From halr at voltaire.com Fri Oct 1 05:25:28 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 01 Oct 2004 08:25:28 -0400 Subject: [openib-general] [PATCH] ib_mad: Fix send side bugs Message-ID: <1096633528.2429.3.camel@hpc-1> ib_mad: Fix send side bugs Sending of DR SMPs is now working :-) Index: ib_mad.c =================================================================== --- ib_mad.c (revision 916) +++ ib_mad.c (working copy) @@ -894,6 +894,7 @@ struct ib_wc *wc) { struct ib_mad_send_wr_private *mad_send_wr; + struct list_head *send_wr; unsigned long flags; /* Completion corresponds to first entry on posted MAD send list */ @@ -907,9 +908,13 @@ mad_send_wr = list_entry(&port_priv->send_posted_mad_list, struct ib_mad_send_wr_private, send_list); - if (mad_send_wr->wr_id != wc->wr_id) { + send_wr = mad_send_wr->send_list.next; + mad_send_wr = container_of(send_wr, struct ib_mad_send_wr_private, send_list); + if (wc->wr_id != (unsigned long)mad_send_wr) { printk(KERN_ERR "Send completion WR ID 0x%Lx doesn't match " - "posted send WR ID 0x%Lx\n", wc->wr_id, mad_send_wr->wr_id); + "posted send WR ID 0x%lx\n", + wc->wr_id, + (unsigned long)mad_send_wr); goto error; } From halr at voltaire.com Fri Oct 1 05:34:31 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 01 Oct 2004 08:34:31 -0400 Subject: [openib-general] Re: [PATCH] code/stack cleanup in find_mad_agent/validate_mad routines In-Reply-To: <20040930125414.47bca27e.mshefty@ichips.intel.com> References: <20040930125414.47bca27e.mshefty@ichips.intel.com> Message-ID: <1096634071.2429.7.camel@hpc-1> On Thu, 2004-09-30 at 15:54, Sean Hefty wrote: > Patch removes a couple of stack variables, eliminates gotos, > and reformats lines over 80 columns. As the coding guideline have a section on "Centralized exiting of functions", I don't think the gotos should be eliminated, at least in this way. -- Hal From halr at voltaire.com Fri Oct 1 05:54:34 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 01 Oct 2004 08:54:34 -0400 Subject: [openib-general] [PATCH] ib_cancel_mad API In-Reply-To: <20040930123614.4eb864a2.mshefty@ichips.intel.com> References: <20040930095509.509ceace.mshefty@ichips.intel.com> <000501c4a70f$d8b539c0$655aa8c0@infiniconsys.com> <20040930123614.4eb864a2.mshefty@ichips.intel.com> Message-ID: <1096635274.2427.1.camel@hpc-1> On Thu, 2004-09-30 at 15:36, Sean Hefty wrote: > On Thu, 30 Sep 2004 10:06:36 -0700 > "Fab Tillier" wrote: > > > I find the "goto found" syntax ugly and confusing. It seems unnatural to > > jump over the unlock like that. > > Does this patch (version 3) seem more natural and less confusing to you? :) Thanks. Applied with a minor change (for centralized exiting of ib_cancel_mad). -- Hal From sean.hefty at intel.com Fri Oct 1 08:32:43 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 1 Oct 2004 08:32:43 -0700 Subject: [openib-general] Re: [PATCH] code/stack cleanup infind_mad_agent/validate_mad routines In-Reply-To: <1096634071.2429.7.camel@hpc-1> Message-ID: >As the coding guideline have a section on "Centralized exiting of >functions", I don't think the gotos should be eliminated, at least in >this way. Bah - the example that the coding guideline gives in that section doesn't even follow this rule. It's usually only useful when there's common cleanup that's needed. But I am fine not applying this patch. From ftillier at infiniconsys.com Fri Oct 1 08:57:13 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Fri, 1 Oct 2004 08:57:13 -0700 Subject: [openib-general] [PATCH] ib_cancel_mad API In-Reply-To: <20040930123614.4eb864a2.mshefty@ichips.intel.com> Message-ID: <000601c4a7cf$51ed46b0$655aa8c0@infiniconsys.com> > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Thursday, September 30, 2004 12:36 PM > > Does this patch (version 3) seem more natural and less confusing to > you? :) > It sure does. Thanks! - Fab From halr at voltaire.com Fri Oct 1 09:33:44 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 01 Oct 2004 12:33:44 -0400 Subject: [openib-general] Re: [PATCH] request/response matching in MAD code In-Reply-To: <20040930121628.2e966a1f.mshefty@ichips.intel.com> References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> Message-ID: <1096648424.2432.2.camel@hpc-1> On Thu, 2004-09-30 at 15:16, Sean Hefty wrote: > The following patch should match response MADs with the corresponding request. A response without a matching request is discarded, and responses are reported before requests. > > Timeouts of request MADs are not yet handled. Thanks! Applied with a few changes: 1. In ib_post_send_mad, where the TID is obtained from the sg list, the addr is a physical one and needed converted to a virtual one. mad_send_wr->tid = ((struct ib_mad_hdr*) bus_to_virt(cur_send_wr->sg_list->addr))->tid; 2. Added the following to reassemble_recv (it was eliminated from ib_mad_recv_done_handler): INIT_LIST_HEAD(&recv->header.recv_buf.list); I also eliminated an unneeded stack variable in ib_post_send_mad. Also, should the TID be overwritten in the high 32 bits or do we trust the client to set this properly ? Note that I only validated an incoming request and have not tested the request/response matching as yet. -- Hal From mshefty at ichips.intel.com Fri Oct 1 09:41:26 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 1 Oct 2004 09:41:26 -0700 Subject: [openib-general] Re: [PATCH] request/response matching in MAD code In-Reply-To: <1096648424.2432.2.camel@hpc-1> References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> <1096648424.2432.2.camel@hpc-1> Message-ID: <20041001094126.5114a67c.mshefty@ichips.intel.com> On Fri, 01 Oct 2004 12:33:44 -0400 Hal Rosenstock wrote: > mad_send_wr->tid = ((struct ib_mad_hdr*) > bus_to_virt(cur_send_wr->sg_list->addr))->tid; Thanks - good catch. > 2. Added the following to reassemble_recv (it was eliminated from > ib_mad_recv_done_handler): > > INIT_LIST_HEAD(&recv->header.recv_buf.list); I was going to get back to RMPP handling. I'm wasn't sure if we wanted to use a doubly linked list or singly linked one for this. > Also, should the TID be overwritten in the high 32 bits or do we trust > the client to set this properly ? Based on our previous discussions on this, clients are responsible for setting the upper 32-bits of the TID correctly. This should be more efficient, and only requires to use the TID provided through registration. > Note that I only validated an incoming request and have not tested the > request/response matching as yet. We may want to delay testing of this until I can get the timeout code added. From mshefty at ichips.intel.com Fri Oct 1 16:11:22 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 1 Oct 2004 16:11:22 -0700 Subject: [openib-general] [PATCH] request/response matching in MAD code In-Reply-To: <20040930121628.2e966a1f.mshefty@ichips.intel.com> References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> Message-ID: <20041001161122.7f79b08e.mshefty@ichips.intel.com> On Thu, 30 Sep 2004 12:16:28 -0700 Sean Hefty wrote: > Timeouts of request MADs are not yet handled. I have part of this implemented now. Does anyone have any opinions on how to manage the thread(s) for processing timeouts? My current thinking is to use workqueues, with one workqueue item per mad_agent. - Sean From roland at topspin.com Fri Oct 1 16:54:21 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 01 Oct 2004 16:54:21 -0700 Subject: [openib-general] [PATCH] request/response matching in MAD code In-Reply-To: <20041001161122.7f79b08e.mshefty@ichips.intel.com> (Sean Hefty's message of "Fri, 1 Oct 2004 16:11:22 -0700") References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> <20041001161122.7f79b08e.mshefty@ichips.intel.com> Message-ID: <52vfduvvte.fsf@topspin.com> Sean> I have part of this implemented now. Does anyone have any Sean> opinions on how to manage the thread(s) for processing Sean> timeouts? Why do we need a thread to handle timeouts? Can't we do all the processing from the timer callback? - R. From mshefty at ichips.intel.com Fri Oct 1 17:11:28 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 1 Oct 2004 17:11:28 -0700 Subject: [openib-general] [PATCH] request/response matching in MAD code In-Reply-To: <52vfduvvte.fsf@topspin.com> References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> <20041001161122.7f79b08e.mshefty@ichips.intel.com> <52vfduvvte.fsf@topspin.com> Message-ID: <20041001171128.18bdc3b0.mshefty@ichips.intel.com> On Fri, 01 Oct 2004 16:54:21 -0700 Roland Dreier wrote: > Sean> I have part of this implemented now. Does anyone have any > Sean> opinions on how to manage the thread(s) for processing > Sean> timeouts? > > Why do we need a thread to handle timeouts? Can't we do all the > processing from the timer callback? Handling the processing from the timer callback is an option. I wasn't sure that we wanted to invoke user callbacks from a timer callback, however. - Sean From roland at topspin.com Fri Oct 1 17:57:35 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 01 Oct 2004 17:57:35 -0700 Subject: [openib-general] [PATCH] request/response matching in MAD code In-Reply-To: <20041001171128.18bdc3b0.mshefty@ichips.intel.com> (Sean Hefty's message of "Fri, 1 Oct 2004 17:11:28 -0700") References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> <20041001161122.7f79b08e.mshefty@ichips.intel.com> <52vfduvvte.fsf@topspin.com> <20041001171128.18bdc3b0.mshefty@ichips.intel.com> Message-ID: <52r7oivsw0.fsf@topspin.com> Sean> Handling the processing from the timer callback is an Sean> option. I wasn't sure that we wanted to invoke user Sean> callbacks from a timer callback, however. It seems like the natural thing to do. I'd also like to move the completion processing to a tasklet. - R. From sean.hefty at intel.com Fri Oct 1 19:43:58 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 1 Oct 2004 19:43:58 -0700 Subject: [openib-general] [PATCH] request/response matching in MAD code In-Reply-To: <52r7oivsw0.fsf@topspin.com> Message-ID: > Sean> Handling the processing from the timer callback is an > Sean> option. I wasn't sure that we wanted to invoke user > Sean> callbacks from a timer callback, however. > >It seems like the natural thing to do. I'll do this if no one objects. >I'd also like to move the completion processing to a tasklet. Would there be any issues processing a local MAD on the HCA in a tasklet? (I'm not close to the code at the moment.) If not, then that sounds good to me. I do think that we'd want to define whether a client could sleep in their callbacks or not, and it would be nice if it were consistent, regardless of how the callback were invoked. From roland at topspin.com Fri Oct 1 20:05:50 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 01 Oct 2004 20:05:50 -0700 Subject: [openib-general] [PATCH] request/response matching in MAD code In-Reply-To: (Sean Hefty's message of "Fri, 1 Oct 2004 19:43:58 -0700") References: Message-ID: <52mzz5x1ip.fsf@topspin.com> Sean> Would there be any issues processing a local MAD on the HCA Sean> in a tasklet? (I'm not close to the code at the moment.) Sean> If not, then that sounds good to me. Hmm, that's a good point. Making that work is probably going to take a redesign of the MAD layer internals, and I don't think we should tackle that right now. For now it's probably better to have something like a workqueue per mad_port (singlethreaded I guess, the full per-CPU shebang feels like overkill). Both completion handling and timeouts can be processed with queue_work(); I think you need only two work_structs per mad_port (one for completions and one for timeouts). You can keep a list of timed out sends -- when a timeout happens, just add the send to the list and then call queue_work(). In addition to being neater (fewer kernel threads), using the workqueue for completions makes it easier to avoid lost wakeups: work_struct->func is guaranteed to be called after queue_work(). In the long run we probably do want to move all the MAD processing out of process context, to avoid the SMA starvation problems that were brought up on the list a few months ago. So maybe the rule should be that consumers can't sleep in callbacks, even though it's OK in the current implementation. - Roland From halr at voltaire.com Sat Oct 2 04:28:04 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 02 Oct 2004 07:28:04 -0400 Subject: [openib-general] Re: [PATCH] request/response matching in MAD code In-Reply-To: <20041001094126.5114a67c.mshefty@ichips.intel.com> References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> <1096648424.2432.2.camel@hpc-1> <20041001094126.5114a67c.mshefty@ichips.intel.com> Message-ID: <1096716484.1865.13.camel@localhost.localdomain> On Fri, 2004-10-01 at 12:41, Sean Hefty wrote: > On Fri, 01 Oct 2004 12:33:44 -0400 > Hal Rosenstock wrote: > > 2. Added the following to reassemble_recv (it was eliminated from > > ib_mad_recv_done_handler): > > > > INIT_LIST_HEAD(&recv->header.recv_buf.list); > > I was going to get back to RMPP handling. I think we can wait on RMPP. > 2. Added the following to reassemble_recv (it was eliminated from > > ib_mad_recv_done_handler): > > > > INIT_LIST_HEAD(&recv->header.recv_buf.list); > I'm wasn't sure if we wanted to use a doubly linked list or singly linked one for this. This isn't just for RMPP; it's needed to free the receive buffers correctly. I'm not sure either and have used doubly linked lists everywhere although I too think that singly linked ones would suffice. > > Also, should the TID be overwritten in the high 32 bits or do we trust > > the client to set this properly ? > > Based on our previous discussions on this, clients are responsible for setting the upper 32-bits of the TID correctly. This should be more efficient, and only requires to use the TID provided through registration. > > > Note that I only validated an incoming request and have not tested the > > request/response matching as yet. > > We may want to delay testing of this until I can get the timeout code added. I will test this without timeouts first and then retest when timeouts are implemented. -- Hal From halr at voltaire.com Mon Oct 4 08:24:23 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 04 Oct 2004 11:24:23 -0400 Subject: [openib-general] Access Layer Status Message-ID: <1096903463.1859.40.camel@localhost.localdomain> Here's an update on the access layer status: MAD layer is working for both receiving and sending including matching responses with requests. I am hoping to have the SMI/SMA working (and make a release announcement) by the end of tomorrow. I am most of the way through this with a couple of issues to resolve. Note that timeouts are not yet supported but are on their way although a release announcement may occur without this support. I am presuming we can start building other code without this initially. If I am wrong about this let me know. You should start seeing more patches for the changes to both the MAD and SMI/SMA coming over the next two days. -- Hal From halr at voltaire.com Mon Oct 4 08:53:21 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 04 Oct 2004 11:53:21 -0400 Subject: [openib-general] Re: [PATCH] request/response matching in MAD code In-Reply-To: <20041001094126.5114a67c.mshefty@ichips.intel.com> References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> <1096648424.2432.2.camel@hpc-1> <20041001094126.5114a67c.mshefty@ichips.intel.com> Message-ID: <1096905200.1873.77.camel@localhost.localdomain> On Fri, 2004-10-01 at 12:41, Sean Hefty wrote: > On Fri, 01 Oct 2004 12:33:44 -0400 > Hal Rosenstock wrote: > > Also, should the TID be overwritten in the high 32 bits or do we trust > > the client to set this properly ? > > Based on our previous discussions on this, clients are responsible > for setting the upper 32-bits of the TID correctly. Yup, I recall this discussion. > This should be more efficient, and only requires to use the TID provided > through registration. It is a minor efficiency gain. The downside is the following: If the client sets the hi_tid wrong (not corresponding to the mad_agent supplied), the response either gets thrown away (if there is no hi_tid match) or delivered to the "wrong" client callback. A "similar" scenario exists when the client specifies a timeout and the send is "unsolicited". In this case, the send although never getting a response is timed out and not completed until then. It could be completed when the send completion occurs ignoring the timeout. With minor checks in both places, the MAD layer could protect against these scenarios. -- Hal From tduffy at sun.com Mon Oct 4 08:59:27 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 04 Oct 2004 08:59:27 -0700 Subject: [openib-general] Re: [openib-commits] r922 - gen2/branches/roland-merge/src/linux-kernel/infiniband/core In-Reply-To: <20041003181209.BA1C92283D5@openib.ca.sandia.gov> References: <20041003181209.BA1C92283D5@openib.ca.sandia.gov> Message-ID: <1096905567.29818.9.camel@duffman> On Sun, 2004-10-03 at 11:12 -0700, roland at openib.org wrote: > Remove code to calculate static LID base from IP address. It's ugly, > relies too much on net stack internals, and will never be accepted > upstream anyway. So save ourselves the headache of making it build. OK, so the only patch now that you need to get openib building on 2.6.9- r3 is the following: Index: drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 922) +++ drivers/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -823,7 +823,7 @@ list_for_each_entry(mcast, &priv->multicast_list, list) clear_bit(IPOIB_MCAST_FLAG_FOUND, &mcast->flags); - read_lock(&in_dev->lock); + read_lock(&in_dev->mc_list_lock); /* Mark all of the entries that are found or don't exist */ for (im = in_dev->mc_list; im; im = im->next) { @@ -885,7 +885,7 @@ } } - read_unlock(&in_dev->lock); + read_unlock(&in_dev->mc_list_lock); /* Remove all of the entries don't exist anymore */ list_for_each_entry_safe(mcast, tmcast, &priv->multicast_list, list) { -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mshefty at ichips.intel.com Mon Oct 4 09:38:13 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 4 Oct 2004 09:38:13 -0700 Subject: [openib-general] Re: [PATCH] request/response matching in MAD code In-Reply-To: <1096905200.1873.77.camel@localhost.localdomain> References: <20040930121628.2e966a1f.mshefty@ichips.intel.com> <1096648424.2432.2.camel@hpc-1> <20041001094126.5114a67c.mshefty@ichips.intel.com> <1096905200.1873.77.camel@localhost.localdomain> Message-ID: <20041004093813.6869bd0f.mshefty@ichips.intel.com> On Mon, 04 Oct 2004 11:53:21 -0400 Hal Rosenstock wrote: > It is a minor efficiency gain. The downside is the following: > If the client sets the hi_tid wrong (not corresponding to the mad_agent > supplied), the response either gets thrown away (if there is no hi_tid > match) or delivered to the "wrong" client callback. > > A "similar" scenario exists when the client specifies a timeout and the > send is "unsolicited". In this case, the send although never getting a > response is timed out and not completed until then. It could be > completed when the send completion occurs ignoring the timeout. I don't have strong feelings on this either way. What I'd like to avoid is pushing too much protocol knowledge into the access layer for a given management class. Some of it is difficult to avoid and still provide some useful functionality. As an example, a CM REP MAD is solicited. The access layer shouldn't modify the TID, but it makes sense for the user to specify a timeout value. - Sean From halr at voltaire.com Mon Oct 4 10:26:42 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 04 Oct 2004 13:26:42 -0400 Subject: [openib-general] ib_mad: Scenarios for returning posted send MADs Message-ID: <1096910801.1859.202.camel@localhost.localdomain> There are two lists of posted send MADs: (1) a list of posted sends for the port, and (2) another list per MAD agent. When a send is first posted, it is placed on both lists until the send completion occurs and then is removed from the port send list. The handling of the agent send list is based on whether there is a timeout specified or not. 1. In the case that a client unregisters with the MAD layer, there is code which cleans up the agent send list. However, it does not appear to me that if the send completion occurs after the deregistration that this completion is thrown away properly but rather a callback may be performed. Did I miss something here ? 2. Another scenario for this is on WC errors which currently attempt to restart the port. I am not sure all WC errors should do this. Perhaps only IB_WC_FATAL_ERR and IB_WC_GENERAL_ERR. 3. The final scenario is board (not currently possible) or module removal. My concern here is about potential send callbacks (indicating FLUSHED) to a potentially stale MAD agent. When the module is removed non forceably, the clients (upper layer modules) would need to be removed first, which should cause the proper deregistration (and these MADs would be cancelled so there would be none to cleanup). I am not sure what the rules for proper behavior are on forceable module removal. Board removal would be similar to this (the forceable module removal case). -- Hal From halr at voltaire.com Mon Oct 4 10:36:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 04 Oct 2004 13:36:26 -0400 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <20041002192612.GB8326@mellanox.co.il> References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> Message-ID: <1096911386.1873.209.camel@localhost.localdomain> On Sat, 2004-10-02 at 15:26, Michael S. Tsirkin wrote: > I'd like to suggest that the mad layer could expose an allocator > function that the user will call to grab the memory for the mad. > This function would return the memory pointer and the appropriate > lkey value. Wouldn't the triple of lkey, start address, and size need to be returned ? > This would make it possible to change the implementation to have > even multiple MRs if needed (for example for tavor you need > two MRs to cover the whole 64 bit address space). How can it be determined whether a second MR is needed ? > On systems like the Apple's G5 where enabling memory for DMA > consumes IOMMU resources, it may even be a good idea to > implement this by keeping a pool of MADs and allocate from there. There was a discussion about a MAD pool or even multiple pools before the ib_mad API was agreed upon. It seemed that consensus at the time was against this, although I think the consensus might have been 51 to 49 on this. -- Hal From mshefty at ichips.intel.com Mon Oct 4 10:52:29 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 4 Oct 2004 10:52:29 -0700 Subject: [openib-general] Re: ib_mad: Scenarios for returning posted send MADs In-Reply-To: <1096910801.1859.202.camel@localhost.localdomain> References: <1096910801.1859.202.camel@localhost.localdomain> Message-ID: <20041004105229.5ce12e0e.mshefty@ichips.intel.com> On Mon, 04 Oct 2004 13:26:42 -0400 Hal Rosenstock wrote: > There are two lists of posted send MADs: (1) a list of posted sends for > the port, and (2) another list per MAD agent. When a send is first > posted, it is placed on both lists until the send completion occurs and > then is removed from the port send list. The handling of the agent send > list is based on whether there is a timeout specified or not. This is correct. The list per MAD agent is intended for timeouts and RMPP handling. Without RMPP, the list of posted sends per port matches the MAD agent list. With RMPP, a send at the MAD agent level may result in posting multiple work requests to the port layer. > 1. In the case that a client unregisters with the MAD layer, there is > code which cleans up the agent send list. However, it does not appear to > me that if the send completion occurs after the deregistration that this > completion is thrown away properly but rather a callback may be > performed. Did I miss something here ? A reference on the MAD agent is taken whenever a work request is posted to the QP. An additional reference is taken on the MAD agent if the MAD has a timeout, indicating that a response MAD is expected. When RMPP is added, a single send may result in multiple references being taken on the MAD agent. The reference per work request is not released until the work request complete. The reference for the response is not released until the response has been received, the request times out, or is canceled. When a client deregisters, MADs waiting for responses are canceled. This decrements their reference counts. If the MAD had no other references, then it is done and may be completed. If it still has references, this indicates that it has active work requests on the QP that must complete before the send MAD can complete. This is why the deregistration code decrements the reference count, then checks the reference count before flushing the request. > 2. Another scenario for this is on WC errors which currently attempt to > restart the port. I am not sure all WC errors should do this. Perhaps > only IB_WC_FATAL_ERR and IB_WC_GENERAL_ERR. My thought is that work requests that result in a failure should be completed in error from the port layer to the MAD agent. The port layer _could_ then restart operations with the next work request, and the MAD agent would complete the send MAD to the user in error. Of course, throwing RMPP into this complicates the matter, since the work request immediately behind the one causing the failure might be another request associated with the same RMPP MAD, which may cause another failure... It would help in this case for the port layer code just return completions for all queued work requests to the MAD agents, and let the MAD agent code deal with the issue. > 3. The final scenario is board (not currently possible) or module > removal. My concern here is about potential send callbacks (indicating > FLUSHED) to a potentially stale MAD agent. When the module is removed > non forceably, the clients (upper layer modules) would need to be > removed first, which should cause the proper deregistration (and these > MADs would be cancelled so there would be none to cleanup). I am not > sure what the rules for proper behavior are on forceable module removal. > Board removal would be similar to this (the forceable module removal > case). Deregistration is a synchronous process, so will wait until all send MADs have completed. If this isn't happening, then the referencing counting is off somewhere. - Sean From mshefty at ichips.intel.com Mon Oct 4 10:57:00 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 4 Oct 2004 10:57:00 -0700 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <1096911386.1873.209.camel@localhost.localdomain> References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> Message-ID: <20041004105700.3daea4b5.mshefty@ichips.intel.com> On Mon, 04 Oct 2004 13:36:26 -0400 Hal Rosenstock wrote: > On Sat, 2004-10-02 at 15:26, Michael S. Tsirkin wrote: > > I'd like to suggest that the mad layer could expose an allocator > > function that the user will call to grab the memory for the mad. > > This function would return the memory pointer and the appropriate > > lkey value. > > Wouldn't the triple of lkey, start address, and size need to be returned I would think that the size would need to be an input into the allocator routine. > There was a discussion about a MAD pool or even multiple pools before > the ib_mad API was agreed upon. It seemed that consensus at the time was > against this, although I think the consensus might have been 51 to 49 on > this. Having an allocator routine might force users to perform data copies when sending data. Do all of the existing MAD implementations have routines to allocate MADs when sending data, and require those routines to be used? From halr at voltaire.com Mon Oct 4 11:12:39 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 04 Oct 2004 14:12:39 -0400 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <20041004105700.3daea4b5.mshefty@ichips.intel.com> References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> <20041004105700.3daea4b5.mshefty@ichips.intel.com> Message-ID: <1096913558.1873.211.camel@localhost.localdomain> On Mon, 2004-10-04 at 13:57, Sean Hefty wrote: > Do all of the existing MAD implementations have routines to > allocate MADs when sending data, and require those routines > to be used? I believe the answer is that not all but most do. -- Hal From halr at voltaire.com Mon Oct 4 12:34:51 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 04 Oct 2004 15:34:51 -0400 Subject: [openib-general] Re: ib_mad: Scenarios for returning posted send MADs In-Reply-To: <20041004105229.5ce12e0e.mshefty@ichips.intel.com> References: <1096910801.1859.202.camel@localhost.localdomain> <20041004105229.5ce12e0e.mshefty@ichips.intel.com> Message-ID: <1096918490.1859.230.camel@localhost.localdomain> On Mon, 2004-10-04 at 13:52, Sean Hefty wrote: Hal Rosenstock wrote: > > 1. In the case that a client unregisters with the MAD layer, there is > > code which cleans up the agent send list. However, it does not appear to > > me that if the send completion occurs after the deregistration that this > > completion is thrown away properly but rather a callback may be > > performed. Did I miss something here ? > > A reference on the MAD agent is taken whenever a work request is > posted to the QP. An additional reference is taken on the MAD > agent if the MAD has a timeout, indicating that a response MAD is > expected. When RMPP is added, a single send may result in multiple > references being taken on the MAD agent. > > The reference per work request is not released until the work > request complete. The reference for the response is not released > until the response has been received, the request times out, or is > canceled. > > When a client deregisters, MADs waiting for responses are canceled. This decrements their reference counts. If the MAD had no other references, then it is done and may be completed. If it still has references, this indicates that it has active work requests on the QP that must complete before the send MAD can complete. > > This is why the deregistration code decrements the reference count, > then checks the reference count before flushing the request. I am pretty sure there is a window here as follows: First, deregistration cancels the MAD removing it from the agent send list. ib_mad_complete_send_wr is invoked some time later and never checks for the send WR still being on the agent send list. It just assumes it is. It potentially makes a send callback. > > 2. Another scenario for this is on WC errors which currently attempt to > > restart the port. I am not sure all WC errors should do this. Perhaps > > only IB_WC_FATAL_ERR and IB_WC_GENERAL_ERR. > > My thought is that work requests that result in a failure should be > completed in error from the port layer to the MAD agent. The port > layer _could_ then restart operations with the next work request, > and the MAD agent would complete the send MAD to the user in error. Aren't some errors fine grained and pertain only to the WR supplied whereas other errors are coarser (like fatal and general) and might apply to something larger (perhaps the port but maybe the QP) ? I wonder whether there is any assistance in the Mellanox documentation as to which errors should be treated how. > > Of course, throwing RMPP into this complicates the matter, since > the work request immediately behind the one causing the failure > might be another request associated with the same RMPP MAD, which > may cause another failure... > > It would help in this case for the port layer code > just return completions for all queued work requests to the MAD > agents, and let the MAD agent code deal with the issue. True for most errors. Not sure about fatal and general errors yet. > > 3. The final scenario is board (not currently possible) or module > > removal. My concern here is about potential send callbacks (indicating > > FLUSHED) to a potentially stale MAD agent. When the module is removed > > non forceably, the clients (upper layer modules) would need to be > > removed first, which should cause the proper deregistration (and these > > MADs would be cancelled so there would be none to cleanup). I am not > > sure what the rules for proper behavior are on forceable module removal. > > Board removal would be similar to this (the forceable module removal > > case). > > Deregistration is a synchronous process, so will wait until all > send MADs have completed. If this isn't happening, then the > referencing counting is off somewhere. I think deregistration is fine (short of issue 1 which I think is readily fixable). I was more asking about the asynchronous scenario here (forced module (or board) removal) where that isn't the case. -- Hal From mshefty at ichips.intel.com Mon Oct 4 12:52:08 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 4 Oct 2004 12:52:08 -0700 Subject: [openib-general] Re: ib_mad: Scenarios for returning posted send MADs In-Reply-To: <1096918490.1859.230.camel@localhost.localdomain> References: <1096910801.1859.202.camel@localhost.localdomain> <20041004105229.5ce12e0e.mshefty@ichips.intel.com> <1096918490.1859.230.camel@localhost.localdomain> Message-ID: <20041004125208.61db4915.mshefty@ichips.intel.com> On Mon, 04 Oct 2004 15:34:51 -0400 Hal Rosenstock wrote: > I am pretty sure there is a window here as follows: > First, deregistration cancels the MAD removing it from the agent send > list. > ib_mad_complete_send_wr is invoked some time later and never checks for > the send WR still being on the agent send list. It just assumes it is. > It potentially makes a send callback. The deregistration only removes the mad_send_wr from the agent send list if its reference count is zero. A reference is held on the mad_send_wr from the time that a work request is posted to the port, until a completion is reported. So, you should never get a callback for a mad_send_wr, unless its reference count is at least one. > Aren't some errors fine grained and pertain only to the WR supplied > whereas other errors are coarser (like fatal and general) and might > apply to something larger (perhaps the port but maybe the QP) ? I wonder > whether there is any assistance in the Mellanox documentation as to > which errors should be treated how. I was referring to errors that applied to a single work request only. For fatal errors that we cannot recover from, we may need a way to report such errors to the user to indicate that their mad_agent is no longer operational. > > It would help in this case for the port layer code > > just return completions for all queued work requests to the MAD > > agents, and let the MAD agent code deal with the issue. > > True for most errors. Not sure about fatal and general errors yet. I think it would depend on the error code that was reported in the send_mad_wc. If the return code is flushed, the mad_agent could just repost the send. If the return code is fatal error, it should complete the MAD to the client. > > > 3. The final scenario is board (not currently possible) or module > > > removal. My concern here is about potential send callbacks (indicating > > > FLUSHED) to a potentially stale MAD agent. When the module is removed > > > non forceably, the clients (upper layer modules) would need to be > > > removed first, which should cause the proper deregistration (and these > > > MADs would be cancelled so there would be none to cleanup). I am not > > > sure what the rules for proper behavior are on forceable module removal. > > > Board removal would be similar to this (the forceable module removal > > > case). > > > > Deregistration is a synchronous process, so will wait until all > > send MADs have completed. If this isn't happening, then the > > referencing counting is off somewhere. > > I think deregistration is fine (short of issue 1 which I think is > readily fixable). I was more asking about the asynchronous scenario here > (forced module (or board) removal) where that isn't the case. Unless there's a bug in the code, I don't believe that we can have send callbacks to stale MAD agents. If you're trying to have the code deregister for a client, this would be impossible. Clients should receive some sort of removal notification event and would need to deregister in response to that event. From roland at topspin.com Mon Oct 4 13:14:34 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 04 Oct 2004 13:14:34 -0700 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <20041004105700.3daea4b5.mshefty@ichips.intel.com> (Sean Hefty's message of "Mon, 4 Oct 2004 10:57:00 -0700") References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> <20041004105700.3daea4b5.mshefty@ichips.intel.com> Message-ID: <524qlaw89h.fsf@topspin.com> Sean> Having an allocator routine might force users to perform Sean> data copies when sending data. Sean> Do all of the existing MAD implementations have routines to Sean> allocate MADs when sending data, and require those routines Sean> to be used? Not Topspin's. I think moving allocation into the MAD layer is probably a bad idea, especially if the only motivation is to handle a DMA address of 0xffffffffffffffff for Tavor (I don't think that will ever happen in practice). The argument about IOMMU resources seems to be an argument in favor of letting the consumer handle allocation. - R. From roland at topspin.com Mon Oct 4 13:15:39 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 04 Oct 2004 13:15:39 -0700 Subject: [openib-general] Re: [openib-commits] r922 - gen2/branches/roland-merge/src/linux-kernel/infiniband/core In-Reply-To: <1096905567.29818.9.camel@duffman> (Tom Duffy's message of "Mon, 04 Oct 2004 08:59:27 -0700") References: <20041003181209.BA1C92283D5@openib.ca.sandia.gov> <1096905567.29818.9.camel@duffman> Message-ID: <52zn32utn8.fsf@topspin.com> Tom> OK, so the only patch now that you need to get openib Tom> building on 2.6.9- r3 is the following: Thanks. I'm currently working on reworking the IPoIB driver into a "native" driver, which will eliminate this ugly peeking into the network layer's multicast lists. When I have the required patch for the core network code, I'll post it here. - R. From vonwyl at EIG.UNIGE.CH Tue Oct 5 05:46:31 2004 From: vonwyl at EIG.UNIGE.CH (von Wyl) Date: Tue, 05 Oct 2004 14:46:31 +0200 Subject: [openib-general] using VAPI and CMAPI in kernel module. Message-ID: <416297A7.3000408@eig.unige.ch> Hi, I'm trying to compile a kernel module which use the VAPI and (maybe) the CMAPI. And when I compile a file with an inclusion of vapi.h I got this : /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:481: error: parse error before '*' token /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:481: warning: function declaration isn't a prototype /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:501: error: parse error before '*' token /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:502: warning: function declaration isn't a prototype /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h: In function `MOSAL_time_compare': /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:503: error: `ts1' undeclared (first use in this function) /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:503: error: (Each undeclared identifier is reported only once /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:503: error: for each function it appears in.) /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:503: error: `ts2' undeclared (first use in this function) /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h: At top level: /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:527: error: parse error before '*' token /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:527: error: parse error before '*' token /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:528: warning: return type defaults to `int' /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:528: warning: function declaration isn't a prototype /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h: In function `MOSAL_time_add_usec': /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:529: error: `ts' undeclared (first use in this function) /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:529: error: `usecs' undeclared (first use in this function) /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h: At top level: /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:553: error: parse error before '*' token /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:554: warning: function declaration isn't a prototype /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h: In function `MOSAL_time_init': /usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/mosal_timer.h:555: error: `ts' undeclared (first use in this function) the file is simply : #include #include #include int init_hello(void) { printk(KERN_NOTICE "Hello, world\n"); return 0; } void cleanup_hello(void) { printk(KERN_ALERT "Goodbye, cruel world\n"); } module_init(init_hello); module_exit(cleanup_hello); and the Makefile : KDIR = /lib/modules/$(shell uname -r)/build/ EXTRA_CFLAGS :=-I/usr/src/linux-2.6.7/drivers/infiniband/hw/mellanox-hca/include/ -O2 -Wall obj-m += exemple01.o default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules clean : rm -rf *~ *.o *.ko If someone tried something like that befor could he send me some very simple examples? I'm using the openIB stack on a 2.6.7 kernel. Thanks... From halr at voltaire.com Tue Oct 5 06:31:43 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 09:31:43 -0400 Subject: [openib-general] Re: ib_mad: Scenarios for returning posted send MADs In-Reply-To: <20041004125208.61db4915.mshefty@ichips.intel.com> References: <1096910801.1859.202.camel@localhost.localdomain> <20041004105229.5ce12e0e.mshefty@ichips.intel.com> <1096918490.1859.230.camel@localhost.localdomain> <20041004125208.61db4915.mshefty@ichips.intel.com> Message-ID: <1096983102.1861.22.camel@localhost.localdomain> On Mon, 2004-10-04 at 15:52, Sean Hefty wrote: > Hal Rosenstock wrote: > > > I am pretty sure there is a window here as follows: > > First, deregistration cancels the MAD removing it from the agent send > > list. > > ib_mad_complete_send_wr is invoked some time later and never checks for > > the send WR still being on the agent send list. It just assumes it is. > > It potentially makes a send callback. > > The deregistration only removes the mad_send_wr from the agent send list > if its reference count is zero. A reference is held on the mad_send_wr > from the time that a work request is posted to the port, until a completion > is reported. So, you should never get a callback for a mad_send_wr, > unless its reference count is at least one. Do you mean that you should never get a callback for a mad_send_wr if (rather than unless) it's reference count is at least one ? Cancelling the MAD decrements the reference count which has the effect of moving the callback one stage ahead. Either the send completion has already occurred in which case the reference count will be 0 or negative and the callback will be invoked immediately or it has not yet occurred so is invoked when the send completion occurs subsequently. The latter case leaves the MAD on both the port and agent send lists until it occurs. There are the other removal scenarios which interplay with this. > > Aren't some errors fine grained and pertain only to the WR supplied > > whereas other errors are coarser (like fatal and general) and might > > apply to something larger (perhaps the port but maybe the QP) ? I wonder > > whether there is any assistance in the Mellanox documentation as to > > which errors should be treated how. > > I was referring to errors that applied to a single work request only. > For fatal errors that we cannot recover from, we may need a way to report > such errors to the user to indicate that their mad_agent is no longer > operational. In looking at the programmer's guide and the mthca driver, I found the following: All CQE syndromes are converted to the appropriate WC status. An unknown syndrome is reported as a general error. Fatal error appears to me to be currently unused. > > > It would help in this case for the port layer code > > > just return completions for all queued work requests to the MAD > > > agents, and let the MAD agent code deal with the issue. > > > > True for most errors. Not sure about fatal and general errors yet. > > I think it would depend on the error code that was reported in the > send_mad_wc. If the return code is flushed, the mad_agent could just > repost the send. Agreed. > If the return code is fatal error, it should complete the MAD to the client. I couldn't find any fatal errors. I think this would be true for a general error. > > > > 3. The final scenario is board (not currently possible) or module > > > > removal. My concern here is about potential send callbacks (indicating > > > > FLUSHED) to a potentially stale MAD agent. When the module is removed > > > > non forceably, the clients (upper layer modules) would need to be > > > > removed first, which should cause the proper deregistration (and these > > > > MADs would be cancelled so there would be none to cleanup). I am not > > > > sure what the rules for proper behavior are on forceable module removal. > > > > Board removal would be similar to this (the forceable module removal > > > > case). > > > > > > Deregistration is a synchronous process, so will wait until all > > > send MADs have completed. If this isn't happening, then the > > > referencing counting is off somewhere. > > > > I think deregistration is fine (short of issue 1 which I think is > > readily fixable). I was more asking about the asynchronous scenario here > > (forced module (or board) removal) where that isn't the case. > > Unless there's a bug in the code, I don't believe that we can have send > callbacks to stale MAD agents. If you're trying to have the code deregister > for a client, this would be impossible. Clients should receive some sort > of removal notification event and would need to deregister in response > to that event. It all depends on the ordering of these shutdown events. If the removal event went to all modules simultaneously, there would need to be an interlock to prevent this from occuring. In any case, what should the MAD layer do when there are posted sends on an agent list ? Should it just dump them and not attempt to make a callback ? The bad side of this is that there are legitimate scenarios where this might occur. Not making the send callback has a number of side effects beyond the individual client (in terms of PCI mapping and memory leakage). -- Hal From mst at mellanox.co.il Tue Oct 5 08:26:28 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 5 Oct 2004 17:26:28 +0200 Subject: [openib-general] Re: ib_mad: Scenarios for returning posted send MADs In-Reply-To: <1096983102.1861.22.camel@localhost.localdomain> References: <1096910801.1859.202.camel@localhost.localdomain> <20041004105229.5ce12e0e.mshefty@ichips.intel.com> <1096918490.1859.230.camel@localhost.localdomain> <20041004125208.61db4915.mshefty@ichips.intel.com> <1096983102.1861.22.camel@localhost.localdomain> Message-ID: <20041005152628.GC8230@mellanox.co.il> Hello! Quoting r. Hal Rosenstock (halr at voltaire.com) "[openib-general] Re: ib_mad: Scenarios for returning posted send MADs": > > Unless there's a bug in the code, I don't believe that we can have send > > callbacks to stale MAD agents. If you're trying to have the code deregister > > for a client, this would be impossible. Clients should receive some sort > > of removal notification event and would need to deregister in response > > to that event. > > It all depends on the ordering of these shutdown events. If the removal > event went to all modules simultaneously, there would need to be an > interlock to prevent this from occuring. > > In any case, what should the MAD layer do when there are posted sends on > an agent list ? Should it just dump them and not attempt to make a > callback ? The bad side of this is that there are legitimate scenarios > where this might occur. Not making the send callback has a number of > side effects beyond the individual client (in terms of PCI mapping and > memory leakage). > > -- Hal I dont see where is this coming from. You dont want each send to play with locks and/or counters just to prevent module removal - it will hurt performance, and module reference count shall take care of that for you. For power management and/or hotswap, thats another issue. As Sean points out, you must first remove all client modules before removing the driver, or modprobe -r will fail. Force I think is for debugging, like for bugs in reference counting. MST From mst at mellanox.co.il Tue Oct 5 08:32:26 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 5 Oct 2004 17:32:26 +0200 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <524qlaw89h.fsf@topspin.com> References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> <20041004105700.3daea4b5.mshefty@ichips.intel.com> <524qlaw89h.fsf@topspin.com> Message-ID: <20041005153226.GD8230@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] mthca and DDR not hidden": > Sean> Having an allocator routine might force users to perform > Sean> data copies when sending data. > > Sean> Do all of the existing MAD implementations have routines to > Sean> allocate MADs when sending data, and require those routines > Sean> to be used? > > Not Topspin's. > > I think moving allocation into the MAD layer is probably a bad idea, > especially if the only motivation is to handle a DMA address of > 0xffffffffffffffff for Tavor (I don't think that will ever happen in > practice). Actually saw some mail on lkml the other day about a 64 bit system where memory is fragmented - PCI at 0 and actual memory near -1. Anyway, Tavor is likely not only device with restrictions on maximum region size? > The argument about IOMMU resources seems to be an argument > in favor of letting the consumer handle allocation. > > - R. Are you saying consumer shall play with IOMMU? But then PCI addresses of memory will change each time consumer does this and you wont be able to use your default region? MSt From mst at mellanox.co.il Tue Oct 5 08:34:32 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 5 Oct 2004 17:34:32 +0200 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <20041004105700.3daea4b5.mshefty@ichips.intel.com> References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> <20041004105700.3daea4b5.mshefty@ichips.intel.com> Message-ID: <20041005153432.GE8230@mellanox.co.il> Hello! Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] mthca and DDR not hidden": > Having an allocator routine might force users to perform data copies when sending data. Well, no one is running MPI over MADs - its for setup and management, anyway. MST From mshefty at ichips.intel.com Tue Oct 5 08:52:13 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 08:52:13 -0700 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <20041005153432.GE8230@mellanox.co.il> References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> <20041004105700.3daea4b5.mshefty@ichips.intel.com> <20041005153432.GE8230@mellanox.co.il> Message-ID: <20041005085213.21d72fa5.mshefty@ichips.intel.com> On Tue, 5 Oct 2004 17:34:32 +0200 "Michael S. Tsirkin" wrote: > Hello! > Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] mthca and DDR not hidden": > > Having an allocator routine might force users to perform data copies when sending data. > > Well, no one is running MPI over MADs - its for setup and > management, anyway. I guess I should have been clearer. I was only refering to data copies when sending MADs. Obviously, data sent on other QPs would not be affected. From roland at topspin.com Tue Oct 5 08:53:10 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 05 Oct 2004 08:53:10 -0700 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <20041005153226.GD8230@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 5 Oct 2004 17:32:26 +0200") References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> <20041004105700.3daea4b5.mshefty@ichips.intel.com> <524qlaw89h.fsf@topspin.com> <20041005153226.GD8230@mellanox.co.il> Message-ID: <52lleltb4p.fsf@topspin.com> Michael> Actually saw some mail on lkml the other day about a 64 Michael> bit system where memory is fragmented - PCI at 0 and Michael> actual memory near -1. Anyway, Tavor is likely not only Michael> device with restrictions on maximum region size? Right, that's why there's pci_set_dma_mask() etc. Michael> Are you saying consumer shall play with IOMMU? But then Michael> PCI addresses of memory will change each time consumer Michael> does this and you wont be able to use your default Michael> region? Not "play with IOMMU" but use pci_map_single() etc. The PCI address may change but as long as it is in the low 0xffffffffffffffff part of memory we're fine. - Roland From iod00d at hp.com Tue Oct 5 08:53:01 2004 From: iod00d at hp.com (Grant Grundler) Date: Tue, 5 Oct 2004 08:53:01 -0700 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <20041005153226.GD8230@mellanox.co.il> References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> <20041004105700.3daea4b5.mshefty@ichips.intel.com> <524qlaw89h.fsf@topspin.com> <20041005153226.GD8230@mellanox.co.il> Message-ID: <20041005155301.GA18567@cup.hp.com> On Tue, Oct 05, 2004 at 05:32:26PM +0200, Michael S. Tsirkin wrote: > > The argument about IOMMU resources seems to be an argument > > in favor of letting the consumer handle allocation. > > Are you saying consumer shall play with IOMMU? Yes, if "play with" means something above the HCA driver deals with DMA mappings. > But then > PCI addresses of memory will change each time consumer > does this and you wont be able to use your default region? Are regions defined by physical (DMA) or virtual addresses? In any case, a "default region" implies a long term DMA mapping. Documentation/DMA-mapping.txt is pretty clear that "streaming" DMA mappings are only intended for transient use - ie map something, do the DMA, then unmap. "Coherent" DMA mappings are intended to live longer periods but only for control structures like descriptor rings or other shared data. Linux tries to be "stingy" with DMA mappings because they are a scarce resource on some platforms. hth, grant From mst at mellanox.co.il Tue Oct 5 09:05:23 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 5 Oct 2004 18:05:23 +0200 Subject: [openib-general] mthca and DDR not hidden In-Reply-To: <20041005085213.21d72fa5.mshefty@ichips.intel.com> References: <52655vzee8.fsf@topspin.com> <1096576805.2885.2.camel@hpc-1> <20040930140429.486106c3.mshefty@ichips.intel.com> <20041002192612.GB8326@mellanox.co.il> <1096911386.1873.209.camel@localhost.localdomain> <20041004105700.3daea4b5.mshefty@ichips.intel.com> <20041005153432.GE8230@mellanox.co.il> <20041005085213.21d72fa5.mshefty@ichips.intel.com> Message-ID: <20041005160522.GF8230@mellanox.co.il> Hello! Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] mthca and DDR not hidden": > On Tue, 5 Oct 2004 17:34:32 +0200 > "Michael S. Tsirkin" wrote: > > > Hello! > > Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] mthca and DDR not hidden": > > > Having an allocator routine might force users to perform data copies when sending data. > > > > Well, no one is running MPI over MADs - its for setup and > > management, anyway. > > I guess I should have been clearer. I was only refering to data copies when sending MADs. Obviously, data sent on other QPs would not be affected. > Right, thats why another copy shouldnt be such a big deal. mst From mshefty at ichips.intel.com Tue Oct 5 09:50:59 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 09:50:59 -0700 Subject: [openib-general] Re: ib_mad: Scenarios for returning posted send MADs In-Reply-To: <1096983102.1861.22.camel@localhost.localdomain> References: <1096910801.1859.202.camel@localhost.localdomain> <20041004105229.5ce12e0e.mshefty@ichips.intel.com> <1096918490.1859.230.camel@localhost.localdomain> <20041004125208.61db4915.mshefty@ichips.intel.com> <1096983102.1861.22.camel@localhost.localdomain> Message-ID: <20041005095059.037c8961.mshefty@ichips.intel.com> On Tue, 05 Oct 2004 09:31:43 -0400 Hal Rosenstock wrote: > Do you mean that you should never get a callback for a mad_send_wr if > (rather than unless) it's reference count is at least one ? There will never be a completion callback associated with a mad_send_wr unless its reference count is >= 1. > Cancelling the MAD decrements the reference count which has the effect > of moving the callback one stage ahead. Either the send completion has > already occurred in which case the reference count will be 0 or negative > and the callback will be invoked immediately or it has not yet occurred > so is invoked when the send completion occurs subsequently. The latter > case leaves the MAD on both the port and agent send lists until it > occurs. There are the other removal scenarios which interplay with this. The reference count for a canceled MAD is only decremented *if there is a timeout specified*. The existence of a timeout value indicates that a response is expected. A mad_send_wr with a timeout has a reference count of 2 initially. It is decremented by one when the send completion callback is invoked. It is decremented if a response matches the request. In order for the request to match, it must still have a timeout value. If a MAD is canceled, its reference count is decremented if a timeout is given. The timeout is set to 0 in this case to prevent responses from matching with the MAD. > In any case, what should the MAD layer do when there are posted sends on > an agent list ? Should it just dump them and not attempt to make a > callback ? The bad side of this is that there are legitimate scenarios > where this might occur. Not making the send callback has a number of > side effects beyond the individual client (in terms of PCI mapping and > memory leakage). Currently, the MAD layer cancels all outstanding sends when the client deregisters, then waits for those sends to complete. The client receives a callback for all sends that it has posted, including those posted when it calls deregister. From halr at voltaire.com Tue Oct 5 11:54:15 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 14:54:15 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching Message-ID: <1097002455.2319.11.camel@hpc-1> Fix endian of high tid so responses are properly matched to requests Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 923) +++ access/ib_mad.c (working copy) @@ -346,7 +346,7 @@ } mad_send_wr->tid = ((struct ib_mad_hdr*) - bus_to_virt(cur_send_wr->sg_list->addr))->tid; + bus_to_virt(cur_send_wr->sg_list->addr))->tid.id; mad_send_wr->agent = mad_agent; mad_send_wr->timeout_ms = cur_send_wr->wr.ud.timeout_ms; if (mad_send_wr->timeout_ms) @@ -420,7 +420,7 @@ void ib_coalesce_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, void *buf) { - printk(KERN_ERR "ib_coalesce_recv_mad() not implemented as yet\n"); + printk(KERN_ERR "ib_coalesce_recv_mad() not implemented yet\n"); } EXPORT_SYMBOL(ib_coalesce_recv_mad); @@ -437,7 +437,7 @@ int ib_process_mad_wc(struct ib_mad_agent *mad_agent, struct ib_wc *wc) { - printk(KERN_ERR "ib_process_mad_wc() not implemented as yet\n"); + printk(KERN_ERR "ib_process_mad_wc() not implemented yet\n"); return 0; } EXPORT_SYMBOL(ib_process_mad_wc); @@ -684,7 +684,7 @@ /* Whether MAD was solicited determines type of routing to MAD client */ if (solicited) { /* Routing is based on high 32 bits of transaction ID of MAD */ - hi_tid = mad->mad_hdr.tid >> 32; + hi_tid = be32_to_cpu(mad->mad_hdr.tid.tid_field.hi_tid); list_for_each_entry(entry, &port_priv->agent_list, agent_list) { if (entry->agent.hi_tid == hi_tid) { mad_agent = entry; @@ -693,7 +693,7 @@ } if (!mad_agent) { printk(KERN_ERR "No client 0x%x for received MAD\n", - (u32)(mad->mad_hdr.tid >> 32)); + hi_tid); goto ret; } } else { @@ -795,7 +795,7 @@ if (solicited) { spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); mad_send_wr = find_send_req(mad_agent_priv, - recv->mad.mad.mad_hdr.tid); + recv->mad.mad.mad_hdr.tid.id); if (!mad_send_wr) { spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); @@ -936,6 +936,11 @@ } } + /* + * Leave sends with timeouts on the send list + * until either matching response is received + * or timeout occurs + */ if (--mad_send_wr->refcount > 0) { spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); return; @@ -1332,6 +1337,7 @@ list_del(&port_priv->recv_posted_mad_list[i]); } + INIT_LIST_HEAD(&port_priv->recv_posted_mad_list[i]); port_priv->recv_posted_mad_count[i] = 0; spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); @@ -1353,6 +1359,7 @@ /* Call completion handler with flushed status !!! */ } + INIT_LIST_HEAD(&port_priv->send_posted_mad_list); port_priv->send_posted_mad_count = 0; spin_unlock_irqrestore(&port_priv->send_list_lock, flags); Index: include/ib_mad.h =================================================================== --- include/ib_mad.h (revision 923) +++ include/ib_mad.h (working copy) @@ -69,6 +69,14 @@ union ib_gid dgid; } __attribute__ ((packed)); +union ib_tid { + u64 id; + struct { + u32 hi_tid; + u32 lo_tid; + } tid_field; +}; + struct ib_mad_hdr { u8 base_version; u8 mgmt_class; @@ -76,7 +84,7 @@ u8 method; u16 status; u16 class_specific; - u64 tid; + union ib_tid tid; u16 attr_id; u16 resv; u32 attr_mod; Index: include/ib_smi.h =================================================================== --- include/ib_smi.h (revision 923) +++ include/ib_smi.h (working copy) @@ -41,7 +41,7 @@ u16 status; u8 hop_ptr; u8 hop_cnt; - u64 tid; + union ib_tid tid; u16 attr_id; u16 resv; u32 attr_mod; From halr at voltaire.com Tue Oct 5 12:03:12 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 15:03:12 -0400 Subject: [openib-general] [PATCH] ib_mad: Fix return posted receive MAD routine Message-ID: <1097002992.2319.21.camel@hpc-1> ib_mad: Fix return posted receive MAD routine Index: ib_mad.c =================================================================== --- ib_mad.c (revision 924) +++ ib_mad.c (working copy) @@ -1327,15 +1327,35 @@ { int i; unsigned long flags; + struct ib_mad_private_header *mad_priv_hdr; + struct ib_mad_recv_buf *rbuf; + struct ib_mad_private *recv; for (i = 0; i < IB_MAD_QPS_CORE; i++) { spin_lock_irqsave(&port_priv->recv_list_lock, flags); while (!list_empty(&port_priv->recv_posted_mad_list[i])) { - /* PCI mapping !!! */ + rbuf = list_entry(&port_priv->recv_posted_mad_list[i], + struct ib_mad_recv_buf, list); + rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; + mad_priv_hdr = container_of(rbuf, + struct ib_mad_private_header, + recv_buf); + recv = container_of(mad_priv_hdr, + struct ib_mad_private, header); - list_del(&port_priv->recv_posted_mad_list[i]); + /* Remove for posted receive MAD list */ + list_del(&recv->header.recv_buf.list); + + /* Undo PCI mapping */ + pci_unmap_single(port_priv->device->dma_device, + pci_unmap_addr(&recv->header, mapping), + sizeof(struct ib_mad_private) - + sizeof(struct ib_mad_private_header), + PCI_DMA_FROMDEVICE); + kmem_cache_free(ib_mad_cache, recv); + } INIT_LIST_HEAD(&port_priv->recv_posted_mad_list[i]); @@ -1352,14 +1372,7 @@ unsigned long flags; spin_lock_irqsave(&port_priv->send_list_lock, flags); - while (!list_empty(&port_priv->send_posted_mad_list)) { - - list_del(&port_priv->send_posted_mad_list); - - /* Call completion handler with flushed status !!! */ - - } - + /* Just clear port send posted MAD list */ INIT_LIST_HEAD(&port_priv->send_posted_mad_list); port_priv->send_posted_mad_count = 0; spin_unlock_irqrestore(&port_priv->send_list_lock, flags); From roland at topspin.com Tue Oct 5 12:01:31 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 05 Oct 2004 12:01:31 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <1097002455.2319.11.camel@hpc-1> (Hal Rosenstock's message of "Tue, 05 Oct 2004 14:54:15 -0400") References: <1097002455.2319.11.camel@hpc-1> Message-ID: <52k6u5rnuc.fsf@topspin.com> + bus_to_virt(cur_send_wr->sg_list->addr))->tid.id; Didn't notice this before but any use of bus_to_virt() is broken. We need to figure out a different way to do whatever you're trying to do here. - R. From halr at voltaire.com Tue Oct 5 12:11:53 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 15:11:53 -0400 Subject: [openib-general] [PATCH] ib_mad: Include port number in mad_agent Message-ID: <1097003513.2319.30.camel@hpc-1> Include port_number in mad_agent This is to make it easy for the SMI client which needs to know the port number for received packets (which are associated with MAD agents) Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 925) +++ access/ib_mad.c (working copy) @@ -214,6 +214,7 @@ mad_agent_priv->agent.context = context; mad_agent_priv->agent.qp = port_priv->qp[qp_type]; mad_agent_priv->agent.hi_tid = ++ib_mad_client_id; + mad_agent_priv->agent.port_num = port_num; ret2 = add_mad_reg_req(mad_reg_req, mad_agent_priv); if (ret2) { Index: include/ib_mad.h =================================================================== --- include/ib_mad.h (revision 924) +++ include/ib_mad.h (working copy) @@ -144,6 +144,7 @@ * @hi_tid - Access layer assigned transaction ID for this client. * Unsolicited MADs sent by this client will have the upper 32-bits * of their TID set to this value. + * @port_num - Port number on which QP is registered */ struct ib_mad_agent { struct ib_device *device; @@ -152,6 +153,7 @@ ib_mad_send_handler send_handler; void *context; u32 hi_tid; + u8 port_num; }; /** From halr at voltaire.com Tue Oct 5 12:27:14 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 15:27:14 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <52k6u5rnuc.fsf@topspin.com> References: <1097002455.2319.11.camel@hpc-1> <52k6u5rnuc.fsf@topspin.com> Message-ID: <1097004434.2319.53.camel@hpc-1> On Tue, 2004-10-05 at 15:01, Roland Dreier wrote: > + bus_to_virt(cur_send_wr->sg_list->addr))->tid.id; > > Didn't notice this before but any use of bus_to_virt() is broken. We > need to figure out a different way to do whatever you're trying to do here. Can you explain why using bus_to_virt() is broken ? The tid of a requests is needed so responses can be matched. One way around this would be to pass the TID as a separate parameter in the ib_post_send_mad call. Maybe there are other less brute force ways. -- Hal From roland at topspin.com Tue Oct 5 13:04:53 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 05 Oct 2004 13:04:53 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <1097004434.2319.53.camel@hpc-1> (Hal Rosenstock's message of "Tue, 05 Oct 2004 15:27:14 -0400") References: <1097002455.2319.11.camel@hpc-1> <52k6u5rnuc.fsf@topspin.com> <1097004434.2319.53.camel@hpc-1> Message-ID: <52fz4trkwq.fsf@topspin.com> Hal> Can you explain why using bus_to_virt() is broken ? See Documentation/DMA-mapping.txt: "It is planned to completely remove virt_to_bus() and bus_to_virt() as they are entirely deprecated. Some ports already do not provide these as it is impossible to correctly support them." (for example ppc64 does not have bus_to_virt). Hal> The tid of a requests is needed so responses can be matched. Hal> One way around this would be to pass the TID as a separate Hal> parameter in the ib_post_send_mad call. Maybe there are other Hal> less brute force ways. I don't see a way around adding a TID parameter to ib_post_send_mad or adding a TID member to the ib_send_wr.wr.ud union. - R. From halr at voltaire.com Tue Oct 5 13:26:11 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 16:26:11 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <52fz4trkwq.fsf@topspin.com> References: <1097002455.2319.11.camel@hpc-1> <52k6u5rnuc.fsf@topspin.com> <1097004434.2319.53.camel@hpc-1> <52fz4trkwq.fsf@topspin.com> Message-ID: <1097007971.2593.2.camel@hpc-1> On Tue, 2004-10-05 at 16:04, Roland Dreier wrote: > I don't see a way around adding a TID parameter to ib_post_send_mad or > adding a TID member to the ib_send_wr.wr.ud union. Adding another member into the ud union seems better than another parameter for this. I will generate a patch for this soon. -- Hal From sean.hefty at intel.com Tue Oct 5 13:25:18 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 13:25:18 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <52fz4trkwq.fsf@topspin.com> Message-ID: > Hal> The tid of a requests is needed so responses can be matched. > > Hal> One way around this would be to pass the TID as a separate > Hal> parameter in the ib_post_send_mad call. Maybe there are other > Hal> less brute force ways. > >I don't see a way around adding a TID parameter to ib_post_send_mad or >adding a TID member to the ib_send_wr.wr.ud union. Are you saying that there's no way to access the MAD data itself from the access layer? RMPP will be extremely difficult to support if that's the case. From halr at voltaire.com Tue Oct 5 13:33:39 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 16:33:39 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: References: Message-ID: <1097008419.2593.12.camel@hpc-1> On Tue, 2004-10-05 at 16:25, Sean Hefty wrote: > > Hal> The tid of a requests is needed so responses can be matched. > > > > Hal> One way around this would be to pass the TID as a separate > > Hal> parameter in the ib_post_send_mad call. Maybe there are other > > Hal> less brute force ways. > > > >I don't see a way around adding a TID parameter to ib_post_send_mad or > >adding a TID member to the ib_send_wr.wr.ud union. > > Are you saying that there's no way to access the MAD data itself from the > access layer? RMPP will be extremely difficult to support if that's the > case. Good point. We will need more than access to the TID for RMPP. We need a replacement for bus_to_virt. Is there an "approved" way to get from DMA address to VA ? -- Hal From sean.hefty at intel.com Tue Oct 5 13:34:39 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 13:34:39 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <1097002455.2319.11.camel@hpc-1> Message-ID: >Fix endian of high tid so responses are properly matched to requests noooooooooooooooooooo.... the TID is in the MAD and goes on the wire. Please, do not use CPU endian! > mad_send_wr->tid = ((struct ib_mad_hdr*) >- bus_to_virt(cur_send_wr->sg_list->addr))->tid; >+ bus_to_virt(cur_send_wr->sg_list->addr))- >>tid.id; A response MAD should have exactly the same TID as what was sent. Not sure why we aren't matching against the entire TID. > if (solicited) { > /* Routing is based on high 32 bits of transaction ID of MAD >*/ >- hi_tid = mad->mad_hdr.tid >> 32; >+ hi_tid = be32_to_cpu(mad->mad_hdr.tid.tid_field.hi_tid); This shouldn't be necessary: Sender of request (system 1): mad.tid = (mad_agent.hi_tid << 32) | user_tid; send mad Receiver of response (system 1): hi_tid = mad.tid >> 32 The receiver of the request should just return the same TID that it received. >+ /* >+ * Leave sends with timeouts on the send list >+ * until either matching response is received >+ * or timeout occurs >+ */ FYI - this is about to change in my next patch. >+union ib_tid { >+ u64 id; >+ struct { >+ u32 hi_tid; >+ u32 lo_tid; >+ } tid_field; >+}; >+ I don't see why TID can't be u64 everywhere. We shouldn't have to make it a union. From mshefty at ichips.intel.com Tue Oct 5 13:42:54 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 13:42:54 -0700 Subject: [openib-general] [PATCH] ib_mad: Fix return posted receive MAD routine In-Reply-To: <1097002992.2319.21.camel@hpc-1> References: <1097002992.2319.21.camel@hpc-1> Message-ID: <20041005134254.7db79eac.mshefty@ichips.intel.com> On Tue, 05 Oct 2004 15:03:12 -0400 Hal Rosenstock wrote: > + rbuf = list_entry(&port_priv->recv_posted_mad_list[i], > + struct ib_mad_recv_buf, list); > + rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; Can we change this, and similar occurrences, to use "->next" in the list_entry macro call itself? Currently, after the execution of the first statement, rbuf does not reference struct ib_mad_recv_buf. It references the port_priv routine. - Sean From mshefty at ichips.intel.com Tue Oct 5 13:50:47 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 13:50:47 -0700 Subject: [openib-general] [PATCH] ib_mad: Fix return posted receive MAD routine In-Reply-To: <20041005134254.7db79eac.mshefty@ichips.intel.com> References: <1097002992.2319.21.camel@hpc-1> <20041005134254.7db79eac.mshefty@ichips.intel.com> Message-ID: <20041005135047.547f44a7.mshefty@ichips.intel.com> On Tue, 5 Oct 2004 13:42:54 -0700 Sean Hefty wrote: > On Tue, 05 Oct 2004 15:03:12 -0400 > Hal Rosenstock wrote: > > > + rbuf = list_entry(&port_priv->recv_posted_mad_list[i], > > + struct ib_mad_recv_buf, list); > > + rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; > > Can we change this, and similar occurrences, to use "->next" in the list_entry macro call itself? Currently, after the execution of the first statement, rbuf does not reference struct ib_mad_recv_buf. It references the port_priv routine. - Er... structure that is, not routine... too much multitasking on my brain at the moment. From iod00d at hp.com Tue Oct 5 13:51:37 2004 From: iod00d at hp.com (Grant Grundler) Date: Tue, 5 Oct 2004 13:51:37 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <1097008419.2593.12.camel@hpc-1> References: <1097008419.2593.12.camel@hpc-1> Message-ID: <20041005205137.GA20420@cup.hp.com> On Tue, Oct 05, 2004 at 04:33:39PM -0400, Hal Rosenstock wrote: > Good point. We will need more than access to the TID for RMPP. We need a > replacement for bus_to_virt. Is there an "approved" way to get from DMA > address to VA ? Yes, pci_map_*. See Documentation/DMA-mapping.txt. The caller needs to save the DMA address to program it into HW and to call the pci_unmap_XXX() interfaces. No, if you mean does the OS support reverse lookups (return any DMA mapping associated with a given Virtual address). grant From halr at voltaire.com Tue Oct 5 13:57:58 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 16:57:58 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: References: Message-ID: <1097009878.2431.5.camel@hpc-1> On Tue, 2004-10-05 at 16:34, Sean Hefty wrote: > >Fix endian of high tid so responses are properly matched to requests > > noooooooooooooooooooo.... the TID is in the MAD and goes on the wire. > Please, do not use CPU endian! > > > mad_send_wr->tid = ((struct ib_mad_hdr*) > >- > bus_to_virt(cur_send_wr->sg_list->addr))->tid; > >+ bus_to_virt(cur_send_wr->sg_list->addr))- > >>tid.id; > > A response MAD should have exactly the same TID as what was sent. Not sure > why we aren't matching against the entire TID. It's only done for comparison purposes (taking the TID off the wire in network endian and converting to CPU endian); hi_tid = be32_to_cpu(mad->mad_hdr.tid.tid_field.hi_tid); list_for_each_entry(entry, &port_priv->agent_list, agent_list) { if (entry->agent.hi_tid == hi_tid) { ... > > if (solicited) { > > /* Routing is based on high 32 bits of transaction ID of MAD > >*/ > >- hi_tid = mad->mad_hdr.tid >> 32; > >+ hi_tid = be32_to_cpu(mad->mad_hdr.tid.tid_field.hi_tid); > > This shouldn't be necessary: The comparison failed when the code was without the conversion. > Sender of request (system 1): > mad.tid = (mad_agent.hi_tid << 32) | user_tid; > send mad > > Receiver of response (system 1): > hi_tid = mad.tid >> 32 > > The receiver of the request should just return the same TID that it > received. It does. > >+ /* > >+ * Leave sends with timeouts on the send list > >+ * until either matching response is received > >+ * or timeout occurs > >+ */ > > FYI - this is about to change in my next patch. > > >+union ib_tid { > >+ u64 id; > >+ struct { > >+ u32 hi_tid; > >+ u32 lo_tid; > >+ } tid_field; > >+}; > >+ > > I don't see why TID can't be u64 everywhere. We shouldn't have to make it a > union. -- Hal From roland at topspin.com Tue Oct 5 13:59:37 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 05 Oct 2004 13:59:37 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <1097008419.2593.12.camel@hpc-1> (Hal Rosenstock's message of "Tue, 05 Oct 2004 16:33:39 -0400") References: <1097008419.2593.12.camel@hpc-1> Message-ID: <52brfgswxy.fsf@topspin.com> Hal> Good point. We will need more than access to the TID for Hal> RMPP. We need a replacement for bus_to_virt. Is there an Hal> "approved" way to get from DMA address to VA ? No, you just need to save off the VA if you need to use it later. So maybe we need to add a pointer to the MAD header in ib_send_wr.wr.ud. - Roland From mshefty at ichips.intel.com Tue Oct 5 14:01:06 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 14:01:06 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <1097009878.2431.5.camel@hpc-1> References: <1097009878.2431.5.camel@hpc-1> Message-ID: <20041005140106.5db3837b.mshefty@ichips.intel.com> On Tue, 05 Oct 2004 16:57:58 -0400 Hal Rosenstock wrote: > > > mad_send_wr->tid = ((struct ib_mad_hdr*) > > >- > > bus_to_virt(cur_send_wr->sg_list->addr))->tid; > > >+ bus_to_virt(cur_send_wr->sg_list->addr))- > > >>tid.id; > > > > A response MAD should have exactly the same TID as what was sent. Not sure > > why we aren't matching against the entire TID. > > It's only done for comparison purposes (taking the TID off the wire in > network endian and converting to CPU endian); The sender of the request should be able to set the TID to whatever value it wants. The responder should just echo that TID. I don't understand why byte-swapping is needed at all on the TID. > > >- hi_tid = mad->mad_hdr.tid >> 32; > > >+ hi_tid = be32_to_cpu(mad->mad_hdr.tid.tid_field.hi_tid); > > > > This shouldn't be necessary: > > The comparison failed when the code was without the conversion. Why did the comparison fail? Was the client setting the upper 32-bits of the TID correctly, or was the client swapping the bits when setting it? From mshefty at ichips.intel.com Tue Oct 5 14:02:03 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 14:02:03 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <52brfgswxy.fsf@topspin.com> References: <1097008419.2593.12.camel@hpc-1> <52brfgswxy.fsf@topspin.com> Message-ID: <20041005140203.14c951f3.mshefty@ichips.intel.com> On Tue, 05 Oct 2004 13:59:37 -0700 Roland Dreier wrote: > Hal> Good point. We will need more than access to the TID for > Hal> RMPP. We need a replacement for bus_to_virt. Is there an > Hal> "approved" way to get from DMA address to VA ? > > No, you just need to save off the VA if you need to use it later. So > maybe we need to add a pointer to the MAD header in ib_send_wr.wr.ud. Would we need multiple VAs if scatter-gather is used by the client? From roland at topspin.com Tue Oct 5 14:06:13 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 05 Oct 2004 14:06:13 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <20041005140203.14c951f3.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 5 Oct 2004 14:02:03 -0700") References: <1097008419.2593.12.camel@hpc-1> <52brfgswxy.fsf@topspin.com> <20041005140203.14c951f3.mshefty@ichips.intel.com> Message-ID: <523c0sswmy.fsf@topspin.com> Sean> Would we need multiple VAs if scatter-gather is used by the client? Yep. Or we could just say that all the fields the access layer needs to look at must be in the first s/g entry. - R. From mshefty at ichips.intel.com Tue Oct 5 14:08:56 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 5 Oct 2004 14:08:56 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <523c0sswmy.fsf@topspin.com> References: <1097008419.2593.12.camel@hpc-1> <52brfgswxy.fsf@topspin.com> <20041005140203.14c951f3.mshefty@ichips.intel.com> <523c0sswmy.fsf@topspin.com> Message-ID: <20041005140856.4d337584.mshefty@ichips.intel.com> On Tue, 05 Oct 2004 14:06:13 -0700 Roland Dreier wrote: > Sean> Would we need multiple VAs if scatter-gather is used by the client? > > Yep. Or we could just say that all the fields the access layer needs > to look at must be in the first s/g entry. You're right, and thinking about it more, we'd probably want that anyway. From halr at voltaire.com Tue Oct 5 17:18:11 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 20:18:11 -0400 Subject: [openib-general] [PATCH] ib_smi: First working version of SMI/SMA Message-ID: <1097021890.3900.2.camel@hpc-1> ib_smi: First working version of SMI/SMA (port gets to active) There is a workaround for the hop pointer in the response which I will work on tomorrow. Index: ib_smi.c =================================================================== --- ib_smi.c (revision 923) +++ ib_smi.c (working copy) @@ -24,13 +24,24 @@ */ #include +#include "ib_smi_priv.h" #include "ib_mad_priv.h" + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_DESCRIPTION("kernel IB SMI"); +MODULE_AUTHOR("Sean Hefty"); +MODULE_AUTHOR("Hal Rosenstock"); + + +static spinlock_t ib_smi_port_list_lock = SPIN_LOCK_UNLOCKED; +static struct list_head ib_smi_port_list; + /* * Fixup a directed route SMP for sending. Return 0 if the SMP should be * discarded. */ -static int smi_handle_dr_smp_send(struct ib_mad_port_private *port_priv, +static int smi_handle_dr_smp_send(struct ib_mad_agent *mad_agent, struct ib_smp *smp) { u8 hop_ptr, hop_cnt; @@ -44,25 +55,25 @@ if (hop_cnt && hop_ptr == 0) { smp->hop_ptr++; return (smp->initial_path[smp->hop_ptr] == - port_priv->port_num); + mad_agent->port_num); } /* C14-9:2 */ if (hop_ptr && hop_ptr < hop_cnt) { - if (port_priv->device->node_type != IB_NODE_SWITCH) + if (mad_agent->device->node_type != IB_NODE_SWITCH) return 0; /* smp->return_path set when received */ smp->hop_ptr++; return (smp->initial_path[smp->hop_ptr] == - port_priv->port_num); + mad_agent->port_num); } /* C14-9:3 -- We're at the end of the DR segment of path */ if (hop_ptr == hop_cnt) { /* smp->return_path set when received */ smp->hop_ptr++; - return (port_priv->device->node_type != IB_NODE_CA || + return (mad_agent->device->node_type != IB_NODE_CA || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -75,24 +86,24 @@ if (hop_cnt && hop_ptr == hop_cnt + 1) { smp->hop_ptr--; return (smp->return_path[smp->hop_ptr] == - port_priv->port_num); + mad_agent->port_num); } /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (port_priv->device->node_type != IB_NODE_SWITCH) + if (mad_agent->device->node_type != IB_NODE_SWITCH) return 0; smp->hop_ptr--; return (smp->return_path[smp->hop_ptr] == - port_priv->port_num); + mad_agent->port_num); } /* C14-13:3 -- at the end of the DR segment of path */ if (hop_ptr == 1) { smp->hop_ptr--; /* C14-13:3 -- SMPs destined for SM shouldn't be here */ - return (port_priv->device->node_type == IB_NODE_SWITCH && + return (mad_agent->device->node_type == IB_NODE_SWITCH && smp->dr_slid != IB_LID_PERMISSIVE); } @@ -106,13 +117,13 @@ * Sender side handling of outgoing SMPs. Fixup the SMP as required by * the spec. Return 0 if the SMP should be dropped. */ -static int smi_handle_smp_send(struct ib_mad_port_private *port_priv, +static int smi_handle_smp_send(struct ib_mad_agent *mad_agent, struct ib_smp *smp) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_send(port_priv, smp); + return smi_handle_dr_smp_send(mad_agent, smp); default: return 0; /* write me... */ } @@ -121,12 +132,12 @@ /* * Return 1 if the SMP should be handled by the local SMA via process_mad. */ -static inline int smi_check_local_smp(struct ib_mad_port_private *port_priv, +static inline int smi_check_local_smp(struct ib_mad_agent *mad_agent, struct ib_smp *smp) { /* C14-9:3 -- We're at the end of the DR segment of path */ /* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM. */ - return (port_priv->device->process_mad && + return (mad_agent->device->process_mad && !ib_get_smp_direction(smp) && (smp->hop_ptr == smp->hop_cnt + 1)); } @@ -135,7 +146,7 @@ * Adjust information for a received SMP. Return 0 if the SMP should be * dropped. */ -static int smi_handle_dr_smp_recv(struct ib_mad_port_private *port_priv, +static int smi_handle_dr_smp_recv(struct ib_mad_agent *mad_agent, struct ib_smp *smp) { u8 hop_ptr, hop_cnt; @@ -151,22 +162,22 @@ /* C14-9:2 -- intermediate hop */ if (hop_ptr && hop_ptr < hop_cnt) { - if (port_priv->device->node_type != IB_NODE_SWITCH) + if (mad_agent->device->node_type != IB_NODE_SWITCH) return 0; - smp->return_path[hop_ptr] = port_priv->port_num; + smp->return_path[hop_ptr] = mad_agent->port_num; /* smp->hop_ptr updated when sending */ return 1; /*(smp->initial_path[hop_ptr+1] <= - port_priv->device->phys_port_cnt); */ + mad_agent->device->phys_port_cnt); */ } /* C14-9:3 -- We're at the end of the DR segment of path */ if (hop_ptr == hop_cnt) { if (hop_cnt) - smp->return_path[hop_ptr] = port_priv->port_num; + smp->return_path[hop_ptr] = mad_agent->port_num; /* smp->hop_ptr updated when sending */ - return (port_priv->device->node_type != IB_NODE_CA || + return (mad_agent->device->node_type != IB_NODE_CA || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -182,12 +193,12 @@ /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (port_priv->device->node_type != IB_NODE_SWITCH) + if (mad_agent->device->node_type != IB_NODE_SWITCH) return 0; /* smp->hop_ptr updated when sending */ return 1; /*(smp->return_path[hop_ptr-1] <= - port_priv->device->phys_port_cnt); */ + mad_agent->device->phys_port_cnt); */ } /* C14-13:3 -- We're at the end of the DR segment of path */ @@ -198,7 +209,7 @@ return 1; } /* smp->hop_ptr updated when sending */ - return (port_priv->device->node_type != IB_NODE_CA); + return (mad_agent->device->node_type != IB_NODE_CA); } /* C14-13:4 -- hop_ptr = 0 -> give to SM. */ @@ -211,13 +222,13 @@ * Receive side handling SMPs. Save receive information as required by * the spec. Return 0 if the SMP should be dropped. */ -static int smi_handle_smp_recv(struct ib_mad_port_private *port_priv, +static int smi_handle_smp_recv(struct ib_mad_agent *mad_agent, struct ib_smp *smp) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_recv(port_priv, smp); + return smi_handle_dr_smp_recv(mad_agent, smp); default: return 0; /* write me... */ } @@ -227,7 +238,7 @@ * Return 1 if the received DR SMP should be forwarded to the send queue. * Return 0 if the SMP should be completed up the stack. */ -static int smi_check_forward_dr_smp(struct ib_mad_port_private *port_priv, +static int smi_check_forward_dr_smp(struct ib_mad_agent *mad_agent, struct ib_smp *smp) { u8 hop_ptr, hop_cnt; @@ -263,56 +274,426 @@ * Return 1 if the received SMP should be forwarded to the send queue. * Return 0 if the SMP should be completed up the stack. */ -static int smi_check_forward_smp(struct ib_mad_port_private *port_priv, +static int smi_check_forward_smp(struct ib_mad_agent *mad_agent, struct ib_smp *smp) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_check_forward_dr_smp(port_priv, smp); + return smi_check_forward_dr_smp(mad_agent, smp); default: return 0; /* write me... */ } } -/* -static int smi_process_local(struct ib_mad_port_private *port_priv, - struct ib_smp *smp) +static int smi_process_local(struct ib_mad_agent *mad_agent, + struct ib_mad *smp, + struct ib_mad *smp_response, + u16 slid) { - port_priv->device->process_mad( ... ); + return mad_agent->device->process_mad(mad_agent->device, 0, + mad_agent->port_num, + slid, smp, smp_response); } -int smi_send_smp(struct ib_mad_port_private *port_priv, - struct ib_smp *smp) +void smp_send(struct ib_mad_agent *mad_agent, + struct ib_mad *smp, + struct ib_mad_recv_wc *mad_recv_wc) { - if (!smi_handle_smp_send(port_priv, smp)) { - smi_fail_send() - return 0; + struct ib_smi_port_private *entry, *port_priv = NULL; + struct ib_smi_send_wr *smi_send_wr; + struct ib_sge gather_list; + struct ib_send_wr send_wr; + struct ib_send_wr *bad_send_wr; + struct ib_ah_attr ah_attr; + struct ib_ah *ah; + unsigned long flags; + + /* Find matching MAD agent */ + spin_lock_irqsave(&ib_smi_port_list_lock, flags); + list_for_each_entry(entry, &ib_smi_port_list, port_list) { + if (entry->mad_agent == mad_agent) { + port_priv = entry; + break; + } } + spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + if (!port_priv) { + printk(KERN_ERR "smp_send: no matching MAD agent 0x%x\n", mad_agent); + return; + } - if (smi_check_local_smp(port_priv, smp)) { - smi_process_local(port_priv, smp); + smi_send_wr = kmalloc(sizeof(*smi_send_wr), GFP_KERNEL); + if (!smi_send_wr) + return; + smi_send_wr->smp = smp; + + /* PCI mapping */ + gather_list.addr = pci_map_single(mad_agent->device->dma_device, + smp, + sizeof(struct ib_mad), + PCI_DMA_TODEVICE); + gather_list.length = sizeof(struct ib_mad); + gather_list.lkey = (*port_priv->mr).lkey; + + send_wr.next = NULL; + send_wr.opcode = IB_WR_SEND; + send_wr.sg_list = &gather_list; + send_wr.num_sge = 1; + send_wr.wr.ud.remote_qpn = mad_recv_wc->wc->src_qp; /* DQPN */ + send_wr.wr.ud.timeout_ms = 0; + send_wr.wr.ud.pkey_index = 0; /* Should only matter for GMPs */ + send_wr.send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED; + + ah_attr.dlid = mad_recv_wc->wc->slid; + ah_attr.port_num = mad_agent->port_num; + ah_attr.src_path_bits = mad_recv_wc->wc->dlid_path_bits; + ah_attr.ah_flags = 0; /* No GRH */ + ah_attr.sl = mad_recv_wc->wc->sl; + ah_attr.static_rate = 0; + + ah = ib_create_ah(mad_agent->qp->pd, &ah_attr); + if (IS_ERR(ah)) { + printk(KERN_ERR "No memory for address handle\n"); + kfree(smp); + return; + } + + send_wr.wr.ud.ah = ah; + send_wr.wr.ud.remote_qkey = 0; /* for SMPs */ + send_wr.wr_id = ++port_priv->wr_id; + + pci_unmap_addr_set(smp, mapping, gather_list.addr); + + /* Send */ + spin_lock_irqsave(&port_priv->send_list_lock, flags); + if (ib_post_send_mad(mad_agent, &send_wr, &bad_send_wr)) { + pci_unmap_single(mad_agent->device->dma_device, + pci_unmap_addr(smp, mapping), + sizeof(struct ib_mad), + PCI_DMA_TODEVICE); + } else { + list_add_tail(&smi_send_wr->send_list, + &port_priv->send_posted_smp_list); + } + spin_unlock_irqrestore(&port_priv->send_list_lock, flags); + ib_destroy_ah(ah); +} + +int smi_send_smp(struct ib_mad_agent *mad_agent, + struct ib_smp *smp, + struct ib_mad_recv_wc *mad_recv_wc, + u16 slid) +{ + struct ib_mad *smp_response; + int ret; + + if (!smi_handle_smp_send(mad_agent, smp)) { return 0; } - * Post the send on the QP * + if (smi_check_local_smp(mad_agent, smp)) { + smp_response = kmalloc(sizeof(struct ib_mad), GFP_KERNEL); + if (!smp_response) + return 0; + + ret = smi_process_local(mad_agent, (struct ib_mad *)smp, + smp_response, slid); + if (ret & IB_MAD_RESULT_SUCCESS) { + /* Workaround !!! */ + ((struct ib_smp *)smp_response)->hop_ptr--; + smp_send(mad_agent, smp_response, mad_recv_wc); + } else + kfree(smp_response); + return 1; + } + + /* Post the send on the QP */ return 1; } -int smi_recv_smp(struct ib_mad_port_private *port_priv, - struct ib_smp *smp) +int smi_recv_smp(struct ib_mad_agent *mad_agent, + struct ib_smp *smp, + struct ib_mad_recv_wc *mad_recv_wc) { - if (!smi_handle_smp_recv(port_priv, smp)) { - smi_fail_recv(); + if (!smi_handle_smp_recv(mad_agent, smp)) { return 0; } - if (smi_check_forward_smp(port_priv, smp)) { - smi_send_smp(port_priv, smp); + if (smi_check_forward_smp(mad_agent, smp)) { + smi_send_smp(mad_agent, smp, mad_recv_wc, mad_recv_wc->wc->slid); return 0; } - * Complete receive up stack * + /* Complete receive up stack */ return 1; } -*/ + +static void smi_send_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_send_wc *mad_send_wc) +{ + struct ib_smi_port_private *entry, *port_priv = NULL; + struct ib_smi_send_wr *smi_send_wr; + struct list_head *send_wr; + unsigned long flags; + + /* Find matching MAD agent */ + spin_lock_irqsave(&ib_smi_port_list_lock, flags); + list_for_each_entry(entry, &ib_smi_port_list, port_list) { + if (entry->mad_agent == mad_agent) { + port_priv = entry; + break; + } + } + /* Hold lock longer !!! */ + spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + if (!port_priv) { + printk(KERN_ERR "smi_send_handler: no matching MAD agent 0x%x\n", mad_agent); + return; + } + + /* Completion corresponds to first entry on posted MAD send list */ + spin_lock_irqsave(&port_priv->send_list_lock, flags); + if (list_empty(&port_priv->send_posted_smp_list)) { + spin_unlock_irqrestore(&port_priv->send_list_lock, flags); + printk(KERN_ERR "Send completion WR ID 0x%Lx but send list " + "is empty\n", mad_send_wc->wr_id); + return; + } + + smi_send_wr = list_entry(&port_priv->send_posted_smp_list, + struct ib_smi_send_wr, + send_list); + send_wr = smi_send_wr->send_list.next; + smi_send_wr = container_of(send_wr, struct ib_smi_send_wr, send_list); + + /* Remove from posted send SMP list */ + list_del(&smi_send_wr->send_list); + spin_unlock_irqrestore(&port_priv->send_list_lock, flags); + + /* Unmap PCI */ + pci_unmap_single(mad_agent->device->dma_device, + pci_unmap_addr(smi_send_wr->smp, mapping), + sizeof(struct ib_mad), + PCI_DMA_TODEVICE); + + /* Release allocated memory */ + kfree(smi_send_wr->smp); +} + +static void smi_recv_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_recv_wc *mad_recv_wc) +{ + smi_recv_smp(mad_agent, + (struct ib_smp *)mad_recv_wc->recv_buf->mad, + mad_recv_wc); + + /* Free received MAD */ + ib_free_recv_mad(mad_recv_wc); +} + +static int ib_smi_port_open(struct ib_device *device, int port_num) +{ + int ret; + u64 iova = 0; + struct ib_phys_buf buf_list = { + .addr = 0, + .size = (unsigned long) high_memory - PAGE_OFFSET + }; + struct ib_smi_port_private *entry, *port_priv = NULL; + struct ib_mad_reg_req reg_req; + unsigned long flags; + + /* First, check if port already open for SMI */ + spin_lock_irqsave(&ib_smi_port_list_lock, flags); + list_for_each_entry(entry, &ib_smi_port_list, port_list) { + if (entry->mad_agent->device == device && entry->port_num == port_num) { + port_priv = entry; + break; + } + } + spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + if (port_priv) { + printk(KERN_DEBUG "%s port %d already open\n", + device->name, port_num); + return 0; + } + + /* Create new device info */ + port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); + if (!port_priv) { + printk(KERN_ERR "No memory for ib_smi_port_private\n"); + return -ENOMEM; + } + + memset(port_priv, 0, sizeof *port_priv); + port_priv->port_num = port_num; + port_priv->wr_id = 0; + spin_lock_init(&port_priv->send_list_lock); + INIT_LIST_HEAD(&port_priv->send_posted_smp_list); + + reg_req.mgmt_class = IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE; + reg_req.mgmt_class_version = 1; + /* All methods for now even though only some are used BY SMA !!! */ + bitmap_fill(®_req.method_mask, IB_MGMT_MAX_METHODS); + + port_priv->mad_agent = ib_register_mad_agent(device, port_num, + IB_QPT_SMI, + ®_req, 0, + &smi_send_handler, + &smi_recv_handler, + NULL); + if (IS_ERR(port_priv->mad_agent)) { + port_priv->mad_agent = NULL; + ret = PTR_ERR(port_priv->mad_agent); + kfree(port_priv); + return ret; + } + + port_priv->mr = ib_reg_phys_mr(port_priv->mad_agent->qp->pd, + &buf_list, 1, + IB_ACCESS_LOCAL_WRITE, &iova); + if (IS_ERR(port_priv->mr)) { + printk(KERN_ERR "Couldn't register MR\n"); + ib_unregister_mad_agent(port_priv->mad_agent); + ret = PTR_ERR(port_priv->mr); + kfree(port_priv); + return ret; + } + + spin_lock_irqsave(&ib_smi_port_list_lock, flags); + list_add_tail(&port_priv->port_list, &ib_smi_port_list); + spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + + return 0; +} + +static int ib_smi_port_close(struct ib_device *device, int port_num) +{ + struct ib_smi_port_private *entry, *port_priv = NULL; + unsigned long flags; + + spin_lock_irqsave(&ib_smi_port_list_lock, flags); + list_for_each_entry(entry, &ib_smi_port_list, port_list) { + if (entry->mad_agent->device == device && entry->port_num == port_num) { + port_priv = entry; + break; + } + } + + if (port_priv == NULL) { + printk(KERN_ERR "Port %d not found\n", port_num); + spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + return -ENODEV; + } + + list_del(&port_priv->port_list); + spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + + ib_dereg_mr(port_priv->mr); + kfree(port_priv); + + return 0; +} + +static void ib_smi_init_device(struct ib_device *device) +{ + int ret, num_ports, cur_port, i, ret2; + struct ib_device_attr device_attr; + + ret = ib_query_device(device, &device_attr); + if (ret) { + printk(KERN_ERR "Couldn't query device %s\n", device->name); + goto error_device_query; + } + + if (device->node_type == IB_NODE_SWITCH) { + num_ports = 1; + cur_port = 0; + } else { + num_ports = device_attr.phys_port_cnt; + cur_port = 1; + } + + for (i = 0; i < num_ports; i++, cur_port++) { + ret = ib_smi_port_open(device, cur_port); + if (ret) { + printk(KERN_ERR "Couldn't open %s port %d\n", + device->name, cur_port); + goto error_device_open; + } + } + + goto error_device_query; + +error_device_open: + while (i > 0) { + cur_port--; + ret2 = ib_smi_port_close(device, cur_port); + if (ret2) { + printk(KERN_ERR "Couldn't close %s port %d\n", + device->name, cur_port); + } + i--; + } + +error_device_query: + return; +} + +static void ib_smi_remove_device(struct ib_device *device) +{ + int ret, i, num_ports, cur_port, ret2; + struct ib_device_attr device_attr; + + ret = ib_query_device(device, &device_attr); + if (ret) { + printk(KERN_ERR "Couldn't query device %s\n", device->name); + goto error_device_query; + } + + if (device->node_type == IB_NODE_SWITCH) { + num_ports = 1; + cur_port = 0; + } else { + num_ports = device_attr.phys_port_cnt; + cur_port = 1; + } + for (i = 0; i < num_ports; i++, cur_port++) { + ret2 = ib_smi_port_close(device, cur_port); + if (ret2) { + printk(KERN_ERR "Couldn't close %s port %d\n", + device->name, cur_port); + if (!ret) + ret = ret2; + } + } + +error_device_query: + return; +} + +static struct ib_client ib_smi_client = { + .name = "ib_smi", + .add = ib_smi_init_device, + .remove = ib_smi_remove_device +}; + +static int __init ib_smi_init(void) +{ + INIT_LIST_HEAD(&ib_smi_port_list); + if (ib_register_client(&ib_smi_client)) { + printk(KERN_ERR "Couldn't register ib_smi client\n"); + return -EINVAL; + } + + return 0; +} + +static void __exit ib_smi_exit(void) +{ + ib_unregister_client(&ib_smi_client); +} + +module_init(ib_smi_init); +module_exit(ib_smi_exit); Index: ib_smi_priv.h =================================================================== --- ib_smi_priv.h (revision 930) +++ ib_smi_priv.h (working copy) @@ -23,8 +23,15 @@ Copyright (c) 2004 Voltaire Corporation. All rights reserved. */ +struct ib_smi_send_wr { + struct list_head send_list; + struct ib_mad *smp; +}; + struct ib_smi_port_private { struct list_head port_list; + struct list_head send_posted_smp_list; + spinlock_t send_list_lock; int port_num; struct ib_mad_agent *mad_agent; struct ib_mr *mr; Index: Makefile =================================================================== --- Makefile (revision 923) +++ Makefile (working copy) @@ -1,9 +1,11 @@ EXTRA_CFLAGS += -I. -Idrivers/infiniband/include obj-$(CONFIG_INFINIBAND_ACCESS_LAYER) += \ - ib_al.o + ib_al.o \ + ib_sma.o ib_al-objs := \ - ib_mad.o \ + ib_mad.o + +ib_sma-objs := \ ib_smi.o - From halr at voltaire.com Tue Oct 5 17:41:56 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 05 Oct 2004 20:41:56 -0400 Subject: [openib-general] [ANNOUNCEMENT]: MAD layer with SMI/SMA Pre Release Message-ID: <1097023316.3900.16.camel@hpc-1> All the bits for the MAD layer with SMI/SMA are now in the openib-candidate branch. The OpenIB access layer is comprised of the MAD layer and the SMI/SMA. These are currently implemented as separate modules. These are layered on top of the mthca driver and cannot run currently with the MAD layer from the roland_merge branch. These directions are how to combine the two for building and running. Note also that the stack currently ends here (e.g. don't expect to run IPoIB right now, although we are heading to this; especially don't expect to run SRP or SDP or u/kDAPL right now either). There is a README in src/linux-kernel/infiniband/access which also contains building and using instructions. There are a number of things I would like to do before formally announcing it but for anyone who is adventurous: Run through install and build process from scratch. Test GSI sending and matching now that port is active. Other Pending Items: Add virtual address to the ud parameter in ib_mad_send. There is a workaround on the hop pointer in ib_smi.c which needs a real fix. Investigate TID endian. Redo MAD completion handler error handling. Trim method mask used in ib_smi.c Undo commenting out of ib_mad_send in mthca_mad.c I will knock some of these off by early tomorrow. -- Hal From roland at topspin.com Tue Oct 5 18:03:21 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 05 Oct 2004 18:03:21 -0700 Subject: [openib-general] [ANNOUNCEMENT]: MAD layer with SMI/SMA Pre Release In-Reply-To: <1097023316.3900.16.camel@hpc-1> (Hal Rosenstock's message of "Tue, 05 Oct 2004 20:41:56 -0400") References: <1097023316.3900.16.camel@hpc-1> Message-ID: <52ekkcr73a.fsf@topspin.com> >From the README: > Note that starting ib_al does not yet cause ib_mthca to be started. I would delete the "yet" in that sentence. There is no reason for the core IB layer to load any low-level drivers, any more than eg "modprobe ipv6" should load e1000. On a modern distro, hotplug should load ib_mthca automatically if any HCAs are installed (in fact for development I have to manually disable this to avoid broken versions getting loaded during bootup). - R. From halr at voltaire.com Wed Oct 6 05:30:57 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 08:30:57 -0400 Subject: [openib-general] [PATCH] ib_smi: Deregister agent when closing port Message-ID: <1097065857.11857.2.camel@hpc-1> ib_smi: Deregister agent when closing port Index: ib_smi.c =================================================================== --- ib_smi.c (revision 936) +++ ib_smi.c (working copy) @@ -593,6 +593,7 @@ spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); ib_dereg_mr(port_priv->mr); + ib_unregister_mad_agent(port_priv->mad_agent); kfree(port_priv); return 0; From halr at voltaire.com Wed Oct 6 06:00:15 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 09:00:15 -0400 Subject: [openib-general] [PATCH] ib_mad: Change completion hander WC status handling not to restart port Message-ID: <1097067615.2427.2.camel@hpc-1> ib_mad: Change completion handler WC status handling to not restart port Index: ib_mad.c =================================================================== --- ib_mad.c (revision 935) +++ ib_mad.c (working copy) @@ -80,7 +80,6 @@ static int add_mad_reg_req(struct ib_mad_reg_req *mad_reg_req, struct ib_mad_agent_private *priv); static void remove_mad_reg_req(struct ib_mad_agent_private *priv); -static int ib_mad_port_restart(struct ib_mad_port_private *priv); static int ib_mad_post_receive_mad(struct ib_mad_port_private *port_priv, struct ib_qp *qp); static int ib_mad_post_receive_mads(struct ib_mad_port_private *priv); @@ -1010,35 +1009,22 @@ static void ib_mad_completion_handler(struct ib_mad_port_private *port_priv) { struct ib_wc wc; - int err_status = 0; ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { printk(KERN_DEBUG "Completion opcode 0x%x WRID 0x%Lx\n", wc.opcode, wc.wr_id); - if (wc.status != IB_WC_SUCCESS) { - switch (wc.opcode) { - case IB_WC_SEND: - printk(KERN_ERR "Send completion error %d\n", - wc.status); - break; - case IB_WC_RECV: - printk(KERN_ERR "Recv completion error %d\n", - wc.status); - break; - default: - printk(KERN_ERR "Unknown completion 0x%x with error %d\n", wc.opcode, wc.status); - break; - } - err_status = 1; - break; - } - switch (wc.opcode) { case IB_WC_SEND: + if (wc.status != IB_WC_SUCCESS) + printk(KERN_ERR "Send completion error %d\n", + wc.status); ib_mad_send_done_handler(port_priv, &wc); break; case IB_WC_RECV: + if (wc.status != IB_WC_SUCCESS) + printk(KERN_ERR "Recv completion error %d\n", + wc.status); ib_mad_recv_done_handler(port_priv, &wc); break; default: @@ -1049,9 +1035,6 @@ } } } - - if (err_status) - ib_mad_port_restart(port_priv); } static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv) @@ -1567,23 +1550,6 @@ } /* - * Restart the port - */ -static int ib_mad_port_restart(struct ib_mad_port_private *port_priv) -{ - int ret; - - ib_mad_port_stop(port_priv); - ret = ib_mad_port_start(port_priv); - if (ret) { - printk(KERN_ERR "Couldn't restart %s port %d\n", - port_priv->device->name, port_priv->port_num); - } - - return ret; -} - -/* * Open the port * Create the QP, PD, MR, and CQ if needed */ From halr at voltaire.com Wed Oct 6 06:50:00 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 09:50:00 -0400 Subject: [openib-general] [PATCH] ib_mad: Add MAD pointer to send WR ud structure Message-ID: <1097070600.2428.5.camel@hpc-1> ib_mad: Add MAD pointer to send WR ud structure (in ib_verbs.h) Index: access/ib_mad_priv.h =================================================================== --- access/ib_mad_priv.h (revision 935) +++ access/ib_mad_priv.h (working copy) @@ -119,7 +119,7 @@ struct list_head agent_send_list; struct ib_mad_agent *agent; u64 wr_id; /* client WR ID */ - u64 tid; + union ib_tid tid; int timeout_ms; int refcount; enum ib_wc_status status; Index: access/ib_smi.c =================================================================== --- access/ib_smi.c (revision 937) +++ access/ib_smi.c (working copy) @@ -361,8 +361,9 @@ send_wr.wr.ud.ah = ah; send_wr.wr.ud.remote_qkey = 0; /* for SMPs */ + send_wr.wr.ud.mad = smp; send_wr.wr_id = ++port_priv->wr_id; - + pci_unmap_addr_set(smp, mapping, gather_list.addr); /* Send */ Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 939) +++ access/ib_mad.c (working copy) @@ -344,8 +344,7 @@ return -ENOMEM; } - mad_send_wr->tid = ((struct ib_mad_hdr*) - bus_to_virt(cur_send_wr->sg_list->addr))->tid.id; + mad_send_wr->tid.id = send_wr->wr.ud.mad->mad_hdr.tid.id; mad_send_wr->agent = mad_agent; mad_send_wr->timeout_ms = cur_send_wr->wr.ud.timeout_ms; if (mad_send_wr->timeout_ms) @@ -765,7 +764,7 @@ list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, agent_send_list) { - if (mad_send_wr->tid == tid) { + if (mad_send_wr->tid.id == tid) { /* Verify request is still valid */ if (mad_send_wr->status == IB_WC_SUCCESS && mad_send_wr->timeout_ms) Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 935) +++ include/ib_verbs.h (working copy) @@ -539,6 +539,7 @@ } atomic; struct { struct ib_ah *ah; + struct ib_mad *mad; u32 remote_qpn; u32 remote_qkey; int timeout_ms; /* valid for MADs only */ From halr at voltaire.com Wed Oct 6 06:53:44 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 09:53:44 -0400 Subject: [openib-general] [PATCH] ib_verbs.h (Roland's branch): Add MAD pointer to send WR ud structure Message-ID: <1097070824.2428.10.camel@hpc-1> ib_verbs.h (Roland's branch): Add MAD pointer to send WR ud structure This is to allow the MAD layer to poke at the MAD header without using bus_to_virt(). Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 939) +++ ib_verbs.h (working copy) @@ -523,6 +523,7 @@ } atomic; struct { struct ib_ah *ah; + struct ib_mad *mad; u32 remote_qpn; u32 remote_qkey; int timeout_ms; /* valid for MADs only */ From halr at voltaire.com Wed Oct 6 07:36:09 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 10:36:09 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: References: Message-ID: <1097073368.2353.6.camel@hpc-1> On Tue, 2004-10-05 at 16:34, Sean Hefty wrote: > A response MAD should have exactly the same TID as what was sent. It does. > Not sure why we aren't matching against the entire TID. Because we just want to first find the right MAD agent. > > if (solicited) { > > /* Routing is based on high 32 bits of transaction ID of MAD > >*/ > >- hi_tid = mad->mad_hdr.tid >> 32; > >+ hi_tid = be32_to_cpu(mad->mad_hdr.tid.tid_field.hi_tid); > > This shouldn't be necessary: > > Sender of request (system 1): > mad.tid = (mad_agent.hi_tid << 32) | user_tid; > send mad The problem with this is that when this is done on a little endian machine it shows up byte swapped on the network and not in network endian. So if hi_tid = 1 and user tid = 0x9abcdef0 then the transaction ID in the MAD is 0xf0debc9a01000000 I don't think that is what we want. > Receiver of response (system 1): > hi_tid = mad.tid >> 32 > The receiver of the request should just return the same TID that it > received. Yes and it does. > >+union ib_tid { > >+ u64 id; > >+ struct { > >+ u32 hi_tid; > >+ u32 lo_tid; > >+ } tid_field; > >+}; > >+ > > I don't see why TID can't be u64 everywhere. We shouldn't have to make it a > union. I did this for convenience. I can remove it once we settle the endianness issue. -- Hal From halr at voltaire.com Wed Oct 6 08:25:15 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 11:25:15 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <1097073368.2353.6.camel@hpc-1> References: <1097073368.2353.6.camel@hpc-1> Message-ID: <1097076314.2764.2.camel@hpc-1> On Wed, 2004-10-06 at 10:36, Hal Rosenstock wrote: > On Tue, 2004-10-05 at 16:34, Sean Hefty wrote: > > > if (solicited) { > > > /* Routing is based on high 32 bits of transaction ID of MAD > > >*/ > > >- hi_tid = mad->mad_hdr.tid >> 32; > > >+ hi_tid = be32_to_cpu(mad->mad_hdr.tid.tid_field.hi_tid); > > > > This shouldn't be necessary: It is still necessary as the TID is in network endian in the MAD header and CPU endian in hi_tid. > > > > Sender of request (system 1): > > mad.tid = (mad_agent.hi_tid << 32) | user_tid; > > send mad > > The problem with this is that when this is done on a little endian > machine it shows up byte swapped on the network and not in network > endian. > > So if hi_tid = 1 and user tid = 0x9abcdef0 > then the transaction ID in the MAD is 0xf0debc9a01000000 > I don't think that is what we want. The client needs to do one more step: Sender of request (system 1): mad.tid = cpu_to_be64((mad_agent.hi_tid << 32) | user_tid); send mad If this seems right, I can post a patch and remove the TID union. > > Receiver of response (system 1): > > hi_tid = mad.tid >> 32 -- Hal From roland at topspin.com Wed Oct 6 08:34:52 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 06 Oct 2004 08:34:52 -0700 Subject: [openib-general] [PATCH] ib_verbs.h (Roland's branch): Add MAD pointer to send WR ud structure In-Reply-To: <1097070824.2428.10.camel@hpc-1> (Hal Rosenstock's message of "Wed, 06 Oct 2004 09:53:44 -0400") References: <1097070824.2428.10.camel@hpc-1> Message-ID: <528yajrhb7.fsf@topspin.com> thanks, applied. From sean.hefty at intel.com Wed Oct 6 09:01:45 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 6 Oct 2004 09:01:45 -0700 Subject: [openib-general] [PATCH] ib_mad: Add MAD pointer to send WR udstructure In-Reply-To: <1097070600.2428.5.camel@hpc-1> Message-ID: >Index: include/ib_verbs.h >=================================================================== >--- include/ib_verbs.h (revision 935) >+++ include/ib_verbs.h (working copy) >@@ -539,6 +539,7 @@ > } atomic; > struct { > struct ib_ah *ah; >+ struct ib_mad *mad; I _think_ that the user only needs to reference something like the following: union ib_mad_hdrs { struct ib_mad_hdr mad_hdr; struct { struct ib_mad_hdr mad_hdr; struct ib_rmpp_hdr rmpp_hdr; } hdr; }; I don't think that we need a pointer to the MAD data. - Sean From ftillier at infiniconsys.com Wed Oct 6 09:04:49 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Wed, 6 Oct 2004 09:04:49 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <1097073368.2353.6.camel@hpc-1> Message-ID: <000001c4abbe$36810ea0$655aa8c0@infiniconsys.com> > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, October 06, 2004 7:36 AM > > The problem with this is that when this is done on a little endian > machine it shows up byte swapped on the network and not in network > endian. > > So if hi_tid = 1 and user tid = 0x9abcdef0 > then the transaction ID in the MAD is 0xf0debc9a01000000 > > I don't think that is what we want. The TID on the wire is opaque to any recipient. A response to a MAD should have exactly the same TID. The recipient of the response (original client) will then correctly decode the hi_tid and user_tid since the transaction ID is received exactly how it was sent. You only need byte swapping if the value needs to be interpreted by some other node with unknown endianness. If the recipient just blindly echoes the value back, the TID is effectively just a 64-bit data blob that it cares not about - byte ordering doesn't matter one bit. The endianness of the TID does not change for the client - it is sent in host order, and is received in host order. The only time this could cause problems is if your client's CPU changes endianness between the time the request is sent and the response received. I don't think we should bother coding for that possibility. - Fab From halr at voltaire.com Wed Oct 6 09:42:20 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 12:42:20 -0400 Subject: [openib-general] [PATCH] ib_mad.h: Remove network endian conversion of QP1 QKey Message-ID: <1097080940.2432.2.camel@hpc-1> ib_mad.h: Remove network endian conversion of QP1 QKey Index: ib_mad.h =================================================================== --- ib_mad.h (revision 935) +++ ib_mad.h (working copy) @@ -58,7 +58,7 @@ #define IB_QP0 0 #define IB_QP1 cpu_to_be32(1) -#define IB_QP1_QKEY cpu_to_be32(0x80010000) +#define IB_QP1_QKEY 0x80010000 struct ib_grh { u32 version_tclass_flow; From halr at voltaire.com Wed Oct 6 09:47:34 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 12:47:34 -0400 Subject: [openib-general] [PATCH] ib_mad: Add MAD pointer to send WR udstructure In-Reply-To: References: Message-ID: <1097081254.2432.5.camel@hpc-1> On Wed, 2004-10-06 at 12:01, Sean Hefty wrote: > >Index: include/ib_verbs.h > >=================================================================== > >--- include/ib_verbs.h (revision 935) > >+++ include/ib_verbs.h (working copy) > >@@ -539,6 +539,7 @@ > > } atomic; > > struct { > > struct ib_ah *ah; > >+ struct ib_mad *mad; > > I _think_ that the user only needs to reference something like the > following: > > union ib_mad_hdrs { > struct ib_mad_hdr mad_hdr; > struct { > struct ib_mad_hdr mad_hdr; > struct ib_rmpp_hdr rmpp_hdr; > } hdr; > }; > > I don't think that we need a pointer to the MAD data. OK. But your diffs are ahead of the tree right now so I can't change it to union ib_mad_hdrs yet. -- Hal From halr at voltaire.com Wed Oct 6 09:48:25 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 12:48:25 -0400 Subject: [openib-general] [PATCH] ib_mad.h: Remove network endian conversion of QP1 QKey In-Reply-To: <1097080940.2432.2.camel@hpc-1> References: <1097080940.2432.2.camel@hpc-1> Message-ID: <1097081305.2432.7.camel@hpc-1> On Wed, 2004-10-06 at 12:42, Hal Rosenstock wrote: > ib_mad.h: Remove network endian conversion of QP1 QKey I forgot to say that with this change GMPs including request/response matching is now working :-) -- Hal From ftillier at infiniconsys.com Wed Oct 6 09:46:13 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Wed, 6 Oct 2004 09:46:13 -0700 Subject: [openib-general] [PATCH] ib_mad.h: Remove network endian conversionof QP1 QKey In-Reply-To: <1097080940.2432.2.camel@hpc-1> Message-ID: <000101c4abc3$fe5cbf00$655aa8c0@infiniconsys.com> > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, October 06, 2004 9:42 AM > > ib_mad.h: Remove network endian conversion of QP1 QKey I don't get the point. Is IB_QP1_QKEY going to be treated in host order, and then swapped by someone at some point when posting? Does mthca expect the QKey to be provided in host order and then swap it when formatting the WQEs? - Fab From halr at voltaire.com Wed Oct 6 09:49:47 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 12:49:47 -0400 Subject: [openib-general] [ANNOUNCE]: MAD layer with SMI/SMA Message-ID: <1097081387.2432.10.camel@hpc-1> This is the "formal" announcement on the availability of the first release of the OpenIB MAD layer with SMI/SMA (located in the openib-candidate branch). The OpenIB access layer is comprised of the MAD layer and the SMI/SMA. These are currently implemented as separate modules. These are layered on top of the mthca driver and cannot run currently with the MAD layer from the roland_merge branch. These directions are how to combine the two for building and running. Note also that the stack currently ends here (e.g. don't expect to run IPoIB right now, although we are heading to this; especially don't expect to run SRP or SDP or u/kDAPL right now either). There is a README in src/linux-kernel/infiniband/access which also contains building and using instructions. I have now done the following since last night: Run through install and build process from scratch. Add virtual address to the ud parameter in ib_mad_send. Redo MAD completion handler error handling. Investigate TID endian. Initial GMP testing (including request/response) There is one patch pending relative to TID endian (and elimination of the TID union). Other Pending Items (that don't affect release): Trim method mask used in ib_smi.c There is a workaround on the hop pointer in ib_smi.c which needs a real fix. Undo commenting out of ib_mad_send in mthca_mad.c More GMP testing Change send WR ud parameter from MAD to MAD header From sean.hefty at intel.com Wed Oct 6 09:47:48 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 6 Oct 2004 09:47:48 -0700 Subject: [openib-general] [PATCH] ib_mad: Add MAD pointer to send WRudstructure In-Reply-To: <1097081254.2432.5.camel@hpc-1> Message-ID: >> I _think_ that the user only needs to reference something like the >> following: >> >> union ib_mad_hdrs { >> struct ib_mad_hdr mad_hdr; >> struct { >> struct ib_mad_hdr mad_hdr; >> struct ib_rmpp_hdr rmpp_hdr; >> } hdr; >> }; >> >> I don't think that we need a pointer to the MAD data. > >OK. But your diffs are ahead of the tree right now so I can't change it >to union ib_mad_hdrs yet. I don't actually have this in my tree. Just thinking whether the user needs to provide a pointer to just the headers, or to the entire MAD. The API should be clear what the expectations are on the client. From halr at voltaire.com Wed Oct 6 10:16:17 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 13:16:17 -0400 Subject: [openib-general] [PATCH] ib_mad: Add MAD pointer to send WRudstructure In-Reply-To: References: Message-ID: <1097082977.2432.33.camel@hpc-1> On Wed, 2004-10-06 at 12:47, Sean Hefty wrote: > >> I _think_ that the user only needs to reference something like the > >> following: > >> > >> union ib_mad_hdrs { > >> struct ib_mad_hdr mad_hdr; > >> struct { > >> struct ib_mad_hdr mad_hdr; > >> struct ib_rmpp_hdr rmpp_hdr; > >> } hdr; > >> }; > >> > >> I don't think that we need a pointer to the MAD data. > > > >OK. But your diffs are ahead of the tree right now so I can't change it > >to union ib_mad_hdrs yet. > > I don't actually have this in my tree. Thought you might have started on RMPP :-) > Just thinking whether the user needs > to provide a pointer to just the headers, or to the entire MAD. The API > should be clear what the expectations are on the client. I'll generate another patch for this. -- Hal From halr at voltaire.com Wed Oct 6 10:21:04 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 13:21:04 -0400 Subject: [openib-general] [PATCH] ib_mad.h: Remove network endian conversionof QP1 QKey In-Reply-To: <000101c4abc3$fe5cbf00$655aa8c0@infiniconsys.com> References: <000101c4abc3$fe5cbf00$655aa8c0@infiniconsys.com> Message-ID: <1097083264.2432.39.camel@hpc-1> On Wed, 2004-10-06 at 12:46, Fab Tillier wrote: > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Wednesday, October 06, 2004 9:42 AM > > > > ib_mad.h: Remove network endian conversion of QP1 QKey > > I don't get the point. Is IB_QP1_QKEY going to be treated in host order, > and then swapped by someone at some point when posting? Does mthca expect > the QKey to be provided in host order and then swap it when formatting the > WQEs? In ib_mad.c, it's used as a QKey attribute to the HCA when the QP is initialized. In a client, it's used as the remote QKey in the send WR ud request. The only byte swapping is done by the HCA not in the MAD layer or client. -- Hal From tduffy at sun.com Wed Oct 6 10:19:46 2004 From: tduffy at sun.com (Tom Duffy) Date: Wed, 06 Oct 2004 10:19:46 -0700 Subject: [openib-general] using VAPI and CMAPI in kernel module. In-Reply-To: <416297A7.3000408@eig.unige.ch> References: <416297A7.3000408@eig.unige.ch> Message-ID: <1097083186.5307.17.camel@duffman> On Tue, 2004-10-05 at 14:46 +0200, von Wyl wrote: > Hi, > > I'm trying to compile a kernel module which use the VAPI and (maybe) the > CMAPI. And when I compile a file with an inclusion of vapi.h I got this : Please use the new mthca driver instead of the old mellanox driver. -tduffy -- "When they took the 4th Amendment, I was quiet because I didn't deal drugs. When they took the 6th Amendment, I was quiet because I am innocent. When they took the 2nd Amendment, I was quiet because I don't own a gun. Now they have taken the 1st Amendment, and I can only be quiet." --Lyle Myhr -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From sean.hefty at intel.com Wed Oct 6 10:23:16 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 6 Oct 2004 10:23:16 -0700 Subject: [openib-general] [PATCH] ib_mad.h: Remove network endianconversionof QP1 QKey In-Reply-To: <1097083264.2432.39.camel@hpc-1> Message-ID: >On Wed, 2004-10-06 at 12:46, Fab Tillier wrote: >> > From: Hal Rosenstock [mailto:halr at voltaire.com] >> > Sent: Wednesday, October 06, 2004 9:42 AM >> > >> > ib_mad.h: Remove network endian conversion of QP1 QKey >> >> I don't get the point. Is IB_QP1_QKEY going to be treated in host order, >> and then swapped by someone at some point when posting? Does mthca >expect >> the QKey to be provided in host order and then swap it when formatting >the >> WQEs? > >In ib_mad.c, it's used as a QKey attribute to the HCA when the QP is >initialized. > >In a client, it's used as the remote QKey in the send WR ud request. > >The only byte swapping is done by the HCA not in the MAD layer or >client. Mthca performs byte-swapping on the qkey in both of these cases. At this point, I think the openib stack needs byte-ordering reworked, but it can probably wait. From roland at topspin.com Wed Oct 6 10:30:12 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 06 Oct 2004 10:30:12 -0700 Subject: [openib-general] [ANNOUNCE]: MAD layer with SMI/SMA In-Reply-To: <1097081387.2432.10.camel@hpc-1> (Hal Rosenstock's message of "Wed, 06 Oct 2004 12:49:47 -0400") References: <1097081387.2432.10.camel@hpc-1> Message-ID: <523c0rrbyz.fsf@topspin.com> Great! I will start merging this onto my tree as soon as I get my IPoIB driver into shape to commit. (I've currently torn it apart getting rid of the pseudo-ethernet layer, and I have it mostly working again). - Roland From halr at voltaire.com Wed Oct 6 10:45:17 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 13:45:17 -0400 Subject: [openib-general] [PATCH] ib_mad: Eliminate ib_tid union Message-ID: <1097084716.2432.42.camel@hpc-1> ib_mad: Eliminate ib_tid union Index: access/ib_mad_priv.h =================================================================== --- access/ib_mad_priv.h (revision 942) +++ access/ib_mad_priv.h (working copy) @@ -119,7 +119,7 @@ struct list_head agent_send_list; struct ib_mad_agent *agent; u64 wr_id; /* client WR ID */ - union ib_tid tid; + u64 tid; int timeout_ms; int refcount; enum ib_wc_status status; Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 940) +++ access/ib_mad.c (working copy) @@ -344,7 +344,7 @@ return -ENOMEM; } - mad_send_wr->tid.id = send_wr->wr.ud.mad->mad_hdr.tid.id; + mad_send_wr->tid = send_wr->wr.ud.mad->mad_hdr.tid; mad_send_wr->agent = mad_agent; mad_send_wr->timeout_ms = cur_send_wr->wr.ud.timeout_ms; if (mad_send_wr->timeout_ms) @@ -682,7 +682,7 @@ /* Whether MAD was solicited determines type of routing to MAD client */ if (solicited) { /* Routing is based on high 32 bits of transaction ID of MAD */ - hi_tid = be32_to_cpu(mad->mad_hdr.tid.tid_field.hi_tid); + hi_tid = be64_to_cpu(mad->mad_hdr.tid) >> 32; list_for_each_entry(entry, &port_priv->agent_list, agent_list) { if (entry->agent.hi_tid == hi_tid) { mad_agent = entry; @@ -764,7 +764,7 @@ list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, agent_send_list) { - if (mad_send_wr->tid.id == tid) { + if (mad_send_wr->tid == tid) { /* Verify request is still valid */ if (mad_send_wr->status == IB_WC_SUCCESS && mad_send_wr->timeout_ms) @@ -793,7 +793,7 @@ if (solicited) { spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); mad_send_wr = find_send_req(mad_agent_priv, - recv->mad.mad.mad_hdr.tid.id); + recv->mad.mad.mad_hdr.tid); if (!mad_send_wr) { spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); Index: include/ib_mad.h =================================================================== --- include/ib_mad.h (revision 943) +++ include/ib_mad.h (working copy) @@ -69,14 +69,6 @@ union ib_gid dgid; } __attribute__ ((packed)); -union ib_tid { - u64 id; - struct { - u32 hi_tid; - u32 lo_tid; - } tid_field; -}; - struct ib_mad_hdr { u8 base_version; u8 mgmt_class; @@ -84,7 +76,7 @@ u8 method; u16 status; u16 class_specific; - union ib_tid tid; + u64 tid; u16 attr_id; u16 resv; u32 attr_mod; Index: include/ib_smi.h =================================================================== --- include/ib_smi.h (revision 935) +++ include/ib_smi.h (working copy) @@ -41,7 +41,7 @@ u16 status; u8 hop_ptr; u8 hop_cnt; - union ib_tid tid; + u64 tid; u16 attr_id; u16 resv; u32 attr_mod; From halr at voltaire.com Wed Oct 6 11:01:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 14:01:24 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix request/response matching In-Reply-To: <000001c4abbe$36810ea0$655aa8c0@infiniconsys.com> References: <000001c4abbe$36810ea0$655aa8c0@infiniconsys.com> Message-ID: <1097085684.2432.48.camel@hpc-1> On Wed, 2004-10-06 at 12:04, Fab Tillier wrote: > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Wednesday, October 06, 2004 7:36 AM > > > > The problem with this is that when this is done on a little endian > > machine it shows up byte swapped on the network and not in network > > endian. > > > > So if hi_tid = 1 and user tid = 0x9abcdef0 > > then the transaction ID in the MAD is 0xf0debc9a01000000 > > > > I don't think that is what we want. > > The TID on the wire is opaque to any recipient. True but by conforming to network endian it makes it easier to understand what is going on (which internal client should receive it). This is a minor cost. > A response to a MAD should have exactly the same TID. > The recipient of the response (original client) > will then correctly decode the hi_tid and user_tid since the transaction ID > is received exactly how it was sent. You only need byte swapping if the > value needs to be interpreted by some other node with unknown endianness. > If the recipient just blindly echoes the value back, the TID is effectively > just a 64-bit data blob that it cares not about - byte ordering doesn't > matter one bit. The endianness of the TID does not change for the client - > it is sent in host order, and is received in host order. The only time this > could cause problems is if your client's CPU changes endianness between the > time the request is sent and the response received. I don't think we should > bother coding for that possibility. Agreed. -- Hal From halr at voltaire.com Wed Oct 6 11:11:51 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 14:11:51 -0400 Subject: [openib-general] [PATCH] ib_mad: Use pointer to madhdr rather than mad in send WR ud structure Message-ID: <1097086311.2768.1.camel@hpc-1> ib_mad: Use pointer to madhdr rather than mad in send WR ud structure We'll go the rest of the way on this when RMPP is implemented. Index: access/ib_smi.c =================================================================== --- access/ib_smi.c (revision 942) +++ access/ib_smi.c (working copy) @@ -361,7 +361,7 @@ send_wr.wr.ud.ah = ah; send_wr.wr.ud.remote_qkey = 0; /* for SMPs */ - send_wr.wr.ud.mad = smp; + send_wr.wr.ud.mad_hdr = (struct ib_mad_hdr *)smp; send_wr.wr_id = ++port_priv->wr_id; pci_unmap_addr_set(smp, smi_send_wr->mapping, gather_list.addr); Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 944) +++ access/ib_mad.c (working copy) @@ -344,7 +344,7 @@ return -ENOMEM; } - mad_send_wr->tid = send_wr->wr.ud.mad->mad_hdr.tid; + mad_send_wr->tid = send_wr->wr.ud.mad_hdr->tid; mad_send_wr->agent = mad_agent; mad_send_wr->timeout_ms = cur_send_wr->wr.ud.timeout_ms; if (mad_send_wr->timeout_ms) Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 940) +++ include/ib_verbs.h (working copy) @@ -539,7 +539,7 @@ } atomic; struct { struct ib_ah *ah; - struct ib_mad *mad; + struct ib_mad_hdr *mad_hdr; u32 remote_qpn; u32 remote_qkey; int timeout_ms; /* valid for MADs only */ From halr at voltaire.com Wed Oct 6 11:36:16 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 14:36:16 -0400 Subject: [openib-general] [PATCH] ib_verbs.h (Roland's branch): Use pointer to madhdr rather than mad in send WR ud structure Message-ID: <1097087776.2768.20.camel@hpc-1> ib_verbs.h: Use pointer to madhdr rather than mad in send WR ud structure We'll go the rest of the way on this when RMPP is implemented. (This patch applies to Roland's branch). Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 944) +++ ib_verbs.h (working copy) @@ -523,7 +523,7 @@ } atomic; struct { struct ib_ah *ah; - struct ib_mad *mad; + struct ib_mad_hdr *mad_hdr; u32 remote_qpn; u32 remote_qkey; int timeout_ms; /* valid for MADs only */ From halr at voltaire.com Wed Oct 6 12:35:18 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 06 Oct 2004 15:35:18 -0400 Subject: [openib-general] mthca support for Arbel Message-ID: <1097091318.1963.4.camel@localhost.localdomain> Hi, Has mthca been tested with Arbel (PCI Express) ? Is this in compatibility mode or native mode or both ? Thanks. -- Hal From roland at topspin.com Wed Oct 6 13:09:50 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 06 Oct 2004 13:09:50 -0700 Subject: [openib-general] mthca support for Arbel In-Reply-To: <1097091318.1963.4.camel@localhost.localdomain> (Hal Rosenstock's message of "Wed, 06 Oct 2004 15:35:18 -0400") References: <1097091318.1963.4.camel@localhost.localdomain> Message-ID: <52u0t7pq0h.fsf@topspin.com> Hal> Hi, Has mthca been tested with Arbel (PCI Express) ? Is this Hal> in compatibility mode or native mode or both ? I've tested mthca with Arbel in compatibility mode. I haven't had access to native mode firmware or up-to-date documentation, so I have not even started on native mode support. - Roland From roland at topspin.com Wed Oct 6 13:32:56 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 06 Oct 2004 13:32:56 -0700 Subject: [openib-general] [PATCH] Start moving to a native IPoIB driver Message-ID: <52pt3vpoxz.fsf@topspin.com> I've just committed this patch. It removes the fake ethernet layer and starts turning IPoIB into a native driver (with addr_len 20 and type ARPHRD_INFINIBAND). The driver is working pretty well with these changes, although multicast is not working at all and there are lots of leaks and races that I still need to fix up. I'm still not sure that I'm doing everything the right way but I think I've made a lot of progress towards something I wouldn't be embarrassed to post to the netdev list. This approach seems to simplify things quite a bit (diffstat shows a net deletion of 1300 lines from a driver that was < 5000 lines to start with) and performance seems a bit better as well. Surprisingly tcpdump and ethereal still work; tcpdump warns; tcpdump: WARNING: arptype 32 not supported by libpcap - falling back to cooked socket but still works fine. The ifconfig and arp commands can't cope with the longer network address, but the ip command handles it fine: # ip link show dev ib0 6: ib0: mtu 2044 qdisc pfifo_fast qlen 128 link/[32] 00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:01:07:8c:e4:61 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff # ip neigh show dev ib0 12.0.0.1 lladdr 00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:01:07:fc:c7:11 nud reachable ip2pr (and indirectly sdp) are broken by these changes, but Libor has said he will work on fixing this up. For now I'm going to move onto integrating the new MAD layer into my tree and come back to IPoIB in a few days. - Roland Index: infiniband/ulp/Kconfig =================================================================== --- infiniband/ulp/Kconfig (revision 915) +++ infiniband/ulp/Kconfig (working copy) @@ -32,7 +32,7 @@ config INFINIBAND_SDP tristate "Sockets Direct Protocol" - depends on INFINIBAND && INFINIBAND_IPOIB + depends on BROKEN && INFINIBAND && INFINIBAND_IPOIB select INFINIBAND_CM ---help--- Support for Sockets Direct Protocol (SDP). This provides Index: infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- infiniband/ulp/ipoib/ipoib_verbs.c (revision 915) +++ infiniband/ulp/ipoib/ipoib_verbs.c (working copy) @@ -122,6 +122,12 @@ } priv->local_qpn = priv->qp->qp_num; + ipoib_dbg(priv, "Local QPN: %06x\n", priv->local_qpn); + + priv->dev->dev_addr[1] = (priv->local_qpn >> 16) & 0xff; + priv->dev->dev_addr[2] = (priv->local_qpn >> 8) & 0xff; + priv->dev->dev_addr[3] = (priv->local_qpn ) & 0xff; + qp_attr.qp_state = IB_QPS_INIT; qp_attr.qkey = 0; qp_attr.port_num = priv->port; Index: infiniband/ulp/ipoib/ipoib_arp.c =================================================================== --- infiniband/ulp/ipoib/ipoib_arp.c (revision 915) +++ infiniband/ulp/ipoib/ipoib_arp.c (working copy) @@ -1,1057 +0,0 @@ -/* - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available at - * , or the OpenIB.org BSD - * license, available in the LICENSE.TXT file accompanying this - * software. These details are also available at - * . - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * Copyright (c) 2004 Topspin Communications. All rights reserved. - * - * $Id$ - */ - -#include -#include -#include - -#include "ipoib.h" - -#include "ts_ib_sa_client.h" - -enum { - IPOIB_ADDRESS_HASH_BITS = IPOIB_ADDRESS_HASH_BYTES * 8, -}; - -struct ipoib_sarp_cache { - struct list_head table[256]; -}; - -struct ipoib_sarp { - struct list_head cache_list; - - atomic_t refcnt; - - uint8_t hash[IPOIB_ADDRESS_HASH_BYTES]; - - union ib_gid gid; - u32 qpn; - u16 lid; - tTS_IB_SL sl; - struct ib_ah *address_handle; - tTS_IB_CLIENT_QUERY_TID tid; - - unsigned long created; - unsigned long last_verify; - unsigned long last_directed_query; - unsigned long first_directed_reply; - - unsigned char require_verify:1; - unsigned char directed_query:1; - unsigned char directed_reply:4; - - unsigned char logcount; - - struct sk_buff_head pkt_queue; - struct work_struct path_record_work; - - struct net_device *dev; -}; - -struct ipoib_sarp_iter { - struct net_device *dev; - uint8_t hash; - struct list_head *cur; -}; - -struct ipoib_arp_payload { - uint8_t src_hw_addr[IPOIB_HW_ADDR_LEN]; - uint8_t src_ip_addr[4]; - uint8_t dst_hw_addr[IPOIB_HW_ADDR_LEN]; - uint8_t dst_ip_addr[4]; -}; - -static void _ipoib_sarp_path_lookup(void *_entry); - -/* =============================================================== */ -/*.._ipoib_sarp_hash -- hash GID/QPN to 6 bytes */ -static void _ipoib_sarp_hash(union ib_gid *gid, u32 qpn, uint8_t *hash) -{ - /* We use the FNV hash (http://www.isthe.com/chongo/tech/comp/fnv/) */ -#define TS_FNV_64_PRIME 0x100000001b3ULL -#define TS_FNV_64_INIT 0xcbf29ce484222325ULL - - int i; - uint64_t h = TS_FNV_64_INIT; - - /* make qpn big-endian so we know where digits are */ - qpn = cpu_to_be32(qpn); - - for (i = 0; i < sizeof (union ib_gid) + 3; ++i) { - h *= TS_FNV_64_PRIME; - h ^= (i < sizeof(tTS_IB_GID) - ? gid->raw[i] - : ((uint8_t *)&qpn)[i - sizeof (union ib_gid) + 1]); - } - - /* xor fold down to 6 bytes and make big-endian */ - h = cpu_to_be64((h >> IPOIB_ADDRESS_HASH_BITS) - ^ (h & ((1ULL << IPOIB_ADDRESS_HASH_BITS) - 1))); - - memcpy(hash, ((uint8_t *)&h) + 2, IPOIB_ADDRESS_HASH_BYTES); -} - -/* =============================================================== */ -/*..ipoib_sarp_get -- increment reference count for ARP entry */ -void ipoib_sarp_get(struct ipoib_sarp *entry) -{ - atomic_inc(&entry->refcnt); -} - -/* =============================================================== */ -/*.._ipoib_sarp_alloc -- allocate shadow ARP entry */ -static struct ipoib_sarp *_ipoib_sarp_alloc(struct net_device *dev) -{ - struct ipoib_sarp *entry; - - entry = kmalloc(sizeof(*entry), GFP_ATOMIC); - if (!entry) - return NULL; - - atomic_set(&entry->refcnt, 2); /* The calling function needs to put */ - - entry->dev = dev; - - entry->require_verify = 0; - entry->directed_query = 1; - entry->directed_reply = 0; - - entry->created = jiffies; - entry->last_verify = jiffies; - entry->last_directed_query = jiffies; - entry->first_directed_reply = jiffies; - - entry->logcount = 0; - - INIT_LIST_HEAD(&entry->cache_list); - - skb_queue_head_init(&entry->pkt_queue); - INIT_WORK(&entry->path_record_work, - _ipoib_sarp_path_lookup, entry); - - entry->address_handle = NULL; - - /* Will force a trigger on the first packet we need to send */ - entry->tid = TS_IB_CLIENT_QUERY_TID_INVALID; - - return entry; -} - -/* =============================================================== */ -/*..ipoib_sarp_put -- decrement reference count for ARP entry */ -void ipoib_sarp_put(struct ipoib_sarp *entry) -{ - struct net_device *dev = entry->dev; - struct ipoib_dev_priv *priv = netdev_priv(dev); - - if (atomic_dec_and_test(&entry->refcnt)) { - ipoib_dbg(priv, "deleting ARP shadow cache entry " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - entry->hash[0], entry->hash[1], entry->hash[2], - entry->hash[3], entry->hash[4], entry->hash[5]); - - if (entry->address_handle != NULL) { - int ret = ib_destroy_ah(entry->address_handle); - if (ret < 0) - ipoib_warn(priv, "ib_destroy_ah failed (ret = %d)\n", - ret); - } - - while (!skb_queue_empty(&entry->pkt_queue)) { - struct sk_buff *skb = skb_dequeue(&entry->pkt_queue); - - skb->dev = dev; - dev_kfree_skb_any(skb); - } - - kfree(entry); - } -} - -/* =============================================================== */ -/*..__ipoib_sarp_find -- find ARP entry (unlocked) */ -static struct ipoib_sarp *__ipoib_sarp_find(struct net_device *dev, - const uint8_t *hash) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_sarp *entry; - - list_for_each_entry(entry, &priv->sarp_cache->table[hash[0]], - cache_list) { - ipoib_dbg_data(priv, "matching %02x:%02x:%02x:%02x:%02x:%02x\n", - entry->hash[0], entry->hash[1], entry->hash[2], - entry->hash[3], entry->hash[4], entry->hash[5]); - - if (!memcmp(hash, entry->hash, IPOIB_ADDRESS_HASH_BYTES)) { - ipoib_sarp_get(entry); - return entry; - } - } - - return NULL; -} - -/* =============================================================== */ -/*.._ipoib_sarp_find -- find ARP entry */ -static struct ipoib_sarp *_ipoib_sarp_find(struct net_device *dev, - const uint8_t *hash) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_sarp *entry; - unsigned long flags; - - spin_lock_irqsave(&priv->lock, flags); - entry = __ipoib_sarp_find(dev, hash); - spin_unlock_irqrestore(&priv->lock, flags); - - return entry; -} - -/* =============================================================== */ -/*..ipoib_sarp_iter_init -- create new ARP iterator */ -struct ipoib_sarp_iter *ipoib_sarp_iter_init(struct net_device *dev) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_sarp_iter *iter; - - iter = kmalloc(sizeof(*iter), GFP_KERNEL); - if (!iter) - return NULL; - - iter->dev = dev; - iter->hash = 0; - iter->cur = priv->sarp_cache->table[0].next; - - while (iter->cur == &priv->sarp_cache->table[iter->hash]) { - ++iter->hash; - if (iter->hash == 0) { - /* ARP table is empty */ - kfree(iter); - return NULL; - } - iter->cur = priv->sarp_cache->table[iter->hash].next; - } - - return iter; -} - -/* =============================================================== */ -/*..ipoib_sarp_iter_free -- free ARP iterator */ -void ipoib_sarp_iter_free(struct ipoib_sarp_iter *iter) -{ - kfree(iter); -} - -/* =============================================================== */ -/*..ipoib_sarp_iter_next -- incr. iter. -- return non-zero at end */ -int ipoib_sarp_iter_next(struct ipoib_sarp_iter *iter) -{ - struct ipoib_dev_priv *priv = netdev_priv(iter->dev); - - while (1) { - iter->cur = iter->cur->next; - - if (iter->cur == &priv->sarp_cache->table[iter->hash]) { - ++iter->hash; - if (!iter->hash) - return 1; - - iter->cur = &priv->sarp_cache->table[iter->hash]; - } else - return 0; - } -} - -/* =============================================================== */ -/*..ipoib_sarp_iter_read -- get data pointed to by ARP iterator */ -void ipoib_sarp_iter_read(struct ipoib_sarp_iter *iter, uint8_t *hash, - union ib_gid *gid, u32 *qpn, - unsigned long *created, unsigned long *last_verify, - unsigned int *queuelen, unsigned int *complete) -{ - struct ipoib_sarp *entry; - - entry = list_entry(iter->cur, struct ipoib_sarp, cache_list); - - memcpy(hash, entry->hash, IPOIB_ADDRESS_HASH_BYTES); - *gid = entry->gid; - *qpn = entry->qpn; - *created = entry->created; - *last_verify = entry->last_verify; - *queuelen = skb_queue_len(&entry->pkt_queue); - *complete = entry->address_handle != NULL; -} - -/* =============================================================== */ -/*..ipoib_sarp_add -- add ARP entry */ -struct ipoib_sarp *ipoib_sarp_add(struct net_device *dev, union ib_gid *gid, - u32 qpn) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - uint8_t hash[IPOIB_ADDRESS_HASH_BYTES]; - struct ipoib_sarp *entry; - unsigned long flags; - - _ipoib_sarp_hash(gid, qpn, hash); - - entry = _ipoib_sarp_find(dev, hash); - if (entry) { - if (entry->qpn != qpn || - memcmp(entry->gid.raw, gid->raw, sizeof (union ib_gid))) { - ipoib_warn(priv, "hash collision\n"); - ipoib_sarp_put(entry); /* for _find() */ - return NULL; - } else - return entry; - } - - entry = _ipoib_sarp_alloc(dev); - if (!entry) { - ipoib_warn(priv, "out of memory for ARP entry\n"); - return NULL; - } - - memcpy(entry->hash, hash, sizeof(entry->hash)); - entry->gid = *gid; - entry->qpn = qpn; - - entry->require_verify = 1; - - spin_lock_irqsave(&priv->lock, flags); - list_add_tail(&entry->cache_list, &priv->sarp_cache->table[hash[0]]); - spin_unlock_irqrestore(&priv->lock, flags); - - return entry; -} - -/* =============================================================== */ -/*..ipoib_sarp_local_add -- add ARP hash for local node */ -struct ipoib_sarp *ipoib_sarp_local_add(struct net_device *dev, - union ib_gid *gid, u32 qpn) -{ - _ipoib_sarp_hash(gid, qpn, dev->dev_addr); - return ipoib_sarp_add(dev, gid, qpn); -} - -/* =============================================================== */ -/*..ipoib_sarp_delete -- delete shadow ARP cache entry */ -int ipoib_sarp_delete(struct net_device *dev, const uint8_t *hash) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_sarp *entry; - unsigned long flags; - - spin_lock_irqsave(&priv->lock, flags); - - entry = __ipoib_sarp_find(dev, hash); - if (!entry) { - spin_unlock_irqrestore(&priv->lock, flags); - - return 0; - } - - list_del_init(&entry->cache_list); - - spin_unlock_irqrestore(&priv->lock, flags); - - ipoib_sarp_put(entry); /* for _find() */ - ipoib_sarp_put(entry); /* for original reference */ - - return 1; -} - -/* =============================================================== */ -/*.._ipoib_sarp_path_record_completion -- path record comp func */ -static int _ipoib_sarp_path_record_completion(tTS_IB_CLIENT_QUERY_TID tid, - int status, - struct ib_path_record *path, - int remaining, void *entry_ptr) -{ - struct ipoib_sarp *entry = entry_ptr; - struct net_device *dev = entry->dev; - struct ipoib_dev_priv *priv = netdev_priv(dev); - - ipoib_dbg(priv, "path record lookup done, status %d\n", status); - - entry->tid = TS_IB_CLIENT_QUERY_TID_INVALID; - - if (!status) { - struct ib_ah_attr av = { - .dlid = path->dlid, - .sl = path->sl, - .src_path_bits = 0, - .static_rate = 0, - .ah_flags = 0, - .port_num = priv->port - }; - - entry->address_handle = ib_create_ah(priv->pd, &av); - if (IS_ERR(entry->address_handle)) { - ipoib_warn(priv, "ib_create_ah failed\n"); - } else { - ipoib_dbg(priv, "created address handle %p for LID 0x%04x, SL %d\n", - entry->address_handle, path->dlid, path->sl); - - entry->lid = path->dlid; - entry->sl = path->sl; - - /* actually send any queued packets */ - while (!skb_queue_empty(&entry->pkt_queue)) { - struct sk_buff *skb = - skb_dequeue(&entry->pkt_queue); - - skb->dev = dev; - - if (dev_queue_xmit(skb)) - ipoib_warn(priv, "dev_queue_xmit failed " - "to requeue packet\n"); - } - } - } else { - if (status != -ETIMEDOUT && entry->logcount < 20) { - ipoib_warn(priv, "tsIbPathRecordRequest completion failed " - "for %02x:%02x:%02x:%02x:%02x:%02x, status = %d\n", - entry->hash[0], entry->hash[1], - entry->hash[2], entry->hash[3], - entry->hash[4], entry->hash[5], status); - entry->logcount++; - } - - /* Flush out any queued packets */ - while (!skb_queue_empty(&entry->pkt_queue)) { - struct sk_buff *skb = skb_dequeue(&entry->pkt_queue); - - skb->dev = dev; - dev_kfree_skb_any(skb); - } - } - - ipoib_sarp_put(entry); /* for _get() in original call */ - - /* nonzero return means no more callbacks (we have our path) */ - return 1; -} - -/* =============================================================== */ -/*.._ipoib_sarp_path_lookup - start path lookup */ -static void _ipoib_sarp_path_lookup(void *entry_ptr) -{ - struct ipoib_sarp *entry = entry_ptr; - struct net_device *dev = entry->dev; - struct ipoib_dev_priv *priv = netdev_priv(dev); - tTS_IB_CLIENT_QUERY_TID tid; - - ipoib_sarp_get(entry); - if (tsIbPathRecordRequest(priv->ca, priv->port, - priv->local_gid.raw, - entry->gid.raw, - priv->pkey, 0, HZ, 3600 * HZ, /* XXX cache jiffies */ - _ipoib_sarp_path_record_completion, - entry, &tid)) { - ipoib_warn(priv, "tsIbPathRecordRequest failed\n"); - ipoib_sarp_put(entry); /* for _get() */ - } else { - ipoib_dbg(priv, "no address vector, starting path record lookup\n"); - entry->tid = tid; - } -} - -/* =============================================================== */ -/*.._ipoib_sarp_lookup -- return address and qpn for entry */ -static int _ipoib_sarp_lookup(struct ipoib_sarp *entry) -{ - struct net_device *dev = entry->dev; - struct ipoib_dev_priv *priv = netdev_priv(dev); - - /* If DEBUG is undefined, priv won't be used */ - (void) priv; - - if (entry->address_handle != NULL) - return 0; - - /* - * Found an entry, but without an address handle. - * Check to see if we have a path record lookup executing and if not, - * start one up. - */ - - if (entry->tid == TS_IB_CLIENT_QUERY_TID_INVALID) - schedule_work(&entry->path_record_work); - else - ipoib_dbg(priv, "no address vector, but path record lookup already started\n"); - - return -EAGAIN; -} - -/* =============================================================== */ -/*..ipoib_sarp_lookup -- lookup a hash in shadow ARP cache */ -int ipoib_sarp_lookup(struct net_device *dev, uint8_t *hash, - struct ipoib_sarp **entry) -{ - struct ipoib_sarp *tentry; - - tentry = _ipoib_sarp_find(dev, hash); - if (!tentry) - return -ENOENT; - - *entry = tentry; - - return _ipoib_sarp_lookup(tentry); -} - -/* =============================================================== */ -/*.._ipoib_sarp_tx_callback -- put reference to entry after TX */ -static void _ipoib_sarp_tx_callback(void *ptr) -{ - ipoib_sarp_put((struct ipoib_sarp *)ptr); /* for _get() in orig call */ -} - -/* =============================================================== */ -/*..ipoib_sarp_send -- send packet to dest */ -int ipoib_sarp_send(struct net_device *dev, struct ipoib_sarp *entry, - struct sk_buff *skb) -{ - return ipoib_dev_send(dev, skb, _ipoib_sarp_tx_callback, - entry, entry->address_handle, entry->qpn); -} - -/* =============================================================== */ -/*..ipoib_sarp_queue_packet -- queue packet during path rec lookup */ -int ipoib_sarp_queue_packet(struct ipoib_sarp *entry, struct sk_buff *skb) -{ - skb_queue_tail(&entry->pkt_queue, skb); - - return 0; -} - -/* =============================================================== */ -/*..ipoib_sarp_rewrite_receive -- rewrite ARP packet for Linux */ -int ipoib_sarp_rewrite_receive(struct net_device *dev, struct sk_buff *skb) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - struct arphdr *arp; - struct ipoib_arp_payload *payload; - struct arphdr *new_arp; - struct ethhdr *header; - uint8_t *new_payload; - struct sk_buff *new_skb; - struct ipoib_sarp *entry; - uint8_t hash[IPOIB_ADDRESS_HASH_BYTES]; - int ret = 0; - - /* If DEBUG is undefined, priv won't be used */ - (void) priv; - - arp = (struct arphdr *)skb->data; - payload = (struct ipoib_arp_payload *)skb_pull(skb, sizeof(*arp)); - - ipoib_dbg(priv, "ARP receive: hwtype=0x%04x proto=0x%04x hwlen=%d prlen=%d op=0x%04x " - "sip=%d.%d.%d.%d dip=%d.%d.%d.%d\n", - ntohs(arp->ar_hrd), - ntohs(arp->ar_pro), - arp->ar_hln, - arp->ar_pln, - ntohs(arp->ar_op), - payload->src_ip_addr[0], payload->src_ip_addr[1], - payload->src_ip_addr[2], payload->src_ip_addr[3], - payload->dst_ip_addr[0], payload->dst_ip_addr[1], - payload->dst_ip_addr[2], payload->dst_ip_addr[3]); - - new_skb = dev_alloc_skb(dev->hard_header_len - + sizeof(*new_arp) - + 2 * (IPOIB_ADDRESS_HASH_BYTES + 4)); - if (!new_skb) { - ret = -ENOMEM; - goto out; - } - - new_skb->mac.raw = new_skb->data; - header = (struct ethhdr *)new_skb->mac.raw; - skb_reserve(new_skb, dev->hard_header_len); - - new_arp = (struct arphdr *)skb_put(new_skb, sizeof(*new_arp)); - new_payload = (uint8_t *)skb_put(new_skb, - 2 * (IPOIB_ADDRESS_HASH_BYTES + 4)); - - header->h_proto = htons(ETH_P_ARP); - - new_skb->dev = dev; - new_skb->pkt_type = PACKET_HOST; - new_skb->protocol = htons(ETH_P_ARP); - - /* copy ARP header */ - *new_arp = *arp; - new_arp->ar_hrd = htons(ARPHRD_ETHER); - new_arp->ar_hln = IPOIB_ADDRESS_HASH_BYTES; - - /* copy IP addresses */ - memcpy(new_payload + IPOIB_ADDRESS_HASH_BYTES, - payload->src_ip_addr, 4); - memcpy(new_payload + 2 * IPOIB_ADDRESS_HASH_BYTES + 4, - payload->dst_ip_addr, 4); - - /* rewrite IPoIB hw address to hashes */ - if (be32_to_cpu(*(uint32_t *)payload->src_hw_addr) & 0xffffff) { - _ipoib_sarp_hash((union ib_gid *) (payload->src_hw_addr + 4), - be32_to_cpu(*(uint32_t *)payload->src_hw_addr) & 0xffffff, hash); - - /* add shadow ARP entries if necessary */ - if (ARPOP_REPLY == ntohs(arp->ar_op)) { - entry = _ipoib_sarp_find(dev, hash); - if (entry) { - if (entry->directed_query && - time_before(jiffies, - entry->last_directed_query + - HZ)) { - /* Directed query, everything's good */ - entry->last_verify = jiffies; - entry->directed_query = 0; - } else { - /* - * If we receive another ARP packet in that's not directed and - * we already have a path record outstanding, don't drop it yet - */ - if (entry->tid == - TS_IB_CLIENT_QUERY_TID_INVALID) { - /* Delete old one and create a new one */ - ipoib_dbg(priv, "LID change inferred on query for " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - hash[0], hash[1], hash[2], - hash[3], hash[4], hash[5]); - - ipoib_sarp_delete(dev, hash); - ipoib_sarp_put(entry); /* for _find() */ - entry = NULL; - } else - ipoib_dbg(priv, "lookup in progress, skipping destroying entry " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - hash[0], hash[1], hash[2], - hash[3], hash[4], hash[5]); - } - } - } else - entry = NULL; - - /* Small optimization, if we already found it once, don't search again */ - if (!entry) - entry = ipoib_sarp_add(dev, - (union ib_gid *) (payload->src_hw_addr + 4), - be32_to_cpu(*(uint32_t *) - payload->src_hw_addr) & - 0xffffff); - - if (ARPOP_REQUEST == ntohs(arp->ar_op)) { - if (entry && !entry->directed_reply) - /* Record when this window started */ - entry->first_directed_reply = jiffies; - } - - if (entry) - ipoib_sarp_put(entry); /* for _find() */ - } else - memset(hash, 0, sizeof(hash)); - - memcpy(new_payload, hash, IPOIB_ADDRESS_HASH_BYTES); - memcpy(header->h_source, hash, sizeof(header->h_source)); - - if (be32_to_cpu(*(uint32_t *)payload->dst_hw_addr) & 0xffffff) { - _ipoib_sarp_hash((union ib_gid *) (payload->dst_hw_addr + 4), - be32_to_cpu(*(uint32_t *)payload->dst_hw_addr) & 0xffffff, hash); - - entry = ipoib_sarp_add(dev, - (union ib_gid *) (payload->dst_hw_addr + 4), - be32_to_cpu(*(uint32_t *)payload->dst_hw_addr) & - 0xffffff); - if (entry) - ipoib_sarp_put(entry); /* for _add() */ - - memcpy(new_payload + IPOIB_ADDRESS_HASH_BYTES + 4, - hash, IPOIB_ADDRESS_HASH_BYTES); - memcpy(header->h_dest, hash, sizeof(header->h_dest)); - } else { - memset(new_payload + IPOIB_ADDRESS_HASH_BYTES + 4, - 0, IPOIB_ADDRESS_HASH_BYTES); - memset(header->h_dest, 0xff, sizeof(header->h_dest)); - } - - netif_rx_ni(new_skb); - -out: - dev_kfree_skb_any(skb); - return ret; -} - -/* =============================================================== */ -/*..ipoib_sarp_rewrite_send -- rewrite and send ARP packet */ -int ipoib_sarp_rewrite_send(struct net_device *dev, struct sk_buff *skb) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned char broadcast_mac_addr[] = - { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; - struct sk_buff *new_skb; - struct arphdr *arp = (struct arphdr *)(skb->data + ETH_HLEN); - uint8_t *payload = ((uint8_t *)arp) + sizeof(*arp); - struct arphdr *new_arp; - struct ipoib_arp_payload *new_payload; - struct ipoib_sarp *dentry = NULL, *entry; - struct ipoib_mcast *dmcast = NULL; - int ret; - - ipoib_dbg(priv, "ARP send: hwtype=0x%04x proto=0x%04x hwlen=%d prlen=%d op=0x%04x " - "sip=%d.%d.%d.%d dip=%d.%d.%d.%d\n", - ntohs(arp->ar_hrd), - ntohs(arp->ar_pro), - arp->ar_hln, - arp->ar_pln, - ntohs(arp->ar_op), - payload[arp->ar_hln], payload[arp->ar_hln + 1], - payload[arp->ar_hln + 2], payload[arp->ar_hln + 3], - payload[2 * arp->ar_hln + 4], payload[2 * arp->ar_hln + 5], - payload[2 * arp->ar_hln + 6], payload[2 * arp->ar_hln + 7]); - - if (memcmp(broadcast_mac_addr, skb->data, ETH_ALEN) == 0) { - /* Broadcast gets handled differently */ - ret = ipoib_mcast_lookup(dev, &priv->bcast_gid, &dmcast); - - /* mcast is only valid if we get a return code of 0 or -EAGAIN */ - switch (ret) { - case 0: - break; - case -EAGAIN: - ipoib_mcast_queue_packet(dmcast, skb); - ipoib_mcast_put(dmcast); /* for _lookup() */ - return 0; - default: - ipoib_warn(priv, "dropping ARP packet with unknown dest " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - skb->data[0], skb->data[1], - skb->data[2], skb->data[3], - skb->data[4], skb->data[5]); - return 1; - } - } else { - dentry = _ipoib_sarp_find(dev, skb->data); - if (!dentry) { - ipoib_warn(priv, "dropping ARP packet with unknown dest " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - skb->data[0], skb->data[1], - skb->data[2], skb->data[3], - skb->data[4], skb->data[5]); - return 1; - } - - /* Make sure we catch any LID changes */ - - /* Update the entry to mark that we last sent a directed ARP query */ - if (dentry->require_verify - && dentry->address_handle != NULL) { - if (ARPOP_REQUEST == ntohs(arp->ar_op)) { - dentry->directed_query = 1; - dentry->last_directed_query = jiffies; - } - - /* - * Catch a LID change on the remote end. If we reply to 3 or more - * ARP queries without a reply, then ditch the entry we have and - * requery - */ - if (ARPOP_REPLY == ntohs(arp->ar_op)) { - dentry->directed_reply++; - - if (!time_before(jiffies, - dentry->first_directed_reply + 4 * HZ)) { - /* We're outside of the time window, so restart the counter */ - dentry->directed_reply = 0; - } else if (dentry->directed_reply > 3) { - if (dentry->tid == - TS_IB_CLIENT_QUERY_TID_INVALID) { - /* Delete old one and create a new one */ - ipoib_dbg(priv, "LID change inferred on reply for " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - dentry->hash[0], dentry->hash[1], - dentry->hash[2], dentry->hash[3], - dentry->hash[4], dentry->hash[5]); - - ipoib_sarp_delete(dev, - dentry->hash); - entry = ipoib_sarp_add(dev, - &dentry->gid, - dentry->qpn); - if (NULL == entry) { - ipoib_dbg(priv, "could not allocate new entry for " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - dentry->hash[0], dentry->hash[1], - dentry->hash[2], dentry->hash[3], - dentry->hash[4], dentry->hash[5]); - ipoib_sarp_put(dentry); /* for _find() */ - return 1; - } - - ipoib_sarp_put(dentry); /* for _find() */ - - dentry = entry; - } else - ipoib_dbg(priv, "lookup in progress, skipping destroying entry " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - dentry->hash[0], dentry->hash[1], - dentry->hash[2], dentry->hash[3], - dentry->hash[4], dentry->hash[5]); - } - } - } - - ret = _ipoib_sarp_lookup(dentry); - if (ret == -EAGAIN) { - ipoib_sarp_queue_packet(dentry, skb); - ipoib_sarp_put(dentry); /* for _find() */ - return 0; - } - } - - new_skb = dev_alloc_skb(dev->hard_header_len - + sizeof(*new_arp) + sizeof(*new_payload)); - if (!new_skb) { - if (dentry) - ipoib_sarp_put(dentry); /* for _find() */ - if (dmcast) - ipoib_mcast_put(dmcast); /* for _lookup() */ - - return 1; - } - skb_reserve(new_skb, dev->hard_header_len); - - new_arp = (struct arphdr *)skb_put(new_skb, sizeof(*new_arp)); - new_payload = (struct ipoib_arp_payload *)skb_put(new_skb, - sizeof(*new_payload)); - - /* build what we need for the header */ - { - uint16_t *t; - - /* ethertype */ - t = (uint16_t *)skb_push(new_skb, 2); - *t = htons(ETH_P_ARP); - - /* leave space so send funct can skip ethernet addrs */ - skb_push(new_skb, IPOIB_ADDRESS_HASH_BYTES * 2); - } - - /* copy ARP header */ - *new_arp = *arp; - new_arp->ar_hrd = htons(ARPHRD_INFINIBAND); - new_arp->ar_hln = IPOIB_HW_ADDR_LEN; - - /* copy IP addresses */ - memcpy(&new_payload->src_ip_addr, payload + arp->ar_hln, 4); - memcpy(&new_payload->dst_ip_addr, payload + 2 * arp->ar_hln + 4, 4); - - /* rewrite hash to IPoIB hw address */ - entry = _ipoib_sarp_find(dev, payload); - if (!entry) { - ipoib_warn(priv, "can't find hw address for hash " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - payload[0], payload[1], payload[2], - payload[3], payload[4], payload[5]); - memset(new_payload->src_hw_addr, 0, IPOIB_HW_ADDR_LEN); - } else { - *((uint32_t *)new_payload->src_hw_addr) = - cpu_to_be32(entry->qpn); - memcpy(&new_payload->src_hw_addr[4], entry->gid.raw, - sizeof (union ib_gid)); - ipoib_sarp_put(entry); /* for _find() */ - } - - if (memcmp(broadcast_mac_addr, payload + IPOIB_ADDRESS_HASH_BYTES + 4, - ETH_ALEN) == 0) { - *((uint32_t *)new_payload->dst_hw_addr) = - cpu_to_be32(IB_MULTICAST_QPN); - memcpy(&new_payload->dst_hw_addr[4], priv->bcast_gid.raw, - sizeof (union ib_gid)); - } else { - entry = _ipoib_sarp_find(dev, payload + - IPOIB_ADDRESS_HASH_BYTES + 4); - if (!entry) - memset(new_payload->dst_hw_addr, 0, IPOIB_HW_ADDR_LEN); - else { - *((uint32_t *)new_payload->dst_hw_addr) = - cpu_to_be32(entry->qpn); - memcpy(&new_payload->dst_hw_addr[4], entry->gid.raw, - sizeof (union ib_gid)); - ipoib_sarp_put(entry); /* for _find() */ - } - } - - dev_kfree_skb_any(skb); - - if (dmcast) - ipoib_mcast_send(dev, dmcast, new_skb); - else - ipoib_sarp_send(dev, dentry, new_skb); - - return 0; -} - -/* =============================================================== */ -/*..ipoib_sarp_dev_init -- initialize ARP cache */ -int ipoib_sarp_dev_init(struct net_device *dev) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - int i; - - priv->sarp_cache = kmalloc(sizeof(*priv->sarp_cache), GFP_KERNEL); - if (!priv->sarp_cache) - return -ENOMEM; - - for (i = 0; i < 256; ++i) - INIT_LIST_HEAD(&priv->sarp_cache->table[i]); - - return 0; -} - -/* =============================================================== */ -/*..ipoib_sarp_dev_flush -- flush ARP cache */ -void ipoib_sarp_dev_flush(struct net_device *dev) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_sarp *entry, *tentry; - LIST_HEAD(delete_list); - unsigned long flags; - int i; - - ipoib_dbg(priv, "flushing shadow ARP cache\n"); - - /* - * We move to delete_list first because putting the reference could - * eventually end up blocking and we're in a spinlock - */ - - /* - * Instead of destroying the address vector, we destroy the entire - * entry, but we create a new empty entry before that. This way we - * don't have any races with freeing a used address vector. - */ - spin_lock_irqsave(&priv->lock, flags); - for (i = 0; i < 256; ++i) { - list_for_each_entry_safe(entry, tentry, - &priv->sarp_cache->table[i], - cache_list) { - struct ipoib_sarp *nentry; - - /* - * Allocation failure isn't fatal, just drop the entry. - * If it's important, a new one will be generated later - * automatically. - */ - nentry = _ipoib_sarp_alloc(entry->dev); - if (nentry) { - memcpy(nentry->hash, entry->hash, - sizeof(nentry->hash)); - nentry->gid = entry->gid; - - nentry->require_verify = entry->require_verify; - nentry->qpn = entry->qpn; - - /* Add it before the current entry */ - list_add_tail(&nentry->cache_list, - &entry->cache_list); - - ipoib_sarp_put(nentry); /* for _alloc() */ - } - - list_del(&entry->cache_list); - list_add_tail(&entry->cache_list, &delete_list); - } - } - spin_unlock_irqrestore(&priv->lock, flags); - - list_for_each_entry_safe(entry, tentry, &delete_list, cache_list) { - list_del_init(&entry->cache_list); - ipoib_sarp_put(entry); /* for original reference */ - } -} - -/* =============================================================== */ -/*..ipoib_sarp_dev_destroy -- destroy ARP cache */ -static void ipoib_sarp_dev_destroy(struct net_device *dev) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_sarp *entry, *tentry; - LIST_HEAD(delete_list); - unsigned long flags; - int i; - - /* - * We move to delete_list first because putting the reference could - * eventually end up blocking and we're in a spinlock - */ - spin_lock_irqsave(&priv->lock, flags); - for (i = 0; i < 256; ++i) { - list_for_each_entry_safe(entry, tentry, - &priv->sarp_cache->table[i], - cache_list) { - list_del(&entry->cache_list); - list_add_tail(&entry->cache_list, &delete_list); - } - } - spin_unlock_irqrestore(&priv->lock, flags); - - list_for_each_entry_safe(entry, tentry, &delete_list, cache_list) { - list_del_init(&entry->cache_list); - ipoib_sarp_put(entry); /* for original reference */ - } -} - -/* =============================================================== */ -/*..ipoib_sarp_dev_cleanup -- clean up ARP cache */ -void ipoib_sarp_dev_cleanup(struct net_device *dev) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - - ipoib_sarp_dev_destroy(dev); - kfree(priv->sarp_cache); -} - -/* =============================================================== */ -/*..ipoib_get_gid -- find a hash in shadow ARP cache */ -int ipoib_get_gid(struct net_device *dev, uint8_t *hash, tTS_IB_GID gid) -{ - struct ipoib_sarp *entry = _ipoib_sarp_find(dev, hash); - - if (!entry) - return -EINVAL; - - memcpy(gid, entry->gid.raw, sizeof (union ib_gid)); - - ipoib_sarp_put(entry); /* for _find() */ - - return 0; -} -EXPORT_SYMBOL(ipoib_get_gid); - -/* - * Local Variables: - * c-file-style: "linux" - * indent-tabs-mode: t - * End: - */ - Index: infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- infiniband/ulp/ipoib/ipoib_main.c (revision 915) +++ infiniband/ulp/ipoib/ipoib_main.c (working copy) @@ -49,8 +49,7 @@ #endif module_param(debug_level, int, 0644); -MODULE_PARM_DESC(debug_level, - "Enable debug tracing if > 0" DATA_PATH_DEBUG_HELP); +MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0" DATA_PATH_DEBUG_HELP); int mcast_debug_level; @@ -62,8 +61,10 @@ DECLARE_MUTEX(ipoib_device_mutex); LIST_HEAD(ipoib_device_list); -static const uint8_t broadcast_mac_addr[] = { - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff +static const u8 ipv4_bcast_addr[] = { + 0x00, 0xff, 0xff, 0xff, + 0xff, 0x12, 0x40, 0x1b, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff }; struct workqueue_struct *ipoib_workqueue; @@ -145,7 +146,7 @@ } EXPORT_SYMBOL(ipoib_device_handle); -int ipoib_dev_open(struct net_device *dev) +int ipoib_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -184,7 +185,7 @@ return 0; } -static int _ipoib_dev_stop(struct net_device *dev) +static int ipoib_stop(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -217,7 +218,7 @@ return 0; } -static int _ipoib_dev_change_mtu(struct net_device *dev, int new_mtu) +static int ipoib_change_mtu(struct net_device *dev, int new_mtu) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -231,240 +232,364 @@ return 0; } -static int _ipoib_dev_set_config(struct net_device *dev, struct ifmap *map) +static int path_rec_completion(tTS_IB_CLIENT_QUERY_TID tid, + int status, + struct ib_path_record *pathrec, + int remaining, void *path_ptr) { - return -EOPNOTSUPP; + struct ipoib_path *path = path_ptr; + struct ipoib_dev_priv *priv = netdev_priv(path->dev); + struct sk_buff *skb; + struct ib_ah *ah; + + if (status) + goto err; + + { + struct ib_ah_attr av = { + .dlid = pathrec->dlid, + .sl = pathrec->sl, + .src_path_bits = 0, + .static_rate = 0, + .ah_flags = 0, + .port_num = priv->port + }; + + ah = ib_create_ah(priv->pd, &av); + } + + if (IS_ERR(ah)) + goto err; + + path->ah = ah; + + ipoib_dbg(priv, "created address handle %p for LID 0x%04x, SL %d\n", + ah, pathrec->dlid, pathrec->sl); + + while ((skb = __skb_dequeue(&path->queue))) { + skb->dev = path->dev; + if (dev_queue_xmit(skb)) + ipoib_warn(priv, "dev_queue_xmit failed " + "to requeue packet\n"); + } + + return 1; + +err: + while ((skb = __skb_dequeue(&path->queue))) + dev_kfree_skb(skb); + + if (path->neighbour) + IPOIB_PATH(path->neighbour) = NULL; + + kfree(path); + + return 1; } -static int _ipoib_dev_xmit(struct sk_buff *skb, struct net_device *dev) +static int path_rec_start(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - uint16_t ethertype; - int ret; + struct ipoib_path *path = kmalloc(sizeof *path, GFP_ATOMIC); + tTS_IB_CLIENT_QUERY_TID tid; - ethertype = ntohs(((struct ethhdr *)skb->data)->h_proto); + if (!path) + goto err; - ipoib_dbg_data(priv, "packet to transmit, length=%d ethertype=0x%04x\n", - skb->len, ethertype); + path->ah = NULL; + path->qpn = be32_to_cpup((u32 *) skb->dst->neighbour->ha); + path->dev = dev; + skb_queue_head_init(&path->queue); + __skb_queue_tail(&path->queue, skb); + path->neighbour = NULL; - if (!netif_carrier_ok(dev)) { - ipoib_dbg(priv, "dropping packet since fabric is not up\n"); + /* + * XXX there's a race here if path record completion runs + * before we get to finish up. Add a lock to path struct? + */ + if (tsIbPathRecordRequest(priv->ca, priv->port, + priv->local_gid.raw, + skb->dst->neighbour->ha + 4, + priv->pkey, 0, HZ, 0, + path_rec_completion, + path, &tid)) { + ipoib_warn(priv, "tsIbPathRecordRequest failed\n"); + goto err; + } - dev->trans_start = jiffies; - ++priv->stats.tx_packets; - priv->stats.tx_bytes += skb->len; - dev_kfree_skb_any(skb); - return 0; + path->neighbour = skb->dst->neighbour; + IPOIB_PATH(skb->dst->neighbour) = path; + return 0; + +err: + kfree(path); + ++priv->stats.tx_dropped; + dev_kfree_skb_any(skb); + + return 0; +} + +static int unicast_arp_completion(tTS_IB_CLIENT_QUERY_TID tid, + int status, + struct ib_path_record *pathrec, + int remaining, void *skb_ptr) +{ + struct sk_buff *skb = skb_ptr; + struct ipoib_dev_priv *priv = netdev_priv(skb->dev); + struct ib_ah *ah; + + if (status) + goto err; + + { + struct ib_ah_attr av = { + .dlid = pathrec->dlid, + .sl = pathrec->sl, + .src_path_bits = 0, + .static_rate = 0, + .ah_flags = 0, + .port_num = priv->port + }; + + ah = ib_create_ah(priv->pd, &av); } - switch (ethertype) { - case ETH_P_ARP: - if (ipoib_sarp_rewrite_send(dev, skb)) { - ++priv->stats.tx_dropped; - dev_kfree_skb_any(skb); - } - return 0; + if (IS_ERR(ah)) + goto err; - case ETH_P_IP: - if (skb->data[0] == 0x01 && skb->data[1] == 0x00 - && skb->data[2] == 0x5e - && (skb->data[3] & 0x80) == 0x00) { - /* Multicast MAC addr */ - struct ipoib_mcast *mcast = NULL; - union ib_gid mgid; - struct iphdr *iph = - (struct iphdr *)(skb->data + ETH_HLEN); - u32 multiaddr = ntohl(iph->daddr); + *(struct ib_ah **) skb->cb = ah; - mgid = ipoib_broadcast_mgid; + if (dev_queue_xmit(skb)) + ipoib_warn(priv, "dev_queue_xmit failed " + "to requeue ARP packet\n"); - /* Add in the P_Key */ - mgid.raw[4] = (priv->pkey >> 8) & 0xff; - mgid.raw[5] = priv->pkey & 0xff; + return 1; - /* Fixup the group mapping */ - mgid.raw[12] = (multiaddr >> 24) & 0x0f; - mgid.raw[13] = (multiaddr >> 16) & 0xff; - mgid.raw[14] = (multiaddr >> 8) & 0xff; - mgid.raw[15] = multiaddr & 0xff; +err: + dev_kfree_skb(skb); - ret = ipoib_mcast_lookup(dev, &mgid, &mcast); - switch (ret) { - case 0: - return ipoib_mcast_send(dev, mcast, skb); - case -EAGAIN: - ipoib_mcast_queue_packet(mcast, skb); - ipoib_mcast_put(mcast); - return 0; - } - } else - if (memcmp(broadcast_mac_addr, skb->data, ETH_ALEN) == 0) { - struct ipoib_mcast *mcast = NULL; + return 1; +} - ret = ipoib_mcast_lookup(dev, &priv->bcast_gid, &mcast); - switch (ret) { - case 0: - return ipoib_mcast_send(dev, mcast, skb); - case -EAGAIN: - ipoib_mcast_queue_packet(mcast, skb); - ipoib_mcast_put(mcast); - return 0; - } - } else { - struct ipoib_sarp *entry = NULL; +static void unicast_arp_finish(struct sk_buff *skb) +{ + struct ib_ah *ah = *(struct ib_ah **) skb->cb; - ret = ipoib_sarp_lookup(dev, skb->data, &entry); - switch (ret) { - case 0: - return ipoib_sarp_send(dev, entry, skb); - case -EAGAIN: - ipoib_sarp_queue_packet(entry, skb); - ipoib_sarp_put(entry); - return 0; - } - } + if (ah) + ib_destroy_ah(ah); +} - switch (ret) { - case 0: - case -EAGAIN: - /* Shouldn't get here anyway */ - break; - case -ENOENT: - ipoib_warn(priv, "dropping packet with unknown dest " - "%02x:%02x:%02x:%02x:%02x:%02x\n", - skb->data[0], skb->data[1], - skb->data[2], skb->data[3], - skb->data[4], skb->data[5]); +/* + * For unicast packets with no skb->dst->neighbour (unicast ARPs are + * the main example), we fire off a path record query for each packet. + * This is pretty bad for scalability (since this is going to hammer + * the SM on a big fabric) but it's the best I can think of for now. + * + * Also we might have a problem if a path changes, because ARPs will + * still go through (since we'll get the new path from the SM for + * these queries) so we'll never update the neighbour. + */ +static int unicast_arp_start(struct sk_buff *skb, struct net_device *dev, + struct ipoib_pseudoheader *phdr) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct sk_buff *tmp_skb; + tTS_IB_CLIENT_QUERY_TID tid; + + if (skb->destructor) { + tmp_skb = skb; + skb = skb_clone(tmp_skb, GFP_ATOMIC); + dev_kfree_skb_any(tmp_skb); + if (!skb) { ++priv->stats.tx_dropped; - dev_kfree_skb_any(skb); return 0; - default: - ipoib_warn(priv, "sending to %02x:%02x:%02x:%02x:%02x:%02x " - "failed (ret = %d)\n", - skb->data[0], skb->data[1], - skb->data[2], skb->data[3], - skb->data[4], skb->data[5], ret); - ++priv->stats.tx_dropped; - dev_kfree_skb_any(skb); - return 0; } - return 0; + } - case ETH_P_IPV6: - ipoib_dbg(priv, "dropping IPv6 packet\n"); - ++priv->stats.tx_dropped; - dev_kfree_skb_any(skb); - return 0; + skb->dev = dev; + skb->destructor = unicast_arp_finish; + memset(skb->cb, 0, sizeof skb->cb); - default: - ipoib_warn(priv, "dropping packet with unknown ethertype 0x%04x\n", - ethertype); + /* + * XXX We need to keep a record of the skb and TID somewhere + * so that we can cancel the request if the device goes down + * before it finishes. + */ + if (tsIbPathRecordRequest(priv->ca, priv->port, + priv->local_gid.raw, + phdr->hwaddr + 4, + priv->pkey, 0, HZ, 0, + unicast_arp_completion, + skb, &tid)) { + ipoib_warn(priv, "tsIbPathRecordRequest failed\n"); ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); - return 0; } return 0; } -struct net_device_stats *_ipoib_dev_get_stats(struct net_device *dev) +static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_path *path; - return &priv->stats; -} + if (skb->dst && skb->dst->neighbour) { + if (unlikely(!IPOIB_PATH(skb->dst->neighbour))) + return path_rec_start(skb, dev); -static void _ipoib_dev_timeout(struct net_device *dev) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); + path = IPOIB_PATH(skb->dst->neighbour); - if (priv->tx_free && !test_bit(IPOIB_FLAG_TIMEOUT, &priv->flags)) { - char ring[IPOIB_TX_RING_SIZE + 1]; - int i; + if (likely(path->ah)) { + ipoib_send(dev, skb, path->ah, path->qpn); + return 0; + } - for (i = 0; i < IPOIB_TX_RING_SIZE; ++i) - ring[i] = priv->tx_ring[i].skb ? 'X' : '.'; + if (skb_queue_len(&path->queue) < IPOIB_MAX_PATH_REC_QUEUE) + __skb_queue_tail(&path->queue, skb); + else + goto err; + } else { + struct ipoib_pseudoheader *phdr = + (struct ipoib_pseudoheader *) skb->data; + skb_pull(skb, sizeof *phdr); - ring[i] = 0; + if (phdr->hwaddr[4] == 0xff) { + /* multicast/broadcast GID */ + if (!memcmp(phdr->hwaddr, dev->broadcast, IPOIB_HW_ADDR_LEN)) + ipoib_mcast_send(dev, priv->broadcast, skb); + else { + ipoib_dbg(priv, "Dropping (no %s): type %04x, QPN %06x " + IPOIB_GID_FMT "\n", + skb->dst ? "neigh" : "dst", + be16_to_cpup((u16 *) skb->data), + be32_to_cpup((u32 *) phdr->hwaddr), + phdr->hwaddr[ 4], phdr->hwaddr[ 5], + phdr->hwaddr[ 6], phdr->hwaddr[ 7], + phdr->hwaddr[ 8], phdr->hwaddr[ 9], + phdr->hwaddr[10], phdr->hwaddr[11], + phdr->hwaddr[12], phdr->hwaddr[13], + phdr->hwaddr[14], phdr->hwaddr[15], + phdr->hwaddr[16], phdr->hwaddr[17], + phdr->hwaddr[18], phdr->hwaddr[19]); + goto err; + } + } else { + /* unicast GID -- ARP reply?? */ - ipoib_warn(priv, "transmit timeout: latency %ld, " - "tx_free %d, tx_ring [%s]\n", - jiffies - dev->trans_start, priv->tx_free, ring); + /* + * If destructor is unicast_arp_finish, we've + * already been through the path lookup and + * now we can just send the packet. + */ + if (skb->destructor == unicast_arp_finish) { + ipoib_send(dev, skb, *(struct ib_ah **) skb->cb, + be32_to_cpup((u32 *) phdr->hwaddr)); + return 0; + } - set_bit(IPOIB_FLAG_TIMEOUT, &priv->flags); - } else - ipoib_dbg(priv, "transmit timeout: latency %ld\n", - jiffies - dev->trans_start); + if (be16_to_cpup((u16 *) skb->data) != ETH_P_ARP) + ipoib_warn(priv, "Unicast, no %s: type %04x, QPN %06x " + IPOIB_GID_FMT "\n", + skb->dst ? "neigh" : "dst", + be16_to_cpup((u16 *) skb->data), + be32_to_cpup((u32 *) phdr->hwaddr), + phdr->hwaddr[ 4], phdr->hwaddr[ 5], + phdr->hwaddr[ 6], phdr->hwaddr[ 7], + phdr->hwaddr[ 8], phdr->hwaddr[ 9], + phdr->hwaddr[10], phdr->hwaddr[11], + phdr->hwaddr[12], phdr->hwaddr[13], + phdr->hwaddr[14], phdr->hwaddr[15], + phdr->hwaddr[16], phdr->hwaddr[17], + phdr->hwaddr[18], phdr->hwaddr[19]); + + /* put the pseudoheader back on */ + skb_push(skb, sizeof *phdr); + return unicast_arp_start(skb, dev, phdr); + } + } + + return 0; + +err: + ++priv->stats.tx_dropped; + dev_kfree_skb_any(skb); + + return 0; } -/* - * Setup the packet to look like ethernet here, we'll fix it later when - * we actually send it to look like an IPoIB packet - */ -static int _ipoib_dev_hard_header(struct sk_buff *skb, - struct net_device *dev, - unsigned short type, - void *daddr, void *saddr, unsigned len) +struct net_device_stats *ipoib_get_stats(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ethhdr *header = (struct ethhdr *)skb_push(skb, ETH_HLEN); - /* If DEBUG is undefined, priv won't be used */ - (void) priv; + return &priv->stats; +} - ipoib_dbg_data(priv, "building header, ethertype=0x%04x\n", type); +static void ipoib_timeout(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); - if (daddr) - memcpy(header->h_dest, daddr, IPOIB_ADDRESS_HASH_BYTES); + ipoib_warn(priv, "transmit timeout: latency %ld\n", + jiffies - dev->trans_start); + /* XXX reset QP, etc. */ +} - if (saddr) - memcpy(header->h_source, saddr, IPOIB_ADDRESS_HASH_BYTES); - else - memcpy(header->h_source, dev->dev_addr, - IPOIB_ADDRESS_HASH_BYTES); +static int ipoib_hard_header(struct sk_buff *skb, + struct net_device *dev, + unsigned short type, + void *daddr, void *saddr, unsigned len) +{ + struct ipoib_header *header; - header->h_proto = htons(type); + header = (struct ipoib_header *) skb_push(skb, sizeof *header); + header->proto = htons(type); + header->reserved = 0; + + /* + * If we don't have a neighbour structure, stuff the + * destination address onto the front of the skb so we can + * figure out where to send the packet later. + */ + if (!skb->dst || !skb->dst->neighbour) { + struct ipoib_pseudoheader *phdr = + (struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr); + memcpy(phdr->hwaddr, daddr, IPOIB_HW_ADDR_LEN); + } + return 0; } -static void _ipoib_dev_set_mcast_list(struct net_device *dev) +static void ipoib_set_mcast_list(struct net_device *dev) { - struct ipoib_dev_priv *priv = netdev_priv(dev); - - schedule_work(&priv->restart_task); + /* XXX Join multicast groups */ } int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port) { struct ipoib_dev_priv *priv = netdev_priv(dev); - if (ipoib_sarp_dev_init(dev)) - goto out; - /* Allocate RX/TX "rings" to hold queued skbs */ - priv->rx_ring = - kmalloc(IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf), - GFP_KERNEL); + priv->rx_ring = kmalloc(IPOIB_RX_RING_SIZE * sizeof (struct ipoib_buf), + GFP_KERNEL); if (!priv->rx_ring) { printk(KERN_WARNING "%s: failed to allocate RX ring (%d entries)\n", ca->name, IPOIB_RX_RING_SIZE); - goto out_arp_cleanup; + goto out; } memset(priv->rx_ring, 0, - IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf)); + IPOIB_RX_RING_SIZE * sizeof (struct ipoib_buf)); - priv->tx_ring = - kmalloc(IPOIB_TX_RING_SIZE * sizeof(struct ipoib_tx_buf), - GFP_KERNEL); + priv->tx_ring = kmalloc(IPOIB_TX_RING_SIZE * sizeof (struct ipoib_buf), + GFP_KERNEL); if (!priv->tx_ring) { printk(KERN_WARNING "%s: failed to allocate TX ring (%d entries)\n", ca->name, IPOIB_TX_RING_SIZE); goto out_rx_ring_cleanup; } memset(priv->tx_ring, 0, - IPOIB_TX_RING_SIZE * sizeof(struct ipoib_tx_buf)); + IPOIB_TX_RING_SIZE * sizeof(struct ipoib_buf)); /* set up the rest of our private data */ @@ -482,9 +607,6 @@ out_rx_ring_cleanup: kfree(priv->rx_ring); -out_arp_cleanup: - ipoib_sarp_dev_cleanup(dev); - out: return -ENOMEM; } @@ -507,7 +629,6 @@ ipoib_proc_dev_cleanup(dev); ipoib_ib_dev_cleanup(dev); - ipoib_sarp_dev_cleanup(dev); if (priv->rx_ring) { for (i = 0; i < IPOIB_RX_RING_SIZE; ++i) @@ -532,15 +653,14 @@ { struct ipoib_dev_priv *priv = netdev_priv(dev); - dev->open = ipoib_dev_open; - dev->stop = _ipoib_dev_stop; - dev->change_mtu = _ipoib_dev_change_mtu; - dev->set_config = _ipoib_dev_set_config; - dev->hard_start_xmit = _ipoib_dev_xmit; - dev->get_stats = _ipoib_dev_get_stats; - dev->tx_timeout = _ipoib_dev_timeout; - dev->hard_header = _ipoib_dev_hard_header; - dev->set_multicast_list = _ipoib_dev_set_mcast_list; + dev->open = ipoib_open; + dev->stop = ipoib_stop; + dev->change_mtu = ipoib_change_mtu; + dev->hard_start_xmit = ipoib_start_xmit; + dev->get_stats = ipoib_get_stats; + dev->tx_timeout = ipoib_timeout; + dev->hard_header = ipoib_hard_header; + dev->set_multicast_list = ipoib_set_mcast_list; dev->watchdog_timeo = HZ; dev->rebuild_header = NULL; @@ -548,17 +668,21 @@ dev->header_cache_update = NULL; dev->flags |= IFF_BROADCAST | IFF_MULTICAST; - - dev->hard_header_len = ETH_HLEN; - dev->addr_len = IPOIB_ADDRESS_HASH_BYTES; - dev->type = ARPHRD_ETHER; + + /* + * We add in IPOIB_HW_ADDR_LEN to allow for the destination + * address "pseudoheader" for skbs without neighbour struct. + */ + dev->hard_header_len = IPOIB_ENCAP_LEN + IPOIB_HW_ADDR_LEN; + dev->addr_len = IPOIB_HW_ADDR_LEN; + dev->type = ARPHRD_INFINIBAND; dev->tx_queue_len = IPOIB_TX_RING_SIZE * 2; /* MTU will be reset when mcast join happens */ dev->mtu = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN; priv->mcast_mtu = priv->admin_mtu = dev->mtu; - memset(dev->broadcast, 0xff, dev->addr_len); + memcpy(dev->broadcast, ipv4_bcast_addr, IPOIB_HW_ADDR_LEN); netif_carrier_off(dev); @@ -610,6 +734,9 @@ goto alloc_mem_failed; } + priv->dev->broadcast[8] = priv->pkey >> 8; + priv->dev->broadcast[9] = priv->pkey & 0xff; + result = ipoib_dev_init(priv->dev, hca, port); if (result < 0) { printk(KERN_WARNING "%s: failed to initialize port %d (ret = %d)\n", Index: infiniband/ulp/ipoib/ipoib.h =================================================================== --- infiniband/ulp/ipoib/ipoib.h (revision 915) +++ infiniband/ulp/ipoib/ipoib.h (working copy) @@ -44,30 +44,27 @@ /* constants */ -#define ARPHRD_INFINIBAND 32 +enum { + IPOIB_PACKET_SIZE = 2048, + IPOIB_BUF_SIZE = IPOIB_PACKET_SIZE + IB_GRH_BYTES, -#define IPOIB_PACKET_SIZE 2048 + IPOIB_ENCAP_LEN = 4, + IPOIB_HW_ADDR_LEN = 20, -enum { IPOIB_RX_RING_SIZE = 128, IPOIB_TX_RING_SIZE = 64, IPOIB_NUM_WC = 4, - IPOIB_BUF_SIZE = IPOIB_PACKET_SIZE + IB_GRH_BYTES, + IPOIB_MAX_PATH_REC_QUEUE = 3, - IPOIB_ADDRESS_HASH_BYTES = ETH_ALEN, - IPOIB_ENCAP_LEN = 4, - IPOIB_HW_ADDR_LEN = 20, - IPOIB_FLAG_TX_FULL = 0, - IPOIB_FLAG_TIMEOUT = 1, - IPOIB_FLAG_OPER_UP = 2, - IPOIB_FLAG_ADMIN_UP = 3, - IPOIB_PKEY_ASSIGNED = 4, - IPOIB_PKEY_STOP = 5, - IPOIB_FLAG_SUBINTERFACE = 6, - IPOIB_MCAST_STOP = 7, + IPOIB_FLAG_OPER_UP = 1, + IPOIB_FLAG_ADMIN_UP = 2, + IPOIB_PKEY_ASSIGNED = 3, + IPOIB_PKEY_STOP = 4, + IPOIB_FLAG_SUBINTERFACE = 5, + IPOIB_MCAST_STOP = 6, IPOIB_MAX_BACKOFF_SECONDS = 16, @@ -79,23 +76,22 @@ /* structs */ -typedef void (*ipoib_tx_callback_t)(void *); +struct ipoib_header { + u16 proto; + u16 reserved; +}; -struct ipoib_sarp; +struct ipoib_pseudoheader { + u8 hwaddr[IPOIB_HW_ADDR_LEN]; +}; + struct ipoib_mcast; -struct ipoib_tx_buf { +struct ipoib_buf { struct sk_buff *skb; - ipoib_tx_callback_t callback; - void *ptr; DECLARE_PCI_UNMAP_ADDR(mapping) }; -struct ipoib_rx_buf { - struct sk_buff *skb; - DECLARE_PCI_UNMAP_ADDR(mapping) -}; - struct ipoib_dev_priv { spinlock_t lock; @@ -132,22 +128,17 @@ u16 local_lid; u32 local_qpn; - union ib_gid bcast_gid; - unsigned int admin_mtu; unsigned int mcast_mtu; - struct ipoib_rx_buf *rx_ring; + struct ipoib_buf *rx_ring; - struct ipoib_tx_buf *tx_ring; + struct ipoib_buf *tx_ring; int tx_head; int tx_free; struct ib_wc ibwc[IPOIB_NUM_WC]; - struct ipoib_sarp_cache *sarp_cache; - - struct proc_dir_entry *arp_proc_entry; struct proc_dir_entry *mcast_proc_entry; struct ib_event_handler event_handler; @@ -158,21 +149,29 @@ struct list_head child_intfs; }; +struct ipoib_path { + struct ib_ah *ah; + u32 qpn; + struct sk_buff_head queue; + + struct net_device *dev; + struct neighbour *neighbour; +}; + +#define IPOIB_PATH(neigh) (*(struct ipoib_path **) ((neigh)->ha + 24)) + extern struct workqueue_struct *ipoib_workqueue; /* list of IPoIB network devices */ extern struct semaphore ipoib_device_mutex; extern struct list_head ipoib_device_list; -extern union ib_gid ipoib_broadcast_mgid; - /* functions */ void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr); -int ipoib_dev_send(struct net_device *dev, struct sk_buff *skb, - ipoib_tx_callback_t callback, - void *ptr, struct ib_ah *address, u32 qpn); +void ipoib_send(struct net_device *dev, struct sk_buff *skb, + struct ib_ah *address, u32 qpn); struct ipoib_dev_priv *ipoib_intf_alloc(const char *format); @@ -188,33 +187,6 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_dev_cleanup(struct net_device *dev); -void ipoib_sarp_get(struct ipoib_sarp *entry); -void ipoib_sarp_put(struct ipoib_sarp *entry); -struct ipoib_sarp *ipoib_sarp_add(struct net_device *dev, union ib_gid *gid, - u32 qpn); -struct ipoib_sarp *ipoib_sarp_local_add(struct net_device *dev, union ib_gid *gid, - u32 qpn); -int ipoib_sarp_delete(struct net_device *dev, const uint8_t *hash); -int ipoib_sarp_lookup(struct net_device *dev, uint8_t *hash, - struct ipoib_sarp **entry); -int ipoib_sarp_queue_packet(struct ipoib_sarp *entry, struct sk_buff *skb); -int ipoib_sarp_send(struct net_device *dev, struct ipoib_sarp *entry, - struct sk_buff *skb); -int ipoib_sarp_rewrite_receive(struct net_device *dev, struct sk_buff *skb); -int ipoib_sarp_rewrite_send(struct net_device *dev, struct sk_buff *skb); -int ipoib_sarp_dev_init(struct net_device *dev); -void ipoib_sarp_dev_flush(struct net_device *dev); -void ipoib_sarp_dev_cleanup(struct net_device *dev); - -struct ipoib_sarp_iter *ipoib_sarp_iter_init(struct net_device *dev); -void ipoib_sarp_iter_free(struct ipoib_sarp_iter *iter); -int ipoib_sarp_iter_next(struct ipoib_sarp_iter *iter); -void ipoib_sarp_iter_read(struct ipoib_sarp_iter *iter, uint8_t *hash, - union ib_gid *gid, u32 *qpn, - unsigned long *created, - unsigned long *last_verify, - unsigned int *queuelen, unsigned int *complete); - int ipoib_proc_dev_init(struct net_device *dev); void ipoib_proc_dev_cleanup(struct net_device *dev); @@ -223,8 +195,8 @@ int ipoib_mcast_lookup(struct net_device *dev, union ib_gid *mgid, struct ipoib_mcast **mcast); int ipoib_mcast_queue_packet(struct ipoib_mcast *mcast, struct sk_buff *skb); -int ipoib_mcast_send(struct net_device *dev, struct ipoib_mcast *mcast, - struct sk_buff *skb); +void ipoib_mcast_send(struct net_device *dev, struct ipoib_mcast *mcast, + struct sk_buff *skb); void ipoib_mcast_restart_task(void *dev_ptr); int ipoib_mcast_start_thread(struct net_device *dev); Index: infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- infiniband/ulp/ipoib/ipoib_ib.c (revision 915) +++ infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -113,7 +113,7 @@ if (entry->opcode == IB_WC_SEND) { if (work_request_id < IPOIB_TX_RING_SIZE) { - struct ipoib_tx_buf *tx_req; + struct ipoib_buf *tx_req; tx_req = &priv->tx_ring[work_request_id]; @@ -145,42 +145,18 @@ if (entry->slid != priv->local_lid || entry->src_qp != priv->local_qpn) { - struct ethhdr *header; + skb->protocol = ((struct ipoib_header *) skb->data)->proto; - skb->protocol = *(uint16_t *)skb->data; - - /* pull the IPoIB header and add an ethernet header */ skb_pull(skb, IPOIB_ENCAP_LEN); - header = (struct ethhdr *)skb_push(skb, - ETH_HLEN); - - /* - * We could figure out the MAC address from - * the IPoIB header and matching, but it's - * probably too much effort for what it's worth - */ - memset(header->h_dest, 0, - sizeof(header->h_dest)); - memset(header->h_source, 0, - sizeof(header->h_source)); - header->h_proto = skb->protocol; - - skb->mac.raw = skb->data; - skb_pull(skb, ETH_HLEN); - dev->last_rx = jiffies; ++priv->stats.rx_packets; priv->stats.rx_bytes += skb->len; - if (skb->protocol == htons(ETH_P_ARP)) { - if (ipoib_sarp_rewrite_receive(dev, skb)) - ipoib_warn(priv, "ipoib_sarp_rewrite_receive failed\n"); - } else { - skb->dev = dev; - skb->pkt_type = PACKET_HOST; - netif_rx_ni(skb); - } + skb->dev = dev; + /* XXX get correct PACKET_ type here */ + skb->pkt_type = PACKET_HOST; + netif_rx_ni(skb); } else { ipoib_dbg_data(priv, "dropping loopback packet\n"); dev_kfree_skb_any(skb); @@ -198,7 +174,7 @@ case IB_WC_SEND: { - struct ipoib_tx_buf *tx_req; + struct ipoib_buf *tx_req; unsigned long flags; if (work_request_id >= IPOIB_TX_RING_SIZE) { @@ -216,18 +192,12 @@ tx_req->skb->len, PCI_DMA_TODEVICE); - clear_bit(IPOIB_FLAG_TIMEOUT, &priv->flags); - ++priv->stats.tx_packets; priv->stats.tx_bytes += tx_req->skb->len; dev_kfree_skb_any(tx_req->skb); tx_req->skb = NULL; - tx_req->callback(tx_req->ptr); - tx_req->callback = NULL; - tx_req->ptr = NULL; - spin_lock_irqsave(&priv->lock, flags); ++priv->tx_free; if (priv->tx_free > IPOIB_TX_RING_SIZE / 2) @@ -260,10 +230,10 @@ } while (n == IPOIB_NUM_WC); } -static int _ipoib_ib_send(struct ipoib_dev_priv *priv, - u64 work_request_id, - struct ib_ah *address, u32 qpn, - dma_addr_t addr, int len) +static inline int post_send(struct ipoib_dev_priv *priv, + u64 work_request_id, + struct ib_ah *address, u32 qpn, + dma_addr_t addr, int len) { struct ib_sge list = { .addr = addr, @@ -290,13 +260,12 @@ } /* =============================================================== */ -/*..ipoib_dev_send -- schedule an IB send work request */ -int ipoib_dev_send(struct net_device *dev, struct sk_buff *skb, - ipoib_tx_callback_t callback, void *ptr, - struct ib_ah *address, u32 qpn) +/*..ipoib_send -- schedule an IB send work request */ +void ipoib_send(struct net_device *dev, struct sk_buff *skb, + struct ib_ah *address, u32 qpn) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_tx_buf *tx_req; + struct ipoib_buf *tx_req; dma_addr_t addr; if (skb->len > dev->mtu + IPOIB_HW_ADDR_LEN) { @@ -304,56 +273,39 @@ skb->len, dev->mtu + IPOIB_HW_ADDR_LEN); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; - - goto err; + dev_kfree_skb_any(skb); + return; } if (!(skb = skb_unshare(skb, GFP_ATOMIC))) { ipoib_warn(priv, "failed to unshare sk_buff. Dropping\n"); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; - - goto out; + return; } ipoib_dbg_data(priv, "sending packet, length=%d address=%p qpn=0x%06x\n", skb->len, address, qpn); - /* make the skb look like an IPoIB packet again */ - { - struct ethhdr *header = (struct ethhdr *)skb->data; - uint16_t *reserved, *ether_type; - - skb_pull(skb, ETH_HLEN); - reserved = (uint16_t *)skb_push(skb, 2); - ether_type = (uint16_t *)skb_push(skb, 2); - - *ether_type = header->h_proto; - *reserved = 0; - } - /* - * We put the skb into the tx_ring _before_ we call _ipoib_ib_send() + * We put the skb into the tx_ring _before_ we call post_send() * because it's entirely possible that the completion handler will - * run before we execute anything after the _ipoib_ib_send(). That + * run before we execute anything after the post_send(). That * means we have to make sure everything is properly recorded and - * our state is consistent before we call _ipoib_ib_send(). + * our state is consistent before we call post_send(). */ tx_req = &priv->tx_ring[priv->tx_head]; tx_req->skb = skb; - tx_req->callback = callback; - tx_req->ptr = ptr; addr = pci_map_single(priv->ca->dma_device, skb->data, skb->len, PCI_DMA_TODEVICE); pci_unmap_addr_set(tx_req, mapping, addr); - if (_ipoib_ib_send(priv, priv->tx_head, address, qpn, addr, skb->len)) { - ipoib_warn(priv, "_ipoib_ib_send failed\n"); + if (post_send(priv, priv->tx_head, address, qpn, addr, skb->len)) { + ipoib_warn(priv, "post_send failed\n"); ++priv->stats.tx_errors; tx_req->skb = NULL; - tx_req->callback = NULL; - tx_req->ptr = NULL; + dev_kfree_skb_any(skb); } else { unsigned long flags; @@ -368,17 +320,7 @@ netif_stop_queue(dev); } spin_unlock_irqrestore(&priv->lock, flags); - - return 0; } - -err: - dev_kfree_skb_any(skb); - -out: - callback(ptr); - - return 0; } int ipoib_ib_dev_open(struct net_device *dev) @@ -451,11 +393,7 @@ /* Delete broadcast and local addresses since they will be recreated */ ipoib_mcast_dev_down(dev); - ipoib_sarp_delete(dev, dev->dev_addr); - /* Invalidate all address vectors */ - ipoib_sarp_dev_flush(dev); - return 0; } @@ -546,7 +484,6 @@ /* Delete the broadcast address and the local address */ ipoib_mcast_dev_down(dev); - ipoib_sarp_delete(dev, dev->dev_addr); ipoib_transport_dev_cleanup(dev); } @@ -560,7 +497,7 @@ * Bug #2507. This implementation will probably be removed when the P_Key * change async notification is available. */ -int ipoib_dev_open(struct net_device *dev); +int ipoib_open(struct net_device *dev); /* =================================================================== */ /*.. ipoib_pkey_dev_check_presence - Check for the interface P_Key presence */ @@ -585,7 +522,7 @@ ipoib_pkey_dev_check_presence(dev); if (test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags)) - ipoib_dev_open(dev); + ipoib_open(dev); else { down(&pkey_sem); if (!test_bit(IPOIB_PKEY_STOP, &priv->flags)) Index: infiniband/ulp/ipoib/ipoib_vlan.c =================================================================== --- infiniband/ulp/ipoib/ipoib_vlan.c (revision 915) +++ infiniband/ulp/ipoib/ipoib_vlan.c (working copy) @@ -75,6 +75,9 @@ priv->pkey = pkey; + priv->dev->broadcast[8] = pkey >> 8; + priv->dev->broadcast[9] = pkey & 0xff; + result = ipoib_dev_init(priv->dev, ppriv->ca, ppriv->port); if (result < 0) { ipoib_warn(ppriv, "failed to initialize subinterface: " Index: infiniband/ulp/ipoib/ipoib_proc.c =================================================================== --- infiniband/ulp/ipoib/ipoib_proc.c (revision 915) +++ infiniband/ulp/ipoib/ipoib_proc.c (working copy) @@ -33,263 +33,6 @@ #include "ts_kernel_services.h" /* - * ARP proc file stuff - */ - -static const char ipoib_arp_proc_entry_name[] = "ipoib_arp_%s"; -/* - * we have a static variable to hold the device pointer between when - * the /proc file is opened and the seq_file start function is - * called. (This is a kludge to get around the fact that we don't get - * to pass user data to the seq_file start function) - */ -static DECLARE_MUTEX(proc_arp_mutex); -static struct net_device *proc_arp_device; - -/* =============================================================== */ -/*.._ipoib_sarp_seq_start -- seq file handling */ -static void *_ipoib_sarp_seq_start(struct seq_file *file, loff_t *pos) -{ - struct ipoib_sarp_iter *iter = ipoib_sarp_iter_init(proc_arp_device); - loff_t n = *pos; - - while (n--) { - if (ipoib_sarp_iter_next(iter)) { - ipoib_sarp_iter_free(iter); - return NULL; - } - } - - return iter; -} - -/* =============================================================== */ -/*.._ipoib_sarp_seq_next -- seq file handling */ -static void *_ipoib_sarp_seq_next(struct seq_file *file, void *iter_ptr, - loff_t *pos) -{ - struct ipoib_sarp_iter *iter = iter_ptr; - - (*pos)++; - - if (ipoib_sarp_iter_next(iter)) { - ipoib_sarp_iter_free(iter); - return NULL; - } - - return iter; -} - -/* =============================================================== */ -/*.._ipoib_sarp_seq_stop -- seq file handling */ -static void _ipoib_sarp_seq_stop(struct seq_file *file, void *iter_ptr) -{ - /* nothing for now */ -} - -/* =============================================================== */ -/*.._ipoib_sarp_seq_show -- seq file handling */ -static int _ipoib_sarp_seq_show(struct seq_file *file, void *iter_ptr) -{ - struct ipoib_sarp_iter *iter = iter_ptr; - uint8_t hash[IPOIB_ADDRESS_HASH_BYTES]; - char gid_buf[sizeof("ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff")]; - union ib_gid gid; - u32 qpn; - int i, n; - unsigned long created, last_verify; - unsigned int queuelen, complete; - - if (iter) { - ipoib_sarp_iter_read(iter, hash, &gid, &qpn, &created, - &last_verify, &queuelen, &complete); - - for (i = 0; i < IPOIB_ADDRESS_HASH_BYTES; ++i) { - seq_printf(file, "%02x", hash[i]); - if (i < IPOIB_ADDRESS_HASH_BYTES - 1) - seq_putc(file, ':'); - else - seq_printf(file, " "); - } - - for (n = 0, i = 0; i < sizeof gid / 2; ++i) { - n += sprintf(gid_buf + n, "%x", - be16_to_cpu(((u16 *)gid.raw)[i])); - if (i < sizeof gid / 2 - 1) - gid_buf[n++] = ':'; - } - } - - seq_printf(file, "GID: %*s", -(1 + (int) sizeof(gid_buf)), gid_buf); - seq_printf(file, "QP#: 0x%06x", qpn); - - seq_printf(file, - " created: %10ld last_verify: %10ld queuelen: %4d complete: %d\n", - created, last_verify, queuelen, complete); - - return 0; -} - -static struct seq_operations ipoib_sarp_seq_operations = { - .start = _ipoib_sarp_seq_start, - .next = _ipoib_sarp_seq_next, - .stop = _ipoib_sarp_seq_stop, - .show = _ipoib_sarp_seq_show, -}; - -/* =============================================================== */ -/*.._ipoib_sarp_proc_open -- proc file handling */ -static int _ipoib_sarp_proc_open(struct inode *inode, struct file *file) -{ - struct proc_dir_entry *pde = PDE(inode); - - if (down_interruptible(&proc_arp_mutex)) - return -ERESTARTSYS; - - proc_arp_device = pde->data; - - return seq_open(file, &ipoib_sarp_seq_operations); -} - -/* - _ipoib_ascii_to_gid is adapted from BSD's inet_pton6, which was - originally written by Paul Vixie -*/ - -/* =============================================================== */ -/*.._ipoib_ascii_to_gid -- read GID from string */ -static int _ipoib_ascii_to_gid(const char *src, union ib_gid *dst) -{ - static const char xdigits[] = "0123456789abcdef"; - unsigned char *tp, *endp, *colonp; - const char *curtok; - int ch, saw_xdigit; - unsigned int val; - - memset((tp = (char *) dst), 0, sizeof (union ib_gid)); - endp = tp + sizeof (union ib_gid); - colonp = NULL; - - /* Leading :: requires some special handling. */ - if (*src == ':' && *++src != ':') - return 0; - - curtok = src; - saw_xdigit = 0; - val = 0; - - while ((ch = *src++) != '\0') { - const char *pch; - - pch = strchr(xdigits, tolower(ch)); - - if (pch) { - val <<= 4; - val |= (pch - xdigits); - if (val > 0xffff) - return 0; - - saw_xdigit = 1; - continue; - } - - if (ch == ':') { - curtok = src; - - if (!saw_xdigit) { - if (colonp) - return 0; - - colonp = tp; - continue; - } else if (*src == '\0') - return 0; - - if (tp + 2 > endp) - return 0; - - *tp++ = (u_char) (val >> 8) & 0xff; - *tp++ = (u_char) val & 0xff; - saw_xdigit = 0; - val = 0; - continue; - } - - return 0; - } - - if (saw_xdigit) { - if (tp + 2 > endp) - return 0; - - *tp++ = (u_char) (val >> 8) & 0xff; - *tp++ = (u_char) val & 0xff; - } - - if (colonp) { - memmove(endp - (tp - colonp), colonp, tp - colonp); - memset(colonp, 0, tp - colonp); - tp = endp; - } - - if (tp != endp) - return 0; - - return 1; -} - -/* =============================================================== */ -/*.._ipoib_sarp_proc_write -- proc file handling */ -static ssize_t _ipoib_sarp_proc_write(struct file *file, const char *buffer, - size_t count, loff_t *pos) -{ - struct ipoib_sarp *entry; - char kernel_buf[256]; - char gid_buf[sizeof("ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff")]; - union ib_gid gid; - u32 qpn; - - count = min(count, sizeof(kernel_buf)); - - if (copy_from_user(kernel_buf, buffer, count)) - return -EFAULT; - - kernel_buf[count - 1] = '\0'; - - if (sscanf(kernel_buf, "%39s %i", gid_buf, &qpn) != 2) - return -EINVAL; - - if (!_ipoib_ascii_to_gid(gid_buf, &gid)) - return -EINVAL; - - if (qpn > 0xffffff) - return -EINVAL; - - entry = ipoib_sarp_add(proc_arp_device, &gid, qpn); - if (entry) - ipoib_sarp_put(entry); - - return count; -} - -/* =============================================================== */ -/*.._ipoib_sarp_proc_release -- proc file handling */ -static int _ipoib_sarp_proc_release(struct inode *inode, struct file *file) -{ - up(&proc_arp_mutex); - - return seq_release(inode, file); -} - -static struct file_operations ipoib_sarp_proc_device_operations = { - .open = _ipoib_sarp_proc_open, - .read = seq_read, - .write = _ipoib_sarp_proc_write, - .llseek = seq_lseek, - .release = _ipoib_sarp_proc_release, -}; - -/* * Multicast proc stuff */ @@ -419,28 +162,14 @@ int ipoib_proc_dev_init(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - char name[sizeof(ipoib_arp_proc_entry_name) + sizeof (dev->name)]; + char name[sizeof(ipoib_mcast_proc_entry_name) + sizeof (dev->name)]; - snprintf(name, sizeof(name) - 1, ipoib_arp_proc_entry_name, dev->name); - priv->arp_proc_entry = create_proc_entry(name, - S_IRUGO | S_IWUGO, - tsKernelProcDirGet()); - - if (!priv->arp_proc_entry) { - ipoib_warn(priv, "Can't create %s in /proc\n", name); - return -ENOMEM; - } - - priv->arp_proc_entry->proc_fops = &ipoib_sarp_proc_device_operations; - priv->arp_proc_entry->data = dev; - snprintf(name, sizeof(name) - 1, ipoib_mcast_proc_entry_name, dev->name); priv->mcast_proc_entry = create_proc_entry(name, S_IRUGO, tsKernelProcDirGet()); if (!priv->mcast_proc_entry) { ipoib_warn(priv, "Can't create %s in /proc\n", name); - /* FIXME: Delete ARP proc entry */ return -ENOMEM; } @@ -455,14 +184,8 @@ void ipoib_proc_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - char name[sizeof(ipoib_arp_proc_entry_name) + sizeof(dev->name)]; + char name[sizeof(ipoib_mcast_proc_entry_name) + sizeof(dev->name)]; - if (priv->arp_proc_entry) { - snprintf(name, sizeof(name) - 1, ipoib_arp_proc_entry_name, - dev->name); - remove_proc_entry(name, tsKernelProcDirGet()); - } - if (priv->mcast_proc_entry) { snprintf(name, sizeof(name) - 1, ipoib_mcast_proc_entry_name, dev->name); Index: infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- infiniband/ulp/ipoib/ipoib_multicast.c (revision 915) +++ infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -62,11 +62,6 @@ struct rb_node *rb_node; }; -union ib_gid ipoib_broadcast_mgid = { - .raw = { 0xff, 0x12, 0x40, 0x1b, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff } -}; - /* =============================================================== */ /*..ipoib_mcast_get - get reference to multicast group */ static inline void ipoib_mcast_get(struct ipoib_mcast *mcast) @@ -212,7 +207,7 @@ } /* Set the cached Q_Key before we attach if it's the broadcast group */ - if (!memcmp(mcast->mgid.raw, priv->bcast_gid.raw, sizeof (union ib_gid))) + if (!memcmp(mcast->mgid.raw, priv->dev->broadcast + 4, sizeof (union ib_gid))) priv->qkey = priv->broadcast->mcast_member.qkey; ret = ipoib_mcast_attach(dev, mcast->mcast_member.mlid, &mcast->mgid); @@ -452,7 +447,6 @@ { struct net_device *dev = dev_ptr; struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_sarp *entry; unsigned long flags; down(&mcast_mutex); @@ -474,12 +468,9 @@ return; } - priv->bcast_gid = ipoib_broadcast_mgid; - priv->bcast_gid.raw[4] = (priv->pkey >> 8) & 0xff; - priv->bcast_gid.raw[5] = priv->pkey & 0xff; + memcpy(priv->broadcast->mgid.raw, priv->dev->broadcast + 4, + sizeof (union ib_gid)); - priv->broadcast->mgid = priv->bcast_gid; - spin_lock_irqsave(&priv->lock, flags); __ipoib_mcast_add(dev, priv->broadcast); spin_unlock_irqrestore(&priv->lock, flags); @@ -524,16 +515,14 @@ if (ib_query_gid(priv->ca, priv->port, 0, &priv->local_gid)) ipoib_warn(priv, "ib_gid_entry_get() failed\n"); + else + memcpy(priv->dev->dev_addr + 4, priv->local_gid.raw, sizeof (union ib_gid)); priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcast_member.mtu) - IPOIB_ENCAP_LEN; dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); - entry = ipoib_sarp_local_add(dev, &priv->local_gid, priv->local_qpn); - if (entry) - ipoib_sarp_put(entry); - ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); netif_carrier_on(dev); @@ -682,19 +671,11 @@ } /* =============================================================== */ -/*..ipoib_mcast_tx_callback -- put reference to group after TX */ -static void ipoib_mcast_tx_callback(void *ptr) -{ - ipoib_mcast_put((struct ipoib_mcast *)ptr); -} - -/* =============================================================== */ /*..ipoib_mcast_send -- send skb to multicast group */ -int ipoib_mcast_send(struct net_device *dev, struct ipoib_mcast *mcast, - struct sk_buff *skb) +void ipoib_mcast_send(struct net_device *dev, struct ipoib_mcast *mcast, + struct sk_buff *skb) { - return ipoib_dev_send(dev, skb, ipoib_mcast_tx_callback, mcast, - mcast->address_handle, IB_MULTICAST_QPN); + ipoib_send(dev, skb, mcast->address_handle, IB_MULTICAST_QPN); } /* =============================================================== */ @@ -830,7 +811,7 @@ u32 multiaddr = ntohl(im->multiaddr); union ib_gid mgid; - mgid = ipoib_broadcast_mgid; + memcpy(mgid.raw, dev->broadcast + 4, sizeof mgid); /* Add in the P_Key */ mgid.raw[4] = (priv->pkey >> 8) & 0xff; @@ -843,8 +824,7 @@ mgid.raw[15] = multiaddr & 0xff; mcast = __ipoib_mcast_find(dev, &mgid); - if (!mcast - || test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) { + if (!mcast || test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) { struct ipoib_mcast *nmcast; /* Not found or send-only group, let's add a new entry */ Index: infiniband/ulp/ipoib/Makefile =================================================================== --- infiniband/ulp/ipoib/Makefile (revision 915) +++ infiniband/ulp/ipoib/Makefile (working copy) @@ -2,13 +2,15 @@ -Idrivers/infiniband/include \ -D_NO_DATA_PATH_TRACE -obj-$(CONFIG_INFINIBAND_IPOIB) += ib_ipoib.o ib_ip2pr.o +obj-$(CONFIG_INFINIBAND_IPOIB) += ib_ipoib.o +# ip2pr is BROKEN now +# obj-$(CONFIG_INFINIBAND_IPOIB) += ib_ip2pr.o + ib_ipoib-objs := \ ipoib_main.o \ ipoib_ib.o \ ipoib_multicast.o \ - ipoib_arp.o \ ipoib_proc.o \ ipoib_verbs.o \ ipoib_vlan.o From mshefty at ichips.intel.com Wed Oct 6 16:01:44 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 6 Oct 2004 16:01:44 -0700 Subject: [openib-general] [PATCH] for review to timeout send MADs Message-ID: <20041006160144.3cadd22d.mshefty@ichips.intel.com> Here are some modifications to support timing out send MADs in the access layer. I haven't tested this code beyond building it, but wanted to make it available for review. There are a few race conditions that need to be avoided when handling timeouts, so if it looks like something was missed, let me know. Hal, can you either send me your test code or check it into svn somewhere? I want to verify that this doesn't break your current tests, then expand the tests to check that the timeout code is working properly. Additional comments: A couple of structure elements were renamed to better reflect their new usage. mad_agents now have two lists: send_list and wait_list. mad_send_wr on the send_list have active work requests posted for them. Once all work requests have completed, if the mad_send_wr has a timeout and hasn't been canceled, it is moved to the wait_list. A workqueue is used to schedule delayed processing of MAD timeouts. The scheduling delay of the timeout thread is adjusted when a send completes or is canceled. If anyone sees an issue with my usage of the workqueue, just let me know. - Sean -- Index: access/ib_mad_priv.h =================================================================== --- access/ib_mad_priv.h (revision 946) +++ access/ib_mad_priv.h (working copy) @@ -58,6 +58,8 @@ #include #include +#include +#include #include #include @@ -106,8 +108,11 @@ struct ib_mad_reg_req *reg_req; struct ib_mad_port_private *port_priv; - spinlock_t send_list_lock; + spinlock_t lock; struct list_head send_list; + struct list_head wait_list; + struct work_struct work; + unsigned long timeout; atomic_t refcount; wait_queue_head_t wait; @@ -116,11 +121,11 @@ struct ib_mad_send_wr_private { struct list_head send_list; - struct list_head agent_send_list; + struct list_head agent_list; struct ib_mad_agent *agent; u64 wr_id; /* client WR ID */ u64 tid; - int timeout_ms; + unsigned long timeout; int refcount; enum ib_wc_status status; }; Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 946) +++ access/ib_mad.c (working copy) @@ -87,6 +87,7 @@ static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv); static void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, struct ib_mad_send_wc *mad_send_wc); +static void timeout_sends(void *data); /* * ib_register_mad_agent - Register to send/receive MADs @@ -225,8 +226,10 @@ list_add_tail(&mad_agent_priv->agent_list, &port_priv->agent_list); spin_unlock_irqrestore(&port_priv->reg_lock, flags); - spin_lock_init(&mad_agent_priv->send_list_lock); + spin_lock_init(&mad_agent_priv->lock); INIT_LIST_HEAD(&mad_agent_priv->send_list); + INIT_LIST_HEAD(&mad_agent_priv->wait_list); + INIT_WORK(&mad_agent_priv->work, timeout_sends, mad_agent_priv); atomic_set(&mad_agent_priv->refcount, 1); init_waitqueue_head(&mad_agent_priv->wait); mad_agent_priv->port_priv = port_priv; @@ -254,17 +257,26 @@ mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private, agent); - /* Cleanup pending receives for this agent !!! */ + /* Note that we could still be handling received MADs. */ + + /* + * Canceling all sends results in dropping received response MADs, + * preventing us from queuing additional work. + */ cancel_mads(mad_agent_priv); + cancel_delayed_work(&mad_agent_priv->work); + flush_scheduled_work(); + spin_lock_irqsave(&mad_agent_priv->port_priv->reg_lock, flags); remove_mad_reg_req(mad_agent_priv); list_del(&mad_agent_priv->agent_list); spin_unlock_irqrestore(&mad_agent_priv->port_priv->reg_lock, flags); + + /* Cleanup pending RMPP receives for this agent !!! */ atomic_dec(&mad_agent_priv->refcount); - wait_event(mad_agent_priv->wait, - !atomic_read(&mad_agent_priv->refcount)); + wait_event(mad_agent_priv->wait, !atomic_read(&mad_agent_priv->refcount)); if (mad_agent_priv->reg_req) kfree(mad_agent_priv->reg_req); @@ -346,19 +358,19 @@ mad_send_wr->tid = send_wr->wr.ud.mad_hdr->tid; mad_send_wr->agent = mad_agent; - mad_send_wr->timeout_ms = cur_send_wr->wr.ud.timeout_ms; - if (mad_send_wr->timeout_ms) - mad_send_wr->refcount = 2; - else - mad_send_wr->refcount = 1; + /* Timeout will be updated after send completes. */ + mad_send_wr->timeout = msecs_to_jiffies( + cur_send_wr->wr.ud.timeout_ms); + /* One reference for each work request to QP + response. */ + mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0); mad_send_wr->status = IB_WC_SUCCESS; /* Reference MAD agent until send completes */ atomic_inc(&mad_agent_priv->refcount); - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); - list_add_tail(&mad_send_wr->agent_send_list, + spin_lock_irqsave(&mad_agent_priv->lock, flags); + list_add_tail(&mad_send_wr->agent_list, &mad_agent_priv->send_list); - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); cur_send_wr->next = NULL; ret = ib_send_mad(mad_agent_priv, mad_send_wr, @@ -367,16 +379,12 @@ /* Handle QP overrun separately... -ENOMEM */ /* Fail send request */ - spin_lock_irqsave(&mad_agent_priv->send_list_lock, - flags); - list_del(&mad_send_wr->agent_send_list); - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, - flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); + list_del(&mad_send_wr->agent_list); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); *bad_send_wr = cur_send_wr; - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); - + atomic_dec(&mad_agent_priv->refcount); return ret; } cur_send_wr= next_send_wr; @@ -690,28 +698,35 @@ } } if (!mad_agent) { - printk(KERN_ERR "No client 0x%x for received MAD on port %d\n", - hi_tid, port_priv->port_num); + printk(KERN_ERR "No client 0x%x for received MAD on " + "port %d\n", hi_tid, port_priv->port_num); goto ret; } } else { /* Routing is based on version, class, and method */ if (mad->mad_hdr.class_version >= MAX_MGMT_VERSION) { - printk(KERN_ERR "MAD received with unsupported class version %d on port %d\n", + printk(KERN_ERR "MAD received with unsupported class " + "version %d on port %d\n", mad->mad_hdr.class_version, port_priv->port_num); goto ret; } version = port_priv->version[mad->mad_hdr.class_version]; if (!version) { - printk(KERN_ERR "MAD received on port %d for class version %d with no client\n", port_priv->port_num, mad->mad_hdr.class_version); + printk(KERN_ERR "MAD received on port %d for class " + "version %d with no client\n", + port_priv->port_num, mad->mad_hdr.class_version); goto ret; } - class = version->method_table[convert_mgmt_class(mad->mad_hdr.mgmt_class)]; + class = version->method_table[convert_mgmt_class( + mad->mad_hdr.mgmt_class)]; if (!class) { - printk(KERN_ERR "MAD received on port %d for class %d with no client\n", port_priv->port_num, mad->mad_hdr.mgmt_class); + printk(KERN_ERR "MAD received on port %d for class %d " + "with no client\n", port_priv->port_num, + mad->mad_hdr.mgmt_class); goto ret; } - mad_agent = class->agent[mad->mad_hdr.method & ~IB_MGMT_METHOD_RESP]; + mad_agent = class->agent[mad->mad_hdr.method & + ~IB_MGMT_METHOD_RESP]; } ret: @@ -724,8 +739,8 @@ /* Make sure MAD base version is understood */ if (mad->mad_hdr.base_version != IB_MGMT_BASE_VERSION) { - printk(KERN_ERR "MAD received with unsupported base version %d\n", - mad->mad_hdr.base_version); + printk(KERN_ERR "MAD received with unsupported base " + "version %d\n", mad->mad_hdr.base_version); goto ret; } @@ -761,16 +776,24 @@ { struct ib_mad_send_wr_private *mad_send_wr; + list_for_each_entry(mad_send_wr, &mad_agent_priv->wait_list, + agent_list) { + + if (mad_send_wr->tid == tid) + return mad_send_wr; + } + + /* + * It's possible to receive the response before we've been notified + * that the send has completed. + */ list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, - agent_send_list) { + agent_list) { - if (mad_send_wr->tid == tid) { - /* Verify request is still valid */ - if (mad_send_wr->status == IB_WC_SUCCESS && - mad_send_wr->timeout_ms) - return mad_send_wr; - else - return NULL; + if (mad_send_wr->tid == tid && mad_send_wr->timeout) { + /* Verify request has not been canceled. */ + return (mad_send_wr->status == IB_WC_SUCCESS) ? + mad_send_wr : NULL; } } return NULL; @@ -791,17 +814,17 @@ /* Complete corresponding request */ if (solicited) { - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); mad_send_wr = find_send_req(mad_agent_priv, recv->mad.mad.mad_hdr.tid); if (!mad_send_wr) { - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, - flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); ib_free_recv_mad(&recv->header.recv_wc); return; } - mad_send_wr->timeout_ms = 0; - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + /* Timeout = 0 means that we won't wait for a response. */ + mad_send_wr->timeout = 0; + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Defined behavior is to complete response before request */ mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, @@ -849,20 +872,20 @@ spin_lock_irqsave(&port_priv->recv_list_lock, flags); if (!list_empty(&port_priv->recv_posted_mad_list[qpn])) { rbuf = list_entry(&port_priv->recv_posted_mad_list[qpn], - struct ib_mad_recv_buf, - list); - rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; + struct ib_mad_recv_buf, + list); + rbuf = (struct ib_mad_recv_buf*)rbuf->list.next; mad_priv_hdr = container_of(rbuf, struct ib_mad_private_header, recv_buf); - recv = container_of(mad_priv_hdr, struct ib_mad_private, header); + recv = container_of(mad_priv_hdr,struct ib_mad_private,header); /* Remove from posted receive MAD list */ list_del(&recv->header.recv_buf.list); port_priv->recv_posted_mad_count[qpn]--; } else { - printk(KERN_ERR "Receive completion WR ID 0x%Lx on QP %d with no" - "posted receive\n", wc->wr_id, qp_num); + printk(KERN_ERR "Receive completion WR ID 0x%Lx on QP %d with " + "no posted receive\n", wc->wr_id, qp_num); spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); return; @@ -893,7 +916,8 @@ solicited); if (!mad_agent) { spin_unlock_irqrestore(&port_priv->reg_lock, flags); - printk(KERN_NOTICE "No matching mad agent found for received MAD on port %d\n", port_priv->port_num); + printk(KERN_NOTICE "No matching mad agent found for received " + "MAD on port %d\n", port_priv->port_num); } else { atomic_inc(&mad_agent->refcount); spin_unlock_irqrestore(&port_priv->reg_lock, flags); @@ -911,6 +935,59 @@ return; } +static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv) +{ + struct ib_mad_send_wr_private *mad_send_wr; + unsigned long delay; + + if (list_empty(&mad_agent_priv->wait_list)) { + cancel_delayed_work(&mad_agent_priv->work); + } else { + + mad_send_wr = list_entry(mad_agent_priv->wait_list.next, + struct ib_mad_send_wr_private, + agent_list); + + if (time_after(mad_agent_priv->timeout, mad_send_wr->timeout)) { + + mad_agent_priv->timeout = mad_send_wr->timeout; + cancel_delayed_work(&mad_agent_priv->work); + delay = mad_send_wr->timeout - jiffies; + if ((long)delay <= 0) + delay = 1; + schedule_delayed_work(&mad_agent_priv->work, delay); + } + } +} + +static void wait_for_response(struct ib_mad_agent_private *mad_agent_priv, + struct ib_mad_send_wr_private *mad_send_wr ) +{ + struct ib_mad_send_wr_private *temp_mad_send_wr; + struct list_head *list_item; + unsigned long delay; + + list_del(&mad_send_wr->agent_list); + + delay = mad_send_wr->timeout; + mad_send_wr->timeout += jiffies; + + list_for_each_prev(list_item, &mad_agent_priv->wait_list) { + temp_mad_send_wr = list_entry(list_item, + struct ib_mad_send_wr_private, + agent_list); + if (time_after(mad_send_wr->timeout, temp_mad_send_wr->timeout)) + break; + } + list_add(&mad_send_wr->agent_list, list_item); + + /* Re-schedule a work item if we have a shorter timeout. */ + if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list) { + cancel_delayed_work(&mad_agent_priv->work); + schedule_delayed_work(&mad_agent_priv->work, delay); + } +} + /* * Process a send work completion. */ @@ -923,30 +1000,27 @@ mad_agent_priv = container_of(mad_send_wr->agent, struct ib_mad_agent_private, agent); - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); if (mad_send_wc->status != IB_WC_SUCCESS && mad_send_wr->status == IB_WC_SUCCESS) { mad_send_wr->status = mad_send_wc->status; - if (mad_send_wr->timeout_ms) { - mad_send_wr->timeout_ms = 0; - mad_send_wr->refcount--; - } + mad_send_wr->refcount -= (mad_send_wr->timeout > 0); } - /* - * Leave sends with timeouts on the send list - * until either matching response is received - * or timeout occurs - */ if (--mad_send_wr->refcount > 0) { - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + if (mad_send_wr->refcount == 1 && mad_send_wr->timeout && + mad_send_wr->status == IB_WC_SUCCESS) { + wait_for_response(mad_agent_priv, mad_send_wr); + } + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); return; } /* Remove send from MAD agent and notify client of completion */ - list_del(&mad_send_wr->agent_send_list); - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + list_del(&mad_send_wr->agent_list); + adjust_timeout(mad_agent_priv); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); if (mad_send_wr->status != IB_WC_SUCCESS ) mad_send_wc->status = mad_send_wr->status; @@ -1045,40 +1119,33 @@ INIT_LIST_HEAD(&cancel_list); - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, - &mad_agent_priv->send_list, agent_send_list) { + &mad_agent_priv->send_list, agent_list) { - if (mad_send_wr->status == IB_WC_SUCCESS) + if (mad_send_wr->status == IB_WC_SUCCESS) { mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - - if (mad_send_wr->timeout_ms) { - mad_send_wr->timeout_ms = 0; - mad_send_wr->refcount--; - } - - if (mad_send_wr->refcount == 0) { - list_del(&mad_send_wr->agent_send_list); - list_add_tail(&mad_send_wr->agent_send_list, - &cancel_list); + mad_send_wr->refcount -= (mad_send_wr->timeout > 0); } } - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + + /* Empty wait list to prevent receives from finding a request. */ + list_splice_init(&mad_agent_priv->wait_list, &cancel_list); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Report all cancelled requests */ mad_send_wc.status = IB_WC_WR_FLUSH_ERR; mad_send_wc.vendor_err = 0; list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, - &cancel_list, agent_send_list) { + &cancel_list, agent_list) { mad_send_wc.wr_id = mad_send_wr->wr_id; mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, &mad_send_wc); - list_del(&mad_send_wr->agent_send_list); + list_del(&mad_send_wr->agent_list); kfree(mad_send_wr); - atomic_dec(&mad_agent_priv->refcount); } } @@ -1089,11 +1156,18 @@ { struct ib_mad_send_wr_private *mad_send_wr; + list_for_each_entry(mad_send_wr, &mad_agent_priv->wait_list, + agent_list) { + if (mad_send_wr->wr_id == wr_id) + return mad_send_wr; + } + list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, - agent_send_list) { + agent_list) { if (mad_send_wr->wr_id == wr_id) return mad_send_wr; } + return NULL; } @@ -1107,28 +1181,25 @@ mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private, agent); - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); mad_send_wr = find_send_by_wr_id(mad_agent_priv, wr_id); if (!mad_send_wr) { - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); goto ret; } if (mad_send_wr->status == IB_WC_SUCCESS) - mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - - if (mad_send_wr->timeout_ms) { - mad_send_wr->timeout_ms = 0; - mad_send_wr->refcount--; - } + mad_send_wr->refcount -= (mad_send_wr->timeout > 0); if (mad_send_wr->refcount != 0) { - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + mad_send_wr->status = IB_WC_WR_FLUSH_ERR; + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); goto ret; } - list_del(&mad_send_wr->agent_send_list); - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + list_del(&mad_send_wr->agent_list); + adjust_timeout(mad_agent_priv); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); mad_send_wc.status = IB_WC_WR_FLUSH_ERR; mad_send_wc.vendor_err = 0; @@ -1145,6 +1216,47 @@ } EXPORT_SYMBOL(ib_cancel_mad); +static void timeout_sends(void *data) +{ + struct ib_mad_agent_private *mad_agent_priv; + struct ib_mad_send_wr_private *mad_send_wr; + struct ib_mad_send_wc mad_send_wc; + unsigned long flags, delay; + + mad_agent_priv = (struct ib_mad_agent_private*)data; + + mad_send_wc.status = IB_WC_RESP_TIMEOUT_ERR; + mad_send_wc.vendor_err = 0; + + spin_lock_irqsave(&mad_agent_priv->lock, flags); + while (!list_empty(&mad_agent_priv->wait_list)) { + + mad_send_wr = list_entry(mad_agent_priv->wait_list.next, + struct ib_mad_send_wr_private, + agent_list); + + if (time_after(mad_send_wr->timeout, jiffies)) { + delay = mad_send_wr->timeout - jiffies; + if ((long)delay <= 0) + delay = 1; + schedule_delayed_work(&mad_agent_priv->work, delay); + break; + } + + list_del(&mad_send_wr->agent_list); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); + + mad_send_wc.wr_id = mad_send_wr->wr_id; + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, + &mad_send_wc); + + kfree(mad_send_wr); + atomic_dec(&mad_agent_priv->refcount); + spin_lock_irqsave(&mad_agent_priv->lock, flags); + } + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); +} + /* * IB MAD thread */ @@ -1815,6 +1927,8 @@ static int __init ib_mad_init_module(void) { + int ret; + ib_mad_cache = kmem_cache_create("ib_mad", sizeof(struct ib_mad_private), 0, @@ -1830,10 +1944,14 @@ if (ib_register_client(&mad_client)) { printk(KERN_ERR "Couldn't register ib_mad client\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } - return 0; + +error: + kmem_cache_destroy(ib_mad_cache); + return ret; } static void __exit ib_mad_cleanup_module(void) From robert.j.woodruff at intel.com Wed Oct 6 16:48:51 2004 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 6 Oct 2004 16:48:51 -0700 Subject: [openib-general] mthca support for Arbel Message-ID: <1AC79F16F5C5284499BB9591B33D6F000266F6A1@orsmsx408> Hal > Hi, >Has mthca been tested with Arbel (PCI Express) ? Is this in >compatibility mode or native mode or both ? >Thanks. >-- Hal Does the PCI-Express HCA require any software changes to the HCA driver ? i.e., will it run with an existing PCI-X tavor driver ? I have heard that it does not require any changes, but I just wanted to confim that. From roland at topspin.com Wed Oct 6 17:19:43 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 06 Oct 2004 17:19:43 -0700 Subject: [openib-general] mthca support for Arbel In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F000266F6A1@orsmsx408> (Robert J. Woodruff's message of "Wed, 6 Oct 2004 16:48:51 -0700") References: <1AC79F16F5C5284499BB9591B33D6F000266F6A1@orsmsx408> Message-ID: <52655npeg0.fsf@topspin.com> Robert> Does the PCI-Express HCA require any software changes to Robert> the HCA driver ? i.e., will it run with an existing PCI-X Robert> tavor driver ? I have heard that it does not require any Robert> changes, but I just wanted to confim that. As Hal mentioned, there are two modes that the Mellanox PCIe HCA can run in: compatible and native mode. In compatible mode, the same driver as for the PCI-X HCA can be used. In native mode, new features are exposed (memory-free mode, verbs extensions, etc), but new software is required. - Roland From roland at topspin.com Wed Oct 6 17:41:29 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 06 Oct 2004 17:41:29 -0700 Subject: [openib-general] [PATCH] for review to timeout send MADs In-Reply-To: <20041006160144.3cadd22d.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 6 Oct 2004 16:01:44 -0700") References: <20041006160144.3cadd22d.mshefty@ichips.intel.com> Message-ID: <521xgbpdfq.fsf@topspin.com> Sean> A workqueue is used to schedule delayed processing of MAD Sean> timeouts. The scheduling delay of the timeout thread is Sean> adjusted when a send completes or is canceled. If anyone Sean> sees an issue with my usage of the workqueue, just let me Sean> know. It seems you are using the system-wide keventd queue. This isn't necessarily a problem per se, but it would probably be better to use a MAD-layer private workqueue (I suggested a single-threaded workqueue per MAD "port" earlier). This avoids two problems. First, keventd is subject to arbitrary delays because it is used as a dumping ground for any code that needs to sleep. Second, if the MAD layer has its own workqueue, then it can be used for completion processing as well; as it stands it seems sort of funny to create a kthread to do some work and run other work from a workqueue. A few low-level comments too: rbuf = list_entry(&port_priv->recv_posted_mad_list[qpn], - struct ib_mad_recv_buf, - list); - rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; + struct ib_mad_recv_buf, + list); + rbuf = (struct ib_mad_recv_buf*)rbuf->list.next; I don't understand what's going on here; can this not be written as: rbuf = list_entry(port_priv->recv_posted_mad_list[qpn].next, struct ib_mad_recv_buf, list); (By the way the cast should be written with spaces as: (struct ib_mad_recv_buf *) rbuf->list.next) This patch seems whitespace-challenged in other places too: - recv = container_of(mad_priv_hdr, struct ib_mad_private, header); + recv = container_of(mad_priv_hdr,struct ib_mad_private,header); and has extra empty lines places like here: + if (time_after(mad_agent_priv->timeout, mad_send_wr->timeout)) { + + mad_agent_priv->timeout = mad_send_wr->timeout; and here: + if (list_empty(&mad_agent_priv->wait_list)) { + cancel_delayed_work(&mad_agent_priv->work); + } else { + + mad_send_wr = list_entry(mad_agent_priv->wait_list.next, From sean.hefty at intel.com Wed Oct 6 21:10:00 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 6 Oct 2004 21:10:00 -0700 Subject: [openib-general] [PATCH] for review to timeout send MADs In-Reply-To: <521xgbpdfq.fsf@topspin.com> Message-ID: >It seems you are using the system-wide keventd queue. This isn't >necessarily a problem per se, but it would probably be better to use a >MAD-layer private workqueue (I suggested a single-threaded workqueue >per MAD "port" earlier). This avoids two problems. First, keventd is >subject to arbitrary delays because it is used as a dumping ground for >any code that needs to sleep. Second, if the MAD layer has its own >workqueue, then it can be used for completion processing as well; as >it stands it seems sort of funny to create a kthread to do some work >and run other work from a workqueue. I am using the system level queue. If we think that using our own MAD queue is better, I will do that. I was thinking more along the lines of a single workqueue for all MAD services, with one per processor, rather than a workqueue per port, however. I was planning on changing completion processing to using work queues in a separate patch. >A few low-level comments too: > > rbuf = list_entry(&port_priv->recv_posted_mad_list[qpn], >- struct ib_mad_recv_buf, >- list); >- rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; >+ struct ib_mad_recv_buf, >+ list); >+ rbuf = (struct ib_mad_recv_buf*)rbuf->list.next; > >I don't understand what's going on here; can this not be written as: > > rbuf = list_entry(port_priv->recv_posted_mad_list[qpn].next, > struct ib_mad_recv_buf, list); > What happened is that I noticed that list_entry was being used like you saw and went to correct it the way you suggested, but backed out that change (manually) after I saw the problem in another area to avoid combining patches. I missed that I didn't revert to the original code. I've started a separate patch to fix-up this issue, and will make sure that this patch does not modify this area of the code. >This patch seems whitespace-challenged in other places too: I'll go back and update this. Thanks. If you see other places that I missed other than what you mentioned, please let me know. From roland.list at gmail.com Thu Oct 7 07:19:51 2004 From: roland.list at gmail.com (Roland Dreier) Date: Thu, 7 Oct 2004 07:19:51 -0700 Subject: [openib-general] [PATCH] for review to timeout send MADs In-Reply-To: References: <521xgbpdfq.fsf@topspin.com> Message-ID: > I am using the system level queue. If we think that using our own MAD queue > is better, I will do that. I was thinking more along the lines of a single > workqueue for all MAD services, with one per processor, rather than a > workqueue per port, however. I don't think the system keventd queue is appropriate for completion handling; I definitely think we should have a MAD layer workqueue. I think it's OK to have a single workqueue for the whole MAD layer; it's probably not scalable to systems with a huge number of HCAs, but the right fix for that is more likely to be moving to softirq/timer context anyway. - R. From halr at voltaire.com Thu Oct 7 07:22:38 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 10:22:38 -0400 Subject: [openib-general] [PATCH] ib_mad.h: Remove network endianconversionof QP1 QKey In-Reply-To: References: Message-ID: <1097158958.2012.56.camel@localhost.localdomain> On Wed, 2004-10-06 at 13:23, Sean Hefty wrote: > >The only byte swapping is done by the HCA not in the MAD layer or > >client. > > Mthca performs byte-swapping on the qkey in both of these cases. Oops. Forgot about the driver :-( > At this point, I think the openib stack needs byte-ordering reworked, but it > can probably wait. Agreed. I think this is a larger undertaking given the reliance on the current implementation. Also, recall several postings on this on this list. Perhaps this is something to consider post SC 2004. -- Hal From halr at voltaire.com Thu Oct 7 07:28:10 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 10:28:10 -0400 Subject: [openib-general] Re: [PATCH] for review to timeout send MADs In-Reply-To: <20041006160144.3cadd22d.mshefty@ichips.intel.com> References: <20041006160144.3cadd22d.mshefty@ichips.intel.com> Message-ID: <1097159289.2012.63.camel@localhost.localdomain> On Wed, 2004-10-06 at 19:01, Sean Hefty wrote: > Here are some modifications to support timing out send MADs in > the access layer. I haven't tested this code beyond building it, Is this still the case ? I noticed you burning the midnight oil last night :-) > but wanted to make it available for review. There are a few race > conditions that need to be avoided when handling timeouts, so if > it looks like something was missed, let me know. If it is not too much work, I would prefer this broken into 2 patches: the first on variable name changes, common error exiting, etc. (all the changes not related to the addition of the timeout functionality) and a second with just the timeout functionality. > Hal, can you either send me your test code or check it into svn > somewhere? I want to verify that this doesn't break your current > tests, then expand the tests to check that the timeout code is > working properly. This test code currently relies on a GSM generating a request in order to reflect the request back. Is that useful to you ? If so, I will post it to this list will integrate it into the tree and the build process such that it does not get included with the normal Linux source. -- Hal From halr at voltaire.com Thu Oct 7 07:30:59 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 10:30:59 -0400 Subject: [openib-general] [PATCH] for review to timeout send MADs In-Reply-To: References: <521xgbpdfq.fsf@topspin.com> Message-ID: <1097159458.2012.67.camel@localhost.localdomain> On Thu, 2004-10-07 at 10:19, Roland Dreier wrote: > I think it's OK to have a single workqueue for the whole MAD layer; > it's probably > not scalable to systems with a huge number of HCAs, Any idea on where the cutoff for huge is or is this likely to be a matter of experience ? Is it at least 2 ? What about 4 ? Just wondering... -- Hal > but the right fix for that > is more likely to be moving to softirq/timer context anyway. From roland.list at gmail.com Thu Oct 7 08:20:16 2004 From: roland.list at gmail.com (Roland Dreier) Date: Thu, 7 Oct 2004 08:20:16 -0700 Subject: [openib-general] [PATCH] for review to timeout send MADs In-Reply-To: <1097159458.2012.67.camel@localhost.localdomain> References: <521xgbpdfq.fsf@topspin.com> <1097159458.2012.67.camel@localhost.localdomain> Message-ID: > Any idea on where the cutoff for huge is or is this likely to be a > matter of experience ? Is it at least 2 ? What about 4 ? Depends on the number of CPUs and the workload. The single workqueue design starts being inefficient when you start getting idle time because every workqueue thread is asleep waiting for an HCA to do something (like a modify QP firmware command). If there are other HCAs ready to do useful work they won't be able to get to it and throughput will be lower than it would be with a thread per HCA port. - R. From sean.hefty at intel.com Thu Oct 7 09:36:36 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 7 Oct 2004 09:36:36 -0700 Subject: [openib-general] Re: [PATCH] for review to timeout send MADs In-Reply-To: <1097159289.2012.63.camel@localhost.localdomain> Message-ID: >On Wed, 2004-10-06 at 19:01, Sean Hefty wrote: >> but wanted to make it available for review. There are a few race >> conditions that need to be avoided when handling timeouts, so if >> it looks like something was missed, let me know. > >If it is not too much work, I would prefer this broken into 2 patches: >the first on variable name changes, common error exiting, etc. (all the >changes not related to the addition of the timeout functionality) and a >second with just the timeout functionality. I'll do this. Is just that some of the changes, i.e. variable names, make more sense in the context of the other changes. I will also change to using a MAD specific workqueue. - Sean From halr at voltaire.com Thu Oct 7 11:46:18 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 14:46:18 -0400 Subject: [openib-general] [PATCH] ib_smi: Trim methods needed to Get, Set, and TrapRepress Message-ID: <1097174778.2922.7.camel@hpc-1> ib_smi: Trim methods needed to Get, Set, and TrapRepress Also, eliminate some compiler warnings Index: ib_smi.c =================================================================== --- ib_smi.c (revision 947) +++ ib_smi.c (working copy) @@ -319,7 +319,7 @@ } spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR "smp_send: no matching MAD agent 0x%x\n", mad_agent); + printk(KERN_ERR "smp_send: no matching MAD agent 0x%x\n", (unsigned int)mad_agent); return; } @@ -451,7 +451,7 @@ /* Hold lock longer !!! */ spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR "smi_send_handler: no matching MAD agent 0x%x\n", mad_agent); + printk(KERN_ERR "smi_send_handler: no matching MAD agent 0x%x\n", (unsigned int)mad_agent); return; } @@ -537,9 +537,13 @@ reg_req.mgmt_class = IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE; reg_req.mgmt_class_version = 1; - /* All methods for now even though only some are used BY SMA !!! */ - bitmap_fill(®_req.method_mask, IB_MGMT_MAX_METHODS); + /* SMA needs to receive Get, Set, and TrapRepress methods */ + bitmap_zero((unsigned long *)®_req.method_mask, IB_MGMT_MAX_METHODS); + set_bit(IB_MGMT_METHOD_GET, (unsigned long *)®_req.method_mask); + set_bit(IB_MGMT_METHOD_SET, (unsigned long *)®_req.method_mask); + set_bit(IB_MGMT_METHOD_TRAP_REPRESS, (unsigned long *)®_req.method_mask); + port_priv->mad_agent = ib_register_mad_agent(device, port_num, IB_QPT_SMI, ®_req, 0, From mshefty at ichips.intel.com Thu Oct 7 11:49:43 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 7 Oct 2004 11:49:43 -0700 Subject: [openib-general] [PATCH] reformat code to within 80 columns Message-ID: <20041007114943.1951b91a.mshefty@ichips.intel.com> Only purpose of patch is to reformat code to keep it within 80 columns. The resulting code highlights some areas where we may want to look at restructing it. - Sean Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 949) +++ access/ib_mad.c (working copy) @@ -122,7 +122,7 @@ } if (rmpp_version) { - ret = ERR_PTR(-EINVAL); /* for now!!! (until RMPP implemented) */ + ret = ERR_PTR(-EINVAL); /* until RMPP implemented!!! */ goto error1; } @@ -133,8 +133,12 @@ goto error1; } if (mad_reg_req->mgmt_class >= MAX_MGMT_CLASS) { - /* IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE is the only one currently allowed */ - if (mad_reg_req->mgmt_class != IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { + /* + * IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE is the only + * one currently allowed + */ + if (mad_reg_req->mgmt_class != + IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { ret = ERR_PTR(-EINVAL); goto error1; } @@ -188,12 +192,13 @@ if (mad_reg_req) { class = port_priv->version[mad_reg_req->mgmt_class_version]; if (class) { - mgmt_class = convert_mgmt_class(mad_reg_req->mgmt_class); + mgmt_class = convert_mgmt_class( + mad_reg_req->mgmt_class); method = class->method_table[mgmt_class]; if (method) { if (method_in_use(&method, mad_reg_req)) { - spin_unlock_irqrestore(&port_priv->reg_lock, flags); - + spin_unlock_irqrestore( + &port_priv->reg_lock, flags); ret = ERR_PTR(-EINVAL); goto error2; } @@ -340,7 +345,8 @@ GFP_ATOMIC : GFP_KERNEL); if (!mad_send_wr) { *bad_send_wr = cur_send_wr; - printk(KERN_ERR "No memory for ib_mad_send_wr_private\n"); + printk(KERN_ERR "No memory for " + "ib_mad_send_wr_private\n"); return -ENOMEM; } @@ -396,7 +402,8 @@ struct ib_mad_private_header *mad_priv_hdr; struct ib_mad_private *priv; - mad_priv_hdr = container_of(mad_recv_wc, struct ib_mad_private_header, recv_wc); + mad_priv_hdr = container_of(mad_recv_wc, struct ib_mad_private_header, + recv_wc); priv = container_of(mad_priv_hdr, struct ib_mad_private, header); /* @@ -406,8 +413,10 @@ list_for_each_entry(entry, &mad_recv_wc->recv_buf->list, list) { /* Free previous receive buffer */ kmem_cache_free(ib_mad_cache, priv); - mad_priv_hdr = container_of(entry, struct ib_mad_private_header, recv_buf); - priv = container_of(mad_priv_hdr, struct ib_mad_private, header); + mad_priv_hdr = container_of(entry, struct ib_mad_private_header, + recv_buf); + priv = container_of(mad_priv_hdr, struct ib_mad_private, + header); } /* Free last buffer */ @@ -454,7 +463,8 @@ for (i = find_first_bit(mad_reg_req->method_mask, IB_MGMT_MAX_METHODS); i < IB_MGMT_MAX_METHODS; - i = find_next_bit(mad_reg_req->method_mask, IB_MGMT_MAX_METHODS, 1+i)) { + i = find_next_bit(mad_reg_req->method_mask, IB_MGMT_MAX_METHODS, + 1+i)) { if ((*method)->agent[i]) { printk(KERN_ERR "Method %d already in use\n", i); return -EINVAL; @@ -494,7 +504,10 @@ { int i, j; - /* Check to see if there are any method tables for this class still in use */ + /* + * Check to see if there are any method tables for this class still + * in use + */ j = 0; for (i = 0; i < MAX_MGMT_CLASS; i++) { if (class->method_table[i]) { @@ -538,7 +551,8 @@ /* Allocate management class table for "new" class version */ *class = kmalloc(sizeof **class, GFP_KERNEL); if (!*class) { - printk(KERN_ERR "No memory for ib_mad_mgmt_class_table\n"); + printk(KERN_ERR "No memory for " + "ib_mad_mgmt_class_table\n"); goto error1; } /* Clear management class table for this class version */ @@ -568,7 +582,8 @@ /* Finally, add in methods being registered */ for (i = find_first_bit(mad_reg_req->method_mask, IB_MGMT_MAX_METHODS); i < IB_MGMT_MAX_METHODS; - i = find_next_bit(mad_reg_req->method_mask, IB_MGMT_MAX_METHODS, 1+i)) { + i = find_next_bit(mad_reg_req->method_mask, IB_MGMT_MAX_METHODS, + 1+i)) { (*method)->agent[i] = priv; } return 0; @@ -608,7 +623,8 @@ port_priv = agent_priv->port_priv; class = port_priv->version[agent_priv->reg_req->mgmt_class_version]; if (!class) { - printk(KERN_ERR "No class table yet MAD registration request supplied\n"); + printk(KERN_ERR "No class table yet MAD registration request " + "supplied\n"); goto ret; } @@ -626,7 +642,8 @@ if (!check_class_table(class)) { /* If not, release management class table */ kfree(class); - port_priv->version[agent_priv->reg_req->mgmt_class_version] = NULL; + port_priv->version[agent_priv->reg_req-> + mgmt_class_version]= NULL; } } } @@ -670,9 +687,10 @@ return response_mad(mad); } -static struct ib_mad_agent_private *find_mad_agent(struct ib_mad_port_private *port_priv, - struct ib_mad *mad, - int solicited) +static struct ib_mad_agent_private * +find_mad_agent(struct ib_mad_port_private *port_priv, + struct ib_mad *mad, + int solicited) { struct ib_mad_agent_private *entry, *mad_agent = NULL; struct ib_mad_mgmt_class_table *version; @@ -690,28 +708,35 @@ } } if (!mad_agent) { - printk(KERN_ERR "No client 0x%x for received MAD on port %d\n", - hi_tid, port_priv->port_num); + printk(KERN_ERR "No client 0x%x for received MAD on " + "port %d\n", hi_tid, port_priv->port_num); goto ret; } } else { /* Routing is based on version, class, and method */ if (mad->mad_hdr.class_version >= MAX_MGMT_VERSION) { - printk(KERN_ERR "MAD received with unsupported class version %d on port %d\n", + printk(KERN_ERR "MAD received with unsupported class " + "version %d on port %d\n", mad->mad_hdr.class_version, port_priv->port_num); goto ret; } version = port_priv->version[mad->mad_hdr.class_version]; if (!version) { - printk(KERN_ERR "MAD received on port %d for class version %d with no client\n", port_priv->port_num, mad->mad_hdr.class_version); + printk(KERN_ERR "MAD received on port %d for class " + "version %d with no client\n", + port_priv->port_num, mad->mad_hdr.class_version); goto ret; } - class = version->method_table[convert_mgmt_class(mad->mad_hdr.mgmt_class)]; + class = version->method_table[convert_mgmt_class( + mad->mad_hdr.mgmt_class)]; if (!class) { - printk(KERN_ERR "MAD received on port %d for class %d with no client\n", port_priv->port_num, mad->mad_hdr.mgmt_class); + printk(KERN_ERR "MAD received on port %d for class " + "%d with no client\n", + port_priv->port_num, mad->mad_hdr.mgmt_class); goto ret; } - mad_agent = class->agent[mad->mad_hdr.method & ~IB_MGMT_METHOD_RESP]; + mad_agent = class->agent[mad->mad_hdr.method & + ~IB_MGMT_METHOD_RESP]; } ret: @@ -724,8 +749,8 @@ /* Make sure MAD base version is understood */ if (mad->mad_hdr.base_version != IB_MGMT_BASE_VERSION) { - printk(KERN_ERR "MAD received with unsupported base version %d\n", - mad->mad_hdr.base_version); + printk(KERN_ERR "MAD received with unsupported base " + "version %d\n", mad->mad_hdr.base_version); goto ret; } @@ -747,8 +772,9 @@ /* * Return start of fully reassembled MAD, or NULL, if MAD isn't assembled yet */ -static struct ib_mad_private* reassemble_recv(struct ib_mad_agent_private *mad_agent_priv, - struct ib_mad_private *recv) +static struct ib_mad_private * +reassemble_recv(struct ib_mad_agent_private *mad_agent_priv, + struct ib_mad_private *recv) { /* Until we have RMPP, all receives are reassembled!... */ INIT_LIST_HEAD(&recv->header.recv_buf.list); @@ -854,15 +880,16 @@ rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; mad_priv_hdr = container_of(rbuf, struct ib_mad_private_header, recv_buf); - recv = container_of(mad_priv_hdr, struct ib_mad_private, header); + recv = container_of(mad_priv_hdr, struct ib_mad_private, + header); /* Remove from posted receive MAD list */ list_del(&recv->header.recv_buf.list); port_priv->recv_posted_mad_count[qpn]--; } else { - printk(KERN_ERR "Receive completion WR ID 0x%Lx on QP %d with no" - "posted receive\n", wc->wr_id, qp_num); + printk(KERN_ERR "Receive completion WR ID 0x%Lx on QP %d " + "with no posted receive\n", wc->wr_id, qp_num); spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); return; @@ -893,7 +920,8 @@ solicited); if (!mad_agent) { spin_unlock_irqrestore(&port_priv->reg_lock, flags); - printk(KERN_NOTICE "No matching mad agent found for received MAD on port %d\n", port_priv->port_num); + printk(KERN_NOTICE "No matching mad agent found for received " + "MAD on port %d\n", port_priv->port_num); } else { atomic_inc(&mad_agent->refcount); spin_unlock_irqrestore(&port_priv->reg_lock, flags); @@ -978,7 +1006,8 @@ struct ib_mad_send_wr_private, send_list); send_wr = mad_send_wr->send_list.next; - mad_send_wr = container_of(send_wr, struct ib_mad_send_wr_private, send_list); + mad_send_wr = container_of(send_wr, struct ib_mad_send_wr_private, + send_list); if (wc->wr_id != (unsigned long)mad_send_wr) { printk(KERN_ERR "Send completion WR ID 0x%Lx doesn't match " "posted send WR ID 0x%lx\n", @@ -994,7 +1023,6 @@ /* Restore client wr_id in WC */ wc->wr_id = mad_send_wr->wr_id; - ib_mad_complete_send_wr(mad_send_wr, (struct ib_mad_send_wc*)wc); return; @@ -1012,7 +1040,8 @@ ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { - printk(KERN_DEBUG "Completion opcode 0x%x WRID 0x%Lx\n", wc.opcode, wc.wr_id); + printk(KERN_DEBUG "Completion opcode 0x%x WRID 0x%Lx\n", + wc.opcode, wc.wr_id); switch (wc.opcode) { case IB_WC_SEND: if (wc.status != IB_WC_SUCCESS) @@ -1027,10 +1056,11 @@ ib_mad_recv_done_handler(port_priv, &wc); break; default: - printk(KERN_ERR "Wrong Opcode 0x%x on completion\n", wc.opcode); + printk(KERN_ERR "Wrong Opcode 0x%x on completion\n", + wc.opcode); if (wc.status) { - printk(KERN_ERR "Completion error %d\n", wc.status); - + printk(KERN_ERR "Completion error %d\n", + wc.status); } } } @@ -1235,7 +1265,8 @@ /* Setup scatter list */ sg_list.addr = pci_map_single(port_priv->device->dma_device, &mad_priv->grh, - sizeof *mad_priv - sizeof mad_priv->header, + sizeof *mad_priv - + sizeof mad_priv->header, PCI_DMA_FROMDEVICE); sg_list.length = sizeof *mad_priv - sizeof mad_priv->header; sg_list.lkey = (*port_priv->mr).lkey; @@ -1274,7 +1305,8 @@ spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); kmem_cache_free(ib_mad_cache, mad_priv); - printk(KERN_NOTICE "ib_post_recv WRID 0x%Lx failed ret = %d\n", recv_wr.wr_id, ret); + printk(KERN_NOTICE "ib_post_recv WRID 0x%Lx failed ret = %d\n", + recv_wr.wr_id, ret); return -EINVAL; } @@ -1292,8 +1324,9 @@ for (j = 0; j < IB_MAD_QPS_CORE; j++) { if (ib_mad_post_receive_mad(port_priv, port_priv->qp[j])) { - printk(KERN_ERR "receive post %d failed on %s port %d\n", - i + 1, port_priv->device->name, + printk(KERN_ERR "receive post %d failed on %s " + "port %d\n", i + 1, + port_priv->device->name, port_priv->port_num); } } @@ -1337,7 +1370,6 @@ PCI_DMA_FROMDEVICE); kmem_cache_free(ib_mad_cache, recv); - } INIT_LIST_HEAD(&port_priv->recv_posted_mad_list[i]); @@ -1485,7 +1517,8 @@ for (i = 0; i < IB_MAD_QPS_CORE; i++) { ret = ib_mad_change_qp_state_to_init(port_priv->qp[i]); if (ret) { - printk(KERN_ERR "Couldn't change QP%d state to INIT\n", i); + printk(KERN_ERR "Couldn't change QP%d state to " + "INIT\n", i); return ret; } } @@ -1505,13 +1538,15 @@ for (i = 0; i < IB_MAD_QPS_CORE; i++) { ret = ib_mad_change_qp_state_to_rtr(port_priv->qp[i]); if (ret) { - printk(KERN_ERR "Couldn't change QP%d state to RTR\n", i); + printk(KERN_ERR "Couldn't change QP%d state to " + "RTR\n", i); goto error; } ret = ib_mad_change_qp_state_to_rts(port_priv->qp[i]); if (ret) { - printk(KERN_ERR "Couldn't change QP%d state to RTS\n", i); + printk(KERN_ERR "Couldn't change QP%d state to " + "RTS\n", i); goto error; } } @@ -1522,7 +1557,8 @@ for (i = 0; i < IB_MAD_QPS_CORE; i++) { ret2 = ib_mad_change_qp_state_to_reset(port_priv->qp[i]); if (ret2) { - printk(KERN_ERR "ib_mad_port_start: Couldn't change QP%d state to RESET\n", i); + printk(KERN_ERR "ib_mad_port_start: Couldn't change " + "QP%d state to RESET\n", i); } } @@ -1539,7 +1575,8 @@ for (i = 0; i < IB_MAD_QPS_CORE; i++) { ret = ib_mad_change_qp_state_to_reset(port_priv->qp[i]); if (ret) { - printk(KERN_ERR "ib_mad_port_stop: Couldn't change %s port %d QP%d state to RESET\n", + printk(KERN_ERR "ib_mad_port_stop: Couldn't change %s " + "port %d QP%d state to RESET\n", port_priv->device->name, port_priv->port_num, i); } } @@ -1597,7 +1634,8 @@ cq_size = (IB_MAD_QP_SEND_SIZE + IB_MAD_QP_RECV_SIZE) * 2; port_priv->cq = ib_create_cq(port_priv->device, - (ib_comp_handler) ib_mad_thread_completion_handler, + (ib_comp_handler) + ib_mad_thread_completion_handler, NULL, port_priv, cq_size); if (IS_ERR(port_priv->cq)) { printk(KERN_ERR "Couldn't create ib_mad CQ\n"); -- From mshefty at ichips.intel.com Thu Oct 7 11:55:22 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 7 Oct 2004 11:55:22 -0700 Subject: [openib-general] [PATCH] reformat code to within 80 columns In-Reply-To: <20041007114943.1951b91a.mshefty@ichips.intel.com> References: <20041007114943.1951b91a.mshefty@ichips.intel.com> Message-ID: <20041007115522.72df3999.mshefty@ichips.intel.com> On Thu, 7 Oct 2004 11:49:43 -0700 Sean Hefty wrote: > Only purpose of patch is to reformat code to keep it within 80 columns. The resulting code highlights some areas where we may want to look at restructing it. > > - Sean Same purpose - different file... SMI code, which was not in last patch. Index: access/ib_smi.c =================================================================== --- access/ib_smi.c (revision 953) +++ access/ib_smi.c (working copy) @@ -319,7 +319,8 @@ } spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR "smp_send: no matching MAD agent 0x%x\n", (unsigned int)mad_agent); + printk(KERN_ERR "smp_send: no matching MAD agent 0x%x\n", + (unsigned int)mad_agent); return; } @@ -424,7 +425,8 @@ } if (smi_check_forward_smp(mad_agent, smp)) { - smi_send_smp(mad_agent, smp, mad_recv_wc, mad_recv_wc->wc->slid); + smi_send_smp(mad_agent, smp, mad_recv_wc, + mad_recv_wc->wc->slid); return 0; } @@ -451,7 +453,8 @@ /* Hold lock longer !!! */ spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR "smi_send_handler: no matching MAD agent 0x%x\n", (unsigned int)mad_agent); + printk(KERN_ERR "smi_send_handler: no matching MAD agent " + "0x%x\n", (unsigned int)mad_agent); return; } @@ -510,7 +513,8 @@ /* First, check if port already open for SMI */ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { - if (entry->mad_agent->device == device && entry->port_num == port_num) { + if (entry->mad_agent->device == device && + entry->port_num == port_num) { port_priv = entry; break; } @@ -542,7 +546,8 @@ bitmap_zero((unsigned long *)®_req.method_mask, IB_MGMT_MAX_METHODS); set_bit(IB_MGMT_METHOD_GET, (unsigned long *)®_req.method_mask); set_bit(IB_MGMT_METHOD_SET, (unsigned long *)®_req.method_mask); - set_bit(IB_MGMT_METHOD_TRAP_REPRESS, (unsigned long *)®_req.method_mask); + set_bit(IB_MGMT_METHOD_TRAP_REPRESS, + (unsigned long *)®_req.method_mask); port_priv->mad_agent = ib_register_mad_agent(device, port_num, IB_QPT_SMI, @@ -582,7 +587,8 @@ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { - if (entry->mad_agent->device == device && entry->port_num == port_num) { + if (entry->mad_agent->device == device && + entry->port_num == port_num) { port_priv = entry; break; } From halr at voltaire.com Thu Oct 7 12:18:14 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 15:18:14 -0400 Subject: [openib-general] Re: [PATCH] reformat code to within 80 columns In-Reply-To: <20041007114943.1951b91a.mshefty@ichips.intel.com> References: <20041007114943.1951b91a.mshefty@ichips.intel.com> Message-ID: <1097176693.2922.11.camel@hpc-1> On Thu, 2004-10-07 at 14:49, Sean Hefty wrote: > Only purpose of patch is to reformat code to keep it within 80 columns. Thanks. Applied. -- Hal From halr at voltaire.com Thu Oct 7 12:24:27 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 15:24:27 -0400 Subject: [openib-general] [PATCH] reformat code to within 80 columns In-Reply-To: <20041007115522.72df3999.mshefty@ichips.intel.com> References: <20041007114943.1951b91a.mshefty@ichips.intel.com> <20041007115522.72df3999.mshefty@ichips.intel.com> Message-ID: <1097177067.2922.16.camel@hpc-1> On Thu, 2004-10-07 at 14:55, Sean Hefty wrote: > On Thu, 7 Oct 2004 11:49:43 -0700 > Sean Hefty wrote: > > > Only purpose of patch is to reformat code to keep it within 80 columns. The resulting code highlights some areas where we may want to look at restructing it. > > > > - Sean > > Same purpose - different file... SMI code, which was not in last patch. Thanks. Applied. -- Hal From mshefty at ichips.intel.com Thu Oct 7 12:42:09 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 7 Oct 2004 12:42:09 -0700 Subject: [openib-general] [PATCH] rename structure members Message-ID: <20041007124209.720bc6d1.mshefty@ichips.intel.com> Here's a patch that just renames a few structure members related to MADs. The renamed variables will be used when handling MAD timeouts. - Sean -- Index: access/ib_mad_priv.h =================================================================== --- access/ib_mad_priv.h (revision 955) +++ access/ib_mad_priv.h (working copy) @@ -106,7 +106,7 @@ struct ib_mad_reg_req *reg_req; struct ib_mad_port_private *port_priv; - spinlock_t send_list_lock; + spinlock_t lock; struct list_head send_list; atomic_t refcount; @@ -116,11 +116,11 @@ struct ib_mad_send_wr_private { struct list_head send_list; - struct list_head agent_send_list; + struct list_head agent_list; struct ib_mad_agent *agent; u64 wr_id; /* client WR ID */ u64 tid; - int timeout_ms; + int timeout; int refcount; enum ib_wc_status status; }; Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 955) +++ access/ib_mad.c (working copy) @@ -230,7 +230,7 @@ list_add_tail(&mad_agent_priv->agent_list, &port_priv->agent_list); spin_unlock_irqrestore(&port_priv->reg_lock, flags); - spin_lock_init(&mad_agent_priv->send_list_lock); + spin_lock_init(&mad_agent_priv->lock); INIT_LIST_HEAD(&mad_agent_priv->send_list); atomic_set(&mad_agent_priv->refcount, 1); init_waitqueue_head(&mad_agent_priv->wait); @@ -352,8 +352,8 @@ mad_send_wr->tid = send_wr->wr.ud.mad_hdr->tid; mad_send_wr->agent = mad_agent; - mad_send_wr->timeout_ms = cur_send_wr->wr.ud.timeout_ms; - if (mad_send_wr->timeout_ms) + mad_send_wr->timeout = cur_send_wr->wr.ud.timeout_ms; + if (mad_send_wr->timeout) mad_send_wr->refcount = 2; else mad_send_wr->refcount = 1; @@ -361,10 +361,10 @@ /* Reference MAD agent until send completes */ atomic_inc(&mad_agent_priv->refcount); - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); - list_add_tail(&mad_send_wr->agent_send_list, + spin_lock_irqsave(&mad_agent_priv->lock, flags); + list_add_tail(&mad_send_wr->agent_list, &mad_agent_priv->send_list); - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); cur_send_wr->next = NULL; ret = ib_send_mad(mad_agent_priv, mad_send_wr, @@ -373,11 +373,9 @@ /* Handle QP overrun separately... -ENOMEM */ /* Fail send request */ - spin_lock_irqsave(&mad_agent_priv->send_list_lock, - flags); - list_del(&mad_send_wr->agent_send_list); - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, - flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); + list_del(&mad_send_wr->agent_list); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); *bad_send_wr = cur_send_wr; if (atomic_dec_and_test(&mad_agent_priv->refcount)) @@ -788,12 +786,12 @@ struct ib_mad_send_wr_private *mad_send_wr; list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, - agent_send_list) { + agent_list) { if (mad_send_wr->tid == tid) { /* Verify request is still valid */ if (mad_send_wr->status == IB_WC_SUCCESS && - mad_send_wr->timeout_ms) + mad_send_wr->timeout) return mad_send_wr; else return NULL; @@ -817,17 +815,16 @@ /* Complete corresponding request */ if (solicited) { - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); mad_send_wr = find_send_req(mad_agent_priv, recv->mad.mad.mad_hdr.tid); if (!mad_send_wr) { - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, - flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); ib_free_recv_mad(&recv->header.recv_wc); return; } - mad_send_wr->timeout_ms = 0; - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + mad_send_wr->timeout = 0; + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Defined behavior is to complete response before request */ mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, @@ -951,13 +948,13 @@ mad_agent_priv = container_of(mad_send_wr->agent, struct ib_mad_agent_private, agent); - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); if (mad_send_wc->status != IB_WC_SUCCESS && mad_send_wr->status == IB_WC_SUCCESS) { mad_send_wr->status = mad_send_wc->status; - if (mad_send_wr->timeout_ms) { - mad_send_wr->timeout_ms = 0; + if (mad_send_wr->timeout) { + mad_send_wr->timeout = 0; mad_send_wr->refcount--; } } @@ -968,13 +965,13 @@ * or timeout occurs */ if (--mad_send_wr->refcount > 0) { - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); return; } /* Remove send from MAD agent and notify client of completion */ - list_del(&mad_send_wr->agent_send_list); - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + list_del(&mad_send_wr->agent_list); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); if (mad_send_wr->status != IB_WC_SUCCESS ) mad_send_wc->status = mad_send_wr->status; @@ -1075,38 +1072,38 @@ INIT_LIST_HEAD(&cancel_list); - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, - &mad_agent_priv->send_list, agent_send_list) { + &mad_agent_priv->send_list, agent_list) { if (mad_send_wr->status == IB_WC_SUCCESS) mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - if (mad_send_wr->timeout_ms) { - mad_send_wr->timeout_ms = 0; + if (mad_send_wr->timeout) { + mad_send_wr->timeout = 0; mad_send_wr->refcount--; } if (mad_send_wr->refcount == 0) { - list_del(&mad_send_wr->agent_send_list); - list_add_tail(&mad_send_wr->agent_send_list, + list_del(&mad_send_wr->agent_list); + list_add_tail(&mad_send_wr->agent_list, &cancel_list); } } - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Report all cancelled requests */ mad_send_wc.status = IB_WC_WR_FLUSH_ERR; mad_send_wc.vendor_err = 0; list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, - &cancel_list, agent_send_list) { + &cancel_list, agent_list) { mad_send_wc.wr_id = mad_send_wr->wr_id; mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, &mad_send_wc); - list_del(&mad_send_wr->agent_send_list); + list_del(&mad_send_wr->agent_list); kfree(mad_send_wr); atomic_dec(&mad_agent_priv->refcount); @@ -1120,7 +1117,7 @@ struct ib_mad_send_wr_private *mad_send_wr; list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, - agent_send_list) { + agent_list) { if (mad_send_wr->wr_id == wr_id) return mad_send_wr; } @@ -1137,28 +1134,28 @@ mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private, agent); - spin_lock_irqsave(&mad_agent_priv->send_list_lock, flags); + spin_lock_irqsave(&mad_agent_priv->lock, flags); mad_send_wr = find_send_by_wr_id(mad_agent_priv, wr_id); if (!mad_send_wr) { - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); goto ret; } if (mad_send_wr->status == IB_WC_SUCCESS) mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - if (mad_send_wr->timeout_ms) { - mad_send_wr->timeout_ms = 0; + if (mad_send_wr->timeout) { + mad_send_wr->timeout = 0; mad_send_wr->refcount--; } if (mad_send_wr->refcount != 0) { - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); goto ret; } - list_del(&mad_send_wr->agent_send_list); - spin_unlock_irqrestore(&mad_agent_priv->send_list_lock, flags); + list_del(&mad_send_wr->agent_list); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); mad_send_wc.status = IB_WC_WR_FLUSH_ERR; mad_send_wc.vendor_err = 0; From halr at voltaire.com Thu Oct 7 14:07:16 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 17:07:16 -0400 Subject: [openib-general] Re: [PATCH] rename structure members In-Reply-To: <20041007124209.720bc6d1.mshefty@ichips.intel.com> References: <20041007124209.720bc6d1.mshefty@ichips.intel.com> Message-ID: <1097183236.2922.69.camel@hpc-1> On Thu, 2004-10-07 at 15:42, Sean Hefty wrote: > Here's a patch that just renames a few structure members related to MADs. > The renamed variables will be used when handling MAD timeouts. Thanks. Applied. -- Hal From mshefty at ichips.intel.com Thu Oct 7 14:33:21 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 7 Oct 2004 14:33:21 -0700 Subject: [openib-general] [PATCH] stored MAD timeout usage changes Message-ID: <20041007143321.13be9029.mshefty@ichips.intel.com> This patch converts stored MAD timeouts from ms to jiffies and changes from keying off a non-zero timeout to using the MAD status value when handling errors and canceling MADs. - Sean Index: access/ib_mad_priv.h =================================================================== --- access/ib_mad_priv.h (revision 956) +++ access/ib_mad_priv.h (working copy) @@ -120,7 +120,7 @@ struct ib_mad_agent *agent; u64 wr_id; /* client WR ID */ u64 tid; - int timeout; + unsigned long timeout; int refcount; enum ib_wc_status status; }; Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 956) +++ access/ib_mad.c (working copy) @@ -352,11 +352,10 @@ mad_send_wr->tid = send_wr->wr.ud.mad_hdr->tid; mad_send_wr->agent = mad_agent; + /* Timeout will be updated after send completes. */ mad_send_wr->timeout = cur_send_wr->wr.ud.timeout_ms; - if (mad_send_wr->timeout) - mad_send_wr->refcount = 2; - else - mad_send_wr->refcount = 1; + /* One reference for each work request to QP + response. */ + mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0); mad_send_wr->status = IB_WC_SUCCESS; /* Reference MAD agent until send completes */ @@ -787,14 +786,10 @@ list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, agent_list) { - - if (mad_send_wr->tid == tid) { - /* Verify request is still valid */ - if (mad_send_wr->status == IB_WC_SUCCESS && - mad_send_wr->timeout) - return mad_send_wr; - else - return NULL; + if (mad_send_wr->tid == tid && mad_send_wr->timeout) { + /* Verify request has not been canceled. */ + return (mad_send_wr->status == IB_WC_SUCCESS) ? + mad_send_wr : NULL; } } return NULL; @@ -823,6 +818,7 @@ ib_free_recv_mad(&recv->header.recv_wc); return; } + /* Timeout = 0 means that we won't wait for a response. */ mad_send_wr->timeout = 0; spin_unlock_irqrestore(&mad_agent_priv->lock, flags); @@ -951,12 +947,8 @@ spin_lock_irqsave(&mad_agent_priv->lock, flags); if (mad_send_wc->status != IB_WC_SUCCESS && mad_send_wr->status == IB_WC_SUCCESS) { - mad_send_wr->status = mad_send_wc->status; - if (mad_send_wr->timeout) { - mad_send_wr->timeout = 0; - mad_send_wr->refcount--; - } + mad_send_wr->refcount -= (mad_send_wr->timeout > 0); } /* @@ -1077,17 +1069,14 @@ &mad_agent_priv->send_list, agent_list) { if (mad_send_wr->status == IB_WC_SUCCESS) - mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - - if (mad_send_wr->timeout) { - mad_send_wr->timeout = 0; - mad_send_wr->refcount--; - } + mad_send_wr->refcount -= (mad_send_wr->timeout > 0); if (mad_send_wr->refcount == 0) { list_del(&mad_send_wr->agent_list); list_add_tail(&mad_send_wr->agent_list, &cancel_list); + } else { + mad_send_wr->status = IB_WC_WR_FLUSH_ERR; } } spin_unlock_irqrestore(&mad_agent_priv->lock, flags); @@ -1142,14 +1131,10 @@ } if (mad_send_wr->status == IB_WC_SUCCESS) - mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - - if (mad_send_wr->timeout) { - mad_send_wr->timeout = 0; - mad_send_wr->refcount--; - } + mad_send_wr->refcount -= (mad_send_wr->timeout > 0); if (mad_send_wr->refcount != 0) { + mad_send_wr->status = IB_WC_WR_FLUSH_ERR; spin_unlock_irqrestore(&mad_agent_priv->lock, flags); goto ret; } From halr at voltaire.com Thu Oct 7 14:54:37 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 17:54:37 -0400 Subject: [openib-general] [PATCH] ib_smi: Support LID routed Gets and Sets in SMA Message-ID: <1097186077.2922.89.camel@hpc-1> ib_smi: Support LID routed Gets and Sets in SMA Index: ib_smi.c =================================================================== --- ib_smi.c (revision 955) +++ ib_smi.c (working copy) @@ -125,7 +125,7 @@ case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_handle_dr_smp_send(mad_agent, smp); default: - return 0; /* write me... */ + return 1; } } @@ -137,9 +137,10 @@ { /* C14-9:3 -- We're at the end of the DR segment of path */ /* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM. */ - return (mad_agent->device->process_mad && + return ((smp->mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED) || + (mad_agent->device->process_mad && !ib_get_smp_direction(smp) && - (smp->hop_ptr == smp->hop_cnt + 1)); + (smp->hop_ptr == smp->hop_cnt + 1))); } /* @@ -230,7 +231,7 @@ case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_handle_dr_smp_recv(mad_agent, smp); default: - return 0; /* write me... */ + return 1; } } @@ -282,7 +283,7 @@ case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_check_forward_dr_smp(mad_agent, smp); default: - return 0; /* write me... */ + return 1; } } @@ -312,7 +313,8 @@ /* Find matching MAD agent */ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { - if (entry->mad_agent == mad_agent) { + if ((entry->mad_agent == mad_agent) || + (entry->mad_agent2 == mad_agent)) { port_priv = entry; break; } @@ -403,8 +405,11 @@ ret = smi_process_local(mad_agent, (struct ib_mad *)smp, smp_response, slid); if (ret & IB_MAD_RESULT_SUCCESS) { - /* Workaround !!! */ - ((struct ib_smp *)smp_response)->hop_ptr--; + if (smp_response->mad_hdr.mgmt_class == + IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { + /* Workaround !!! */ + ((struct ib_smp *)smp_response)->hop_ptr--; + } smp_send(mad_agent, smp_response, mad_recv_wc); } else kfree(smp_response); @@ -445,7 +450,8 @@ /* Find matching MAD agent */ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { - if (entry->mad_agent == mad_agent) { + if ((entry->mad_agent == mad_agent) || + (entry->mad_agent2 == mad_agent)) { port_priv = entry; break; } @@ -539,6 +545,7 @@ spin_lock_init(&port_priv->send_list_lock); INIT_LIST_HEAD(&port_priv->send_posted_smp_list); + /* Obtain MAD agent for directed route SM class */ reg_req.mgmt_class = IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE; reg_req.mgmt_class_version = 1; @@ -556,17 +563,32 @@ &smi_recv_handler, NULL); if (IS_ERR(port_priv->mad_agent)) { - port_priv->mad_agent = NULL; ret = PTR_ERR(port_priv->mad_agent); kfree(port_priv); return ret; } + /* Obtain MAD agent for LID routed SM class */ + reg_req.mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; + port_priv->mad_agent2 = ib_register_mad_agent(device, port_num, + IB_QPT_SMI, + ®_req, 0, + &smi_send_handler, + &smi_recv_handler, + NULL); + if (IS_ERR(port_priv->mad_agent2)) { + ret = PTR_ERR(port_priv->mad_agent2); + ib_unregister_mad_agent(port_priv->mad_agent); + kfree(port_priv); + return ret; + } + port_priv->mr = ib_reg_phys_mr(port_priv->mad_agent->qp->pd, &buf_list, 1, IB_ACCESS_LOCAL_WRITE, &iova); if (IS_ERR(port_priv->mr)) { printk(KERN_ERR "Couldn't register MR\n"); + ib_unregister_mad_agent(port_priv->mad_agent2); ib_unregister_mad_agent(port_priv->mad_agent); ret = PTR_ERR(port_priv->mr); kfree(port_priv); @@ -604,6 +626,7 @@ spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); ib_dereg_mr(port_priv->mr); + ib_unregister_mad_agent(port_priv->mad_agent2); ib_unregister_mad_agent(port_priv->mad_agent); kfree(port_priv); Index: ib_smi_priv.h =================================================================== --- ib_smi_priv.h (revision 947) +++ ib_smi_priv.h (working copy) @@ -39,7 +39,8 @@ struct list_head send_posted_smp_list; spinlock_t send_list_lock; int port_num; - struct ib_mad_agent *mad_agent; + struct ib_mad_agent *mad_agent; /* DR */ + struct ib_mad_agent *mad_agent2; /* LR */ struct ib_mr *mr; u64 wr_id; }; Index: TODO =================================================================== --- TODO (revision 947) +++ TODO (working copy) @@ -5,8 +5,7 @@ Short Term Send timeout support Treat send overruns as timeouts for now (once timeout support implemented) ? -LR SMP -GMP with GRH testing +Test GMPs with GRH Revisit ib_mad.h structure packing @@ -18,7 +17,7 @@ RMPP support Redirection support PMA support -SMI support for switches +Test SMI support for switches sysfs support for MAD layer (statistics, debug support, etc.) Replace locking with RCU From halr at voltaire.com Thu Oct 7 15:13:42 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 07 Oct 2004 18:13:42 -0400 Subject: [openib-general] Re: [PATCH] stored MAD timeout usage changes In-Reply-To: <20041007143321.13be9029.mshefty@ichips.intel.com> References: <20041007143321.13be9029.mshefty@ichips.intel.com> Message-ID: <1097187222.2922.92.camel@hpc-1> On Thu, 2004-10-07 at 17:33, Sean Hefty wrote: > This patch converts stored MAD timeouts from ms to jiffies and > changes from keying off a non-zero timeout to using the MAD status > value when handling errors and canceling MADs. Thanks. Applied. -- Hal From mshefty at ichips.intel.com Thu Oct 7 15:21:28 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 7 Oct 2004 15:21:28 -0700 Subject: [openib-general] [PATCH] use wait_list for MADs waiting for a response Message-ID: <20041007152128.4f171812.mshefty@ichips.intel.com> This patch uses a wait_list to store MADs waiting for a response. - Sean -- Index: access/ib_mad_priv.h =================================================================== --- access/ib_mad_priv.h (revision 958) +++ access/ib_mad_priv.h (working copy) @@ -108,6 +108,7 @@ spinlock_t lock; struct list_head send_list; + struct list_head wait_list; atomic_t refcount; wait_queue_head_t wait; Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 958) +++ access/ib_mad.c (working copy) @@ -232,6 +232,7 @@ spin_lock_init(&mad_agent_priv->lock); INIT_LIST_HEAD(&mad_agent_priv->send_list); + INIT_LIST_HEAD(&mad_agent_priv->wait_list); atomic_set(&mad_agent_priv->refcount, 1); init_waitqueue_head(&mad_agent_priv->wait); mad_agent_priv->port_priv = port_priv; @@ -259,7 +260,12 @@ mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private, agent); - /* Cleanup pending receives for this agent !!! */ + /* Note that we could still be handling received MADs. */ + + /* + * Canceling all sends results in dropping received response MADs, + * preventing us from queuing additional work. + */ cancel_mads(mad_agent_priv); spin_lock_irqsave(&mad_agent_priv->port_priv->reg_lock, flags); @@ -267,6 +273,8 @@ list_del(&mad_agent_priv->agent_list); spin_unlock_irqrestore(&mad_agent_priv->port_priv->reg_lock, flags); + /* Cleanup pending RMPP receives for this agent !!! */ + atomic_dec(&mad_agent_priv->refcount); wait_event(mad_agent_priv->wait, !atomic_read(&mad_agent_priv->refcount)); @@ -784,6 +792,16 @@ { struct ib_mad_send_wr_private *mad_send_wr; + list_for_each_entry(mad_send_wr, &mad_agent_priv->wait_list, + agent_list) { + if (mad_send_wr->tid == tid) + return mad_send_wr; + } + + /* + * It's possible to receive the response before we've been notified + * that the send has completed. + */ list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, agent_list) { if (mad_send_wr->tid == tid && mad_send_wr->timeout) { @@ -932,6 +950,28 @@ return; } +static void wait_for_response(struct ib_mad_agent_private *mad_agent_priv, + struct ib_mad_send_wr_private *mad_send_wr ) +{ + struct ib_mad_send_wr_private *temp_mad_send_wr; + struct list_head *list_item; + unsigned long delay; + + list_del(&mad_send_wr->agent_list); + + delay = mad_send_wr->timeout; + mad_send_wr->timeout += jiffies; + + list_for_each_prev(list_item, &mad_agent_priv->wait_list) { + temp_mad_send_wr = list_entry(list_item, + struct ib_mad_send_wr_private, + agent_list); + if (time_after(mad_send_wr->timeout, temp_mad_send_wr->timeout)) + break; + } + list_add(&mad_send_wr->agent_list, list_item); +} + /* * Process a send work completion. */ @@ -951,12 +991,11 @@ mad_send_wr->refcount -= (mad_send_wr->timeout > 0); } - /* - * Leave sends with timeouts on the send list - * until either matching response is received - * or timeout occurs - */ if (--mad_send_wr->refcount > 0) { + if (mad_send_wr->refcount == 1 && mad_send_wr->timeout && + mad_send_wr->status == IB_WC_SUCCESS) { + wait_for_response(mad_agent_priv, mad_send_wr); + } spin_unlock_irqrestore(&mad_agent_priv->lock, flags); return; } @@ -1067,18 +1106,14 @@ spin_lock_irqsave(&mad_agent_priv->lock, flags); list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, &mad_agent_priv->send_list, agent_list) { - - if (mad_send_wr->status == IB_WC_SUCCESS) + if (mad_send_wr->status == IB_WC_SUCCESS) { + mad_send_wr->status = IB_WC_WR_FLUSH_ERR; mad_send_wr->refcount -= (mad_send_wr->timeout > 0); - - if (mad_send_wr->refcount == 0) { - list_del(&mad_send_wr->agent_list); - list_add_tail(&mad_send_wr->agent_list, - &cancel_list); - } else { - mad_send_wr->status = IB_WC_WR_FLUSH_ERR; } } + + /* Empty wait list to prevent receives from finding a request. */ + list_splice_init(&mad_agent_priv->wait_list, &cancel_list); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Report all cancelled requests */ @@ -1087,14 +1122,12 @@ list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, &cancel_list, agent_list) { - mad_send_wc.wr_id = mad_send_wr->wr_id; mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, &mad_send_wc); list_del(&mad_send_wr->agent_list); kfree(mad_send_wr); - atomic_dec(&mad_agent_priv->refcount); } } @@ -1105,6 +1138,12 @@ { struct ib_mad_send_wr_private *mad_send_wr; + list_for_each_entry(mad_send_wr, &mad_agent_priv->wait_list, + agent_list) { + if (mad_send_wr->wr_id == wr_id) + return mad_send_wr; + } + list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list, agent_list) { if (mad_send_wr->wr_id == wr_id) From halr at voltaire.com Fri Oct 8 06:24:27 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 08 Oct 2004 09:24:27 -0400 Subject: [openib-general] Re: [PATCH] use wait_list for MADs waiting for a response In-Reply-To: <20041007152128.4f171812.mshefty@ichips.intel.com> References: <20041007152128.4f171812.mshefty@ichips.intel.com> Message-ID: <1097241864.6552.101.camel@hpc-1> On Thu, 2004-10-07 at 18:21, Sean Hefty wrote: > This patch uses a wait_list to store MADs waiting for a response. Thanks. Applied. -- Hal From halr at voltaire.com Fri Oct 8 06:25:12 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 08 Oct 2004 09:25:12 -0400 Subject: [openib-general] [PATCH] ib_smi: Centralized error exiting in ib_smi_port_open Message-ID: <1097241912.6552.104.camel@hpc-1> ib_smi: Centralized error exiting in ib_smi_port_open Index: ib_smi.c =================================================================== --- ib_smi.c (revision 958) +++ ib_smi.c (working copy) @@ -536,7 +536,8 @@ port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); if (!port_priv) { printk(KERN_ERR "No memory for ib_smi_port_private\n"); - return -ENOMEM; + ret = -ENOMEM; + goto error1; } memset(port_priv, 0, sizeof *port_priv); @@ -564,8 +565,7 @@ NULL); if (IS_ERR(port_priv->mad_agent)) { ret = PTR_ERR(port_priv->mad_agent); - kfree(port_priv); - return ret; + goto error2; } /* Obtain MAD agent for LID routed SM class */ @@ -578,9 +578,7 @@ NULL); if (IS_ERR(port_priv->mad_agent2)) { ret = PTR_ERR(port_priv->mad_agent2); - ib_unregister_mad_agent(port_priv->mad_agent); - kfree(port_priv); - return ret; + goto error3; } port_priv->mr = ib_reg_phys_mr(port_priv->mad_agent->qp->pd, @@ -588,11 +586,8 @@ IB_ACCESS_LOCAL_WRITE, &iova); if (IS_ERR(port_priv->mr)) { printk(KERN_ERR "Couldn't register MR\n"); - ib_unregister_mad_agent(port_priv->mad_agent2); - ib_unregister_mad_agent(port_priv->mad_agent); ret = PTR_ERR(port_priv->mr); - kfree(port_priv); - return ret; + goto error4; } spin_lock_irqsave(&ib_smi_port_list_lock, flags); @@ -600,6 +595,15 @@ spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); return 0; + +error4: + ib_unregister_mad_agent(port_priv->mad_agent2); +error3: + ib_unregister_mad_agent(port_priv->mad_agent); +error2: + kfree(port_priv); +error1: + return ret; } static int ib_smi_port_close(struct ib_device *device, int port_num) From halr at voltaire.com Fri Oct 8 07:16:49 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 08 Oct 2004 10:16:49 -0400 Subject: [openib-general] [PATCH] ib_smi: Add PMA support Message-ID: <1097245009.2494.1.camel@hpc-1> ib_smi: Add PMA support Index: ib_smi_priv.h =================================================================== --- ib_smi_priv.h (revision 960) +++ ib_smi_priv.h (working copy) @@ -41,6 +41,7 @@ int port_num; struct ib_mad_agent *mad_agent; /* DR SM class */ struct ib_mad_agent *mad_agent2; /* LR SM class */ + struct ib_mad_agent *pma_mad_agent; /* PerfMgt class */ struct ib_mr *mr; u64 wr_id; }; Index: ib_smi.c =================================================================== --- ib_smi.c (revision 960) +++ ib_smi.c (working copy) @@ -124,7 +124,7 @@ { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_handle_dr_smp_send(mad_agent, smp); - default: + default: /* LR SM or PerfMgt classes */ return 1; } } @@ -137,7 +137,7 @@ { /* C14-9:3 -- We're at the end of the DR segment of path */ /* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM. */ - return ((smp->mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED) || + return ((smp->mgmt_class != IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) || (mad_agent->device->process_mad && !ib_get_smp_direction(smp) && (smp->hop_ptr == smp->hop_cnt + 1))); @@ -230,7 +230,7 @@ { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_handle_dr_smp_recv(mad_agent, smp); - default: + default: /* LR SM or PerfMgt classes */ return 1; } } @@ -282,7 +282,7 @@ { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_check_forward_dr_smp(mad_agent, smp); - default: + default: /* LR SM or PerfMgt classes */ return 1; } } @@ -314,7 +314,8 @@ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { if ((entry->mad_agent == mad_agent) || - (entry->mad_agent2 == mad_agent)) { + (entry->mad_agent2 == mad_agent) || + (entry->pma_mad_agent == mad_agent)) { port_priv = entry; break; } @@ -363,7 +364,13 @@ } send_wr.wr.ud.ah = ah; - send_wr.wr.ud.remote_qkey = 0; /* for SMPs */ + if (smp->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { + send_wr.wr.ud.pkey_index = mad_recv_wc->wc->pkey_index; + send_wr.wr.ud.remote_qkey = IB_QP1_QKEY; + } else { + send_wr.wr.ud.pkey_index = 0; /* Should only matter for GMPs */ + send_wr.wr.ud.remote_qkey = 0; /* for SMPs */ + } send_wr.wr.ud.mad_hdr = (struct ib_mad_hdr *)smp; send_wr.wr_id = ++port_priv->wr_id; @@ -451,7 +458,8 @@ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { if ((entry->mad_agent == mad_agent) || - (entry->mad_agent2 == mad_agent)) { + (entry->mad_agent2 == mad_agent) || + (entry->pma_mad_agent == mad_agent)) { port_priv = entry; break; } @@ -581,13 +589,26 @@ goto error3; } + /* Obtain MAD agent for PerfMgt class */ + reg_req.mgmt_class = IB_MGMT_CLASS_PERF_MGMT; + port_priv->pma_mad_agent = ib_register_mad_agent(device, port_num, + IB_QPT_GSI, + ®_req, 0, + &smi_send_handler, + &smi_recv_handler, + NULL); + if (IS_ERR(port_priv->pma_mad_agent)) { + ret = PTR_ERR(port_priv->pma_mad_agent); + goto error4; + } + port_priv->mr = ib_reg_phys_mr(port_priv->mad_agent->qp->pd, &buf_list, 1, IB_ACCESS_LOCAL_WRITE, &iova); if (IS_ERR(port_priv->mr)) { printk(KERN_ERR "Couldn't register MR\n"); ret = PTR_ERR(port_priv->mr); - goto error4; + goto error5; } spin_lock_irqsave(&ib_smi_port_list_lock, flags); @@ -596,6 +617,8 @@ return 0; +error5: + ib_unregister_mad_agent(port_priv->pma_mad_agent); error4: ib_unregister_mad_agent(port_priv->mad_agent2); error3: @@ -630,6 +653,7 @@ spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); ib_dereg_mr(port_priv->mr); + ib_unregister_mad_agent(port_priv->pma_mad_agent); ib_unregister_mad_agent(port_priv->mad_agent2); ib_unregister_mad_agent(port_priv->mad_agent); kfree(port_priv); From halr at voltaire.com Fri Oct 8 11:27:28 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 08 Oct 2004 14:27:28 -0400 Subject: [openib-general] SMI comments Message-ID: <1097260048.2494.30.camel@hpc-1> Hi, I started looking at the SMI code in detail and have a few questions/comments below. I am assuming the implementation is based on the IBA 1.1 spec description. It looks to me like there is no SMP initialization code (per 14.2.2.1 and 14.2.2.3). 14.2.2.1 doesn't matter right now as we don't initiate any outgoing SMPs, but 14.2.2.3 does. I think this is why I needed the hop pointer workaround. I also didn't see the LID handling as described in this section of the spec. In terms of switches, the port is the input or output (physical) port and not the port of the MAD registration (base or enhanced port 0). More detailed comments and one or more patches to follow based on these comments. Thanks. -- Hal From greg at kroah.com Fri Oct 8 13:22:47 2004 From: greg at kroah.com (Greg KH) Date: Fri, 8 Oct 2004 13:22:47 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? Message-ID: <20041008202247.GA9653@kroah.com> Hi all, Enough people have been asking me about this lately, that I thought I would just bring it up publicly here. It seems that the Infiniband group (IBTA) has changed their licensing agrement of the basic Infiniband spec. See: http://www.theinquirer.net/?article=18922 for more info about this. The main point that affects Linux is the fact that now, no non-member of the IBTA can implement any working Infiniband code, otherwise they might run into legal problems. As an anonymous member of a IBTA company told me: If someone downloads the spec without joining the IBTA, and proceeds to use the spec for an implementation of the IBTA spec, that person (company) runs the risk of being a target of patent infringement claims by IBTA members. Another person, wanting to remain anonymous stated to me: In justification for this position people say that they are just trying to get more people to join the IBTA because they need the dues, which by coincidence are $9500 per year, and point out that some other commonly used specs are similarly made available for steep prices. I don't know one way or the other about that but this sounds a lot like the reason that we all gave ourselves for NOT including SDP in the kernel[1]. So, even if a IBTA member company creates a Linux IB implementation, and gets it into the kernel tree, any company who ships such a implementation, who is not a IBTA member, could be the target of any patent infringement claims[2]. So, OpenIB group, how to you plan to address this issue? Do you all have a position as to how you think your code base can be accepted into the main kernel tree given these recent events? thanks, greg k-h [1] SDP, for those who do not know, is a part of the IB spec that Microsoft has come out and stated they they currently own the patents that cover that portion of the specification, and that anyone who wants to implement it, needs to get a licensing agreement with them. Of course, that license agreement does not allow for a GPLed version of the implementation. [2] Sure, any person who has a copy of the kernel source tree could be a target for any of a zillion other potential claims, nothing new there, but the point here is they are explicitly stating that they will go after non-IBTA members who touch IB code[3]. [3] An insanely stupid position to take, given the fact that any normal industry group would be very happy to actually have people use their specification, but hey, the IB people have never been know for their brilliance in the past... From rminnich at lanl.gov Fri Oct 8 13:38:06 2004 From: rminnich at lanl.gov (Ronald G. Minnich) Date: Fri, 8 Oct 2004 14:38:06 -0600 (MDT) Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041008202247.GA9653@kroah.com> References: <20041008202247.GA9653@kroah.com> Message-ID: On Fri, 8 Oct 2004, Greg KH wrote: > If someone downloads the spec without joining the IBTA, and > proceeds to use the spec for an implementation of the IBTA spec, > that person (company) runs the risk of being a target of patent > infringement claims by IBTA members. Another solid reason to write infiniband off. I keep hoping that the IB vendor crowd will stop shooting themselves in the head with such regularity, and they just won't. They just keep increasing the size of the bore. Infiniband can now be spelled a few different ways, "I2O" and "ATM" come to mind, except that "ATM" was less unsuccessful in its lifetime than IB has been so far. > In justification for this position people say that they are just > trying to get more people to join the IBTA because they need the > dues, which by coincidence are $9500 per year, and point out > that some other commonly used specs are similarly made available > for steep prices. I don't know one way or the other about that > but this sounds a lot like the reason that we all gave ourselves > for NOT including SDP in the kernel[1]. > So, OpenIB group, how to you plan to address this issue? Do you all > have a position as to how you think your code base can be accepted into > the main kernel tree given these recent events? Well, we non-vendors have no power, and it appears the vendors are determined to kill IB. This is all very discouraging. A lot of people at the Labs put a lot of work into the Infiniband openib effort, including getting money to support the software development, and it looks like we're not going to get very far if these rules stick. I am going to renew my search for non-IB solutions, I guess. It's hard to recommend this interconnect when IBTA takes this kind of action. ron From krause at cup.hp.com Fri Oct 8 15:48:46 2004 From: krause at cup.hp.com (Michael Krause) Date: Fri, 08 Oct 2004 15:48:46 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041008202247.GA9653@kroah.com> References: <20041008202247.GA9653@kroah.com> Message-ID: <6.1.2.0.2.20041008152933.01f671a8@esmail.cup.hp.com> At 01:22 PM 10/8/2004, Greg KH wrote: >Hi all, > >Enough people have been asking me about this lately, that I thought I >would just bring it up publicly here. > >It seems that the Infiniband group (IBTA) has changed their licensing >agrement of the basic Infiniband spec. See: > http://www.theinquirer.net/?article=18922 >for more info about this. > >The main point that affects Linux is the fact that now, no non-member of >the IBTA can implement any working Infiniband code, otherwise they might >run into legal problems. As an anonymous member of a IBTA company told >me: > If someone downloads the spec without joining the IBTA, and > proceeds to use the spec for an implementation of the IBTA spec, > that person (company) runs the risk of being a target of patent > infringement claims by IBTA members. Caution: I'm not a lawyer so the following discussion is just a personal opinion. Spec for free or spec for a price - neither grants anyone rights to any IP contained within the specifications or on the technologies that surround the specification. The change in spec cost, while clearly unfortunate, has no impact on the IP rights. IP rights are defined by the IBTA membership agreement (just like they are for PCI and any number of other technologies used within the industry). If you want to implement a technology, then you have to be a member of the appropriate organization and agree to the same industry-wide terms that others do. Hence, this problem is not IB-specific but a fact of life within the industry. >Another person, wanting to remain anonymous stated to me: > In justification for this position people say that they are just > trying to get more people to join the IBTA because they need the > dues, which by coincidence are $9500 per year, and point out > that some other commonly used specs are similarly made available > for steep prices. I don't know one way or the other about that > but this sounds a lot like the reason that we all gave ourselves > for NOT including SDP in the kernel[1]. > >So, even if a IBTA member company creates a Linux IB implementation, and >gets it into the kernel tree, any company who ships such a implementation, >who is not a IBTA member, could be the target of any patent infringement >claims[2]. Again, this is true of many technologies not just IB. For example, if a company has patents on PCI Express and someone implements a device / chipset / whatever and they are not part of the PCI-SIG, then they can be subject to different terms than someone who is a member of the PCI-SIG. In both cases, the access to specs, etc. has nothing to do with IP licensing. >So, OpenIB group, how to you plan to address this issue? Do you all have >a position as to how you think your code base can be accepted into the >main kernel tree given these recent events? This problem isn't just an OpenIB issue. It is true for the IETF, PCI-SIG, USB, PCMCIA, etc. which all have technologies with varying degrees of patents. Even going beyond what is in these various industry organizations, there are also many companies who have patents on protocol off-load, OS bypass, copy avoidance, RDMA, QoS algorithms, etc. Does any subsystem implemented in or on top of Linux suddenly stop work because there is IP involved? >thanks, > >greg k-h > >[1] SDP, for those who do not know, is a part of the IB spec that >Microsoft has come out and stated they they currently own the patents that >cover that portion of the specification, and that anyone who wants to >implement it, needs to get a licensing agreement with them. Of course, >that license agreement does not allow for a GPLed version of the >implementation. SDP was derived from Winsocks Direct. Microsoft may have IP associated with the specification. Other companies who worked on SDP may also have IP. One does not know all of the IP that may exist in any technology until someone attempts to enforce their rights. >[2] Sure, any person who has a copy of the kernel source tree could be a >target for any of a zillion other potential claims, nothing new there, but >the point here is they are explicitly stating that they will go >after non-IBTA members who touch IB code[3]. I don't see how this can be asserted. The IBTA defines the licensing requirements for member companies. It is the companies that own the IP that have to enforce their IP; the IBTA has no role in the process other than to set a level playing field for those that participate in the IBTA. This is true for other industry organizations as well. >[3] An insanely stupid position to take, given the fact that any normal >industry group would be very happy to actually have people use their >specification, but hey, the IB people have never been know for their >brilliance in the past... The same can be stated for many different technologies. The IBTA is no different than the rest of the industry and was founded using the same principles already in use in the industry at that time. In general, these operating principles are still in sync with other organizations so it is not clear that you can blame the IBTA as doing something outrageous when all of this has been known and published on the IBTA web site from the start. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebiederman at lnxi.com Fri Oct 8 15:49:16 2004 From: ebiederman at lnxi.com (Eric W. Biederman) Date: 08 Oct 2004 16:49:16 -0600 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041008202247.GA9653@kroah.com> References: <20041008202247.GA9653@kroah.com> Message-ID: Greg KH writes: > [2] Sure, any person who has a copy of the kernel source tree could be a > target for any of a zillion other potential claims, nothing new there, > but the point here is they are explicitly stating that they will go > after non-IBTA members who touch IB code[3]. Greg I see nothing to back up the idea that IBTA intends to go after non-members. I simply see a disclaimer of warranty, and I see wording by your anonymous source that restates a disclaimer of warranty. Until I see something more to back this up I do not see a problem. In fact I see infiniband prices dropping, and competition increasing. The drivers off of openib.org look like they are a good start at making a sane linux implementation. Even the PCI-SIG requires you to pay for the spec. I agree it would be suicidally insane for the infiniband trade association to go after a linux stack, as it appears that a large portion of the infiniband users are currently running linux. Given the vendors I have seen working on hardware and the vendors who are a part of the infiniband trade association there does appear to be a certain amount of disconnect between the two. So this may be an attempt to bring all of the interested parties together. Eric From jgarzik at pobox.com Fri Oct 8 16:13:31 2004 From: jgarzik at pobox.com (Jeff Garzik) Date: Fri, 08 Oct 2004 19:13:31 -0400 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: References: <20041008202247.GA9653@kroah.com> Message-ID: <41671F1B.1050907@pobox.com> Eric W. Biederman wrote: > Greg KH writes: > > >>[2] Sure, any person who has a copy of the kernel source tree could be a >>target for any of a zillion other potential claims, nothing new there, >>but the point here is they are explicitly stating that they will go >>after non-IBTA members who touch IB code[3]. > > > Greg I see nothing to back up the idea that IBTA intends to go after > non-members. I simply see a disclaimer of warranty, and I see wording > by your anonymous source that restates a disclaimer of warranty. Well, let's not rely on anonymous sources and go straight to the web site, shall we? Ordering copies of the spec, for non-members: http://www.infinibandta.org/specs/How_to_Order_IBTA_Specifications.pdf Key note: use of spec is only granted for NON-COMMERCIAL use Now, let's look at the membership agreement for IBTA: http://www.infinibandta.org/meminfo/mem-agreement.pdf Key note: The point is made repeatedly that there are no patent grants simply by being a member. From greg at kroah.com Fri Oct 8 16:13:07 2004 From: greg at kroah.com (Greg KH) Date: Fri, 8 Oct 2004 16:13:07 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: References: <20041008202247.GA9653@kroah.com> Message-ID: <20041008231307.GA32530@kroah.com> On Fri, Oct 08, 2004 at 04:49:16PM -0600, Eric W. Biederman wrote: > Greg KH writes: > > > [2] Sure, any person who has a copy of the kernel source tree could be a > > target for any of a zillion other potential claims, nothing new there, > > but the point here is they are explicitly stating that they will go > > after non-IBTA members who touch IB code[3]. > > Greg I see nothing to back up the idea that IBTA intends to go after > non-members. I simply see a disclaimer of warranty, and I see wording > by your anonymous source that restates a disclaimer of warranty. All I know is a number of different people, from different companies are suddenly very worried about this. The fact that they don't want to comment on it in public leads me to believe that there is something behind their fears. > Until I see something more to back this up I do not see a problem. In > fact I see infiniband prices dropping, and competition increasing. > The drivers off of openib.org look like they are a good start at > making a sane linux implementation. It is a good start. And as all OpenIB members are also IBTA members, I am asking for the group's position as to this change. > Even the PCI-SIG requires you to pay for the spec. I know that, almost all groups do. Although $9500 does seem a bit steep for spec prices :) > I agree it would be suicidally insane for the infiniband trade > association to go after a linux stack, as it appears that a large > portion of the infiniband users are currently running linux. One specific IBTA member has issues with the adaption of Linux, and has already done one thing to restrict a full IB implementation that would work on Linux. And as for insane, have you ever tried to actually read that spec? :) thanks, greg k-h From rlrevell at joe-job.com Fri Oct 8 16:24:00 2004 From: rlrevell at joe-job.com (Lee Revell) Date: Fri, 08 Oct 2004 19:24:00 -0400 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041008231307.GA32530@kroah.com> References: <20041008202247.GA9653@kroah.com> <20041008231307.GA32530@kroah.com> Message-ID: <1097277840.1442.92.camel@krustophenia.net> On Fri, 2004-10-08 at 19:13, Greg KH wrote: > All I know is a number of different people, from different companies are > suddenly very worried about this. The fact that they don't want to > comment on it in public leads me to believe that there is something > behind their fears. Sounds like our favorite software company's FUD squad has been busy. Lee From davej at redhat.com Fri Oct 8 16:26:32 2004 From: davej at redhat.com (Dave Jones) Date: Fri, 8 Oct 2004 19:26:32 -0400 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041008231307.GA32530@kroah.com> References: <20041008202247.GA9653@kroah.com> <20041008231307.GA32530@kroah.com> Message-ID: <20041008232632.GL25892@redhat.com> On Fri, Oct 08, 2004 at 04:13:07PM -0700, Greg KH wrote: > > Even the PCI-SIG requires you to pay for the spec. > > I know that, almost all groups do. Although $9500 does seem a bit steep > for spec prices :) Especially as (wrt PCISIG at least) the mindshare books contain almost exactly the same information for around $50. In fact, I find those books are much better presented than the actual specs. Dave From roland at topspin.com Fri Oct 8 16:27:14 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 08 Oct 2004 16:27:14 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041008202247.GA9653@kroah.com> (Greg KH's message of "Fri, 8 Oct 2004 13:22:47 -0700") References: <20041008202247.GA9653@kroah.com> Message-ID: <528yagn63x.fsf@topspin.com> The increase in cost for the spec is rather unfortunate but I think it's orthogonal to any IP issues. Since the Linux kernel contains a lot of code written to specs available only under NDA (and even reverse-engineered code where specs are completely unavailable), I don't think the expense should be an issue. As for IP, as far as I know, there has been no change to any of the bylaws or other members agreements. If there is some specific provision that concerns you, please bring it to our attention -- the IBTA in general and the IBTA steering committee in general have been very supportive of the OpenIB effort. In fact, most of the IBTA steering commitee companies (Agilent, HP, IBM, InfiniCon, Intel, Mellanox, Sun, Topspin, and Voltaire) have been active participants in OpenIB development. I would hope we can resolve any issues relating to open source and the Linux kernel. However, I would suspect that we'll find the USB, Firewire, Bluetooth, etc., etc. standards bodies all have very similar IP language in their bylaws and licenses. Thanks, Roland From roland at topspin.com Fri Oct 8 16:29:03 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 08 Oct 2004 16:29:03 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041008231307.GA32530@kroah.com> (Greg KH's message of "Fri, 8 Oct 2004 16:13:07 -0700") References: <20041008202247.GA9653@kroah.com> <20041008231307.GA32530@kroah.com> Message-ID: <524ql4n60w.fsf@topspin.com> Greg> All I know is a number of different people, from different Greg> companies are suddenly very worried about this. The fact Greg> that they don't want to comment on it in public leads me to Greg> believe that there is something behind their fears. Hmm, I haven't heard anything. I guess I'm out of the loop. Greg> One specific IBTA member has issues with the adaption of Greg> Linux, and has already done one thing to restrict a full IB Greg> implementation that would work on Linux. Microsoft is actually no longer an IBTA member. Thanks, Roland From greg at kroah.com Fri Oct 8 16:34:50 2004 From: greg at kroah.com (Greg KH) Date: Fri, 8 Oct 2004 16:34:50 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <528yagn63x.fsf@topspin.com> References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> Message-ID: <20041008233450.GA1490@kroah.com> On Fri, Oct 08, 2004 at 04:27:14PM -0700, Roland Dreier wrote: > The increase in cost for the spec is rather unfortunate but I think > it's orthogonal to any IP issues. Since the Linux kernel contains a > lot of code written to specs available only under NDA (and even > reverse-engineered code where specs are completely unavailable), I > don't think the expense should be an issue. It isn't at all, just an odd side point. > As for IP, as far as I know, there has been no change to any of the > bylaws or other members agreements. The "purchase a spec" agreement has changed, right? > If there is some specific > provision that concerns you, please bring it to our attention -- the > IBTA in general and the IBTA steering committee in general have been > very supportive of the OpenIB effort. In fact, most of the IBTA > steering commitee companies (Agilent, HP, IBM, InfiniCon, Intel, > Mellanox, Sun, Topspin, and Voltaire) have been active participants in > OpenIB development. I would hope we can resolve any issues relating > to open source and the Linux kernel. What about the issue of not being able to use the spec for "commercial" applications? And doesn't the member agreement not cover anyone who implements the spec, and then gives that implementation to someone who is not a member? > However, I would suspect that we'll find the USB, Firewire, Bluetooth, > etc., etc. standards bodies all have very similar IP language in their > bylaws and licenses. No, the USB bylaws explicitly forbid any member company from putting in, or trying to claim any IP that is in the USB specs. That is something that makes USB quite different from IB. I haven't had the misfortune to have to go read the PCI SIG bylaws and member agreement... thanks, greg k-h From jgarzik at pobox.com Fri Oct 8 17:57:22 2004 From: jgarzik at pobox.com (Jeff Garzik) Date: Fri, 08 Oct 2004 20:57:22 -0400 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <528yagn63x.fsf@topspin.com> References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> Message-ID: <41673772.9010402@pobox.com> Roland Dreier wrote: > As for IP, as far as I know, there has been no change to any of the > bylaws or other members agreements. If there is some specific > provision that concerns you, please bring it to our attention -- the > IBTA in general and the IBTA steering committee in general have been > very supportive of the OpenIB effort. In fact, most of the IBTA > steering commitee companies (Agilent, HP, IBM, InfiniCon, Intel, > Mellanox, Sun, Topspin, and Voltaire) have been active participants in > OpenIB development. I would hope we can resolve any issues relating > to open source and the Linux kernel. Read the member agreement :) It -explicitly- does -not- require waiving of patent claims related to any implementation of IB. That's different from ATA, SCSI, USB, the list goes on... Jeff From roland at topspin.com Fri Oct 8 20:09:50 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 08 Oct 2004 20:09:50 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <41673772.9010402@pobox.com> (Jeff Garzik's message of "Fri, 08 Oct 2004 20:57:22 -0400") References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <41673772.9010402@pobox.com> Message-ID: <52zn2wlh8h.fsf@topspin.com> Jeff> Read the member agreement :) It -explicitly- does -not- Jeff> require waiving of patent claims related to any Jeff> implementation of IB. Jeff> That's different from ATA, SCSI, USB, the list goes on... Fair enough, but read the Bluetooth SIG patent agreement [1]. As far as I can tell, all it requires is that other SIG members receive a patent license. Do we need to do rm -rf net/bluetooth? IEEE only requires that patents be licensed under RAND terms (it does not even require royalty free licensing) [2]. Time for rm -rf drivers/ieee1394? The code that we have written so far is pretty standard driver code, so I have a hard time believing that the IB drivers are any more at risk than any other Linux code. There may be good and valid reasons not to merge IB drivers upstream, but I'd be very disappointed if this FUD about patents is what keeps them out. Thanks, Roland [1] https://www.bluetooth.org/foundry/sitecontent/document/Patent_and_Copyright_License_Agreement [2] http://standards.ieee.org/guides/bylaws/sect6-7.html#6 From roland at topspin.com Fri Oct 8 20:40:20 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 08 Oct 2004 20:40:20 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041008233450.GA1490@kroah.com> (Greg KH's message of "Fri, 8 Oct 2004 16:34:50 -0700") References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <20041008233450.GA1490@kroah.com> Message-ID: <52vfdklftn.fsf@topspin.com> Greg> The "purchase a spec" agreement has changed, right? Good point. I think the right way to understand this is that the purchase agreement and the $9500 cost is intended to discourage anyone from actually buying the spec -- for the same money you can become a full IBTA member so why shell out for the spec with more restrictions? This might be counterproductive but I don't think there's anything sinister behind it. - Roland From jgarzik at pobox.com Fri Oct 8 21:23:22 2004 From: jgarzik at pobox.com (Jeff Garzik) Date: Sat, 09 Oct 2004 00:23:22 -0400 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <52zn2wlh8h.fsf@topspin.com> References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <41673772.9010402@pobox.com> <52zn2wlh8h.fsf@topspin.com> Message-ID: <416767BA.1020204@pobox.com> Roland Dreier wrote: > Jeff> Read the member agreement :) It -explicitly- does -not- > Jeff> require waiving of patent claims related to any > Jeff> implementation of IB. > > Jeff> That's different from ATA, SCSI, USB, the list goes on... > > Fair enough, but read the Bluetooth SIG patent agreement [1]. As far > as I can tell, all it requires is that other SIG members receive a > patent license. Do we need to do rm -rf net/bluetooth? IEEE only > requires that patents be licensed under RAND terms (it does not even > require royalty free licensing) [2]. Time for rm -rf drivers/ieee1394? As my mother would ask, would you jump off a cliff just because your friend did? If there is questionable code, that is _not_ a justification to add more. Jeff From romieu at fr.zoreil.com Sat Oct 9 04:50:28 2004 From: romieu at fr.zoreil.com (Francois Romieu) Date: Sat, 9 Oct 2004 13:50:28 +0200 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <528yagn63x.fsf@topspin.com> References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> Message-ID: <20041009115028.GA14571@electric-eye.fr.zoreil.com> Roland Dreier : > it's orthogonal to any IP issues. Since the Linux kernel contains a > lot of code written to specs available only under NDA (and even > reverse-engineered code where specs are completely unavailable), I > don't think the expense should be an issue. One can say good bye to peer review. -- Ueimor From roland at topspin.com Sat Oct 9 13:47:15 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 09 Oct 2004 13:47:15 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <20041009115028.GA14571@electric-eye.fr.zoreil.com> (Francois Romieu's message of "Sat, 9 Oct 2004 13:50:28 +0200") References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <20041009115028.GA14571@electric-eye.fr.zoreil.com> Message-ID: <52oejbliuk.fsf@topspin.com> Roland> it's orthogonal to any IP issues. Since the Linux kernel Roland> contains a lot of code written to specs available only Roland> under NDA (and even reverse-engineered code where specs Roland> are completely unavailable), I don't think the expense Roland> should be an issue. Francois> One can say good bye to peer review. Yes and no. Certainly people without specs can't review spec compliance, but review for coding style, locking bugs, etc. is if anything more valuable. Thanks, Roland From roland at topspin.com Sat Oct 9 14:11:06 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 09 Oct 2004 14:11:06 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <416767BA.1020204@pobox.com> (Jeff Garzik's message of "Sat, 09 Oct 2004 00:23:22 -0400") References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <41673772.9010402@pobox.com> <52zn2wlh8h.fsf@topspin.com> <416767BA.1020204@pobox.com> Message-ID: <52k6tzlhqt.fsf@topspin.com> Jeff> If there is questionable code, that is _not_ a justification Jeff> to add more. I guess my point was not that the bluetooth stack is somehow questionable, but rather that the IP policies of a standards bodies are really not a good reason to keep code out of the kernel. If someone can name one patent that the IB driver stack looks like it might possibly run into, then we would have to take that very seriously. However, no one has done this here -- all we have is FUD or guilt by association or whatever you want to call it. The mere fact that the IBTA bylaws only require members license their patents under RAND terms shouldn't be an issue. If nothing else, the fact that there are hugely more non-IBTA member companies than member companies who might have patents makes the IBTA bylaws almost a moot point. For what its worth, I know of at least five companies shipping IB stacks and the only patent licensing that I know of is the Microsoft SDP license, and even that is really just CYA: all Microsoft says is that they _might_ have patents that cover SDP and that they will license them at no cost to anyone who wants them; unfortunately this license is not GPL-compatible, but for proprietary stacks the zero-cost terms look fine. There are people who have looked at Microsoft's patents and concluded that none of them actually apply to SDP as specified by the IBTA. Thanks, Roland From alan at lxorguk.ukuu.org.uk Sun Oct 10 12:05:32 2004 From: alan at lxorguk.ukuu.org.uk (Alan Cox) Date: Sun, 10 Oct 2004 20:05:32 +0100 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <52k6tzlhqt.fsf@topspin.com> References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <41673772.9010402@pobox.com> <52zn2wlh8h.fsf@topspin.com> <416767BA.1020204@pobox.com> <52k6tzlhqt.fsf@topspin.com> Message-ID: <1097435131.29422.7.camel@localhost.localdomain> On Sad, 2004-10-09 at 22:11, Roland Dreier wrote: > I guess my point was not that the bluetooth stack is somehow > questionable, but rather that the IP policies of a standards bodies > are really not a good reason to keep code out of the kernel. If > someone can name one patent that the IB driver stack looks like it > might possibly run into, then we would have to take that very > seriously. However, no one has done this here -- all we have is FUD > or guilt by association or whatever you want to call it. Its called "caution". It's why nobody does innovation in the USA any more, its too dangerous to innovate. Far better to make it available as before with a blue led and a beeper. > The mere fact that the IBTA bylaws only require members license their > patents under RAND terms shouldn't be an issue. If nothing else, the > fact that there are hugely more non-IBTA member companies than member > companies who might have patents makes the IBTA bylaws almost a moot > point. The big question seems to be about the standard itself. Are the items at issue hardware or software ? We already deal with a lot of devices that have hardware related patent pools and those by themselves don't seem to cause problems. In the mean time I guess the guys down in Bristol[1] will be feeling happier and happier at the Infiniband self destruct sequence. Alan [1] Quadrics From roland at topspin.com Sun Oct 10 13:33:46 2004 From: roland at topspin.com (Roland Dreier) Date: Sun, 10 Oct 2004 13:33:46 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <1097435131.29422.7.camel@localhost.localdomain> (Alan Cox's message of "Sun, 10 Oct 2004 20:05:32 +0100") References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <41673772.9010402@pobox.com> <52zn2wlh8h.fsf@topspin.com> <416767BA.1020204@pobox.com> <52k6tzlhqt.fsf@topspin.com> <1097435131.29422.7.camel@localhost.localdomain> Message-ID: <52brfal3dh.fsf@topspin.com> Alan> The big question seems to be about the standard itself. Are Alan> the items at issue hardware or software ? We already deal Alan> with a lot of devices that have hardware related patent Alan> pools and those by themselves don't seem to cause problems. As far as I know there are no items at issue. No one has suggested that there actually are any patents to worry about. The big complaint is just that the IBTA member companies haven't made enough promises about their patents. The OpenIB subversion repository can be checked out by anyone interested. Anyone who wants to can look the code over and look for something patent encumbered. Thanks, Roland From halr at voltaire.com Mon Oct 11 07:36:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 11 Oct 2004 10:36:24 -0400 Subject: [openib-general] tvflash questions Message-ID: <1097505384.2821.5.camel@localhost.localdomain> Hi, I have two HCAs which I am attempting to update via tvflash. One appears to be at firmware version 1.18.00 and the other at 1.18.01. When I use tvflash, I get the following (error) message: tvflash -v -d fw-23108-rel-3_2_0/fw-23108-a1-rel.mlx GUID pointer (0x4D545061) is larger than image size (0x19EC27)! Am I correct in assuming the firmware is not updated ? (It appears that way since after rebooting the machine the same version of firmware (1.18) is indicated. Also (and more importantly) is there a way around this with the OpenIB tools ? Thanks. -- Hal From roland at topspin.com Mon Oct 11 07:50:21 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 11 Oct 2004 07:50:21 -0700 Subject: [openib-general] Re: tvflash questions In-Reply-To: <1097505384.2821.5.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 11 Oct 2004 10:36:24 -0400") References: <1097505384.2821.5.camel@localhost.localdomain> Message-ID: <527jpxl36a.fsf@topspin.com> Hal> tvflash -v -d fw-23108-rel-3_2_0/fw-23108-a1-rel.mlx GUID tvflash uses binary firmware images, not "mlx" format. See the README. - R. From halr at voltaire.com Mon Oct 11 08:10:03 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 11 Oct 2004 11:10:03 -0400 Subject: [openib-general] Re: tvflash questions In-Reply-To: <527jpxl36a.fsf@topspin.com> References: <1097505384.2821.5.camel@localhost.localdomain> <527jpxl36a.fsf@topspin.com> Message-ID: <1097507402.2821.22.camel@localhost.localdomain> On Mon, 2004-10-11 at 10:50, Roland Dreier wrote: > Hal> tvflash -v -d fw-23108-rel-3_2_0/fw-23108-a1-rel.mlx GUID > > tvflash uses binary firmware images, not "mlx" format. See the README. Thanks. I should learn to RTFM... The README states: To generate a firmware file for use with tvflash, use Mellanox's InfiniBurn tool and save a firmware image in binary format to a file. In the long run, do we need either of the following: 1. Mellanox to release their firmware in binary form 2. An open source tool to do this conversion -- Hal From krause at cup.hp.com Mon Oct 11 09:09:09 2004 From: krause at cup.hp.com (Michael Krause) Date: Mon, 11 Oct 2004 09:09:09 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <52k6tzlhqt.fsf@topspin.com> References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <41673772.9010402@pobox.com> <52zn2wlh8h.fsf@topspin.com> <416767BA.1020204@pobox.com> <52k6tzlhqt.fsf@topspin.com> Message-ID: <6.1.2.0.2.20041011090336.01f3bcf8@esmail.cup.hp.com> At 02:11 PM 10/9/2004, you wrote: > Jeff> If there is questionable code, that is _not_ a justification > Jeff> to add more. > >I guess my point was not that the bluetooth stack is somehow >questionable, but rather that the IP policies of a standards bodies >are really not a good reason to keep code out of the kernel. If >someone can name one patent that the IB driver stack looks like it >might possibly run into, then we would have to take that very >seriously. However, no one has done this here -- all we have is FUD >or guilt by association or whatever you want to call it. > >The mere fact that the IBTA bylaws only require members license their >patents under RAND terms shouldn't be an issue. If nothing else, the >fact that there are hugely more non-IBTA member companies than member >companies who might have patents makes the IBTA bylaws almost a moot >point. It isn't a numbers game when it comes to the law. >For what its worth, I know of at least five companies shipping IB >stacks and the only patent licensing that I know of is the Microsoft >SDP license, The lack of companies enforcing their patents at this time does not mean that they will not do so in the future. There are many patents in the IB specification suite. The question is whether these apply to the software stack being done within this forum. Until a company comes forward and states their intention, there is no way to tell. Not attempting any FUD here as I don't see a reason to stop development of any of a wide range of technologies that are covered by similar terms in various industry bodies. >and even that is really just CYA: all Microsoft says is >that they _might_ have patents that cover SDP and that they will >license them at no cost to anyone who wants them; unfortunately this >license is not GPL-compatible, but for proprietary stacks the >zero-cost terms look fine. There are people who have looked at >Microsoft's patents and concluded that none of them actually apply to >SDP as specified by the IBTA. The patent office is the only one who can draw a conclusion that can be relied upon. I'd never rely upon hearsay in making a decision. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Oct 11 11:16:09 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 11 Oct 2004 14:16:09 -0400 Subject: [openib-general] [PATCH] ib_mad: Add module name to messages Message-ID: <1097518569.6014.0.camel@hpc-1> ib_mad: Add modules name to messages Index: ib_mad_priv.h =================================================================== --- ib_mad_priv.h (revision 965) +++ ib_mad_priv.h (working copy) @@ -61,6 +61,9 @@ #include #include + +#define PFX "ib_mad: " + #define IB_MAD_QPS_CORE 2 /* Always QP0 and QP1 as a minimum */ /* QP and CQ parameters */ Index: ib_mad.c =================================================================== --- ib_mad.c (revision 966) +++ ib_mad.c (working copy) @@ -308,7 +308,7 @@ &port_priv->send_posted_mad_list); port_priv->send_posted_mad_count++; } else { - printk(KERN_NOTICE "ib_post_send failed ret = %d\n", ret); + printk(KERN_NOTICE PFX "ib_post_send failed ret = %d\n", ret); *bad_send_wr = send_wr; } spin_unlock_irqrestore(&port_priv->send_list_lock, flags); @@ -353,7 +353,7 @@ GFP_ATOMIC : GFP_KERNEL); if (!mad_send_wr) { *bad_send_wr = cur_send_wr; - printk(KERN_ERR "No memory for " + printk(KERN_ERR PFX "No memory for " "ib_mad_send_wr_private\n"); return -ENOMEM; } @@ -432,7 +432,7 @@ void ib_coalesce_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, void *buf) { - printk(KERN_ERR "ib_coalesce_recv_mad() not implemented yet\n"); + printk(KERN_ERR PFX "ib_coalesce_recv_mad() not implemented yet\n"); } EXPORT_SYMBOL(ib_coalesce_recv_mad); @@ -449,7 +449,7 @@ int ib_process_mad_wc(struct ib_mad_agent *mad_agent, struct ib_wc *wc) { - printk(KERN_ERR "ib_process_mad_wc() not implemented yet\n"); + printk(KERN_ERR PFX "ib_process_mad_wc() not implemented yet\n"); return 0; } EXPORT_SYMBOL(ib_process_mad_wc); @@ -471,7 +471,7 @@ i = find_next_bit(mad_reg_req->method_mask, IB_MGMT_MAX_METHODS, 1+i)) { if ((*method)->agent[i]) { - printk(KERN_ERR "Method %d already in use\n", i); + printk(KERN_ERR PFX "Method %d already in use\n", i); return -EINVAL; } } @@ -483,7 +483,7 @@ /* Allocate management method table */ *method = kmalloc(sizeof **method, GFP_KERNEL); if (!*method) { - printk(KERN_ERR "No memory for ib_mad_mgmt_method_table\n"); + printk(KERN_ERR PFX "No memory for ib_mad_mgmt_method_table\n"); return -ENOMEM; } /* Clear management method table */ @@ -556,7 +556,7 @@ /* Allocate management class table for "new" class version */ *class = kmalloc(sizeof **class, GFP_KERNEL); if (!*class) { - printk(KERN_ERR "No memory for " + printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_class_table\n"); goto error1; } @@ -628,8 +628,8 @@ port_priv = agent_priv->port_priv; class = port_priv->version[agent_priv->reg_req->mgmt_class_version]; if (!class) { - printk(KERN_ERR "No class table yet MAD registration request " - "supplied\n"); + printk(KERN_ERR PFX "No class table yet MAD registration " + "request supplied\n"); goto ret; } @@ -713,21 +713,21 @@ } } if (!mad_agent) { - printk(KERN_ERR "No client 0x%x for received MAD on " - "port %d\n", hi_tid, port_priv->port_num); + printk(KERN_ERR PFX "No client 0x%x for received MAD " + "on port %d\n", hi_tid, port_priv->port_num); goto ret; } } else { /* Routing is based on version, class, and method */ if (mad->mad_hdr.class_version >= MAX_MGMT_VERSION) { - printk(KERN_ERR "MAD received with unsupported class " - "version %d on port %d\n", + printk(KERN_ERR PFX "MAD received with unsupported " + "class version %d on port %d\n", mad->mad_hdr.class_version, port_priv->port_num); goto ret; } version = port_priv->version[mad->mad_hdr.class_version]; if (!version) { - printk(KERN_ERR "MAD received on port %d for class " + printk(KERN_ERR PFX "MAD received on port %d for class " "version %d with no client\n", port_priv->port_num, mad->mad_hdr.class_version); goto ret; @@ -735,7 +735,7 @@ class = version->method_table[convert_mgmt_class( mad->mad_hdr.mgmt_class)]; if (!class) { - printk(KERN_ERR "MAD received on port %d for class " + printk(KERN_ERR PFX "MAD received on port %d for class " "%d with no client\n", port_priv->port_num, mad->mad_hdr.mgmt_class); goto ret; @@ -754,7 +754,7 @@ /* Make sure MAD base version is understood */ if (mad->mad_hdr.base_version != IB_MGMT_BASE_VERSION) { - printk(KERN_ERR "MAD received with unsupported base " + printk(KERN_ERR PFX "MAD received with unsupported base " "version %d\n", mad->mad_hdr.base_version); goto ret; } @@ -874,7 +874,8 @@ qp_num = wrid.wrid_field.qpn; qpn = convert_qpnum(qp_num); if (qpn == -1) { - printk(KERN_ERR "Packet received on unknown QPN %d\n", qp_num); + printk(KERN_ERR PFX "Packet received on unknown QPN %d\n", + qp_num); ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); return; } @@ -899,7 +900,7 @@ port_priv->recv_posted_mad_count[qpn]--; } else { - printk(KERN_ERR "Receive completion WR ID 0x%Lx on QP %d " + printk(KERN_ERR PFX "Receive completion WR ID 0x%Lx on QP %d " "with no posted receive\n", wc->wr_id, qp_num); spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); @@ -931,8 +932,8 @@ solicited); if (!mad_agent) { spin_unlock_irqrestore(&port_priv->reg_lock, flags); - printk(KERN_NOTICE "No matching mad agent found for received " - "MAD on port %d\n", port_priv->port_num); + printk(KERN_NOTICE PFX "No matching mad agent found for " + "received MAD on port %d\n", port_priv->port_num); } else { atomic_inc(&mad_agent->refcount); spin_unlock_irqrestore(&port_priv->reg_lock, flags); @@ -1025,8 +1026,8 @@ /* Completion corresponds to first entry on posted MAD send list */ spin_lock_irqsave(&port_priv->send_list_lock, flags); if (list_empty(&port_priv->send_posted_mad_list)) { - printk(KERN_ERR "Send completion WR ID 0x%Lx but send list " - "is empty\n", wc->wr_id); + printk(KERN_ERR PFX "Send completion WR ID 0x%Lx but send " + "list is empty\n", wc->wr_id); goto error; } @@ -1037,7 +1038,7 @@ mad_send_wr = container_of(send_wr, struct ib_mad_send_wr_private, send_list); if (wc->wr_id != (unsigned long)mad_send_wr) { - printk(KERN_ERR "Send completion WR ID 0x%Lx doesn't match " + printk(KERN_ERR PFX "Send completion WR ID 0x%Lx doesn't match " "posted send WR ID 0x%lx\n", wc->wr_id, (unsigned long)mad_send_wr); @@ -1068,26 +1069,26 @@ ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { - printk(KERN_DEBUG "Completion opcode 0x%x WRID 0x%Lx\n", + printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", wc.opcode, wc.wr_id); switch (wc.opcode) { case IB_WC_SEND: if (wc.status != IB_WC_SUCCESS) - printk(KERN_ERR "Send completion error %d\n", + printk(KERN_ERR PFX "Send completion error %d\n", wc.status); ib_mad_send_done_handler(port_priv, &wc); break; case IB_WC_RECV: if (wc.status != IB_WC_SUCCESS) - printk(KERN_ERR "Recv completion error %d\n", + printk(KERN_ERR PFX "Recv completion error %d\n", wc.status); ib_mad_recv_done_handler(port_priv, &wc); break; default: - printk(KERN_ERR "Wrong Opcode 0x%x on completion\n", + printk(KERN_ERR PFX "Wrong Opcode 0x%x on completion\n", wc.opcode); if (wc.status) { - printk(KERN_ERR "Completion error %d\n", + printk(KERN_ERR PFX "Completion error %d\n", wc.status); } } @@ -1233,7 +1234,7 @@ port_priv->device->name, port_priv->port_num); if (IS_ERR(port_priv->mad_thread)) { - printk(KERN_ERR "Couldn't start ib_mad thread for %s port %d\n", + printk(KERN_ERR PFX "Couldn't start ib_mad thread for %s port %d\n", port_priv->device->name, port_priv->port_num); return PTR_ERR(port_priv->mad_thread); } @@ -1264,7 +1265,7 @@ qpn = convert_qpnum(qp->qp_num); if (qpn == -1) { - printk(KERN_ERR "Post receive to invalid QPN %d\n", qp->qp_num); + printk(KERN_ERR PFX "Post receive to invalid QPN %d\n", qp->qp_num); return -EINVAL; } @@ -1279,7 +1280,7 @@ (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL); if (!mad_priv) { - printk(KERN_ERR "No memory for receive buffer\n"); + printk(KERN_ERR PFX "No memory for receive buffer\n"); return -ENOMEM; } @@ -1326,7 +1327,7 @@ spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); kmem_cache_free(ib_mad_cache, mad_priv); - printk(KERN_NOTICE "ib_post_recv WRID 0x%Lx failed ret = %d\n", + printk(KERN_NOTICE PFX "ib_post_recv WRID 0x%Lx failed ret = %d\n", recv_wr.wr_id, ret); return -EINVAL; } @@ -1345,8 +1346,8 @@ for (j = 0; j < IB_MAD_QPS_CORE; j++) { if (ib_mad_post_receive_mad(port_priv, port_priv->qp[j])) { - printk(KERN_ERR "receive post %d failed on %s " - "port %d\n", i + 1, + printk(KERN_ERR PFX "receive post %d failed " + "on %s port %d\n", i + 1, port_priv->device->name, port_priv->port_num); } @@ -1425,7 +1426,7 @@ attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { - printk(KERN_ERR "Couldn't allocate memory for ib_qp_attr\n"); + printk(KERN_ERR PFX "Couldn't allocate memory for ib_qp_attr\n"); return -ENOMEM; } @@ -1445,7 +1446,7 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - printk(KERN_DEBUG "ib_mad_change_qp_state_to_init ret = %d\n", ret); + printk(KERN_DEBUG PFX "ib_mad_change_qp_state_to_init ret = %d\n", ret); return ret; } @@ -1461,7 +1462,7 @@ attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { - printk(KERN_ERR "Couldn't allocate memory for ib_qp_attr\n"); + printk(KERN_ERR PFX "Couldn't allocate memory for ib_qp_attr\n"); return -ENOMEM; } @@ -1471,7 +1472,7 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - printk(KERN_DEBUG "ib_mad_change_qp_state_to_rtr ret = %d\n", ret); + printk(KERN_DEBUG PFX "ib_mad_change_qp_state_to_rtr ret = %d\n", ret); return ret; } @@ -1487,7 +1488,7 @@ attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { - printk(KERN_ERR "Couldn't allocate memory for ib_qp_attr\n"); + printk(KERN_ERR PFX "Couldn't allocate memory for ib_qp_attr\n"); return -ENOMEM; } @@ -1498,7 +1499,7 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - printk(KERN_DEBUG "ib_mad_change_qp_state_to_rts ret = %d\n", ret); + printk(KERN_DEBUG PFX "ib_mad_change_qp_state_to_rts ret = %d\n", ret); return ret; } @@ -1514,7 +1515,7 @@ attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { - printk(KERN_ERR "Couldn't allocate memory for ib_qp_attr\n"); + printk(KERN_ERR PFX "Couldn't allocate memory for ib_qp_attr\n"); return -ENOMEM; } @@ -1524,7 +1525,7 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - printk(KERN_DEBUG "ib_mad_change_qp_state_to_reset ret = %d\n", ret); + printk(KERN_DEBUG PFX "ib_mad_change_qp_state_to_reset ret = %d\n", ret); return ret; } @@ -1538,7 +1539,7 @@ for (i = 0; i < IB_MAD_QPS_CORE; i++) { ret = ib_mad_change_qp_state_to_init(port_priv->qp[i]); if (ret) { - printk(KERN_ERR "Couldn't change QP%d state to " + printk(KERN_ERR PFX "Couldn't change QP%d state to " "INIT\n", i); return ret; } @@ -1546,27 +1547,27 @@ ret = ib_mad_post_receive_mads(port_priv); if (ret) { - printk(KERN_ERR "Couldn't post receive requests\n"); + printk(KERN_ERR PFX "Couldn't post receive requests\n"); goto error; } ret = ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); if (ret) { - printk(KERN_ERR "Failed to request completion notification\n"); + printk(KERN_ERR PFX "Failed to request completion notification\n"); goto error; } for (i = 0; i < IB_MAD_QPS_CORE; i++) { ret = ib_mad_change_qp_state_to_rtr(port_priv->qp[i]); if (ret) { - printk(KERN_ERR "Couldn't change QP%d state to " + printk(KERN_ERR PFX "Couldn't change QP%d state to " "RTR\n", i); goto error; } ret = ib_mad_change_qp_state_to_rts(port_priv->qp[i]); if (ret) { - printk(KERN_ERR "Couldn't change QP%d state to " + printk(KERN_ERR PFX "Couldn't change QP%d state to " "RTS\n", i); goto error; } @@ -1578,8 +1579,8 @@ for (i = 0; i < IB_MAD_QPS_CORE; i++) { ret2 = ib_mad_change_qp_state_to_reset(port_priv->qp[i]); if (ret2) { - printk(KERN_ERR "ib_mad_port_start: Couldn't change " - "QP%d state to RESET\n", i); + printk(KERN_ERR PFX "ib_mad_port_start: Couldn't " + "change QP%d state to RESET\n", i); } } @@ -1596,8 +1597,8 @@ for (i = 0; i < IB_MAD_QPS_CORE; i++) { ret = ib_mad_change_qp_state_to_reset(port_priv->qp[i]); if (ret) { - printk(KERN_ERR "ib_mad_port_stop: Couldn't change %s " - "port %d QP%d state to RESET\n", + printk(KERN_ERR PFX "ib_mad_port_stop: Couldn't change " + "%s port %d QP%d state to RESET\n", port_priv->device->name, port_priv->port_num, i); } } @@ -1633,7 +1634,7 @@ } spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); if (port_priv) { - printk(KERN_DEBUG "%s port %d already open\n", + printk(KERN_DEBUG PFX "%s port %d already open\n", device->name, port_num); return 0; } @@ -1641,7 +1642,7 @@ /* Create new device info */ port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); if (!port_priv) { - printk(KERN_ERR "No memory for ib_mad_port_private\n"); + printk(KERN_ERR PFX "No memory for ib_mad_port_private\n"); return -ENOMEM; } @@ -1659,14 +1660,14 @@ ib_mad_thread_completion_handler, NULL, port_priv, cq_size); if (IS_ERR(port_priv->cq)) { - printk(KERN_ERR "Couldn't create ib_mad CQ\n"); + printk(KERN_ERR PFX "Couldn't create ib_mad CQ\n"); ret = PTR_ERR(port_priv->cq); goto error3; } port_priv->pd = ib_alloc_pd(device); if (IS_ERR(port_priv->pd)) { - printk(KERN_ERR "Couldn't create ib_mad PD\n"); + printk(KERN_ERR PFX "Couldn't create ib_mad PD\n"); ret = PTR_ERR(port_priv->pd); goto error4; } @@ -1674,7 +1675,7 @@ port_priv->mr = ib_reg_phys_mr(port_priv->pd, &buf_list, 1, IB_ACCESS_LOCAL_WRITE, &iova); if (IS_ERR(port_priv->mr)) { - printk(KERN_ERR "Couldn't register ib_mad MR\n"); + printk(KERN_ERR PFX "Couldn't register ib_mad MR\n"); ret = PTR_ERR(port_priv->mr); goto error5; } @@ -1694,14 +1695,14 @@ port_priv->qp[i] = ib_create_qp(port_priv->pd, &qp_init_attr, &qp_cap); if (IS_ERR(port_priv->qp[i])) { - printk(KERN_ERR "Couldn't create ib_mad QP%d\n", i); + printk(KERN_ERR PFX "Couldn't create ib_mad QP%d\n", i); ret = PTR_ERR(port_priv->qp[i]); if (i == 0) goto error6; else goto error7; } - printk(KERN_DEBUG "Created ib_mad QP %d\n", + printk(KERN_DEBUG PFX "Created ib_mad QP %d\n", port_priv->qp[i]->qp_num); } @@ -1723,7 +1724,7 @@ ret = ib_mad_port_start(port_priv); if (ret) { - printk(KERN_ERR "Couldn't start port\n"); + printk(KERN_ERR PFX "Couldn't start port\n"); goto error8; } @@ -1768,7 +1769,7 @@ } if (port_priv == NULL) { - printk(KERN_ERR "Port %d not found\n", port_num); + printk(KERN_ERR PFX "Port %d not found\n", port_num); spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); return -ENODEV; } @@ -1797,7 +1798,7 @@ ret = ib_query_device(device, &device_attr); if (ret) { - printk(KERN_ERR "Couldn't query device %s\n", device->name); + printk(KERN_ERR PFX "Couldn't query device %s\n", device->name); goto error_device_query; } @@ -1811,7 +1812,7 @@ for (i = 0; i < num_ports; i++, cur_port++) { ret = ib_mad_port_open(device, cur_port); if (ret) { - printk(KERN_ERR "Couldn't open %s port %d\n", + printk(KERN_ERR PFX "Couldn't open %s port %d\n", device->name, cur_port); goto error_device_open; } @@ -1824,7 +1825,7 @@ cur_port--; ret2 = ib_mad_port_close(device, cur_port); if (ret2) { - printk(KERN_ERR "Couldn't close %s port %d\n", + printk(KERN_ERR PFX "Couldn't close %s port %d\n", device->name, cur_port); } i--; @@ -1841,7 +1842,7 @@ ret = ib_query_device(device, &device_attr); if (ret) { - printk(KERN_ERR "Couldn't query device %s\n", device->name); + printk(KERN_ERR PFX "Couldn't query device %s\n", device->name); goto error_device_query; } @@ -1855,7 +1856,7 @@ for (i = 0; i < num_ports; i++, cur_port++) { ret2 = ib_mad_port_close(device, cur_port); if (ret2) { - printk(KERN_ERR "Couldn't close %s port %d\n", + printk(KERN_ERR PFX "Couldn't close %s port %d\n", device->name, cur_port); if (!ret) ret = ret2; @@ -1881,14 +1882,14 @@ NULL, NULL); if (!ib_mad_cache) { - printk(KERN_ERR "Couldn't create ib_mad cache\n"); + printk(KERN_ERR PFX "Couldn't create ib_mad cache\n"); return -ENOMEM; } INIT_LIST_HEAD(&ib_mad_port_list); if (ib_register_client(&mad_client)) { - printk(KERN_ERR "Couldn't register ib_mad client\n"); + printk(KERN_ERR PFX "Couldn't register ib_mad client\n"); return -EINVAL; } @@ -1900,7 +1901,7 @@ ib_unregister_client(&mad_client); if (kmem_cache_destroy(ib_mad_cache)) { - printk(KERN_DEBUG "Failed to destroy ib_mad cache\n"); + printk(KERN_DEBUG PFX "Failed to destroy ib_mad cache\n"); } } From halr at voltaire.com Mon Oct 11 11:25:22 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 11 Oct 2004 14:25:22 -0400 Subject: [openib-general] [PATCH] ib_smi: Add module names to messages Message-ID: <1097519122.6014.2.camel@hpc-1> ib_smi: Add modules names to messages Index: ib_smi_priv.h =================================================================== --- ib_smi_priv.h (revision 965) +++ ib_smi_priv.h (working copy) @@ -28,6 +28,8 @@ #include +#define SPFX "ib_sma: " + struct ib_smi_send_wr { struct list_head send_list; struct ib_mad *smp; Index: ib_smi.c =================================================================== --- ib_smi.c (revision 965) +++ ib_smi.c (working copy) @@ -322,7 +322,7 @@ } spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR "smp_send: no matching MAD agent 0x%x\n", + printk(KERN_ERR SPFX "smp_send: no matching MAD agent 0x%x\n", (unsigned int)mad_agent); return; } @@ -357,7 +357,7 @@ ah = ib_create_ah(mad_agent->qp->pd, &ah_attr); if (IS_ERR(ah)) { - printk(KERN_ERR "No memory for address handle\n"); + printk(KERN_ERR SPFX "No memory for address handle\n"); kfree(smp); return; } @@ -467,7 +467,7 @@ /* Hold lock longer !!! */ spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR "smi_send_handler: no matching MAD agent " + printk(KERN_ERR SPFX "smi_send_handler: no matching MAD agent " "0x%x\n", (unsigned int)mad_agent); return; } @@ -476,7 +476,7 @@ spin_lock_irqsave(&port_priv->send_list_lock, flags); if (list_empty(&port_priv->send_posted_smp_list)) { spin_unlock_irqrestore(&port_priv->send_list_lock, flags); - printk(KERN_ERR "Send completion WR ID 0x%Lx but send list " + printk(KERN_ERR SPFX "Send completion WR ID 0x%Lx but send list " "is empty\n", mad_send_wc->wr_id); return; } @@ -535,7 +535,7 @@ } spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); if (port_priv) { - printk(KERN_DEBUG "%s port %d already open\n", + printk(KERN_DEBUG SPFX "%s port %d already open\n", device->name, port_num); return 0; } @@ -543,7 +543,7 @@ /* Create new device info */ port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); if (!port_priv) { - printk(KERN_ERR "No memory for ib_smi_port_private\n"); + printk(KERN_ERR SPFX "No memory for ib_smi_port_private\n"); ret = -ENOMEM; goto error1; } @@ -606,7 +606,7 @@ &buf_list, 1, IB_ACCESS_LOCAL_WRITE, &iova); if (IS_ERR(port_priv->mr)) { - printk(KERN_ERR "Couldn't register MR\n"); + printk(KERN_ERR SPFX "Couldn't register MR\n"); ret = PTR_ERR(port_priv->mr); goto error5; } @@ -644,7 +644,7 @@ } if (port_priv == NULL) { - printk(KERN_ERR "Port %d not found\n", port_num); + printk(KERN_ERR SPFX "Port %d not found\n", port_num); spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); return -ENODEV; } @@ -668,7 +668,7 @@ ret = ib_query_device(device, &device_attr); if (ret) { - printk(KERN_ERR "Couldn't query device %s\n", device->name); + printk(KERN_ERR SPFX "Couldn't query device %s\n", device->name); goto error_device_query; } @@ -683,7 +683,7 @@ for (i = 0; i < num_ports; i++, cur_port++) { ret = ib_smi_port_open(device, cur_port); if (ret) { - printk(KERN_ERR "Couldn't open %s port %d\n", + printk(KERN_ERR SPFX "Couldn't open %s port %d\n", device->name, cur_port); goto error_device_open; } @@ -696,7 +696,7 @@ cur_port--; ret2 = ib_smi_port_close(device, cur_port); if (ret2) { - printk(KERN_ERR "Couldn't close %s port %d\n", + printk(KERN_ERR SPFX "Couldn't close %s port %d\n", device->name, cur_port); } i--; @@ -713,7 +713,7 @@ ret = ib_query_device(device, &device_attr); if (ret) { - printk(KERN_ERR "Couldn't query device %s\n", device->name); + printk(KERN_ERR SPFX "Couldn't query device %s\n", device->name); goto error_device_query; } @@ -727,7 +727,7 @@ for (i = 0; i < num_ports; i++, cur_port++) { ret2 = ib_smi_port_close(device, cur_port); if (ret2) { - printk(KERN_ERR "Couldn't close %s port %d\n", + printk(KERN_ERR SPFX "Couldn't close %s port %d\n", device->name, cur_port); if (!ret) ret = ret2; @@ -748,7 +748,7 @@ { INIT_LIST_HEAD(&ib_smi_port_list); if (ib_register_client(&ib_smi_client)) { - printk(KERN_ERR "Couldn't register ib_smi client\n"); + printk(KERN_ERR SPFX "Couldn't register ib_smi client\n"); return -EINVAL; } From rminnich at lanl.gov Mon Oct 11 11:27:48 2004 From: rminnich at lanl.gov (Ronald G. Minnich) Date: Mon, 11 Oct 2004 12:27:48 -0600 (MDT) Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <6.1.2.0.2.20041008152933.01f671a8@esmail.cup.hp.com> References: <20041008202247.GA9653@kroah.com> <6.1.2.0.2.20041008152933.01f671a8@esmail.cup.hp.com> Message-ID: On Fri, 8 Oct 2004, Michael Krause wrote: > Spec for free or spec for a price - neither grants anyone rights to any > IP contained within the specifications or on the technologies that > surround the specification. The change in spec cost, while clearly > unfortunate, has no impact on the IP rights. IP rights are defined by > the IBTA membership agreement (just like they are for PCI and any number > of other technologies used within the industry). If you want to > implement a technology, then you have to be a member of the appropriate > organization and agree to the same industry-wide terms that others do. > Hence, this problem is not IB-specific but a fact of life within the > industry. funny, I don't recall these problems with Ethernet. > Again, this is true of many technologies not just IB. For example, if a > company has patents on PCI Express and someone implements a device / > chipset / whatever and they are not part of the PCI-SIG, then they can > be subject to different terms than someone who is a member of the > PCI-SIG. In both cases, the access to specs, etc. has nothing to do > with IP licensing. sorry, I think about protocol software differently than chips. Maybe I'm thinking incorrectly here. ron From halr at voltaire.com Mon Oct 11 11:37:55 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 11 Oct 2004 14:37:55 -0400 Subject: [openib-general] [PATCH] mthca: Display firmware version message Message-ID: <1097519875.6014.15.camel@hpc-1> mthca: Display firmware version message Make message always come out so easy to validate firmware version as this is significant Index: mthca_cmd.c =================================================================== --- mthca_cmd.c (revision 965) +++ mthca_cmd.c (working copy) @@ -525,6 +525,8 @@ MTHCA_GET(dev->fw_start, outbox, QUERY_FW_START_OFFSET); MTHCA_GET(dev->fw_end, outbox, QUERY_FW_END_OFFSET); + printk(KERN_INFO PFX "FW version %012llx\n", + (unsigned long long) dev->fw_ver); mthca_dbg(dev, "FW version %012llx, max_cmds %d\n", (unsigned long long) dev->fw_ver, dev->cmd.max_cmds); mthca_dbg(dev, "FW size %d KB (start %llx, end %llx)\n", From halr at voltaire.com Mon Oct 11 12:01:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 11 Oct 2004 15:01:01 -0400 Subject: [openib-general] [PATCH] mthca: mthca_modify_qp should allow any to reset QP state transition Message-ID: <1097521261.6014.19.camel@hpc-1> mthca: mthca_modify_qp should validate attr->qp_state < 0 rather than <=0 so any to reset QP state transition is allowed Index: mthca_qp.c =================================================================== --- mthca_qp.c (revision 965) +++ mthca_qp.c (working copy) @@ -495,7 +495,7 @@ spin_unlock_irq(&qp->lock); if (attr_mask & IB_QP_STATE) { - if (attr->qp_state <= 0 || attr->qp_state > IB_QPS_ERR) + if (attr->qp_state < 0 || attr->qp_state > IB_QPS_ERR) return -EINVAL; new_state = attr->qp_state; } else From roland at topspin.com Mon Oct 11 12:26:06 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 11 Oct 2004 12:26:06 -0700 Subject: [openib-general] [PATCH] mthca: Display firmware version message In-Reply-To: <1097519875.6014.15.camel@hpc-1> (Hal Rosenstock's message of "Mon, 11 Oct 2004 14:37:55 -0400") References: <1097519875.6014.15.camel@hpc-1> Message-ID: <52pt3pjbu9.fsf@topspin.com> Hal> mthca: Display firmware version message Make message always Hal> come out so easy to validate firmware version as this is Hal> significant Since you can always do # cat /sys/class/infiniband/mthca0/fw_ver 3.2.0 I'm not going to apply this. Thanks, Roland From roland at topspin.com Mon Oct 11 12:27:37 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 11 Oct 2004 12:27:37 -0700 Subject: [openib-general] [PATCH] mthca: mthca_modify_qp should allow any to reset QP state transition In-Reply-To: <1097521261.6014.19.camel@hpc-1> (Hal Rosenstock's message of "Mon, 11 Oct 2004 15:01:01 -0400") References: <1097521261.6014.19.camel@hpc-1> Message-ID: <52lledjbrq.fsf@topspin.com> Hal> mthca: mthca_modify_qp should validate attr->qp_state < 0 Hal> rather than <=0 so any to reset QP state transition is Hal> allowed Good catch, thanks. I applied this. - R. From krause at cup.hp.com Mon Oct 11 13:25:03 2004 From: krause at cup.hp.com (Michael Krause) Date: Mon, 11 Oct 2004 13:25:03 -0700 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: References: <20041008202247.GA9653@kroah.com> <6.1.2.0.2.20041008152933.01f671a8@esmail.cup.hp.com> Message-ID: <6.1.2.0.2.20041011132033.034b74b0@esmail.cup.hp.com> At 11:27 AM 10/11/2004, Ronald G. Minnich wrote: >On Fri, 8 Oct 2004, Michael Krause wrote: > > > Spec for free or spec for a price - neither grants anyone rights to any > > IP contained within the specifications or on the technologies that > > surround the specification. The change in spec cost, while clearly > > unfortunate, has no impact on the IP rights. IP rights are defined by > > the IBTA membership agreement (just like they are for PCI and any number > > of other technologies used within the industry). If you want to > > implement a technology, then you have to be a member of the appropriate > > organization and agree to the same industry-wide terms that others do. > > Hence, this problem is not IB-specific but a fact of life within the > > industry. > >funny, I don't recall these problems with Ethernet. IEEE requires RAND licensing by companies; it does not license the technology to individuals. So the same problems exist for Ethernet as it does for IB as it does for many technologies. > > Again, this is true of many technologies not just IB. For example, if a > > company has patents on PCI Express and someone implements a device / > > chipset / whatever and they are not part of the PCI-SIG, then they can > > be subject to different terms than someone who is a member of the > > PCI-SIG. In both cases, the access to specs, etc. has nothing to do > > with IP licensing. > >sorry, I think about protocol software differently than chips. Maybe I'm >thinking incorrectly here. There are software patents in all of these protocol off-load technologies just like there are hardware patents. As people start to use embedded processors for various communication workloads, the line between software and hardware patents will blur even further. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Oct 12 06:22:10 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 12 Oct 2004 09:22:10 -0400 Subject: [openib-general] SMI comments Message-ID: <1097587330.31469.22.camel@hpc-1> Hi again, I have another comment on the retrun value from smi_handle_dr_smp_send. In most cases, it means the output port. In other cases, 0 means discard. This is an issue for switches where 0 is a valid port number. As far as the initialization of outgoing and returning DR SMPs, the assumption is that this is done outside of this code. The same is true for most of the LID handling, although there needs to be more cases and support for non permissive LIDs. -- Hal -----Forwarded Message----- From: Hal Rosenstock To: Sean Hefty Cc: openib-general at openib.org Subject: [openib-general] SMI comments Date: 08 Oct 2004 14:27:28 -0400 Hi, I started looking at the SMI code in detail and have a few questions/comments below. I am assuming the implementation is based on the IBA 1.1 spec description. It looks to me like there is no SMP initialization code (per 14.2.2.1 and 14.2.2.3). 14.2.2.1 doesn't matter right now as we don't initiate any outgoing SMPs, but 14.2.2.3 does. I think this is why I needed the hop pointer workaround. I also didn't see the LID handling as described in this section of the spec. In terms of switches, the port is the input or output (physical) port and not the port of the MAD registration (base or enhanced port 0). More detailed comments and one or more patches to follow based on these comments. Thanks. -- Hal _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Tue Oct 12 06:34:50 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 12 Oct 2004 09:34:50 -0400 Subject: [openib-general] [PATCH] ib_smi: Check node type against switch rather than not CA Message-ID: <1097588090.31469.26.camel@hpc-1> ib_smi: Check node type against switch rather than not CA (handles router case as well, in case this happens) Index: ib_smi.c =================================================================== --- ib_smi.c (revision 970) +++ ib_smi.c (working copy) @@ -73,7 +73,7 @@ if (hop_ptr == hop_cnt) { /* smp->return_path set when received */ smp->hop_ptr++; - return (mad_agent->device->node_type != IB_NODE_CA || + return (mad_agent->device->node_type == IB_NODE_SWITCH || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -178,7 +178,7 @@ smp->return_path[hop_ptr] = mad_agent->port_num; /* smp->hop_ptr updated when sending */ - return (mad_agent->device->node_type != IB_NODE_CA || + return (mad_agent->device->node_type == IB_NODE_SWITCH || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -210,7 +210,7 @@ return 1; } /* smp->hop_ptr updated when sending */ - return (mad_agent->device->node_type != IB_NODE_CA); + return (mad_agent->device->node_type == IB_NODE_SWITCH); } /* C14-13:4 -- hop_ptr = 0 -> give to SM. */ From halr at voltaire.com Tue Oct 12 06:51:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 12 Oct 2004 09:51:26 -0400 Subject: [openib-general] [PATCH] ib_smi: PMA does not receive TrapRepress Message-ID: <1097589086.31469.33.camel@hpc-1> ib_smi: PMA does not receive TrapRepress Index: ib_smi.c =================================================================== --- ib_smi.c (revision 971) +++ ib_smi.c (working copy) @@ -592,6 +592,8 @@ /* Obtain MAD agent for PerfMgt class */ reg_req.mgmt_class = IB_MGMT_CLASS_PERF_MGMT; + clear_bit(IB_MGMT_METHOD_TRAP_REPRESS, + (unsigned long *)®_req.method_mask); port_priv->pma_mad_agent = ib_register_mad_agent(device, port_num, IB_QPT_GSI, ®_req, 0, From halr at voltaire.com Tue Oct 12 07:20:45 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 12 Oct 2004 10:20:45 -0400 Subject: [openib-general] [PATCH] ib_smi: In smi_handle_dr_smp_send, fix C14-13:3 handling Message-ID: <1097590845.31469.40.camel@hpc-1> ib_smi: In smi_handle_dr_smp_send, fix C14-13:3 handling Index: ib_smi.c =================================================================== --- ib_smi.c (revision 972) +++ ib_smi.c (working copy) @@ -103,8 +103,8 @@ if (hop_ptr == 1) { smp->hop_ptr--; /* C14-13:3 -- SMPs destined for SM shouldn't be here */- return (mad_agent->device->node_type == IB_NODE_SWITCH && - smp->dr_slid != IB_LID_PERMISSIVE); + return (mad_agent->device->node_type == IB_NODE_SWITCH || + smp->dr_slid == IB_LID_PERMISSIVE); } /* C14-13:4 -- hop_ptr = 0 -> should have gone to SM. */ From halr at voltaire.com Tue Oct 12 09:15:14 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 12 Oct 2004 12:15:14 -0400 Subject: [openib-general] [PATCH] ib_smi: Remove DR handling workaround in smi_send_smp Message-ID: <1097597714.31469.45.camel@hpc-1> ib_smi: Remove DR handling workaround in smi_send_smp Index: ib_smi.c =================================================================== --- ib_smi.c (revision 973) +++ ib_smi.c (working copy) @@ -188,9 +188,12 @@ } else { - /* C14-13:1 -- sender should have decremented hop_ptr */ - if (hop_cnt && hop_ptr == hop_cnt + 1) - return 0; + /* C14-13:1 */ + if (hop_cnt && hop_ptr == hop_cnt + 1) { + smp->hop_ptr--; + return (smp->return_path[smp->hop_ptr] == + mad_agent->port_num); + } /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { @@ -411,12 +414,11 @@ ret = smi_process_local(mad_agent, (struct ib_mad *)smp, smp_response, slid); if (ret & IB_MAD_RESULT_SUCCESS) { - /* DR handling workaround !!! */ - if (smp_response->mad_hdr.mgmt_class == - IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { - if (((struct ib_smp *)smp_response)->hop_ptr) { - ((struct ib_smp *)smp_response)->hop_ptr--; - } + if (!smi_handle_smp_recv(mad_agent, + (struct ib_smp *)smp_response)) { + /* SMI failed receive */ + kfree(smp_response); + return 0; } smp_send(mad_agent, smp_response, mad_recv_wc); } else From roland at topspin.com Tue Oct 12 10:20:37 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 12 Oct 2004 10:20:37 -0700 Subject: [openib-general] Questions about SMI Message-ID: <52ekk3j1ju.fsf@topspin.com> I'm reading over the SMI implementation, and I have a few questions. First of all, I don't see how LID-routed SMPs or PMA MADs are handled correctly. As far as I can tell, what ends up happening for a LR-SMP or PMA query is: smi_recv_handler() gets MAD and just calls smi_recv_smp(), which will call smi_handle_smp_recv() which just returns 1 so smi_recv_smp() goes on to call smi_check_forward_smp(), which just returns 1 so smi_recv_smp() calls smi_send_smp() etc. Am I wrong or is this completely b0rken? Also, it seems very confusing that PMA queries end up going through functions named "smi_recv_handler()" and so on (since they are using GSI, not SMI). In the same vein, having port_priv->mad_agent and port_priv->mad_agent2 is less than enlightening; it would make more sense to use either more descriptive names or an array of agents. Finally, what's the plan for handling device-specific things like Tavor-generated MADs (to recap: when Tavor needs to send a trap to the SM, the trap shows up as a receive on the local QP0 with source LID 0; this MAD needs to be forwarded to the SM)? Thanks, Roland From halr at voltaire.com Tue Oct 12 10:39:07 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 12 Oct 2004 13:39:07 -0400 Subject: [openib-general] Questions about SMI In-Reply-To: <52ekk3j1ju.fsf@topspin.com> References: <52ekk3j1ju.fsf@topspin.com> Message-ID: <1097602747.31469.54.camel@hpc-1> On Tue, 2004-10-12 at 13:20, Roland Dreier wrote: > I'm reading over the SMI implementation, and I have a few questions. > > First of all, I don't see how LID-routed SMPs or PMA MADs are handled > correctly. As far as I can tell, what ends up happening for a LR-SMP > or PMA query is: > > smi_recv_handler() gets MAD and just calls > smi_recv_smp(), which will call > smi_handle_smp_recv() which just returns 1 > so smi_recv_smp() goes on to call > smi_check_forward_smp(), which just returns 1 > so smi_recv_smp() calls smi_send_smp() > > etc. > > Am I wrong or is this completely b0rken? No, it's not broken (of course depending on one's definition) and yes, this can be bypassed and is on my TODO list. I have been focusing more on functionality right now. > Also, it seems very confusing that PMA queries end up going through > functions named "smi_recv_handler()" and so on (since they are using > GSI, not SMI). Understand the evolution of the code. It started with just being DR SMP and evolved to support LR SMP and then PerfMgt. I will change the routine names and code flow so it is more intuitive. > In the same vein, having port_priv->mad_agent and > port_priv->mad_agent2 is less than enlightening; it would make more > sense to use either more descriptive names or an array of agents. OK. > Finally, what's the plan for handling device-specific things like > Tavor-generated MADs (to recap: when Tavor needs to send a trap to the > SM, the trap shows up as a receive on the local QP0 with source LID 0; > this MAD needs to be forwarded to the SM)? How important is this ? Things seem to run fine without it right now as far as I can tell. In any case, I can make this the next item on my list (putting the code restructure and renaming as well as more SMI work below this in priority). I should be able to start on this later today or early tomorrow. -- Hal From roland at topspin.com Tue Oct 12 10:57:03 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 12 Oct 2004 10:57:03 -0700 Subject: [openib-general] Questions about SMI In-Reply-To: <1097602747.31469.54.camel@hpc-1> (Hal Rosenstock's message of "Tue, 12 Oct 2004 13:39:07 -0400") References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> Message-ID: <52acurizv4.fsf@topspin.com> Hal> No, it's not broken (of course depending on one's definition) Hal> and yes, this can be bypassed and is on my TODO list. I have Hal> been focusing more on functionality right now. Oh, OK, I see. smi_send_smp() is what actually generates the response and sends it back. Sorry I didn't follow the code far enough. By the way, what do you mean by "bypassing" this? Roland> Finally, what's the plan for handling device-specific Roland> things like Tavor-generated MADs (to recap: when Tavor Roland> needs to send a trap to the SM, the trap shows up as a Roland> receive on the local QP0 with source LID 0; this MAD needs Roland> to be forwarded to the SM)? Hal> How important is this ? Things seem to run fine without it Hal> right now as far as I can tell. Yes, things will usually work fine since SM traps are not used for things like subnet discovery, so it's not that important for now. However, if something happens to cause the Tavor to start sending out a trap, it will continue to send traps forever (since the trap is not being forwarded to anyone who will generate a trap repress). Also, I don't think it's possible to pass compliance testing without this. In any case, my question is really about the architecture to handle this, since it can't be done by registering an agent (someone, probably the low-level driver, needs to look at SMPs before agent dispatch and check the SLID to decide what to do with it). Thanks, Roland From halr at voltaire.com Tue Oct 12 11:52:04 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 12 Oct 2004 14:52:04 -0400 Subject: [openib-general] Questions about SMI In-Reply-To: <52acurizv4.fsf@topspin.com> References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> Message-ID: <1097607124.31469.97.camel@hpc-1> On Tue, 2004-10-12 at 13:57, Roland Dreier wrote: > Hal> No, it's not broken (of course depending on one's definition) > Hal> and yes, this can be bypassed and is on my TODO list. I have > Hal> been focusing more on functionality right now. > > Oh, OK, I see. smi_send_smp() is what actually generates the response > and sends it back. Sorry I didn't follow the code far enough. By the > way, what do you mean by "bypassing" this? The choice is to either change the routine names so not just smi or smp or to pick off the 2 classes (LR SMPs and PerfMgt) and just handle them a little more "directly". That's all. > Roland> Finally, what's the plan for handling device-specific > Roland> things like Tavor-generated MADs (to recap: when Tavor > Roland> needs to send a trap to the SM, the trap shows up as a > Roland> receive on the local QP0 with source LID 0; this MAD needs > Roland> to be forwarded to the SM)? > > Hal> How important is this ? Things seem to run fine without it > Hal> right now as far as I can tell. > > Yes, things will usually work fine since SM traps are not used for > things like subnet discovery, so it's not that important for now. > However, if something happens to cause the Tavor to start sending out > a trap, it will continue to send traps forever (since the trap is not > being forwarded to anyone who will generate a trap repress). I'm not sure the traps get out on the wire right now so it may just be a local issue right now... > Also, I don't think it's possible to pass compliance testing without this. Yes, there is at least one cause Trap to be generated at DUT and send TrapRepress to DUT case in the compliance suite if I recall correctly. > In any case, my question is really about the architecture to handle > this, since it can't be done by registering an agent I'm missing something here. Do these traps come in as normal receives (to the SMA) ? If so, can't they be identified from other receives (e.g. SLID = 0 (perhaps other parameters)) ? If they don't come in as normal receives, can they be "made into" receives ? > (someone,probably the low-level driver, needs to look at SMPs before agent > dispatch and check the SLID to decide what to do with it). Thanks. -- Hal From roland at topspin.com Tue Oct 12 11:58:49 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 12 Oct 2004 11:58:49 -0700 Subject: [openib-general] Questions about SMI In-Reply-To: <1097607124.31469.97.camel@hpc-1> (Hal Rosenstock's message of "Tue, 12 Oct 2004 14:52:04 -0400") References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> <1097607124.31469.97.camel@hpc-1> Message-ID: <524qkzix06.fsf@topspin.com> Hal> I'm not sure the traps get out on the wire right now so it Hal> may just be a local issue right now... They are sent by the ib_mad_send() in mthca_mad.c that the patch in your tree comments out. Hal> I'm missing something here. Do these traps come in as normal Hal> receives (to the SMA) ? If so, can't they be identified from Hal> other receives (e.g. SLID = 0 (perhaps other parameters)) ? Hal> If they don't come in as normal receives, can they be "made Hal> into" receives ? They arrive as normal receives but with SLID = 0. The question I have is where the check for SMP trap with SLID == 0 will be made. It's not possible for someone to register a MAD agent for SMP traps to handle this, because the SM itself needs to get all normal SMP traps. So this special case needs to be handled before agent dispatch is done. Also, since this is really just a Tavor/Arbel-specific quirk, it would be better to find a generic way for low-level drivers to hook into the MAD code and handle this. - R. From halr at voltaire.com Tue Oct 12 12:56:12 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 12 Oct 2004 15:56:12 -0400 Subject: [openib-general] Questions about SMI In-Reply-To: <524qkzix06.fsf@topspin.com> References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> <1097607124.31469.97.camel@hpc-1> <524qkzix06.fsf@topspin.com> Message-ID: <1097610972.31469.105.camel@hpc-1> On Tue, 2004-10-12 at 14:58, Roland Dreier wrote: > They arrive as normal receives but with SLID = 0. The question I have > is where the check for SMP trap with SLID == 0 will be made. It's not > possible for someone to register a MAD agent for SMP traps to handle > this, because the SM itself needs to get all normal SMP traps. So > this special case needs to be handled before agent dispatch is done. Sounds like the driver is the only place where this can be done then. > Also, since this is really just a Tavor/Arbel-specific quirk, it would > be better to find a generic way for low-level drivers to hook into the > MAD code and handle this. I can provide an ib_mad_send routine that takes a MAD pointer similar to what is there now. Would that work for you ? Is that generic enough ? The only issue will be for the MAD layer to deal with the send completion differently than sends posted by agents of the MAD layer. That should be straightforward to resolve. -- Hal From bohra at cs.rutgers.edu Tue Oct 12 13:37:36 2004 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Tue, 12 Oct 2004 16:37:36 -0400 Subject: [openib-general] Connection from a kernel module Message-ID: <416C4090.1060100@cs.rutgers.edu> Hello all, I am writing a VAPI based kernel module for Tavor HCAs. AFAIK, the CM service cannot be used from within a kernel module. Is there a way to create a connection between two in-kernel modules? Any pointers would be invaluable. Thanks Aniruddha PS - I am not subscribed to this list. Please include me in the cc. From roland at topspin.com Tue Oct 12 15:57:17 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 12 Oct 2004 15:57:17 -0700 Subject: [openib-general] Questions about SMI In-Reply-To: <1097610972.31469.105.camel@hpc-1> (Hal Rosenstock's message of "Tue, 12 Oct 2004 15:56:12 -0400") References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> <1097607124.31469.97.camel@hpc-1> <524qkzix06.fsf@topspin.com> <1097610972.31469.105.camel@hpc-1> Message-ID: <52u0szh7ea.fsf@topspin.com> Hal> Sounds like the driver is the only place where this can be done then. Do you mean that the mthca driver should snoop on receive completions on QP0 and look for traps with SLID==0? This seems pretty awkward (since the MAD layer is going to lose those receive completions, how do we handle freeing the MAD memory, etc., etc) and doesn't really seem practical. The last time we discussed this I thought we reached the conclusion that the MAD layer should give the low-level driver a look at every MAD before doing agent dispatch, eg: http://article.gmane.org/gmane.linux.drivers.openib/4264 - R. From halr at voltaire.com Wed Oct 13 03:45:05 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 06:45:05 -0400 Subject: [openib-general] Questions about SMI In-Reply-To: <52u0szh7ea.fsf@topspin.com> References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> <1097607124.31469.97.camel@hpc-1> <524qkzix06.fsf@topspin.com> <1097610972.31469.105.camel@hpc-1> <52u0szh7ea.fsf@topspin.com> Message-ID: <1097664305.2751.20.camel@localhost.localdomain> On Tue, 2004-10-12 at 18:57, Roland Dreier wrote: > Do you mean that the mthca driver should snoop on receive completions > on QP0 and look for traps with SLID==0? This seems pretty awkward > (since the MAD layer is going to lose those receive completions, how > do we handle freeing the MAD memory, etc., etc) and doesn't really > seem practical. The last time we discussed this I thought we reached > the conclusion that the MAD layer should give the low-level driver a > look at every MAD before doing agent dispatch, eg: > http://article.gmane.org/gmane.linux.drivers.openib/4264 It seems like the thread ended with an unanswered question. The answer appears to be that process_mad was used. Is that what we want to do for OpenIB ? Is this to be done for all MADs, all SMPs, or only certain ones ? If certain ones, is there a flag to indicate which ones need to be fed to the driver ? Does that avoid the issue of more than 1 consumer for a particular MAD ? That was another "rule" that I thought we achieved consensus on. It looks like the consumed flag takes care of this. Would there also need to be a send routine provided (by the MAD layer) which does not require an agent registration ? If so, is there a way to ensure that the driver is the only component to use this routine ? -- Hal From halr at voltaire.com Wed Oct 13 05:27:58 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 08:27:58 -0400 Subject: [openib-general] [PATCH] ib_smi: Make agent names more meaningful Message-ID: <1097670478.2751.29.camel@localhost.localdomain> ib_smi: Make agent names more meaningful Index: ib_smi_priv.h =================================================================== --- ib_smi_priv.h (revision 970) +++ ib_smi_priv.h (working copy) @@ -41,9 +41,9 @@ struct list_head send_posted_smp_list; spinlock_t send_list_lock; int port_num; - struct ib_mad_agent *mad_agent; /* DR SM class */ - struct ib_mad_agent *mad_agent2; /* LR SM class */ - struct ib_mad_agent *pma_mad_agent; /* PerfMgt class */ + struct ib_mad_agent *dr_smp_mad_agent; /* DR SM class */ + struct ib_mad_agent *lr_smp_mad_agent; /* LR SM class */ + struct ib_mad_agent *pma_mad_agent; /* PerfMgt class */ struct ib_mr *mr; u64 wr_id; }; Index: ib_smi.c =================================================================== --- ib_smi.c (revision 974) +++ ib_smi.c (working copy) @@ -316,8 +316,8 @@ /* Find matching MAD agent */ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { - if ((entry->mad_agent == mad_agent) || - (entry->mad_agent2 == mad_agent) || + if ((entry->dr_smp_mad_agent == mad_agent) || + (entry->lr_smp_mad_agent == mad_agent) || (entry->pma_mad_agent == mad_agent)) { port_priv = entry; break; @@ -460,8 +460,8 @@ /* Find matching MAD agent */ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { - if ((entry->mad_agent == mad_agent) || - (entry->mad_agent2 == mad_agent) || + if ((entry->dr_smp_mad_agent == mad_agent) || + (entry->lr_smp_mad_agent == mad_agent) || (entry->pma_mad_agent == mad_agent)) { port_priv = entry; break; @@ -530,7 +530,7 @@ /* First, check if port already open for SMI */ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { - if (entry->mad_agent->device == device && + if (entry->dr_smp_mad_agent->device == device && entry->port_num == port_num) { port_priv = entry; break; @@ -568,27 +568,27 @@ set_bit(IB_MGMT_METHOD_TRAP_REPRESS, (unsigned long *)®_req.method_mask); - port_priv->mad_agent = ib_register_mad_agent(device, port_num, - IB_QPT_SMI, - ®_req, 0, - &smi_send_handler, - &smi_recv_handler, - NULL); - if (IS_ERR(port_priv->mad_agent)) { - ret = PTR_ERR(port_priv->mad_agent); + port_priv->dr_smp_mad_agent = ib_register_mad_agent(device, port_num, + IB_QPT_SMI, + ®_req, 0, + &smi_send_handler, + &smi_recv_handler, + NULL); + if (IS_ERR(port_priv->dr_smp_mad_agent)) { + ret = PTR_ERR(port_priv->dr_smp_mad_agent); goto error2; } /* Obtain MAD agent for LID routed SM class */ reg_req.mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - port_priv->mad_agent2 = ib_register_mad_agent(device, port_num, - IB_QPT_SMI, - ®_req, 0, - &smi_send_handler, - &smi_recv_handler, - NULL); - if (IS_ERR(port_priv->mad_agent2)) { - ret = PTR_ERR(port_priv->mad_agent2); + port_priv->lr_smp_mad_agent = ib_register_mad_agent(device, port_num, + IB_QPT_SMI, + ®_req, 0, + &smi_send_handler, + &smi_recv_handler, + NULL); + if (IS_ERR(port_priv->lr_smp_mad_agent)) { + ret = PTR_ERR(port_priv->lr_smp_mad_agent); goto error3; } @@ -607,7 +607,7 @@ goto error4; } - port_priv->mr = ib_reg_phys_mr(port_priv->mad_agent->qp->pd, + port_priv->mr = ib_reg_phys_mr(port_priv->dr_smp_mad_agent->qp->pd, &buf_list, 1, IB_ACCESS_LOCAL_WRITE, &iova); if (IS_ERR(port_priv->mr)) { @@ -625,9 +625,9 @@ error5: ib_unregister_mad_agent(port_priv->pma_mad_agent); error4: - ib_unregister_mad_agent(port_priv->mad_agent2); + ib_unregister_mad_agent(port_priv->lr_smp_mad_agent); error3: - ib_unregister_mad_agent(port_priv->mad_agent); + ib_unregister_mad_agent(port_priv->dr_smp_mad_agent); error2: kfree(port_priv); error1: @@ -641,7 +641,7 @@ spin_lock_irqsave(&ib_smi_port_list_lock, flags); list_for_each_entry(entry, &ib_smi_port_list, port_list) { - if (entry->mad_agent->device == device && + if (entry->dr_smp_mad_agent->device == device && entry->port_num == port_num) { port_priv = entry; break; @@ -659,8 +659,8 @@ ib_dereg_mr(port_priv->mr); ib_unregister_mad_agent(port_priv->pma_mad_agent); - ib_unregister_mad_agent(port_priv->mad_agent2); - ib_unregister_mad_agent(port_priv->mad_agent); + ib_unregister_mad_agent(port_priv->lr_smp_mad_agent); + ib_unregister_mad_agent(port_priv->dr_smp_mad_agent); kfree(port_priv); return 0; From halr at voltaire.com Wed Oct 13 06:11:20 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 09:11:20 -0400 Subject: [openib-general] [PATCH] ib_smi: Change from smi to agent names for more code clarity Message-ID: <1097673080.2751.38.camel@localhost.localdomain> ib_smi: Change from smi to agent names for more code clarity Index: ib_smi_priv.h =================================================================== --- ib_smi_priv.h (revision 976) +++ ib_smi_priv.h (working copy) @@ -23,22 +23,22 @@ Copyright (c) 2004 Voltaire Corporation. All rights reserved. */ -#ifndef __IB_SMI_PRIV_H__ -#define __IB_SMI_PRIV_H__ +#ifndef __IB_AGENT_PRIV_H__ +#define __IB_AGENT_PRIV_H__ #include -#define SPFX "ib_sma: " +#define SPFX "ib_agent: " -struct ib_smi_send_wr { +struct ib_agent_send_wr { struct list_head send_list; - struct ib_mad *smp; + struct ib_mad *mad; DECLARE_PCI_UNMAP_ADDR(mapping) }; -struct ib_smi_port_private { +struct ib_agent_port_private { struct list_head port_list; - struct list_head send_posted_smp_list; + struct list_head send_posted_list; spinlock_t send_list_lock; int port_num; struct ib_mad_agent *dr_smp_mad_agent; /* DR SM class */ @@ -48,4 +48,4 @@ u64 wr_id; }; -#endif /* __IB_SMI_PRIV_H__ */ +#endif /* __IB_AGENT_PRIV_H__ */ Index: ib_smi.c =================================================================== --- ib_smi.c (revision 976) +++ ib_smi.c (working copy) @@ -34,8 +34,8 @@ MODULE_AUTHOR("Hal Rosenstock"); -static spinlock_t ib_smi_port_list_lock = SPIN_LOCK_UNLOCKED; -static struct list_head ib_smi_port_list; +static spinlock_t ib_agent_port_list_lock = SPIN_LOCK_UNLOCKED; +static struct list_head ib_agent_port_list; /* * Fixup a directed route SMP for sending. Return 0 if the SMP should be @@ -300,12 +300,12 @@ slid, smp, smp_response); } -void smp_send(struct ib_mad_agent *mad_agent, - struct ib_mad *smp, +void mad_send(struct ib_mad_agent *mad_agent, + struct ib_mad *mad, struct ib_mad_recv_wc *mad_recv_wc) { - struct ib_smi_port_private *entry, *port_priv = NULL; - struct ib_smi_send_wr *smi_send_wr; + struct ib_agent_port_private *entry, *port_priv = NULL; + struct ib_agent_send_wr *agent_send_wr; struct ib_sge gather_list; struct ib_send_wr send_wr; struct ib_send_wr *bad_send_wr; @@ -314,8 +314,8 @@ unsigned long flags; /* Find matching MAD agent */ - spin_lock_irqsave(&ib_smi_port_list_lock, flags); - list_for_each_entry(entry, &ib_smi_port_list, port_list) { + spin_lock_irqsave(&ib_agent_port_list_lock, flags); + list_for_each_entry(entry, &ib_agent_port_list, port_list) { if ((entry->dr_smp_mad_agent == mad_agent) || (entry->lr_smp_mad_agent == mad_agent) || (entry->pma_mad_agent == mad_agent)) { @@ -323,21 +323,21 @@ break; } } - spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR SPFX "smp_send: no matching MAD agent 0x%x\n", + printk(KERN_ERR SPFX "mad_send: no matching MAD agent 0x%x\n", (unsigned int)mad_agent); return; } - smi_send_wr = kmalloc(sizeof(*smi_send_wr), GFP_KERNEL); - if (!smi_send_wr) + agent_send_wr = kmalloc(sizeof(*agent_send_wr), GFP_KERNEL); + if (!agent_send_wr) return; - smi_send_wr->smp = smp; + agent_send_wr->mad = mad; /* PCI mapping */ gather_list.addr = pci_map_single(mad_agent->device->dma_device, - smp, + mad, sizeof(struct ib_mad), PCI_DMA_TODEVICE); gather_list.length = sizeof(struct ib_mad); @@ -361,33 +361,33 @@ ah = ib_create_ah(mad_agent->qp->pd, &ah_attr); if (IS_ERR(ah)) { printk(KERN_ERR SPFX "No memory for address handle\n"); - kfree(smp); + kfree(mad); return; } send_wr.wr.ud.ah = ah; - if (smp->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { + if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { send_wr.wr.ud.pkey_index = mad_recv_wc->wc->pkey_index; send_wr.wr.ud.remote_qkey = IB_QP1_QKEY; } else { send_wr.wr.ud.pkey_index = 0; /* Should only matter for GMPs */ send_wr.wr.ud.remote_qkey = 0; /* for SMPs */ } - send_wr.wr.ud.mad_hdr = (struct ib_mad_hdr *)smp; + send_wr.wr.ud.mad_hdr = (struct ib_mad_hdr *)mad; send_wr.wr_id = ++port_priv->wr_id; - pci_unmap_addr_set(smp, smi_send_wr->mapping, gather_list.addr); + pci_unmap_addr_set(mad, agent_send_wr->mapping, gather_list.addr); /* Send */ spin_lock_irqsave(&port_priv->send_list_lock, flags); if (ib_post_send_mad(mad_agent, &send_wr, &bad_send_wr)) { pci_unmap_single(mad_agent->device->dma_device, - pci_unmap_addr(smp, smi_send_wr->mapping), + pci_unmap_addr(mad, agent_send_wr->mapping), sizeof(struct ib_mad), PCI_DMA_TODEVICE); } else { - list_add_tail(&smi_send_wr->send_list, - &port_priv->send_posted_smp_list); + list_add_tail(&agent_send_wr->send_list, + &port_priv->send_posted_list); } spin_unlock_irqrestore(&port_priv->send_list_lock, flags); ib_destroy_ah(ah); @@ -420,7 +420,7 @@ kfree(smp_response); return 0; } - smp_send(mad_agent, smp_response, mad_recv_wc); + mad_send(mad_agent, smp_response, mad_recv_wc); } else kfree(smp_response); return 1; @@ -449,17 +449,17 @@ return 1; } -static void smi_send_handler(struct ib_mad_agent *mad_agent, - struct ib_mad_send_wc *mad_send_wc) +static void agent_send_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_send_wc *mad_send_wc) { - struct ib_smi_port_private *entry, *port_priv = NULL; - struct ib_smi_send_wr *smi_send_wr; + struct ib_agent_port_private *entry, *port_priv = NULL; + struct ib_agent_send_wr *agent_send_wr; struct list_head *send_wr; unsigned long flags; /* Find matching MAD agent */ - spin_lock_irqsave(&ib_smi_port_list_lock, flags); - list_for_each_entry(entry, &ib_smi_port_list, port_list) { + spin_lock_irqsave(&ib_agent_port_list_lock, flags); + list_for_each_entry(entry, &ib_agent_port_list, port_list) { if ((entry->dr_smp_mad_agent == mad_agent) || (entry->lr_smp_mad_agent == mad_agent) || (entry->pma_mad_agent == mad_agent)) { @@ -468,44 +468,46 @@ } } /* Hold lock longer !!! */ - spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR SPFX "smi_send_handler: no matching MAD agent " + printk(KERN_ERR SPFX "agent_send_handler: no matching MAD agent " "0x%x\n", (unsigned int)mad_agent); return; } /* Completion corresponds to first entry on posted MAD send list */ spin_lock_irqsave(&port_priv->send_list_lock, flags); - if (list_empty(&port_priv->send_posted_smp_list)) { + if (list_empty(&port_priv->send_posted_list)) { spin_unlock_irqrestore(&port_priv->send_list_lock, flags); printk(KERN_ERR SPFX "Send completion WR ID 0x%Lx but send list " "is empty\n", mad_send_wc->wr_id); return; } - smi_send_wr = list_entry(&port_priv->send_posted_smp_list, - struct ib_smi_send_wr, - send_list); - send_wr = smi_send_wr->send_list.next; - smi_send_wr = container_of(send_wr, struct ib_smi_send_wr, send_list); + agent_send_wr = list_entry(&port_priv->send_posted_list, + struct ib_agent_send_wr, + send_list); + send_wr = agent_send_wr->send_list.next; + agent_send_wr = container_of(send_wr, struct ib_agent_send_wr, + send_list); /* Remove from posted send SMP list */ - list_del(&smi_send_wr->send_list); + list_del(&agent_send_wr->send_list); spin_unlock_irqrestore(&port_priv->send_list_lock, flags); /* Unmap PCI */ pci_unmap_single(mad_agent->device->dma_device, - pci_unmap_addr(smi_send_wr->smp, smi_send_wr->mapping), + pci_unmap_addr(agent_send_wr->smp, + agent_send_wr->mapping), sizeof(struct ib_mad), PCI_DMA_TODEVICE); /* Release allocated memory */ - kfree(smi_send_wr->smp); + kfree(agent_send_wr->mad); } -static void smi_recv_handler(struct ib_mad_agent *mad_agent, - struct ib_mad_recv_wc *mad_recv_wc) +static void agent_recv_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_recv_wc *mad_recv_wc) { smi_recv_smp(mad_agent, (struct ib_smp *)mad_recv_wc->recv_buf->mad, @@ -515,7 +517,7 @@ ib_free_recv_mad(mad_recv_wc); } -static int ib_smi_port_open(struct ib_device *device, int port_num) +static int ib_agent_port_open(struct ib_device *device, int port_num) { int ret; u64 iova = 0; @@ -523,20 +525,20 @@ .addr = 0, .size = (unsigned long) high_memory - PAGE_OFFSET }; - struct ib_smi_port_private *entry, *port_priv = NULL; + struct ib_agent_port_private *entry, *port_priv = NULL; struct ib_mad_reg_req reg_req; unsigned long flags; /* First, check if port already open for SMI */ - spin_lock_irqsave(&ib_smi_port_list_lock, flags); - list_for_each_entry(entry, &ib_smi_port_list, port_list) { + spin_lock_irqsave(&ib_agent_port_list_lock, flags); + list_for_each_entry(entry, &ib_agent_port_list, port_list) { if (entry->dr_smp_mad_agent->device == device && entry->port_num == port_num) { port_priv = entry; break; } } - spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); if (port_priv) { printk(KERN_DEBUG SPFX "%s port %d already open\n", device->name, port_num); @@ -546,7 +548,7 @@ /* Create new device info */ port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); if (!port_priv) { - printk(KERN_ERR SPFX "No memory for ib_smi_port_private\n"); + printk(KERN_ERR SPFX "No memory for ib_agent_port_private\n"); ret = -ENOMEM; goto error1; } @@ -555,7 +557,7 @@ port_priv->port_num = port_num; port_priv->wr_id = 0; spin_lock_init(&port_priv->send_list_lock); - INIT_LIST_HEAD(&port_priv->send_posted_smp_list); + INIT_LIST_HEAD(&port_priv->send_posted_list); /* Obtain MAD agent for directed route SM class */ reg_req.mgmt_class = IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE; @@ -571,8 +573,8 @@ port_priv->dr_smp_mad_agent = ib_register_mad_agent(device, port_num, IB_QPT_SMI, ®_req, 0, - &smi_send_handler, - &smi_recv_handler, + &agent_send_handler, + &agent_recv_handler, NULL); if (IS_ERR(port_priv->dr_smp_mad_agent)) { ret = PTR_ERR(port_priv->dr_smp_mad_agent); @@ -584,8 +586,8 @@ port_priv->lr_smp_mad_agent = ib_register_mad_agent(device, port_num, IB_QPT_SMI, ®_req, 0, - &smi_send_handler, - &smi_recv_handler, + &agent_send_handler, + &agent_recv_handler, NULL); if (IS_ERR(port_priv->lr_smp_mad_agent)) { ret = PTR_ERR(port_priv->lr_smp_mad_agent); @@ -599,8 +601,8 @@ port_priv->pma_mad_agent = ib_register_mad_agent(device, port_num, IB_QPT_GSI, ®_req, 0, - &smi_send_handler, - &smi_recv_handler, + &agent_send_handler, + &agent_recv_handler, NULL); if (IS_ERR(port_priv->pma_mad_agent)) { ret = PTR_ERR(port_priv->pma_mad_agent); @@ -616,9 +618,9 @@ goto error5; } - spin_lock_irqsave(&ib_smi_port_list_lock, flags); - list_add_tail(&port_priv->port_list, &ib_smi_port_list); - spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + spin_lock_irqsave(&ib_agent_port_list_lock, flags); + list_add_tail(&port_priv->port_list, &ib_agent_port_list); + spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); return 0; @@ -634,13 +636,13 @@ return ret; } -static int ib_smi_port_close(struct ib_device *device, int port_num) +static int ib_agent_port_close(struct ib_device *device, int port_num) { - struct ib_smi_port_private *entry, *port_priv = NULL; + struct ib_agent_port_private *entry, *port_priv = NULL; unsigned long flags; - spin_lock_irqsave(&ib_smi_port_list_lock, flags); - list_for_each_entry(entry, &ib_smi_port_list, port_list) { + spin_lock_irqsave(&ib_agent_port_list_lock, flags); + list_for_each_entry(entry, &ib_agent_port_list, port_list) { if (entry->dr_smp_mad_agent->device == device && entry->port_num == port_num) { port_priv = entry; @@ -650,12 +652,12 @@ if (port_priv == NULL) { printk(KERN_ERR SPFX "Port %d not found\n", port_num); - spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); return -ENODEV; } list_del(&port_priv->port_list); - spin_unlock_irqrestore(&ib_smi_port_list_lock, flags); + spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); ib_dereg_mr(port_priv->mr); ib_unregister_mad_agent(port_priv->pma_mad_agent); @@ -666,7 +668,7 @@ return 0; } -static void ib_smi_init_device(struct ib_device *device) +static void ib_agent_init_device(struct ib_device *device) { int ret, num_ports, cur_port, i, ret2; struct ib_device_attr device_attr; @@ -686,7 +688,7 @@ } for (i = 0; i < num_ports; i++, cur_port++) { - ret = ib_smi_port_open(device, cur_port); + ret = ib_agent_port_open(device, cur_port); if (ret) { printk(KERN_ERR SPFX "Couldn't open %s port %d\n", device->name, cur_port); @@ -699,7 +701,7 @@ error_device_open: while (i > 0) { cur_port--; - ret2 = ib_smi_port_close(device, cur_port); + ret2 = ib_agent_port_close(device, cur_port); if (ret2) { printk(KERN_ERR SPFX "Couldn't close %s port %d\n", device->name, cur_port); @@ -711,7 +713,7 @@ return; } -static void ib_smi_remove_device(struct ib_device *device) +static void ib_agent_remove_device(struct ib_device *device) { int ret, i, num_ports, cur_port, ret2; struct ib_device_attr device_attr; @@ -730,7 +732,7 @@ cur_port = 1; } for (i = 0; i < num_ports; i++, cur_port++) { - ret2 = ib_smi_port_close(device, cur_port); + ret2 = ib_agent_port_close(device, cur_port); if (ret2) { printk(KERN_ERR SPFX "Couldn't close %s port %d\n", device->name, cur_port); @@ -743,27 +745,27 @@ return; } -static struct ib_client ib_smi_client = { - .name = "ib_smi", - .add = ib_smi_init_device, - .remove = ib_smi_remove_device +static struct ib_client ib_agent_client = { + .name = "ib_agent", + .add = ib_agent_init_device, + .remove = ib_agent_remove_device }; -static int __init ib_smi_init(void) +static int __init ib_agent_init(void) { - INIT_LIST_HEAD(&ib_smi_port_list); - if (ib_register_client(&ib_smi_client)) { - printk(KERN_ERR SPFX "Couldn't register ib_smi client\n"); + INIT_LIST_HEAD(&ib_agent_port_list); + if (ib_register_client(&ib_agent_client)) { + printk(KERN_ERR SPFX "Couldn't register ib_agent client\n"); return -EINVAL; } return 0; } -static void __exit ib_smi_exit(void) +static void __exit ib_agent_exit(void) { - ib_unregister_client(&ib_smi_client); + ib_unregister_client(&ib_agent_client); } -module_init(ib_smi_init); -module_exit(ib_smi_exit); +module_init(ib_agent_init); +module_exit(ib_agent_exit); From halr at voltaire.com Wed Oct 13 06:25:30 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 09:25:30 -0400 Subject: [openib-general] [PATCH] ib_smi: Add physical port count into agent private structure Message-ID: <1097673930.2751.45.camel@localhost.localdomain> ib_smi: Add physical port count into agent private structure (Needed by switch SMI to validate port number in initial/return paths) Index: ib_smi_priv.h =================================================================== --- ib_smi_priv.h (revision 977) +++ ib_smi_priv.h (working copy) @@ -41,6 +41,7 @@ struct list_head send_posted_list; spinlock_t send_list_lock; int port_num; + int phys_port_cnt; struct ib_mad_agent *dr_smp_mad_agent; /* DR SM class */ struct ib_mad_agent *lr_smp_mad_agent; /* LR SM class */ struct ib_mad_agent *pma_mad_agent; /* PerfMgt class */ Index: ib_smi.c =================================================================== --- ib_smi.c (revision 977) +++ ib_smi.c (working copy) @@ -517,7 +517,8 @@ ib_free_recv_mad(mad_recv_wc); } -static int ib_agent_port_open(struct ib_device *device, int port_num) +static int ib_agent_port_open(struct ib_device *device, int port_num, + int num_ports) { int ret; u64 iova = 0; @@ -555,6 +556,7 @@ memset(port_priv, 0, sizeof *port_priv); port_priv->port_num = port_num; + port_priv->phys_port_cnt = num_ports; port_priv->wr_id = 0; spin_lock_init(&port_priv->send_list_lock); INIT_LIST_HEAD(&port_priv->send_posted_list); @@ -688,7 +690,7 @@ } for (i = 0; i < num_ports; i++, cur_port++) { - ret = ib_agent_port_open(device, cur_port); + ret = ib_agent_port_open(device, cur_port, num_ports); if (ret) { printk(KERN_ERR SPFX "Couldn't open %s port %d\n", device->name, cur_port); From halr at voltaire.com Wed Oct 13 07:38:05 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 10:38:05 -0400 Subject: [openib-general] [PATCH] ib_smi: Add port number into receive SMI checking Message-ID: <1097678285.2751.66.camel@localhost.localdomain> ib_smi: Add port number into receive SMI checking Index: ib_smi.c =================================================================== --- ib_smi.c (revision 981) +++ ib_smi.c (working copy) @@ -148,7 +148,8 @@ * dropped. */ static int smi_handle_dr_smp_recv(struct ib_mad_agent *mad_agent, - struct ib_smp *smp) + struct ib_smp *smp, + int phys_port_cnt) { u8 hop_ptr, hop_cnt; @@ -168,8 +169,7 @@ smp->return_path[hop_ptr] = mad_agent->port_num; /* smp->hop_ptr updated when sending */ - return 1; /*(smp->initial_path[hop_ptr+1] <= - mad_agent->device->phys_port_cnt); */ + return (smp->initial_path[hop_ptr+1] <= phys_port_cnt); } /* C14-9:3 -- We're at the end of the DR segment of path */ @@ -201,8 +201,7 @@ return 0; /* smp->hop_ptr updated when sending */ - return 1; /*(smp->return_path[hop_ptr-1] <= - mad_agent->device->phys_port_cnt); */ + return (smp->return_path[hop_ptr-1] <= phys_port_cnt); } /* C14-13:3 -- We're at the end of the DR segment of path */ @@ -227,12 +226,13 @@ * the spec. Return 0 if the SMP should be dropped. */ static int smi_handle_smp_recv(struct ib_mad_agent *mad_agent, - struct ib_smp *smp) + struct ib_smp *smp, + int phys_port_cnt) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_recv(mad_agent, smp); + return smi_handle_dr_smp_recv(mad_agent, smp, phys_port_cnt); default: /* LR SM or PerfMgmt classes */ return 1; } @@ -396,7 +396,8 @@ int smi_send_smp(struct ib_mad_agent *mad_agent, struct ib_smp *smp, struct ib_mad_recv_wc *mad_recv_wc, - u16 slid) + u16 slid, + int phys_port_cnt) { struct ib_mad *smp_response; int ret; @@ -415,7 +416,8 @@ smp_response, slid); if (ret & IB_MAD_RESULT_SUCCESS) { if (!smi_handle_smp_recv(mad_agent, - (struct ib_smp *)smp_response)) { + (struct ib_smp *)smp_response, + phys_port_cnt)) { /* SMI failed receive */ kfree(smp_response); return 0; @@ -432,16 +434,17 @@ int smi_recv_smp(struct ib_mad_agent *mad_agent, struct ib_smp *smp, - struct ib_mad_recv_wc *mad_recv_wc) + struct ib_mad_recv_wc *mad_recv_wc, + int phys_port_cnt) { - if (!smi_handle_smp_recv(mad_agent, smp)) { + if (!smi_handle_smp_recv(mad_agent, smp, phys_port_cnt)) { /* SMI failed receive */ return 0; } if (smi_check_forward_smp(mad_agent, smp)) { smi_send_smp(mad_agent, smp, mad_recv_wc, - mad_recv_wc->wc->slid); + mad_recv_wc->wc->slid, phys_port_cnt); return 0; } @@ -509,16 +512,36 @@ static void agent_recv_handler(struct ib_mad_agent *mad_agent, struct ib_mad_recv_wc *mad_recv_wc) { - smi_recv_smp(mad_agent, - (struct ib_smp *)mad_recv_wc->recv_buf->mad, - mad_recv_wc); + struct ib_agent_port_private *entry, *port_priv = NULL; + unsigned long flags; + /* Find matching MAD agent */ + spin_lock_irqsave(&ib_agent_port_list_lock, flags); + list_for_each_entry(entry, &ib_agent_port_list, port_list) { + if ((entry->dr_smp_agent == mad_agent) || + (entry->lr_smp_agent == mad_agent) || + (entry->perf_mgmt_agent == mad_agent)) { + port_priv = entry; + break; + } + } + spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); + if (!port_priv) { + printk(KERN_ERR SPFX "agent_recv_handler: no matching MAD agent 0x%x\n", + (unsigned int)mad_agent); + + } else { + smi_recv_smp(mad_agent, + (struct ib_smp *)mad_recv_wc->recv_buf->mad, + mad_recv_wc, port_priv->phys_port_cnt); + } + /* Free received MAD */ ib_free_recv_mad(mad_recv_wc); } static int ib_agent_port_open(struct ib_device *device, int port_num, - int num_ports) + int phys_port_cnt) { int ret; u64 iova = 0; @@ -556,7 +579,7 @@ memset(port_priv, 0, sizeof *port_priv); port_priv->port_num = port_num; - port_priv->phys_port_cnt = num_ports; + port_priv->phys_port_cnt = phys_port_cnt; port_priv->wr_id = 0; spin_lock_init(&port_priv->send_list_lock); INIT_LIST_HEAD(&port_priv->send_posted_list); From roland at topspin.com Wed Oct 13 08:14:42 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 13 Oct 2004 08:14:42 -0700 Subject: [openib-general] Questions about SMI In-Reply-To: <1097664305.2751.20.camel@localhost.localdomain> (Hal Rosenstock's message of "Wed, 13 Oct 2004 06:45:05 -0400") References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> <1097607124.31469.97.camel@hpc-1> <524qkzix06.fsf@topspin.com> <1097610972.31469.105.camel@hpc-1> <52u0szh7ea.fsf@topspin.com> <1097664305.2751.20.camel@localhost.localdomain> Message-ID: <52ekk2fy59.fsf@topspin.com> Hal> It seems like the thread ended with an unanswered Hal> question. The answer appears to be that process_mad was Hal> used. Is that what we want to do for OpenIB ? That seems simplest to me but I'm not opposed to adding another driver entry point if that makes things simpler. Hal> Is this to be done for all MADs, all SMPs, or only certain Hal> ones ? If certain ones, is there a flag to indicate which Hal> ones need to be fed to the driver ? Again, all MADs seems simplest but I'm not opposed to another method. We need to do more than just SMPs because Tavor also generates Baseboard Management traps that need to be forwarded. If we want to limit which MADs to give to the low-level driver we could add a flag to ib_driver.flags, like "IB_MAD_TRAP_FORWARDING" or something. Hal> Does that avoid the issue of more than 1 consumer for a Hal> particular MAD ? That was another "rule" that I thought we Hal> achieved consensus on. It looks like the consumed flag takes Hal> care of this. Right, if you don't count the low-level driver seeing the MAD first. We can get by with one agent per MAD other than that. Hal> Would there also need to be a send routine provided (by the Hal> MAD layer) which does not require an agent registration ? If Hal> so, is there a way to ensure that the driver is the only Hal> component to use this routine ? I don't see any reason why the driver needs a special send routine -- it should work fine for the driver to create a couple of MAD agents for each port and use them to send MADs. If I'm wrong about this and we need a special send routine, I don't think we have to do anything to prevent other code from using the routine beyond marking it "for low-level driver use only" in a comment -- this is the kernel so we can trust other code not to be malicious. I'm not that concerned about the details of how we do this for now. We can always fix it up later, so I suggest you implement whatever seems easiest to you and I will fix up mthca to match. - Roland From halr at voltaire.com Wed Oct 13 09:49:35 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 12:49:35 -0400 Subject: [openib-general] [PATCH] ib_mad: Only validate that receive completion handler is supplied if registration request supplied Message-ID: <1097686175.2751.158.camel@localhost.localdomain> ib_mad: Only validate that receive completion handler is supplied if registration request supplied in ib_register_mad_agent Index: ib_mad.c =================================================================== --- ib_mad.c (revision 970) +++ ib_mad.c (working copy) @@ -116,7 +116,7 @@ goto error1; } - if (!send_handler || !recv_handler) { + if (!send_handler) { ret = ERR_PTR(-EINVAL); goto error1; } @@ -128,7 +128,8 @@ /* Validate MAD registration request if supplied */ if (mad_reg_req) { - if (mad_reg_req->mgmt_class_version >= MAX_MGMT_VERSION) { + if (!recv_handler || + mad_reg_req->mgmt_class_version >= MAX_MGMT_VERSION) { ret = ERR_PTR(-EINVAL); goto error1; } From halr at voltaire.com Wed Oct 13 10:04:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 13:04:01 -0400 Subject: [openib-general] Questions about SMI In-Reply-To: <52ekk2fy59.fsf@topspin.com> References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> <1097607124.31469.97.camel@hpc-1> <524qkzix06.fsf@topspin.com> <1097610972.31469.105.camel@hpc-1> <52u0szh7ea.fsf@topspin.com> <1097664305.2751.20.camel@localhost.localdomain> <52ekk2fy59.fsf@topspin.com> Message-ID: <1097687041.2751.177.camel@localhost.localdomain> On Wed, 2004-10-13 at 11:14, Roland Dreier wrote: > Hal> It seems like the thread ended with an unanswered > Hal> question. The answer appears to be that process_mad was > Hal> used. Is that what we want to do for OpenIB ? > > That seems simplest to me but I'm not opposed to adding another driver > entry point if that makes things simpler. > Hal> Is this to be done for all MADs, all SMPs, or only certain > Hal> ones ? If certain ones, is there a flag to indicate which > Hal> ones need to be fed to the driver ? > > Again, all MADs seems simplest but I'm not opposed to another method. A separate driver entry point (which would be NULL in other device cases) seems slightly cleaner to me. How about: int snoop_mad(struct ib_device *ibdev, u8 port_num, u16 slid, struct ib_mad *mad); > We need to do more than just SMPs because Tavor also generates > Baseboard Management traps that need to be forwarded. Got it: it's any locally generated traps. Do they all show up on QP0 regardless of the class of the trap ? > If we want to limit which MADs to give to the low-level driver > we could add a flag to ib_driver.flags, like "IB_MAD_TRAP_FORWARDING" > or something. This seems to be adding more device specific knowledge (although somewhat abstracted) into the MAD layer receive handler so letting the device driver see them all and decide which it wants is probably better. > Hal> Does that avoid the issue of more than 1 consumer for a > Hal> particular MAD ? That was another "rule" that I thought we > Hal> achieved consensus on. It looks like the consumed flag takes > Hal> care of this. > > Right, if you don't count the low-level driver seeing the MAD first. > We can get by with one agent per MAD other than that. I'm counting the low level driver :-) It depends on what the driver will do with the MAD. There are "ownership" issues on the MAD. > Hal> Would there also need to be a send routine provided (by the > Hal> MAD layer) which does not require an agent registration ? If > Hal> so, is there a way to ensure that the driver is the only > Hal> component to use this routine ? > > I don't see any reason why the driver needs a special send routine -- > it should work fine for the driver to create a couple of MAD agents > for each port and use them to send MADs. Right. a MAD registration without a supplied ib_mad_reg_req structure should work. You will need to pick up the latest ib_mad.c to have a shot at this working. I will work on testing this useage in the meantime. > If I'm wrong about this and we need a special send routine, I don't think > we have to do anything > to prevent other code from using the routine beyond marking it "for > low-level driver use only" in a comment -- this is the kernel so we > can trust other code not to be malicious. > > I'm not that concerned about the details of how we do this for now. > We can always fix it up later, so I suggest you implement whatever > seems easiest to you and I will fix up mthca to match. OK. It shouldn't take long to code this up. Testing will take a little longer. Is there any easy way to cause one of these traps to be generated ? Thanks. -- Hal From roland at topspin.com Wed Oct 13 10:27:02 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 13 Oct 2004 10:27:02 -0700 Subject: [openib-general] Questions about SMI In-Reply-To: <1097687041.2751.177.camel@localhost.localdomain> (Hal Rosenstock's message of "Wed, 13 Oct 2004 13:04:01 -0400") References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> <1097607124.31469.97.camel@hpc-1> <524qkzix06.fsf@topspin.com> <1097610972.31469.105.camel@hpc-1> <52u0szh7ea.fsf@topspin.com> <1097664305.2751.20.camel@localhost.localdomain> <52ekk2fy59.fsf@topspin.com> <1097687041.2751.177.camel@localhost.localdomain> Message-ID: <521xg2fs0p.fsf@topspin.com> Hal> int snoop_mad(struct ib_device *ibdev, Hal> u8 port_num, Hal> u16 slid, Hal> struct ib_mad *mad); That's fine with me. How about the return value -- I would suggest 0 for ignored, non-zero for consumed? So the MAD layer would do: if (device->snoop_mad(...)) { /* free mad */ /* stop processing */ } /* otherwise proceed as usual */ Hal> Got it: it's any locally generated traps. Do they all show up Hal> on QP0 regardless of the class of the trap ? No, I think BM traps arrive on QP1 (although I haven't tested). Hal> I'm counting the low level driver :-) It depends on what the Hal> driver will do with the MAD. There are "ownership" issues on Hal> the MAD. If you count the low-level driver then there can be two consumers per MAD: first snoop_mad() sees it and then agent dispatch gives it to a different consumer. Hal> OK. It shouldn't take long to code this up. Testing will take Hal> a little longer. Is there any easy way to cause one of these Hal> traps to be generated ? I think if you send an SMP with bad M_Key it should generate a trap. - R. From halr at voltaire.com Wed Oct 13 10:39:52 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 13:39:52 -0400 Subject: [openib-general] Questions about SMI In-Reply-To: <521xg2fs0p.fsf@topspin.com> References: <52ekk3j1ju.fsf@topspin.com> <1097602747.31469.54.camel@hpc-1> <52acurizv4.fsf@topspin.com> <1097607124.31469.97.camel@hpc-1> <524qkzix06.fsf@topspin.com> <1097610972.31469.105.camel@hpc-1> <52u0szh7ea.fsf@topspin.com> <1097664305.2751.20.camel@localhost.localdomain> <52ekk2fy59.fsf@topspin.com> <1097687041.2751.177.camel@localhost.localdomain> <521xg2fs0p.fsf@topspin.com> Message-ID: <1097689192.2751.210.camel@localhost.localdomain> On Wed, 2004-10-13 at 13:27, Roland Dreier wrote: > Hal> int snoop_mad(struct ib_device *ibdev, > Hal> u8 port_num, > Hal> u16 slid, > Hal> struct ib_mad *mad); > > That's fine with me. How about the return value -- I would suggest 0 > for ignored, non-zero for consumed? Sounds good. > So the MAD layer would do: > > if (device->snoop_mad(...)) { > /* free mad */ > /* stop processing */ > } > > /* otherwise proceed as usual */ Almost. The MAD layer would check for a valid snoop_mad routine first. Details... details... > Hal> Got it: it's any locally generated traps. Do they all show up > Hal> on QP0 regardless of the class of the trap ? > > No, I think BM traps arrive on QP1 (although I haven't tested). OK. The MAD layer will push all MADs (on QPs 0 and 1) to snoop_mad if provided. > Hal> Is there any easy way to cause one of these > Hal> traps to be generated ? > > I think if you send an SMP with bad M_Key it should generate a trap. Thanks. I'll give that a shot and see if I can generate one. -- Hal From halr at voltaire.com Wed Oct 13 11:31:19 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 14:31:19 -0400 Subject: [openib-general] ib_mad: Implement snoop_mad in receive handler for locally generated traps Message-ID: <1097692278.2751.273.camel@localhost.localdomain> ib_mad: Implement snoop_mad in receive handler for locally generated traps Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 970) +++ include/ib_verbs.h (working copy) @@ -638,10 +638,14 @@ enum ib_mad_result { IB_MAD_RESULT_FAILURE = 0, /* (!SUCCESS is the important flag) */ IB_MAD_RESULT_SUCCESS = 1 << 0, /* MAD was successfully processed */ - IB_MAD_RESULT_REPLY = 1 << 1, /* Reply packet needs to be sent */ - IB_MAD_RESULT_CONSUMED = 1 << 2 /* Packet consumed: stop processing */ + IB_MAD_RESULT_REPLY = 1 << 1 /* Reply packet needs to be sent */ }; +enum ib_snoop_mad_result { + IB_SNOOP_MAD_IGNORED, + IB_SNOOP_MAD_CONSUMED +}; + #define IB_DEVICE_NAME_MAX 64 struct ib_device { @@ -760,6 +764,10 @@ u16 source_lid, struct ib_mad *in_mad, struct ib_mad *out_mad); + int (*snoop_mad)(struct ib_device *device, + u8 port_num, + u16 source_lid, + struct ib_mad *mad); struct class_device class_dev; u8 node_type; Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 984) +++ access/ib_mad.c (working copy) @@ -926,6 +926,16 @@ if (!validate_mad(recv->header.recv_buf.mad, qp_num)) goto ret; + /* Snoop MAD ? */ + if (port_priv->device->snoop_mad) { + if (port_priv->device->snoop_mad(port_priv->device, + port_priv->port_num, + wc->slid, + recv->header.recv_buf.mad)) { + goto ret; + } + } + spin_lock_irqsave(&port_priv->reg_lock, flags); /* Determine corresponding MAD agent for incoming receive MAD */ solicited = solicited_mad(recv->header.recv_buf.mad); From halr at voltaire.com Wed Oct 13 11:36:42 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 14:36:42 -0400 Subject: [openib-general] [PATCH] mthca: Implement snoop_mad] Message-ID: <1097692601.2751.286.camel@localhost.localdomain> mthca: Implement snoop_mad Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 970) +++ include/ib_verbs.h (working copy) @@ -654,10 +654,14 @@ enum ib_mad_result { IB_MAD_RESULT_FAILURE = 0, /* (!SUCCESS is the important flag) */ IB_MAD_RESULT_SUCCESS = 1 << 0, /* MAD was successfully processed */ - IB_MAD_RESULT_REPLY = 1 << 1, /* Reply packet needs to be sent */ - IB_MAD_RESULT_CONSUMED = 1 << 2 /* Packet consumed: stop processing */ + IB_MAD_RESULT_REPLY = 1 << 1 /* Reply packet needs to be sent */ }; +enum ib_snoop_mad_result { + IB_SNOOP_MAD_IGNORED, + IB_SNOOP_MAD_CONSUMED +}; + #define IB_DEVICE_NAME_MAX 64 struct ib_device { @@ -771,6 +775,10 @@ u16 source_lid, struct ib_mad *in_mad, struct ib_mad *out_mad); + int (*snoop_mad)(struct ib_device *device, + u8 port_num, + u16 source_lid, + struct ib_mad *mad); struct class_device class_dev; struct kobject ports_parent; Index: hw/mthca/mthca_dev.h =================================================================== --- hw/mthca/mthca_dev.h (revision 970) +++ hw/mthca/mthca_dev.h (working copy) @@ -352,6 +352,10 @@ int mthca_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid); int mthca_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid); +int mthca_snoop_mad(struct ib_device *ibdev, + u8 port_num, + u16 slid, + struct ib_mad *mad); int mthca_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, Index: hw/mthca/mthca_provider.c =================================================================== --- hw/mthca/mthca_provider.c (revision 970) +++ hw/mthca/mthca_provider.c (working copy) @@ -581,6 +581,7 @@ dev->ib_dev.attach_mcast = mthca_multicast_attach; dev->ib_dev.detach_mcast = mthca_multicast_detach; dev->ib_dev.process_mad = mthca_process_mad; + dev->ib_dev.snoop_mad = mthca_snoop_mad; ret = ib_register_device(&dev->ib_dev); if (ret) Index: hw/mthca/mthca_mad.c =================================================================== --- hw/mthca/mthca_mad.c (revision 970) +++ hw/mthca/mthca_mad.c (working copy) @@ -69,33 +69,44 @@ } } -int mthca_process_mad(struct ib_device *ibdev, - int mad_flags, - u8 port_num, - u16 slid, - struct ib_mad *in_mad, - struct ib_mad *out_mad) +int mthca_snoop_mad(struct ib_device *ibdev, + u8 port_num, + u16 slid, + struct ib_mad *mad) { - int err; - u8 status; - /* Forward locally generated traps to the SM */ - if (in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED && - in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && + if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED && + mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && slid == 0) { struct ib_sm_path sm_path; ib_cached_sm_path_get(ibdev, port_num, &sm_path); if (sm_path.sm_lid) { - in_mad->sqpn = 0; - in_mad->dlid = sm_path.sm_lid; - in_mad->completion_func = NULL; - ib_mad_send(in_mad); + mad->sqpn = 0; + mad->dlid = sm_path.sm_lid; + mad->completion_func = NULL; +#if 0 + ib_mad_send(mad); +#else + printk(KERN_ERR "mthca_snoop_mad: ib_mad_send\n"); +#endif } - return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED; + return IB_SNOOP_MAD_CONSUMED; } + return IB_SNOOP_MAD_IGNORED; +} +int mthca_process_mad(struct ib_device *ibdev, + int mad_flags, + u8 port_num, + u16 slid, + struct ib_mad *in_mad, + struct ib_mad *out_mad) +{ + int err; + u8 status; + /* * Only handle SM gets, sets and trap represses for SM class * From roland at topspin.com Wed Oct 13 13:23:16 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 13 Oct 2004 13:23:16 -0700 Subject: [openib-general] [PATCH] mthca: Implement snoop_mad] In-Reply-To: <1097692601.2751.286.camel@localhost.localdomain> (Hal Rosenstock's message of "Wed, 13 Oct 2004 14:36:42 -0400") References: <1097692601.2751.286.camel@localhost.localdomain> Message-ID: <52sm8ie5aj.fsf@topspin.com> Thanks. Your patch was severely whitespace damaged, so I implemented snoop_mad by hand as below. Something about your mailer (Evolution?) or the way you're using it is killing all the tabs and wrapping some lines. In any case this should be pretty much the same as your version. - R. Index: infiniband/include/ib_verbs.h =================================================================== --- infiniband/include/ib_verbs.h (revision 948) +++ infiniband/include/ib_verbs.h (working copy) @@ -654,10 +654,14 @@ enum ib_mad_result { IB_MAD_RESULT_FAILURE = 0, /* (!SUCCESS is the important flag) */ IB_MAD_RESULT_SUCCESS = 1 << 0, /* MAD was successfully processed */ - IB_MAD_RESULT_REPLY = 1 << 1, /* Reply packet needs to be sent */ - IB_MAD_RESULT_CONSUMED = 1 << 2 /* Packet consumed: stop processing */ + IB_MAD_RESULT_REPLY = 1 << 1 /* Reply packet needs to be sent */ }; +enum ib_snoop_mad_result { + IB_SNOOP_MAD_IGNORED, + IB_SNOOP_MAD_CONSUMED +}; + #define IB_DEVICE_NAME_MAX 64 struct ib_device { @@ -771,6 +775,10 @@ u16 source_lid, struct ib_mad *in_mad, struct ib_mad *out_mad); + enum ib_snoop_mad_result (*snoop_mad)(struct ib_device *device, + u8 port_num, + u16 source_lid, + struct ib_mad *mad); struct class_device class_dev; struct kobject ports_parent; Index: infiniband/core/mad_filter.c =================================================================== --- infiniband/core/mad_filter.c (revision 915) +++ infiniband/core/mad_filter.c (working copy) @@ -301,13 +301,6 @@ mad->mad_hdr.mgmt_class, be16_to_cpu(mad->mad_hdr.attr_id)); - /* If the packet was consumed, we don't want to let anyone else look at it. - * This is a special case for hardware (tavor) which uses the input queue - * to generate traps. - */ - if (ret & IB_MAD_RESULT_CONSUMED) - goto no_response; - /* Look at incoming MADs to see if they match any filters. * Outgoing MADs are checked in ib_mad_work_thread(). */ Index: infiniband/hw/mthca/mthca_dev.h =================================================================== --- infiniband/hw/mthca/mthca_dev.h (revision 915) +++ infiniband/hw/mthca/mthca_dev.h (working copy) @@ -358,6 +358,10 @@ u16 slid, struct ib_mad *in_mad, struct ib_mad *out_mad); +enum ib_snoop_mad_result mthca_snoop_mad(struct ib_device *ibdev, + u8 port_num, + u16 slid, + struct ib_mad *mad); static inline struct mthca_dev *to_mdev(struct ib_device *ibdev) { Index: infiniband/hw/mthca/mthca_provider.c =================================================================== --- infiniband/hw/mthca/mthca_provider.c (revision 949) +++ infiniband/hw/mthca/mthca_provider.c (working copy) @@ -581,6 +581,7 @@ dev->ib_dev.attach_mcast = mthca_multicast_attach; dev->ib_dev.detach_mcast = mthca_multicast_detach; dev->ib_dev.process_mad = mthca_process_mad; + dev->ib_dev.snoop_mad = mthca_snoop_mad; ret = ib_register_device(&dev->ib_dev); if (ret) Index: infiniband/hw/mthca/mthca_mad.c =================================================================== --- infiniband/hw/mthca/mthca_mad.c (revision 915) +++ infiniband/hw/mthca/mthca_mad.c (working copy) @@ -79,23 +79,6 @@ int err; u8 status; - /* Forward locally generated traps to the SM */ - if (in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED && - in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && - slid == 0) { - struct ib_sm_path sm_path; - - ib_cached_sm_path_get(ibdev, port_num, &sm_path); - if (sm_path.sm_lid) { - in_mad->sqpn = 0; - in_mad->dlid = sm_path.sm_lid; - in_mad->completion_func = NULL; - ib_mad_send(in_mad); - } - - return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED; - } - /* * Only handle SM gets, sets and trap represses for SM class * @@ -154,6 +137,21 @@ return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; } +enum ib_snoop_mad_result mthca_snoop_mad(struct ib_device *ibdev, + u8 port_num, + u16 slid, + struct ib_mad *mad) +{ + if (mad->mad_hdr.method != IB_MGMT_METHOD_TRAP || + mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE || + slid != 0) + return IB_SNOOP_MAD_IGNORED; + + /* XXX: forward locally generated MAD to SM */ + + return IB_SNOOP_MAD_CONSUMED; +} + /* * Local Variables: * c-file-style: "linux" From halr at voltaire.com Wed Oct 13 13:31:58 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 16:31:58 -0400 Subject: [openib-general] [PATCH] mthca: Implement snoop_mad] In-Reply-To: <52sm8ie5aj.fsf@topspin.com> References: <1097692601.2751.286.camel@localhost.localdomain> <52sm8ie5aj.fsf@topspin.com> Message-ID: <1097699518.2751.327.camel@localhost.localdomain> On Wed, 2004-10-13 at 16:23, Roland Dreier wrote: > Thanks. Your patch was severely whitespace damaged, so I implemented > snoop_mad by hand as below. Something about your mailer (Evolution?) > or the way you're using it is killing all the tabs and wrapping some > lines. Yeah. I'm doing things a little differently today. It's not Evolution. I'll get it straightened out. Thanks for letting me know. > In any case this should be pretty much the same as your > version. Looks good. I will integrate and test tomorrow. I should be able to let you know by early afternoon eastern time. I will also update my patches for your tree :-) As they approach nil... -- Hal From libor at topspin.com Wed Oct 13 13:44:20 2004 From: libor at topspin.com (Libor Michalek) Date: Wed, 13 Oct 2004 13:44:20 -0700 Subject: [openib-general] [PATCH] move IP2PR/SDP to the native IPoIB driver In-Reply-To: <52pt3vpoxz.fsf@topspin.com>; from roland@topspin.com on Wed, Oct 06, 2004 at 01:32:56PM -0700 References: <52pt3vpoxz.fsf@topspin.com> Message-ID: <20041013134420.A16145@topspin.com> On Wed, Oct 06, 2004 at 01:32:56PM -0700, Roland Dreier wrote: > I've just committed this patch. It removes the fake ethernet layer > and starts turning IPoIB into a native driver (with addr_len 20 and > type ARPHRD_INFINIBAND). The driver is working pretty well with these > changes, although multicast is not working at all and there are lots > of leaks and races that I still need to fix up. > > ip2pr (and indirectly sdp) are broken by these changes, but Libor has > said he will work on fixing this up. This patch gets ip2pr working on top of the new native ipoib driver that you checked in. The main change was removing the step of looking up the hardware address, from the arp cache or arp packet header, in the ipoib hardware-to-gid table. Instead we now use the gid directly in the arp cache or arp packet header. -Libor Index: infiniband/ulp/Kconfig =================================================================== --- infiniband/ulp/Kconfig (revision 974) +++ infiniband/ulp/Kconfig (working copy) @@ -32,7 +32,7 @@ config INFINIBAND_SDP tristate "Sockets Direct Protocol" - depends on BROKEN && INFINIBAND && INFINIBAND_IPOIB + depends on INFINIBAND && INFINIBAND_IPOIB select INFINIBAND_CM ---help--- Support for Sockets Direct Protocol (SDP). This provides Index: infiniband/ulp/ipoib/Makefile =================================================================== --- infiniband/ulp/ipoib/Makefile (revision 985) +++ infiniband/ulp/ipoib/Makefile (working copy) @@ -3,10 +3,8 @@ -D_NO_DATA_PATH_TRACE obj-$(CONFIG_INFINIBAND_IPOIB) += ib_ipoib.o +obj-$(CONFIG_INFINIBAND_IPOIB) += ib_ip2pr.o -# ip2pr is BROKEN now -# obj-$(CONFIG_INFINIBAND_IPOIB) += ib_ip2pr.o - ib_ipoib-objs := \ ipoib_main.o \ ipoib_ib.o \ Index: infiniband/ulp/ipoib/ip2pr_priv.h =================================================================== --- infiniband/ulp/ipoib/ip2pr_priv.h (revision 985) +++ infiniband/ulp/ipoib/ip2pr_priv.h (working copy) @@ -122,10 +122,10 @@ /* * begin ethernet */ - u8 src_hw[ETH_ALEN]; - u32 src_ip; - u8 dst_hw[ETH_ALEN]; - u32 dst_ip; + struct ip2pr_ipoib_addr src_hw; + u32 src_ip; + struct ip2pr_ipoib_addr dst_hw; + u32 dst_ip; } __attribute__ ((packed)); typedef enum { @@ -193,30 +193,32 @@ * wait for an ARP event to complete. */ struct ip2pr_ipoib_wait { - s8 type; /* ip2pr or gid2pr */ - tIP2PR_PATH_LOOKUP_ID plid; /* request identifier */ - void *func; /* callback function for completion */ - void *arg; /* user argument */ - struct net_device *dev; /* ipoib device */ - tTS_KERNEL_TIMER_STRUCT timer; /* retry timer */ - u8 retry; /* retry counter */ - u8 flags; /* usage flags */ - u8 state; /* current state */ - u8 hw[ETH_ALEN]; /* hardware address */ - u32 src_addr; /* requested address. */ - u32 dst_addr; /* requested address. */ - u32 gw_addr; /* next hop IP address */ - u8 local_rt; /* local route only */ - s32 bound_dev; /* bound device interface */ - tTS_IB_GID src_gid; /* source GID */ - tTS_IB_GID dst_gid; /* destination GID */ - u16 pkey; /* pkey to use */ - tTS_IB_PORT hw_port; /* hardware port */ - struct ib_device *ca; /* hardware HCA */ - u32 prev_timeout; /* timeout value for pending request */ - tTS_IB_CLIENT_QUERY_TID tid; /* path record lookup transactionID */ + s8 type; /* ip2pr or gid2pr */ + tIP2PR_PATH_LOOKUP_ID plid; /* request identifier */ + void *func; /* callback function for completion */ + void *arg; /* user argument */ + u8 state; /* current state */ + u8 flags; /* usage flags */ + tTS_KERNEL_TIMER_STRUCT timer; /* retry timer */ + u8 retry; /* retry counter */ + u32 prev_timeout; /* timeout value for pending request */ + tTS_IB_CLIENT_QUERY_TID tid; /* path record lookup transactionID */ + + u8 local_rt; /* local route only */ + s32 bound_dev; /* bound device interface */ + u32 src_addr; /* requested address. */ + u32 dst_addr; /* requested address. */ + u32 gw_addr; /* next hop IP address */ + + struct net_device *dev; /* ipoib device */ + struct ip2pr_ipoib_addr src_hw; /* source QP/GID */ + struct ip2pr_ipoib_addr dst_hw; /* destination QP/GID */ + u16 pkey; /* pkey to use */ + tTS_IB_PORT hw_port; /* hardware port */ + struct ib_device *ca; /* hardware HCA */ + spinlock_t lock; - struct ip2pr_ipoib_wait *next; /* next element in wait list. */ + struct ip2pr_ipoib_wait *next; /* next element in wait list. */ struct ip2pr_ipoib_wait **p_next; /* previous next element in list */ struct work_struct arp_completion; }; Index: infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- infiniband/ulp/ipoib/ip2pr_link.c (revision 985) +++ infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -687,9 +687,10 @@ /* ip2pr_path_record_complete -- path lookup complete, save result */ static s32 ip2pr_path_record_complete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, struct ib_path_record *path, - s32 remaining, void *arg) + s32 remaining, + void *arg) { - struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *) arg; + struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *)arg; struct ip2pr_path_element *path_elmt = NULL; s32 result; @@ -703,7 +704,6 @@ tid, ipoib_wait->tid); return -EFAULT; } - /* if */ /* * path lookup is complete */ @@ -714,58 +714,69 @@ remaining); } - /* if */ + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, + "POST: Status <%d> path completion:", status); + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, + "POST: <%p:%d:%04x> <%016llx:%016llx> <%016llx:%016llx>", + ipoib_wait->ca, + ipoib_wait->hw_port, + ipoib_wait->pkey, + be64_to_cpu(ipoib_wait->src_hw.gid.s.high), + be64_to_cpu(ipoib_wait->src_hw.gid.s.low), + be64_to_cpu(ipoib_wait->dst_hw.gid.s.high), + be64_to_cpu(ipoib_wait->dst_hw.gid.s.low)); /* * Save result. */ switch (status) { case -ETIMEDOUT: - if (0 < ipoib_wait->retry--) { - ip2pr_path_timeout++; - ipoib_wait->prev_timeout = (ipoib_wait->prev_timeout * 2); /* backoff */ - if (ipoib_wait->prev_timeout > _tsIp2prLinkRoot.backoff) - ipoib_wait->prev_timeout = - _tsIp2prLinkRoot.backoff; + if (0 == ipoib_wait->retry--) { /* - * reinitiate path record resolution - */ - result = tsIbPathRecordRequest(ipoib_wait->ca, - ipoib_wait->hw_port, - ipoib_wait->src_gid, - ipoib_wait->dst_gid, - ipoib_wait->pkey, - TS_IB_PATH_RECORD_FORCE_REMOTE, - (ipoib_wait-> - prev_timeout * HZ) + - (jiffies & 0x0f), 0, - ip2pr_path_record_complete, - ipoib_wait, - &ipoib_wait->tid); - if (0 != result) { - - TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "FUNC: Error initiating path record request. <%d>", - result); - status = result; - goto callback; - } /* if */ - } /* if */ - else { - /* * no more retries, failure. */ goto callback; - } /* else */ + } + + ip2pr_path_timeout++; + ipoib_wait->prev_timeout = (ipoib_wait->prev_timeout * 2); + if (ipoib_wait->prev_timeout > _tsIp2prLinkRoot.backoff) { + + ipoib_wait->prev_timeout = _tsIp2prLinkRoot.backoff; + } + /* + * reinitiate path record resolution + */ + result = tsIbPathRecordRequest(ipoib_wait->ca, + ipoib_wait->hw_port, + ipoib_wait->src_hw.gid.all, + ipoib_wait->dst_hw.gid.all, + ipoib_wait->pkey, + TS_IB_PATH_RECORD_FORCE_REMOTE, + (ipoib_wait->prev_timeout * HZ) + + (jiffies & 0x0f), + 0, + ip2pr_path_record_complete, + ipoib_wait, + &ipoib_wait->tid); + if (0 != result) { + + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, + "FUNC: Path record request error. <%d>", + result); + + status = result; + goto callback; + } /* if */ + break; case 0: TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, - "POST: Path record lookup complete. <%016llx:%016llx:%d>", - be64_to_cpu(*(u64 *) path->dgid), - be64_to_cpu(*(u64 *) - (path->dgid + sizeof(u64))), + "POST: Path record complete. <%016llx:%016llx:%d>", + be64_to_cpu(*(u64 *)path->dgid), + be64_to_cpu(*(u64 *)(path->dgid + sizeof(u64))), path->dlid); result = ip2pr_path_element_create(ipoib_wait->dst_addr, @@ -774,25 +785,27 @@ ipoib_wait->ca, path, &path_elmt); if (0 > result) { - + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "POST: Error <%d> creating path element.", result); status = result; } - /* if */ + goto callback; default: TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "POST: Error <%d> in path record completion.", status); + "POST: Error <%d> in path record completion.", + status); goto callback; - } /* switch */ + } return 0; - callback: +callback: + if (0 < TS_IP2PR_IPOIB_FLAG_GET_FUNC(ipoib_wait)) { result = ip2pr_path_lookup_complete(ipoib_wait->plid, @@ -802,12 +815,14 @@ ipoib_wait->arg); if (0 > result) { - TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "PATH: Error <%d> completing Path Record Lookup.", - result); + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, + "FUNC: completion error <%d> <%d:%08x>", + result, status, ipoib_wait->dst_addr); } + TS_IP2PR_IPOIB_FLAG_CLR_FUNC(ipoib_wait); } + if (0 < TS_IP2PR_IPOIB_FLAGS_EMPTY(ipoib_wait)) { result = ip2pr_ipoib_wait_destroy(ipoib_wait, @@ -823,7 +838,8 @@ /* ip2pr_link_find_complete -- complete the resolution of an ip address */ static s32 ip2pr_link_find_complete(struct ip2pr_ipoib_wait *ipoib_wait, - s32 status, IP2PR_USE_LOCK use_lock) + s32 status, + IP2PR_USE_LOCK use_lock) { s32 result = 0; s32 expect; @@ -838,39 +854,26 @@ status = -EFAULT; goto done; } - /* if */ + if (0 != status) { goto done; } - /* if */ /* - * lookup real address - */ - result = ipoib_get_gid(ipoib_wait->dev, ipoib_wait->hw, - ipoib_wait->dst_gid); - if (0 > result) { - - TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, - "FUNC: Error <%d> on shadow cache lookup.", result); - return (0); - } - /* if */ - /* * reset retry counter */ ipoib_wait->retry = _tsIp2prLinkRoot.max_retries; ipoib_wait->prev_timeout = _tsIp2prLinkRoot.retry_timeout; - /* * initiate path record resolution */ spin_lock_irqsave(&ipoib_wait->lock, flags); if (ipoib_wait->state == IP2PR_STATE_ARP_WAIT) { + result = tsIbPathRecordRequest(ipoib_wait->ca, ipoib_wait->hw_port, - ipoib_wait->src_gid, - ipoib_wait->dst_gid, + ipoib_wait->src_hw.gid.all, + ipoib_wait->dst_hw.gid.all, ipoib_wait->pkey, TS_IB_PATH_RECORD_FORCE_REMOTE, (ipoib_wait->prev_timeout * HZ) + @@ -881,21 +884,24 @@ if (0 != result) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "FUNC: Error initiating path record request. <%d>", + "FUNC: Path record request error. <%d>", result); status = result; spin_unlock_irqrestore(&ipoib_wait->lock, flags); goto done; - } /* if */ + } + ipoib_wait->state = IP2PR_STATE_PATH_WAIT; - } else { + } + else { + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "FUNC: Invalid state "); } spin_unlock_irqrestore(&ipoib_wait->lock, flags); return 0; - done: +done: if (0 < TS_IP2PR_IPOIB_FLAG_GET_FUNC(ipoib_wait)) { result = ip2pr_path_lookup_complete(ipoib_wait->plid, @@ -906,19 +912,19 @@ if (0 > result) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, - "FUNC: Error <%d> completing address resolution. <%d:%08x>", + "FUNC: completion error <%d> <%d:%08x>", result, status, ipoib_wait->dst_addr); } - /* if */ + TS_IP2PR_IPOIB_FLAG_CLR_FUNC(ipoib_wait); } - /* if */ + if (0 < TS_IP2PR_IPOIB_FLAGS_EMPTY(ipoib_wait)) { - + expect = ip2pr_ipoib_wait_destroy(ipoib_wait, use_lock); TS_EXPECT(MOD_IP2PR, !(0 > expect)); } - /* if */ + return 0; } @@ -928,19 +934,27 @@ struct neighbour *neigh; extern struct neigh_table arp_tbl; - neigh = neigh_lookup(&arp_tbl, &ipoib_wait->dst_addr, ipoib_wait->dev); - if (neigh) { + neigh = neigh_lookup(&arp_tbl, + &ipoib_wait->dst_addr, + ipoib_wait->dev); + if (NULL != neigh) { + read_lock_bh(&neigh->lock); + *state = neigh->nud_state; - memcpy(ipoib_wait->hw, neigh->ha, sizeof(ipoib_wait->hw)); + memcpy(&ipoib_wait->dst_hw, + neigh->ha, + sizeof(ipoib_wait->dst_hw)); + read_unlock_bh(&neigh->lock); - return (0); - } else { - memset(ipoib_wait->hw, 0, sizeof(ipoib_wait->hw)); + return 0; + } + else { + memset(&ipoib_wait->dst_hw, 0, sizeof(ipoib_wait->dst_hw)); *state = 0; - return (-ENOENT); + return -ENOENT; } } @@ -953,30 +967,27 @@ char devname[20]; int i; + struct flowi fl = { + .oif = ipoib_wait->bound_dev, + .nl_u = { + .ip4_u = { + .daddr = ipoib_wait->dst_addr, + .saddr = ipoib_wait->src_addr, + .tos = 0, + } + }, + .proto = 0, + .uli_u = { + .ports = { + .sport = 0, + .dport = 0, + } + } + }; + TS_CHECK_NULL(ipoib_wait, -EINVAL); - { - struct flowi fl = { - .oif = ipoib_wait->bound_dev, /* oif */ - .nl_u = { - .ip4_u = { - .daddr = ipoib_wait->dst_addr, /* dst */ - .saddr = ipoib_wait->src_addr, /* src */ - .tos = 0, /* tos */ - } - }, - .proto = 0, /* protocol */ - .uli_u = { - .ports = { - .sport = 0, /* sport */ - .dport = 0, /* dport */ - } - } - }; - - result = ip_route_output_key(&rt, &fl); - } - + result = ip_route_output_key(&rt, &fl); if (0 > result || NULL == rt) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, @@ -985,7 +996,6 @@ ipoib_wait->local_rt, ipoib_wait->bound_dev); return result; } - /* if */ /* * check route flags */ @@ -994,11 +1004,11 @@ ip_rt_put(rt); return -ENETUNREACH; } - /* if */ /* * check that device is IPoIB */ - if (NULL == rt->u.dst.neighbour || NULL == rt->u.dst.neighbour->dev) { + if (NULL == rt->u.dst.neighbour || + NULL == rt->u.dst.neighbour->dev) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, "FIND: No neighbour found for <%08x:%08x>", @@ -1007,48 +1017,53 @@ result = -EINVAL; goto error; } - /* if */ + if (0 > TS_IP2PR_IPOIB_DEV_TOPSPIN(rt->u.dst.neighbour->dev) && 0 == (IFF_LOOPBACK & rt->u.dst.neighbour->dev->flags)) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, - "FIND: Destination or neighbour device is not IPoIB. <%s:%08x>", + "FIND: Nneighbour device is not IPoIB. <%s:%08x>", rt->u.dst.neighbour->dev->name, rt->u.dst.neighbour->dev->flags); result = -ENETUNREACH; goto error; } - /* if */ + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, "FIND: Found neighbour. <%08x:%08x:%08x> nud state <%02x>", rt->rt_src, rt->rt_dst, rt->rt_gateway, rt->u.dst.neighbour->nud_state); - ipoib_wait->gw_addr = rt->rt_gateway; + ipoib_wait->gw_addr = rt->rt_gateway; ipoib_wait->src_addr = rt->rt_src; /* * device needs to be a valid IB device. Check for loopback. */ - ipoib_wait->dev = - ((0 < - (IFF_LOOPBACK & rt->u.dst.neighbour->dev-> - flags)) ? ip_dev_find(rt->rt_src) : rt->u.dst.neighbour->dev); + if (0 < (IFF_LOOPBACK & rt->u.dst.neighbour->dev->flags)) { + ipoib_wait->dev = ip_dev_find(rt->rt_src); + } + else { + + ipoib_wait->dev = rt->u.dst.neighbour->dev; + } + if (NULL == ipoib_wait->dev) { + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, "network device is null\n", rt->rt_src, ipoib_wait->dev->name); result = -EINVAL; goto error; } - /* * if loopback, check if src device is an ib device. Allow lo device */ if (((0 > TS_IP2PR_IPOIB_DEV_TOPSPIN(ipoib_wait->dev)) && (0 != strncmp(ipoib_wait->dev->name, "lo", 2))) && (IFF_LOOPBACK & rt->u.dst.neighbour->dev->flags)) { + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, "<%08x> is loopback, but is on device %s\n", rt->rt_src, ipoib_wait->dev->name); @@ -1061,100 +1076,102 @@ TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, "dev->flags 0x%lx means loopback", ipoib_wait->dev->flags); - /* * Get ipoib interface which is active */ - for (i = 0; i < (IP2PR_MAX_HCAS * 2); i++) { + for (i = 0, ipoib_wait->dev = NULL; + i < (IP2PR_MAX_HCAS * 2); + i++) { + sprintf(devname, "ib%d", i); - if (NULL != - (ipoib_wait->dev = dev_get_by_name(devname))) { - if (0 < (IFF_UP & ipoib_wait->dev->flags)) { - break; - } + ipoib_wait->dev = dev_get_by_name(devname); + + if (NULL != ipoib_wait->dev && + 0 < (IFF_UP & ipoib_wait->dev->flags)) { + + break; } } - if (IP2PR_MAX_HCAS == i) - ipoib_wait->dev = NULL; } - /* if */ /* * Verify device. */ if (NULL == ipoib_wait->dev) { - + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, - "FIND: No device for IB communications <%s:%08x:%08x>", + "FIND: No dev for IB communications <%s:%08x:%08x>", rt->u.dst.neighbour->dev->name, rt->u.dst.neighbour->dev->flags, rt->rt_src); result = -EINVAL; goto error; } - - /* if */ /* * lookup local information */ result = ipoib_device_handle(ipoib_wait->dev, &ipoib_wait->ca, &ipoib_wait->hw_port, - ipoib_wait->src_gid, &ipoib_wait->pkey); - + ipoib_wait->src_hw.gid.all, + &ipoib_wait->pkey); if (0 > result) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "FUNC: Error <%d> looking up local device information.", + "FUNC: Error <%d> looking up local device info.", result); goto error; } - /* if */ + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, - "FIND: hca <%04x> for port <%02x> gidp <%p>", - ipoib_wait->ca, ipoib_wait->hw_port, ipoib_wait->src_gid); - + "FIND: hca <%04x> for port <%02x> gid <%016llx:%016llx>", + ipoib_wait->ca, + ipoib_wait->hw_port, + be64_to_cpu(ipoib_wait->src_hw.gid.s.high), + be64_to_cpu(ipoib_wait->src_hw.gid.s.low)); /* * if this is a loopback connection, find the local source interface * and get the associated HW address. */ if (rt->u.dst.neighbour->dev->flags & IFF_LOOPBACK) { - memcpy((char *)ipoib_wait->hw, - (char *)ipoib_wait->dev->dev_addr, - sizeof(ipoib_wait->hw)); - } else { + + memcpy(&ipoib_wait->dst_hw, + ipoib_wait->dev->dev_addr, + sizeof(ipoib_wait->dst_hw)); + + goto complete; + } + /* + * Not Lookback. Get the Mac address from arp + */ + result = ip2pr_arp_query(ipoib_wait, &state); + + if (result || + state & NUD_FAILED || + !memcmp(ipoib_wait->dst_hw.gid.all, nullgid, sizeof(nullgid))) { /* - * Not Lookback. Get the Mac address from arp + * No arp entry. Create a Wait entry and send + * Arp request */ - result = ip2pr_arp_query(ipoib_wait, &state); - if ((result) || (state & NUD_FAILED) || - ((ipoib_wait->hw[0] == 0) && - (ipoib_wait->hw[1] == 0) && - (ipoib_wait->hw[2] == 0) && - (ipoib_wait->hw[3] == 0) && - (ipoib_wait->hw[4] == 0) && (ipoib_wait->hw[5] == 0))) { - /* - * No arp entry. Create a Wait entry and send Arp request - */ - result = ip2pr_ipoib_wait_list_insert(ipoib_wait); - if (0 > result) { + result = ip2pr_ipoib_wait_list_insert(ipoib_wait); + if (0 > result) { - TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, - TRACE_FLOW_WARN, - "FIND: Error <%d> inserting wait for address resolution.", - result); - goto error; - } - /* if */ - arp_send(ARPOP_REQUEST, - ETH_P_ARP, - rt->rt_gateway, - rt->u.dst.neighbour->dev, - ipoib_wait->src_addr, - NULL, - rt->u.dst.neighbour->dev->dev_addr, NULL); - return (0); + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, + "FIND: Error <%d> inserting wait request.", + result); + goto error; } + + arp_send(ARPOP_REQUEST, + ETH_P_ARP, + rt->rt_gateway, + rt->u.dst.neighbour->dev, + ipoib_wait->src_addr, + NULL, + rt->u.dst.neighbour->dev->dev_addr, + NULL); + return 0; } +complete: /* * We have a valid arp entry or this is a loopback interface. */ @@ -1162,17 +1179,17 @@ if (0 > result) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, - "FIND: Error <%d> completing address lookup. <%08x:%08x>", + "FIND: Error <%d> completing lookup. <%08x:%08x>", result, ipoib_wait->src_addr, ipoib_wait->dst_addr); goto error; - } /* if */ - return (0); - error: + } + return 0; +error: return result; } -/** +/* * Arp packet reception for completions */ @@ -1194,33 +1211,39 @@ next_wait = ipoib_wait->next; - if ((ip_addr == ipoib_wait->gw_addr) && - (LOOKUP_IP2PR == ipoib_wait->type)) { + if (ip_addr != ipoib_wait->gw_addr || + LOOKUP_IP2PR != ipoib_wait->type) { - spin_lock(&ipoib_wait->lock); - if ((ipoib_wait->state & IP2PR_STATE_ARP_WAIT) == 0) { - spin_unlock(&ipoib_wait->lock); - ipoib_wait = next_wait; - continue; - } + continue; + } + + spin_lock(&ipoib_wait->lock); + + if ((ipoib_wait->state & IP2PR_STATE_ARP_WAIT) == 0) { + spin_unlock(&ipoib_wait->lock); + ipoib_wait = next_wait; - TS_IP2PR_IPOIB_FLAG_CLR_TASK(ipoib_wait); + continue; + } + spin_unlock(&ipoib_wait->lock); - result = ip2pr_link_find_complete(ipoib_wait, 0, 0); - if (0 > result) { - TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, - TRACE_FLOW_WARN, - "FIND: Error <%d> completing address lookup. <%08x>", - result, ipoib_wait->dst_addr); + TS_IP2PR_IPOIB_FLAG_CLR_TASK(ipoib_wait); - result = ip2pr_ipoib_wait_destroy(ipoib_wait, - IP2PR_LOCK_HELD); - TS_EXPECT(MOD_IP2PR, !(0 > result)); - } + result = ip2pr_link_find_complete(ipoib_wait, 0, 0); + if (0 > result) { + + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, + "FIND: Error <%d> completing lookup. <%08x>", + result, ipoib_wait->dst_addr); + + result = ip2pr_ipoib_wait_destroy(ipoib_wait, + IP2PR_LOCK_HELD); + TS_EXPECT(MOD_IP2PR, !(0 > result)); } + ipoib_wait = next_wait; - } /* while */ + } spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); return; @@ -1261,8 +1284,10 @@ * determine if anyone is waiting for this ARP response. */ spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); + for (counter = 0, ipoib_wait = _tsIp2prLinkRoot.wait_list; - NULL != ipoib_wait; ipoib_wait = ipoib_wait->next) { + NULL != ipoib_wait; + ipoib_wait = ipoib_wait->next) { /* skip gid2pr lookup entries */ if (LOOKUP_GID2PR == ipoib_wait->type) { @@ -1273,7 +1298,8 @@ (ipoib_wait->state & IP2PR_STATE_ARP_WAIT)) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, - "RECV: Arp Recv for <%08x>.", arp_hdr->src_ip); + "RECV: Arp Recv for <%08x>.", + arp_hdr->src_ip); /* * remove timer, before scheduling the task. */ @@ -1281,8 +1307,9 @@ /* * save results */ - memcpy(ipoib_wait->hw, arp_hdr->src_hw, - sizeof(ipoib_wait->hw)); + memcpy(&ipoib_wait->dst_hw, + &arp_hdr->src_hw, + sizeof(ipoib_wait->dst_hw)); /* * flags */ @@ -1299,13 +1326,14 @@ * Schedule the ARP completion. */ if (0 < counter) { - INIT_WORK(tqp, ip2pr_arp_recv_complete, + + INIT_WORK(tqp, + ip2pr_arp_recv_complete, (void *)(unsigned long)arp_hdr->src_ip); schedule_work(tqp); } - /* if */ - done: +done: kfree_skb(skb); return 0; } @@ -1702,7 +1730,7 @@ struct ip2pr_sgid_element *gid_node = NULL; struct ip2pr_gid_pr_element *prn_elmt; - if (ip2pr_src_gid_node_get(ipoib_wait->src_gid, &gid_node)) + if (ip2pr_src_gid_node_get(ipoib_wait->src_hw.gid.all, &gid_node)) return (-EINVAL); prn_elmt = kmem_cache_alloc(_tsIp2prLinkRoot.gid_pr_cache, SLAB_ATOMIC); @@ -1849,29 +1877,32 @@ /* if */ switch (status) { case -ETIMEDOUT: - if (0 < ipoib_wait->retry--) { - ipoib_wait->tid = TS_IB_CLIENT_QUERY_TID_INVALID; - result = tsIbPathRecordRequest(ipoib_wait->ca, - ipoib_wait->hw_port, - ipoib_wait->src_gid, - ipoib_wait->dst_gid, - ipoib_wait->pkey, - TS_IB_PATH_RECORD_FORCE_REMOTE, - TS_IP2PR_DEV_PATH_WAIT, - 0, - gid2pr_complete, - ipoib_wait, - &ipoib_wait->tid); - if (0 > result) { - status = result; - TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "PATH: Error <%d> Completing Path Record Request.", - result); - goto callback; - } - } else { + + if (0 == ipoib_wait->retry--) { + goto callback; } + + ipoib_wait->tid = TS_IB_CLIENT_QUERY_TID_INVALID; + result = tsIbPathRecordRequest(ipoib_wait->ca, + ipoib_wait->hw_port, + ipoib_wait->src_hw.gid.all, + ipoib_wait->dst_hw.gid.all, + ipoib_wait->pkey, + TS_IB_PATH_RECORD_FORCE_REMOTE, + TS_IP2PR_DEV_PATH_WAIT, + 0, + gid2pr_complete, + ipoib_wait, + &ipoib_wait->tid); + if (0 > result) { + status = result; + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, + "PATH: Path Record Request error. <%d>", + result); + goto callback; + } + break; case 0: /* @@ -1948,8 +1979,8 @@ ipoib_wait->ca = gid_node->ca; ipoib_wait->hw_port = gid_node->port; ipoib_wait->pkey = pkey; - memcpy(ipoib_wait->src_gid, src_gid, sizeof(src_gid)); - memcpy(ipoib_wait->dst_gid, dst_gid, sizeof(dst_gid)); + memcpy(ipoib_wait->src_hw.gid.all, src_gid, sizeof(src_gid)); + memcpy(ipoib_wait->dst_hw.gid.all, dst_gid, sizeof(dst_gid)); result = ip2pr_ipoib_wait_list_insert(ipoib_wait); if (0 > result) { From roland at topspin.com Wed Oct 13 15:59:48 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 13 Oct 2004 15:59:48 -0700 Subject: [openib-general] Re: [PATCH] move IP2PR/SDP to the native IPoIB driver In-Reply-To: <20041013134420.A16145@topspin.com> (Libor Michalek's message of "Wed, 13 Oct 2004 13:44:20 -0700") References: <52pt3vpoxz.fsf@topspin.com> <20041013134420.A16145@topspin.com> Message-ID: <52lleady1n.fsf@topspin.com> Cool, I applied this to my branch. For bonus points change ip2pr to use ib_register_client()/ib_unregister_client() so we can get rid of these warnings (and so it works when HCA drivers are loaded after SDP): /data/home/roland/Src/linux-2.6.8.1-openib/drivers/infiniband/ulp/ipoib/ip2pr_link.c:2070: warning: b_device_get_by_index' is deprecated (declared at /data/home/roland/Src/linux-2.6.8.1-openib/drivers/infiniband/include/ts_ib_core.h:41) /data/home/roland/Src/linux-2.6.8.1-openib/drivers/infiniband/ulp/ipoib/ip2pr_link.c:2165: warning: b_device_get_by_index' is deprecated (declared at /data/home/roland/Src/linux-2.6.8.1-openib/drivers/infiniband/include/ts_ib_core.h:41) - R. From mshefty at ichips.intel.com Wed Oct 13 16:02:52 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Oct 2004 16:02:52 -0700 Subject: [openib-general] [PATCH] single exit in ib_mad_init_module Message-ID: <20041013160252.611c98e7.mshefty@ichips.intel.com> Patch to fix cleanup issue in ib_mad_init_module. - Sean -- Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 987) +++ access/ib_mad.c (working copy) @@ -1886,6 +1886,8 @@ static int __init ib_mad_init_module(void) { + int ret; + ib_mad_cache = kmem_cache_create("ib_mad", sizeof(struct ib_mad_private), 0, @@ -1894,17 +1896,23 @@ NULL); if (!ib_mad_cache) { printk(KERN_ERR PFX "Couldn't create ib_mad cache\n"); - return -ENOMEM; + ret = -ENOMEM; + goto error1; } INIT_LIST_HEAD(&ib_mad_port_list); if (ib_register_client(&mad_client)) { printk(KERN_ERR PFX "Couldn't register ib_mad client\n"); - return -EINVAL; + ret = -EINVAL; + goto error2; } - return 0; + +error2: + kmem_cache_destroy(ib_mad_cache); +error1: + return ret; } static void __exit ib_mad_cleanup_module(void) From mshefty at ichips.intel.com Wed Oct 13 16:52:44 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Oct 2004 16:52:44 -0700 Subject: [openib-general] [PATCH] fix list_entry usage Message-ID: <20041013165244.3ae058c4.mshefty@ichips.intel.com> Patch fixes casting to incorrect structures when calling list_entry(). - Sean -- Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 987) +++ access/ib_mad.c (working copy) @@ -887,10 +887,9 @@ */ spin_lock_irqsave(&port_priv->recv_list_lock, flags); if (!list_empty(&port_priv->recv_posted_mad_list[qpn])) { - rbuf = list_entry(&port_priv->recv_posted_mad_list[qpn], + rbuf = list_entry(&port_priv->recv_posted_mad_list[qpn].next, struct ib_mad_recv_buf, list); - rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; mad_priv_hdr = container_of(rbuf, struct ib_mad_private_header, recv_buf); recv = container_of(mad_priv_hdr, struct ib_mad_private, @@ -1031,7 +1030,6 @@ struct ib_wc *wc) { struct ib_mad_send_wr_private *mad_send_wr; - struct list_head *send_wr; unsigned long flags; /* Completion corresponds to first entry on posted MAD send list */ @@ -1042,12 +1040,9 @@ goto error; } - mad_send_wr = list_entry(&port_priv->send_posted_mad_list, + mad_send_wr = list_entry(&port_priv->send_posted_mad_list.next, struct ib_mad_send_wr_private, send_list); - send_wr = mad_send_wr->send_list.next; - mad_send_wr = container_of(send_wr, struct ib_mad_send_wr_private, - send_list); if (wc->wr_id != (unsigned long)mad_send_wr) { printk(KERN_ERR PFX "Send completion WR ID 0x%Lx doesn't match " "posted send WR ID 0x%lx\n", @@ -1383,9 +1378,8 @@ spin_lock_irqsave(&port_priv->recv_list_lock, flags); while (!list_empty(&port_priv->recv_posted_mad_list[i])) { - rbuf = list_entry(&port_priv->recv_posted_mad_list[i], - struct ib_mad_recv_buf, list); - rbuf = (struct ib_mad_recv_buf *)rbuf->list.next; + rbuf = list_entry(&port_priv->recv_posted_mad_list[i].next, + struct ib_mad_recv_buf, list); mad_priv_hdr = container_of(rbuf, struct ib_mad_private_header, recv_buf); From libor at topspin.com Wed Oct 13 17:41:01 2004 From: libor at topspin.com (Libor Michalek) Date: Wed, 13 Oct 2004 17:41:01 -0700 Subject: [openib-general] [PATCH] dynamic device init for IP2PR In-Reply-To: <52lleady1n.fsf@topspin.com>; from roland@topspin.com on Wed, Oct 13, 2004 at 03:59:48PM -0700 References: <52pt3vpoxz.fsf@topspin.com> <20041013134420.A16145@topspin.com> <52lleady1n.fsf@topspin.com> Message-ID: <20041013174101.B16145@topspin.com> On Wed, Oct 13, 2004 at 03:59:48PM -0700, Roland Dreier wrote: > > For bonus points change ip2pr to use ib_register_client() > ib_unregister_client() so we can get rid of these warnings > (and so it works when HCA drivers are loaded after SDP): No problem, here's the patch. I tried it out by unloading/loading mthca. -Libor Index: infiniband/ulp/ipoib/ip2pr_priv.h =================================================================== --- infiniband/ulp/ipoib/ip2pr_priv.h (revision 988) +++ infiniband/ulp/ipoib/ip2pr_priv.h (working copy) @@ -250,7 +250,7 @@ */ struct ip2pr_gid_pr_element { struct ib_path_record path_record; - u32 usage; /* last used time. */ + u32 usage; /* last used time. */ struct ip2pr_gid_pr_element *next; struct ip2pr_gid_pr_element **p_next; }; Index: infiniband/ulp/ipoib/ip2pr_mod.c =================================================================== --- infiniband/ulp/ipoib/ip2pr_mod.c (revision 988) +++ infiniband/ulp/ipoib/ip2pr_mod.c (working copy) @@ -27,14 +27,12 @@ MODULE_DESCRIPTION("IB path record lookup module"); MODULE_LICENSE("Dual BSD/GPL"); -extern s32 ip2pr_link_addr_init(void); -extern s32 ip2pr_link_addr_cleanup(void); -extern s32 ip2pr_user_lookup(unsigned long arg); -extern s32 gid2pr_user_lookup(unsigned long arg); -extern s32 ip2pr_proc_fs_init(void); -extern s32 ip2pr_proc_fs_cleanup(void); -extern s32 ip2pr_src_gid_init(void); -extern s32 ip2pr_src_gid_cleanup(void); +extern int ip2pr_link_addr_init(void); +extern int ip2pr_link_addr_cleanup(void); +extern int ip2pr_user_lookup(unsigned long arg); +extern int gid2pr_user_lookup(unsigned long arg); +extern int ip2pr_proc_fs_init(void); +extern int ip2pr_proc_fs_cleanup(void); static int ip2pr_major_number = 240; static int ip2pr_open(struct inode *inode, struct file *fp); @@ -93,70 +91,86 @@ /* ip2pr_driver_init_module -- initialize the PathRecord Lookup host module */ int __init ip2pr_driver_init_module(void) { - s32 result = 0; + int result = 0; TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: Path Record Lookup module load."); - result = - register_chrdev(ip2pr_major_number, IP2PR_DEVNAME, &ip2pr_fops); + result = register_chrdev(ip2pr_major_number, + IP2PR_DEVNAME, + &ip2pr_fops); if (0 > result) { - TS_REPORT_FATAL(MOD_IP2PR, "Device registration failed"); - return (result); + + TS_REPORT_FATAL(MOD_IP2PR, + "Device registration error <%d>", result); + goto error_dev; } - if (ip2pr_major_number == 0) + + if (0 == ip2pr_major_number) { + ip2pr_major_number = result; + } result = ip2pr_proc_fs_init(); if (0 > result) { - TS_REPORT_FATAL(MOD_IP2PR, "Init: Error creating proc entries"); - unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME); - return (result); + + TS_REPORT_FATAL(MOD_IP2PR, + "Error <%d> creating proc entries", result); + goto error_fs; } result = ip2pr_link_addr_init(); if (0 > result) { - TS_REPORT_FATAL(MOD_IP2PR, "Device resource allocation failed"); - (void)ip2pr_proc_fs_cleanup(); - unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME); - return (result); - } - result = ip2pr_src_gid_init(); - if (0 > result) { - TS_REPORT_FATAL(MOD_IP2PR, "Gid resource allocation failed"); - (void)ip2pr_link_addr_cleanup(); - (void)ip2pr_proc_fs_cleanup(); - unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME); - return (result); + TS_REPORT_FATAL(MOD_IP2PR, + "Device resource allocation error <%d>", + result); + goto error_lnk; } - return (result); + return 0; +error_lnk: + (void)ip2pr_proc_fs_cleanup(); +error_fs: + unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME); +error_dev: + return result; } static void __exit ip2pr_driver_cleanup_module(void) { + int result; + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: Path Record Lookup module load."); - - if (unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME) != 0) { - TS_REPORT_WARN(MOD_UDAPL, "Cannot unregister device"); - } - /* - * Src Gid Cleanup - */ - (void)ip2pr_src_gid_cleanup(); - /* * link level addressing services. */ (void)ip2pr_link_addr_cleanup(); - /* * proc tables */ (void)ip2pr_proc_fs_cleanup(); + /* + * unregister character device. + */ + result = unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME); + if (result) { + TS_REPORT_WARN(MOD_IP2PR, "Cannot unregister device"); + } + + return; } module_init(ip2pr_driver_init_module); module_exit(ip2pr_driver_cleanup_module); + + + + + + + + + + Index: infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- infiniband/ulp/ipoib/ip2pr_link.c (revision 988) +++ infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -27,14 +27,13 @@ static tTS_KERNEL_TIMER_STRUCT _tsIp2prPathTimer; static tIP2PR_PATH_LOOKUP_ID _tsIp2prPathLookupId = 0; -static struct ib_event_handler _tsIp2prEventHandle[IP2PR_MAX_HCAS]; static unsigned int ip2pr_total_req = 0; static unsigned int ip2pr_arp_timeout = 0; static unsigned int ip2pr_path_timeout = 0; static unsigned int ip2pr_total_fail = 0; -static struct ip2pr_link_root _tsIp2prLinkRoot = { +static struct ip2pr_link_root _link_root = { wait_list:NULL, path_list:NULL, wait_lock:SPIN_LOCK_UNLOCKED, @@ -54,6 +53,15 @@ ((TS_IP2PR_PATH_LOOKUP_INVALID == ++_tsIp2prPathLookupId) ? \ ++_tsIp2prPathLookupId : _tsIp2prPathLookupId) +static void ip2pr_device_init_one(struct ib_device *device); +static void ip2pr_device_remove_one(struct ib_device *device); + +static struct ib_client ip2pr_client = { + .name = "ip2pr", + .add = ip2pr_device_init_one, + .remove = ip2pr_device_remove_one +}; + /** * Path Record lookup caching */ @@ -63,7 +71,7 @@ { struct ip2pr_path_element *path_elmt; - for (path_elmt = _tsIp2prLinkRoot.path_list; + for (path_elmt = _link_root.path_list; NULL != path_elmt; path_elmt = path_elmt->next) if (ip_addr == path_elmt->dst_addr) break; @@ -72,8 +80,10 @@ } /* ip2pr_path_element_create -- create an entry for a path record element */ -static s32 ip2pr_path_element_create(u32 dst_addr, u32 src_addr, - tTS_IB_PORT hw_port, struct ib_device *ca, +static s32 ip2pr_path_element_create(u32 dst_addr, + u32 src_addr, + tTS_IB_PORT hw_port, + struct ib_device *ca, struct ib_path_record *path_r, struct ip2pr_path_element **return_elmt) { @@ -82,23 +92,23 @@ TS_CHECK_NULL(path_r, -EINVAL); TS_CHECK_NULL(return_elmt, -EINVAL); - TS_CHECK_NULL(_tsIp2prLinkRoot.path_cache, -EINVAL); + TS_CHECK_NULL(_link_root.path_cache, -EINVAL); - path_elmt = kmem_cache_alloc(_tsIp2prLinkRoot.path_cache, SLAB_ATOMIC); + path_elmt = kmem_cache_alloc(_link_root.path_cache, SLAB_ATOMIC); if (NULL == path_elmt) return -ENOMEM; memset(path_elmt, 0, sizeof(*path_elmt)); - spin_lock_irqsave(&_tsIp2prLinkRoot.path_lock, flags); - path_elmt->next = _tsIp2prLinkRoot.path_list; - _tsIp2prLinkRoot.path_list = path_elmt; - path_elmt->p_next = &_tsIp2prLinkRoot.path_list; + spin_lock_irqsave(&_link_root.path_lock, flags); + path_elmt->next = _link_root.path_list; + _link_root.path_list = path_elmt; + path_elmt->p_next = &_link_root.path_list; if (NULL != path_elmt->next) path_elmt->next->p_next = &path_elmt->next; - spin_unlock_irqrestore(&_tsIp2prLinkRoot.path_lock, flags); + spin_unlock_irqrestore(&_link_root.path_lock, flags); /* * set values */ @@ -120,9 +130,9 @@ unsigned long flags; TS_CHECK_NULL(path_elmt, -EINVAL); - TS_CHECK_NULL(_tsIp2prLinkRoot.path_cache, -EINVAL); + TS_CHECK_NULL(_link_root.path_cache, -EINVAL); - spin_lock_irqsave(&_tsIp2prLinkRoot.path_lock, flags); + spin_lock_irqsave(&_link_root.path_lock, flags); if (NULL != path_elmt->p_next) { if (NULL != path_elmt->next) path_elmt->next->p_next = path_elmt->p_next; @@ -132,9 +142,9 @@ path_elmt->p_next = NULL; path_elmt->next = NULL; } /* if */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.path_lock, flags); + spin_unlock_irqrestore(&_link_root.path_lock, flags); - kmem_cache_free(_tsIp2prLinkRoot.path_cache, path_elmt); + kmem_cache_free(_link_root.path_cache, path_elmt); return 0; } @@ -176,10 +186,10 @@ unsigned long flags = 0; TS_CHECK_NULL(ipoib_wait, -EINVAL); - TS_CHECK_NULL(_tsIp2prLinkRoot.wait_cache, -EINVAL); + TS_CHECK_NULL(_link_root.wait_cache, -EINVAL); if (use_lock) - spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); + spin_lock_irqsave(&_link_root.wait_lock, flags); if (NULL != ipoib_wait->p_next) { if (NULL != ipoib_wait->next) { @@ -192,9 +202,9 @@ ipoib_wait->next = NULL; } /* if */ if (use_lock) - spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); + spin_unlock_irqrestore(&_link_root.wait_lock, flags); - kmem_cache_free(_tsIp2prLinkRoot.wait_cache, ipoib_wait); + kmem_cache_free(_link_root.wait_cache, ipoib_wait); return 0; } @@ -233,8 +243,8 @@ * rearm the timer (check for neighbour nud status?) */ ipoib_wait->prev_timeout = (ipoib_wait->prev_timeout * 2); /* backoff */ - if (ipoib_wait->prev_timeout > _tsIp2prLinkRoot.backoff) - ipoib_wait->prev_timeout = _tsIp2prLinkRoot.backoff; + if (ipoib_wait->prev_timeout > _link_root.backoff) + ipoib_wait->prev_timeout = _link_root.backoff; ipoib_wait->timer.run_time = jiffies + (ipoib_wait->prev_timeout * HZ) + (jiffies & 0x0f); @@ -283,9 +293,9 @@ { struct ip2pr_ipoib_wait *ipoib_wait; - TS_CHECK_NULL(_tsIp2prLinkRoot.wait_cache, NULL); + TS_CHECK_NULL(_link_root.wait_cache, NULL); - ipoib_wait = kmem_cache_alloc(_tsIp2prLinkRoot.wait_cache, SLAB_ATOMIC); + ipoib_wait = kmem_cache_alloc(_link_root.wait_cache, SLAB_ATOMIC); if (NULL != ipoib_wait) { memset(ipoib_wait, 0, sizeof(*ipoib_wait)); @@ -296,7 +306,7 @@ if (LOOKUP_IP2PR == ltype) { tsKernelTimerInit(&ipoib_wait->timer); ipoib_wait->timer.run_time = jiffies + - (_tsIp2prLinkRoot.retry_timeout * HZ); + (_link_root.retry_timeout * HZ); ipoib_wait->timer.function = ip2pr_ipoib_wait_timeout; ipoib_wait->timer.arg = ipoib_wait; } @@ -310,8 +320,8 @@ ipoib_wait->func = (void *) func; ipoib_wait->plid = plid; ipoib_wait->dev = 0; - ipoib_wait->retry = _tsIp2prLinkRoot.max_retries; - ipoib_wait->prev_timeout = _tsIp2prLinkRoot.retry_timeout; + ipoib_wait->retry = _link_root.max_retries; + ipoib_wait->prev_timeout = _link_root.retry_timeout; ipoib_wait->tid = TS_IB_CLIENT_QUERY_TID_INVALID; ipoib_wait->hw_port = 0; ipoib_wait->ca = NULL; @@ -338,17 +348,17 @@ return -EFAULT; } /* if */ - spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); + spin_lock_irqsave(&_link_root.wait_lock, flags); - ipoib_wait->next = _tsIp2prLinkRoot.wait_list; - _tsIp2prLinkRoot.wait_list = ipoib_wait; - ipoib_wait->p_next = &_tsIp2prLinkRoot.wait_list; + ipoib_wait->next = _link_root.wait_list; + _link_root.wait_list = ipoib_wait; + ipoib_wait->p_next = &_link_root.wait_list; if (NULL != ipoib_wait->next) { ipoib_wait->next->p_next = &ipoib_wait->next; } /* if */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); + spin_unlock_irqrestore(&_link_root.wait_lock, flags); /* * Start timer only for IP 2 PR lookup @@ -373,8 +383,8 @@ unsigned long flags; struct ip2pr_ipoib_wait *ipoib_wait; - spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); - for (ipoib_wait = _tsIp2prLinkRoot.wait_list; + spin_lock_irqsave(&_link_root.wait_lock, flags); + for (ipoib_wait = _link_root.wait_list; NULL != ipoib_wait; ipoib_wait = ipoib_wait->next) { if (plid == ipoib_wait->plid) { @@ -382,7 +392,7 @@ break; } /* if */ } /* for */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); + spin_unlock_irqrestore(&_link_root.wait_lock, flags); return ipoib_wait; } @@ -419,8 +429,8 @@ /* * loop across connections. */ - spin_lock_irqsave(&_tsIp2prLinkRoot.path_lock, flags); - for (path_elmt = _tsIp2prLinkRoot.path_list, counter = 0; + spin_lock_irqsave(&_link_root.path_lock, flags); + for (path_elmt = _link_root.path_list, counter = 0; NULL != path_elmt && !(TS_IP2PR_PATH_PROC_DUMP_SIZE > (max_size - offset)); path_elmt = path_elmt->next, counter++) { @@ -448,7 +458,7 @@ path_elmt->hw_port, path_elmt->usage); } /* if */ } /* for */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.path_lock, flags); + spin_unlock_irqrestore(&_link_root.path_lock, flags); if (!(start_index > counter)) { @@ -489,8 +499,8 @@ /* * loop across connections. */ - spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); - for (ipoib_wait = _tsIp2prLinkRoot.wait_list, counter = 0; + spin_lock_irqsave(&_link_root.wait_lock, flags); + for (ipoib_wait = _link_root.wait_list, counter = 0; NULL != ipoib_wait && !(TS_IP2PR_IPOIB_PROC_DUMP_SIZE > (max_size - offset)); ipoib_wait = ipoib_wait->next, counter++) { @@ -510,7 +520,7 @@ ipoib_wait->retry, ipoib_wait->flags); } /* if */ } /* for */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); + spin_unlock_irqrestore(&_link_root.wait_lock, flags); if (!(start_index > counter)) { @@ -541,7 +551,7 @@ long *end_index) { return (ip2pr_proc_read_int(buffer, max_size, start_index, - end_index, _tsIp2prLinkRoot.max_retries)); + end_index, _link_root.max_retries)); } /* ip2pr_proc-timeout_read -- dump current timeout value */ @@ -549,7 +559,7 @@ long *end_index) { return (ip2pr_proc_read_int(buffer, max_size, start_index, - end_index, _tsIp2prLinkRoot.retry_timeout)); + end_index, _link_root.retry_timeout)); } /* ip2pr_proc_backoff_read -- dump current backoff value */ @@ -557,7 +567,7 @@ long *end_index) { return (ip2pr_proc_read_int(buffer, max_size, start_index, - end_index, _tsIp2prLinkRoot.backoff)); + end_index, _link_root.backoff)); } /* ip2pr_proc_cache_timeout_read -- dump current cache timeout value */ @@ -565,7 +575,7 @@ long *end_index) { return (ip2pr_proc_read_int(buffer, max_size, start_index, - end_index, _tsIp2prLinkRoot.cache_timeout)); + end_index, _link_root.cache_timeout)); } /* ip2pr_proc_total_req -- dump current retry value */ @@ -633,7 +643,7 @@ ret = ip2pr_proc_write_int(file, buffer, count, pos, &val); if (val <= TS_IP2PR_PATH_MAX_RETRIES) - _tsIp2prLinkRoot.max_retries = val; + _link_root.max_retries = val; return (ret); } @@ -647,7 +657,7 @@ ret = ip2pr_proc_write_int(file, buffer, count, pos, &val); if (val <= TS_IP2PR_MAX_DEV_PATH_WAIT) - _tsIp2prLinkRoot.retry_timeout = val; + _link_root.retry_timeout = val; return (ret); } @@ -661,7 +671,7 @@ ret = ip2pr_proc_write_int(file, buffer, count, pos, &val); if (val <= TS_IP2PR_PATH_MAX_BACKOFF) - _tsIp2prLinkRoot.backoff = val; + _link_root.backoff = val; return (ret); } @@ -675,7 +685,7 @@ ret = ip2pr_proc_write_int(file, buffer, count, pos, &val); if (val <= TS_IP2PR_PATH_MAX_CACHE_TIMEOUT) - _tsIp2prLinkRoot.cache_timeout = val; + _link_root.cache_timeout = val; return (ret); } @@ -717,8 +727,8 @@ TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "POST: Status <%d> path completion:", status); TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, - "POST: <%p:%d:%04x> <%016llx:%016llx> <%016llx:%016llx>", - ipoib_wait->ca, + "POST: <%s:%d:%04x> <%016llx:%016llx> <%016llx:%016llx>", + ipoib_wait->ca->name, ipoib_wait->hw_port, ipoib_wait->pkey, be64_to_cpu(ipoib_wait->src_hw.gid.s.high), @@ -741,9 +751,9 @@ ip2pr_path_timeout++; ipoib_wait->prev_timeout = (ipoib_wait->prev_timeout * 2); - if (ipoib_wait->prev_timeout > _tsIp2prLinkRoot.backoff) { + if (ipoib_wait->prev_timeout > _link_root.backoff) { - ipoib_wait->prev_timeout = _tsIp2prLinkRoot.backoff; + ipoib_wait->prev_timeout = _link_root.backoff; } /* * reinitiate path record resolution @@ -862,8 +872,8 @@ /* * reset retry counter */ - ipoib_wait->retry = _tsIp2prLinkRoot.max_retries; - ipoib_wait->prev_timeout = _tsIp2prLinkRoot.retry_timeout; + ipoib_wait->retry = _link_root.max_retries; + ipoib_wait->prev_timeout = _link_root.retry_timeout; /* * initiate path record resolution */ @@ -1022,7 +1032,7 @@ 0 == (IFF_LOOPBACK & rt->u.dst.neighbour->dev->flags)) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, - "FIND: Nneighbour device is not IPoIB. <%s:%08x>", + "FIND: Neighbour device is not IPoIB. <%s:%08x>", rt->u.dst.neighbour->dev->name, rt->u.dst.neighbour->dev->flags); @@ -1122,8 +1132,8 @@ } TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, - "FIND: hca <%04x> for port <%02x> gid <%016llx:%016llx>", - ipoib_wait->ca, + "FIND: hca <%s> for port <%02x> gid <%016llx:%016llx>", + ipoib_wait->ca->name, ipoib_wait->hw_port, be64_to_cpu(ipoib_wait->src_hw.gid.s.high), be64_to_cpu(ipoib_wait->src_hw.gid.s.low)); @@ -1205,8 +1215,8 @@ TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "RECV: Arp completion for <%08x>.", ip_addr); - spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); - ipoib_wait = _tsIp2prLinkRoot.wait_list; + spin_lock_irqsave(&_link_root.wait_lock, flags); + ipoib_wait = _link_root.wait_list; while (NULL != ipoib_wait) { next_wait = ipoib_wait->next; @@ -1244,7 +1254,7 @@ ipoib_wait = next_wait; } - spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); + spin_unlock_irqrestore(&_link_root.wait_lock, flags); return; } @@ -1283,9 +1293,9 @@ /* * determine if anyone is waiting for this ARP response. */ - spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); + spin_lock_irqsave(&_link_root.wait_lock, flags); - for (counter = 0, ipoib_wait = _tsIp2prLinkRoot.wait_list; + for (counter = 0, ipoib_wait = _link_root.wait_list; NULL != ipoib_wait; ipoib_wait = ipoib_wait->next) { @@ -1320,7 +1330,7 @@ tqp = &ipoib_wait->arp_completion; } /* if */ } /* for */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); + spin_unlock_irqrestore(&_link_root.wait_lock, flags); /* * Schedule the ARP completion. @@ -1355,7 +1365,7 @@ /* * destroy all cached path record elements. */ - while (NULL != (path_elmt = _tsIp2prLinkRoot.path_list)) { + while (NULL != (path_elmt = _link_root.path_list)) { result = ip2pr_path_element_destroy(path_elmt); TS_EXPECT(MOD_IP2PR, !(0 > result)); @@ -1364,8 +1374,8 @@ /* * Mark the source gid node based on port state */ - spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); - for (sgid_elmt = _tsIp2prLinkRoot.src_gid_list; + spin_lock_irqsave(&_link_root.gid_lock, flags); + for (sgid_elmt = _link_root.src_gid_list; NULL != sgid_elmt; sgid_elmt = sgid_elmt->next) { if ((sgid_elmt->ca == record->device) && (sgid_elmt->port == record->element.port_num)) { @@ -1393,11 +1403,13 @@ break; } } - spin_unlock_irqrestore(&_tsIp2prLinkRoot.gid_lock, flags); + spin_unlock_irqrestore(&_link_root.gid_lock, flags); TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "Async Port Event on hca=<%d>,port=<%d>, event=%d", - record->device, record->element.port_num, record->event); + "Async Port Event on hca <%s> port <%d> event <%d>", + record->device->name, + record->element.port_num, + record->event); return; } @@ -1412,14 +1424,14 @@ struct ip2pr_gid_pr_element *prn_elmt, *next_prn; /* cache_timeout of zero implies static path records. */ - if (_tsIp2prLinkRoot.cache_timeout) { + if (_link_root.cache_timeout) { /* * arg entry is unused. */ - path_elmt = _tsIp2prLinkRoot.path_list; + path_elmt = _link_root.path_list; while (NULL != path_elmt) { next_elmt = path_elmt->next; - if (!((_tsIp2prLinkRoot.cache_timeout * HZ) > + if (!((_link_root.cache_timeout * HZ) > (s32) (jiffies - path_elmt->usage))) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, @@ -1438,12 +1450,12 @@ /* * Go thru' the GID List */ - sgid_elmt = _tsIp2prLinkRoot.src_gid_list; + sgid_elmt = _link_root.src_gid_list; while (NULL != sgid_elmt) { prn_elmt = sgid_elmt->pr_list; while (NULL != prn_elmt) { next_prn = prn_elmt->next; - if (!((_tsIp2prLinkRoot.cache_timeout * HZ) > + if (!((_link_root.cache_timeout * HZ) > (s32) (jiffies - prn_elmt->usage))) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, @@ -1648,8 +1660,8 @@ unsigned long flags; *gid_node = NULL; - spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); - for (sgid_elmt = _tsIp2prLinkRoot.src_gid_list; + spin_lock_irqsave(&_link_root.gid_lock, flags); + for (sgid_elmt = _link_root.src_gid_list; NULL != sgid_elmt; sgid_elmt = sgid_elmt->next) { if (IB_PORT_ACTIVE == sgid_elmt->port_state) { @@ -1687,7 +1699,7 @@ prn_elmt->usage = jiffies; spin_unlock_irqrestore - (&_tsIp2prLinkRoot.gid_lock, + (&_link_root.gid_lock, flags); return (0); } @@ -1695,7 +1707,7 @@ } } } - spin_unlock_irqrestore(&_tsIp2prLinkRoot.gid_lock, flags); + spin_unlock_irqrestore(&_link_root.gid_lock, flags); return (-ENOENT); } @@ -1707,17 +1719,17 @@ unsigned long flags; *gid_node = NULL; - spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); - for (sgid_elmt = _tsIp2prLinkRoot.src_gid_list; + spin_lock_irqsave(&_link_root.gid_lock, flags); + for (sgid_elmt = _link_root.src_gid_list; NULL != sgid_elmt; sgid_elmt = sgid_elmt->next) { if (0 == memcmp(sgid_elmt->gid, src_gid, sizeof(src_gid))) { *gid_node = sgid_elmt; - spin_unlock_irqrestore(&_tsIp2prLinkRoot.gid_lock, + spin_unlock_irqrestore(&_link_root.gid_lock, flags); return (0); } } - spin_unlock_irqrestore(&_tsIp2prLinkRoot.gid_lock, flags); + spin_unlock_irqrestore(&_link_root.gid_lock, flags); return (-EINVAL); } @@ -1733,7 +1745,7 @@ if (ip2pr_src_gid_node_get(ipoib_wait->src_hw.gid.all, &gid_node)) return (-EINVAL); - prn_elmt = kmem_cache_alloc(_tsIp2prLinkRoot.gid_pr_cache, SLAB_ATOMIC); + prn_elmt = kmem_cache_alloc(_link_root.gid_pr_cache, SLAB_ATOMIC); if (NULL == prn_elmt) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "PATH: Error Allocating prn memory."); @@ -1744,7 +1756,7 @@ /* * Insert into the ccache list */ - spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); + spin_lock_irqsave(&_link_root.gid_lock, flags); prn_elmt->next = gid_node->pr_list; gid_node->pr_list = prn_elmt; prn_elmt->p_next = &gid_node->pr_list; @@ -1754,7 +1766,7 @@ prn_elmt->next->p_next = &prn_elmt->next; } /* if */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.gid_lock, flags); + spin_unlock_irqrestore(&_link_root.gid_lock, flags); return (0); } @@ -1772,44 +1784,11 @@ prn_elmt->p_next = NULL; prn_elmt->next = NULL; } /* if */ - kmem_cache_free(_tsIp2prLinkRoot.gid_pr_cache, prn_elmt); + kmem_cache_free(_link_root.gid_pr_cache, prn_elmt); return (0); } -/* ip2pr_src_gid_delete -- Cleanup one node in Source GID List. */ -static s32 ip2pr_src_gid_delete(struct ip2pr_sgid_element *sgid_elmt) -{ - unsigned long flags; - struct ip2pr_gid_pr_element *prn_elmt; - - spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); - - /* - * Clear Path Record List for this Source GID node - */ - while (NULL != (prn_elmt = sgid_elmt->pr_list)) { - ip2pr_delete(prn_elmt); - } /* while */ - - if (NULL != sgid_elmt->p_next) { - - if (NULL != sgid_elmt->next) { - sgid_elmt->next->p_next = sgid_elmt->p_next; - } - /* if */ - *(sgid_elmt->p_next) = sgid_elmt->next; - - sgid_elmt->p_next = NULL; - sgid_elmt->next = NULL; - } /* if */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.gid_lock, flags); - - kmem_cache_free(_tsIp2prLinkRoot.src_gid_cache, sgid_elmt); - - return (0); -} - /* ip2pr_src_gid_add -- Add one node to Source GID List. */ s32 ip2pr_src_gid_add(struct ib_device *hca_device, tTS_IB_PORT port, @@ -1818,8 +1797,7 @@ struct ip2pr_sgid_element *sgid_elmt; unsigned long flags; - sgid_elmt = - kmem_cache_alloc(_tsIp2prLinkRoot.src_gid_cache, SLAB_ATOMIC); + sgid_elmt = kmem_cache_alloc(_link_root.src_gid_cache, SLAB_ATOMIC); if (NULL == sgid_elmt) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "PATH: Error Allocating sgidn memory."); @@ -1829,10 +1807,9 @@ memset(sgid_elmt, 0, sizeof(*sgid_elmt)); if (ib_query_gid(hca_device, port, 0, (union ib_gid *) sgid_elmt->gid)) { - kmem_cache_free(_tsIp2prLinkRoot.src_gid_cache, sgid_elmt); + kmem_cache_free(_link_root.src_gid_cache, sgid_elmt); return (-EFAULT); } - /* * set the fields */ @@ -1840,22 +1817,20 @@ sgid_elmt->port = port; sgid_elmt->port_state = port_state; sgid_elmt->gid_index = 0; - sgid_elmt->port_state = port_state; - /* * insert it into the list */ - spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); - sgid_elmt->next = _tsIp2prLinkRoot.src_gid_list; - _tsIp2prLinkRoot.src_gid_list = sgid_elmt; - sgid_elmt->p_next = &_tsIp2prLinkRoot.src_gid_list; + spin_lock_irqsave(&_link_root.gid_lock, flags); + sgid_elmt->next = _link_root.src_gid_list; + _link_root.src_gid_list = sgid_elmt; + sgid_elmt->p_next = &_link_root.src_gid_list; if (NULL != sgid_elmt->next) { sgid_elmt->next->p_next = &sgid_elmt->next; } - /* if */ - spin_unlock_irqrestore(&_tsIp2prLinkRoot.gid_lock, flags); + spin_unlock_irqrestore(&_link_root.gid_lock, flags); + return (0); } @@ -1921,7 +1896,10 @@ func = (tGID2PR_LOOKUP_FUNC) ipoib_wait->func; return func(tid, status, - ipoib_wait->hw_port, ipoib_wait->ca, path, ipoib_wait->arg); + ipoib_wait->hw_port, + ipoib_wait->ca, + path, + ipoib_wait->arg); return (0); } @@ -1949,12 +1927,15 @@ if (0 == ip2pr_gid_cache_lookup(src_gid, dst_gid, &path_record, &gid_node)) { func = (tGID2PR_LOOKUP_FUNC) funcptr; - result = - func(*plid, 0, gid_node->port, gid_node->ca, &path_record, - arg); + result = func(*plid, + 0, + gid_node->port, + gid_node->ca, + &path_record, + arg); if (0 != result) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "PATH: Error <%d> Completing Path Record Request.", + "PATH: Path Record Request error. <%d>", result); } return (0); @@ -1973,9 +1954,9 @@ if (NULL == ipoib_wait) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "PATH: Error creating address resolution wait object"); + "PATH: Error creating wait object"); return (-ENOMEM); - } /* if */ + } ipoib_wait->ca = gid_node->ca; ipoib_wait->hw_port = gid_node->port; ipoib_wait->pkey = pkey; @@ -2010,102 +1991,152 @@ } EXPORT_SYMBOL(gid2pr_lookup); -/* ip2pr_src_gid_cleanup -- Cleanup the Source GID List. */ -s32 ip2pr_src_gid_cleanup(void) +/* ip2pr_device_remove_one -- remove one device */ +static void ip2pr_device_remove_one(struct ib_device *device) { + struct ip2pr_gid_pr_element *prn_elmt; struct ip2pr_sgid_element *sgid_elmt; - s32 result; + struct ip2pr_sgid_element *next_elmt; + struct ib_event_handler *handler; + unsigned long flags; - while (NULL != (sgid_elmt = _tsIp2prLinkRoot.src_gid_list)) { + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, + "INIT: removing device. <%s>", device->name); - result = ip2pr_src_gid_delete(sgid_elmt); - TS_EXPECT(MOD_IP2PR, !(0 > result)); - } /* while */ + spin_lock_irqsave(&_link_root.gid_lock, flags); - kmem_cache_destroy(_tsIp2prLinkRoot.src_gid_cache); - kmem_cache_destroy(_tsIp2prLinkRoot.gid_pr_cache); + sgid_elmt = _link_root.src_gid_list; + while (NULL != sgid_elmt) { - return (0); + if (device != sgid_elmt->ca) { + + sgid_elmt = sgid_elmt->next; + continue; + } + /* + * Clear Path Record List for this Source GID node + */ + while (NULL != (prn_elmt = sgid_elmt->pr_list)) { + + ip2pr_delete(prn_elmt); + } + + next_elmt = sgid_elmt->next; + + if (NULL != sgid_elmt->next) { + sgid_elmt->next->p_next = sgid_elmt->p_next; + } + + *(sgid_elmt->p_next) = sgid_elmt->next; + + sgid_elmt->p_next = NULL; + sgid_elmt->next = NULL; + + kmem_cache_free(_link_root.src_gid_cache, sgid_elmt); + + sgid_elmt = next_elmt; + } + + spin_unlock_irqrestore(&_link_root.gid_lock, flags); + /* + * clean up async handler + */ + handler = ib_get_client_data(device, &ip2pr_client); + if (NULL == handler) { + + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, + "INIT: async handler lookup failure. <%s>", + device->name); + } + else { + + ib_unregister_event_handler(handler); + kfree(handler); + } + + return; } -/* ip2pr_src_gid_init -- initialize the Source GID List. */ -s32 ip2pr_src_gid_init(void) +/* ip2pr_device_init_one -- initialize one device */ +static void ip2pr_device_init_one(struct ib_device *device) { - s32 result = 0; - int i, j; - struct ib_device *hca_device; struct ib_device_attr dev_prop; struct ib_port_attr port_prop; + struct ib_event_handler *handler; + int counter; + int result; - _tsIp2prLinkRoot.src_gid_cache = kmem_cache_create("Ip2prSrcGidList", - sizeof - (struct ip2pr_sgid_element), - 0, - SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (NULL == _tsIp2prLinkRoot.src_gid_cache) { + TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, + "INIT: adding new device. <%s>", device->name); - TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "INIT: Failed to create src gid cache."); - return (-ENOMEM); - } - /* if */ - _tsIp2prLinkRoot.gid_pr_cache = kmem_cache_create("Ip2prGidPrList", - sizeof - (struct ip2pr_gid_pr_element), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (NULL == _tsIp2prLinkRoot.gid_pr_cache) { + result = ib_query_device(device, &dev_prop); + if (result) { - TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "INIT: Failed to create gid to pr list cache."); - kmem_cache_destroy(_tsIp2prLinkRoot.src_gid_cache); - return (-ENOMEM); + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, + "INIT: Error <%d> querying device. <%s>", + result, device->name); + + return; } - - /* if */ /* - * Create SGID list for each port on hca + * query ports. */ - for (i = 0; ((hca_device = ib_device_get_by_index(i)) != NULL); ++i) { - if (ib_query_device(hca_device, &dev_prop)) { - TS_REPORT_FATAL(MOD_IB_NET, - "ib_device_properties_get() failed"); - return -EINVAL; + for (counter = 0; counter < dev_prop.phys_port_cnt; counter++) { + + result = ib_query_port(device, (counter + 1), &port_prop); + if (result) { + + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, + "INIT: Error <%d> querying port. <%s:%d:%d>", + result, device->name, counter + 1, + dev_prop.phys_port_cnt); + continue; } - for (j = 1; j <= dev_prop.phys_port_cnt; j++) { - if (ib_query_port(hca_device, j, &port_prop)) { - continue; - } + result = ip2pr_src_gid_add(device, + (counter + 1), + port_prop.state); + if (0 > result) { + + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, + "INIT: Error <%d> saving GID. <%s:%d:%d>", + result, device->name, counter + 1, + dev_prop.phys_port_cnt); + } + } + /* + * allocate and set async event handler. + */ + handler = kmalloc(sizeof(*handler), GFP_KERNEL); - result = ip2pr_src_gid_add(hca_device, j, - port_prop.state); - if (0 > result) { - goto port_err; - } - } /* for */ - } /* for */ - return (0); + INIT_IB_EVENT_HANDLER(handler, device, ip2pr_event_func); - port_err: - kmem_cache_destroy(_tsIp2prLinkRoot.src_gid_cache); - kmem_cache_destroy(_tsIp2prLinkRoot.gid_pr_cache); + result = ib_register_event_handler(handler); + if (result) { - return (result); + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, + "INIT: Error <%d> registering event handler.", + result); + + kfree(handler); + } + else { + + ib_set_client_data(device, &ip2pr_client, handler); + } + + return; } /* ip2pr_link_addr_init -- initialize the advertisment caches. */ -s32 ip2pr_link_addr_init(void) +int ip2pr_link_addr_init(void) { - s32 result = 0; - int i; - struct ib_device *hca_device; + int result = 0; TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: Link level services initialization."); - if (NULL != _tsIp2prLinkRoot.wait_cache) { + if (NULL != _link_root.wait_cache) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "INIT: Wait cache is already initialized!"); @@ -2113,67 +2144,87 @@ result = -EINVAL; goto error; } - /* if */ /* * create cache */ - _tsIp2prLinkRoot.wait_cache = kmem_cache_create("Ip2prIpoibWait", - sizeof - (struct ip2pr_ipoib_wait), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (NULL == _tsIp2prLinkRoot.wait_cache) { + _link_root.wait_cache = kmem_cache_create("ip2pr_wait", + sizeof + (struct ip2pr_ipoib_wait), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (NULL == _link_root.wait_cache) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "INIT: Failed to create wait cache."); - + result = -ENOMEM; goto error_wait; } - /* if */ - _tsIp2prLinkRoot.path_cache = kmem_cache_create("Ip2prPathLookup", - sizeof - (struct ip2pr_path_element), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (NULL == _tsIp2prLinkRoot.path_cache) { + _link_root.path_cache = kmem_cache_create("ip2pr_path", + sizeof + (struct ip2pr_path_element), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (NULL == _link_root.path_cache) { + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "INIT: Failed to create path lookup cache."); result = -ENOMEM; goto error_path; } - /* if */ - _tsIp2prLinkRoot.user_req = kmem_cache_create("Ip2prUserReq", - sizeof - (struct ip2pr_user_req), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (NULL == _tsIp2prLinkRoot.user_req) { + _link_root.user_req = kmem_cache_create("ip2pr_user", + sizeof(struct ip2pr_user_req), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (NULL == _link_root.user_req) { + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "INIT: Failed to create user request cache."); result = -ENOMEM; goto error_user; } + + _link_root.src_gid_cache = kmem_cache_create("ip2pr_src_gid", + sizeof + (struct ip2pr_sgid_element), + 0, + SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (NULL == _link_root.src_gid_cache) { + + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, + "INIT: Failed to create src gid cache."); + result = -ENOMEM; + goto error_gid; + } + + _link_root.gid_pr_cache = kmem_cache_create("ip2pr_gid_pr", + sizeof + (struct ip2pr_gid_pr_element), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (NULL == _link_root.gid_pr_cache) { + + TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, + "INIT: Failed to create gid to pr list cache."); + + result = -ENOMEM; + goto error_pre; + } /* - * Install async event handler, to clear cache on port down + * register for device events. */ + result = ib_register_client(&ip2pr_client); + if (0 > result) { - for (i = 0; ((hca_device = ib_device_get_by_index(i)) != NULL); ++i) { - INIT_IB_EVENT_HANDLER(&_tsIp2prEventHandle[i], - hca_device, ip2pr_event_func); - result = ib_register_event_handler(&_tsIp2prEventHandle[i]); - if (result) { - TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, - "INIT: Error <%d> registering event handler.", - result); - goto error_async; - } + TS_TRACE(MOD_IP2PR, T_TERSE, TRACE_FLOW_FATAL, + "INIT: Error <%d> registering client.", result); + goto error_hca; } - /* * create timer for pruning path record cache. */ @@ -2189,41 +2240,43 @@ */ dev_add_pack(&_sdp_arp_type); - _tsIp2prLinkRoot.backoff = TS_IP2PR_PATH_BACKOFF; - _tsIp2prLinkRoot.max_retries = TS_IP2PR_PATH_RETRIES; - _tsIp2prLinkRoot.retry_timeout = TS_IP2PR_DEV_PATH_WAIT; - _tsIp2prLinkRoot.cache_timeout = TS_IP2PR_PATH_REAPING_AGE; + _link_root.backoff = TS_IP2PR_PATH_BACKOFF; + _link_root.max_retries = TS_IP2PR_PATH_RETRIES; + _link_root.retry_timeout = TS_IP2PR_DEV_PATH_WAIT; + _link_root.cache_timeout = TS_IP2PR_PATH_REAPING_AGE; return 0; - error_async: - - for (i = 0; i < IP2PR_MAX_HCAS; i++) - if (_tsIp2prEventHandle[i].device) - ib_unregister_event_handler(&_tsIp2prEventHandle[i]); - - kmem_cache_destroy(_tsIp2prLinkRoot.user_req); - error_user: - kmem_cache_destroy(_tsIp2prLinkRoot.path_cache); - error_path: - kmem_cache_destroy(_tsIp2prLinkRoot.wait_cache); - error_wait: - error: +error_hca: + kmem_cache_destroy(_link_root.gid_pr_cache); +error_pre: + kmem_cache_destroy(_link_root.src_gid_cache); +error_gid: + kmem_cache_destroy(_link_root.user_req); +error_user: + kmem_cache_destroy(_link_root.path_cache); +error_path: + kmem_cache_destroy(_link_root.wait_cache); +error_wait: +error: return result; } /* ip2pr_link_addr_cleanup -- cleanup the advertisment caches. */ -s32 ip2pr_link_addr_cleanup(void) +int ip2pr_link_addr_cleanup(void) { struct ip2pr_path_element *path_elmt; struct ip2pr_ipoib_wait *ipoib_wait; - u32 result; - int i; + int result; - TS_CHECK_NULL(_tsIp2prLinkRoot.wait_cache, -EINVAL); + TS_CHECK_NULL(_link_root.wait_cache, -EINVAL); TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: Link level services cleanup."); /* + * delete list of HCAs/PORTs + */ + ib_unregister_client(&ip2pr_client); + /* * stop cache pruning timer */ tsKernelTimerRemove(&_tsIp2prPathTimer); @@ -2231,25 +2284,17 @@ * remove ARP packet processing. */ dev_remove_pack(&_sdp_arp_type); - /* - * release async event handler(s) - */ - for (i = 0; i < IP2PR_MAX_HCAS; i++) - if (_tsIp2prEventHandle[i].device) - ib_unregister_event_handler(&_tsIp2prEventHandle[i]); - - /* * clear wait list */ - while (NULL != (ipoib_wait = _tsIp2prLinkRoot.wait_list)) { + while (NULL != (ipoib_wait = _link_root.wait_list)) { result = ip2pr_ipoib_wait_destroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); TS_EXPECT(MOD_IP2PR, !(0 > result)); } /* while */ - while (NULL != (path_elmt = _tsIp2prLinkRoot.path_list)) { + while (NULL != (path_elmt = _link_root.path_list)) { result = ip2pr_path_element_destroy(path_elmt); TS_EXPECT(MOD_IP2PR, !(0 > result)); @@ -2257,10 +2302,13 @@ /* * delete cache */ - kmem_cache_destroy(_tsIp2prLinkRoot.wait_cache); - kmem_cache_destroy(_tsIp2prLinkRoot.path_cache); - kmem_cache_destroy(_tsIp2prLinkRoot.user_req); + kmem_cache_destroy(_link_root.gid_pr_cache); + kmem_cache_destroy(_link_root.src_gid_cache); + kmem_cache_destroy(_link_root.wait_cache); + kmem_cache_destroy(_link_root.path_cache); + kmem_cache_destroy(_link_root.user_req); + return 0; } @@ -2330,7 +2378,7 @@ return (-EINVAL); } - ureq = kmem_cache_alloc(_tsIp2prLinkRoot.user_req, SLAB_ATOMIC); + ureq = kmem_cache_alloc(_link_root.user_req, SLAB_ATOMIC); if (NULL == ureq) { return (-ENOMEM); } @@ -2340,25 +2388,25 @@ status = ip2pr_path_record_lookup(param.dst_addr, 0, 0, 0, ip2pr_cb_internal, ureq, &plid); if (status < 0) { - kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); + kmem_cache_free(_link_root.user_req, ureq); return (-EFAULT); } status = down_interruptible(&ureq->sem); if (status) { ip2pr_path_record_cancel(plid); - kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); + kmem_cache_free(_link_root.user_req, ureq); return (-EINTR); } if (ureq->status) { - kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); + kmem_cache_free(_link_root.user_req, ureq); return (-EHOSTUNREACH); } copy_to_user(param.path_record, &ureq->path_record, sizeof(*param.path_record)); - kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); + kmem_cache_free(_link_root.user_req, ureq); return (0); } @@ -2383,7 +2431,7 @@ if (NULL == param.path_record) { return (-EINVAL); } - ureq = kmem_cache_alloc(_tsIp2prLinkRoot.user_req, SLAB_ATOMIC); + ureq = kmem_cache_alloc(_link_root.user_req, SLAB_ATOMIC); if (NULL == ureq) { return (-ENOMEM); } @@ -2393,19 +2441,19 @@ status = gid2pr_lookup(param.src_gid, param.dst_gid, param.pkey, gid2pr_cb_internal, (void *) ureq, &plid); if (status < 0) { - kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); + kmem_cache_free(_link_root.user_req, ureq); return (-EFAULT); } status = down_interruptible(&ureq->sem); if (status) { gid2pr_cancel(plid); - kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); + kmem_cache_free(_link_root.user_req, ureq); return (-EINTR); } if (ureq->status) { - kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); + kmem_cache_free(_link_root.user_req, ureq); return (-EHOSTUNREACH); } @@ -2414,7 +2462,7 @@ copy_to_user(&upa->port, &ureq->port, sizeof(upa->port)); copy_to_user(param.path_record, &ureq->path_record, sizeof(*param.path_record)); - kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); + kmem_cache_free(_link_root.user_req, ureq); return (0); } From libor at topspin.com Wed Oct 13 18:28:00 2004 From: libor at topspin.com (Libor Michalek) Date: Wed, 13 Oct 2004 18:28:00 -0700 Subject: [openib-general] [PATCH] SDP buffer managment simplification. In-Reply-To: <20041013174101.B16145@topspin.com>; from libor@topspin.com on Wed, Oct 13, 2004 at 05:41:01PM -0700 References: <52pt3vpoxz.fsf@topspin.com> <20041013134420.A16145@topspin.com> <52lleady1n.fsf@topspin.com> <20041013174101.B16145@topspin.com> Message-ID: <20041013182800.C16145@topspin.com> This patch is a simpliification to the SDP buffer management code. The previous code would never return memory once it had grabbed it, and the way the buffers were allocated made adding that feature difficult. So, I simplified the code in the process of adding this behaviour. -Libor Index: infiniband/ulp/sdp/sdp_proto.h =================================================================== --- infiniband/ulp/sdp/sdp_proto.h (revision 988) +++ infiniband/ulp/sdp/sdp_proto.h (working copy) @@ -98,7 +98,10 @@ tSDP_BUFF_TEST_FUNC test_func, void *usr_arg); -int sdp_buff_pool_init(u32 buff_min, u32 buff_max); +int sdp_buff_pool_init(int buff_min, + int buff_max, + int alloc_inc, + int free_mark); void sdp_buff_pool_destroy(void); @@ -276,8 +279,6 @@ /* --------------------------------------------------------------------- */ int sdp_conn_table_init(int proto_family, int conn_size, - int buff_min, - int buff_max, int recv_post_max, int recv_buff_max, int send_post_max, Index: infiniband/ulp/sdp/sdp_buff.h =================================================================== --- infiniband/ulp/sdp/sdp_buff.h (revision 988) +++ infiniband/ulp/sdp/sdp_buff.h (working copy) @@ -32,37 +32,36 @@ * structures */ struct sdpc_buff_q { - struct sdpc_buff *head; /* double linked list of buffers */ - u32 size; /* current number of buffers allocated to the pool */ + struct sdpc_buff *head; /* double linked list of buffers */ + u32 size; /* number of buffers in the pool */ #ifdef _TS_SDP_DEBUG_POOL_NAME - char *name; /* pointer to pools name */ + char *name; /* pointer to pools name */ #endif }; /* struct sdpc_buff_q */ struct sdpc_buff { struct sdpc_buff *next; struct sdpc_buff *prev; - u32 type; /* element type. (for generic queue) */ - struct sdpc_buff_q *pool; /* pool currently holding this buffer. */ - tSDP_GENERIC_DESTRUCT_FUNC release; /* release the object */ + u32 type; /* element type. (for generic queue) */ + struct sdpc_buff_q *pool; /* pool currently holding this buffer. */ + tSDP_GENERIC_DESTRUCT_FUNC release; /* release the object */ /* * primary generic data pointers */ - void *head; /* first byte of data buffer */ - void *data; /* first byte of valid data in buffer */ - void *tail; /* last byte of valid data in buffer */ - void *end; /* last byte of data buffer */ + void *head; /* first byte of data buffer */ + void *data; /* first byte of valid data in buffer */ + void *tail; /* last byte of valid data in buffer */ + void *end; /* last byte of data buffer */ /* * Experimental */ - u32 flags; /* Buffer flags */ - u32 u_id; /* unique buffer ID, used for tracking */ + u32 flags; /* Buffer flags */ /* * Protocol specific data */ - struct msg_hdr_bsdh *bsdh_hdr; /* SDP header (BSDH) */ - u32 data_size; /* size of just data in the buffer */ - u32 ib_wrid; /* IB work request ID */ + struct msg_hdr_bsdh *bsdh_hdr; /* SDP header (BSDH) */ + u32 data_size; /* size of just data in the buffer */ + u32 ib_wrid; /* IB work request ID */ /* * IB specific data (The main buffer pool sets the lkey when * it is created) Index: infiniband/ulp/sdp/sdp_buff_p.h =================================================================== --- infiniband/ulp/sdp/sdp_buff_p.h (revision 988) +++ infiniband/ulp/sdp/sdp_buff_p.h (working copy) @@ -38,59 +38,36 @@ * definitions */ #define TS_SDP_BUFFER_COUNT_MIN 1024 -#define TS_SDP_BUFFER_COUNT_MAX 131072 -#define TS_SDP_BUFFER_COUNT_INC 1024 +#define TS_SDP_BUFFER_COUNT_MAX 1048576 +#define TS_SDP_BUFFER_COUNT_INC 128 +#define TS_SDP_BUFFER_FREE_MARK 1024 #define TS_SDP_POOL_NAME_MAX 16 /* maximum size pool name */ #define TS_SDP_MAIN_POOL_NAME "main" -#define TS_SDP_BUFF_OUT_LEN 33 /* size of buffer output line */ /* - * types - */ -typedef struct tSDP_MAIN_POOL_STRUCT tSDP_MAIN_POOL_STRUCT, *tSDP_MAIN_POOL; -typedef struct tSDP_MEMORY_SEGMENT_STRUCT tSDP_MEMORY_SEGMENT_STRUCT, - *tSDP_MEMORY_SEGMENT; -typedef struct tSDP_MEM_SEG_HEAD_STRUCT tSDP_MEM_SEG_HEAD_STRUCT, - *tSDP_MEM_SEG_HEAD; -/* * structures */ -struct tSDP_MAIN_POOL_STRUCT { +struct sdpc_buff_root { /* * variant */ - struct sdpc_buff_q pool; /* actual pool of buffers */ - spinlock_t lock; /* spin lock for pool access */ + struct sdpc_buff_q pool; /* actual pool of buffers */ + spinlock_t lock; /* spin lock for pool access */ /* * invariant */ - kmem_cache_t *pool_cache; /* cache of pool objects */ + kmem_cache_t *pool_cache; /* cache of pool objects */ + kmem_cache_t *buff_cache; /* cache of buffer descriptor objects */ - u32 buff_min; - u32 buff_max; - u32 buff_cur; - u32 buff_size; /* size of each buffer in the pool */ + int buff_min; /* minimum allocated buffers */ + int buff_max; /* maximum allocated buffers */ + int buff_cur; /* total allocated buffers */ + int buff_size; /* size of each buffer in the pool */ - tSDP_MEMORY_SEGMENT segs; -}; /* tSDP_MAIN_POOL_STRUCT */ + int alloc_inc; /* allocation increment */ + int free_mark; /* start freeing unused buffers */ +}; /* struct sdpc_buff_root */ -/* - * Each memory segment is its own page. - */ -struct tSDP_MEM_SEG_HEAD_STRUCT { - tSDP_MEMORY_SEGMENT next; - tSDP_MEMORY_SEGMENT prev; - u32 size; -}; /* tSDP_MEM_SEG_HEAD_STRUCT */ - -#define TS_SDP_BUFF_COUNT ((PAGE_SIZE - sizeof(tSDP_MEM_SEG_HEAD_STRUCT))/ \ - sizeof(struct sdpc_buff)) - -struct tSDP_MEMORY_SEGMENT_STRUCT { - tSDP_MEM_SEG_HEAD_STRUCT head; - struct sdpc_buff list[TS_SDP_BUFF_COUNT]; -}; /* tSDP_MEMORY_REGION_STRUCT */ - #endif /* _TS_SDP_BUFF_P_H */ Index: infiniband/ulp/sdp/sdp_conn.c =================================================================== --- infiniband/ulp/sdp/sdp_conn.c (revision 988) +++ infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -2061,8 +2061,6 @@ /*..sdp_conn_table_init -- create a sdp connection table */ int sdp_conn_table_init(int proto_family, int conn_size, - int buff_min, - int buff_max, int recv_post_max, int recv_buff_max, int send_post_max, @@ -2151,17 +2149,7 @@ result); goto error_iocb; } - /* - * buffer memory - */ - result = sdp_buff_pool_init(buff_min, buff_max); - if (0 > result) { - TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, - "INIT: Error <%d> initializing buffer pool.", result); - goto error_buff; - } - _dev_root_s.conn_cache = kmem_cache_create("SdpConnCache", sizeof(struct sdp_opt), 0, SLAB_HWCACHE_ALIGN, @@ -2203,8 +2191,6 @@ error_sock: kmem_cache_destroy(_dev_root_s.conn_cache); error_conn: - (void)sdp_buff_pool_destroy(); -error_buff: (void)sdp_main_iocb_cleanup(); error_iocb: _dev_root_s.sk_array--; @@ -2251,10 +2237,6 @@ */ (void)sdp_cm_listen_stop(&_dev_root_s); /* - * delete buffer memory - */ - (void)sdp_buff_pool_destroy(); - /* * delete IOCB table */ (void)sdp_main_iocb_cleanup(); Index: infiniband/ulp/sdp/sdp_inet.c =================================================================== --- infiniband/ulp/sdp/sdp_inet.c (revision 988) +++ infiniband/ulp/sdp/sdp_inet.c (working copy) @@ -27,10 +27,12 @@ /* * list of connections waiting for an incomming connection */ -static int _proto_family = TS_SDP_DEV_PROTO; -static int _buff_min = TS_SDP_BUFFER_COUNT_MIN; -static int _buff_max = TS_SDP_BUFFER_COUNT_MAX; -static int _conn_size = TS_SDP_DEV_SK_LIST_SIZE; +static int _proto_family = TS_SDP_DEV_PROTO; +static int _buff_min = TS_SDP_BUFFER_COUNT_MIN; +static int _buff_max = TS_SDP_BUFFER_COUNT_MAX; +static int _alloc_inc = TS_SDP_BUFFER_COUNT_INC; +static int _free_mark = TS_SDP_BUFFER_FREE_MARK; +static int _conn_size = TS_SDP_DEV_SK_LIST_SIZE; static int _recv_post_max = TS_SDP_DEV_RECV_POST_MAX; static int _recv_buff_max = TS_SDP_RECV_BUFFERS_MAX; @@ -1837,17 +1839,28 @@ if (0 > result) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, - "INIT: Error <%d> initializing SDP advertisments <%d>", + "INIT: Error <%d> initializing advertisments <%d>", result); goto error_advt; } /* + * buffer memory + */ + result = sdp_buff_pool_init(_buff_min, + _buff_max, + _alloc_inc, + _free_mark); + if (0 > result) { + + TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, + "INIT: Error <%d> initializing buffer pool.", result); + goto error_buff; + } + /* * connection table */ result = sdp_conn_table_init(_proto_family, _conn_size, - _buff_min, - _buff_max, _recv_post_max, _recv_buff_max, _send_post_max, @@ -1878,6 +1891,8 @@ error_sock: (void)sdp_conn_table_clear(); error_conn: + (void)sdp_buff_pool_destroy(); +error_buff: (void)sdp_main_advt_cleanup(); error_advt: (void)sdp_main_desc_cleanup(); @@ -1902,6 +1917,10 @@ */ (void)sdp_conn_table_clear(); /* + * delete buffer memory + */ + (void)sdp_buff_pool_destroy(); + /* * delete advertisment table */ (void)sdp_main_advt_cleanup(); Index: infiniband/ulp/sdp/sdp_buff.c =================================================================== --- infiniband/ulp/sdp/sdp_buff.c (revision 988) +++ infiniband/ulp/sdp/sdp_buff.c (working copy) @@ -24,16 +24,19 @@ #include "sdp_main.h" static char _main_pool_name[] = TS_SDP_MAIN_POOL_NAME; -static tSDP_MAIN_POOL main_pool = NULL; +static struct sdpc_buff_root *main_pool = NULL; /* * data buffers managment API */ /* ========================================================================= */ /*.._sdp_buff_q_get - Get a buffer from a specific pool */ -static __inline__ struct sdpc_buff *_sdp_buff_q_get(struct sdpc_buff_q *pool, - int fifo, - tSDP_BUFF_TEST_FUNC test_func, - void *usr_arg) +static __inline__ struct sdpc_buff *_sdp_buff_q_get +( + struct sdpc_buff_q *pool, + int fifo, + tSDP_BUFF_TEST_FUNC test_func, + void *usr_arg +) { struct sdpc_buff *buff; @@ -483,11 +486,11 @@ if (0 > result) { TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, - "BUFF: Error <%d> returning buffer to main. <%d>", - result, pool->size); + "BUFF: Error <%d> returning buffer to main", + result); } } - + return 0; } /* sdp_buff_q_clear */ @@ -498,190 +501,148 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ -/*.._sdp_buff_pool_seg_release -- release buffers from the segment */ -static int _sdp_buff_pool_seg_release(tSDP_MEMORY_SEGMENT mem_seg) +/*.._sdp_buff_pool_release -- release allocated buffers from the main pool */ +static int _sdp_buff_pool_release(struct sdpc_buff_root *m_pool, + int count) { - TS_CHECK_NULL(mem_seg, -EINVAL); - /* - * loop through pages. - */ - while (0 < mem_seg->head.size) { + struct sdpc_buff *buff; - mem_seg->head.size--; - free_page((unsigned long)mem_seg->list[mem_seg->head.size]. - head); - } - /* - * free descriptor page - */ - free_page((unsigned long)mem_seg); - - return 0; -} /* _sdp_buff_pool_seg_release */ - -/* ========================================================================= */ -/*.._sdp_buff_pool_seg_release_all -- release buffers from the segment */ -static int _sdp_buff_pool_seg_release_all(tSDP_MAIN_POOL m_pool) -{ - tSDP_MEMORY_SEGMENT mem_seg; - int result; - TS_CHECK_NULL(m_pool, -EINVAL); /* - * loop through pages. + * Release count buffers. */ - while (NULL != m_pool->segs) { - - mem_seg = m_pool->segs; - m_pool->segs = mem_seg->head.next; - - m_pool->buff_cur -= mem_seg->head.size; - - result = _sdp_buff_pool_seg_release(mem_seg); - TS_EXPECT(MOD_LNX_SDP, !(0 > result)); + while (count--) { + + buff = sdp_buff_q_get(&m_pool->pool); + if (NULL == buff) { + + break; + } + /* + * decrement global buffer count, free buffer page, and free + * buffer descriptor. + */ + m_pool->buff_cur--; + free_page((unsigned long)buff->head); + kmem_cache_free(m_pool->buff_cache, buff); } - + return 0; -} /* _sdp_buff_pool_seg_release_all */ +} /* _sdp_buff_pool_release */ /* ========================================================================= */ -/*.._sdp_buff_pool_seg_alloc -- allocate more buffers for the main pool */ -static tSDP_MEMORY_SEGMENT _sdp_buff_pool_seg_alloc(void) +/*.._sdp_buff_pool_release_check -- check for buffer release from main pool */ +static __inline__ int _sdp_buff_pool_release_check +( + struct sdpc_buff_root *m_pool +) { - tSDP_MEMORY_SEGMENT mem_seg; - int counter; - int result; + TS_CHECK_NULL(m_pool, -EINVAL); /* - * get descriptor page + * If there are more then minimum buffers outstanding, free half of + * the available buffers. */ - mem_seg = (tSDP_MEMORY_SEGMENT) __get_free_page(GFP_ATOMIC); - if (NULL == mem_seg) { + if (m_pool->buff_cur > m_pool->buff_min && + m_pool->pool.size > m_pool->free_mark) { + int count; + /* + * Always leave at least minimum buffers, otherwise remove + * either half of the pool, which is more then the mark + */ + count = min((m_pool->buff_cur - m_pool->buff_min), + (m_pool->free_mark/2)); - TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, - "BUFFER: Failed to allocate descriptor page."); - - goto error; + return _sdp_buff_pool_release(m_pool, count); } - /* - * loop - */ - for (counter = 0, mem_seg->head.size = 0; - counter < TS_SDP_BUFF_COUNT; counter++, mem_seg->head.size++) { - - mem_seg->list[counter].head = - (void *) __get_free_page(GFP_ATOMIC); - if (NULL == mem_seg->list[counter].head) { - - TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, - "BUFFER: Failed to allocate buffer page. <%d>", - counter); - - goto error_free; - } - - mem_seg->list[counter].end = - mem_seg->list[counter].head + PAGE_SIZE; - mem_seg->list[counter].data = mem_seg->list[counter].head; - mem_seg->list[counter].tail = mem_seg->list[counter].head; - mem_seg->list[counter].lkey = 0; - mem_seg->list[counter].real = 0; - mem_seg->list[counter].size = 0; - mem_seg->list[counter].u_id = 0; - mem_seg->list[counter].pool = NULL; - mem_seg->list[counter].type = TS_SDP_GENERIC_TYPE_BUFF; - mem_seg->list[counter].release = ((tSDP_GENERIC_DESTRUCT_FUNC) - sdp_buff_pool_put); + else { + + return 0; } - /* - * return segment - */ - return mem_seg; -error_free: - result = _sdp_buff_pool_seg_release(mem_seg); - TS_EXPECT(MOD_LNX_SDP, !(0 > result)); -error: - return NULL; -} /* _sdp_buff_pool_seg_alloc */ +} /* _sdp_buff_pool_release_check */ /* ========================================================================= */ /*.._sdp_buff_pool_alloc -- allocate more buffers for the main pool */ -static int _sdp_buff_pool_alloc(tSDP_MAIN_POOL m_pool, u32 size) +static int _sdp_buff_pool_alloc(struct sdpc_buff_root *m_pool) { - tSDP_MEMORY_SEGMENT head_seg = NULL; - tSDP_MEMORY_SEGMENT mem_seg; - u32 counter = 0; - u32 total = 0; + struct sdpc_buff *buff; + int total; int result; TS_CHECK_NULL(m_pool, -EINVAL); /* - * check pool limits. + * Calculate the total number of buffers. */ - if (m_pool->buff_max < (m_pool->buff_cur + size)) { + total = max(m_pool->buff_min, (m_pool->buff_cur + m_pool->alloc_inc)); + total = min(total, m_pool->buff_max); - goto error; - } - /* - * first allocate the requested number of buffers. Once complete - * place them all into the main pool. - */ - while (total < size) { + while (total > m_pool->buff_cur) { + /* + * allocate a buffer descriptor, buffer, and then add it to + * the pool. + */ + buff = kmem_cache_alloc(m_pool->buff_cache, GFP_ATOMIC); + if (NULL == buff) { - mem_seg = _sdp_buff_pool_seg_alloc(); - if (NULL == mem_seg) { - TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, - "BUFFER: Failed to allocate segment."); - - goto error; + "BUFFER: Failed to allocate buffer. <%d:%d>", + total, m_pool->buff_cur); + break; } - mem_seg->head.next = head_seg; - head_seg = mem_seg; + buff->head = (void *)__get_free_page(GFP_ATOMIC); + if (NULL == buff->head) { - total += mem_seg->head.size; - } - /* - * insert each segment into the list, and insert each buffer into - * the main pool - */ - while (NULL != head_seg) { + TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, + "BUFFER: Failed to allocate page. <%d:%d>", + total, m_pool->buff_cur); - mem_seg = head_seg; - head_seg = mem_seg->head.next; + kmem_cache_free(m_pool->buff_cache, buff); + break; + } + + buff->end = buff->head + PAGE_SIZE; + buff->data = buff->head; + buff->tail = buff->head; + buff->lkey = 0; + buff->real = 0; + buff->size = 0; + buff->pool = NULL; + buff->type = TS_SDP_GENERIC_TYPE_BUFF; + buff->release = ((tSDP_GENERIC_DESTRUCT_FUNC)sdp_buff_q_put); - mem_seg->head.next = m_pool->segs; - m_pool->segs = mem_seg; + result = sdp_buff_q_put(&m_pool->pool, buff); + if (0 > result) { - for (counter = 0; counter < mem_seg->head.size; counter++) { - - mem_seg->list[counter].u_id = m_pool->buff_cur++; - - result = - sdp_buff_q_put(&main_pool->pool, - &mem_seg->list[counter]); - TS_EXPECT(MOD_LNX_SDP, !(0 > result)); + TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, + "BUFFER: Failed to insert buffer. <%d>", + result); + + free_page((unsigned long)buff->head); + kmem_cache_free(m_pool->buff_cache, buff); + break; } + + m_pool->buff_cur++; } - return total; -error: + if (NULL == main_pool->pool.head) { - while (NULL != head_seg) { + TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_FATAL, + "BUFFER: Failed to allocate any buffers. <%d:%d:%d>", + total, m_pool->buff_cur, m_pool->alloc_inc); - mem_seg = head_seg; - head_seg = mem_seg->head.next; - - result = _sdp_buff_pool_seg_release(mem_seg); - TS_EXPECT(MOD_LNX_SDP, !(0 > result)); + return -ENOMEM; } - return -ENOMEM; + return 0; } /* _sdp_buff_pool_alloc */ /* ========================================================================= */ /*..sdp_buff_pool_init - Initialize the main buffer pool of memory */ -int sdp_buff_pool_init(u32 buff_min, u32 buff_max) +int sdp_buff_pool_init(int buff_min, + int buff_max, + int alloc_inc, + int free_mark) { int result; @@ -692,17 +653,20 @@ return -EEXIST; } - if (!(0 < buff_min) || buff_max < buff_min) { + if (!(0 < buff_min) || + !(0 < alloc_inc) || + !(0 < free_mark) || + buff_max < buff_min) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, - "BUFFER: Pool allocation count error. <%d:%d>", - buff_min, buff_max); + "BUFFER: Pool allocation count error. <%d:%d:%d:%d>", + buff_min, buff_max, alloc_inc, free_mark); return -ERANGE; } /* * allocate the main pool structures */ - main_pool = kmalloc(sizeof(tSDP_MAIN_POOL_STRUCT), GFP_KERNEL); + main_pool = kmalloc(sizeof(struct sdpc_buff_root), GFP_KERNEL); if (NULL == main_pool) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, @@ -711,18 +675,20 @@ goto done; } - memset(main_pool, 0, sizeof(tSDP_MAIN_POOL_STRUCT)); + memset(main_pool, 0, sizeof(struct sdpc_buff_root)); main_pool->buff_size = PAGE_SIZE; - main_pool->buff_min = buff_min; - main_pool->buff_max = buff_max; + main_pool->buff_min = buff_min; + main_pool->buff_max = buff_max; + main_pool->alloc_inc = alloc_inc; + main_pool->free_mark = free_mark; spin_lock_init(&main_pool->lock); result = sdp_buff_q_init(&main_pool->pool, _main_pool_name, 0); TS_EXPECT(MOD_LNX_SDP, !(0 > result)); - main_pool->pool_cache = kmem_cache_create("SdpBuffPool", + main_pool->pool_cache = kmem_cache_create("sdp_buff_pool", sizeof(struct sdpc_buff_q), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); @@ -733,10 +699,22 @@ result = -ENOMEM; goto error_pool; } + + main_pool->buff_cache = kmem_cache_create("sdp_buff_desc", + sizeof(struct sdpc_buff), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (NULL == main_pool->buff_cache) { + + TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, + "BUFFER: Failed to allocate buffer cache."); + result = -ENOMEM; + goto error_buff; + } /* if */ /* * allocate the minimum number of buffers. */ - result = _sdp_buff_pool_alloc(main_pool, buff_min); + result = _sdp_buff_pool_alloc(main_pool); if (0 > result) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, @@ -753,6 +731,8 @@ return 0; /* success */ error_alloc: + kmem_cache_destroy(main_pool->buff_cache); +error_buff: kmem_cache_destroy(main_pool->pool_cache); error_pool: kfree(main_pool); @@ -772,13 +752,23 @@ return; } /* - * Free all the memory regions + * Free all the buffers. */ - (void)_sdp_buff_pool_seg_release_all(main_pool); + (void)_sdp_buff_pool_release(main_pool, main_pool->buff_cur); /* + * Sanity check that the current number of buffers was released. + */ + if (main_pool->buff_cur) { + + TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_CLEANUP, + "BUFFER: Leaking buffers during cleanup. <%d>", + main_pool->buff_cur); + } + /* * free pool cache */ kmem_cache_destroy(main_pool->pool_cache); + kmem_cache_destroy(main_pool->buff_cache); /* * free main */ @@ -809,14 +799,13 @@ if (NULL == main_pool->pool.head) { - result = - _sdp_buff_pool_alloc(main_pool, TS_SDP_BUFFER_COUNT_INC); + result = _sdp_buff_pool_alloc(main_pool); if (0 > result) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, "BUFFER: Error <%d> allocating buffers.", result); - + spin_unlock_irqrestore(&main_pool->lock, flags); return NULL; } @@ -899,6 +888,8 @@ main_pool->pool.size++; + (void)_sdp_buff_pool_release_check(main_pool); + spin_unlock_irqrestore(&main_pool->lock, flags); return 0; @@ -972,6 +963,8 @@ main_pool->pool.size += count; + (void)_sdp_buff_pool_release_check(main_pool); + spin_unlock_irqrestore(&main_pool->lock, flags); return 0; @@ -1021,11 +1014,8 @@ off_t start_index, long *end_index) { - tSDP_MEMORY_SEGMENT mem_seg; - int buff_count; - int offset = 0; - int counter; unsigned long flags; + int offset = 0; TS_CHECK_NULL(buffer, -EINVAL); /* @@ -1039,19 +1029,22 @@ if (0 == start_index) { - offset += sprintf((buffer + offset), "Totals:\n"); - offset += sprintf((buffer + offset), "-------\n"); - offset += sprintf((buffer + offset), " buffer size: %8d\n", main_pool->buff_size); - offset += sprintf((buffer + offset), + offset += sprintf((buffer + offset), " buffers maximum: %8d\n", main_pool->buff_max); - offset += sprintf((buffer + offset), + offset += sprintf((buffer + offset), " buffers minimum: %8d\n", main_pool->buff_min); offset += sprintf((buffer + offset), + " buffers increment: %8d\n", + main_pool->alloc_inc); + offset += sprintf((buffer + offset), + " buffers decrement: %8d\n", + main_pool->free_mark); + offset += sprintf((buffer + offset), " buffers allocated: %8d\n", main_pool->buff_cur); offset += sprintf((buffer + offset), @@ -1060,54 +1053,8 @@ offset += sprintf((buffer + offset), " buffers outstanding: %8d\n", main_pool->buff_cur - main_pool->pool.size); - offset += sprintf((buffer + offset), "\nBuffers:\n"); - offset += sprintf((buffer + offset), "--------\n"); - offset += sprintf((buffer + offset), - " id size pool name\n"); - offset += sprintf((buffer + offset), - " -------- ---- ----------------\n"); } - /* - * buffers - */ - if (!(start_index < main_pool->buff_cur)) { - goto done; - } - - for (counter = 0, buff_count = 0, mem_seg = main_pool->segs; - NULL != mem_seg && TS_SDP_BUFF_OUT_LEN < (max_size - offset); - mem_seg = mem_seg->head.next) { - - for (counter = 0; - counter < mem_seg->head.size && - TS_SDP_BUFF_OUT_LEN < (max_size - offset); - counter++, buff_count++) { - - if (start_index > buff_count) { - - continue; - } - - offset += sprintf((buffer + offset), - " %08x %04x %-16s\n", - mem_seg->list[counter].u_id, - (int)(mem_seg->list[counter].tail - - mem_seg->list[counter].data), -#ifdef _TS_SDP_DEBUG_POOL_NAME - ((NULL != mem_seg->list[counter]. - pool) ? - mem_seg->list[counter].pool->name : - "") -#else - "" -#endif - ); - } - } - - *end_index = buff_count - start_index; -done: spin_unlock_irqrestore(&main_pool->lock, flags); return offset; From halr at voltaire.com Wed Oct 13 20:59:43 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 13 Oct 2004 23:59:43 -0400 Subject: [openib-general] Re: [PATCH] single exit in ib_mad_init_module In-Reply-To: <20041013160252.611c98e7.mshefty@ichips.intel.com> References: <20041013160252.611c98e7.mshefty@ichips.intel.com> Message-ID: <1097726382.2751.332.camel@localhost.localdomain> On Wed, 2004-10-13 at 19:02, Sean Hefty wrote: > Patch to fix cleanup issue in ib_mad_init_module. Thanks. Applied. -- Hal From roland at topspin.com Wed Oct 13 22:23:45 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 13 Oct 2004 22:23:45 -0700 Subject: [openib-general] Re: [PATCH] dynamic device init for IP2PR In-Reply-To: <20041013174101.B16145@topspin.com> (Libor Michalek's message of "Wed, 13 Oct 2004 17:41:01 -0700") References: <52pt3vpoxz.fsf@topspin.com> <20041013134420.A16145@topspin.com> <52lleady1n.fsf@topspin.com> <20041013174101.B16145@topspin.com> Message-ID: <52hdoxeuu6.fsf@topspin.com> Excellent, 7900 bonus points counting the 100 point deduction for this whitespace atrocity: module_init(ip2pr_driver_init_module); module_exit(ip2pr_driver_cleanup_module); + + + + + + + + + + From roland at topspin.com Wed Oct 13 22:26:20 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 13 Oct 2004 22:26:20 -0700 Subject: [openib-general] [PATCH] SDP buffer managment simplification. In-Reply-To: <20041013182800.C16145@topspin.com> (Libor Michalek's message of "Wed, 13 Oct 2004 18:28:00 -0700") References: <52pt3vpoxz.fsf@topspin.com> <20041013134420.A16145@topspin.com> <52lleady1n.fsf@topspin.com> <20041013174101.B16145@topspin.com> <20041013182800.C16145@topspin.com> Message-ID: <52d5zleupv.fsf@topspin.com> thanks, applied to my branch. - R. From halr at voltaire.com Thu Oct 14 05:30:45 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 08:30:45 -0400 Subject: [openib-general] Re: [PATCH] fix list_entry usage In-Reply-To: <20041013165244.3ae058c4.mshefty@ichips.intel.com> References: <20041013165244.3ae058c4.mshefty@ichips.intel.com> Message-ID: <1097757045.22373.8.camel@hpc-1> On Wed, 2004-10-13 at 19:52, Sean Hefty wrote: > Patch fixes casting to incorrect structures when calling list_entry(). Have you tried this ? It doesn't work (at least for me). (I had also tried the previous similar patch with the same results but forgot to report back). It is on my list to debug further unless you get to it first. rbuf = list_entry(&port_priv->recv_posted_mad_list[qpn].next, struct ib_mad_recv_buf, list); does not appear to be obtaining next. Similarly for the other changes. -- Hal From halr at voltaire.com Thu Oct 14 06:12:21 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 09:12:21 -0400 Subject: [openib-general] Switch SMI incoming MAD question Message-ID: <1097759540.22373.29.camel@hpc-1> Hi, We added port_num into the UD send_wr structure to accomodate switches. Isn't a similar thing needed for receive WCs ? The SMI needs to know which physical port the DR SMP came in on. If this is the case, it seems to me that the right thing to do to add this to the ib_mad_recv_wc structure. -- Hal From halr at voltaire.com Thu Oct 14 06:16:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 09:16:26 -0400 Subject: [openib-general] [PATCH] ib_smi: Make port number parameter to SMI DR routines Message-ID: <1097759786.22373.35.camel@hpc-1> ib_smi: Make port number parameter to SMI DR routines (in preparation for switch support where port number is not the port of the receiving MAD agent) Index: ib_smi.c =================================================================== --- ib_smi.c (revision 990) +++ ib_smi.c (working copy) @@ -42,7 +42,8 @@ * discarded. */ static int smi_handle_dr_smp_send(struct ib_mad_agent *mad_agent, - struct ib_smp *smp) + struct ib_smp *smp, + int port_num) { u8 hop_ptr, hop_cnt; @@ -55,7 +56,7 @@ if (hop_cnt && hop_ptr == 0) { smp->hop_ptr++; return (smp->initial_path[smp->hop_ptr] == - mad_agent->port_num); + port_num); } /* C14-9:2 */ @@ -66,7 +67,7 @@ /* smp->return_path set when received */ smp->hop_ptr++; return (smp->initial_path[smp->hop_ptr] == - mad_agent->port_num); + port_num); } /* C14-9:3 -- We're at the end of the DR segment of path */ @@ -86,7 +87,7 @@ if (hop_cnt && hop_ptr == hop_cnt + 1) { smp->hop_ptr--; return (smp->return_path[smp->hop_ptr] == - mad_agent->port_num); + port_num); } /* C14-13:2 */ @@ -96,7 +97,7 @@ smp->hop_ptr--; return (smp->return_path[smp->hop_ptr] == - mad_agent->port_num); + port_num); } /* C14-13:3 -- at the end of the DR segment of path */ @@ -118,12 +119,13 @@ * the spec. Return 0 if the SMP should be dropped. */ static int smi_handle_smp_send(struct ib_mad_agent *mad_agent, - struct ib_smp *smp) + struct ib_smp *smp, + int port_num) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_send(mad_agent, smp); + return smi_handle_dr_smp_send(mad_agent, smp, port_num); default: /* LR SM or PerfMgmt classes */ return 1; } @@ -149,6 +151,7 @@ */ static int smi_handle_dr_smp_recv(struct ib_mad_agent *mad_agent, struct ib_smp *smp, + int port_num, int phys_port_cnt) { u8 hop_ptr, hop_cnt; @@ -175,7 +178,7 @@ /* C14-9:3 -- We're at the end of the DR segment of path */ if (hop_ptr == hop_cnt) { if (hop_cnt) - smp->return_path[hop_ptr] = mad_agent->port_num; + smp->return_path[hop_ptr] = port_num; /* smp->hop_ptr updated when sending */ return (mad_agent->device->node_type == IB_NODE_SWITCH || @@ -192,7 +195,7 @@ if (hop_cnt && hop_ptr == hop_cnt + 1) { smp->hop_ptr--; return (smp->return_path[smp->hop_ptr] == - mad_agent->port_num); + port_num); } /* C14-13:2 */ @@ -227,12 +230,14 @@ */ static int smi_handle_smp_recv(struct ib_mad_agent *mad_agent, struct ib_smp *smp, + int port_num, int phys_port_cnt) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_recv(mad_agent, smp, phys_port_cnt); + return smi_handle_dr_smp_recv(mad_agent, smp, + port_num, phys_port_cnt); default: /* LR SM or PerfMgmt classes */ return 1; } @@ -402,7 +407,7 @@ struct ib_mad *smp_response; int ret; - if (!smi_handle_smp_send(mad_agent, smp)) { + if (!smi_handle_smp_send(mad_agent, smp, mad_agent->port_num)) { /* SMI failed send */ return 0; } @@ -417,6 +422,7 @@ if (ret & IB_MAD_RESULT_SUCCESS) { if (!smi_handle_smp_recv(mad_agent, (struct ib_smp *)smp_response, + mad_agent->port_num, phys_port_cnt)) { /* SMI failed receive */ kfree(smp_response); @@ -437,7 +443,8 @@ struct ib_mad_recv_wc *mad_recv_wc, int phys_port_cnt) { - if (!smi_handle_smp_recv(mad_agent, smp, phys_port_cnt)) { + if (!smi_handle_smp_recv(mad_agent, smp, + mad_agent->port_num, phys_port_cnt)) { /* SMI failed receive */ return 0; } From halr at voltaire.com Thu Oct 14 06:21:27 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 09:21:27 -0400 Subject: [openib-general] Switch SMI incoming MAD question In-Reply-To: <1097759540.22373.29.camel@hpc-1> References: <1097759540.22373.29.camel@hpc-1> Message-ID: <1097760087.22373.38.camel@hpc-1> On Thu, 2004-10-14 at 09:12, Hal Rosenstock wrote: > Hi, > > We added port_num into the UD send_wr structure to accomodate switches. > Isn't a similar thing needed for receive WCs ? The SMI needs to know > which physical port the DR SMP came in on. If this is the case, it seems > to me that the right thing to do to add this to the ib_mad_recv_wc > structure. I'm half asleep (and half right above)... I think it needs to be added to ib_wc structure rather than ib_mad_recv_wc structure as the incoming port that the MAD layer receives on is not the same as the switch physical port. -- Hal From Tom.Duffy at Sun.COM Thu Oct 14 08:33:10 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Thu, 14 Oct 2004 08:33:10 -0700 Subject: [openib-general] Re: [PATCH] dynamic device init for IP2PR In-Reply-To: <52hdoxeuu6.fsf@topspin.com> References: <52pt3vpoxz.fsf@topspin.com> <20041013134420.A16145@topspin.com> <52lleady1n.fsf@topspin.com> <20041013174101.B16145@topspin.com> <52hdoxeuu6.fsf@topspin.com> Message-ID: <416E9C36.4040507@sun.com> Roland Dreier wrote: > Excellent, 7900 bonus points counting the 100 point deduction for this > whitespace atrocity: That's what you get for using emacs ;-) -tduffy From tduffy at sun.com Thu Oct 14 09:24:30 2004 From: tduffy at sun.com (Tom Duffy) Date: Thu, 14 Oct 2004 09:24:30 -0700 Subject: [openib-general] Re: [openib-commits] r1001 - in gen1/trunk/src/userspace/osm: . opensm osmsh osmtest In-Reply-To: <20041014160410.C79AF2283D4@openib.ca.sandia.gov> References: <20041014160410.C79AF2283D4@openib.ca.sandia.gov> Message-ID: <1097771070.19323.4.camel@duffman> On Thu, 2004-10-14 at 09:04 -0700, eitan at openib.org wrote: > Supporting new vendor - ts_no_vapi > This optional vendor does not require VAPI user level and > is based on gen1 /proc/infiniband/core tree Huh? Why are you adding support for the (obsolete) gen1 openib? What about the gen2 tree that uses /sys/? -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mshefty at ichips.intel.com Thu Oct 14 10:16:43 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Oct 2004 10:16:43 -0700 Subject: [openib-general] Re: [PATCH] fix list_entry usage In-Reply-To: <1097757045.22373.8.camel@hpc-1> References: <20041013165244.3ae058c4.mshefty@ichips.intel.com> <1097757045.22373.8.camel@hpc-1> Message-ID: <20041014101643.1fb402e0.mshefty@ichips.intel.com> On Thu, 14 Oct 2004 08:30:45 -0400 Hal Rosenstock wrote: > On Wed, 2004-10-13 at 19:52, Sean Hefty wrote: > > Patch fixes casting to incorrect structures when calling list_entry(). > > Have you tried this ? It doesn't work (at least for me). (I had also > tried the previous similar patch with the same results but forgot to > report back). It is on my list to debug further unless you get to it > first. > > rbuf = > list_entry(&port_priv->recv_posted_mad_list[qpn].next, > struct ib_mad_recv_buf, > list); I didn't try this (it was a diversion from the change that I was trying to make), but I'm guessing that we need to remove the &. - Sean From halr at voltaire.com Thu Oct 14 10:33:08 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 13:33:08 -0400 Subject: [openib-general] [PATCH] ib_mad: Fix send only registrations Message-ID: <1097775188.2514.1.camel@hpc-1> ib_mad: Fix send only registrations Index: ib_mad.c =================================================================== --- ib_mad.c (revision 990) +++ ib_mad.c (working copy) @@ -116,11 +116,6 @@ goto error1; } - if (!send_handler) { - ret = ERR_PTR(-EINVAL); - goto error1; - } - if (rmpp_version) { ret = ERR_PTR(-EINVAL); /* until RMPP implemented!!! */ goto error1; @@ -128,11 +123,22 @@ /* Validate MAD registration request if supplied */ if (mad_reg_req) { - if (!recv_handler || - mad_reg_req->mgmt_class_version >= MAX_MGMT_VERSION) { + if (mad_reg_req->mgmt_class_version >= MAX_MGMT_VERSION) { ret = ERR_PTR(-EINVAL); goto error1; } + if (!bitmap_empty(mad_reg_req->method_mask, + IB_MGMT_MAX_METHODS)) { + if (!recv_handler) { + ret = ERR_PTR(-EINVAL); + goto error1; + } + } else { + if (!send_handler) { + ret = ERR_PTR(-EINVAL); + goto error1; + } + } if (mad_reg_req->mgmt_class >= MAX_MGMT_CLASS) { /* * IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE is the only @@ -151,6 +157,12 @@ ret = ERR_PTR(-EINVAL); goto error1; } + } else { + /* No registration request supplied */ + if (!send_handler) { + ret = ERR_PTR(-EINVAL); + goto error1; + } } /* Validate device and port */ @@ -842,7 +854,9 @@ spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Defined behavior is to complete response before request */ - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, + if (mad_agent_priv->agent.recv_handler) + mad_agent_priv->agent.recv_handler( + &mad_agent_priv->agent, &recv->header.recv_wc); atomic_dec(&mad_agent_priv->refcount); @@ -851,7 +865,9 @@ mad_send_wc.wr_id = mad_send_wr->wr_id; ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc); } else { - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, + if (mad_agent_priv->agent.recv_handler) + mad_agent_priv->agent.recv_handler( + &mad_agent_priv->agent, &recv->header.recv_wc); if (atomic_dec_and_test(&mad_agent_priv->refcount)) wake_up(&mad_agent_priv->wait); @@ -929,7 +945,7 @@ /* Snoop MAD ? */ if (port_priv->device->snoop_mad) { if (port_priv->device->snoop_mad(port_priv->device, - port_priv->port_num, + (u8)port_priv->port_num, wc->slid, recv->header.recv_buf.mad)) { goto ret; @@ -952,7 +968,7 @@ } ret: - if (!mad_agent) { + if (!mad_agent || !mad_agent->agent.recv_handler) { /* Should this case be optimized ? */ kmem_cache_free(ib_mad_cache, recv); } @@ -1018,7 +1034,9 @@ if (mad_send_wr->status != IB_WC_SUCCESS ) mad_send_wc->status = mad_send_wr->status; - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, mad_send_wc); + if (mad_agent_priv->agent.send_handler) + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, + mad_send_wc); /* Release reference on agent taken when sending */ if (atomic_dec_and_test(&mad_agent_priv->refcount)) @@ -1135,7 +1153,9 @@ list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, &cancel_list, agent_list) { mad_send_wc.wr_id = mad_send_wr->wr_id; - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, + if (mad_agent_priv->agent.send_handler) + mad_agent_priv->agent.send_handler( + &mad_agent_priv->agent, &mad_send_wc); list_del(&mad_send_wr->agent_list); @@ -1196,8 +1216,9 @@ mad_send_wc.status = IB_WC_WR_FLUSH_ERR; mad_send_wc.vendor_err = 0; mad_send_wc.wr_id = mad_send_wr->wr_id; - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, - &mad_send_wc); + if (mad_agent_priv->agent.send_handler) + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, + &mad_send_wc); kfree(mad_send_wr); if (atomic_dec_and_test(&mad_agent_priv->refcount)) From halr at voltaire.com Thu Oct 14 10:55:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 13:55:01 -0400 Subject: [openib-general] Re: [PATCH] fix list_entry usage In-Reply-To: <20041014101643.1fb402e0.mshefty@ichips.intel.com> References: <20041013165244.3ae058c4.mshefty@ichips.intel.com> <1097757045.22373.8.camel@hpc-1> <20041014101643.1fb402e0.mshefty@ichips.intel.com> Message-ID: <1097776501.2514.15.camel@hpc-1> On Thu, 2004-10-14 at 13:16, Sean Hefty wrote: > > rbuf = > > list_entry(&port_priv->recv_posted_mad_list[qpn].next, > > struct ib_mad_recv_buf, > > list); > > I didn't try this (it was a diversion from the change that I was > trying to make), but I'm guessing that we need to remove the &. Good guess :-) Thanks. Applied. -- Hal From mshefty at ichips.intel.com Thu Oct 14 10:55:05 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Oct 2004 10:55:05 -0700 Subject: [openib-general] [PATCH] ib_mad: Fix send only registrations In-Reply-To: <1097775188.2514.1.camel@hpc-1> References: <1097775188.2514.1.camel@hpc-1> Message-ID: <20041014105505.0fb30306.mshefty@ichips.intel.com> On Thu, 14 Oct 2004 13:33:08 -0400 Hal Rosenstock wrote: > /* Validate MAD registration request if supplied */ > if (mad_reg_req) { > - if (!recv_handler || > - mad_reg_req->mgmt_class_version >= MAX_MGMT_VERSION) { > + if (mad_reg_req->mgmt_class_version >= MAX_MGMT_VERSION) { > ret = ERR_PTR(-EINVAL); > goto error1; > } > + if (!bitmap_empty(mad_reg_req->method_mask, > + IB_MGMT_MAX_METHODS)) { > + if (!recv_handler) { > + ret = ERR_PTR(-EINVAL); > + goto error1; > + } > + } else { > + if (!send_handler) { > + ret = ERR_PTR(-EINVAL); > + goto error1; > + } > + } I'm not quite understanding this change. If the user has provided a mad_reg_req, they are indicating that they want to receive unsolicited MADs. A recv_handler should be required. Am I missing something? > /* Validate device and port */ > @@ -842,7 +854,9 @@ > spin_unlock_irqrestore(&mad_agent_priv->lock, flags); > > /* Defined behavior is to complete response before request */ > - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > + if (mad_agent_priv->agent.recv_handler) > + mad_agent_priv->agent.recv_handler( > + &mad_agent_priv->agent, > &recv->header.recv_wc); > atomic_dec(&mad_agent_priv->refcount); If I understand this change, a client sent a MAD, expecting a response, got one, but didn't register with a receive handler. As a side thought, I'm wondering how much protection we need to build into the code to handle kernel clients that don't provide all of the necessary parameters, but we can discuss this. The only real problem with this change is that the receive buffer needs to be released if it is not given to a client. We should probably change the send status as well, since no response was delivered. > @@ -851,7 +865,9 @@ > mad_send_wc.wr_id = mad_send_wr->wr_id; > ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc); > } else { > - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > + if (mad_agent_priv->agent.recv_handler) > + mad_agent_priv->agent.recv_handler( > + &mad_agent_priv->agent, > &recv->header.recv_wc); Need to free the receive buffer here as well if not delivered. > - if (!mad_agent) { > + if (!mad_agent || !mad_agent->agent.recv_handler) { This appears to be where the receive buffer would have been freed, but... We can't safely walk into the mad_agent structure after calling ib_mad_complete_recv(). Immediately above this code a reference is taken on the mad_agent. That reference is released in ib_mad_complete_recv(), which would allow the user to destroy the mad_agent before returning back to this call and the if statement above. I'm more in favor of removing checks for a recv_handler completely, but if we want to keep it, we can move it into find_mad_agent(), and just not report a mad_agent if it doesn't have a recv_handler. > if (mad_send_wr->status != IB_WC_SUCCESS ) > mad_send_wc->status = mad_send_wr->status; > - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > mad_send_wc); > + if (mad_agent_priv->agent.send_handler) > + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > + mad_send_wc); This has similar problems to the receive handling. If a client issued a send, but doesn't have a send_handler, there's nothing we can do with the send buffer, which needs to be freed. I think that a client who does this is causing more problems then we can deal with in the access layer. A possible fix for this is to check that mad_agent has a send_handler when the send is posted, rather than waiting until it completes. > /* Release reference on agent taken when sending */ > if (atomic_dec_and_test(&mad_agent_priv->refcount)) > @@ -1135,7 +1153,9 @@ > list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, > &cancel_list, agent_list) { > mad_send_wc.wr_id = mad_send_wr->wr_id; > - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > + if (mad_agent_priv->agent.send_handler) > + mad_agent_priv->agent.send_handler( > + &mad_agent_priv->agent, > &mad_send_wc); Same issue as above. > list_del(&mad_send_wr->agent_list); > @@ -1196,8 +1216,9 @@ > mad_send_wc.status = IB_WC_WR_FLUSH_ERR; > mad_send_wc.vendor_err = 0; > mad_send_wc.wr_id = mad_send_wr->wr_id; > - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > - &mad_send_wc); > + if (mad_agent_priv->agent.send_handler) > + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > + &mad_send_wc); Ditto. From halr at voltaire.com Thu Oct 14 11:32:57 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 14:32:57 -0400 Subject: [openib-general] [PATCH] ib_mad: Fix send only registrations In-Reply-To: <20041014105505.0fb30306.mshefty@ichips.intel.com> References: <1097775188.2514.1.camel@hpc-1> <20041014105505.0fb30306.mshefty@ichips.intel.com> Message-ID: <1097778776.2514.43.camel@hpc-1> On Thu, 2004-10-14 at 13:55, Sean Hefty wrote: > I'm not quite understanding this change. If the user has provided a mad_reg_req, > they are indicating that they want to receive unsolicited MADs. A recv_handler should be required. > Am I missing something? What if they didn't fill in any methods in their registration request ? Should a recv_handler be required ? Note that the mthca appears to be a send only MAD client (for locally generated traps). That's what all this stemmed from. Maybe I took it too far. > If I understand this change, a client sent a MAD, expecting a response, got one, > but didn't register with a receive handler. Maybe he sent something and wasn't expecting a response (so didn't register a recv handler) but got one anyway. Should the MAD layer crash because of this ? > As a side thought, I'm wondering how much protection we need to build into the code to > handle kernel clients that don't provide all of the necessary parameters, but we can discuss this. I've been wondering this myself too. In the previous case, it caused the MAD layer to crash on a NULL pointer reference and the Linux locked up sometime thereafter. > The only real problem with this change is that the receive buffer needs to be released > if it is not given to a client. I think it's there at the end of the ib_mad_recv_done_handler where !mad_agent->agent.recv_handler is checked and kmem_cache_free is called if so. I see the problem you have with doing it this way later in your email so I will fix it. > We should probably change the send status as well, since no response was delivered. OK. What status would you propose ? Do you want to generate a patch for this or should I ? > > @@ -851,7 +865,9 @@ > > mad_send_wc.wr_id = mad_send_wr->wr_id; > > ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc); > > } else { > > - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > > + if (mad_agent_priv->agent.recv_handler) > > + mad_agent_priv->agent.recv_handler( > > + &mad_agent_priv->agent, > > &recv->header.recv_wc); > > Need to free the receive buffer here as well if not delivered. This was to be handled by the code at the end of ib_mad_recv_done_handler. I will fix it. > > - if (!mad_agent) { > > + if (!mad_agent || !mad_agent->agent.recv_handler) { > > This appears to be where the receive buffer would have been freed, but... > We can't safely walk into the mad_agent structure after calling ib_mad_complete_recv(). > Immediately above this code a reference is taken on the mad_agent. > That reference is released in ib_mad_complete_recv(), which would allow the user to > destroy the mad_agent before returning back to this call and the if statement above. OK. I will move the buffer releases to where the references are held. > I'm more in favor of removing checks for a recv_handler completely, > but if we want to keep it, we can move it into find_mad_agent(), > and just not report a mad_agent if it doesn't have a recv_handler. That's a much better solution. As to the checks for the recv_handler, it depends on whether a recv_handler is required. Is it is, a simple check during registration and removal of the checks for whether than recv handler was supplied (the way it was). I thought it more flexible to make this optional but only got part way there with the implementation. > > if (mad_send_wr->status != IB_WC_SUCCESS ) > > mad_send_wc->status = mad_send_wr->status; > > - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > > mad_send_wc); > > + if (mad_agent_priv->agent.send_handler) > > + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > > + mad_send_wc); > > This has similar problems to the receive handling. If a client issued a send, > but doesn't have a send_handler, there's nothing we can do with the send buffer, > which needs to be freed. I think that a client who does this is causing more > problems then we can deal with in the access layer. Agreed. So are send handlers always required ? I just continued with thinking that if receive handlers are optional, might send ones be too in certain cases ? > A possible fix for this is to check that mad_agent has a send_handler when the send is posted, > rather than waiting until it completes. Yes, that's a better solution. > > /* Release reference on agent taken when sending */ > > if (atomic_dec_and_test(&mad_agent_priv->refcount)) > > @@ -1135,7 +1153,9 @@ > > list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, > > &cancel_list, agent_list) { > > mad_send_wc.wr_id = mad_send_wr->wr_id; > > - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > > + if (mad_agent_priv->agent.send_handler) > > + mad_agent_priv->agent.send_handler( > > + &mad_agent_priv->agent, > > &mad_send_wc); > > Same issue as above. > > > list_del(&mad_send_wr->agent_list); > > @@ -1196,8 +1216,9 @@ > > mad_send_wc.status = IB_WC_WR_FLUSH_ERR; > > mad_send_wc.vendor_err = 0; > > mad_send_wc.wr_id = mad_send_wr->wr_id; > > - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > > - &mad_send_wc); > > + if (mad_agent_priv->agent.send_handler) > > + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, > > + &mad_send_wc); > > Ditto. -- Hal From halr at voltaire.com Thu Oct 14 11:35:13 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 14:35:13 -0400 Subject: [openib-general] [PATCH] ib_mad: Fix send only registrations In-Reply-To: <1097778776.2514.43.camel@hpc-1> References: <1097775188.2514.1.camel@hpc-1> <20041014105505.0fb30306.mshefty@ichips.intel.com> <1097778776.2514.43.camel@hpc-1> Message-ID: <1097778913.2514.46.camel@hpc-1> On Thu, 2004-10-14 at 14:32, Hal Rosenstock wrote: > > A possible fix for this is to check that mad_agent has a send_handler when the send is posted, > > rather than waiting until it completes. > > Yes, that's a better solution. What status code would we use for this case ? -- Hal From mshefty at ichips.intel.com Thu Oct 14 11:50:17 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Oct 2004 11:50:17 -0700 Subject: [openib-general] [PATCH] ib_mad: Fix send only registrations In-Reply-To: <1097778776.2514.43.camel@hpc-1> References: <1097775188.2514.1.camel@hpc-1> <20041014105505.0fb30306.mshefty@ichips.intel.com> <1097778776.2514.43.camel@hpc-1> Message-ID: <20041014115017.7e375229.mshefty@ichips.intel.com> On Thu, 14 Oct 2004 14:32:57 -0400 Hal Rosenstock wrote: > What if they didn't fill in any methods in their registration request ? > Should a recv_handler be required ? Note that the mthca appears to be a > send only MAD client (for locally generated traps). That's what all this > stemmed from. Maybe I took it too far. I was thinking that clients wouldn't provide a mad_reg_req parameter if they were only going to issue sends. Although, I can see where the documentation says that the parameter _may_ be NULL in that case, rather than _must_ be NULL. Is there any use for mad_reg_req for clients that only issue sends? > Maybe he sent something and wasn't expecting a response (so didn't > register a recv handler) but got one anyway. Should the MAD layer crash > because of this ? I think that the local client can prevent this by not setting a timeout value. But, I agree that we shouldn't crash because of bad incoming data. > > We should probably change the send status as well, since no response was delivered. > > OK. What status would you propose ? Do you want to generate a patch for > this or should I ? Your guess is as good as mine... Maybe we can avoid this situation altogether though. See below. > > A possible fix for this is to check that mad_agent has a send_handler when the send is posted, > > rather than waiting until it completes. > > Yes, that's a better solution. How about this? In ib_post_send_mad(), we could perform something like: if (!mad_agent->send_handler || (send_wc->wr.ud.timeout_ms && !mad_agent->recv_handler)) return -EINVAL; With this check and a check in ib_register_mad_agent() for: if (mad_reg_req && !recv_handler) return -EINVAL (or something similar, depending on when mad_reg_req is required.) I think we can safely remove all other checks for valid handlers. We may still want to keep the other send_handler check after the else to if (mad_reg_req) in ib_register_mad_agent(), but it shouldn't be needed. - Sean From halr at voltaire.com Thu Oct 14 12:20:46 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 15:20:46 -0400 Subject: [openib-general] [PATCH] ib_mad: Fix send only registrations In-Reply-To: <20041014115017.7e375229.mshefty@ichips.intel.com> References: <1097775188.2514.1.camel@hpc-1> <20041014105505.0fb30306.mshefty@ichips.intel.com> <1097778776.2514.43.camel@hpc-1> <20041014115017.7e375229.mshefty@ichips.intel.com> Message-ID: <1097781646.2514.64.camel@hpc-1> On Thu, 2004-10-14 at 14:50, Sean Hefty wrote: > I was thinking that clients wouldn't provide a mad_reg_req parameter > if they were only going to issue sends. Although, I can see where the > documentation says that the parameter _may_ be NULL in that case, > rather than _must_ be NULL. Is there any use for mad_reg_req > for clients that only issue sends? I don't see one. It is pretty much ignored on the send side right now. The only use would be if we implemented some send side checking. > How about this? > > In ib_post_send_mad(), we could perform something like: > > if (!mad_agent->send_handler || > (send_wc->wr.ud.timeout_ms && !mad_agent->recv_handler)) > return -EINVAL; > > With this check and a check in ib_register_mad_agent() for: > > if (mad_reg_req && !recv_handler) > return -EINVAL > > (or something similar, depending on when mad_reg_req is required.) > I think we can safely remove all other checks for valid handlers. > We may still want to keep the other send_handler check after the > else to if (mad_reg_req) in ib_register_mad_agent(), but it shouldn't > be needed. This makes sense. I'll work up a patch for this approach. -- Hal From halr at voltaire.com Thu Oct 14 12:32:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 14 Oct 2004 15:32:01 -0400 Subject: [openib-general] [PATCH] ib_mad: Better handling of send and receive handlers Message-ID: <1097782321.2514.67.camel@hpc-1> ib_mad: Better handling of send and receive handlers Index: ib_mad.c =================================================================== --- ib_mad.c (revision 1004) +++ ib_mad.c (working copy) @@ -127,17 +127,9 @@ ret = ERR_PTR(-EINVAL); goto error1; } - if (!bitmap_empty(mad_reg_req->method_mask, - IB_MGMT_MAX_METHODS)) { - if (!recv_handler) { - ret = ERR_PTR(-EINVAL); - goto error1; - } - } else { - if (!send_handler) { - ret = ERR_PTR(-EINVAL); - goto error1; - } + if (!recv_handler) { + ret = ERR_PTR(-EINVAL); + goto error1; } if (mad_reg_req->mgmt_class >= MAX_MGMT_CLASS) { /* @@ -351,6 +343,12 @@ return -EINVAL; } + if (!mad_agent->send_handler || + (send_wr->wr.ud.timeout_ms && !mad_agent->recv_handler)) { + *bad_send_wr = cur_send_wr; + return -EINVAL; + } + mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private, agent); port_priv = mad_agent_priv->port_priv; @@ -758,6 +756,14 @@ } ret: + if (!mad_agent->agent.recv_handler) { + printk(KERN_ERR PFX "No receive handler for client " + "0x%x on port %d\n", + (unsigned int)&mad_agent->agent, + port_priv->port_num); + mad_agent = NULL; + } + return mad_agent; } @@ -854,8 +860,7 @@ spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Defined behavior is to complete response before request */ - if (mad_agent_priv->agent.recv_handler) - mad_agent_priv->agent.recv_handler( + mad_agent_priv->agent.recv_handler( &mad_agent_priv->agent, &recv->header.recv_wc); atomic_dec(&mad_agent_priv->refcount); @@ -865,8 +870,7 @@ mad_send_wc.wr_id = mad_send_wr->wr_id; ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc); } else { - if (mad_agent_priv->agent.recv_handler) - mad_agent_priv->agent.recv_handler( + mad_agent_priv->agent.recv_handler( &mad_agent_priv->agent, &recv->header.recv_wc); if (atomic_dec_and_test(&mad_agent_priv->refcount)) @@ -967,7 +971,7 @@ } ret: - if (!mad_agent || !mad_agent->agent.recv_handler) { + if (!mad_agent) { /* Should this case be optimized ? */ kmem_cache_free(ib_mad_cache, recv); } @@ -1033,9 +1037,8 @@ if (mad_send_wr->status != IB_WC_SUCCESS ) mad_send_wc->status = mad_send_wr->status; - if (mad_agent_priv->agent.send_handler) - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, - mad_send_wc); + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, + mad_send_wc); /* Release reference on agent taken when sending */ if (atomic_dec_and_test(&mad_agent_priv->refcount)) @@ -1148,9 +1151,7 @@ list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, &cancel_list, agent_list) { mad_send_wc.wr_id = mad_send_wr->wr_id; - if (mad_agent_priv->agent.send_handler) - mad_agent_priv->agent.send_handler( - &mad_agent_priv->agent, + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, &mad_send_wc); list_del(&mad_send_wr->agent_list); @@ -1211,9 +1212,8 @@ mad_send_wc.status = IB_WC_WR_FLUSH_ERR; mad_send_wc.vendor_err = 0; mad_send_wc.wr_id = mad_send_wr->wr_id; - if (mad_agent_priv->agent.send_handler) - mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, - &mad_send_wc); + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, + &mad_send_wc); kfree(mad_send_wr); if (atomic_dec_and_test(&mad_agent_priv->refcount)) From yaronh at voltaire.com Thu Oct 14 15:17:09 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Fri, 15 Oct 2004 00:17:09 +0200 Subject: [openib-general] SDP socket address family Message-ID: <35EA21F54A45CB47B879F21A91F4862F257C37@taurus.voltaire.com> There seems to be a conflict between the currently used SDP socket address family number (26) and the current linux kernel. Linux allocates this address family number (26) for 'LLC' protocol. Any ideas if we should change it from 26, and to what ? Below are some related header-file snippets: SuSE-9.1 /usr/include/linux/socket.h: ------------------------------------------------------------------- #define AF_IRDA 23 /* IRDA sockets */ #define AF_PPPOX 24 /* PPPoX sockets */ #define AF_WANPIPE 25 /* Wanpipe API Sockets */ #define AF_LLC 26 /* Linux LLC */ #define AF_BLUETOOTH 31 /* Bluetooth sockets */ #define AF_MAX 32 /* For now.. */ Voltaire's sdp/sdp-sockets/sdp-sockets.h: ------------------------------------------------------------------- # define AF_IBT 26 TopSpin's infiniband/ulp/sdp/sdp_inet.h: ------------------------------------------------------------------- /* * constants shared between user and kernel space. */ #define AF_INET_SDP 26 /* SDP socket protocol family */ #define AF_INET_STR "AF_INET_SDP" /* SDP enabled enviroment variable */ Yaron From tduffy at sun.com Thu Oct 14 16:13:52 2004 From: tduffy at sun.com (Tom Duffy) Date: Thu, 14 Oct 2004 16:13:52 -0700 Subject: [openib-general] [PATCH][TRIVIAL] remove unused variable in ib_device.c Message-ID: <1097795632.25394.2.camel@duffman> Index: drivers/infiniband/core/ib_device.c =================================================================== --- drivers/infiniband/core/ib_device.c (revision 1007) +++ drivers/infiniband/core/ib_device.c (working copy) @@ -188,7 +188,6 @@ struct ib_device_private *priv; struct ib_device_attr prop; int ret; - int p; down(&device_sem); -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From krause at cup.hp.com Thu Oct 14 16:21:03 2004 From: krause at cup.hp.com (Michael Krause) Date: Thu, 14 Oct 2004 16:21:03 -0700 Subject: [openib-general] SDP socket address family In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F257C37@taurus.voltaire.com > References: <35EA21F54A45CB47B879F21A91F4862F257C37@taurus.voltaire.com> Message-ID: <6.1.2.0.2.20041014161825.035f6238@esmail.cup.hp.com> Why not just leverage the SDP port mapper protocol already defined in the RDMAC version and avoid having to provide a new address family? The port mapper protocol is interconnect independent and will enable sockets applications to more easily be executed transparently. It seems counterproductive to continue to pursue a new address family. BTW, the new port mapper protocol will also work with the new async sockets and memory management API that is nearly complete (should be approved soon within the OpenGroup). This would greatly enhance socket application design and provide greater performance when operating over a RDMA interconnect than traditional BSD sockets. Mike At 03:17 PM 10/14/2004, Yaron Haviv wrote: >There seems to be a conflict between the currently used SDP socket address >family number (26) and the current linux kernel. Linux allocates this >address family number (26) for 'LLC' protocol. > >Any ideas if we should change it from 26, and to what ? > >Below are some related header-file snippets: > >SuSE-9.1 /usr/include/linux/socket.h: >------------------------------------------------------------------- >#define AF_IRDA 23 /* IRDA sockets */ >#define AF_PPPOX 24 /* PPPoX sockets */ >#define AF_WANPIPE 25 /* Wanpipe API Sockets */ >#define AF_LLC 26 /* Linux LLC */ >#define AF_BLUETOOTH 31 /* Bluetooth sockets */ >#define AF_MAX 32 /* For now.. */ > > >Voltaire's sdp/sdp-sockets/sdp-sockets.h: >------------------------------------------------------------------- ># define AF_IBT 26 > > >TopSpin's infiniband/ulp/sdp/sdp_inet.h: >------------------------------------------------------------------- >/* > * constants shared between user and kernel space. > */ >#define AF_INET_SDP 26 /* SDP socket protocol family */ >#define AF_INET_STR "AF_INET_SDP" /* SDP enabled enviroment variable */ > >Yaron > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at topspin.com Thu Oct 14 16:32:07 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 14 Oct 2004 16:32:07 -0700 Subject: [openib-general] [PATCH][TRIVIAL] remove unused variable in ib_device.c In-Reply-To: <1097795632.25394.2.camel@duffman> (Tom Duffy's message of "Thu, 14 Oct 2004 16:13:52 -0700") References: <1097795632.25394.2.camel@duffman> Message-ID: <523c0gev0o.fsf@topspin.com> Thanks, applied. - R. From ftillier at infiniconsys.com Thu Oct 14 16:57:45 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Thu, 14 Oct 2004 16:57:45 -0700 Subject: [openib-general] SDP socket address family In-Reply-To: <6.1.2.0.2.20041014161825.035f6238@esmail.cup.hp.com> Message-ID: <000001c4b249$9ab2eab0$655aa8c0@infiniconsys.com> > From: Michael Krause [mailto:krause at cup.hp.com] > Sent: Thursday, October 14, 2004 4:21 PM > > Why not just leverage the SDP port mapper protocol already defined in the > RDMAC version and avoid having to provide a new address family? The port > mapper protocol is interconnect independent and will enable sockets > applications to more easily be executed transparently. It seems > counterproductive to continue to pursue a new address family. > > BTW, the new port mapper protocol will also work with the new async > sockets and memory management API that is nearly complete (should be > approved soon within the OpenGroup). This would greatly enhance socket > application design and provide greater performance when operating over a > RDMA interconnect than traditional BSD sockets. > A while ago, there was some discussion of having transparent port mapping be a bad thing, and a security vulnerability of some sort. Note that I don't personally believe that. - Fab From krause at cup.hp.com Fri Oct 15 07:21:44 2004 From: krause at cup.hp.com (Michael Krause) Date: Fri, 15 Oct 2004 07:21:44 -0700 Subject: [openib-general] SDP socket address family In-Reply-To: <000001c4b249$9ab2eab0$655aa8c0@infiniconsys.com> References: <6.1.2.0.2.20041014161825.035f6238@esmail.cup.hp.com> <000001c4b249$9ab2eab0$655aa8c0@infiniconsys.com> Message-ID: <6.1.2.0.2.20041015071933.01de8be0@esmail.cup.hp.com> At 04:57 PM 10/14/2004, Fab Tillier wrote: > > From: Michael Krause [mailto:krause at cup.hp.com] > > Sent: Thursday, October 14, 2004 4:21 PM > > > > Why not just leverage the SDP port mapper protocol already defined in the > > RDMAC version and avoid having to provide a new address family? The port > > mapper protocol is interconnect independent and will enable sockets > > applications to more easily be executed transparently. It seems > > counterproductive to continue to pursue a new address family. > > > > BTW, the new port mapper protocol will also work with the new async > > sockets and memory management API that is nearly complete (should be > > approved soon within the OpenGroup). This would greatly enhance socket > > application design and provide greater performance when operating over a > > RDMA interconnect than traditional BSD sockets. > > > >A while ago, there was some discussion of having transparent port mapping be >a bad thing, and a security vulnerability of some sort. Note that I don't >personally believe that. The SDP spec submitted to the IETF as a draft has the port mapping. The HP IETF gurus did not have a problem with the port mapping. Discussions with various IETF AD did not generate a negative reaction but one never knows how the IETF will act. In general, the exchange has measures to mitigate any DOS attacks and conserve resources. Take a look and please consider as this is the best way to get a large number of sockets applications to quickly operate over RDMA interconnects without requiring any source code changes. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Oct 15 07:27:52 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 15 Oct 2004 10:27:52 -0400 Subject: [Fwd: Re: [openib-general] Switch SMI incoming MAD question] Message-ID: <1097850472.3060.13.camel@localhost.localdomain> If port_num is added to struct ib_wc, there are 2 options for this field: 1. It is valid for switches only so that HCA drivers don't have to fill this in. This is easier for HCA drivers and doesn't put an unnecessary requirement on them. Also, there already are other switch or not checks in the MAD layer (as dictated by SMI) so another one won't hurt. or: 2. It is always filled in by the driver. This is easier for the MAD layer as there is no special case code for this. port_num only really needs to be set for DR SMPs. Not sure if that helps or not. I am going with option 1 unless I hear otherwise. -- Hal -----Forwarded Message----- From: Hal Rosenstock To: openib-general at openib.org Subject: Re: [openib-general] Switch SMI incoming MAD question Date: 14 Oct 2004 09:21:27 -0400 On Thu, 2004-10-14 at 09:12, Hal Rosenstock wrote: > Hi, > > We added port_num into the UD send_wr structure to accomodate switches. > Isn't a similar thing needed for receive WCs ? The SMI needs to know > which physical port the DR SMP came in on. If this is the case, it seems > to me that the right thing to do to add this to the ib_mad_recv_wc > structure. I'm half asleep (and half right above)... I think it needs to be added to ib_wc structure rather than ib_mad_recv_wc structure as the incoming port that the MAD layer receives on is not the same as the switch physical port. -- Hal _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Fri Oct 15 07:31:14 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 15 Oct 2004 10:31:14 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Add port_num into ib_wc structure for switch DR SMPs Message-ID: <1097850674.3060.16.camel@localhost.localdomain> ib_verbs.h: Add port_num into ib_wc structure for switch DR SMPs Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 1009) +++ ib_verbs.h (working copy) @@ -618,6 +618,7 @@ u16 slid; u8 sl; u8 dlid_path_bits; + u8 port_num; /* valid for DR SMPs on switches */ }; enum ib_cq_notify { From halr at voltaire.com Fri Oct 15 07:33:39 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 15 Oct 2004 10:33:39 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Add port_num into ib_wc structure for switch DR SMPs (Roland's branch) Message-ID: <1097850819.3060.19.camel@localhost.localdomain> ib_verbs.h: Add port_num into ib_wc structure for switch DR SMPs (this is for Roland's branch) Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 1009) +++ ib_verbs.h (working copy) @@ -325,6 +325,7 @@ u16 slid; u8 sl; u8 dlid_path_bits; + u8 port_num; /* valid for DR SMPs on switches */ }; enum ib_cq_notify { From roland at topspin.com Fri Oct 15 15:03:47 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 15 Oct 2004 15:03:47 -0700 Subject: [openib-general] [PATCH] ib_verbs.h: Add port_num into ib_wc structure for switch DR SMPs (Roland's branch) In-Reply-To: <1097850819.3060.19.camel@localhost.localdomain> (Hal Rosenstock's message of "Fri, 15 Oct 2004 10:33:39 -0400") References: <1097850819.3060.19.camel@localhost.localdomain> Message-ID: <52fz4fbpvg.fsf@topspin.com> Applied by hand (the patch had no tabs and was line wrapped) - R. From roland at topspin.com Sun Oct 17 17:59:22 2004 From: roland at topspin.com (Roland Dreier) Date: Sun, 17 Oct 2004 17:59:22 -0700 Subject: [openib-general] Usage of ib_mad_recv_wc.recv_buf? Message-ID: <521xfwc045.fsf@topspin.com> I'm working with the new MAD code, and I'm wondering what the intendend usage of ib_mad_recv_wc.recv_buf is, specifically in the RMPP case. I see that struct ib_mad_recv_buf has a struct list_head member, but struct ib_mad_recv_wc just has a struct ib_mad_recv_buf * member. I assume that the idea is for multiple MAD packets to be passed as a linked list, so it would seem that struct ib_mad_recv_wc should just have a struct list_head where the MAD buffers are linked. Am I missing something or does this need to be changed? Thanks, Roland From halr at voltaire.com Mon Oct 18 05:34:48 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 18 Oct 2004 08:34:48 -0400 Subject: [openib-general] Usage of ib_mad_recv_wc.recv_buf? In-Reply-To: <521xfwc045.fsf@topspin.com> References: <521xfwc045.fsf@topspin.com> Message-ID: <1098102888.2753.10.camel@localhost.localdomain> On Sun, 2004-10-17 at 20:59, Roland Dreier wrote: > I'm working with the new MAD code, and I'm wondering what the > intendend usage of ib_mad_recv_wc.recv_buf is, specifically in the > RMPP case. I see that struct ib_mad_recv_buf has a struct list_head > member, but struct ib_mad_recv_wc just has a struct ib_mad_recv_buf * > member. I assume that the idea is for multiple MAD packets to be > passed as a linked list, so it would seem that struct ib_mad_recv_wc > should just have a struct list_head where the MAD buffers are linked. > Am I missing something or does this need to be changed? I believe that the intention is that the receive completion is not indicated to the consumer until the complete RMPP transaction has been received. The multiple receive completions (one per segment) are "coaelsced" into one and provided to the client when the final RMPP segment is received. -- Hal From halr at voltaire.com Mon Oct 18 08:27:02 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 18 Oct 2004 11:27:02 -0400 Subject: [openib-general] ib_smi: More changes for better code clarity Message-ID: <1098113222.27056.4.camel@hpc-1> ib_smi: More changes for better code clarity Index: ib_smi.c =================================================================== --- ib_smi.c (revision 1012) +++ ib_smi.c (working copy) @@ -126,7 +126,7 @@ { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_handle_dr_smp_send(mad_agent, smp, port_num); - default: /* LR SM or PerfMgmt classes */ + default: /* LR SM class */ return 1; } } @@ -238,7 +238,7 @@ case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_handle_dr_smp_recv(mad_agent, smp, port_num, phys_port_cnt); - default: /* LR SM or PerfMgmt classes */ + default: /* LR SM class */ return 1; } } @@ -290,7 +290,7 @@ { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: return smi_check_forward_dr_smp(mad_agent, smp); - default: /* LR SM or PerfMgmt classes */ + default: /* LR SM class */ return 1; } } @@ -305,9 +305,9 @@ slid, mad, mad_response); } -void mad_send(struct ib_mad_agent *mad_agent, - struct ib_mad *mad, - struct ib_mad_recv_wc *mad_recv_wc) +void agent_mad_send(struct ib_mad_agent *mad_agent, + struct ib_mad *mad, + struct ib_mad_recv_wc *mad_recv_wc) { struct ib_agent_port_private *entry, *port_priv = NULL; struct ib_agent_send_wr *agent_send_wr; @@ -330,7 +330,7 @@ } spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR SPFX "mad_send: no matching MAD agent 0x%x\n", + printk(KERN_ERR SPFX "agent_mad_send: no matching MAD agent 0x%x\n", (unsigned int)mad_agent); return; } @@ -428,7 +428,7 @@ kfree(smp_response); return 0; } - mad_send(mad_agent, smp_response, mad_recv_wc); + agent_mad_send(mad_agent, smp_response, mad_recv_wc); } else kfree(smp_response); return 1; @@ -438,23 +438,59 @@ return 1; } -int smi_recv_smp(struct ib_mad_agent *mad_agent, - struct ib_smp *smp, +int mad_response(struct ib_mad_agent *mad_agent, + struct ib_mad *mad, struct ib_mad_recv_wc *mad_recv_wc, - int phys_port_cnt) + u16 slid) { - if (!smi_handle_smp_recv(mad_agent, smp, - mad_agent->port_num, phys_port_cnt)) { - /* SMI failed receive */ + struct ib_mad *response; + int ret; + + response = kmalloc(sizeof(struct ib_mad), GFP_KERNEL); + if (!response) return 0; - } - if (smi_check_forward_smp(mad_agent, smp)) { - smi_send_smp(mad_agent, smp, mad_recv_wc, - mad_recv_wc->wc->slid, phys_port_cnt); + ret = mad_process_local(mad_agent, mad, response, slid); + if (ret & IB_MAD_RESULT_SUCCESS) { + agent_mad_send(mad_agent, response, mad_recv_wc); + } else + kfree(response); + return 1; +} + +int agent_recv_mad(struct ib_mad_agent *mad_agent, + struct ib_mad *mad, + struct ib_mad_recv_wc *mad_recv_wc, + int phys_port_cnt) +{ + /* SM Directed Route or LID Routed class */ + if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE || + mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED) { + if (!smi_handle_smp_recv(mad_agent, (struct ib_smp *)mad, + mad_agent->port_num, phys_port_cnt)) { + /* SMI failed receive */ + return 0; + } + + if (smi_check_forward_smp(mad_agent, (struct ib_smp *)mad)) { + smi_send_smp(mad_agent, (struct ib_smp *)mad, + mad_recv_wc, + mad_recv_wc->wc->slid, + phys_port_cnt); + return 0; + } + + } else { + /* PerfMgmt class */ + if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { + mad_response(mad_agent, mad, mad_recv_wc, + mad_recv_wc->wc->slid); + } else { + printk(KERN_ERR "agent_recv_mad: Unexpected mgmt class 0x%x received\n", mad->mad_hdr.mgmt_class); + } return 0; } - + /* Complete receive up stack */ return 1; } @@ -538,9 +574,9 @@ (unsigned int)mad_agent); } else { - smi_recv_smp(mad_agent, - (struct ib_smp *)mad_recv_wc->recv_buf->mad, - mad_recv_wc, port_priv->phys_port_cnt); + agent_recv_mad(mad_agent, + mad_recv_wc->recv_buf->mad, + mad_recv_wc, port_priv->phys_port_cnt); } /* Free received MAD */ From roland at topspin.com Mon Oct 18 09:11:34 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 18 Oct 2004 09:11:34 -0700 Subject: [openib-general] Usage of ib_mad_recv_wc.recv_buf? In-Reply-To: <1098102888.2753.10.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 18 Oct 2004 08:34:48 -0400") References: <521xfwc045.fsf@topspin.com> <1098102888.2753.10.camel@localhost.localdomain> Message-ID: <52wtxo9fbd.fsf@topspin.com> Hal> I believe that the intention is that the receive completion Hal> is not indicated to the consumer until the complete RMPP Hal> transaction has been received. The multiple receive Hal> completions (one per segment) are "coaelsced" into one and Hal> provided to the client when the final RMPP segment is Hal> received. Right, and then the consumer can call void ib_coalesce_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, void *buf); to get the MAD packets' payloads copied into a single buffer. What I was trying to point out was that we have: struct ib_mad_recv_buf { struct list_head list; struct ib_grh *grh; struct ib_mad *mad; }; with "list" described as "Reference to next data buffer for a received RMPP MAD," but: struct ib_mad_recv_wc { struct ib_wc *wc; struct ib_mad_recv_buf *recv_buf; int mad_len; }; with only a pointer to a single ib_mad_recv_buf (although the comment describes recv_buf as "Specifies the location of the received data buffer(s)"). It seems to me that if ib_mad_recv_wc is supposed to return a list of MAD packets, then instead of the recv_buf member, it should have a struct list_head member to serve as the head of the list of packets. - R. From mshefty at ichips.intel.com Mon Oct 18 09:10:39 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 18 Oct 2004 09:10:39 -0700 Subject: [openib-general] Usage of ib_mad_recv_wc.recv_buf? In-Reply-To: <521xfwc045.fsf@topspin.com> References: <521xfwc045.fsf@topspin.com> Message-ID: <20041018091039.6e6c6782.mshefty@ichips.intel.com> On Sun, 17 Oct 2004 17:59:22 -0700 Roland Dreier wrote: > I'm working with the new MAD code, and I'm wondering what the > intendend usage of ib_mad_recv_wc.recv_buf is, specifically in the > RMPP case. I see that struct ib_mad_recv_buf has a struct list_head > member, but struct ib_mad_recv_wc just has a struct ib_mad_recv_buf * > member. I assume that the idea is for multiple MAD packets to be > passed as a linked list, so it would seem that struct ib_mad_recv_wc > should just have a struct list_head where the MAD buffers are linked. > Am I missing something or does this need to be changed? Your assumption is correct. For RMPP, the idea is to pass back a linked list of received segments (duplicated MAD headers and all), in order to avoid data copies. I think that the current API is a result of changes to ib_mad_recv_buf. At one point I think that it had a *recv_buf field, which was changed to list_head, but ib_mad_recv_wc was left alone. I'm not sure that we gain any advantage to using a doubly-linked list over a singly-linked one, so I would be fine with converting it back to *recv_buf. If we do decide to keep list_head, then list_head should be added into ib_mad_recv_wc as you suggest. - Sean From roland at topspin.com Mon Oct 18 09:30:40 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 18 Oct 2004 09:30:40 -0700 Subject: [openib-general] Usage of ib_mad_recv_wc.recv_buf? In-Reply-To: <20041018091039.6e6c6782.mshefty@ichips.intel.com> (Sean Hefty's message of "Mon, 18 Oct 2004 09:10:39 -0700") References: <521xfwc045.fsf@topspin.com> <20041018091039.6e6c6782.mshefty@ichips.intel.com> Message-ID: <52sm8c9efj.fsf@topspin.com> Sean> I'm not sure that we gain any advantage to using a Sean> doubly-linked list over a singly-linked one, so I would be Sean> fine with converting it back to *recv_buf. If we do decide Sean> to keep list_head, then list_head should be added into Sean> ib_mad_recv_wc as you suggest. It seems that consumers will walk the list from the start, while the MAD layer will need to add buffers to the tail, it seems we need both a head and tail pointer in ib_mad_recv_wc anyway. Since the recv_bufs all have 256-byte packet buffers, the savings of having only a single pointer there is pretty minimal. I would think that avoiding open-coding all the list manipulation is worth the extra pointer anyway. Since we need an ib_mad_recv_wc structure anyway, the objection I had to using struct list_head for posting send requests (namely that it forces the creation of another structure to hold the list head) doesn't apply here. - R. From halr at voltaire.com Mon Oct 18 10:10:18 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 18 Oct 2004 13:10:18 -0400 Subject: [openib-general] [PATCH] ib_smi: Add GRH support for PMA Message-ID: <1098119418.27056.6.camel@hpc-1> ib_smi: Add GRH support for PMA Index: ib_smi.c =================================================================== --- ib_smi.c (revision 1017) +++ ib_smi.c (working copy) @@ -307,6 +307,7 @@ void agent_mad_send(struct ib_mad_agent *mad_agent, struct ib_mad *mad, + struct ib_grh *grh, struct ib_mad_recv_wc *mad_recv_wc) { struct ib_agent_port_private *entry, *port_priv = NULL; @@ -359,9 +360,24 @@ ah_attr.dlid = mad_recv_wc->wc->slid; ah_attr.port_num = mad_agent->port_num; ah_attr.src_path_bits = mad_recv_wc->wc->dlid_path_bits; - ah_attr.ah_flags = 0; /* No GRH */ ah_attr.sl = mad_recv_wc->wc->sl; ah_attr.static_rate = 0; + if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { + if (mad_recv_wc->wc->wc_flags & IB_WC_GRH) { + ah_attr.ah_flags = IB_AH_GRH; + ah_attr.grh.sgid_index = 0; /* Should sgid be looked up +? */ + ah_attr.grh.hop_limit = grh->hop_limit; + ah_attr.grh.flow_label = be32_to_cpup(&grh->version_tclass_flow) & 0xfffff; + ah_attr.grh.traffic_class = (be32_to_cpup(&grh->version_tclass_flow) >> 20) & 0xff; + memcpy(ah_attr.grh.dgid.raw, grh->sgid.raw, sizeof(struct ib_grh)); + } else { + ah_attr.ah_flags = 0; /* No GRH */ + } + } else { + /* Directed route or LID routed SM class */ + ah_attr.ah_flags = 0; /* No GRH */ + } ah = ib_create_ah(mad_agent->qp->pd, &ah_attr); if (IS_ERR(ah)) { @@ -428,7 +444,8 @@ kfree(smp_response); return 0; } - agent_mad_send(mad_agent, smp_response, mad_recv_wc); + agent_mad_send(mad_agent, smp_response, + NULL, mad_recv_wc); } else kfree(smp_response); return 1; @@ -438,12 +455,13 @@ return 1; } -int mad_response(struct ib_mad_agent *mad_agent, - struct ib_mad *mad, - struct ib_mad_recv_wc *mad_recv_wc, - u16 slid) +int agent_mad_response(struct ib_mad_agent *mad_agent, + struct ib_mad *mad, + struct ib_mad_recv_wc *mad_recv_wc, + u16 slid) { struct ib_mad *response; + struct ib_grh *grh; int ret; response = kmalloc(sizeof(struct ib_mad), GFP_KERNEL); @@ -452,7 +470,8 @@ ret = mad_process_local(mad_agent, mad, response, slid); if (ret & IB_MAD_RESULT_SUCCESS) { - agent_mad_send(mad_agent, response, mad_recv_wc); + grh = (void *)mad - sizeof(struct ib_grh); + agent_mad_send(mad_agent, response, grh, mad_recv_wc); } else kfree(response); return 1; @@ -483,8 +502,8 @@ } else { /* PerfMgmt class */ if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { - mad_response(mad_agent, mad, mad_recv_wc, - mad_recv_wc->wc->slid); + agent_mad_response(mad_agent, mad, mad_recv_wc, + mad_recv_wc->wc->slid); } else { printk(KERN_ERR "agent_recv_mad: Unexpected mgmt class 0x%x received\n", mad->mad_hdr.mgmt_class); } From halr at voltaire.com Mon Oct 18 10:52:57 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 18 Oct 2004 13:52:57 -0400 Subject: [openib-general] [PATCH] ib_smi: More minor change to DR SMI code for port number Message-ID: <1098121977.27056.9.camel@hpc-1> ib_smi: More minor changes to DR SMI code for port number Index: ib_smi.c =================================================================== --- ib_smi.c (revision 1018) +++ ib_smi.c (working copy) @@ -41,8 +41,8 @@ * Fixup a directed route SMP for sending. Return 0 if the SMP should be * discarded. */ -static int smi_handle_dr_smp_send(struct ib_mad_agent *mad_agent, - struct ib_smp *smp, +static int smi_handle_dr_smp_send(struct ib_smp *smp, + u8 node_type, int port_num) { u8 hop_ptr, hop_cnt; @@ -61,7 +61,7 @@ /* C14-9:2 */ if (hop_ptr && hop_ptr < hop_cnt) { - if (mad_agent->device->node_type != IB_NODE_SWITCH) + if (node_type != IB_NODE_SWITCH) return 0; /* smp->return_path set when received */ @@ -74,7 +74,7 @@ if (hop_ptr == hop_cnt) { /* smp->return_path set when received */ smp->hop_ptr++; - return (mad_agent->device->node_type == IB_NODE_SWITCH || + return (node_type == IB_NODE_SWITCH || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -92,7 +92,7 @@ /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (mad_agent->device->node_type != IB_NODE_SWITCH) + if (node_type != IB_NODE_SWITCH) return 0; smp->hop_ptr--; @@ -104,7 +104,7 @@ if (hop_ptr == 1) { smp->hop_ptr--; /* C14-13:3 -- SMPs destined for SM shouldn't be here */ - return (mad_agent->device->node_type == IB_NODE_SWITCH || + return (node_type == IB_NODE_SWITCH || smp->dr_slid == IB_LID_PERMISSIVE); } @@ -118,14 +118,14 @@ * Sender side handling of outgoing SMPs. Fixup the SMP as required by * the spec. Return 0 if the SMP should be dropped. */ -static int smi_handle_smp_send(struct ib_mad_agent *mad_agent, - struct ib_smp *smp, +static int smi_handle_smp_send(struct ib_smp *smp, + u8 node_type, int port_num) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_send(mad_agent, smp, port_num); + return smi_handle_dr_smp_send(smp, node_type, port_num); default: /* LR SM class */ return 1; } @@ -149,8 +149,8 @@ * Adjust information for a received SMP. Return 0 if the SMP should be * dropped. */ -static int smi_handle_dr_smp_recv(struct ib_mad_agent *mad_agent, - struct ib_smp *smp, +static int smi_handle_dr_smp_recv(struct ib_smp *smp, + u8 node_type, int port_num, int phys_port_cnt) { @@ -167,10 +167,10 @@ /* C14-9:2 -- intermediate hop */ if (hop_ptr && hop_ptr < hop_cnt) { - if (mad_agent->device->node_type != IB_NODE_SWITCH) + if (node_type != IB_NODE_SWITCH) return 0; - smp->return_path[hop_ptr] = mad_agent->port_num; + smp->return_path[hop_ptr] = port_num; /* smp->hop_ptr updated when sending */ return (smp->initial_path[hop_ptr+1] <= phys_port_cnt); } @@ -181,7 +181,7 @@ smp->return_path[hop_ptr] = port_num; /* smp->hop_ptr updated when sending */ - return (mad_agent->device->node_type == IB_NODE_SWITCH || + return (node_type == IB_NODE_SWITCH || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -200,7 +200,7 @@ /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (mad_agent->device->node_type != IB_NODE_SWITCH) + if (node_type != IB_NODE_SWITCH) return 0; /* smp->hop_ptr updated when sending */ @@ -215,7 +215,7 @@ return 1; } /* smp->hop_ptr updated when sending */ - return (mad_agent->device->node_type == IB_NODE_SWITCH); + return (node_type == IB_NODE_SWITCH); } /* C14-13:4 -- hop_ptr = 0 -> give to SM. */ @@ -228,15 +228,15 @@ * Receive side handling SMPs. Save receive information as required by * the spec. Return 0 if the SMP should be dropped. */ -static int smi_handle_smp_recv(struct ib_mad_agent *mad_agent, - struct ib_smp *smp, +static int smi_handle_smp_recv(struct ib_smp *smp, + u8 node_type, int port_num, int phys_port_cnt) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_recv(mad_agent, smp, + return smi_handle_dr_smp_recv(smp, node_type, port_num, phys_port_cnt); default: /* LR SM class */ return 1; @@ -247,8 +247,7 @@ * Return 1 if the received DR SMP should be forwarded to the send queue. * Return 0 if the SMP should be completed up the stack. */ -static int smi_check_forward_dr_smp(struct ib_mad_agent *mad_agent, - struct ib_smp *smp) +static int smi_check_forward_dr_smp(struct ib_smp *smp) { u8 hop_ptr, hop_cnt; @@ -283,13 +282,12 @@ * Return 1 if the received SMP should be forwarded to the send queue. * Return 0 if the SMP should be completed up the stack. */ -static int smi_check_forward_smp(struct ib_mad_agent *mad_agent, - struct ib_smp *smp) +static int smi_check_forward_smp(struct ib_smp *smp) { switch (smp->mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_check_forward_dr_smp(mad_agent, smp); + return smi_check_forward_dr_smp(smp); default: /* LR SM class */ return 1; } @@ -423,7 +421,8 @@ struct ib_mad *smp_response; int ret; - if (!smi_handle_smp_send(mad_agent, smp, mad_agent->port_num)) { + if (!smi_handle_smp_send(smp, mad_agent->device->node_type, + mad_agent->port_num)) { /* SMI failed send */ return 0; } @@ -436,8 +435,8 @@ ret = mad_process_local(mad_agent, (struct ib_mad *)smp, smp_response, slid); if (ret & IB_MAD_RESULT_SUCCESS) { - if (!smi_handle_smp_recv(mad_agent, - (struct ib_smp *)smp_response, + if (!smi_handle_smp_recv((struct ib_smp *)smp_response, + mad_agent->device->node_type, mad_agent->port_num, phys_port_cnt)) { /* SMI failed receive */ @@ -485,14 +484,16 @@ /* SM Directed Route or LID Routed class */ if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE || mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED) { - if (!smi_handle_smp_recv(mad_agent, (struct ib_smp *)mad, + if (!smi_handle_smp_recv((struct ib_smp *)mad, + mad_agent->device->node_type, mad_agent->port_num, phys_port_cnt)) { /* SMI failed receive */ return 0; } - if (smi_check_forward_smp(mad_agent, (struct ib_smp *)mad)) { - smi_send_smp(mad_agent, (struct ib_smp *)mad, + if (smi_check_forward_smp((struct ib_smp *)mad)) { + smi_send_smp(mad_agent, + (struct ib_smp *)mad, mad_recv_wc, mad_recv_wc->wc->slid, phys_port_cnt); From halr at voltaire.com Mon Oct 18 11:10:57 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 18 Oct 2004 14:10:57 -0400 Subject: [openib-general] [PATCH] ib_smi: DR SMI switch port number handling Message-ID: <1098123056.27056.12.camel@hpc-1> ib_smi: DR SMI switch port number handling Index: ib_smi.c =================================================================== --- ib_smi.c (revision 1019) +++ ib_smi.c (working copy) @@ -481,12 +481,18 @@ struct ib_mad_recv_wc *mad_recv_wc, int phys_port_cnt) { + int port_num; + /* SM Directed Route or LID Routed class */ if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE || mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED) { + if (mad_agent->device->node_type != IB_NODE_SWITCH) + port_num = mad_agent->port_num; + else + port_num = mad_recv_wc->wc->port_num; if (!smi_handle_smp_recv((struct ib_smp *)mad, mad_agent->device->node_type, - mad_agent->port_num, phys_port_cnt)) { + port_num, phys_port_cnt)) { /* SMI failed receive */ return 0; } From krkumar at us.ibm.com Mon Oct 18 13:11:19 2004 From: krkumar at us.ibm.com (Krishna Kumar) Date: Mon, 18 Oct 2004 13:11:19 -0700 (PDT) Subject: [openib-general] [PATCH] MAD thread doesn't get killed on failure to open port Message-ID: This patch should fix above problem, as well as get rid of redundant initialization code. Thanks, - KK --- ib_mad.c.org 2004-10-18 12:42:18.000000000 -0700 +++ ib_mad.c 2004-10-18 12:44:20.000000000 -0700 @@ -1676,9 +1676,6 @@ static int ib_mad_port_open(struct ib_de port_priv->device = device; port_priv->port_num = port_num; spin_lock_init(&port_priv->reg_lock); - for (i = 0; i < MAX_MGMT_VERSION; i++) { - port_priv->version[i] = NULL; - } cq_size = (IB_MAD_QP_SEND_SIZE + IB_MAD_QP_RECV_SIZE) * 2; port_priv->cq = ib_create_cq(port_priv->device, @@ -1737,12 +1734,8 @@ static int ib_mad_port_open(struct ib_de spin_lock_init(&port_priv->send_list_lock); INIT_LIST_HEAD(&port_priv->agent_list); INIT_LIST_HEAD(&port_priv->send_posted_mad_list); - port_priv->send_posted_mad_count = 0; - for (i = 0; i < IB_MAD_QPS_CORE; i++) { + for (i = 0; i < IB_MAD_QPS_CORE; i++) INIT_LIST_HEAD(&port_priv->recv_posted_mad_list[i]); - port_priv->recv_posted_mad_count[i] = 0; - port_priv->recv_wr_index[i] = 0; - } ret = ib_mad_thread_init(port_priv); if (ret) @@ -1751,7 +1744,7 @@ static int ib_mad_port_open(struct ib_de ret = ib_mad_port_start(port_priv); if (ret) { printk(KERN_ERR PFX "Couldn't start port\n"); - goto error8; + goto error9; } spin_lock_irqsave(&ib_mad_port_list_lock, flags); @@ -1760,6 +1753,8 @@ static int ib_mad_port_open(struct ib_de return 0; +error9: + kthread_stop(port_priv->mad_thread); error8: ib_destroy_qp(port_priv->qp[1]); error7: From krkumar at us.ibm.com Mon Oct 18 16:21:32 2004 From: krkumar at us.ibm.com (Krishna Kumar) Date: Mon, 18 Oct 2004 16:21:32 -0700 (PDT) Subject: [openib-general] [PATCH] Encapsulate searching of device/port into 1 routine Message-ID: This patch encapsulates the search of existing device/port open into a new routine that the existing 3 users can call instead of each of them going through the ib_mad_port_list list (lock and lockless version). Thanks, - KK --- ib_mad.c.org 2004-10-18 16:16:46.000000000 -0700 +++ ib_mad.c 2004-10-18 16:19:25.000000000 -0700 @@ -89,6 +89,38 @@ struct ib_mad_send_wc *mad_send_wc); /* + * Returns a ib_mad_port_private structure or NULL for a device/port. + * Assumes ib_mad_port_list_lock is being held. + */ +static inline struct ib_mad_port_private * +__ib_get_mad_port(struct ib_device *device, int port_num) +{ + struct ib_mad_port_private *entry; + + BUG_ON(!spin_is_locked(&ib_mad_port_list_lock)); + list_for_each_entry(entry, &ib_mad_port_list, port_list) { + if (entry->device == device && entry->port_num == port_num) + return entry; + } + return NULL; +} + +/* + * Wrapper function to return a ib_mad_port_private structure or NULL for + * a device/port. + */ +static inline struct ib_mad_port_private * +ib_get_mad_port(struct ib_device *device, int port_num) +{ + struct ib_mad_port_private *entry; + + spin_lock_irqsave(&ib_mad_port_list_lock, flags); + entry = __ib_get_mad_port(device, port_num); + spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); + + return entry; +} +/* * ib_register_mad_agent - Register to send/receive MADs */ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device, @@ -100,7 +132,7 @@ ib_mad_recv_handler recv_handler, void *context) { - struct ib_mad_port_private *entry, *port_priv = NULL; + struct ib_mad_port_private *port_priv; struct ib_mad_agent *ret; struct ib_mad_agent_private *mad_agent_priv; struct ib_mad_reg_req *reg_req = NULL; @@ -158,14 +190,7 @@ } /* Validate device and port */ - spin_lock_irqsave(&ib_mad_port_list_lock, flags); - list_for_each_entry(entry, &ib_mad_port_list, port_list) { - if (entry->device == device && entry->port_num == port_num) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); + port_priv = ib_get_mad_port(device, port_num); if (!port_priv) { ret = ERR_PTR(-ENODEV); goto error1; @@ -1647,18 +1672,11 @@ }; struct ib_qp_init_attr qp_init_attr; struct ib_qp_cap qp_cap; - struct ib_mad_port_private *entry, *port_priv = NULL; + struct ib_mad_port_private *port_priv; unsigned long flags; /* First, check if port already open at MAD layer */ - spin_lock_irqsave(&ib_mad_port_list_lock, flags); - list_for_each_entry(entry, &ib_mad_port_list, port_list) { - if (entry->device == device && entry->port_num == port_num) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); + port_priv = ib_get_mad_port(device, port_num); if (port_priv) { printk(KERN_DEBUG PFX "%s port %d already open\n", device->name, port_num); @@ -1778,20 +1796,15 @@ */ static int ib_mad_port_close(struct ib_device *device, int port_num) { - struct ib_mad_port_private *entry, *port_priv = NULL; + struct ib_mad_port_private *port_priv; unsigned long flags; spin_lock_irqsave(&ib_mad_port_list_lock, flags); - list_for_each_entry(entry, &ib_mad_port_list, port_list) { - if (entry->device == device && entry->port_num == port_num) { - port_priv = entry; - break; - } - } + port_priv = __ib_get_mad_port(device, port_num); if (port_priv == NULL) { - printk(KERN_ERR PFX "Port %d not found\n", port_num); spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); + printk(KERN_ERR PFX "Port %d not found\n", port_num); return -ENODEV; } From krkumar at us.ibm.com Mon Oct 18 17:14:42 2004 From: krkumar at us.ibm.com (Krishna Kumar) Date: Mon, 18 Oct 2004 17:14:42 -0700 (PDT) Subject: [openib-general] [PATCH] Encapsulate searching of device/port into 1 routine In-Reply-To: Message-ID: While creating patch, I inadvertently deleted a line. The function is ib_get_mad_port() where I missed declaring flags. Please use this patch instead. thx, - KK --- ib_mad.c.org 2004-10-18 16:16:46.000000000 -0700 +++ ib_mad.c 2004-10-18 17:09:51.000000000 -0700 @@ -89,6 +89,39 @@ static void ib_mad_complete_send_wr(stru struct ib_mad_send_wc *mad_send_wc); /* + * Returns a ib_mad_port_private structure or NULL for a device/port. + * Assumes ib_mad_port_list_lock is being held. + */ +static inline struct ib_mad_port_private * +__ib_get_mad_port(struct ib_device *device, int port_num) +{ + struct ib_mad_port_private *entry; + + BUG_ON(!spin_is_locked(&ib_mad_port_list_lock)); + list_for_each_entry(entry, &ib_mad_port_list, port_list) { + if (entry->device == device && entry->port_num == port_num) + return entry; + } + return NULL; +} + +/* + * Wrapper function to return a ib_mad_port_private structure or NULL for + * a device/port. + */ +static inline struct ib_mad_port_private * +ib_get_mad_port(struct ib_device *device, int port_num) +{ + struct ib_mad_port_private *entry; + unsigned long flags; + + spin_lock_irqsave(&ib_mad_port_list_lock, flags); + entry = __ib_get_mad_port(device, port_num); + spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); + + return entry; +} +/* * ib_register_mad_agent - Register to send/receive MADs */ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device, @@ -100,7 +133,7 @@ struct ib_mad_agent *ib_register_mad_age ib_mad_recv_handler recv_handler, void *context) { - struct ib_mad_port_private *entry, *port_priv = NULL; + struct ib_mad_port_private *port_priv; struct ib_mad_agent *ret; struct ib_mad_agent_private *mad_agent_priv; struct ib_mad_reg_req *reg_req = NULL; @@ -158,14 +191,7 @@ struct ib_mad_agent *ib_register_mad_age } /* Validate device and port */ - spin_lock_irqsave(&ib_mad_port_list_lock, flags); - list_for_each_entry(entry, &ib_mad_port_list, port_list) { - if (entry->device == device && entry->port_num == port_num) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); + port_priv = ib_get_mad_port(device, port_num); if (!port_priv) { ret = ERR_PTR(-ENODEV); goto error1; @@ -1647,18 +1673,11 @@ static int ib_mad_port_open(struct ib_de }; struct ib_qp_init_attr qp_init_attr; struct ib_qp_cap qp_cap; - struct ib_mad_port_private *entry, *port_priv = NULL; + struct ib_mad_port_private *port_priv; unsigned long flags; /* First, check if port already open at MAD layer */ - spin_lock_irqsave(&ib_mad_port_list_lock, flags); - list_for_each_entry(entry, &ib_mad_port_list, port_list) { - if (entry->device == device && entry->port_num == port_num) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); + port_priv = ib_get_mad_port(device, port_num); if (port_priv) { printk(KERN_DEBUG PFX "%s port %d already open\n", device->name, port_num); @@ -1778,20 +1797,15 @@ error3: */ static int ib_mad_port_close(struct ib_device *device, int port_num) { - struct ib_mad_port_private *entry, *port_priv = NULL; + struct ib_mad_port_private *port_priv; unsigned long flags; spin_lock_irqsave(&ib_mad_port_list_lock, flags); - list_for_each_entry(entry, &ib_mad_port_list, port_list) { - if (entry->device == device && entry->port_num == port_num) { - port_priv = entry; - break; - } - } + port_priv = __ib_get_mad_port(device, port_num); if (port_priv == NULL) { - printk(KERN_ERR PFX "Port %d not found\n", port_num); spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); + printk(KERN_ERR PFX "Port %d not found\n", port_num); return -ENODEV; } From volta104 at mail.netvision.net.il Tue Oct 19 07:27:11 2004 From: volta104 at mail.netvision.net.il (volta104 at mail.netvision.net.il) Date: Tue, 19 Oct 2004 10:27:11 -0400 Subject: [openib-general] [PATCH] Encapsulate searching of device/port into1 routine Message-ID: <94350-2200410219142711463@M2W084.mail2web.com> Hi, I will not be able to look at integrating your two ib_mad patches until Thursday. Thanks. -- Hal -------------------------------------------------------------------- mail2web - Check your email from the web at http://mail2web.com/ . From roland at topspin.com Tue Oct 19 08:52:58 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 19 Oct 2004 08:52:58 -0700 Subject: [openib-general] Kernel 2.6.9 Message-ID: <52ekjuaen9.fsf@topspin.com> Now that Linus has officially released 2.6.9, I am removing backwards compatibility from mthca (basically the patch below). I added a 2.6.9 patch in the src/linux-kernel/patches directory. Since the tree does not compile against anything older, I removed the older kernel patches (I don't think they're useful and they probably create confusion). Eagle-eyed readers will notice that there is now a linux-2.6.9-ipoib-multicast.diff patch in the patches directory as well. This adds the mapping from IPv4 multicast address to IB MGID. Once I fix multicast in the IPoIB driver, this patch will be required for correct operation. - R. Index: infiniband/hw/mthca/mthca_dev.h =================================================================== --- infiniband/hw/mthca/mthca_dev.h (revision 987) +++ infiniband/hw/mthca/mthca_dev.h (working copy) @@ -29,14 +29,6 @@ #include #include -/* - * Backwards compatibility for kernel 2.6.8.1. Remove when 2.6.9 is - * officially released with support for __iomem annotations. - */ -#ifndef __iomem -#define __iomem -#endif - #include "mthca_provider.h" #include "mthca_doorbell.h" From halr at voltaire.com Tue Oct 19 13:52:23 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 19 Oct 2004 16:52:23 -0400 Subject: [openib-general] GRH Validation on Incoming Packets Message-ID: <008a01c4b61d$8b79d220$9114a8c0@Gripen> Hi, I have a question about whether any GRH validation is done on incoming GSI packets which contain GRHs. If so, is the IP version field validated or does that need to be done by the MAD layer or clients (like PMA) ? Thanks. -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdror at mellanox.co.il Wed Oct 20 14:54:40 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Wed, 20 Oct 2004 23:54:40 +0200 Subject: [openib-general] GRH Validation on Incoming Packets Message-ID: <506C3D7B14CDD411A52C00025558DED60638B517@mtlex01.yok.mtl.com> You're exempt from checking it. Check out IB spec: C8-19: The network layer shall silently discard, with the exception of adjusting any applicable management counters specified elsewhere in this specification, packets that meet any of the following conditions: * Value of IPVer is not 6. * The value of DGID does not equal one of the GID values assigned to the port that received the packet. -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Tuesday, October 19, 2004 10:52 PM To: openib-general at openib.org Subject: [openib-general] GRH Validation on Incoming Packets Hi, I have a question about whether any GRH validation is done on incoming GSI packets which contain GRHs. If so, is the IP version field validated or does that need to be done by the MAD layer or clients (like PMA) ? Thanks. -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgalecki at sbs.com Wed Oct 20 15:08:55 2004 From: mgalecki at sbs.com (Mark Galecki) Date: Wed, 20 Oct 2004 15:08:55 -0700 Subject: [openib-general] Hello and a basic question Message-ID: <4DCF75ED419BD411821A0050DAB499CD029BDB8F@newex.sbscorp.sbs.com> Hello, I am new to IB in Linux, and I have a basic naming question (for which I tried to find an answer on the Web but failed because I guess it is too basic). Why is IP-over-IB named IP-over-IB? From what I can understand, this is really a software layer, that connects the protocol layer (IP and ARP) to the IB device driver. So, it should be called something like Protocol-to-IB, not IP-over-IB. Either my understanding is wrong, in which case please briefly explain why, or "Protocol-to-IB" is too complicated and unclear a name - then please confirm that was the reason why such a name was not chosen. Thank you, Mark Galeck This communication, including any attachments, is confidential information of SBS Technologies and intended solely for the use of the individual or entity to which it is addressed. Any unauthorized review, use, disclosure or distribution is prohibited. If you believe this message was sent to you in error, please immediately notify the sender by reply e-mail and delete this message without disclosing it. Thank you. From ftillier at infiniconsys.com Wed Oct 20 15:45:37 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Wed, 20 Oct 2004 15:45:37 -0700 Subject: [openib-general] Hello and a basic question In-Reply-To: <4DCF75ED419BD411821A0050DAB499CD029BDB8F@newex.sbscorp.sbs.com> Message-ID: <000001c4b6f6$87da57a0$6401a8c0@infiniconsys.com> > From: Mark Galecki [mailto:mgalecki at sbs.com] > Sent: Wednesday, October 20, 2004 3:09 PM > > Why is IP-over-IB named IP-over-IB? From what I can understand, this is > really a software layer, that connects the protocol layer (IP and ARP) to > the IB device driver. So, it should be called something like > Protocol-to-IB, not IP-over-IB. ARP is included in the IP over IB spec because it is used to do hardware address resolution. IPoIB is not a generic software layer that can route any protocol - it only supports IP (and whatever is needed for IP to work, i.e. ARP). The name illustrates the limitation - you can't send other protocols over an IPoIB implementation, as the rules for encapsulation aren't defined for other protocols. So you could call it "IP Protocol to IB", but that's a bit cumbersome. Hope that helps! - Fab From roland at topspin.com Wed Oct 20 16:04:43 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 20 Oct 2004 16:04:43 -0700 Subject: [openib-general] [PATCH][0/5] Add ib_get_dma_mr() API Message-ID: <200410201604.WCsI6flbVRY89a6V@topspin.com> This series of patches adds the struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); API to my branch. To recap, this creates an MR that can access any DMA address for an HCA device (all 64 bits of memory in the Tavor case) (Tom, if you want to give this a spin on sparc64, that would be great... this should cleanly fix the problem we hacked around before) - R. From roland at topspin.com Wed Oct 20 16:04:44 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 20 Oct 2004 16:04:44 -0700 Subject: [openib-general] [PATCH][1/5] ib_get_dma_mr(): core In-Reply-To: <200410201604.WCsI6flbVRY89a6V@topspin.com> Message-ID: <200410201604.lATCUQgyHykmXCXX@topspin.com> Index: infiniband/include/ib_verbs.h =================================================================== --- infiniband/include/ib_verbs.h (revision 1024) +++ infiniband/include/ib_verbs.h (working copy) @@ -736,6 +736,8 @@ enum ib_cq_notify cq_notify); int (*req_ncomp_notif)(struct ib_cq *cq, int wc_cnt); + struct ib_mr * (*get_dma_mr)(struct ib_pd *pd, + int mr_access_flags); struct ib_mr * (*reg_phys_mr)(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, @@ -924,6 +926,8 @@ -ENOSYS; } +struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, Index: infiniband/core/ib_verbs.c =================================================================== --- infiniband/core/ib_verbs.c (revision 915) +++ infiniband/core/ib_verbs.c (working copy) @@ -226,6 +226,23 @@ /* Memory regions */ +struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) +{ + struct ib_mr *mr; + + mr = pd->device->get_dma_mr(pd, mr_access_flags); + + if (!IS_ERR(mr)) { + mr->device = pd->device; + mr->pd = pd; + atomic_inc(&pd->usecnt); + atomic_set(&mr->usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_get_dma_mr); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, From roland at topspin.com Wed Oct 20 16:04:44 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 20 Oct 2004 16:04:44 -0700 Subject: [openib-general] [PATCH][2/5] ib_get_dma_mr(): mthca implementation In-Reply-To: <200410201604.lATCUQgyHykmXCXX@topspin.com> Message-ID: <200410201604.R9qWj8KatnyaxMsP@topspin.com> Index: infiniband/hw/mthca/mthca_dev.h =================================================================== --- infiniband/hw/mthca/mthca_dev.h (revision 1023) +++ infiniband/hw/mthca/mthca_dev.h (working copy) @@ -287,7 +287,7 @@ void mthca_pd_free(struct mthca_dev *dev, struct mthca_pd *pd); int mthca_mr_alloc_notrans(struct mthca_dev *dev, u32 pd, - struct mthca_mr *mr); + u32 access, struct mthca_mr *mr); int mthca_mr_alloc_phys(struct mthca_dev *dev, u32 pd, u64 *buffer_list, int buffer_size_shift, int list_len, u64 iova, u64 total_size, Index: infiniband/hw/mthca/mthca_provider.c =================================================================== --- infiniband/hw/mthca/mthca_provider.c (revision 987) +++ infiniband/hw/mthca/mthca_provider.c (working copy) @@ -398,6 +398,36 @@ return 0; } +static inline u32 convert_access(int acc) +{ + return (acc & IB_ACCESS_REMOTE_ATOMIC ? MTHCA_MPT_FLAG_ATOMIC : 0) | + (acc & IB_ACCESS_REMOTE_WRITE ? MTHCA_MPT_FLAG_REMOTE_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? MTHCA_MPT_FLAG_REMOTE_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? MTHCA_MPT_FLAG_LOCAL_WRITE : 0) | + MTHCA_MPT_FLAG_LOCAL_READ; +} + +static struct ib_mr *mthca_get_dma_mr(struct ib_pd *pd, int acc) +{ + struct mthca_mr *mr; + int err; + + mr = kmalloc(sizeof *mr, GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + err = mthca_mr_alloc_notrans(to_mdev(pd->device), + to_mpd(pd)->pd_num, + convert_access(acc), mr); + + if (err) { + kfree(mr); + return ERR_PTR(err); + } + + return &mr->ibmr; +} + static struct ib_mr *mthca_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *buffer_list, int num_phys_buf, @@ -410,8 +440,7 @@ u64 mask; int shift; int npages; - u32 access; - int err = -ENOMEM; + int err; int i, j, n; /* First check that we have enough alignment */ @@ -475,13 +504,6 @@ ++j) page_list[n++] = buffer_list[i].addr + ((u64) j << shift); - access = - (acc & IB_ACCESS_REMOTE_ATOMIC ? MTHCA_MPT_FLAG_ATOMIC : 0) | - (acc & IB_ACCESS_REMOTE_WRITE ? MTHCA_MPT_FLAG_REMOTE_WRITE : 0) | - (acc & IB_ACCESS_REMOTE_READ ? MTHCA_MPT_FLAG_REMOTE_READ : 0) | - (acc & IB_ACCESS_LOCAL_WRITE ? MTHCA_MPT_FLAG_LOCAL_WRITE : 0) | - MTHCA_MPT_FLAG_LOCAL_READ; - mthca_dbg(to_mdev(pd->device), "Registering memory at %llx (iova %llx) " "in PD %x; shift %d, npages %d.\n", (unsigned long long) buffer_list[0].addr, @@ -493,15 +515,13 @@ to_mpd(pd)->pd_num, page_list, shift, npages, *iova_start, total_size, - access, mr); + convert_access(acc), mr); if (err) { kfree(mr); - mr = ERR_PTR(err); - goto out; + return ERR_PTR(err); } -out: kfree(page_list); return &mr->ibmr; } @@ -576,6 +596,7 @@ dev->ib_dev.destroy_cq = mthca_destroy_cq; dev->ib_dev.poll_cq = mthca_poll_cq; dev->ib_dev.req_notify_cq = mthca_req_notify_cq; + dev->ib_dev.get_dma_mr = mthca_get_dma_mr; dev->ib_dev.reg_phys_mr = mthca_reg_phys_mr; dev->ib_dev.dereg_mr = mthca_dereg_mr; dev->ib_dev.attach_mcast = mthca_multicast_attach; Index: infiniband/hw/mthca/mthca_pd.c =================================================================== --- infiniband/hw/mthca/mthca_pd.c (revision 915) +++ infiniband/hw/mthca/mthca_pd.c (working copy) @@ -37,7 +37,10 @@ if (pd->pd_num == -1) return -ENOMEM; - err = mthca_mr_alloc_notrans(dev, pd->pd_num, &pd->ntmr); + err = mthca_mr_alloc_notrans(dev, pd->pd_num, + MTHCA_MPT_FLAG_LOCAL_READ | + MTHCA_MPT_FLAG_LOCAL_WRITE, + &pd->ntmr); if (err) mthca_free(&dev->pd_table.alloc, pd->pd_num); Index: infiniband/hw/mthca/mthca_mr.c =================================================================== --- infiniband/hw/mthca/mthca_mr.c (revision 915) +++ infiniband/hw/mthca/mthca_mr.c (working copy) @@ -108,7 +108,7 @@ } int mthca_mr_alloc_notrans(struct mthca_dev *dev, u32 pd, - struct mthca_mr *mr) + u32 access, struct mthca_mr *mr) { void *mailbox; struct mthca_mpt_entry *mpt_entry; @@ -133,10 +133,9 @@ mpt_entry->flags = cpu_to_be32(MTHCA_MPT_FLAG_SW_OWNS | MTHCA_MPT_FLAG_MIO | - MTHCA_MPT_FLAG_LOCAL_WRITE | - MTHCA_MPT_FLAG_LOCAL_READ | MTHCA_MPT_FLAG_PHYSICAL | - MTHCA_MPT_FLAG_REGION); + MTHCA_MPT_FLAG_REGION | + access); mpt_entry->page_size = 0; mpt_entry->key = cpu_to_be32(mr->ibmr.lkey); mpt_entry->pd = cpu_to_be32(pd); From roland at topspin.com Wed Oct 20 16:04:45 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 20 Oct 2004 16:04:45 -0700 Subject: [openib-general] [PATCH][3/5] ib_get_dma_mr(): use in MAD layer In-Reply-To: <200410201604.R9qWj8KatnyaxMsP@topspin.com> Message-ID: <200410201604.dIuGm8b59yxfhQ0W@topspin.com> Index: infiniband/core/mad_ib.c =================================================================== --- infiniband/core/mad_ib.c (revision 915) +++ infiniband/core/mad_ib.c (working copy) @@ -59,7 +59,7 @@ mad, IB_MAD_PACKET_SIZE, PCI_DMA_TODEVICE); gather_list.length = IB_MAD_PACKET_SIZE; - gather_list.lkey = priv->lkey; + gather_list.lkey = priv->mr->lkey; send_param.next = NULL; send_param.opcode = IB_WR_SEND; @@ -303,7 +303,7 @@ buf, IB_MAD_BUFFER_SIZE, PCI_DMA_FROMDEVICE); scatter_list.length = IB_MAD_BUFFER_SIZE; - scatter_list.lkey = priv->lkey; + scatter_list.lkey = priv->mr->lkey; receive_param.next = NULL; receive_param.sg_list = &scatter_list; Index: infiniband/core/mad_main.c =================================================================== --- infiniband/core/mad_main.c (revision 915) +++ infiniband/core/mad_main.c (working copy) @@ -42,31 +42,6 @@ kmem_cache_t *mad_cache; -static inline int ib_mad_register_memory(struct ib_pd *pd, - struct ib_mr **mr, - u32 *lkey) -{ - u64 iova = 0; - struct ib_phys_buf buffer_list = { - .addr = 0, - .size = (unsigned long) high_memory - PAGE_OFFSET - }; - - *mr = ib_reg_phys_mr(pd, &buffer_list, 1, /* list_len */ - IB_ACCESS_LOCAL_WRITE, &iova); - if (IS_ERR(*mr)) { - printk(KERN_WARNING "ib_reg_phys_mr failed " - "size 0x%016llx, iova 0x%016llx " - "(return code %ld)\n", - (unsigned long long) buffer_list.size, - (unsigned long long) iova, PTR_ERR(*mr)); - return PTR_ERR(*mr); - } - - *lkey = (*mr)->lkey; - return 0; -} - static int ib_mad_qp_create(struct ib_device *device, tTS_IB_PORT port, u32 qpn) @@ -197,8 +172,9 @@ INIT_WORK(&priv->cq_work, ib_mad_drain_cq, device); - if (ib_mad_register_memory(priv->pd, &priv->mr, &priv->lkey)) { - printk(KERN_WARNING "Failed to allocate MAD MR for %s\n", + priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(priv->mr)) { + printk(KERN_WARNING "Failed to create DMA MR for %s\n", device->name); goto error_free_cq; } Index: infiniband/core/mad_priv.h =================================================================== --- infiniband/core/mad_priv.h (revision 915) +++ infiniband/core/mad_priv.h (working copy) @@ -59,7 +59,6 @@ struct ib_pd *pd; struct ib_cq *cq; struct ib_mr *mr; - u32 lkey; struct ib_qp *qp[IB_MAD_MAX_PORTS_PER_DEVICE + 1][2]; struct ib_mad_buf send_buf [IB_MAD_MAX_PORTS_PER_DEVICE + 1][2][IB_MAD_SENDS_PER_QP]; From roland at topspin.com Wed Oct 20 16:04:51 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 20 Oct 2004 16:04:51 -0700 Subject: [openib-general] [PATCH][4/5] ib_get_dma_mr(): use in IPoIB In-Reply-To: <200410201604.dIuGm8b59yxfhQ0W@topspin.com> Message-ID: <200410201604.op4hdUVnLxZZfJpP@topspin.com> Index: infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- infiniband/ulp/ipoib/ipoib_verbs.c (revision 952) +++ infiniband/ulp/ipoib/ipoib_verbs.c (working copy) @@ -201,24 +201,10 @@ if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP)) goto out_free_cq; - { - /* XXX we assume physical memory starts at address 0. */ - struct ib_phys_buf buffer_list = { - .addr = 0, - .size = (unsigned long) high_memory - PAGE_OFFSET - }; - u64 dummy_iova = 0; - - priv->mr = ib_reg_phys_mr(priv->pd, &buffer_list, - 1, /* list_len */ - IB_ACCESS_LOCAL_WRITE, - &dummy_iova); - if (IS_ERR(priv->mr)) { - printk(KERN_WARNING "%s: ib_reg_phys_mr failed\n", ca->name); - goto out_free_cq; - } - - priv->lkey = priv->mr->lkey; + priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(priv->mr)) { + printk(KERN_WARNING "%s: ib_reg_phys_mr failed\n", ca->name); + goto out_free_cq; } return 0; Index: infiniband/ulp/ipoib/ipoib.h =================================================================== --- infiniband/ulp/ipoib/ipoib.h (revision 996) +++ infiniband/ulp/ipoib/ipoib.h (working copy) @@ -121,7 +121,6 @@ struct ib_mr *mr; struct ib_cq *cq; struct ib_qp *qp; - u32 lkey; u32 qkey; union ib_gid local_gid; Index: infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- infiniband/ulp/ipoib/ipoib_ib.c (revision 986) +++ infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -36,7 +36,7 @@ struct ib_sge list = { .addr = addr, .length = IPOIB_BUF_SIZE, - .lkey = priv->lkey, + .lkey = priv->mr->lkey, }; struct ib_recv_wr param = { .wr_id = work_request_id, @@ -238,7 +238,7 @@ struct ib_sge list = { .addr = addr, .length = len, - .lkey = priv->lkey, + .lkey = priv->mr->lkey, }; struct ib_send_wr param = { .wr_id = work_request_id, From roland at topspin.com Wed Oct 20 16:04:52 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 20 Oct 2004 16:04:52 -0700 Subject: [openib-general] [PATCH][5/5] ib_get_dma_mr(): use in SDP In-Reply-To: <200410201604.op4hdUVnLxZZfJpP@topspin.com> Message-ID: <200410201604.GyJMYdnGqFmk8hI7@topspin.com> Index: infiniband/ulp/sdp/sdp_conn.c =================================================================== --- infiniband/ulp/sdp/sdp_conn.c (revision 994) +++ infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -1869,7 +1869,6 @@ #ifdef _TS_SDP_AIO_SUPPORT struct ib_fmr_pool_param fmr_param_s; #endif - struct ib_phys_buf buffer_list; struct ib_device_attr node_info; struct sdev_hca_port *port; struct sdev_hca *hca; @@ -1918,16 +1917,7 @@ /* * memory registration */ - buffer_list.addr = 0; - buffer_list.size = (unsigned long)high_memory - PAGE_OFFSET; - - hca->iova = 0; - - hca->mem_h = ib_reg_phys_mr(hca->pd, - &buffer_list, - 1, /* list_len */ - IB_ACCESS_LOCAL_WRITE, - &hca->iova); + hca->mem_h = ib_get_dma_mr(hca->pd, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(hca->mem_h)) { result = PTR_ERR(hca->mem_h); TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, Index: infiniband/ulp/sdp/sdp_dev.h =================================================================== --- infiniband/ulp/sdp/sdp_dev.h (revision 915) +++ infiniband/ulp/sdp/sdp_dev.h (working copy) @@ -149,7 +149,6 @@ struct ib_mr *mem_h; /* registered memory region */ u32 l_key; /* local key */ u32 r_key; /* remote key */ - u64 iova; /* address */ struct ib_fmr_pool *fmr_pool; /* fast memory for Zcopy */ struct sdev_hca_port *port_list; /* ports on this HCA */ struct sdev_hca *next; /* next HCA in the list */ From bill at strahm.net Wed Oct 20 16:06:31 2004 From: bill at strahm.net (bill at strahm.net) Date: Wed, 20 Oct 2004 16:06:31 -0700 Subject: [openib-general] Hello and a basic question Message-ID: <20041020230631.31890.qmail@webmail01.mesa1.secureserver.net> The IETF Working Group that is specifying IP over IB is the IP over IB Working Group (IPoIB) http://www.ietf.org/html.charters/ipoib-charter.html That said it became very easy to just call the interface IP over IB. I personally don't like the name - but I can live with it. Kinda like calling eth0 interfaces IP over Ethernet interfaces. In reality I would just prefer to call it the IB interface. There is a rather long history in the IETF of creating IP over Foo Working Groups (IPoCDN, IPO, IPDVB, IPRPR) where foo is your favorite layer 2 protocol. This is my view of what has happened historically - whether this matches anyone elses opinion is left up to the reader. Bill > -------- Original Message -------- > Subject: [openib-general] Hello and a basic question > From: "Mark Galecki" > Date: Wed, October 20, 2004 3:08 pm > To: "'openib-general at openib.org'" > > Hello, I am new to IB in Linux, and I have a basic naming question (for > which I tried to find an answer on the Web but failed because I guess it is > too basic). > > Why is IP-over-IB named IP-over-IB? From what I can understand, this is > really a software layer, that connects the protocol layer (IP and ARP) to > the IB device driver. So, it should be called something like > Protocol-to-IB, not IP-over-IB. > > Either my understanding is wrong, in which case please briefly explain why, > or "Protocol-to-IB" is too complicated and unclear a name - then please > confirm that was the reason why such a name was not chosen. > > Thank you, > > Mark Galeck > > > This communication, including any attachments, is confidential information of SBS Technologies and intended solely for the use of the individual or entity to which it is addressed. Any unauthorized review, use, disclosure or distribution is prohibited. If you believe this message was sent to you in error, please immediately notify the sender by reply e-mail and delete this message without disclosing it. Thank you. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From roland at topspin.com Wed Oct 20 16:11:06 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 20 Oct 2004 16:11:06 -0700 Subject: [openib-general] Hello and a basic question In-Reply-To: <4DCF75ED419BD411821A0050DAB499CD029BDB8F@newex.sbscorp.sbs.com> (Mark Galecki's message of "Wed, 20 Oct 2004 15:08:55 -0700") References: <4DCF75ED419BD411821A0050DAB499CD029BDB8F@newex.sbscorp.sbs.com> Message-ID: <52acuh6l4l.fsf@topspin.com> Mark> Why is IP-over-IB named IP-over-IB? From what I can Mark> understand, this is really a software layer, that connects Mark> the protocol layer (IP and ARP) to the IB device driver. Mark> So, it should be called something like Protocol-to-IB, not Mark> IP-over-IB. I think the name IP-over-IB is used by the IETF working group because what they are specifying is an encapsulation to be used to carry IP packets (both IPv4 and IPv6) over InfiniBand. In addition, an encapsulation is given for ARP packets, but IP&ARP-over-IB is too cumbersome a name. You can think of the current IP-over-IB drafts as being analogous to RFC 894 et al ("Standard for the transmission of IP datagrams over Ethernet networks"). The name for the kernel driver carries over from the name of the encapsulation -- we have a driver that talks IP-over-IB. This is analogous to eg. the naming of the kernel "clip" driver, which does classical IP over ATM. - Roland From tduffy at sun.com Wed Oct 20 16:27:02 2004 From: tduffy at sun.com (Tom Duffy) Date: Wed, 20 Oct 2004 16:27:02 -0700 Subject: [openib-general] Hello and a basic question In-Reply-To: <4DCF75ED419BD411821A0050DAB499CD029BDB8F@newex.sbscorp.sbs.com> References: <4DCF75ED419BD411821A0050DAB499CD029BDB8F@newex.sbscorp.sbs.com> Message-ID: <1098314822.30513.10.camel@duffman> On Wed, 2004-10-20 at 15:08 -0700, Mark Galecki wrote: > This communication, including any attachments, is confidential > information of SBS Technologies and intended solely for the use of the > individual or entity to which it is addressed. Any unauthorized > review, use, disclosure or distribution is prohibited. If you believe > this message was sent to you in error, please immediately notify the > sender by reply e-mail and delete this message without disclosing it. > Thank you. I find it interesting that your email to a public list was confidential. I was clearly not the intended recipient and I will not agree to your terms. Nor do I think Sandia would be willing to delete it from their archives. -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mlleinin at hpcn.ca.sandia.gov Wed Oct 20 17:28:02 2004 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Wed, 20 Oct 2004 17:28:02 -0700 Subject: [openib-general] Hello and a basic question In-Reply-To: <1098314822.30513.10.camel@duffman> References: <4DCF75ED419BD411821A0050DAB499CD029BDB8F@newex.sbscorp.sbs.com> <1098314822.30513.10.camel@duffman> Message-ID: <1098318482.12120.124.camel@trinity> On Wed, 2004-10-20 at 16:27 -0700, Tom Duffy wrote: > On Wed, 2004-10-20 at 15:08 -0700, Mark Galecki wrote: > > This communication, including any attachments, is confidential > > information of SBS Technologies and intended solely for the use of the > > individual or entity to which it is addressed. Any unauthorized > > review, use, disclosure or distribution is prohibited. If you believe > > this message was sent to you in error, please immediately notify the > > sender by reply e-mail and delete this message without disclosing it. > > Thank you. > > I find it interesting that your email to a public list was confidential. > I was clearly not the intended recipient and I will not agree to your > terms. Nor do I think Sandia would be willing to delete it from their > archives. > Also remember all openib-general mail archived at gmane.org and mail-archive.com. :) - Matt From roland at topspin.com Wed Oct 20 22:28:45 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 20 Oct 2004 22:28:45 -0700 Subject: [openib-general] L_Key/MR for sending MADs? Message-ID: <526554iqr6.fsf@topspin.com> A little while ago, we had a brief discussion about what MR consumers should use for MADs they want to send. It seems the two possibilities where for the MAD layer to expose its MR for consumer use, or for consumers to create a new MR using the MAD layer's PD. Which option did we decide was the right way to go? - R. From Andras.Horvath at cern.ch Thu Oct 21 01:44:12 2004 From: Andras.Horvath at cern.ch (Andras.Horvath at cern.ch) Date: Thu, 21 Oct 2004 10:44:12 +0200 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52ekjuaen9.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> Message-ID: <20041021084412.GL21516@cern.ch> > compatibility from mthca (basically the patch below). I added a 2.6.9 > patch in the src/linux-kernel/patches directory. Since the tree does > not compile against anything older, I removed the older kernel > patches (I don't think they're useful and they probably create confusion). Please excuse my ignorance, but what exactly do I need to check out? I got https://openib.org/svn/gen2/branches/roland-merge/ for me, and the kernel patching and compiling part went OK. However, the userspace tools seem to be (at least partially) missing, and README.user-build does not really help much. What (or what else) do I need? thanks in advance, Andras From halr at voltaire.com Thu Oct 21 08:44:35 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 21 Oct 2004 11:44:35 -0400 Subject: [openib-general] L_Key/MR for sending MADs? In-Reply-To: <526554iqr6.fsf@topspin.com> References: <526554iqr6.fsf@topspin.com> Message-ID: <1098373475.1063.5.camel@hpc-1> On Thu, 2004-10-21 at 01:28, Roland Dreier wrote: > A little while ago, we had a brief discussion about what MR consumers > should use for MADs they want to send. It seems the two possibilities > where for the MAD layer to expose its MR for consumer use, or for > consumers to create a new MR using the MAD layer's PD. Which option > did we decide was the right way to go? I'm not sure we reached consensus/conclusion on this although there was discussion of some pros and cons of each approach. Right now, the agents are using the latter approach but that could easily be changed. -- Hal From halr at voltaire.com Thu Oct 21 09:22:06 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 21 Oct 2004 12:22:06 -0400 Subject: [openib-general] [PATCH] MAD thread doesn't get killed on failure to open port In-Reply-To: References: Message-ID: <1098375726.1063.22.camel@hpc-1> On Mon, 2004-10-18 at 16:11, Krishna Kumar wrote: > This patch should fix above problem, as well as get rid of redundant > initialization code. Thanks. Applied. -- Hal From halr at voltaire.com Thu Oct 21 09:25:39 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 21 Oct 2004 12:25:39 -0400 Subject: [openib-general] [PATCH] Encapsulate searching of device/port into 1 routine In-Reply-To: References: Message-ID: <1098375939.1063.27.camel@hpc-1> On Mon, 2004-10-18 at 20:14, Krishna Kumar wrote: > While creating patch, I inadvertently deleted a line. The function is > ib_get_mad_port() where I missed declaring flags. Please use this patch > instead. Thanks. Applied. -- Hal From mshefty at ichips.intel.com Thu Oct 21 09:42:36 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 21 Oct 2004 09:42:36 -0700 Subject: [openib-general] L_Key/MR for sending MADs? In-Reply-To: <526554iqr6.fsf@topspin.com> References: <526554iqr6.fsf@topspin.com> Message-ID: <20041021094236.469858ed.mshefty@ichips.intel.com> On Wed, 20 Oct 2004 22:28:45 -0700 Roland Dreier wrote: > A little while ago, we had a brief discussion about what MR consumers > should use for MADs they want to send. It seems the two possibilities > where for the MAD layer to expose its MR for consumer use, or for > consumers to create a new MR using the MAD layer's PD. Which option > did we decide was the right way to go? I'm not sure that we decided that there was a right way, so I'd say go with whatever is most convenient for you. My preference would be to expose the MR over just the lkey though, if we go that route. - Sean From tduffy at sun.com Thu Oct 21 10:55:10 2004 From: tduffy at sun.com (Tom Duffy) Date: Thu, 21 Oct 2004 10:55:10 -0700 Subject: [openib-general] [PATCH][0/5] Add ib_get_dma_mr() API In-Reply-To: <200410201604.WCsI6flbVRY89a6V@topspin.com> References: <200410201604.WCsI6flbVRY89a6V@topspin.com> Message-ID: <1098381310.2389.6.camel@duffman> On Wed, 2004-10-20 at 16:04 -0700, Roland Dreier wrote: > (Tom, if you want to give this a spin on sparc64, that would be > great... this should cleanly fix the problem we hacked around before) Ok, the new driver seems to load fine on sparc64. And I can load ipoib also, but I cannot get onto the IP network (most likely because of multicast join issues): ib0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) ib1 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) ib0.8001 Link encap:UNSPEC HWaddr 00-00-00-14-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:192.168.0.48 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:13 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From tduffy at sun.com Thu Oct 21 10:56:56 2004 From: tduffy at sun.com (Tom Duffy) Date: Thu, 21 Oct 2004 10:56:56 -0700 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <20041021084412.GL21516@cern.ch> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> Message-ID: <1098381416.2389.9.camel@duffman> On Thu, 2004-10-21 at 10:44 +0200, Andras.Horvath at cern.ch wrote: > > compatibility from mthca (basically the patch below). I added a 2.6.9 > > patch in the src/linux-kernel/patches directory. Since the tree does > > not compile against anything older, I removed the older kernel > > patches (I don't think they're useful and they probably create confusion). > > Please excuse my ignorance, but what exactly do I need to check out? > > I got https://openib.org/svn/gen2/branches/roland-merge/ for me, and the > kernel patching and compiling part went OK. However, the userspace tools > seem to be (at least partially) missing, and README.user-build does not > really help much. What (or what else) do I need? What are you trying to get working? Have you loaded ib_mthca? How about ib_ipoib? Do you have an external subnet manager running on something? -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From tduffy at sun.com Thu Oct 21 11:25:03 2004 From: tduffy at sun.com (Tom Duffy) Date: Thu, 21 Oct 2004 11:25:03 -0700 Subject: [OOPS] in ipoib_mcast_attach [openib-general] In-Reply-To: <1098381310.2389.6.camel@duffman> References: <200410201604.WCsI6flbVRY89a6V@topspin.com> <1098381310.2389.6.camel@duffman> Message-ID: <1098383103.2389.15.camel@duffman> On Thu, 2004-10-21 at 10:55 -0700, Tom Duffy wrote: > Ok, the new driver seems to load fine on sparc64. Actually, looks like I got an oops (probably because qp (i0) was NULL when passed to ib_modify_qp()): Oct 21 10:58:39 localhost kernel: \|/ ____ \|/ Oct 21 10:58:39 localhost kernel: "@'/ .. \`@" Oct 21 10:58:39 localhost kernel: /_| \__/ |_\ Oct 21 10:58:39 localhost kernel: \__U_/ Oct 21 10:58:39 localhost kernel: ts_ib_mad(2510): Oops [#1] Oct 21 10:58:39 localhost kernel: TSTATE: 0000004480009605 TPC: 0000000002018244 TNPC: 0000000002018248 Y: 00000000 Not tainted Oct 21 10:58:39 localhost kernel: TPC: Oct 21 10:58:39 localhost kernel: g0: d320a004d520a008 g1: 0000000081234568 g2: 0000000000000003 g3: 0000000000000000 Oct 21 10:58:39 localhost kernel: g4: fffff8007af1f980 g5: 0000000000000008 g6: fffff8007a574000 g7: 0000000000000024 Oct 21 10:58:39 localhost kernel: o0: fffff800798a0500 o1: 0000000000000020 o2: 0000000000000001 o3: fffff8007a57735e Oct 21 10:58:39 localhost kernel: o4: 0000000000008001 o5: 00000000000000f0 sp: fffff8007a5769e1 ret_pc: fffff8007f8c4ea0 Oct 21 10:58:39 localhost kernel: RPC: <0xfffff8007f8c4ea0> Oct 21 10:58:39 localhost kernel: l0: 0000000000000001 l1: fffff8007eb19000 l2: 0000000002173800 l3: fffff8007fe99818 Oct 21 10:58:39 localhost kernel: l4: 00000000000229f2 l5: 00000000006cfd90 l6: 00000000000057f0 l7: fffff8007fe1bb50 Oct 21 10:58:39 localhost kernel: i0: 0000000000000000 i1: fffff800798a0500 i2: 0000000000000040 i3: fffff8007a577360 Oct 21 10:58:39 localhost kernel: i4: 0000000000000040 i5: fffff8007f972840 i6: fffff8007a576aa1 i7: 000000000218cdf0 Oct 21 10:58:39 localhost kernel: I7: Oct 21 10:58:39 localhost kernel: Caller[000000000218cdf0]: ipoib_mcast_attach+0x70/0x120 [ib_ipoib] Oct 21 10:58:39 localhost kernel: Caller[000000000218b088]: ipoib_mcast_join_finish+0x148/0x3a0 [ib_ipoib] Oct 21 10:58:39 localhost kernel: Caller[000000000218b834]: ipoib_mcast_join_complete+0x174/0x1e0 [ib_ipoib] Oct 21 10:58:39 localhost kernel: Caller[000000000217e5bc]: _tsIbMulticastJoinResponse+0xdc/0x2e0 [ib_sa_client] Oct 21 10:58:39 localhost kernel: Caller[0000000002176328]: ib_client_query_callback+0x68/0xa0 [ib_client_query] Oct 21 10:58:39 localhost kernel: Caller[0000000002176fa0]: ib_client_mad_handler+0x60/0x100 [ib_client_query] Oct 21 10:58:39 localhost kernel: Caller[000000000216f2b8]: ib_mad_invoke_filters+0x98/0x120 [ib_mad] Oct 21 10:58:39 localhost kernel: Caller[000000000216f744]: ib_mad_dispatch+0xe4/0x1e0 [ib_mad] Oct 21 10:58:39 localhost kernel: Caller[000000000216fcf0]: ib_mad_work_thread+0x70/0x540 [ib_mad] Oct 21 10:58:39 localhost kernel: Caller[00000000020088b0]: _tsKernelQueueThread+0xf0/0x160 [ib_services] Oct 21 10:58:39 localhost kernel: Caller[0000000002008624]: _tsKernelThreadStart+0x84/0xa0 [ib_services] Oct 21 10:58:39 localhost kernel: Caller[0000000000417430]: kernel_thread+0x30/0x60 Oct 21 10:58:39 localhost kernel: Caller[00000000020086d4]: tsKernelThreadStart+0x94/0xe0 [ib_services] Oct 21 10:58:39 localhost kernel: Instruction DUMP: 01000000 01000000 9de3bf40 90100018 9410001a 92100019 c258a110 9fc04000 Oct 21 10:58:39 localhost kernel: TSTATE: 0000004411009601 TPC: 0000000000416428 TNPC: 000000000041642c Y: 00000000 Not tainted Oct 21 10:58:39 localhost kernel: TPC: Oct 21 10:58:39 localhost kernel: g0: 000000000000044d g1: 0000000000000001 g2: fffff80000016020 g3: fffff8007fe34008 Oct 21 10:58:39 localhost kernel: g4: fffff8007fdd4040 g5: 0000000000000001 g6: fffff8007fe34000 g7: 0000000000798018 Oct 21 10:58:39 localhost kernel: o0: 000000000000000a o1: fffff8007fe34008 o2: 000000000041742c o3: 00000000fffff800 Oct 21 10:58:39 localhost kernel: o4: 00000000000009cd o5: 0000000000000010 sp: fffff8007fe37681 ret_pc: 0000000000416478 Oct 21 10:58:39 localhost kernel: RPC: Oct 21 10:58:39 localhost kernel: l0: 00000000007920c0 l1: 0000000000798018 l2: 00000000007920c0 l3: 00000000007c1400 Oct 21 10:58:39 localhost kernel: l4: 0000000000000000 l5: 0000000000000000 l6: 000000000000000f l7: fffff8007fe37fd0 Oct 21 10:58:39 localhost kernel: i0: 0000000000000000 i1: 00000000007ae508 i2: 00000000007c15a8 i3: fffff8007fe37fd8 Oct 21 10:58:39 localhost kernel: i4: 0000000000000001 i5: 0000000000000000 i6: fffff8007fe37741 i7: 000000000042c42c Oct 21 10:58:40 localhost kernel: I7: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From halr at voltaire.com Thu Oct 21 12:33:30 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 21 Oct 2004 15:33:30 -0400 Subject: [openib-general] [PATCH] Access layer differences to roland_merge branch updated Message-ID: <1098387210.3026.3.camel@hpc-1> Differences to roland_merge branch needed for using the OpenIB access layer (.../openib-candidate/src/linux-kernel/patches/roland_merge.diff) have been updated. This is a change to the core/Makefile. It has been tested with 2.6.9. -- Hal From krkumar at us.ibm.com Thu Oct 21 14:55:18 2004 From: krkumar at us.ibm.com (Krishna Kumar) Date: Thu, 21 Oct 2004 14:55:18 -0700 (PDT) Subject: [openib-general] [PATCH] Minor cleanup and remove redundant initializations Message-ID: Trivial patch as described above in ib_mad.c (plus moving some variables into localized loop). Thanks, - KK --- ib_mad.c.org 2004-10-21 14:48:53.000000000 -0700 +++ ib_mad.c 2004-10-21 14:52:31.000000000 -0700 @@ -356,11 +356,8 @@ { int ret; struct ib_send_wr *cur_send_wr, *next_send_wr; - struct ib_send_wr *bad_wr; - struct ib_mad_send_wr_private *mad_send_wr; struct ib_mad_agent_private *mad_agent_priv; struct ib_mad_port_private *port_priv; - unsigned long flags; cur_send_wr = send_wr; /* Validate supplied parameters */ @@ -382,6 +379,10 @@ /* Walk list of send WRs and post each on send list */ cur_send_wr = send_wr; while (cur_send_wr) { + unsigned long flags; + struct ib_send_wr *bad_wr; + struct ib_mad_send_wr_private *mad_send_wr; + next_send_wr = (struct ib_send_wr *)cur_send_wr->next; /* Allocate MAD send WR tracking structure */ @@ -1472,7 +1473,7 @@ static inline int ib_mad_change_qp_state_to_init(struct ib_qp *qp) { int ret; - struct ib_qp_attr *attr = NULL; + struct ib_qp_attr *attr; int attr_mask; struct ib_qp_cap qp_cap; @@ -1508,7 +1509,7 @@ static inline int ib_mad_change_qp_state_to_rtr(struct ib_qp *qp) { int ret; - struct ib_qp_attr *attr = NULL; + struct ib_qp_attr *attr; int attr_mask; struct ib_qp_cap qp_cap; @@ -1534,7 +1535,7 @@ static inline int ib_mad_change_qp_state_to_rts(struct ib_qp *qp) { int ret; - struct ib_qp_attr *attr = NULL; + struct ib_qp_attr *attr; int attr_mask; struct ib_qp_cap qp_cap; @@ -1561,7 +1562,7 @@ static inline int ib_mad_change_qp_state_to_reset(struct ib_qp *qp) { int ret; - struct ib_qp_attr *attr = NULL; + struct ib_qp_attr *attr; int attr_mask; struct ib_qp_cap qp_cap; @@ -1671,8 +1672,6 @@ .addr = 0, .size = (unsigned long) high_memory - PAGE_OFFSET }; - struct ib_qp_init_attr qp_init_attr; - struct ib_qp_cap qp_cap; struct ib_mad_port_private *port_priv; unsigned long flags; @@ -1723,6 +1722,9 @@ } for (i = 0; i < IB_MAD_QPS_CORE; i++) { + struct ib_qp_init_attr qp_init_attr; + struct ib_qp_cap qp_cap; + memset(&qp_init_attr, 0, sizeof qp_init_attr); qp_init_attr.send_cq = port_priv->cq; qp_init_attr.recv_cq = port_priv->cq; @@ -1743,7 +1745,7 @@ goto error6; else goto error7; - } + } printk(KERN_DEBUG PFX "Created ib_mad QP %d\n", port_priv->qp[i]->qp_num); } From mshefty at ichips.intel.com Thu Oct 21 16:08:49 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 21 Oct 2004 16:08:49 -0700 Subject: [openib-general] [PATCH] timeout wq code Message-ID: <20041021160849.46271e5f.mshefty@ichips.intel.com> Code to use a work queue to time out MADs. There is one work queue per port. (Completion handling code was not changed.) I'm working on creating a few simple test cases to verify MAD functionality (registration, timeouts, sends, receives, and RMPP in the future), but these are not yet done. -Sean Index: access/ib_mad_priv.h =================================================================== --- access/ib_mad_priv.h (revision 1036) +++ access/ib_mad_priv.h (working copy) @@ -58,6 +58,7 @@ #include #include +#include #include #include @@ -112,6 +113,8 @@ spinlock_t lock; struct list_head send_list; struct list_head wait_list; + struct work_struct work; + unsigned long timeout; atomic_t refcount; wait_queue_head_t wait; @@ -149,6 +152,7 @@ spinlock_t reg_lock; struct ib_mad_mgmt_class_table *version[MAX_MGMT_VERSION]; struct list_head agent_list; + struct workqueue_struct *wq; spinlock_t send_list_lock; struct list_head send_posted_mad_list; Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 1036) +++ access/ib_mad.c (working copy) @@ -87,6 +87,7 @@ static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv); static void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, struct ib_mad_send_wc *mad_send_wc); +static void timeout_sends(void *data); /* * Returns a ib_mad_port_private structure or NULL for a device/port. @@ -264,6 +265,7 @@ spin_lock_init(&mad_agent_priv->lock); INIT_LIST_HEAD(&mad_agent_priv->send_list); INIT_LIST_HEAD(&mad_agent_priv->wait_list); + INIT_WORK(&mad_agent_priv->work, timeout_sends, mad_agent_priv); atomic_set(&mad_agent_priv->refcount, 1); init_waitqueue_head(&mad_agent_priv->wait); mad_agent_priv->port_priv = port_priv; @@ -299,6 +301,9 @@ */ cancel_mads(mad_agent_priv); + cancel_delayed_work(&mad_agent_priv->work); + flush_workqueue(mad_agent_priv->port_priv->wq); + spin_lock_irqsave(&mad_agent_priv->port_priv->reg_lock, flags); remove_mad_reg_req(mad_agent_priv); list_del(&mad_agent_priv->agent_list); @@ -398,7 +403,8 @@ mad_send_wr->tid = send_wr->wr.ud.mad_hdr->tid; mad_send_wr->agent = mad_agent; /* Timeout will be updated after send completes. */ - mad_send_wr->timeout = cur_send_wr->wr.ud.timeout_ms; + mad_send_wr->timeout = msecs_to_jiffies(cur_send_wr->wr. + ud.timeout_ms); /* One reference for each work request to QP + response. */ mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0); mad_send_wr->status = IB_WC_SUCCESS; @@ -422,9 +428,7 @@ spin_unlock_irqrestore(&mad_agent_priv->lock, flags); *bad_send_wr = cur_send_wr; - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); - + atomic_dec(&mad_agent_priv->refcount); return ret; } cur_send_wr= next_send_wr; @@ -1007,6 +1011,30 @@ return; } +static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv) +{ + struct ib_mad_send_wr_private *mad_send_wr; + unsigned long delay; + + if (list_empty(&mad_agent_priv->wait_list)) { + cancel_delayed_work(&mad_agent_priv->work); + } else { + mad_send_wr = list_entry(mad_agent_priv->wait_list.next, + struct ib_mad_send_wr_private, + agent_list); + + if (time_after(mad_agent_priv->timeout, mad_send_wr->timeout)) { + mad_agent_priv->timeout = mad_send_wr->timeout; + cancel_delayed_work(&mad_agent_priv->work); + delay = mad_send_wr->timeout - jiffies; + if ((long)delay <= 0) + delay = 1; + queue_delayed_work(mad_agent_priv->port_priv->wq, + &mad_agent_priv->work, delay); + } + } +} + static void wait_for_response(struct ib_mad_agent_private *mad_agent_priv, struct ib_mad_send_wr_private *mad_send_wr ) { @@ -1027,6 +1055,13 @@ break; } list_add(&mad_send_wr->agent_list, list_item); + + /* Re-schedule a work item if we have a shorter timeout. */ + if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list) { + cancel_delayed_work(&mad_agent_priv->work); + queue_delayed_work(mad_agent_priv->port_priv->wq, + &mad_agent_priv->work, delay); + } } /* @@ -1059,6 +1094,7 @@ /* Remove send from MAD agent and notify client of completion */ list_del(&mad_send_wr->agent_list); + adjust_timeout(mad_agent_priv); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); if (mad_send_wr->status != IB_WC_SUCCESS ) @@ -1233,6 +1269,7 @@ } list_del(&mad_send_wr->agent_list); + adjust_timeout(mad_agent_priv); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); mad_send_wc.status = IB_WC_WR_FLUSH_ERR; @@ -1250,6 +1287,47 @@ } EXPORT_SYMBOL(ib_cancel_mad); +static void timeout_sends(void *data) +{ + struct ib_mad_agent_private *mad_agent_priv; + struct ib_mad_send_wr_private *mad_send_wr; + struct ib_mad_send_wc mad_send_wc; + unsigned long flags, delay; + + mad_agent_priv = (struct ib_mad_agent_private *)data; + + mad_send_wc.status = IB_WC_RESP_TIMEOUT_ERR; + mad_send_wc.vendor_err = 0; + + spin_lock_irqsave(&mad_agent_priv->lock, flags); + while (!list_empty(&mad_agent_priv->wait_list)) { + mad_send_wr = list_entry(mad_agent_priv->wait_list.next, + struct ib_mad_send_wr_private, + agent_list); + + if (time_after(mad_send_wr->timeout, jiffies)) { + delay = mad_send_wr->timeout - jiffies; + if ((long)delay <= 0) + delay = 1; + queue_delayed_work(mad_agent_priv->port_priv->wq, + &mad_agent_priv->work, delay); + break; + } + + list_del(&mad_send_wr->agent_list); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); + + mad_send_wc.wr_id = mad_send_wr->wr_id; + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, + &mad_send_wc); + + kfree(mad_send_wr); + atomic_dec(&mad_agent_priv->refcount); + spin_lock_irqsave(&mad_agent_priv->lock, flags); + } + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); +} + /* * IB MAD thread */ @@ -1756,14 +1834,20 @@ for (i = 0; i < IB_MAD_QPS_CORE; i++) INIT_LIST_HEAD(&port_priv->recv_posted_mad_list[i]); + port_priv->wq = create_workqueue("ib_mad"); + if (!port_priv->wq) { + ret = -ENOMEM; + goto error8; + } + ret = ib_mad_thread_init(port_priv); if (ret) - goto error8; + goto error9; ret = ib_mad_port_start(port_priv); if (ret) { printk(KERN_ERR PFX "Couldn't start port\n"); - goto error9; + goto error10; } spin_lock_irqsave(&ib_mad_port_list_lock, flags); @@ -1772,8 +1856,10 @@ return 0; -error9: +error10: kthread_stop(port_priv->mad_thread); +error9: + destroy_workqueue(port_priv->wq); error8: ib_destroy_qp(port_priv->qp[1]); error7: @@ -1814,6 +1900,7 @@ ib_mad_port_stop(port_priv); kthread_stop(port_priv->mad_thread); + destroy_workqueue(port_priv->wq); ib_destroy_qp(port_priv->qp[1]); ib_destroy_qp(port_priv->qp[0]); ib_dereg_mr(port_priv->mr); -- From roland at topspin.com Thu Oct 21 21:21:56 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 21 Oct 2004 21:21:56 -0700 Subject: [OOPS] in ipoib_mcast_attach [openib-general] In-Reply-To: <1098383103.2389.15.camel@duffman> (Tom Duffy's message of "Thu, 21 Oct 2004 11:25:03 -0700") References: <200410201604.WCsI6flbVRY89a6V@topspin.com> <1098381310.2389.6.camel@duffman> <1098383103.2389.15.camel@duffman> Message-ID: <52oeivgz6j.fsf@topspin.com> Tom> Ok, the new driver seems to load fine on sparc64. Do your ports at least get to active? Tom> Actually, looks like I got an oops (probably because qp (i0) Tom> was NULL when passed to ib_modify_qp()): Strange, I'm not seeing how this is possible. I'll keep looking. - R. From roland at topspin.com Thu Oct 21 21:22:24 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 21 Oct 2004 21:22:24 -0700 Subject: [OOPS] in ipoib_mcast_attach [openib-general] In-Reply-To: <1098383103.2389.15.camel@duffman> (Tom Duffy's message of "Thu, 21 Oct 2004 11:25:03 -0700") References: <200410201604.WCsI6flbVRY89a6V@topspin.com> <1098381310.2389.6.camel@duffman> <1098383103.2389.15.camel@duffman> Message-ID: <52k6tjgz5r.fsf@topspin.com> Tom> Actually, looks like I got an oops (probably because qp (i0) Tom> was NULL when passed to ib_modify_qp()): By the way, how do you trigger this oops? ifconfig the interface up? - R. From Andras.Horvath at cern.ch Fri Oct 22 04:35:24 2004 From: Andras.Horvath at cern.ch (Andras.Horvath at cern.ch) Date: Fri, 22 Oct 2004 13:35:24 +0200 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <1098381416.2389.9.camel@duffman> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> Message-ID: <20041022113524.GI21516@cern.ch> > What are you trying to get working? VAPI and/or IPoIB. > Have you loaded ib_mthca? How about ib_ipoib? well.. now I've started from 'find /lib/modules/2.6.9ib/kernel/drivers/infiniband/' due to lack of any docs. In short, the kernel crashed after trying to send the first packet on ipoib (?). The test setup is two dual Xeon i386 boxes, 2.6.9 plus Roland's IB patches (SMP kernel). Kernel messages as below - please let me know if I did something wrong :) or how I can aid the debug process. The hardware is a Voltaire PCI-X HCA: 03:01.0 PCI bridge: Mellanox Technology MT23108 PCI Bridge (rev a1) 04:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) (but I can try with "original" Mellanox Cougar boards as we have those as well) By the way, what is the primary development platform (architecture) for openib.org? Thanks in advance! > Do you have an external subnet manager running on something? not (at least, not yet, this is a back-to-back demo setup). So, what I did: # modprobe ib_mthca # modprobe ib_ipoib # dmesg [...] ib_mthca: Mellanox InfiniBand HCA driver v0.05-pre (June 13, 2004) ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:04:00.0) ib_mthca 0000:04:00.0: Found bridge: Mellanox Technology MT23108 PCI Bridge (0000:03:01.0) ib_mthca 0000:04:00.0: FW version 000100180000, max_cmds 1 ib_mthca 0000:04:00.0: FW size 6143 KB (start f7a00000, end f7ffffff) ib_mthca 0000:04:00.0: HCA memory size 131071 KB (start f0000000, end f7ffffff) ib_mthca 0000:04:00.0: Max QPs: 16777216, reserved QPs: 16, entry size: 256 ib_mthca 0000:04:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: 64 ib_mthca 0000:04:00.0: Max EQs: 64, reserved EQs: 1, entry size: 64 ib_mthca 0000:04:00.0: reserved MPTs: 16, reserved MTTs: 16 ib_mthca 0000:04:00.0: Max PDs: 16777216, reserved PDs: 0, reserved UARs: 1 ib_mthca 0000:04:00.0: Max QP/MCG: 16777216, reserved MGMs: 0 ib_mthca 0000:04:00.0: Flags: 003f0337 ib_mthca 0000:04:00.0: profile[ 0]--10/20 @ 0x f0000000 (size 0x 4000000) ib_mthca 0000:04:00.0: profile[ 1]-- 0/16 @ 0x f4000000 (size 0x 1000000) ib_mthca 0000:04:00.0: profile[ 2]-- 7/18 @ 0x f5000000 (size 0x 800000) ib_mthca 0000:04:00.0: profile[ 3]-- 9/17 @ 0x f5800000 (size 0x 800000) ib_mthca 0000:04:00.0: profile[ 4]-- 3/16 @ 0x f6000000 (size 0x 400000) ib_mthca 0000:04:00.0: profile[ 5]-- 4/16 @ 0x f6400000 (size 0x 200000) ib_mthca 0000:04:00.0: profile[ 6]--12/15 @ 0x f6600000 (size 0x 100000) ib_mthca 0000:04:00.0: profile[ 7]-- 8/13 @ 0x f6700000 (size 0x 80000) ib_mthca 0000:04:00.0: profile[ 8]--11/11 @ 0x f6780000 (size 0x 10000) ib_mthca 0000:04:00.0: profile[ 9]-- 6/ 5 @ 0x f6790000 (size 0x 800) ib_mthca 0000:04:00.0: HCA memory: allocated 106050 KB/124928 KB (18878 KB free) ib_mthca 0000:04:00.0: Allocated EQ 1 with 65536 entries ib_mthca 0000:04:00.0: Allocated EQ 2 with 128 entries ib_mthca 0000:04:00.0: Allocated EQ 3 with 128 entries ib_mthca 0000:04:00.0: Setting mask 000000000003c3fe for eqn 2 ib_mthca 0000:04:00.0: Setting mask 0000000000000400 for eqn 3 ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 1; shift 30, npages 1. ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 2; shift 30, npages 1. ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 3; shift 30, npages 1. # ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) # ifconfig ib0 up 10.0.0.2/8 # ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. 64 bytes from 10.0.0.2: icmp_seq=0 ttl=64 time=0.035 ms 64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.015 ms [etc] (did same on the other box, configuring it to 10.0.0.4, then on 10.0.0.2: ) # ping 10.0.0.4 PING 10.0.0.4 (1Unable to handle kernel NULL pointer dereference at virtual address 0000000c printing eip: f89abc03 *pde = 36a1b001 Oops: 0000 [#1] SMP Modules linked in: ib_ipoib ib_sa_client ib_client_query ib_mad ib_poll ib_mthca ib_core ib_services e100 e1000 floppy sg scsi_mod microcode CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010046 (2.6.9ib) EIP is at mthca_post_send+0x575/0x70d [ib_mthca] eax: 00000000 ebx: c03d5d90 ecx: 00000000 edx: f7d59280 esi: f7d59280 edi: c03d5d8c ebp: f654e010 esp: c03d5cd0 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c03d4000 task=c034fac0) Stack: f7d59280 00000000 00000206 f659b080 f8c15153 f6d3d880 c03d5d54 00c15153 00000000 00000000 00000001 f6d3d880 00000000 00000000 00000286 00000000 f6981800 f8c15153 88088a89 00000000 00000000 f6981800 f6afa220 c03d5d8c Call Trace: [] ipoib_send+0x1de/0x3e8 [ib_ipoib] [] ipoib_mcast_send+0x2a/0x2e [ib_ipoib] [] ipoib_start_xmit+0x30e/0x320 [ib_ipoib] [] ipoib_hard_header+0x90/0xcf [ib_ipoib] [] arp_create+0x1e9/0x269 [] qdisc_restart+0x14a/0x1d1 [] dev_queue_xmit+0x216/0x29f [] arp_solicit+0x104/0x1d8 [] neigh_timer_handler+0x176/0x27d [] neigh_timer_handler+0x0/0x27d [] run_timer_softirq+0xc7/0x180 [] __do_softirq+0xba/0xc9 [] do_softirq+0x2d/0x2f [] smp_apic_timer_interrupt+0x90/0xfa [] default_idle+0x2a/0x2d [] default_idle+0x0/0x2d [] apic_timer_interrupt+0x1a/0x20 [] default_idle+0x0/0x2d [] default_idle+0x2a/0x2d [] cpu_idle+0x37/0x40 [] start_kernel+0x18d/0x1cb [] unknown_bootoption+0x0/0x182 Code: 44 24 34 75 10 83 c5 10 c7 44 24 28 02 00 00 00 e9 b4 fb ff ff 8b 54 24 6c 8b 44 24 70 89 10 e9 31 fe ff ff 8b 5c 24 6c 8b 43 20 <8b> 48 0c 89 c8 89 ca 25 00 ff 00 00 c1 e2 18 c1 e0 08 09 c2 89 0.0.0.4) 56(84) <0>Kernel panic - not syncing: Fatal exception in interrupt From krause at cup.hp.com Fri Oct 22 07:36:14 2004 From: krause at cup.hp.com (Michael Krause) Date: Fri, 22 Oct 2004 07:36:14 -0700 Subject: [openib-general] IBTA Spec Fee Reversal - FYI Message-ID: <6.1.2.0.2.20041022073315.01e3c740@esmail.cup.hp.com> FYI... The IBTA reversed the recent change in spec fee policy to its previous state, i.e. the specs are free to download to the same extent they were before. IP rights are orthogonal to free spec access and nothing has changed in this regard since the inception of the IBTA. Mike From tom.duffy at sun.com Fri Oct 22 07:45:50 2004 From: tom.duffy at sun.com (Tom Duffy) Date: Fri, 22 Oct 2004 07:45:50 -0700 Subject: [OOPS] in ipoib_mcast_attach [openib-general] In-Reply-To: <52k6tjgz5r.fsf@topspin.com> References: <200410201604.WCsI6flbVRY89a6V@topspin.com> <1098381310.2389.6.camel@duffman> <1098383103.2389.15.camel@duffman> <52k6tjgz5r.fsf@topspin.com> Message-ID: <1098456350.1999.1.camel@duffman> On Thu, 2004-10-21 at 21:22 -0700, Roland Dreier wrote: > Tom> Actually, looks like I got an oops (probably because qp (i0) > Tom> was NULL when passed to ib_modify_qp()): > > By the way, how do you trigger this oops? ifconfig the interface up? ifconfig up, then assign it an IP address. And somewhere along the way, I think I might have restarted the SM. -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From roland at topspin.com Fri Oct 22 09:06:57 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 09:06:57 -0700 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <20041022113524.GI21516@cern.ch> (Andras Horvath's message of "Fri, 22 Oct 2004 13:35:24 +0200") References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> Message-ID: <52d5zahh3y.fsf@topspin.com> Andras> VAPI and/or IPoIB. Unfortunately there are no user space verbs right now (work should be starting soon). When we do implement the verbs the API will most likely be closer to the current kernel API than to VAPI. Andras> In short, the kernel crashed after trying to send the Andras> first packet on ipoib (?). The test setup is two dual Xeon Andras> i386 boxes, 2.6.9 plus Roland's IB patches (SMP kernel). Andras> Kernel messages as below - please let me know if I did Andras> something wrong :) or how I can aid the debug process. I think I understand this crash and know how to get rid of it. However, IPoIB probably still will not work for you because of the multicast member record issue with your SM. That will be fixed once we cut over to the new MAD code, which should happen next week (assuming that the new MAD module work is completed). Thanks, Roland From halr at voltaire.com Fri Oct 22 11:18:11 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 14:18:11 -0400 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52d5zahh3y.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> Message-ID: <1098469091.22400.0.camel@hpc-1> On Fri, 2004-10-22 at 12:06, Roland Dreier wrote: > That will be fixed once > we cut over to the new MAD code, which should happen next week > (assuming that the new MAD module work is completed). What new MAD module work are you referring to here ? -- Hal From roland at topspin.com Fri Oct 22 11:15:31 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 11:15:31 -0700 Subject: [openib-general] [PATCH] Better IPoIB multicast handling Message-ID: <528y9yhb5o.fsf@topspin.com> This patch improves how IPoIB handles multicasts. It should fix the crash that Andras saw; unfortunately I don't think it will help with Tom's crash (although I don't understand that crash so it might fix it). Unfortunately it still probably doesn't work with some SMs. Also, with this patch, multicast seems to work (tested only with "ping -I ib0 224.0.0.1") although the way I handle multicast neighbours needs cleanup. Any test feedback is appreciated... Thanks, Roland Index: infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- infiniband/ulp/ipoib/ipoib_main.c (revision 952) +++ infiniband/ulp/ipoib/ipoib_main.c (working copy) @@ -435,6 +435,16 @@ struct ipoib_path *path; if (skb->dst && skb->dst->neighbour) { + if (unlikely(skb->dst->neighbour->ha[4] == 0xff)) { + /* Add in the P_Key */ + skb->dst->neighbour->ha[8] = (priv->pkey >> 8) & 0xff; + skb->dst->neighbour->ha[9] = priv->pkey & 0xff; + ipoib_mcast_send(dev, + (union ib_gid *) (skb->dst->neighbour->ha + 4), + skb); + return 0; + } + if (unlikely(!IPOIB_PATH(skb->dst->neighbour))) return path_rec_start(skb, dev); @@ -455,26 +465,13 @@ skb_pull(skb, sizeof *phdr); if (phdr->hwaddr[4] == 0xff) { - /* multicast/broadcast GID */ - if (!memcmp(phdr->hwaddr, dev->broadcast, IPOIB_HW_ADDR_LEN)) - ipoib_mcast_send(dev, priv->broadcast, skb); - else { - ipoib_dbg(priv, "Dropping (no %s): type %04x, QPN %06x " - IPOIB_GID_FMT "\n", - skb->dst ? "neigh" : "dst", - be16_to_cpup((u16 *) skb->data), - be32_to_cpup((u32 *) phdr->hwaddr), - phdr->hwaddr[ 4], phdr->hwaddr[ 5], - phdr->hwaddr[ 6], phdr->hwaddr[ 7], - phdr->hwaddr[ 8], phdr->hwaddr[ 9], - phdr->hwaddr[10], phdr->hwaddr[11], - phdr->hwaddr[12], phdr->hwaddr[13], - phdr->hwaddr[14], phdr->hwaddr[15], - phdr->hwaddr[16], phdr->hwaddr[17], - phdr->hwaddr[18], phdr->hwaddr[19]); - goto err; - } - } else { + /* Add in the P_Key */ + phdr->hwaddr[8] = (priv->pkey >> 8) & 0xff; + phdr->hwaddr[9] = priv->pkey & 0xff; + + ipoib_mcast_send(dev, (union ib_gid *) (phdr->hwaddr + 4), skb); + } + else { /* unicast GID -- ARP reply?? */ /* Index: infiniband/ulp/ipoib/ipoib.h =================================================================== --- infiniband/ulp/ipoib/ipoib.h (revision 1031) +++ infiniband/ulp/ipoib/ipoib.h (working copy) @@ -57,6 +57,7 @@ IPOIB_NUM_WC = 4, IPOIB_MAX_PATH_REC_QUEUE = 3, + IPOIB_MAX_MCAST_QUEUE = 3, IPOIB_FLAG_TX_FULL = 0, IPOIB_FLAG_OPER_UP = 1, @@ -191,10 +192,7 @@ void ipoib_mcast_join_task(void *dev_ptr); void ipoib_mcast_put(struct ipoib_mcast *mcast); -int ipoib_mcast_lookup(struct net_device *dev, union ib_gid *mgid, - struct ipoib_mcast **mcast); -int ipoib_mcast_queue_packet(struct ipoib_mcast *mcast, struct sk_buff *skb); -void ipoib_mcast_send(struct net_device *dev, struct ipoib_mcast *mcast, +void ipoib_mcast_send(struct net_device *dev, union ib_gid *mgid, struct sk_buff *skb); void ipoib_mcast_restart_task(void *dev_ptr); Index: infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- infiniband/ulp/ipoib/ipoib_multicast.c (revision 952) +++ infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -618,15 +618,13 @@ } /* =============================================================== */ -/*..ipoib_mcast_lookup -- return reference to multicast */ -int ipoib_mcast_lookup(struct net_device *dev, - union ib_gid *mgid, - struct ipoib_mcast **mmcast) +/*..ipoib_mcast_send -- handle sending mcast packet */ +void ipoib_mcast_send(struct net_device *dev, union ib_gid *mgid, + struct sk_buff *skb) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; unsigned long flags; - int ret = 0; spin_lock_irqsave(&priv->lock, flags); mcast = __ipoib_mcast_find(dev, mgid); @@ -637,57 +635,41 @@ mcast = ipoib_mcast_alloc(dev, 0); if (!mcast) { - ipoib_warn(priv, "unable to allocate memory for multicast structure\n"); - ret = -ENOMEM; + ipoib_warn(priv, "unable to allocate memory for " + "multicast structure\n"); + dev_kfree_skb_any(skb); goto out; } set_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags); - mcast->mgid = *mgid; - __ipoib_mcast_add(dev, mcast); - list_add_tail(&mcast->list, &priv->multicast_list); - - /* Leave references for the calling application */ } - if (mcast->address_handle == NULL) { + if (!mcast->address_handle) { if (mcast->tid != TS_IB_CLIENT_QUERY_TID_INVALID) - ipoib_dbg_mcast(priv, "no address vector, but multicast join already started\n"); + ipoib_dbg_mcast(priv, "no address vector, " + "but multicast join already started\n"); else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) ipoib_mcast_sendonly_join(mcast); - ret = -EAGAIN; + if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE) + skb_queue_tail(&mcast->pkt_queue, skb); + else + dev_kfree_skb_any(skb); } - *mmcast = mcast; - out: spin_unlock_irqrestore(&priv->lock, flags); - - return ret; + if (mcast) { + if (mcast->address_handle) + ipoib_send(dev, skb, mcast->address_handle, IB_MULTICAST_QPN); + ipoib_mcast_put(mcast); + } } /* =============================================================== */ -/*..ipoib_mcast_send -- send skb to multicast group */ -void ipoib_mcast_send(struct net_device *dev, struct ipoib_mcast *mcast, - struct sk_buff *skb) -{ - ipoib_send(dev, skb, mcast->address_handle, IB_MULTICAST_QPN); -} - -/* =============================================================== */ -/*..ipoib_mcast_queue_packet -- queue skb pending join */ -int ipoib_mcast_queue_packet(struct ipoib_mcast *mcast, struct sk_buff *skb) -{ - skb_queue_tail(&mcast->pkt_queue, skb); - - return 0; -} - -/* =============================================================== */ /*..ipoib_mcast_dev_flush -- flush joins and address vectors */ void ipoib_mcast_dev_flush(struct net_device *dev) { From roland at topspin.com Fri Oct 22 11:17:30 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 11:17:30 -0700 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <1098469091.22400.0.camel@hpc-1> (Hal Rosenstock's message of "Fri, 22 Oct 2004 14:18:11 -0400") References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <1098469091.22400.0.camel@hpc-1> Message-ID: <524qkmhb2d.fsf@topspin.com> Hal> What new MAD module work are you referring to here ? The stuff that you and Sean are working on. I'm waiting for Sean's timeout code and replacement of the MAD thread with a workqueue to be finished and merged before I pull the code into my tree (to avoid merging hassles on my side). - Roland From halr at voltaire.com Fri Oct 22 12:14:27 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 15:14:27 -0400 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <524qkmhb2d.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <1098469091.22400.0.camel@hpc-1> <524qkmhb2d.fsf@topspin.com> Message-ID: <1098472467.22400.6.camel@hpc-1> On Fri, 2004-10-22 at 14:17, Roland Dreier wrote: > Hal> What new MAD module work are you referring to here ? > > The stuff that you and Sean are working on. I'm waiting for Sean's > timeout code and replacement of the MAD thread with a workqueue to be > finished and merged before I pull the code into my tree (to avoid > merging hassles on my side). I do not understand why the lack of the timeout code or the replacement of the MAD thread with a workqueue needs to be completed before you can integrate. I think there is sufficient functionality there now to get this up and running. The code will continue to evolve anyhow. The changes that have made and put into the openib-candidate tree make it possible to not have to pull it into your tree and track all the changes. -- Hal From halr at voltaire.com Fri Oct 22 12:16:44 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 15:16:44 -0400 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52d5zahh3y.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> Message-ID: <1098472604.22400.9.camel@hpc-1> On Fri, 2004-10-22 at 12:06, Roland Dreier wrote: > However, IPoIB probably still will not work for you because of the > multicast member record issue with your SM. If I recall correctly, this is an end station issue with the SA multicast request rather than an SM issue. -- Hal From halr at voltaire.com Fri Oct 22 13:15:22 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 16:15:22 -0400 Subject: [openib-general] [PATCH] Minor cleanup and remove redundant initializations In-Reply-To: References: Message-ID: <1098476122.22400.18.camel@hpc-1> On Thu, 2004-10-21 at 17:55, Krishna Kumar wrote: > Trivial patch as described above in ib_mad.c (plus moving some variables > into localized loop). Thanks. Applied. -- Hal From halr at voltaire.com Fri Oct 22 13:25:33 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 16:25:33 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Add response timeout WC error code (Roland's branch) Message-ID: <1098476733.22400.24.camel@hpc-1> ib_verbs.h: Add response timeout WC error code (Roland's branch) Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 1039) +++ ib_verbs.h (working copy) @@ -289,7 +289,8 @@ IB_WC_INV_EECN_ERR, IB_WC_INV_EEC_STATE_ERR, IB_WC_FATAL_ERR, - IB_WC_GENERAL_ERR + IB_WC_GENERAL_ERR, + IB_WC_RESP_TIMEOUT_ERR }; enum ib_wc_opcode { From roland at topspin.com Fri Oct 22 13:25:09 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 13:25:09 -0700 Subject: [openib-general] [PATCH] ib_verbs.h: Add response timeout WC error code (Roland's branch) In-Reply-To: <1098476733.22400.24.camel@hpc-1> (Hal Rosenstock's message of "Fri, 22 Oct 2004 16:25:33 -0400") References: <1098476733.22400.24.camel@hpc-1> Message-ID: <52mzyefql6.fsf@topspin.com> thanks, applied. - R. From roland at topspin.com Fri Oct 22 13:26:57 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 13:26:57 -0700 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <1098472467.22400.6.camel@hpc-1> (Hal Rosenstock's message of "Fri, 22 Oct 2004 15:14:27 -0400") References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <1098469091.22400.0.camel@hpc-1> <524qkmhb2d.fsf@topspin.com> <1098472467.22400.6.camel@hpc-1> Message-ID: <52is92fqi6.fsf@topspin.com> Hal> I do not understand why the lack of the timeout code or the Hal> replacement of the MAD thread with a workqueue needs to be Hal> completed before you can integrate. I think there is Hal> sufficient functionality there now to get this up and Hal> running. The code will continue to evolve anyhow. Hal> The changes that have made and put into the openib-candidate Hal> tree make it possible to not have to pull it into your tree Hal> and track all the changes. My general policy is that I don't commit changes that break IPoIB on my tree. If I cut over to using new MAD code (+ new SA query API etc) without the new MAD code in my tree then my tree is broken. So I want to wait until the major work on the MAD code is done before I put it on my tree. Also my new SA code isn't quite done anyway. In any case I think I'll integrate the new MAD stuff on Monday no matter where it stands, and we can just deal with merging changes back and forth. - R. From halr at voltaire.com Fri Oct 22 13:36:34 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 16:36:34 -0400 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52is92fqi6.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <1098469091.22400.0.camel@hpc-1> <524qkmhb2d.fsf@topspin.com> <1098472467.22400.6.camel@hpc-1> <52is92fqi6.fsf@topspin.com> Message-ID: <1098477394.22400.28.camel@hpc-1> On Fri, 2004-10-22 at 16:26, Roland Dreier wrote: > In any case I think I'll integrate the new MAD stuff on Monday no > matter where it stands, and we can just deal with merging changes back > and forth. Thanks. I did a first pass of integrating the timeout code and it works without the timeouts. I will work on testing with timeouts now. I will get as many changes as I can in by Monday. -- Hal From roland at topspin.com Fri Oct 22 13:32:55 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 13:32:55 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) Message-ID: <52ekjqfq88.fsf@topspin.com> One issue was raised earlier, but I forgot it about in the recent discussion about how MADs are passed to the low-level driver (which ended with adding the snoop_mad entry point). The issue is how to process SM class GETs that are handled by the SMA (nearly every attribute) vs. GETs that need to be handled by the SM (namely SMInfo). As the code stands now, the ib_smi module registers an agent for SM GETs, so there's no way for an SM to receive SMInfo GET requests. How do we want to solve this? - R. From halr at voltaire.com Fri Oct 22 13:39:58 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 16:39:58 -0400 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52is92fqi6.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <1098469091.22400.0.camel@hpc-1> <524qkmhb2d.fsf@topspin.com> <1098472467.22400.6.camel@hpc-1> <52is92fqi6.fsf@topspin.com> Message-ID: <1098477598.22400.32.camel@hpc-1> On Fri, 2004-10-22 at 16:26, Roland Dreier wrote: > My general policy is that I don't commit changes that break IPoIB on > my tree. If I cut over to using new MAD code (+ new SA query API etc) > without the new MAD code in my tree then my tree is broken. So you don't plan on supporting both MAD layers for a short period of time. I wasn't sure about this. Also, this raises the question again about the "official" gen2 tree. -- Hal From tduffy at sun.com Fri Oct 22 13:39:16 2004 From: tduffy at sun.com (Tom Duffy) Date: Fri, 22 Oct 2004 13:39:16 -0700 Subject: [openib-general] [PATCH] Better IPoIB multicast handling In-Reply-To: <528y9yhb5o.fsf@topspin.com> References: <528y9yhb5o.fsf@topspin.com> Message-ID: <1098477556.1127.9.camel@duffman> On Fri, 2004-10-22 at 11:15 -0700, Roland Dreier wrote: > This patch improves how IPoIB handles multicasts. It should fix the > crash that Andras saw; unfortunately I don't think it will help with > Tom's crash (although I don't understand that crash so it might fix > it). Unfortunately it still probably doesn't work with some SMs. > > Also, with this patch, multicast seems to work (tested only with > "ping -I ib0 224.0.0.1") although the way I handle multicast > neighbours needs cleanup. > > Any test feedback is appreciated... Still crashing on my sparc64. Looks like qp is NULL. And yes, port 1 is going to ACTIVE. Unable to handle kernel NULL pointer dereference tsk->{mm,active_mm}->context = 0000000000000625 tsk->{mm,active_mm}->pgd = fffff8007ce84000 \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ ts_ib_mad(8771): Oops [#1] TSTATE: 0000004480009605 TPC: 0000000002018244 TNPC: 0000000002018248 Y: 00000000 Not tainted TPC: g0: 0000000000000000 g1: 0000000081234568 g2: 0000000000000003 g3: 0000000000000000 g4: fffff8007c07a100 g5: 0000000000000008 g6: fffff8006b6f8000 g7: 0000000000000024 o0: fffff8007d551640 o1: 0000000000000020 o2: 0000000000000001 o3: fffff8006b6fb35e o4: 0000000000008001 o5: 00000000000000f0 sp: fffff8006b6fa9e1 ret_pc: fffff8007f8ad220 RPC: <0xfffff8007f8ad220> l0: 0000000000000001 l1: fffff8006c389800 l2: 0000000002173800 l3: fffff8007fe99818 l4: 0000000000022b8c l5: 00000000006cfd90 l6: 000000000000595b l7: fffff8007fe1bb50 i0: 0000000000000000 i1: fffff8007d551640 i2: 0000000000000040 i3: fffff8006b6fb360 i4: 0000000000000040 i5: fffff8007f977ae0 i6: fffff8006b6faaa1 i7: 000000000218ce10 I7: Caller[000000000218ce10]: ipoib_mcast_attach+0x70/0x120 [ib_ipoib] Caller[000000000218b008]: ipoib_mcast_join_finish+0x148/0x3a0 [ib_ipoib] Caller[000000000218b7b4]: ipoib_mcast_join_complete+0x174/0x1e0 [ib_ipoib] Caller[000000000217e5bc]: _tsIbMulticastJoinResponse+0xdc/0x2e0 [ib_sa_client] Caller[0000000002176328]: ib_client_query_callback+0x68/0xa0 [ib_client_query] Caller[0000000002176fa0]: ib_client_mad_handler+0x60/0x100 [ib_client_query] Caller[000000000216f2b8]: ib_mad_invoke_filters+0x98/0x120 [ib_mad] Caller[000000000216f744]: ib_mad_dispatch+0xe4/0x1e0 [ib_mad] Caller[000000000216fcf0]: ib_mad_work_thread+0x70/0x540 [ib_mad] Caller[00000000020088b0]: _tsKernelQueueThread+0xf0/0x160 [ib_services] Caller[0000000002008624]: _tsKernelThreadStart+0x84/0xa0 [ib_services] Caller[0000000000417430]: kernel_thread+0x30/0x60 Caller[00000000020086d4]: tsKernelThreadStart+0x94/0xe0 [ib_services] Instruction DUMP: 01000000 01000000 9de3bf40 90100018 9410001a 92100019 c258a110 9fc04000 TSTATE: 00000000f0009601 TPC: 0000000000516398 TNPC: 000000000051639c Y: 00000000 Not tainted TPC: <__bzero+0x14c/0x274> g0: fffff8007fe5ba90 g1: fffff80000000000 g2: 00000000b6db6db7 g3: 0000000000000000 g4: fffff8007fe1b3e0 g5: 0000000000000080 g6: fffff8007fe58000 g7: 000000000051 000 o0: fffff8007d7f3100 o1: 0000000000000000 o2: 0000000000000000 o3: 0000000000000f00 o4: 0000000000000080 o5: 0000000000000000 sp: fffff8007fe5b141 ret_pc: 0000000000425fc4 RPC: l0: fffff8007d7f2000 l1: 0000000000000000 l2: 0000000000000000 l3: fffff8007fe73000 l4: 0000000000002000 l5: fffff8007c886020 l6: fffff8006c389b90 l7: fffff8007fe37fd0 i0: 0000000000000000 i1: 0000000000000001 i2: fffff8007fe5bacc i3: 00000000c7dc0000 i4: 0000000000000000 i5: 0000000000000000 i6: fffff8007fe5b201 i7: 00000000020549f4 I7: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From halr at voltaire.com Fri Oct 22 13:45:57 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 16:45:57 -0400 Subject: [openib-general] Re: [PATCH] timeout wq code In-Reply-To: <20041021160849.46271e5f.mshefty@ichips.intel.com> References: <20041021160849.46271e5f.mshefty@ichips.intel.com> Message-ID: <1098477957.22400.35.camel@hpc-1> On Thu, 2004-10-21 at 19:08, Sean Hefty wrote: > Code to use a work queue to time out MADs. There is one work queue per port. > (Completion handling code was not changed.) Thanks! Applied. Note that the timeouts have not been tested yet. I will send out a subsequent email when I have done that. > I'm working on creating a few simple test cases to verify > MAD functionality (registration, timeouts, sends, receives, > and RMPP in the future), but these are not yet done. Great. -- Hal From roland at topspin.com Fri Oct 22 13:55:52 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 13:55:52 -0700 Subject: [openib-general] wr_id in solicited receive requests Message-ID: <526552fp5z.fsf@topspin.com> In ib_mad.h, we have the following comment for ib_mad_recv_wc: * For received response, the wr_id field of the wc is set to the wr_id * for the corresponding send request. However, I can't find where this is done in the code in ib_mad.c. The only thing that looks close is in ib_mad_send_done_handler(): /* Restore client wr_id in WC */ wc->wr_id = mad_send_wr->wr_id; ib_mad_complete_send_wr(mad_send_wr, (struct ib_mad_send_wc*)wc); return; but that's for send completions. By the way, why is the cast (struct ib_mad_send_wc*)wc in the call to ib_mad_complete_send_wr() valid? It looks like ib_mad_completion_handler() calls ib_mad_send_done_handler() with the address of a struct ib_wc on the stack; the layouts of struct ib_mad_send_wc and struct ib_wc don't match so I'm a little confused as to what that cast is trying to do. Thanks, Roland From roland at topspin.com Fri Oct 22 13:56:58 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 13:56:58 -0700 Subject: [openib-general] [PATCH] Better IPoIB multicast handling In-Reply-To: <1098477556.1127.9.camel@duffman> (Tom Duffy's message of "Fri, 22 Oct 2004 13:39:16 -0700") References: <528y9yhb5o.fsf@topspin.com> <1098477556.1127.9.camel@duffman> Message-ID: <521xfqfp45.fsf@topspin.com> Tom> Still crashing on my sparc64. Looks like qp is NULL. OK, I'm sort of happy (I hate fixing bugs without understanding what the bug was in the first place). I'll keep thinking about it. Tom> And yes, port 1 is going to ACTIVE. That's good ... it means ib_get_dma_mr() does the right thing. - R. From roland at topspin.com Fri Oct 22 14:04:28 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 14:04:28 -0700 Subject: [openib-general] [PATCH] Better IPoIB multicast handling In-Reply-To: <1098477556.1127.9.camel@duffman> (Tom Duffy's message of "Fri, 22 Oct 2004 13:39:16 -0700") References: <528y9yhb5o.fsf@topspin.com> <1098477556.1127.9.camel@duffman> Message-ID: <52wtxiea77.fsf@topspin.com> Can you try running with this debugging patch? (It should just crash sooner) Thanks, Roland Index: infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- infiniband/ulp/ipoib/ipoib_multicast.c (revision 1038) +++ infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -536,6 +536,8 @@ ipoib_dbg_mcast(priv, "starting multicast thread\n"); + BUG_ON(!priv->qp); + down(&mcast_mutex); clear_bit(IPOIB_MCAST_STOP, &priv->flags); queue_work(ipoib_workqueue, &priv->mcast_task); From halr at voltaire.com Fri Oct 22 14:11:21 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 17:11:21 -0400 Subject: [openib-general] wr_id in solicited receive requests In-Reply-To: <526552fp5z.fsf@topspin.com> References: <526552fp5z.fsf@topspin.com> Message-ID: <1098479481.22400.58.camel@hpc-1> On Fri, 2004-10-22 at 16:55, Roland Dreier wrote: > By the way, why is the cast (struct ib_mad_send_wc*)wc in the call to > ib_mad_complete_send_wr() valid? It looks like ib_mad_completion_handler() > calls ib_mad_send_done_handler() with the address of a struct ib_wc on > the stack; the layouts of struct ib_mad_send_wc and struct ib_wc don't > match so I'm a little confused as to what that cast is trying to do. It's an optimization since the ib_mad_send_wc structure is a subset of and can be overlaid on the ib_wc structure. The ib_mad_send_wc struct is the first 3 elements of the ib_wc structure. -- Hal From halr at voltaire.com Fri Oct 22 14:21:00 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 17:21:00 -0400 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52ekjqfq88.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> Message-ID: <1098480059.22400.69.camel@hpc-1> On Fri, 2004-10-22 at 16:32, Roland Dreier wrote: > The issue is how to process SM class GETs that are handled by the SMA > (nearly every attribute) vs. GETs that need to be handled by the SM > (namely SMInfo). As the code stands now, the ib_smi module registers > an agent for SM GETs, so there's no way for an SM to receive SMInfo > GET requests. This applies to Sets of SMInfo as well as Gets. The registration interface will need to be extended somehow for this. I don't think we want to add an attribute ID mask :-) The simplest way I can think of is to add an issm bit to the registration request structure and special case these unsolicited receives to go to the SM if it has been registered and to drop otherwise. -- Hal From roland at topspin.com Fri Oct 22 14:25:34 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 14:25:34 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <1098480059.22400.69.camel@hpc-1> (Hal Rosenstock's message of "Fri, 22 Oct 2004 17:21:00 -0400") References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> Message-ID: <52mzyee981.fsf@topspin.com> Hal> This applies to Sets of SMInfo as well as Gets. True, good point. Hal> The registration interface will need to be extended somehow Hal> for this. I don't think we want to add an attribute ID mask Hal> :-) The simplest way I can think of is to add an issm bit to Hal> the registration request structure and special case these Hal> unsolicited receives to go to the SM if it has been Hal> registered and to drop otherwise. All these special cases are starting to look pretty awkward. Another solution is to get rid of the snoop_mad entry point and the SMA and PMA agents in ib_smi, and just pass every MAD to the low-level driver. The low-level driver can decide to handle it (consume it and/or generate a response) or not. Everything not consumed by the low-level driver gets put through agent dispatch as usual. This also avoids having to add another agent for BMA to ib_smi. Also it makes it pretty easy for low-level drivers to handle vendor-specific GSI classes (of course drivers can register an agent for every port and class they want to handle but it turns into a bit of a pain). - R. From ftillier at infiniconsys.com Fri Oct 22 14:32:07 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Fri, 22 Oct 2004 14:32:07 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52mzyee981.fsf@topspin.com> Message-ID: <000001c4b87e$95fa3790$655aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Friday, October 22, 2004 2:26 PM > > All these special cases are starting to look pretty awkward. Another > solution is to get rid of the snoop_mad entry point and the SMA and > PMA agents in ib_smi, and just pass every MAD to the low-level > driver. The low-level driver can decide to handle it (consume it > and/or generate a response) or not. Everything not consumed by the > low-level driver gets put through agent dispatch as usual. > This seems to imply that received MAD completion processing will never happen in the context of the CQ notification callback, and always require a context switch to a thread context that can block while the MADs are handed off. Am I following correctly? Do we want to have such a context switch for every received MAD? This is where having a local mad interface that is asynchronous would help quite a bit. - Fab From halr at voltaire.com Fri Oct 22 14:47:19 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 22 Oct 2004 17:47:19 -0400 Subject: [openib-general] wr_id in solicited receive requests In-Reply-To: <526552fp5z.fsf@topspin.com> References: <526552fp5z.fsf@topspin.com> Message-ID: <1098481639.22400.95.camel@hpc-1> On Fri, 2004-10-22 at 16:55, Roland Dreier wrote: > In ib_mad.h, we have the following comment for ib_mad_recv_wc: > > * For received response, the wr_id field of the wc is set to the wr_id > * for the corresponding send request. > > However, I can't find where this is done in the code in ib_mad.c. Good catch. Thanks for pointing this out. I'll work up a patch for this (it looks like a one liner from looking at the code). -- Hal From mshefty at ichips.intel.com Fri Oct 22 15:20:12 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 22 Oct 2004 15:20:12 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52mzyee981.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> Message-ID: <20041022152012.4612097b.mshefty@ichips.intel.com> On Fri, 22 Oct 2004 14:25:34 -0700 Roland Dreier wrote: > All these special cases are starting to look pretty awkward. Another > solution is to get rid of the snoop_mad entry point and the SMA and > PMA agents in ib_smi, and just pass every MAD to the low-level > driver. The low-level driver can decide to handle it (consume it > and/or generate a response) or not. Everything not consumed by the > low-level driver gets put through agent dispatch as usual. I think I'm missing something here. I thought that the snoop_mad entry point was the solution to this issue. - Sean From roland at topspin.com Fri Oct 22 15:43:18 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 15:43:18 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <20041022152012.4612097b.mshefty@ichips.intel.com> (Sean Hefty's message of "Fri, 22 Oct 2004 15:20:12 -0700") References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> Message-ID: <52is92e5mh.fsf@topspin.com> Sean> I think I'm missing something here. I thought that the Sean> snoop_mad entry point was the solution to this issue. Except as currently defined, it doesn't provide a way for the low-level driver to give back a response -- it just lets the low-level driver steal MADs like locally generated traps. There's no way to handle SMA, PMA, BMA and vendor-specific requests etc. My suggestion basically amounts to expanding snoop_mad to handle everything. But then there's no need for the process_mad entry point (snoop_mad becomes the interface). So we might as well get rid of snoop_mad since I think the process_mad name is more descriptive. - R. From roland at topspin.com Fri Oct 22 15:44:34 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 22 Oct 2004 15:44:34 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <000001c4b87e$95fa3790$655aa8c0@infiniconsys.com> (Fab Tillier's message of "Fri, 22 Oct 2004 14:32:07 -0700") References: <000001c4b87e$95fa3790$655aa8c0@infiniconsys.com> Message-ID: <52ekjqe5kd.fsf@topspin.com> Fab> This seems to imply that received MAD completion processing Fab> will never happen in the context of the CQ notification Fab> callback, and always require a context switch to a thread Fab> context that can block while the MADs are handed off. Am I Fab> following correctly? Do we want to have such a context Fab> switch for every received MAD? This design decision (to handle MAD completions in thread context) was made a while ago. However I agree that long term it would make sense to move at least the SMA processing back to interrupt/tasklet context (and switch the local MAD interface to be asynch, of course). - R. From ftillier at infiniconsys.com Fri Oct 22 15:58:21 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Fri, 22 Oct 2004 15:58:21 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52ekjqe5kd.fsf@topspin.com> Message-ID: <000101c4b88a$a34e6cc0$655aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Friday, October 22, 2004 3:45 PM > > Fab> This seems to imply that received MAD completion processing > Fab> will never happen in the context of the CQ notification > Fab> callback, and always require a context switch to a thread > Fab> context that can block while the MADs are handed off. Am I > Fab> following correctly? Do we want to have such a context > Fab> switch for every received MAD? > > This design decision (to handle MAD completions in thread context) was > made a while ago. However I agree that long term it would make sense > to move at least the SMA processing back to interrupt/tasklet context > (and switch the local MAD interface to be asynch, of course). > Ok, cool. Just wanted to make sure we're all on the same page. The course of action you describe seems perfectly sane to me. - Fab From troy at scl.ameslab.gov Fri Oct 22 16:59:43 2004 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Fri, 22 Oct 2004 18:59:43 -0500 Subject: [openib-general] InfiniBand incompatible with the Linux kernel? In-Reply-To: <52oejbliuk.fsf@topspin.com> References: <20041008202247.GA9653@kroah.com> <528yagn63x.fsf@topspin.com> <20041009115028.GA14571@electric-eye.fr.zoreil.com> <52oejbliuk.fsf@topspin.com> Message-ID: <41799EEF.8060902@scl.ameslab.gov> Well, fortunately this has turned out to be a non-issue. I just went to www.infinibandta.org and the 1.2 spec is available for download. http://www.infinibandta.org/specs/register/publicspec/vol1r1_2.zip http://www.infinibandta.org/specs/register/publicspec/vol2r1_2.zip Roland Dreier wrote: > Roland> it's orthogonal to any IP issues. Since the Linux kernel > Roland> contains a lot of code written to specs available only > Roland> under NDA (and even reverse-engineered code where specs > Roland> are completely unavailable), I don't think the expense > Roland> should be an issue. > > Francois> One can say good bye to peer review. > >Yes and no. Certainly people without specs can't review spec >compliance, but review for coding style, locking bugs, etc. is if >anything more valuable. > >Thanks, > Roland >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Sat Oct 23 14:02:56 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 23 Oct 2004 23:02:56 +0200 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52ekjqe5kd.fsf@topspin.com> References: <000001c4b87e$95fa3790$655aa8c0@infiniconsys.com> <52ekjqe5kd.fsf@topspin.com> Message-ID: <20041023210256.GA8995@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] Handling SM class (SMInfo vs. other queries)": > Fab> This seems to imply that received MAD completion processing > Fab> will never happen in the context of the CQ notification > Fab> callback, and always require a context switch to a thread > Fab> context that can block while the MADs are handed off. Am I > Fab> following correctly? Do we want to have such a context > Fab> switch for every received MAD? > > This design decision (to handle MAD completions in thread context) was > made a while ago. However I agree that long term it would make sense > to move at least the SMA processing back to interrupt/tasklet context > (and switch the local MAD interface to be asynch, of course). > > - R. I think the difficulty with the last one is that (at least for Tavor) process local mad can block, since there is a limited number of outstanding commands. Of course you could always make it non-blocking by dropping the MAD if the command interface is busy ... no idea if this will improve or decease performance, though. MST From roland at topspin.com Sat Oct 23 15:50:55 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 23 Oct 2004 15:50:55 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <20041023210256.GA8995@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 23 Oct 2004 23:02:56 +0200") References: <000001c4b87e$95fa3790$655aa8c0@infiniconsys.com> <52ekjqe5kd.fsf@topspin.com> <20041023210256.GA8995@mellanox.co.il> Message-ID: <52acuddp68.fsf@topspin.com> Michael> I think the difficulty with the last one is that (at Michael> least for Tavor) process local mad can block, since there Michael> is a limited number of outstanding commands. Of course Michael> you could always make it non-blocking by dropping the MAD Michael> if the command interface is busy ... no idea if this will Michael> improve or decease performance, though. Not sure if I understand what the issue is... if the interface is asynchronous, if there is not room for another outstanding command, it seems one can just queue the command until a slot becomes open. - Roland From mst at mellanox.co.il Sun Oct 24 00:59:27 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 24 Oct 2004 09:59:27 +0200 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52acuddp68.fsf@topspin.com> References: <000001c4b87e$95fa3790$655aa8c0@infiniconsys.com> <52ekjqe5kd.fsf@topspin.com> <20041023210256.GA8995@mellanox.co.il> <52acuddp68.fsf@topspin.com> Message-ID: <20041024075927.GA24740@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] Handling SM class (SMInfo vs. other queries)": > Michael> I think the difficulty with the last one is that (at > Michael> least for Tavor) process local mad can block, since there > Michael> is a limited number of outstanding commands. Of course > Michael> you could always make it non-blocking by dropping the MAD > Michael> if the command interface is busy ... no idea if this will > Michael> improve or decease performance, though. > > Not sure if I understand what the issue is... if the interface is > asynchronous, if there is not room for another outstanding command, > it seems one can just queue the command until a slot becomes open. > > - Roland I was thinking in terms of gen1 code I guess. What I was trying to say I think is that you'll need some kind of limit to avoid exhausing kernel memory if there are lots of MADs. MST From halr at voltaire.com Sun Oct 24 10:38:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 24 Oct 2004 13:38:01 -0400 Subject: [openib-general] [PATCH] ib_mad: In ib_mad_complete_recv, decrement agent refcount when not fully reassembled and when no request found Message-ID: <1098639481.3512.6.camel@hpc-1> ib_mad: In ib_mad_complete_recv, decrement agent reference count when receive is not fully reassembled, and also when solicited and no matching request is found. This allows deregistration to complete rather than waiting for an event which never occurs. Index: ib_mad.c =================================================================== --- ib_mad.c (revision 1044) +++ ib_mad.c (working copy) @@ -873,8 +873,10 @@ /* Fully reassemble receive before processing */ recv = reassemble_recv(mad_agent_priv, recv); - if (!recv) + if (!recv) { + atomic_dec(&mad_agent_priv->refcount); return; + } /* Complete corresponding request */ if (solicited) { @@ -884,6 +886,7 @@ if (!mad_send_wr) { spin_unlock_irqrestore(&mad_agent_priv->lock, flags); ib_free_recv_mad(&recv->header.recv_wc); + atomic_dec(&mad_agent_priv->refcount); return; } /* Timeout = 0 means that we won't wait for a response */ From halr at voltaire.com Sun Oct 24 10:40:28 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 24 Oct 2004 13:40:28 -0400 Subject: [openib-general] ib_verbs.h (Roland's branch): IB_WC_FATAL_ERR Message-ID: <1098639628.3512.10.camel@hpc-1> Hi, Is there a need for IB_WC_FATAL_ERR value of ib_wc_status enum ? It appears to be unused. If so, can it be removed from ib_verbs.h (Roland's branch) ? Thanks. -- Hal From halr at voltaire.com Sun Oct 24 11:35:31 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 24 Oct 2004 14:35:31 -0400 Subject: [openib-general] MAD timeout code Message-ID: <1098642930.6252.4.camel@hpc-1> Hi, I have now completed 4 test cases with the MAD timeout code and it appears to be working. -- Hal From halr at voltaire.com Sun Oct 24 11:45:40 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 24 Oct 2004 14:45:40 -0400 Subject: [openib-general] [PATCH] ib_smi: Changes to make ib_sma into ib_agt module since this supports PMA as well as SMA currently Message-ID: <1098643539.6252.16.camel@hpc-1> ib_smi: Changes to make ib_sma into ib_agt module since this supports PMA as well as SMA currently Index: ib_smi.c =================================================================== --- ib_smi.c (revision 1037) +++ ib_smi.c (working copy) @@ -1,865 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved. - Copyright (c) 2004 Infinicon Corporation. All rights reserved. - Copyright (c) 2004 Intel Corporation. All rights reserved. - Copyright (c) 2004 Topspin Corporation. All rights reserved. - Copyright (c) 2004 Voltaire Corporation. All rights reserved. -*/ - -#include -#include "ib_smi_priv.h" -#include "ib_mad_priv.h" - - -MODULE_LICENSE("Dual BSD/GPL"); -MODULE_DESCRIPTION("kernel IB agents (SMA and PMA)"); -MODULE_AUTHOR("Sean Hefty"); -MODULE_AUTHOR("Hal Rosenstock"); - - -static spinlock_t ib_agent_port_list_lock = SPIN_LOCK_UNLOCKED; -static struct list_head ib_agent_port_list; - -/* - * Fixup a directed route SMP for sending. Return 0 if the SMP should be - * discarded. - */ -static int smi_handle_dr_smp_send(struct ib_smp *smp, - u8 node_type, - int port_num) -{ - u8 hop_ptr, hop_cnt; - - hop_ptr = smp->hop_ptr; - hop_cnt = smp->hop_cnt; - - /* See section 14.2.2.2, Vol 1 IB spec */ - if (!ib_get_smp_direction(smp)) { - /* C14-9:1 */ - if (hop_cnt && hop_ptr == 0) { - smp->hop_ptr++; - return (smp->initial_path[smp->hop_ptr] == - port_num); - } - - /* C14-9:2 */ - if (hop_ptr && hop_ptr < hop_cnt) { - if (node_type != IB_NODE_SWITCH) - return 0; - - /* smp->return_path set when received */ - smp->hop_ptr++; - return (smp->initial_path[smp->hop_ptr] == - port_num); - } - - /* C14-9:3 -- We're at the end of the DR segment of path */ - if (hop_ptr == hop_cnt) { - /* smp->return_path set when received */ - smp->hop_ptr++; - return (node_type == IB_NODE_SWITCH || - smp->dr_dlid == IB_LID_PERMISSIVE); - } - - /* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM. */ - /* C14-9:5 -- Fail unreasonable hop pointer. */ - return (hop_ptr == hop_cnt + 1); - - } else { - /* C14-13:1 */ - if (hop_cnt && hop_ptr == hop_cnt + 1) { - smp->hop_ptr--; - return (smp->return_path[smp->hop_ptr] == - port_num); - } - - /* C14-13:2 */ - if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (node_type != IB_NODE_SWITCH) - return 0; - - smp->hop_ptr--; - return (smp->return_path[smp->hop_ptr] == - port_num); - } - - /* C14-13:3 -- at the end of the DR segment of path */ - if (hop_ptr == 1) { - smp->hop_ptr--; - /* C14-13:3 -- SMPs destined for SM shouldn't be here */ - return (node_type == IB_NODE_SWITCH || - smp->dr_slid == IB_LID_PERMISSIVE); - } - - /* C14-13:4 -- hop_ptr = 0 -> should have gone to SM. */ - /* C14-13:5 -- Check for unreasonable hop pointer. */ - return 0; - } -} - -/* - * Sender side handling of outgoing SMPs. Fixup the SMP as required by - * the spec. Return 0 if the SMP should be dropped. - */ -static int smi_handle_smp_send(struct ib_smp *smp, - u8 node_type, - int port_num) -{ - switch (smp->mgmt_class) - { - case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_send(smp, node_type, port_num); - default: /* LR SM class */ - return 1; - } -} - -/* - * Return 1 if the SMP should be handled by the local SMA via process_mad. - */ -static inline int smi_check_local_smp(struct ib_mad_agent *mad_agent, - struct ib_smp *smp) -{ - /* C14-9:3 -- We're at the end of the DR segment of path */ - /* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM. */ - return ((smp->mgmt_class != IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) || - (mad_agent->device->process_mad && - !ib_get_smp_direction(smp) && - (smp->hop_ptr == smp->hop_cnt + 1))); -} - -/* - * Adjust information for a received SMP. Return 0 if the SMP should be - * dropped. - */ -static int smi_handle_dr_smp_recv(struct ib_smp *smp, - u8 node_type, - int port_num, - int phys_port_cnt) -{ - u8 hop_ptr, hop_cnt; - - hop_ptr = smp->hop_ptr; - hop_cnt = smp->hop_cnt; - - /* See section 14.2.2.2, Vol 1 IB spec */ - if (!ib_get_smp_direction(smp)) { - /* C14-9:1 -- sender should have incremented hop_ptr */ - if (hop_cnt && hop_ptr == 0) - return 0; - - /* C14-9:2 -- intermediate hop */ - if (hop_ptr && hop_ptr < hop_cnt) { - if (node_type != IB_NODE_SWITCH) - return 0; - - smp->return_path[hop_ptr] = port_num; - /* smp->hop_ptr updated when sending */ - return (smp->initial_path[hop_ptr+1] <= phys_port_cnt); - } - - /* C14-9:3 -- We're at the end of the DR segment of path */ - if (hop_ptr == hop_cnt) { - if (hop_cnt) - smp->return_path[hop_ptr] = port_num; - /* smp->hop_ptr updated when sending */ - - return (node_type == IB_NODE_SWITCH || - smp->dr_dlid == IB_LID_PERMISSIVE); - } - - /* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM. */ - /* C14-9:5 -- fail unreasonable hop pointer. */ - return (hop_ptr == hop_cnt + 1); - - } else { - - /* C14-13:1 */ - if (hop_cnt && hop_ptr == hop_cnt + 1) { - smp->hop_ptr--; - return (smp->return_path[smp->hop_ptr] == - port_num); - } - - /* C14-13:2 */ - if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (node_type != IB_NODE_SWITCH) - return 0; - - /* smp->hop_ptr updated when sending */ - return (smp->return_path[hop_ptr-1] <= phys_port_cnt); - } - - /* C14-13:3 -- We're at the end of the DR segment of path */ - if (hop_ptr == 1) { - if (smp->dr_slid == IB_LID_PERMISSIVE) { - /* giving SMP to SM - update hop_ptr */ - smp->hop_ptr--; - return 1; - } - /* smp->hop_ptr updated when sending */ - return (node_type == IB_NODE_SWITCH); - } - - /* C14-13:4 -- hop_ptr = 0 -> give to SM. */ - /* C14-13:5 -- Check for unreasonable hop pointer. */ - return (hop_ptr == 0); - } -} - -/* - * Receive side handling SMPs. Save receive information as required by - * the spec. Return 0 if the SMP should be dropped. - */ -static int smi_handle_smp_recv(struct ib_smp *smp, - u8 node_type, - int port_num, - int phys_port_cnt) -{ - switch (smp->mgmt_class) - { - case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_handle_dr_smp_recv(smp, node_type, - port_num, phys_port_cnt); - default: /* LR SM class */ - return 1; - } -} - -/* - * Return 1 if the received DR SMP should be forwarded to the send queue. - * Return 0 if the SMP should be completed up the stack. - */ -static int smi_check_forward_dr_smp(struct ib_smp *smp) -{ - u8 hop_ptr, hop_cnt; - - hop_ptr = smp->hop_ptr; - hop_cnt = smp->hop_cnt; - - if (!ib_get_smp_direction(smp)) { - /* C14-9:2 -- intermediate hop */ - if (hop_ptr && hop_ptr < hop_cnt) - return 1; - - /* C14-9:3 -- at the end of the DR segment of path */ - if (hop_ptr == hop_cnt) - return (smp->dr_dlid == IB_LID_PERMISSIVE); - - /* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM. */ - if (hop_ptr == hop_cnt + 1) - return 1; - } else { - /* C14-13:2 */ - if (2 <= hop_ptr && hop_ptr <= hop_cnt) - return 1; - - /* C14-13:3 -- at the end of the DR segment of path */ - if (hop_ptr == 1) - return (smp->dr_slid != IB_LID_PERMISSIVE); - } - return 0; -} - -/* - * Return 1 if the received SMP should be forwarded to the send queue. - * Return 0 if the SMP should be completed up the stack. - */ -static int smi_check_forward_smp(struct ib_smp *smp) -{ - switch (smp->mgmt_class) - { - case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: - return smi_check_forward_dr_smp(smp); - default: /* LR SM class */ - return 1; - } -} - -static int mad_process_local(struct ib_mad_agent *mad_agent, - struct ib_mad *mad, - struct ib_mad *mad_response, - u16 slid) -{ - return mad_agent->device->process_mad(mad_agent->device, 0, - mad_agent->port_num, - slid, mad, mad_response); -} - -void agent_mad_send(struct ib_mad_agent *mad_agent, - struct ib_mad *mad, - struct ib_grh *grh, - struct ib_mad_recv_wc *mad_recv_wc) -{ - struct ib_agent_port_private *entry, *port_priv = NULL; - struct ib_agent_send_wr *agent_send_wr; - struct ib_sge gather_list; - struct ib_send_wr send_wr; - struct ib_send_wr *bad_send_wr; - struct ib_ah_attr ah_attr; - struct ib_ah *ah; - unsigned long flags; - - /* Find matching MAD agent */ - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if ((entry->dr_smp_agent == mad_agent) || - (entry->lr_smp_agent == mad_agent) || - (entry->perf_mgmt_agent == mad_agent)) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); - if (!port_priv) { - printk(KERN_ERR SPFX "agent_mad_send: no matching MAD agent 0x%x\n", - (unsigned int)mad_agent); - return; - } - - agent_send_wr = kmalloc(sizeof(*agent_send_wr), GFP_KERNEL); - if (!agent_send_wr) - return; - agent_send_wr->mad = mad; - - /* PCI mapping */ - gather_list.addr = pci_map_single(mad_agent->device->dma_device, - mad, - sizeof(struct ib_mad), - PCI_DMA_TODEVICE); - gather_list.length = sizeof(struct ib_mad); - gather_list.lkey = (*port_priv->mr).lkey; - - send_wr.next = NULL; - send_wr.opcode = IB_WR_SEND; - send_wr.sg_list = &gather_list; - send_wr.num_sge = 1; - send_wr.wr.ud.remote_qpn = mad_recv_wc->wc->src_qp; /* DQPN */ - send_wr.wr.ud.timeout_ms = 0; - send_wr.send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED; - - ah_attr.dlid = mad_recv_wc->wc->slid; - ah_attr.port_num = mad_agent->port_num; - ah_attr.src_path_bits = mad_recv_wc->wc->dlid_path_bits; - ah_attr.sl = mad_recv_wc->wc->sl; - ah_attr.static_rate = 0; - if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { - if (mad_recv_wc->wc->wc_flags & IB_WC_GRH) { - ah_attr.ah_flags = IB_AH_GRH; - ah_attr.grh.sgid_index = 0; /* Should sgid be looked up -? */ - ah_attr.grh.hop_limit = grh->hop_limit; - ah_attr.grh.flow_label = be32_to_cpup(&grh->version_tclass_flow) & 0xfffff; - ah_attr.grh.traffic_class = (be32_to_cpup(&grh->version_tclass_flow) >> 20) & 0xff; - memcpy(ah_attr.grh.dgid.raw, grh->sgid.raw, sizeof(struct ib_grh)); - } else { - ah_attr.ah_flags = 0; /* No GRH */ - } - } else { - /* Directed route or LID routed SM class */ - ah_attr.ah_flags = 0; /* No GRH */ - } - - ah = ib_create_ah(mad_agent->qp->pd, &ah_attr); - if (IS_ERR(ah)) { - printk(KERN_ERR SPFX "No memory for address handle\n"); - kfree(mad); - return; - } - - send_wr.wr.ud.ah = ah; - if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { - send_wr.wr.ud.pkey_index = mad_recv_wc->wc->pkey_index; - send_wr.wr.ud.remote_qkey = IB_QP1_QKEY; - } else { - send_wr.wr.ud.pkey_index = 0; /* Should only matter for GMPs */ - send_wr.wr.ud.remote_qkey = 0; /* for SMPs */ - } - send_wr.wr.ud.mad_hdr = (struct ib_mad_hdr *)mad; - send_wr.wr_id = ++port_priv->wr_id; - - pci_unmap_addr_set(mad, agent_send_wr->mapping, gather_list.addr); - - /* Send */ - spin_lock_irqsave(&port_priv->send_list_lock, flags); - if (ib_post_send_mad(mad_agent, &send_wr, &bad_send_wr)) { - pci_unmap_single(mad_agent->device->dma_device, - pci_unmap_addr(mad, agent_send_wr->mapping), - sizeof(struct ib_mad), - PCI_DMA_TODEVICE); - } else { - list_add_tail(&agent_send_wr->send_list, - &port_priv->send_posted_list); - } - spin_unlock_irqrestore(&port_priv->send_list_lock, flags); - ib_destroy_ah(ah); -} - -int smi_send_smp(struct ib_mad_agent *mad_agent, - struct ib_smp *smp, - struct ib_mad_recv_wc *mad_recv_wc, - u16 slid, - int phys_port_cnt) -{ - struct ib_mad *smp_response; - int ret; - - if (!smi_handle_smp_send(smp, mad_agent->device->node_type, - mad_agent->port_num)) { - /* SMI failed send */ - return 0; - } - - if (smi_check_local_smp(mad_agent, smp)) { - smp_response = kmalloc(sizeof(struct ib_mad), GFP_KERNEL); - if (!smp_response) - return 0; - - ret = mad_process_local(mad_agent, (struct ib_mad *)smp, - smp_response, slid); - if (ret & IB_MAD_RESULT_SUCCESS) { - if (!smi_handle_smp_recv((struct ib_smp *)smp_response, - mad_agent->device->node_type, - mad_agent->port_num, - phys_port_cnt)) { - /* SMI failed receive */ - kfree(smp_response); - return 0; - } - agent_mad_send(mad_agent, smp_response, - NULL, mad_recv_wc); - } else - kfree(smp_response); - return 1; - } - - /* Post the send on the QP */ - return 1; -} - -int agent_mad_response(struct ib_mad_agent *mad_agent, - struct ib_mad *mad, - struct ib_mad_recv_wc *mad_recv_wc, - u16 slid) -{ - struct ib_mad *response; - struct ib_grh *grh; - int ret; - - response = kmalloc(sizeof(struct ib_mad), GFP_KERNEL); - if (!response) - return 0; - - ret = mad_process_local(mad_agent, mad, response, slid); - if (ret & IB_MAD_RESULT_SUCCESS) { - grh = (void *)mad - sizeof(struct ib_grh); - agent_mad_send(mad_agent, response, grh, mad_recv_wc); - } else - kfree(response); - return 1; -} - -int agent_recv_mad(struct ib_mad_agent *mad_agent, - struct ib_mad *mad, - struct ib_mad_recv_wc *mad_recv_wc, - int phys_port_cnt) -{ - int port_num; - - /* SM Directed Route or LID Routed class */ - if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE || - mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED) { - if (mad_agent->device->node_type != IB_NODE_SWITCH) - port_num = mad_agent->port_num; - else - port_num = mad_recv_wc->wc->port_num; - if (!smi_handle_smp_recv((struct ib_smp *)mad, - mad_agent->device->node_type, - port_num, phys_port_cnt)) { - /* SMI failed receive */ - return 0; - } - - if (smi_check_forward_smp((struct ib_smp *)mad)) { - smi_send_smp(mad_agent, - (struct ib_smp *)mad, - mad_recv_wc, - mad_recv_wc->wc->slid, - phys_port_cnt); - return 0; - } - - } else { - /* PerfMgmt class */ - if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { - agent_mad_response(mad_agent, mad, mad_recv_wc, - mad_recv_wc->wc->slid); - } else { - printk(KERN_ERR "agent_recv_mad: Unexpected mgmt class 0x%x received\n", mad->mad_hdr.mgmt_class); - } - return 0; - } - - /* Complete receive up stack */ - return 1; -} - -static void agent_send_handler(struct ib_mad_agent *mad_agent, - struct ib_mad_send_wc *mad_send_wc) -{ - struct ib_agent_port_private *entry, *port_priv = NULL; - struct ib_agent_send_wr *agent_send_wr; - struct list_head *send_wr; - unsigned long flags; - - /* Find matching MAD agent */ - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if ((entry->dr_smp_agent == mad_agent) || - (entry->lr_smp_agent == mad_agent) || - (entry->perf_mgmt_agent == mad_agent)) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); - if (!port_priv) { - printk(KERN_ERR SPFX "agent_send_handler: no matching MAD agent " - "0x%x\n", (unsigned int)mad_agent); - return; - } - - /* Completion corresponds to first entry on posted MAD send list */ - spin_lock_irqsave(&port_priv->send_list_lock, flags); - if (list_empty(&port_priv->send_posted_list)) { - spin_unlock_irqrestore(&port_priv->send_list_lock, flags); - printk(KERN_ERR SPFX "Send completion WR ID 0x%Lx but send list " - "is empty\n", mad_send_wc->wr_id); - return; - } - - agent_send_wr = list_entry(&port_priv->send_posted_list, - struct ib_agent_send_wr, - send_list); - send_wr = agent_send_wr->send_list.next; - agent_send_wr = container_of(send_wr, struct ib_agent_send_wr, - send_list); - - /* Remove from posted send SMP list */ - list_del(&agent_send_wr->send_list); - spin_unlock_irqrestore(&port_priv->send_list_lock, flags); - - /* Unmap PCI */ - pci_unmap_single(mad_agent->device->dma_device, - pci_unmap_addr(agent_send_wr->smp, - agent_send_wr->mapping), - sizeof(struct ib_mad), - PCI_DMA_TODEVICE); - - /* Release allocated memory */ - kfree(agent_send_wr->mad); -} - -static void agent_recv_handler(struct ib_mad_agent *mad_agent, - struct ib_mad_recv_wc *mad_recv_wc) -{ - struct ib_agent_port_private *entry, *port_priv = NULL; - unsigned long flags; - - /* Find matching MAD agent */ - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if ((entry->dr_smp_agent == mad_agent) || - (entry->lr_smp_agent == mad_agent) || - (entry->perf_mgmt_agent == mad_agent)) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); - if (!port_priv) { - printk(KERN_ERR SPFX "agent_recv_handler: no matching MAD agent 0x%x\n", - (unsigned int)mad_agent); - - } else { - agent_recv_mad(mad_agent, - mad_recv_wc->recv_buf->mad, - mad_recv_wc, port_priv->phys_port_cnt); - } - - /* Free received MAD */ - ib_free_recv_mad(mad_recv_wc); -} - -static int ib_agent_port_open(struct ib_device *device, int port_num, - int phys_port_cnt) -{ - int ret; - u64 iova = 0; - struct ib_phys_buf buf_list = { - .addr = 0, - .size = (unsigned long) high_memory - PAGE_OFFSET - }; - struct ib_agent_port_private *entry, *port_priv = NULL; - struct ib_mad_reg_req reg_req; - unsigned long flags; - - /* First, check if port already open for SMI */ - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if (entry->dr_smp_agent->device == device && - entry->port_num == port_num) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); - if (port_priv) { - printk(KERN_DEBUG SPFX "%s port %d already open\n", - device->name, port_num); - return 0; - } - - /* Create new device info */ - port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); - if (!port_priv) { - printk(KERN_ERR SPFX "No memory for ib_agent_port_private\n"); - ret = -ENOMEM; - goto error1; - } - - memset(port_priv, 0, sizeof *port_priv); - port_priv->port_num = port_num; - port_priv->phys_port_cnt = phys_port_cnt; - port_priv->wr_id = 0; - spin_lock_init(&port_priv->send_list_lock); - INIT_LIST_HEAD(&port_priv->send_posted_list); - - /* Obtain MAD agent for directed route SM class */ - reg_req.mgmt_class = IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE; - reg_req.mgmt_class_version = 1; - - /* SMA needs to receive Get, Set, and TrapRepress methods */ - bitmap_zero((unsigned long *)®_req.method_mask, IB_MGMT_MAX_METHODS); - set_bit(IB_MGMT_METHOD_GET, (unsigned long *)®_req.method_mask); - set_bit(IB_MGMT_METHOD_SET, (unsigned long *)®_req.method_mask); - set_bit(IB_MGMT_METHOD_TRAP_REPRESS, - (unsigned long *)®_req.method_mask); - - port_priv->dr_smp_agent = ib_register_mad_agent(device, port_num, - IB_QPT_SMI, - ®_req, 0, - &agent_send_handler, - &agent_recv_handler, - NULL); - if (IS_ERR(port_priv->dr_smp_agent)) { - ret = PTR_ERR(port_priv->dr_smp_agent); - goto error2; - } - - /* Obtain MAD agent for LID routed SM class */ - reg_req.mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - port_priv->lr_smp_agent = ib_register_mad_agent(device, port_num, - IB_QPT_SMI, - ®_req, 0, - &agent_send_handler, - &agent_recv_handler, - NULL); - if (IS_ERR(port_priv->lr_smp_agent)) { - ret = PTR_ERR(port_priv->lr_smp_agent); - goto error3; - } - - /* Obtain MAD agent for PerfMgmt class */ - reg_req.mgmt_class = IB_MGMT_CLASS_PERF_MGMT; - clear_bit(IB_MGMT_METHOD_TRAP_REPRESS, - (unsigned long *)®_req.method_mask); - port_priv->perf_mgmt_agent = ib_register_mad_agent(device, port_num, - IB_QPT_GSI, - ®_req, 0, - &agent_send_handler, - &agent_recv_handler, - NULL); - if (IS_ERR(port_priv->perf_mgmt_agent)) { - ret = PTR_ERR(port_priv->perf_mgmt_agent); - goto error4; - } - - port_priv->mr = ib_reg_phys_mr(port_priv->dr_smp_agent->qp->pd, - &buf_list, 1, - IB_ACCESS_LOCAL_WRITE, &iova); - if (IS_ERR(port_priv->mr)) { - printk(KERN_ERR SPFX "Couldn't register MR\n"); - ret = PTR_ERR(port_priv->mr); - goto error5; - } - - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_add_tail(&port_priv->port_list, &ib_agent_port_list); - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); - - return 0; - -error5: - ib_unregister_mad_agent(port_priv->perf_mgmt_agent); -error4: - ib_unregister_mad_agent(port_priv->lr_smp_agent); -error3: - ib_unregister_mad_agent(port_priv->dr_smp_agent); -error2: - kfree(port_priv); -error1: - return ret; -} - -static int ib_agent_port_close(struct ib_device *device, int port_num) -{ - struct ib_agent_port_private *entry, *port_priv = NULL; - unsigned long flags; - - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if (entry->dr_smp_agent->device == device && - entry->port_num == port_num) { - port_priv = entry; - break; - } - } - - if (port_priv == NULL) { - printk(KERN_ERR SPFX "Port %d not found\n", port_num); - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); - return -ENODEV; - } - - list_del(&port_priv->port_list); - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); - - ib_dereg_mr(port_priv->mr); - - ib_unregister_mad_agent(port_priv->perf_mgmt_agent); - ib_unregister_mad_agent(port_priv->lr_smp_agent); - ib_unregister_mad_agent(port_priv->dr_smp_agent); - kfree(port_priv); - - return 0; -} - -static void ib_agent_init_device(struct ib_device *device) -{ - int ret, num_ports, cur_port, i, ret2; - struct ib_device_attr device_attr; - - ret = ib_query_device(device, &device_attr); - if (ret) { - printk(KERN_ERR SPFX "Couldn't query device %s\n", device->name); - goto error_device_query; - } - - if (device->node_type == IB_NODE_SWITCH) { - num_ports = 1; - cur_port = 0; - } else { - num_ports = device_attr.phys_port_cnt; - cur_port = 1; - } - - for (i = 0; i < num_ports; i++, cur_port++) { - ret = ib_agent_port_open(device, cur_port, num_ports); - if (ret) { - printk(KERN_ERR SPFX "Couldn't open %s port %d\n", - device->name, cur_port); - goto error_device_open; - } - } - - goto error_device_query; - -error_device_open: - while (i > 0) { - cur_port--; - ret2 = ib_agent_port_close(device, cur_port); - if (ret2) { - printk(KERN_ERR SPFX "Couldn't close %s port %d\n", - device->name, cur_port); - } - i--; - } - -error_device_query: - return; -} - -static void ib_agent_remove_device(struct ib_device *device) -{ - int ret, i, num_ports, cur_port, ret2; - struct ib_device_attr device_attr; - - ret = ib_query_device(device, &device_attr); - if (ret) { - printk(KERN_ERR SPFX "Couldn't query device %s\n", device->name); - goto error_device_query; - } - - if (device->node_type == IB_NODE_SWITCH) { - num_ports = 1; - cur_port = 0; - } else { - num_ports = device_attr.phys_port_cnt; - cur_port = 1; - } - for (i = 0; i < num_ports; i++, cur_port++) { - ret2 = ib_agent_port_close(device, cur_port); - if (ret2) { - printk(KERN_ERR SPFX "Couldn't close %s port %d\n", - device->name, cur_port); - if (!ret) - ret = ret2; - } - } - -error_device_query: - return; -} - -static struct ib_client ib_agent_client = { - .name = "ib_agent", - .add = ib_agent_init_device, - .remove = ib_agent_remove_device -}; - -static int __init ib_agent_init(void) -{ - INIT_LIST_HEAD(&ib_agent_port_list); - if (ib_register_client(&ib_agent_client)) { - printk(KERN_ERR SPFX "Couldn't register ib_agent client\n"); - return -EINVAL; - } - - return 0; -} - -static void __exit ib_agent_exit(void) -{ - ib_unregister_client(&ib_agent_client); -} - -module_init(ib_agent_init); -module_exit(ib_agent_exit); Index: ib_smi_priv.h =================================================================== --- ib_smi_priv.h (revision 1037) +++ ib_smi_priv.h (working copy) @@ -1,52 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved. - Copyright (c) 2004 Infinicon Corporation. All rights reserved. - Copyright (c) 2004 Intel Corporation. All rights reserved. - Copyright (c) 2004 Topspin Corporation. All rights reserved. - Copyright (c) 2004 Voltaire Corporation. All rights reserved. -*/ - -#ifndef __IB_AGENT_PRIV_H__ -#define __IB_AGENT_PRIV_H__ - -#include - -#define SPFX "ib_agent: " - -struct ib_agent_send_wr { - struct list_head send_list; - struct ib_mad *mad; - DECLARE_PCI_UNMAP_ADDR(mapping) -}; - -struct ib_agent_port_private { - struct list_head port_list; - struct list_head send_posted_list; - spinlock_t send_list_lock; - int port_num; - int phys_port_cnt; - struct ib_mad_agent *dr_smp_agent; /* DR SM class */ - struct ib_mad_agent *lr_smp_agent; /* LR SM class */ - struct ib_mad_agent *perf_mgmt_agent; /* PerfMgmt class */ - struct ib_mr *mr; - u64 wr_id; -}; - -#endif /* __IB_AGENT_PRIV_H__ */ Index: ib_agent.c =================================================================== --- ib_agent.c (revision 1037) +++ ib_agent.c (working copy) @@ -24,7 +24,7 @@ */ #include -#include "ib_smi_priv.h" +#include "ib_agent_priv.h" #include "ib_mad_priv.h" Index: Makefile =================================================================== --- Makefile (revision 1037) +++ Makefile (working copy) @@ -2,10 +2,10 @@ obj-$(CONFIG_INFINIBAND_ACCESS_LAYER) += \ ib_al.o \ - ib_sma.o + ib_agt.o ib_al-objs := \ ib_mad.o -ib_sma-objs := \ - ib_smi.o +ib_agt-objs := \ + ib_agent.o Index: README =================================================================== --- README (revision 1037) +++ README (working copy) @@ -43,6 +43,6 @@ 6. You are now ready to run the new access layer as follows: /sbin/modprobe ib_mthca /sbin/modprobe ib_al (This can be skipped) - /sbin/modprobe ib_sma + /sbin/modprobe ib_agt Note that starting ib_al does not cause ib_mthca to be started. From halr at voltaire.com Sun Oct 24 12:07:18 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 24 Oct 2004 15:07:18 -0400 Subject: [openib-general] [PATCH] ib_mad: For received responses, set the wr_id field of the wc is set to the wr_id of the corresponding send request. Message-ID: <1098644838.6252.29.camel@hpc-1> ib_mad: For received responses, set the wr_id field of the wc is set to the wr_id of the corresponding send request. Index: ib_mad.c =================================================================== --- ib_mad.c (revision 1046) +++ ib_mad.c (working copy) @@ -895,6 +895,7 @@ spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Defined behavior is to complete response before request */ + recv->header.recv_wc.wc->wr_id = mad_send_wr->wr_id; mad_agent_priv->agent.recv_handler( &mad_agent_priv->agent, &recv->header.recv_wc); From roland at topspin.com Sun Oct 24 21:27:39 2004 From: roland at topspin.com (Roland Dreier) Date: Sun, 24 Oct 2004 21:27:39 -0700 Subject: [openib-general] [PATCH][0/3] Trivial fixes to MAD code Message-ID: <200410242127.i9TIPUFsNZ4ESGxv@topspin.com> Here is a series of small fixes to the MAD code that I applied on my branch. They fix the build on non-i386 archs and clean up a few other minor things. - R. From roland at topspin.com Sun Oct 24 21:27:40 2004 From: roland at topspin.com (Roland Dreier) Date: Sun, 24 Oct 2004 21:27:40 -0700 Subject: [openib-general] [PATCH][1/3] Make convert_mgmt_class() really inline In-Reply-To: <200410242127.i9TIPUFsNZ4ESGxv@topspin.com> Message-ID: <200410242127.rb9yItwau1fzwwfI@topspin.com> Because the first declaration of convert_mgmt_class() had no body, gcc doesn't actually inline the function. Fix that by moving the body before the first use of the function. Index: linux-kernel/infiniband/core/ib_mad.c =================================================================== --- linux-kernel.orig/infiniband/core/ib_mad.c 2004-10-24 20:04:11.000000000 -0700 +++ linux-kernel/infiniband/core/ib_mad.c 2004-10-24 20:31:30.000000000 -0700 @@ -84,7 +84,6 @@ static int ib_mad_post_receive_mad(struct ib_mad_port_private *port_priv, struct ib_qp *qp); static int ib_mad_post_receive_mads(struct ib_mad_port_private *priv); -static inline u8 convert_mgmt_class(u8 mgmt_class); static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv); static void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, struct ib_mad_send_wc *mad_send_wc); @@ -123,6 +122,14 @@ return entry; } + +static inline u8 convert_mgmt_class(u8 mgmt_class) +{ + /* Alias IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE to 0 */ + return mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE ? + 0 : mgmt_class; +} + /* * ib_register_mad_agent - Register to send/receive MADs */ @@ -497,13 +504,6 @@ } EXPORT_SYMBOL(ib_process_mad_wc); -static inline u8 convert_mgmt_class(u8 mgmt_class) -{ - /* Alias IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE to 0 */ - return mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE ? - 0 : mgmt_class; -} - static int method_in_use(struct ib_mad_mgmt_method_table **method, struct ib_mad_reg_req *mad_reg_req) { From roland at topspin.com Sun Oct 24 21:27:40 2004 From: roland at topspin.com (Roland Dreier) Date: Sun, 24 Oct 2004 21:27:40 -0700 Subject: [openib-general] [PATCH][2/3] Use static LIST_HEAD() to initialize In-Reply-To: <200410242127.rb9yItwau1fzwwfI@topspin.com> Message-ID: <200410242127.XxqH9M49aLc4fv9B@topspin.com> Use the LIST_HEAD macro instead of explicitly calling INIT_LIST_HEAD(). Index: linux-kernel/infiniband/core/ib_agent.c =================================================================== --- linux-kernel.orig/infiniband/core/ib_agent.c 2004-10-24 20:32:27.000000000 -0700 +++ linux-kernel/infiniband/core/ib_agent.c 2004-10-24 20:32:58.000000000 -0700 @@ -35,7 +35,7 @@ static spinlock_t ib_agent_port_list_lock = SPIN_LOCK_UNLOCKED; -static struct list_head ib_agent_port_list; +static LIST_HEAD(ib_agent_port_list); /* * Fixup a directed route SMP for sending. Return 0 if the SMP should be @@ -847,7 +847,6 @@ static int __init ib_agent_init(void) { - INIT_LIST_HEAD(&ib_agent_port_list); if (ib_register_client(&ib_agent_client)) { printk(KERN_ERR SPFX "Couldn't register ib_agent client\n"); return -EINVAL; From roland at topspin.com Sun Oct 24 21:27:40 2004 From: roland at topspin.com (Roland Dreier) Date: Sun, 24 Oct 2004 21:27:40 -0700 Subject: [openib-general] [PATCH][3/3] Fix errors and warnings on 64-bit archs In-Reply-To: <200410242127.XxqH9M49aLc4fv9B@topspin.com> Message-ID: <200410242127.WEp8ZaMMWz7Usnqn@topspin.com> Fix improper use of pci_umap_addr macros (code would not compile on platforms where the macros are non-trivial). Also fix some printk format warnings (pointers can't be cast to ints on 64-bit platforms; u64 is not unsigned long long either). Index: linux-kernel/infiniband/core/ib_agent.c =================================================================== --- linux-kernel.orig/infiniband/core/ib_agent.c 2004-10-24 20:32:58.000000000 -0700 +++ linux-kernel/infiniband/core/ib_agent.c 2004-10-24 21:12:23.000000000 -0700 @@ -329,8 +329,8 @@ } spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR SPFX "agent_mad_send: no matching MAD agent 0x%x\n", - (unsigned int)mad_agent); + printk(KERN_ERR SPFX "agent_mad_send: no matching MAD agent %p\n", + mad_agent); return; } @@ -395,13 +395,13 @@ send_wr.wr.ud.mad_hdr = (struct ib_mad_hdr *)mad; send_wr.wr_id = ++port_priv->wr_id; - pci_unmap_addr_set(mad, agent_send_wr->mapping, gather_list.addr); + pci_unmap_addr_set(agent_send_wr, mapping, gather_list.addr); /* Send */ spin_lock_irqsave(&port_priv->send_list_lock, flags); if (ib_post_send_mad(mad_agent, &send_wr, &bad_send_wr)) { pci_unmap_single(mad_agent->device->dma_device, - pci_unmap_addr(mad, agent_send_wr->mapping), + pci_unmap_addr(agent_send_wr, mapping), sizeof(struct ib_mad), PCI_DMA_TODEVICE); } else { @@ -542,7 +542,7 @@ spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); if (!port_priv) { printk(KERN_ERR SPFX "agent_send_handler: no matching MAD agent " - "0x%x\n", (unsigned int)mad_agent); + "%p\n", mad_agent); return; } @@ -551,7 +551,7 @@ if (list_empty(&port_priv->send_posted_list)) { spin_unlock_irqrestore(&port_priv->send_list_lock, flags); printk(KERN_ERR SPFX "Send completion WR ID 0x%Lx but send list " - "is empty\n", mad_send_wc->wr_id); + "is empty\n", (unsigned long long) mad_send_wc->wr_id); return; } @@ -568,8 +568,7 @@ /* Unmap PCI */ pci_unmap_single(mad_agent->device->dma_device, - pci_unmap_addr(agent_send_wr->smp, - agent_send_wr->mapping), + pci_unmap_addr(agent_send_wr, mapping), sizeof(struct ib_mad), PCI_DMA_TODEVICE); @@ -595,9 +594,8 @@ } spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); if (!port_priv) { - printk(KERN_ERR SPFX "agent_recv_handler: no matching MAD agent 0x%x\n", - (unsigned int)mad_agent); - + printk(KERN_ERR SPFX "agent_recv_handler: no matching MAD agent %p\n", + mad_agent); } else { agent_recv_mad(mad_agent, mad_recv_wc->recv_buf->mad, Index: linux-kernel/infiniband/core/ib_mad.c =================================================================== --- linux-kernel.orig/infiniband/core/ib_mad.c 2004-10-24 20:36:01.000000000 -0700 +++ linux-kernel/infiniband/core/ib_mad.c 2004-10-24 21:12:09.000000000 -0700 @@ -790,8 +790,8 @@ ret: if (!mad_agent->agent.recv_handler) { printk(KERN_ERR PFX "No receive handler for client " - "0x%x on port %d\n", - (unsigned int)&mad_agent->agent, + "%p on port %d\n", + &mad_agent->agent, port_priv->port_num); mad_agent = NULL; } @@ -957,7 +957,8 @@ } else { printk(KERN_ERR PFX "Receive completion WR ID 0x%Lx on QP %d " - "with no posted receive\n", wc->wr_id, qp_num); + "with no posted receive\n", (unsigned long long) wc->wr_id, + qp_num); spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); return; @@ -1123,7 +1124,7 @@ spin_lock_irqsave(&port_priv->send_list_lock, flags); if (list_empty(&port_priv->send_posted_mad_list)) { printk(KERN_ERR PFX "Send completion WR ID 0x%Lx but send " - "list is empty\n", wc->wr_id); + "list is empty\n", (unsigned long long) wc->wr_id); goto error; } @@ -1133,7 +1134,7 @@ if (wc->wr_id != (unsigned long)mad_send_wr) { printk(KERN_ERR PFX "Send completion WR ID 0x%Lx doesn't match " "posted send WR ID 0x%lx\n", - wc->wr_id, + (unsigned long long) wc->wr_id, (unsigned long)mad_send_wr); goto error; } @@ -1163,7 +1164,7 @@ while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", - wc.opcode, wc.wr_id); + wc.opcode, (unsigned long long) wc.wr_id); switch (wc.opcode) { case IB_WC_SEND: if (wc.status != IB_WC_SUCCESS) @@ -1463,7 +1464,7 @@ kmem_cache_free(ib_mad_cache, mad_priv); printk(KERN_NOTICE PFX "ib_post_recv WRID 0x%Lx failed ret = %d\n", - recv_wr.wr_id, ret); + (unsigned long long) recv_wr.wr_id, ret); return -EINVAL; } From Andras.Horvath at cern.ch Mon Oct 25 03:22:34 2004 From: Andras.Horvath at cern.ch (Andras.Horvath at cern.ch) Date: Mon, 25 Oct 2004 12:22:34 +0200 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52d5zahh3y.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> Message-ID: <20041025102234.GV21516@cern.ch> > Unfortunately there are no user space verbs right now (work should be > starting soon). When we do implement the verbs the API will most > likely be closer to the current kernel API than to VAPI. OK, I guess we can modify our apps. > I think I understand this crash and know how to get rid of it. > However, IPoIB probably still will not work for you because of the > multicast member record issue with your SM. That will be fixed once > we cut over to the new MAD code, which should happen next week > (assuming that the new MAD module work is completed). I was not running the Voltaire SM but a back-to-back setup without any subnet manager in fact (since I can't compile the userspace stuff due to missing files like ib_al.h - can somebody please provide instructions on how to build the userspace part?) So maybe I did the unexpected, ping'ing away without SM, hence the crash :] thanks a lot, Andras From halr at voltaire.com Mon Oct 25 05:36:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 08:36:01 -0400 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <20041025102234.GV21516@cern.ch> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <20041025102234.GV21516@cern.ch> Message-ID: <1098707761.3269.840.camel@localhost.localdomain> On Mon, 2004-10-25 at 06:22, Andras.Horvath at cern.ch wrote: > So maybe I did the unexpected, ping'ing away without SM, hence the crash > :] In order for a ping to work, the port must be active (or at least armed) which requires the SM. However, if there was no SM in the subnet, that would not be the case but the ping should not cause a crash; just a command failure. -- Hal From halr at voltaire.com Mon Oct 25 07:33:35 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 10:33:35 -0400 Subject: [openib-general] [PATCH][1/3] Make convert_mgmt_class() really inline In-Reply-To: <200410242127.rb9yItwau1fzwwfI@topspin.com> References: <200410242127.rb9yItwau1fzwwfI@topspin.com> Message-ID: <1098714815.4191.4.camel@hpc-1> On Mon, 2004-10-25 at 00:27, Roland Dreier wrote: > Because the first declaration of convert_mgmt_class() had no body, gcc > doesn't actually inline the function. Fix that by moving the body > before the first use of the function. Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 25 07:40:06 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 10:40:06 -0400 Subject: [openib-general] [PATCH][2/3] Use static LIST_HEAD() to initialize In-Reply-To: <200410242127.XxqH9M49aLc4fv9B@topspin.com> References: <200410242127.XxqH9M49aLc4fv9B@topspin.com> Message-ID: <1098715205.4191.9.camel@hpc-1> On Mon, 2004-10-25 at 00:27, Roland Dreier wrote: > Use the LIST_HEAD macro instead of explicitly calling INIT_LIST_HEAD(). Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 25 07:47:41 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 10:47:41 -0400 Subject: [openib-general] [PATCH][3/3] Fix errors and warnings on 64-bit archs In-Reply-To: <200410242127.WEp8ZaMMWz7Usnqn@topspin.com> References: <200410242127.WEp8ZaMMWz7Usnqn@topspin.com> Message-ID: <1098715661.4191.16.camel@hpc-1> On Mon, 2004-10-25 at 00:27, Roland Dreier wrote: > Fix improper use of pci_umap_addr macros (code would not compile on > platforms where the macros are non-trivial). Also fix some printk > format warnings (pointers can't be cast to ints on 64-bit platforms; > u64 is not unsigned long long either). Thanks. Applied. -- Hal From roland at topspin.com Mon Oct 25 09:20:57 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 09:20:57 -0700 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <20041025102234.GV21516@cern.ch> (Andras Horvath's message of "Mon, 25 Oct 2004 12:22:34 +0200") References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <20041025102234.GV21516@cern.ch> Message-ID: <52654yepli.fsf@topspin.com> Andras> I was not running the Voltaire SM but a back-to-back setup Andras> without any subnet manager in fact (since I can't compile Andras> the userspace stuff due to missing files like ib_al.h - Andras> can somebody please provide instructions on how to build Andras> the userspace part?) So maybe I did the unexpected, Andras> ping'ing away without SM, hence the crash :] You will not be able to use IPoIB without an SM. The crash in IPoIB should be fixed now but it still will not work without SM. As I said before we don't have any userspace support right now. I am currently working on the userspace support required for the SM but there is not even anything to test yet. Thanks, Roland From halr at voltaire.com Mon Oct 25 09:35:18 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 12:35:18 -0400 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52mzyee981.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> Message-ID: <1098722118.3269.929.camel@localhost.localdomain> On Fri, 2004-10-22 at 17:25, Roland Dreier wrote: > Hal> The registration interface will need to be extended somehow > Hal> for this. I don't think we want to add an attribute ID mask > Hal> :-) The simplest way I can think of is to add an issm bit to > Hal> the registration request structure and special case these > Hal> unsolicited receives to go to the SM if it has been > Hal> registered and to drop otherwise. > > All these special cases are starting to look pretty awkward. What are the other special cases for registration ? > Another solution is to get rid of the snoop_mad entry point and the SMA and > PMA agents in ib_smi, and just pass every MAD to the low-level > driver. The low-level driver can decide to handle it (consume it > and/or generate a response) or not. Everything not consumed by the > low-level driver gets put through agent dispatch as usual. Wish this part of the discussion occurred 10 days ago or so :-) > This also avoids having to add another agent for BMA to ib_smi. Adding another agent is relatively trivial as evidenced by PMA. > Also it makes it pretty easy for low-level drivers to handle > vendor-specific GSI classes (of course drivers can register an agent > for every port and class they want to handle but it turns into a bit > of a pain). -- Hal From halr at voltaire.com Mon Oct 25 09:39:36 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 12:39:36 -0400 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52is92e5mh.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> Message-ID: <1098722375.3269.934.camel@localhost.localdomain> On Fri, 2004-10-22 at 18:43, Roland Dreier wrote: > Sean> I think I'm missing something here. I thought that the > Sean> snoop_mad entry point was the solution to this issue. > > Except as currently defined, it doesn't provide a way for the > low-level driver to give back a response -- it just lets the low-level > driver steal MADs like locally generated traps. There's no way to > handle SMA, PMA, BMA and vendor-specific requests etc. > > My suggestion basically amounts to expanding snoop_mad to handle > everything. But then there's no need for the process_mad entry point > (snoop_mad becomes the interface). So we might as well get rid of > snoop_mad since I think the process_mad name is more descriptive. OK. It's pretty straightforward to change the MAD layer to use PLM rather than snoop MAD (and remove snoop_mad (undo that patch)). Should I post the changes ? -- Hal From halr at voltaire.com Mon Oct 25 09:44:32 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 12:44:32 -0400 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52is92fqi6.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <1098469091.22400.0.camel@hpc-1> <524qkmhb2d.fsf@topspin.com> <1098472467.22400.6.camel@hpc-1> <52is92fqi6.fsf@topspin.com> Message-ID: <1098722672.3269.938.camel@localhost.localdomain> On Fri, 2004-10-22 at 16:26, Roland Dreier wrote: > In any case I think I'll integrate the new MAD stuff on Monday no > matter where it stands, and we can just deal with merging changes back > and forth. It appears we now (as of late yesterday) need to deal with merging changes back and forth. When do you see the official gen2 branch being setup ? What will happen to the roland_merge branch at that point ? -- Hal From roland at topspin.com Mon Oct 25 09:51:57 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 09:51:57 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <1098722118.3269.929.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 25 Oct 2004 12:35:18 -0400") References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <1098722118.3269.929.camel@localhost.localdomain> Message-ID: <52wtxed9le.fsf@topspin.com> Hal> What are the other special cases for registration ? Not just registration... I just meant that having an extra snoop_mad entry point and a special issm bit and hard-coding different treatment of SMInfo in the MAD layer starts to smell to me like the MAD layer is at the wrong level of abstraction... (and I haven't thought through the CM yet but I'm a little worried about how the solicited/unsolicited distinction that the MAD layer makes will fit with the CM). Hal> Wish this part of the discussion occurred 10 days ago or so :-) I first raised the issue at least two months ago: http://article.gmane.org/gmane.linux.drivers.openib/4217 Roland> This also avoids having to add another agent for BMA to ib_smi. Hal> Adding another agent is relatively trivial as evidenced by PMA. Sure, it's not a big deal; it's only a minor difference either way, but giving the low-level driver first crack at all MADs does make the ib_agent code simpler. - Roland From roland at topspin.com Mon Oct 25 09:52:36 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 09:52:36 -0700 Subject: [openib-general] ib_verbs.h (Roland's branch): IB_WC_FATAL_ERR In-Reply-To: <1098639628.3512.10.camel@hpc-1> (Hal Rosenstock's message of "Sun, 24 Oct 2004 13:40:28 -0400") References: <1098639628.3512.10.camel@hpc-1> Message-ID: <52sm82d9kb.fsf@topspin.com> Hal> Hi, Is there a need for IB_WC_FATAL_ERR value of ib_wc_status Hal> enum ? It appears to be unused. If so, can it be removed from Hal> ib_verbs.h (Roland's branch) ? I think it's meant to be used for local catastrophic errors. Since mthca doesn't generate catastrophic errors (yet), it appears unused, but I think we want to leave it in. - R. From roland at topspin.com Mon Oct 25 10:01:02 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 10:01:02 -0700 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <1098722672.3269.938.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 25 Oct 2004 12:44:32 -0400") References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <1098469091.22400.0.camel@hpc-1> <524qkmhb2d.fsf@topspin.com> <1098472467.22400.6.camel@hpc-1> <52is92fqi6.fsf@topspin.com> <1098722672.3269.938.camel@localhost.localdomain> Message-ID: <52fz42d969.fsf@topspin.com> Hal> When do you see the official gen2 branch being setup ? What Hal> will happen to the roland_merge branch at that point ? I plan to continue working on my branch, and probably set up a separate branch for staging code to be submitted to the kernel soon (eg for removing the CM, SDP, etc... I'll keep them on roland-merge but they won't go to the kernel for a while). I guess for a branch to be official, something official has to happen in the OpenIB organization, but I'm not sure what. I'm not going to worry about that for now.... - Roland From roland at topspin.com Mon Oct 25 10:08:51 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 10:08:51 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <1098722375.3269.934.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 25 Oct 2004 12:39:36 -0400") References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> Message-ID: <52breqd8t8.fsf@topspin.com> Hal> OK. It's pretty straightforward to change the MAD layer to Hal> use PLM rather than snoop MAD (and remove snoop_mad (undo Hal> that patch)). Should I post the changes ? It's my idea so I certainly like the approach :) Sean, what do you think? - R. From mshefty at ichips.intel.com Mon Oct 25 10:09:06 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 25 Oct 2004 10:09:06 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <1098722375.3269.934.camel@localhost.localdomain> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> Message-ID: <20041025100906.3a9aa95e.mshefty@ichips.intel.com> On Mon, 25 Oct 2004 12:39:36 -0400 Hal Rosenstock wrote: > OK. It's pretty straightforward to change the MAD layer to use PLM > rather than snoop MAD (and remove snoop_mad (undo that patch)). Should I > post the changes ? I think that this makes sense. - Sean From roland at topspin.com Mon Oct 25 10:16:23 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 10:16:23 -0700 Subject: 64-bit compat (was Re: [openib-general] [PATCH][3/3] Fix errors and warnings on 64-bit archs) In-Reply-To: <200410242127.WEp8ZaMMWz7Usnqn@topspin.com> (Roland Dreier's message of "Sun, 24 Oct 2004 21:27:40 -0700") References: <200410242127.WEp8ZaMMWz7Usnqn@topspin.com> Message-ID: <527jped8go.fsf@topspin.com> By the way, in case someone else wants to use the same approach, here's how I make sure my changes build across multiple archs: I'm using toolchains built with and the attached script to make sure my tree builds on i386, x86_64, ppc64, ia64, ppc, sparc64 and i386/gcc-2.95. I just have to run one command to get a build report: $ check-oib compiling for i386 ... success compiling for x86_64 ... success compiling for ppc64 ... success compiling for ia64 ... success ERROR: ia64_monarch_init_handler: 186 slots, total region length = 0 compiling for sparc64 ... success compiling for i386 (gcc-2.95) ... success compiling for ppc ... success - Roland -------------- next part -------------- A non-text attachment was scrubbed... Name: check-oib Type: text/x-perl Size: 1946 bytes Desc: not available URL: From mshefty at ichips.intel.com Mon Oct 25 10:14:15 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 25 Oct 2004 10:14:15 -0700 Subject: [openib-general] [PATCH] ib_mad: In ib_mad_complete_recv, decrement agent refcount when not fully reassembled and when no request found In-Reply-To: <1098639481.3512.6.camel@hpc-1> References: <1098639481.3512.6.camel@hpc-1> Message-ID: <20041025101415.4c14f12b.mshefty@ichips.intel.com> On Sun, 24 Oct 2004 13:38:01 -0400 Hal Rosenstock wrote: > ib_mad: In ib_mad_complete_recv, decrement agent reference count when > receive is not fully reassembled, and also when solicited and no > matching request is found. This allows deregistration to complete rather > than waiting for an event which never occurs. > ... > /* Fully reassemble receive before processing */ > recv = reassemble_recv(mad_agent_priv, recv); > - if (!recv) > + if (!recv) { > + atomic_dec(&mad_agent_priv->refcount); > return; > + } I'm not sure about this. If we just start assembling a MAD, we're probably going to want to maintain a reference on the MAD agent while the reassembly is occurring, in order to handle timeouts. We'll have a better idea of what the RMPP code will do once it's actually written however, so that reference could be a different one. If we do keep this code, it should probably be an atomic_dec_and_test, followed by a wake_up. > /* Complete corresponding request */ > if (solicited) { > @@ -884,6 +886,7 @@ > if (!mad_send_wr) { > spin_unlock_irqrestore(&mad_agent_priv->lock, flags); > ib_free_recv_mad(&recv->header.recv_wc); > + atomic_dec(&mad_agent_priv->refcount); > return; I think that we want this to be atomic_dec_and_test(). (Similar to the call near the end of this function.) If a match is found, we can get away with a simple atomic_dec, since the send will still hold a reference on the mad agent. But if no match is found, then I think this may be the last reference being held. From mshefty at ichips.intel.com Mon Oct 25 10:27:52 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 25 Oct 2004 10:27:52 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52breqd8t8.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> <52breqd8t8.fsf@topspin.com> Message-ID: <20041025102752.3113fb69.mshefty@ichips.intel.com> On Mon, 25 Oct 2004 10:08:51 -0700 Roland Dreier wrote: > Hal> OK. It's pretty straightforward to change the MAD layer to > Hal> use PLM rather than snoop MAD (and remove snoop_mad (undo > Hal> that patch)). Should I post the changes ? > > It's my idea so I certainly like the approach :) > > Sean, what do you think? I think that it makes sense, but just to make sure that I'm clear on this. We want to pass every received MAD to the HCA driver before any processing has occurred on the MAD, correct? If the MAD is not consumed by the driver, the MAD layer may update the MAD and call process_local_mad a second time, correct? Or are we only talking about calling process_local_mad for all received LID routed MADs? - Sean From roland at topspin.com Mon Oct 25 10:34:09 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 10:34:09 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <20041025102752.3113fb69.mshefty@ichips.intel.com> (Sean Hefty's message of "Mon, 25 Oct 2004 10:27:52 -0700") References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> <52breqd8t8.fsf@topspin.com> <20041025102752.3113fb69.mshefty@ichips.intel.com> Message-ID: <52y8hubt2m.fsf@topspin.com> Sean> I think that it makes sense, but just to make sure that I'm Sean> clear on this. We want to pass every received MAD to the Sean> HCA driver before any processing has occurred on the MAD, Sean> correct? That's my plan... Sean> If the MAD is not consumed by the driver, the MAD Sean> layer may update the MAD and call process_local_mad a second Sean> time, correct? Sure, I guess so -- nothing should break if the MAD layer does this (since the driver can't remember that it already saw the MAD). But when do you see this being done? If the driver didn't handle the MAD the first time around, it's unlikely to do anything different on the second try (unless the MAD layer does something extreme like change the attribute ID, and I can't think of a time when we'd do something like that). - R. From halr at voltaire.com Mon Oct 25 10:35:05 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 13:35:05 -0400 Subject: [openib-general] [PATCH] ib_mad: In ib_mad_complete_recv, decrement agent refcount when not fully reassembled and when no request found In-Reply-To: <20041025101415.4c14f12b.mshefty@ichips.intel.com> References: <1098639481.3512.6.camel@hpc-1> <20041025101415.4c14f12b.mshefty@ichips.intel.com> Message-ID: <1098725705.3269.951.camel@localhost.localdomain> On Mon, 2004-10-25 at 13:14, Sean Hefty wrote: > On Sun, 24 Oct 2004 13:38:01 -0400 > Hal Rosenstock wrote: > > > ib_mad: In ib_mad_complete_recv, decrement agent reference count when > > receive is not fully reassembled, and also when solicited and no > > matching request is found. This allows deregistration to complete rather > > than waiting for an event which never occurs. > > ... > > /* Fully reassemble receive before processing */ > > recv = reassemble_recv(mad_agent_priv, recv); > > - if (!recv) > > + if (!recv) { > > + atomic_dec(&mad_agent_priv->refcount); > > return; > > + } > > I'm not sure about this. I wasn't sure about this either. > If we just start assembling a MAD, we're probably going to want to maintain > a reference on the MAD agent while the reassembly is occurring, in order to > handle timeouts. We'll have a better idea of what the RMPP code will do > once it's actually written however, so that reference could be a different one. Yes, but it gets incremented once every received segment and never decremented (prior to this patch). This is to handle the increment at line 1003 before ib_mad_complete_recv is called in ib_mad_recv_done_handler. > If we do keep this code, it should probably be an atomic_dec_and_test, > followed by a wake_up. Similar to below, I don't think it is the last reference held on the agent. > > /* Complete corresponding request */ > > if (solicited) { > > @@ -884,6 +886,7 @@ > > if (!mad_send_wr) { > > spin_unlock_irqrestore(&mad_agent_priv->lock, flags); > > ib_free_recv_mad(&recv->header.recv_wc); > > + atomic_dec(&mad_agent_priv->refcount); > > return; > > I think that we want this to be atomic_dec_and_test(). > (Similar to the call near the end of this function.) If a match is found, > we can get away with a simple atomic_dec, since the send will still hold a > reference on the mad agent. But if no match is found, then I think this > may be the last reference being held. I didn't change the match path only the non match. It's not the last reference held as one other reference is held on the agent which is given up at deregistration time. -- Hal From mshefty at ichips.intel.com Mon Oct 25 10:41:15 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 25 Oct 2004 10:41:15 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52y8hubt2m.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> <52breqd8t8.fsf@topspin.com> <20041025102752.3113fb69.mshefty@ichips.intel.com> <52y8hubt2m.fsf@topspin.com> Message-ID: <20041025104115.275dc24b.mshefty@ichips.intel.com> On Mon, 25 Oct 2004 10:34:09 -0700 Roland Dreier wrote: > Sean> If the MAD is not consumed by the driver, the MAD > Sean> layer may update the MAD and call process_local_mad a second > Sean> time, correct? > > Sure, I guess so -- nothing should break if the MAD layer does this > (since the driver can't remember that it already saw the MAD). But > when do you see this being done? If the driver didn't handle the MAD > the first time around, it's unlikely to do anything different on the > second try (unless the MAD layer does something extreme like change > the attribute ID, and I can't think of a time when we'd do something > like that). I was thinking of DR MADs, that have several checks, plus updates to the hop_ptr. Are we talking about handing MADs to the driver immediately after it is received, or immediately before it would be dispatched to other clients? From roland at topspin.com Mon Oct 25 10:47:53 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 10:47:53 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <20041025104115.275dc24b.mshefty@ichips.intel.com> (Sean Hefty's message of "Mon, 25 Oct 2004 10:41:15 -0700") References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> <52breqd8t8.fsf@topspin.com> <20041025102752.3113fb69.mshefty@ichips.intel.com> <52y8hubt2m.fsf@topspin.com> <20041025104115.275dc24b.mshefty@ichips.intel.com> Message-ID: <52u0sibsfq.fsf@topspin.com> Sean> I was thinking of DR MADs, that have several checks, plus Sean> updates to the hop_ptr. Are we talking about handing MADs Sean> to the driver immediately after it is received, or Sean> immediately before it would be dispatched to other clients? I guess right before dispatch to other clients -- there's no way for the low-level driver to tell whether or not the hop_count/hop_ptr checks and updates have been done or not, so it would just (incorrectly) consume the MAD and generate a response the first time around. - R. From halr at voltaire.com Mon Oct 25 10:57:21 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 13:57:21 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Add IB_WC_FATAL_ERR to ib_wc_status enum Message-ID: <1098727040.4106.3.camel@hpc-1> ib_verbs.h: Add IB_WC_FATAL_ERR to ib_wc_status enum (to duplicate what is in Roland's branch) Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 1058) +++ ib_verbs.h (working copy) @@ -581,6 +581,7 @@ IB_WC_REM_ABORT_ERR, IB_WC_INV_EECN_ERR, IB_WC_INV_EEC_STATE_ERR, + IB_WC_FATAL_ERR, IB_WC_GENERAL_ERR, IB_WC_RESP_TIMEOUT_ERR }; From roland at topspin.com Mon Oct 25 10:57:33 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 10:57:33 -0700 Subject: [openib-general] 2 questions on physical code layout Message-ID: <52pt36brzm.fsf@topspin.com> I have a couple of questions/suggestions about how we want to arrange the code for kernel inclusion: 1. Does it make sense to remove the ib_ prefixes on .c files in drivers/infiniband/core? After all, someone looking in drivers/infiniband should realize they're looking at IB source code. For reference, drivers/usb/core has file with names like hub.c, hcd.c, sysfs.c, ... and net/core has neighbour.c, sock.c, stream.c, .... (If we move our includes somewhere like include/infiniband, I would suggest getting rid of the ib_ prefix from .h files as well, since C files can do "#include ") 2. Should we combine ib_mad.c and ib_agent.c into a single module? In the past I've argued against arbitrarily combining modules, but in this case, ib_agent.c doesn't export any symbols (so dependencies can't bring it in automatically), and everything silently fails if it's not loaded. I'm afraid that "Why are my ports stuck in the DOWN state?" is going to become our most popular FAQ if we don't address this. mthca is auto-loaded by hotplug, and modprobe ib_ipoib will bring in every other required module, so ib_agent is the only problem I see right now. I can't think of any situation where one would want IB drivers loaded without a functioning SMI, so I don't see any disadvantage to having ib_agent.c linked into the same .ko as ib_mad.c. Thanks, Roland From iod00d at hp.com Mon Oct 25 11:17:17 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 25 Oct 2004 11:17:17 -0700 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <52pt36brzm.fsf@topspin.com> References: <52pt36brzm.fsf@topspin.com> Message-ID: <20041025181717.GE22847@cup.hp.com> On Mon, Oct 25, 2004 at 10:57:33AM -0700, Roland Dreier wrote: > I have a couple of questions/suggestions about how we want to arrange > the code for kernel inclusion: > > 1. Does it make sense to remove the ib_ prefixes on .c files in > drivers/infiniband/core? Up to you (or whoever maintains the code). Some drivers that have their own subdir keep the prefixes. e1000 and sym2 drivers are the counter examples I had in mind. grant From roland at topspin.com Mon Oct 25 11:18:40 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 11:18:40 -0700 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <52pt36brzm.fsf@topspin.com> (Roland Dreier's message of "Mon, 25 Oct 2004 10:57:33 -0700") References: <52pt36brzm.fsf@topspin.com> Message-ID: <52lldubr0f.fsf@topspin.com> By the way, here's what the diff to the code to combine ib_mad and ib_agent looks like (appropriate Makefile changes are also needed). Index: linux-kernel/infiniband/core/ib_agent.c =================================================================== --- linux-kernel.orig/infiniband/core/ib_agent.c 2004-10-24 21:28:25.000000000 -0700 +++ linux-kernel/infiniband/core/ib_agent.c 2004-10-25 11:11:46.000000000 -0700 @@ -837,26 +837,8 @@ return; } -static struct ib_client ib_agent_client = { +struct ib_client agent_client = { .name = "ib_agent", .add = ib_agent_init_device, .remove = ib_agent_remove_device }; - -static int __init ib_agent_init(void) -{ - if (ib_register_client(&ib_agent_client)) { - printk(KERN_ERR SPFX "Couldn't register ib_agent client\n"); - return -EINVAL; - } - - return 0; -} - -static void __exit ib_agent_exit(void) -{ - ib_unregister_client(&ib_agent_client); -} - -module_init(ib_agent_init); -module_exit(ib_agent_exit); Index: linux-kernel/infiniband/core/ib_mad.c =================================================================== --- linux-kernel.orig/infiniband/core/ib_mad.c 2004-10-24 21:28:25.000000000 -0700 +++ linux-kernel/infiniband/core/ib_mad.c 2004-10-25 11:12:58.000000000 -0700 @@ -1999,6 +1999,8 @@ .remove = ib_mad_remove_device }; +extern struct ib_client agent_client; + static int __init ib_mad_init_module(void) { int ret; @@ -2022,8 +2024,17 @@ ret = -EINVAL; goto error2; } + + if (ib_register_client(&agent_client)) { + printk(KERN_ERR PFX "Couldn't register ib_agent client\n"); + ret = -EINVAL; + goto error3; + } + return 0; +error3: + ib_unregister_client(&mad_client); error2: kmem_cache_destroy(ib_mad_cache); error1: @@ -2032,6 +2043,7 @@ static void __exit ib_mad_cleanup_module(void) { + ib_unregister_client(&agent_client); ib_unregister_client(&mad_client); if (kmem_cache_destroy(ib_mad_cache)) { From halr at voltaire.com Mon Oct 25 11:21:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 14:21:24 -0400 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <52pt36brzm.fsf@topspin.com> References: <52pt36brzm.fsf@topspin.com> Message-ID: <1098728484.3269.968.camel@localhost.localdomain> On Mon, 2004-10-25 at 13:57, Roland Dreier wrote: > I have a couple of questions/suggestions about how we want to arrange > the code for kernel inclusion: > > 1. Does it make sense to remove the ib_ prefixes on .c files in > drivers/infiniband/core? After all, someone looking in > drivers/infiniband should realize they're looking at IB source > code. For reference, drivers/usb/core has file with names like > hub.c, hcd.c, sysfs.c, ... and net/core has neighbour.c, > sock.c, stream.c, .... > > (If we move our includes somewhere like include/infiniband, I > would suggest getting rid of the ib_ prefix from .h files as > well, since C files can do "#include ") Seems reasonable to me. > 2. Should we combine ib_mad.c and ib_agent.c into a single module? > In the past I've argued against arbitrarily combining modules, > but in this case, ib_agent.c doesn't export any symbols (so > dependencies can't bring it in automatically), and everything > silently fails if it's not loaded. Making the agent(s) a separate module was arbitrary and historical. They can easily be merged. It also seems we could make the MAD module pull in the agent module if we want to keep them separate. > I'm afraid that "Why are my ports stuck in the DOWN state?" is > going to become our most popular FAQ if we don't address this. > mthca is auto-loaded by hotplug, and modprobe ib_ipoib will > bring in every other required module, so ib_agent is the only > problem I see right now. > > I can't think of any situation where one would want IB drivers > loaded without a functioning SMI, so I don't see any > disadvantage to having ib_agent.c linked into the same .ko as ib_mad.c. The only situation I can see for this is certain MAD layer tests. -- Hal From mshefty at ichips.intel.com Mon Oct 25 11:22:25 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 25 Oct 2004 11:22:25 -0700 Subject: [openib-general] [PATCH] ib_mad: In ib_mad_complete_recv, decrement agent refcount when not fully reassembled and when no request found In-Reply-To: <1098725705.3269.951.camel@localhost.localdomain> References: <1098639481.3512.6.camel@hpc-1> <20041025101415.4c14f12b.mshefty@ichips.intel.com> <1098725705.3269.951.camel@localhost.localdomain> Message-ID: <20041025112225.077481ec.mshefty@ichips.intel.com> On Mon, 25 Oct 2004 13:35:05 -0400 Hal Rosenstock wrote: > > I think that we want this to be atomic_dec_and_test(). > > (Similar to the call near the end of this function.) If a match is found, > > we can get away with a simple atomic_dec, since the send will still hold a > > reference on the mad agent. But if no match is found, then I think this > > may be the last reference being held. > > I didn't change the match path only the non match. It's not the last > reference held as one other reference is held on the agent which is > given up at deregistration time. I only mentioned the match path, since it can rely on another reference being held against the mad_agent from the send. I.e. the match path and non-match path differ with respect to how many other references are held on the mad_agent. The reference taken for the receive is made by the MAD layer itself, so the client isn't aware that one was taken. If the client deregisters the MAD service at the same time that a MAD is received, then the reference from the registration will have been released, leaving only the reference from the receive being held. So, in the case of a non-match, we need to release the reference, but also check to see if that was the last reference held. - Sean From roland at topspin.com Mon Oct 25 11:26:05 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 11:26:05 -0700 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <20041025181717.GE22847@cup.hp.com> (Grant Grundler's message of "Mon, 25 Oct 2004 11:17:17 -0700") References: <52pt36brzm.fsf@topspin.com> <20041025181717.GE22847@cup.hp.com> Message-ID: <52hdoibqo2.fsf@topspin.com> Grant> Up to you (or whoever maintains the code). Some drivers Grant> that have their own subdir keep the prefixes. e1000 and Grant> sym2 drivers are the counter examples I had in mind. Good point... well, I'm getting sick of typing ib_ :) Also I'd argue that the e1000_ or sym_ prefixes are more needed because the drivers are so close to drivers/net and drivers/scsi -- although they are in their own subdirectories there's not as much mental separation from other net/scsi drivers (eg I often look at tg3.c and then e1000_main.c, so e1000/main.c might be a little confusing to me). Anyway this doesn't matter that much to me but I'd like to resolve it one way or another before we try to go upstream. Right now my drivers/infiniband/core directory has both ib_verbs.c, ib_sysfs.c, etc. as well as plain file names like packer.c, ud_header.c. - R. From halr at voltaire.com Mon Oct 25 11:31:37 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 14:31:37 -0400 Subject: [openib-general] [PATCH] ib_mad: In ib_mad_complete_recv, decrement agent refcount when not fully reassembled and when no request found In-Reply-To: <20041025112225.077481ec.mshefty@ichips.intel.com> References: <1098639481.3512.6.camel@hpc-1> <20041025101415.4c14f12b.mshefty@ichips.intel.com> <1098725705.3269.951.camel@localhost.localdomain> <20041025112225.077481ec.mshefty@ichips.intel.com> Message-ID: <1098729097.3269.971.camel@localhost.localdomain> On Mon, 2004-10-25 at 14:22, Sean Hefty wrote: > The reference taken for the receive is made by the MAD layer itself, so the client isn't aware that > one was taken. If the client deregisters the MAD service at the same time that a MAD is received, > then the reference from the registration will have been released, leaving only the reference from > the receive being held. So, in the case of a non-match, we need to release the reference, but also > check to see if that was the last reference held. OK. It seems to apply to both as you said in your original email. I will generate a patch for this. -- Hal From halr at voltaire.com Mon Oct 25 11:57:20 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 14:57:20 -0400 Subject: [openib-general] [PATCH] ib_mad: In ib_mad_complete_recv, use atomic_dec_and_test and wait_event rather than just atomic_dec for RMPP segments and when solicited and there is no matching request found Message-ID: <1098730639.6186.3.camel@hpc-1> ib_mad: In ib_mad_complete_recv, use atomic_dec_and_test and wait_event rather than just atomic_dec for RMPP segments and when solicited and there is no matching request found. Index: ib_mad.c =================================================================== --- ib_mad.c (revision 1062) +++ ib_mad.c (working copy) @@ -875,7 +875,8 @@ /* Fully reassemble receive before processing */ recv = reassemble_recv(mad_agent_priv, recv); if (!recv) { - atomic_dec(&mad_agent_priv->refcount); + if (atomic_dec_and_test(&mad_agent_priv->refcount)) + wake_up(&mad_agent_priv->wait); return; } @@ -887,7 +888,8 @@ if (!mad_send_wr) { spin_unlock_irqrestore(&mad_agent_priv->lock, flags); ib_free_recv_mad(&recv->header.recv_wc); - atomic_dec(&mad_agent_priv->refcount); + if (atomic_dec_and_test(&mad_agent_priv->refcount)) + wake_up(&mad_agent_priv->wait); return; } /* Timeout = 0 means that we won't wait for a response */ From halr at voltaire.com Mon Oct 25 12:41:04 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 15:41:04 -0400 Subject: [openib-general] [PATCH] Start moving to a native IPoIB driver In-Reply-To: <52pt3vpoxz.fsf@topspin.com> References: <52pt3vpoxz.fsf@topspin.com> Message-ID: <1098733264.3266.6.camel@localhost.localdomain> On Wed, 2004-10-06 at 16:32, Roland Dreier wrote: > I've just committed this patch. It removes the fake ethernet layer > and starts turning IPoIB into a native driver (with addr_len 20 and > type ARPHRD_INFINIBAND). The driver is working pretty well with these > changes, although multicast is not working at all and there are lots > of leaks and races that I still need to fix up. In terms of the change to net/ip.h in linux-2.6.9-ipoib-multicast.diff, I'm curious as to why the mapping doesn't match the MGID layout (and where that conversion is done). Also, why is there a placeholder for the multicast QPN here ? The endian also looks a little odd to me as well. I will take more of a look here when I get a chance in the not too distant future. -- Hal From halr at voltaire.com Mon Oct 25 12:56:33 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 15:56:33 -0400 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52u0sibsfq.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> <52breqd8t8.fsf@topspin.com> <20041025102752.3113fb69.mshefty@ichips.intel.com> <52y8hubt2m.fsf@topspin.com> <20041025104115.275dc24b.mshefty@ichips.intel.com> <52u0sibsfq.fsf@topspin.com> Message-ID: <1098734193.3266.8.camel@localhost.localdomain> On Mon, 2004-10-25 at 13:47, Roland Dreier wrote: > I guess right before dispatch to other clients -- there's no way for > the low-level driver to tell whether or not the hop_count/hop_ptr > checks and updates have been done or not, so it would just > (incorrectly) consume the MAD and generate a response the first time > around. Do the responses get back to the SM ? If so, would there be 2 responses for certain things ? -- Hal From roland at topspin.com Mon Oct 25 13:09:47 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 13:09:47 -0700 Subject: [openib-general] [PATCH] Start moving to a native IPoIB driver In-Reply-To: <1098733264.3266.6.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 25 Oct 2004 15:41:04 -0400") References: <52pt3vpoxz.fsf@topspin.com> <1098733264.3266.6.camel@localhost.localdomain> Message-ID: <52d5z6blv8.fsf@topspin.com> Hal> In terms of the change to net/ip.h in Hal> linux-2.6.9-ipoib-multicast.diff, I'm curious as to why the Hal> mapping doesn't match the MGID layout (and where that Hal> conversion is done). Also, why is there a placeholder for the Hal> multicast QPN here ? The endian also looks a little odd to me Hal> as well. ip_ib_mc_map() is mapping an IPv4 multicast address to an IPoIB hardware address. So we have to create the full 20-byte address (as used by ARP etc), which means the first 4 bytes are reserved+QPN. I just looked over what I wrote again and it seems to be correct, including endianness: we put ff 12 40 1b as the top 4 bytes of the GID, and then the low 28 bits of the IPv4 address as the last 4 bytes of the GID. (Endianness is a little tricky because I used the same idiom as the rest of net/ip.h, where we do addr = ntohl(addr) and then fill in the group ID in reverse order). - R. From roland at topspin.com Mon Oct 25 13:10:58 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 13:10:58 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <1098734193.3266.8.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 25 Oct 2004 15:56:33 -0400") References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> <52breqd8t8.fsf@topspin.com> <20041025102752.3113fb69.mshefty@ichips.intel.com> <52y8hubt2m.fsf@topspin.com> <20041025104115.275dc24b.mshefty@ichips.intel.com> <52u0sibsfq.fsf@topspin.com> <1098734193.3266.8.camel@localhost.localdomain> Message-ID: <528y9ublt9.fsf@topspin.com> Roland> I guess right before dispatch to other clients -- there's Roland> no way for the low-level driver to tell whether or not the Roland> hop_count/hop_ptr checks and updates have been done or Roland> not, so it would just (incorrectly) consume the MAD and Roland> generate a response the first time around. Hal> Do the responses get back to the SM ? If so, would there be 2 Hal> responses for certain things ? This shouldn't happen -- I was just describing what would happen if we followed Sean's idea of passing the same MAD to the low-level driver twice. Just to be clear -- SMI checks should happen before the low-level driver gets the MAD, and the low-level driver should only get the MAD once (and only generate one reply if necessary). - R. From krkumar at us.ibm.com Mon Oct 25 13:11:11 2004 From: krkumar at us.ibm.com (Krishna Kumar) Date: Mon, 25 Oct 2004 13:11:11 -0700 (PDT) Subject: [openib-general] [PATCH] Consolidate access to ib_agent_port_list Message-ID: Hi, This patch is similar to one for MAD that I sent some time earlier. I could also have split the search routine into two, a get_by_dev and a get_by_agent, but I felt it was too cumbursome. Thanks, - KK --- ib_agent.c.org 2004-10-25 12:37:56.000000000 -0700 +++ ib_agent.c 2004-10-25 12:42:55.000000000 -0700 @@ -303,12 +303,52 @@ slid, mad, mad_response); } +static inline struct ib_agent_port_private * +__ib_get_agent_mad(struct ib_device *device, int port_num, + struct ib_mad_agent *mad_agent) +{ + struct ib_agent_port_private *entry; + + BUG_ON(!spin_is_locked(&ib_agent_port_list_lock); + BUG_ON(!(!!device ^ !!mad_agent)); /* Exactly one MUST be (!NULL) */ + + if (device) { + list_for_each_entry(entry, &ib_agent_port_list, port_list) { + if (entry->dr_smp_agent->device == device && + entry->port_num == port_num) + return entry; + } + } else { + list_for_each_entry(entry, &ib_agent_port_list, port_list) { + if ((entry->dr_smp_agent == mad_agent) || + (entry->lr_smp_agent == mad_agent) || + (entry->perf_mgmt_agent == mad_agent)) + return entry; + } + } + return NULL; +} + +static inline struct ib_agent_port_private * +ib_get_agent_mad(struct ib_device *device, int port_num, + struct ib_mad_agent *mad_agent) +{ + struct ib_agent_port_private *entry; + unsigned long flags; + + spin_lock_irqsave(&ib_agent_port_list_lock, flags); + entry = __ib_get_agent_mad(device, port_num, mad_agent); + spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); + + return entry; +} + void agent_mad_send(struct ib_mad_agent *mad_agent, struct ib_mad *mad, struct ib_grh *grh, struct ib_mad_recv_wc *mad_recv_wc) { - struct ib_agent_port_private *entry, *port_priv = NULL; + struct ib_agent_port_private *port_priv; struct ib_agent_send_wr *agent_send_wr; struct ib_sge gather_list; struct ib_send_wr send_wr; @@ -318,16 +358,7 @@ unsigned long flags; /* Find matching MAD agent */ - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if ((entry->dr_smp_agent == mad_agent) || - (entry->lr_smp_agent == mad_agent) || - (entry->perf_mgmt_agent == mad_agent)) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); + port_priv = ib_get_agent_mad(NULL, 0, mad_agent); if (!port_priv) { printk(KERN_ERR SPFX "agent_mad_send: no matching MAD agent %p\n", mad_agent); @@ -524,22 +555,13 @@ static void agent_send_handler(struct ib_mad_agent *mad_agent, struct ib_mad_send_wc *mad_send_wc) { - struct ib_agent_port_private *entry, *port_priv = NULL; + struct ib_agent_port_private *port_priv; struct ib_agent_send_wr *agent_send_wr; struct list_head *send_wr; unsigned long flags; /* Find matching MAD agent */ - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if ((entry->dr_smp_agent == mad_agent) || - (entry->lr_smp_agent == mad_agent) || - (entry->perf_mgmt_agent == mad_agent)) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); + port_priv = ib_get_agent_mad(NULL, 0, mad_agent); if (!port_priv) { printk(KERN_ERR SPFX "agent_send_handler: no matching MAD agent " "%p\n", mad_agent); @@ -579,20 +601,10 @@ static void agent_recv_handler(struct ib_mad_agent *mad_agent, struct ib_mad_recv_wc *mad_recv_wc) { - struct ib_agent_port_private *entry, *port_priv = NULL; - unsigned long flags; + struct ib_agent_port_private *port_priv; /* Find matching MAD agent */ - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if ((entry->dr_smp_agent == mad_agent) || - (entry->lr_smp_agent == mad_agent) || - (entry->perf_mgmt_agent == mad_agent)) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); + port_priv = ib_get_agent_mad(NULL, 0, mad_agent); if (!port_priv) { printk(KERN_ERR SPFX "agent_recv_handler: no matching MAD agent %p\n", mad_agent); @@ -615,20 +627,12 @@ .addr = 0, .size = (unsigned long) high_memory - PAGE_OFFSET }; - struct ib_agent_port_private *entry, *port_priv = NULL; + struct ib_agent_port_private *port_priv; struct ib_mad_reg_req reg_req; unsigned long flags; /* First, check if port already open for SMI */ - spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if (entry->dr_smp_agent->device == device && - entry->port_num == port_num) { - port_priv = entry; - break; - } - } - spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); + port_priv = ib_get_agent_mad(device, port_num, NULL); if (port_priv) { printk(KERN_DEBUG SPFX "%s port %d already open\n", device->name, port_num); @@ -729,21 +733,14 @@ static int ib_agent_port_close(struct ib_device *device, int port_num) { - struct ib_agent_port_private *entry, *port_priv = NULL; + struct ib_agent_port_private *port_priv; unsigned long flags; spin_lock_irqsave(&ib_agent_port_list_lock, flags); - list_for_each_entry(entry, &ib_agent_port_list, port_list) { - if (entry->dr_smp_agent->device == device && - entry->port_num == port_num) { - port_priv = entry; - break; - } - } - + port_priv = __ib_get_agent_mad(NULL, 0, mad_agent); if (port_priv == NULL) { - printk(KERN_ERR SPFX "Port %d not found\n", port_num); spin_unlock_irqrestore(&ib_agent_port_list_lock, flags); + printk(KERN_ERR SPFX "Port %d not found\n", port_num); return -ENODEV; } From roland at topspin.com Mon Oct 25 13:22:05 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 13:22:05 -0700 Subject: [openib-general] [PATCH] Fix GFP mask inside spinlock Message-ID: <524qkiblaq.fsf@topspin.com> add_mad_reg_req() is called with a spinlock held, so it has to allocate with GFP_ATOMIC. (It might be better to reorganize the code so the allocation can happen outside the lock but I didn't attempt that) Index: linux-kernel/infiniband/core/ib_mad.c =================================================================== --- linux-kernel.orig/infiniband/core/ib_mad.c 2004-10-25 12:16:15.000000000 -0700 +++ linux-kernel/infiniband/core/ib_mad.c 2004-10-25 13:19:32.000000000 -0700 @@ -597,7 +597,7 @@ class = &private->version[mad_reg_req->mgmt_class_version]; if (!*class) { /* Allocate management class table for "new" class version */ - *class = kmalloc(sizeof **class, GFP_KERNEL); + *class = kmalloc(sizeof **class, GFP_ATOMIC); if (!*class) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_class_table\n"); From roland at topspin.com Mon Oct 25 13:24:23 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 13:24:23 -0700 Subject: [openib-general] [PATCH] Fix GFP mask inside spinlock In-Reply-To: <524qkiblaq.fsf@topspin.com> (Roland Dreier's message of "Mon, 25 Oct 2004 13:22:05 -0700") References: <524qkiblaq.fsf@topspin.com> Message-ID: <52zn2aa6mg.fsf@topspin.com> Actually allocate_method_table() needs the same treatment... here's an updated patch: Index: linux-kernel/infiniband/core/ib_mad.c =================================================================== --- linux-kernel.orig/infiniband/core/ib_mad.c 2004-10-25 12:16:15.000000000 -0700 +++ linux-kernel/infiniband/core/ib_mad.c 2004-10-25 13:23:27.000000000 -0700 @@ -524,7 +524,7 @@ static int allocate_method_table(struct ib_mad_mgmt_method_table **method) { /* Allocate management method table */ - *method = kmalloc(sizeof **method, GFP_KERNEL); + *method = kmalloc(sizeof **method, GFP_ATOMIC); if (!*method) { printk(KERN_ERR PFX "No memory for ib_mad_mgmt_method_table\n"); return -ENOMEM; @@ -597,7 +597,7 @@ class = &private->version[mad_reg_req->mgmt_class_version]; if (!*class) { /* Allocate management class table for "new" class version */ - *class = kmalloc(sizeof **class, GFP_KERNEL); + *class = kmalloc(sizeof **class, GFP_ATOMIC); if (!*class) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_class_table\n"); From roland at topspin.com Mon Oct 25 14:09:01 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 14:09:01 -0700 Subject: [openib-general] Bug in ib_mad.c completion handling Message-ID: <52vfcya4k2.fsf@topspin.com> ib_mad_completion_handler switches on wc.opcode, even if wc.status is not SUCCESS. This isn't correct, since according to the IB spec, wc.opcode may not be valid if the completion has an unsuccessful status. - R. From halr at voltaire.com Mon Oct 25 15:07:18 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 18:07:18 -0400 Subject: [openib-general] Bug in ib_mad.c completion handling In-Reply-To: <52vfcya4k2.fsf@topspin.com> References: <52vfcya4k2.fsf@topspin.com> Message-ID: <1098742038.3266.28.camel@localhost.localdomain> On Mon, 2004-10-25 at 17:09, Roland Dreier wrote: > ib_mad_completion_handler switches on wc.opcode, even if wc.status is > not SUCCESS. This isn't correct, since according to the IB spec, > wc.opcode may not be valid if the completion has an unsuccessful > status. Can you cite the compliance or chapter/verse on this ? Thanks. -- Hal From mshefty at ichips.intel.com Mon Oct 25 15:10:04 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 25 Oct 2004 15:10:04 -0700 Subject: [openib-general] Bug in ib_mad.c completion handling In-Reply-To: <1098742038.3266.28.camel@localhost.localdomain> References: <52vfcya4k2.fsf@topspin.com> <1098742038.3266.28.camel@localhost.localdomain> Message-ID: <20041025151004.57add65a.mshefty@ichips.intel.com> On Mon, 25 Oct 2004 18:07:18 -0400 Hal Rosenstock wrote: > On Mon, 2004-10-25 at 17:09, Roland Dreier wrote: > > ib_mad_completion_handler switches on wc.opcode, even if wc.status is > > not SUCCESS. This isn't correct, since according to the IB spec, > > wc.opcode may not be valid if the completion has an unsuccessful > > status. > > Can you cite the compliance or chapter/verse on this ? See 11.4.2.1 of the 1.1 spec version (poll_cq documentation). - Sean From halr at voltaire.com Mon Oct 25 15:23:02 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 18:23:02 -0400 Subject: [openib-general] Bug in ib_mad.c completion handling In-Reply-To: <20041025151004.57add65a.mshefty@ichips.intel.com> References: <52vfcya4k2.fsf@topspin.com> <1098742038.3266.28.camel@localhost.localdomain> <20041025151004.57add65a.mshefty@ichips.intel.com> Message-ID: <1098742982.3266.37.camel@localhost.localdomain> On Mon, 2004-10-25 at 18:10, Sean Hefty wrote: > On Mon, 25 Oct 2004 18:07:18 -0400 > Hal Rosenstock wrote: > > > On Mon, 2004-10-25 at 17:09, Roland Dreier wrote: > > > ib_mad_completion_handler switches on wc.opcode, even if wc.status is > > > not SUCCESS. This isn't correct, since according to the IB spec, > > > wc.opcode may not be valid if the completion has an unsuccessful > > > status. > > > > Can you cite the compliance or chapter/verse on this ? > > See 11.4.2.1 of the 1.1 spec version (poll_cq documentation). It says "undefined except as noted below" and operation type (as well as WRID) are noted below. -- Hal From mshefty at ichips.intel.com Mon Oct 25 15:24:57 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 25 Oct 2004 15:24:57 -0700 Subject: [openib-general] Bug in ib_mad.c completion handling In-Reply-To: <1098742982.3266.37.camel@localhost.localdomain> References: <52vfcya4k2.fsf@topspin.com> <1098742038.3266.28.camel@localhost.localdomain> <20041025151004.57add65a.mshefty@ichips.intel.com> <1098742982.3266.37.camel@localhost.localdomain> Message-ID: <20041025152457.2d2bbbbf.mshefty@ichips.intel.com> On Mon, 25 Oct 2004 18:23:02 -0400 Hal Rosenstock wrote: > On Mon, 2004-10-25 at 18:10, Sean Hefty wrote: > > On Mon, 25 Oct 2004 18:07:18 -0400 > > Hal Rosenstock wrote: > > > > > On Mon, 2004-10-25 at 17:09, Roland Dreier wrote: > > > > ib_mad_completion_handler switches on wc.opcode, even if wc.status is > > > > not SUCCESS. This isn't correct, since according to the IB spec, > > > > wc.opcode may not be valid if the completion has an unsuccessful > > > > status. > > > > > > Can you cite the compliance or chapter/verse on this ? > > > > See 11.4.2.1 of the 1.1 spec version (poll_cq documentation). > > It says "undefined except as noted below" and operation type (as well as > WRID) are noted below. My interpretation is that only WRID is valid, since it is explicitly called out. All other fields are invalid. - Sean From halr at voltaire.com Mon Oct 25 15:34:22 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 18:34:22 -0400 Subject: [openib-general] Bug in ib_mad.c completion handling In-Reply-To: <20041025152457.2d2bbbbf.mshefty@ichips.intel.com> References: <52vfcya4k2.fsf@topspin.com> <1098742038.3266.28.camel@localhost.localdomain> <20041025151004.57add65a.mshefty@ichips.intel.com> <1098742982.3266.37.camel@localhost.localdomain> <20041025152457.2d2bbbbf.mshefty@ichips.intel.com> Message-ID: <1098743662.3266.42.camel@localhost.localdomain> On Mon, 2004-10-25 at 18:24, Sean Hefty wrote: > > > See 11.4.2.1 of the 1.1 spec version (poll_cq documentation). > > > > It says "undefined except as noted below" and operation type (as well as > > WRID) are noted below. > > My interpretation is that only WRID is valid, since it is explicitly called out. All other fields are invalid. Seems to me operation type is also called out "equally". Call the spec lawyers :-) -- Hal From roland at topspin.com Mon Oct 25 16:22:23 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 16:22:23 -0700 Subject: [openib-general] Bug in ib_mad.c completion handling In-Reply-To: <1098743662.3266.42.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 25 Oct 2004 18:34:22 -0400") References: <52vfcya4k2.fsf@topspin.com> <1098742038.3266.28.camel@localhost.localdomain> <20041025151004.57add65a.mshefty@ichips.intel.com> <1098742982.3266.37.camel@localhost.localdomain> <20041025152457.2d2bbbbf.mshefty@ichips.intel.com> <1098743662.3266.42.camel@localhost.localdomain> Message-ID: <52r7nm9yds.fsf@topspin.com> Hal> Seems to me operation type is also called out "equally". Call Hal> the spec lawyers :-) I don't think your interpretation can be reconciled with the spec. 11.4.2.1 says: If the status of the operation that generates the Work Completion is anything other than success, the contents of the Work Completion are undefined except as noted below. The contents of a Work Completion are: and what follows is a list of _every_ component of a Work Completion. For WR ID, status and freed resource count _only_, the spec says: This is always valid, regardless of the status of the operation. Clearly the intent of the spec is that these 3 fields are the only fields required to be valid for unsuccessful completions. Both mthca and the Mellanox THCA driver followed this interpretation. - R. From halr at voltaire.com Mon Oct 25 18:32:15 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 21:32:15 -0400 Subject: [openib-general] Bug in ib_mad.c completion handling In-Reply-To: <52r7nm9yds.fsf@topspin.com> References: <52vfcya4k2.fsf@topspin.com> <1098742038.3266.28.camel@localhost.localdomain> <20041025151004.57add65a.mshefty@ichips.intel.com> <1098742982.3266.37.camel@localhost.localdomain> <20041025152457.2d2bbbbf.mshefty@ichips.intel.com> <1098743662.3266.42.camel@localhost.localdomain> <52r7nm9yds.fsf@topspin.com> Message-ID: <1098754335.3266.207.camel@localhost.localdomain> On Mon, 2004-10-25 at 19:22, Roland Dreier wrote: > and what follows is a list of _every_ component of a Work Completion. > For WR ID, status and freed resource count _only_, the spec says: > > This is always valid, regardless of the status of the operation. That's what I missed. Thanks. I'll fix the code. -- Hal From halr at voltaire.com Mon Oct 25 18:38:40 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 21:38:40 -0400 Subject: [openib-general] [PATCH] Fix GFP mask inside spinlock In-Reply-To: <52zn2aa6mg.fsf@topspin.com> References: <524qkiblaq.fsf@topspin.com> <52zn2aa6mg.fsf@topspin.com> Message-ID: <1098754720.3266.215.camel@localhost.localdomain> On Mon, 2004-10-25 at 16:24, Roland Dreier wrote: > Actually allocate_method_table() needs the same treatment... Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 25 19:01:07 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 22:01:07 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix bug in completion handler when status != success Message-ID: <1098756067.10520.4.camel@hpc-1> ib_mad.c: Fix bug in completion handler when status != success ib_mad_completion_handler switches on wc.opcode, even if wc.status is not SUCCESS. This isn't correct, since according to the IB spec, wc.opcode may not be valid if the completion has an unsuccessful status. Index: ib_mad.c =================================================================== --- ib_mad.c (revision 1069) +++ ib_mad.c (working copy) @@ -1165,27 +1165,24 @@ ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { - printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", - wc.opcode, (unsigned long long) wc.wr_id); - switch (wc.opcode) { - case IB_WC_SEND: - if (wc.status != IB_WC_SUCCESS) - printk(KERN_ERR PFX "Send completion error %d\n", - wc.status); - ib_mad_send_done_handler(port_priv, &wc); - break; - case IB_WC_RECV: - if (wc.status != IB_WC_SUCCESS) - printk(KERN_ERR PFX "Recv completion error %d\n", - wc.status); - ib_mad_recv_done_handler(port_priv, &wc); - break; - default: - printk(KERN_ERR PFX "Wrong Opcode 0x%x on completion\n", - wc.opcode); - if (wc.status) { - printk(KERN_ERR PFX "Completion error %d\n", - wc.status); + if (wc.status != IB_WC_SUCCESS) { + printk(KERN_ERR PFX "Completion error %d WRID 0x%Lx\n", + wc.status, (unsigned long long) wc.wr_id); + } else { + printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", + wc.opcode, (unsigned long long) wc.wr_id); + + switch (wc.opcode) { + case IB_WC_SEND: + ib_mad_send_done_handler(port_priv, &wc); + break; + case IB_WC_RECV: + ib_mad_recv_done_handler(port_priv, &wc); + break; + default: + printk(KERN_ERR PFX "Wrong Opcode 0x%x on completion\n", + wc.opcode); + break; } } } From halr at voltaire.com Mon Oct 25 19:06:46 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 22:06:46 -0400 Subject: [openib-general] [PATCH] Start moving to a native IPoIB driver In-Reply-To: <52d5z6blv8.fsf@topspin.com> References: <52pt3vpoxz.fsf@topspin.com> <1098733264.3266.6.camel@localhost.localdomain> <52d5z6blv8.fsf@topspin.com> Message-ID: <1098756406.3266.246.camel@localhost.localdomain> On Mon, 2004-10-25 at 16:09, Roland Dreier wrote: > Hal> In terms of the change to net/ip.h in > Hal> linux-2.6.9-ipoib-multicast.diff, I'm curious as to why the > Hal> mapping doesn't match the MGID layout (and where that > Hal> conversion is done). Also, why is there a placeholder for the > Hal> multicast QPN here ? The endian also looks a little odd to me > Hal> as well. > > ip_ib_mc_map() is mapping an IPv4 multicast address to an IPoIB > hardware address. So we have to create the full 20-byte address (as > used by ARP etc), which means the first 4 bytes are reserved+QPN. OK; I thought the mapping was just to MGID but it is hardware address (QPN + MGID) even though multicast doesn't use ARP (but the QPN for the MGID is needed to multicast). > I just looked over what I wrote again and it seems to be correct, > including endianness: we put ff 12 40 1b as the top 4 bytes of the > GID, and then the low 28 bits of the IPv4 address as the last 4 bytes > of the GID. That sounds right. > (Endianness is a little tricky because I used the same idiom as the > rest of net/ip.h, where we do addr = ntohl(addr) and then fill in the > group ID in reverse order). That's what made it seem odd to me. Should've looked harder. Thanks. -- Hal From roland at topspin.com Mon Oct 25 19:12:30 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 25 Oct 2004 19:12:30 -0700 Subject: [openib-general] [PATCH] ib_mad.c: Fix bug in completion handler when status != success In-Reply-To: <1098756067.10520.4.camel@hpc-1> (Hal Rosenstock's message of "Mon, 25 Oct 2004 22:01:07 -0400") References: <1098756067.10520.4.camel@hpc-1> Message-ID: <52acua9qi9.fsf@topspin.com> This is no good: + if (wc.status != IB_WC_SUCCESS) { + printk(KERN_ERR PFX "Completion error %d WRID 0x%Lx\n", + wc.status, (unsigned long long) wc.wr_id); + } else if a request fails we'll never complete it. eg if a consumer uses a bad L_Key for a send request, the send will never complete (and if the consumer ever does ib_unregister_mad_agent it will hang waiting for the send to finish). - R. From halr at voltaire.com Mon Oct 25 19:25:25 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 22:25:25 -0400 Subject: [openib-general] [PATCH] ib_mad.c: Fix bug in completion handler when status != success In-Reply-To: <52acua9qi9.fsf@topspin.com> References: <1098756067.10520.4.camel@hpc-1> <52acua9qi9.fsf@topspin.com> Message-ID: <1098757525.3266.252.camel@localhost.localdomain> On Mon, 2004-10-25 at 22:12, Roland Dreier wrote: > This is no good: > > + if (wc.status != IB_WC_SUCCESS) { > + printk(KERN_ERR PFX "Completion error %d WRID 0x%Lx\n", > + wc.status, (unsigned long long) > wc.wr_id); > + } else > > if a request fails we'll never complete it. eg if a consumer uses a > bad L_Key for a send request, the send will never complete (and if the > consumer ever does ib_unregister_mad_agent it will hang waiting for > the send to finish). It looks like I also need to add in a call to ib_mad_send_done_handler in this case as well (when status != success) to handle the case you cite. -- Hal From halr at voltaire.com Mon Oct 25 19:42:55 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 25 Oct 2004 22:42:55 -0400 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler Message-ID: <1098758558.14442.6.camel@hpc-1> ib_mad: In completion handler, when status != success call send done handler so that if a request fails we'll complete it. eg if a consumer uses a bad L_Key for a send request, the send will complete (and if the consumer ever does ib_unregister_mad_agent it will not hang waiting for the send to finish). Index: ib_mad.c =================================================================== --- ib_mad.c (revision 1070) +++ ib_mad.c (working copy) @@ -1168,6 +1168,7 @@ if (wc.status != IB_WC_SUCCESS) { printk(KERN_ERR PFX "Completion error %d WRID 0x%Lx\n", wc.status, (unsigned long long) wc.wr_id); + ib_mad_send_done_handler(port_priv, &wc); } else { printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", wc.opcode, (unsigned long long) wc.wr_id); From halr at voltaire.com Tue Oct 26 06:41:30 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 09:41:30 -0400 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <52wtxed9le.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <1098722118.3269.929.camel@localhost.localdomain> <52wtxed9le.fsf@topspin.com> Message-ID: <1098798090.3266.7.camel@localhost.localdomain> On Mon, 2004-10-25 at 12:51, Roland Dreier wrote: > (and I haven't thought through > the CM yet but I'm a little worried about how the solicited/unsolicited > distinction that the MAD layer makes will fit with the CM). All CM MADs are (considered) unsolicited. There was a thread on this a while back which I can dig out if needed. -- Hal From halr at voltaire.com Tue Oct 26 06:47:48 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 09:47:48 -0400 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52fz42d969.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <1098469091.22400.0.camel@hpc-1> <524qkmhb2d.fsf@topspin.com> <1098472467.22400.6.camel@hpc-1> <52is92fqi6.fsf@topspin.com> <1098722672.3269.938.camel@localhost.localdomain> <52fz42d969.fsf@topspin.com> Message-ID: <1098798468.3266.16.camel@localhost.localdomain> On Mon, 2004-10-25 at 13:01, Roland Dreier wrote: > I guess for a branch to be official, something official has to happen > in the OpenIB organization, but I'm not sure what. I'm not sure what either but I would think if we all agree about this, then OpenIB does not need to do anything until right before kernel submission time. All we need to do is name and start this branch with the appropriate contents. -- Hal From halr at voltaire.com Tue Oct 26 08:10:06 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 11:10:06 -0400 Subject: [openib-general] [PATCH] Consolidate access to ib_agent_port_list In-Reply-To: References: Message-ID: <1098803406.3266.44.camel@localhost.localdomain> Hi KK, On Mon, 2004-10-25 at 16:11, Krishna Kumar wrote: > This patch is similar to one for MAD that I sent some time earlier. > > I could also have split the search routine into two, a get_by_dev > and a get_by_agent, but I felt it was too cumbursome. Looks pretty good. A couple of minor points: > > Thanks, > > - KK > > --- ib_agent.c.org 2004-10-25 12:37:56.000000000 -0700 > +++ ib_agent.c 2004-10-25 12:42:55.000000000 -0700 > @@ -303,12 +303,52 @@ > slid, mad, mad_response); > } > > +static inline struct ib_agent_port_private * > +__ib_get_agent_mad(struct ib_device *device, int port_num, > + struct ib_mad_agent *mad_agent) > +{ > + struct ib_agent_port_private *entry; > + > + BUG_ON(!spin_is_locked(&ib_agent_port_list_lock); BUG_ON(!spin_is_locked(&ib_agent_port_list_lock)); > @@ -729,21 +733,14 @@ > > static int ib_agent_port_close(struct ib_device *device, int port_num) > { > - struct ib_agent_port_private *entry, *port_priv = NULL; > + struct ib_agent_port_private *port_priv; > unsigned long flags; > > spin_lock_irqsave(&ib_agent_port_list_lock, flags); > - list_for_each_entry(entry, &ib_agent_port_list, port_list) { > - if (entry->dr_smp_agent->device == device && > - entry->port_num == port_num) { > - port_priv = entry; > - break; > - } > - } > - > + port_priv = __ib_get_agent_mad(NULL, 0, mad_agent); I think this needs to be: port_priv = __ib_get_agent_mad(device, port_num, NULL); If that's the case, I'm all set. I also don't understand why the patching resulted in the need to do some manual merging (as several hunks failed). Thanks. -- Hal From roland at topspin.com Tue Oct 26 09:14:03 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 26 Oct 2004 09:14:03 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <1098758558.14442.6.camel@hpc-1> (Hal Rosenstock's message of "Mon, 25 Oct 2004 22:42:55 -0400") References: <1098758558.14442.6.camel@hpc-1> Message-ID: <52654xa244.fsf@topspin.com> if (wc.status != IB_WC_SUCCESS) { printk(KERN_ERR PFX "Completion error %d WRID 0x%Lx\n", wc.status, (unsigned long long) wc.wr_id); + ib_mad_send_done_handler(port_priv, &wc); } else { I think this is still not quite right: what if a receive fails? - R. From mshefty at ichips.intel.com Tue Oct 26 09:30:00 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 26 Oct 2004 09:30:00 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <52654xa244.fsf@topspin.com> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> Message-ID: <20041026093000.72b60799.mshefty@ichips.intel.com> On Tue, 26 Oct 2004 09:14:03 -0700 Roland Dreier wrote: > if (wc.status != IB_WC_SUCCESS) { > printk(KERN_ERR PFX "Completion error %d WRID 0x%Lx\n", > wc.status, (unsigned long long) > wc.wr_id); > + ib_mad_send_done_handler(port_priv, &wc); > } else { > > I think this is still not quite right: what if a receive fails? As a suggestion, we can allocate 2 CQs per QP, one for receives, and one for sends. This would let us separate send from receive completions based on the callback. From mshefty at ichips.intel.com Tue Oct 26 09:40:54 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 26 Oct 2004 09:40:54 -0700 Subject: [openib-general] ib_free_recv_mad and references Message-ID: <20041026094054.2806fa7a.mshefty@ichips.intel.com> Currently, a call to ib_free_recv_mad does not dereference the mad_agent that the mad was given to. The call itself does not access the mad_agent, but should a reference be held on the mad_agent while it has a received MAD? Looking at the implementation, it appears that a mad_agent could deregister with the access layer, then call ib_free_recv_mad, which accesses the ib_mad_cache. - Sean -- From halr at voltaire.com Tue Oct 26 09:44:56 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 12:44:56 -0400 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <20041026093000.72b60799.mshefty@ichips.intel.com> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> Message-ID: <1098809096.3266.47.camel@localhost.localdomain> On Tue, 2004-10-26 at 12:30, Sean Hefty wrote: > > I think this is still not quite right: what if a receive fails? > > As a suggestion, we can allocate 2 CQs per QP, one for receives, > and one for sends. This would let us separate send from receive > completions based on the callback. Another alternative is to assume it is a receive if it is not a send is not matched. -- Hal From halr at voltaire.com Tue Oct 26 09:55:50 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 12:55:50 -0400 Subject: [openib-general] [PATCH] Combine ib_agent into ib_mad module Message-ID: <1098809749.26411.1.camel@hpc-1> Combine ib_agent into ib_mad module The only downside of this currently is that when removing some module which uses the AL will cause the AL to be removed. Is there a way to stop a dependency from unloading ? Index: ib_agent.c =================================================================== --- ib_agent.c (revision 1073) +++ ib_agent.c (working copy) @@ -29,12 +29,7 @@ #include -MODULE_LICENSE("Dual BSD/GPL"); -MODULE_DESCRIPTION("kernel IB agents (SMA and PMA)"); -MODULE_AUTHOR("Sean Hefty"); -MODULE_AUTHOR("Hal Rosenstock"); - static spinlock_t ib_agent_port_list_lock = SPIN_LOCK_UNLOCKED; static LIST_HEAD(ib_agent_port_list); @@ -619,7 +614,7 @@ ib_free_recv_mad(mad_recv_wc); } -static int ib_agent_port_open(struct ib_device *device, int port_num, +int ib_agent_port_open(struct ib_device *device, int port_num, int phys_port_cnt) { int ret; @@ -732,7 +727,7 @@ return ret; } -static int ib_agent_port_close(struct ib_device *device, int port_num) +int ib_agent_port_close(struct ib_device *device, int port_num) { struct ib_agent_port_private *port_priv; unsigned long flags; @@ -757,103 +752,3 @@ return 0; } -static void ib_agent_init_device(struct ib_device *device) -{ - int ret, num_ports, cur_port, i, ret2; - struct ib_device_attr device_attr; - - ret = ib_query_device(device, &device_attr); - if (ret) { - printk(KERN_ERR SPFX "Couldn't query device %s\n", device->name); - goto error_device_query; - } - - if (device->node_type == IB_NODE_SWITCH) { - num_ports = 1; - cur_port = 0; - } else { - num_ports = device_attr.phys_port_cnt; - cur_port = 1; - } - - for (i = 0; i < num_ports; i++, cur_port++) { - ret = ib_agent_port_open(device, cur_port, num_ports); - if (ret) { - printk(KERN_ERR SPFX "Couldn't open %s port %d\n", - device->name, cur_port); - goto error_device_open; - } - } - - goto error_device_query; - -error_device_open: - while (i > 0) { - cur_port--; - ret2 = ib_agent_port_close(device, cur_port); - if (ret2) { - printk(KERN_ERR SPFX "Couldn't close %s port %d\n", - device->name, cur_port); - } - i--; - } - -error_device_query: - return; -} - -static void ib_agent_remove_device(struct ib_device *device) -{ - int ret, i, num_ports, cur_port, ret2; - struct ib_device_attr device_attr; - - ret = ib_query_device(device, &device_attr); - if (ret) { - printk(KERN_ERR SPFX "Couldn't query device %s\n", device->name); - goto error_device_query; - } - - if (device->node_type == IB_NODE_SWITCH) { - num_ports = 1; - cur_port = 0; - } else { - num_ports = device_attr.phys_port_cnt; - cur_port = 1; - } - for (i = 0; i < num_ports; i++, cur_port++) { - ret2 = ib_agent_port_close(device, cur_port); - if (ret2) { - printk(KERN_ERR SPFX "Couldn't close %s port %d\n", - device->name, cur_port); - if (!ret) - ret = ret2; - } - } - -error_device_query: - return; -} - -static struct ib_client ib_agent_client = { - .name = "ib_agent", - .add = ib_agent_init_device, - .remove = ib_agent_remove_device -}; - -static int __init ib_agent_init(void) -{ - if (ib_register_client(&ib_agent_client)) { - printk(KERN_ERR SPFX "Couldn't register ib_agent client\n"); - return -EINVAL; - } - - return 0; -} - -static void __exit ib_agent_exit(void) -{ - ib_unregister_client(&ib_agent_client); -} - -module_init(ib_agent_init); -module_exit(ib_agent_exit); Index: ib_mad.c =================================================================== --- ib_mad.c (revision 1071) +++ ib_mad.c (working copy) @@ -1917,6 +1917,12 @@ return 0; } + +extern int ib_agent_port_open(struct ib_device *device, int port_num, + int phys_port_cnt); +extern int ib_agent_port_close(struct ib_device *device, int port_num); + + static void ib_mad_init_device(struct ib_device *device) { int ret, num_ports, cur_port, i, ret2; @@ -1942,6 +1948,12 @@ device->name, cur_port); goto error_device_open; } + ret = ib_agent_port_open(device, cur_port, num_ports); + if (ret) { + printk(KERN_ERR PFX "Couldn't open %s port %d for agents\n", + device->name, cur_port); + goto error_device_open; + } } goto error_device_query; @@ -1949,6 +1961,11 @@ error_device_open: while (i > 0) { cur_port--; + ret2 = ib_agent_port_close(device, cur_port); + if (ret2) { + printk(KERN_ERR PFX "Couldn't close %s port %d for agent\n", + device->name, cur_port); + } ret2 = ib_mad_port_close(device, cur_port); if (ret2) { printk(KERN_ERR PFX "Couldn't close %s port %d\n", @@ -1980,6 +1997,13 @@ cur_port = 1; } for (i = 0; i < num_ports; i++, cur_port++) { + ret2 = ib_agent_port_close(device, cur_port); + if (ret2) { + printk(KERN_ERR PFX "Couldn't close %s port %d for agent\n", + device->name, cur_port); + if (!ret) + ret = ret2; + } ret2 = ib_mad_port_close(device, cur_port); if (ret2) { printk(KERN_ERR PFX "Couldn't close %s port %d\n", @@ -2022,6 +2046,7 @@ ret = -EINVAL; goto error2; } + return 0; error2: Index: Makefile =================================================================== --- Makefile (revision 1071) +++ Makefile (working copy) @@ -2,10 +2,8 @@ obj-$(CONFIG_INFINIBAND_ACCESS_LAYER) += \ ib_al.o \ - ib_agt.o + ib_al_test.o ib_al-objs := \ - ib_mad.o - -ib_agt-objs := \ + ib_mad.o \ ib_agent.o Index: README =================================================================== --- README (revision 1071) +++ README (working copy) @@ -42,7 +42,6 @@ 6. You are now ready to run the new access layer as follows: /sbin/modprobe ib_mthca - /sbin/modprobe ib_al (This can be skipped) - /sbin/modprobe ib_agt + /sbin/modprobe ib_al Note that starting ib_al does not cause ib_mthca to be started. From mshefty at ichips.intel.com Tue Oct 26 09:50:09 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 26 Oct 2004 09:50:09 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <1098809096.3266.47.camel@localhost.localdomain> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <1098809096.3266.47.camel@localhost.localdomain> Message-ID: <20041026095009.037a4d4a.mshefty@ichips.intel.com> On Tue, 26 Oct 2004 12:44:56 -0400 Hal Rosenstock wrote: > On Tue, 2004-10-26 at 12:30, Sean Hefty wrote: > > > I think this is still not quite right: what if a receive fails? > > > > As a suggestion, we can allocate 2 CQs per QP, one for receives, > > and one for sends. This would let us separate send from receive > > completions based on the callback. > > Another alternative is to assume it is a receive if it is not a send is > not matched. I think we have other issues with the completion handling as well. Since we use a single CQ for both QPs, I think that we need to search the send_posted_mad_list to find the corresponding completion. We cannot assume that the completion matches with the request at the head of the list. This appears to be broken in the non-error case as well. I will happily create a patch to fix these issues. - Sean From halr at voltaire.com Tue Oct 26 10:03:58 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 13:03:58 -0400 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <20041026095009.037a4d4a.mshefty@ichips.intel.com> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <1098809096.3266.47.camel@localhost.localdomain> <20041026095009.037a4d4a.mshefty@ichips.intel.com> Message-ID: <1098810238.3266.58.camel@localhost.localdomain> On Tue, 2004-10-26 at 12:50, Sean Hefty wrote: > > Another alternative is to assume it is a receive if it is not a send is > > not matched. > > I think we have other issues with the completion handling as well. > Since we use a single CQ for both QPs, I think that we need to search > the send_posted_mad_list to find the corresponding completion. > We cannot assume that the completion matches with the request at the > head of the list. > > This appears to be broken in the non-error case as well. Right. > I will happily create a patch to fix these issues. Just wondering... will the patch change to a CQ/QP or leave it as 1 CQ/port ? (BTW, there was a patch a long time ago on this which was lost in the shuffle. Sorry). -- Hal From roland at topspin.com Tue Oct 26 10:10:23 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 26 Oct 2004 10:10:23 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <20041026093000.72b60799.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 26 Oct 2004 09:30:00 -0700") References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> Message-ID: <52hdoh8kxs.fsf@topspin.com> Sean> As a suggestion, we can allocate 2 CQs per QP, one for Sean> receives, and one for sends. This would let us separate Sean> send from receive completions based on the callback. That's one solution, and another way to handle it is to have a way of distinguishing sends from receives based on wr_id (that's what the Topspin stack does). Not sure which is better really. - Roland From halr at voltaire.com Tue Oct 26 10:14:00 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 13:14:00 -0400 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <52hdoh8kxs.fsf@topspin.com> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <52hdoh8kxs.fsf@topspin.com> Message-ID: <1098810840.3266.69.camel@localhost.localdomain> On Tue, 2004-10-26 at 13:10, Roland Dreier wrote: > Sean> As a suggestion, we can allocate 2 CQs per QP, one for > Sean> receives, and one for sends. This would let us separate > Sean> send from receive completions based on the callback. > > That's one solution, and another way to handle it is to have a way of > distinguishing sends from receives based on wr_id (that's what the > Topspin stack does). That's where I was heading with this. It implies a "stolen" bit in the WRID. > Not sure which is better really. Me neither but Sean seems to feel strongly about the CQ separation. -- Hal From roland at topspin.com Tue Oct 26 10:13:49 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 26 Oct 2004 10:13:49 -0700 Subject: [openib-general] [PATCH] Combine ib_agent into ib_mad module In-Reply-To: <1098809749.26411.1.camel@hpc-1> (Hal Rosenstock's message of "Tue, 26 Oct 2004 12:55:50 -0400") References: <1098809749.26411.1.camel@hpc-1> Message-ID: <52d5z58ks2.fsf@topspin.com> Hal> Combine ib_agent into ib_mad module Why did you use this approach (move ib_agent_port_{open,close} into ib_mad.c) rather than the more minimal way I posted, that just had ib_mad.c register the ib_agent_client? Hal> The only downside of this Hal> currently is that when removing some module which uses the AL Hal> will cause the AL to be removed. Is there a way to stop a Hal> dependency from unloading ? Does it work for you to use rmmod instead of 'modprobe -r'? In any case if nothing depends on the MAD module why does it have to stay loaded? - R. From mst at mellanox.co.il Tue Oct 26 10:16:28 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Oct 2004 19:16:28 +0200 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <52hdoh8kxs.fsf@topspin.com> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <52hdoh8kxs.fsf@topspin.com> Message-ID: <20041026171628.GC13064@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call?send done handler": > Sean> As a suggestion, we can allocate 2 CQs per QP, one for > Sean> receives, and one for sends. This would let us separate > Sean> send from receive completions based on the callback. > > That's one solution, and another way to handle it is to have a way of > distinguishing sends from receives based on wr_id (that's what the > Topspin stack does). > > Not sure which is better really. > > - Roland If you have 2 CQs you could have separate threads handing sends and receives, waking up only the relevant one. MST From mshefty at ichips.intel.com Tue Oct 26 10:13:32 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 26 Oct 2004 10:13:32 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <1098810238.3266.58.camel@localhost.localdomain> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <1098809096.3266.47.camel@localhost.localdomain> <20041026095009.037a4d4a.mshefty@ichips.intel.com> <1098810238.3266.58.camel@localhost.localdomain> Message-ID: <20041026101332.7d5eb6ea.mshefty@ichips.intel.com> On Tue, 26 Oct 2004 13:03:58 -0400 Hal Rosenstock wrote: > On Tue, 2004-10-26 at 12:50, Sean Hefty wrote: > > > Another alternative is to assume it is a receive if it is not a send is > > > not matched. > > > > I think we have other issues with the completion handling as well. > > Since we use a single CQ for both QPs, I think that we need to search > > the send_posted_mad_list to find the corresponding completion. > > We cannot assume that the completion matches with the request at the > > head of the list. > > > > This appears to be broken in the non-error case as well. > > Right. > > > I will happily create a patch to fix these issues. > > Just wondering... will the patch change to a CQ/QP or leave it as 1 > CQ/port ? (BTW, there was a patch a long time ago on this which was lost > in the shuffle. Sorry). I was just looking at the other error handling cases to see what would make the most sense. At a minimum, I think that we want two send_posted_mad_list's, one per QP, in order to recover from errors on one of the QPs. Having a single list makes it more complicated to restart a QP. >From a software viewpoint, I think that 2 CQs per QP, for a total of 4 per port, would make the code the simplest, and probably allow for the most optimization wrt completion processing and QP size. (My assumption is that the memory cost for 4 smaller CQs would basically be the same as 1 or 2 larger CQs.) Of course, we can always use a single CQ and just set the wr_id to something that can differentiate between which send/receive queue we're trying to process. - Sean From mshefty at ichips.intel.com Tue Oct 26 10:30:34 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 26 Oct 2004 10:30:34 -0700 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <52654yepli.fsf@topspin.com> References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <20041025102234.GV21516@cern.ch> <52654yepli.fsf@topspin.com> Message-ID: <20041026103034.19ef1502.mshefty@ichips.intel.com> On Mon, 25 Oct 2004 09:20:57 -0700 Roland Dreier wrote: > As I said before we don't have any userspace support right now. I am > currently working on the userspace support required for the SM but > there is not even anything to test yet. How far have you gotten for userspace SM support? Is there any code checked it anywhere, or have you just started on this? - Sean From krkumar at us.ibm.com Tue Oct 26 10:26:50 2004 From: krkumar at us.ibm.com (Krishna Kumar) Date: Tue, 26 Oct 2004 10:26:50 -0700 (PDT) Subject: [openib-general] [PATCH] Consolidate access to ib_agent_port_list In-Reply-To: <1098803406.3266.44.camel@localhost.localdomain> Message-ID: Hi Hal, Thanks for applying the patch with the 2 fixes. > I also don't understand why the patching resulted in the need to do some > manual merging (as several hunks failed). Sorry about this, but I am not sure why it happened - I did a svn update a few minutes before sending the patch. I am not sure if my mailer is mangling the patch - I will check that today. Thanks, - KK On Tue, 26 Oct 2004, Hal Rosenstock wrote: > Hi KK, > > On Mon, 2004-10-25 at 16:11, Krishna Kumar wrote: > > This patch is similar to one for MAD that I sent some time earlier. > > > > I could also have split the search routine into two, a get_by_dev > > and a get_by_agent, but I felt it was too cumbursome. > > Looks pretty good. A couple of minor points: > > > > > Thanks, > > > > - KK > > > > --- ib_agent.c.org 2004-10-25 12:37:56.000000000 -0700 > > +++ ib_agent.c 2004-10-25 12:42:55.000000000 -0700 > > @@ -303,12 +303,52 @@ > > slid, mad, mad_response); > > } > > > > +static inline struct ib_agent_port_private * > > +__ib_get_agent_mad(struct ib_device *device, int port_num, > > + struct ib_mad_agent *mad_agent) > > +{ > > + struct ib_agent_port_private *entry; > > + > > + BUG_ON(!spin_is_locked(&ib_agent_port_list_lock); > > BUG_ON(!spin_is_locked(&ib_agent_port_list_lock)); > > > > @@ -729,21 +733,14 @@ > > > > static int ib_agent_port_close(struct ib_device *device, int port_num) > > { > > - struct ib_agent_port_private *entry, *port_priv = NULL; > > + struct ib_agent_port_private *port_priv; > > unsigned long flags; > > > > spin_lock_irqsave(&ib_agent_port_list_lock, flags); > > - list_for_each_entry(entry, &ib_agent_port_list, port_list) { > > - if (entry->dr_smp_agent->device == device && > > - entry->port_num == port_num) { > > - port_priv = entry; > > - break; > > - } > > - } > > - > > + port_priv = __ib_get_agent_mad(NULL, 0, mad_agent); > > I think this needs to be: > port_priv = __ib_get_agent_mad(device, port_num, NULL); > If that's the case, I'm all set. > > I also don't understand why the patching resulted in the need to do some > manual merging (as several hunks failed). > > Thanks. > > -- Hal > > > From halr at voltaire.com Tue Oct 26 10:36:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 13:36:24 -0400 Subject: [openib-general] [PATCH] Combine ib_agent into ib_mad module In-Reply-To: <52d5z58ks2.fsf@topspin.com> References: <1098809749.26411.1.camel@hpc-1> <52d5z58ks2.fsf@topspin.com> Message-ID: <1098812184.3266.111.camel@localhost.localdomain> On Tue, 2004-10-26 at 13:13, Roland Dreier wrote: > Hal> Combine ib_agent into ib_mad module > > Why did you use this approach (move ib_agent_port_{open,close} into > ib_mad.c) rather than the more minimal way I posted, that just had > ib_mad.c register the ib_agent_client? Because with just your changes, the port did not become active. It appeared that there needed to be some synchronization between the MAD layer completing its initialization and the agents completing theirs. This was the simplest way I could think of. I'm sure there are other solutions too. Also, while your changes were fewer lines of code change, these changes are less lines of code total. Is there an issue with doing it this way ? > Hal> The only downside of this > Hal> currently is that when removing some module which uses the AL > Hal> will cause the AL to be removed. Is there a way to stop a > Hal> dependency from unloading ? > > Does it work for you to use rmmod instead of 'modprobe -r'? Yes. I forgot about rmmod... Maybe we will need to FAQ this. > In any case if nothing depends on the MAD module why does it have to stay > loaded? Because the agents need to stay around or the port does not stay active now. -- Hal From halr at voltaire.com Tue Oct 26 10:39:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 13:39:26 -0400 Subject: [openib-general] [PATCH] Consolidate access to ib_agent_port_list In-Reply-To: References: Message-ID: <1098812366.3266.115.camel@localhost.localdomain> Hi again KK, On Tue, 2004-10-26 at 13:26, Krishna Kumar wrote: > Thanks for applying the patch with the 2 fixes. Thanks for your efforts on this :-) > > I also don't understand why the patching resulted in the need to do some > > manual merging (as several hunks failed). > > Sorry about this, but I am not sure why it happened - I did a svn update > a few minutes before sending the patch. I am not sure if my mailer is > mangling the patch - I will check that today. I don't think it is your mailer. patch seemed to complain about the line numbers although manually this was a trivial merge. I am less than impressed with svn's merge capabilities. It seems to handle only the most basic merging. Oh well, I've worked with less in the distant past. -- Hal From roland at topspin.com Tue Oct 26 12:33:49 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 26 Oct 2004 12:33:49 -0700 Subject: [openib-general] Kernel 2.6.9 In-Reply-To: <20041026103034.19ef1502.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 26 Oct 2004 10:30:34 -0700") References: <52ekjuaen9.fsf@topspin.com> <20041021084412.GL21516@cern.ch> <1098381416.2389.9.camel@duffman> <20041022113524.GI21516@cern.ch> <52d5zahh3y.fsf@topspin.com> <20041025102234.GV21516@cern.ch> <52654yepli.fsf@topspin.com> <20041026103034.19ef1502.mshefty@ichips.intel.com> Message-ID: <52pt356zqa.fsf@topspin.com> Sean> How far have you gotten for userspace SM support? Is there Sean> any code checked it anywhere, or have you just started on Sean> this? I'm about 90% done, although nothing is checked in. I should be able to check it in by the end of this week. - R. From roland at topspin.com Tue Oct 26 12:35:33 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 26 Oct 2004 12:35:33 -0700 Subject: [openib-general] [PATCH] Combine ib_agent into ib_mad module In-Reply-To: <1098812184.3266.111.camel@localhost.localdomain> (Hal Rosenstock's message of "Tue, 26 Oct 2004 13:36:24 -0400") References: <1098809749.26411.1.camel@hpc-1> <52d5z58ks2.fsf@topspin.com> <1098812184.3266.111.camel@localhost.localdomain> Message-ID: <52lldt6zne.fsf@topspin.com> Hal> Because with just your changes, the port did not become Hal> active. It appeared that there needed to be some Hal> synchronization between the MAD layer completing its Hal> initialization and the agents completing theirs. This was Hal> the simplest way I could think of. I'm sure there are other Hal> solutions too. That's strange, my tree is working fine for me. If you do ib_register_client(&mad_client) ib_register_client(&agent_client) then it's guaranteed that the MAD layer is done initializing each device before the agent layer starts. Oh well. Hal> Also, while your changes were fewer lines of code change, Hal> these changes are less lines of code total. Is there an issue Hal> with doing it this way ? I guess the only downside I see is that it ties ib_mad.c and ib_agent.c together event more tightly (there is agent initialization code in ib_mad.c, rather than keeping all the agent code in ib_agent.c) - Roland From mst at mellanox.co.il Tue Oct 26 12:56:40 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Oct 2004 21:56:40 +0200 Subject: [openib-general] [PATCH] Consolidate access to ib_agent_port_list In-Reply-To: <1098812366.3266.115.camel@localhost.localdomain> References: <1098812366.3266.115.camel@localhost.localdomain> Message-ID: <20041026195640.GG13064@mellanox.co.il> Hello! Quoting r. Hal Rosenstock (halr at voltaire.com) "Re: [openib-general] [PATCH] Consolidate access to?ib_agent_port_list": > Hi again KK, > > On Tue, 2004-10-26 at 13:26, Krishna Kumar wrote: > > Thanks for applying the patch with the 2 fixes. > > Thanks for your efforts on this :-) > > > > I also don't understand why the patching resulted in the need to do some > > > manual merging (as several hunks failed). > > > > Sorry about this, but I am not sure why it happened - I did a svn update > > a few minutes before sending the patch. I am not sure if my mailer is > > mangling the patch - I will check that today. > > I don't think it is your mailer. patch seemed to complain about the line > numbers although manually this was a trivial merge. I am less than > impressed with svn's merge capabilities. It seems to handle only the > most basic merging. Oh well, I've worked with less in the distant past. > > -- Hal How is this subversion related? Just patch -l -p0 seems to do the trick for me. From halr at voltaire.com Tue Oct 26 13:09:53 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 16:09:53 -0400 Subject: [openib-general] [PATCH] Consolidate access to ib_agent_port_list In-Reply-To: <20041026195640.GG13064@mellanox.co.il> References: <1098812366.3266.115.camel@localhost.localdomain> <20041026195640.GG13064@mellanox.co.il> Message-ID: <1098821393.1718.10.camel@localhost.localdomain> On Tue, 2004-10-26 at 15:56, Michael S. Tsirkin wrote: > Hello! > Quoting r. Hal Rosenstock (halr at voltaire.com) "Re: [openib-general] [PATCH] Consolidate access to?ib_agent_port_list": > > Hi again KK, > > > > On Tue, 2004-10-26 at 13:26, Krishna Kumar wrote: > > > Thanks for applying the patch with the 2 fixes. > > > > Thanks for your efforts on this :-) > > > > > > I also don't understand why the patching resulted in the need to do some > > > > manual merging (as several hunks failed). > > > > > > Sorry about this, but I am not sure why it happened - I did a svn update > > > a few minutes before sending the patch. I am not sure if my mailer is > > > mangling the patch - I will check that today. > > > > I don't think it is your mailer. patch seemed to complain about the line > > numbers although manually this was a trivial merge. I am less than > > impressed with svn's merge capabilities. It seems to handle only the > > most basic merging. Oh well, I've worked with less in the distant past. > > > > -- Hal > > How is this subversion related? Right, it's patch not svn. > Just patch -l -p0 seems to do the trick for me In general or with this specific patch ? -- Hal From halr at voltaire.com Tue Oct 26 13:14:14 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Oct 2004 16:14:14 -0400 Subject: [openib-general] [PATCH] Combine ib_agent into ib_mad module In-Reply-To: <52lldt6zne.fsf@topspin.com> References: <1098809749.26411.1.camel@hpc-1> <52d5z58ks2.fsf@topspin.com> <1098812184.3266.111.camel@localhost.localdomain> <52lldt6zne.fsf@topspin.com> Message-ID: <1098821654.3266.135.camel@localhost.localdomain> On Tue, 2004-10-26 at 15:35, Roland Dreier wrote: > Hal> Because with just your changes, the port did not become > Hal> active. It appeared that there needed to be some > Hal> synchronization between the MAD layer completing its > Hal> initialization and the agents completing theirs. This was > Hal> the simplest way I could think of. I'm sure there are other > Hal> solutions too. > > That's strange, my tree is working fine for me. If you do > > ib_register_client(&mad_client) > ib_register_client(&agent_client) > > then it's guaranteed that the MAD layer is done initializing each > device before the agent layer starts. > > Oh well. > > Hal> Also, while your changes were fewer lines of code change, > Hal> these changes are less lines of code total. Is there an issue > Hal> with doing it this way ? > > I guess the only downside I see is that it ties ib_mad.c and > ib_agent.c together event more tightly (there is agent initialization > code in ib_mad.c, rather than keeping all the agent code in ib_agent.c) Is it worth undoing this and investigating what was going on more to make it work in the more decoupled way ? -- Hal From mst at mellanox.co.il Tue Oct 26 13:33:34 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Oct 2004 22:33:34 +0200 Subject: [openib-general] [PATCH] Consolidate access to ib_agent_port_list In-Reply-To: <1098821393.1718.10.camel@localhost.localdomain> References: <1098812366.3266.115.camel@localhost.localdomain> <20041026195640.GG13064@mellanox.co.il> <1098821393.1718.10.camel@localhost.localdomain> Message-ID: <20041026203334.GA13741@mellanox.co.il> Hello! Quoting r. Hal Rosenstock (halr at voltaire.com) "Re: [openib-general] [PATCH] Consolidate access?to?ib_agent_port_list": > On Tue, 2004-10-26 at 15:56, Michael S. Tsirkin wrote: > > Hello! > > Quoting r. Hal Rosenstock (halr at voltaire.com) "Re: [openib-general] [PATCH] Consolidate access to?ib_agent_port_list": > > > Hi again KK, > > > > > > On Tue, 2004-10-26 at 13:26, Krishna Kumar wrote: > > > > Thanks for applying the patch with the 2 fixes. > > > > > > Thanks for your efforts on this :-) > > > > > > > > I also don't understand why the patching resulted in the need to do some > > > > > manual merging (as several hunks failed). > > > > > > > > Sorry about this, but I am not sure why it happened - I did a svn update > > > > a few minutes before sending the patch. I am not sure if my mailer is > > > > mangling the patch - I will check that today. > > > > > > I don't think it is your mailer. patch seemed to complain about the line > > > numbers although manually this was a trivial merge. I am less than > > > impressed with svn's merge capabilities. It seems to handle only the > > > most basic merging. Oh well, I've worked with less in the distant past. > > > > > > -- Hal > > > > How is this subversion related? > > Right, it's patch not svn. OK then. > > Just patch -l -p0 seems to do the trick for me > > In general or with this specific patch ? > > -- Hal In general. From mshefty at ichips.intel.com Tue Oct 26 14:35:02 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 26 Oct 2004 14:35:02 -0700 Subject: [openib-general] MAD completion processing workqueue Message-ID: <20041026143502.4b68c486.mshefty@ichips.intel.com> I'm looking at replacing the completion processing thread with a workqueue as suggested before by Roland. There is currently a workqueue per port that is used only to handle send timeouts. I can either use this same workqueue, or create a new workqueue to handle completion processing. Any preferences? One workqueue per port? Two per port, one for sends and one for receives? Two per port, one for each QP? Four per port? Separate the workqueue from the port completely? Along this same line, do we want to define any specific behavior that clients can expect (e.g. one callback at a time)? My initial thought is to use the current workqueue, but require clients to handle simultaneous callbacks. - Sean -- From mshefty at ichips.intel.com Tue Oct 26 15:29:22 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 26 Oct 2004 15:29:22 -0700 Subject: [openib-general] agent_mad_send Message-ID: <20041026152922.6089cb14.mshefty@ichips.intel.com> In agent_mad_send, a call is made to create an address handle. Immediately after calling ib_post_send_mad, the address handle is destroyed. I think that we want to wait until the send is completed before destroying the address handle, and require this of all callers of ib_post_send_mad. Also, I don't think that we want to have this code access the port_priv structure, such as the send_list_lock (which ends up being acquired twice). Queuing of MADs should be done by the ib_post_send_mad call, and not by its caller. This will be needed to handle QP overflow anyway. Agent_send_handler has a similar issue. I will try to submit a patch for this in a couple of days, but I'm currently trying to separate the send_posted_list for the two QPs for better error and completion handling. - Sean -- From roland at topspin.com Tue Oct 26 20:23:52 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 26 Oct 2004 20:23:52 -0700 Subject: [openib-general] [PATCH] Combine ib_agent into ib_mad module In-Reply-To: <1098821654.3266.135.camel@localhost.localdomain> (Hal Rosenstock's message of "Tue, 26 Oct 2004 16:14:14 -0400") References: <1098809749.26411.1.camel@hpc-1> <52d5z58ks2.fsf@topspin.com> <1098812184.3266.111.camel@localhost.localdomain> <52lldt6zne.fsf@topspin.com> <1098821654.3266.135.camel@localhost.localdomain> Message-ID: <523c007sjb.fsf@topspin.com> Hal> Is it worth undoing this and investigating what was going on Hal> more to make it work in the more decoupled way ? Nah, I don't think so. - R. From roland at topspin.com Tue Oct 26 21:16:27 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 26 Oct 2004 21:16:27 -0700 Subject: [openib-general] MAD completion processing workqueue In-Reply-To: <20041026143502.4b68c486.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 26 Oct 2004 14:35:02 -0700") References: <20041026143502.4b68c486.mshefty@ichips.intel.com> Message-ID: <52y8hs6bj8.fsf@topspin.com> Sean> Any preferences? One workqueue per port? Two per port, one Sean> for sends and one for receives? Two per port, one for each Sean> QP? Four per port? Separate the workqueue from the port Sean> completely? I would say one single threaded WQ per port (or maybe per QP). For the full per-CPU WQ I wouldn't do more than one per HCA (on a typical 2 CPU system with HT, this is already 4 threads). Sean> Along this same line, do we want to define any specific Sean> behavior that clients can expect (e.g. one callback at a Sean> time)? Sean> My initial thought is to use the current workqueue, but Sean> require clients to handle simultaneous callbacks. That seems fine, as long as the ordering of receive callback then send callback for each transaction is preserved. - R. From roland at topspin.com Tue Oct 26 21:17:34 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 26 Oct 2004 21:17:34 -0700 Subject: [openib-general] agent_mad_send In-Reply-To: <20041026152922.6089cb14.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 26 Oct 2004 15:29:22 -0700") References: <20041026152922.6089cb14.mshefty@ichips.intel.com> Message-ID: <52u0sg6bhd.fsf@topspin.com> Sean> In agent_mad_send, a call is made to create an address Sean> handle. Immediately after calling ib_post_send_mad, the Sean> address handle is destroyed. I think that we want to wait Sean> until the send is completed before destroying the address Sean> handle, and require this of all callers of ib_post_send_mad. Yes, that's correct. Because of a quirk in the way Mellanox HCA's implement special QPs, it's actually OK to destroy the AH immediately after posting the send, but for an ordinary QP this will lead to some bizarre problems. - R. From halr at voltaire.com Wed Oct 27 06:47:25 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 09:47:25 -0400 Subject: [openib-general] agent_mad_send In-Reply-To: <20041026152922.6089cb14.mshefty@ichips.intel.com> References: <20041026152922.6089cb14.mshefty@ichips.intel.com> Message-ID: <1098884845.3266.198.camel@localhost.localdomain> On Tue, 2004-10-26 at 18:29, Sean Hefty wrote: > In agent_mad_send, a call is made to create an address handle. > Immediately after calling ib_post_send_mad, the address handle is destroyed. > I think that we want to wait until the send is completed before destroying > the address handle, and require this of all callers of ib_post_send_mad. I can post a patch for this but this depends on whether the agent or MAD layer should destroy the AH. > Also, I don't think that we want to have this code access the port_priv > structure, such as the send_list_lock (which ends up being acquired twice). The agent is using a different port_priv structure and send_list_lock than the one the MAD layer uses. Where is it acquired twice ? > Queuing of MADs should be done by the ib_post_send_mad call, > and not by its caller. This will be needed to handle QP overflow anyway. > Agent_send_handler has a similar issue. The agent layer is queuing the send to release resources (PCI unmap and free MAD memory) on send completion, not for retransmission on QP overflow. -- Hal From halr at voltaire.com Wed Oct 27 07:08:45 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 10:08:45 -0400 Subject: [openib-general] ib_free_recv_mad and references In-Reply-To: <20041026094054.2806fa7a.mshefty@ichips.intel.com> References: <20041026094054.2806fa7a.mshefty@ichips.intel.com> Message-ID: <1098886125.3266.215.camel@localhost.localdomain> On Tue, 2004-10-26 at 12:40, Sean Hefty wrote: > Currently, a call to ib_free_recv_mad does not dereference the mad_agent that > the mad was given to. The call itself does not access the mad_agent, > but should a reference be held on the mad_agent while it has a received MAD? > Looking at the implementation, it appears that a mad_agent could deregister > with the access layer, then call ib_free_recv_mad, which accesses the > ib_mad_cache. ib_mad_cache is in existence from the time of module insertion to removal. Deregistering the mad_agent has no effect on its presence so I don't think the order of ib_free_mad_recv and ib_unregister_mad_agent matters. -- Hal From halr at voltaire.com Wed Oct 27 07:20:05 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 10:20:05 -0400 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <20041026095009.037a4d4a.mshefty@ichips.intel.com> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <1098809096.3266.47.camel@localhost.localdomain> <20041026095009.037a4d4a.mshefty@ichips.intel.com> Message-ID: <1098886805.3266.218.camel@localhost.localdomain> On Tue, 2004-10-26 at 12:50, Sean Hefty wrote: > I think we have other issues with the completion handling as well. > Since we use a single CQ for both QPs, I think that we need to search > the send_posted_mad_list to find the corresponding completion. > We cannot assume that the completion matches with the request at the > head of the list. > > This appears to be broken in the non-error case as well. > > I will happily create a patch to fix these issues. Is it worth fixing this for the current approach or should I just wait for this patch ? Thanks. -- Hal From halr at voltaire.com Wed Oct 27 08:17:42 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 11:17:42 -0400 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <528y9ublt9.fsf@topspin.com> References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> <52breqd8t8.fsf@topspin.com> <20041025102752.3113fb69.mshefty@ichips.intel.com> <52y8hubt2m.fsf@topspin.com> <20041025104115.275dc24b.mshefty@ichips.intel.com> <52u0sibsfq.fsf@topspin.com> <1098734193.3266.8.camel@localhost.localdomain> <528y9ublt9.fsf@topspin.com> Message-ID: <1098890262.3266.277.camel@localhost.localdomain> On Mon, 2004-10-25 at 16:10, Roland Dreier wrote: > Just to be clear -- SMI checks should happen before the > low-level driver gets the MAD, and the low-level driver should only > get the MAD once (and only generate one reply if necessary). If I understand correctly, this obviates the need for what is now ib_agent. All that might remain is SMI handling for DR SMPs. Is that right ? -- Hal From mshefty at ichips.intel.com Wed Oct 27 09:40:56 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 09:40:56 -0700 Subject: [openib-general] ib_free_recv_mad and references In-Reply-To: <1098886125.3266.215.camel@localhost.localdomain> References: <20041026094054.2806fa7a.mshefty@ichips.intel.com> <1098886125.3266.215.camel@localhost.localdomain> Message-ID: <20041027094056.77c13756.mshefty@ichips.intel.com> On Wed, 27 Oct 2004 10:08:45 -0400 Hal Rosenstock wrote: > On Tue, 2004-10-26 at 12:40, Sean Hefty wrote: > > Currently, a call to ib_free_recv_mad does not dereference the mad_agent that > > the mad was given to. The call itself does not access the mad_agent, > > but should a reference be held on the mad_agent while it has a received MAD? > > Looking at the implementation, it appears that a mad_agent could deregister > > with the access layer, then call ib_free_recv_mad, which accesses the > > ib_mad_cache. > > ib_mad_cache is in existence from the time of module insertion to > removal. Deregistering the mad_agent has no effect on its presence so I > don't think the order of ib_free_mad_recv and ib_unregister_mad_agent > matters. I'm just wondering more about whether we should permit an agent to unregister while it has received MADs outstanding. Or, if it makes more sense to block unregister until all MADs have been returned. - Sean From mshefty at ichips.intel.com Wed Oct 27 09:42:14 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 09:42:14 -0700 Subject: [openib-general] agent_mad_send In-Reply-To: <52u0sg6bhd.fsf@topspin.com> References: <20041026152922.6089cb14.mshefty@ichips.intel.com> <52u0sg6bhd.fsf@topspin.com> Message-ID: <20041027094214.72894368.mshefty@ichips.intel.com> On Tue, 26 Oct 2004 21:17:34 -0700 Roland Dreier wrote: > Sean> In agent_mad_send, a call is made to create an address > Sean> handle. Immediately after calling ib_post_send_mad, the > Sean> address handle is destroyed. I think that we want to wait > Sean> until the send is completed before destroying the address > Sean> handle, and require this of all callers of ib_post_send_mad. > > Yes, that's correct. Because of a quirk in the way Mellanox HCA's > implement special QPs, it's actually OK to destroy the AH immediately > after posting the send, but for an ordinary QP this will lead to some > bizarre problems. I'm concerned about handling QP overrun cases, where the call to ib_post_send_mad doesn't immediately post to the QP. - Sean From mshefty at ichips.intel.com Wed Oct 27 09:47:53 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 09:47:53 -0700 Subject: [openib-general] agent_mad_send In-Reply-To: <1098884845.3266.198.camel@localhost.localdomain> References: <20041026152922.6089cb14.mshefty@ichips.intel.com> <1098884845.3266.198.camel@localhost.localdomain> Message-ID: <20041027094753.2d8c64a0.mshefty@ichips.intel.com> On Wed, 27 Oct 2004 09:47:25 -0400 Hal Rosenstock wrote: > On Tue, 2004-10-26 at 18:29, Sean Hefty wrote: > > In agent_mad_send, a call is made to create an address handle. > > Immediately after calling ib_post_send_mad, the address handle is destroyed. > > I think that we want to wait until the send is completed before destroying > > the address handle, and require this of all callers of ib_post_send_mad. > > I can post a patch for this but this depends on whether the agent or MAD > layer should destroy the AH. I think that the MAD agent should, since it allocated the AH. > > Also, I don't think that we want to have this code access the port_priv > > structure, such as the send_list_lock (which ends up being acquired twice). > > The agent is using a different port_priv structure and send_list_lock > than the one the MAD layer uses. Where is it acquired twice ? My bad. I was working off of the variable names, and didn't check that they had different types. - Sean -- From mshefty at ichips.intel.com Wed Oct 27 09:53:28 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 09:53:28 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <1098886805.3266.218.camel@localhost.localdomain> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <1098809096.3266.47.camel@localhost.localdomain> <20041026095009.037a4d4a.mshefty@ichips.intel.com> <1098886805.3266.218.camel@localhost.localdomain> Message-ID: <20041027095328.2e2a7f63.mshefty@ichips.intel.com> On Wed, 27 Oct 2004 10:20:05 -0400 Hal Rosenstock wrote: > On Tue, 2004-10-26 at 12:50, Sean Hefty wrote: > > I think we have other issues with the completion handling as well. > > Since we use a single CQ for both QPs, I think that we need to search > > the send_posted_mad_list to find the corresponding completion. > > We cannot assume that the completion matches with the request at the > > head of the list. > > > > This appears to be broken in the non-error case as well. > > > > I will happily create a patch to fix these issues. > > Is it worth fixing this for the current approach or should I just wait > for this patch ? I'll create a patch that uses separate send_posted_mad_list's for QP0/1, but try to keep the changes fairly minimal. I'll do this after changing the completion handling to use the current workqueue, rather than allocating a separate thread. (I've canned my user-mode work, since Roland is further along.) - Sean From halr at voltaire.com Wed Oct 27 10:11:17 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 13:11:17 -0400 Subject: [openib-general] ib_free_recv_mad and references In-Reply-To: <20041027094056.77c13756.mshefty@ichips.intel.com> References: <20041026094054.2806fa7a.mshefty@ichips.intel.com> <1098886125.3266.215.camel@localhost.localdomain> <20041027094056.77c13756.mshefty@ichips.intel.com> Message-ID: <1098897077.3266.471.camel@localhost.localdomain> On Wed, 2004-10-27 at 12:40, Sean Hefty wrote: > I'm just wondering more about whether we should permit an agent to unregister > while it has received MADs outstanding. Or, if it makes more sense to block > unregister until all MADs have been returned. That sounds reasonable to me since if the deregistration is blocked until all received MADs are freed, this will make it more likely that the client will do the right thing. Shall I work up a patch for this ? -- Hal From halr at voltaire.com Wed Oct 27 10:21:27 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 13:21:27 -0400 Subject: [openib-general] agent_mad_send In-Reply-To: <20041027094753.2d8c64a0.mshefty@ichips.intel.com> References: <20041026152922.6089cb14.mshefty@ichips.intel.com> <1098884845.3266.198.camel@localhost.localdomain> <20041027094753.2d8c64a0.mshefty@ichips.intel.com> Message-ID: <1098897687.3266.511.camel@localhost.localdomain> On Wed, 2004-10-27 at 12:47, Sean Hefty wrote: > > I can post a patch for this but this depends on whether the agent or MAD > > layer should destroy the AH. > > I think that the MAD agent should, since it allocated the AH. That's what I thought but didn't want to post a patch and find out otherwise. -- Hal From halr at voltaire.com Wed Oct 27 10:49:15 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 13:49:15 -0400 Subject: [openib-general] [PATCH] ib_agent: In agent_mad_send, destroy address handle on send completion rather than immediately Message-ID: <1098899354.11291.3.camel@hpc-1> ib_agent: In agent_mad_send, destroy address handle on send completion rather than immediately Index: ib_agent_priv.h =================================================================== --- ib_agent_priv.h (revision 1071) +++ ib_agent_priv.h (working copy) @@ -32,6 +32,7 @@ struct ib_agent_send_wr { struct list_head send_list; + struct ib_ah *ah; struct ib_mad *mad; DECLARE_PCI_UNMAP_ADDR(mapping) }; Index: ib_agent.c =================================================================== --- ib_agent.c (revision 1077) +++ ib_agent.c (working copy) @@ -350,7 +350,6 @@ struct ib_send_wr send_wr; struct ib_send_wr *bad_send_wr; struct ib_ah_attr ah_attr; - struct ib_ah *ah; unsigned long flags; /* Find matching MAD agent */ @@ -404,14 +403,14 @@ ah_attr.ah_flags = 0; /* No GRH */ } - ah = ib_create_ah(mad_agent->qp->pd, &ah_attr); - if (IS_ERR(ah)) { + agent_send_wr->ah = ib_create_ah(mad_agent->qp->pd, &ah_attr); + if (IS_ERR(agent_send_wr->ah)) { printk(KERN_ERR SPFX "No memory for address handle\n"); kfree(mad); return; } - send_wr.wr.ud.ah = ah; + send_wr.wr.ud.ah = agent_send_wr->ah; if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT) { send_wr.wr.ud.pkey_index = mad_recv_wc->wc->pkey_index; send_wr.wr.ud.remote_qkey = IB_QP1_QKEY; @@ -427,16 +426,17 @@ /* Send */ spin_lock_irqsave(&port_priv->send_list_lock, flags); if (ib_post_send_mad(mad_agent, &send_wr, &bad_send_wr)) { + spin_unlock_irqrestore(&port_priv->send_list_lock, flags); pci_unmap_single(mad_agent->device->dma_device, pci_unmap_addr(agent_send_wr, mapping), sizeof(struct ib_mad), PCI_DMA_TODEVICE); + ib_destroy_ah(agent_send_wr->ah); } else { list_add_tail(&agent_send_wr->send_list, &port_priv->send_posted_list); + spin_unlock_irqrestore(&port_priv->send_list_lock, flags); } - spin_unlock_irqrestore(&port_priv->send_list_lock, flags); - ib_destroy_ah(ah); } int smi_send_smp(struct ib_mad_agent *mad_agent, @@ -590,6 +590,8 @@ sizeof(struct ib_mad), PCI_DMA_TODEVICE); + ib_destroy_ah(agent_send_wr->ah); + /* Release allocated memory */ kfree(agent_send_wr->mad); } From mshefty at ichips.intel.com Wed Oct 27 10:45:48 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 10:45:48 -0700 Subject: [openib-general] ib_free_recv_mad and references In-Reply-To: <1098897077.3266.471.camel@localhost.localdomain> References: <20041026094054.2806fa7a.mshefty@ichips.intel.com> <1098886125.3266.215.camel@localhost.localdomain> <20041027094056.77c13756.mshefty@ichips.intel.com> <1098897077.3266.471.camel@localhost.localdomain> Message-ID: <20041027104548.5ed9d8e6.mshefty@ichips.intel.com> On Wed, 27 Oct 2004 13:11:17 -0400 Hal Rosenstock wrote: > On Wed, 2004-10-27 at 12:40, Sean Hefty wrote: > > I'm just wondering more about whether we should permit an agent to unregister > > while it has received MADs outstanding. Or, if it makes more sense to block > > unregister until all MADs have been returned. > > That sounds reasonable to me since if the deregistration is blocked > until all received MADs are freed, this will make it more likely that > the client will do the right thing. > > Shall I work up a patch for this ? If you have time. I'll get to it if not. I don't think this will be a large change. - Sean From halr at voltaire.com Wed Oct 27 11:03:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 14:03:26 -0400 Subject: [openib-general] ib_free_recv_mad and references In-Reply-To: <20041027104548.5ed9d8e6.mshefty@ichips.intel.com> References: <20041026094054.2806fa7a.mshefty@ichips.intel.com> <1098886125.3266.215.camel@localhost.localdomain> <20041027094056.77c13756.mshefty@ichips.intel.com> <1098897077.3266.471.camel@localhost.localdomain> <20041027104548.5ed9d8e6.mshefty@ichips.intel.com> Message-ID: <1098900206.3266.643.camel@localhost.localdomain> On Wed, 2004-10-27 at 13:45, Sean Hefty wrote: > If you have time. I'll get to it if not. I don't think this will be a > large change. I think the signature for ib_free_recv_mad needs to add in a mad_agent parameter as there is currently no need to know which mad_agent was returning the buffers but there will be for the ref counting. Do you see some other way to do this ? Also, with this, I now see what you were saying about partially reassembled (RMPP) receives. BTW, there are 2 comments in ib_unregister_mad_agent referring to both of these: /* Note that we could still be handling received MADs */ /* XXX: Cleanup pending RMPP receives for this agent */ -- Hal From rminnich at lanl.gov Wed Oct 27 11:06:09 2004 From: rminnich at lanl.gov (Ronald G. Minnich) Date: Wed, 27 Oct 2004 12:06:09 -0600 (MDT) Subject: [openib-general] how we DON'T want to make openib Message-ID: I just noticed this tree from a VAPI make :-) ├─sshd───bash───make───make───sh�─cat └─2*[grep] This is a bit of work for my poor machine :=) ron From roland at topspin.com Wed Oct 27 11:07:39 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 27 Oct 2004 11:07:39 -0700 Subject: [openib-general] Handling SM class (SMInfo vs. other queries) In-Reply-To: <1098890262.3266.277.camel@localhost.localdomain> (Hal Rosenstock's message of "Wed, 27 Oct 2004 11:17:42 -0400") References: <52ekjqfq88.fsf@topspin.com> <1098480059.22400.69.camel@hpc-1> <52mzyee981.fsf@topspin.com> <20041022152012.4612097b.mshefty@ichips.intel.com> <52is92e5mh.fsf@topspin.com> <1098722375.3269.934.camel@localhost.localdomain> <52breqd8t8.fsf@topspin.com> <20041025102752.3113fb69.mshefty@ichips.intel.com> <52y8hubt2m.fsf@topspin.com> <20041025104115.275dc24b.mshefty@ichips.intel.com> <52u0sibsfq.fsf@topspin.com> <1098734193.3266.8.camel@localhost.localdomain> <528y9ublt9.fsf@topspin.com> <1098890262.3266.277.camel@localhost.localdomain> Message-ID: <52wtxc3uhg.fsf@topspin.com> Hal> If I understand correctly, this obviates the need for what is Hal> now ib_agent. All that might remain is SMI handling for DR Hal> SMPs. Is that right ? I think the receive path looks something like if (DR SMP) SMI checks (discard on failure) rc = process_mad() if (rc & CONSUMED) if (rc & REPONSE) if (DR SMP) outgoing SMI updates send response free MAD else agent dispatch so ib_agent still needs a QP0 agent and a QP1 agent per port for handling sends, but it won't receive any MADs. - Roland From mshefty at ichips.intel.com Wed Oct 27 11:59:22 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 11:59:22 -0700 Subject: [openib-general] [PATCH] change MAD completion processing to use workqueue Message-ID: <20041027115922.6ca6a0eb.mshefty@ichips.intel.com> Index: access/ib_mad_priv.h =================================================================== --- access/ib_mad_priv.h (revision 1078) +++ access/ib_mad_priv.h (working copy) @@ -153,6 +153,7 @@ struct ib_mad_mgmt_class_table *version[MAX_MGMT_VERSION]; struct list_head agent_list; struct workqueue_struct *wq; + struct work_struct work; spinlock_t send_list_lock; struct list_head send_posted_mad_list; @@ -162,9 +163,6 @@ struct list_head recv_posted_mad_list[IB_MAD_QPS_CORE]; int recv_posted_mad_count[IB_MAD_QPS_CORE]; u32 recv_wr_index[IB_MAD_QPS_CORE]; - - struct task_struct *mad_thread; - int thread_wake; }; #endif /* __IB_MAD_PRIV_H__ */ Index: access/ib_mad.c =================================================================== --- access/ib_mad.c (revision 1078) +++ access/ib_mad.c (working copy) @@ -1158,10 +1158,12 @@ /* * IB MAD completion callback */ -static void ib_mad_completion_handler(struct ib_mad_port_private *port_priv) +static void ib_mad_completion_handler(void *data) { + struct ib_mad_port_private *port_priv; struct ib_wc wc; + port_priv = (struct ib_mad_port_private*)data; ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { @@ -1333,57 +1335,10 @@ spin_unlock_irqrestore(&mad_agent_priv->lock, flags); } -/* - * IB MAD thread - */ -static int ib_mad_thread(void *param) -{ - struct ib_mad_port_private *port_priv = param; - - __set_current_state(TASK_RUNNING); - - do { - port_priv->thread_wake = 0; - wmb(); - - ib_mad_completion_handler(port_priv); - - set_current_state(TASK_INTERRUPTIBLE); - if (!port_priv->thread_wake) - schedule(); - __set_current_state(TASK_RUNNING); - } while (!kthread_should_stop()); - - return 0; -} - -/* - * Initialize the IB MAD thread - */ -static int ib_mad_thread_init(struct ib_mad_port_private *port_priv) -{ - port_priv->thread_wake = 0; - - port_priv->mad_thread = kthread_create(ib_mad_thread, - port_priv, - "ib_mad(%6s-%-2d)", - port_priv->device->name, - port_priv->port_num); - if (IS_ERR(port_priv->mad_thread)) { - printk(KERN_ERR PFX "Couldn't start ib_mad thread for %s port %d\n", - port_priv->device->name, port_priv->port_num); - return PTR_ERR(port_priv->mad_thread); - } - return 0; -} - static void ib_mad_thread_completion_handler(struct ib_cq *cq) { struct ib_mad_port_private *port_priv = cq->cq_context; - - port_priv->thread_wake = 1; - wmb(); - wake_up_process(port_priv->mad_thread); + queue_work(port_priv->wq, &port_priv->work); } static int ib_mad_post_receive_mad(struct ib_mad_port_private *port_priv, @@ -1845,15 +1800,12 @@ ret = -ENOMEM; goto error8; } - - ret = ib_mad_thread_init(port_priv); - if (ret) - goto error9; + INIT_WORK(&port_priv->work, ib_mad_completion_handler, port_priv); ret = ib_mad_port_start(port_priv); if (ret) { printk(KERN_ERR PFX "Couldn't start port\n"); - goto error10; + goto error9; } spin_lock_irqsave(&ib_mad_port_list_lock, flags); @@ -1862,8 +1814,6 @@ return 0; -error10: - kthread_stop(port_priv->mad_thread); error9: destroy_workqueue(port_priv->wq); error8: @@ -1903,7 +1853,7 @@ spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); ib_mad_port_stop(port_priv); - kthread_stop(port_priv->mad_thread); + flush_workqueue(port_priv->wq); destroy_workqueue(port_priv->wq); ib_destroy_qp(port_priv->qp[1]); ib_destroy_qp(port_priv->qp[0]); -- From rminnich at lanl.gov Wed Oct 27 12:08:38 2004 From: rminnich at lanl.gov (Ronald G. Minnich) Date: Wed, 27 Oct 2004 13:08:38 -0600 (MDT) Subject: [openib-general] how we DON'T want to make openib In-Reply-To: References: Message-ID: On Wed, 27 Oct 2004, Ronald G. Minnich wrote: > > > I just noticed this tree from a VAPI make :-) > > ├─sshd───bash───make───make───sh�─cat > └─2*[grep] > well that didn't translate make make sh make make make (cat|grep) was the tree. ron From halr at voltaire.com Wed Oct 27 12:16:53 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 15:16:53 -0400 Subject: [openib-general] Re: [PATCH] change MAD completion processing to use workqueue In-Reply-To: <20041027115922.6ca6a0eb.mshefty@ichips.intel.com> References: <20041027115922.6ca6a0eb.mshefty@ichips.intel.com> Message-ID: <1098904613.3266.844.camel@localhost.localdomain> On Wed, 2004-10-27 at 14:59, Sean Hefty wrote: Thanks. Applied. -- Hal From rminnich at lanl.gov Wed Oct 27 12:23:01 2004 From: rminnich at lanl.gov (Ronald G. Minnich) Date: Wed, 27 Oct 2004 13:23:01 -0600 (MDT) Subject: [openib-general] logset Message-ID: I'm unclear on how to use this. logset print gives back nothing. logset debug 8 gives back nothing. what is this thing? how do I get trace logging of something like vstat, given that vstat is not working on my machine? thanks ron From roland at topspin.com Wed Oct 27 13:35:22 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 27 Oct 2004 13:35:22 -0700 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <1098728484.3269.968.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 25 Oct 2004 14:21:24 -0400") References: <52pt36brzm.fsf@topspin.com> <1098728484.3269.968.camel@localhost.localdomain> Message-ID: <52bren527p.fsf@topspin.com> OK, I'm going to go ahead and rename ib_mad.c -> mad.c, ib_agent.c -> agent.c etc. (This also makes it possible to build a module named ib_mad.o, which I think makes more sense than ib_al.o, from multiple sources). I can continue to merge by hand but it might make sense to make the same change on Hal's branch. - R. From mshefty at ichips.intel.com Wed Oct 27 13:55:17 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 13:55:17 -0700 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <52bren527p.fsf@topspin.com> References: <52pt36brzm.fsf@topspin.com> <1098728484.3269.968.camel@localhost.localdomain> <52bren527p.fsf@topspin.com> Message-ID: <20041027135517.17ba87fb.mshefty@ichips.intel.com> On Wed, 27 Oct 2004 13:35:22 -0700 Roland Dreier wrote: > OK, I'm going to go ahead and rename ib_mad.c -> mad.c, ib_agent.c -> > agent.c etc. (This also makes it possible to build a module named > ib_mad.o, which I think makes more sense than ib_al.o, from multiple > sources). > > I can continue to merge by hand but it might make sense to make the > same change on Hal's branch. The name changes sound good to me. I didn't realize that you had taken a copy of the current mad code. Is there anything in the openib-candidate branch that isn't in your branch? Does it make sense to just update the code in the roland-merge branch? From mshefty at ichips.intel.com Wed Oct 27 14:42:17 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 14:42:17 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <1098810840.3266.69.camel@localhost.localdomain> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <52hdoh8kxs.fsf@topspin.com> <1098810840.3266.69.camel@localhost.localdomain> Message-ID: <20041027144217.2dac3118.mshefty@ichips.intel.com> On Tue, 26 Oct 2004 13:14:00 -0400 Hal Rosenstock wrote: > On Tue, 2004-10-26 at 13:10, Roland Dreier wrote: > > Sean> As a suggestion, we can allocate 2 CQs per QP, one for > > Sean> receives, and one for sends. This would let us separate > > Sean> send from receive completions based on the callback. > > > > That's one solution, and another way to handle it is to have a way > > of distinguishing sends from receives based on wr_id (that's what > > the Topspin stack does). > > That's where I was heading with this. It implies a "stolen" bit in the > WRID. > > > Not sure which is better really. > > Me neither but Sean seems to feel strongly about the CQ separation. Just to make sure that we don't have duplicate efforts, I've been working on the patch to fix handling of send completions. My plan is to use one send_mad_posted_list per QP, to make it faster/easier to find the correct send completion, plus allow for easier error handling when one of the special QPs goes into the error state. The code currently maintains a single CQ per port. - Sean From mshefty at ichips.intel.com Wed Oct 27 15:42:50 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 15:42:50 -0700 Subject: [openib-general] ib_mad_port_start allows receive processing before sends can be posted Message-ID: <20041027154250.23190b8f.mshefty@ichips.intel.com> There appears to be a minor race in ib_mad_port_start where the MAD layer could begin accepting and processing receives before the QP allows sends, or even before we know if the QP will finish initializing properly. This makes it difficult to handle traffic that comes in before the QP is transitioned to the RTS state, to recover from errors if the RTS transition fails, or to recover from errors if we fail to initialize QP1 after QP0 is active. Longer term, we may want to consider separating the QP0 and QP1 initialization. Short term, I think that if we just move the code around in ib_mad_port_start, we should be able to ensure that both QPs are ready to send and receive before handling any receives. (I don't think that we care if the QPs go to the RTS state without any receives being posted on them. We'll lose all MADs received before the QP goes into the RTR state anyway, so this adds a small delay onto the time that we need to begin handling receives.) Unless there's a reason to keep the code as is, I'll generate a patch for this. - Sean -- From mshefty at ichips.intel.com Wed Oct 27 16:15:25 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 16:15:25 -0700 Subject: [openib-general] ib_mad_recv_wrid index field Message-ID: <20041027161525.770adada.mshefty@ichips.intel.com> What's the purpose behind the index field in the receive wr_id? - Sean From halr at voltaire.com Wed Oct 27 18:53:17 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 27 Oct 2004 21:53:17 -0400 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <52bren527p.fsf@topspin.com> References: <52pt36brzm.fsf@topspin.com> <1098728484.3269.968.camel@localhost.localdomain> <52bren527p.fsf@topspin.com> Message-ID: <1098928397.3266.1638.camel@localhost.localdomain> On Wed, 2004-10-27 at 16:35, Roland Dreier wrote: > OK, I'm going to go ahead and rename ib_mad.c -> mad.c, ib_agent.c -> > agent.c etc. (This also makes it possible to build a module named > ib_mad.o, which I think makes more sense than ib_al.o, from multiple > sources). > > I can continue to merge by hand but it might make sense to make the > same change on Hal's branch. I will do this too. -- Hal From roland at topspin.com Wed Oct 27 22:18:44 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 27 Oct 2004 22:18:44 -0700 Subject: [openib-general] [PATCH] Add new SA module Message-ID: <52wtxb2zez.fsf@topspin.com> Here's the new SA module (with support only for PathRecord GETs and MCMemberRecord SETs) that I just checked in. All comments and criticism welcome... (It may be easier to review the code just by looking at include/ib_sa.h and core/sa_query.c in the repo rather than a diff that is just added lines...) - R. Index: include/ib_sa.h =================================================================== --- include/ib_sa.h (revision 0) +++ include/ib_sa.h (revision 0) @@ -0,0 +1,178 @@ +/* + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available at + * , or the OpenIB.org BSD + * license, available in the LICENSE.TXT file accompanying this + * software. These details are also available at + * . + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Copyright (c) 2004 Topspin Communications. All rights reserved. + * + * $Id$ + */ + +#ifndef IB_SA_H +#define IB_SA_H + +#include + +#include + +enum { + IB_SA_CLASS_VERSION = 2 /* IB spec version 1.1 */ +}; + +enum ib_sa_selector { + IB_SA_GTE = 0, + IB_SA_LTE = 1, + IB_SA_EQ = 2, + /* + * The meaning of "best" depends on the attribute: for + * example, for MTU best will return the largest available + * MTU, while for packet life time, best will return the + * smallest available life time. + */ + IB_SA_BEST = 3 +}; + +typedef u64 __bitwise ib_sa_comp_mask; + +#define IB_SA_COMP_MASK(n) ((__force ib_sa_comp_mask) cpu_to_be64(1ull << n)) + +/* + * Structures for SA records are named "struct ib_sa_xxx_rec." No + * attempt is made to pack structures to match the physical layout of + * SA records in SA MADs; all packing and unpacking is handled by the + * SA query code. + * + * For a record with structure ib_sa_xxx_rec, the naming convention + * for the component mask value for field yyy is IB_SA_XXX_REC_YYY (we + * never use different abbreviations or otherwise change the spelling + * of xxx/yyy between ib_sa_xxx_rec.yyy and IB_SA_XXX_REC_YYY). + * + * Reserved rows are indicated with comments to help maintainability. + */ + +/* reserved: 0 */ +/* reserved: 1 */ +#define IB_SA_PATH_REC_DGID IB_SA_COMP_MASK( 2) +#define IB_SA_PATH_REC_SGID IB_SA_COMP_MASK( 3) +#define IB_SA_PATH_REC_DLID IB_SA_COMP_MASK( 4) +#define IB_SA_PATH_REC_SLID IB_SA_COMP_MASK( 5) +#define IB_SA_PATH_REC_RAW_TRAFFIC IB_SA_COMP_MASK( 6) +/* reserved: 7 */ +#define IB_SA_PATH_REC_FLOW_LABEL IB_SA_COMP_MASK( 8) +#define IB_SA_PATH_REC_HOP_LIMIT IB_SA_COMP_MASK( 9) +#define IB_SA_PATH_REC_TRAFFIC_CLASS IB_SA_COMP_MASK(10) +#define IB_SA_PATH_REC_REVERSIBLE IB_SA_COMP_MASK(11) +#define IB_SA_PATH_REC_NUMB_PATH IB_SA_COMP_MASK(12) +#define IB_SA_PATH_REC_PKEY IB_SA_COMP_MASK(13) +/* reserved: 14 */ +#define IB_SA_PATH_REC_SL IB_SA_COMP_MASK(15) +#define IB_SA_PATH_REC_MTU_SELECTOR IB_SA_COMP_MASK(16) +#define IB_SA_PATH_REC_MTU IB_SA_COMP_MASK(17) +#define IB_SA_PATH_REC_RATE_SELECTOR IB_SA_COMP_MASK(18) +#define IB_SA_PATH_REC_RATE IB_SA_COMP_MASK(19) +#define IB_SA_PATH_REC_PACKET_LIFE_TIME_SELECTOR IB_SA_COMP_MASK(20) +#define IB_SA_PATH_REC_PACKET_LIFE_TIME IB_SA_COMP_MASK(21) +#define IB_SA_PATH_REC_PREFERENCE IB_SA_COMP_MASK(22) + +struct ib_sa_path_rec { + /* reserved */ + /* reserved */ + union ib_gid dgid; + union ib_gid sgid; + u16 dlid; + u16 slid; + int raw_traffic; + /* reserved */ + u32 flow_label; + u8 hop_limit; + u8 traffic_class; + int reversible; + u8 numb_path; + u16 pkey; + /* reserved */ + u8 sl; + u8 mtu_selector; + enum ib_mtu mtu; + u8 rate_selector; + u8 rate; + u8 packet_life_time_selector; + u8 packet_life_time; + u8 preference; +}; + +#define IB_SA_MCMEMBER_REC_MGID IB_SA_COMP_MASK( 0) +#define IB_SA_MCMEMBER_REC_PORT_GID IB_SA_COMP_MASK( 1) +#define IB_SA_MCMEMBER_REC_QKEY IB_SA_COMP_MASK( 2) +#define IB_SA_MCMEMBER_REC_MLID IB_SA_COMP_MASK( 3) +#define IB_SA_MCMEMBER_REC_MTU_SELECTOR IB_SA_COMP_MASK( 4) +#define IB_SA_MCMEMBER_REC_MTU IB_SA_COMP_MASK( 5) +#define IB_SA_MCMEMBER_REC_TRAFFIC_CLASS IB_SA_COMP_MASK( 6) +#define IB_SA_MCMEMBER_REC_PKEY IB_SA_COMP_MASK( 7) +#define IB_SA_MCMEMBER_REC_RATE_SELECTOR IB_SA_COMP_MASK( 8) +#define IB_SA_MCMEMBER_REC_RATE IB_SA_COMP_MASK( 9) +#define IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME_SELECTOR IB_SA_COMP_MASK(10) +#define IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME IB_SA_COMP_MASK(11) +#define IB_SA_MCMEMBER_REC_SL IB_SA_COMP_MASK(12) +#define IB_SA_MCMEMBER_REC_FLOW_LABEL IB_SA_COMP_MASK(13) +#define IB_SA_MCMEMBER_REC_HOP_LIMIT IB_SA_COMP_MASK(14) +#define IB_SA_MCMEMBER_REC_SCOPE IB_SA_COMP_MASK(15) +#define IB_SA_MCMEMBER_REC_JOIN_STATE IB_SA_COMP_MASK(16) +#define IB_SA_MCMEMBER_REC_PROXY_JOIN IB_SA_COMP_MASK(17) + +struct ib_sa_mcmember_rec { + union ib_gid mgid; + union ib_gid port_gid; + u32 qkey; + u16 mlid; + u8 mtu_selector; + enum ib_mtu mtu; + u8 traffic_class; + u16 pkey; + u8 rate_selector; + u8 rate; + u8 packet_life_time_selector; + u8 packet_life_time; + u8 sl; + u32 flow_label; + u8 hop_limit; + u8 scope; + u8 join_state; + int proxy_join; +}; + +struct ib_sa_query; + +void ib_sa_cancel_query(int id, struct ib_sa_query *query); + +int ib_sa_path_rec_get(struct ib_device *device, u8 port_num, + struct ib_sa_path_rec *rec, u64 comp_mask, + int timeout_ms, int gfp_mask, + void (*callback)(int status, + struct ib_sa_path_rec *resp, + void *context), + void *context, + struct ib_sa_query **query); + +int ib_sa_mcmember_rec_set(struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, u64 comp_mask, + int timeout_ms, int gfp_mask, + void (*callback)(int status, + struct ib_sa_mcmember_rec *resp, + void *context), + void *context, + struct ib_sa_query **query); + +#endif /* IB_SA_H */ Index: core/sa_query.c =================================================================== --- core/sa_query.c (revision 0) +++ core/sa_query.c (revision 0) @@ -0,0 +1,789 @@ +/* + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available at + * , or the OpenIB.org BSD + * license, available in the LICENSE.TXT file accompanying this + * software. These details are also available at + * . + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Copyright (c) 2004 Topspin Communications. All rights reserved. + * + * $Id$ + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +MODULE_AUTHOR("Roland Dreier"); +MODULE_DESCRIPTION("InfiniBand subnet administration query support"); +MODULE_LICENSE("Dual BSD/GPL"); + +struct ib_sa_hdr { + u64 sm_key; + u16 attr_offset; + u16 reserved; + ib_sa_comp_mask comp_mask; +} __attribute__ ((packed)); + +struct ib_sa_mad { + struct ib_mad_hdr mad_hdr; + struct ib_rmpp_hdr rmpp_hdr; + struct ib_sa_hdr sa_hdr; + u8 data[200]; +} __attribute__ ((packed)); + +struct ib_sa_sm_ah { + struct ib_ah *ah; + struct kref ref; +}; + +struct ib_sa_port { + struct ib_mad_agent *agent; + struct ib_mr *mr; + struct ib_sa_sm_ah *sm_ah; + struct work_struct update_task; + spinlock_t ah_lock; + u8 port_num; +}; + +struct ib_sa_device { + int start_port, end_port; + struct ib_event_handler event_handler; + struct ib_sa_port port[0]; +}; + +struct ib_sa_query { + void (*callback)(struct ib_sa_query *, int, struct ib_sa_mad *); + void (*release)(struct ib_sa_query *); + struct ib_sa_port *port; + struct ib_sa_mad *mad; + struct ib_sa_sm_ah *sm_ah; + DECLARE_PCI_UNMAP_ADDR(mapping) + int id; +}; + +struct ib_sa_path_query { + void (*callback)(int, struct ib_sa_path_rec *, void *); + void *context; + struct ib_sa_query sa_query; +}; + +struct ib_sa_mcmember_query { + void (*callback)(int, struct ib_sa_mcmember_rec *, void *); + void *context; + struct ib_sa_query sa_query; +}; + +static void ib_sa_add_one(struct ib_device *device); +static void ib_sa_remove_one(struct ib_device *device); + +static struct ib_client sa_client = { + .name = "sa", + .add = ib_sa_add_one, + .remove = ib_sa_remove_one +}; + +static spinlock_t idr_lock = SPIN_LOCK_UNLOCKED; +DEFINE_IDR(query_idr); + +enum { + IB_SA_ATTR_CLASS_PORTINFO = 0x01, + IB_SA_ATTR_NOTICE = 0x02, + IB_SA_ATTR_INFORM_INFO = 0x03, + IB_SA_ATTR_NODE_REC = 0x11, + IB_SA_ATTR_PORT_INFO_REC = 0x12, + IB_SA_ATTR_SL2VL_REC = 0x13, + IB_SA_ATTR_SWITCH_REC = 0x14, + IB_SA_ATTR_LINEAR_FDB_REC = 0x15, + IB_SA_ATTR_RANDOM_FDB_REC = 0x16, + IB_SA_ATTR_MCAST_FDB_REC = 0x17, + IB_SA_ATTR_SM_INFO_REC = 0x18, + IB_SA_ATTR_LINK_REC = 0x20, + IB_SA_ATTR_GUID_INFO_REC = 0x30, + IB_SA_ATTR_SERVICE_REC = 0x31, + IB_SA_ATTR_PARTITION_REC = 0x33, + IB_SA_ATTR_RANGE_REC = 0x34, + IB_SA_ATTR_PATH_REC = 0x35, + IB_SA_ATTR_VL_ARB_REC = 0x36, + IB_SA_ATTR_MC_GROUP_REC = 0x37, + IB_SA_ATTR_MC_MEMBER_REC = 0x38, + IB_SA_ATTR_TRACE_REC = 0x39, + IB_SA_ATTR_MULTI_PATH_REC = 0x3a, + IB_SA_ATTR_SERVICE_ASSOC_REC = 0x3b +}; + +#define PATH_REC_FIELD(field) \ + .struct_offset_bytes = offsetof(struct ib_sa_path_rec, field), \ + .struct_size_bytes = sizeof ((struct ib_sa_path_rec *) 0)->field, \ + .field_name = "sa_path_rec:" #field + +static const struct ib_field path_rec_table[] = { + { RESERVED, + .offset_words = 0, + .offset_bits = 0, + .size_bits = 32 }, + { RESERVED, + .offset_words = 1, + .offset_bits = 0, + .size_bits = 32 }, + { PATH_REC_FIELD(dgid), + .offset_words = 2, + .offset_bits = 0, + .size_bits = 128 }, + { PATH_REC_FIELD(sgid), + .offset_words = 6, + .offset_bits = 0, + .size_bits = 128 }, + { PATH_REC_FIELD(dlid), + .offset_words = 10, + .offset_bits = 0, + .size_bits = 16 }, + { PATH_REC_FIELD(slid), + .offset_words = 10, + .offset_bits = 16, + .size_bits = 16 }, + { PATH_REC_FIELD(raw_traffic), + .offset_words = 11, + .offset_bits = 0, + .size_bits = 1 }, + { RESERVED, + .offset_words = 11, + .offset_bits = 1, + .size_bits = 3 }, + { PATH_REC_FIELD(flow_label), + .offset_words = 11, + .offset_bits = 4, + .size_bits = 20 }, + { PATH_REC_FIELD(hop_limit), + .offset_words = 11, + .offset_bits = 24, + .size_bits = 8 }, + { PATH_REC_FIELD(traffic_class), + .offset_words = 12, + .offset_bits = 0, + .size_bits = 8 }, + { PATH_REC_FIELD(reversible), + .offset_words = 12, + .offset_bits = 8, + .size_bits = 1 }, + { PATH_REC_FIELD(numb_path), + .offset_words = 12, + .offset_bits = 9, + .size_bits = 7 }, + { PATH_REC_FIELD(pkey), + .offset_words = 12, + .offset_bits = 16, + .size_bits = 16 }, + { RESERVED, + .offset_words = 13, + .offset_bits = 0, + .size_bits = 12 }, + { PATH_REC_FIELD(sl), + .offset_words = 13, + .offset_bits = 12, + .size_bits = 4 }, + { PATH_REC_FIELD(mtu_selector), + .offset_words = 13, + .offset_bits = 16, + .size_bits = 2 }, + { PATH_REC_FIELD(mtu), + .offset_words = 13, + .offset_bits = 18, + .size_bits = 6 }, + { PATH_REC_FIELD(rate_selector), + .offset_words = 13, + .offset_bits = 24, + .size_bits = 2 }, + { PATH_REC_FIELD(rate), + .offset_words = 13, + .offset_bits = 26, + .size_bits = 6 }, + { PATH_REC_FIELD(packet_life_time_selector), + .offset_words = 14, + .offset_bits = 0, + .size_bits = 2 }, + { PATH_REC_FIELD(packet_life_time), + .offset_words = 14, + .offset_bits = 2, + .size_bits = 6 }, + { PATH_REC_FIELD(preference), + .offset_words = 14, + .offset_bits = 8, + .size_bits = 8 }, + { RESERVED, + .offset_words = 14, + .offset_bits = 16, + .size_bits = 48 }, +}; + +#define MCMEMBER_REC_FIELD(field) \ + .struct_offset_bytes = offsetof(struct ib_sa_mcmember_rec, field), \ + .struct_size_bytes = sizeof ((struct ib_sa_mcmember_rec *) 0)->field, \ + .field_name = "sa_mcmember_rec:" #field + +static const struct ib_field mcmember_rec_table[] = { + { MCMEMBER_REC_FIELD(mgid), + .offset_words = 0, + .offset_bits = 0, + .size_bits = 128 }, + { MCMEMBER_REC_FIELD(port_gid), + .offset_words = 4, + .offset_bits = 0, + .size_bits = 128 }, + { MCMEMBER_REC_FIELD(qkey), + .offset_words = 8, + .offset_bits = 0, + .size_bits = 32 }, + { MCMEMBER_REC_FIELD(mlid), + .offset_words = 9, + .offset_bits = 0, + .size_bits = 16 }, + { MCMEMBER_REC_FIELD(mtu_selector), + .offset_words = 9, + .offset_bits = 16, + .size_bits = 2 }, + { MCMEMBER_REC_FIELD(mtu), + .offset_words = 9, + .offset_bits = 18, + .size_bits = 6 }, + { MCMEMBER_REC_FIELD(traffic_class), + .offset_words = 9, + .offset_bits = 24, + .size_bits = 8 }, + { MCMEMBER_REC_FIELD(pkey), + .offset_words = 10, + .offset_bits = 0, + .size_bits = 16 }, + { MCMEMBER_REC_FIELD(rate_selector), + .offset_words = 10, + .offset_bits = 16, + .size_bits = 2 }, + { MCMEMBER_REC_FIELD(rate), + .offset_words = 10, + .offset_bits = 18, + .size_bits = 6 }, + { MCMEMBER_REC_FIELD(packet_life_time_selector), + .offset_words = 10, + .offset_bits = 24, + .size_bits = 2 }, + { MCMEMBER_REC_FIELD(packet_life_time), + .offset_words = 10, + .offset_bits = 26, + .size_bits = 6 }, + { MCMEMBER_REC_FIELD(sl), + .offset_words = 11, + .offset_bits = 0, + .size_bits = 4 }, + { MCMEMBER_REC_FIELD(flow_label), + .offset_words = 11, + .offset_bits = 4, + .size_bits = 20 }, + { MCMEMBER_REC_FIELD(hop_limit), + .offset_words = 11, + .offset_bits = 24, + .size_bits = 8 }, + { MCMEMBER_REC_FIELD(scope), + .offset_words = 12, + .offset_bits = 0, + .size_bits = 4 }, + { MCMEMBER_REC_FIELD(join_state), + .offset_words = 12, + .offset_bits = 4, + .size_bits = 4 }, + { MCMEMBER_REC_FIELD(proxy_join), + .offset_words = 12, + .offset_bits = 8, + .size_bits = 1 }, + { RESERVED, + .offset_words = 12, + .offset_bits = 9, + .size_bits = 23 }, +}; + +static void free_sm_ah(struct kref *kref) +{ + struct ib_sa_sm_ah *sm_ah = container_of(kref, struct ib_sa_sm_ah, ref); + + ib_destroy_ah(sm_ah->ah); + kfree(sm_ah); +} + +static void update_sm_ah(void *port_ptr) +{ + struct ib_sa_port *port = port_ptr; + struct ib_sa_sm_ah *new_ah, *old_ah; + struct ib_port_attr port_attr; + struct ib_ah_attr ah_attr; + + if (ib_query_port(port->agent->device, port->port_num, &port_attr)) { + printk(KERN_WARNING "Couldn't query port\n"); + return; + } + + new_ah = kmalloc(sizeof *new_ah, GFP_KERNEL); + if (!new_ah) { + printk(KERN_WARNING "Couldn't allocate new SM AH\n"); + return; + } + + kref_init(&new_ah->ref); + + memset(&ah_attr, 0, sizeof ah_attr); + ah_attr.dlid = port_attr.sm_lid; + ah_attr.sl = port_attr.sm_sl; + ah_attr.port_num = port->port_num; + + new_ah->ah = ib_create_ah(port->agent->qp->pd, &ah_attr); + if (IS_ERR(new_ah->ah)) { + printk(KERN_WARNING "Couldn't create new SM AH\n"); + kfree(new_ah); + return; + } + + spin_lock_irq(&port->ah_lock); + old_ah = port->sm_ah; + port->sm_ah = new_ah; + spin_unlock_irq(&port->ah_lock); + + if (old_ah) + kref_put(&old_ah->ref, free_sm_ah); +} + +static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event) +{ + if (event->event == IB_EVENT_PORT_ERR || + event->event == IB_EVENT_PORT_ACTIVE || + event->event == IB_EVENT_LID_CHANGE || + event->event == IB_EVENT_PKEY_CHANGE || + event->event == IB_EVENT_SM_CHANGE) { + struct ib_sa_device *sa_dev = + ib_get_client_data(event->device, &sa_client); + + schedule_work(&sa_dev->port[event->element.port_num - + sa_dev->start_port].update_task); + } +} + +void ib_sa_cancel_query(int id, struct ib_sa_query *query) +{ + unsigned long flags; + + spin_lock_irqsave(&idr_lock, flags); + if (idr_find(&query_idr, query->id) != query) { + spin_unlock_irqrestore(&idr_lock, flags); + return; + } + spin_unlock_irqrestore(&idr_lock, flags); + + ib_cancel_mad(query->port->agent, query->id); +} +EXPORT_SYMBOL(ib_sa_cancel_query); + +static void init_mad(struct ib_sa_mad *mad, struct ib_mad_agent *agent) +{ + memset(mad, 0, sizeof *mad); + + mad->mad_hdr.base_version = IB_MGMT_BASE_VERSION; + mad->mad_hdr.mgmt_class = IB_MGMT_CLASS_SUBN_ADM; + mad->mad_hdr.class_version = IB_SA_CLASS_VERSION; +} + +static int send_mad(struct ib_sa_query *query, int timeout_ms) +{ + struct ib_sa_port *port = query->port; + unsigned long flags; + int ret; + struct ib_sge gather_list; + struct ib_send_wr *bad_wr, wr = { + .opcode = IB_WR_SEND, + .sg_list = &gather_list, + .num_sge = 1, + .send_flags = IB_SEND_SIGNALED, + .wr = { + .ud = { + .mad_hdr = &query->mad->mad_hdr, + .remote_qpn = 1, + .remote_qkey = IB_QP1_QKEY, + .timeout_ms = timeout_ms + } + } + }; + +retry: + if (!idr_pre_get(&query_idr, GFP_ATOMIC)) + return -ENOMEM; + spin_lock_irqsave(&idr_lock, flags); + ret = idr_get_new(&query_idr, query, &query->id); + spin_unlock_irqrestore(&idr_lock, flags); + if (ret == -EAGAIN) + goto retry; + if (ret) + return ret; + + query->mad->mad_hdr.tid = + cpu_to_be64(((u64) port->agent->hi_tid) << 32 | query->id); + wr.wr_id = query->id; + printk("tid %016llx id %08x\n", be64_to_cpu(query->mad->mad_hdr.tid), query->id); + + spin_lock_irqsave(&port->ah_lock, flags); + kref_get(&port->sm_ah->ref); + query->sm_ah = port->sm_ah; + wr.wr.ud.ah = port->sm_ah->ah; + spin_unlock_irqrestore(&port->ah_lock, flags); + + gather_list.addr = pci_map_single(port->agent->device->dma_device, + query->mad, + sizeof (struct ib_sa_mad), + PCI_DMA_TODEVICE); + gather_list.length = sizeof (struct ib_sa_mad); + gather_list.lkey = port->mr->lkey; + pci_unmap_addr_set(query, mapping, gather_list.addr); + + ret = ib_post_send_mad(port->agent, &wr, &bad_wr); + if (ret) { + pci_unmap_single(port->agent->device->dma_device, + pci_unmap_addr(query, mapping), + sizeof (struct ib_sa_mad), + PCI_DMA_TODEVICE); + kref_put(&query->sm_ah->ref, free_sm_ah); + spin_lock_irqsave(&idr_lock, flags); + idr_remove(&query_idr, query->id); + spin_unlock_irqrestore(&idr_lock, flags); + } + + return ret; +} + +static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query, + int status, + struct ib_sa_mad *mad) +{ + struct ib_sa_path_query *query = + container_of(sa_query, struct ib_sa_path_query, sa_query); + + if (mad) { + struct ib_sa_path_rec rec; + + ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table), + mad->data, &rec); + query->callback(status, &rec, query->context); + } else + query->callback(status, NULL, query->context); +} + +static void ib_sa_path_rec_release(struct ib_sa_query *sa_query) +{ + kfree(container_of(sa_query, struct ib_sa_path_query, sa_query)); +} + +int ib_sa_path_rec_get(struct ib_device *device, u8 port_num, + struct ib_sa_path_rec *rec, u64 comp_mask, + int timeout_ms, int gfp_mask, + void (*callback)(int status, + struct ib_sa_path_rec *resp, + void *context), + void *context, + struct ib_sa_query **sa_query) +{ + struct ib_sa_path_query *query; + struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client); + struct ib_sa_port *port = &sa_dev->port[port_num - sa_dev->start_port]; + struct ib_mad_agent *agent = port->agent; + int ret; + + query = kmalloc(sizeof *query, gfp_mask); + if (!query) + return -ENOMEM; + query->sa_query.mad = kmalloc(sizeof *query->sa_query.mad, gfp_mask); + if (!query->sa_query.mad) { + kfree(query); + return -ENOMEM; + } + + query->callback = callback; + query->context = context; + + init_mad(query->sa_query.mad, agent); + + query->sa_query.callback = ib_sa_path_rec_callback; + query->sa_query.release = ib_sa_path_rec_release; + query->sa_query.port = port; + query->sa_query.mad->mad_hdr.method = IB_MGMT_METHOD_GET; + query->sa_query.mad->mad_hdr.attr_id = cpu_to_be16(IB_SA_ATTR_PATH_REC); + query->sa_query.mad->sa_hdr.comp_mask = comp_mask; + + ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), + rec, query->sa_query.mad->data); + + ret = send_mad(&query->sa_query, timeout_ms); + if (ret) + kfree(query); + + *sa_query = &query->sa_query; + + return ret ? ret : query->sa_query.id; +} +EXPORT_SYMBOL(ib_sa_path_rec_get); + +static void ib_sa_mcmember_rec_callback(struct ib_sa_query *sa_query, + int status, + struct ib_sa_mad *mad) +{ + struct ib_sa_mcmember_query *query = + container_of(sa_query, struct ib_sa_mcmember_query, sa_query); + + if (mad) { + struct ib_sa_mcmember_rec rec; + + ib_unpack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table), + mad->data, &rec); + query->callback(status, &rec, query->context); + } else + query->callback(status, NULL, query->context); +} + +static void ib_sa_mcmember_rec_release(struct ib_sa_query *sa_query) +{ + kfree(container_of(sa_query, struct ib_sa_mcmember_query, sa_query)); +} + +int ib_sa_mcmember_rec_set(struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, u64 comp_mask, + int timeout_ms, int gfp_mask, + void (*callback)(int status, + struct ib_sa_mcmember_rec *resp, + void *context), + void *context, + struct ib_sa_query **sa_query) +{ + struct ib_sa_mcmember_query *query; + struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client); + struct ib_sa_port *port = &sa_dev->port[port_num - sa_dev->start_port]; + struct ib_mad_agent *agent = port->agent; + int ret; + + query = kmalloc(sizeof *query, gfp_mask); + if (!query) + return -ENOMEM; + query->sa_query.mad = kmalloc(sizeof *query->sa_query.mad, gfp_mask); + if (!query->sa_query.mad) { + kfree(query); + return -ENOMEM; + } + + query->callback = callback; + query->context = context; + + init_mad(query->sa_query.mad, agent); + + query->sa_query.callback = ib_sa_mcmember_rec_callback; + query->sa_query.release = ib_sa_mcmember_rec_release; + query->sa_query.port = port; + query->sa_query.mad->mad_hdr.method = IB_MGMT_METHOD_SET; + query->sa_query.mad->mad_hdr.attr_id = cpu_to_be16(IB_SA_ATTR_MC_MEMBER_REC); + query->sa_query.mad->sa_hdr.comp_mask = comp_mask; + + ib_pack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table), + rec, query->sa_query.mad->data); + + ret = send_mad(&query->sa_query, timeout_ms); + if (ret) + kfree(query); + + *sa_query = &query->sa_query; + + return ret ? ret : query->sa_query.id; +} +EXPORT_SYMBOL(ib_sa_mcmember_rec_set); + +static void send_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_send_wc *mad_send_wc) +{ + struct ib_sa_query *query; + unsigned long flags; + + spin_lock_irqsave(&idr_lock, flags); + query = idr_find(&query_idr, mad_send_wc->wr_id); + spin_unlock_irqrestore(&idr_lock, flags); + + if (!query) + return; + + if (mad_send_wc->status) + query->callback(query, + mad_send_wc->status == IB_WC_RESP_TIMEOUT_ERR ? + -ETIMEDOUT : -EIO, NULL); + + query->release(query); + + spin_lock_irqsave(&idr_lock, flags); + idr_remove(&query_idr, mad_send_wc->wr_id); + spin_unlock_irqrestore(&idr_lock, flags); +} + +static void recv_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_recv_wc *mad_recv_wc) +{ + struct ib_sa_query *query; + unsigned long flags; + + spin_lock_irqsave(&idr_lock, flags); + query = idr_find(&query_idr, mad_recv_wc->wc->wr_id); + spin_unlock_irqrestore(&idr_lock, flags); + + if (query) { + if (mad_recv_wc->wc->status == IB_WC_SUCCESS) + query->callback(query, + mad_recv_wc->recv_buf->mad->mad_hdr.status ? + -EINVAL : 0, + (struct ib_sa_mad *) mad_recv_wc->recv_buf->mad); + else + query->callback(query, -EIO, NULL); + } + + ib_free_recv_mad(mad_recv_wc); +} + +static void ib_sa_add_one(struct ib_device *device) +{ + struct ib_sa_device *sa_dev; + int s, e, i; + + if (device->node_type == IB_NODE_SWITCH) + s = e = 0; + else { + struct ib_device_attr attr; + if (ib_query_device(device, &attr)) + return; + + s = 1; + e = attr.phys_port_cnt; + } + + sa_dev = kmalloc(sizeof *sa_dev + + (e - s + 1) * sizeof (struct ib_sa_port), + GFP_KERNEL); + if (!sa_dev) + return; + + sa_dev->start_port = s; + sa_dev->end_port = e; + + for (i = s; i <= e; ++i) { + sa_dev->port[i - s].mr = NULL; + sa_dev->port[i - s].sm_ah = NULL; + sa_dev->port[i - s].port_num = i; + spin_lock_init(&sa_dev->port[i - s].ah_lock); + + sa_dev->port[i - s].agent = + ib_register_mad_agent(device, i, IB_QPT_GSI, + NULL, 0, send_handler, + recv_handler, sa_dev); + if (IS_ERR(sa_dev->port[i - s].agent)) + goto err; + + sa_dev->port[i - s].mr = ib_get_dma_mr(sa_dev->port[i - s].agent->qp->pd, + IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(sa_dev->port[i - s].mr)) { + /* Bump i so agent from this iter. is freed */ + ++i; + goto err; + } + + INIT_WORK(&sa_dev->port[i - s].update_task, + update_sm_ah, &sa_dev->port[i - s]); + } + + /* + * We register our event handler after everything is set up, + * and then update our cached info after the event handler is + * registered to avoid any problems if a port changes state + * during our initialization. + */ + + INIT_IB_EVENT_HANDLER(&sa_dev->event_handler, device, ib_sa_event); + if (ib_register_event_handler(&sa_dev->event_handler)) { + kfree(sa_dev); + goto err; + } + + for (i = s; i <= e; ++i) + update_sm_ah(&sa_dev->port[i - s]); + + ib_set_client_data(device, &sa_client, sa_dev); + + return; + +err: + while (--i >= s) { + if (sa_dev->port[i - s].mr && !IS_ERR(sa_dev->port[i - s].mr)) + ib_dereg_mr(sa_dev->port[i - s].mr); + + if (sa_dev->port[i - s].sm_ah) + kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah); + + ib_unregister_mad_agent(sa_dev->port[i - s].agent); + } + + kfree(sa_dev); + + return; +} + +static void ib_sa_remove_one(struct ib_device *device) +{ + struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client); + int i; + + if (!sa_dev) + return; + + ib_unregister_event_handler(&sa_dev->event_handler); + + for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) { + ib_unregister_mad_agent(sa_dev->port[i].agent); + kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah); + } + + kfree(sa_dev); +} + +static int __init ib_sa_init(void) +{ + int ret; + + ret = ib_register_client(&sa_client); + if (ret) + printk(KERN_ERR "Couldn't register ib_sa client\n"); + + return ret; +} + +static void __exit ib_sa_cleanup(void) +{ + ib_unregister_client(&sa_client); +} + +module_init(ib_sa_init); +module_exit(ib_sa_cleanup); From roland at topspin.com Wed Oct 27 22:20:00 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 27 Oct 2004 22:20:00 -0700 Subject: [openib-general] [PATCH] Convert IPoIB to use new SA module Message-ID: <52sm7z2zcv.fsf@topspin.com> This converts IPoIB to use the new SA API for PathRecord and MCMemberRecord transactions. Correcting the component mask used for multicast joins after the initial broadcast group still needs to be done... - R. Index: ulp/ipoib/ipoib_main.c =================================================================== --- ulp/ipoib/ipoib_main.c (revision 1085) +++ ulp/ipoib/ipoib_main.c (working copy) @@ -232,22 +232,24 @@ return 0; } -static int path_rec_completion(tTS_IB_CLIENT_QUERY_TID tid, - int status, - struct ib_path_record *pathrec, - int remaining, void *path_ptr) +static void path_rec_completion(int status, + struct ib_sa_path_rec *pathrec, + void *path_ptr) { struct ipoib_path *path = path_ptr; struct ipoib_dev_priv *priv = netdev_priv(path->dev); struct sk_buff *skb; struct ib_ah *ah; - if (status) + ipoib_dbg(priv, "status %d, LID 0x%04x for GID " IPOIB_GID_FMT "\n", + status, be16_to_cpu(pathrec->dlid), IPOIB_GID_ARG(pathrec->dgid)); + + if (status != IB_WC_SUCCESS) goto err; { struct ib_ah_attr av = { - .dlid = pathrec->dlid, + .dlid = be16_to_cpu(pathrec->dlid), .sl = pathrec->sl, .src_path_bits = 0, .static_rate = 0, @@ -273,7 +275,7 @@ "to requeue packet\n"); } - return 1; + return; err: while ((skb = __skb_dequeue(&path->queue))) @@ -283,15 +285,16 @@ IPOIB_PATH(path->neighbour) = NULL; kfree(path); - - return 1; } static int path_rec_start(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path = kmalloc(sizeof *path, GFP_ATOMIC); - tTS_IB_CLIENT_QUERY_TID tid; + struct ib_sa_path_rec rec = { + .numb_path = 1 + }; + struct ib_sa_query *query; if (!path) goto err; @@ -303,17 +306,23 @@ __skb_queue_tail(&path->queue, skb); path->neighbour = NULL; + rec.sgid = priv->local_gid; + memcpy(rec.dgid.raw, skb->dst->neighbour->ha + 4, 16); + rec.pkey = cpu_to_be16(priv->pkey); + /* * XXX there's a race here if path record completion runs * before we get to finish up. Add a lock to path struct? */ - if (tsIbPathRecordRequest(priv->ca, priv->port, - priv->local_gid.raw, - skb->dst->neighbour->ha + 4, - priv->pkey, 0, HZ, 0, - path_rec_completion, - path, &tid)) { - ipoib_warn(priv, "tsIbPathRecordRequest failed\n"); + if (ib_sa_path_rec_get(priv->ca, priv->port, &rec, + IB_SA_PATH_REC_DGID | + IB_SA_PATH_REC_SGID | + IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_PKEY, + 1000, GFP_ATOMIC, + path_rec_completion, + path, &query) < 0) { + ipoib_warn(priv, "ib_sa_path_rec_get failed\n"); goto err; } @@ -329,21 +338,23 @@ return 0; } -static int unicast_arp_completion(tTS_IB_CLIENT_QUERY_TID tid, - int status, - struct ib_path_record *pathrec, - int remaining, void *skb_ptr) +static void unicast_arp_completion(int status, + struct ib_sa_path_rec *pathrec, + void *skb_ptr) { struct sk_buff *skb = skb_ptr; struct ipoib_dev_priv *priv = netdev_priv(skb->dev); struct ib_ah *ah; + ipoib_dbg(priv, "status %d, LID 0x%04x for GID " IPOIB_GID_FMT "\n", + status, be16_to_cpu(pathrec->dlid), IPOIB_GID_ARG(pathrec->dgid)); + if (status) goto err; { struct ib_ah_attr av = { - .dlid = pathrec->dlid, + .dlid = be16_to_cpu(pathrec->dlid), .sl = pathrec->sl, .src_path_bits = 0, .static_rate = 0, @@ -363,12 +374,10 @@ ipoib_warn(priv, "dev_queue_xmit failed " "to requeue ARP packet\n"); - return 1; + return; err: dev_kfree_skb(skb); - - return 1; } static void unicast_arp_finish(struct sk_buff *skb) @@ -394,7 +403,10 @@ { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *tmp_skb; - tTS_IB_CLIENT_QUERY_TID tid; + struct ib_sa_path_rec rec = { + .numb_path = 1 + }; + struct ib_sa_query *query; if (skb->destructor) { tmp_skb = skb; @@ -410,18 +422,24 @@ skb->destructor = unicast_arp_finish; memset(skb->cb, 0, sizeof skb->cb); + rec.sgid = priv->local_gid; + memcpy(rec.dgid.raw, phdr->hwaddr + 4, 16); + rec.pkey = cpu_to_be16(priv->pkey); + /* * XXX We need to keep a record of the skb and TID somewhere * so that we can cancel the request if the device goes down * before it finishes. */ - if (tsIbPathRecordRequest(priv->ca, priv->port, - priv->local_gid.raw, - phdr->hwaddr + 4, - priv->pkey, 0, HZ, 0, - unicast_arp_completion, - skb, &tid)) { - ipoib_warn(priv, "tsIbPathRecordRequest failed\n"); + if (ib_sa_path_rec_get(priv->ca, priv->port, &rec, + IB_SA_PATH_REC_DGID | + IB_SA_PATH_REC_SGID | + IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_PKEY, + 1000, GFP_ATOMIC, + unicast_arp_completion, + skb, &query) < 0) { + ipoib_warn(priv, "ib_sa_path_rec_get failed\n"); ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); } @@ -736,6 +754,15 @@ priv->dev->broadcast[8] = priv->pkey >> 8; priv->dev->broadcast[9] = priv->pkey & 0xff; + result = ib_query_gid(hca, port, 0, &priv->local_gid); + if (result) { + printk(KERN_WARNING "%s: ib_query_gid port %d failed (ret = %d)\n", + hca->name, port, result); + goto alloc_mem_failed; + } else + memcpy(priv->dev->dev_addr + 4, priv->local_gid.raw, sizeof (union ib_gid)); + + result = ipoib_dev_init(priv->dev, hca, port); if (result < 0) { printk(KERN_WARNING "%s: failed to initialize port %d (ret = %d)\n", Index: ulp/ipoib/ipoib.h =================================================================== --- ulp/ipoib/ipoib.h (revision 1085) +++ ulp/ipoib/ipoib.h (working copy) @@ -39,9 +39,8 @@ #include #include +#include -#include - /* constants */ enum { @@ -102,7 +101,8 @@ struct semaphore mcast_mutex; - tTS_IB_CLIENT_QUERY_TID mcast_tid; + int mcast_query_id; + struct ib_sa_query *mcast_query; struct ipoib_mcast *broadcast; struct list_head multicast_list; Index: ulp/ipoib/ipoib_ib.c =================================================================== --- ulp/ipoib/ipoib_ib.c (revision 1085) +++ ulp/ipoib/ipoib_ib.c (working copy) @@ -25,8 +25,6 @@ #include "ipoib.h" -#include "ts_ib_sa_client.h" - static DECLARE_MUTEX(pkey_sem); static int _ipoib_ib_receive(struct ipoib_dev_priv *priv, @@ -427,7 +425,7 @@ priv->ca = ca; priv->port = port; priv->qp = NULL; - priv->mcast_tid = TS_IB_CLIENT_QUERY_TID_INVALID; + priv->mcast_query = NULL; if (ipoib_transport_dev_init(dev, ca)) { printk(KERN_WARNING "%s: ipoib_transport_dev_init failed\n", ca->name); Index: ulp/ipoib/ipoib_multicast.c =================================================================== --- ulp/ipoib/ipoib_multicast.c (revision 1085) +++ ulp/ipoib/ipoib_multicast.c (working copy) @@ -30,8 +30,6 @@ #include "ipoib.h" -#include "ts_ib_sa_client.h" - static DECLARE_MUTEX(mcast_mutex); /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */ @@ -43,10 +41,12 @@ unsigned long created; unsigned long backoff; - struct ib_multicast_member mcast_member; + struct ib_sa_mcmember_rec mcmember; struct ib_ah *address_handle; - tTS_IB_CLIENT_QUERY_TID tid; + int query_id; + struct ib_sa_query *query; + union ib_gid mgid; unsigned long flags; @@ -125,7 +125,7 @@ mcast->address_handle = NULL; /* Will force a trigger on the first packet we need to send */ - mcast->tid = TS_IB_CLIENT_QUERY_TID_INVALID; + mcast->query = NULL; return mcast; } @@ -189,14 +189,13 @@ /* =============================================================== */ /*..ipoib_mcast_join_finish - finish joining mcast group entry */ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast, - struct ib_multicast_member *member_ptr) + struct ib_sa_mcmember_rec *mcmember) { struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); int ret; - mcast->mcast_member = *member_ptr; - priv->qkey = priv->broadcast->mcast_member.qkey; + mcast->mcmember = *mcmember; if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { ipoib_warn(priv, "multicast group " IPOIB_GID_FMT @@ -208,9 +207,9 @@ /* Set the cached Q_Key before we attach if it's the broadcast group */ if (!memcmp(mcast->mgid.raw, priv->dev->broadcast + 4, sizeof (union ib_gid))) - priv->qkey = priv->broadcast->mcast_member.qkey; + priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey); - ret = ipoib_mcast_attach(dev, mcast->mcast_member.mlid, &mcast->mgid); + ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mgid); if (ret < 0) { ipoib_warn(priv, "couldn't attach QP to multicast group " IPOIB_GID_FMT "\n", @@ -222,21 +221,21 @@ { struct ib_ah_attr av = { - .dlid = mcast->mcast_member.mlid, + .dlid = be16_to_cpu(mcast->mcmember.mlid), .port_num = priv->port, - .sl = mcast->mcast_member.sl, + .sl = mcast->mcmember.sl, .src_path_bits = 0, .static_rate = 0, .ah_flags = IB_AH_GRH, .grh = { - .flow_label = mcast->mcast_member.flowlabel, - .hop_limit = mcast->mcast_member.hoplmt, + .flow_label = be32_to_cpu(mcast->mcmember.flow_label), + .hop_limit = mcast->mcmember.hop_limit, .sgid_index = 0, - .traffic_class = mcast->mcast_member.tclass + .traffic_class = mcast->mcmember.traffic_class } }; - memcpy(av.grh.dgid.raw, mcast->mcast_member.mgid, + memcpy(av.grh.dgid.raw, mcast->mcmember.mgid.raw, sizeof (union ib_gid)); mcast->address_handle = ib_create_ah(priv->pd, &av); @@ -247,8 +246,8 @@ " AV %p, LID 0x%04x, SL %d\n", IPOIB_GID_ARG(mcast->mgid), mcast->address_handle, - mcast->mcast_member.mlid, - mcast->mcast_member.sl); + be16_to_cpu(mcast->mcmember.mlid), + mcast->mcmember.sl); } } @@ -268,19 +267,18 @@ /* =============================================================== */ /*..ipoib_mcast_sendonly_join_complete -- handler for multicast join */ static void -ipoib_mcast_sendonly_join_complete(tTS_IB_CLIENT_QUERY_TID tid, - int status, - struct ib_multicast_member *member_ptr, +ipoib_mcast_sendonly_join_complete(int status, + struct ib_sa_mcmember_rec *mcmember, void *mcast_ptr) { struct ipoib_mcast *mcast = mcast_ptr; struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); - mcast->tid = TS_IB_CLIENT_QUERY_TID_INVALID; + mcast->query = NULL; if (!status) - ipoib_mcast_join_finish(mcast, member_ptr); + ipoib_mcast_join_finish(mcast, mcmember); else { if (mcast->logcount++ < 20) ipoib_dbg_mcast(priv, "multicast join failed for " IPOIB_GID_FMT @@ -311,7 +309,14 @@ { struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); - tTS_IB_CLIENT_QUERY_TID tid; + struct ib_sa_mcmember_rec rec = { +#if 0 /* Some SMs don't support send-only yet */ + .join_state = 4 +#else + .join_state = 1 +#endif + }; + struct ib_sa_query *query; int ret = 0; atomic_inc(&priv->mcast_joins); @@ -329,16 +334,20 @@ } ipoib_mcast_get(mcast); - ret = tsIbMulticastGroupJoin(priv->ca, - priv->port, mcast->mgid.raw, priv->pkey, -/* ib_sm doesn't support send only yet - TS_IB_MULTICAST_JOIN_SEND_ONLY_NON_MEMBER, -*/ - TS_IB_MULTICAST_JOIN_FULL_MEMBER, - HZ, + + rec.mgid = mcast->mgid; + rec.port_gid = priv->local_gid; + rec.pkey = be16_to_cpu(priv->pkey); + + ret = ib_sa_mcmember_rec_set(priv->ca, priv->port, &rec, + IB_SA_MCMEMBER_REC_MGID | + IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_PKEY | + IB_SA_MCMEMBER_REC_JOIN_STATE, + 1000, GFP_ATOMIC, ipoib_mcast_sendonly_join_complete, - mcast, &tid); - if (ret) { + mcast, &query); + if (ret < 0) { ipoib_warn(priv, "tsIbMulticastGroupJoin failed (ret = %d)\n", ret); ipoib_mcast_put(mcast); @@ -347,7 +356,8 @@ ", starting join\n", IPOIB_GID_ARG(mcast->mgid)); - mcast->tid = tid; + mcast->query = query; + mcast->query_id = ret; } out: @@ -359,18 +369,17 @@ /* =============================================================== */ /*..ipoib_mcast_join_complete - handle comp of mcast join */ -static void ipoib_mcast_join_complete(tTS_IB_CLIENT_QUERY_TID tid, - int status, - struct ib_multicast_member *member_ptr, +static void ipoib_mcast_join_complete(int status, + struct ib_sa_mcmember_rec *mcmember, void *mcast_ptr) { struct ipoib_mcast *mcast = mcast_ptr; struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); - priv->mcast_tid = TS_IB_CLIENT_QUERY_TID_INVALID; + priv->mcast_query = NULL; - if (!status && !ipoib_mcast_join_finish(mcast, member_ptr)) { + if (!status && !ipoib_mcast_join_finish(mcast, mcmember)) { mcast->backoff = HZ; down(&mcast_mutex); if (!test_bit(IPOIB_MCAST_STOP, &priv->flags)) @@ -410,24 +419,30 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast) { struct ipoib_dev_priv *priv = netdev_priv(dev); - int status; + struct ib_sa_mcmember_rec rec = { + .join_state = 1 + }; + int ret = 0; ipoib_dbg_mcast(priv, "joining MGID " IPOIB_GID_FMT "\n", IPOIB_GID_ARG(mcast->mgid)); - status = tsIbMulticastGroupJoin(priv->ca, - priv->port, - mcast->mgid.raw, - priv->pkey, - TS_IB_MULTICAST_JOIN_FULL_MEMBER, - mcast->backoff, - ipoib_mcast_join_complete, - mcast, &priv->mcast_tid); + rec.mgid = mcast->mgid; + rec.port_gid = priv->local_gid; + rec.pkey = be16_to_cpu(priv->pkey); - if (status) { - ipoib_warn(priv, "tsIbMulticastGroupJoin failed, status %d\n", - status); + ret = ib_sa_mcmember_rec_set(priv->ca, priv->port, &rec, + IB_SA_MCMEMBER_REC_MGID | + IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_PKEY | + IB_SA_MCMEMBER_REC_JOIN_STATE, + mcast->backoff * 1000, GFP_ATOMIC, + ipoib_mcast_join_complete, + mcast, &priv->mcast_query); + if (ret < 0) { + ipoib_warn(priv, "tsIbMulticastGroupJoin failed, status %d\n", ret); + mcast->backoff *= 2; if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS; @@ -438,7 +453,8 @@ &priv->mcast_task, mcast->backoff); up(&mcast_mutex); - } + } else + priv->mcast_query_id = ret; } /* =============================================================== */ @@ -456,6 +472,11 @@ } up(&mcast_mutex); + if (ib_query_gid(priv->ca, priv->port, 0, &priv->local_gid)) + ipoib_warn(priv, "ib_gid_entry_get() failed\n"); + else + memcpy(priv->dev->dev_addr + 4, priv->local_gid.raw, sizeof (union ib_gid)); + if (!priv->broadcast) { priv->broadcast = ipoib_mcast_alloc(dev, 1); if (!priv->broadcast) { @@ -513,12 +534,7 @@ priv->local_lid = port_lid.lid; } - if (ib_query_gid(priv->ca, priv->port, 0, &priv->local_gid)) - ipoib_warn(priv, "ib_gid_entry_get() failed\n"); - else - memcpy(priv->dev->dev_addr + 4, priv->local_gid.raw, sizeof (union ib_gid)); - - priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcast_member.mtu) + priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) - IPOIB_ENCAP_LEN; dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); @@ -554,9 +570,9 @@ down(&mcast_mutex); - if (priv->mcast_tid != TS_IB_CLIENT_QUERY_TID_INVALID) { - ib_client_query_cancel(priv->mcast_tid); - priv->mcast_tid = TS_IB_CLIENT_QUERY_TID_INVALID; + if (priv->mcast_query) { + ib_sa_cancel_query(priv->mcast_query_id, priv->mcast_query); + priv->mcast_query = NULL; } set_bit(IPOIB_MCAST_STOP, &priv->flags); @@ -580,14 +596,11 @@ return 0; /* Remove ourselves from the multicast group */ - result = ipoib_mcast_detach(dev, mcast->mcast_member.mlid, &mcast->mgid); + result = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mgid); if (result) ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", result); - result = tsIbMulticastGroupLeave(priv->ca, priv->port, - mcast->mcast_member.mgid); - if (result) - ipoib_warn(priv, "tsIbMulticastGroupLeave failed (result = %d)\n", result); + /* XXX implement leaving SA's multicast group */ return 0; } @@ -648,7 +661,7 @@ } if (!mcast->address_handle) { - if (mcast->tid != TS_IB_CLIENT_QUERY_TID_INVALID) + if (mcast->query) ipoib_dbg_mcast(priv, "no address vector, " "but multicast join already started\n"); else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) @@ -728,8 +741,8 @@ list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { list_del_init(&mcast->list); - if (mcast->tid != TS_IB_CLIENT_QUERY_TID_INVALID) - ib_client_query_cancel(mcast->tid); + if (mcast->query) + ib_sa_cancel_query(mcast->query_id, mcast->query); ipoib_mcast_leave(dev, mcast); ipoib_mcast_put(mcast); } @@ -845,8 +858,8 @@ list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { list_del_init(&mcast->list); - if (mcast->tid != TS_IB_CLIENT_QUERY_TID_INVALID) - ib_client_query_cancel(mcast->tid); + if (mcast->query) + ib_sa_cancel_query(mcast->query_id, mcast->query); ipoib_mcast_leave(mcast->dev, mcast); ipoib_mcast_put(mcast); } From roland at topspin.com Wed Oct 27 22:29:02 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 27 Oct 2004 22:29:02 -0700 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <20041027135517.17ba87fb.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 27 Oct 2004 13:55:17 -0700") References: <52pt36brzm.fsf@topspin.com> <1098728484.3269.968.camel@localhost.localdomain> <52bren527p.fsf@topspin.com> <20041027135517.17ba87fb.mshefty@ichips.intel.com> Message-ID: <52oein2yxt.fsf@topspin.com> Sean> I didn't realize that you had taken a copy of the current Sean> mad code. Is there anything in the openib-candidate branch Sean> that isn't in your branch? Does it make sense to just Sean> update the code in the roland-merge branch? I've got everything up to r1080 in my branch (which I think is the latest). I would be fine with consolidating work onto the roland-merge branch and pushing core/mad changes through you and Hal, if that's what the consensus is. Or we could copy roland-merge to a new branch with a different name and work there. - R. From mshefty at ichips.intel.com Wed Oct 27 23:01:19 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 23:01:19 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <20041027095328.2e2a7f63.mshefty@ichips.intel.com> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <1098809096.3266.47.camel@localhost.localdomain> <20041026095009.037a4d4a.mshefty@ichips.intel.com> <1098886805.3266.218.camel@localhost.localdomain> <20041027095328.2e2a7f63.mshefty@ichips.intel.com> Message-ID: <20041027230119.1f5d11d4.mshefty@ichips.intel.com> On Wed, 27 Oct 2004 09:53:28 -0700 Sean Hefty wrote: > I'll create a patch that uses separate send_posted_mad_list's for > QP0/1, but try to keep the changes fairly minimal. I'll do this after > changing the completion handling to use the current workqueue, rather > than allocating a separate thread. (I've canned my user-mode work, > since Roland is further along.) I've run into a few other issues trying to use separate send queues. One of note is that receives are posted to the QP outside of the lock that inserts them onto the recv_posted_mad_list. I don't think that this causes a problem at the moment, since receives are always re-posted from the completion handler, which is single threaded. Question then, should I go ahead and fix this so that it would work in a multi-threaded case, or assume that completion handling will be single threaded and optimize for this by removing unnecessary locking? (Currently, my patch fixes the locking, but it should be noted that the code won't actually test that the locking is correct as it's written.) - Sean From mshefty at ichips.intel.com Wed Oct 27 23:06:41 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Oct 2004 23:06:41 -0700 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <52oein2yxt.fsf@topspin.com> References: <52pt36brzm.fsf@topspin.com> <1098728484.3269.968.camel@localhost.localdomain> <52bren527p.fsf@topspin.com> <20041027135517.17ba87fb.mshefty@ichips.intel.com> <52oein2yxt.fsf@topspin.com> Message-ID: <20041027230641.742e2543.mshefty@ichips.intel.com> On Wed, 27 Oct 2004 22:29:02 -0700 Roland Dreier wrote: > Sean> I didn't realize that you had taken a copy of the current > Sean> mad code. Is there anything in the openib-candidate branch > Sean> that isn't in your branch? Does it make sense to just > Sean> update the code in the roland-merge branch? > > I've got everything up to r1080 in my branch (which I think is the > latest). > > I would be fine with consolidating work onto the roland-merge branch > and pushing core/mad changes through you and Hal, if that's what the > consensus is. > > Or we could copy roland-merge to a new branch with a different name > and work there. Either is fine. I'd just like to get to a single branch. Renaming roland-merge may make it easier for people to locate the correct code. From halr at voltaire.com Thu Oct 28 05:14:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 08:14:26 -0400 Subject: [openib-general] [PATCH] mad: Comment change (Roland's branch) Message-ID: <1098965660.22191.7.camel@hpc-1> mad: Comment change (Roland's branch) Index: mad.c =================================================================== --- mad.c (revision 1088) +++ mad.c (working copy) @@ -1771,7 +1771,7 @@ qp_init_attr.cap.max_recv_wr = IB_MAD_QP_RECV_SIZE; qp_init_attr.cap.max_send_sge = IB_MAD_SEND_REQ_MAX_SG; qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG; - qp_init_attr.qp_type = i; + qp_init_attr.qp_type = i; /* Relies on ib_qp_type enum ordering of IB_QPT_SMI and IB_QPT_GSI */ qp_init_attr.port_num = port_priv->port_num; port_priv->qp[i] = ib_create_qp(port_priv->pd, &qp_init_attr, &qp_cap); From tziporet at mellanox.co.il Thu Oct 28 05:22:36 2004 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 28 Oct 2004 14:22:36 +0200 Subject: [openib-general] agent_mad_send Message-ID: <506C3D7B14CDD411A52C00025558DED6064BE93A@mtlex01.yok.mtl.com> The reason for this is that for special QPs the driver insert the AV to the packet and not the HW. In regular QP the HW reads the AV only when is actually doing the send and thus you have to wait and not destroy the AV till the send completes. Tziporet -----Original Message----- From: Roland Dreier [mailto:roland at topspin.com] Sent: Wednesday, October 27, 2004 6:18 AM To: Sean Hefty Cc: openib-general at openib.org Subject: Re: [openib-general] agent_mad_send Sean> In agent_mad_send, a call is made to create an address Sean> handle. Immediately after calling ib_post_send_mad, the Sean> address handle is destroyed. I think that we want to wait Sean> until the send is completed before destroying the address Sean> handle, and require this of all callers of ib_post_send_mad. Yes, that's correct. Because of a quirk in the way Mellanox HCA's implement special QPs, it's actually OK to destroy the AH immediately after posting the send, but for an ordinary QP this will lead to some bizarre problems. - R. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Oct 28 05:29:28 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 08:29:28 -0400 Subject: [openib-general] hotplug and mthca Message-ID: <1098966568.3266.2947.camel@localhost.localdomain> Hi, When I start mthca, I get the following: /sbin/hotplug: no runnable /etc/hotplug/infiniband.agent is installed How does one get mthca to load automatically ? Note this is just curiousity/understanding. It is not impacting anything. Thanks. -- Hal From tziporet at mellanox.co.il Thu Oct 28 05:47:13 2004 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 28 Oct 2004 14:47:13 +0200 Subject: [openib-general] logset Message-ID: <506C3D7B14CDD411A52C00025558DED6064BE93B@mtlex01.yok.mtl.com> logset print will print into /var/log/messages the kernel modules debug print status You should use it after the driver is loaded. Example: > logset print > tail /var/log/messages Oct 28 14:34:18 swlab99 kernel: mtl_log_set: layer 'print', info '' Oct 28 14:34:18 swlab99 kernel: Oct 28 14:34:18 swlab99 kernel: Layers and severities for print Oct 28 14:34:18 swlab99 kernel: ------------------------------- Oct 28 14:34:18 swlab99 kernel: Layer - "THH": Oct 28 14:34:18 swlab99 kernel: Name="trace", severities="" Oct 28 14:34:18 swlab99 kernel: Name="debug", severities="" Oct 28 14:34:18 swlab99 kernel: Name="error", severities="1234" By default error messages allays printed into the /var/log/messages file so if vstat is not working look for some error message in this file. In case of user level errors then in order to see them you need to set: export MTL_LOG="error:" Tziporet -----Original Message----- From: Ronald G. Minnich [mailto:rminnich at lanl.gov] Sent: Wednesday, October 27, 2004 9:23 PM To: openib-general at openib.org Subject: [openib-general] logset I'm unclear on how to use this. logset print gives back nothing. logset debug 8 gives back nothing. what is this thing? how do I get trace logging of something like vstat, given that vstat is not working on my machine? thanks ron _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Oct 28 06:06:14 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 09:06:14 -0400 Subject: [openib-general] Re: ib_mad_recv_wrid index field In-Reply-To: <20041027161525.770adada.mshefty@ichips.intel.com> References: <20041027161525.770adada.mshefty@ichips.intel.com> Message-ID: <1098968774.3266.2984.camel@localhost.localdomain> On Wed, 2004-10-27 at 19:15, Sean Hefty wrote: > What's the purpose behind the index field in the receive wr_id? Currently, it's just a uniquifier. It's not used for anything (other than perhaps a pseudo receive packet counter on the QPN). -- Hal From halr at voltaire.com Thu Oct 28 07:23:28 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 10:23:28 -0400 Subject: [openib-general] ib_mad_port_start allows receive processing before sends can be posted In-Reply-To: <20041027154250.23190b8f.mshefty@ichips.intel.com> References: <20041027154250.23190b8f.mshefty@ichips.intel.com> Message-ID: <1098973408.3339.3.camel@localhost.localdomain> On Wed, 2004-10-27 at 18:42, Sean Hefty wrote: > There appears to be a minor race in ib_mad_port_start where the MAD > layer could begin accepting and processing receives before the QP allows > sends, or even before we know if the QP will finish initializing > properly. This makes it difficult to handle traffic that comes in > before the QP is transitioned to the RTS state, to recover from errors > if the RTS transition fails, or to recover from errors if we fail to > initialize QP1 after QP0 is active. Not sure I see the problem. If we receive something and decide to respond, as long as the send is dropped when the QP is not yet reached RTS, there should be no issue. > Longer term, we may want to consider separating the QP0 and QP1 > initialization. Perhaps. > Short term, I think that if we just move the code around in > ib_mad_port_start, we should be able to ensure that both QPs are ready > to send and receive before handling any receives. (I don't think that > we care if the QPs go to the RTS state without any receives being posted > on them. We'll lose all MADs received before the QP goes into the RTR > state anyway, so this adds a small delay onto the time that we need to > begin handling receives.) > > Unless there's a reason to keep the code as is, I'll generate a patch > for this. I have no objection (but would like to understand more of the perceived problem(s)). Perhaps this approach makes things simpler in the long run, but I think the case of sending when the QP is not RTS needs to be dealt with anyhow and make take care of the problems you are concerned about. -- Hal > > - Sean From halr at voltaire.com Thu Oct 28 07:30:44 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 10:30:44 -0400 Subject: [Fwd: Re: [openib-general] ib_free_recv_mad and references] Message-ID: <1098973844.3339.11.camel@localhost.localdomain> I'm not sure whether this overlaps with Sean's imminent patches so I'll hold off on this for now until I see or hear something. (I don't think it does but want to be sure). Also, since I haven't heard back, I would assume the additional parameter needs to be added to ib_free_recv_mad. -- Hal -----Forwarded Message----- From: Hal Rosenstock To: Sean Hefty Cc: openib-general at openib.org Subject: Re: [openib-general] ib_free_recv_mad and references Date: 27 Oct 2004 14:03:26 -0400 On Wed, 2004-10-27 at 13:45, Sean Hefty wrote: > If you have time. I'll get to it if not. I don't think this will be a > large change. I think the signature for ib_free_recv_mad needs to add in a mad_agent parameter as there is currently no need to know which mad_agent was returning the buffers but there will be for the ref counting. Do you see some other way to do this ? Also, with this, I now see what you were saying about partially reassembled (RMPP) receives. BTW, there are 2 comments in ib_unregister_mad_agent referring to both of these: /* Note that we could still be handling received MADs */ /* XXX: Cleanup pending RMPP receives for this agent */ -- Hal _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Thu Oct 28 07:42:23 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 10:42:23 -0400 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <20041027230119.1f5d11d4.mshefty@ichips.intel.com> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <1098809096.3266.47.camel@localhost.localdomain> <20041026095009.037a4d4a.mshefty@ichips.intel.com> <1098886805.3266.218.camel@localhost.localdomain> <20041027095328.2e2a7f63.mshefty@ichips.intel.com> <20041027230119.1f5d11d4.mshefty@ichips.intel.com> Message-ID: <1098974356.3339.16.camel@localhost.localdomain> On Thu, 2004-10-28 at 02:01, Sean Hefty wrote: > I've run into a few other issues trying to use separate send queues. > One of note is that receives are posted to the QP outside of the lock > that inserts them onto the recv_posted_mad_list. I couldn't find where you were referring to. Can you point me at it ? > I don't think that this causes a problem at the moment, since receives are > always re-posted from the completion handler, which is single threaded. > > Question then, should I go ahead and fix this so that it would work in a > multi-threaded case, or assume that completion handling will be single > threaded and optimize for this by removing unnecessary locking? Also, if this is locked, should we go to finer grained locks ? Currently there is a lock for the receive list, but might a lock per receive list per QP be better ? > (Currently, my patch fixes the locking, but it should be noted that the > code won't actually test that the locking is correct as it's written.) I guess we'll just do it by code inspection or someone should develop test(s) or real case(s) for this. -- Hal From roland at topspin.com Thu Oct 28 08:05:32 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 28 Oct 2004 08:05:32 -0700 Subject: [openib-general] hotplug and mthca In-Reply-To: <1098966568.3266.2947.camel@localhost.localdomain> (Hal Rosenstock's message of "Thu, 28 Oct 2004 08:29:28 -0400") References: <1098966568.3266.2947.camel@localhost.localdomain> Message-ID: <52654u3mtf.fsf@topspin.com> Hal> Hi, When I start mthca, I get the following: Hal> /sbin/hotplug: no runnable /etc/hotplug/infiniband.agent is Hal> installed That's benign -- hotplug gets an event when /sys/class/infiniband/mthca0 is created but it doesn't have a script for handling the infiniband class. Hal> How does one get mthca to load automatically ? If ib_mthca.ko is in /lib/modules/xxx for your boot kernel, hotplug should load it automatically on boot; it matches the HCA PCI device with the signature in ib_mthca.ko (the alias: pci:v000015B3d00005A44sv*sd*bc*sc*i* alias: pci:v00001867d00005A44sv*sd*bc*sc*i* alias: pci:v000015B3d00006278sv*sd*bc*sc*i* alias: pci:v00001867d00006278sv*sd*bc*sc*i* part of modinfo output). - R. From roland at topspin.com Thu Oct 28 08:13:25 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 28 Oct 2004 08:13:25 -0700 Subject: [openib-general] hotplug and mthca In-Reply-To: <52654u3mtf.fsf@topspin.com> (Roland Dreier's message of "Thu, 28 Oct 2004 08:05:32 -0700") References: <1098966568.3266.2947.camel@localhost.localdomain> <52654u3mtf.fsf@topspin.com> Message-ID: <521xfi3mga.fsf@topspin.com> Roland> If ib_mthca.ko is in /lib/modules/xxx for your boot Roland> kernel, hotplug should load it automatically on boot (Assuming either your distro or you set up hotplug correctly to run on boot...) - R. From iod00d at hp.com Thu Oct 28 08:48:12 2004 From: iod00d at hp.com (Grant Grundler) Date: Thu, 28 Oct 2004 08:48:12 -0700 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <52oein2yxt.fsf@topspin.com> References: <52pt36brzm.fsf@topspin.com> <1098728484.3269.968.camel@localhost.localdomain> <52bren527p.fsf@topspin.com> <20041027135517.17ba87fb.mshefty@ichips.intel.com> <52oein2yxt.fsf@topspin.com> Message-ID: <20041028154812.GA16568@cup.hp.com> On Wed, Oct 27, 2004 at 10:29:02PM -0700, Roland Dreier wrote: > Or we could copy roland-merge to a new branch with a different name > and work there. I'd prefer this *after* openib.org blesses the new name. Otherwise calling it roland_merge is fine with me. grant From halr at voltaire.com Thu Oct 28 09:38:12 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 12:38:12 -0400 Subject: [openib-general] Errors when building latest IPoIB Message-ID: <1098981492.17991.20.camel@hpc-1> When I build with IPoIB configured, I get the following errors: *** Warning: "ib_client_query_cancel" [drivers/infiniband/ulp/ipoib/ib_ip2pr.ko] undefined! *** Warning: "tsIbPathRecordRequest" [drivers/infiniband/ulp/ipoib/ib_ip2pr.ko] undefined! The former is part of client_query.c and the latter sa_client_path_record.c but these are not included in the build. Are they supposed to be or is something else broken ? -- Hal From roland at topspin.com Thu Oct 28 09:34:58 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 28 Oct 2004 09:34:58 -0700 Subject: [openib-general] Errors when building latest IPoIB In-Reply-To: <1098981492.17991.20.camel@hpc-1> (Hal Rosenstock's message of "Thu, 28 Oct 2004 12:38:12 -0400") References: <1098981492.17991.20.camel@hpc-1> Message-ID: <52is8u243x.fsf@topspin.com> Hal> The former is part of client_query.c and the latter Hal> sa_client_path_record.c but these are not included in the Hal> build. Are they supposed to be or is something else broken ? Oh, ip2pr shouldn't be built any more either (new MAD/SA stuff breaks it). I'll fix the Makefile. - R. From mshefty at ichips.intel.com Thu Oct 28 09:58:51 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Oct 2004 09:58:51 -0700 Subject: [openib-general] [PATCH] ib_mad: In completion handler, when status != success call send done handler In-Reply-To: <1098974356.3339.16.camel@localhost.localdomain> References: <1098758558.14442.6.camel@hpc-1> <52654xa244.fsf@topspin.com> <20041026093000.72b60799.mshefty@ichips.intel.com> <1098809096.3266.47.camel@localhost.localdomain> <20041026095009.037a4d4a.mshefty@ichips.intel.com> <1098886805.3266.218.camel@localhost.localdomain> <20041027095328.2e2a7f63.mshefty@ichips.intel.com> <20041027230119.1f5d11d4.mshefty@ichips.intel.com> <1098974356.3339.16.camel@localhost.localdomain> Message-ID: <20041028095851.6a77a2a9.mshefty@ichips.intel.com> On Thu, 28 Oct 2004 10:42:23 -0400 Hal Rosenstock wrote: > On Thu, 2004-10-28 at 02:01, Sean Hefty wrote: > > I've run into a few other issues trying to use separate send queues. > > > > One of note is that receives are posted to the QP outside of the > > lock that inserts them onto the recv_posted_mad_list. > > I couldn't find where you were referring to. Can you point me at it ? I think it was in ib_mad_post_receive_mad. > Also, if this is locked, should we go to finer grained locks ? > Currently there is a lock for the receive list, but might a lock per > receive list per QP be better ? I've changed the code to use a lock per QP. So, each QP now has their own send list, receive list, send lock, and receive lock. - Sean From mshefty at ichips.intel.com Thu Oct 28 10:05:43 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Oct 2004 10:05:43 -0700 Subject: [openib-general] ib_mad_port_start allows receive processing before sends can be posted In-Reply-To: <1098973408.3339.3.camel@localhost.localdomain> References: <20041027154250.23190b8f.mshefty@ichips.intel.com> <1098973408.3339.3.camel@localhost.localdomain> Message-ID: <20041028100543.26d193c0.mshefty@ichips.intel.com> On Thu, 28 Oct 2004 10:23:28 -0400 Hal Rosenstock wrote: > I have no objection (but would like to understand more of the > perceived problem(s)). Perhaps this approach makes things simpler in > the long run, but I think the case of sending when the QP is not RTS > needs to be dealt with anyhow and make take care of the problems you > are concerned about. I think that we'll have to handle this as well, but think that it may be easier to handle it in error handling code only, rather than during initialization as well. This gets a little into how we handle immediate errors when posting sends, with respect to which errors result in the access layer queuing the MAD, versus errors that are reported to the user directly. But, I probably won't get to looking at that for a couple of weeks. - Sean From mshefty at ichips.intel.com Thu Oct 28 10:09:09 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Oct 2004 10:09:09 -0700 Subject: [openib-general] [PATCH] mad: Comment change (Roland's branch) In-Reply-To: <1098965660.22191.7.camel@hpc-1> References: <1098965660.22191.7.camel@hpc-1> Message-ID: <20041028100909.4bdbfb9d.mshefty@ichips.intel.com> On Thu, 28 Oct 2004 08:14:26 -0400 Hal Rosenstock wrote: > mad: Comment change (Roland's branch) > > - qp_init_attr.qp_type = i; > + qp_init_attr.qp_type = i; /* Relies on ib_qp_type enum ordering > of IB_QPT_SMI and IB_QPT_GSI */ As a side note, it was convenient for me to remove this restriction with my next set of changes, so I did. - Sean From mshefty at ichips.intel.com Thu Oct 28 10:13:11 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Oct 2004 10:13:11 -0700 Subject: [Fwd: Re: [openib-general] ib_free_recv_mad and references] In-Reply-To: <1098973844.3339.11.camel@localhost.localdomain> References: <1098973844.3339.11.camel@localhost.localdomain> Message-ID: <20041028101311.17a35e69.mshefty@ichips.intel.com> On Thu, 28 Oct 2004 10:30:44 -0400 Hal Rosenstock wrote: > I'm not sure whether this overlaps with Sean's imminent patches so > I'll hold off on this for now until I see or hear something. (I don't > think it does but want to be sure). Also, since I haven't heard back, > I would assume the additional parameter needs to be added to > ib_free_recv_mad. I haven't touched this area of the code, so if you want to apply a patch, that should work fine. And we can either add a new parameter to ib_free_recv_mad, or store the mad_agent somewhere off the MAD completion. My preference would be to add the parameter to ib_free_recv_mad. - Sean From halr at voltaire.com Thu Oct 28 10:51:43 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 13:51:43 -0400 Subject: [openib-general] Latest IPoIB Bringup Questions Message-ID: <1098985903.17991.74.camel@hpc-1> Hi, After I load the ib_ipoib module (after loading ib_mthca), I see the ib0 and ib1 net interfaces. The HW address has no QPN and the default subnet prefix (0xFE80:0....) in it. It indicates UP and RUNNING (as well as BROADCAST and MULTICAST). After ifconfig'ing an IPv4 address and netmask< I see the that the QPN is filled in in the HWaddr (00-00-04-04). No other bytes change. Is that correct ? Wouldn't the lower bytes get filled in with some local GID ? Also, when I look at the SA packets on the IB "wire", I see the following: IPoIB End Node SA Set(MC) Bcast group (TID 0x0000000700000000) ------> 1 msec Set(MC) Bcast group (TID 0x0000000700000000) ------> 16.837 msec Set(MC) Bcast group (TID 0x0000000700000001) ------> 1.256 msec <------------- GetResp (TID 0x0000000700000000) status 0 0.536 msec Set(MC) x.x.x.01 group (TID x0000000700000001) ------> presumably 224.0.0.1 1.251 msec <------------- GetResp (TID 0x0000000700000000) status 0 1.579 msec <------------- GetResp (TID 0x0000000700000001) status 0 1.465 msec <------------- GetResp (TID 0x0000000700000000) status 0x0600 Some questions/comments on this sequence: 1. In general, how long is the end node waiting before retransmitting a SA MC request ? 2. How are TIDs chosen ? It looks to me like there are 2 different MC requests potentially outstanding with the same TID. The last request is the insufficient component mask issue. I haven't checked connectivity at the IP level yet. (Not sure whether the broadcast group was formed correctly or not). One other minor comment: Should we teach ifconfig to display Link Encap: INFINIBAND ? -- Hal From roland at topspin.com Thu Oct 28 12:32:15 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 28 Oct 2004 12:32:15 -0700 Subject: [openib-general] Latest IPoIB Bringup Questions In-Reply-To: <1098985903.17991.74.camel@hpc-1> (Hal Rosenstock's message of "Thu, 28 Oct 2004 13:51:43 -0400") References: <1098985903.17991.74.camel@hpc-1> Message-ID: <52654u1vwg.fsf@topspin.com> Hal> After ifconfig'ing an IPv4 address and netmask< I see the Hal> that the QPN is filled in in the HWaddr (00-00-04-04). No Hal> other bytes change. Is that correct ? Wouldn't the lower Hal> bytes get filled in with some local GID ? Actually with ifconfig I don't think you can see enough of the HW addr to see the GUID part of the GID. On my box I have: # ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 but # ip addr show dev ib0 6: ib0: mtu 2044 qdisc pfifo_fast qlen 128 link/[32] 00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:01:07:8c:e4:61 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff Hal> Some questions/comments on this sequence: 1. In general, how Hal> long is the end node waiting before retransmitting a SA MC Hal> request ? It should wait 1 second and back off from there. But I think I have a bug somewhere either in the IPoIB driver or the SA layer. Hal> 2. How are TIDs chosen ? It looks to me like there are 2 Hal> different MC requests potentially outstanding with the same Hal> TID. The SA layer uses idr.h to allocate the low TID -- so it shouldn't reuse a TID until the previous request times out or is completed. Hal> The last request is the insufficient component mask issue. Right, that should be possible to fix now with the new SA API. Hal> One other minor comment: Should we teach ifconfig to display Hal> Link Encap: INFINIBAND ? Probably better to work on ip, since ifconfig has other issues (such as using an ioctl limited to 14 bytes to get the HW addr) - R. From halr at voltaire.com Thu Oct 28 12:54:53 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 28 Oct 2004 15:54:53 -0400 Subject: [openib-general] Latest IPoIB Bringup Questions In-Reply-To: <52654u1vwg.fsf@topspin.com> References: <1098985903.17991.74.camel@hpc-1> <52654u1vwg.fsf@topspin.com> Message-ID: <1098993293.17991.186.camel@hpc-1> On Thu, 2004-10-28 at 15:32, Roland Dreier wrote: > Hal> After ifconfig'ing an IPv4 address and netmask< I see the > Hal> that the QPN is filled in in the HWaddr (00-00-04-04). No > Hal> other bytes change. Is that correct ? Wouldn't the lower > Hal> bytes get filled in with some local GID ? > > Actually with ifconfig I don't think you can see enough of the HW addr > to see the GUID part of the GID. On my box I have: > > # ifconfig ib0 > ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 > > but > > # ip addr show dev ib0 > 6: ib0: mtu 2044 qdisc pfifo_fast qlen 128 > link/[32] 00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:01:07:8c:e4:61 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff Thanks. That looks better :-) > Hal> Some questions/comments on this sequence: 1. In general, how > Hal> long is the end node waiting before retransmitting a SA MC > Hal> request ? > > It should wait 1 second and back off from there. But I think I have a > bug somewhere either in the IPoIB driver or the SA layer. > > Hal> 2. How are TIDs chosen ? It looks to me like there are 2 > Hal> different MC requests potentially outstanding with the same > Hal> TID. > > The SA layer uses idr.h to allocate the low TID -- so it shouldn't > reuse a TID until the previous request times out or is completed. This could be related to #1 above. > Hal> The last request is the insufficient component mask issue. > > Right, that should be possible to fix now with the new SA API. > > Hal> One other minor comment: Should we teach ifconfig to display > Hal> Link Encap: INFINIBAND ? > > Probably better to work on ip, since ifconfig has other issues (such > as using an ioctl limited to 14 bytes to get the HW addr) I have basic IP connectivity but tbe component mask issue is getting in the way of some things in some cases I see a flood of join requests/rejects until the application gives up. The "flood" may also be due to the timing issues in #1 above. -- Hal From mshefty at ichips.intel.com Thu Oct 28 23:30:00 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Oct 2004 23:30:00 -0700 Subject: [openib-general] [PATCH] for review -- fix MAD completion handling Message-ID: <20041028233000.19879b59.mshefty@ichips.intel.com> Here's what I have to handle MAD completion handling. This patch tries to fix the issue of matching a completion (successful or error) with the corresponding work request. Some notes: - I kept a single CQ, which meant that the wr_id needed to determine if the request was a send or receive (in the case of errors), and find the request. - I structured the code in order to support multi-threaded polling of the CQ. This implied that requests could be processed out of order from how they were queued on the posted lists. (Or we needed coarser locking.) - The correct work request is accessed directly, without searching. - Locking between the QPs, has been reduced, and locks have been made finer grained. - The code structure should help when adding error handling (which I have on my to do list). I have a few more notes and items to revisit, but I don't have them nearby at the moment. I will begin testing the code tomorrow, but wanted to make it available for review. - Sean Index: access/mad.c =================================================================== --- access/mad.c (revision 1092) +++ access/mad.c (working copy) @@ -81,9 +81,8 @@ static int add_mad_reg_req(struct ib_mad_reg_req *mad_reg_req, struct ib_mad_agent_private *priv); static void remove_mad_reg_req(struct ib_mad_agent_private *priv); -static int ib_mad_post_receive_mad(struct ib_mad_port_private *port_priv, - struct ib_qp *qp); -static int ib_mad_post_receive_mads(struct ib_mad_port_private *priv); +static int ib_mad_post_receive_mad(struct ib_mad_qp_info *qp_info); +static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info); static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv); static void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, struct ib_mad_send_wc *mad_send_wc); @@ -130,6 +129,19 @@ 0 : mgmt_class; } +static int get_spl_qp_index(enum ib_qp_type qp_type) +{ + switch (qp_type) + { + case IB_QPT_SMI: + return 0; + case IB_QPT_GSI: + return 1; + default: + return -1; + } +} + /* * ib_register_mad_agent - Register to send/receive MADs */ @@ -148,12 +160,13 @@ struct ib_mad_reg_req *reg_req = NULL; struct ib_mad_mgmt_class_table *class; struct ib_mad_mgmt_method_table *method; - int ret2; + int ret2, qpn; unsigned long flags; u8 mgmt_class; /* Validate parameters */ - if (qp_type != IB_QPT_GSI && qp_type != IB_QPT_SMI) { + qpn = get_spl_qp_index(qp_type); + if (qpn == -1) { ret = ERR_PTR(-EINVAL); goto error1; } @@ -248,14 +261,14 @@ /* Now, fill in the various structures */ memset(mad_agent_priv, 0, sizeof *mad_agent_priv); - mad_agent_priv->port_priv = port_priv; + mad_agent_priv->qp_info = &port_priv->qp_info[qpn]; mad_agent_priv->reg_req = reg_req; mad_agent_priv->rmpp_version = rmpp_version; mad_agent_priv->agent.device = device; mad_agent_priv->agent.recv_handler = recv_handler; mad_agent_priv->agent.send_handler = send_handler; mad_agent_priv->agent.context = context; - mad_agent_priv->agent.qp = port_priv->qp[qp_type]; + mad_agent_priv->agent.qp = port_priv->qp_info[qpn].qp; mad_agent_priv->agent.hi_tid = ++ib_mad_client_id; mad_agent_priv->agent.port_num = port_num; @@ -276,7 +289,6 @@ INIT_WORK(&mad_agent_priv->work, timeout_sends, mad_agent_priv); atomic_set(&mad_agent_priv->refcount, 1); init_waitqueue_head(&mad_agent_priv->wait); - mad_agent_priv->port_priv = port_priv; return &mad_agent_priv->agent; @@ -296,6 +308,7 @@ int ib_unregister_mad_agent(struct ib_mad_agent *mad_agent) { struct ib_mad_agent_private *mad_agent_priv; + struct ib_mad_port_private *port_priv; unsigned long flags; mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private, @@ -309,13 +322,14 @@ */ cancel_mads(mad_agent_priv); + port_priv = mad_agent_priv->qp_info->port_priv; cancel_delayed_work(&mad_agent_priv->work); - flush_workqueue(mad_agent_priv->port_priv->wq); + flush_workqueue(port_priv->wq); - spin_lock_irqsave(&mad_agent_priv->port_priv->reg_lock, flags); + spin_lock_irqsave(&port_priv->reg_lock, flags); remove_mad_reg_req(mad_agent_priv); list_del(&mad_agent_priv->agent_list); - spin_unlock_irqrestore(&mad_agent_priv->port_priv->reg_lock, flags); + spin_unlock_irqrestore(&port_priv->reg_lock, flags); /* XXX: Cleanup pending RMPP receives for this agent */ @@ -330,32 +344,51 @@ } EXPORT_SYMBOL(ib_unregister_mad_agent); +static void queue_mad(struct ib_mad_queue *mad_queue, + struct ib_mad_list_head *mad_list) +{ + unsigned long flags; + + mad_list->mad_queue = mad_queue; + spin_lock_irqsave(&mad_queue->lock, flags); + list_add_tail(&mad_queue->list, &mad_list->list); + mad_queue->count++; + spin_unlock_irqrestore(&mad_queue->lock, flags); +} + +static void dequeue_mad(struct ib_mad_list_head *mad_list) +{ + struct ib_mad_queue *mad_queue; + unsigned long flags; + + BUG_ON(!mad_list->mad_queue); + mad_queue = mad_list->mad_queue; + spin_lock_irqsave(&mad_queue->lock, flags); + list_del(&mad_list->list); + mad_queue->count--; + spin_unlock_irqrestore(&mad_queue->lock, flags); +} + static int ib_send_mad(struct ib_mad_agent_private *mad_agent_priv, struct ib_mad_send_wr_private *mad_send_wr, struct ib_send_wr *send_wr, struct ib_send_wr **bad_send_wr) { - struct ib_mad_port_private *port_priv; - unsigned long flags; + struct ib_mad_qp_info *qp_info; int ret; - port_priv = mad_agent_priv->port_priv; - /* Replace user's WR ID with our own to find WR upon completion */ + qp_info = mad_agent_priv->qp_info; mad_send_wr->wr_id = send_wr->wr_id; - send_wr->wr_id = (unsigned long)mad_send_wr; + send_wr->wr_id = (unsigned long)&mad_send_wr->mad_list; + queue_mad(&qp_info->send_queue, &mad_send_wr->mad_list); - spin_lock_irqsave(&port_priv->send_list_lock, flags); ret = ib_post_send(mad_agent_priv->agent.qp, send_wr, bad_send_wr); - if (!ret) { - list_add_tail(&mad_send_wr->send_list, - &port_priv->send_posted_mad_list); - port_priv->send_posted_mad_count++; - } else { + if (ret) { printk(KERN_NOTICE PFX "ib_post_send failed ret = %d\n", ret); + dequeue_mad(&mad_send_wr->mad_list); *bad_send_wr = send_wr; } - spin_unlock_irqrestore(&port_priv->send_list_lock, flags); return ret; } @@ -370,7 +403,6 @@ int ret; struct ib_send_wr *cur_send_wr, *next_send_wr; struct ib_mad_agent_private *mad_agent_priv; - struct ib_mad_port_private *port_priv; cur_send_wr = send_wr; /* Validate supplied parameters */ @@ -387,7 +419,6 @@ mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private, agent); - port_priv = mad_agent_priv->port_priv; /* Walk list of send WRs and post each on send list */ cur_send_wr = send_wr; @@ -430,6 +461,7 @@ cur_send_wr, &bad_wr); if (ret) { /* Handle QP overrun separately... -ENOMEM */ + /* Handle posting when QP is in error state... */ /* Fail send request */ spin_lock_irqsave(&mad_agent_priv->lock, flags); @@ -592,7 +624,7 @@ if (!mad_reg_req) return 0; - private = priv->port_priv; + private = priv->qp_info->port_priv; mgmt_class = convert_mgmt_class(mad_reg_req->mgmt_class); class = &private->version[mad_reg_req->mgmt_class_version]; if (!*class) { @@ -668,7 +700,7 @@ goto ret; } - port_priv = agent_priv->port_priv; + port_priv = agent_priv->qp_info->port_priv; class = port_priv->version[agent_priv->reg_req->mgmt_class_version]; if (!class) { printk(KERN_ERR PFX "No class table yet MAD registration " @@ -700,20 +732,6 @@ return; } -static int convert_qpnum(u32 qp_num) -{ - /* - * XXX: No redirection currently - * QP0 and QP1 only - * Ultimately, will need table of QP numbers and table index - * as QP numbers will not be packed once redirection supported - */ - if (qp_num > 1) { - return -1; - } - return qp_num; -} - static int response_mad(struct ib_mad *mad) { /* Trap represses are responses although response bit is reset */ @@ -919,54 +937,21 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv, struct ib_wc *wc) { + struct ib_mad_qp_info *qp_info; struct ib_mad_private_header *mad_priv_hdr; - struct ib_mad_recv_buf *rbuf; struct ib_mad_private *recv; - union ib_mad_recv_wrid wrid; - unsigned long flags; - u32 qp_num; + struct ib_mad_list_head *mad_list; struct ib_mad_agent_private *mad_agent = NULL; - int solicited, qpn; - - /* For receive, QP number is field in the WC WRID */ - wrid.wrid = wc->wr_id; - qp_num = wrid.wrid_field.qpn; - qpn = convert_qpnum(qp_num); - if (qpn == -1) { - printk(KERN_ERR PFX "Packet received on unknown QPN %d\n", - qp_num); - ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); - return; - } - - /* - * Completion corresponds to first entry on - * posted MAD receive list based on WRID in completion - */ - spin_lock_irqsave(&port_priv->recv_list_lock, flags); - if (!list_empty(&port_priv->recv_posted_mad_list[qpn])) { - rbuf = list_entry(port_priv->recv_posted_mad_list[qpn].next, - struct ib_mad_recv_buf, - list); - mad_priv_hdr = container_of(rbuf, struct ib_mad_private_header, - recv_buf); - recv = container_of(mad_priv_hdr, struct ib_mad_private, - header); - - /* Remove from posted receive MAD list */ - list_del(&recv->header.recv_buf.list); - port_priv->recv_posted_mad_count[qpn]--; - - } else { - printk(KERN_ERR PFX "Receive completion WR ID 0x%Lx on QP %d " - "with no posted receive\n", (unsigned long long) wc->wr_id, - qp_num); - spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); - ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); - return; - } - spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); + int solicited; + unsigned long flags; + mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id; + qp_info = mad_list->mad_queue->qp_info; + dequeue_mad(mad_list); + + mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header, + mad_list); + recv = container_of(mad_priv_hdr, struct ib_mad_private, header); pci_unmap_single(port_priv->device->dma_device, pci_unmap_addr(&recv->header, mapping), sizeof(struct ib_mad_private) - @@ -981,7 +966,7 @@ recv->header.recv_buf.grh = &recv->grh; /* Validate MAD */ - if (!validate_mad(recv->header.recv_buf.mad, qp_num)) + if (!validate_mad(recv->header.recv_buf.mad, qp_info->qp->qp_num)) goto ret; /* Snoop MAD ? */ @@ -1014,7 +999,7 @@ } /* Post another receive request for this QP */ - ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); + ib_mad_post_receive_mad(qp_info); return; } @@ -1036,7 +1021,8 @@ delay = mad_send_wr->timeout - jiffies; if ((long)delay <= 0) delay = 1; - queue_delayed_work(mad_agent_priv->port_priv->wq, + queue_delayed_work(mad_agent_priv->qp_info-> + port_priv->wq, &mad_agent_priv->work, delay); } } @@ -1066,7 +1052,7 @@ /* Reschedule a work item if we have a shorter timeout */ if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list) { cancel_delayed_work(&mad_agent_priv->work); - queue_delayed_work(mad_agent_priv->port_priv->wq, + queue_delayed_work(mad_agent_priv->qp_info->port_priv->wq, &mad_agent_priv->work, delay); } } @@ -1120,39 +1106,15 @@ struct ib_wc *wc) { struct ib_mad_send_wr_private *mad_send_wr; - unsigned long flags; - - /* Completion corresponds to first entry on posted MAD send list */ - spin_lock_irqsave(&port_priv->send_list_lock, flags); - if (list_empty(&port_priv->send_posted_mad_list)) { - printk(KERN_ERR PFX "Send completion WR ID 0x%Lx but send " - "list is empty\n", (unsigned long long) wc->wr_id); - goto error; - } - - mad_send_wr = list_entry(port_priv->send_posted_mad_list.next, - struct ib_mad_send_wr_private, - send_list); - if (wc->wr_id != (unsigned long)mad_send_wr) { - printk(KERN_ERR PFX "Send completion WR ID 0x%Lx doesn't match " - "posted send WR ID 0x%lx\n", - (unsigned long long) wc->wr_id, - (unsigned long)mad_send_wr); - goto error; - } - - /* Remove from posted send MAD list */ - list_del(&mad_send_wr->send_list); - port_priv->send_posted_mad_count--; - spin_unlock_irqrestore(&port_priv->send_list_lock, flags); + struct ib_mad_list_head *mad_list; + mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id; + mad_send_wr = container_of(mad_list, struct ib_mad_send_wr_private, + mad_list); + dequeue_mad(mad_list); /* Restore client wr_id in WC */ wc->wr_id = mad_send_wr->wr_id; ib_mad_complete_send_wr(mad_send_wr, (struct ib_mad_send_wc*)wc); - return; - -error: - spin_unlock_irqrestore(&port_priv->send_list_lock, flags); } /* @@ -1162,31 +1124,36 @@ { struct ib_mad_port_private *port_priv; struct ib_wc wc; + struct ib_mad_list_head *mad_list; + struct ib_mad_qp_info *qp_info; port_priv = (struct ib_mad_port_private*)data; ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { if (wc.status != IB_WC_SUCCESS) { - printk(KERN_ERR PFX "Completion error %d WRID 0x%Lx\n", - wc.status, (unsigned long long) wc.wr_id); + /* Determine if failure was a send or receive. */ + mad_list = (struct ib_mad_list_head *) + (unsigned long)wc.wr_id; + qp_info = mad_list->mad_queue->qp_info; + if (mad_list->mad_queue == &qp_info->send_queue) + wc.opcode = IB_WC_SEND; + else + wc.opcode = IB_WC_RECV; + } + printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", + wc.opcode, (unsigned long long) wc.wr_id); + + switch (wc.opcode) { + case IB_WC_SEND: ib_mad_send_done_handler(port_priv, &wc); - } else { - printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", - wc.opcode, (unsigned long long) wc.wr_id); - - switch (wc.opcode) { - case IB_WC_SEND: - ib_mad_send_done_handler(port_priv, &wc); - break; - case IB_WC_RECV: - ib_mad_recv_done_handler(port_priv, &wc); - break; - default: - printk(KERN_ERR PFX "Wrong Opcode 0x%x on completion\n", - wc.opcode); - break; - } + break; + case IB_WC_RECV: + ib_mad_recv_done_handler(port_priv, &wc); + break; + default: + BUG_ON(1); + break; } } } @@ -1316,7 +1283,8 @@ delay = mad_send_wr->timeout - jiffies; if ((long)delay <= 0) delay = 1; - queue_delayed_work(mad_agent_priv->port_priv->wq, + queue_delayed_work(mad_agent_priv->qp_info-> + port_priv->wq, &mad_agent_priv->work, delay); break; } @@ -1341,24 +1309,13 @@ queue_work(port_priv->wq, &port_priv->work); } -static int ib_mad_post_receive_mad(struct ib_mad_port_private *port_priv, - struct ib_qp *qp) +static int ib_mad_post_receive_mad(struct ib_mad_qp_info *qp_info) { struct ib_mad_private *mad_priv; struct ib_sge sg_list; struct ib_recv_wr recv_wr; struct ib_recv_wr *bad_recv_wr; - unsigned long flags; int ret; - union ib_mad_recv_wrid wrid; - int qpn; - - - qpn = convert_qpnum(qp->qp_num); - if (qpn == -1) { - printk(KERN_ERR PFX "Post receive to invalid QPN %d\n", qp->qp_num); - return -EINVAL; - } /* * Allocate memory for receive buffer. @@ -1376,47 +1333,32 @@ } /* Setup scatter list */ - sg_list.addr = pci_map_single(port_priv->device->dma_device, + sg_list.addr = pci_map_single(qp_info->port_priv->device->dma_device, &mad_priv->grh, sizeof *mad_priv - sizeof mad_priv->header, PCI_DMA_FROMDEVICE); sg_list.length = sizeof *mad_priv - sizeof mad_priv->header; - sg_list.lkey = (*port_priv->mr).lkey; + sg_list.lkey = (*qp_info->port_priv->mr).lkey; /* Setup receive WR */ recv_wr.next = NULL; recv_wr.sg_list = &sg_list; recv_wr.num_sge = 1; recv_wr.recv_flags = IB_RECV_SIGNALED; - wrid.wrid_field.index = port_priv->recv_wr_index[qpn]++; - wrid.wrid_field.qpn = qp->qp_num; - recv_wr.wr_id = wrid.wrid; - - /* Link receive WR into posted receive MAD list */ - spin_lock_irqsave(&port_priv->recv_list_lock, flags); - list_add_tail(&mad_priv->header.recv_buf.list, - &port_priv->recv_posted_mad_list[qpn]); - port_priv->recv_posted_mad_count[qpn]++; - spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); - + recv_wr.wr_id = (unsigned long)&mad_priv->header.mad_list; pci_unmap_addr_set(&mad_priv->header, mapping, sg_list.addr); - /* Now, post receive WR */ - ret = ib_post_recv(qp, &recv_wr, &bad_recv_wr); + /* Post receive WR. */ + queue_mad(&qp_info->recv_queue, &mad_priv->header.mad_list); + ret = ib_post_recv(qp_info->qp, &recv_wr, &bad_recv_wr); if (ret) { - - pci_unmap_single(port_priv->device->dma_device, + dequeue_mad(&mad_priv->header.mad_list); + pci_unmap_single(qp_info->port_priv->device->dma_device, pci_unmap_addr(&mad_priv->header, mapping), sizeof *mad_priv - sizeof mad_priv->header, PCI_DMA_FROMDEVICE); - /* Unlink from posted receive MAD list */ - spin_lock_irqsave(&port_priv->recv_list_lock, flags); - list_del(&mad_priv->header.recv_buf.list); - port_priv->recv_posted_mad_count[qpn]--; - spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); - kmem_cache_free(ib_mad_cache, mad_priv); printk(KERN_NOTICE PFX "ib_post_recv WRID 0x%Lx failed ret = %d\n", (unsigned long long) recv_wr.wr_id, ret); @@ -1429,79 +1371,73 @@ /* * Allocate receive MADs and post receive WRs for them */ -static int ib_mad_post_receive_mads(struct ib_mad_port_private *port_priv) +static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info) { - int i, j; + int i, ret; for (i = 0; i < IB_MAD_QP_RECV_SIZE; i++) { - for (j = 0; j < IB_MAD_QPS_CORE; j++) { - if (ib_mad_post_receive_mad(port_priv, - port_priv->qp[j])) { - printk(KERN_ERR PFX "receive post %d failed " - "on %s port %d\n", i + 1, - port_priv->device->name, - port_priv->port_num); - } + ret = ib_mad_post_receive_mad(qp_info); + if (ret) { + printk(KERN_ERR PFX "receive post %d failed " + "on %s port %d\n", i + 1, + qp_info->port_priv->device->name, + qp_info->port_priv->port_num); + break; } } - - return 0; + return ret; } /* * Return all the posted receive MADs */ -static void ib_mad_return_posted_recv_mads(struct ib_mad_port_private *port_priv) +static void ib_mad_return_posted_recv_mads(struct ib_mad_qp_info *qp_info) { - int i; unsigned long flags; struct ib_mad_private_header *mad_priv_hdr; - struct ib_mad_recv_buf *rbuf; struct ib_mad_private *recv; + struct ib_mad_list_head *mad_list; - for (i = 0; i < IB_MAD_QPS_CORE; i++) { - spin_lock_irqsave(&port_priv->recv_list_lock, flags); - while (!list_empty(&port_priv->recv_posted_mad_list[i])) { + spin_lock_irqsave(&qp_info->recv_queue.lock, flags); + while (!list_empty(&qp_info->recv_queue.list)) { - rbuf = list_entry(port_priv->recv_posted_mad_list[i].next, - struct ib_mad_recv_buf, list); - mad_priv_hdr = container_of(rbuf, - struct ib_mad_private_header, - recv_buf); - recv = container_of(mad_priv_hdr, - struct ib_mad_private, header); - - /* Remove for posted receive MAD list */ - list_del(&recv->header.recv_buf.list); - - /* Undo PCI mapping */ - pci_unmap_single(port_priv->device->dma_device, - pci_unmap_addr(&recv->header, mapping), - sizeof(struct ib_mad_private) - - sizeof(struct ib_mad_private_header), - PCI_DMA_FROMDEVICE); - - kmem_cache_free(ib_mad_cache, recv); - } - - INIT_LIST_HEAD(&port_priv->recv_posted_mad_list[i]); - port_priv->recv_posted_mad_count[i] = 0; - spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); + mad_list = list_entry(qp_info->recv_queue.list.next, + struct ib_mad_list_head, list); + mad_priv_hdr = container_of(mad_list, + struct ib_mad_private_header, + mad_list); + recv = container_of(mad_priv_hdr, struct ib_mad_private, + header); + + /* Remove from posted receive MAD list */ + list_del(&recv->header.recv_buf.list); + + /* Undo PCI mapping */ + pci_unmap_single(qp_info->port_priv->device->dma_device, + pci_unmap_addr(&recv->header, mapping), + sizeof(struct ib_mad_private) - + sizeof(struct ib_mad_private_header), + PCI_DMA_FROMDEVICE); + kmem_cache_free(ib_mad_cache, recv); } + + INIT_LIST_HEAD(&qp_info->recv_queue.list); + qp_info->recv_queue.count = 0; + spin_unlock_irqrestore(&qp_info->recv_queue.lock, flags); } /* * Return all the posted send MADs */ -static void ib_mad_return_posted_send_mads(struct ib_mad_port_private *port_priv) +static void ib_mad_return_posted_send_mads(struct ib_mad_qp_info *qp_info) { unsigned long flags; - spin_lock_irqsave(&port_priv->send_list_lock, flags); - /* Just clear port send posted MAD list */ - INIT_LIST_HEAD(&port_priv->send_posted_mad_list); - port_priv->send_posted_mad_count = 0; - spin_unlock_irqrestore(&port_priv->send_list_lock, flags); + /* Just clear port send posted MAD list... revisit!!! */ + spin_lock_irqsave(&qp_info->send_queue.lock, flags); + INIT_LIST_HEAD(&qp_info->send_queue.list); + qp_info->send_queue.count = 0; + spin_unlock_irqrestore(&qp_info->send_queue.lock, flags); } /* @@ -1627,35 +1563,21 @@ int ret, i, ret2; for (i = 0; i < IB_MAD_QPS_CORE; i++) { - ret = ib_mad_change_qp_state_to_init(port_priv->qp[i]); + ret = ib_mad_change_qp_state_to_init(port_priv->qp_info[i].qp); if (ret) { printk(KERN_ERR PFX "Couldn't change QP%d state to " "INIT\n", i); - return ret; + goto error; } - } - - ret = ib_mad_post_receive_mads(port_priv); - if (ret) { - printk(KERN_ERR PFX "Couldn't post receive requests\n"); - goto error; - } - - ret = ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); - if (ret) { - printk(KERN_ERR PFX "Failed to request completion notification\n"); - goto error; - } - for (i = 0; i < IB_MAD_QPS_CORE; i++) { - ret = ib_mad_change_qp_state_to_rtr(port_priv->qp[i]); + ret = ib_mad_change_qp_state_to_rtr(port_priv->qp_info[i].qp); if (ret) { printk(KERN_ERR PFX "Couldn't change QP%d state to " "RTR\n", i); goto error; } - ret = ib_mad_change_qp_state_to_rts(port_priv->qp[i]); + ret = ib_mad_change_qp_state_to_rts(port_priv->qp_info[i].qp); if (ret) { printk(KERN_ERR PFX "Couldn't change QP%d state to " "RTS\n", i); @@ -1663,17 +1585,31 @@ } } + ret = ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); + if (ret) { + printk(KERN_ERR PFX "Failed to request completion notification\n"); + goto error; + } + + for (i = 0; i < IB_MAD_QPS_CORE; i++) { + ret = ib_mad_post_receive_mads(&port_priv->qp_info[i]); + if (ret) { + printk(KERN_ERR PFX "Couldn't post receive requests\n"); + goto error; + } + } return 0; + error: - ib_mad_return_posted_recv_mads(port_priv); for (i = 0; i < IB_MAD_QPS_CORE; i++) { - ret2 = ib_mad_change_qp_state_to_reset(port_priv->qp[i]); + ib_mad_return_posted_recv_mads(&port_priv->qp_info[i]); + ret2 = ib_mad_change_qp_state_to_reset(port_priv-> + qp_info[i].qp); if (ret2) { printk(KERN_ERR PFX "ib_mad_port_start: Couldn't " "change QP%d state to RESET\n", i); } } - return ret; } @@ -1685,16 +1621,66 @@ int i, ret; for (i = 0; i < IB_MAD_QPS_CORE; i++) { - ret = ib_mad_change_qp_state_to_reset(port_priv->qp[i]); + ret = ib_mad_change_qp_state_to_reset(port_priv->qp_info[i].qp); if (ret) { printk(KERN_ERR PFX "ib_mad_port_stop: Couldn't change " "%s port %d QP%d state to RESET\n", port_priv->device->name, port_priv->port_num, i); } + ib_mad_return_posted_recv_mads(&port_priv->qp_info[i]); + ib_mad_return_posted_send_mads(&port_priv->qp_info[i]); } +} - ib_mad_return_posted_recv_mads(port_priv); - ib_mad_return_posted_send_mads(port_priv); +static void init_mad_queue(struct ib_mad_qp_info *qp_info, + struct ib_mad_queue *mad_queue) +{ + mad_queue->qp_info = qp_info; + mad_queue->count = 0; + spin_lock_init(&mad_queue->lock); + INIT_LIST_HEAD(&mad_queue->list); +} + +static int create_mad_qp(struct ib_mad_port_private *port_priv, + struct ib_mad_qp_info *qp_info, + enum ib_qp_type qp_type) +{ + struct ib_qp_init_attr qp_init_attr; + struct ib_qp_cap qp_cap; + int ret; + + qp_info->port_priv = port_priv; + init_mad_queue(qp_info, &qp_info->send_queue); + init_mad_queue(qp_info, &qp_info->recv_queue); + + memset(&qp_init_attr, 0, sizeof qp_init_attr); + qp_init_attr.send_cq = port_priv->cq; + qp_init_attr.recv_cq = port_priv->cq; + qp_init_attr.sq_sig_type = IB_SIGNAL_ALL_WR; + qp_init_attr.rq_sig_type = IB_SIGNAL_ALL_WR; + qp_init_attr.cap.max_send_wr = IB_MAD_QP_SEND_SIZE; + qp_init_attr.cap.max_recv_wr = IB_MAD_QP_RECV_SIZE; + qp_init_attr.cap.max_send_sge = IB_MAD_SEND_REQ_MAX_SG; + qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG; + qp_init_attr.qp_type = qp_type; + qp_init_attr.port_num = port_priv->port_num; + qp_info->qp = ib_create_qp(port_priv->pd, &qp_init_attr, &qp_cap); + if (IS_ERR(qp_info->qp)) { + printk(KERN_ERR PFX "Couldn't create ib_mad QP%d\n", + get_spl_qp_index(qp_type)); + ret = PTR_ERR(qp_info->qp); + goto error; + } + printk(KERN_DEBUG PFX "Created ib_mad QP %d\n", qp_info->qp->qp_num); + return 0; + +error: + return ret; +} + +static void destroy_mad_qp(struct ib_mad_qp_info *qp_info) +{ + ib_destroy_qp(qp_info->qp); } /* @@ -1703,7 +1689,7 @@ */ static int ib_mad_port_open(struct ib_device *device, int port_num) { - int ret, cq_size, i; + int ret, cq_size; u64 iova = 0; struct ib_phys_buf buf_list = { .addr = 0, @@ -1758,42 +1744,15 @@ goto error5; } - for (i = 0; i < IB_MAD_QPS_CORE; i++) { - struct ib_qp_init_attr qp_init_attr; - struct ib_qp_cap qp_cap; - - memset(&qp_init_attr, 0, sizeof qp_init_attr); - qp_init_attr.send_cq = port_priv->cq; - qp_init_attr.recv_cq = port_priv->cq; - qp_init_attr.sq_sig_type = IB_SIGNAL_ALL_WR; - qp_init_attr.rq_sig_type = IB_SIGNAL_ALL_WR; - qp_init_attr.cap.max_send_wr = IB_MAD_QP_SEND_SIZE; - qp_init_attr.cap.max_recv_wr = IB_MAD_QP_RECV_SIZE; - qp_init_attr.cap.max_send_sge = IB_MAD_SEND_REQ_MAX_SG; - qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG; - qp_init_attr.qp_type = i; /* Relies on ib_qp_type enum ordering of IB_QPT_SMI and IB_QPT_GSI */ - qp_init_attr.port_num = port_priv->port_num; - port_priv->qp[i] = ib_create_qp(port_priv->pd, &qp_init_attr, - &qp_cap); - if (IS_ERR(port_priv->qp[i])) { - printk(KERN_ERR PFX "Couldn't create ib_mad QP%d\n", i); - ret = PTR_ERR(port_priv->qp[i]); - if (i == 0) - goto error6; - else - goto error7; - } - printk(KERN_DEBUG PFX "Created ib_mad QP %d\n", - port_priv->qp[i]->qp_num); - } + ret = create_mad_qp(port_priv, &port_priv->qp_info[0], IB_QPT_SMI); + if (ret) + goto error6; + ret = create_mad_qp(port_priv, &port_priv->qp_info[1], IB_QPT_GSI); + if (ret) + goto error7; spin_lock_init(&port_priv->reg_lock); - spin_lock_init(&port_priv->recv_list_lock); - spin_lock_init(&port_priv->send_list_lock); INIT_LIST_HEAD(&port_priv->agent_list); - INIT_LIST_HEAD(&port_priv->send_posted_mad_list); - for (i = 0; i < IB_MAD_QPS_CORE; i++) - INIT_LIST_HEAD(&port_priv->recv_posted_mad_list[i]); port_priv->wq = create_workqueue("ib_mad"); if (!port_priv->wq) { @@ -1811,15 +1770,14 @@ spin_lock_irqsave(&ib_mad_port_list_lock, flags); list_add_tail(&port_priv->port_list, &ib_mad_port_list); spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); - return 0; error9: destroy_workqueue(port_priv->wq); error8: - ib_destroy_qp(port_priv->qp[1]); + destroy_mad_qp(&port_priv->qp_info[1]); error7: - ib_destroy_qp(port_priv->qp[0]); + destroy_mad_qp(&port_priv->qp_info[0]); error6: ib_dereg_mr(port_priv->mr); error5: @@ -1855,8 +1813,8 @@ ib_mad_port_stop(port_priv); flush_workqueue(port_priv->wq); destroy_workqueue(port_priv->wq); - ib_destroy_qp(port_priv->qp[1]); - ib_destroy_qp(port_priv->qp[0]); + destroy_mad_qp(&port_priv->qp_info[1]); + destroy_mad_qp(&port_priv->qp_info[0]); ib_dereg_mr(port_priv->mr); ib_dealloc_pd(port_priv->pd); ib_destroy_cq(port_priv->cq); Index: access/mad_priv.h =================================================================== --- access/mad_priv.h (revision 1092) +++ access/mad_priv.h (working copy) @@ -79,16 +79,13 @@ #define MAX_MGMT_CLASS 80 #define MAX_MGMT_VERSION 8 - -union ib_mad_recv_wrid { - u64 wrid; - struct { - u32 index; - u32 qpn; - } wrid_field; +struct ib_mad_list_head { + struct list_head list; + struct ib_mad_queue *mad_queue; }; struct ib_mad_private_header { + struct ib_mad_list_head mad_list; struct ib_mad_recv_wc recv_wc; struct ib_mad_recv_buf recv_buf; DECLARE_PCI_UNMAP_ADDR(mapping) @@ -108,7 +105,7 @@ struct list_head agent_list; struct ib_mad_agent agent; struct ib_mad_reg_req *reg_req; - struct ib_mad_port_private *port_priv; + struct ib_mad_qp_info *qp_info; spinlock_t lock; struct list_head send_list; @@ -122,7 +119,7 @@ }; struct ib_mad_send_wr_private { - struct list_head send_list; + struct ib_mad_list_head mad_list; struct list_head agent_list; struct ib_mad_agent *agent; u64 wr_id; /* client WR ID */ @@ -140,11 +137,25 @@ struct ib_mad_mgmt_method_table *method_table[MAX_MGMT_CLASS]; }; +struct ib_mad_queue { + spinlock_t lock; + struct list_head list; + int count; + struct ib_mad_qp_info *qp_info; +}; + +struct ib_mad_qp_info { + struct ib_mad_port_private *port_priv; + struct ib_qp *qp; + struct ib_mad_queue send_queue; + struct ib_mad_queue recv_queue; + /* struct ib_mad_queue overflow_queue; */ +}; + struct ib_mad_port_private { struct list_head port_list; struct ib_device *device; int port_num; - struct ib_qp *qp[IB_MAD_QPS_CORE]; struct ib_cq *cq; struct ib_pd *pd; struct ib_mr *mr; @@ -154,15 +165,7 @@ struct list_head agent_list; struct workqueue_struct *wq; struct work_struct work; - - spinlock_t send_list_lock; - struct list_head send_posted_mad_list; - int send_posted_mad_count; - - spinlock_t recv_list_lock; - struct list_head recv_posted_mad_list[IB_MAD_QPS_CORE]; - int recv_posted_mad_count[IB_MAD_QPS_CORE]; - u32 recv_wr_index[IB_MAD_QPS_CORE]; + struct ib_mad_qp_info qp_info[IB_MAD_QPS_CORE]; }; #endif /* __IB_MAD_PRIV_H__ */ From halr at voltaire.com Fri Oct 29 07:30:12 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 10:30:12 -0400 Subject: [openib-general] Latest IPoIB Bringup Status and Question Message-ID: <1099060193.7890.38.camel@hpc-1> Hi, Last night's changes appear to fix the join component mask issues :-) In all the GetResps, the status is now 0. There is no longer any join flood due to failed (status 0x0600) joins. The timing still appears a little off although this does not cause an operational issue. I see two Set(MC) requests for the broadcast group 0.1426 msec apart. I also see two Set(MC) requests for the 01 group 5.684 msec apart. Should down'ing the ib interface issue Set(MCMemberRecord) leaves for the groups which have been joined ? If not, should it or is there some other way to do this ? (This is minor.) There is one operational issue I am working more to isolate. I hope to report back on this later today. Thanks. -- Hal From halr at voltaire.com Fri Oct 29 07:30:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 10:30:24 -0400 Subject: [openib-general] [PATCH] sa_query: Remove debug print statement Message-ID: <1099060224.7890.41.camel@hpc-1> sa_query: Remove debug print statement Index: sa_query.c =================================================================== --- sa_query.c (revision 1095) +++ sa_query.c (working copy) @@ -444,7 +444,6 @@ query->mad->mad_hdr.tid = cpu_to_be64(((u64) port->agent->hi_tid) << 32 | query->id); wr.wr_id = query->id; - printk("tid %016llx id %08x\n", be64_to_cpu(query->mad->mad_hdr.tid), query->id); spin_lock_irqsave(&port->ah_lock, flags); kref_get(&port->sm_ah->ref); From halr at voltaire.com Fri Oct 29 07:42:06 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 10:42:06 -0400 Subject: [openib-general] [PATCH] mad: Minor reordering so prints follow unlocking or buffer posting rather than preceeding them Message-ID: <1099060926.7890.46.camel@hpc-1> mad: Minor reordering so prints follow unlocking or buffer posting rather than preceeding them Index: mad.c =================================================================== --- mad.c (revision 1089) +++ mad.c (working copy) @@ -933,9 +933,9 @@ qp_num = wrid.wrid_field.qpn; qpn = convert_qpnum(qp_num); if (qpn == -1) { + ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); printk(KERN_ERR PFX "Packet received on unknown QPN %d\n", qp_num); - ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); return; } @@ -958,11 +958,12 @@ port_priv->recv_posted_mad_count[qpn]--; } else { + spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); + ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); printk(KERN_ERR PFX "Receive completion WR ID 0x%Lx on QP %d " - "with no posted receive\n", (unsigned long long) wc->wr_id, + "with no posted receive\n", + (unsigned long long) wc->wr_id, qp_num); - spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); - ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); return; } spin_unlock_irqrestore(&port_priv->recv_list_lock, flags); @@ -1015,7 +1016,6 @@ /* Post another receive request for this QP */ ib_mad_post_receive_mad(port_priv, port_priv->qp[qp_num]); - return; } static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv) From roland at topspin.com Fri Oct 29 08:52:22 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 08:52:22 -0700 Subject: [openib-general] Latest IPoIB Bringup Status and Question In-Reply-To: <1099060193.7890.38.camel@hpc-1> (Hal Rosenstock's message of "Fri, 29 Oct 2004 10:30:12 -0400") References: <1099060193.7890.38.camel@hpc-1> Message-ID: <528y9pzfm1.fsf@topspin.com> Hal> The timing still appears a little off although this does not Hal> cause an operational issue. I see two Set(MC) requests for Hal> the broadcast group 0.1426 msec apart. I also see two Set(MC) Hal> requests for the 01 group 5.684 msec apart. OK, I think I have a handle on this... Hal> Should down'ing the ib interface issue Set(MCMemberRecord) Hal> leaves for the groups which have been joined ? If not, should Hal> it or is there some other way to do this ? (This is minor.) Yes, see ipoib_mcast_leave(), specifically /* XXX implement leaving SA's multicast group */ - R. From roland at topspin.com Fri Oct 29 08:53:09 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 08:53:09 -0700 Subject: [openib-general] [PATCH] sa_query: Remove debug print statement In-Reply-To: <1099060224.7890.41.camel@hpc-1> (Hal Rosenstock's message of "Fri, 29 Oct 2004 10:30:24 -0400") References: <1099060224.7890.41.camel@hpc-1> Message-ID: <524qkdzfkq.fsf@topspin.com> Hal> sa_query: Remove debug print statement Thanks, applied. Should we remove some of the debug output from mad.c? - R. From halr at voltaire.com Fri Oct 29 09:13:21 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 12:13:21 -0400 Subject: [openib-general] [PATCH] sa_query: Remove debug print statement In-Reply-To: <524qkdzfkq.fsf@topspin.com> References: <1099060224.7890.41.camel@hpc-1> <524qkdzfkq.fsf@topspin.com> Message-ID: <1099066401.3532.5.camel@amirp.us.voltaire.com> On Fri, 2004-10-29 at 11:53, Roland Dreier wrote: > Should we remove some of the debug output from mad.c? Sure. Are there specific ones you have in mind ? The ones that are KERN_DEBUG ? Any others ? -- Hal From roland at topspin.com Fri Oct 29 09:15:59 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 09:15:59 -0700 Subject: [openib-general] [PATCH] sa_query: Remove debug print statement In-Reply-To: <1099066401.3532.5.camel@amirp.us.voltaire.com> (Hal Rosenstock's message of "Fri, 29 Oct 2004 12:13:21 -0400") References: <1099060224.7890.41.camel@hpc-1> <524qkdzfkq.fsf@topspin.com> <1099066401.3532.5.camel@amirp.us.voltaire.com> Message-ID: <52y8hpxzy8.fsf@topspin.com> Hal> Sure. Are there specific ones you have in mind ? The ones Hal> that are KERN_DEBUG ? Any others ? This alone will probably clean up my dmesg a lot: Index: infiniband/core/mad.c =================================================================== --- infiniband/core/mad.c (revision 1098) +++ infiniband/core/mad.c (working copy) @@ -1172,9 +1172,6 @@ wc.status, (unsigned long long) wc.wr_id); ib_mad_send_done_handler(port_priv, &wc); } else { - printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", - wc.opcode, (unsigned long long) wc.wr_id); - switch (wc.opcode) { case IB_WC_SEND: ib_mad_send_done_handler(port_priv, &wc); From roland at topspin.com Fri Oct 29 09:21:31 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 09:21:31 -0700 Subject: [openib-general] [PATCH] sa_query: Remove debug print statement In-Reply-To: <52y8hpxzy8.fsf@topspin.com> (Roland Dreier's message of "Fri, 29 Oct 2004 09:15:59 -0700") References: <1099060224.7890.41.camel@hpc-1> <524qkdzfkq.fsf@topspin.com> <1099066401.3532.5.camel@amirp.us.voltaire.com> <52y8hpxzy8.fsf@topspin.com> Message-ID: <52r7nhxzp0.fsf@topspin.com> Actually how about this: Index: infiniband/core/mad.c =================================================================== --- infiniband/core/mad.c (revision 1098) +++ infiniband/core/mad.c (working copy) @@ -1172,9 +1172,6 @@ wc.status, (unsigned long long) wc.wr_id); ib_mad_send_done_handler(port_priv, &wc); } else { - printk(KERN_DEBUG PFX "Completion opcode 0x%x WRID 0x%Lx\n", - wc.opcode, (unsigned long long) wc.wr_id); - switch (wc.opcode) { case IB_WC_SEND: ib_mad_send_done_handler(port_priv, &wc); @@ -1536,7 +1533,8 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - printk(KERN_DEBUG PFX "ib_mad_change_qp_state_to_init ret = %d\n", ret); + if (ret) + printk(KERN_WARNING PFX "ib_mad_change_qp_state_to_init ret = %d\n", ret); return ret; } @@ -1562,7 +1560,8 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - printk(KERN_DEBUG PFX "ib_mad_change_qp_state_to_rtr ret = %d\n", ret); + if (ret) + printk(KERN_WARNING PFX "ib_mad_change_qp_state_to_rtr ret = %d\n", ret); return ret; } @@ -1589,7 +1588,8 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - printk(KERN_DEBUG PFX "ib_mad_change_qp_state_to_rts ret = %d\n", ret); + if (ret) + printk(KERN_WARNING PFX "ib_mad_change_qp_state_to_rts ret = %d\n", ret); return ret; } @@ -1615,7 +1615,8 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - printk(KERN_DEBUG PFX "ib_mad_change_qp_state_to_reset ret = %d\n", ret); + if (ret) + printk(KERN_WARNING PFX "ib_mad_change_qp_state_to_reset ret = %d\n", ret); return ret; } @@ -1783,8 +1784,6 @@ else goto error7; } - printk(KERN_DEBUG PFX "Created ib_mad QP %d\n", - port_priv->qp[i]->qp_num); } spin_lock_init(&port_priv->reg_lock); From halr at voltaire.com Fri Oct 29 09:30:34 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 12:30:34 -0400 Subject: [openib-general] [PATCH] sa_query: Remove debug print statement In-Reply-To: <52y8hpxzy8.fsf@topspin.com> References: <1099060224.7890.41.camel@hpc-1> <524qkdzfkq.fsf@topspin.com> <1099066401.3532.5.camel@amirp.us.voltaire.com> <52y8hpxzy8.fsf@topspin.com> Message-ID: <1099067433.3532.24.camel@amirp.us.voltaire.com> On Fri, 2004-10-29 at 12:15, Roland Dreier wrote: > Hal> Sure. Are there specific ones you have in mind ? The ones > Hal> that are KERN_DEBUG ? Any others ? > > This alone will probably clean up my dmesg a lot: We'll start with that one. Thanks. Applied. -- Hal From halr at voltaire.com Fri Oct 29 09:35:36 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 12:35:36 -0400 Subject: [openib-general] [PATCH] sa_query: Remove debug print statement In-Reply-To: <52r7nhxzp0.fsf@topspin.com> References: <1099060224.7890.41.camel@hpc-1> <524qkdzfkq.fsf@topspin.com> <1099066401.3532.5.camel@amirp.us.voltaire.com> <52y8hpxzy8.fsf@topspin.com> <52r7nhxzp0.fsf@topspin.com> Message-ID: <1099067736.3532.29.camel@amirp.us.voltaire.com> On Fri, 2004-10-29 at 12:21, Roland Dreier wrote: > Actually how about this: Even better :-) Thanks. Applied. -- Hal From halr at voltaire.com Fri Oct 29 11:23:15 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 14:23:15 -0400 Subject: [openib-general] Latest IPoIB Status Message-ID: <1099074195.10075.12.camel@amirp.us.voltaire.com> Hi, I tested some very simple IP multicast test programs. The proper IB multicast groups are being joined. I used 239.0.0.1 as well as some others. This was something the gen1 implementation could not do as this hashed to the same ethernet address as 224.0.0.1 which is joined at ib network interface startup. No new issues were detected (join request timing and leave not implemented as yet). -- Hal From halr at voltaire.com Fri Oct 29 11:48:06 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 14:48:06 -0400 Subject: [Fwd: [openib-general] Latest IPoIB Status] Message-ID: <1099075686.2991.16.camel@hpc-1> I wrote a little too soon: On shutdown of the machine with outstanding joins, the message: ib0: waiting on -5 multicast groups is repeated and shutdown appears to hang. -- Hal -----Forwarded Message----- From: Hal Rosenstock To: openib-general at openib.org Subject: [openib-general] Latest IPoIB Status Date: 29 Oct 2004 14:23:15 -0400 Hi, I tested some very simple IP multicast test programs. The proper IB multicast groups are being joined. I used 239.0.0.1 as well as some others. This was something the gen1 implementation could not do as this hashed to the same ethernet address as 224.0.0.1 which is joined at ib network interface startup. No new issues were detected (join request timing and leave not implemented as yet). -- Hal _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From roland at topspin.com Fri Oct 29 12:41:21 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 12:41:21 -0700 Subject: [Fwd: [openib-general] Latest IPoIB Status] In-Reply-To: <1099075686.2991.16.camel@hpc-1> (Hal Rosenstock's message of "Fri, 29 Oct 2004 14:48:06 -0400") References: <1099075686.2991.16.camel@hpc-1> Message-ID: <52d5z1xqfy.fsf@topspin.com> Hal> I wrote a little too soon: On shutdown of the machine with Hal> outstanding joins, the message: ib0: waiting on -5 multicast Hal> groups is repeated and shutdown appears to hang. OK, looks like a reference counting bug. - R. From halr at voltaire.com Fri Oct 29 12:54:19 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 15:54:19 -0400 Subject: [Fwd: [openib-general] Latest IPoIB Status] In-Reply-To: <52d5z1xqfy.fsf@topspin.com> References: <1099075686.2991.16.camel@hpc-1> <52d5z1xqfy.fsf@topspin.com> Message-ID: <1099079658.3270.8.camel@localhost.localdomain> On Fri, 2004-10-29 at 15:41, Roland Dreier wrote: > OK, looks like a reference counting bug. A little more info which may help: It occured in conjunction with the multicast testing and there were some add membership failures at the socket layer. These might be the ones outstanding. It does not seem to occur with just the broadcast group and 224.0.0.1. -- Hal From roland at topspin.com Fri Oct 29 13:05:05 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 13:05:05 -0700 Subject: [Fwd: [openib-general] Latest IPoIB Status] In-Reply-To: <1099079658.3270.8.camel@localhost.localdomain> (Hal Rosenstock's message of "Fri, 29 Oct 2004 15:54:19 -0400") References: <1099075686.2991.16.camel@hpc-1> <52d5z1xqfy.fsf@topspin.com> <1099079658.3270.8.camel@localhost.localdomain> Message-ID: <523bzxxpce.fsf@topspin.com> OK, I just fixed the bug below (left over in the port to new SA API). should work better now I hope. - R. Index: infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- infiniband/ulp/ipoib/ipoib_multicast.c (revision 1102) +++ infiniband/ulp/ipoib/ipoib_multicast.c (revision 1103) @@ -361,7 +361,7 @@ } out: - if (ret) + if (ret < 0) atomic_dec(&priv->mcast_joins); return ret; From krkumar at us.ibm.com Fri Oct 29 13:01:03 2004 From: krkumar at us.ibm.com (Krishna Kumar) Date: Fri, 29 Oct 2004 13:01:03 -0700 (PDT) Subject: [openib-general] [RFC] [PATCH] Remove redundant ib_qp_cap from 2 verb routines. Message-ID: Hi, I know this changes the verbs interface a bit, but ... I don't see a value in the qp_cap being passed to different routines, when either ib_qp_attr or ib_qp_init_attr, both of which contain a qp_cap, are being passed at the same time. Above attached patch removes the qp_cap (saves stack space too) and I have also included a patch for mthca just for completeness to show how the mthca can avoid using the extra variable. Does this sound right ? Thanks, - KK [Inline patch for viewing] diff -ruNp org/access/mad.c new/access/mad.c --- org/access/mad.c 2004-10-29 12:38:35.000000000 -0700 +++ new/access/mad.c 2004-10-29 12:38:16.000000000 -0700 @@ -1509,7 +1509,6 @@ static inline int ib_mad_change_qp_state int ret; struct ib_qp_attr *attr; int attr_mask; - struct ib_qp_cap qp_cap; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -1530,7 +1529,7 @@ static inline int ib_mad_change_qp_state attr->qkey = IB_QP1_QKEY; attr_mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_QKEY; - ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); + ret = ib_modify_qp(qp, attr, attr_mask); kfree(attr); if (ret) @@ -1546,7 +1545,6 @@ static inline int ib_mad_change_qp_state int ret; struct ib_qp_attr *attr; int attr_mask; - struct ib_qp_cap qp_cap; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -1557,7 +1555,7 @@ static inline int ib_mad_change_qp_state attr->qp_state = IB_QPS_RTR; attr_mask = IB_QP_STATE; - ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); + ret = ib_modify_qp(qp, attr, attr_mask); kfree(attr); if (ret) @@ -1573,7 +1571,6 @@ static inline int ib_mad_change_qp_state int ret; struct ib_qp_attr *attr; int attr_mask; - struct ib_qp_cap qp_cap; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -1585,7 +1582,7 @@ static inline int ib_mad_change_qp_state attr->sq_psn = IB_MAD_SEND_Q_PSN; attr_mask = IB_QP_STATE | IB_QP_SQ_PSN; - ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); + ret = ib_modify_qp(qp, attr, attr_mask); kfree(attr); if (ret) @@ -1601,7 +1598,6 @@ static inline int ib_mad_change_qp_state int ret; struct ib_qp_attr *attr; int attr_mask; - struct ib_qp_cap qp_cap; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -1612,7 +1608,7 @@ static inline int ib_mad_change_qp_state attr->qp_state = IB_QPS_RESET; attr_mask = IB_QP_STATE; - ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); + ret = ib_modify_qp(qp, attr, attr_mask); kfree(attr); if (ret) @@ -1761,7 +1757,6 @@ static int ib_mad_port_open(struct ib_de for (i = 0; i < IB_MAD_QPS_CORE; i++) { struct ib_qp_init_attr qp_init_attr; - struct ib_qp_cap qp_cap; memset(&qp_init_attr, 0, sizeof qp_init_attr); qp_init_attr.send_cq = port_priv->cq; @@ -1774,8 +1769,7 @@ static int ib_mad_port_open(struct ib_de qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG; qp_init_attr.qp_type = i; /* Relies on ib_qp_type enum ordering of IB_QPT_SMI and IB_QPT_GSI */ qp_init_attr.port_num = port_priv->port_num; - port_priv->qp[i] = ib_create_qp(port_priv->pd, &qp_init_attr, - &qp_cap); + port_priv->qp[i] = ib_create_qp(port_priv->pd, &qp_init_attr); if (IS_ERR(port_priv->qp[i])) { printk(KERN_ERR PFX "Couldn't create ib_mad QP%d\n", i); ret = PTR_ERR(port_priv->qp[i]); diff -ruNp org/access/verbs.c new/access/verbs.c --- org/access/verbs.c 2004-10-29 12:38:35.000000000 -0700 +++ new/access/verbs.c 2004-10-29 12:38:16.000000000 -0700 @@ -136,12 +136,11 @@ EXPORT_SYMBOL(ib_destroy_ah); /* Queue pair */ struct ib_qp *ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap) + struct ib_qp_init_attr *qp_init_attr) { struct ib_qp *qp; - qp = pd->device->create_qp(pd, qp_init_attr, qp_cap); + qp = pd->device->create_qp(pd, qp_init_attr); if (!IS_ERR(qp)) { qp->device = pd->device; diff -ruNp org/include/ib_verbs.h new/include/ib_verbs.h --- org/include/ib_verbs.h 2004-10-29 12:38:39.000000000 -0700 +++ new/include/ib_verbs.h 2004-10-29 12:38:22.000000000 -0700 @@ -687,12 +687,10 @@ struct ib_device { struct ib_ah_attr *ah_attr); int (*destroy_ah)(struct ib_ah *ah); struct ib_qp * (*create_qp)(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); + struct ib_qp_init_attr *qp_init_attr); int (*modify_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_cap *qp_cap); + int qp_attr_mask); int (*query_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, @@ -844,15 +842,13 @@ int ib_query_ah(struct ib_ah *ah, int ib_destroy_ah(struct ib_ah *ah); struct ib_qp *ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); + struct ib_qp_init_attr *qp_init_attr); static inline int ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_cap *qp_cap) + int qp_attr_mask) { - return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); + return qp->device->modify_qp(qp, qp_attr, qp_attr_mask); } int ib_query_qp(struct ib_qp *qp, diff -ruNp org/hw/mthca/mthca_provider.c new/hw/mthca/mthca_provider.c --- org/hw/mthca/mthca_provider.c 2004-10-29 12:42:38.000000000 -0700 +++ new/hw/mthca/mthca_provider.c 2004-10-29 12:46:41.000000000 -0700 @@ -289,8 +289,7 @@ static int mthca_ah_destroy(struct ib_ah } static struct ib_qp *mthca_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *init_attr, - struct ib_qp_cap *qp_cap) + struct ib_qp_init_attr *init_attr) { struct mthca_qp *qp; int err; @@ -349,8 +348,7 @@ static struct ib_qp *mthca_create_qp(str return ERR_PTR(err); } - *qp_cap = init_attr->cap; - qp_cap->max_inline_data = 0; + init_attr->cap.max_inline_data = 0; return (struct ib_qp *) qp; } -------------- next part -------------- diff -ruNp org/access/mad.c new/access/mad.c --- org/access/mad.c 2004-10-29 12:38:35.000000000 -0700 +++ new/access/mad.c 2004-10-29 12:38:16.000000000 -0700 @@ -1509,7 +1509,6 @@ static inline int ib_mad_change_qp_state int ret; struct ib_qp_attr *attr; int attr_mask; - struct ib_qp_cap qp_cap; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -1530,7 +1529,7 @@ static inline int ib_mad_change_qp_state attr->qkey = IB_QP1_QKEY; attr_mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_QKEY; - ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); + ret = ib_modify_qp(qp, attr, attr_mask); kfree(attr); if (ret) @@ -1546,7 +1545,6 @@ static inline int ib_mad_change_qp_state int ret; struct ib_qp_attr *attr; int attr_mask; - struct ib_qp_cap qp_cap; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -1557,7 +1555,7 @@ static inline int ib_mad_change_qp_state attr->qp_state = IB_QPS_RTR; attr_mask = IB_QP_STATE; - ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); + ret = ib_modify_qp(qp, attr, attr_mask); kfree(attr); if (ret) @@ -1573,7 +1571,6 @@ static inline int ib_mad_change_qp_state int ret; struct ib_qp_attr *attr; int attr_mask; - struct ib_qp_cap qp_cap; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -1585,7 +1582,7 @@ static inline int ib_mad_change_qp_state attr->sq_psn = IB_MAD_SEND_Q_PSN; attr_mask = IB_QP_STATE | IB_QP_SQ_PSN; - ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); + ret = ib_modify_qp(qp, attr, attr_mask); kfree(attr); if (ret) @@ -1601,7 +1598,6 @@ static inline int ib_mad_change_qp_state int ret; struct ib_qp_attr *attr; int attr_mask; - struct ib_qp_cap qp_cap; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -1612,7 +1608,7 @@ static inline int ib_mad_change_qp_state attr->qp_state = IB_QPS_RESET; attr_mask = IB_QP_STATE; - ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); + ret = ib_modify_qp(qp, attr, attr_mask); kfree(attr); if (ret) @@ -1761,7 +1757,6 @@ static int ib_mad_port_open(struct ib_de for (i = 0; i < IB_MAD_QPS_CORE; i++) { struct ib_qp_init_attr qp_init_attr; - struct ib_qp_cap qp_cap; memset(&qp_init_attr, 0, sizeof qp_init_attr); qp_init_attr.send_cq = port_priv->cq; @@ -1774,8 +1769,7 @@ static int ib_mad_port_open(struct ib_de qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG; qp_init_attr.qp_type = i; /* Relies on ib_qp_type enum ordering of IB_QPT_SMI and IB_QPT_GSI */ qp_init_attr.port_num = port_priv->port_num; - port_priv->qp[i] = ib_create_qp(port_priv->pd, &qp_init_attr, - &qp_cap); + port_priv->qp[i] = ib_create_qp(port_priv->pd, &qp_init_attr); if (IS_ERR(port_priv->qp[i])) { printk(KERN_ERR PFX "Couldn't create ib_mad QP%d\n", i); ret = PTR_ERR(port_priv->qp[i]); diff -ruNp org/access/verbs.c new/access/verbs.c --- org/access/verbs.c 2004-10-29 12:38:35.000000000 -0700 +++ new/access/verbs.c 2004-10-29 12:38:16.000000000 -0700 @@ -136,12 +136,11 @@ EXPORT_SYMBOL(ib_destroy_ah); /* Queue pair */ struct ib_qp *ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap) + struct ib_qp_init_attr *qp_init_attr) { struct ib_qp *qp; - qp = pd->device->create_qp(pd, qp_init_attr, qp_cap); + qp = pd->device->create_qp(pd, qp_init_attr); if (!IS_ERR(qp)) { qp->device = pd->device; diff -ruNp org/include/ib_verbs.h new/include/ib_verbs.h --- org/include/ib_verbs.h 2004-10-29 12:38:39.000000000 -0700 +++ new/include/ib_verbs.h 2004-10-29 12:38:22.000000000 -0700 @@ -687,12 +687,10 @@ struct ib_device { struct ib_ah_attr *ah_attr); int (*destroy_ah)(struct ib_ah *ah); struct ib_qp * (*create_qp)(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); + struct ib_qp_init_attr *qp_init_attr); int (*modify_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_cap *qp_cap); + int qp_attr_mask); int (*query_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, @@ -844,15 +842,13 @@ int ib_query_ah(struct ib_ah *ah, int ib_destroy_ah(struct ib_ah *ah); struct ib_qp *ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); + struct ib_qp_init_attr *qp_init_attr); static inline int ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_cap *qp_cap) + int qp_attr_mask) { - return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); + return qp->device->modify_qp(qp, qp_attr, qp_attr_mask); } int ib_query_qp(struct ib_qp *qp, diff -ruNp org/hw/mthca/mthca_provider.c new/hw/mthca/mthca_provider.c --- org/hw/mthca/mthca_provider.c 2004-10-29 12:42:38.000000000 -0700 +++ new/hw/mthca/mthca_provider.c 2004-10-29 12:46:41.000000000 -0700 @@ -289,8 +289,7 @@ static int mthca_ah_destroy(struct ib_ah } static struct ib_qp *mthca_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *init_attr, - struct ib_qp_cap *qp_cap) + struct ib_qp_init_attr *init_attr) { struct mthca_qp *qp; int err; @@ -349,8 +348,7 @@ static struct ib_qp *mthca_create_qp(str return ERR_PTR(err); } - *qp_cap = init_attr->cap; - qp_cap->max_inline_data = 0; + init_attr->cap.max_inline_data = 0; return (struct ib_qp *) qp; } From mshefty at ichips.intel.com Fri Oct 29 13:14:37 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 29 Oct 2004 13:14:37 -0700 Subject: [openib-general] [RFC] [PATCH] Remove redundant ib_qp_cap from 2 verb routines. In-Reply-To: References: Message-ID: <20041029131437.6f1d0cf6.mshefty@ichips.intel.com> On Fri, 29 Oct 2004 13:01:03 -0700 (PDT) Krishna Kumar wrote: > Hi, > > I know this changes the verbs interface a bit, but ... > > I don't see a value in the qp_cap being passed to different routines, > when either ib_qp_attr or ib_qp_init_attr, both of which contain a > qp_cap, are being passed at the same time. The parameter is there to separate input/output parameters, and resulted from the original VAPI evolution of the code. There's no strong technical reason that it cannot be removed. - Sean From halr at voltaire.com Fri Oct 29 13:47:59 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 29 Oct 2004 16:47:59 -0400 Subject: [Fwd: [openib-general] Latest IPoIB Status] In-Reply-To: <523bzxxpce.fsf@topspin.com> References: <1099075686.2991.16.camel@hpc-1> <52d5z1xqfy.fsf@topspin.com> <1099079658.3270.8.camel@localhost.localdomain> <523bzxxpce.fsf@topspin.com> Message-ID: <1099082879.3270.33.camel@localhost.localdomain> On Fri, 2004-10-29 at 16:05, Roland Dreier wrote: > OK, I just fixed the bug below (left over in the port to new SA API). > > should work better now I hope. Good find. It looks like that did the trick :-) Thanks for the fast turnaround. That makes things much nicer as rebooting is now more reliable and one doesn't need to be at the machine to accomplish this... -- Hal From xma at us.ibm.com Fri Oct 29 16:35:40 2004 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 29 Oct 2004 17:35:40 -0600 Subject: [openib-general] [PATCH]code optimization in ib_register_mad_agent() Message-ID: I am starting to look at the access layer code. Here is a code optimization patch in ib_register_mad_agent(). thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone: (503) 578-7638 FAX: (503) 578-3228 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: access.mad.patch Type: application/octet-stream Size: 2138 bytes Desc: not available URL: From xma at us.ibm.com Fri Oct 29 17:06:47 2004 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 29 Oct 2004 18:06:47 -0600 Subject: [openib-general] [PATCH]spinlock shouldn't be held while calling ib_post_send() Message-ID: Here is the patch. thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone: (503) 578-7638 FAX: (503) 578-3228 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: access.mad1.patch Type: application/octet-stream Size: 839 bytes Desc: not available URL: From roland at topspin.com Fri Oct 29 17:11:12 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 17:11:12 -0700 Subject: [openib-general] [PATCH]spinlock shouldn't be held while calling ib_post_send() In-Reply-To: (Shirley Ma's message of "Fri, 29 Oct 2004 18:06:47 -0600") References: Message-ID: <52y8hpvzdr.fsf@topspin.com> it's perfectly fine to call ib_post_send() from any context, including with spinlocks held. - R. From mshefty at ichips.intel.com Fri Oct 29 17:09:17 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 29 Oct 2004 17:09:17 -0700 Subject: [openib-general] [PATCH]spinlock shouldn't be held while calling ib_post_send() In-Reply-To: References: Message-ID: <20041029170917.3faa58e3.mshefty@ichips.intel.com> On Fri, 29 Oct 2004 18:06:47 -0600 Shirley Ma wrote: > Here is the patch. Note that my patch removes the lock when calling ib_post_send. But, holding the lock when calling ib_post_send() should be fine. Also, the current completion code assumes that the work requests are queued in the same order that the sends are posted in. Releasing the lock after queuing the request, but before calling ib_psot_send() allows work requests to be posted out of order from the order that they are queued on the send posted list. - Sean From roland at topspin.com Fri Oct 29 17:12:40 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 17:12:40 -0700 Subject: [openib-general] [PATCH]code optimization in ib_register_mad_agent() In-Reply-To: (Shirley Ma's message of "Fri, 29 Oct 2004 17:35:40 -0600") References: Message-ID: <52u0sdvzbb.fsf@topspin.com> Can you post patches in the body of your email or at least with a mime type like text/plain or text/x-patch so I don't have to save the patch to look at it? Thanks, Roland From mshefty at ichips.intel.com Fri Oct 29 17:13:45 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 29 Oct 2004 17:13:45 -0700 Subject: [openib-general] [PATCH]code optimization in ib_register_mad_agent() In-Reply-To: References: Message-ID: <20041029171345.1d01e8a3.mshefty@ichips.intel.com> On Fri, 29 Oct 2004 17:35:40 -0600 Shirley Ma wrote: > I am starting to look at the access layer code. Here is a code > optimization patch in ib_register_mad_agent(). ib_mad_client_id must be incremented while holding the spinlock (or converted into an atomic). The rest of the initialization looks fine moved upwards. - Sean From xma at us.ibm.com Fri Oct 29 17:21:50 2004 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 29 Oct 2004 17:21:50 -0700 Subject: [openib-general] [PATCH]code optimization in ib_register_mad_agent() In-Reply-To: <52u0sdvzbb.fsf@topspin.com> Message-ID: > Can you post patches in the body of your email or at least with a mime type like text/plain or text/x-patch so I don't have to save the patch to look at it? Ok, let's see whether this works? My email has some problem for inline patch. diff -urN access/mad.c access.patch/mad.c --- access/mad.c 2004-10-29 14:17:18.000000000 -0700 +++ access.patch/mad.c 2004-10-29 16:24:28.337157928 -0700 @@ -221,7 +221,21 @@ } /* Make a copy of the MAD registration request */ memcpy(reg_req, mad_reg_req, sizeof *reg_req); - } + } + + /* Now, fill in the various structures */ + memset(mad_agent_priv, 0, sizeof *mad_agent_priv); + mad_agent_priv->port_priv = port_priv; + mad_agent_priv->reg_req = reg_req; + mad_agent_priv->rmpp_version = rmpp_version; + mad_agent_priv->agent.device = device; + mad_agent_priv->agent.recv_handler = recv_handler; + mad_agent_priv->agent.send_handler = send_handler; + mad_agent_priv->agent.context = context; + mad_agent_priv->agent.qp = port_priv->qp[qp_type]; + mad_agent_priv->agent.hi_tid = ++ib_mad_client_id; + mad_agent_priv->agent.port_num = port_num; + spin_lock_irqsave(&port_priv->reg_lock, flags); @@ -237,31 +251,14 @@ method = class->method_table[mgmt_class]; if (method) { if (method_in_use(&method, mad_reg_req)) { - spin_unlock_irqrestore( - &port_priv->reg_lock, flags); ret = ERR_PTR(-EINVAL); goto error3; } } } } - - /* Now, fill in the various structures */ - memset(mad_agent_priv, 0, sizeof *mad_agent_priv); - mad_agent_priv->port_priv = port_priv; - mad_agent_priv->reg_req = reg_req; - mad_agent_priv->rmpp_version = rmpp_version; - mad_agent_priv->agent.device = device; - mad_agent_priv->agent.recv_handler = recv_handler; - mad_agent_priv->agent.send_handler = send_handler; - mad_agent_priv->agent.context = context; - mad_agent_priv->agent.qp = port_priv->qp[qp_type]; - mad_agent_priv->agent.hi_tid = ++ib_mad_client_id; - mad_agent_priv->agent.port_num = port_num; - ret2 = add_mad_reg_req(mad_reg_req, mad_agent_priv); if (ret2) { - spin_unlock_irqrestore(&port_priv->reg_lock, flags); ret = ERR_PTR(ret2); goto error3; } @@ -281,8 +278,8 @@ return &mad_agent_priv->agent; error3: - if (reg_req) - kfree(reg_req); + spin_unlock_irqrestore(&port_priv->reg_lock, flags); + kfree(reg_req); error2: kfree(mad_agent_priv); error1: thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone: (503) 578-7638 FAX: (503) 578-3228 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Fri Oct 29 17:35:38 2004 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 29 Oct 2004 17:35:38 -0700 Subject: [openib-general] atomic_read in ib_unregister_mad_agent() In-Reply-To: Message-ID: It's better to use semaphore instead of atomic_read to check the reference count 0 in wait_event() in ib_unregister_mad_agent(). Agree? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone: (503) 578-7638 FAX: (503) 578-3228 -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at topspin.com Fri Oct 29 17:55:04 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 17:55:04 -0700 Subject: [openib-general] atomic_read in ib_unregister_mad_agent() In-Reply-To: (Shirley Ma's message of "Fri, 29 Oct 2004 17:35:38 -0700") References: Message-ID: <52lldpvxcn.fsf@topspin.com> Shirley> It's better to use semaphore instead of atomic_read to Shirley> check the reference count 0 in wait_event() in Shirley> ib_unregister_mad_agent(). Agree? I don't see how one uses a semaphore to wait for a reference count to become zero (semaphores sleep until their count is non-zero). - R. From roland at topspin.com Fri Oct 29 17:56:25 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 29 Oct 2004 17:56:25 -0700 Subject: [openib-general] [PATCH]code optimization in ib_register_mad_agent() In-Reply-To: (Shirley Ma's message of "Fri, 29 Oct 2004 17:21:50 -0700") References: Message-ID: <52hdodvxae.fsf@topspin.com> The inline patch was whitespace damaged and line wrapped. Is there any way to make your attachments have mime type text/x-patch? That way my client will display it inline. - R. From mb48 at cs.unh.edu Sat Oct 30 09:39:33 2004 From: mb48 at cs.unh.edu (Mahadevan Balasubramaniam) Date: Sat, 30 Oct 2004 12:39:33 -0400 (EDT) Subject: [openib-general] IP over IB problem Message-ID: Hello, I am having two hosts with IP over Infiniband interface. They are connected to each other directly. I am having minism as my subnet manager. When I try to discover the topology, it gives the following error: Initialize Discovery... DONE New Discovered Node PortGUID:0008F10403961019 NodeGUID:0008F10403961018 LID: 1 New Node - Type:CA NumPorts:02 LID:0001 ****** ERROR: InfiniBandMAD - generic error (cMgtClass=81 cMethod=1 wAttrib=15 ) 0:0 Error in Link 4x FromLID:0001 FromPort:01 Verify Topology...DONE I don't know what this means. Any help is greatly appreciated. Thanks Maha From halr at voltaire.com Sat Oct 30 10:35:54 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 30 Oct 2004 13:35:54 -0400 Subject: [openib-general] IP over IB problem In-Reply-To: References: Message-ID: <1099157753.3270.77.camel@localhost.localdomain> On Sat, 2004-10-30 at 12:39, Mahadevan Balasubramaniam wrote: > I am having two hosts with IP over Infiniband interface. They are > connected to each other directly. I am having minism as my subnet manager. > When I try to discover the topology, it gives the following error: > > Initialize Discovery... DONE > New Discovered Node PortGUID:0008F10403961019 NodeGUID:0008F10403961018 > LID: 1 > > New Node - Type:CA NumPorts:02 LID:0001 > ****** ERROR: InfiniBandMAD - generic error (cMgtClass=81 cMethod=1 > wAttrib=15 > ) 0:0 > Error in Link 4x FromLID:0001 FromPort:01 > Verify Topology...DONE > > > I don't know what this means. Any help is greatly appreciated. Management class 0x81 is Subn (Directed Route). Method 1 is a Get. Attribute ID 0x15 is PortInfo. So this is SubnGet(PortInfo) which should be answered by the SMA. Not sure why there is an error on this. What IB stack are you running on the 2 end nodes with the HCAs ? BTW, I don't think minism can support IPoIB. It is lacking an SA which is needed for PathRecord and MulticastRecord support. -- Hal From mb48 at cs.unh.edu Sat Oct 30 11:35:32 2004 From: mb48 at cs.unh.edu (Mahadevan Balasubramaniam) Date: Sat, 30 Oct 2004 14:35:32 -0400 (EDT) Subject: [openib-general] IP over IB problem In-Reply-To: <1099157753.3270.77.camel@localhost.localdomain> References: <1099157753.3270.77.camel@localhost.localdomain> Message-ID: Hi, I'm using Voltaire's stack. Is there any place on the web where I can find a user manual (or some information) about minism. Actually, I'm quite new in using this stuff. Thanks Maha On Sat, 30 Oct 2004, Hal Rosenstock wrote: > On Sat, 2004-10-30 at 12:39, Mahadevan Balasubramaniam wrote: > > I am having two hosts with IP over Infiniband interface. They are > > connected to each other directly. I am having minism as my subnet manager. > > When I try to discover the topology, it gives the following error: > > > > Initialize Discovery... DONE > > New Discovered Node PortGUID:0008F10403961019 NodeGUID:0008F10403961018 > > LID: 1 > > > > New Node - Type:CA NumPorts:02 LID:0001 > > ****** ERROR: InfiniBandMAD - generic error (cMgtClass=81 cMethod=1 > > wAttrib=15 > > ) 0:0 > > Error in Link 4x FromLID:0001 FromPort:01 > > Verify Topology...DONE > > > > > > I don't know what this means. Any help is greatly appreciated. > > Management class 0x81 is Subn (Directed Route). Method 1 is a Get. > Attribute ID 0x15 is PortInfo. So this is SubnGet(PortInfo) which should > be answered by the SMA. Not sure why there is an error on this. > > What IB stack are you running on the 2 end nodes with the HCAs ? > > BTW, I don't think minism can support IPoIB. It is lacking an SA which > is needed for PathRecord and MulticastRecord support. > > -- Hal > From halr at voltaire.com Sat Oct 30 13:56:11 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 30 Oct 2004 16:56:11 -0400 Subject: [openib-general] IP over IB problem In-Reply-To: References: <1099157753.3270.77.camel@localhost.localdomain> Message-ID: <1099169771.3270.113.camel@localhost.localdomain> On Sat, 2004-10-30 at 14:35, Mahadevan Balasubramaniam wrote: > Hi, > I'm using Voltaire's stack. Is there any place on the web where I can > find a user manual (or some information) about minism. minism is insufficient for running IPoIB. You need a full SM. I do not believe there is a host based SM for the Voltaire stack although I'm not 100% about whether OpenSM in some configuration works over the Voltaire stack. So what I would recommend is not a back to back configuration for doing this. Either add in a switch with an embedded SM, or a switch without an embedded SM and another end node which would run an SM. Is that possible ? -- Hal From halr at voltaire.com Sat Oct 30 14:25:39 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 30 Oct 2004 17:25:39 -0400 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <20041027230641.742e2543.mshefty@ichips.intel.com> References: <52pt36brzm.fsf@topspin.com> <1098728484.3269.968.camel@localhost.localdomain> <52bren527p.fsf@topspin.com> <20041027135517.17ba87fb.mshefty@ichips.intel.com> <52oein2yxt.fsf@topspin.com> <20041027230641.742e2543.mshefty@ichips.intel.com> Message-ID: <1099171539.3270.160.camel@localhost.localdomain> On Thu, 2004-10-28 at 02:06, Sean Hefty wrote: > Either is fine. I'd just like to get to a single branch. Renaming > roland-merge may make it easier for people to locate the correct code. Not sure what you mean by "correct code" in the statement above. -- Hal From halr at voltaire.com Sat Oct 30 14:25:34 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 30 Oct 2004 17:25:34 -0400 Subject: [openib-general] 2 questions on physical code layout In-Reply-To: <52oein2yxt.fsf@topspin.com> References: <52pt36brzm.fsf@topspin.com> <1098728484.3269.968.camel@localhost.localdomain> <52bren527p.fsf@topspin.com> <20041027135517.17ba87fb.mshefty@ichips.intel.com> <52oein2yxt.fsf@topspin.com> Message-ID: <1099171534.3270.158.camel@localhost.localdomain> On Thu, 2004-10-28 at 01:29, Roland Dreier wrote: > I would be fine with consolidating work onto the roland-merge branch > and pushing core/mad changes through you and Hal, if that's what the > consensus is. > Or we could copy roland-merge to a new branch with a different name > and work there. Are you talking about the whole branch or only parts of the branch ? Does this include SDP, SRP, and the user space components. I would like to be able to see exactly what is eligible for kernel submission initially. This would become the initial stable branch. We would then add components as they become useful. That would be the "main" branch. -- Hal From shaharf at voltaire.com Sun Oct 31 05:25:32 2004 From: shaharf at voltaire.com (shaharf) Date: Sun, 31 Oct 2004 15:25:32 +0200 Subject: [openib-general] RE: [opened-general] IP over IB problem Message-ID: Just a small correction: Voltaire's IPoIB should work with minism as long it is configured to do it (using the ib-setup option 3,2). As a matter of fact it should be treated as a hack, but it should be enough for back to back configuration. OpenSM should work with a Voltaire Host, but I am not sure that it is tested. Shahar -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Hal Rosenstock Sent: Saturday, October 30, 2004 10:56 PM To: Mahadevan Balasubramaniam Cc: openib-general at openib.org Subject: Re: [openib-general] IP over IB problem On Sat, 2004-10-30 at 14:35, Mahadevan Balasubramaniam wrote: > Hi, > I'm using Voltaire's stack. Is there any place on the web where I can > find a user manual (or some information) about minism. minism is insufficient for running IPoIB. You need a full SM. I do not believe there is a host based SM for the Voltaire stack although I'm not 100% about whether OpenSM in some configuration works over the Voltaire stack. So what I would recommend is not a back to back configuration for doing this. Either add in a switch with an embedded SM, or a switch without an embedded SM and another end node which would run an SM. Is that possible ? -- Hal _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From kcm at psc.edu Sun Oct 31 10:20:01 2004 From: kcm at psc.edu (Ken MacInnis) Date: Sun, 31 Oct 2004 13:20:01 -0500 Subject: [openib-general] Problem with 2.4.24 and gen1 Message-ID: <41852CD1.9080108@psc.edu> Hi, I've got a fairly modified kernel here I'm trying to get a OpenIB stack running on. It's a vanilla 2.4.24 kernel with Lustre and other patches in it, but I'm seeing this when I modprobe ib_tavor: Oct 31 13:13:05 samwise kernel: THH(1): cmdif.c[1190]: Command not completed after timeout: cmd=TAV OR_IF_CMD_MAD_IFC (0x24), token=0x1400, pid=0x8E1, go=0 Oct 31 13:13:05 samwise kernel: THH(1): CMD ERROR DUMP. opcode=0x24, opc_mod = 0x1, exec_time_micro =300000000 . . Oct 31 13:13:06 samwise kernel: THH(1): cmdif.c[842]: Failed command 0x24 (TAVOR_IF_CMD_MAD_IFC): s tatus=0x103 (0x0103 - unexpected error - fatal) Oct 31 13:13:06 samwise kernel: Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[2790]: THH_hob_query_port_prop: cmdif returned FA TAL Oct 31 13:13:06 samwise kernel: VIPKL(1): qpm.c[278]: QPM_new: HOBKL_query_port_prop returned with error: -254 = VAPI_EFATAL Oct 31 13:13:06 samwise kernel: VIPKL(1): qpm.c[302]: QPM_new: returned with error: -254 = VAPI_EF ATAL Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[3474]: THH_hob_fatal_err_thread: RECEIVED FATAL E RROR WAKEUP Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[4490]: THH_hob_halt_hca: HALT HCA returned 0x103 Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[1620]: THH_hob_destroy: FATAL ERROR Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[1627]: THH_hob_destroy: PERFORMING SW RESET. pa=0 xFE9F0010 va=0xF8A01010 Oct 31 13:13:06 samwise kernel: Oct 31 13:13:06 samwise kernel: Mellanox Tavor Device Driver is creating device "InfiniHost0" (bus=0 4, devfn=00) Oct 31 13:13:06 samwise kernel: Oct 31 13:13:06 samwise kernel: [KERNEL_IB][_tsIbTavorInitOne][tavor_main.c:86]InfiniHost0: VAPI_ope n_hca failed, status -254 (Fatal error (Local Catastrophic Error)) Oct 31 13:13:06 samwise kernel: [SRPTP][srp_host_init][srp_host.c:1495]SRP Host using indirect addre ssing This occurs with an older openib rev (200-ish) as well as one up-to-date as of today. Everything else (modules.conf, etc.) is set up as it has been when I was messing with 2.4 kernels and OpenIB a few months ago, so I'm not thinking it's related to such. Any ideas? Yes, I know it's 2.4 as well as a fairly older 2.4, but I have no choice here. :) lspci -vvv bits follow. 03:01.0 PCI bridge: Mellanox Technology: Unknown device 5a46 (rev a1) (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR-

Reset- FastB2B- Capabilities: [70] PCI-X non-bridge device. Command: DPERE+ ERO+ RBC=0 OST=4 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, D MCRS=0, RSCEM- 04:00.0 InfiniBand: Mellanox Technology: Unknown device 5a44 (rev a1) Subsystem: Mellanox Technology: Unknown device 5a44 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR-

Hi, The problem is that the driver does not get the interrupt for the command completion, and thus you get the error: "Command not completed after timeout". It is related to the OS & system you are using. What is the distribution you are using? We once saw such problems with older versions of SuSE. Try to add append="acpi=off" to the lilo you are using or add also disableapic in the same append line. Tziporet -----Original Message----- From: Ken MacInnis [mailto:kcm at psc.edu] Sent: Sunday, October 31, 2004 8:20 PM To: openib-general at openib.org Subject: [openib-general] Problem with 2.4.24 and gen1 Hi, I've got a fairly modified kernel here I'm trying to get a OpenIB stack running on. It's a vanilla 2.4.24 kernel with Lustre and other patches in it, but I'm seeing this when I modprobe ib_tavor: Oct 31 13:13:05 samwise kernel: THH(1): cmdif.c[1190]: Command not completed after timeout: cmd=TAV OR_IF_CMD_MAD_IFC (0x24), token=0x1400, pid=0x8E1, go=0 Oct 31 13:13:05 samwise kernel: THH(1): CMD ERROR DUMP. opcode=0x24, opc_mod = 0x1, exec_time_micro =300000000 . . Oct 31 13:13:06 samwise kernel: THH(1): cmdif.c[842]: Failed command 0x24 (TAVOR_IF_CMD_MAD_IFC): s tatus=0x103 (0x0103 - unexpected error - fatal) Oct 31 13:13:06 samwise kernel: Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[2790]: THH_hob_query_port_prop: cmdif returned FA TAL Oct 31 13:13:06 samwise kernel: VIPKL(1): qpm.c[278]: QPM_new: HOBKL_query_port_prop returned with error: -254 = VAPI_EFATAL Oct 31 13:13:06 samwise kernel: VIPKL(1): qpm.c[302]: QPM_new: returned with error: -254 = VAPI_EF ATAL Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[3474]: THH_hob_fatal_err_thread: RECEIVED FATAL E RROR WAKEUP Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[4490]: THH_hob_halt_hca: HALT HCA returned 0x103 Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[1620]: THH_hob_destroy: FATAL ERROR Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[1627]: THH_hob_destroy: PERFORMING SW RESET. pa=0 xFE9F0010 va=0xF8A01010 Oct 31 13:13:06 samwise kernel: Oct 31 13:13:06 samwise kernel: Mellanox Tavor Device Driver is creating device "InfiniHost0" (bus=0 4, devfn=00) Oct 31 13:13:06 samwise kernel: Oct 31 13:13:06 samwise kernel: [KERNEL_IB][_tsIbTavorInitOne][tavor_main.c:86]InfiniHost0: VAPI_ope n_hca failed, status -254 (Fatal error (Local Catastrophic Error)) Oct 31 13:13:06 samwise kernel: [SRPTP][srp_host_init][srp_host.c:1495]SRP Host using indirect addre ssing This occurs with an older openib rev (200-ish) as well as one up-to-date as of today. Everything else (modules.conf, etc.) is set up as it has been when I was messing with 2.4 kernels and OpenIB a few months ago, so I'm not thinking it's related to such. Any ideas? Yes, I know it's 2.4 as well as a fairly older 2.4, but I have no choice here. :) lspci -vvv bits follow. 03:01.0 PCI bridge: Mellanox Technology: Unknown device 5a46 (rev a1) (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR-