From halr at voltaire.com Sun Aug 1 07:40:48 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 01 Aug 2004 10:40:48 -0400 Subject: [openib-general] gen2 dev branch References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> Message-ID: <004101c477d5$8a2c1260$6401a8c0@comcast.net> Roland Dreier wrote: > I don't like the assumption that that every MAD on QP1 > will follow all the GSI rules (especially since CM doesn't really > follow all the GSI ruls). As far as I know, CM follows all the GSI rules. Are you referring to the CM being the only GS entity to use send rather than request/response methods (and the use of transaction IDs for message sequences rather than request/response pairs) ? In any case, there are several existence proofs of CMs running on top of a interface similar to the one being proposed. > I don't like building a general kernel GSI > layer when the SA client is the only known consumer. Not sure what you mean by known in the above. Are you referring to "phase 1" kernel consumers only in that statement ? -- Hal From yaronh at voltaire.com Sun Aug 1 12:40:02 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Sun, 1 Aug 2004 22:40:02 +0300 Subject: [openib-general] gen2 dev branch Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AB07@taurus.voltaire.com> On Sunday, August 01, 2004 4:47 AM, Roland Dreier wrote: > Yaron> I don't see where is the policy vs mechanism, we suggest a > Yaron> mechanism that works and performs in the most extreme cases > Yaron> possible , enable more functionality, more scalable, and > Yaron> easier to work with, and the only counter arguments I see > Yaron> is that the MAD snooping is not flexible > > Yaron> I still think you don't have a real reason to object to our > Yaron> model so aggressively, most people in this list support it, > Yaron> it can work with your code, and we are quite open to > Yaron> incorporate any useful suggestions as long as they are > Yaron> focused and productive. > > I don't like the assumption that there is only one consumer > for each MAD. Any reason we should have more than one consumer for the same MAD beside snoop ? Should that be the main guideline when we build a GSI layer ? > I don't like building a general kernel GSI > layer when the SA client is the only known consumer. Just from what people use today: CM, SA, PM, DM, and vendor specific (we use for SM sync) GSI consumers So do we want to repeat the same GSI functionality for each of them ? In our case a client sends a MAD, get a callback for the response, or retry if no response arrived A client can be in kernel or in user, and only the relevant client will get interrupted, so elegant :) > I don't like to treat QP0 and QP1 so differently. A. QP0 and QP1 are different: different size constraints (RMPP, Direct Route), different QP attributes and handling (Q_Key, Pkey, ..), different variety of classes and services, different behavior in some cases (Redirect), etc' the only common thing is that they use MAD's B. I also don't see any reason for QP0 MAD's to be handled by few consumers on the same time you need at the most to register based on hca, port, mngr/agent (you even suggested having the SMA as part of the driver) > However I agree that you approach can be made to work for today's > applications, so if Sean is comfortable with it I will go along. In > any case we can always replace it later if it turns out I'm right. > > - Roland Give other people some credit, it wont hurt :) if there will be bugs/limitations we will fix them, I don't see any reason to totally replace it in future, its already a proven solution. Yaron From roland at topspin.com Sun Aug 1 13:55:58 2004 From: roland at topspin.com (Roland Dreier) Date: Sun, 01 Aug 2004 13:55:58 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <004101c477d5$8a2c1260$6401a8c0@comcast.net> (Hal Rosenstock's message of "Sun, 01 Aug 2004 10:40:48 -0400") References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <004101c477d5$8a2c1260$6401a8c0@comcast.net> Message-ID: <528ycyr31d.fsf@topspin.com> Hal> As far as I know, CM follows all the GSI rules. Hal> Are you referring to the CM being the only GS entity to use Hal> send rather than request/response methods (and the use of Hal> transaction IDs for message sequences rather than Hal> request/response pairs) ? Right -- also redirection works somewhat differently (new port info as part of a REJ message). Hal> In any case, there are several existence proofs of CMs Hal> running on top of a interface similar to the one being Hal> proposed. Sure, it can work. It just doesn't seem like a perfect fit to me. - Roland From roland at topspin.com Sun Aug 1 14:00:51 2004 From: roland at topspin.com (Roland Dreier) Date: Sun, 01 Aug 2004 14:00:51 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AB07@taurus.voltaire.com> (Yaron Haviv's message of "Sun, 1 Aug 2004 22:40:02 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AB07@taurus.voltaire.com> Message-ID: <521xiqr2t8.fsf@topspin.com> Roland> I don't like building a general kernel GSI layer when the Roland> SA client is the only known consumer. Yaron> Just from what people use today: CM, SA, PM, DM, and vendor Yaron> specific (we use for SM sync) GSI consumers So do we want Yaron> to repeat the same GSI functionality for each of them ? CM has no use for RMPP or really any of the GSI functions. I haven't seen anyone doing PM from the kernel. Right now DM is in the kernel (for Topspin's stack too) but I would like to try to move that to userspace. I assume your vendor-specific SM sync is in userspace as well. Given that most of the users are in userspace, I would like to try and move the general code to a user library too. - Roland From David.Brean at Sun.COM Sun Aug 1 15:51:43 2004 From: David.Brean at Sun.COM (David M. Brean) Date: Sun, 01 Aug 2004 18:51:43 -0400 Subject: [openib-general] Updated IPoIB IETF WG presentation In-Reply-To: <006d01c4756b$948aeea0$6401a8c0@comcast.net> References: <006d01c4756b$948aeea0$6401a8c0@comcast.net> Message-ID: <410D73FF.5060908@sun.com> Hello, Some comments/questions about these slides: * slide 1 - nit: perhaps the title should be "Some Experience with Linux IPoIB Implementations" since the information is coming from Linux developers. * slide 4 - nit: move the first bullet after bullet containing "single implementation" * slide 6 - nit: first bullet should be highlighted as the "problem" and the second bullet as the "solution". * slide 7 and 8 - In section 5.0 of the I-D, there is text stating that the "broadcast group may be created by the first IPoIB node to be initialized or it can be created administratively before the IPoIB subnet is setup". The mechanism used to administratively create the group is intentionally beyond the scope of the I-D. For example, an implementation could enable the fabric (or "network" as you say) administrator to control membership in a partition and therefore make sure that the first node added to that partition creates the broadcast group correctly. In any case, mentioning the administrative option is kinda a "helpful" hint. All the IPoIB nodes are free to create the broadcast group, just like they can create any multicast group, as long as the IPoIB node has enough information to specify the necessary parameters as required by the SA interface. The I-D suggests how to find the necessary parameters for the multicast groups and leaves open how IPoIB nodes obtain that information if they need to create that group. Are these slides suggesting that the I-D be changed to specify the IPoIB parameters via defaults for the case where the IPoIB node must create the broadcast group? [Note, Q_Key is provided by broadcast group, so it isn't necessary to distribute to all IPoIB nodes.] * slide 9 and 10 - "Running" may be the description of a state that is be OS is beyond the scope of the I-D (does Windows network interface support a "running" state?). However, the I-D does say that an IPoIB link is "formed" only when the broadcast group exists. The I-D doesn't say anything about operation in a "degraded" mode, for example, when a IPoIB node can't join a multicast group. Behavior in degraded mode seems like an implementation issue. It's not clear what you would want to change in the I-D, perhaps you can suggest what you want changed in the presentation. * slide 12 - I recall that during the email discussion: 1) a boot-time scenario where the IPoIB nodes had to access the SA to obtain pathrecord information to fill the pathrecord cache and send unicast ARP messages 2) a SM failover/restart scenario For #1, the speed at which the IPoIB nodes can begin normal operation depends on the fabric and SA implementation. I guess the question is whether this is an architecture or implementation problem. Is it impossible to implement a working system based on the current architecture? I think the proposed alternative would require changes to the encapsulation scheme plus specifying some defaults such as the SL so that SA queries are eliminated. Some of that might require input from the IBTA. For #2, how long is too long for a subnet to operate without successful SA queries? 10 seconds? 20 seconds? Or is this change suggesting that the subnet should continue operating, perhaps establishing new IP connections (note, this proposal doesn't attempt to fix the situation at the IB transport level) even in the case where no SA exists. Please clarify in the slides. * slide 13 - An IB CA should perform as well as a "dumb" ethernet NIC with respect to bandwidth and CPU utilization. If not, someone should look at the overheads in the IB access layer and the CA implementation, right? The statement "not equivalent to ethernet" is highlighting the lack offload mechanisms in the CA such as checksum, correct? If so, perhaps that point should be made explicit. Note, I'm not attempting to respond to the issues raised on the slides since that will happen at the meeting, but merely seeking clarification of the issues being raised. -David Hal Rosenstock wrote: > Here's an updated presentation based on the comments from yesterday: > - Separate slide and more detail on openib > - Eliminate checksum slide > > It is also available as > https://openib.org/svn/trunk/contrib/voltaire/ietf_ipoib/ipoib_exp.pdf. > > -- Hal > > From halr at voltaire.com Sun Aug 1 18:50:04 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 01 Aug 2004 21:50:04 -0400 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate Message-ID: <025a01c47833$096cb180$6401a8c0@comcast.net> Under https://openib.org/svn/gen2/branches/openib-candidate/src/linux-kernel/infin iband/, there is a GSI implementation under access including RMPP under access/rmpp. A couple of the include files (ib_verbs.h, a skeletal ib_core.h, and ib_core_types.h) are under the include subdirectory. The GSI (including RMPP) has been ported to the new ib_verbs (although it is using a version from a couple of days ago prior to its merging with mthca started). While this version stems from a working implementation, it has not been tested on ib_verbs as yet. The next step in this evolution is to modify the GSI to the one proposed for openib in https://openib.org/svn/trunk/contrib/voltaire/access/gsi.h -- Hal From halr at voltaire.com Mon Aug 2 06:59:25 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 02 Aug 2004 09:59:25 -0400 Subject: [openib-general] Updated IPoIB IETF WG presentation References: <006d01c4756b$948aeea0$6401a8c0@comcast.net> <410D73FF.5060908@sun.com> Message-ID: <001701c47898$ec8f6880$6401a8c0@comcast.net> Hi David, David M. Brean wrote: > Hello, > > Some comments/questions about these slides: > > * slide 1 - nit: perhaps the title should be "Some Experience with > Linux IPoIB Implementations" since the information is coming from > Linux > developers. Good point. > * slide 4 - nit: move the first bullet after bullet containing "single > implementation" I reordered the bullets as suggested. > * slide 6 - nit: first bullet should be highlighted as the "problem" > and > the second bullet as the "solution". Done. > * slide 7 and 8 - In section 5.0 of the I-D, there is text stating > that > the "broadcast group may be created by the first IPoIB node to be > initialized or it can be created administratively before the IPoIB > subnet is setup". The mechanism used to administratively create the > group is intentionally beyond the scope of the I-D. For example, an > implementation could enable the fabric (or "network" as you say) > administrator to control membership in a partition and therefore make > sure that the first node added to that partition creates the broadcast > group correctly. In any case, mentioning the administrative option is > kinda a "helpful" hint. All the IPoIB nodes are free to create the > broadcast group, just like they can create any multicast group, as > long > as the IPoIB node has enough information to specify the necessary > parameters as required by the SA interface. The I-D suggests how to > find the necessary parameters for the multicast groups and leaves open > how IPoIB nodes obtain that information if they need to create that > group. > > Are these slides suggesting that the I-D be changed to specify the > IPoIB parameters via defaults for the case where the IPoIB node must > create the broadcast group? >From the discussion on the group, it was stated that some may have interpreted the spec as requiring the pre-administered groups and not supporting the end node creation of a group (even the broadcast group if not already present) (at least that's the way at least two were implemented). This may not be an issue any more but I have not seen this stated explicitly on this email list. Yes, it might be good (to eliminate the need for explicit configuration) to select a specific controlled QKey as a default for the end node case. > [Note, Q_Key is provided by broadcast group, so it isn't necessary > to distribute to all IPoIB nodes.] Are you referring to "It is RECOMMENDED that a controlled Q_Key be used with the high order bit set." for the broadcast group (and all other groups using the broadcast group parameters) ? Aren't there many controlled QKeys so this still needs configuration somewhere (either at the SM/SA or at at least one end node (if all the others join rather than create the broadcast group (otherwise all end nodes if they all attempt to create this group when not present)) ? > * slide 9 and 10 - "Running" may be the description of a state that is > be OS is beyond the scope of the I-D (does Windows network interface > support a "running" state?). However, the I-D does say that an IPoIB > link is "formed" only when the broadcast group exists. The I-D > doesn't > say anything about operation in a "degraded" mode, for example, when a > IPoIB node can't join a multicast group. Behavior in degraded mode > seems like an implementation issue. It's not clear what you would > want > to change in the I-D, perhaps you can suggest what you want changed in > the presentation. I added in a bullet on interface state being OS specific. What I was wondering about (due to the implementations not currently dealing with the failure modes) was: Is the statement "an IPoIB link is "formed" only when the broadcast group exists" sufficient for an IPoIB node failing to join the broadcast group ? Perhaps it should state "From the IPoIB node perspective, the node is not part of the IPoIB link until (at least) the broadcast group is successfully joined" as well. > * slide 12 - I recall that during the email discussion: > 1) a boot-time scenario where the IPoIB nodes had to access the SA to > obtain pathrecord information to fill the pathrecord cache and send > unicast ARP messages I didn't mention this one in the presentation although it is mentioned in bullet which states "Only if node has talked with other node (and cached information); otherwise SA interaction is currently needed" > 2) a SM failover/restart scenario > > For #1, the speed at which the IPoIB nodes can begin normal > operation depends on the fabric and SA implementation. I guess the > question is > whether this is an architecture or implementation problem. Is it > impossible to implement a working system based on the current > architecture? I think the proposed alternative would require changes > to > the encapsulation scheme plus specifying some defaults such as the SL > so > that SA queries are eliminated. Some of that might require input from > the IBTA. > > For #2, how long is too long for a subnet to operate without > successful SA queries? 10 seconds? 20 seconds? Don't know. Perhaps there are some on this list with opinions on this. > Or is this change > suggesting that the subnet should continue operating, perhaps > establishing new IP connections (note, this proposal doesn't attempt > to > fix the situation at the IB transport level) even in the case where no > SA exists. Please clarify in the slides. The intent is to continue operation for all IPoIB nodes currently on the subnet (in the absence of any changes) when in the window when no SM/SA exists. > * slide 13 - An IB CA should perform as well as a "dumb" ethernet NIC > with respect to bandwidth and CPU utilization. If not, someone should > look at the overheads in the IB access layer and the CA > implementation, right? The statement "not equivalent to ethernet" is > highlighting the > lack offload mechanisms in the CA such as checksum, correct? If so, > perhaps that point should be made explicit. Another lack of clarity. I did mean "dumb" ethernet and not anything more sophisticated with checksum offload, etc. That's a separate issue. I made this into 2 slides in the next version of this presentation. > Note, I'm not attempting to respond to the issues raised on the slides > since that will happen at the meeting, but merely seeking > clarification > of the issues being raised. Understood. Thanks for your comments. I think the (hopefully) added clarity will help. -- Hal From halr at voltaire.com Mon Aug 2 07:40:51 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 02 Aug 2004 10:40:51 -0400 Subject: [openib-general] Updated IPoIB IETF WG presentation Message-ID: <002901c4789e$b6e14900$6401a8c0@comcast.net> Hi, Here's another update to the IPoIB presentation based on David's comments. It is also available as https://openib.org/svn/trunk/contrib/voltaire/ietf_ipoib/ipoib_exp.pdf. -- Hal -------------- next part -------------- A non-text attachment was scrubbed... Name: IPoIB Implementation Experience.pdf Type: application/pdf Size: 225095 bytes Desc: not available URL: From roland at topspin.com Mon Aug 2 07:51:09 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 02 Aug 2004 07:51:09 -0700 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate In-Reply-To: <025a01c47833$096cb180$6401a8c0@comcast.net> (Hal Rosenstock's message of "Sun, 01 Aug 2004 21:50:04 -0400") References: <025a01c47833$096cb180$6401a8c0@comcast.net> Message-ID: <52n01dpp9e.fsf@topspin.com> A few quick comments based on starting to read the code: - Makefile should use standard kbuild rather your own rules. It doesn't seem like it can even build a 2.6 .ko module. - Need to get rid of your spinlock wrappers -- not just for style reasons, as usual the wrappers are buggy. - Need to remove all the #if 0/#if 1 (or replace with #ifdef SUITABLE_PREPROC_SYMBOL) -- however #ifdefs in .c files should be avoided if at all possible. - Static limit on number of ports/HCAs supported doesn't look good to me. - VD_ENTERFUNC() etc. debugging code needs to be removed - all printk()s need appropriate KERN_ levels. - all /proc files should be moved to sysfs - Shouldn't hard-code P_Key index ... needs to be settable by consumer - Need some way to send mads with GRH - ib_reg_mr() function has been removed from the API, and registering memory in the data path doesn't look good to me -- you should do ib_reg_phys_mr() once to cover all of lowmem, and then just do pci_map_single()/pci_unmap_single() in the data path. - gsi_post_send_mad() looks buggy to me -- where is addr_hndl_attr filled in? Thanks, Roland From halr at voltaire.com Mon Aug 2 08:30:34 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 02 Aug 2004 11:30:34 -0400 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate References: <025a01c47833$096cb180$6401a8c0@comcast.net> <52n01dpp9e.fsf@topspin.com> Message-ID: <006a01c478a5$a8337ca0$6401a8c0@comcast.net> Roland Dreier wrote: > A few quick comments based on starting to read the code: I added a TODO list as: https://openib.org/svn/gen2/branches/openib-candidate/src/linux-kernel/infin iband/access/TODO with your comments. > - Makefile should use standard kbuild rather your own rules. It > doesn't seem like it can even build a 2.6 .ko module. > - Need to get rid of your spinlock wrappers -- not just for style > reasons, as usual the wrappers are buggy. > - Need to remove all the #if 0/#if 1 (or replace with #ifdef > SUITABLE_PREPROC_SYMBOL) -- however #ifdefs in .c files should be > avoided if at all possible. > - Static limit on number of ports/HCAs supported doesn't look good > to me. > - VD_ENTERFUNC() etc. debugging code needs to be removed > - all printk()s need appropriate KERN_ levels. > - all /proc files should be moved to sysfs > - Shouldn't hard-code P_Key index ... needs to be settable by > consumer > - Need some way to send mads with GRH I agree in general with this but: There is no requirement to send with GRH (only to receive with GRH) (until multisubnet is supported which is currently an incomplete aspect of IBA). I have not added this one into the TODO (at least yet)... > - ib_reg_mr() function has been removed from the API, and registering > memory in the data path doesn't look good to me -- you should do > ib_reg_phys_mr() once to cover all of lowmem, and then just do > pci_map_single()/pci_unmap_single() in the data path. > - gsi_post_send_mad() looks buggy to me -- where is addr_hndl_attr > filled in? Good catch. I missed that in the port to ib_verbs :-( -- Hal From iod00d at hp.com Mon Aug 2 08:30:19 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 2 Aug 2004 08:30:19 -0700 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate In-Reply-To: <52n01dpp9e.fsf@topspin.com> References: <025a01c47833$096cb180$6401a8c0@comcast.net> <52n01dpp9e.fsf@topspin.com> Message-ID: <20040802153019.GA20829@cup.hp.com> On Mon, Aug 02, 2004 at 07:51:09AM -0700, Roland Dreier wrote: > - ib_reg_mr() function has been removed from the API, and registering > memory in the data path doesn't look good to me -- you should do > ib_reg_phys_mr() once to cover all of lowmem, and then just do > pci_map_single()/pci_unmap_single() in the data path. How does that work on ia64? I think I'm being confused by "all of lowmem" when maybe only kernel bits are meant. Is that right? To be clear, we only want to call DMA mapping services for pages that are pinned (can't be swapped out). I'm pretty sure (Roland) you know that already though. thanks, grant From roland at topspin.com Mon Aug 2 08:33:10 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 02 Aug 2004 08:33:10 -0700 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate In-Reply-To: <006a01c478a5$a8337ca0$6401a8c0@comcast.net> (Hal Rosenstock's message of "Mon, 02 Aug 2004 11:30:34 -0400") References: <025a01c47833$096cb180$6401a8c0@comcast.net> <52n01dpp9e.fsf@topspin.com> <006a01c478a5$a8337ca0$6401a8c0@comcast.net> Message-ID: <52fz75pnbd.fsf@topspin.com> Hal> I agree in general with this but: There is no requirement to Hal> send with GRH (only to receive with GRH) (until multisubnet Hal> is supported which is currently an incomplete aspect of IBA). Hal> I have not added this one into the TODO (at least yet)... Don't the current set of IB compliance tests try sending a request with GRH (and expect a response with GRH)? If you can receive requests with GRHs, then the response generation rules require the response to be sent with a GRH as well. - Roland From roland at topspin.com Mon Aug 2 08:39:32 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 02 Aug 2004 08:39:32 -0700 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate In-Reply-To: <20040802153019.GA20829@cup.hp.com> (Grant Grundler's message of "Mon, 2 Aug 2004 08:30:19 -0700") References: <025a01c47833$096cb180$6401a8c0@comcast.net> <52n01dpp9e.fsf@topspin.com> <20040802153019.GA20829@cup.hp.com> Message-ID: <52brhtpn0r.fsf@topspin.com> Grant> How does that work on ia64? I think I'm being confused by Grant> "all of lowmem" when maybe only kernel bits are meant. Is Grant> that right? Grant> To be clear, we only want to call DMA mapping services for Grant> pages that are pinned (can't be swapped out). I'm pretty Grant> sure (Roland) you know that already though. Right now what we do is a little ugly. But we tell the HCA to create a memory region covering memory from address 0 up to (high_memory-PAGE_OFFSET) where address 0 is translated to address 0 (ie the identity mapping). This works OK on the vast majority of systems but will break if we encounter systems with physical memory starting somewhere other than 0, discontiguous memory, a crazy PCI mapping that uses some other range of addresses, etc... In any case, yes you are right, the pci_map call should be on DMA-able memory (kmalloced memory or the like). Once we have the basic verbs API all squared away, I would like to add an extension to get an L_Key with translation turned off. This is almost like 'default L_Key' from the verbs extensions, but Tavor will require a protection domain as well. This will let the HCA use any address returned from pci_map_xxx(). - Roland From halr at voltaire.com Mon Aug 2 08:58:54 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 02 Aug 2004 11:58:54 -0400 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate References: <025a01c47833$096cb180$6401a8c0@comcast.net> <52n01dpp9e.fsf@topspin.com> <006a01c478a5$a8337ca0$6401a8c0@comcast.net> <52fz75pnbd.fsf@topspin.com> Message-ID: <003101c478a9$9e7e4ce0$6401a8c0@comcast.net> Roland Dreier wrote: > Hal> I agree in general with this but: There is no requirement to > Hal> send with GRH (only to receive with GRH) (until multisubnet > Hal> is supported which is currently an incomplete aspect of IBA). > Hal> I have not added this one into the TODO (at least yet)... > > Don't the current set of IB compliance tests try sending a request > with GRH (and expect a response with GRH)? If you can receive > requests with GRHs, then the response generation rules require the > response to be sent with a GRH as well. Can't speak to the current set of IB compliances without doing some homework on them but you are right that there is an extra (C13-52.1.1) compliance for MAD responses which requires a response with GRH to be sent if a request with GRH was received. I will update the TODO list with this. -- Hal From Tom.Duffy at Sun.COM Mon Aug 2 09:50:34 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Mon, 02 Aug 2004 09:50:34 -0700 Subject: [openib-general] Updated IPoIB IETF WG presentation In-Reply-To: <006d01c4756b$948aeea0$6401a8c0@comcast.net> References: <006d01c4756b$948aeea0$6401a8c0@comcast.net> Message-ID: <1091465433.5331.24.camel@localhost> On Thu, 2004-07-29 at 05:57, Hal Rosenstock wrote: > Here's an updated presentation based on the comments from yesterday: > - Separate slide and more detail on openib > - Eliminate checksum slide > > It is also available as > https://openib.org/svn/trunk/contrib/voltaire/ietf_ipoib/ipoib_exp.pdf. On slide 6, it says MAX_ADDR_LEN is set to 8. Just checked my 2.6.8-rc2 tree and it is set to 32. -tduffy From halr at voltaire.com Mon Aug 2 10:09:31 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 02 Aug 2004 13:09:31 -0400 Subject: [openib-general] Updated IPoIB IETF WG presentation In-Reply-To: <1091465433.5331.24.camel@localhost> References: <006d01c4756b$948aeea0$6401a8c0@comcast.net> <1091465433.5331.24.camel@localhost> Message-ID: <1091466573.18550.23.camel@localhost.localdomain> On Mon, 2004-08-02 at 12:50, Tom Duffy wrote: > On Thu, 2004-07-29 at 05:57, Hal Rosenstock wrote: > > Here's an updated presentation based on the comments from yesterday: > > - Separate slide and more detail on openib > > - Eliminate checksum slide > > > > It is also available as > > https://openib.org/svn/trunk/contrib/voltaire/ietf_ipoib/ipoib_exp.pdf. > > On slide 6, it says MAX_ADDR_LEN is set to 8. Just checked my 2.6.8-rc2 > tree and it is set to 32. Yes, this is fixed in 2.6. It was broken in 2.4. I'll fix that slide. Thanks. -- Hal From roland at topspin.com Mon Aug 2 10:10:24 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 02 Aug 2004 10:10:24 -0700 Subject: [openib-general] Updated IPoIB IETF WG presentation In-Reply-To: <1091465433.5331.24.camel@localhost> (Tom Duffy's message of "Mon, 02 Aug 2004 09:50:34 -0700") References: <006d01c4756b$948aeea0$6401a8c0@comcast.net> <1091465433.5331.24.camel@localhost> Message-ID: <52y8kxo48v.fsf@topspin.com> Tom> On slide 6, it says MAX_ADDR_LEN is set to 8. Just checked Tom> my 2.6.8-rc2 tree and it is set to 32. Yes, my patch to change MAX_ADDR_LEN from 8 to 32 was merged around 2.5.54 or so (around January 2003). Of course 2.4 continues to have a MAX_ADDR_LEN of 8. - Roland From iod00d at hp.com Mon Aug 2 10:16:22 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 2 Aug 2004 10:16:22 -0700 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate In-Reply-To: <52brhtpn0r.fsf@topspin.com> References: <025a01c47833$096cb180$6401a8c0@comcast.net> <52n01dpp9e.fsf@topspin.com> <20040802153019.GA20829@cup.hp.com> <52brhtpn0r.fsf@topspin.com> Message-ID: <20040802171622.GD20829@cup.hp.com> On Mon, Aug 02, 2004 at 08:39:32AM -0700, Roland Dreier wrote: > Right now what we do is a little ugly. But we tell the HCA to create > a memory region covering memory from address 0 up to (high_memory-PAGE_OFFSET) > where address 0 is translated to address 0 (ie the identity mapping). Is this representation of memory region using kernel virtual or physical addresses? > This works OK on the vast majority of systems but will break if we > encounter systems with physical memory starting somewhere other than > 0, discontiguous memory, a crazy PCI mapping that uses some other > range of addresses, etc... Most NUMA machines are very likely to break this assumption. Even IA64 on ZX1 will break the assumptions about contigous memory. THe physical memory map for ia64 is something like: 0-1GB RAM 1-2 IOVA space (used by 32-bit PCI devices) 2-4 MMIO space, CPUs, Firmware, etc ... 257-260 RAM (mem controller remapped this from 1-4GB space) There's also space for more RAM and 64-bit MMIO (HP speak it's call GMMIO). > In any case, yes you are right, the pci_map call should be on DMA-able > memory (kmalloced memory or the like). ok > Once we have the basic verbs API all squared away, I would like to add > an extension to get an L_Key with translation turned off. This is > almost like 'default L_Key' from the verbs extensions, but Tavor will > require a protection domain as well. This will let the HCA use any > address returned from pci_map_xxx(). I don't understand how protection domains play with DMA. I gather so far the HCA hosts a virtual -> DMA mapping table which the HCA driver keeps up to date. Is the protection domain used by the HCA to limit which entries in the table a process on a remote host may derefence? grant From halr at voltaire.com Mon Aug 2 10:35:49 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 02 Aug 2004 13:35:49 -0400 Subject: [openib-general] IPoIB IETF WG presentation updated again Message-ID: <003d01c478b7$2798f360$6401a8c0@comcast.net> Hi, Here's another update to the IPoIB presentation (MAX_ADDR_LEN slide as pointed out by Tom). It is also available as https://openib.org/svn/trunk/contrib/voltaire/ietf_ipoib/ipoib_exp.pdf. -- Hal -------------- next part -------------- A non-text attachment was scrubbed... Name: IPoIB Implementation Experience.pdf Type: application/pdf Size: 226533 bytes Desc: not available URL: From roland at topspin.com Mon Aug 2 10:38:40 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 02 Aug 2004 10:38:40 -0700 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate In-Reply-To: <20040802171622.GD20829@cup.hp.com> (Grant Grundler's message of "Mon, 2 Aug 2004 10:16:22 -0700") References: <025a01c47833$096cb180$6401a8c0@comcast.net> <52n01dpp9e.fsf@topspin.com> <20040802153019.GA20829@cup.hp.com> <52brhtpn0r.fsf@topspin.com> <20040802171622.GD20829@cup.hp.com> Message-ID: <52llgxo2xr.fsf@topspin.com> Grant> Most NUMA machines are very likely to break this Grant> assumption. Even IA64 on ZX1 will break the assumptions Grant> about contigous memory. That map is OK... the problem would be if someone put half their RAM at 0x80000000000000 and half at 0xc0000000000000 or something like that. Small holes are fine; the DMA mapping code just won't return addresses in that area. Grant> I don't understand how protection domains play with DMA. I Grant> gather so far the HCA hosts a virtual -> DMA mapping table Grant> which the HCA driver keeps up to date. Is the protection Grant> domain used by the HCA to limit which entries in the table Grant> a process on a remote host may derefence? Every queue pair and every memory region is created in some protection domain. For a memory region to be used by an operation on a queue pair, their protection domains must match. However, the verbs extensions define a "default L_Key" (for kernel use only) that turns off both virtual->physical translation and protection domain checking. It's almost possible to simulate this on the current Mellanox HCA, except that I don't know of a way to turn off protection domain checking, so we'll have to create one pseudo-default L_Key per protection domain. - Roland - Roland From mshefty at ichips.intel.com Mon Aug 2 11:03:34 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 2 Aug 2004 11:03:34 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <52ekmrr5o5.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> Message-ID: <20040802110334.6d355987.mshefty@ichips.intel.com> On Sat, 31 Jul 2004 18:46:50 -0700 Roland Dreier wrote: > Yaron> I don't see where is the policy vs mechanism > > I guess I've said enough times that I'm not comfortable with this > design. I think it's important that we come to an agreement regarding the best approach to take with the GSI. And as a developer, I would rather start by focusing on the minimal requirements/features, then determine which additional features are desired and at which layer. * We need a way to synchronize sending MADs out QP1. * We need a method to route MADs received on QP1 to the appropriate handler. * There should be a common RMPP module usable over any QP (for redirection). * A useful feature would be matching responses with requests, with automatic retries. * For debugging purposes, you may also want the snoop all sends/receives on QP1 and/or any QP. Are any requirements missing from this? I've always thought of the "GSI" (layer-1) as meeting only the first two requirements. And I think that it would be optimal if layer-1 handed a received MAD to exactly one client, who would be responsible for the buffer. I would rather see a layer-1 registration be solicited-only or unsolicited based on version/class/method. This is more flexible than the current proposal, which is limited to trap/report. I don't think that layer-1 should have to know the meaning of the fields. It should act based purely on values in the fields. Specifically, to the proposed API, I would make class/version required for unsolicited registration only. I would replace reg_flags with a list of methods, or route based on version/class only. QP redirection requires allocating a PD, one or two CQs, and a QP. The size of the CQs, rearm policy, size of the QP, etc. need to be determined. I don't see why a user-mode app can't redirect traffic to a user-mode QP. Because of the number of options available, I think that the user needs control over them. However, they shouldn't be required to re-implement RMPP support, if it is needed. For the CM, I don't see that redirection requires anything beyond the existing verbs, plus a simple layer-1 interface as mentioned. The CM can redirect to one QP, one QP per PKey, whatever, and no RMPP support is needed. This moves the value-add upwards into the CM. For clients requiring RMPP support, I think that there needs to be a way to use a common RMPP module above any QP. This can be done by adding a layer-2 set of interfaces above a simple layer-1 interface. (The actual API may extend the existing layer-1/QP APIs, but the internal architecture would have them layered.) From gdror at mellanox.co.il Mon Aug 2 12:44:13 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Mon, 2 Aug 2004 22:44:13 +0300 Subject: [openib-general] IPoIB IETF WG presentation updated again Message-ID: <506C3D7B14CDD411A52C00025558DED60585BDB2@mtlex01.yok.mtl.com> > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Monday, August 02, 2004 8:36 PM ... > Here's another update to the IPoIB presentation (MAX_ADDR_LEN slide as > pointed out by Tom). ... > -- Hal > It's good to see that MAX_ADDR_LEN has been changed to 32. Does that solve all the IPoIB ARP related problems for 2.6 kernel ? Can we store all related link information in this 32 bytes ? What is envisioned to be stored in this 32 bytes - is it just the QPN+GID, or the entire path info, or the address vector object too ? I think that ideally, if a network device can replace the ARP functionality in the kernel that'll be better. Because this way the IPoIB can get an address resolution request from the IP stack, handle it by sending an ARP, then SA query for the path record, then creation of HCA address handle, and then place it in cache and pass back this address handle. When cache is replaced or expires, IPoIB will destroy the HCA address handle. If this is not supported, then IPoIB will still need to maintain a shadow table. Beyond that, it'll be nice if we could have gotten the IP datagram without the "Ethernet" header. Currently the IPoIB driver has to chop it, and replace it with the IPoIB encapsulation header. Anyway, this is just the purity of the protocol stack layering. Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.list at gmail.com Mon Aug 2 14:46:18 2004 From: roland.list at gmail.com (Roland Dreier) Date: Mon, 2 Aug 2004 14:46:18 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <20040802110334.6d355987.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040802110334.6d355987.mshefty@ichips.intel.com> Message-ID: I snipped a lot of stuff that I agree with. I just want to comment on one item: > * A useful feature would be matching responses with requests, with automatic retries. On general principles I'm leery of automatic retries (experience has shown that error handling should be moved up to the highest level possible). In particular for IB, we have found that exponential backoff is useful for preventing livelock in large fabrics (if a lot of clients are sending requests to a single server and the clients don't back off, the system may get stuck in a state where the server is spending all its time responding to requests that the client has already timed out). - Roland From roland at topspin.com Mon Aug 2 14:51:41 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 02 Aug 2004 14:51:41 -0700 Subject: [openib-general] IPoIB IETF WG presentation updated again In-Reply-To: <506C3D7B14CDD411A52C00025558DED60585BDB2@mtlex01.yok.mtl.com> (Dror Goldenberg's message of "Mon, 2 Aug 2004 22:44:13 +0300") References: <506C3D7B14CDD411A52C00025558DED60585BDB2@mtlex01.yok.mtl.com> Message-ID: <528ycxnr82.fsf@topspin.com> Dror> I think that ideally, if a network device can replace the Dror> ARP functionality in the kernel that'll be better. Because Dror> this way the IPoIB can get an address resolution request Dror> from the IP stack, handle it by sending an ARP, then SA Dror> query for the path record, then creation of HCA address Dror> handle, and then place it in cache and pass back this Dror> address handle. When cache is replaced or expires, IPoIB Dror> will destroy the HCA address handle. If this is not Dror> supported, then IPoIB will still need to maintain a shadow Dror> table. I don't think the networking maintainers will have much desire to see pluggable ARP implementations. However it may be possible to use the hard_header_cache() methods to handle the address vector stuff (I haven't figured out if this can be made to work, this is just a vague idea right now). Dror> Beyond that, it'll be nice if we could have gotten the IP Dror> datagram without the "Ethernet" header. Currently the IPoIB Dror> driver has to chop it, and replace it with the IPoIB Dror> encapsulation header. Anyway, this is just the purity of the Dror> protocol  stack layering. Not sure where this is coming from -- the Linux kernel networking core does not put an ethernet header in a packet. The network device's hard_header method can do whatever it wants to set up the packet. - Roland From mshefty at ichips.intel.com Mon Aug 2 14:00:43 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 2 Aug 2004 14:00:43 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040802110334.6d355987.mshefty@ichips.intel.com> Message-ID: <20040802140043.2642230e.mshefty@ichips.intel.com> On Mon, 2 Aug 2004 14:46:18 -0700 Roland Dreier wrote: > I snipped a lot of stuff that I agree with. I just want to comment on one item: > > > * A useful feature would be matching responses with requests, with automatic retries. > > On general principles I'm leery of automatic retries (experience has > shown that error handling should be moved up to the highest level > possible). I think of this more as providing a reliable communication transport. It seems that this sort of feature would be duplicated by any client that sends a MAD and wants to get a response. If you push RMPP down into the access layer (which I think is desirable, given its complexity - even if it's a user-mode piece), then retransmissions should probably be pushed down as well. Did you have a different idea of how to isolate retransmissions from RMPP? From roland.list at gmail.com Mon Aug 2 15:11:57 2004 From: roland.list at gmail.com (Roland Dreier) Date: Mon, 2 Aug 2004 15:11:57 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <20040802140043.2642230e.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040802110334.6d355987.mshefty@ichips.intel.com> <20040802140043.2642230e.mshefty@ichips.intel.com> Message-ID: > I think of this more as providing a reliable communication transport. It seems that this sort of feature would be duplicated by any client that sends a MAD and wants to get a response. > If you push RMPP down into the access layer (which I think is desirable, given its complexity - even if it's a user-mode piece), then retransmissions should probably be pushed down as well. > Did you have a different idea of how to isolate retransmissions from RMPP? I think it's fine to do retransmissions after timeouts in the context of an RMPP implementation, since the spec defines the timeout explicitly. If a consumer issues an RMPP send request, I agree completely that the RMPP implementation should handle any timeouts that happen during the send (and similarly for receiving multi-packet messages). The case I'm talking about is when, for example, a consumer sends a path record request. If no response is received after the consumer's specified timeout, then we should just return that timeout to the consumer and let the consumer decide whether and when to resend the request. - Roland From mshefty at ichips.intel.com Mon Aug 2 14:37:19 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 2 Aug 2004 14:37:19 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040802110334.6d355987.mshefty@ichips.intel.com> <20040802140043.2642230e.mshefty@ichips.intel.com> Message-ID: <20040802143719.4081f169.mshefty@ichips.intel.com> On Mon, 2 Aug 2004 15:11:57 -0700 Roland Dreier wrote: > The case I'm talking about is when, for example, a consumer sends a > path record request. If no response is received after the consumer's > specified timeout, then we should just return that timeout to the > consumer and let the consumer decide whether and when to resend the > request. Ah - I didn't get that you wanted to include a timeout to wait for a response. I think if you did that, then I agree, there's no reason to automatically resend on the user's behalf. (They could just do that from the callback, for example.) So, assuming that, I'm left wondering what the impact would be on clients if the access layer _didn't_ provide a timeout to match responses with requests. It seems that clients would need to maintain a list of outstanding requests, along with a timer to time them out, information about whether a send had completed, and information on if a response had been received. Is this enough to justify pushing the timeout into the access layer? From mshefty at ichips.intel.com Mon Aug 2 15:24:26 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 2 Aug 2004 15:24:26 -0700 Subject: [openib-general] Update from merging in Roland's changes... In-Reply-To: <20040730153253.038876fe.mshefty@ichips.intel.com> References: <20040730151713.52bc6b55.mshefty@ichips.intel.com> <20040730153253.038876fe.mshefty@ichips.intel.com> Message-ID: <20040802152426.672a6f27.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 15:32:53 -0700 Sean Hefty wrote: > On Fri, 30 Jul 2004 15:17:13 -0700 > Sean Hefty wrote: > > > I've updated ib_verbs based on changes that Roland had in his copy of the file. Specifically, I've updated the ib_xxx structures to include pointers to referenced items, added a usecnt field, and converted the functions from prototypes into static inline routines. The patch for this update is listed below. > > > Here's a separate patch to include the lkey/rkey in ib_mr, ib_mw, and ib_fmr. With these fields included in the structures, the lkey/rkey parameters can be removed from several calls. I have not committed these changes. I would like to reach an agreement whether or not we want these changes. I'm assuming that the lack of response indicates that people are fine with these changes then. If not, please speak up. From roland.list at gmail.com Mon Aug 2 16:35:05 2004 From: roland.list at gmail.com (Roland Dreier) Date: Mon, 2 Aug 2004 16:35:05 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <20040802143719.4081f169.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040802110334.6d355987.mshefty@ichips.intel.com> <20040802140043.2642230e.mshefty@ichips.intel.com> <20040802143719.4081f169.mshefty@ichips.intel.com> Message-ID: > So, assuming that, I'm left wondering what the impact would be on clients if the access layer _didn't_ provide a timeout to match responses with requests. It seems that clients would need to maintain a list of outstanding requests, along with a timer to time them out, information about whether a send had completed, and information on if a response had been received. Is this enough to justify pushing the timeout into the access layer? That's a good question. If we put timeouts in the consumer instead of the access layer, then consumers have to delete their requests on timeout (to avoid leaking request context). Also, I've found that the timeout code can have subtle ordering/locking bugs (eg handling the case when a timeout occurs just as a response arrives). I don't really have a strong opinion, but I would lean towards having basic handling of timeouts (just generating a callback when a timeout happens) in the access layer. - Roland From mshefty at ichips.intel.com Mon Aug 2 15:57:26 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 2 Aug 2004 15:57:26 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040802110334.6d355987.mshefty@ichips.intel.com> <20040802140043.2642230e.mshefty@ichips.intel.com> <20040802143719.4081f169.mshefty@ichips.intel.com> Message-ID: <20040802155726.2a6c6623.mshefty@ichips.intel.com> On Mon, 2 Aug 2004 16:35:05 -0700 Roland Dreier wrote: > I don't really have a strong opinion, but I would lean towards having > basic handling of timeouts (just generating a callback when a timeout > happens) in the access layer. I agree. Looking at the propsed API, I don't see how timeouts are used currently. So, I think we want an API that does *not* do automatic retransmission, but does use a timeout when matching a response with request. This is probably easy enough to fix by adding a timeout to the MAD (GSI message?) structure. From yaronh at voltaire.com Mon Aug 2 20:56:33 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Tue, 3 Aug 2004 06:56:33 +0300 Subject: [openib-general] gen2 dev branch Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AB57@taurus.voltaire.com> On Tuesday, August 03, 2004 1:57 AM, Sean Hefty wrote: > On Mon, 2 Aug 2004 16:35:05 -0700 > Roland Dreier wrote: >> I don't really have a strong opinion, but I would lean towards having >> basic handling of timeouts (just generating a callback when a timeout >> happens) in the access layer. > > I agree. Looking at the propsed API, I don't see how timeouts are > used currently. So, I think we want an API that does *not* do > automatic retransmission, but does use a timeout when matching a > response with request. This is probably easy enough to fix by adding > a timeout to the MAD (GSI message?) structure. The current API doesn't deal with timeouts and retransmits, but leave that to the application as suggested in the pass, the benefit is that different apps can deal with retransmit differently. E.g. an app may want to send few MAD's and have a single timer for all, instead of one per MAD, or have its own timer/retry policy >From what I remember It does however match a response with a request and issue a callback on a complete transaction (when a response to the MAD arrived , rather than when the MAD was sent). Maybe Todd or Hal/Moni can respond on how its done exactly >QP redirection requires allocating ... Because of the number of options available, I think that the user needs control over them. Agreed, a GSI server should own its redirected QP. we need however to look at the redirect messages that run on QP1, when an active side sends a MAD and get a response that the MAD should be redirected, the MAD should be resent to the new QP, in the proposed implementation the GSI layer resends the MAD to the new QP when such redirect message arrives without involving the App, otherwise any consumer should have provided exactly the same functionality and its better to have it be transparent to the consumers. Yaron From gdror at mellanox.co.il Tue Aug 3 03:46:17 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Tue, 3 Aug 2004 13:46:17 +0300 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate Message-ID: <506C3D7B14CDD411A52C00025558DED60585BE63@mtlex01.yok.mtl.com> > -----Original Message----- > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Monday, August 02, 2004 8:39 PM > > > Grant> I don't understand how protection domains play with DMA. I > Grant> gather so far the HCA hosts a virtual -> DMA mapping table > Grant> which the HCA driver keeps up to date. Is the protection > Grant> domain used by the HCA to limit which entries in the table > Grant> a process on a remote host may derefence? > > Every queue pair and every memory region is created in some protection > domain. For a memory region to be used by an operation on a queue > pair, their protection domains must match. However, the verbs > extensions define a "default L_Key" (for kernel use only) that turns > off both virtual->physical translation and protection domain > checking. It's almost possible to simulate this on the current > Mellanox HCA, except that I don't know of a way to turn off protection > domain checking, so we'll have to create one pseudo-default L_Key per > protection domain. > > - Roland > It is not possible to turn off PD check on a MR in Tavor. You will have to create an MR for each PD that you need. One way to go is to map all kernel apps to the same PD, but you probably don't want to do such a thing. Once we get to verbs extensions, you can configure Arbel to disable the PD check for certain QPs. -Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From David.Brean at Sun.COM Tue Aug 3 01:43:58 2004 From: David.Brean at Sun.COM (David M. Brean) Date: Tue, 03 Aug 2004 04:43:58 -0400 Subject: [openib-general] Updated IPoIB IETF WG presentation In-Reply-To: <001701c47898$ec8f6880$6401a8c0@comcast.net> References: <006d01c4756b$948aeea0$6401a8c0@comcast.net> <410D73FF.5060908@sun.com> <001701c47898$ec8f6880$6401a8c0@comcast.net> Message-ID: <410F504E.5000908@sun.com> Hello, Hal Rosenstock wrote: > Hi David, > > David M. Brean wrote: > >>Hello, >> >>Some comments/questions about these slides: >> >>* slide 1 - nit: perhaps the title should be "Some Experience with >>Linux IPoIB Implementations" since the information is coming from >>Linux >>developers. > > > Good point. > > >>* slide 4 - nit: move the first bullet after bullet containing "single >>implementation" > > > I reordered the bullets as suggested. > > >>* slide 6 - nit: first bullet should be highlighted as the "problem" >>and >>the second bullet as the "solution". > > > Done. > > >>* slide 7 and 8 - In section 5.0 of the I-D, there is text stating >>that >>the "broadcast group may be created by the first IPoIB node to be >>initialized or it can be created administratively before the IPoIB >>subnet is setup". The mechanism used to administratively create the >>group is intentionally beyond the scope of the I-D. For example, an >>implementation could enable the fabric (or "network" as you say) >>administrator to control membership in a partition and therefore make >>sure that the first node added to that partition creates the broadcast >>group correctly. In any case, mentioning the administrative option is >>kinda a "helpful" hint. All the IPoIB nodes are free to create the >>broadcast group, just like they can create any multicast group, as >>long >>as the IPoIB node has enough information to specify the necessary >>parameters as required by the SA interface. The I-D suggests how to >>find the necessary parameters for the multicast groups and leaves open >>how IPoIB nodes obtain that information if they need to create that >>group. >> >> Are these slides suggesting that the I-D be changed to specify the >>IPoIB parameters via defaults for the case where the IPoIB node must >>create the broadcast group? > > >>From the discussion on the group, it was stated that some may > have interpreted the spec as requiring the pre-administered groups > and not supporting the end node creation of a group (even > the broadcast group if not already present) > (at least that's the way at least two were implemented). > This may not be an issue any more but I have not seen this stated > explicitly on this email list. > The slides quote text from the I-D that says that IPoIB node should create group if it doesn't exist and use parameters from the broadcast group. What additional clarification is needed? By the way, the I-D is written to be consistent with the language in the IB specification and that is why JOIN and CREATE are separately described. However, JOIN and CREATE can be done in one SA operation and that operation has been described on this reflector. > Yes, it might be good (to eliminate the need for explicit configuration) > to select a specific controlled QKey as a default for the end node case. > > >> [Note, Q_Key is provided by broadcast group, so it isn't necessary >>to distribute to all IPoIB nodes.] > > > Are you referring to "It is RECOMMENDED that a controlled Q_Key be used with > the > high order bit set." for the broadcast group (and all other groups > using the broadcast group parameters) ? > > Aren't there many controlled QKeys so this still needs configuration > somewhere (either at the SM/SA or at at least one end node (if all the > others > join rather than create the broadcast group (otherwise all end nodes if they > all attempt to create this group when not present)) ? > Section 5.0 of the latest I-D says "The join operation (using the broadcast group) returns the MTU, the Q_Key and other parameters associated with the broadcast group. The node then associates the parameters received as a result of the join operation with its IPoIB interface." and in section 9.1.2 it says "The Q_Key received on joining the broadcast group MUST be used for all IPoIB communication over the particular IPoIB link." So, for a particular IPoIB link there is one Q_Key and there should be no need for explicit configuration on each IPoIB node except in the case of the broadcast group creation. Selection of the Q_Key value is left to the administrator, but the I-D recommends using one in the controlled range. So, I don't think there is a separate Q_Key distribution problem in addition to the broadcast group problem mentioned in the slides. > >>* slide 9 and 10 - "Running" may be the description of a state that is >>be OS is beyond the scope of the I-D (does Windows network interface >>support a "running" state?). However, the I-D does say that an IPoIB >>link is "formed" only when the broadcast group exists. The I-D >>doesn't >>say anything about operation in a "degraded" mode, for example, when a >>IPoIB node can't join a multicast group. Behavior in degraded mode >>seems like an implementation issue. It's not clear what you would >>want >>to change in the I-D, perhaps you can suggest what you want changed in >>the presentation. > > > I added in a bullet on interface state being OS specific. > > What I was wondering about (due to the implementations not currently > dealing with the failure modes) was: > > Is the statement "an IPoIB link is "formed" only when the broadcast group > exists" sufficient for an IPoIB node failing to join the broadcast group ? > > Perhaps it should state "From the IPoIB node perspective, the node is not > part of the IPoIB link until (at least) the broadcast group is successfully > joined" as well. > Section 5 describes the IPoIB link setup. It says that "Every IPoIB interface MUST "FullMember" join the IB multicast group defined by the broadcast-GID." and later says "Thus the IPoIB link is formed by the IPoIB nodes joining the broadcast group." I interpreted the problems discussed in the email as being related to unclear behavior of communication when IPoIB is operating in a degraded mode. The I-D doesn't attempt to describe that (in my opinion), but I not sure that it needs to. Perhaps that's vendor value add. > >>* slide 12 - I recall that during the email discussion: >>1) a boot-time scenario where the IPoIB nodes had to access the SA to >>obtain pathrecord information to fill the pathrecord cache and send >>unicast ARP messages > > > I didn't mention this one in the presentation although it is mentioned in > bullet which states > "Only if node has talked with other node (and cached information); otherwise > SA interaction is currently needed" > Well, I mention this scenario because the implication is that the current mechanism does not scale. I don't recall any comments on the reflector about performance problems under normal operating conditions. > >>2) a SM failover/restart scenario >> >> For #1, the speed at which the IPoIB nodes can begin normal >>operation depends on the fabric and SA implementation. I guess the >>question is >>whether this is an architecture or implementation problem. Is it >>impossible to implement a working system based on the current >>architecture? I think the proposed alternative would require changes >>to >>the encapsulation scheme plus specifying some defaults such as the SL >>so >>that SA queries are eliminated. Some of that might require input from >>the IBTA. >> >> For #2, how long is too long for a subnet to operate without >>successful SA queries? 10 seconds? 20 seconds? > > > Don't know. Perhaps there are some on this list with opinions on this. > > >>Or is this change >>suggesting that the subnet should continue operating, perhaps >>establishing new IP connections (note, this proposal doesn't attempt >>to >>fix the situation at the IB transport level) even in the case where no >>SA exists. Please clarify in the slides. > > > The intent is to continue operation for all IPoIB nodes currently > on the subnet (in the absence of any changes) when in the window > when no SM/SA exists. > Yes, but the duration of the window depends on the SM implementation and fabric configuration. If you are going to suggest that the protocol be redesigned, then you need to explain how the architecture is unimplementable. [The alternative represents a significant change at this point ant the proposals that I've seen are insufficient.] -David > >>* slide 13 - An IB CA should perform as well as a "dumb" ethernet NIC >>with respect to bandwidth and CPU utilization. If not, someone should >>look at the overheads in the IB access layer and the CA >>implementation, right? The statement "not equivalent to ethernet" is >>highlighting the >>lack offload mechanisms in the CA such as checksum, correct? If so, >>perhaps that point should be made explicit. > > > Another lack of clarity. I did mean "dumb" ethernet and not anything more > sophisticated with checksum offload, etc. That's a separate issue. > I made this into 2 slides in the next version of this presentation. > > >>Note, I'm not attempting to respond to the issues raised on the slides >>since that will happen at the meeting, but merely seeking >>clarification >>of the issues being raised. > > > Understood. Thanks for your comments. I think the (hopefully) added > clarity will help. > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Tue Aug 3 07:00:09 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 03 Aug 2004 10:00:09 -0400 Subject: [openib-general] Updated IPoIB IETF WG presentation References: <006d01c4756b$948aeea0$6401a8c0@comcast.net> <410D73FF.5060908@sun.com> <001701c47898$ec8f6880$6401a8c0@comcast.net> <410F504E.5000908@sun.com> Message-ID: <003001c47962$315c3960$6401a8c0@comcast.net> David M. Brean wrote: >> Hal Rosenstock wrote: >> From the discussion on the group, it was stated that some may >> have interpreted the spec as requiring the pre-administered groups >> and not supporting the end node creation of a group (even >> the broadcast group if not already present) >> (at least that's the way at least two were implemented). >> This may not be an issue any more but I have not seen this stated >> explicitly on this email list. >> > > The slides quote text from the I-D that says that IPoIB node should > create group if it doesn't exist and use parameters from the broadcast > group. What additional clarification is needed? IMO none. As I mentioned, some others in this group were unsure. > By the way, the I-D is written to be consistent with the language in > the IB specification and that is why JOIN and CREATE are separately > described. However, JOIN and CREATE can be done in one SA operation > and that operation has been described on this reflector. Understood. >> Yes, it might be good (to eliminate the need for explicit >> configuration) >> to select a specific controlled QKey as a default for the end node >> case. >> >> >>> [Note, Q_Key is provided by broadcast group, so it isn't necessary >>> to distribute to all IPoIB nodes.] >> >> >> Are you referring to "It is RECOMMENDED that a controlled Q_Key be >> used with the >> high order bit set." for the broadcast group (and all other groups >> using the broadcast group parameters) ? >> >> Aren't there many controlled QKeys so this still needs configuration >> somewhere (either at the SM/SA or at at least one end node (if all >> the >> others >> join rather than create the broadcast group (otherwise all end nodes >> if they all attempt to create this group when not present)) ? >> > > Section 5.0 of the latest I-D says "The join operation (using the > broadcast group) returns the MTU, the Q_Key and other parameters > associated with the broadcast group. The node then associates the > parameters received as a result of the join operation with its IPoIB > interface." and in section 9.1.2 it says "The Q_Key received on > joining > the broadcast group MUST be used for all IPoIB communication over the > particular IPoIB link." > > So, for a particular IPoIB link there is one Q_Key and there should be > no need for explicit configuration on each IPoIB node except in the > case > of the broadcast group creation. Selection of the Q_Key value is > left to the administrator, but the I-D recommends using one in the > controlled range. So, I don't think there is a separate Q_Key > distribution problem in addition to the broadcast group problem > mentioned in the slides. Agreed. Was this mentioned somewhere else in the slides ? >>> * slide 9 and 10 - "Running" may be the description of a state that >>> is be OS is beyond the scope of the I-D (does Windows network >>> interface support a "running" state?). However, the I-D does say >>> that an IPoIB link is "formed" only when the broadcast group >>> exists. The I-D doesn't >>> say anything about operation in a "degraded" mode, for example, >>> when a IPoIB node can't join a multicast group. Behavior in >>> degraded mode seems like an implementation issue. It's not clear >>> what you would want >>> to change in the I-D, perhaps you can suggest what you want changed >>> in the presentation. >> >> >> I added in a bullet on interface state being OS specific. >> >> What I was wondering about (due to the implementations not currently >> dealing with the failure modes) was: >> >> Is the statement "an IPoIB link is "formed" only when the broadcast >> group exists" sufficient for an IPoIB node failing to join the >> broadcast group ? >> >> Perhaps it should state "From the IPoIB node perspective, the node >> is not part of the IPoIB link until (at least) the broadcast group >> is successfully joined" as well. >> > > Section 5 describes the IPoIB link setup. It says that "Every IPoIB > interface MUST "FullMember" join the IB multicast group defined by the > broadcast-GID." and later says "Thus the IPoIB link is formed by > the IPoIB nodes joining the broadcast group." > > I interpreted the problems discussed in the email as being related to > unclear behavior of communication when IPoIB is operating in a > degraded > mode. The I-D doesn't attempt to describe that (in my opinion), but I > not sure that it needs to. Perhaps that's vendor value add. There were 2 aspects to the degraded operation. One was related to critical groups (like the broadcast group, which is covered by the statement you cite) and non critical ones. There was an issue when the broadcast group could not be joined. There was also the issue of whether any other groups are "critical" or is the broadcast group the only one. >>> * slide 12 - I recall that during the email discussion: >>> 1) a boot-time scenario where the IPoIB nodes had to access the SA >>> to obtain pathrecord information to fill the pathrecord cache and >>> send unicast ARP messages >> >> >> I didn't mention this one in the presentation although it is >> mentioned in bullet which states >> "Only if node has talked with other node (and cached information); >> otherwise SA interaction is currently needed" >> > > Well, I mention this scenario because the implication is that the > current mechanism does not scale. I don't recall any comments on the > reflector about performance problems under normal operating > conditions. I believe boot up has been mentioned by some people on the list. I perceive this as an SA performance issue in not being able to keep up with the transaction rate in a large cluster. Do you think I should add this as a performance concern (not an IPoIB one, but related to IPoIB) ? >>> 2) a SM failover/restart scenario >>> >>> For #1, the speed at which the IPoIB nodes can begin normal >>> operation depends on the fabric and SA implementation. I guess the >>> question is >>> whether this is an architecture or implementation problem. Is it >>> impossible to implement a working system based on the current >>> architecture? I think the proposed alternative would require >>> changes to >>> the encapsulation scheme plus specifying some defaults such as the >>> SL so >>> that SA queries are eliminated. Some of that might require input >>> from the IBTA. >>> >>> For #2, how long is too long for a subnet to operate without >>> successful SA queries? 10 seconds? 20 seconds? >> >> >> Don't know. Perhaps there are some on this list with opinions on >> this. >> >> >>> Or is this change >>> suggesting that the subnet should continue operating, perhaps >>> establishing new IP connections (note, this proposal doesn't attempt >>> to >>> fix the situation at the IB transport level) even in the case where >>> no SA exists. Please clarify in the slides. >> >> >> The intent is to continue operation for all IPoIB nodes currently >> on the subnet (in the absence of any changes) when in the window >> when no SM/SA exists. >> > > Yes, but the duration of the window depends on the SM implementation > and fabric configuration. If you are going to suggest that the > protocol be redesigned, then you need to explain how the architecture > is > unimplementable. [The alternative represents a significant change at > this point ant the proposals that I've seen are insufficient.] Agreed. Much more work (and detail) needs to be done here. I still think it is worth mentioning to plant the seed. -- Hal From halr at voltaire.com Tue Aug 3 07:12:36 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 03 Aug 2004 10:12:36 -0400 Subject: [openib-general] IPoIB IETF WG presentation updated again References: <506C3D7B14CDD411A52C00025558DED60585BDB2@mtlex01.yok.mtl.com> Message-ID: <003f01c47963$ee56b6c0$6401a8c0@comcast.net> RE: [openib-general] IPoIB IETF WG presentation updated againHi Dror, > It's good to see that MAX_ADDR_LEN has been changed to 32. Does that solve > all the IPoIB ARP related problems for 2.6 kernel ? Can we store all related link > information in this 32 bytes ? What is envisioned to be stored in this 32 bytes - is > it just the QPN+GID, or the entire path info, or the address vector object too ? If it holds 32 bytes, then it can hold GID + QPN with 13 bytes still available. Other information you might want to hold: SL 1 bytes LID 2 bytes MTU (for connected mode) (1 byte) Rate (1 byte) Network Layer Flow Label 3 bytes (20 bits) Hop Limit 1 byte TClass 1 byte So all the info for an AV could be stored there. Did I miss something needed ? I didn't double check this but there is still some room left over. > I think that ideally, if a network device can replace the ARP functionality in the kernel > that'll be better. Because this way the IPoIB can get an address resolution request > from the IP stack, handle it by sending an ARP, then SA query for the path record, then > creation of HCA address handle, and then place it in cache and pass back this address > handle. When cache is replaced or expires, IPoIB will destroy the HCA address handle. > If this is not supported, then IPoIB will still need to maintain a shadow table. Cloning an AH is probably faster than creating a new one from scratch. (We would need an additional verb for this). How much does this cost ? Is this optimization worth it ? > Beyond that, it'll be nice if we could have gotten the IP datagram without the "Ethernet" > header. Currently the IPoIB driver has to chop it, and replace it with the IPoIB encapsulation > header. Anyway, this is just the purity of the protocol stack layering. There would need to be another way to identify the various protocols (aka ethertypes) being carried. -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue Aug 3 08:58:27 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 3 Aug 2004 08:58:27 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AB57@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F18AB57@taurus.voltaire.com> Message-ID: <20040803085827.59b9162f.mshefty@ichips.intel.com> On Tue, 3 Aug 2004 06:56:33 +0300 "Yaron Haviv" wrote: > The current API doesn't deal with timeouts and retransmits, but leave > that to the application as suggested in the pass, the benefit is that > different apps can deal with retransmit differently. I think that the GSI needs to provide the timeout mechanism. We don't want clients to have to re-implement this, and it makes it easier to match requests with responses. > >QP redirection requires allocating ... Because of the number of > options available, I think that the user needs control over them. > > Agreed, a GSI server should own its redirected QP. I don't think we're in agreement. The client should control the redirected QPs, *not* the GSI. Having the GSI control redirected QPs puts too much policy into the GSI. I view the GSI as access to QP1 only. > we need however to look at the redirect messages that run on QP1, when > an active side sends a MAD and get a response that the MAD should be > redirected, the MAD should be resent to the new QP, in the proposed > implementation the GSI layer resends the MAD to the new QP when such > redirect message arrives without involving the App, otherwise any > consumer should have provided exactly the same functionality and its > better to have it be transparent to the consumers. I see no benefit to making redirection transparent. The GSI is reducing control from the app, without having knowledge of what the client is trying to do. The process of redirecting a remote node involves formatting and sending a MAD, so I don't see this as a win. If something like this *is* necessary, then a "redirection layer" could be implemented on top of a simple layer-1 interface. To be clear, I believe that the core GSI should operate on field values only, and let the meanings of those fields be defined by upper layers. RMPP and request/response matching need to be generic modules that are available over any QP. Clients should have control over redirection and redirected QPs. I think that we need to examine the design of the APIs and the software architecture from a couple of perspectives to see that it can *best* meet all needs. The CM doesn't require request/response matching or RMPP, and CM redirection is simple. SA queries requires request/response matching and RMPP, and it's redirection is more complex. Currently, I don't think that we're *best* meeting the needs of the CM. From halr at voltaire.com Tue Aug 3 10:42:31 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 03 Aug 2004 13:42:31 -0400 Subject: [openib-general] gen2 dev branch References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040802110334.6d355987.mshefty@ichips.intel.com> <20040802140043.2642230e.mshefty@ichips.intel.com> Message-ID: <008a01c47981$41c09a20$6401a8c0@comcast.net> Roland Dreier wrote: > The case I'm talking about is when, for example, a consumer sends a > path record request. If no response is received after the consumer's > specified timeout, then we should just return that timeout to the > consumer and let the consumer decide whether and when to resend the > request. By passing a timeout with the request to be sent to the GSI, the GSI can decide when to timeout the matching of the response with the request and callback the consumer. -- Hal From Tom.Duffy at Sun.COM Tue Aug 3 11:03:11 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Tue, 03 Aug 2004 11:03:11 -0700 Subject: [openib-general] BOF at Linuxworld Message-ID: <1091556191.12096.5.camel@localhost> Would it be a good idea to setup a BOF or have all OpenIB parties interested go out for a beer at Linuxworld? I wasn't able to make it to OLS because of a previous commitment. -tduffy From mshefty at ichips.intel.com Tue Aug 3 10:25:08 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 3 Aug 2004 10:25:08 -0700 Subject: [openib-general] Update from merging in Roland's changes... In-Reply-To: <20040730153253.038876fe.mshefty@ichips.intel.com> References: <20040730151713.52bc6b55.mshefty@ichips.intel.com> <20040730153253.038876fe.mshefty@ichips.intel.com> Message-ID: <20040803102508.08c8f643.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 15:32:53 -0700 Sean Hefty wrote: > Here's a separate patch to include the lkey/rkey in ib_mr, ib_mw, and ib_fmr. With these fields included in the structures, the lkey/rkey parameters can be removed from several calls. I have not committed these changes. I would like to reach an agreement whether or not we want these changes. I have applied this patch. From halr at voltaire.com Tue Aug 3 15:04:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 03 Aug 2004 18:04:26 -0400 Subject: [openib-general] [PATCH] GSI: Replace VD_xxx calls and use KERN_ levels in printk calls Message-ID: <1091570668.1212.6.camel@localhost.localdomain> Replace VD_xxx calls with printk calls Add KERN_xxx levels to printk calls Also, eliminated need for debug_support.h Index: gsi_main.c =================================================================== --- gsi_main.c (revision 561) +++ gsi_main.c (working copy) @@ -65,7 +65,6 @@ #include "mad.h" #define VD_MAIN_MODULE -#include "debug_support.h" #include "gsi.h" #include "gsi_priv.h" #if 0 /* GSI_REDIRECT */ @@ -238,10 +237,8 @@ int attr_mask; struct ib_qp_cap qp_cap; - VD_ENTERFUNC(); - if (!(attr = kmalloc(sizeof (*attr), GFP_KERNEL))) { - VD_TRACE(VD_ERROR, "Could not alloc mem for ib_qp_attr\n"); + printk(KERN_ERR "Could not alloc mem for ib_qp_attr\n"); return -ENOMEM; } @@ -254,7 +251,7 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - VD_LEAVEFUNC("ret = %d\n", ret); + printk(KERN_DEBUG "ret = %d\n", ret); return ret; } @@ -269,10 +266,8 @@ int attr_mask; struct ib_qp_cap qp_cap; - VD_ENTERFUNC(); - if (!(attr = kmalloc(sizeof (*attr), GFP_KERNEL))) { - VD_TRACE(VD_ERROR, "Could not alloc mem for ib_qp_attr\n"); + printk(KERN_ERR "Could not alloc mem for ib_qp_attr\n"); return -ENOMEM; } @@ -282,7 +277,7 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - VD_LEAVEFUNC("ret = %d\n", ret); + printk(KERN_DEBUG "ret = %d\n", ret); return ret; } @@ -297,10 +292,8 @@ int attr_mask; struct ib_qp_cap qp_cap; - VD_ENTERFUNC(); - if (!(attr = kmalloc(sizeof (*attr), GFP_KERNEL))) { - VD_TRACE(VD_ERROR, "Could not alloc mem for ib_qp_attr\n"); + printk(KERN_ERR "Could not alloc mem for ib_qp_attr\n"); return -ENOMEM; } @@ -311,7 +304,7 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - VD_LEAVEFUNC("ret = %d\n", ret); + printk(KERN_DEBUG "ret = %d\n", ret); return ret; } @@ -326,10 +319,8 @@ int attr_mask; struct ib_qp_cap qp_cap; - VD_ENTERFUNC(); - if (!(attr = kmalloc(sizeof (*attr), GFP_KERNEL))) { - VD_TRACE(VD_ERROR, "Could not alloc mem for ib_qp_attr\n"); + printk(KERN_ERR "Could not alloc mem for ib_qp_attr\n"); return -ENOMEM; } @@ -339,7 +330,7 @@ ret = ib_modify_qp(qp, attr, attr_mask, &qp_cap); kfree(attr); - VD_LEAVEFUNC("ret = %d\n", ret); + printk(KERN_DEBUG "ret = %d\n", ret); return ret; } @@ -351,8 +342,6 @@ { struct gsi_dtgrm_priv_st *dtgrm; - VD_ENTERFUNC(); - /* * Get the posted datagrams * and return it to pool. @@ -368,7 +357,6 @@ } INIT_LIST_HEAD(&hca->rcv_posted_dtgrm_list); hca->stat.rcv_posted_cnt = 0; - VD_LEAVEFUNC(); } static void @@ -378,7 +366,6 @@ struct mad_t *mad; GSI_SND_LIST_LOCK_VAR; - VD_ENTERFUNC(); GSI_SND_LIST_LOCK(class_info->hca); while (!list_empty(&class_info->snd_posted_dtgrm_list)) { dtgrm_priv = @@ -406,14 +393,13 @@ (struct gsi_dtgrm_t *) dtgrm_priv); else { - VD_TRACE(VD_ERROR, "No callback registered\n"); + printk(KERN_ERR "No callback registered\n"); gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm_priv); } GSI_SND_LIST_LOCK(class_info->hca); } INIT_LIST_HEAD(&class_info->snd_posted_dtgrm_list); GSI_SND_LIST_UNLOCK(class_info->hca); - VD_LEAVEFUNC(); } /* @@ -425,7 +411,6 @@ { struct gsi_redirect_info_st *redirect_info = NULL; - VD_ENTERFUNC(); while (!list_empty(&class_info->redirect_class_port_info_list)) { redirect_info = (struct gsi_redirect_info_st *) class_info-> @@ -439,7 +424,6 @@ kfree(redirect_info); } - VD_LEAVEFUNC(); } /* @@ -452,8 +436,6 @@ struct gsi_serv_class_info_st *class_info, *head = (struct gsi_serv_class_info_st *) &gsi_class_list; - VD_ENTERFUNC(); - gsi_get_all_classes(); list_for_each(class_info, head) { @@ -465,7 +447,6 @@ gsi_put_all_classes(); - VD_LEAVEFUNC(); } /* @@ -481,8 +462,6 @@ struct gsi_dtgrm_priv_st *dtgrm_priv; GSI_RCV_LIST_LOCK_VAR; - VD_ENTERFUNC(); - /* * Get all datagrams from the pool and * submit receive work requests @@ -495,7 +474,7 @@ IB_MR_LOCAL_WRITE, &dtgrm_priv->sg.lkey, &rkey); if (IS_ERR(dtgrm_priv->v_mem_h)) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Could not get general memory region\n"); ret = PTR_ERR(dtgrm_priv->v_mem_h); goto error1; @@ -518,14 +497,12 @@ GSI_RCV_LIST_UNLOCK(hca); if (!(ret = ib_post_recv(hca->qp, &wr, &bad_wr))) { - VD_TRACE(VD_ERROR, "Could not post receive request\n"); + printk(KERN_ERR "Could not post receive request\n"); goto error2; } hca->stat.rcv_posted_cnt++; } - VD_LEAVEFUNC(); - return 0; error2: @@ -535,7 +512,7 @@ error1: gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm_priv); - VD_LEAVEFUNC("ret = %d\n", ret); + printk(KERN_DEBUG "ret = %d\n", ret); return ret; } @@ -560,39 +537,35 @@ { int ret = 0; - VD_ENTERFUNC(); - if ((ret = gsi_change_qp_state_to_init(hca->handle, hca->qp, hca->port))) { - VD_TRACE(VD_ERROR, "Could not change QP state to INIT\n"); + printk(KERN_ERR "Could not change QP state to INIT\n"); return (ret); } if ((ret = gsi_post_receive_dtgrms(hca))) { - VD_TRACE(VD_ERROR, "Could not post receive requests\n"); + printk(KERN_ERR "Could not post receive requests\n"); goto error; } if ((ret = ib_req_notify_cq(hca->cq, IB_CQ_NEXT_COMP))) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Failed to request completion notification\n"); goto error; } if ((ret = gsi_change_qp_state_to_rtr(hca->handle, hca->qp))) { - VD_TRACE(VD_ERROR, "Could not change QP state to RTR\n"); + printk(KERN_ERR "Could not change QP state to RTR\n"); goto error; } if ((ret = gsi_change_qp_state_to_rts(hca->handle, hca->qp))) { - VD_TRACE(VD_ERROR, "Could not change QP state to RTS\n"); + printk(KERN_ERR "Could not change QP state to RTS\n"); goto error; } GSI_HCA_SET_UP(hca); - VD_LEAVEFUNC(); - return 0; error: @@ -608,8 +581,6 @@ static int gsi_hca_stop(struct gsi_hca_info_st *hca) { - VD_ENTERFUNC(); - GSI_HCA_SET_DOWN(hca); gsi_change_qp_state_to_reset(hca->handle, hca->qp); @@ -617,8 +588,6 @@ gsi_return_posted_rcv_dtgrms(hca); gsi_return_posted_snd_dtgrms(hca); - VD_LEAVEFUNC(); - return 0; } @@ -627,19 +596,16 @@ { int ret; - VD_ENTERFUNC(); - if ((ret = gsi_hca_stop(hca))) - VD_TRACE(VD_ERROR, "Could not stop %s/%d\n", hca->handle->name, + printk(KERN_ERR "Could not stop %s/%d\n", hca->handle->name, hca->port); else { if ((ret = gsi_hca_start(hca))) - VD_TRACE(VD_ERROR, "Could not start %s/%d\n", + printk(KERN_ERR "Could not start %s/%d\n", hca->handle->name, hca->port); } hca->stat.restart_cnt++; - VD_LEAVEFUNC(); return ret; } @@ -654,17 +620,14 @@ { GSI_HCA_LIST_LOCK_VAR; - VD_ENTERFUNC(); - if (hca == NULL) { - VD_TRACE(VD_ERROR, "hca == NULL\n"); - VD_LEAVEFUNC(); + printk(KERN_ERR "hca == NULL\n"); return -ENODEV; } GSI_HCA_LIST_LOCK(); if (--hca->ref_cnt > 0) { - VD_LEAVEFUNC("cnt - %d\n", hca->ref_cnt); + printk(KERN_DEBUG "cnt - %d\n", hca->ref_cnt); GSI_HCA_LIST_UNLOCK(); return 0; } @@ -679,8 +642,6 @@ ib_destroy_cq(hca->cq); kfree(hca); - VD_LEAVEFUNC(); - return 0; } @@ -699,8 +660,6 @@ struct ib_qp_cap qp_cap; GSI_HCA_LIST_LOCK_VAR; - VD_ENTERFUNC(); - /* * First check if HCA already open for GSI */ @@ -719,12 +678,11 @@ if (hca) { hca->ref_cnt++; *hca_info = hca; - VD_TRACE(VD_DEBUG, "Already open\n"); + printk(KERN_DEBUG "Already open\n"); } GSI_HCA_LIST_UNLOCK(); if (hca) { - VD_LEAVEFUNC(); return 0; } @@ -732,8 +690,7 @@ * Create new HCA info */ if ((hca = kmalloc(sizeof (*hca), GFP_KERNEL)) == NULL) { - VD_TRACE(VD_ERROR, "Memory allocation error\n"); - VD_LEAVEFUNC(); + printk(KERN_ERR "Memory allocation error\n"); return -ENOMEM; } @@ -751,14 +708,14 @@ (ib_comp_handler) gsi_thread_compl_cb, (void *) hca, &cq_size); if (IS_ERR(hca->cq)) { - VD_TRACE(VD_ERROR, "Could not create receive CQ.\n"); + printk(KERN_ERR "Could not create receive CQ.\n"); ret = PTR_ERR(hca->cq); goto error3; } hca->pd = ib_alloc_pd(device); if (IS_ERR(hca->pd)) { - VD_TRACE(VD_ERROR, "Could not allocate protection domain.\n"); + printk(KERN_ERR "Could not allocate protection domain.\n"); ret = PTR_ERR(hca->pd); goto error4; } @@ -783,16 +740,16 @@ &qp_cap); #endif if (IS_ERR(hca->qp)) { - VD_TRACE(VD_ERROR, "Could not create QP.\n"); + printk(KERN_ERR "Could not create QP.\n"); ret = PTR_ERR(hca->qp); goto error5; } - VD_TRACE(VD_DEBUG, "Created QP - %d\n", qp_attr.create_return.qp_num); + printk(KERN_DEBUG "Created QP - %d\n", hca->qp->qp_num); if ((ret = gsi_dtgrm_pool_create(GSI_QP_RCV_SIZE, &hca->rcv_dtgrm_pool)) < 0) { - VD_TRACE(VD_ERROR, "Could not create datagram pool\n"); + printk(KERN_ERR "Could not create datagram pool\n"); goto error6; } @@ -804,7 +761,7 @@ gsi_thread_init(hca); if ((ret = gsi_hca_start(hca))) { - VD_TRACE(VD_ERROR, "Could not start device\n"); + printk(KERN_ERR "Could not start device\n"); goto error7; } @@ -812,8 +769,6 @@ list_add_tail((struct list_head *) hca, &gsi_hca_list); GSI_HCA_LIST_UNLOCK(); - VD_LEAVEFUNC(); - return 0; error7: @@ -829,8 +784,6 @@ error3: kfree(hca); - VD_LEAVEFUNC(); - return ret; } @@ -852,15 +805,13 @@ #endif GSI_CLASS_LOCK_VAR; - VD_ENTERFUNC(); - if (rmpp) { - VD_TRACE(VD_ERROR, "RMPP not supported yet"); + printk(KERN_ERR "RMPP not supported yet"); return -EPERM; } if ((ret = gsi_hca_open(device, &hca, port))) { - VD_TRACE(VD_ERROR, "Cannot open HCA\n"); + printk(KERN_ERR "Cannot open HCA\n"); goto error1; } @@ -871,7 +822,7 @@ if ((server_info = gsi_get_class_info(class, hca, GSI_SERVER_ID)) != NULL) { gsi_put_class_info(server_info); - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Agent already registered for this class -(0x%x)!\n", class); ret = -EBUSY; @@ -888,7 +839,7 @@ if ((ret = gsi_register_redirection(hca_name, hca->port, class, &class_port_info))) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Could not register redirection for class (0x%x)!\n", class); goto error2; @@ -897,7 +848,7 @@ } if (!(newinfo = kmalloc(sizeof (*newinfo), GFP_KERNEL))) { - VD_TRACE(VD_ERROR, "Memory allocation error\n"); + printk(KERN_ERR "Memory allocation error\n"); ret = -ENOMEM; goto error2; } @@ -924,7 +875,7 @@ if (rmpp) { if (gsi_dtgrm_pool_create(GSI_RMPP_RCV_POOL_SIZE, &newinfo->rmpp_rcv_dtgrm_pool) < 0) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Could not create RMPP receive pool\n"); ret = -ENOMEM; goto error3; @@ -933,7 +884,7 @@ if (gsi_dtgrm_pool_create (GSI_RMPP_SND_POOL_SIZE, &newinfo->rmpp_snd_dtgrm_pool) < 0) { - VD_TRACE(VD_ERROR, "Could not create RMPP send pool\n"); + printk(KERN_ERR "Could not create RMPP send pool\n"); ret = -ENOMEM; goto error4; } @@ -942,14 +893,14 @@ gsi_rmpp_receive_cb, gsi_rmpp_send_compl_cb)) == NULL) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Could not register RMPP service for class (0x%x)!\n", class); ret = -EPERM; goto error5; } - VD_TRACE(VD_DEBUG, "Registered RMPP service for class (0x%x)\n", + printk(KERN_DEBUG "Registered RMPP service for class (0x%x)\n", class); } @@ -957,10 +908,9 @@ list_add_tail((struct list_head *) newinfo, &gsi_class_list); GSI_CLASS_UNLOCK(); - VD_TRACE(VD_DEBUG, "Registered class-%d, client id-%d\n", + printk(KERN_DEBUG "Registered class-%d, client id-%d\n", newinfo->class, newinfo->client_id); - VD_LEAVEFUNC(); return 0; error5: @@ -974,7 +924,6 @@ error2: gsi_hca_close(hca); error1: - VD_LEAVEFUNC(); return ret; } @@ -988,11 +937,9 @@ (struct gsi_serv_class_info_st *) handle; GSI_CLASS_LOCK_VAR; - VD_ENTERFUNC(); - GSI_CLASS_LOCK(); if (class_info->ref_cnt) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Class in use! Cannot deregister class. Try later.\n"); GSI_CLASS_UNLOCK(); return -EPERM; @@ -1003,7 +950,7 @@ if (class_info->rmpp) { if (rmpp_deregister(class_info->rmpp_h) != RMPP_IB_SUCCESS) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Could not deregister RMPP service.\n"); GSI_CLASS_LOCK(); list_add_tail((struct list_head *) class_info, @@ -1021,8 +968,6 @@ gsi_hca_close(class_info->hca); kfree(class_info); - VD_LEAVEFUNC(); - return 0; } @@ -1041,8 +986,6 @@ (struct gsi_serv_class_info_st *) &gsi_class_list, *ret_info = NULL; GSI_CLASS_LOCK_VAR; - VD_ENTERFUNC(); - GSI_CLASS_LOCK(); list_for_each(info, head) { if (info->class == class && @@ -1054,7 +997,6 @@ } GSI_CLASS_UNLOCK(); - VD_LEAVEFUNC(); return ret_info; } @@ -1094,14 +1036,11 @@ (struct gsi_serv_class_info_st *) &gsi_class_list; GSI_CLASS_LOCK_VAR; - VD_ENTERFUNC(); - GSI_CLASS_LOCK(); list_for_each(class_info, head) { class_info->ref_cnt++; } GSI_CLASS_UNLOCK(); - VD_LEAVEFUNC(); } /* @@ -1114,14 +1053,11 @@ (struct gsi_serv_class_info_st *) &gsi_class_list; GSI_CLASS_LOCK_VAR; - VD_ENTERFUNC(); - GSI_CLASS_LOCK(); list_for_each(class_info, head) { class_info->ref_cnt--; } GSI_CLASS_UNLOCK(); - VD_LEAVEFUNC(); } /* @@ -1138,8 +1074,6 @@ NULL; GSI_HCA_LIST_LOCK_VAR; - VD_ENTERFUNC(); - gsi_get_all_classes(); GSI_HCA_LIST_LOCK(); @@ -1155,16 +1089,13 @@ if (!ret_hca_info) gsi_put_all_classes(); - VD_LEAVEFUNC(); return ret_hca_info; } static void gsi_put_hca_info(struct gsi_hca_info_st *hca_info) { - VD_ENTERFUNC(); gsi_put_all_classes(); - VD_LEAVEFUNC(); } /* @@ -1197,27 +1128,25 @@ static struct ib_wc wc; int err_status = 0; - VD_ENTERFUNC(); - while (!ib_poll_cq(hca->cq, 1, &wc)) { - VD_TRACE(VD_DEBUG, "Completion - WR ID = 0x%Lx\n", wc.wr_id); + printk(KERN_DEBUG "Completion - WR ID = 0x%Lx\n", wc.wr_id); if (wc.status != IB_WC_SUCCESS) { switch (wc.opcode) { case IB_WC_SEND: - VD_TRACE(VD_ERROR, "Send compl error: %d\n", - wc.status) + printk(KERN_ERR "Send compl error: %d\n", + wc.status); hca->stat.snd_err_cnt++; break; case IB_WC_RECV: - VD_TRACE(VD_ERROR, "Rcv compl error: %d\n", + printk(KERN_ERR "Rcv compl error: %d\n", wc.status); hca->stat.rcv_err_cnt++; break; default: - VD_TRACE(VD_ERROR, "Wrong OPcode: %d\n", + printk(KERN_ERR "Wrong OPcode: %d\n", wc.opcode); - VD_TRACE(VD_ERROR, "Compl error: %d\n", + printk(KERN_ERR "Compl error: %d\n", wc.status); break; } @@ -1228,7 +1157,7 @@ dtgrm_priv = (struct gsi_dtgrm_priv_st *) (unsigned long) wc.wr_id; - VD_TRACE(VD_DEBUG, "Completion - dgrm ptr = 0x%p\n", + printk(KERN_DEBUG "Completion - dgrm ptr = 0x%p\n", dtgrm_priv); switch (wc.opcode) { case IB_WC_SEND: @@ -1238,7 +1167,7 @@ gsi_receive_cb(hca, dtgrm_priv, &wc); break; default: - VD_TRACE(VD_ERROR, "Wrong OPcode: %d\n", wc.opcode); + printk(KERN_DEBUG "Wrong Opcode: %d\n", wc.opcode); break; } } @@ -1250,7 +1179,6 @@ ib_req_notify_cq(hca->cq, IB_CQ_NEXT_COMP); } - VD_LEAVEFUNC(); } /* @@ -1264,8 +1192,6 @@ struct mad_t *mad = (struct mad_t *) &dtgrm_priv->mad; GSI_SND_LIST_LOCK_VAR; - VD_ENTERFUNC(); - #if 0 /* GSI_ADDRESS_HNDL_POOL_SUPPORT */ ib_put_ah(dtgrm_priv->addr_hndl); #else @@ -1288,7 +1214,7 @@ * Exception: * If GSI_DONT_KEEP_SEND_MADS defined, * MADs with Send() method are not kept. - * Communication Manager version 1 uses only Send() methods and + * Communication Manager (CM) uses only Send() methods and * never sets the response bit. * Every datagram will be in use for long time in this case. * Datagrams may finish during intensive communication. @@ -1305,7 +1231,7 @@ && mad->hdr.m.ms.method != MAD_MTHD_SEND #endif ) { - VD_TRACE(VD_DEBUG, "Request packet. Free resources later\n"); + printk(KERN_DEBUG "Request packet. Free resources later\n"); mad_swap_header(mad); GSI_SND_LIST_UNLOCK(hca); return; @@ -1328,10 +1254,9 @@ GSI_SND_LIST_UNLOCK(hca); class_info = (struct gsi_serv_class_info_st *) dtgrm_priv->context; - if (class_info == NULL) { /* This may happen for PLM (process local mad) */ - VD_TRACE(VD_DEBUG, + printk(KERN_DEBUG \ "No management class -%x associated with a sent MAD\n", mad->hdr.class); gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm_priv); @@ -1341,12 +1266,9 @@ /* Increase class use counter */ gsi_use_class_info(class_info); - /* - * If the sent MAD is for RMPP - - * handle it properly. - */ + /* If the sent MAD is for RMPP, handle it properly. */ if (gsi_is_rmpp_mad(class_info, (struct gsi_dtgrm_t *) dtgrm_priv)) { - VD_TRACE(VD_DEBUG, "RMPP segment send done\n"); + printk(KERN_DEBUG "RMPP segment send done\n"); rmpp_mad_send_done(class_info->rmpp_h, dtgrm_priv->rmpp_context); gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm_priv); @@ -1355,13 +1277,12 @@ class_info->context, (struct gsi_dtgrm_t *) dtgrm_priv); } else { - VD_TRACE(VD_ERROR, "No callback registered\n"); + printk(KERN_ERR "No callback registered\n"); gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm_priv); } gsi_put_class_info(class_info); - VD_LEAVEFUNC(); } /* @@ -1371,10 +1292,8 @@ static inline void gsi_sent_dtgrm_timer_run() { - VD_ENTERFUNC(); if (!atomic_dec_and_test(&gsi_sent_dtgrm_timer_running)) { - VD_TRACE(VD_DEBUG, "Timer already running\n"); - VD_LEAVEFUNC(); + printk(KERN_DEBUG "Timer already running\n"); return; } @@ -1384,7 +1303,6 @@ gsi_sent_dtgrm_timer.expires = jiffies + GSI_DGRM_LIFE_TIME; add_timer(&gsi_sent_dtgrm_timer); - VD_LEAVEFUNC(); } /* @@ -1393,16 +1311,13 @@ static inline void gsi_sent_dtgrm_timer_stop() { - VD_ENTERFUNC(); if (atomic_dec_and_test(&gsi_sent_dtgrm_timer_running)) { - VD_TRACE(VD_DEBUG, "Timer not running\n"); + printk(KERN_DEBUG "Timer not running\n"); atomic_inc(&gsi_sent_dtgrm_timer_running); - VD_LEAVEFUNC(); return; } del_timer_sync(&gsi_sent_dtgrm_timer); atomic_set(&gsi_sent_dtgrm_timer_running, 1); - VD_LEAVEFUNC(); } /* @@ -1419,7 +1334,6 @@ int run_timer_again = FALSE; GSI_SND_LIST_LOCK_VAR; - VD_ENTERFUNC(); gsi_get_all_classes(); INIT_LIST_HEAD(&sent_dtgrm_list); @@ -1437,13 +1351,13 @@ dtgrm != (struct gsi_dtgrm_priv_st *) &class_info-> snd_posted_dtgrm_list; dtgrm = dtgrm->next) { - VD_TRACE(VD_DEBUG, "Posted counter: %d \n", + printk(KERN_DEBUG "Posted counter: %d \n", dtgrm->posted); if (!dtgrm->posted && dtgrm->time + GSI_DGRM_LIFE_TIME < jiffies) { struct gsi_dtgrm_priv_st *dtgrm_tmp; - VD_TRACE(VD_DEBUG, "Remove the datagram\n"); + printk(KERN_DEBUG "Remove the datagram\n"); dtgrm_tmp = dtgrm->prev; v_list_del((struct list_head *) dtgrm); @@ -1455,10 +1369,7 @@ } GSI_SND_LIST_UNLOCK(class_info->hca); - /* - * If there are datagrams in lists, - * run the timer again. - */ + /* If there are datagrams in the lists, run the timer again. */ if (run_timer_again) gsi_sent_dtgrm_timer_run(); @@ -1475,13 +1386,13 @@ mad_swap_header((struct mad_t *) &dtgrm->mad); if (class_info->send_compl_cb) { - VD_TRACE(VD_DEBUG, "Call the callback\n"); + printk(KERN_DEBUG "Calling the callback\n"); class_info->send_compl_cb(class_info, class_info->context, (struct gsi_dtgrm_t *) dtgrm); } else { - VD_TRACE(VD_ERROR, "No callback registered\n"); + printk(KERN_ERR "No callback registered\n"); gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm); } @@ -1489,14 +1400,13 @@ } gsi_put_all_classes(); - VD_LEAVEFUNC(); } /* * Handle received response datagram. * * Look for corresponding request datagram waiting in the send datagram list. - * If the received response datagram is a redirection request - resent the wating datagram. + * If the received response datagram is a redirection request - resend the waiting datagram. * For regular response - release the request datagram. */ static void @@ -1512,12 +1422,10 @@ GSI_SND_LIST_LOCK_VAR; GSI_REDIR_LIST_LOCK_VAR; - VD_ENTERFUNC(); - *redirect = (int) mad->hdr.u.ns.status.s.redir_rqrd; /* - * Search if there are any request datagrams wating for this response. + * Search if there are any request datagrams waiting for this response. */ GSI_SND_LIST_LOCK(class_info->hca); list_for_each((struct list_head *) req_dtgrm, @@ -1537,7 +1445,7 @@ */ if ((struct list_head *) req_dtgrm == &class_info->snd_posted_dtgrm_list) { - VD_TRACE(VD_DEBUG, "No corresponding request datagram.\n"); + printk(KERN_DEBUG "No corresponding request datagram.\n"); goto out; } @@ -1553,14 +1461,14 @@ if (attrib_id != MAD_CLASS_ATTRIB_ID_CLASS_PORT_INFO || attrib_modifier != 0) { - VD_TRACE(VD_ERROR, "Illegal attrib ID on redirection MAD\n"); + printk(KERN_ERR "Illegal attrib ID on redirection MAD\n"); goto out1; } /* * Redirect */ - VD_TRACE(VD_DEBUG, "Handle redirect packet\n"); + printk(KERN_DEBUG "Handle redirect packet\n"); redir_mad = (struct gsi_redirect_mad_t *) mad; @@ -1572,7 +1480,7 @@ list_for_each((struct list_head *) redirect_info, &class_info->redirect_class_port_info_list) { if (resp_dtgrm->rlid == redirect_info->rlid) { - VD_TRACE(VD_DEBUG, "Redirect info already exists\n"); + printk(KERN_DEBUG "Redirect info already exists\n"); v_list_del((struct list_head *) redirect_info); break; } @@ -1583,7 +1491,7 @@ &class_info->redirect_class_port_info_list) { if ((redirect_info = kmalloc(sizeof (*redirect_info), GFP_KERNEL)) == NULL) { - VD_TRACE(VD_DEBUG, "Memory allocation error.\n"); + printk(KERN_DEBUG "Memory allocation error.\n"); goto out1; } } @@ -1601,7 +1509,7 @@ &class_info->redirect_class_port_info_list); GSI_REDIR_LIST_UNLOCK(class_info); - VD_TRACE(VD_DEBUG, "Redirect LID: %d, QP: %d QK: 0x%x\n", + printk(KERN_DEBUG "Redirect LID: %d, QP: %d QK: 0x%x\n", redirect_info->class_port_info.redirect_lid, redirect_info->class_port_info.redirect_qp, redirect_info->class_port_info.redirect_q_key); @@ -1610,12 +1518,11 @@ * Retransmit the request datagram. */ if (gsi_post_send_mad(class_info, (struct gsi_dtgrm_t *) req_dtgrm)) { - VD_TRACE(VD_DEBUG, + printk(KERN_DEBUG \ "Could not retransmit redirected datagram.\n"); goto out1; } - VD_LEAVEFUNC(); return; out1: @@ -1623,17 +1530,16 @@ * Release the request datagram (call the send completion callback) */ if (class_info->send_compl_cb) { - VD_TRACE(VD_DEBUG, "Call the callback\n"); + printk(KERN_DEBUG "Calling the callback\n"); class_info->send_compl_cb(class_info, class_info->context, (struct gsi_dtgrm_t *) req_dtgrm); } else { - VD_TRACE(VD_ERROR, "No callback registered\n"); + printk(KERN_ERR "No callback registered\n"); gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) req_dtgrm); } out: - VD_LEAVEFUNC(); return; } @@ -1643,10 +1549,8 @@ { struct rmpp_ib_mad_element_t *rmpp_mad; - VD_ENTERFUNC(); - if (rmpp_get_mad(MAD_RMPP_DATA_SIZE, &rmpp_mad) != RMPP_IB_SUCCESS) { - VD_TRACE(VD_ERROR, "Could not get RMPP mad element\n"); + printk(KERN_ERR "Could not get RMPP mad element\n"); goto get_mad_err; } @@ -1663,7 +1567,7 @@ rmpp_rcv(class_info->rmpp_h, rmpp_mad); get_mad_err: - VD_LEAVEFUNC(); + return; } static inline int @@ -1696,8 +1600,6 @@ char mad_out[256] = { 0 }; GSI_RCV_LIST_LOCK_VAR; - VD_ENTERFUNC(); - mad = &dtgrm_priv->mad; hca->stat.rcv_cnt++; @@ -1714,7 +1616,7 @@ mad_swap_header(mad); - VD_TRACE(VD_DEBUG, "MAD status: 0x%x, TID: 0x%Lx\n", + printk(KERN_DEBUG "MAD status: 0x%x, TID: 0x%Lx\n", mad->hdr.u.ns.status.raw16, mad->hdr.transact_id); /* @@ -1724,21 +1626,21 @@ client_id = ((struct gsi_tid_st *) &mad->hdr.transact_id)->client_id; - VD_TRACE(VD_DEBUG, + printk(KERN_DEBUG \ "Received response packet for class-%d, client id - %d\n", mad->hdr.class, client_id); } else { - VD_TRACE(VD_DEBUG, "Received request packet for class-%d\n", + printk(KERN_DEBUG "Received request packet for class-%d\n", mad->hdr.class); } if ((class_info = gsi_get_class_info(mad->hdr.class, hca, client_id)) == NULL) { - VD_TRACE(VD_DEBUG, + printk(KERN_DEBUG \ "Management class not registered - %x, client id %d - injecting back to FW\n", mad->hdr.class, client_id); - VD_TRACE(VD_DEBUG, "Source QP %d, LID %d \n", - wc->addres_info.source_qp, wc->addres_info.slid); + printk(KERN_DEBUG "Source QP %d, LID %d \n", + wc->src_qp, wc->slid); if (!mad->hdr.m.ms.r) { @@ -1750,14 +1652,14 @@ 0, mad, (struct mad_t *) mad_out)) { - VD_TRACE(VD_DEBUG, + printk(KERN_DEBUG \ "process_local_mad() failed.\n"); goto out; } memcpy(mad, mad_out, 256); - VD_TRACE(VD_DEBUG, "Posting PLM reply.\n"); + printk(KERN_DEBUG "Posting PLM reply.\n"); dtgrm_priv->rqp = wc->src_qp; dtgrm_priv->rlid = wc->slid; dtgrm_priv->path_bits = wc->dlid_path_bits; @@ -1782,7 +1684,7 @@ dtgrm_priv->path_bits = wc->dlid_path_bits; dtgrm_priv->sl = wc->sl; - VD_TRACE(VD_DEBUG, + printk(KERN_DEBUG \ "Received datagram - remote QP num-%d, LID-%d, path bits- %d, SL - %d\n", dtgrm_priv->rqp, dtgrm_priv->rlid, dtgrm_priv->path_bits, dtgrm_priv->sl); @@ -1791,7 +1693,7 @@ gsi_handle_response_mad(class_info, dtgrm_priv, &redirect); if (redirect) { - VD_TRACE(VD_DEBUG, "Redirect mad was handled\n"); + printk(KERN_DEBUG "Redirect mad was handled\n"); goto out1; } @@ -1808,20 +1710,19 @@ class_info->context, (struct gsi_dtgrm_t *) dtgrm_priv); } else { - VD_TRACE(VD_ERROR, "No receive callback registered\n"); + printk(KERN_DEBUG "No receive callback registered\n"); goto out1; } gsi_put_class_info(class_info); - VD_LEAVEFUNC(); - return; + goto out_plm; out1: gsi_put_class_info(class_info); out: gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm_priv); out_plm: - VD_LEAVEFUNC(); + return; } static void @@ -1833,14 +1734,13 @@ int redirect = FALSE; struct gsi_redirect_info_st *redirect_info; GSI_REDIR_LIST_LOCK_VAR; - VD_ENTERFUNC(); if (!dtgrm_priv->mad.hdr.m.ms.r) { GSI_REDIR_LIST_LOCK(class_info); list_for_each((struct list_head *) redirect_info, &class_info->redirect_class_port_info_list) { if (dtgrm_priv->rlid == redirect_info->rlid) { - VD_TRACE(VD_DEBUG, + printk(KERN_DEBUG \ "The packet will be redirected\n"); memcpy(&class_port_info, &redirect_info->class_port_info, @@ -1861,7 +1761,7 @@ addr_vec->port = class_info->hca->port; if (redirect) { - VD_TRACE(VD_DEBUG, "Redirect LID: %d, QP: %d QK: 0x%x\n", + printk(KERN_DEBUG "Redirect LID: %d, QP: %d QK: 0x%x\n", class_port_info.redirect_lid, class_port_info.redirect_qp, class_port_info.redirect_q_key); @@ -1881,7 +1781,6 @@ dtgrm_priv->r_q_key = GSI_QP1_WELL_KNOWN_Q_KEY; } - VD_LEAVEFUNC(); } /* @@ -1897,9 +1796,8 @@ int ret; struct mad_t *mad = (struct mad_t *) dtgrm->mad; - VD_ENTERFUNC(); if (!hca->up) { - VD_TRACE(VD_ERROR, "Device is down\n"); + printk(KERN_ERR "Device is down\n"); ret = -EPERM; goto error; } @@ -1934,14 +1832,13 @@ } if (gsi_is_rmpp_send_dtgrm(class_info, dtgrm)) { - VD_TRACE(VD_DEBUG, "Post RMPP datagram\n"); + printk(KERN_DEBUG "Post RMPP datagram\n"); ret = gsi_post_send_rmpp(class_info, dtgrm); } else { - VD_TRACE(VD_DEBUG, "Post regular datagram\n"); + printk(KERN_DEBUG "Post regular datagram\n"); ret = gsi_post_send_mad(class_info, dtgrm); } error: - VD_LEAVEFUNC(); return ret; } @@ -1966,19 +1863,17 @@ (struct gsi_dtgrm_priv_st *) dtgrm; GSI_SND_LIST_LOCK_VAR; - VD_ENTERFUNC(); - gsi_prepare_send_data(class_info, dtgrm_priv, &addr_vec); #if 0 /* GSI_ADDRESS_HNDL_POOL_SUPPORT */ if ((ret = ib_get_ah(hca->handle->pd, &addr_vec, &addr_hndl))) { - VD_TRACE(VD_ERROR, "Could not create address handle\n"); + printk(KERN_ERR "Could not create address handle\n"); goto error1; } #else addr_hndl = ib_create_ah(hca->pd, &addr_hndl_attr); if (IS_ERR(addr_hndl)) { - VD_TRACE(VD_ERROR, "Could not create address handle\n"); + printk(KERN_ERR "Could not create address handle\n"); ret = PTR_ERR(addr_hndl); goto error1; } @@ -1990,7 +1885,7 @@ IB_MR_LOCAL_WRITE, &dtgrm_priv->sg.lkey, &rkey); if (IS_ERR(dtgrm_priv->v_mem_h)) { - VD_TRACE(VD_ERROR, "Could not get general memory attr.\n"); + printk(KERN_ERR "Could not get general memory attr.\n"); ret = PTR_ERR(dtgrm_priv->v_mem_h); goto error2; } @@ -2030,12 +1925,11 @@ dtgrm_priv->context = (void *) class_info; if ((ret = ib_post_send(hca->qp, &wr, &bad_wr))) { - VD_TRACE(VD_ERROR, "Could not post send request\n"); + printk(KERN_ERR "Could not post send request\n"); goto error3; } class_info->stat.snd_cnt++; - VD_LEAVEFUNC(); return 0; error3: @@ -2059,13 +1953,11 @@ { void *rmpp_buf; - VD_ENTERFUNC(); if ((rmpp_buf = gsi_dtgrm_alloc_rmpp_buf((struct gsi_dtgrm_t *) dtgrm_priv, rmpp_mad->size - MAD_RMPP_HDR_SIZE)) == NULL) { - VD_TRACE(VD_ERROR, "Could not allocate RMPP buffer\n"); - VD_LEAVEFUNC(); + printk(KERN_ERR "Could not allocate RMPP buffer\n"); return; } @@ -2081,7 +1973,6 @@ dtgrm_priv->rmpp_dir_switch_needed = rmpp_mad->dir_switch_needed; - VD_LEAVEFUNC(); } /* @@ -2102,8 +1993,6 @@ struct gsi_dtgrm_priv_st *dtgrm_priv = (struct gsi_dtgrm_priv_st *) dtgrm; - VD_ENTERFUNC(); - addr_vec.grh_flag = 0; memset((char *) &addr_vec.grh.dgid, 0, sizeof (addr_vec.grh.dgid)); addr_vec.grh.sgid_index = 0; @@ -2121,13 +2010,13 @@ #if 0 /* GSI_ADDRESS_HNDL_POOL_SUPPORT */ if ((ret = ib_get_ah(hca->handle->pd, &addr_vec, &addr_hndl))) { - VD_TRACE(VD_ERROR, "Could not create address handle\n"); + printk(KERN_ERR "Could not create address handle\n"); goto error1; } #else addr_hndl = ib_create_ah(hca->pd, &addr_hndl_attr); if (IS_ERR(addr_hndl)) { - VD_TRACE(VD_ERROR, "Could not create address handle\n"); + printk(KERN_ERR "Could not create address handle\n"); ret = PTR_ERR(addr_hndl); goto error1; } @@ -2138,7 +2027,7 @@ IB_MR_LOCAL_WRITE, &dtgrm_priv->sg.lkey, &rkey); if (IS_ERR(dtgrm_priv->v_mem_h)) { - VD_TRACE(VD_ERROR, "Could not get general memory attr.\n"); + printk(KERN_ERR "Could not get general memory attr.\n"); goto error2; } @@ -2172,11 +2061,10 @@ dtgrm_priv->context = NULL; if ((ret = ib_post_send(hca->qp, &wr, &bad_wr))) { - VD_TRACE(VD_ERROR, "Could not post send request\n"); + printk(KERN_ERR "Could not post send request\n"); goto error2; } - VD_LEAVEFUNC(); return 0; error2: @@ -2193,8 +2081,6 @@ gsi_conv_rcv_dtgm_to_rmpp_mad(struct rmpp_ib_mad_element_t *rmpp_mad, struct gsi_dtgrm_priv_st *dtgrm_priv) { - VD_ENTERFUNC(); - memcpy(rmpp_mad->p_mad_buf, &dtgrm_priv->mad, MAD_BLOCK_SIZE); /* @@ -2213,15 +2099,12 @@ rmpp_mad->grh_valid = FALSE; rmpp_mad->pkey_index = GSI_P_KEY_INDEX; - VD_LEAVEFUNC(); } static void gsi_conv_send_dtgm_to_rmpp_mad(struct rmpp_ib_mad_element_t *rmpp_mad, struct gsi_dtgrm_priv_st *dtgrm_priv) { - VD_ENTERFUNC(); - memcpy(rmpp_mad->p_mad_buf, &dtgrm_priv->mad, MAD_RMPP_HDR_SIZE); memcpy((char *) (rmpp_mad->p_mad_buf) + MAD_RMPP_HDR_SIZE, dtgrm_priv->rmpp_payload, dtgrm_priv->rmpp_payload_size); @@ -2239,7 +2122,6 @@ rmpp_mad->context = dtgrm_priv; - VD_LEAVEFUNC(); } /* @@ -2252,11 +2134,9 @@ struct rmpp_ib_mad_element_t *rmpp_mad; int ret; - VD_ENTERFUNC(); - if (rmpp_get_mad(dtgrm->rmpp_payload_size, &rmpp_mad) != RMPP_IB_SUCCESS) { - VD_TRACE(VD_ERROR, "Could not get RMPP mad element\n"); + printk(KERN_ERR "Could not get RMPP mad element\n"); ret = -ENOBUFS; goto get_mad_err; } @@ -2265,18 +2145,16 @@ (struct gsi_dtgrm_priv_st *) dtgrm); if (rmpp_send(class_info->rmpp_h, rmpp_mad) != RMPP_IB_SUCCESS) { - VD_TRACE(VD_ERROR, "RMPP send error\n"); + printk(KERN_ERR "RMPP send error\n"); ret = -EPERM; goto send_err; } - VD_LEAVEFUNC(); return 0; send_err: rmpp_put_mad(rmpp_mad); get_mad_err: - VD_LEAVEFUNC(); return ret; } @@ -2287,22 +2165,21 @@ { struct gsi_serv_class_info_st *class_info; - VD_ENTERFUNC(); - VD_TRACE(VD_DEBUG, "RMPP send callback\n"); + printk(KERN_DEBUG "RMPP send callback\n"); class_info = (struct gsi_serv_class_info_st *) rmpp_get_vendal_context(rmpp_h); if (status == RMPP_IB_SUCCESS) { - VD_TRACE(VD_DEBUG, "RMPP send success\n"); + printk(KERN_DEBUG "RMPP send success\n"); } else if (status == RMPP_IB_ACK_DONE) { - VD_TRACE(VD_DEBUG, "RMPP ACK send success\n"); + printk(KERN_DEBUG "RMPP ACK send success\n"); /* * If this is an ACK or NACK completion, */ goto out; } else { - VD_TRACE(VD_ERROR, "RMPP send error %d\n", status); + printk(KERN_ERR "RMPP send error %d\n", status); } class_info->send_compl_cb(class_info, class_info->context, rmpp_mad->context); @@ -2310,7 +2187,6 @@ out: rmpp_put_mad(rmpp_mad); - VD_LEAVEFUNC(); } static void @@ -2319,15 +2195,14 @@ struct gsi_serv_class_info_st *class_info; struct gsi_dtgrm_t *dtgrm; - VD_ENTERFUNC(); - VD_TRACE(VD_DEBUG, "RMPP receive callback\n"); + printk(KERN_DEBUG "RMPP receive callback\n"); class_info = (struct gsi_serv_class_info_st *) rmpp_get_vendal_context(rmpp_h); if (gsi_dtgrm_pool_get(class_info->rmpp_rcv_dtgrm_pool, (struct gsi_dtgrm_t **) &dtgrm)) { - VD_TRACE(VD_ERROR, "Could not get datagram\n"); + printk(KERN_ERR "Could not get datagram\n"); goto err; } @@ -2337,7 +2212,6 @@ class_info->receive_cb(class_info, class_info->context, dtgrm); err: rmpp_put_mad(rmpp_mad); - VD_LEAVEFUNC(); } /* @@ -2381,8 +2255,6 @@ GSI_REDIR_LIST_LOCK_VAR; GSI_HCA_LIST_LOCK_VAR; - VD_ENTERFUNC(); - /* * Print HCA/port information */ @@ -2489,8 +2361,6 @@ gsi_put_all_classes(); *eof = 1; - VD_LEAVEFUNC(); - return out - page; } @@ -2528,8 +2398,6 @@ (struct gsi_hca_info_st *) &gsi_hca_list, *hca_info; GSI_HCA_LIST_LOCK_VAR; - VD_ENTERFUNC(); - printk("********************************************\n"); printk("SA SPEC compliant - %s\n\n", rmpp_spec_compliant ? "TRUE" : "FALSE"); @@ -2590,7 +2458,6 @@ printk("********************************************\n"); gsi_put_all_classes(); - VD_LEAVEFUNC(); } static int @@ -2598,12 +2465,11 @@ { struct gsi_hca_info_st *hca_info; int ret = 0; - VD_ENTERFUNC(); printk("GSI restart command - %s %d \n", name, port); if (!(hca_info = gsi_get_hca_info(name, port))) { - VD_TRACE(VD_ERROR, "Unknown device to restart\n"); + printk(KERN_ERR "Unknown device to restart\n"); ret = -ENODEV; goto error; } @@ -2613,7 +2479,6 @@ gsi_put_hca_info(hca_info); error: - VD_LEAVEFUNC(); return ret; } @@ -2628,30 +2493,29 @@ int port; int i = count; - VD_ENTERFUNC(); p = (char *) buffer; #define READ_STR(str, size)\ gsi_proc_read_str(str, size, &i, &p) if (!READ_STR(cmd, sizeof (cmd))) { - VD_TRACE(VD_DEBUG, "Could not read command\n"); + printk(KERN_DEBUG "Could not read command\n"); goto error; } if (!strcmp(cmd, "showstat")) { - VD_TRACE(VD_DEBUG, "The SHOW STATISTICS command\n"); + printk(KERN_DEBUG "The SHOW STATISTICS command\n"); gsi_proc_show_statistics(); } else if (!strcmp(cmd, "restart")) { - VD_TRACE(VD_DEBUG, "The RESTART command\n"); + printk(KERN_DEBUG "The RESTART command\n"); if (!READ_STR(name, sizeof (name))) { - VD_TRACE(VD_ERROR, "Could not read port number\n"); + printk(KERN_ERR "Could not read port number\n"); goto error; } if (!READ_STR(port_str, sizeof (port_str))) { - VD_TRACE(VD_ERROR, "Could not read port number\n"); + printk(KERN_ERR "Could not read port number\n"); goto error; } @@ -2665,15 +2529,15 @@ } } else if (!strcmp(cmd, "rcn")) { - VD_TRACE(VD_DEBUG, "The REQUEST COMPL NOTIF command\n"); + printk(KERN_DEBUG "The REQUEST COMPL NOTIF command\n"); if (!READ_STR(name, sizeof (name))) { - VD_TRACE(VD_ERROR, "Could not read HCA name\n"); + printk(KERN_ERR "Could not read HCA name\n"); goto error; } if (!READ_STR(port_str, sizeof (port_str))) { - VD_TRACE(VD_ERROR, "Could not read port number\n"); + printk(KERN_ERR "Could not read port number\n"); goto error; } @@ -2689,15 +2553,15 @@ } } else if (!strcmp(cmd, "poll")) { - VD_TRACE(VD_DEBUG, "The POLL command\n"); + printk(KERN_DEBUG "The POLL command\n"); if (!READ_STR(name, sizeof (name))) { - VD_TRACE(VD_ERROR, "Could not read HCA name\n"); + printk(KERN_ERR "Could not read HCA name\n"); goto error; } if (!READ_STR(port_str, sizeof (port_str))) { - VD_TRACE(VD_ERROR, "Could not read port number\n"); + printk(KERN_ERR "Could not read port number\n"); goto error; } @@ -2715,7 +2579,6 @@ #undef READ_STR - VD_LEAVEFUNC(); return count; error: printk("GSI: wrong /proc/openib/gsi/control input.\n"); @@ -2723,7 +2586,6 @@ \"showstat\" show statistics\n\ \"restart HCA PORT\" reset QP and restart\n"); - VD_LEAVEFUNC(); return count; } @@ -2742,7 +2604,6 @@ struct gsi_hca_info_st *hca = param; struct gsi_thread_data_t *thread_data = &hca->thread_data; - VD_ENTERFUNC(); lock_kernel(); #if 1 /* LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0) */ @@ -2758,7 +2619,7 @@ while (1) { if (down_interruptible(&thread_data->sem)) { - VD_TRACE(VD_DEBUG, + printk(KERN_DEBUG \ "Waked up - interrupted - exit thread!\n"); break; } @@ -2766,10 +2627,9 @@ if (!thread_data->run) break; - VD_TRACE(VD_DEBUG, "Waked up!\n"); + printk(KERN_DEBUG "Waked up!\n"); gsi_compl_cb(hca); } - VD_LEAVEFUNC(); return 0; } @@ -2782,9 +2642,7 @@ { struct gsi_thread_data_t *thread_data = &hca->thread_data; - VD_ENTERFUNC(); up(&thread_data->sem); - VD_LEAVEFUNC(); } /* @@ -2795,10 +2653,8 @@ { struct gsi_thread_data_t *thread_data = &hca->thread_data; - VD_ENTERFUNC(); thread_data->run = 1; kernel_thread(gsi_thread, hca, 0); - VD_LEAVEFUNC(); } /* @@ -2809,11 +2665,9 @@ { struct gsi_thread_data_t *thread_data = &hca->thread_data; - VD_ENTERFUNC(); thread_data->run = 0; gsi_thread_signal(hca); schedule(); - VD_LEAVEFUNC(); } static int @@ -2824,12 +2678,12 @@ int i = 0; if ((ret = ib_query_hca_cap(device, &hca_attrib))) { - VD_TRACE(VD_ERROR, "Could not query HCA\n"); + printk(KERN_ERR "Could not query HCA\n"); goto error_hca_query; } if (hca_attrib.phys_port_cnt > GSI_MAX_SUPPORTED_PORTS) { - VD_TRACE(VD_ERROR, "Too many ports - %d (support up to %d)\n", + printk(KERN_ERR "Too many ports - %d (support up to %d)\n", hca_attrib.phys_port_cnt, GSI_MAX_SUPPORTED_PORTS); ret = -EMFILE; goto error_too_many_ports; @@ -2838,19 +2692,18 @@ for (i = 0; i < hca_attrib.phys_port_cnt; i++) { if ((ret = gsi_hca_open(device, &(gsi_hca_h_array[i]), (i + 1)))) { - VD_TRACE(VD_ERROR, "Cannot open HCA\n"); + printk(KERN_ERR "Cannot open HCA\n"); goto error_gsi_hca_open; } gsi_hca_h_array_num_ports++; } - VD_LEAVEFUNC(); return 0; error_gsi_hca_open: while (i > 0) { if ((ret = gsi_hca_close((gsi_hca_h_array[i - 1])))) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Cannot close gsi HCA (ret: %d), ix: %d\n", ret, i - 1); } @@ -2861,7 +2714,6 @@ error_too_many_ports: error_hca_query: - VD_LEAVEFUNC(); return ret; } @@ -2873,13 +2725,13 @@ int i = 0; if ((ret = ib_query_hca_cap(device, &hca_attrib))) { - VD_TRACE(VD_ERROR, "Could not query HCA\n"); + printk(KERN_ERR "Could not query HCA\n"); goto error_hca_query; } if (hca_attrib.phys_port_cnt > GSI_MAX_SUPPORTED_PORTS) { - VD_TRACE(VD_ERROR, "Too many ports - %d (support up to %d)\n", - hca_attrib.port_num, GSI_MAX_SUPPORTED_PORTS); + printk(KERN_ERR "Too many ports - %d (support up to %d)\n", + hca_attrib.phys_port_cnt, GSI_MAX_SUPPORTED_PORTS); ret = -EMFILE; goto error_too_many_ports; } @@ -2887,27 +2739,25 @@ for (i = 0; i < hca_attrib.phys_port_cnt; i++) { if (gsi_hca_h_array[i] != NULL) { if ((ret = gsi_hca_close((gsi_hca_h_array[i])))) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Cannot close gsi HCA (ret: %d), ix: %d\n", ret, i); } gsi_hca_h_array_num_ports--; } else { - VD_TRACE(VD_ERROR, "gsi_hca_h_array[%d] == NULL\n", i); + printk(KERN_ERR "gsi_hca_h_array[%d] == NULL\n", i); } } if (gsi_hca_h_array_num_ports != 0) { - VD_TRACE(VD_ERROR, "gsi_hca_h_array_num_ports: %d != 0\n", + printk(KERN_ERR "gsi_hca_h_array_num_ports: %d != 0\n", gsi_hca_h_array_num_ports); } - VD_LEAVEFUNC(); return 0; error_too_many_ports: error_hca_query: - VD_LEAVEFUNC(); return ret; } @@ -2918,7 +2768,7 @@ switch (event) { case IB_DEVICE_NOTIFIER_ADD: if (gsi_create_ports(device)) - VD_TRACE(VD_ERROR, "Failed to initialize device."); + printk(KERN_ERR "Failed to initialize device."); break; case IB_DEVICE_NOTIFIER_REMOVE: @@ -2926,7 +2776,7 @@ break; default: - VD_TRACE(VD_ERROR, "Unknown device notifier event %d.", event); + printk(KERN_ERR "Unknown device notifier event %d.", event); break; } } @@ -2941,14 +2791,12 @@ int result = 0; struct proc_dir_entry *ent; - VD_TRACE(VD_DEBUG, "GSI_PRIV_DTGRM_PRIV_HDR_SIZE = %ld\n", + printk(KERN_DEBUG "GSI_PRIV_DTGRM_PRIV_HDR_SIZE = %ld\n", (long) GSI_PRIV_DTGRM_PRIV_HDR_SIZE); - VD_TRACE(VD_DEBUG, "GSI_DTGRM_PRIV_HDR_SIZE = %d\n", + printk(KERN_DEBUG "GSI_DTGRM_PRIV_HDR_SIZE = %d\n", GSI_DTGRM_PRIV_HDR_SIZE); - VD_TRACE(VD_DEBUG, "MAD size= %ld\n", (long) sizeof (struct mad_t)); + printk(KERN_DEBUG "MAD size= %ld\n", (long) sizeof (struct mad_t)); - VD_LOGGER_INIT("openib/modules/gsi/"); - /* * validation check on the structure's size * the structure in gsi.h is a copy of gsi_priv.h. the size of the structures is defined with #define. in @@ -2967,7 +2815,7 @@ if ((result = module_version_init(MODNAME, GSI_VERSION, "$Name: NVIGOR_2_6_GPL_BR_040614 $")) != 0) { - VD_TRACE(VD_ERROR, "Could not init module version!\n"); + printk(KERN_ERR "Could not init module version!\n"); result = -ENOMEM; goto error1; } @@ -2980,13 +2828,13 @@ ent->read_proc = gsi_read_proc; ent->write_proc = gsi_write_proc; } else { - VD_TRACE(VD_ERROR, "Could not create proc entry!\n"); + printk(KERN_ERR "Could not create proc entry!\n"); result = -ENOMEM; goto error2; } if (rmpp_init() != RMPP_SUCCESS) { - VD_TRACE(VD_ERROR, "Could init RMPP!\n"); + printk(KERN_ERR "Could init RMPP!\n"); result = -ENXIO; goto error3; } @@ -3003,14 +2851,13 @@ error2: module_version_exit(MODNAME); error1: - VD_LOGGER_EXIT(); return result; } static void gsi_cleanup_module(void) { - VD_TRACE(VD_DEBUG, "Bye!\n"); + printk(KERN_DEBUG "Bye GSI!\n"); rmpp_cleanup(); ib_device_notifier_deregister(&gsi_notifier); @@ -3018,7 +2865,6 @@ gsi_sent_dtgrm_timer_stop(); remove_proc_entry("openib/gsi/control", NULL); module_version_exit(MODNAME); - VD_LOGGER_EXIT(); } struct gsi_dtgrm_pool_info_st { @@ -3033,12 +2879,10 @@ struct gsi_hca_info_st *hca_info; int ret = 0; - VD_ENTERFUNC(); - printk("GSI POLL command - %s %d \n", name, port); if ((hca_info = gsi_get_hca_info(name, port)) == NULL) { - VD_TRACE(VD_ERROR, "Unknown device to test\n"); + printk(KERN_ERR "Unknown device to test\n"); ret = -ENODEV; goto error; } @@ -3047,7 +2891,6 @@ gsi_put_hca_info(hca_info); error: - VD_LEAVEFUNC(); return ret; } @@ -3057,26 +2900,23 @@ struct gsi_hca_info_st *hca_info; int ret; - VD_ENTERFUNC(); - printk("GSI REQUEST COMPLETION NOTIFICATION command - %s %d \n", name, port); if ((hca_info = gsi_get_hca_info(name, port)) == NULL) { - VD_TRACE(VD_ERROR, "Unknown device to test\n"); + printk(KERN_ERR "Unknown device to test\n"); ret = -ENODEV; goto error; } if ((ret = ib_req_notify_cq(hca_info->cq, IB_CQ_NEXT_COMP))) { - VD_TRACE(VD_ERROR, + printk(KERN_ERR \ "Failed to request competion notification\n"); } gsi_put_hca_info(hca_info); error: - VD_LEAVEFUNC(); return ret; } @@ -3089,18 +2929,16 @@ struct gsi_dtgrm_pool_info_st *pool; char name[GSI_POOL_MAX_NAME_LEN]; - VD_ENTERFUNC(); - /* * Sanity check */ if (!modname || (strlen(modname) > (GSI_POOL_MAX_NAME_LEN - 8))) { - VD_LEAVEFUNC("Invalid pool name\n"); + printk(KERN_ERR "Invalid pool name\n"); return -ENOENT; } if ((pool = kmalloc(sizeof (*pool), GFP_KERNEL)) == NULL) { - VD_LEAVEFUNC("Memory allocation error\n"); + printk(KERN_ERR "Memory allocation error\n"); return -ENOMEM; } @@ -3111,7 +2949,7 @@ 0, SLAB_HWCACHE_ALIGN, NULL, NULL)) == NULL) { - VD_LEAVEFUNC("kmem_cache_create failed\n"); + printk(KERN_ERR "kmem_cache_create failed\n"); goto nomem; } @@ -3120,23 +2958,18 @@ *handle = pool; - VD_LEAVEFUNC(); return 0; nomem: kfree(pool); - - VD_LEAVEFUNC(); return -ENOMEM; } static void gsi_dtgrm_pool_do_destroy(struct gsi_dtgrm_pool_info_st *pool) { - VD_ENTERFUNC(); kmem_cache_destroy(pool->cache); kfree(pool); - VD_LEAVEFUNC(); } /* @@ -3148,8 +2981,6 @@ struct gsi_dtgrm_pool_info_st *pool = (struct gsi_dtgrm_pool_info_st *) handle; - VD_ENTERFUNC(); - /* * If atomic_sub_and_test() detects negative value (non-zero), * there are posted datagrames which @@ -3162,7 +2993,6 @@ gsi_dtgrm_pool_do_destroy(pool); } - VD_LEAVEFUNC(); } /* @@ -3174,8 +3004,6 @@ struct gsi_dtgrm_pool_info_st *pool = (struct gsi_dtgrm_pool_info_st *) handle; - VD_ENTERFUNC(); - atomic_dec(&pool->cnt); if (atomic_read(&pool->cnt) < 0) { @@ -3183,7 +3011,7 @@ * pool->cnt may ne negative if the pool is empty, or * gsi_dtgrm_pool_delete() was called before. */ - VD_TRACE(VD_DEBUG, "Empty pool \n"); + printk(KERN_DEBUG "Empty pool \n"); goto empty_pool; } @@ -3191,7 +3019,7 @@ GFP_KERNEL); if (!(*dtgrm)) { - VD_TRACE(VD_ERROR, "Cannot allocate a datagram\n"); + printk(KERN_ERR "Cannot allocate a datagram\n"); goto alloc_error; } @@ -3203,13 +3031,11 @@ ((struct gsi_dtgrm_priv_st *) (*dtgrm))->posted = 0; ((struct gsi_dtgrm_priv_st *) (*dtgrm))->pool = pool; - VD_LEAVEFUNC(); return 0; empty_pool: alloc_error: atomic_inc(&pool->cnt); - VD_LEAVEFUNC(); return -ENOBUFS; } @@ -3222,8 +3048,6 @@ struct gsi_dtgrm_pool_info_st *pool = NULL; struct gsi_dtgrm_priv_st *dtgrm_priv = NULL; - VD_ENTERFUNC(); - dtgrm_priv = ((struct gsi_dtgrm_priv_st *) dtgrm); pool = (struct gsi_dtgrm_pool_info_st *) dtgrm_priv->pool; @@ -3237,11 +3061,10 @@ * destroy it now. */ if (atomic_inc_and_test(&pool->cnt)) { - VD_TRACE(VD_DEBUG, "Deleting marked for deletion pool!\n"); + printk(KERN_DEBUG "Deleting marked for deletion pool!\n"); gsi_dtgrm_pool_do_destroy(pool); } - VD_LEAVEFUNC(); return 0; } @@ -3251,8 +3074,6 @@ static int gsi_dtgrm_pool_size(void *handle) { - VD_ENTERFUNC(); - VD_LEAVEFUNC(); return ((struct gsi_dtgrm_pool_info_st *) handle)->size; } @@ -3266,24 +3087,18 @@ (struct gsi_dtgrm_pool_info_st *) handle; int cnt; - VD_ENTERFUNC(); - cnt = atomic_read(&pool->cnt); - - VD_LEAVEFUNC(); return cnt; } void * gsi_dtgrm_alloc_rmpp_buf(struct gsi_dtgrm_t *dtgrm, int size) { - VD_ENTERFUNC(); if ((dtgrm->rmpp_payload = kmalloc(size, GFP_KERNEL)) == NULL) goto alloc_error; dtgrm->rmpp_payload_size = size; - VD_LEAVEFUNC(); alloc_error: return dtgrm->rmpp_payload; } @@ -3291,22 +3106,16 @@ void gsi_dtgrm_free_rmpp_buf(struct gsi_dtgrm_t *dtgrm) { - VD_ENTERFUNC(); - kfree(dtgrm->rmpp_payload); dtgrm->rmpp_payload = NULL; dtgrm->rmpp_payload_size = 0; - - VD_LEAVEFUNC(); } void * gsi_dtgrm_get_rmpp_buf(struct gsi_dtgrm_t *dtgrm, int *size) { - VD_ENTERFUNC(); *size = dtgrm->rmpp_payload_size; - VD_LEAVEFUNC(); return dtgrm->rmpp_payload; } Index: gsi_rmpp_vendal.c =================================================================== --- gsi_rmpp_vendal.c (revision 561) +++ gsi_rmpp_vendal.c (working copy) @@ -60,7 +60,6 @@ #include #include -#include "debug_support.h" #include "ib_verbs.h" #include "mad.h" #include "gsi.h" @@ -73,8 +72,6 @@ enum rmpp_ib_api_status_t rmpp_vendal_register(void *vendal_p, void *rmpp_h) { - VD_ENTERFUNC(); - VD_LEAVEFUNC(); return RMPP_IB_SUCCESS; } @@ -82,16 +79,12 @@ gsi_conv_snd_rmpp_mad_to_dtgrm(struct gsi_dtgrm_t *dtgrm, struct rmpp_ib_mad_element_t *rmpp_mad) { - VD_ENTERFUNC(); - memcpy(&dtgrm->mad, &rmpp_mad->mad_seg, MAD_BLOCK_SIZE); dtgrm->rlid = rmpp_mad->remote_lid; dtgrm->sl = rmpp_mad->remote_sl; dtgrm->rqp = rmpp_mad->remote_qp; dtgrm->path_bits = rmpp_mad->path_bits; - - VD_LEAVEFUNC(); } enum rmpp_ib_api_status_t @@ -103,10 +96,9 @@ struct gsi_serv_class_info_st *class_info = (struct gsi_serv_class_info_st *) rmpp_get_vendal_context(rmpp_h); - VD_ENTERFUNC(); if (gsi_dtgrm_pool_get(class_info->rmpp_rcv_dtgrm_pool, (struct gsi_dtgrm_t **) &dtgrm)) { - VD_TRACE(VD_ERROR, "Could not get datagram\n"); + printk(KERN_ERR "Could not get datagram\n"); ret = RMPP_IB_INSUFFICIENT_MEMORY; goto pool_get_err; } @@ -116,36 +108,32 @@ dtgrm->rmpp_context = rmpp_mad; #ifdef DUMP - VD_TRACE(VD_DEBUG, "Posting RMPP MAD - TID: 0x%Lx \n", + printk(KERN_DEBUG "Posting RMPP MAD - TID: 0x%Lx \n", ((struct mad_t *) &dtgrm->mad)->hdr.transact_id); dump_block_with_ascii(__FUNCTION__ ": Posting MAD", (void *) &dtgrm->mad, MAD_BLOCK_SIZE, 0); - VD_TRACE(VD_DEBUG, "Remote LID=%d\n", dtgrm->rlid); + printk(KERN_DEBUG "Remote LID=%d\n", dtgrm->rlid); #endif if (gsi_post_send_mad(class_info, dtgrm)) { - VD_TRACE(VD_ERROR, "Could not post send datagram\n"); + printk(KERN_ERR "Could not post send datagram\n"); ret = RMPP_IB_ERROR; goto send_err; } - VD_LEAVEFUNC(); return RMPP_IB_SUCCESS; send_err: gsi_dtgrm_pool_put(dtgrm); pool_get_err: - VD_LEAVEFUNC(); return ret; } enum rmpp_ib_api_status_t rmpp_vendal_deregister(void *rmpp_h) { - VD_ENTERFUNC(); - VD_LEAVEFUNC(); return RMPP_IB_SUCCESS; } @@ -158,7 +146,7 @@ char buf[1024]; va_list args; #if defined(VDBG) || defined(VD_LOGGER_ON) - int vd_debug_level; + char vd_debug_level[12]; #endif if (level == RMPP_LOG_NONE) { @@ -166,16 +154,16 @@ } #if defined(VDBG) || defined(VD_LOGGER_ON) else if (level == RMPP_LOG_ERROR) { - vd_debug_level = VD_ERROR; + strcpy(vd_debug_level, "KERN_ERR"); } else { - vd_debug_level = VD_DEBUG; + strcpy(vd_debug_level, "KERN_DEBUG"); } #endif va_start(args, fmt); vsnprintf(buf, sizeof (buf) - 1, fmt, args); va_end(args); #if defined(VDBG) || defined(VD_LOGGER_ON) - VD_TRACE(vd_debug_level, "RMPP:%s (%s:%d) %s", function, file, line, + printk(%s "RMPP:%s (%s:%d) %s", vd_debug_level, function, file, line, buf); #endif } From mshefty at ichips.intel.com Tue Aug 3 15:19:35 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 3 Aug 2004 15:19:35 -0700 Subject: [openib-general] struct ib_device added to ib_verbs.h Message-ID: <20040803151935.07db66ba.mshefty@ichips.intel.com> I've committed a change to ib_verbs to include struct ib_device. I used the struct ib_device in the mthca ib_verbs.h file as the basis, but modified the function calls. There was a second small change to make cqe an input only parameter for create/modify CQ calls, since the cqe is part of struct ib_cq. Please respond if there are concerns. Thanks! - Sean From mshefty at ichips.intel.com Tue Aug 3 17:36:43 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 3 Aug 2004 17:36:43 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <52ekmrr5o5.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> Message-ID: <20040803173643.43143cdd.mshefty@ichips.intel.com> There are still some disagreements on the GSI APIs and overal model, and I would like to try to get everyone together on this. Using the gsi.h file under trunk/contrib/voltaire/access as a base, I tried to modify the API slightly to meet some sort of compromise. The modified file is below. Some comments about the changes: * I tried to make the API for QP0 and QP1, but I focused on QP1. "gsi" was replaced with "mad" because of this. * I added struct ib_mad_reg to replace the handle that is currently being returned. * I combined ib_gsi_buffer and ib_gsi_msg. (I don't think there's any reason to have the ib_gsi_msg have a list. I know that this was done in the Intel gen1 code, but I think it was the wrong approach to take. It's *much* simpler for clients to send/receive a single data buffer. In fact, I'd be surprised if there was any client that didn't copy the chained buffers into a single buffer.) * I added send_context and timeout fields to the gsi_msg. These are intended to be set when receiving a response to an outgoing request. * I changed the registration call to operate based on method values. * I removed the gsi_redir_class call for now, since there's some disagreement on how this should best be accomplished. After looking at the struct ib_gsi_reg, I have started thinking of some ideas on how this could work. What is still missing is information about RMPP. My preference would be for the clients to indicate to the access layer which classes/methods require RMPP, rather than hard-coding it directly into the access layer. This allows a single implementation of RMPP, but gives the clients control over when it should be invoked. For redirection, if you examine struct ib_gsi_reg, there's a pointer to a qp. For normal registration calls, I'm expecting that this value will go directly to QP1. For QP redirection, we can add a calls such as: struct ib_mad_reg *ib_qp_redir(struct ib_qp *qp, ib_mad_send_handler send_handler, ib_mad_recv_handler recv_handler, void *context); int ib_process_wc(struct ib_mad_reg *mad_reg, struct ib_wc *wc); The client would allocate and manage the QP and CQ(s). The redirect call simply informs the access layer that RMPP and request/response services should be enabled on that QP. When the client removes a work completion for that QP, it hands the work completion to the access layer for processing. The access layer can then perform RMPP and request/response matching. These calls wouldn't be needed for something like the CM, but would be helpful for an SA that required redirection, but didn't want to re-implement RMPP. I think that given these calls it would be possible to implement a "redirection layer" similar to the one mentioned in the currently proposed GSI. And I'm guessing that you could shim the proposed GSI over this API without too much trouble, especially if it disabled RMPP. Please respond with comments. Again, I'm hoping that we can reach a set of APIs that everyone is comfortable with. - Sean /* This software is available to you under a choice of one of two licenses. You may choose to be licensed under the terms of the GNU General Public License (GPL) Version 2, available at , or the OpenIB.org BSD license, available in the LICENSE.TXT file accompanying this software. These details are also available at . THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved. Copyright (c) 2004 Infinicon Corporation. All rights reserved. Copyright (c) 2004 Intel Corporation. All rights reserved. Copyright (c) 2004 Topspin Corporation. All rights reserved. Copyright (c) 2004 Voltaire Corporation. All rights reserved. */ #if !defined( IB_MAD_H ) #define IB_MAD_H #include "ib_verbs.h" typedef void (*ib_mad_send_handler)(struct ib_mad_reg *mad_reg, struct ib_mad_msg *msg); typedef void (*ib_mad_recv_handler)(struct ib_mad_reg *mad_reg, struct ib_mad_msg *msg); struct ib_mad_reg { struct ib_device *device; struct ib_qp *qp; ib_mad_recv_handler recv_handler; ib_mad_send_handler send_handler; void *context; }; struct ib_mad_msg { struct ib_mad_msg *next; void *buf; int length; /* send_context is set on a receive to context on matching send */ void *send_context; void *context; int timeout_ms; enum ib_wc_status_t status; u32 remote_qp; u32 remote_q_key; u16 remote_lid; u16 pkey_index; u8 service_level; u8 path_bits; u8 global_route; }; /* * mgmt_class, mgmt_class_version, and method_array are only * required if the user wishes to receive unsolicited MADs */ struct ib_mad_reg *ib_mad_reg_class(struct ib_device *device, u8 port, enum ib_qp_type qp_type, u8 mgmt_class, u8 mgmt_class_version, u8 *method_array, ib_mad_send_handler send_handler, ib_mad_recv_handler recv_handler, void *context); int ib_mad_dereg_class(struct ib_mad_reg *mad_reg); int ib_mad_post_send_msg(struct ib_mad_reg *mad_reg, struct ib_mad_msg *msg); /* Proposed redirection support. */ struct ib_mad_reg *ib_qp_redir(struct ib_qp *qp, ib_mad_send_handler send_handler, ib_mad_recv_handler recv_handler, void *context); ib_process_wc(struct ib_mad_reg *mad_reg, struct ib_wc *wc); #endif /* IB_MAD_H */ From halr at voltaire.com Tue Aug 3 20:41:06 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 03 Aug 2004 23:41:06 -0400 Subject: [openib-general] [PATCH] GSI: Fix addr_hdnl_attr initialization in gsi_post_send_mad Message-ID: <1091590867.1485.9.camel@localhost.localdomain> Fix addr_hdnl_attr initialization in gsi_post_send_mad Index: gsi_main.c =================================================================== --- gsi_main.c (revision 572) +++ gsi_main.c (working copy) @@ -1851,7 +1851,6 @@ { struct ib_ah_attr addr_vec; struct ib_ah *addr_hndl; - struct ib_ah_attr addr_hndl_attr; u32 rkey; struct ib_send_wr wr; struct ib_send_wr *bad_wr; @@ -1871,7 +1870,7 @@ goto error1; } #else - addr_hndl = ib_create_ah(hca->pd, &addr_hndl_attr); + addr_hndl = ib_create_ah(hca->pd, &addr_vec); if (IS_ERR(addr_hndl)) { printk(KERN_ERR "Could not create address handle\n"); ret = PTR_ERR(addr_hndl); @@ -1984,7 +1983,6 @@ { struct ib_ah_attr addr_vec; struct ib_ah *addr_hndl; - struct ib_ah_attr addr_hndl_attr; u32 rkey; struct ib_send_wr wr; struct ib_send_wr *bad_wr; @@ -2014,7 +2012,7 @@ goto error1; } #else - addr_hndl = ib_create_ah(hca->pd, &addr_hndl_attr); + addr_hndl = ib_create_ah(hca->pd, &addr_vec); if (IS_ERR(addr_hndl)) { printk(KERN_ERR "Could not create address handle\n"); ret = PTR_ERR(addr_hndl); From halr at voltaire.com Tue Aug 3 21:17:38 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 00:17:38 -0400 Subject: [openib-general] Client Reregistration Status Message-ID: <1091593059.1485.21.camel@localhost.localdomain> Hi, Here's an update on client reregistration for SM/SA: A PortInfo bit to indicate this has been proposed. GS classes require a different solution. A possible direction for a GS class solution was discussed. Also, the following questions have been asked of the AWG (Application Working Group) chair: - whether GS manager restart is already resolved - and, if not, is solving it pressing enough to attempt a solution for 1.2. There was no quorum at yesterday's MgtWG meeting so time is getting tight to complete the work to get this into 1.2 but I'm still hopeful. -- Hal From ted at topspin.com Tue Aug 3 23:55:54 2004 From: ted at topspin.com (Ted Wilcox) Date: Tue, 3 Aug 2004 23:55:54 -0700 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate In-Reply-To: <003101c478a9$9e7e4ce0$6401a8c0@comcast.net> References: <025a01c47833$096cb180$6401a8c0@comcast.net> <52n01dpp9e.fsf@topspin.com> <006a01c478a5$a8337ca0$6401a8c0@comcast.net> <52fz75pnbd.fsf@topspin.com> <003101c478a9$9e7e4ce0$6401a8c0@comcast.net> Message-ID: <20040804065554.GA24198@topspin.com> On Mon, Aug 02, 2004 at 11:58:54AM -0400, Hal Rosenstock wrote: > Roland Dreier wrote: > > Hal> I agree in general with this but: There is no requirement to > > Hal> send with GRH (only to receive with GRH) (until multisubnet > > Hal> is supported which is currently an incomplete aspect of IBA). > > Hal> I have not added this one into the TODO (at least yet)... > > > > Don't the current set of IB compliance tests try sending a request > > with GRH (and expect a response with GRH)? If you can receive > > requests with GRHs, then the response generation rules require the > > response to be sent with a GRH as well. > > Can't speak to the current set of IB compliances without doing > some homework on them but you are right that there is an extra > (C13-52.1.1) compliance for MAD responses which requires > a response with GRH to be sent if a request with GRH was > received. I will update the TODO list with this. The compliance tests do check this, at least as of the last plugfest. It's good to have on the list. -Ted. From ram at mellanox.co.il Wed Aug 4 01:18:07 2004 From: ram at mellanox.co.il (Ram Izhaki) Date: Wed, 4 Aug 2004 11:18:07 +0300 Subject: [openib-general] IB Core compile problem on AS3.0 Message-ID: <506C3D7B14CDD411A52C00025558DED603E09FEB@mtlex01.yok.mtl.com> Roland, Can you help me here? I am getting this compile error both on RH EL 3.0 and RH 9.0: make -C core modules make[2]: Entering directory `/usr/src/linux-2.4.21-15.EL/drivers/infiniband/core' perl generate_pkt_access.pl access smp_packet.desc > smp_access.c || ( rm -f smp_access.c && exit 1 ) perl generate_pkt_access.pl header smp_packet.desc > smp_access.h || ( rm -f smp_access.h && exit 1 ) perl generate_pkt_access.pl type smp_packet.desc > smp_types.h || ( rm -f smp_types.h && exit 1 ) perl generate_pkt_access.pl header pm_packet.desc > pm_access.h || ( rm -f pm_access.h && exit 1 ) perl generate_pkt_access.pl type pm_packet.desc > pm_types.h || ( rm -f pm_types.h && exit 1 ) gcc -D__KERNEL__ -I/usr/src/linux-2.4.21-15.EL/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -Wno-unused -fomit-frame-pointer -pipe -freorder-blocks -mpreferred-stack-boundary=2 -march=i686 -DMODULE -DMODVERSIONS -include /usr/src/linux-2.4.21-15.EL/include/linux/modversions.h -I/usr/src/linux-2.4.21-15.EL/drivers/infiniband/include -I/usr/src/linux-2.4.21-15.EL/drivers/infiniband/ulp/ipoib -DIN_TREE_BUILD -DTS_HOST_DRIVER -D_NO_DATA_PATH_TRACE -nostdinc -iwithprefix include -DKBUILD_BASENAME=smp_access -c -o smp_access.o smp_access.c In file included from /usr/src/linux-2.4.21-15.EL/include/linux/prefetch.h:13, from /usr/src/linux-2.4.21-15.EL/include/linux/list.h:6, from /usr/src/linux-2.4.21-15.EL/drivers/infiniband/include/ts_ib_core_types.h:29 , from smp_types.h:9, from smp_access.h:9, from smp_access.c:6: /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:61: warning: parameter names (without types) in function declaration /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:61: field `loops_per_jiffy_R_ver_str' declared as a function /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: warning: function declaration isn't a prototype /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: warning: function declaration isn't a prototype /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:272: warning: parameter names (without types) in function declaration In file included from /usr/src/linux-2.4.21-15.EL/include/linux/spinlock.h:56, from /usr/src/linux-2.4.21-15.EL/drivers/infiniband/include/ts_ib_core_types.h:30 , from smp_types.h:9, from smp_access.h:9, from smp_access.c:6: /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:9: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:9: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:10: `printk_R_ver_str' declared as function returning a function /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:10: warning: function declaration isn't a prototype . . . . Thanks, Ram. Ram Weiss-Izhaki Project manager ---------------------------------------- Mellanox Technologies LTD Phone +972-4-9097200 ext: 239; Fax: +972-4-9593245 Mobile: +972-52-4559412 P.O.B 586, Yokne'am illit 20692, Israel. ---------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at mellanox.co.il Wed Aug 4 01:20:53 2004 From: ram at mellanox.co.il (Ram Izhaki) Date: Wed, 4 Aug 2004 11:20:53 +0300 Subject: [openib-general] IB Core compile problem on AS3.0 Message-ID: <506C3D7B14CDD411A52C00025558DED603E09FED@mtlex01.yok.mtl.com> Also, This happens only on x86 and not on Opteron. Ram. -----Original Message----- From: Ram Izhaki [mailto:ram at mellanox.co.il] Sent: Wednesday, August 04, 2004 11:18 AM To: 'openib-general at openib.org' Subject: [openib-general] IB Core compile problem on AS3.0 Roland, Can you help me here? I am getting this compile error both on RH EL 3.0 and RH 9.0: make -C core modules make[2]: Entering directory `/usr/src/linux-2.4.21-15.EL/drivers/infiniband/core' perl generate_pkt_access.pl access smp_packet.desc > smp_access.c || ( rm -f smp_access.c && exit 1 ) perl generate_pkt_access.pl header smp_packet.desc > smp_access.h || ( rm -f smp_access.h && exit 1 ) perl generate_pkt_access.pl type smp_packet.desc > smp_types.h || ( rm -f smp_types.h && exit 1 ) perl generate_pkt_access.pl header pm_packet.desc > pm_access.h || ( rm -f pm_access.h && exit 1 ) perl generate_pkt_access.pl type pm_packet.desc > pm_types.h || ( rm -f pm_types.h && exit 1 ) gcc -D__KERNEL__ -I/usr/src/linux-2.4.21-15.EL/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -Wno-unused -fomit-frame-pointer -pipe -freorder-blocks -mpreferred-stack-boundary=2 -march=i686 -DMODULE -DMODVERSIONS -include /usr/src/linux-2.4.21-15.EL/include/linux/modversions.h -I/usr/src/linux-2.4.21-15.EL/drivers/infiniband/include -I/usr/src/linux-2.4.21-15.EL/drivers/infiniband/ulp/ipoib -DIN_TREE_BUILD -DTS_HOST_DRIVER -D_NO_DATA_PATH_TRACE -nostdinc -iwithprefix include -DKBUILD_BASENAME=smp_access -c -o smp_access.o smp_access.c In file included from /usr/src/linux-2.4.21-15.EL/include/linux/prefetch.h:13, from /usr/src/linux-2.4.21-15.EL/include/linux/list.h:6, from /usr/src/linux-2.4.21-15.EL/drivers/infiniband/include/ts_ib_core_types.h:29 , from smp_types.h:9, from smp_access.h:9, from smp_access.c:6: /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:61: warning: parameter names (without types) in function declaration /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:61: field `loops_per_jiffy_R_ver_str' declared as a function /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: warning: function declaration isn't a prototype /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: warning: function declaration isn't a prototype /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:272: warning: parameter names (without types) in function declaration In file included from /usr/src/linux-2.4.21-15.EL/include/linux/spinlock.h:56, from /usr/src/linux-2.4.21-15.EL/drivers/infiniband/include/ts_ib_core_types.h:30 , from smp_types.h:9, from smp_access.h:9, from smp_access.c:6: /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:9: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:9: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:10: `printk_R_ver_str' declared as function returning a function /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:10: warning: function declaration isn't a prototype . . . . Thanks, Ram. Ram Weiss-Izhaki Project manager ---------------------------------------- Mellanox Technologies LTD Phone +972-4-9097200 ext: 239; Fax: +972-4-9593245 Mobile: +972-52-4559412 P.O.B 586, Yokne'am illit 20692, Israel. ---------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed Aug 4 01:41:38 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Aug 2004 11:41:38 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer Message-ID: <20040804084138.GA29136@mellanox.co.il> Hello! I am looking at the ib core gen2 code now, and I am puzzled by the fact that some functions just seem to do a check and then copy another one. Taking the following example int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) { return cq->device->req_ncomp_notif ? cq->device->req_ncomp_notif(cq, wc_cnt) : -ENOSYS; } Why cant all users just do device->req_ncomp_notif(cq, wc_cnt) directly? Its true that req_ncomp_notif could be NULL, but this could be fixed, instead, by modifying it to a reasonable default when the device is initialised. Same goes for ib_resize_cq and some others. Thanks, MST From ram at mellanox.co.il Wed Aug 4 03:47:25 2004 From: ram at mellanox.co.il (Ram Izhaki) Date: Wed, 4 Aug 2004 13:47:25 +0300 Subject: [openib-general] IB Core compile problem on AS3.0 Message-ID: <506C3D7B14CDD411A52C00025558DED603E09FEE@mtlex01.yok.mtl.com> Roland, I managed to solve the problem. Please ignore the mail thread. Thanks, Ram. -----Original Message----- From: Ram Izhaki [mailto:ram at mellanox.co.il] Sent: Wednesday, August 04, 2004 11:21 AM To: 'openib-general at openib.org' Subject: RE: [openib-general] IB Core compile problem on AS3.0 Also, This happens only on x86 and not on Opteron. Ram. -----Original Message----- From: Ram Izhaki [mailto:ram at mellanox.co.il] Sent: Wednesday, August 04, 2004 11:18 AM To: 'openib-general at openib.org' Subject: [openib-general] IB Core compile problem on AS3.0 Roland, Can you help me here? I am getting this compile error both on RH EL 3.0 and RH 9.0: make -C core modules make[2]: Entering directory `/usr/src/linux-2.4.21-15.EL/drivers/infiniband/core' perl generate_pkt_access.pl access smp_packet.desc > smp_access.c || ( rm -f smp_access.c && exit 1 ) perl generate_pkt_access.pl header smp_packet.desc > smp_access.h || ( rm -f smp_access.h && exit 1 ) perl generate_pkt_access.pl type smp_packet.desc > smp_types.h || ( rm -f smp_types.h && exit 1 ) perl generate_pkt_access.pl header pm_packet.desc > pm_access.h || ( rm -f pm_access.h && exit 1 ) perl generate_pkt_access.pl type pm_packet.desc > pm_types.h || ( rm -f pm_types.h && exit 1 ) gcc -D__KERNEL__ -I/usr/src/linux-2.4.21-15.EL/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -Wno-unused -fomit-frame-pointer -pipe -freorder-blocks -mpreferred-stack-boundary=2 -march=i686 -DMODULE -DMODVERSIONS -include /usr/src/linux-2.4.21-15.EL/include/linux/modversions.h -I/usr/src/linux-2.4.21-15.EL/drivers/infiniband/include -I/usr/src/linux-2.4.21-15.EL/drivers/infiniband/ulp/ipoib -DIN_TREE_BUILD -DTS_HOST_DRIVER -D_NO_DATA_PATH_TRACE -nostdinc -iwithprefix include -DKBUILD_BASENAME=smp_access -c -o smp_access.o smp_access.c In file included from /usr/src/linux-2.4.21-15.EL/include/linux/prefetch.h:13, from /usr/src/linux-2.4.21-15.EL/include/linux/list.h:6, from /usr/src/linux-2.4.21-15.EL/drivers/infiniband/include/ts_ib_core_types.h:29 , from smp_types.h:9, from smp_access.h:9, from smp_access.c:6: /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:61: warning: parameter names (without types) in function declaration /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:61: field `loops_per_jiffy_R_ver_str' declared as a function /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:84: warning: function declaration isn't a prototype /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:268: warning: function declaration isn't a prototype /usr/src/linux-2.4.21-15.EL/include/asm/processor.h:272: warning: parameter names (without types) in function declaration In file included from /usr/src/linux-2.4.21-15.EL/include/linux/spinlock.h:56, from /usr/src/linux-2.4.21-15.EL/drivers/infiniband/include/ts_ib_core_types.h:30 , from smp_types.h:9, from smp_access.h:9, from smp_access.c:6: /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:9: invalid suffix on integer constant /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:9: syntax error before numeric constant /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:10: `printk_R_ver_str' declared as function returning a function /usr/src/linux-2.4.21-15.EL/include/asm/spinlock.h:10: warning: function declaration isn't a prototype . . . . Thanks, Ram. Ram Weiss-Izhaki Project manager ---------------------------------------- Mellanox Technologies LTD Phone +972-4-9097200 ext: 239; Fax: +972-4-9593245 Mobile: +972-52-4559412 P.O.B 586, Yokne'am illit 20692, Israel. ---------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdror at mellanox.co.il Wed Aug 4 03:59:40 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Wed, 4 Aug 2004 13:59:40 +0300 Subject: [openib-general] IPoIB IETF WG presentation updated again Message-ID: <506C3D7B14CDD411A52C00025558DED60585BFB7@mtlex01.yok.mtl.com> > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Tuesday, August 03, 2004 12:52 AM > > > Dror> I think that ideally, if a network device can replace the > Dror> ARP functionality in the kernel that'll be better. Because > Dror> this way the IPoIB can get an address resolution request > Dror> from the IP stack, handle it by sending an ARP, then SA > Dror> query for the path record, then creation of HCA address > Dror> handle, and then place it in cache and pass back this > Dror> address handle. When cache is replaced or expires, IPoIB > Dror> will destroy the HCA address handle. If this is not > Dror> supported, then IPoIB will still need to maintain a shadow > Dror> table. > > I don't think the networking maintainers will have much desire to see > pluggable ARP implementations. However it may be possible to use the > hard_header_cache() methods to handle the address vector stuff (I > haven't figured out if this can be made to work, this is just a vague > idea right now). > What you're saying is that we'll have to maintain a shadow cache for the ARP table anyway. And that this table will be loosely synchronized with the OS ARP table (because you don't know when ARP table entry is invalidated, right?). I can understand why the kernel maintainers would rather not have a pluggable ARP cache, but the way IPoIB works today with the shadow ARP table doesn't look clean to me. Maybe it's possible to get a callback when the ARP table entry is invalidated ? BTW, I am not sure I understand how hard_header_cache() in net_device works... > Dror> Beyond that, it'll be nice if we could have gotten the IP > Dror> datagram without the "Ethernet" header. Currently the IPoIB > Dror> driver has to chop it, and replace it with the IPoIB > Dror> encapsulation header. Anyway, this is just the purity of the > Dror> protocol  stack layering. > > Not sure where this is coming from -- the Linux kernel networking core > does not put an ethernet header in a packet. The network device's > hard_header method can do whatever it wants to set up the packet. > Thanks. I wasn't aware of being able to override the default implementation of hard_header in net_device. So, I think I now understand how one would solve this cleanly. I looked at gen1 code and saw that the code for hard_header is currently doing what's Ether is doing. That was probably why I got confused from the first place... -Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Aug 4 04:13:17 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 07:13:17 -0400 Subject: [openib-general] ib_req_ncomp_notif in core_ layer References: <20040804084138.GA29136@mellanox.co.il> Message-ID: <005c01c47a14$0c097ee0$6401a8c0@comcast.net> Michael S. Tsirkin wrote: > Hello! > I am looking at the ib core gen2 code now, and I am puzzled by > the fact that some functions just seem to do a check and then copy > another one. Taking the following example > > int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) > { > return cq->device->req_ncomp_notif ? > cq->device->req_ncomp_notif(cq, wc_cnt) : > -ENOSYS; > } > > > Why cant all users just do device->req_ncomp_notif(cq, wc_cnt) > directly? They can (nothing stops them) but since these are currently optional functions, consumers would need to check that the driver supplied that function. > Its true that req_ncomp_notif could be NULL, but this could > be fixed, instead, by modifying it to a reasonable default when > the device is initialised. > > Same goes for ib_resize_cq and some others. Yes, they all could be done by the driver with error codes always returned for optional routines whose features aren't implemented. That is an alternative method of doing this. -- Hal From gdror at mellanox.co.il Wed Aug 4 04:15:57 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Wed, 4 Aug 2004 14:15:57 +0300 Subject: [openib-general] IPoIB IETF WG presentation updated again Message-ID: <506C3D7B14CDD411A52C00025558DED60585BFB9@mtlex01.yok.mtl.com> >-----Original Message----- >From: Hal Rosenstock [mailto:halr at voltaire.com] >Sent: Tuesday, August 03, 2004 5:13 PM >Hi Dror, > >> It's good to see that MAX_ADDR_LEN has been changed to 32. Does that solve >> all the IPoIB ARP related problems for 2.6 kernel ? Can we store all related link >> information in this 32 bytes ? What is envisioned to be stored in this 32 bytes - is >> it just the QPN+GID, or the entire path info, or the address vector object too ? > >If it holds 32 bytes, then it can hold GID + QPN with 13 bytes still available. > >Other information you might want to hold: >SL 1 bytes >LID 2 bytes >MTU (for connected mode) (1 byte) >Rate (1 byte) >Network Layer >Flow Label 3 bytes (20 bits) >Hop Limit 1 byte >TClass 1 byte > Source Path bits (1 bytes) ? >So all the info for an AV could be stored there. Did I miss something needed ? I didn't >double check this but there is still some room left over. This is the information for the AV. However, you don't want to create the AV for each packet that you send. Although in Mellanox devices the creation of AVs is relatively inexpensive (compared with other resources like QP, CQ), I don't think that it's the right way to go. I think that the right way to go is to store the AV handle in the ARP table as part of the HW address. And here comes the problem... If you're not notified of ARP table entry invalidation, then you can not really destroy the AV handle. So you have now to maintain some state yourself. And I was wondering if there is a way to solve that. > >> I think that ideally, if a network device can replace the ARP functionality in the kernel >> that'll be better. Because this way the IPoIB can get an address resolution request >> from the IP stack, handle it by sending an ARP, then SA query for the path record, then >> creation of HCA address handle, and then place it in cache and pass back this address >> handle. When cache is replaced or expires, IPoIB will destroy the HCA address handle. >> If this is not supported, then IPoIB will still need to maintain a shadow table. > >Cloning an AH is probably faster than creating a new one from scratch. (We would need >an additional verb for this). How much does this cost ? Is this optimization worth it ? In Tavor the creation/modification of AV is relatively inexpensive. It involves writing the AV information to the attached DRAM or to the main memory. For userland application, it may be more complicated in some mode of operation. Writing AV to DRAM involves PIO writes, which you probably want to avoid on the datapath. So, you'd better have an AV ready and persistent for each neighbor, as long as your cache is long enough. BTW, I don't think that modification will almost cost you the same as new creation. > >> Beyond that, it'll be nice if we could have gotten the IP datagram without the "Ethernet" >> header. Currently the IPoIB driver has to chop it, and replace it with the IPoIB encapsulation >> header. Anyway, this is just the purity of the protocol stack layering. > >There would need to be another way to identify the various protocols (aka ethertypes) being >carried. > Yes. That was what Roland mentioned. Implement hard_header. -Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at topspin.com Tue Aug 3 22:17:58 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 03 Aug 2004 22:17:58 -0700 Subject: [openib-general] BOF at Linuxworld In-Reply-To: <1091556191.12096.5.camel@localhost> (Tom Duffy's message of "Tue, 03 Aug 2004 11:03:11 -0700") References: <1091556191.12096.5.camel@localhost> Message-ID: <52d627jxbt.fsf@topspin.com> Tom, Greg, Grant and I had a mini-BOF at Linuxworld today. I got a lot of good ideas on device model/sysfs support that I'll try to write up in semi-coherent form soon... - R. From roland at topspin.com Tue Aug 3 22:53:08 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 03 Aug 2004 22:53:08 -0700 Subject: [openib-general] device classes Message-ID: <524qnjjvp7.fsf@topspin.com> Today we decided that upper-layer protocols like IPoIB should be classes (in the sense of the word ;). However, as I start to try and figure things out, I don't see how a class gets hotplug-type notifications when devices appear or disappear. Am I missing something, or does the IB core have to implement something where classes like IPoIB register with the core, and then the core creates class_device instances for every registered class when devices are added? - R. From roland at topspin.com Tue Aug 3 21:19:39 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 03 Aug 2004 21:19:39 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <20040803173643.43143cdd.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 3 Aug 2004 17:36:43 -0700") References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> Message-ID: <52llgv7cx0.fsf@topspin.com> This looks pretty sane to me -- I really like the proposed way of handling redirection. ...just a few questions/comments/nitpicks: - what callback will a consumer receive if no response is received before a send timeout? - where does transaction ID allocation happen? - how does the MAD layer know how many elements are in the method_array parameter of ib_mad_reg_class()? - where is the GRH info stored if ib_mad_msg.global_route is set? - should we use a "struct list_head" instead of our own type for ib_mad_msg.next? - I assume we'll have helpers for parsing ClassPortInfo redirection responses and updating a message. - are any values other than IB_QPT_SMI and IB_QPT_GSI valid for the qp_type parameter of ib_mad_reg_class()? - should ib_qp_redir()/ib_process_wc() have "_mad_" in their name to give more of a clue as to what they do? - are you thinking that a single "support RMPP" flag would be passed into ib_mad_reg_class() and ib_qp_redir()? Thanks, Roland From roland at topspin.com Tue Aug 3 07:23:35 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 03 Aug 2004 07:23:35 -0700 Subject: [openib-general] IPoIB IETF WG presentation updated again In-Reply-To: <003f01c47963$ee56b6c0$6401a8c0@comcast.net> (Hal Rosenstock's message of "Tue, 03 Aug 2004 10:12:36 -0400") References: <506C3D7B14CDD411A52C00025558DED60585BDB2@mtlex01.yok.mtl.com> <003f01c47963$ee56b6c0$6401a8c0@comcast.net> Message-ID: <528ycwmhaw.fsf@topspin.com> Hal> Cloning an AH is probably faster than creating a new one from Hal> scratch. (We would need an additional verb for this). How Hal> much does this cost ? Is this optimization worth it ? Actually cloning an AH is probably slower on Tavor than creating one from scratch, at least if the AH is in Tavor memory. To clone an AH one has to do a bunch of reads across PCI and then write the data back across PCI, while creating a new one is just a matter of taking the address attributes (which are almost certainly already in CPU cache) and writing them across PCI. (Reads across PCI are much more expensive than writes, which can be posted and even write-combined) Obviously other HCAs may perform differently. - Roland From roland at topspin.com Tue Aug 3 21:23:24 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 03 Aug 2004 21:23:24 -0700 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate In-Reply-To: <506C3D7B14CDD411A52C00025558DED60585BE63@mtlex01.yok.mtl.com> (Dror Goldenberg's message of "Tue, 3 Aug 2004 13:46:17 +0300") References: <506C3D7B14CDD411A52C00025558DED60585BE63@mtlex01.yok.mtl.com> Message-ID: <52hdrj7cqr.fsf@topspin.com> Dror> It is not possible to turn off PD check on a MR in Dror> Tavor. You will have to create an MR for each PD that you Dror> need. One way to go is to map all kernel apps to the same Dror> PD, but you probably don't want to do such a thing. It's no big deal to create one more MR with translation off for each kernel PD (in fact mthca does it already for access to address vectors). So this is a perfectly reasonable solution, it just means our API can't perfectly match the verbs extensions. Dror> Once we get to verbs extensions, you can configure Arbel to Dror> disable the PD check for certain QPs. Yes, I see how Arbel has full support for the base memory management extensions including reserved L_Key. I would assume we will allow use of reserved L_Key for all kernel QPs (and no userspace QPs). Thanks, Roland From roland at topspin.com Tue Aug 3 21:09:23 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 03 Aug 2004 21:09:23 -0700 Subject: [openib-general] [PATCH] [FIXED] Fix underlying problem with using typedefs In-Reply-To: <1091212925.2772.1148.camel@localhost> (Tom Duffy's message of "Fri, 30 Jul 2004 11:42:05 -0700") References: <1091209468.2772.1081.camel@localhost> <20040730180353.GA29306@kroah.com> <1091212925.2772.1148.camel@localhost> Message-ID: <52pt677de4.fsf@topspin.com> Thanks, I finally committed this. - R. From roland at topspin.com Tue Aug 3 22:39:47 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 03 Aug 2004 22:39:47 -0700 Subject: [openib-general] Update from merging in Roland's changes... In-Reply-To: <20040803102508.08c8f643.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 3 Aug 2004 10:25:08 -0700") References: <20040730151713.52bc6b55.mshefty@ichips.intel.com> <20040730153253.038876fe.mshefty@ichips.intel.com> <20040803102508.08c8f643.mshefty@ichips.intel.com> Message-ID: <528ycvjwbg.fsf@topspin.com> This patch (committed to my branch) syncs back up with the changes to the CQ and MR API. Next on my list is the AH API. - R. Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (revision 545) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (working copy) @@ -196,7 +196,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) { struct ipoib_dev_priv *priv = dev->priv; - int entries; priv->pd = ib_alloc_pd(priv->ca); if (IS_ERR(priv->pd)) { @@ -205,15 +204,15 @@ return -ENODEV; } - entries = IPOIB_TX_RING_SIZE + IPOIB_RX_RING_SIZE + 1; - priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, dev, &entries); + priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, dev, + IPOIB_TX_RING_SIZE + IPOIB_RX_RING_SIZE + 1); if (IS_ERR(priv->cq)) { TS_REPORT_FATAL(MOD_IB_NET, "%s: failed to create CQ", dev->name); goto out_free_pd; } TS_TRACE(MOD_IB_NET, T_VERBOSE, TRACE_IB_NET_GEN, - "%s: CQ with %d entries", dev->name, entries); + "%s: CQ with %d entries", dev->name, priv->cq->cqe); if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP)) goto out_free_cq; @@ -225,19 +224,19 @@ .size = (unsigned long) high_memory - PAGE_OFFSET }; uint64_t dummy_iova = 0; - u32 rkey; priv->mr = ib_reg_phys_mr(priv->pd, &buffer_list, 1, /* list_len */ IB_MR_LOCAL_WRITE, - &dummy_iova, - &priv->lkey, &rkey); + &dummy_iova); if (IS_ERR(priv->mr)) { TS_REPORT_FATAL(MOD_IB_NET, "%s: ib_memory_register_physical failed", dev->name); goto out_free_cq; } + + priv->lkey = priv->mr->lkey; } return 0; Index: src/linux-kernel/infiniband/ulp/srp/srp_host.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_host.c (revision 545) +++ src/linux-kernel/infiniband/ulp/srp/srp_host.c (working copy) @@ -529,8 +529,6 @@ int pkt_num; int max_num_pkts; srp_host_hca_params_t *hca; - int cq_entries; - int requested_cq_entries; int hca_index; /* allocate twice as many packets, send and receive */ @@ -563,22 +561,19 @@ if (hca->valid == FALSE) break; - cq_entries = MAX_SEND_WQES; - requested_cq_entries = cq_entries; - target->cqs_hndl[hca_index] = ib_create_cq(hca->ca_hndl, - cq_send_event, - target, - &cq_entries); + cq_send_event, + target, + MAX_SEND_WQES); if (IS_ERR(target->cqs_hndl[hca_index]) || - cq_entries < requested_cq_entries) { + target->cqs_hndl[hca_index]->cqe < MAX_SEND_WQES) { TS_REPORT_FATAL(MOD_SRPTP, "Send completion queue " "creation failed: %d asked " "for %d entries", - cq_entries, - requested_cq_entries); + target->cqs_hndl[hca_index]->cqe, + MAX_SEND_WQES); target->cqs_hndl[hca_index] = NULL; goto CQ_MR_FAIL; } @@ -586,16 +581,13 @@ if (ib_req_notify_cq(target->cqs_hndl[hca_index], IB_CQ_NEXT_COMP)) goto CQ_MR_FAIL; - cq_entries = MAX_RECV_WQES; - requested_cq_entries = cq_entries; - target->cqr_hndl[hca_index] = ib_create_cq(hca->ca_hndl, cq_recv_event, target, - &cq_entries); + MAX_RECV_WQES); if (IS_ERR(target->cqs_hndl[hca_index]) || - cq_entries < requested_cq_entries) { + target->cqr_hndl[hca_index]->cqe < MAX_RECV_WQES) { TS_REPORT_FATAL(MOD_SRPTP, "Recv completeion queue " "creation failed"); @@ -616,9 +608,7 @@ 1, /*list_len */ IB_MR_LOCAL_WRITE | IB_MR_REMOTE_READ, - &iova, - &target->l_key[hca_index], - &target->r_key[hca_index]); + &iova); if (IS_ERR(target->srp_pkt_data_mhndl[hca_index])) { TS_REPORT_FATAL(MOD_SRPTP, @@ -627,6 +617,9 @@ target->srp_pkt_data_mhndl[hca_index] = NULL; goto CQ_MR_FAIL; } + + target->l_key[hca_index] = target->srp_pkt_data_mhndl[hca_index]->lkey; + target->r_key[hca_index] = target->srp_pkt_data_mhndl[hca_index]->rkey; } srp_pkt = target->srp_pkt_hdr_area; Index: src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (revision 545) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -1064,7 +1064,7 @@ conn->send_cq = ib_create_cq(conn->ca, sdp_cq_event_handler, (void *)(unsigned long)conn->hashent, - &conn->send_cq_size); + conn->send_cq_size); if (IS_ERR(conn->send_cq)) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, @@ -1074,6 +1074,8 @@ goto error_scq; } + conn->send_cq_size = conn->send_cq->cqe; + result = ib_req_notify_cq(conn->send_cq, IB_CQ_NEXT_COMP); if (0 > result) { @@ -1088,7 +1090,7 @@ conn->recv_cq = ib_create_cq(conn->ca, sdp_cq_event_handler, (void *)(unsigned long)conn->hashent, - &conn->recv_cq_size); + conn->recv_cq_size); if (IS_ERR(conn->recv_cq)) { @@ -1099,6 +1101,8 @@ goto error_rcq; } + conn->recv_cq_size = conn->recv_cq->cqe; + result = ib_req_notify_cq(conn->recv_cq, IB_CQ_NEXT_COMP); if (0 > result) { @@ -1980,8 +1984,7 @@ &buffer_list, 1, /* list_len */ IB_ACCESS_LOCAL_WRITE, - &hca->iova, - &hca->l_key, &hca->r_key); + &hca->iova); if (IS_ERR(hca->mem_h)) { result = PTR_ERR(hca->mem_h); TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, @@ -1989,6 +1992,10 @@ result, hca_handle, hca_count); goto error; } + + hca->l_key = hca->mem_h->lkey; + hca->r_key = hca->mem_h->rkey; + #ifdef _TS_SDP_AIO_SUPPORT /* * FMR allocation Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 545) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -145,6 +145,8 @@ struct ib_mr { struct ib_device *device; struct ib_pd *pd; + u32 lkey; + u32 rkey; atomic_t usecnt; /* count number of MWs */ }; @@ -195,9 +197,7 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey); + u64 *iova_start); int (*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int (*dereg_mr)(struct ib_mr *mr); @@ -207,9 +207,7 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey); + u64 *iova_start); ib_mw_create_func mw_create; ib_mw_destroy_func mw_destroy; ib_mw_bind_func mw_bind; @@ -229,9 +227,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, - void *cq_context, int *cqe); + void *cq_context, int cqe); -int ib_resize_cq(struct ib_cq *cq, int *cqe); +int ib_resize_cq(struct ib_cq *cq, int cqe); int ib_destroy_cq(struct ib_cq *cq); /** @@ -272,9 +270,7 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey); + u64 *iova_start); int ib_rereg_phys_mr(struct ib_mr *mr, int mr_rereg_mask, @@ -282,9 +278,7 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey); + u64 *iova_start); int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int ib_dereg_mr(struct ib_mr *mr); Index: src/linux-kernel/infiniband/core/core_cq.c =================================================================== --- src/linux-kernel/infiniband/core/core_cq.c (revision 545) +++ src/linux-kernel/infiniband/core/core_cq.c (working copy) @@ -30,17 +30,17 @@ struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, - void *cq_context, int *cqe) + void *cq_context, int cqe) { struct ib_cq *cq; - cq = device->create_cq(device, cqe); + cq = device->create_cq(device, &cqe); if (!IS_ERR(cq)) { cq->device = device; cq->comp_handler = comp_handler; cq->context = cq_context; - cq->cqe = *cqe; + cq->cqe = cqe; atomic_set(&cq->usecnt, 0); } @@ -57,10 +57,19 @@ } EXPORT_SYMBOL(ib_destroy_cq); -int ib_resize_cq(struct ib_cq *cq, - int *cqe) +int ib_resize_cq(struct ib_cq *cq, + int cqe) { - return cq->device->resize_cq ? cq->device->resize_cq(cq, cqe) : -ENOSYS; + int ret; + + if (!cq->device->resize_cq) + return -ENOSYS; + + ret = cq->device->resize_cq(cq, &cqe); + if (!ret) + cq->cqe = cqe; + + return ret; } EXPORT_SYMBOL(ib_resize_cq); Index: src/linux-kernel/infiniband/core/core_mr.c =================================================================== --- src/linux-kernel/infiniband/core/core_mr.c (revision 545) +++ src/linux-kernel/infiniband/core/core_mr.c (working copy) @@ -32,14 +32,12 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey) + u64 *iova_start) { struct ib_mr *mr; mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, - mr_access_flags, iova_start, lkey, rkey); + mr_access_flags, iova_start); if (!IS_ERR(mr)) { mr->device = pd->device; @@ -58,15 +56,12 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey) + u64 *iova_start) { return mr->device->rereg_phys_mr ? mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, phys_buf_array, num_phys_buf, - mr_access_flags, iova_start, - lkey, rkey) : + mr_access_flags, iova_start) : -ENOSYS; } EXPORT_SYMBOL(ib_rereg_phys_mr); Index: src/linux-kernel/infiniband/core/mad_main.c =================================================================== --- src/linux-kernel/infiniband/core/mad_main.c (revision 545) +++ src/linux-kernel/infiniband/core/mad_main.c (working copy) @@ -51,25 +51,25 @@ struct ib_mr **mr, u32 *lkey) { - u32 rkey; u64 iova = 0; struct ib_phys_buf buffer_list = { .addr = 0, .size = (unsigned long) high_memory - PAGE_OFFSET }; - *mr = ib_reg_phys_mr(pd, &buffer_list, - 1, /* list_len */ - IB_MR_LOCAL_WRITE, - &iova, lkey, &rkey); - if (IS_ERR(*mr)) + *mr = ib_reg_phys_mr(pd, &buffer_list, 1, /* list_len */ + IB_MR_LOCAL_WRITE, &iova); + if (IS_ERR(*mr)) { TS_REPORT_WARN(MOD_KERNEL_IB, "ib_reg_phys_mr failed " "size 0x%016" TS_U64_FMT "x, iova 0x%016" TS_U64_FMT "x" " (return code %d)", buffer_list.size, iova, PTR_ERR(*mr)); + return PTR_ERR(*mr); + } - return IS_ERR(*mr) ? PTR_ERR(*mr) : 0; + *lkey = (*mr)->lkey; + return 0; } static int ib_mad_qp_create(struct ib_device *device, @@ -200,7 +200,7 @@ (IB_MAD_RECEIVES_PER_QP + IB_MAD_SENDS_PER_QP) * priv->num_port; priv->cq = ib_create_cq(device, ib_mad_completion, - device, &entries); + device, entries); if (IS_ERR(priv->cq)) { TS_REPORT_FATAL(MOD_KERNEL_IB, "Failed to allocate CQ for %s", Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 545) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -419,9 +419,7 @@ struct ib_phys_buf *buffer_list, int num_phys_buf, int acc, - u64 *iova_start, - u32 *lkey, - u32 *rkey) + u64 *iova_start) { struct mthca_mr *mr; u64 *page_list; @@ -520,8 +518,6 @@ goto out; } - *lkey = *rkey = mr->key; - out: kfree(page_list); return (struct ib_mr *) mr; Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.h =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.h (revision 545) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.h (working copy) @@ -40,7 +40,6 @@ struct mthca_mr { struct ib_mr ibmr; - u32 key; int order; u32 first_seg; }; Index: src/linux-kernel/infiniband/hw/mthca/mthca_cq.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_cq.c (revision 545) +++ src/linux-kernel/infiniband/hw/mthca/mthca_cq.c (working copy) @@ -657,7 +657,7 @@ cq_context->error_eqn = cpu_to_be32(dev->eq_table.eq[MTHCA_EQ_ASYNC].eqn); cq_context->comp_eqn = cpu_to_be32(dev->eq_table.eq[MTHCA_EQ_COMP].eqn); cq_context->pd = cpu_to_be32(dev->driver_pd.pd_num); - cq_context->lkey = cpu_to_be32(cq->mr.key); + cq_context->lkey = cpu_to_be32(cq->mr.ibmr.lkey); cq_context->cqn = cpu_to_be32(cq->cqn); err = mthca_SW2HW_CQ(dev, cq_context, cq->cqn, &status); Index: src/linux-kernel/infiniband/hw/mthca/mthca_eq.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_eq.c (revision 545) +++ src/linux-kernel/infiniband/hw/mthca/mthca_eq.c (working copy) @@ -408,7 +408,7 @@ MTHCA_KAR_PAGE); eq_context->pd = cpu_to_be32(dev->driver_pd.pd_num); eq_context->intr = intr; - eq_context->lkey = cpu_to_be32(eq->mr.key); + eq_context->lkey = cpu_to_be32(eq->mr.ibmr.lkey); err = mthca_SW2HW_EQ(dev, eq_context, eq->eqn, &status); if (err) { Index: src/linux-kernel/infiniband/hw/mthca/mthca_av.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_av.c (revision 521) +++ src/linux-kernel/infiniband/hw/mthca/mthca_av.c (working copy) @@ -77,7 +77,7 @@ av = ah->av; } - ah->key = pd->ntmr.key; + ah->key = pd->ntmr.ibmr.lkey; memset(av, 0, MTHCA_AV_SIZE); Index: src/linux-kernel/infiniband/hw/mthca/mthca_mr.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_mr.c (revision 545) +++ src/linux-kernel/infiniband/hw/mthca/mthca_mr.c (working copy) @@ -118,14 +118,15 @@ might_sleep(); mr->order = -1; - mr->key = mthca_alloc(&dev->mr_table.mpt_alloc); - if (mr->key == -1) + mr->ibmr.lkey = mthca_alloc(&dev->mr_table.mpt_alloc); + if (mr->ibmr.lkey == -1) return -ENOMEM; + mr->ibmr.rkey = mr->ibmr.lkey; mailbox = kmalloc(sizeof *mpt_entry + MTHCA_CMD_MAILBOX_EXTRA, GFP_KERNEL); if (!mailbox) { - mthca_free(&dev->mr_table.mpt_alloc, mr->key); + mthca_free(&dev->mr_table.mpt_alloc, mr->ibmr.lkey); return -ENOMEM; } mpt_entry = MAILBOX_ALIGN(mailbox); @@ -137,7 +138,7 @@ MTHCA_MPT_FLAG_PHYSICAL | MTHCA_MPT_FLAG_REGION); mpt_entry->page_size = 0; - mpt_entry->key = cpu_to_be32(mr->key); + mpt_entry->key = cpu_to_be32(mr->ibmr.lkey); mpt_entry->pd = cpu_to_be32(pd); mpt_entry->start = 0; mpt_entry->length = ~0ULL; @@ -146,7 +147,7 @@ sizeof *mpt_entry - offsetof(struct mthca_mpt_entry, lkey)); err = mthca_SW2HW_MPT(dev, mpt_entry, - mr->key & (dev->limits.num_mpts - 1), + mr->ibmr.lkey & (dev->limits.num_mpts - 1), &status); if (status) { mthca_warn(dev, "SW2HW_MPT returned status 0x%02x\n", @@ -173,9 +174,10 @@ might_sleep(); WARN_ON(buffer_size_shift >= 32); - mr->key = mthca_alloc(&dev->mr_table.mpt_alloc); - if (mr->key == -1) + mr->ibmr.lkey = mthca_alloc(&dev->mr_table.mpt_alloc); + if (mr->ibmr.lkey == -1) return -ENOMEM; + mr->ibmr.rkey = mr->ibmr.lkey; for (i = dev->limits.mtt_seg_size / 8, mr->order = 0; i < list_len; @@ -233,7 +235,7 @@ access); mpt_entry->page_size = cpu_to_be32(buffer_size_shift - 12); - mpt_entry->key = cpu_to_be32(mr->key); + mpt_entry->key = cpu_to_be32(mr->ibmr.lkey); mpt_entry->pd = cpu_to_be32(pd); mpt_entry->start = cpu_to_be64(iova); mpt_entry->length = cpu_to_be64(total_size); @@ -243,7 +245,7 @@ mr->first_seg * dev->limits.mtt_seg_size); if (0) { - mthca_dbg(dev, "Dumping MPT entry %08x:\n", mr->key); + mthca_dbg(dev, "Dumping MPT entry %08x:\n", mr->ibmr.lkey); for (i = 0; i < sizeof (struct mthca_mpt_entry) / 4; ++i) { if (i % 4 == 0) printk("[%02x] ", i * 4); @@ -254,7 +256,7 @@ } err = mthca_SW2HW_MPT(dev, mpt_entry, - mr->key & (dev->limits.num_mpts - 1), + mr->ibmr.lkey & (dev->limits.num_mpts - 1), &status); if (status) { mthca_warn(dev, "SW2HW_MPT returned status 0x%02x\n", @@ -272,7 +274,7 @@ mthca_free_mtt(dev, mr->first_seg, mr->order); err_out_mpt_free: - mthca_free(&dev->mr_table.mpt_alloc, mr->key); + mthca_free(&dev->mr_table.mpt_alloc, mr->ibmr.lkey); return err; } @@ -284,7 +286,7 @@ might_sleep(); err = mthca_HW2SW_MPT(dev, NULL, - mr->key & (dev->limits.num_mpts - 1), + mr->ibmr.lkey & (dev->limits.num_mpts - 1), &status); if (err) mthca_warn(dev, "HW2SW_MPT failed (%d)\n", err); @@ -295,7 +297,7 @@ if (mr->order >= 0) mthca_free_mtt(dev, mr->first_seg, mr->order); - mthca_free(&dev->mr_table.mpt_alloc, mr->key); + mthca_free(&dev->mr_table.mpt_alloc, mr->ibmr.lkey); } int __devinit mthca_init_mr_table(struct mthca_dev *dev) Index: src/linux-kernel/infiniband/hw/mthca/mthca_qp.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_qp.c (revision 545) +++ src/linux-kernel/infiniband/hw/mthca/mthca_qp.c (working copy) @@ -631,7 +631,7 @@ /* leave rdd as 0 */ qp_context->pd = cpu_to_be32(qp->pd->pd_num); /* leave wqe_base as 0 (we always create an MR based at 0 for WQs) */ - qp_context->wqe_lkey = cpu_to_be32(qp->mr.key); + qp_context->wqe_lkey = cpu_to_be32(qp->mr.ibmr.lkey); qp_context->params1 = cpu_to_be32((MTHCA_ACK_REQ_FREQ << 28) | (MTHCA_FLIGHT_LIMIT << 24) | MTHCA_QP_BIT_SRE | @@ -1086,7 +1086,7 @@ ind * MTHCA_UD_HEADER_SIZE); data->byte_count = cpu_to_be32(header_size); - data->lkey = cpu_to_be32(sqp->qp.pd->ntmr.key); + data->lkey = cpu_to_be32(sqp->qp.pd->ntmr.ibmr.lkey); data->addr = cpu_to_be64(sqp->header_dma + ind * MTHCA_UD_HEADER_SIZE); From roland at topspin.com Tue Aug 3 07:04:02 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 03 Aug 2004 07:04:02 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AB57@taurus.voltaire.com> (Yaron Haviv's message of "Tue, 3 Aug 2004 06:56:33 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AB57@taurus.voltaire.com> Message-ID: <52llgwmi7h.fsf@topspin.com> Yaron> we need however to look at the redirect messages that run Yaron> on QP1, when an active side sends a MAD and get a response Yaron> that the MAD should be redirected, the MAD should be resent Yaron> to the new QP, in the proposed implementation the GSI layer Yaron> resends the MAD to the new QP when such redirect message Yaron> arrives without involving the App, otherwise any consumer Yaron> should have provided exactly the same functionality and its Yaron> better to have it be transparent to the consumers. If redirect handling is done completely within the GSI layer, it seems the consumer continue to send requests to the original destination and have every request redirected. It might be more efficient to expose the fact of redirection to the consumer so that future requests can be sent to the new destination. - Roland From mshefty at ichips.intel.com Wed Aug 4 08:01:29 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 08:01:29 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <52llgv7cx0.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> <52llgv7cx0.fsf@topspin.com> Message-ID: <20040804080129.2749e280.mshefty@ichips.intel.com> On Tue, 03 Aug 2004 21:19:39 -0700 Roland Dreier wrote: > This looks pretty sane to me -- I really like the proposed way of > handling redirection. Thanks for reviewing this. > - what callback will a consumer receive if no response is received > before a send timeout? Hmm... maybe some sort of transaction status needs to be added to the ib_mad_msg to indicate a timeout on the send. I'm not sure we want to override the work completion status for this... > - where does transaction ID allocation happen? I was assuming with the registration call. The client TID should probably be placed inside struct ib_mad_reg. > - how does the MAD layer know how many elements are in the > method_array parameter of ib_mad_reg_class()? I was assuming a fixed size of 127 elements. (At least I think it's 127 methods per class.) This could probably be an optional argument if the registration is for all methods of a given class. > - where is the GRH info stored if ib_mad_msg.global_route is set? buf would point to the start of the MAD, including the GRH. Based on a separate mail, I'm still trying to figure out the best way to send/receive MADs without unnecessary data copies. > - should we use a "struct list_head" instead of our own type for > ib_mad_msg.next? I'm fine with this. I followed what I had for the work requests. Maybe those should change as well? > - I assume we'll have helpers for parsing ClassPortInfo redirection > responses and updating a message. I'm good with redirection helper routines. I'd just like to get a clean layering. Btw, does anyone know if you can redirect in the midding of sending an RMPP message? > - are any values other than IB_QPT_SMI and IB_QPT_GSI valid for the > qp_type parameter of ib_mad_reg_class()? I wasn't thinking that there would be. In fact, I was considering the impact of removing ib_get_spl_qp from the exposed access layer API. I was also considering preventing calls to ib_modify_qp/ib_destroy_qp from using a QP of type SMI/GSI, but allowing calls to ib_query_qp to pass through. Thoughts? > - should ib_qp_redir()/ib_process_wc() have "_mad_" in their name to > give more of a clue as to what they do? Probably. :) > - are you thinking that a single "support RMPP" flag would be passed > into ib_mad_reg_class() and ib_qp_redir()? That is my current thought. If the flag is set, it indicates that the class/methods being accessed by the user carry the RMPP header. I think that this would allow a vendor to define their own MADs, but take advantage of RMPP as long as the standard RMPP header was used. From halr at voltaire.com Wed Aug 4 09:29:02 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 12:29:02 -0400 Subject: [openib-general] [PATCH] GSI: Merge in Sean's changes to ib_xxx_cq functions Message-ID: <1091636944.1212.108.camel@localhost.localdomain> Merge in Sean's changes to ib_xxx_cq functions Index: access/gsi_main.c =================================================================== --- access/gsi_main.c (revision 583) +++ access/gsi_main.c (working copy) @@ -655,7 +655,7 @@ struct gsi_hca_info_st *hca = NULL, *head = (struct gsi_hca_info_st *) &gsi_hca_list, *entry; int ret; - u32 cq_size; + int cq_size; struct ib_qp_init_attr qp_init_attr; struct ib_qp_cap qp_cap; GSI_HCA_LIST_LOCK_VAR; @@ -706,7 +706,7 @@ cq_size = GSI_QP_SND_SIZE + GSI_QP_RCV_SIZE + 20; hca->cq = ib_create_cq(hca->handle, (ib_comp_handler) gsi_thread_compl_cb, - (void *) hca, &cq_size); + (void *) hca, cq_size); if (IS_ERR(hca->cq)) { printk(KERN_ERR "Could not create receive CQ.\n"); ret = PTR_ERR(hca->cq); Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 562) +++ include/ib_verbs.h (working copy) @@ -70,6 +70,7 @@ struct ib_device *device; ib_comp_handler comp_handler; void *cq_context; + int cqe; }; struct ib_srq { @@ -600,12 +601,10 @@ struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, - void *cq_context, int *cqe); + void *cq_context, int cqe); -int ib_query_cq(struct ib_cq *cq, void *cq_context, int *cqe); +int ib_resize_cq(struct ib_cq *cq, int cqe); -int ib_resize_cq(struct ib_cq *cq, int *cqe); - int ib_destroy_cq(struct ib_cq *cq); /* in functions below iova_start is in/out parameter */ From halr at voltaire.com Wed Aug 4 09:40:00 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 12:40:00 -0400 Subject: [openib-general] GSI compromise In-Reply-To: <20040804080129.2749e280.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> <52llgv7cx0.fsf@topspin.com> <20040804080129.2749e280.mshefty@ichips.intel.com> Message-ID: <1091637602.1212.127.camel@localhost.localdomain> On Wed, 2004-08-04 at 11:01, Sean Hefty wrote: > On Tue, 03 Aug 2004 21:19:39 -0700 > Roland Dreier wrote: > > - how does the MAD layer know how many elements are in the > > method_array parameter of ib_mad_reg_class()? > > I was assuming a fixed size of 127 elements. (At least I think it's 127 methods per class.) This could probably be an optional argument if the registration is for all methods of a given class. I think it's 128 (7 bits) and whole range is usable (although some are reserved). See IBA 1.1 Table 101, p. 637-8. > > - are you thinking that a single "support RMPP" flag would be passed > > into ib_mad_reg_class() and ib_qp_redir()? > > That is my current thought. If the flag is set, it indicates that the class/methods being accessed by the user carry the RMPP header. I think that this would allow a vendor to define their own MADs, but take advantage of RMPP as long as the standard RMPP header was used. I'm not sure it is quite this simple although that would be nice if it were :-) I am in the process of writing up a cut of the RMPP requirements. -- Hal From halr at voltaire.com Wed Aug 4 10:21:58 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 13:21:58 -0400 Subject: [openib-general] RMPP Requirements Message-ID: <1091640119.1212.205.camel@localhost.localdomain> Hi, Here's my cut at RMPP requirements: - At least a flag is needed to indicate RMPP or not - There are 3 types of RMPP transfers: - receiver initiated - sender initiated - sender initiated two sided Perhaps this is better expressed in terms of roles: - receiver role (used by SA client GetTable and GetTraceTable) - sender role (used by SA GetTableResp) - sender initiated two sided (used by SA client GetMulti) - RMPP can be used by each GS class although SA is the only one currently using it - RMPP is method and attribute specific within any class that might use it - currently as there is no overlap with the same SA attribute being supported by multiple methods, method appears to be sufficient - RMPP Timeouts - Response time is based on SA PathRecord:PacketLifeTime and GS agent ClassPortInfo:RespTimeValue (which can change) if one exists - Total transaction time is based on SA PathRecord:PacketLifeTime in both directions, receiver's RespTimeValue, and response time above -- Hal From roland at topspin.com Wed Aug 4 10:28:04 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 04 Aug 2004 10:28:04 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <20040804080129.2749e280.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 4 Aug 2004 08:01:29 -0700") References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> <52llgv7cx0.fsf@topspin.com> <20040804080129.2749e280.mshefty@ichips.intel.com> Message-ID: <52smb2iziz.fsf@topspin.com> Roland> - how does the MAD layer know how many elements are in the Roland> method_array parameter of ib_mad_reg_class()? Sean> I was assuming a fixed size of 127 elements. (At least I Sean> think it's 127 methods per class.) This could probably be Sean> an optional argument if the registration is for all methods Sean> of a given class. How does the response bit get handled then? (eg lots of consumers will want to see get responses but not gets). Also if it's just a bitmask of which method we want, we might as well make it a bitmap (ie use DECLARE_BITMAP(method_mask, 256) or something like that). Roland> - should we use a "struct list_head" instead of our own type Roland> for ib_mad_msg.next? Sean> I'm fine with this. I followed what I had for the work Sean> requests. Maybe those should change as well? That's a good idea. It adds one more pointer to the work request (since struct list_head is doubly linked) but I think being able to use macros like list_for_each(), list_del(), etc. is worth it. Sean> I'm good with redirection helper routines. I'd just like to Sean> get a clean layering. Btw, does anyone know if you can Sean> redirect in the midding of sending an RMPP message? I don't think so -- my impression is that a redirect message counts as a response to a request, so you can't send one until you get the whole request. - R. From mshefty at ichips.intel.com Wed Aug 4 09:30:07 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 09:30:07 -0700 Subject: [openib-general] RMPP Requirements In-Reply-To: <1091640119.1212.205.camel@localhost.localdomain> References: <1091640119.1212.205.camel@localhost.localdomain> Message-ID: <20040804093007.02c3a217.mshefty@ichips.intel.com> On Wed, 04 Aug 2004 13:21:58 -0400 Hal Rosenstock wrote: > Hi, > > Here's my cut at RMPP requirements: > > - At least a flag is needed to indicate RMPP or not > - There are 3 types of RMPP transfers: > - receiver initiated > - sender initiated > - sender initiated two sided > Perhaps this is better expressed in terms of roles: > - receiver role (used by SA client GetTable and GetTraceTable) > - sender role (used by SA GetTableResp) > - sender initiated two sided (used by SA client GetMulti) > - RMPP can be used by each GS class although SA is the only one > currently using it > - RMPP is method and attribute specific within any class that might use > it > - currently as there is no overlap with the same SA attribute > being > supported by multiple methods, method appears to be sufficient > - RMPP Timeouts > - Response time is based on SA PathRecord:PacketLifeTime and GS > agent > ClassPortInfo:RespTimeValue (which can change) if one exists > - Total transaction time is based on SA > PathRecord:PacketLifeTime in > both directions, receiver's RespTimeValue, and response time above I've started trying to incorporate RMPP into the "compromise" GSI/MAD APIs that I sent out earlier. For RMPP, I'm hoping that the access layer can have a limited view of RMPP, while still providing all of the support that's required. E.g.: * The access layer needs to know if the RMPP header is in a MAD or not. I think that this can be determined by the management class. * It needs to know the offset of the RMPP header into the MAD. This seems fixed for currently defined classes. * The access layer needs to know what version of RMPP is being used. This is given in the RMPP header, which would work, provided that the header doesn't change. Given this, the access layer should be able to determine if RMPP is active. I don't think it matters which of the three types of transfers are in use. From ftillier at infiniconsys.com Wed Aug 4 10:47:35 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Wed, 4 Aug 2004 10:47:35 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <20040803173643.43143cdd.mshefty@ichips.intel.com> Message-ID: <08628CA53C6CBA4ABAFB9E808A5214CB0205E190@mercury.infiniconsys.com> > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Tuesday, August 03, 2004 5:37 PM > > For redirection, if you examine struct ib_gsi_reg, there's a pointer to a > qp. For normal registration calls, I'm expecting that this value will go > directly to QP1. For QP redirection, we can add a calls such as: > > struct ib_mad_reg *ib_qp_redir(struct ib_qp *qp, > ib_mad_send_handler send_handler, > ib_mad_recv_handler recv_handler, > void *context); > > int ib_process_wc(struct ib_mad_reg *mad_reg, > struct ib_wc *wc); > > The client would allocate and manage the QP and CQ(s). The redirect call > simply informs the access layer that RMPP and request/response services > should be enabled on that QP. When the client removes a work completion > for that QP, it hands the work completion to the access layer for > processing. The access layer can then perform RMPP and request/response > matching. How does a client that wants to selectively redirect requests to more than one QP do so with this API? Is this a feature we want, or do we not care about supporting clients with more than one redirected QP? It seems that this could be a good scalability feature, no? - Fab From mshefty at ichips.intel.com Wed Aug 4 09:55:16 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 09:55:16 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <52smb2iziz.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> <52llgv7cx0.fsf@topspin.com> <20040804080129.2749e280.mshefty@ichips.intel.com> <52smb2iziz.fsf@topspin.com> Message-ID: <20040804095516.41cbf286.mshefty@ichips.intel.com> On Wed, 04 Aug 2004 10:28:04 -0700 Roland Dreier wrote: > How does the response bit get handled then? (eg lots of consumers > will want to see get responses but not gets). Currently, there's not a way to hand a received MAD to more than one consumer, so responses go only to the requestor. My preference would be to give ownership of a received MAD to one consumer in order to avoid data copies. To allow other clients to see MADs, I was thinking of adding in some sort of snooping functionality (either by extending these APIs or adding a new one). But I wanted to make it clear through the API which clients would be taking ownership of received MADs, versus those who were only allowed to view it. > Also if it's just a bitmask of which method we want, we might as well > make it a bitmap (ie use DECLARE_BITMAP(method_mask, 256) or something > like that). I've updated the file to use a bitmask. I've also added the file to SVN under https://openib.org/svn/trunk/contrib/intel/ib_mad.h. > Roland> - should we use a "struct list_head" instead of our own type > Roland> for ib_mad_msg.next? > > Sean> I'm fine with this. I followed what I had for the work > Sean> requests. Maybe those should change as well? > > That's a good idea. It adds one more pointer to the work request > (since struct list_head is doubly linked) but I think being able to > use macros like list_for_each(), list_del(), etc. is worth it. I will update the pointers. From mshefty at ichips.intel.com Wed Aug 4 10:04:57 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 10:04:57 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <08628CA53C6CBA4ABAFB9E808A5214CB0205E190@mercury.infiniconsys.com> References: <20040803173643.43143cdd.mshefty@ichips.intel.com> <08628CA53C6CBA4ABAFB9E808A5214CB0205E190@mercury.infiniconsys.com> Message-ID: <20040804100457.25fa2dbe.mshefty@ichips.intel.com> On Wed, 4 Aug 2004 10:47:35 -0700 "Fab Tillier" wrote: > How does a client that wants to selectively redirect requests to more than > one QP do so with this API? Is this a feature we want, or do we not care > about supporting clients with more than one redirected QP? It seems that > this could be a good scalability feature, no? My thought was to let the client send the redirect message. This differs from what the proposed implementation does. The client can then redirect to any number of QPs that they wish. I do believe that this is a feature we want. From halr at voltaire.com Wed Aug 4 11:12:14 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 14:12:14 -0400 Subject: [openib-general] RMPP Requirements References: <1091640119.1212.205.camel@localhost.localdomain> <20040804093007.02c3a217.mshefty@ichips.intel.com> Message-ID: <006e01c47a4e$92be4f80$6401a8c0@comcast.net> Sean Hefty wrote: > * The access layer needs to know if the RMPP header is in a MAD or > not. I think that this can be determined by the management class. Not sure what you mean by "determined by the management class". I think it is at a minimum class and method (at least for SA today). > * It needs to know the offset of the RMPP header into the MAD. This > seems fixed for currently defined classes. It is fixed for all classes using RMPP. Not sure what you mean by currently defined classes. > * The access layer needs to know what version of RMPP is being used. > This is given in the RMPP header, which would work, provided that the > header doesn't change. I don't think the position of the version in the header can change without the base version changing. That's not likely any time soon :-) > Given this, the access layer should be able to determine if RMPP is > active. Sure the RMPPFlags,Active can be checked. On the responder side, do we want to make sure that RMPP is allowed/supported on that class/method and possibly attribute ? > I don't think it matters which of the three types of transfers are in use. I'm not quite with you yet on this. I can see it for sender and receiver but doesn't someone have to indicate a 2 sided transfer ? Where/how do you see that being handled ? I think timeouts need some exposure through the API somehow. In order to handle the transaction timeout properly, it appears that the packet lifetime in both directions needs to be obtained (passed in ?). It also appears that in order to set the response time properly access to the ClassPortInfo:RespTimeValue is needed (which may change) if present. -- Hal From mshefty at ichips.intel.com Wed Aug 4 10:37:31 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 10:37:31 -0700 Subject: [openib-general] RMPP Requirements In-Reply-To: <006e01c47a4e$92be4f80$6401a8c0@comcast.net> References: <1091640119.1212.205.camel@localhost.localdomain> <20040804093007.02c3a217.mshefty@ichips.intel.com> <006e01c47a4e$92be4f80$6401a8c0@comcast.net> Message-ID: <20040804103731.2182c540.mshefty@ichips.intel.com> On Wed, 04 Aug 2004 14:12:14 -0400 Hal Rosenstock wrote: > Sean Hefty wrote: > > * The access layer needs to know if the RMPP header is in a MAD or > > not. I think that this can be determined by the management class. > > Not sure what you mean by "determined by the management class". > I think it is at a minimum class and method (at least for SA today). 13.6.2.1. If a management class uses RMPP, it shall include the RMPP header...in every MAD of that class type. > > * It needs to know the offset of the RMPP header into the MAD. This > > seems fixed for currently defined classes. > > It is fixed for all classes using RMPP. Not sure what you mean by currently > defined classes. I was referring to vendor-defined classes, versus classes defined in the spec. > > * The access layer needs to know what version of RMPP is being used. > > This is given in the RMPP header, which would work, provided that the > > header doesn't change. > > I don't think the position of the version in the header can change without > the > base version changing. That's not likely any time soon :-) Didn't they change the MAD header from 1.0 to 1.1 without changing the version? Hmm... maybe we can't even trust the version number then... > > Given this, the access layer should be able to determine if RMPP is > > active. > > Sure the RMPPFlags,Active can be checked. On the responder side, > do we want to make sure that RMPP is allowed/supported on that class/method > and possibly attribute ? I would think the checks on a receive would be: 1) do we have someone to give this to 2) did the receiver tell us to look for an RMPP header 3) is RMPP active in the received MAD 4) invoke RMPP handling Checks for send would be: 1) did the sender indicate there's an RMPP header in his MADs 2) is RMPP active in the MAD to send 3) invoke RMPP transfer > > I don't think it matters which of the three types of transfers are in use. > > I'm not quite with you yet on this. I can see it for sender and receiver but > doesn't someone have to indicate a 2 sided transfer ? Where/how do you see > that being handled ? I think that the checks listed above are sufficient. > I think timeouts need some exposure through the API somehow. In order to > handle the transaction timeout properly, it appears that the packet lifetime > in both directions needs to be obtained (passed in ?). It also appears that > in order to set the response time properly access to the > ClassPortInfo:RespTimeValue is needed (which may change) if present. I need to think about this more. My thought is that the client needs to determine the appropriate timeout, either through queries or by using 2 seconds as a nice number. Clients can set the RRespTime directly in the RMPP packet. The access layer should only need to worry about modifying things like the SegmentNumber and PayloadLength. From halr at voltaire.com Wed Aug 4 11:45:16 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 14:45:16 -0400 Subject: [openib-general] [PATCH] GSI: Take care of gsi_dtgrm_pool_create unresolved symbol Message-ID: <1091645117.1485.235.camel@localhost.localdomain> Take care of gsi_dtgrm_pool_create unresolved symbol in gsi_main.c Index: gsi_main.c =================================================================== --- gsi_main.c (revision 585) +++ gsi_main.c (working copy) @@ -881,9 +881,8 @@ goto error3; } - if (gsi_dtgrm_pool_create - (GSI_RMPP_SND_POOL_SIZE, - &newinfo->rmpp_snd_dtgrm_pool) < 0) { + if (gsi_dtgrm_pool_create(GSI_RMPP_SND_POOL_SIZE, + &newinfo->rmpp_snd_dtgrm_pool) < 0) { printk(KERN_ERR "Could not create RMPP send pool\n"); ret = -ENOMEM; goto error4; @@ -2921,12 +2920,17 @@ /* * Create datagram pool */ +#if 0 /* GSI_POOL_TRACE */ int gsi_dtgrm_pool_create_named(u32 cnt, void **handle, char *modname) +#else +int gsi_dtgrm_pool_create(u32 cnt, void **handle) +#endif { struct gsi_dtgrm_pool_info_st *pool; char name[GSI_POOL_MAX_NAME_LEN]; +#if 0 /* GSI_POOL_TRACE */ /* * Sanity check */ @@ -2934,14 +2938,16 @@ printk(KERN_ERR "Invalid pool name\n"); return -ENOENT; } + sprintf(name, "gsi_%s%-d", modname, gsi_pool_cnt++); +#else + sprintf(name, "gsi%-d", gsi_pool_cnt++); +#endif if ((pool = kmalloc(sizeof (*pool), GFP_KERNEL)) == NULL) { printk(KERN_ERR "Memory allocation error\n"); return -ENOMEM; } - sprintf(name, "gsi_%s%-d", modname, gsi_pool_cnt++); - if ((pool->cache = kmem_cache_create(name, sizeof (struct gsi_dtgrm_priv_st), 0, @@ -3122,7 +3128,11 @@ */ EXPORT_SYMBOL_NOVERS(gsi_reg_class); EXPORT_SYMBOL_NOVERS(gsi_dereg_class); +#if 0 /* GSI_POOL_TRACE */ EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_create_named); +#else +EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_create); +#endif EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_destroy); EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_get); EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_put); From mshefty at ichips.intel.com Wed Aug 4 10:45:50 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 10:45:50 -0700 Subject: [openib-general] PATCH *next to list Message-ID: <20040804104550.1e1b8335.mshefty@ichips.intel.com> Here's a patch that converts *next fields to using struct list_head. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 571) +++ ib_verbs.h (working copy) @@ -460,7 +460,7 @@ }; struct ib_send_wr { - struct ib_send_wr *next; + struct list_head list; u64 wr_id; struct ib_sge *sg_list; int num_sge; @@ -488,7 +488,7 @@ }; struct ib_recv_wr { - struct _ib_recv_wr *next; + struct list_head list; u64 wr_id; struct ib_sge *sg_list; int num_sge; Index: ib_mad.h =================================================================== --- ib_mad.h (revision 584) +++ ib_mad.h (working copy) @@ -43,7 +43,7 @@ }; struct ib_mad_msg { - struct ib_mad_msg *next; + struct list_head list; /* See about zero-copy... */ void *buf; From roland at topspin.com Wed Aug 4 12:35:25 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 04 Aug 2004 12:35:25 -0700 Subject: [openib-general] PATCH *next to list In-Reply-To: <20040804104550.1e1b8335.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 4 Aug 2004 10:45:50 -0700") References: <20040804104550.1e1b8335.mshefty@ichips.intel.com> Message-ID: <52k6weitmq.fsf@topspin.com> Sean> Here's a patch that converts *next fields to using struct Sean> list_head. Looks good to me. - R. From roland at topspin.com Wed Aug 4 12:36:48 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 04 Aug 2004 12:36:48 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040804084138.GA29136@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 4 Aug 2004 11:41:38 +0300") References: <20040804084138.GA29136@mellanox.co.il> Message-ID: <52fz72itkf.fsf@topspin.com> Michael> Hello! I am looking at the ib core gen2 code now, and I Michael> am puzzled by the fact that some functions just seem to Michael> do a check and then copy another one. Taking the Michael> following example Michael> Its true that req_ncomp_notif could be NULL, but this Michael> could be fixed, instead, by modifying it to a reasonable Michael> default when the device is initialised. It's not really a big deal either way. The Linux model seems to be that function pointers are NULL for unimplemented methods (eg in struct netdevice). - R. From halr at voltaire.com Wed Aug 4 12:41:05 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 15:41:05 -0400 Subject: [openib-general] RMPP Requirements References: <1091640119.1212.205.camel@localhost.localdomain> <20040804093007.02c3a217.mshefty@ichips.intel.com> <006e01c47a4e$92be4f80$6401a8c0@comcast.net> <20040804103731.2182c540.mshefty@ichips.intel.com> Message-ID: <009901c47a5a$fccdaf40$6401a8c0@comcast.net> Sean Hefty wrote: > On Wed, 04 Aug 2004 14:12:14 -0400 > Hal Rosenstock wrote: > >> Sean Hefty wrote: >>> * The access layer needs to know if the RMPP header is in a MAD or >>> not. I think that this can be determined by the management class. >> >> Not sure what you mean by "determined by the management class". >> I think it is at a minimum class and method (at least for SA today). > > 13.6.2.1. If a management class uses RMPP, it shall include the RMPP > header...in every MAD of that class type. Yes, you're right. There is some other non compliance text indicating that RMPP is on a class, version, attribute basis. This is related to whether the RMPP active flag is on or off. That was what confused me. >>> * It needs to know the offset of the RMPP header into the MAD. This >>> seems fixed for currently defined classes. >> >> It is fixed for all classes using RMPP. Not sure what you mean by >> currently defined classes. > > I was referring to vendor-defined classes, versus classes defined in > the spec. OK. >>> * The access layer needs to know what version of RMPP is being used. >>> This is given in the RMPP header, which would work, provided that >>> the header doesn't change. >> >> I don't think the position of the version in the header can change >> without the >> base version changing. That's not likely any time soon :-) > > Didn't they change the MAD header from 1.0 to 1.1 without changing > the version? Hmm... maybe we can't even trust the version number > then... SA class version changed from 1 to 2 between 1.0a and 1.1. In 1.0a RMPP was limited to SA class. The version numbers can be trusted. >>> Given this, the access layer should be able to determine if RMPP is >>> active. >> >> Sure the RMPPFlags,Active can be checked. On the responder side, >> do we want to make sure that RMPP is allowed/supported on that >> class/method and possibly attribute ? > > I would think the checks on a receive would be: > 1) do we have someone to give this to > 2) did the receiver tell us to look for an RMPP header > 3) is RMPP active in the received MAD > 4) invoke RMPP handling > > Checks for send would be: > 1) did the sender indicate there's an RMPP header in his MADs > 2) is RMPP active in the MAD to send > 3) invoke RMPP transfer As long as we don't want any further checking along the lines I described as part of #1. Assuming those checks are not done, this would allow a request to be sent via RMPP and it would be responded to even though it shouldn't be. So we would be relying on the requester to do the right thing. I don't think that is a good idea and it would get caught by a compliance test (some day). Request/response matching would occur above RMPP. >>> I don't think it matters which of the three types of transfers are >>> in use. >> >> I'm not quite with you yet on this. I can see it for sender and >> receiver but doesn't someone have to indicate a 2 sided transfer ? >> Where/how do you see that being handled ? > > I think that the checks listed above are sufficient. I still don't see how that handles the 2 sided transfer. Is a 2 sided transfer just a RMPP send followed by RMPP receive ? Wouldn't this have to be on the same context (transaction ID, etc. tuple) at a minimum ? That's the significance of the IsDS flag. >> I think timeouts need some exposure through the API somehow. In >> order to handle the transaction timeout properly, it appears that >> the packet lifetime in both directions needs to be obtained (passed >> in ?). It also appears that in order to set the response time >> properly access to the ClassPortInfo:RespTimeValue is needed (which >> may change) if present. > > I need to think about this more. My thought is that the client needs > to determine the appropriate timeout, either through queries or by > using 2 seconds as a nice number. There are some defaults supplied in the spec (different from 2 seconds) for when there is no GS class agent (no ClassPortInfo) and for ttime. Should we only support the default or allow for better timeouts ? > Clients can set the RRespTime directly in the RMPP packet. That's good. We might want to help with an encoding routine for this. > The access layer should only need to > worry about modifying things like the SegmentNumber and > PayloadLength. Wouldn't the response and transaction time outs occur within RMPP ? -- Hal From roland at topspin.com Wed Aug 4 12:45:52 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 04 Aug 2004 12:45:52 -0700 Subject: [openib-general] [RFC] mthca to require kernel version >= 2.6.8? Message-ID: <52brhqit5b.fsf@topspin.com> In my tree I have some changes to mthca that add better MSI/MSI-X support. MSI-X support in particular is a win because have separate interrupts for different event queues means that the mthca interrupt handler does not need to do a (slow, expensive) MMIO read across PCI to find out what event triggered the interrupt. However, the Linux MSI/MSI-X API made device drivers very awkward until I succeeded in getting Long Nguyen's patches merged into 2.6.8-rc3. The new mthca code is written to this new (sane) API, so it requires a very recent kernel. I am planning on committing my changes to my main subversion branch when kernel 2.6.8 is released (which should be soon). This will mean that mthca will not compile against any older kernels. Does anyone see a problem with this? Thanks, Roland From halr at voltaire.com Wed Aug 4 13:18:19 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 16:18:19 -0400 Subject: [openib-general] GSI compromise In-Reply-To: <20040803173643.43143cdd.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> Message-ID: <1091650700.1212.245.camel@localhost.localdomain> On Tue, 2004-08-03 at 20:36, Sean Hefty wrote: > There are still some disagreements on the GSI APIs and overal model, and I would like to try to get everyone together on this. Using the gsi.h file under trunk/contrib/voltaire/access as a base, I tried to modify the API slightly to meet some sort of compromise. The modified file is below. Some comments about the changes: Looks good although I need some more time to think about redirection before commenting on those aspects of the API. Just a couple of minor comments/questions (as others have already been brought up on the list): In terms of timeout_ms, would 0 or some other value be no timeout (for request/response matching) ? Rather than ib_mad_reg/dereg_class, it looks more like ib_mad_register/deregister to me now :-) -- Hal From Neeraj.Gupta at Sun.COM Wed Aug 4 13:39:23 2004 From: Neeraj.Gupta at Sun.COM (Neeraj Gupta) Date: Wed, 04 Aug 2004 13:39:23 -0700 Subject: [openib-general] vapi start error : mod_thh Message-ID: <4111497B.20606@Sun.COM> Hi, I am getting failure while starting VAPI. [root at ibsys1 root]# vapi start vapi: Inspecting PCI chipset: [ OK ] vapi: Loading mosal: [ OK ] vapi: Creating device node /dev/mosal: [ OK ] vapi: Loading mod_mpga: [ OK ] vapi: Loading mod_vapi_common: [ OK ] vapi: Loading mod_hh: [ OK ] vapi: Loading mod_thh: [FAILED] I tried many times.. burnt firmware again... reloaded software but no luck. Interesting thing is that the same error I got on another machine and it went away after a couple of reboots but not on this one. I am not sure what made the other machine work ! Some system info : [root at ibsys1 root]# uname -a Linux ibsys1 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 i686 i386 GNU/Linux [root at ibsys1 root]# cat /etc/redhat-release Red Hat Enterprise Linux AS release 3 (Taroon) [root at ibsys1 root]# lsmod Module Size Used by Not tainted mod_hh 16632 0 mod_vapi_common 64360 0 mod_mpga 24960 0 (unused) mosal 113300 0 [mod_vapi_common mod_mpga] mst_pciconf 79104 0 mst_pci 77248 0 mst_info 3448 0 nfs 96912 1 (autoclean) lockd 60656 1 (autoclean) [nfs] sunrpc 92124 1 (autoclean) [nfs lockd] lp 9220 0 (autoclean) parport 39072 0 (autoclean) [lp] autofs 13780 0 (autoclean) (unused) e1000 72320 1 floppy 59056 0 (autoclean) microcode 5248 0 (autoclean) keybdev 2976 0 (unused) mousedev 5688 1 hid 22404 0 (unused) input 6208 0 [keybdev mousedev hid] usb-uhci 27532 0 (unused) usbcore 83168 1 [hid usb-uhci] ext3 95784 1 jbd 56856 1 [ext3] aic79xx 190652 2 sd_mod 13744 4 scsi_mod 116904 2 [aic79xx sd_mod] If someone can help me here, I will appreciate. Thanks, -- ******************************************************************* * Neeraj Gupta email: neeraj.gupta at sun.com * Netra Systems & Networking phone: +1(510)936-4852 or x14852 * Sun Microsystems, Inc. fax : +1(510)936-4963 * 7788 Gateway Blvd, UNWK19-202, Newark, CA 94560, USA ******************************************************************* From Tom.Duffy at Sun.COM Wed Aug 4 13:52:26 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Wed, 04 Aug 2004 13:52:26 -0700 Subject: [openib-general] [RFC] mthca to require kernel version >= 2.6.8? In-Reply-To: <52brhqit5b.fsf@topspin.com> References: <52brhqit5b.fsf@topspin.com> Message-ID: <1091652746.8342.8.camel@duffman> On Wed, 2004-08-04 at 12:45 -0700, Roland Dreier wrote: > Does anyone see a problem with this? As I said to you at Linuxworld, I think it is a good idea to track current development of 2.6.x and I don't have a problem keeping openib dependent on very recent bits. -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From Tom.Duffy at Sun.COM Wed Aug 4 13:54:42 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Wed, 04 Aug 2004 13:54:42 -0700 Subject: [openib-general] [PATCH] get rid of some more typedefs in ip2pr_export.h Message-ID: <1091652882.8342.11.camel@duffman> Signed-by: Tom Duffy with permission from Sun Legal. Index: drivers/infiniband/ulp/ipoib/ip2pr_export.h =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_export.h (revision 583) +++ drivers/infiniband/ulp/ipoib/ip2pr_export.h (working copy) @@ -33,7 +33,7 @@ /* * address resolution ID */ -typedef tUINT64 tIP2PR_PATH_LOOKUP_ID; +typedef u64 tIP2PR_PATH_LOOKUP_ID; /* * invalid address resolution ID */ @@ -41,21 +41,21 @@ /* * address resolved completion function. */ -typedef tINT32(*tIP2PR_PATH_LOOKUP_FUNC) (tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, - tUINT32 src_addr, - tUINT32 dst_addr, - tTS_IB_PORT hw_port, - struct ib_device *ca, - struct ib_path_record *path, - tPTR usr_arg); +typedef s32(*tIP2PR_PATH_LOOKUP_FUNC) (tIP2PR_PATH_LOOKUP_ID plid, + s32 status, + u32 src_addr, + u32 dst_addr, + tTS_IB_PORT hw_port, + struct ib_device *ca, + struct ib_path_record *path, + tPTR usr_arg); -typedef tINT32(*tGID2PR_LOOKUP_FUNC) (tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, - tTS_IB_PORT hw_port, - struct ib_device *ca, - struct ib_path_record *path, - tPTR usr_arg); +typedef s32(*tGID2PR_LOOKUP_FUNC) (tIP2PR_PATH_LOOKUP_ID plid, + s32 status, + tTS_IB_PORT hw_port, + struct ib_device *ca, + struct ib_path_record *path, + tPTR usr_arg); /* * address lookup initiation. * @@ -67,32 +67,37 @@ * arg - supplied argument is returned in callback function * plid - pointer to storage for identifier of this query. */ -tINT32 tsIp2prPathRecordLookup(tUINT32 dst_addr, /* NBO */ - tUINT32 src_addr, /* NBO */ - tUINT8 localroute, - tINT32 bound_dev_if, - tIP2PR_PATH_LOOKUP_FUNC func, - tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid); +s32 tsIp2prPathRecordLookup(u32 dst_addr, /* NBO */ + u32 src_addr, /* NBO */ + u8 localroute, + s32 bound_dev_if, + tIP2PR_PATH_LOOKUP_FUNC func, + tPTR arg, + tIP2PR_PATH_LOOKUP_ID *plid); + /* * address lookup cancel */ -tINT32 tsIp2prPathRecordCancel(tIP2PR_PATH_LOOKUP_ID plid); +s32 tsIp2prPathRecordCancel(tIP2PR_PATH_LOOKUP_ID plid); + /* * Giver a Source and Destination GID, get the path record */ -tINT32 tsGid2prLookup - (tTS_IB_GID src_gid, - tTS_IB_GID dst_gid, - u16 pkey, - tGID2PR_LOOKUP_FUNC func, tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid); -tINT32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid); +s32 tsGid2prLookup(tTS_IB_GID src_gid, + tTS_IB_GID dst_gid, + u16 pkey, + tGID2PR_LOOKUP_FUNC func, + tPTR arg, + tIP2PR_PATH_LOOKUP_ID * plid); + +s32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid); #endif -struct tIP2PR_LOOKUP_PARAM_STRUCT { - tUINT32 dst_addr; +struct ip2pr_lookup_param { + u32 dst_addr; struct ib_path_record *path_record; }; -struct tGID2PR_LOOKUP_PARAM_STRUCT { +struct gid2pr_lookup_param { tTS_IB_GID src_gid; tTS_IB_GID dst_gid; u16 pkey; @@ -100,10 +105,6 @@ tTS_IB_PORT port; struct ib_path_record *path_record; }; -typedef struct tIP2PR_LOOKUP_PARAM_STRUCT tIP2PR_LOOKUP_PARAM_STRUCT, - *tIP2PR_LOOKUP_PARAM; -typedef struct tGID2PR_LOOKUP_PARAM_STRUCT tGID2PR_LOOKUP_PARAM_STRUCT, - *tGID2PR_LOOKUP_PARAM; #define IP2PR_IOC_MAGIC 100 #define IP2PR_IOC_LOOKUP_REQ _IOWR(IP2PR_IOC_MAGIC, 0, void *) Index: drivers/infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_link.c (revision 583) +++ drivers/infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -2410,14 +2410,15 @@ s32 _tsIp2prUserLookup(unsigned long arg) { struct ip2pr_user_req *ureq; - tIP2PR_LOOKUP_PARAM_STRUCT param; + struct ip2pr_lookup_param param; s32 status; tIP2PR_PATH_LOOKUP_ID plid; if (0 == arg) { return (-EINVAL); } - if (copy_from_user(¶m, (tIP2PR_LOOKUP_PARAM) arg, sizeof(param))) { + if (copy_from_user(¶m, (struct ip2pr_lookup_param *) arg, + sizeof(param))) { return (-EFAULT); } if (NULL == param.path_record) { @@ -2462,8 +2463,7 @@ s32 _tsGid2prUserLookup(unsigned long arg) { struct ip2pr_user_req *ureq; - tGID2PR_LOOKUP_PARAM_STRUCT param; - tGID2PR_LOOKUP_PARAM upa; + struct gid2pr_lookup_param param, *upa; s32 status; tIP2PR_PATH_LOOKUP_ID plid; @@ -2471,7 +2471,7 @@ return (-EINVAL); } - if (copy_from_user(¶m, (tGID2PR_LOOKUP_PARAM) arg, + if (copy_from_user(¶m, (struct gid2pr_lookup_param *) arg, sizeof(param))) { return (-EFAULT); } @@ -2505,7 +2505,7 @@ return (-EHOSTUNREACH); } - upa = (tGID2PR_LOOKUP_PARAM) arg; + upa = (struct gid2pr_lookup_param *) arg; copy_to_user(&upa->device, &ureq->device, sizeof(upa->device)); copy_to_user(&upa->port, &ureq->port, sizeof(upa->port)); copy_to_user(param.path_record, &ureq->path_record, -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mlleinin at hpcn.ca.sandia.gov Wed Aug 4 13:54:21 2004 From: mlleinin at hpcn.ca.sandia.gov (Matt L. Leininger) Date: Wed, 04 Aug 2004 13:54:21 -0700 Subject: [openib-general] [RFC] mthca to require kernel version >= 2.6.8? In-Reply-To: <52brhqit5b.fsf@topspin.com> References: <52brhqit5b.fsf@topspin.com> Message-ID: <1091652861.12397.142.camel@trinity> On Wed, 2004-08-04 at 12:45, Roland Dreier wrote: > In my tree I have some changes to mthca that add better MSI/MSI-X > support. MSI-X support in particular is a win because have separate > interrupts for different event queues means that the mthca interrupt > handler does not need to do a (slow, expensive) MMIO read across PCI > to find out what event triggered the interrupt. > > However, the Linux MSI/MSI-X API made device drivers very awkward > until I succeeded in getting Long Nguyen's patches merged into > 2.6.8-rc3. The new mthca code is written to this new (sane) API, so > it requires a very recent kernel. > > I am planning on committing my changes to my main subversion branch > when kernel 2.6.8 is released (which should be soon). This will mean > that mthca will not compile against any older kernels. > > Does anyone see a problem with this? > I don't see a problem with this. Our foremost goal is to get into the 2.6 kernel, not to support older 2.6 (or 2.4 kernels). Those who need support for 2.4 or an earlier versions of the 2.6 kernels can use the vendor (gen1) stacks. - Matt From gdror at mellanox.co.il Wed Aug 4 14:17:47 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Thu, 5 Aug 2004 00:17:47 +0300 Subject: [openib-general] vapi start error : mod_thh Message-ID: <506C3D7B14CDD411A52C00025558DED605869E03@mtlex01.yok.mtl.com> > -----Original Message----- > From: Neeraj Gupta [mailto:Neeraj.Gupta at Sun.COM] > Sent: Wednesday, August 04, 2004 11:39 PM > > Hi, > > I am getting failure while starting VAPI. > Try looking at /var/log/messages - maybe you can get a hint there. I'd also try /sbin/lspci to see that you got the HCA properly installed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tom.Duffy at Sun.COM Wed Aug 4 14:17:32 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Wed, 04 Aug 2004 14:17:32 -0700 Subject: [openib-general] [PATCH] cleanup typedefs in ip2pr_proc.h Message-ID: <1091654252.8342.15.camel@duffman> Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/ulp/ipoib/ip2pr_proc.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_proc.c (revision 583) +++ drivers/infiniband/ulp/ipoib/ip2pr_proc.c (working copy) @@ -26,58 +26,73 @@ static const char _dir_name_root[] = TS_IP2PR_PROC_DIR_NAME; static struct proc_dir_entry *_dir_root = NULL; -extern tINT32 tsIp2prPathElementTableDump(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); -extern tINT32 tsIp2prIpoibWaitTableDump(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); -extern tINT32 tsIp2prProcRetriesRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); -extern tINT32 tsIp2prProcTimeoutRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); -extern tINT32 tsIp2prProcBackoffRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); -extern tINT32 tsIp2prProcCacheTimeoutRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); +extern s32 tsIp2prPathElementTableDump(tSTR buffer, + s32 max_size, + s32 start_index, + long *end_index); +extern s32 tsIp2prIpoibWaitTableDump(tSTR buffer, + s32 max_size, + s32 start_index, + long *end_index); +extern s32 tsIp2prProcRetriesRead(tSTR buffer, + s32 max_size, + s32 start_index, + long *end_index); +extern s32 tsIp2prProcTimeoutRead(tSTR buffer, + s32 max_size, + s32 start_index, + long *end_index); +extern s32 tsIp2prProcBackoffRead(tSTR buffer, + s32 max_size, + s32 start_index, + long *end_index); +extern s32 tsIp2prProcCacheTimeoutRead(tSTR buffer, + s32 max_size, + s32 start_index, + long *end_index); extern int tsIp2prProcRetriesWrite(struct file *file, const char *buffer, - unsigned long count, void *pos); + unsigned long count, + void *pos); extern int tsIp2prProcTimeoutWrite(struct file *file, const char *buffer, - unsigned long count, void *pos); + unsigned long count, + void *pos); extern int tsIp2prProcBackoffWrite(struct file *file, const char *buffer, - unsigned long count, void *pos); + unsigned long count, + void *pos); extern int tsIp2prProcCacheTimeoutWrite(struct file *file, const char *buffer, - unsigned long count, void *pos); + unsigned long count, + void *pos); extern int tsIp2prProcTotalReq(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); + s32 max_size, + s32 start_index, + long *end_index); extern int tsIp2prProcArpTimeout(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); + s32 max_size, + s32 start_index, + long *end_index); extern int tsIp2prProcPathTimeout(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); + s32 max_size, + s32 start_index, + long *end_index); extern int tsIp2prProcTotalFail(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index); + s32 max_size, + s32 start_index, + long *end_index); /* ========================================================================= */ /*.._tsIp2prProcReadParse -- read function for the injection table */ -static tINT32 _tsIp2prProcReadParse - (char *page, - char **start, off_t offset, tINT32 count, tINT32 * eof, tPTR data) { - tIP2PR_PROC_SUB_ENTRY sub_entry = (tIP2PR_PROC_SUB_ENTRY) data; +static s32 _tsIp2prProcReadParse(char *page, char **start, off_t offset, + s32 count, s32 *eof, tPTR data) +{ + struct ip2pr_proc_sub_entry *sub_entry = + (struct ip2pr_proc_sub_entry *) data; long end_index = 0; - tINT32 size; + s32 size; TS_CHECK_NULL(sub_entry, -EINVAL); @@ -103,7 +118,7 @@ return size; } /* _tsIp2prProcReadParse */ -static tIP2PR_PROC_SUB_ENTRY_STRUCT _file_entry_list[TS_IP2PR_PROC_ENTRIES] = { +static struct ip2pr_proc_sub_entry _file_entry_list[TS_IP2PR_PROC_ENTRIES] = { {entry:NULL, type:TS_IP2PR_PROC_ENTRY_ARP_WAIT, name:"arp_wait", @@ -158,10 +173,10 @@ /* ========================================================================= */ /*..tsIp2prProcFsCleanup -- cleanup the proc filesystem entries */ -tINT32 tsIp2prProcFsCleanup(void - ) { - tIP2PR_PROC_SUB_ENTRY sub_entry; - tINT32 counter; +s32 tsIp2prProcFsCleanup(void) +{ + struct ip2pr_proc_sub_entry *sub_entry; + s32 counter; TS_CHECK_NULL(_dir_root, -EINVAL); /* @@ -189,19 +204,20 @@ /* ========================================================================= */ /*..tsIp2prProcFsInit -- initialize the proc filesystem entries */ -tINT32 tsIp2prProcFsInit(void) { - tIP2PR_PROC_SUB_ENTRY sub_entry; - tINT32 result; - tINT32 counter; +s32 tsIp2prProcFsInit(void) +{ + struct ip2pr_proc_sub_entry *sub_entry; + s32 result; + s32 counter; /* * XXX still need to check this: * validate some assumptions the write parser will be making. */ - if (0 && sizeof(tINT32) != sizeof(tSTR)) { + if (0 && sizeof(s32) != sizeof(tSTR)) { TS_TRACE(MOD_IP2PR, T_TERSE, TRACE_FLOW_FATAL, "PROC: integers and pointers of a different size. <%d:%d>", - sizeof(tINT32), sizeof(tSTR)); + sizeof(s32), sizeof(tSTR)); return -EFAULT; } /* if */ Index: drivers/infiniband/ulp/ipoib/ip2pr_proc.h =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_proc.h (revision 583) +++ drivers/infiniband/ulp/ipoib/ip2pr_proc.h (working copy) @@ -30,19 +30,18 @@ /* --------------------------------------------------------------------- */ /* read callback prototype. */ /* --------------------------------------------------------------------- */ -typedef tINT32(*tIP2PR_PROC_READ_CB_FUNC) (tSTR buffer, - tINT32 max_size, - tINT32 start, long *end); -typedef struct tIP2PR_PROC_SUB_ENTRY_STRUCT tIP2PR_PROC_SUB_ENTRY_STRUCT, - *tIP2PR_PROC_SUB_ENTRY; +typedef s32(*tIP2PR_PROC_READ_CB_FUNC) (tSTR buffer, + s32 max_size, + s32 start, + long *end); -struct tIP2PR_PROC_SUB_ENTRY_STRUCT { +struct ip2pr_proc_sub_entry { tSTR name; - tINT32 type; + s32 type; struct proc_dir_entry *entry; tIP2PR_PROC_READ_CB_FUNC read; write_proc_t *write; -}; /* tIP2PR_PROC_SUB_ENTRY_STRUCT */ +}; /* --------------------------------------------------------------------- */ /* entry write parsing */ /* --------------------------------------------------------------------- */ @@ -70,24 +69,19 @@ TS_IP2PR_PROC_ENTRIES /* number of entries in framework */ } TS_IP2PR_PROC_ENTRY_LIST; -typedef struct tIP2PR_PROC_ENTRY_WRITE_STRUCT tIP2PR_PROC_ENTRY_WRITE_STRUCT, - *tIP2PR_PROC_ENTRY_WRITE; -typedef struct tIP2PR_PROC_ENTRY_PARSE_STRUCT tIP2PR_PROC_ENTRY_PARSE_STRUCT, - *tIP2PR_PROC_ENTRY_PARSE; - -struct tIP2PR_PROC_ENTRY_WRITE_STRUCT { - tINT16 id; - tINT16 type; +struct ip2pr_proc_entry_write { + s16 id; + s16 type; union { - tINT32 i; + s32 i; tSTR s; } value; -}; /* tIP2PR_PROC_WRITE_STRUCT */ +}; -struct tIP2PR_PROC_ENTRY_PARSE_STRUCT { - tINT16 id; - tINT16 type; +struct ip2pr_proc_entry_parse { + s16 id; + s16 type; tSTR value; -}; /* tIP2PR_PROC_ENTRY_PARSE_STRUCT */ +}; #endif /* _TS_IP2PR_PROC_H */ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mshefty at ichips.intel.com Wed Aug 4 13:15:20 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 13:15:20 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <1091650700.1212.245.camel@localhost.localdomain> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> <1091650700.1212.245.camel@localhost.localdomain> Message-ID: <20040804131520.05dfb3ca.mshefty@ichips.intel.com> On Wed, 04 Aug 2004 16:18:19 -0400 Hal Rosenstock wrote: > Looks good although I need some more time to think about redirection > before commenting on those aspects of the API. *nods* I also haven't given too much thought to the use of the API over QP0, so some additional tweaks may be necessary. > In terms of timeout_ms, would 0 or some other value be no timeout (for > request/response matching) ? I was assuming that a timeout_ms of 0 would be no timeout. I.e. it's just a send. > Rather than ib_mad_reg/dereg_class, it looks more like > ib_mad_register/deregister to me now :-) agreed From gdror at mellanox.co.il Wed Aug 4 14:26:54 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Thu, 5 Aug 2004 00:26:54 +0300 Subject: [openib-general] [ANNOUNCE] GSI Implementation Candidate Message-ID: <506C3D7B14CDD411A52C00025558DED605869E06@mtlex01.yok.mtl.com> > -----Original Message----- > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Wednesday, August 04, 2004 7:23 AM > To: Dror Goldenberg > Cc: Grant Grundler; openib-general at openib.org > Subject: Re: [openib-general] [ANNOUNCE] GSI Implementation Candidate > > > Dror> It is not possible to turn off PD check on a MR in > Dror> Tavor. You will have to create an MR for each PD that you > Dror> need. One way to go is to map all kernel apps to the same > Dror> PD, but you probably don't want to do such a thing. > > It's no big deal to create one more MR with translation off for each > kernel PD (in fact mthca does it already for access to address > vectors). So this is a perfectly reasonable solution, it just means > our API can't perfectly match the verbs extensions. If you want to follow the VE API, then here is an idea. The consumer gets the reserved LKey value through query HCA. If you allocate an MR with translation off for each different PD you allocate, then you can provide the consumer with this MR. Obviously you need to do that only for kernel consumers only, and you have a hidden assumption that a consumer doesn't create more than one PD. If you can not assume that , then it won't work... Obviously, you'd need to create the PD (if it hasn't been created yet) when the CA is queried, which is also a bit ugly. Bottom line, is that I am not so sure that it is so important to match the exact verbs in this case. -Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdror at mellanox.co.il Wed Aug 4 14:31:55 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Thu, 5 Aug 2004 00:31:55 +0300 Subject: [openib-general] IPoIB IETF WG presentation updated again Message-ID: <506C3D7B14CDD411A52C00025558DED605869E07@mtlex01.yok.mtl.com> > -----Original Message----- > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Tuesday, August 03, 2004 5:24 PM > > > Hal> Cloning an AH is probably faster than creating a new one from > Hal> scratch. (We would need an additional verb for this). How > Hal> much does this cost ? Is this optimization worth it ? > > Actually cloning an AH is probably slower on Tavor than creating one > from scratch, at least if the AH is in Tavor memory. To clone an AH > one has to do a bunch of reads across PCI and then write the data back > across PCI, while creating a new one is just a matter of taking the > address attributes (which are almost certainly already in CPU cache) > and writing them across PCI. (Reads across PCI are much more > expensive than writes, which can be posted and even write-combined) > If you want to do a fair comparison, then you need to compare ModifyAH with DestroyAH+CreateAH. Anyway, since AVs are read only from the HCA perspective, if you want to do a good implementation of AV operations, then you can completely avoid PIO reads. Just maintain a shadow data structure in the process address space that will hold the kind of information that you intended to read from the device memory. While this solution requires more memory, this memory doesn't need to be pinned, so it's relatively a cheap resource and it reduces PIO reads completely. Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed Aug 4 14:20:42 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 14:20:42 -0700 Subject: [openib-general] RMPP Requirements In-Reply-To: <009901c47a5a$fccdaf40$6401a8c0@comcast.net> References: <1091640119.1212.205.camel@localhost.localdomain> <20040804093007.02c3a217.mshefty@ichips.intel.com> <006e01c47a4e$92be4f80$6401a8c0@comcast.net> <20040804103731.2182c540.mshefty@ichips.intel.com> <009901c47a5a$fccdaf40$6401a8c0@comcast.net> Message-ID: <20040804142042.6360ea1e.mshefty@ichips.intel.com> On Wed, 04 Aug 2004 15:41:05 -0400 Hal Rosenstock wrote: > >>> Given this, the access layer should be able to determine if RMPP is > >>> active. > >> > >> Sure the RMPPFlags,Active can be checked. On the responder side, > >> do we want to make sure that RMPP is allowed/supported on that > >> class/method and possibly attribute ? > > > > I would think the checks on a receive would be: > > 1) do we have someone to give this to > > 2) did the receiver tell us to look for an RMPP header > > 3) is RMPP active in the received MAD > > 4) invoke RMPP handling > > > > Checks for send would be: > > 1) did the sender indicate there's an RMPP header in his MADs > > 2) is RMPP active in the MAD to send > > 3) invoke RMPP transfer > > As long as we don't want any further checking along the lines > I described as part of #1. I'm not quite following this. Which checks are you refering to? > Assuming those checks are not done, this would > allow a request to be sent via RMPP and it would be responded > to even though it shouldn't be. So we would be relying on > the requester to do the right thing. I don't think that is a good > idea and it would get caught by a compliance test (some day). Still not quite following you here... > Request/response matching would occur above RMPP. I agree that request/response matching needs to be above RMPP. > >>> I don't think it matters which of the three types of transfers are > >>> in use. > >> > >> I'm not quite with you yet on this. I can see it for sender and > >> receiver but doesn't someone have to indicate a 2 sided transfer ? > >> Where/how do you see that being handled ? > > > > I think that the checks listed above are sufficient. > > I still don't see how that handles the 2 sided transfer. Is a > 2 sided transfer just a RMPP send followed by RMPP receive ? That's my understanding. (I'm not even sure why the spec bothered defining the startup scenarios. I don't think that they have any effect on the data seen on the wire.) > Wouldn't this have to be on the same context (transaction ID, etc. tuple) > at a minimum ? That's the significance of the IsDS flag. I think we should focus on what's in the MAD itself. E.g. IsDS isn't part of the RMPP header. The context should be able to be limited to the TID. > There are some defaults supplied in the spec (different from 2 seconds) > for when there is no GS class agent (no ClassPortInfo) and for ttime. > Should we only support the default or allow for better timeouts ? I'm not really sure of the issue here. If the clients can set a timeout value, then how they come up with that value shouldn't matter. Are you suggesting that the client should provide multiple timeout values? > > Clients can set the RRespTime directly in the RMPP packet. > > That's good. We might want to help with an encoding routine for this. I agree that we can always add some helper routines in later as needed. > > The access layer should only need to > > worry about modifying things like the SegmentNumber and > > PayloadLength. > > Wouldn't the response and transaction time outs occur within RMPP ? Given that the spec gives us something like: total transaction time = 4.096 us x payloadlen/220 x (2^packetlife_send_to_recv + 2^packetlife_recv_to_send + 2^recv_resp_time + 2^send_RMPP_resp) x 8 because we want a bigger value And trying to calculate this actual timeout involves sending a bunch of queries to the SA, I'm inclined to say, let's just do something really simple. From greg at kroah.com Wed Aug 4 15:55:26 2004 From: greg at kroah.com (Greg KH) Date: Wed, 4 Aug 2004 15:55:26 -0700 Subject: [openib-general] Re: device classes In-Reply-To: <524qnjjvp7.fsf@topspin.com> References: <524qnjjvp7.fsf@topspin.com> Message-ID: <20040804225526.GB11004@kroah.com> On Tue, Aug 03, 2004 at 10:53:08PM -0700, Roland Dreier wrote: > Today we decided that upper-layer protocols like IPoIB should be > classes (in the sense of the word ;). However, as I > start to try and figure things out, I don't see how a class gets > hotplug-type notifications when devices appear or disappear. A class? Look at the struct class_interface code, specifically class_interface_register() and class_interface_unregister() functions. If you call them, then the function pointers you pass in for the add and remove functions in the struct class_interface will get called for every struct class_device that is added or removed for that struct class. > Am I missing something, or does the IB core have to implement > something where classes like IPoIB register with the core, and then > the core creates class_device instances for every registered class > when devices are added? No, that sounds about right. IPoIB would be a struct class. Hm, but then you want that class code to be called whenever a struct device (really a ib_device) is added to the system. That's not built in anywhere, you'll have to either add some custom code, or I need to get off my butt and create a struct bus_interface chunk of code that will work like struct class_interface. Would that help out here? thanks, greg k-h From iod00d at hp.com Wed Aug 4 16:02:00 2004 From: iod00d at hp.com (Grant Grundler) Date: Wed, 4 Aug 2004 16:02:00 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040804084138.GA29136@mellanox.co.il> References: <20040804084138.GA29136@mellanox.co.il> Message-ID: <20040804230200.GB548@cup.hp.com> On Wed, Aug 04, 2004 at 11:41:38AM +0300, Michael S. Tsirkin wrote: > Why cant all users just do device->req_ncomp_notif(cq, wc_cnt) directly? It's useful to hide the indirect function call with either static inline or a macro. This reduces programming errors, hides the "method" used to access the indirect function and defines the "API" for a layer of code. > Its true that req_ncomp_notif could be NULL, but this could > be fixed, instead, by modifying it to a reasonable default when > the device is initialised. I asked the same question about 4 monthes ago for a different "jump table". If a function is "required", then supplying a default function call is practical (and my preference too). thanks, grant From mshefty at ichips.intel.com Wed Aug 4 15:05:18 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 15:05:18 -0700 Subject: [openib-general] MAD snooping Message-ID: <20040804150518.1d29d140.mshefty@ichips.intel.com> I've been looking at what it would take to get some sort of snooping functionality into the MAD APIs. Questions: At which level should this functionality be? Should it be at a lower-level, where individual MADs are seen? Or should it be above the request/response level, similar to what other clients would see? Or both? Is there a better name for this than snooping? Do we need this feature on redirected QPs, if possible? At the lower-layer, *all* MADs would be seen, but the user doesn't get the benefits of RMPP. At the upper level, they get RMPP, but some MADs wouldn't be seen. I'm also trying to understand how this might be implemented, such that if no one is snooping performance is relatively unaffected. I think this depends on the layering. From Neeraj.Gupta at Sun.COM Wed Aug 4 16:13:58 2004 From: Neeraj.Gupta at Sun.COM (Neeraj Gupta) Date: Wed, 04 Aug 2004 16:13:58 -0700 Subject: [openib-general] IPoIB (IBsNice) communication through a TCA Message-ID: <41116DB6.5090907@Sun.COM> Hi, If anyone has experience of setting up an environment like this: Linux+HCA+OpenIB(eth#) <--> IB Switch (with TCA) <--> ethernet host please let me know. I have a few questions and issues. Basically, I am unable to communicate through end to end. Regards, -- ******************************************************************* * Neeraj Gupta email: neeraj.gupta at sun.com * Netra Systems & Networking phone: +1(510)936-4852 or x14852 * Sun Microsystems, Inc. fax : +1(510)936-4963 * 7788 Gateway Blvd, UNWK19-202, Newark, CA 94560, USA ******************************************************************* From halr at voltaire.com Wed Aug 4 16:43:00 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 04 Aug 2004 19:43:00 -0400 Subject: [openib-general] IPoIB (IBsNice) communication through a TCA In-Reply-To: <41116DB6.5090907@Sun.COM> References: <41116DB6.5090907@Sun.COM> Message-ID: <1091662983.1485.256.camel@localhost.localdomain> On Wed, 2004-08-04 at 19:13, Neeraj Gupta wrote: > Hi, > > If anyone has experience of setting up an environment like this: > > Linux+HCA+OpenIB(eth#) <--> IB Switch (with TCA) <--> ethernet host > > please let me know. I have a few questions and issues. Basically, I am > unable to communicate through end to end. Are you running a routing protocol or did you set up static routes at both end nodes ? The two end nodes are on two different IP subnets. The IB switch with TCA is a gateway (router) between the two (IP) subnets. -- Hal From mshefty at ichips.intel.com Wed Aug 4 16:02:13 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 4 Aug 2004 16:02:13 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update Message-ID: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> Here's a patch that includes some additional comments for the APIs in ib_mad.h. I've also added a new value to ib_wc_status that can be used to indicate that a send request timed-out. (I originally wasn't going to do this, but it seemed like the best way to handle this case.) Other changes were renaming fields to match those used in ib_verbs.h. And a new field for an address handle was added to allow clients to cache and re-use address handles when sending MADs. I was going to rename the ib_mad_reg_class to ib_mad_reg, but that's the name of the structure, so I'd have to change that name first... These have not been committed yet. While commenting the ib_mad_msg structure, I was wondering if it made sense to break the structure apart into multiple structures: ib_send_mad_wr, ib_mad_send_wc, ib_mad_recv_wc. Thoughts? Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 587) +++ ib_verbs.h (working copy) @@ -515,7 +515,8 @@ IB_WC_REM_ABORT_ERR, IB_WC_INV_EECN_ERR, IB_WC_INV_EEC_STATE_ERR, - IB_WC_GENERAL_ERR + IB_WC_GENERAL_ERR, + IB_WC_RESP_TIMEOUT_ERR }; enum ib_wc_opcode { Index: ib_mad.h =================================================================== --- ib_mad.h (revision 587) +++ ib_mad.h (working copy) @@ -42,27 +42,41 @@ u32 lo_tid; }; +/** + * ib_mad_msg - MAD information to send or was received. + * list - Allows chaining together multiple messages. + * @context - Set by user and returned for sent messages. For a received + * response message, set to the @context for the matching send. + * @buf - TBD + * @length - TBD + * @ah - Address handle used for send operation. + * @timeout_ms - Used for send operations. Timeout value, in milliseconds, + * to wait for a response message. Set to 0 if no response is expected. + * @status - Completion status for a sent message. + * @remote_qp - Destination QP for a sent message. Source QP for a received + * message. + * @remote_qkey - Specifies the qkey used by remote QP for send operations. + * @remote_lid - LID of remote QP for a received message. + * @pkey_index - pkey index for a received message. + * @sl - service level of source for a received message. + * @path_bits - path bits of source for a received message. + * @grh_flag - Indicates if the GRH is valid. + */ struct ib_mad_msg { struct list_head list; - - /* See about zero-copy... */ void *buf; int length; - - /* send_context is set on a receive to context on matching send */ - void *send_context; + struct ib_ah *ah; void *context; int timeout_ms; - enum ib_wc_status_t status; - u32 remote_qp; - u32 remote_q_key; + u32 remote_qkey; u16 remote_lid; u16 pkey_index; - u8 service_level; + u8 sl; u8 path_bits; - u8 grh_valid; + u8 grh_flag; }; /** From roland at topspin.com Wed Aug 4 17:30:25 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 04 Aug 2004 17:30:25 -0700 Subject: [openib-general] Re: device classes In-Reply-To: <20040804225526.GB11004@kroah.com> (Greg KH's message of "Wed, 4 Aug 2004 15:55:26 -0700") References: <524qnjjvp7.fsf@topspin.com> <20040804225526.GB11004@kroah.com> Message-ID: <523c32ifz2.fsf@topspin.com> Greg> A class? Look at the struct class_interface code, Greg> specifically class_interface_register() and Greg> class_interface_unregister() functions. If you call them, Greg> then the function pointers you pass in for the add and Greg> remove functions in the struct class_interface will get Greg> called for every struct class_device that is added or Greg> removed for that struct class. Hmm, that could work, if the core creates "infiniband_device" and "infiniband_port" classes. But is it kosher to use the class_device.dev pointer to go back up to the actual device (and then create a new class_device)? Greg> No, that sounds about right. IPoIB would be a struct class. Greg> Hm, but then you want that class code to be called whenever Greg> a struct device (really a ib_device) is added to the system. Greg> That's not built in anywhere, you'll have to either add some Greg> custom code, or I need to get off my butt and create a Greg> struct bus_interface chunk of code that will work like Greg> struct class_interface. Would that help out here? Not sure -- it seems like bus_interface would be per bus (not per "bus type"), so it's not quite what we want. IPoIB wants to get notified whenever a device is added to any of the "virtual infiniband" buses we talked about. At a high level here are the things I think we want to happen (exact function names below aren't important, I'm just trying to come up with placeholders): - when a low-level driver (eg mthca) finds an HCA, it calls ib_register_device(), which creates a virtual bus rooted at the PCI device and adds new virtual devices for the HCA and each of its ports (in part to be able to put both global HCA attributes and also per-port attributes in sysfs) - when an ULP (IPoIB, SDP, etc) is loaded, it calls ib_register_ulp() This will trigger callbacks for all of the HCA devices that already exist (if the ULP is loaded after the LLD) and also when new HCA devices are added. - when a low-level driver is unloaded or an HCA is hot-removed, it calls ib_unregister_device(), which calls every ULP's remove() method. - when a ULP is unloaded, it calls ib_unregister_ulp(), which calls the ULP's remove() method for every HCA in the system. The question is how to make this happen within the device model. One way would be to use (abuse?) the class_interface stuff by having ib_register_device() create virtual devices and then create class_devices for each virtual device. Then the ULPs would get called back by the class_interface stuff, and then follow the dev pointer back to the original struct device and create their own class_devices. There may be a better way... - R. - R. From roland at topspin.com Wed Aug 4 18:44:27 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 04 Aug 2004 18:44:27 -0700 Subject: [openib-general] IPoIB (IBsNice) communication through a TCA In-Reply-To: <41116DB6.5090907@Sun.COM> (Neeraj Gupta's message of "Wed, 04 Aug 2004 16:13:58 -0700") References: <41116DB6.5090907@Sun.COM> Message-ID: <52y8kugxz8.fsf@topspin.com> Neeraj> Hi, If anyone has experience of setting up an environment Neeraj> like this: Neeraj> Linux+HCA+OpenIB(eth#) <--> IB Switch (with TCA) <--> Neeraj> ethernet host Neeraj> please let me know. I have a few questions and Neeraj> issues. Basically, I am unable to communicate through end Neeraj> to end. What kind of TCA are you using? If it is a Topspin ethernet gateway I may be able to help you. - Roland From iod00d at hp.com Wed Aug 4 20:03:01 2004 From: iod00d at hp.com (Grant Grundler) Date: Wed, 4 Aug 2004 20:03:01 -0700 Subject: [openib-general] Re: device classes In-Reply-To: <20040804225526.GB11004@kroah.com> References: <524qnjjvp7.fsf@topspin.com> <20040804225526.GB11004@kroah.com> Message-ID: <20040805030301.GA3541@cup.hp.com> On Wed, Aug 04, 2004 at 03:55:26PM -0700, Greg KH wrote: > Hm, but > then you want that class code to be called whenever a struct device > (really a ib_device) is added to the system. Would the device driver have to advertise that it supports the class? grant From halr at voltaire.com Wed Aug 4 21:31:40 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 00:31:40 -0400 Subject: [openib-general] [PATCH] ib_mad.h: Fix some typos Message-ID: <1091680301.3605.271.camel@localhost.localdomain> Fix some typos Index: ib_mad.h =================================================================== --- ib_mad.h (revision 587) +++ ib_mad.h (working copy) @@ -28,6 +28,9 @@ #include "ib_verbs.h" +struct ib_mad_reg; +struct ib_mad_msg; + typedef void (*ib_mad_send_handler)(struct ib_mad_reg *mad_reg, struct ib_mad_msg *msg); typedef void (*ib_mad_recv_handler)(struct ib_mad_reg *mad_reg, @@ -54,7 +57,7 @@ void *context; int timeout_ms; - enum ib_wc_status_t status; + enum ib_wc_status status; u32 remote_qp; u32 remote_q_key; @@ -77,8 +80,8 @@ */ struct ib_mad_reg_req { u8 mgmt_class; - u8 mgmt_class_version, - DECLARE_BITMAP(method_mask, 128), + u8 mgmt_class_version; + DECLARE_BITMAP(method_mask, 128); }; /** From tziporet at mellanox.co.il Thu Aug 5 04:18:06 2004 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 5 Aug 2004 14:18:06 +0300 Subject: [openib-general] vapi start error : mod_thh Message-ID: <506C3D7B14CDD411A52C00025558DED603CFD218@mtlex01.yok.mtl.com> Hi, >From /var/log/messages we can see that the driver does not get the correct FW version (the version received is 0.0.0). Query FW is the first command that the driver performs. The command succeeds but the data written to outbox is wrong. The driver give a physical address for the HCA to write the outbox, and since the command do succeeds I suspect there is a problem with the physical address that the HCA writs. We once saw such a problem and it was a chipset/bios problem. Please provide us the following details so we will have better understanding: 1. The platform you are using. 2. The OS & kernel 3. The output of lspci -xvv Tziporet -----Original Message----- From: Neeraj Gupta [mailto:Neeraj.Gupta at Sun.COM] Sent: Thursday, August 05, 2004 12:58 AM To: Dror Goldenberg Subject: Re: [openib-general] vapi start error : mod_thh Hi Dror, Thanks for your reply. here is the log from /var/log/messages. Actually, I plugged a different HCA now.. and still the same result. I really need to get this system up asap. Please help if possible. Earlier, I was using the mellanox cougar cub card and this time, its the regular version (normal height with memory socket) - Neeraj Aug 4 14:56:51 nspgqa75b tdg: vapi Inspecting PCI chipset: : Success Aug 4 14:56:52 nspgqa75b tdg: vapi Loading mosal: : Success Aug 4 14:56:52 nspgqa75b tdg: vapi Creating device node /dev/mosal: : Success Aug 4 14:56:52 nspgqa75b tdg: vapi Loading mod_mpga: : Success Aug 4 14:56:52 nspgqa75b tdg: vapi Loading mod_vapi_common: : Success Aug 4 14:56:52 nspgqa75b tdg: vapi Loading mod_hh: : Success Aug 4 14:56:52 nspgqa75b kernel: Aug 4 14:56:52 nspgqa75b kernel: Mellanox Tavor Device Driver is creating device "InfiniHost0" (bus=04, devfn=00) Aug 4 14:56:52 nspgqa75b kernel: Aug 4 14:56:52 nspgqa75b kernel: Uhhuh. NMI received for unknown reason 31 on CPU 0. Aug 4 14:56:52 nspgqa75b kernel: Dazed and confused, but trying to continue Aug 4 14:56:52 nspgqa75b kernel: Do you have a strange power saving mode enabled? Aug 4 14:56:52 nspgqa75b kernel: THH(1): THH_hob_create: INSTALLED FIRMWARE VERSION IS NOT SUPPORTED: Aug 4 14:56:52 nspgqa75b kernel: Installed: 0.0.0, Minimum Required: 1.15.0 Aug 4 14:56:52 nspgqa75b kernel: Aug 4 14:56:52 nspgqa75b kernel: THH(1): linux/thh_mod_obj.c[292]: Failed creating THH_hob for InfiniHost0 Aug 4 14:56:52 nspgqa75b kernel: THH(1): THH_init_hh_all_tavor: For all 1 Tavor devices initialization was not successful Aug 4 14:56:52 nspgqa75b kernel: THH(1): linux/thh_mod_obj.c[329]: Failed initialization of all available InfiniHost devices Aug 4 14:56:52 nspgqa75b kernel: Aug 4 14:56:52 nspgqa75b tdg: vapi Loading mod_thh: : Failure Dror Goldenberg wrote: > > -----Original Message----- > > From: Neeraj Gupta [mailto:Neeraj.Gupta at Sun.COM] > > Sent: Wednesday, August 04, 2004 11:39 PM > > > > Hi, > > > > I am getting failure while starting VAPI. > > > > Try looking at /var/log/messages - maybe you can get a hint there. > I'd also try /sbin/lspci to see that you got the HCA properly installed. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Aug 5 05:28:50 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 08:28:50 -0400 Subject: [openib-general] ib_mad.h rmpp_version and RRespTime in header Message-ID: <1091708931.1212.283.camel@localhost.localdomain> Hi, Do we really want the consumer to know RMPP version and pass it into ib_mad_reg_class (and also ib_mad_qp_redir) ? It would seem to me that the RMPP implementation should support whatever version(s) it does and be backward compatible to other versions. The only reason I recall from this thread was allowing the consumer to set RRespTime in the RMPP header as a way of passing this. If that is the only thing, why not do this some other way (like part of ib_mad_msg struct) ? Shouldn't we isolate the consumer from these sorts of things ? -- Hal From mst at mellanox.co.il Thu Aug 5 05:52:42 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 5 Aug 2004 15:52:42 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040804230200.GB548@cup.hp.com> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> Message-ID: <20040805125242.GH2640@mellanox.co.il> Hello, Grant! Quoting r. Grant Grundler (iod00d at hp.com) "Re: [openib-general] ib_req_ncomp_notif in core_ layer": > On Wed, Aug 04, 2004 at 11:41:38AM +0300, Michael S. Tsirkin wrote: > > Why cant all users just do device->req_ncomp_notif(cq, wc_cnt) directly? > > It's useful to hide the indirect function call with either static inline > or a macro. But req_ncomp_notif is not inline, nor is it a macro. > This reduces programming errors, hides the "method" used > to access the indirect function and defines the "API" for a layer of code. I do see a problem though since each new method has to be added in multiple places - I count 4: core header, core implementation, device structure, specific implementation. Thanks, mst From halr at voltaire.com Thu Aug 5 06:03:47 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 09:03:47 -0400 Subject: [openib-general] Some ib_mad.h Redirection Comments Message-ID: <1091711028.3605.295.camel@localhost.localdomain> Hi, I have some questions about the redirection support being proposed in ib_mad.h: Does ib_mad_qp_redir need to take a ib_mad_reg_req parameter or at least a GS class ? How is the redirection information in a GS agent's ClassPortInfo filled in ? Is this part of what register and redirection (maybe a different one from what is proposed) should handle ? The redirection can occur to a remote QP, not just a QP on the local node. In terms of a GS entity making requests (which have responses), if it is redirected, will the retransmission to the redirected location occur without consumer intervention ? Thanks. -- Hal From halr at voltaire.com Thu Aug 5 07:06:20 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 10:06:20 -0400 Subject: [openib-general] [ANNOUNCE] Voltaire OpenIB stack 2.6 kernel support Message-ID: <1091714781.1169.6.camel@localhost.localdomain> The Voltaire OpenIB stack in https://openib.org/svn/trunk/contrib/voltaire/voltaire-ibhost/ now supports the 2.6 kernel. It has the following limitations currently: 1. It was NOT tested on 2.4 kernels. 2. No dapl & user mode interface for gsi & cm yet. 3. Needs MORE testing on 2.6. -- Hal From iod00d at hp.com Thu Aug 5 08:27:36 2004 From: iod00d at hp.com (Grant Grundler) Date: Thu, 5 Aug 2004 08:27:36 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040805125242.GH2640@mellanox.co.il> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> Message-ID: <20040805152736.GB6526@cup.hp.com> On Thu, Aug 05, 2004 at 03:52:42PM +0300, Michael S. Tsirkin wrote: > > It's useful to hide the indirect function call with either static inline > > or a macro. > But req_ncomp_notif is not inline, nor is it a macro. I saw that before I replied. Making it a inline or a macro means it has to be moved to a header file. I'm perfectly ok with that if folks are (1) comfortable the method won't change soon and (2) if it does, an ABI "event" will occur. > > This reduces programming errors, hides the "method" used > > to access the indirect function and defines the "API" for a layer of code. > > I do see a problem though > since each new method has to be added in multiple places - I count 4: > core header, core implementation, device structure, specific implementation. Yes. That can be reduced a bit by using inline/macro in a core header. But basically you are right. That's a tradeoff of using a jump table which provides needed flexibility. thanks, grant From mshefty at ichips.intel.com Thu Aug 5 07:32:26 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 07:32:26 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040805125242.GH2640@mellanox.co.il> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> Message-ID: <20040805073226.497bb020.mshefty@ichips.intel.com> On Thu, 5 Aug 2004 15:52:42 +0300 "Michael S. Tsirkin" wrote: > But req_ncomp_notif is not inline, nor is it a macro. This call is inline in the ib_verbs.h version, which is what I think we're slowly moving towards. Since it's an optional call though, it shouldn't assume that the call exists, and should check first. From mshefty at ichips.intel.com Thu Aug 5 07:35:55 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 07:35:55 -0700 Subject: [openib-general] [PATCH] ib_mad.h: Fix some typos In-Reply-To: <1091680301.3605.271.camel@localhost.localdomain> References: <1091680301.3605.271.camel@localhost.localdomain> Message-ID: <20040805073555.66331a3d.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 00:31:40 -0400 Hal Rosenstock wrote: > Fix some typos Thanks! I'll apply and commit shortly (about an hour). From roland at topspin.com Thu Aug 5 08:39:00 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 05 Aug 2004 08:39:00 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040805073226.497bb020.mshefty@ichips.intel.com> (Sean Hefty's message of "Thu, 5 Aug 2004 07:32:26 -0700") References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> Message-ID: <52llgth9wr.fsf@topspin.com> Sean> This call is inline in the ib_verbs.h version, which is what Sean> I think we're slowly moving towards. Since it's an optional Sean> call though, it shouldn't assume that the call exists, and Sean> should check first. Yeah, I'll make req_ncomp_notif an inline in my tree as well, since it's a data path method. I think most of the non-data path functions should not be inline, since they may need to do some reference counting and so on, and smaller text size is probably better for performance than inlining the function (especially since the jump to the core function is not through a function pointer and can be predicted perfectly). - R. From tduffy at sun.com Thu Aug 5 08:52:11 2004 From: tduffy at sun.com (Tom Duffy) Date: Thu, 05 Aug 2004 08:52:11 -0700 Subject: [openib-general] reply-to munging In-Reply-To: <1091219655.3942.1263.camel@localhost> References: <1091219655.3942.1263.camel@localhost> Message-ID: <1091721131.10131.4.camel@duffman> On Fri, 2004-07-30 at 13:34 -0700, Tom Duffy wrote: > Is there a reason why reply-to munging has been turned off? Ah, it looks like openib switched to mailman (boy, that was quiet and done smoothly). Now there are more than two lists available. Can somebody explain what these various lists are for? http://openib.org/mailman/listinfo openib-bylaws openib-commits openib-general openib-promoters openib-software Obviously, I know what general and commits are, but what are the other three? -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mshefty at ichips.intel.com Thu Aug 5 08:20:48 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 08:20:48 -0700 Subject: [openib-general] ib_mad.h rmpp_version and RRespTime in header In-Reply-To: <1091708931.1212.283.camel@localhost.localdomain> References: <1091708931.1212.283.camel@localhost.localdomain> Message-ID: <20040805082048.14fe6da6.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 08:28:50 -0400 Hal Rosenstock wrote: > Do we really want the consumer to know RMPP version and pass it into > ib_mad_reg_class (and also ib_mad_qp_redir) ? I think that we need a flag of some sort for the consumer to indicate that a MAD has the RMPP header. The size of the RMPP header may change between versions, so the access layer needs to know how large the header is. The consumer needs to know which version is being used, so that it can allocate the header in the MAD. I guess an alternative is to specify the version in the ib_mad_msg (or set directly in the MAD). My hope is that the access layer can be isolated from changes to the management classes, or from having to know which classes require RMPP. > It would seem to me that the RMPP implementation should support whatever > version(s) it does and be backward compatible to other versions. Didn't the RMPP stuff change completely between 1.0 and 1.1, such that trying to support 1.0 and 1.1 requires the client to control the RMPP? This is kind of what I'm worried about. Also, a client may have an idea of what RMPP version is supported by the remote side, whereas the access layer does not. > The only reason I recall from this thread was allowing the consumer to > set RRespTime in the RMPP header as a way of passing this. If that is > the only thing, why not do this some other way (like part of ib_mad_msg > struct) ? Shouldn't we isolate the consumer from these sorts of things ? I think that we should examine whatever alternatives make sense. However, I believe that the sender should provide the buffer space for both the MAD and RMPP headers. If that is done, then it seems to make more sense to set the data directly in the header, rather than have the access layer copy it from ib_mad_msg into the header. From tduffy at sun.com Thu Aug 5 09:39:43 2004 From: tduffy at sun.com (Tom Duffy) Date: Thu, 05 Aug 2004 09:39:43 -0700 Subject: [openib-general] [PATCH] kill build warning in ipoib_arp.c Message-ID: <1091723983.10131.7.camel@duffman> Building ipoib_arp.c causes a warning with new API: CC [M] drivers/infiniband/ulp/ipoib/ipoib_arp.o drivers/infiniband/ulp/ipoib/ipoib_arp.c: In function `_ipoib_sarp_path_lookup':drivers/infiniband/ulp/ipoib/ipoib_arp.c:494: warning: `return' with a value, in function returning void Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/ulp/ipoib/ipoib_arp.c =================================================================== --- drivers/infiniband/ulp/ipoib/ipoib_arp.c (revision 589) +++ drivers/infiniband/ulp/ipoib/ipoib_arp.c (working copy) @@ -490,8 +490,6 @@ dev->name); entry->tid = tid; } - - return 0; } /* =============================================================== */ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mshefty at ichips.intel.com Thu Aug 5 08:40:33 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 08:40:33 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <1091711028.3605.295.camel@localhost.localdomain> References: <1091711028.3605.295.camel@localhost.localdomain> Message-ID: <20040805084033.7dc26c7a.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 09:03:47 -0400 Hal Rosenstock wrote: Obviously, I haven't thought through all of the details on this, so some discussion is good. > Does ib_mad_qp_redir need to take a ib_mad_reg_req parameter or at least > a GS class ? I don't think it's necessary. The ib_mad_reg_req is used to route unsolicited MADs to the proper client. Since the user has control over the QP and CQ in the redirected case, they receive all MADs on the QP that they are redirecting to. > How is the redirection information in a GS agent's ClassPortInfo filled > in ? Is this part of what register and redirection (maybe a different > one from what is proposed) should handle ? The redirection can occur to > a remote QP, not just a QP on the local node. I was assuming that the client would format (maybe with help) and send the redirection message. Redirection messages, therefore, would flow from client to client. If a simpler redirection interface were desired, it could be built on top of the proposed APIs, but I think that that could be above the access layer. > In terms of a GS entity making requests (which have responses), if it is > redirected, will the retransmission to the redirected location occur > without consumer intervention ? I was assuming not. Since redirection comes as a response to a request, my thought was to complete the request as redirected. (Maybe we can add another completion status value.) It should be fairly simple for the client to re-issue the send to the redirected QP. The benefit of this is that the client can automatically send future requests to the redirected QP. From mshefty at ichips.intel.com Thu Aug 5 08:46:12 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 08:46:12 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> References: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> Message-ID: <20040805084612.5deeb4ea.mshefty@ichips.intel.com> On Wed, 4 Aug 2004 16:02:13 -0700 Sean Hefty wrote: > Here's a patch that includes some additional comments for the APIs in ib_mad.h. Applied - along with some additional syntax error fixes from Hal. From ftillier at infiniconsys.com Thu Aug 5 09:50:26 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Thu, 5 Aug 2004 09:50:26 -0700 Subject: [openib-general] ib_mad.h rmpp_version and RRespTime in header In-Reply-To: <20040805082048.14fe6da6.mshefty@ichips.intel.com> Message-ID: <000001c47b0c$506b4d90$655aa8c0@infiniconsys.com> > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Thursday, August 05, 2004 8:21 AM > > On Thu, 05 Aug 2004 08:28:50 -0400 > Hal Rosenstock wrote: > > > It would seem to me that the RMPP implementation should support whatever > > version(s) it does and be backward compatible to other versions. > > Didn't the RMPP stuff change completely between 1.0 and 1.1, such that > trying to support 1.0 and 1.1 requires the client to control the RMPP? > This is kind of what I'm worried about. Also, a client may have an idea > of what RMPP version is supported by the remote side, whereas the access > layer does not. Do we really want to support 1.0a? Does anyone still code implemented to that version of the IB spec? As far as I know, all vendor stacks implement the 1.1 spec, so I'd rather see that be the baseline. This is similar to how we're only supporting 2.6 kernels. - Fab From mshefty at ichips.intel.com Thu Aug 5 08:57:26 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 08:57:26 -0700 Subject: [openib-general] ib_mad.h rmpp_version and RRespTime in header In-Reply-To: <000001c47b0c$506b4d90$655aa8c0@infiniconsys.com> References: <20040805082048.14fe6da6.mshefty@ichips.intel.com> <000001c47b0c$506b4d90$655aa8c0@infiniconsys.com> Message-ID: <20040805085726.014ae378.mshefty@ichips.intel.com> On Thu, 5 Aug 2004 09:50:26 -0700 "Fab Tillier" wrote: > Do we really want to support 1.0a? Egads - I hope not. > Does anyone still code implemented to > that version of the IB spec? As far as I know, all vendor stacks implement > the 1.1 spec, so I'd rather see that be the baseline. This is similar to > how we're only supporting 2.6 kernels. I only mentioned the older spec as an example of how static the headers may be. For example, I think that it makes more sense to move the RMPP headers first, and not have it be tied to MADs at all... From mshefty at ichips.intel.com Thu Aug 5 09:02:26 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 09:02:26 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <52llgth9wr.fsf@topspin.com> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> Message-ID: <20040805090226.28e24b02.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 08:39:00 -0700 Roland Dreier wrote: > I think most of the non-data path functions should not be inline, > since they may need to do some reference counting and so on, I agree. Based on your previous e-mail about setting struct fields and reference counting, my plan was to move some of the calls out of the header file as these features were added. From Yuefeng.Liu at Sun.COM Thu Aug 5 10:26:48 2004 From: Yuefeng.Liu at Sun.COM (Yuefeng Liu) Date: Thu, 05 Aug 2004 10:26:48 -0700 Subject: [openib-general] [ANNOUNCE] Voltaire OpenIB stack 2.6 kernel support In-Reply-To: <1091714781.1169.6.camel@localhost.localdomain> References: <1091714781.1169.6.camel@localhost.localdomain> Message-ID: <41126DD8.3010701@Sun.COM> I just did a svn update on my openib tree and tried to build the voltaire code. I got the following error code from building the src rpm ibhost-2.1.0_12_GPL_BSD-1.src.rpm (I packaged the rpm using ibhost-buildrpm.sh) gcc -fPIC -DDYNAMIC -O2 -Wall -D_DEBUG -DVD_LOGGER_ON -std=c99 -nostdlib -shared -o sock-redirect-env.so sock-redirect-env.o -ldl rm sock-redirect.o sock-redirect-env.o make[4]: Leaving directory `/usr/src/openib/trunk/contrib/voltaire/rpm/BUILD/ibhost-2.1.0_12_GPL_BSD/src/nvigor/sdp/sock-redirect' make[3]: Leaving directory `/usr/src/openib/trunk/contrib/voltaire/rpm/BUILD/ibhost-2.1.0_12_GPL_BSD/src/nvigor/sdp' make[2]: Leaving directory `/usr/src/openib/trunk/contrib/voltaire/rpm/BUILD/ibhost-2.1.0_12_GPL_BSD/src/nvigor/sdp' cp: cannot stat `sdp/q-mng/q-mng.ko': No such file or directory make[1]: *** [bhost26] Error 1 make[1]: Leaving directory `/usr/src/openib/trunk/contrib/voltaire/rpm/BUILD/ibhost-2.1.0_12_GPL_BSD/src/nvigor/sdp' make: *** [bhost] Error 1 Error building ibhost.... error: Bad exit status from /var/tmp/rpm-tmp.40290 (%build) Hal Rosenstock wrote: > The Voltaire OpenIB stack in > https://openib.org/svn/trunk/contrib/voltaire/voltaire-ibhost/ now > supports the 2.6 kernel. > > It has the following limitations currently: > 1. It was NOT tested on 2.4 kernels. > 2. No dapl & user mode interface for gsi & cm yet. > 3. Needs MORE testing on 2.6. > > -- Hal > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Thu Aug 5 13:21:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 16:21:24 -0400 Subject: [openib-general] GSI compromise In-Reply-To: <52smb2iziz.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> <52llgv7cx0.fsf@topspin.com> <20040804080129.2749e280.mshefty@ichips.intel.com> <52smb2iziz.fsf@topspin.com> Message-ID: <1091737286.1911.32.camel@localhost.localdomain> On Wed, 2004-08-04 at 13:28, Roland Dreier wrote: > Sean> I'm good with redirection helper routines. I'd just like to > Sean> get a clean layering. Btw, does anyone know if you can > Sean> redirect in the midding of sending an RMPP message? > > I don't think so -- my impression is that a redirect message counts as > a response to a request, so you can't send one until you get the whole request. The redirect can occur at *any* time, including in the middle of an RMPP transfer. -- Hal From mshefty at ichips.intel.com Thu Aug 5 12:33:54 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 12:33:54 -0700 Subject: [openib-general] GSI compromise In-Reply-To: <1091737286.1911.32.camel@localhost.localdomain> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> <52llgv7cx0.fsf@topspin.com> <20040804080129.2749e280.mshefty@ichips.intel.com> <52smb2iziz.fsf@topspin.com> <1091737286.1911.32.camel@localhost.localdomain> Message-ID: <20040805123354.5e7ba6ce.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 16:21:24 -0400 Hal Rosenstock wrote: > The redirect can occur at *any* time, including in the middle of an RMPP > transfer. I noticed this morning that the spec said that "redirection may be used at any time". But as Roland mentioned, redirection is response message. It would seem like the sender would be sending back two response messages, but with different data if they could send it at *any* time. From mshefty at ichips.intel.com Thu Aug 5 12:45:42 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 12:45:42 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> References: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> Message-ID: <20040805124542.55fec6f9.mshefty@ichips.intel.com> On Wed, 4 Aug 2004 16:02:13 -0700 Sean Hefty wrote: > While commenting the ib_mad_msg structure, I was wondering if it made sense to break the structure apart into multiple structures: ib_send_mad_wr, ib_mad_send_wc, ib_mad_recv_wc. Thoughts? Here's an ib_mad.h version that breaks ib_mad_msg into multiple structures. Looking at the results, I think it makes sense to go this route. Other notable changes: * ib_mad_reg/dereg_class() were renamed to ib_mad_reg/dereg() (suggested by Hal). Field names were changed to match ib_verbs.h (the work request/completion structures). * ib_mad_send_wr now uses a SG-list to specify the send data. This should allow zero-copy sends, and more easily extend ib_mad_post_send() over redirected QPs. * The new ib_mad_recv_wc structure was tweaked (based on the existing proposed GSI implementation) to allow zero-copy receives. This resulted in adding a new call, ib_free_sg_list() that can be used to deallocate the received MAD buffers. Currently, I'm not overly fond of the zero-copy receive support as defined. Just including the file, since a diff marked nearly every line... - Sean /* This software is available to you under a choice of one of two licenses. You may choose to be licensed under the terms of the GNU General Public License (GPL) Version 2, available at , or the OpenIB.org BSD license, available in the LICENSE.TXT file accompanying this software. These details are also available at . THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved. Copyright (c) 2004 Infinicon Corporation. All rights reserved. Copyright (c) 2004 Intel Corporation. All rights reserved. Copyright (c) 2004 Topspin Corporation. All rights reserved. Copyright (c) 2004 Voltaire Corporation. All rights reserved. */ #if !defined( IB_MAD_H ) #define IB_MAD_H #include "ib_verbs.h" struct ib_mad_agent; struct ib_mad_send_wc; struct ib_mad_recv_wc; typedef void (*ib_mad_send_handler)(struct ib_mad_agent *mad_agent, struct ib_mad_send_wc *mad_send_wc); /** * ib_mad_recv_handler - callback handler for a received MAD. * @mad_agent - MAD agent requesting the received MAD. * @mad_recv_wc - Received work completion information on the received MAD. * * After receiving a MAD, clients must call ib_free_sg_list to release * the data buffers associated with the MAD. */ typedef void (*ib_mad_recv_handler)(struct ib_mad_agent *mad_agent, struct ib_mad_recv_wc *mad_recv_wc); struct ib_mad_agent { struct ib_device *device; struct ib_qp *qp; ib_mad_recv_handler recv_handler; ib_mad_send_handler send_handler; void *context; u32 lo_tid; }; enum ib_mad_flags { IB_MAD_GRH_VALID = 1 }; /** * ib_mad_send_wr - send MAD work request. * @list - Allows chaining together multiple requests. * @context - User-controlled work request context. * @sg_list - An array of scatter-gather entries, referencing the MAD's * data buffer(s). The first entry must reference a data buffer of * 256 bytes. * @num_sge - The number of scatter-gather entries. * @mad_flags - Flags used to control the send operation. * @ah - Address handle for the destination. * @timeout_ms - Timeout value, in milliseconds, to wait for a response * message. Set to 0 if no response is expected. * @remote_qpn - Destination QP. * @remote_qkey - Specifies the qkey used by remote QP. * @pkey_index - Pkey index to use. Required when sending on QP1 only. */ struct ib_mad_send_wr { struct list_head list; void *context; struct ib_sge *sg_list; int num_sge; int mad_flags; struct ib_ah *ah; int timeout_ms; u32 remote_qpn; u32 remote_qkey; u16 pkey_index; }; /** * ib_mad_send_wc - MAD send completion information. * @context - Context associated with the send MAD request. * @status - Completion status. * @vendor_err - Optional vendor error information returned with a failed * request. */ struct ib_mad_send_wc { void *context; enum ib_wc_status status; u32 vendor_err; }; /** * ib_mad_recv_wc - received MAD information. * @context - For received response, set to the context specified for * the corresponding send request. * @sg_list - An array of scatter-gather entries, referencing the received * MAD's data buffer(s). * @num_sge - The number of scatter-gather entries. * @mad_flags - Flags used to specify information about the received MAD. * @mad_len - The length of the received MAD, without duplicated headers. * @src_qpn - Source QP. * @pkey_index - Pkey index. * @slid - LID of remote QP. * @sl - Service level of source for a received message. * @dlid_path_bits - Path bits of source for a received message. * * An RMPP receive will either be coalesced into a single data buffer, or * will be handed to the user as a list of scatter-gather entries in order * to avoid data copies. If a list of buffers is provided to the user, * each buffer will be 256 bytes and contain duplicated copies of the MAD * and RMPP headers. */ struct ib_mad_recv_wc { void *context; struct ib_sge *sg_list; int num_sge; int mad_flags; u32 mad_len; u32 src_qp; u16 pkey_index; u16 slid; u8 sl; u8 dlid_path_bits; }; /** * ib_mad_reg_req - MAD registration request * @mgmt_class - Indicates which management class of MADs should be receive * by the caller. This field is only required if the user wishes to * receive unsolicited MADs, otherwise it should be 0. * @mgmt_class_version - Indicates which version of MADs for the given * management class to receive. * @method_mask - The caller will receive unsolicited MADs for any method * where @method_mask = 1. */ struct ib_mad_reg_req { u8 mgmt_class; u8 mgmt_class_version; DECLARE_BITMAP(method_mask, 128); }; /** * ib_mad_reg - Register to send/receive MADs. * @device - The device to register with. * @port - The port on the specified device to use. * @qp_type - Specifies which QP to access. Must be either * IB_QPT_SMI or IB_QPT_GSI. * @mad_reg_req - Specifies which unsolicited MADs should be received * by the caller. This parameter may be NULL if the caller only * wishes to receive solicited responses. * @rmpp_version - If set to 1, indicates that the client will send * and receive MADs that contain the RMPP header for the given version. * If set to 0, indicates that RMPP is not used by this client. * @send_handler - The completion callback routine invoked after a send * request has completed. * @recv_handler - The completion callback routine invoked for a received * MAD. * @context - User specified context associated with the registration. */ struct ib_mad_agent *ib_mad_reg(struct ib_device *device, u8 port, enum ib_qp_type qp_type, struct ib_mad_reg_req *mad_reg_req, u8 rmpp_version, ib_mad_send_handler send_handler, ib_mad_recv_handler recv_handler, void *context); int ib_mad_dereg(struct ib_mad_agent *mad_agent); int ib_mad_post_send(struct ib_mad_agent *mad_agent, struct ib_mad_send_wr *mad_send_wr); struct ib_mad_agent *ib_mad_qp_redir(struct ib_qp *qp, u8 rmpp_version, ib_mad_send_handler send_handler, ib_mad_recv_handler recv_handler, void *context); int ib_mad_process_wc(struct ib_mad_agent *mad_agent, struct ib_wc *wc); /** * ib_free_sg_list - Releases the memory associated with a received MAD. * @sg_list - The sg-list referencing the data buffers to release. * @num_sge - The number of entries in the sg-list. * * This routine releases the memory referenced by the the entries in the * scatter-gather list, as well as the @sg_list pointer itself. */ int ib_free_sg_list(struct ib_sge *sg_list, int num_sge); #endif /* IB_MAD_H */ From halr at voltaire.com Thu Aug 5 13:52:33 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 16:52:33 -0400 Subject: [openib-general] ib_mad.h rmpp_version and RRespTime in header In-Reply-To: <20040805082048.14fe6da6.mshefty@ichips.intel.com> References: <1091708931.1212.283.camel@localhost.localdomain> <20040805082048.14fe6da6.mshefty@ichips.intel.com> Message-ID: <1091739155.1911.59.camel@localhost.localdomain> On Thu, 2004-08-05 at 11:20, Sean Hefty wrote: > On Thu, 05 Aug 2004 08:28:50 -0400 > Hal Rosenstock wrote: > > > Do we really want the consumer to know RMPP version and pass it into > > ib_mad_reg_class (and also ib_mad_qp_redir) ? > > I think that we need a flag of some sort for the consumer to indicate that a MAD has the RMPP header. The size of the RMPP header may change between versions, so the access layer needs to know how large the header is. The consumer needs to know which version is being used, so that it can allocate the header in the MAD. I guess an alternative is to specify the version in the ib_mad_msg (or set directly in the MAD). > > My hope is that the access layer can be isolated from changes to the management classes, or from having to know which classes require RMPP. OK. The version parameter serves as both the flag and the version. > > It would seem to me that the RMPP implementation should support whatever > > version(s) it does and be backward compatible to other versions. > > Didn't the RMPP stuff change completely between 1.0 and 1.1, such that trying to support 1.0 and 1.1 requires the client to control the RMPP? This is kind of what I'm worried about. Also, a client may have an idea of what RMPP version is supported by the remote side, whereas the access layer does not. SA 1.0a (and RMPP) was broken in many ways which is why RMPP was totally redesigned as part of 1.1 and SA changed its class version from 1 to 2. IBA 1.1 is a minimum requirement for deployment (and should be a minimum requirement for OpenIB). If RMPP were to change, I would expect it to change in a way such that the older version were not orphaned and new versions could talk to old versions. So the only thing needed is the local version. In fact, the access layer would return an error code if it were not a supported version. Anyhow, I'm with you on the version... > > The only reason I recall from this thread was allowing the consumer to > > set RRespTime in the RMPP header as a way of passing this. If that is > > the only thing, why not do this some other way (like part of ib_mad_msg > > struct) ? Shouldn't we isolate the consumer from these sorts of things ? > > I think that we should examine whatever alternatives make sense. However, I believe that the sender should provide the buffer space for both the MAD and RMPP headers. If that is done, then it seems to make more sense to set the data directly in the header, rather than have the access layer copy it from ib_mad_msg into the header. I'm confused. I thought the sender was just sending (ib_mad_post_send_msg) a buffer of some length (indicated in ib_mad_msg). Wouldn't RMPP take that and fragment it and add the headers ? Is there another call/structures yet to be added for this ? -- Hal From halr at voltaire.com Thu Aug 5 13:56:12 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 16:56:12 -0400 Subject: [openib-general] ib_mad.h rmpp_version and RRespTime in header In-Reply-To: <000001c47b0c$506b4d90$655aa8c0@infiniconsys.com> References: <000001c47b0c$506b4d90$655aa8c0@infiniconsys.com> Message-ID: <1091739374.1366.66.camel@localhost.localdomain> On Thu, 2004-08-05 at 12:50, Fab Tillier wrote: > Do we really want to support 1.0a? Does anyone still code implemented to > that version of the IB spec? As far as I know, all vendor stacks implement > the 1.1 spec, so I'd rather see that be the baseline. This is similar to > how we're only supporting 2.6 kernels. I agree. 1.0.a is orphaned. 1.1 needs to be the minimum requirement of OpenIB. We will need to deal with the upcoming 1.2 as well. -- Hal From halr at voltaire.com Thu Aug 5 14:20:07 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 17:20:07 -0400 Subject: [openib-general] GSI compromise In-Reply-To: <20040805123354.5e7ba6ce.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> <52ekmrr5o5.fsf@topspin.com> <20040803173643.43143cdd.mshefty@ichips.intel.com> <52llgv7cx0.fsf@topspin.com> <20040804080129.2749e280.mshefty@ichips.intel.com> <52smb2iziz.fsf@topspin.com> <1091737286.1911.32.camel@localhost.localdomain> <20040805123354.5e7ba6ce.mshefty@ichips.intel.com> Message-ID: <1091740808.1911.115.camel@localhost.localdomain> On Thu, 2004-08-05 at 15:33, Sean Hefty wrote: > On Thu, 05 Aug 2004 16:21:24 -0400 > Hal Rosenstock wrote: > > > The redirect can occur at *any* time, including in the middle of an RMPP > > transfer. > > I noticed this morning that the spec said that "redirection may be used at any time". > > But as Roland mentioned, redirection is response message. Yes (but see below). > It would seem like the sender would be sending back two response > messages, but with different data if they could send it at *any* time. There are 2 cases: 1. The request is RMPP'd (SA GetMulti is an example of this). The responder can redirect in the middle of the request. This is the more straightforward one. 2. I think you (and Roland) are talking about when a SA GetTable/GetTraceTable request is made which is not RMPP'd but elicits a RMPP response. How this is done is an "implementation issue". I think there are 2 valid choices: (1) complete the transaction and redirect the next request, or (2) terminate the transaction (cause the request to be retried) and redirect that request to the new location. -- Hal From halr at voltaire.com Thu Aug 5 14:36:58 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 17:36:58 -0400 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <20040805084033.7dc26c7a.mshefty@ichips.intel.com> References: <1091711028.3605.295.camel@localhost.localdomain> <20040805084033.7dc26c7a.mshefty@ichips.intel.com> Message-ID: <1091741820.1911.134.camel@localhost.localdomain> On Thu, 2004-08-05 at 11:40, Sean Hefty wrote: > On Thu, 05 Aug 2004 09:03:47 -0400 > Hal Rosenstock wrote: > > Obviously, I haven't thought through all of the details on this, so some discussion is good. > > > Does ib_mad_qp_redir need to take a ib_mad_reg_req parameter or at least > > a GS class ? > > I don't think it's necessary. The ib_mad_reg_req is used to route unsolicited MADs to the proper client. Since the user has control over the QP and CQ in the redirected case, they receive all MADs on the QP that they are redirecting to. > > > How is the redirection information in a GS agent's ClassPortInfo filled > > in ? Is this part of what register and redirection (maybe a different > > one from what is proposed) should handle ? The redirection can occur to > > a remote QP, not just a QP on the local node. > > I was assuming that the client would format (maybe with help) and send the redirection message. Redirection messages, therefore, would flow from client to client. If a simpler redirection interface were desired, it could be built on top of the proposed APIs, but I think that that could be above the access layer. If the class were also provided when the QP was redirected above, wouldn't all the information be available to issue the redirect from when a request came in ? The only other case is when it is being redirected outside the local node which does not appear to be supported. > > In terms of a GS entity making requests (which have responses), if it is > > redirected, will the retransmission to the redirected location occur > > without consumer intervention ? > > I was assuming not. Since redirection comes as a response to a request, my thought was to complete the request as redirected. (Maybe we can add another completion status value.) It should be fairly simple for the client to re-issue the send to the redirected QP. The benefit of this is that the client can automatically send future requests to the redirected QP. See my previous email on the two cases. Redirection could be deferred to the next request which is what you are proposing but I think I would want to look at the implementation complexity before committing to that. In terms of not supporting remote redirection, this might be an important feature for scaling. I suppose we can always cross that one when we get there. There is one other aspect of redirection that should be discussed. Should the redirection location be cached ? That seems like a good thing to me as it saves on additional redirects to get to the new location. -- Hal From mshefty at ichips.intel.com Thu Aug 5 13:56:59 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 13:56:59 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <1091741820.1911.134.camel@localhost.localdomain> References: <1091711028.3605.295.camel@localhost.localdomain> <20040805084033.7dc26c7a.mshefty@ichips.intel.com> <1091741820.1911.134.camel@localhost.localdomain> Message-ID: <20040805135659.6b27105c.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 17:36:58 -0400 Hal Rosenstock wrote: > If the class were also provided when the QP was redirected above, > wouldn't all the information be available to issue the redirect from > when a request came in ? Having the access layer send the redirect response prevents the client from redirecting to different QPs. See line 27 of page 667: "It is permissible for different requesters...to be redirected to a different interface." This is why I think we want to move the redirection above the lowest level APIs. > The only other case is when it is being redirected outside the local > node which does not appear to be supported. This is supported. The API assumes that the the client formats and sends the redirect response. The purpose behind the ib_mad_qp_redir() routine is to "attach" MAD processing to a user allocated QP. The struct ib_qp* parameter into that routine may not be needed, but could be useful depending on the implementation. I could have made myself clearer by adding a IB_WC_REDIR_RESP to ib_wc for the send completion. > See my previous email on the two cases. Redirection could be deferred to > the next request which is what you are proposing but I think I would > want to look at the implementation complexity before committing to that. My initial thought is to have the redirection occur immediately. Any RMPP in progress is simply aborted, and the send is retransmitted. I don't think we need to worry about trying to optimize redirecting in the middle of RMPP, since it doesn't seem like a sane thing to try to do anyway. > There is one other aspect of redirection that should be discussed. > Should the redirection location be cached ? That seems like a good thing > to me as it saves on additional redirects to get to the new location. The client would be responsible for caching the redirection. From mshefty at ichips.intel.com Thu Aug 5 14:00:31 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 14:00:31 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <20040805135659.6b27105c.mshefty@ichips.intel.com> References: <1091711028.3605.295.camel@localhost.localdomain> <20040805084033.7dc26c7a.mshefty@ichips.intel.com> <1091741820.1911.134.camel@localhost.localdomain> <20040805135659.6b27105c.mshefty@ichips.intel.com> Message-ID: <20040805140031.4bb10055.mshefty@ichips.intel.com> On Thu, 5 Aug 2004 13:56:59 -0700 Sean Hefty wrote: > The struct ib_qp* parameter into that routine may not be needed, but > could be useful depending on the implementation. Scratch that. You can't *send* a MAD without knowing what QP to post it on. From mshefty at ichips.intel.com Thu Aug 5 14:12:07 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 5 Aug 2004 14:12:07 -0700 Subject: [openib-general] ib_mad.h rmpp_version and RRespTime in header In-Reply-To: <1091739155.1911.59.camel@localhost.localdomain> References: <1091708931.1212.283.camel@localhost.localdomain> <20040805082048.14fe6da6.mshefty@ichips.intel.com> <1091739155.1911.59.camel@localhost.localdomain> Message-ID: <20040805141207.7c40ea07.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 16:52:33 -0400 Hal Rosenstock wrote: > I'm confused. I thought the sender was just sending > (ib_mad_post_send_msg) a buffer of some length (indicated in > ib_mad_msg). Wouldn't RMPP take that and fragment it and add > the headers ? Is there another call/structures yet to be added > for this ? I'm assuming that the headers are provided by the user, and that the user has set the data in the headers correctly. I'm picturing an RMPP implementation that would do one of two things: 1. Update the header and post a send_wr consisting of at most 2 sg-entries. The first sg-entry would reference the MAD header, the second the data portion. After the send_wr completes, the header would be updated, and the next segment would be transfered. This method allows zero-copy sends. 2. Allocate an array of MAD headers. Copy the existing MAD header into each slot, then update each header based on its segment number. Post send_wr's with 2 sg-entries each. Each send_wr would reference one of the headers, plus the corresponding data. This method allows sending multiple packets at once. I like the first method myself. From halr at voltaire.com Thu Aug 5 17:13:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 20:13:26 -0400 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <20040805135659.6b27105c.mshefty@ichips.intel.com> References: <1091711028.3605.295.camel@localhost.localdomain> <20040805084033.7dc26c7a.mshefty@ichips.intel.com> <1091741820.1911.134.camel@localhost.localdomain> <20040805135659.6b27105c.mshefty@ichips.intel.com> Message-ID: <1091751208.1366.205.camel@localhost.localdomain> On Thu, 2004-08-05 at 16:56, Sean Hefty wrote: > On Thu, 05 Aug 2004 17:36:58 -0400 > Hal Rosenstock wrote: > > > If the class were also provided when the QP was redirected above, > > wouldn't all the information be available to issue the redirect from > > when a request came in ? > > Having the access layer send the redirect response prevents the client from redirecting to different QPs. Are you talking about a client with multiple QPs and causing the requester to alternate amongst them via some (client) load balancing algorithm ? > See line 27 of page 667: "It is permissible for different requesters...to be redirected to a different interface." > This is why I think we want to move the redirection above the lowest level APIs. I don't think this by itself forces redirection above the API. I think it can still be handled transparently inside the GSI. I'm also not quite sure what a requester is in this context (and this is informative rather than normative (compliance) text). > > See my previous email on the two cases. Redirection could be deferred to > > the next request which is what you are proposing but I think I would > > want to look at the implementation complexity before committing to that. > > My initial thought is to have the redirection occur immediately. Any RMPP in progress is simply aborted, > and the send is retransmitted. I don't think we need to worry about trying to optimize redirecting in the > middle of RMPP, since it doesn't seem like a sane thing to try to do anyway. I think this depends on what the meaning of calling qp_redir is/should be. I think we still have differing assumptions about this. > > There is one other aspect of redirection that should be discussed. > > Should the redirection location be cached ? That seems like a good thing > > to me as it saves on additional redirects to get to the new location. > > The client would be responsible for caching the redirection. This seems like a commonality which can be pushed down into the access layer and makes more sense when the access layer is handling the retransmissions on redirect. If it isn't, it makes less sense. So back to that issue: if the access layer is already matching responses to requests (when timeout is enabled), I don't see why it doesn't handle the retransmission on receipt of redirect so each client doesn't need to implement this. -- Hal From halr at voltaire.com Thu Aug 5 17:19:37 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 20:19:37 -0400 Subject: [openib-general] ib_mad.h rmpp_version and RRespTime in header In-Reply-To: <20040805141207.7c40ea07.mshefty@ichips.intel.com> References: <1091708931.1212.283.camel@localhost.localdomain> <20040805082048.14fe6da6.mshefty@ichips.intel.com> <1091739155.1911.59.camel@localhost.localdomain> <20040805141207.7c40ea07.mshefty@ichips.intel.com> Message-ID: <1091751578.1366.211.camel@localhost.localdomain> On Thu, 2004-08-05 at 17:12, Sean Hefty wrote: > On Thu, 05 Aug 2004 16:52:33 -0400 > Hal Rosenstock wrote: > > > I'm confused. I thought the sender was just sending > > (ib_mad_post_send_msg) a buffer of some length (indicated in > > ib_mad_msg). Wouldn't RMPP take that and fragment it and add > > the headers ? Is there another call/structures yet to be added > > for this ? > > I'm assuming that the headers are provided by the user, and that the user has set the data in the headers correctly. I'm picturing an RMPP implementation that would do one of two things: Yes, I can see from your latest proposed changes. > 1. Update the header and post a send_wr consisting of at most 2 sg-entries. The first sg-entry would reference the MAD header, the second the data portion. After the send_wr completes, the header would be updated, and the next segment would be transfered. This method allows zero-copy sends. > > 2. Allocate an array of MAD headers. Copy the existing MAD header into each slot, then update each header based on its segment number. Post send_wr's with 2 sg-entries each. Each send_wr would reference one of the headers, plus the corresponding data. This method allows sending multiple packets at once. > > I like the first method myself. 1 would need to be done per segment whereas 2 could get a window's worth of segments out at once. 1 is simpler to implement. 2 seems more efficient in terms of RMPP but more complex to implement. -- Hal From halr at voltaire.com Thu Aug 5 17:44:33 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 05 Aug 2004 20:44:33 -0400 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <20040805124542.55fec6f9.mshefty@ichips.intel.com> References: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> <20040805124542.55fec6f9.mshefty@ichips.intel.com> Message-ID: <1091753075.1911.233.camel@localhost.localdomain> On Thu, 2004-08-05 at 15:45, Sean Hefty wrote: > Just including the file, since a diff marked nearly every line... This answered a number of my previous questions. Code talks... I still have a few questions: ib_mad_agent has lo_tid of 32 bits. What are the other 32 bits of tid ? I vaguely recall some email (thread) went by on this. ib_mad_send_wr.sg_list indicates first entry must reference a data buffer of 256 bytes. Is the "base" RMPP header in it ? Which fields must be filled in by the client (RMPPActive, RRespTime, and length) ? Will length be in the first segment only (total length) or also in the last segment ? On the receive side, we need to handle either if we have to deal with non OpenIB implementations :-( What about subsequent entries in the s-g list for send ? Are they also constrained to be 256 bytes or something else ? I would presume RMPP would rewrite the RMPP header based on the first header and update the appropriate fields. Is timeout_ms used for Ttime when it is a RMPP send ? I am still wondering about the RMPP direction switch (IsDS) and whether this needs to be exposed somehow. -- Hal From mlleinin at hpcn.ca.sandia.gov Thu Aug 5 20:34:35 2004 From: mlleinin at hpcn.ca.sandia.gov (Matt L. Leininger) Date: Thu, 05 Aug 2004 20:34:35 -0700 Subject: [openib-general] reply-to munging In-Reply-To: <1091721131.10131.4.camel@duffman> References: <1091219655.3942.1263.camel@localhost> <1091721131.10131.4.camel@duffman> Message-ID: <1091763275.17401.135.camel@trinity> On Thu, 2004-08-05 at 08:52, Tom Duffy wrote: > On Fri, 2004-07-30 at 13:34 -0700, Tom Duffy wrote: > > Is there a reason why reply-to munging has been turned off? > > Ah, it looks like openib switched to mailman (boy, that was quiet and > done smoothly). Now there are more than two lists available. Can > somebody explain what these various lists are for? > > http://openib.org/mailman/listinfo > > openib-bylaws > openib-commits > openib-general > openib-promoters > openib-software > > Obviously, I know what general and commits are, but what are the other > three? > We switched over to mailman because mailboxer was painful and limited. openib-software was being used by the early software working group. All these discussion were moved to openib-general. openib-bylaws is being used by some folks working on ... Bylaws. When they get something worth reading it will be posted to the openib-general list. openib-promoters is being used by the promoters to figured out how to pay for legal and setup fees, etc. - Matt From halr at voltaire.com Fri Aug 6 05:34:09 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 06 Aug 2004 08:34:09 -0400 Subject: [openib-general] [PATCH] IPoIB Building (gen2) Message-ID: <1091795650.1805.1.camel@localhost.localdomain> Fix a couple of nits with gen2 IPoIB building Index: src/linux-kernel/infiniband/ulp/Kconfig =================================================================== --- src/linux-kernel/infiniband/ulp/Kconfig (revision 590) +++ src/linux-kernel/infiniband/ulp/Kconfig (working copy) @@ -2,7 +2,7 @@ tristate "IP-over-InfiniBand" depends on INFINIBAND && NETDEVICES && INET ---help--- - Support for the IP-over-InfiniBand protocol (IPoIB). This + Support for the IETF IP-over-InfiniBand protocol (IPoIB). This transports IP packets over InfiniBand so you can use your IB device as a fancy NIC. To configure interfaces on multiple InfiniBand partitions (P_Keys) you will need the ipoibcfg Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c (revision 590) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c (working copy) @@ -491,7 +491,7 @@ entry->tid = tid; } - return 0; + return; } /* =============================================================== */ From Tom.Duffy at Sun.COM Fri Aug 6 08:06:14 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 06 Aug 2004 08:06:14 -0700 Subject: [openib-general] [PATCH] IPoIB Building (gen2) In-Reply-To: <1091795650.1805.1.camel@localhost.localdomain> References: <1091795650.1805.1.camel@localhost.localdomain> Message-ID: <1091804774.11834.5.camel@localhost> On Fri, 2004-08-06 at 05:34, Hal Rosenstock wrote: > Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c > =================================================================== > --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c (revision 590) > +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c (working copy) > @@ -491,7 +491,7 @@ > entry->tid = tid; > } > > - return 0; > + return; > } > > /* =============================================================== */ I submitted a patch for this yesterday as well. -tduffy From roland at topspin.com Fri Aug 6 08:23:51 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 06 Aug 2004 08:23:51 -0700 Subject: [openib-general] [PATCH] IPoIB Building (gen2) In-Reply-To: <1091804774.11834.5.camel@localhost> (Tom Duffy's message of "Fri, 06 Aug 2004 08:06:14 -0700") References: <1091795650.1805.1.camel@localhost.localdomain> <1091804774.11834.5.camel@localhost> Message-ID: <524qngffy0.fsf@topspin.com> Tom> I submitted a patch for this yesterday as well. Yup, it was on my queue to apply. I hope to catch up today... - R. From roland at topspin.com Fri Aug 6 08:26:14 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 06 Aug 2004 08:26:14 -0700 Subject: [openib-general] [PATCH] IPoIB Building (gen2) In-Reply-To: <1091795650.1805.1.camel@localhost.localdomain> (Hal Rosenstock's message of "Fri, 06 Aug 2004 08:34:09 -0400") References: <1091795650.1805.1.camel@localhost.localdomain> Message-ID: <52zn58e19l.fsf@topspin.com> > - Support for the IP-over-InfiniBand protocol (IPoIB). This > + Support for the IETF IP-over-InfiniBand protocol (IPoIB). This I'm wondering whether this change is useful -- none of the kernel's existing Kconfig entries for things like IPv6 or IPsec mention "IETF." Is adding IETF here going to make this clearer to anyone? - R. From halr at voltaire.com Fri Aug 6 08:28:43 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 06 Aug 2004 11:28:43 -0400 Subject: [openib-general] [PATCH] IPoIB Building (gen2) References: <1091795650.1805.1.camel@localhost.localdomain> <52zn58e19l.fsf@topspin.com> Message-ID: <001801c47bca$1003a0e0$6401a8c0@comcast.net> Roland Dreier wrote: > > - Support for the IP-over-InfiniBand protocol (IPoIB). This > > + Support for the IETF IP-over-InfiniBand protocol (IPoIB). > This > > I'm wondering whether this change is useful -- none of the kernel's > existing Kconfig entries for things like IPv6 or IPsec mention "IETF." > Is adding IETF here going to make this clearer to anyone? I just wanted to distinguish from all the earlier (and there were quite a few) proprietary ways of doing IPoIB and so there is some name confusion. It's not a big deal but I don't think the same thing applies in for IPv6 or IPsec. -- Hal From halr at voltaire.com Fri Aug 6 09:15:18 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 06 Aug 2004 12:15:18 -0400 Subject: [openib-general] [PATCH] GSI: Remove hard coded PKey Index Message-ID: <1091808919.1885.18.camel@localhost.localdomain> Support PKey index rather than hardcoding in GSI Index: access/TODO =================================================================== --- access/TODO (revision 586) +++ access/TODO (working copy) @@ -1,7 +1,6 @@ -8/4/04 +8/6/04 -Consumer needs to be able to set PKey index (currently hardcoded) -Add support for at least responses to requests with GRH +Add support for (at least) responses to requests with GRH Remove #if 0/1 with suitable preprocessor symbols Replace ib_reg_mr with ib_reg_phys_mr Eliminate static limit on numbers of ports/HCAs Index: access/make.rules =================================================================== --- access/make.rules (revision 561) +++ access/make.rules (working copy) @@ -68,8 +68,8 @@ CC=gcc LD=ld LDFLAGS=-m elf_i386 -r -#CFLAGS += -DCPU_BE=0 -DCPU_LE=1 -mpreferred-stack-boundary=2 -fno-common -I/usr/src/linux/include/asm/mach-default -CFLAGS += -mpreferred-stack-boundary=2 -fno-common -I/usr/src/linux/include/asm/mach-default +#CFLAGS += -DCPU_BE=0 -DCPU_LE=1 -mpreferred-stack-boundary=2 -fno-common -I/usr/src/linux-$(shell uname -r)/include/asm-i386/mach-default +CFLAGS += -mpreferred-stack-boundary=2 -fno-common -I/usr/src/linux-$(shell uname -r)/include/asm-i386/mach-default endif endif endif Index: access/gsi_main.c =================================================================== --- access/gsi_main.c (revision 586) +++ access/gsi_main.c (working copy) @@ -81,7 +81,6 @@ #define GSI_MAX_STATIC_RATE 0 #define GSI_SOURCE_PATH_BIT 0 #define GSI_SEND_Q_PSN 0 -#define GSI_P_KEY_INDEX 0 #define GSI_SL 0 #define GSI_TRAFIC_CLASS 0 #define GSI_HOP_LIMIT 63 @@ -243,7 +242,11 @@ } attr->qp_state = IB_QPS_INIT; - attr->pkey_index = GSI_P_KEY_INDEX; + /* + * PKey index for QP1 is irrelevant but one is needed for + * Reset to Init transition. + */ + attr->pkey_index = 0; attr->port = port; attr->qkey = GSI_QP1_WELL_KNOWN_Q_KEY; attr_mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_QKEY; @@ -1663,6 +1666,7 @@ dtgrm_priv->rlid = wc->slid; dtgrm_priv->path_bits = wc->dlid_path_bits; dtgrm_priv->sl = wc->sl; + dtgrm_priv->pkey_index = wc->pkey_index; mad_swap_header(mad); if (gsi_post_send_plm_reply_mad @@ -1682,6 +1686,7 @@ dtgrm_priv->rlid = wc->slid; dtgrm_priv->path_bits = wc->dlid_path_bits; dtgrm_priv->sl = wc->sl; + dtgrm_priv->pkey_index = wc->pkey_index; printk(KERN_DEBUG \ "Received datagram - remote QP num-%d, LID-%d, path bits- %d, SL - %d\n", @@ -1696,10 +1701,7 @@ goto out1; } - /* - * If the received MAD is for RMPP - - * handle it properly. - */ + /* If the received MAD is for RMPP - handle it properly. */ if (gsi_is_rmpp_mad(class_info, (struct gsi_dtgrm_t *) dtgrm_priv)) { gsi_rmpp_recv(class_info, dtgrm_priv); goto out1; @@ -1901,7 +1903,7 @@ /* QP1_WELL_KNOWN_Q_KEY = 0x80010000 */ wr.wr.ud.remote_qkey = dtgrm_priv->r_q_key; wr.wr.ud.ah = addr_hndl; - wr.wr.ud.pkey_index = GSI_P_KEY_INDEX; + wr.wr.ud.pkey_index = dtgrm_priv->pkey_index; wr.send_flags = IB_SEND_SIGNALED; mad_swap_header(mad); @@ -2041,7 +2043,7 @@ /* QP1_WELL_KNOWN_Q_KEY = 0x80010000 */ wr.wr.ud.remote_qkey = dtgrm_priv->r_q_key; wr.wr.ud.ah = addr_hndl; - wr.wr.ud.pkey_index = GSI_P_KEY_INDEX; + wr.wr.ud.pkey_index = dtgrm_priv->pkey_index; wr.send_flags = IB_SEND_SIGNALED; mad_swap_header(mad); @@ -2095,7 +2097,7 @@ /* NOTE: GRH not supported yet! */ rmpp_mad->grh_valid = FALSE; - rmpp_mad->pkey_index = GSI_P_KEY_INDEX; + rmpp_mad->pkey_index = dtgrm_priv->pkey_index; } static void @@ -2111,7 +2113,7 @@ rmpp_mad->remote_sl = dtgrm_priv->sl; rmpp_mad->remote_qp = dtgrm_priv->rqp; rmpp_mad->path_bits = dtgrm_priv->path_bits; - rmpp_mad->pkey_index = GSI_P_KEY_INDEX; + rmpp_mad->pkey_index = dtgrm_priv->pkey_index; rmpp_mad->dir_switch_needed = dtgrm_priv->rmpp_dir_switch_needed; /* NOTE: GRH not supported yet! */ @@ -2603,11 +2605,11 @@ lock_kernel(); -#if 1 /* LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0) */ +#if 0 /* LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0) */ sprintf(current->comm, "gsi-%-6s-%-2d", hca->handle->name, hca->port); daemonize(); #else - daemonize("gsi-%-6s-%-2d", hca->handle->name, hca_port); + daemonize("gsi-%-6s-%-2d", hca->handle->name, hca->port); #endif unlock_kernel(); Index: access/gsi.h =================================================================== --- access/gsi.h (revision 561) +++ access/gsi.h (working copy) @@ -109,6 +109,7 @@ u32 rqp; u32 rqk; u16 rlid; + u16 pkey_index; u8 sl; u64 guid; Index: access/gsi_priv.h =================================================================== --- access/gsi_priv.h (revision 561) +++ access/gsi_priv.h (working copy) @@ -144,6 +144,7 @@ u32 rqp; u32 r_q_key; u16 rlid; + u16 pkey_index; u8 sl; u64 guid; From mshefty at ichips.intel.com Fri Aug 6 10:17:44 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 6 Aug 2004 10:17:44 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <1091753075.1911.233.camel@localhost.localdomain> References: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> <20040805124542.55fec6f9.mshefty@ichips.intel.com> <1091753075.1911.233.camel@localhost.localdomain> Message-ID: <20040806101744.2ba901c5.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 20:44:33 -0400 Hal Rosenstock wrote: > This answered a number of my previous questions. Code talks... I want to make sure that we can get agreement on the API. I do think that we're getting close. > ib_mad_agent has lo_tid of 32 bits. What are the other 32 bits of tid ? > I vaguely recall some email (thread) went by on this. The TID was going to be split between the client and the access layer. The access layer would use the ib_mad_agent to set its portion of the TID on outgoing requests. This allows it to route received responses to the proper client, but ensures that MADs sent from multiple clients do not conflict on TIDs. > ib_mad_send_wr.sg_list indicates first entry must reference a data > buffer of 256 bytes. Is the "base" RMPP header in it ? Which fields > must be filled in by the client (RMPPActive, RRespTime, and length) ? > Will length be in the first segment only (total length) or also in the > last segment ? Because the spec positioned the RMPP header in the middle of user-data (i.e. after the standard MAD header), I think that the most efficient way to handle sending a MAD is for the client to hand the access layer a buffer that contains both the MAD header and RMPP header. For most MADs, this results in a send work request that uses a single sg-entry. If we agree on this, then the intent here is that we don't want the user handing the access layer the RMPP header split across multiple data buffers. The real restriction is that the first sg-entry should reference both the MAD and RMPP headers. I think that the user only needs to set RRespTime and RMPPActive flag in the RMPP header if using RMPP. If RMPP is not used, they should set all fields to 0. > On the receive side, we need to handle either if we have > to deal with non OpenIB implementations :-( On the receive side, we control the data buffers, so this isn't an issue. We should just post receive buffers of 256 + sizeof(grh). > What about subsequent entries in the s-g list for send ? Are they also > constrained to be 256 bytes or something else ? I would presume > RMPP would rewrite the RMPP header based on the first header and > update the appropriate fields. We can be as flexible or restrictive as we want to be, I think. My request (based on feedback) is that we try to minimize the need to perform any data copies. > Is timeout_ms used for Ttime when it is a RMPP send ? timeout_ms applies to sends, whereas Ttime applies to the receiver. We could use the default of 40 seconds as mentioned in the spec, but this seems high to me. For a received response, timeout_ms should work fine. > I am still wondering about the RMPP direction switch (IsDS) and whether > this needs to be exposed somehow. I don't think that it does. I don't think anything like this was needed in the sourceforge stack, and the proposed GSI implementation uses the sourceforge RMPP code. I think that we just need to know if a send requires segmentation, or if a receive requires reassembly. From mshefty at ichips.intel.com Fri Aug 6 11:09:30 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 6 Aug 2004 11:09:30 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <1091751208.1366.205.camel@localhost.localdomain> References: <1091711028.3605.295.camel@localhost.localdomain> <20040805084033.7dc26c7a.mshefty@ichips.intel.com> <1091741820.1911.134.camel@localhost.localdomain> <20040805135659.6b27105c.mshefty@ichips.intel.com> <1091751208.1366.205.camel@localhost.localdomain> Message-ID: <20040806110930.32db6f7d.mshefty@ichips.intel.com> On Thu, 05 Aug 2004 20:13:26 -0400 Hal Rosenstock wrote: > Are you talking about a client with multiple QPs and causing the > requester to alternate amongst them via some (client) load balancing > algorithm ? As an example: I think that a CM on node 1 should be able to redirect connection request messages from node 2 to QP 23, and messages from node 3 to QP 34. The CM should be responsible for allocating the QPs, sizing them appropriately, allocating the CQs, and controlling the processing that occurs on the redirected QPs/CQs. > I don't think this by itself forces redirection above the API. I think > it can still be handled transparently inside the GSI. Trying to handle this "transparently" in the GSI puts too much policy in the GSI. I think that it's necessary for the clients to manage their QPs, not the access layer. > I think this depends on what the meaning of calling qp_redir is/should > be. I think we still have differing assumptions about this. I think that our assumptions differ. My assumption is that qp_redir essentially just creates an ib_mad_agent structure, so that a call to ib_mad_post_send can post a work request to the proper QP. > > The client would be responsible for caching the redirection. > > This seems like a commonality which can be pushed down into the access > layer and makes more sense when the access layer is handling the > retransmissions on redirect. If it isn't, it makes less sense. Sending to a redirected QP involves setting the remote_qpn, remote_qkey, and pkey_index in the ib_mad_send_wr to different values. It's trivial for a client to set this. > So back to that issue: if the access layer is already matching responses > to requests (when timeout is enabled), I don't see why it doesn't handle > the retransmission on receipt of redirect so each client doesn't need to > implement this. Currently, the access layer is not retransmitting MADs outside of RMPP. When a response comes in, both the request and response may be handed to the user. If the response is a redirect, the client can cache the redirection information and re-issue the request. All future requests sent by that client can now automatically go to the correct location without the GSI having to snoop the destination, lookup whether that destination has been redirected, and modifying the outbound MAD. Are you concerned about duplicating the management of a redirection table? I think that redirection tables must be maintained per client. Would adding support for redirection table management remove your concerns? From roland at topspin.com Fri Aug 6 11:33:15 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 06 Aug 2004 11:33:15 -0700 Subject: [openib-general] [PATCH] kill build warning in ipoib_arp.c In-Reply-To: <1091723983.10131.7.camel@duffman> (Tom Duffy's message of "Thu, 05 Aug 2004 09:39:43 -0700") References: <1091723983.10131.7.camel@duffman> Message-ID: <52oelodslw.fsf@topspin.com> thanks, applied. - R. From roland at topspin.com Fri Aug 6 11:39:56 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 06 Aug 2004 11:39:56 -0700 Subject: [openib-general] [PATCH] IPoIB Building (gen2) In-Reply-To: <001801c47bca$1003a0e0$6401a8c0@comcast.net> (Hal Rosenstock's message of "Fri, 06 Aug 2004 11:28:43 -0400") References: <1091795650.1805.1.camel@localhost.localdomain> <52zn58e19l.fsf@topspin.com> <001801c47bca$1003a0e0$6401a8c0@comcast.net> Message-ID: <528ycsdsar.fsf@topspin.com> How about something like this: Index: src/linux-kernel/infiniband/ulp/Kconfig =================================================================== --- src/linux-kernel/infiniband/ulp/Kconfig (revision 576) +++ src/linux-kernel/infiniband/ulp/Kconfig (working copy) @@ -8,6 +8,9 @@ InfiniBand partitions (P_Keys) you will need the ipoibcfg utility from . + The IPoIB protocol is defined by the IETF ipoib working + group: . + config INFINIBAND_SDP tristate "Sockets Direct Protocol" depends on INFINIBAND && INFINIBAND_IPOIB From roland at topspin.com Fri Aug 6 11:35:20 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 06 Aug 2004 11:35:20 -0700 Subject: [openib-general] [PATCH] get rid of some more typedefs in ip2pr_export.h In-Reply-To: <1091652882.8342.11.camel@duffman> (Tom Duffy's message of "Wed, 04 Aug 2004 13:54:42 -0700") References: <1091652882.8342.11.camel@duffman> Message-ID: <52k6wcdsif.fsf@topspin.com> thanks, applied to my branch. From roland at topspin.com Fri Aug 6 11:35:23 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 06 Aug 2004 11:35:23 -0700 Subject: [openib-general] [PATCH] cleanup typedefs in ip2pr_proc.h In-Reply-To: <1091654252.8342.15.camel@duffman> (Tom Duffy's message of "Wed, 04 Aug 2004 14:17:32 -0700") References: <1091654252.8342.15.camel@duffman> Message-ID: <52fz70dsic.fsf@topspin.com> thanks, applied to my branch. From halr at voltaire.com Fri Aug 6 13:03:48 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 06 Aug 2004 16:03:48 -0400 Subject: [openib-general] [PATCH] IPoIB Building (gen2) In-Reply-To: <528ycsdsar.fsf@topspin.com> References: <1091795650.1805.1.camel@localhost.localdomain> <52zn58e19l.fsf@topspin.com> <001801c47bca$1003a0e0$6401a8c0@comcast.net> <528ycsdsar.fsf@topspin.com> Message-ID: <1091822631.1814.3.camel@localhost.localdomain> On Fri, 2004-08-06 at 14:39, Roland Dreier wrote: > How about something like this: > + The IPoIB protocol is defined by the IETF ipoib working > + group: . That should make it totally clear. -- Hal From Tom.Duffy at Sun.COM Fri Aug 6 14:04:43 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 06 Aug 2004 14:04:43 -0700 Subject: [openib-general] [PATCH] remove the tSTR typedef Message-ID: <1091826283.22091.42.camel@localhost> This patch removes all the uses of tSTR from Roland's branch. Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/ulp/dapl/khash.c =================================================================== --- drivers/infiniband/ulp/dapl/khash.c (revision 594) +++ drivers/infiniband/ulp/dapl/khash.c (working copy) @@ -348,7 +348,7 @@ tINT32 DaplHashTableDump (DAPL_HASH_TABLE table, DAPL_HASH_DUMP_FUNC dfunc, - tSTR buffer, tINT32 max_size, tINT32 start, tINT32 * end) { + char *buffer, tINT32 max_size, tINT32 start, tINT32 * end) { DAPL_HASH_BUCKET bucket; tINT32 offset = 0; tINT32 elements; Index: drivers/infiniband/ulp/dapl/khash.h =================================================================== --- drivers/infiniband/ulp/dapl/khash.h (revision 594) +++ drivers/infiniband/ulp/dapl/khash.h (working copy) @@ -39,7 +39,7 @@ * bytes written into buffer, a negative return means that data will not * fit into max_size bytes. */ -typedef tINT32(*DAPL_HASH_DUMP_FUNC) (tSTR buffer, +typedef tINT32(*DAPL_HASH_DUMP_FUNC) (char *buffer, tINT32 max_size, char *key, tUINT32 value); /* @@ -77,7 +77,7 @@ tINT32 DaplHashTableDump(DAPL_HASH_TABLE table, DAPL_HASH_DUMP_FUNC dfunc, - tSTR buffer, + char *buffer, tINT32 max_size, tINT32 start, tINT32 * end); #endif /* _KHASH_H */ Index: drivers/infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_link.c (revision 594) +++ drivers/infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -410,7 +410,7 @@ /* ========================================================================= */ /*..tsIp2prPathElementTableDump - dump the path record element table to proc */ -s32 tsIp2prPathElementTableDump(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prPathElementTableDump(char *buffer, s32 max_size, s32 start_index, long *end_index) { struct ip2pr_path_element *path_elmt; @@ -480,7 +480,7 @@ /* ========================================================================= */ /*..tsIp2prIpoibWaitTableDump - dump the address resolution wait table to proc */ s32 -tsIp2prIpoibWaitTableDump(tSTR buffer, s32 max_size, s32 start_index, +tsIp2prIpoibWaitTableDump(char *buffer, s32 max_size, s32 start_index, long *end_index) { struct ip2pr_ipoib_wait *ipoib_wait; @@ -538,7 +538,7 @@ } /* tsIp2prIpoibWaitTableDump */ /* ..tsIp2prProcReadInt. dump integer value to /proc file */ -s32 tsIp2prProcReadInt(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcReadInt(char *buffer, s32 max_size, s32 start_index, long *end_index, int val) { s32 offset = 0; @@ -554,7 +554,7 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcRetriesRead(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcRetriesRead(char *buffer, s32 max_size, s32 start_index, long *end_index) { @@ -565,7 +565,7 @@ } /* ..tsIp2prProcTimeoutRead. dump current timeout value */ -s32 tsIp2prProcTimeoutRead(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcTimeoutRead(char *buffer, s32 max_size, s32 start_index, long *end_index) { @@ -576,7 +576,7 @@ } /* ..tsIp2prProcBackoutRead. dump current backout value */ -s32 tsIp2prProcBackoffRead(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcBackoffRead(char *buffer, s32 max_size, s32 start_index, long *end_index) { @@ -587,7 +587,7 @@ } /* ..tsIp2prProcCacheTimeoutRead. dump current cache timeout value */ -s32 tsIp2prProcCacheTimeoutRead(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcCacheTimeoutRead(char *buffer, s32 max_size, s32 start_index, long *end_index) { @@ -598,7 +598,7 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcTotalReq(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcTotalReq(char *buffer, s32 max_size, s32 start_index, long *end_index) { @@ -608,7 +608,7 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcArpTimeout(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcArpTimeout(char *buffer, s32 max_size, s32 start_index, long *end_index) { @@ -618,7 +618,7 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcPathTimeout(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcPathTimeout(char *buffer, s32 max_size, s32 start_index, long *end_index) { @@ -628,7 +628,7 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcTotalFail(tSTR buffer, s32 max_size, s32 start_index, +s32 tsIp2prProcTotalFail(char *buffer, s32 max_size, s32 start_index, long *end_index) { Index: drivers/infiniband/ulp/ipoib/ip2pr_proc.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_proc.c (revision 594) +++ drivers/infiniband/ulp/ipoib/ip2pr_proc.c (working copy) @@ -26,27 +26,27 @@ static const char _dir_name_root[] = TS_IP2PR_PROC_DIR_NAME; static struct proc_dir_entry *_dir_root = NULL; -extern s32 tsIp2prPathElementTableDump(tSTR buffer, +extern s32 tsIp2prPathElementTableDump(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern s32 tsIp2prIpoibWaitTableDump(tSTR buffer, +extern s32 tsIp2prIpoibWaitTableDump(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern s32 tsIp2prProcRetriesRead(tSTR buffer, +extern s32 tsIp2prProcRetriesRead(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern s32 tsIp2prProcTimeoutRead(tSTR buffer, +extern s32 tsIp2prProcTimeoutRead(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern s32 tsIp2prProcBackoffRead(tSTR buffer, +extern s32 tsIp2prProcBackoffRead(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern s32 tsIp2prProcCacheTimeoutRead(tSTR buffer, +extern s32 tsIp2prProcCacheTimeoutRead(char *buffer, s32 max_size, s32 start_index, long *end_index); @@ -67,19 +67,19 @@ unsigned long count, void *pos); -extern int tsIp2prProcTotalReq(tSTR buffer, +extern int tsIp2prProcTotalReq(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern int tsIp2prProcArpTimeout(tSTR buffer, +extern int tsIp2prProcArpTimeout(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern int tsIp2prProcPathTimeout(tSTR buffer, +extern int tsIp2prProcPathTimeout(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern int tsIp2prProcTotalFail(tSTR buffer, +extern int tsIp2prProcTotalFail(char *buffer, s32 max_size, s32 start_index, long *end_index); @@ -213,11 +213,11 @@ * XXX still need to check this: * validate some assumptions the write parser will be making. */ - if (0 && sizeof(s32) != sizeof(tSTR)) { + if (0 && sizeof(s32) != sizeof(char *)) { TS_TRACE(MOD_IP2PR, T_TERSE, TRACE_FLOW_FATAL, "PROC: integers and pointers of a different size. <%d:%d>", - sizeof(s32), sizeof(tSTR)); + sizeof(s32), sizeof(char *)); return -EFAULT; } /* if */ Index: drivers/infiniband/ulp/ipoib/ip2pr_proc.h =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_proc.h (revision 594) +++ drivers/infiniband/ulp/ipoib/ip2pr_proc.h (working copy) @@ -30,13 +30,13 @@ /* --------------------------------------------------------------------- */ /* read callback prototype. */ /* --------------------------------------------------------------------- */ -typedef s32(*tIP2PR_PROC_READ_CB_FUNC) (tSTR buffer, +typedef s32(*tIP2PR_PROC_READ_CB_FUNC) (char *buffer, s32 max_size, s32 start, long *end); struct ip2pr_proc_sub_entry { - tSTR name; + char *name; s32 type; struct proc_dir_entry *entry; tIP2PR_PROC_READ_CB_FUNC read; @@ -74,14 +74,14 @@ s16 type; union { s32 i; - tSTR s; + char *s; } value; }; struct ip2pr_proc_entry_parse { s16 id; s16 type; - tSTR value; + char *value; }; #endif /* _TS_IP2PR_PROC_H */ Index: drivers/infiniband/ulp/sdp/sdp_proc.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_proc.c (revision 594) +++ drivers/infiniband/ulp/sdp/sdp_proc.c (working copy) @@ -352,11 +352,11 @@ * XXX still need to check this: * validate some assumptions the write parser will be making. */ - if (0 && sizeof(tINT32) != sizeof(tSTR)) { + if (0 && sizeof(tINT32) != sizeof(char *)) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, "PROC: integers and pointers of a different size. <%d:%d>", - sizeof(tINT32), sizeof(tSTR)); + sizeof(tINT32), sizeof(char *)); return -EFAULT; } Index: drivers/infiniband/include/ib_legacy_types.h =================================================================== --- drivers/infiniband/include/ib_legacy_types.h (revision 594) +++ drivers/infiniband/include/ib_legacy_types.h (working copy) @@ -195,7 +195,6 @@ typedef tUINT8 tPORT; typedef void * tPTR; typedef const void * tCONST_PTR; -typedef char * tSTR; typedef const char * tCONST_STR; typedef tUINT32 tIFINDEX; From jdaley at systemfabricworks.com Fri Aug 6 14:07:33 2004 From: jdaley at systemfabricworks.com (Jan Daley) Date: Fri, 6 Aug 2004 16:07:33 -0500 Subject: [openib-general] [PATCH] OpenSM Assigning Duplicate LIDs Message-ID: <003a01c47bf9$67c6ee20$6b01a8c0@maverick> Hi, I ran across a scenario in which the SM would assign the same lid to multiple nodes. The problem occurs when two nodes that have the same lid are placed into a subnet that doesn't have an SM running. When OpenSM is started, it preserves the existing lids without checking for duplicates. Index: opensm/osm_lid_mgr.c =================================================================== --- opensm/osm_lid_mgr.c (revision 590) +++ opensm/osm_lid_mgr.c (working copy) @@ -878,6 +878,7 @@ osm_lid_mgr_t* const p_mgr = (osm_lid_mgr_t*)context; osm_physp_t* p_physp; cl_ptr_vector_t* p_tbl; + osm_port_t* temp_port; OSM_LOG_ENTER( p_mgr->p_log, __osm_lid_mgr_process_foreign ); @@ -912,19 +913,33 @@ min_lid_ho, max_lid_ho ); } - /* - Place this port into the port ptr vector. - And update the PortInfo attribute template. - */ - for( lid_ho = min_lid_ho; lid_ho <= max_lid_ho; lid_ho++ ) - cl_ptr_vector_set( p_tbl, lid_ho, p_port ); + if (CL_SUCCESS == cl_ptr_vector_at(p_tbl, min_lid_ho, (void*)&temp_port) + && NULL != temp_port) + { + /* + If something is already there, we need to find a new + lid range for this port. Process it like it is unassigned. + */ + __osm_lid_mgr_process_unassigned(p_object, context); + } + else + { + /* + Place this port into the port ptr vector. + And update the PortInfo attribute template. + */ + for( lid_ho = min_lid_ho; lid_ho <= max_lid_ho; lid_ho++ ) + { + cl_ptr_vector_set( p_tbl, lid_ho, p_port ); + } - /* - Set the PortInfo for the Physical Port associated - with this Port. - */ - p_physp = osm_port_get_default_phys_ptr( p_port ); - __osm_lid_mgr_set_physp_pi( p_mgr, p_physp, cl_hton16( min_lid_ho ) ); + /* + Set the PortInfo for the Physical Port associated + with this Port. + */ + p_physp = osm_port_get_default_phys_ptr( p_port ); + __osm_lid_mgr_set_physp_pi( p_mgr, p_physp, cl_hton16( min_lid_ho ) ); + } OSM_LOG_EXIT( p_mgr->p_log ); } Jan Daley System Fabric Works (512) 343-6101 x 13 From jdaley at systemfabricworks.com Fri Aug 6 14:08:41 2004 From: jdaley at systemfabricworks.com (Jan Daley) Date: Fri, 6 Aug 2004 16:08:41 -0500 Subject: [openib-general] [PATCH] OpenSM - Clear IsSM bit when shutting down Message-ID: <003b01c47bf9$90029ce0$6b01a8c0@maverick> Hi, This patch is to clear the PortInfo:CapabilityMask:IsSM bit on shutdown. A SM that is brought up on a different node later on will do repeated SubnGet(SMInfo) that will just timeout. Index: opensm/osm_vendor_mlx.c =================================================================== --- opensm/osm_vendor_mlx.c (revision 590) +++ opensm/osm_vendor_mlx.c (working copy) @@ -674,6 +674,43 @@ } /* + * NAME __osm_vendor_clear_sm + * + * DESCRIPTION Modifies the port info for the bound port to clear the "IS_SM" bit. + */ +static void +__osm_vendor_clear_sm( IN osm_bind_handle_t h_bind ) +{ + osmv_bind_obj_t *p_bo = ( osmv_bind_obj_t * ) h_bind; + osm_vendor_t const *p_vend = p_bo->p_vendor; + VAPI_ret_t status; + VAPI_hca_attr_t attr_mod; + VAPI_hca_attr_mask_t attr_mask; + + OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); + + cl_memclr( &attr_mod, sizeof( attr_mod ) ); + cl_memclr( &attr_mask, sizeof( attr_mask ) ); + + attr_mod.is_sm = FALSE; + attr_mask = HCA_ATTR_IS_SM; + + status = + VAPI_modify_hca_attr( p_bo->hca_hndl, p_bo->port_num, &attr_mod, + &attr_mask ); + if ( status != VAPI_OK ) + { + osm_log( p_vend->p_log, OSM_LOG_ERROR, + "osm_vendor_set_sm: ERR 5012: " + "Unable to clear 'IS_SM' bit in port attributes (%d).\n", + status ); + } + + OSM_LOG_EXIT( p_vend->p_log ); +} + + +/* * NAME __osm_vendor_internal_unbind * * DESCRIPTION Destroying a bind: @@ -689,6 +726,8 @@ OSM_LOG_ENTER(p_log,__osm_vendor_internal_unbind); + __osm_vendor_clear_sm(h_bind); + /* "notifying" all that from now on no new sends can be done */ osmv_txn_lock(p_bo); p_bo->is_closing = TRUE; Jan Daley System Fabric Works (512) 343-6101 x 13 From jdaley at systemfabricworks.com Fri Aug 6 14:08:51 2004 From: jdaley at systemfabricworks.com (Jan Daley) Date: Fri, 6 Aug 2004 16:08:51 -0500 Subject: [openib-general] [PATCH] OpenSM - Initializing a spinlock twice Message-ID: <003c01c47bf9$9663e9e0$6b01a8c0@maverick> Removing the second initialization of the spinlock. Index: opensm/cl_event_wheel.c =================================================================== --- opensm/cl_event_wheel.c (revision 590) +++ opensm/cl_event_wheel.c (working copy) @@ -249,7 +249,6 @@ CL_ASSERT( cl_spinlock_init( &(p_event_wheel->lock) ) == CL_SUCCESS ); cl_qlist_init( &p_event_wheel->events_wheel); cl_qmap_init( &p_event_wheel->events_map ); - cl_spinlock_init( &p_event_wheel->lock ); /* init the timer with timeout */ cl_status = cl_timer_init(&p_event_wheel->timer, Jan Daley System Fabric Works (512) 343-6101 x 13 From jdaley at systemfabricworks.com Fri Aug 6 14:09:05 2004 From: jdaley at systemfabricworks.com (Jan Daley) Date: Fri, 6 Aug 2004 16:09:05 -0500 Subject: [openib-general] [PATCH] OpenSM - SA Client Not Detecting SM Change Message-ID: <003d01c47bf9$9e76de80$6b01a8c0@maverick> Hi, The SM's lid was being saved off in the bind call and used for all subsequent SA queries. If a different SM became master, all queries would fail until a rebind occurred. The change is to not save off the SM's lid on the bind and to query the port for the SM's lid on the send. Also, fixed a memory leak in __osmv_get_lid_and_sm_lid_by_port_guid. Index: opensm/osm_vendor_mlx_sa.c =================================================================== --- opensm/osm_vendor_mlx_sa.c (revision 590) +++ opensm/osm_vendor_mlx_sa.c (working copy) @@ -87,8 +87,6 @@ osm_mad_pool_t *p_mad_pool; uint64_t port_guid; cl_event_t sync_event; - uint16_t lid; - uint16_t sm_lid; } osmv_sa_bind_info_t; /*********************************************************************** ****** @@ -317,6 +315,8 @@ } } + cl_free(p_attr_array); + Exit: return ( status ); } @@ -332,7 +332,6 @@ { osm_bind_info_t bind_info; osm_log_t *p_log = p_vend->p_log; - ib_api_status_t status = IB_SUCCESS; osmv_sa_bind_info_t *p_sa_bind_info; cl_status_t cl_status; @@ -368,6 +367,7 @@ p_sa_bind_info->p_log = p_log; p_sa_bind_info->p_mad_pool = p_mad_pool; p_sa_bind_info->p_vendor = p_vend; + p_sa_bind_info->port_guid = port_guid; /* Bind to the lower level */ p_sa_bind_info->h_bind = @@ -388,22 +388,6 @@ goto Exit; } - /* obtain the sm_lid from the vendor */ - status = - __osmv_get_lid_and_sm_lid_by_port_guid( - p_vend, port_guid, - &p_sa_bind_info->lid, - &p_sa_bind_info->sm_lid); - if (status != IB_SUCCESS) - { - cl_free(p_sa_bind_info); - p_sa_bind_info = OSM_BIND_INVALID_HANDLE; - osm_log( p_log, OSM_LOG_ERROR, - "osm_vendor_bind_sa: ERR 0507: " - "Fail to obtain the sm lid.\n" ); - goto Exit; - } - /* initialize the sync_event */ cl_event_construct( &p_sa_bind_info->sync_event ); cl_status = cl_event_init( &p_sa_bind_info->sync_event, TRUE ); @@ -480,9 +464,25 @@ static atomic32_t trans_id; boolean_t sync; osmv_query_req_t *p_query_req_copy; + uint16_t local_lid; + uint16_t sm_lid; OSM_LOG_ENTER( p_log, __osmv_send_sa_req ); + status = __osmv_get_lid_and_sm_lid_by_port_guid( + p_bind->p_vendor, + p_bind->port_guid, + &local_lid, + &sm_lid); + + if (IB_SUCCESS != status) + { + osm_log( p_log, OSM_LOG_ERROR, + "__osmv_send_sa_req: ERR 1103: " + "Unable to get SM's LID.\n" ); + goto Exit; + } + /* Get a MAD wrapper for the send */ p_madw = osm_mad_pool_get( p_bind->p_mad_pool, @@ -535,9 +535,13 @@ /* Provide the address to send to */ - p_madw->mad_addr.dest_lid = cl_hton16(p_bind->sm_lid); + + __osmv_get_lid_and_sm_lid_by_port_guid(p_bind->p_vendor, p_bind->port_guid, + &local_lid, &sm_lid); + + p_madw->mad_addr.dest_lid = cl_hton16(sm_lid); p_madw->mad_addr.addr_type.smi.source_lid = - cl_hton16(p_bind->lid); + cl_hton16(local_lid); p_madw->mad_addr.addr_type.gsi.remote_qp = CL_HTON32(1); p_madw->resp_expected = TRUE; p_madw->fail_msg = CL_DISP_MSGID_NONE; Jan Daley System Fabric Works (512) 343-6101 x 13 From jdaley at systemfabricworks.com Fri Aug 6 14:09:20 2004 From: jdaley at systemfabricworks.com (Jan Daley) Date: Fri, 6 Aug 2004 16:09:20 -0500 Subject: [openib-general] [PATCH] OpenSM - Multiple Initializations of objects on startup Message-ID: <003e01c47bf9$a730e890$6b01a8c0@maverick> Hi, osm_subn_construct and cl_map_init(&(p_subn->opt.port_pro_ignore_guids), 10) are being called twice. This causes some memory leaks. Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 590) +++ opensm/osm_subnet.c (working copy) @@ -112,7 +112,6 @@ cl_qmap_init( &p_subn->rtr_guid_tbl ); cl_qmap_init( &p_subn->prtn_pkey_tbl ); cl_qmap_init( &p_subn->mgrp_mlid_tbl ); - cl_map_init(&(p_subn->opt.port_pro_ignore_guids), 10); cl_list_construct( &p_subn->new_ports_list ); cl_list_init( &p_subn->new_ports_list, 10 ); } @@ -200,8 +199,6 @@ { cl_status_t status; - osm_subn_construct( p_subn ); - status = cl_ptr_vector_init( &p_subn->node_lid_tbl, OSM_SUBNET_VECTOR_MIN_SIZE, OSM_SUBNET_VECTOR_GROW_SIZE ); Jan Daley System Fabric Works (512) 343-6101 x 13 From jdaley at systemfabricworks.com Fri Aug 6 14:09:34 2004 From: jdaley at systemfabricworks.com (Jan Daley) Date: Fri, 6 Aug 2004 16:09:34 -0500 Subject: [openib-general] [PATCH] Osmtest - Byte Swapping issue and resource cleanup Message-ID: <003f01c47bf9$b1ad8a30$6b01a8c0@maverick> Hi, 1) In osmtest_stress_port_recs_small, the lid passed into the call to osmtest_get_port_rec wasn't being byte swapped. This caused all portinfo records to be retrieved instead of just one. 2) I added a couple of calls to cleanup qpools that were created. Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 590) +++ osmtest/osmtest.c (working copy) @@ -479,6 +479,8 @@ osm_vendor_delete( &p_osmt->p_vendor ); } + cl_qpool_destroy(&p_osmt->port_pool); + cl_qpool_destroy(&p_osmt->node_pool); osm_log_destroy( &p_osmt->log ); } @@ -1308,7 +1310,7 @@ /* * Do a blocking query for our own PortRecord in the subnet. */ - status = osmtest_get_port_rec( p_osmt, p_osmt->local_port.lid, &context ); + status = osmtest_get_port_rec( p_osmt, cl_hton16(p_osmt->local_port.lid), &context ); if( status != IB_SUCCESS ) { Jan Daley System Fabric Works (512) 343-6101 x 13 From yaronh at voltaire.com Fri Aug 6 15:09:58 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Sat, 7 Aug 2004 01:09:58 +0300 Subject: [openib-general] Some ib_mad.h Redirection Comments Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AC62@taurus.voltaire.com> On Friday, August 06, 2004 9:10 PM, Sean Hefty wrote: > On Thu, 05 Aug 2004 20:13:26 -0400 > Hal Rosenstock wrote: > >> I don't think this by itself forces redirection above the API. I >> think it can still be handled transparently inside the GSI. > > Trying to handle this "transparently" in the GSI puts too much policy > in the GSI. I think that it's necessary for the clients to manage > their QPs, not the access layer. I think Hal is focused on the Request side any Sean on the Response (Server) side I'm not sure there is argument that the Server side owns the QP's (other than QP1 owned by the GSI) >>> The client would be responsible for caching the redirection. >> >> This seems like a commonality which can be pushed down into the >> access layer and makes more sense when the access layer is handling >> the retransmissions on redirect. If it isn't, it makes less sense. > > Sending to a redirected QP involves setting the remote_qpn, > remote_qkey, and pkey_index in the ib_mad_send_wr to different > values. It's trivial for a client to set this. I don't see what is the value in having every potential GSI client implement a common functionality of identifying it's a Redirect, and resending the request to the new location, why not just have the GSI layer do that common functionality for all the consumers ? I think it is also trivial for the GSI to implement such functionality > >> So back to that issue: if the access layer is already matching >> responses to requests (when timeout is enabled), I don't see why it >> doesn't handle the retransmission on receipt of redirect so each >> client doesn't need to implement this. > > Currently, the access layer is not retransmitting MADs outside of > RMPP. When a response comes in, both the request and response may be > handed to the user. If the response is a redirect, the client can > cache the redirection information and re-issue the request. All > future requests sent by that client can now automatically go to the > correct location without the GSI having to snoop the destination, > lookup whether that destination has been redirected, and modifying > the outbound MAD. This doesn't handle cases were there are multiple clients that access the same remote service (e.g. multiple SA clients), if the GSI layer implement the caching, all clients can benefit from it, not to mention having a central implementation. If your main concern is CM (that has a single client per node, multiple remote servers, and some different behavior) we can treat it as an exception, I'm more focused at things such as distributed SA in such cases we can cache/lookup based on a class first > > Are you concerned about duplicating the management of a redirection > table? I think that redirection tables must be maintained per > client. Would adding support for redirection table management remove > your concerns? I thing we are starting to complicate something that was supposed to be trivial If the Requestor side redirect handling is done at the GSI level it makes the Requestor code trivial: Just build the proper MAD, send it, and parse the result (or retry) after the callback And not have any potential consumer deal with the same cases and complicate its code. From Tom.Duffy at Sun.COM Fri Aug 6 15:33:09 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 06 Aug 2004 15:33:09 -0700 Subject: [openib-general] OpenSM on mthca Message-ID: <1091831588.2524.5.camel@localhost> So, one of the things we talked about at Linuxworld in our miniBOF was getting OpenSM working on top of mthca. This will be needed for any community member to begin to contribute as they are very unlikely to shell out big bucks for an IB switch with builtin SM. I have only just began to look into it. What is it going to take to do this port? Is it possible with the current mthca driver? Thanks, -tduffy From roland at topspin.com Fri Aug 6 15:51:31 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 06 Aug 2004 15:51:31 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AC62@taurus.voltaire.com> (Yaron Haviv's message of "Sat, 7 Aug 2004 01:09:58 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AC62@taurus.voltaire.com> Message-ID: <52ekmjdgng.fsf@topspin.com> Yaron> I don't see what is the value in having every potential GSI Yaron> client implement a common functionality of identifying it's Yaron> a Redirect, and resending the request to the new location, Yaron> why not just have the GSI layer do that common Yaron> functionality for all the consumers ? I think it is also Yaron> trivial for the GSI to implement such functionality I think you're underestimating the amount of policy and complication involved in putting redirect handling in the common layer. First of all the GSI layer would have to maintain a table indexed by (at least) class, destination and source address (since redirects may depend on the requester). Then we have the problem of what to do if a redirected request times out -- should the common layer then automatically try resending the request to the original address? - Roland From roland at topspin.com Fri Aug 6 17:03:22 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 06 Aug 2004 17:03:22 -0700 Subject: [openib-general] OpenSM on mthca In-Reply-To: <1091831588.2524.5.camel@localhost> (Tom Duffy's message of "Fri, 06 Aug 2004 15:33:09 -0700") References: <1091831588.2524.5.camel@localhost> Message-ID: <52acx7ddbp.fsf@topspin.com> Tom> I have only just began to look into it. What is it going to Tom> take to do this port? Is it possible with the current mthca Tom> driver? I think all the VAPI calls from osm_vendor_XXX.[ch] need to be removed. It's not clear to me how close osm_vendor_mlx_ts_anafa.c is to what we would need. It's definitely possible on top of the current mthca driver, since sending and receiving MADs is all that is required. In any case all this MAD/GSI discussion is going to lead to a change in the userspace interface (not to mention the requirement of being 32/64 clean) so I wouldn't want to spend too much effort getting opensm working on the current tree. - Roland From halr at voltaire.com Sat Aug 7 08:06:16 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 07 Aug 2004 11:06:16 -0400 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <20040806101744.2ba901c5.mshefty@ichips.intel.com> References: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> <20040805124542.55fec6f9.mshefty@ichips.intel.com> <1091753075.1911.233.camel@localhost.localdomain> <20040806101744.2ba901c5.mshefty@ichips.intel.com> Message-ID: <1091891178.1120.18.camel@localhost.localdomain> On Fri, 2004-08-06 at 13:17, Sean Hefty wrote: > > ib_mad_send_wr.sg_list indicates first entry must reference a data > > buffer of 256 bytes. Is the "base" RMPP header in it ? Which fields > > must be filled in by the client (RMPPActive, RRespTime, and length) ? > > Will length be in the first segment only (total length) or also in the > > last segment ? > > Because the spec positioned the RMPP header in the middle of user-data (i.e. after the standard MAD header), > I think that the most efficient way to handle sending a MAD is for the client to hand the access layer a buffer > that contains both the MAD header and RMPP header. For most MADs, > this results in a send work request that uses a single sg-entry. I'm unconvinced about so called "zero copy" RMPP. Someone has to do the fragmentation/reassembly. Seems to me that should be hidden by the access layer rather than exposed to the consumer. I think this is the fundamental issue to resolve for RMPP. > If we agree on this, then the intent here is that we don't want the user handing the access layer the RMPP header > split across multiple data buffers. The real restriction is that the first sg-entry should reference both the > MAD and RMPP headers. I think that the user only needs to set RRespTime and RMPPActive flag in the RMPP header > if using RMPP. If RMPP is not used, they should set all fields to 0. Not sure the consumer should need to set all fields to 0 when RMPPActive is not set. The access layer might be better to do this to be sure. > > On the receive side, we need to handle either if we have > > to deal with non OpenIB implementations :-( > > On the receive side, we control the data buffers, so this isn't an issue. > We should just post receive buffers of 256 + sizeof(grh). I was thinking about the model where RMPP performs the coalescing on the receive side in which case I think this helps as the segments can be copied and reused sooner. > > What about subsequent entries in the s-g list for send ? Are they also > > constrained to be 256 bytes or something else ? I would presume > > RMPP would rewrite the RMPP header based on the first header and > > update the appropriate fields. > > We can be as flexible or restrictive as we want to be, I think. > My request (based on feedback) is that we try to minimize the need > to perform any data copies. OK assuming this model is being used. > > Is timeout_ms used for Ttime when it is a RMPP send ? > > timeout_ms applies to sends, whereas Ttime applies to the receiver. Right. > We could use the default of 40 seconds as mentioned in the spec, but this seems high to me. Yes, that calculation is based on a set of assumptions which are documented in the spec. While it is easier to use some hard coded value rather than a dynamically calculated one, it also lends to longer timeouts when a RMPP packet is dropped somewhere. > For a received response, timeout_ms should work fine. > > > I am still wondering about the RMPP direction switch (IsDS) and whether > > this needs to be exposed somehow. > > I don't think that it does. What is used to indicate send only v. send and (RMPP) response expected ? > I don't think anything like this was needed > in the sourceforge stack, and the proposed GSI implementation uses the > sourceforge RMPP code. I think that we just need to know if a send requires segmentation, > or if a receive requires reassembly. Did the SF RMPP use SA GetMulti which is where this is used ? -- Hal From mshefty at ichips.intel.com Sat Aug 7 08:55:49 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Sat, 7 Aug 2004 08:55:49 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <1091891178.1120.18.camel@localhost.localdomain> References: <20040804160213.3e93cb2a.mshefty@ichips.intel.com> <20040805124542.55fec6f9.mshefty@ichips.intel.com> <1091753075.1911.233.camel@localhost.localdomain> <20040806101744.2ba901c5.mshefty@ichips.intel.com> <1091891178.1120.18.camel@localhost.localdomain> Message-ID: <20040807085549.26ad502a.mshefty@ichips.intel.com> On Sat, 07 Aug 2004 11:06:16 -0400 Hal Rosenstock wrote: > I'm unconvinced about so called "zero copy" RMPP. Someone has to do the > fragmentation/reassembly. Seems to me that should be hidden by the > access layer rather than exposed to the consumer. I think this is the > fundamental issue to resolve for RMPP. I think that we can separate segmentation from reassembly. For segmentation, we can definitely do zero-copies. And zero-copy shouldn't be an issue for non-RMPP MADs. For reassembly, this is harder because of how the MAD headers are defined. The standard and RMPP MAD headers are duplicated in every segment. The result is that in order to do zero-copy reassembly, the user needs to get back a chain of buffers. (I'm refering to kernel-only clients here. For user-space, there's no reason not to give the user a reassembled MAD in a single buffer.) I think that the issue here is that the API becomes cludgy, hard to define, and difficult to work with. Plus the data that the user cares about is now sprinkled throughout multiple buffers, but offset into those buffers sizeof(grh) + sizeof(mad header) + sizeof(rmpp header). Based on the API of the original GSI proposal, it appeared that it was trying to provide zero-copy reassembly. I'm open to reassembly requiring a single data copy however. > Not sure the consumer should need to set all fields to 0 when RMPPActive > is not set. The access layer might be better to do this to be sure. I think that the client could do this more efficiently. The access layer would need to do this on every send, whereas the client could do it once for multiple transfers. > I was thinking about the model where RMPP performs the coalescing on the > receive side in which case I think this helps as the segments can be > copied and reused sooner. Something to consider is that the spec permits sending an RMPP packet of unknown length (PayloadLength = 0 in the first segment). This makes it difficult to coalesce into a single buffer when receiving a segment, because the size of the buffer isn't known until the last segment has been received. A benefit of coalescing the data into a single buffer is that it decreases memory use, since we can avoid carrying around the duplicated GRH and MAD/RMPP headers. > Yes, that calculation is based on a set of assumptions which are documented in the spec. > While it is easier to use some hard coded value rather than a dynamically calculated one, > it also lends to longer timeouts when a RMPP packet is dropped somewhere. Here's a problem that I see with the dynamic calcations. The GSI is sitting around when it *receives* the first segment of an RMPP packet. According to the spec, it now has to figure out the PayloadLength (which could be set to 0, in which case it just uses a default), figure out the packet lifetime from the sender to itself, get the packet lifetime from itself to the sender, and know what it's own response time value is going to be (which should be set by the client, not the GSI). By the time the GSI figures all these values out, the RMPP transfer is either going to be done, or have timed out on the sender side... Anyway, I'd like to make the receive timeouts dynamic and client controlled. We just need a good way to do it. > What is used to indicate send only v. send and (RMPP) response expected ? The timeout_ms field in ib_mad_send_wr indicates if a response is expected. When sending, if RMPPActive is set, the send will use RMPP. After the send completes, if timeout_ms is set, then a response is expected. On the received response, if the sender uses RMPP (set when calling ib_mad_reg()), the GSI will look in the RMPP header to see if RMPP is active for the receive. > Did the SF RMPP use SA GetMulti which is where this is used ? The SF RMPP did support GetMulti. It was only class aware in a few cases, such as the CM and trap repress messages. I mentioned this before, but looking at the proposed GSI implementation, it copied the SF RMPP code. From mshefty at ichips.intel.com Sat Aug 7 09:07:57 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Sat, 7 Aug 2004 09:07:57 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AC62@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F18AC62@taurus.voltaire.com> Message-ID: <20040807090757.056217f5.mshefty@ichips.intel.com> On Sat, 7 Aug 2004 01:09:58 +0300 "Yaron Haviv" wrote: > I think Hal is focused on the Request side any Sean on the Response > (Server) side > I'm not sure there is argument that the Server side owns the QP's (other > than QP1 owned by the GSI) I'm focused on both sides. I'm just not sure that whatever is done on the request side necessarily changes the API. If we're in agreement on the server side, then I think we're more than half-way there. > I don't see what is the value in having every potential GSI client > implement a common functionality of identifying it's a Redirect, and > resending the request to the new location, why not just have the GSI > layer do that common functionality for all the consumers ? > I think it is also trivial for the GSI to implement such functionality My general thought is that if something is trivial to implement, just push it up to the clients. However, I'm not sure that redirection is that simple. With a 1000 node fabric, 3 services per node, and everyone redirecting, the redirecting table will be substantial. I need to think about this more and continue discussing it, but I'm becoming convinced that there may be enough work there to justify moving some of this into the access layer. > If your main concern is CM (that has a single client per node, multiple > remote servers, and some different behavior) we can treat it as an > exception, I'm more focused at things such as distributed SA in such > cases we can cache/lookup based on a class first I pick the CM because it's the simplest type of client. It doesn't use RMPP or request/response transfers. I think starting with the CM will ensure that we get the right layering of services. From halr at voltaire.com Sat Aug 7 12:01:11 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 07 Aug 2004 15:01:11 -0400 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed Message-ID: <1091905273.1833.6.camel@localhost.localdomain> On Roland's gen2 branch, the following patch causes CM and DM client to be included in the kernel build only when needed. One example of this is when only building for IPoIB. Index: src/linux-kernel/infiniband/ulp/Kconfig =================================================================== --- src/linux-kernel/infiniband/ulp/Kconfig (revision 595) +++ src/linux-kernel/infiniband/ulp/Kconfig (working copy) @@ -2,7 +2,7 @@ tristate "IP-over-InfiniBand" depends on INFINIBAND && NETDEVICES && INET ---help--- - Support for the IP-over-InfiniBand protocol (IPoIB). This + Support for the IETF IP-over-InfiniBand protocol (IPoIB). This transports IP packets over InfiniBand so you can use your IB device as a fancy NIC. To configure interfaces on multiple InfiniBand partitions (P_Keys) you will need the ipoibcfg @@ -14,6 +14,7 @@ config INFINIBAND_SDP tristate "Sockets Direct Protocol" depends on INFINIBAND && INFINIBAND_IPOIB + select INFINIBAND_CM ---help--- Support for Sockets Direct Protocol (SDP). This provides sockets semantics over InfiniBand via address family @@ -24,7 +25,9 @@ config INFINIBAND_SRP tristate "SCSI RDMA Protocol" depends on INFINIBAND && SCSI - ---help--- + select INFINIBAND_CM + select INFINIBAND_DM_CLIENT + ---help--- Support for SCSI RDMA Protocol (SRP). This transports SCSI commands over InfiniBand and allows you to access storage connected via IB. @@ -33,6 +36,7 @@ tristate "uDAPL helper" depends on (BROKEN || m) && INFINIBAND && INFINIBAND_MELLANOX_HCA select INFINIBAND_USER_CM + select INFINIBAND_CM ---help--- Kernel space helper for uDAPL. Select this and use the uDAPL library from to run uDAPL applications. Index: src/linux-kernel/infiniband/Kconfig =================================================================== --- src/linux-kernel/infiniband/Kconfig (revision 595) +++ src/linux-kernel/infiniband/Kconfig (working copy) @@ -17,6 +17,21 @@ This allows userspace protocols such as uDAPL or MPI to use the kernel's services for establishing connections. +config INFINIBAND_CM + bool "CM" + depends on INFINIBAND + default y if (INFINIBAND_USER_CM || INFINIBAND_SRP || INFINIBAND_SDP) + ---help--- + Kernel communication manager. Needed for connection oriented protocols + such as SRP, SDP, DAPL, or MPI. + +config INFINIBAND_DM_CLIENT + bool "DM client" + depends on INFINIBAND + default y if INFINIBAND_SRP + ---help--- + Device Manager (DM) client. Needed for SRP. + source "drivers/infiniband/ulp/Kconfig" source "drivers/infiniband/hw/Kconfig" Index: src/linux-kernel/infiniband/core/Makefile =================================================================== --- src/linux-kernel/infiniband/core/Makefile (revision 595) +++ src/linux-kernel/infiniband/core/Makefile (working copy) @@ -9,11 +9,19 @@ obj-$(CONFIG_INFINIBAND) += \ ib_core.o \ - ib_mad.o \ - ib_cm.o \ + ib_mad.o +ifdef CONFIG_INFINIBAND_CM +obj-$(CONFIG_INFINIBAND) += \ + ib_cm.o +endif +obj-$(CONFIG_INFINIBAND) += \ ib_client_query.o \ - ib_sa_client.o \ - ib_dm_client.o \ + ib_sa_client.o +ifdef CONFIG_INFINIBAND_DM_CLIENT +obj-$(CONFIG_INFINIBAND) += \ + ib_dm_client.o +endif +obj-$(CONFIG_INFINIBAND) += \ ib_useraccess.o obj-$(CONFIG_INFINIBAND_USER_CM) += \ @@ -54,6 +62,7 @@ mad_proc.o \ mad_export.o +ifdef CONFIG_INFINIBAND_CM ib_cm-objs := \ cm_main.o \ cm_api.o \ @@ -64,6 +73,7 @@ cm_connection_table.o \ cm_service_table.o \ cm_proc.o +endif ib_client_query-objs := \ client_query.o \ @@ -81,6 +91,7 @@ sa_client_export.o \ sa_client_node_info.o +ifdef CONFIG_INFINIBAND_DM_CLIENT ib_dm_client-objs := \ dm_client_main.o \ dm_client_query.o \ @@ -90,6 +101,7 @@ dm_client_ioc_profile.o \ dm_client_svc_entries.o \ dm_client_host.o +endif ib_useraccess-objs := \ useraccess_main.o \ @@ -104,6 +116,7 @@ $(obj)/pm_access.h $(obj)/pm_types.h $(obj)/mad_static.o: $(obj)/smp_access.h $(obj)/smp_types.h + $(ib_cm-objs:%=$(obj)/%): $(obj)/cm_packet.h quiet_cmd_gendecode = GEN $@ From yaronh at voltaire.com Sat Aug 7 12:18:39 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Sat, 7 Aug 2004 22:18:39 +0300 Subject: [openib-general] Some ib_mad.h Redirection Comments Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AC7E@taurus.voltaire.com> On Saturday, August 07, 2004 1:52 AM, Roland Dreier wrote: > Yaron> I don't see what is the value in having every potential GSI > Yaron> client implement a common functionality of identifying it's > Yaron> a Redirect, and resending the request to the new location, > Yaron> why not just have the GSI layer do that common > Yaron> functionality for all the consumers ? I think it is also > Yaron> trivial for the GSI to implement such functionality > > I think you're underestimating the amount of policy and complication > involved in putting redirect handling in the common layer. First of > all the GSI layer would have to maintain a table indexed by (at > least) class, destination and source address (since redirects may > depend on the requester). Then we have the problem of what to do if > a redirected request times out -- should the common layer then > automatically try resending the request to the original address? If you think Redirect on the requestor side is not trivial, that is even a better reason to put it in a common place rather than for every potential client, from what I know it is already supported by the code Hal posted to some degree, so it is achievable as well as we can use it instead of re-inventing a new mechanism Yaron From yaronh at voltaire.com Sat Aug 7 12:29:18 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Sat, 7 Aug 2004 22:29:18 +0300 Subject: [openib-general] Some ib_mad.h Redirection Comments Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AC7F@taurus.voltaire.com> On Saturday, August 07, 2004 7:08 PM, Sean Hefty wrote: > On Sat, 7 Aug 2004 01:09:58 +0300 > "Yaron Haviv" wrote: > > If we're in agreement on the server side, then I think we're more than half-way > there. To me it looks like there is agreement :) >> I don't see what is the value in having every potential GSI client >> implement a common functionality of identifying it's a Redirect, and >> resending the request to the new location, why not just have the GSI >> layer do that common functionality for all the consumers ? >> I think it is also trivial for the GSI to implement such >> functionality > > My general thought is that if something is trivial to implement, just > push it up to the clients. However, I'm not sure that redirection is > that simple. With a 1000 node fabric, 3 services per node, and > everyone redirecting, the redirecting table will be substantial. I > need to think about this more and continue discussing it, but I'm > becoming convinced that there may be enough work there to justify > moving some of this into the access layer. > >> If your main concern is CM (that has a single client per node, >> multiple remote servers, and some different behavior) we can treat it >> as an exception, I'm more focused at things such as distributed SA in >> such cases we can cache/lookup based on a class first > > I pick the CM because it's the simplest type of client. It doesn't > use RMPP or request/response transfers. I think starting with the CM > will ensure that we get the right layering of services. For CM the use cases are different than SA, PM, .. Since CM Redirect can be used for per session load balancing, etc', in such a case Redirect is better done at the CM rather than the Access For SA .. There is usually only one target (SA Server) accessed by each node, so we just need to cache one entry for the remote SA One implementation I'm looking at is a distributed SA, were each node access a different SA, also there can be more than one SA clients per node Vs. usually one CM per node So that's why I think Redirect in CM & SA are not the same and may require different approach Your thoughts ? Yaron From roland at topspin.com Sat Aug 7 12:32:54 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 07 Aug 2004 12:32:54 -0700 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: <1091905273.1833.6.camel@localhost.localdomain> (Hal Rosenstock's message of "Sat, 07 Aug 2004 15:01:11 -0400") References: <1091905273.1833.6.camel@localhost.localdomain> Message-ID: <52u0vebv6h.fsf@topspin.com> I like this idea (although I might want to hide this level of control unless CONFIG_EMBEDDED is selected). I think this sort of thing in Makefiles is not the right way to go, though: +ifdef CONFIG_INFINIBAND_CM +obj-$(CONFIG_INFINIBAND) += \ + ib_cm.o +endif I think it would be better just to assign to obj-$(CONFIG_INFINIBAND_CM). Also, this is definitely not needed (if ib_cm.o isn't being build, ib_cm-objs will just be ignored): +ifdef CONFIG_INFINIBAND_CM ib_cm-objs := \ cm_main.o \ cm_api.o \ @@ -64,6 +73,7 @@ cm_connection_table.o \ cm_service_table.o \ cm_proc.o +endif Also I'm not sure that one needs both the "select" and "default y" stuff in Kconfig. I'd rather work this out later when we know what's in the real gen2 tree and how the core IPoIB stuff for merging upstream looks. Then we can figure out how to add CM, etc. (And see if we can move DM into userspace) - R. From roland at topspin.com Sat Aug 7 12:34:10 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 07 Aug 2004 12:34:10 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AC7F@taurus.voltaire.com> (Yaron Haviv's message of "Sat, 7 Aug 2004 22:29:18 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AC7F@taurus.voltaire.com> Message-ID: <52pt62bv4d.fsf@topspin.com> Yaron> So that's why I think Redirect in CM & SA are not the same Yaron> and may require different approach Seems like this is an argument for moving redirect support further up in the stack, right? - Roland From roland at topspin.com Sat Aug 7 12:36:02 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 07 Aug 2004 12:36:02 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AC7E@taurus.voltaire.com> (Yaron Haviv's message of "Sat, 7 Aug 2004 22:18:39 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AC7E@taurus.voltaire.com> Message-ID: <52llgqbv19.fsf@topspin.com> Yaron> If you think Redirect on the requestor side is not trivial, Yaron> that is even a better reason to put it in a common place Yaron> rather than for every potential client, from what I know it Yaron> is already supported by the code Hal posted to some degree, Yaron> so it is achievable as well as we can use it instead of Yaron> re-inventing a new mechanism Actually I think redirect on the requestor side involves a non-trivial amount of policy, which means I want the consumer to be in control of it. If there is a common policy that many consumers use then we can create a library or helper functions, but I wouldn't want to force every consumer to use the same policy. - R. From halr at voltaire.com Sat Aug 7 14:07:52 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 07 Aug 2004 17:07:52 -0400 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: <52u0vebv6h.fsf@topspin.com> References: <1091905273.1833.6.camel@localhost.localdomain> <52u0vebv6h.fsf@topspin.com> Message-ID: <1091912874.1808.6.camel@localhost.localdomain> On Sat, 2004-08-07 at 15:32, Roland Dreier wrote: > I like this idea (although I might want to hide this level of control > unless CONFIG_EMBEDDED is selected). I think this sort of thing in > Makefiles is not the right way to go, though: > > +ifdef CONFIG_INFINIBAND_CM > +obj-$(CONFIG_INFINIBAND) += \ > + ib_cm.o > +endif > > I think it would be better just to assign to > obj-$(CONFIG_INFINIBAND_CM). > > Also, this is definitely not needed (if ib_cm.o isn't being build, > ib_cm-objs will just be ignored): > > +ifdef CONFIG_INFINIBAND_CM > ib_cm-objs := \ > cm_main.o \ > cm_api.o \ > @@ -64,6 +73,7 @@ > cm_connection_table.o \ > cm_service_table.o \ > cm_proc.o > +endif > > Also I'm not sure that one needs both the "select" and "default y" > stuff in Kconfig. > > I'd rather work this out later when we know what's in the real gen2 > tree and how the core IPoIB stuff for merging upstream looks. Then we > can figure out how to add CM, etc. (And see if we can move DM into > userspace) I'm interested in working this out now because having everything in one tree will make it easier right ? I'd like to avoid having 2 mthcas and have things build smoothly. Is there a better way to do this ? -- Hal From halr at voltaire.com Sat Aug 7 15:44:07 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 07 Aug 2004 18:44:07 -0400 Subject: [openib-general] [PATCH]: GSI: Eliminate function declaration isn't a prototype warnings Message-ID: <1091918649.1807.18.camel@localhost.localdomain> Eliminate function declaration isn't a prototype warnings Index: gsi_main.c =================================================================== --- gsi_main.c (revision 596) +++ gsi_main.c (working copy) @@ -178,8 +178,8 @@ *hca, u32 client_id); static void gsi_put_class_info(struct gsi_serv_class_info_st *class_info); -static void gsi_get_all_classes(); -static void gsi_put_all_classes(); +static void gsi_get_all_classes(void); +static void gsi_put_all_classes(void); static void gsi_send_compl_cb(struct gsi_hca_info_st *hca, struct gsi_dtgrm_priv_st *dtgrm_priv); @@ -1032,7 +1032,7 @@ * Increase reference counter for all classes */ static void -gsi_get_all_classes() +gsi_get_all_classes(void) { struct gsi_serv_class_info_st *class_info, *head = (struct gsi_serv_class_info_st *) &gsi_class_list; @@ -1049,7 +1049,7 @@ * Decrease reference counter for all classes */ static void -gsi_put_all_classes() +gsi_put_all_classes(void) { struct gsi_serv_class_info_st *class_info, *head = (struct gsi_serv_class_info_st *) &gsi_class_list; @@ -1292,7 +1292,7 @@ * (see gsi_sent_dtgrm_timer_handler() description) */ static inline void -gsi_sent_dtgrm_timer_run() +gsi_sent_dtgrm_timer_run(void) { if (!atomic_dec_and_test(&gsi_sent_dtgrm_timer_running)) { printk(KERN_DEBUG "Timer already running\n"); @@ -1311,7 +1311,7 @@ * Stop the sent datagram timer. */ static inline void -gsi_sent_dtgrm_timer_stop() +gsi_sent_dtgrm_timer_stop(void) { if (atomic_dec_and_test(&gsi_sent_dtgrm_timer_running)) { printk(KERN_DEBUG "Timer not running\n"); @@ -2388,7 +2388,7 @@ * Print statistics to console */ static void -gsi_proc_show_statistics() +gsi_proc_show_statistics(void) { struct gsi_serv_class_info_st *class_info, *head = (struct gsi_serv_class_info_st *) &gsi_class_list; Index: rmpp/rmpp.c =================================================================== --- rmpp/rmpp.c (revision 560) +++ rmpp/rmpp.c (working copy) @@ -211,7 +211,7 @@ * Get a send handling structure. */ struct rmpp_mad_send_t * -__get_mad_send() +__get_mad_send(void) { struct rmpp_mad_send_t *h_send; @@ -745,7 +745,8 @@ /* * Send the MAD. - * If we failed to send, just free all send requests from the queue. + * If we failed to send, just free all + * send requests from the queue. */ if (status == RMPP_IB_SUCCESS) status = @@ -1562,8 +1563,7 @@ /* * If a send tracking structure was already - * created by sender rmpp_send(), - * return it. + * created by sender rmpp_send(), return it. * If not, create it and return NULL. * Sender will use it later. */ Index: rmpp/rmpp_al.h =================================================================== --- rmpp/rmpp_al.h (revision 560) +++ rmpp/rmpp_al.h (working copy) @@ -212,9 +212,9 @@ RMPP_IB_INVALID_MCAST_HANDLE, RMPP_IB_INVALID_CALLBACK, RMPP_IB_INVALID_AL_HANDLE, /* InfiniBand Access Layer */ - RMPP_IB_INVALID_HANDLE, /* InfiniBand Access Layer */ - RMPP_IB_ERROR, /* InfiniBand Access Layer */ - RMPP_IB_REMOTE_ERROR, /* Infiniband Access Layer */ + RMPP_IB_INVALID_HANDLE, /* InfiniBand Access Layer */ + RMPP_IB_ERROR, /* InfiniBand Access Layer */ + RMPP_IB_REMOTE_ERROR, /* Infiniband Access Layer */ RMPP_IB_VERBS_PROCESSING_DONE, /* See Notes above */ RMPP_IB_INVALID_WR_TYPE, RMPP_IB_QP_IN_TIMEWAIT, @@ -262,15 +262,14 @@ struct rmpp_mad_send_t *h_send; - struct rmpp_ib_mad_t *p_mad_buf; /* The buffer including + struct rmpp_ib_mad_t *p_mad_buf; /* + * Buffer including * MAD header and payload. * May be longer than 265 */ u32 size; - struct ib_rmpp_mad_seg_t mad_seg; /* 256 byte long - * MAD segment - */ + struct ib_rmpp_mad_seg_t mad_seg; /* 256 byte long MAD segment */ u32 remote_qp; @@ -300,11 +299,13 @@ u8 hop_limit; u8 sgid_index; - int dir_switch_needed; /* Sender will need to wait for an ACK + int dir_switch_needed; /* + * Sender will need to wait for an ACK * before start sending. * Set by RMPP service user. */ - u32 payload_offset; /* SA packets may need to + u32 payload_offset; /* + * SA packets may need to * add SA header in every * segment */ From roland at topspin.com Sat Aug 7 16:06:24 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 07 Aug 2004 16:06:24 -0700 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: <1091912874.1808.6.camel@localhost.localdomain> (Hal Rosenstock's message of "Sat, 07 Aug 2004 17:07:52 -0400") References: <1091905273.1833.6.camel@localhost.localdomain> <52u0vebv6h.fsf@topspin.com> <1091912874.1808.6.camel@localhost.localdomain> Message-ID: <52hdreblan.fsf@topspin.com> Hal> I'm interested in working this out now because having Hal> everything in one tree will make it easier right ? I'd like Hal> to avoid having 2 mthcas and have things build smoothly. Is Hal> there a better way to do this ? OK (although my impression was that it was important to emphasize that my tree was not the official gen2 tree and that there should be multiple trees for now) In any case, if we want to do this now then I think all of my previous comments should be addressed: make all the CONFIG_XXX options just choose which modules to build via obj-$(CONFIG_XXX) (it should be possible to make this work with no ifdefs in Makefiles), and get rid of the redundant "default y" stuff in Kconfig (I think "select" should be enough). Also I'd like to get opinions on whether this level of configurability should always be exposed or if it should be hidden unless the user selects CONFIG_EMBEDDED. - R. From halr at voltaire.com Sat Aug 7 16:46:17 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 07 Aug 2004 19:46:17 -0400 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: <52hdreblan.fsf@topspin.com> References: <1091905273.1833.6.camel@localhost.localdomain> <52u0vebv6h.fsf@topspin.com> <1091912874.1808.6.camel@localhost.localdomain> <52hdreblan.fsf@topspin.com> Message-ID: <1091922379.1803.30.camel@localhost.localdomain> On Sat, 2004-08-07 at 19:06, Roland Dreier wrote: > Hal> I'm interested in working this out now because having > Hal> everything in one tree will make it easier right ? I'd like > Hal> to avoid having 2 mthcas and have things build smoothly. Is > Hal> there a better way to do this ? > > OK (although my impression was that it was important to emphasize that > my tree was not the official gen2 tree and that there should be > multiple trees for now) It's not the official tree but neither of us wants to maintain mthca in 2 places :-) I would expect the official tree to at least include mthca and core/core* files with some minor modifications, don't you ? > In any case, if we want to do this now then I think all of my previous > comments should be addressed: make all the CONFIG_XXX options just > choose which modules to build via obj-$(CONFIG_XXX) (it should be > possible to make this work with no ifdefs in Makefiles), Fair enough. I just wanted to be sure there was a chance that this would be accepted before investing more time. > and get rid > of the redundant "default y" stuff in Kconfig (I think "select" should > be enough). I thought that might be the case but didn't have time to check it out. > Also I'd like to get opinions on whether this level of configurability > should always be exposed or if it should be hidden unless the user > selects CONFIG_EMBEDDED. Is that because it changes the kernel size in a minor way so is not worth whether it is in or out ? If so, in the long run, this might only be for CONFIG_EMBEDDED but it will be useful to build variants off one tree. I suspect you will want to keep your "tree" going for longer than just phase 1 of gen2 as it contains some things that are beyond that phase. I think this makes less work for all of us (not that there isn't more than enough right now) :-) -- Hal From halr at voltaire.com Sat Aug 7 16:49:20 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 07 Aug 2004 19:49:20 -0400 Subject: [openib-general] [PATCH]: GSI: Make RMPP ready to be a configurable option Message-ID: <1091922562.1803.34.camel@localhost.localdomain> Make RMPP ready to be a configurable option This will be completed when I put in the changes for 2.6 kbuilding For now, the Makefile continues to enable it Index: gsi_main.c =================================================================== --- gsi_main.c (revision 598) +++ gsi_main.c (working copy) @@ -138,7 +138,7 @@ # define GSI_REDIR_LIST_UNLOCK(class_info) spin_unlock_irqrestore(&class_info->redirect_class_port_info_list_lock,\ redirect_class_port_info_list_sflags) -/* ID reserved for server (AGENT in IB MI terms) */ +/* ID reserved for server (Agent in IB MI terms) */ #define GSI_SERVER_ID 0 static struct list_head gsi_class_list; @@ -154,14 +154,16 @@ static int gsi_hca_h_array_num_ports = 0; #endif -/* Current client ID (MANAGER in IB MI terms) */ +/* Current client ID (Manager in IB MI terms) */ static u32 gsi_curr_client_id = GSI_SERVER_ID + 1; static u32 gsi_curr_tid_cnt = 0; static u32 gsi_stat_2put_err = 0; static u32 gsi_stat_put_err = 0; +#ifdef GSI_RMPP_SUPPORT extern int rmpp_spec_compliant; +#endif /* * Our transaction ID structure - @@ -195,6 +197,7 @@ static void gsi_thread_stop(struct gsi_hca_info_st *hca); static void gsi_thread_signal(struct gsi_hca_info_st *hca); +#ifdef GSI_RMPP_SUPPORT static void gsi_rmpp_receive_cb(void *rmpp_h, struct rmpp_ib_mad_element_t *rmpp_mad); static void gsi_rmpp_send_compl_cb(void *rmpp_h, @@ -210,6 +213,7 @@ static inline int gsi_is_rmpp_mad(struct gsi_serv_class_info_st *class_info, struct gsi_dtgrm_t *dtgrm); +#endif #ifndef min static inline int @@ -875,6 +879,7 @@ *handle = (void *) newinfo; +#ifdef GSI_RMPP_SUPPORT if (rmpp) { if (gsi_dtgrm_pool_create(GSI_RMPP_RCV_POOL_SIZE, &newinfo->rmpp_rcv_dtgrm_pool) < 0) { @@ -905,6 +910,7 @@ printk(KERN_DEBUG "Registered RMPP service for class (0x%x)\n", class); } +#endif GSI_CLASS_LOCK(); list_add_tail((struct list_head *) newinfo, &gsi_class_list); @@ -915,6 +921,7 @@ return 0; +#ifdef GSI_RMPP_SUPPORT error5: if (rmpp) gsi_dtgrm_pool_destroy(newinfo->rmpp_snd_dtgrm_pool); @@ -923,6 +930,7 @@ gsi_dtgrm_pool_destroy(newinfo->rmpp_rcv_dtgrm_pool); error3: kfree(newinfo); +#endif error2: gsi_hca_close(hca); error1: @@ -950,6 +958,7 @@ v_list_del((struct list_head *) class_info); GSI_CLASS_UNLOCK(); +#ifdef GSI_RMPP_SUPPORT if (class_info->rmpp) { if (rmpp_deregister(class_info->rmpp_h) != RMPP_IB_SUCCESS) { printk(KERN_ERR \ @@ -963,6 +972,7 @@ gsi_dtgrm_pool_destroy(class_info->rmpp_snd_dtgrm_pool); gsi_dtgrm_pool_destroy(class_info->rmpp_rcv_dtgrm_pool); } +#endif gsi_class_return_posted_snd_dtgrms(class_info); gsi_class_clean_redirect_class_port_info_list(class_info); @@ -1180,7 +1190,6 @@ gsi_post_receive_dtgrms(hca); ib_req_notify_cq(hca->cq, IB_CQ_NEXT_COMP); } - } /* @@ -1268,6 +1277,7 @@ /* Increase class use counter */ gsi_use_class_info(class_info); +#ifdef GSI_RMPP_SUPPORT /* If the sent MAD is for RMPP, handle it properly. */ if (gsi_is_rmpp_mad(class_info, (struct gsi_dtgrm_t *) dtgrm_priv)) { printk(KERN_DEBUG "RMPP segment send done\n"); @@ -1275,6 +1285,9 @@ dtgrm_priv->rmpp_context); gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm_priv); } else if (class_info->send_compl_cb) { +#else + if (class_info->send_compl_cb) { +#endif class_info->send_compl_cb(class_info, class_info->context, (struct gsi_dtgrm_t *) dtgrm_priv); @@ -1324,7 +1337,7 @@ /* * The function checks send datagram lists for all classes and release request datagrams - * that was not released as a result of response (if the datagram life time is finished). + * that were not released as a result of response (if the datagram life time is finished). */ static void gsi_sent_dtgrm_timer_handler(unsigned long data) @@ -1499,8 +1512,8 @@ } /* -* Keep the received class port info in the redirect info list. -*/ + * Keep the received class port info in the redirect info list. + */ memcpy(&redirect_info->class_port_info, &redir_mad->class_port_info, sizeof (redirect_info->class_port_info)); @@ -1545,6 +1558,7 @@ return; } +#ifdef GSI_RMPP_SUPPORT static void gsi_rmpp_recv(struct gsi_serv_class_info_st *class_info, struct gsi_dtgrm_priv_st *dtgrm_priv) @@ -1587,6 +1601,7 @@ (((struct mad_rmpp_hdr_t *) &dtgrm->mad)-> rmpp_flags & MAD_RMPP_FLAG_ACTIVE)); } +#endif /* * Receive callback @@ -1701,11 +1716,13 @@ goto out1; } +#ifdef GSI_RMPP_SUPPORT /* If the received MAD is for RMPP - handle it properly. */ if (gsi_is_rmpp_mad(class_info, (struct gsi_dtgrm_t *) dtgrm_priv)) { gsi_rmpp_recv(class_info, dtgrm_priv); goto out1; } +#endif if (class_info->receive_cb) { class_info->receive_cb(class_info, class_info->context, @@ -1781,7 +1798,6 @@ dtgrm_priv->rqp = 1; dtgrm_priv->r_q_key = GSI_QP1_WELL_KNOWN_Q_KEY; } - } /* @@ -1832,6 +1848,7 @@ } } +#ifdef GSI_RMPP_SUPPORT if (gsi_is_rmpp_send_dtgrm(class_info, dtgrm)) { printk(KERN_DEBUG "Post RMPP datagram\n"); ret = gsi_post_send_rmpp(class_info, dtgrm); @@ -1839,6 +1856,11 @@ printk(KERN_DEBUG "Post regular datagram\n"); ret = gsi_post_send_mad(class_info, dtgrm); } +#else + printk(KERN_DEBUG "Post regular datagram\n"); + ret = gsi_post_send_mad(class_info, dtgrm); +#endif + error: return ret; } @@ -1947,6 +1969,7 @@ return ret; } +#ifdef GSI_RMPP_SUPPORT static void gsi_conv_rmpp_mad_to_dtgrm(struct gsi_dtgrm_priv_st *dtgrm_priv, struct rmpp_ib_mad_element_t *rmpp_mad) @@ -1974,6 +1997,7 @@ dtgrm_priv->rmpp_dir_switch_needed = rmpp_mad->dir_switch_needed; } +#endif /* * Post send a single PLM reply MAD. @@ -2076,6 +2100,7 @@ return ret; } +#ifdef GSI_RMPP_SUPPORT static void gsi_conv_rcv_dtgm_to_rmpp_mad(struct rmpp_ib_mad_element_t *rmpp_mad, struct gsi_dtgrm_priv_st *dtgrm_priv) @@ -2211,6 +2236,7 @@ err: rmpp_put_mad(rmpp_mad); } +#endif /* * proc file read procedure. @@ -2397,8 +2423,10 @@ GSI_HCA_LIST_LOCK_VAR; printk("********************************************\n"); +#ifdef GSI_RMPP_SUPPORT printk("SA SPEC compliant - %s\n\n", rmpp_spec_compliant ? "TRUE" : "FALSE"); +#endif /* * Print HCA/port information @@ -2589,9 +2617,8 @@ /* * GSI thread. - * If we may call completion callabacks only from - * a thread context, not tasklets, - * GSI thread is used. + * If we may call completion callbacks only from + * a thread context, not tasklets, GSI thread is used. * The thread sleeps on a wait queue waiting for * a signal from gsi_thread_compl_cb(). * gsi_compl_cb() will be called by the thread. @@ -2856,7 +2883,9 @@ gsi_cleanup_module(void) { printk(KERN_DEBUG "Bye GSI!\n"); +#ifdef GSI_RMPP_SUPPORT rmpp_cleanup(); +#endif ib_device_notifier_deregister(&gsi_notifier); @@ -3029,8 +3058,10 @@ } memset((*dtgrm)->mad, 0, sizeof ((*dtgrm)->mad)); +#ifdef GSI_RMPP_SUPPORT (*dtgrm)->rmpp_payload = NULL; (*dtgrm)->rmpp_payload_size = 0; +#endif (*dtgrm)->is_marked_for_release = 0; (*dtgrm)->is_in_use = 0; ((struct gsi_dtgrm_priv_st *) (*dtgrm))->posted = 0; @@ -3056,7 +3087,9 @@ dtgrm_priv = ((struct gsi_dtgrm_priv_st *) dtgrm); pool = (struct gsi_dtgrm_pool_info_st *) dtgrm_priv->pool; +#ifdef GSI_RMPP_SUPPORT gsi_dtgrm_free_rmpp_buf(dtgrm); +#endif kmem_cache_free(pool->cache, dtgrm); @@ -3096,6 +3129,7 @@ return cnt; } +#ifdef GSI_RMPP_SUPPORT void * gsi_dtgrm_alloc_rmpp_buf(struct gsi_dtgrm_t *dtgrm, int size) { @@ -3123,6 +3157,7 @@ *size = dtgrm->rmpp_payload_size; return dtgrm->rmpp_payload; } +#endif /* * Exported functions @@ -3138,9 +3173,11 @@ EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_get); EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_put); EXPORT_SYMBOL_NOVERS(gsi_post_send_dtgrm); +#ifdef GSI_RMPP_SUPPORT EXPORT_SYMBOL_NOVERS(gsi_dtgrm_alloc_rmpp_buf); EXPORT_SYMBOL_NOVERS(gsi_dtgrm_free_rmpp_buf); EXPORT_SYMBOL_NOVERS(gsi_dtgrm_get_rmpp_buf); +#endif MODULE_LICENSE("GPL/BSD"); Index: gsi.h =================================================================== --- gsi.h (revision 596) +++ gsi.h (working copy) @@ -60,7 +60,6 @@ #include "class_port_info.h" #if 0 -#define GSI_RMPP_SUPPORT #define GSI_POOL_TRACE #endif @@ -100,7 +99,7 @@ char mad[MAD_BLOCK_SIZE]; char cache_pad[(32 - ((sizeof (struct mad_t) + IB_GRH_LEN) % 32)) % 32]; -#if 1 /* GSI_RMPP_SUPPORT */ +#ifdef GSI_RMPP_SUPPORT void *rmpp_payload; int rmpp_payload_size; void *rmpp_context; @@ -230,7 +229,7 @@ */ int gsi_post_send_dtgrm(void *handle, struct gsi_dtgrm_t *dtgrm); -#if 1 /* GSI_RMPP_SUPPORT */ +#ifdef GSI_RMPP_SUPPORT /* * Allocate a buffer for RMMP (Relable Multi-Packet Protocol) payload. * The buffer does not include general MAD header or RMMP header. Index: Makefile =================================================================== --- Makefile (revision 561) +++ Makefile (working copy) @@ -10,7 +10,7 @@ OBJS += gsi_rmpp_vendal.o rmpp/rmpp_module.o #CFLAGS := -W -O2 -DVD_MODULE_NAME=GSI -DVD_TRACE_LEVEL=6 -DVD_ENTERLEAVE_LEVEL=3 -DMODULE -D__KERNEL__ -DKBUILD_BASENAME=$(MODULE) -DKBUILD_MODNAME=$(MODULE) $(INCDIRS) -CFLAGS := -W -O2 -DVD_MODULE_NAME=GSI -DMODULE -D__KERNEL__ -DKBUILD_BASENAME=$(MODULE) -DKBUILD_MODNAME=$(MODULE) $(INCDIRS) +CFLAGS := -W -O2 -DVD_MODULE_NAME=GSI -DMODULE -D__KERNEL__ -DKBUILD_BASENAME=$(MODULE) -DKBUILD_MODNAME=$(MODULE) -DGSI_RMPP_SUPPORT $(INCDIRS) all: gsi.o From ftillier at infiniconsys.com Sat Aug 7 16:49:15 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Sat, 7 Aug 2004 16:49:15 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <20040807085549.26ad502a.mshefty@ichips.intel.com> Message-ID: <000301c47cd9$26954240$655aa8c0@infiniconsys.com> > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Saturday, August 07, 2004 8:56 AM > > On Sat, 07 Aug 2004 11:06:16 -0400 > Hal Rosenstock wrote: > > > What is used to indicate send only v. send and (RMPP) response > > expected? > > The timeout_ms field in ib_mad_send_wr indicates if a response is > expected. When sending, if RMPPActive is set, the send will use RMPP. > After the send completes, if timeout_ms is set, then a response is > expected. On the received response, if the sender uses RMPP (set when > calling ib_mad_reg()), the GSI will look in the RMPP header to see if RMPP > is active for the receive. If the GSI does not do the retries for the client, how do you handle the case where the request times out but isn't retried before the response comes in? In that case, the response no longer matches up to anything. If it gets dropped, the subsequent retry by the client could result in exactly the same result. Will clients have a way to tell the GSI to keep a request "active" for matching to receives between the time a send times out and the client resends it? - Fab From halr at voltaire.com Sat Aug 7 17:42:25 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 07 Aug 2004 20:42:25 -0400 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: <52hdreblan.fsf@topspin.com> References: <1091905273.1833.6.camel@localhost.localdomain> <52u0vebv6h.fsf@topspin.com> <1091912874.1808.6.camel@localhost.localdomain> <52hdreblan.fsf@topspin.com> Message-ID: <1091925821.1808.13.camel@localhost.localdomain> On Sat, 2004-08-07 at 19:06, Roland Dreier wrote: > and get rid > of the redundant "default y" stuff in Kconfig (I think "select" should > be enough). I'm not convinced it's totally redundant. Using only select without default, after selecting an option which causes CM or DM client to be included, and then unselecting the original option, sometimes causes the underlying dependency to stay on. With the default syntax, I don't think that happened. With the defaults as well, this does not occur. So I think it is better this way unless it is a bug that there is this difference for another reason. I will resubmit the new patch which will have the same Kconfigs and a new Makefile shortly. -- Hal From roland.list at gmail.com Sun Aug 8 09:20:45 2004 From: roland.list at gmail.com (Roland Dreier) Date: Sun, 8 Aug 2004 09:20:45 -0700 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: <1091922379.1803.30.camel@localhost.localdomain> References: <1091905273.1833.6.camel@localhost.localdomain> <52u0vebv6h.fsf@topspin.com> <1091912874.1808.6.camel@localhost.localdomain> <52hdreblan.fsf@topspin.com> <1091922379.1803.30.camel@localhost.localdomain> Message-ID: > > Also I'd like to get opinions on whether this level of configurability > > should always be exposed or if it should be hidden unless the user > > selects CONFIG_EMBEDDED. > Is that because it changes the kernel size in a minor way so is not > worth whether it is in or out ? Yes, and also most users don't want to have to decide whether or not to build things like the CM or the DM client. In fact most users probably won't even know enough about IB to be able to make an informed choice. So it might be better to hide the choice unless the user really cares. - R. From roland.list at gmail.com Sun Aug 8 09:23:32 2004 From: roland.list at gmail.com (Roland Dreier) Date: Sun, 8 Aug 2004 09:23:32 -0700 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: <1091925821.1808.13.camel@localhost.localdomain> References: <1091905273.1833.6.camel@localhost.localdomain> <52u0vebv6h.fsf@topspin.com> <1091912874.1808.6.camel@localhost.localdomain> <52hdreblan.fsf@topspin.com> <1091925821.1808.13.camel@localhost.localdomain> Message-ID: > I'm not convinced it's totally redundant. Using only select without > default, after selecting an option which causes CM or DM client to be > included, and then unselecting the original option, sometimes causes the > underlying dependency to stay on. With the default syntax, I don't think > that happened. With the defaults as well, this does not occur. So I > think it is better this way unless it is a bug that there is this > difference for another reason. > I will resubmit the new patch which will have the same Kconfigs and a > new Makefile shortly. Hmm... I don't see anywhere else in the kernel that uses both default and select to express the same dependency. Also it seems it will be really ugly to make this work if you make the CM and DM options tristate (as they should be, since it seems if we're going to this trouble we should support situations like building IPoIB into the kernel and then building the CM and, say, SDP as modules). - R. From roland.list at gmail.com Sun Aug 8 09:25:52 2004 From: roland.list at gmail.com (Roland Dreier) Date: Sun, 8 Aug 2004 09:25:52 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <000301c47cd9$26954240$655aa8c0@infiniconsys.com> References: <000301c47cd9$26954240$655aa8c0@infiniconsys.com> Message-ID: > If the GSI does not do the retries for the client, how do you handle the > case where the request times out but isn't retried before the response comes > in? In that case, the response no longer matches up to anything. If it > gets dropped, the subsequent retry by the client could result in exactly the > same result. Will clients have a way to tell the GSI to keep a request > "active" for matching to receives between the time a send times out and the > client resends it? I think if the response arrives after the request has timed out, the only sensible thing to do is discard it. If the client is going to wait to retry after a timeout, and is willing to accept responses after the timeout, it seems that the client should have just used a longer timeout. - R. From halr at voltaire.com Sun Aug 8 09:49:40 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 08 Aug 2004 12:49:40 -0400 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: References: <1091905273.1833.6.camel@localhost.localdomain> <52u0vebv6h.fsf@topspin.com> <1091912874.1808.6.camel@localhost.localdomain> <52hdreblan.fsf@topspin.com> <1091925821.1808.13.camel@localhost.localdomain> Message-ID: <1091983782.1852.5.camel@localhost.localdomain> On Sun, 2004-08-08 at 12:23, Roland Dreier wrote: > Hmm... I don't see anywhere else in the kernel that uses both default and > select to express the same dependency. > > Also it seems it will be really ugly to make this work if you make the CM and > DM options tristate (as they should be, since it seems if we're going to this > trouble we should support situations like building IPoIB into the kernel and > then building the CM and, say, SDP as modules). The patch I just sent has both select and depends. I will play with this some more but there appears to be a subtle difference in the effect only when selecting and deselecting the option on which the other option is dependent. It's not a big deal if that doesn't work perfectly. So the bottom line is I can reissue another patch without this or you can fix this. Let me know. -- Hal From roland.list at gmail.com Sun Aug 8 10:48:41 2004 From: roland.list at gmail.com (Roland Dreier) Date: Sun, 8 Aug 2004 10:48:41 -0700 Subject: [openib-general] [PATCH] gen2: Only include CM and DM client in kernel build when needed In-Reply-To: <1091983782.1852.5.camel@localhost.localdomain> References: <1091905273.1833.6.camel@localhost.localdomain> <52u0vebv6h.fsf@topspin.com> <1091912874.1808.6.camel@localhost.localdomain> <52hdreblan.fsf@topspin.com> <1091925821.1808.13.camel@localhost.localdomain> <1091983782.1852.5.camel@localhost.localdomain> Message-ID: > The patch I just sent has both select and depends. I will play with this > some more but there appears to be a subtle difference in the effect only > when selecting and deselecting the option on which the other option is > dependent. It's not a big deal if that doesn't work perfectly. So the > bottom line is I can reissue another patch without this or you can fix > this. Let me know. The new patch hasn't shown up yet, so I can't be totally definite. But I would prefer not to have redundancy in the Kconfig file for several reasons: - maintain consistency with existing kernel practice - avoid maintenance problems caused by updating only half the info - although now you're saying "depends" & "select", the original patch used "default" & "select" -- in any case I would like to be able to have CONFIG_INFINIBAND=y plus CONFIG_INFINIBAND_CM=m work. - R. From mst at mellanox.co.il Sun Aug 8 11:21:12 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 8 Aug 2004 21:21:12 +0300 Subject: [openib-general] [PATCH] OpenSM - Clear IsSM bit when shutting down In-Reply-To: <003b01c47bf9$90029ce0$6b01a8c0@maverick> References: <003b01c47bf9$90029ce0$6b01a8c0@maverick> Message-ID: <20040808182112.GA11499@mellanox.co.il> But, userspace clearly can not guarantee an operation on shutdown (e.g. the app can be killed with -9). Does not this mean we need a kernel component help, to make cleaning IsSM in a robust fashion? MST Quoting r. Jan Daley (jdaley at systemfabricworks.com) "[openib-general] [PATCH] OpenSM - Clear IsSM bit when shutting down": > Hi, > > This patch is to clear the PortInfo:CapabilityMask:IsSM bit on shutdown. > A SM that is brought up on a different node later on will do repeated > SubnGet(SMInfo) that will just timeout. > > > > Index: opensm/osm_vendor_mlx.c > =================================================================== > --- opensm/osm_vendor_mlx.c (revision 590) > +++ opensm/osm_vendor_mlx.c (working copy) > @@ -674,6 +674,43 @@ > } > > /* > + * NAME __osm_vendor_clear_sm > + * > + * DESCRIPTION Modifies the port info for the bound port to clear > the "IS_SM" bit. > + */ > +static void > +__osm_vendor_clear_sm( IN osm_bind_handle_t h_bind ) > +{ > + osmv_bind_obj_t *p_bo = ( osmv_bind_obj_t * ) h_bind; > + osm_vendor_t const *p_vend = p_bo->p_vendor; > + VAPI_ret_t status; > + VAPI_hca_attr_t attr_mod; > + VAPI_hca_attr_mask_t attr_mask; > + > + OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); > + > + cl_memclr( &attr_mod, sizeof( attr_mod ) ); > + cl_memclr( &attr_mask, sizeof( attr_mask ) ); > + > + attr_mod.is_sm = FALSE; > + attr_mask = HCA_ATTR_IS_SM; > + > + status = > + VAPI_modify_hca_attr( p_bo->hca_hndl, p_bo->port_num, &attr_mod, > + &attr_mask ); > + if ( status != VAPI_OK ) > + { > + osm_log( p_vend->p_log, OSM_LOG_ERROR, > + "osm_vendor_set_sm: ERR 5012: " > + "Unable to clear 'IS_SM' bit in port attributes (%d).\n", > + status ); > + } > + > + OSM_LOG_EXIT( p_vend->p_log ); > +} > + > + > +/* > * NAME __osm_vendor_internal_unbind > * > * DESCRIPTION Destroying a bind: > @@ -689,6 +726,8 @@ > > OSM_LOG_ENTER(p_log,__osm_vendor_internal_unbind); > > + __osm_vendor_clear_sm(h_bind); > + > /* "notifying" all that from now on no new sends can be done */ > osmv_txn_lock(p_bo); > p_bo->is_closing = TRUE; > > > > Jan Daley > System Fabric Works > (512) 343-6101 x 13 > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ftillier at infiniconsys.com Sun Aug 8 13:02:13 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Sun, 8 Aug 2004 13:02:13 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: Message-ID: <000401c47d82$990e05b0$655aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:roland.list at gmail.com] > Sent: Sunday, August 08, 2004 9:26 AM > > > If the GSI does not do the retries for the client, how do you handle the > > case where the request times out but isn't retried before the response > > comes > > in? In that case, the response no longer matches up to anything. If it > > gets dropped, the subsequent retry by the client could result in exactly > > the > > same result. Will clients have a way to tell the GSI to keep a request > > "active" for matching to receives between the time a send times out and > > the client resends it? > > I think if the response arrives after the request has timed out, the only > sensible thing to do is discard it. If the client is going to wait to > retry after > a timeout, and is willing to accept responses after the timeout, it seems > that the client should have just used a longer timeout. It's not that the client is going to wait, but rather that there is a period of time between when the GSI times out a send and when a client receives notification of such and can retry the send. I'm concerned with any receive coming in within that window. It seems that the only client usage model is to have no retries and a really long timeout. Either that or the GSI has to provide some synchronization to keep the send valid until the send callback completes, but at the same time support the same MAD buffer to be resent by the client from the callback. I'm assuming that a send request by the client will get enqueued by the GSI so that it may be matched against received responses. The send needs to stay in that queue until the client gives up on retries. I'm just curious as to how the current API is going to support this and what effect supporting this has on API usage - i.e. can you retry a send from the send completion callback, etc. - Fab From eitan at mellanox.co.il Sun Aug 8 22:51:31 2004 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 9 Aug 2004 08:51:31 +0300 Subject: [openib-general] [PATCH] OpenSM - Clear IsSM bit when shuttin g down Message-ID: <506C3D7B14CDD411A52C00025558DED6047EE6C9@mtlex01.yok.mtl.com> If SM dies and the SM Bit is kept Michael S. Tsirkin [mailto:mst at mellanox.co.il] wrote: "But, userspace clearly can not guarantee an operation on shutdown (e.g. the app can be killed with -9). Does not this mean we need a kernel component help, to make cleaning IsSM in a robust fashion?" [EZ] If SM dies and the SM Bit is kept as 1 (set) - the impact on the subnet is minimal. The first SM to be run should query SM-Info on the port that has the SM-Bit set. Since there is no SM there will it will never get an answer - I.e. the SM dies. So I do not see this as a high priority issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Aug 9 04:15:17 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 09 Aug 2004 07:15:17 -0400 Subject: [openib-general] [PATCH] Only include CM and DM client in kernel build when needed (Take 3) Message-ID: <1092050119.1818.15.camel@localhost.localdomain> Take 3: On Roland's gen2 branch, the following patch causes CM and DM client to be included in the kernel build only when needed. One example of this is when only building for IPoIB. Index: infiniband/ulp/Kconfig =================================================================== --- infiniband/ulp/Kconfig (revision 595) +++ infiniband/ulp/Kconfig (working copy) @@ -2,7 +2,7 @@ tristate "IP-over-InfiniBand" depends on INFINIBAND && NETDEVICES && INET ---help--- - Support for the IP-over-InfiniBand protocol (IPoIB). This + Support for the IETF IP-over-InfiniBand protocol (IPoIB). This transports IP packets over InfiniBand so you can use your IB device as a fancy NIC. To configure interfaces on multiple InfiniBand partitions (P_Keys) you will need the ipoibcfg @@ -14,6 +14,7 @@ config INFINIBAND_SDP tristate "Sockets Direct Protocol" depends on INFINIBAND && INFINIBAND_IPOIB + select INFINIBAND_CM ---help--- Support for Sockets Direct Protocol (SDP). This provides sockets semantics over InfiniBand via address family @@ -24,7 +25,9 @@ config INFINIBAND_SRP tristate "SCSI RDMA Protocol" depends on INFINIBAND && SCSI - ---help--- + select INFINIBAND_CM + select INFINIBAND_DM_CLIENT + ---help--- Support for SCSI RDMA Protocol (SRP). This transports SCSI commands over InfiniBand and allows you to access storage connected via IB. @@ -33,6 +36,7 @@ tristate "uDAPL helper" depends on (BROKEN || m) && INFINIBAND && INFINIBAND_MELLANOX_HCA select INFINIBAND_USER_CM + select INFINIBAND_CM ---help--- Kernel space helper for uDAPL. Select this and use the uDAPL library from to run uDAPL applications. Index: infiniband/Kconfig =================================================================== --- infiniband/Kconfig (revision 595) +++ infiniband/Kconfig (working copy) @@ -12,11 +12,25 @@ tristate "Userspace CM" depends on INFINIBAND && INFINIBAND_MELLANOX_HCA depends on BROKEN || m + select INFINIBAND_CM ---help--- Userspace access to the kernel communication manager. This allows userspace protocols such as uDAPL or MPI to use the kernel's services for establishing connections. +config INFINIBAND_CM + tristate "CM" + depends on INFINIBAND + ---help--- + Kernel communication manager. Needed for connection oriented protocols + such as SRP, SDP, DAPL, or MPI. + +config INFINIBAND_DM_CLIENT + tristate "DM client" + depends on INFINIBAND + ---help--- + Device Manager (DM) client. Needed for SRP. + source "drivers/infiniband/ulp/Kconfig" source "drivers/infiniband/hw/Kconfig" Index: infiniband/core/Makefile =================================================================== --- infiniband/core/Makefile (revision 595) +++ infiniband/core/Makefile (working copy) @@ -10,12 +10,16 @@ obj-$(CONFIG_INFINIBAND) += \ ib_core.o \ ib_mad.o \ - ib_cm.o \ ib_client_query.o \ ib_sa_client.o \ - ib_dm_client.o \ ib_useraccess.o +obj-$(CONFIG_INFINIBAND_CM) += \ + ib_cm.o + +obj-$(CONFIG_INFINIBAND_DM_CLIENT) += \ + ib_dm_client.o + obj-$(CONFIG_INFINIBAND_USER_CM) += \ ib_useraccess_cm.o @@ -104,6 +108,7 @@ $(obj)/pm_access.h $(obj)/pm_types.h $(obj)/mad_static.o: $(obj)/smp_access.h $(obj)/smp_types.h + $(ib_cm-objs:%=$(obj)/%): $(obj)/cm_packet.h quiet_cmd_gendecode = GEN $@ From yael at mellanox.co.il Mon Aug 9 05:33:47 2004 From: yael at mellanox.co.il (Yael Kalka) Date: Mon, 9 Aug 2004 15:33:47 +0300 Subject: [openib-general] [PATCH] Osmtest - Byte Swapping issue and re source cleanup Message-ID: <506C3D7B14CDD411A52C00025558DED60585C2FF@mtlex01.yok.mtl.com> Committed to OpenSM - gen1. Thanks, Yael -----Original Message----- From: Jan Daley [mailto:jdaley at systemfabricworks.com] Sent: Saturday, August 07, 2004 12:10 AM To: openib-general at openib.org Subject: [openib-general] [PATCH] Osmtest - Byte Swapping issue and resource cleanup Hi, 1) In osmtest_stress_port_recs_small, the lid passed into the call to osmtest_get_port_rec wasn't being byte swapped. This caused all portinfo records to be retrieved instead of just one. 2) I added a couple of calls to cleanup qpools that were created. Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 590) +++ osmtest/osmtest.c (working copy) @@ -479,6 +479,8 @@ osm_vendor_delete( &p_osmt->p_vendor ); } + cl_qpool_destroy(&p_osmt->port_pool); + cl_qpool_destroy(&p_osmt->node_pool); osm_log_destroy( &p_osmt->log ); } @@ -1308,7 +1310,7 @@ /* * Do a blocking query for our own PortRecord in the subnet. */ - status = osmtest_get_port_rec( p_osmt, p_osmt->local_port.lid, &context ); + status = osmtest_get_port_rec( p_osmt, cl_hton16(p_osmt->local_port.lid), &context ); if( status != IB_SUCCESS ) { Jan Daley System Fabric Works (512) 343-6101 x 13 _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From yael at mellanox.co.il Mon Aug 9 05:34:07 2004 From: yael at mellanox.co.il (Yael Kalka) Date: Mon, 9 Aug 2004 15:34:07 +0300 Subject: [openib-general] [PATCH] OpenSM Assigning Duplicate LIDs Message-ID: <506C3D7B14CDD411A52C00025558DED60585C300@mtlex01.yok.mtl.com> Committed to OpenSM - gen1. Thanks, Yael -----Original Message----- From: Jan Daley [mailto:jdaley at systemfabricworks.com] Sent: Saturday, August 07, 2004 12:08 AM To: openib-general at openib.org Subject: [openib-general] [PATCH] OpenSM Assigning Duplicate LIDs Hi, I ran across a scenario in which the SM would assign the same lid to multiple nodes. The problem occurs when two nodes that have the same lid are placed into a subnet that doesn't have an SM running. When OpenSM is started, it preserves the existing lids without checking for duplicates. Index: opensm/osm_lid_mgr.c =================================================================== --- opensm/osm_lid_mgr.c (revision 590) +++ opensm/osm_lid_mgr.c (working copy) @@ -878,6 +878,7 @@ osm_lid_mgr_t* const p_mgr = (osm_lid_mgr_t*)context; osm_physp_t* p_physp; cl_ptr_vector_t* p_tbl; + osm_port_t* temp_port; OSM_LOG_ENTER( p_mgr->p_log, __osm_lid_mgr_process_foreign ); @@ -912,19 +913,33 @@ min_lid_ho, max_lid_ho ); } - /* - Place this port into the port ptr vector. - And update the PortInfo attribute template. - */ - for( lid_ho = min_lid_ho; lid_ho <= max_lid_ho; lid_ho++ ) - cl_ptr_vector_set( p_tbl, lid_ho, p_port ); + if (CL_SUCCESS == cl_ptr_vector_at(p_tbl, min_lid_ho, (void*)&temp_port) + && NULL != temp_port) + { + /* + If something is already there, we need to find a new + lid range for this port. Process it like it is unassigned. + */ + __osm_lid_mgr_process_unassigned(p_object, context); + } + else + { + /* + Place this port into the port ptr vector. + And update the PortInfo attribute template. + */ + for( lid_ho = min_lid_ho; lid_ho <= max_lid_ho; lid_ho++ ) + { + cl_ptr_vector_set( p_tbl, lid_ho, p_port ); + } - /* - Set the PortInfo for the Physical Port associated - with this Port. - */ - p_physp = osm_port_get_default_phys_ptr( p_port ); - __osm_lid_mgr_set_physp_pi( p_mgr, p_physp, cl_hton16( min_lid_ho ) ); + /* + Set the PortInfo for the Physical Port associated + with this Port. + */ + p_physp = osm_port_get_default_phys_ptr( p_port ); + __osm_lid_mgr_set_physp_pi( p_mgr, p_physp, cl_hton16( min_lid_ho ) ); + } OSM_LOG_EXIT( p_mgr->p_log ); } Jan Daley System Fabric Works (512) 343-6101 x 13 _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From yael at mellanox.co.il Mon Aug 9 05:34:24 2004 From: yael at mellanox.co.il (Yael Kalka) Date: Mon, 9 Aug 2004 15:34:24 +0300 Subject: [openib-general] [PATCH] OpenSM - Multiple Initializations of objects on startup Message-ID: <506C3D7B14CDD411A52C00025558DED60585C301@mtlex01.yok.mtl.com> Committed to OpenSM - gen1. Thanks, Yael -----Original Message----- From: Jan Daley [mailto:jdaley at systemfabricworks.com] Sent: Saturday, August 07, 2004 12:09 AM To: openib-general at openib.org Subject: [openib-general] [PATCH] OpenSM - Multiple Initializations of objects on startup Hi, osm_subn_construct and cl_map_init(&(p_subn->opt.port_pro_ignore_guids), 10) are being called twice. This causes some memory leaks. Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 590) +++ opensm/osm_subnet.c (working copy) @@ -112,7 +112,6 @@ cl_qmap_init( &p_subn->rtr_guid_tbl ); cl_qmap_init( &p_subn->prtn_pkey_tbl ); cl_qmap_init( &p_subn->mgrp_mlid_tbl ); - cl_map_init(&(p_subn->opt.port_pro_ignore_guids), 10); cl_list_construct( &p_subn->new_ports_list ); cl_list_init( &p_subn->new_ports_list, 10 ); } @@ -200,8 +199,6 @@ { cl_status_t status; - osm_subn_construct( p_subn ); - status = cl_ptr_vector_init( &p_subn->node_lid_tbl, OSM_SUBNET_VECTOR_MIN_SIZE, OSM_SUBNET_VECTOR_GROW_SIZE ); Jan Daley System Fabric Works (512) 343-6101 x 13 _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From yael at mellanox.co.il Mon Aug 9 05:34:33 2004 From: yael at mellanox.co.il (Yael Kalka) Date: Mon, 9 Aug 2004 15:34:33 +0300 Subject: [openib-general] [PATCH] OpenSM - Initializing a spinlock twi ce Message-ID: <506C3D7B14CDD411A52C00025558DED60585C302@mtlex01.yok.mtl.com> Committed to OpenSM - gen1. Thanks, Yael -----Original Message----- From: Jan Daley [mailto:jdaley at systemfabricworks.com] Sent: Saturday, August 07, 2004 12:09 AM To: openib-general at openib.org Subject: [openib-general] [PATCH] OpenSM - Initializing a spinlock twice Removing the second initialization of the spinlock. Index: opensm/cl_event_wheel.c =================================================================== --- opensm/cl_event_wheel.c (revision 590) +++ opensm/cl_event_wheel.c (working copy) @@ -249,7 +249,6 @@ CL_ASSERT( cl_spinlock_init( &(p_event_wheel->lock) ) == CL_SUCCESS ); cl_qlist_init( &p_event_wheel->events_wheel); cl_qmap_init( &p_event_wheel->events_map ); - cl_spinlock_init( &p_event_wheel->lock ); /* init the timer with timeout */ cl_status = cl_timer_init(&p_event_wheel->timer, Jan Daley System Fabric Works (512) 343-6101 x 13 _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From yael at mellanox.co.il Mon Aug 9 05:34:43 2004 From: yael at mellanox.co.il (Yael Kalka) Date: Mon, 9 Aug 2004 15:34:43 +0300 Subject: [openib-general] [PATCH] OpenSM - Clear IsSM bit when shuttin g down Message-ID: <506C3D7B14CDD411A52C00025558DED60585C303@mtlex01.yok.mtl.com> Committed to OpenSM - gen1. Thanks, Yael -----Original Message----- From: Jan Daley [mailto:jdaley at systemfabricworks.com] Sent: Saturday, August 07, 2004 12:09 AM To: openib-general at openib.org Subject: [openib-general] [PATCH] OpenSM - Clear IsSM bit when shutting down Hi, This patch is to clear the PortInfo:CapabilityMask:IsSM bit on shutdown. A SM that is brought up on a different node later on will do repeated SubnGet(SMInfo) that will just timeout. Index: opensm/osm_vendor_mlx.c =================================================================== --- opensm/osm_vendor_mlx.c (revision 590) +++ opensm/osm_vendor_mlx.c (working copy) @@ -674,6 +674,43 @@ } /* + * NAME __osm_vendor_clear_sm + * + * DESCRIPTION Modifies the port info for the bound port to clear the "IS_SM" bit. + */ +static void +__osm_vendor_clear_sm( IN osm_bind_handle_t h_bind ) +{ + osmv_bind_obj_t *p_bo = ( osmv_bind_obj_t * ) h_bind; + osm_vendor_t const *p_vend = p_bo->p_vendor; + VAPI_ret_t status; + VAPI_hca_attr_t attr_mod; + VAPI_hca_attr_mask_t attr_mask; + + OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); + + cl_memclr( &attr_mod, sizeof( attr_mod ) ); + cl_memclr( &attr_mask, sizeof( attr_mask ) ); + + attr_mod.is_sm = FALSE; + attr_mask = HCA_ATTR_IS_SM; + + status = + VAPI_modify_hca_attr( p_bo->hca_hndl, p_bo->port_num, &attr_mod, + &attr_mask ); + if ( status != VAPI_OK ) + { + osm_log( p_vend->p_log, OSM_LOG_ERROR, + "osm_vendor_set_sm: ERR 5012: " + "Unable to clear 'IS_SM' bit in port attributes (%d).\n", + status ); + } + + OSM_LOG_EXIT( p_vend->p_log ); +} + + +/* * NAME __osm_vendor_internal_unbind * * DESCRIPTION Destroying a bind: @@ -689,6 +726,8 @@ OSM_LOG_ENTER(p_log,__osm_vendor_internal_unbind); + __osm_vendor_clear_sm(h_bind); + /* "notifying" all that from now on no new sends can be done */ osmv_txn_lock(p_bo); p_bo->is_closing = TRUE; Jan Daley System Fabric Works (512) 343-6101 x 13 _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Aug 9 06:03:31 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 09 Aug 2004 09:03:31 -0400 Subject: [openib-general] OpenSM Patches for gen2 Message-ID: <1092056613.1819.1.camel@localhost.localdomain> Hi, What is the plan for the gen2 OpenSM in terms of tracking changes to the gen1 OpenSM ? Thanks. -- Hal From sean.hefty at intel.com Mon Aug 9 09:30:24 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 9 Aug 2004 09:30:24 -0700 Subject: [openib-general] [PATCH] added comments to ib_mad.h - minor update In-Reply-To: <000401c47d82$990e05b0$655aa8c0@infiniconsys.com> Message-ID: >It's not that the client is going to wait, but rather that there is a >period >of time between when the GSI times out a send and when a client receives >notification of such and can retry the send. I'm concerned with any >receive >coming in within that window. It seems that the only client usage model is >to have no retries and a really long timeout. Either that or the GSI has >to >provide some synchronization to keep the send valid until the send callback >completes, but at the same time support the same MAD buffer to be resent by >the client from the callback. I'm not sure that there's a real window here, and I agree with Roland. The client should have used a longer timeout. The response in this case is just discarded. It didn't arrive within the specified timeout window -- for the request that it was trying to match. If the request is going to be resent anyway, the client should resend the response, and hopefully, the timeout window has been adjusted. >I'm assuming that a send request by the client will get enqueued by the GSI >so that it may be matched against received responses. The send needs to >stay in that queue until the client gives up on retries. I'm just curious >as to how the current API is going to support this and what effect >supporting this has on API usage I think the difference in the models that you're talking about is whether a timeout is: time to wait for a response from a specific request, time from the first request multiplied by the number of retries to receive a response. I think that either can work. >i.e. can you retry a send from the send completion callback, etc. I think allowing retries to be issued from the send completion callback would be a requirement for the current API. From halr at voltaire.com Mon Aug 9 09:38:38 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 09 Aug 2004 12:38:38 -0400 Subject: [openib-general] [PATCH]: GSI: Eliminate spinlock wrappers Message-ID: <1092069520.1810.47.camel@localhost.localdomain> This patch eliminates the use of the spinlock wrappers in RMPP. Hence, it also eliminates the need for rmpp_lock.[ch] Index: rmpp/rmpp.c =================================================================== --- rmpp/rmpp.c (revision 598) +++ rmpp/rmpp.c (working copy) @@ -484,7 +484,7 @@ "h_send %p INIT, p_mad_element %p, p_mad_buf %p\n", h_send, p_mad_element, p_mad_element->p_mad_buf); - /* version 1 support only currently. */ + /* version 1 support only. */ if (p_mad_element->rmpp_version != DEFAULT_RMPP_VERSION) { RMPP_LOG(RMPP_LOG_ERROR, "ERR: p_mad_element %p WRONG RMPP VERSION %d\n", @@ -492,12 +492,7 @@ return RMPP_IB_INVALID_SETTING; } - rmpp_spinlock_construct(&h_send->lock); - if (rmpp_spinlock_init(&h_send->lock) != RMPP_SUCCESS) { - RMPP_LOG(RMPP_LOG_ERROR, "ERR: rmpp_spinlock_init error\n"); - return RMPP_IB_ERROR; - } - + spin_lock_init(&h_send->lock); INIT_LIST_HEAD(&h_send->req_list); h_send->busy = FALSE; @@ -554,12 +549,7 @@ } #endif - rmpp_spinlock_construct(&h_send->lock); - if (rmpp_spinlock_init(&h_send->lock) != RMPP_SUCCESS) { - RMPP_LOG(RMPP_LOG_ERROR, "ERR: rmpp_spinlock_init error\n"); - return RMPP_IB_ERROR; - } - + spin_lock_init(&h_send->lock); INIT_LIST_HEAD(&h_send->req_list); h_send->busy = FALSE; @@ -613,7 +603,7 @@ goto exit; } - rmpp_spinlock_acquire(&h_send->lock); + spin_lock_bh(&h_send->lock); /* Reset information to track the send. */ h_send->retry_time = MAX_TIME; @@ -705,7 +695,7 @@ RMPP_LOG(RMPP_LOG_DEBUG, "h_send %p DEBUG - drop a packet\n", h_send); - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); __set_retry_time(h_send); rmpp_timer_trim(&((struct rmpp_info_t *) (rmpp_h))->send_timer, h_send->p_send_mad->timeout_ms); @@ -728,7 +718,7 @@ */ if (h_send->busy) { RMPP_LOG(RMPP_LOG_DEBUG, "h_send %p BUSY\n", h_send); - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); goto exit; } else h_send->busy = TRUE; @@ -737,7 +727,7 @@ send_req = (struct rmpp_send_req_t *) h_send->req_list.next; list_del((struct list_head *) send_req); - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); RMPP_LOG(RMPP_LOG_DEBUG, "h_send %p SEND, send_req %p, p_send_mad %p\n", @@ -758,14 +748,14 @@ rmpp_free(send_req); - rmpp_spinlock_acquire(&h_send->lock); + spin_lock_bh(&h_send->lock); } if (h_send->ack_seg == h_send->total_seg) send_done = TRUE; h_send->busy = FALSE; - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); if (send_done) __put_mad_send(h_send); @@ -885,10 +875,10 @@ p_mad_element, (p_rmpp_mad->common_hdr.trans_id)); /* - * Search for the send. The send may have timed out, + * Search for the send. The send may have timed out, * been canceled, or received a response. */ - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); /* * Check if direction switch is to be performed @@ -917,7 +907,7 @@ "h_send %p handle regular ACK, TID 0x%LX\n", h_send, p_mad_element->p_mad_buf->trans_id); } - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); if (!h_send) { RMPP_LOG(RMPP_LOG_DEBUG, @@ -926,7 +916,7 @@ return; } - rmpp_spinlock_acquire(&h_send->lock); + spin_lock_bh(&h_send->lock); /* Drop old ACKs. */ if (ntohl(p_rmpp_mad->seg_num) < h_send->ack_seg) { @@ -969,12 +959,12 @@ h_send, h_send->ack_seg, h_send->total_seg); if (h_send->ack_seg == h_send->total_seg) { - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); /* The send is done. All segments have been ack'ed. */ send_done = TRUE; } else if (h_send->ack_seg < h_send->seg_limit) { /* Send the next segment. */ - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); status = __send_rmpp_seg(rmpp_h, h_send); if (status != RMPP_IB_SUCCESS) { RMPP_LOG(RMPP_LOG_ERROR, @@ -985,19 +975,19 @@ wc_status = RMPP_IB_WCS_TIMEOUT_RETRY_ERR; } } else { - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); RMPP_LOG(RMPP_LOG_DEBUG, "h_send %p INVALID ACK seg %d >= seg_limit %d\n", h_send, h_send->ack_seg, h_send->seg_limit); } if (send_done) { - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); RMPP_LOG(RMPP_LOG_DEBUG, "h_send %p SEND DONE, delete from list\n", h_send); list_del((struct list_head *) h_send); - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); /* * Check if we finished sending a request MAD @@ -1029,7 +1019,7 @@ return; exit: - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); } /* @@ -1046,29 +1036,28 @@ (struct rmpp_ib_rmpp_mad_t *) rmpp_ib_get_mad_buf(p_mad_element); /* Search for the send. The send may have timed out or been canceled. */ - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); h_send = __mad_send_match(rmpp_h, p_mad_element); + spin_unlock_bh(&rmpp_info->lock); if (!h_send) { - rmpp_spinlock_release(&rmpp_info->lock); return; } - rmpp_spinlock_release(&rmpp_info->lock); - rmpp_spinlock_acquire(&h_send->lock); + spin_lock_bh(&h_send->lock); /* If the send is active, we will finish processing it once it completes. */ if (h_send->retry_time == MAX_TIME) { h_send->canceled = TRUE; - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); return; } - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); /* Fail the send operation. */ - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); RMPP_LOG(RMPP_LOG_DEBUG, "h_send %p DELETE from list\n", h_send); list_del((struct list_head *) h_send); - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); if (rmpp_info->send_compl_cb) rmpp_info->send_compl_cb(rmpp_h, h_send->p_send_mad, @@ -1101,7 +1090,7 @@ INIT_LIST_HEAD(&timeout_list); cur_time = rmpp_get_time_stamp(); - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); /* Check all outstanding sends. */ list_for_each(h_send, (struct rmpp_mad_send_t *) &rmpp_info->send_list) { @@ -1110,7 +1099,7 @@ status = RMPP_IB_SUCCESS; - rmpp_spinlock_acquire(&h_send_current->lock); + spin_lock_bh(&h_send_current->lock); if (h_send->p_send_mad == NULL) { RMPP_LOG(RMPP_LOG_DEBUG, @@ -1167,7 +1156,7 @@ /* Resend all unacknowledged segments. */ h_send->cur_seg = h_send->ack_seg + 1; - rmpp_spinlock_release(&h_send_current->lock); + spin_unlock_bh(&h_send_current->lock); status = __send_rmpp_seg(rmpp_info, h_send); if (status != RMPP_IB_SUCCESS) { RMPP_LOG(RMPP_LOG_ERROR, @@ -1205,10 +1194,10 @@ h_send = h_send_tmp; cont: - rmpp_spinlock_release(&h_send_current->lock); + spin_unlock_bh(&h_send_current->lock); } - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); /* Report all timed out sends to the user. */ @@ -1241,7 +1230,7 @@ rmpp_info = (struct rmpp_info_t *) context; - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); /* Check all outstanding receives. */ list_for_each(p_recv, (struct rmpp_mad_recv_t *) &rmpp_info->recv_list) { @@ -1273,7 +1262,7 @@ } restart_timer = !list_empty(&rmpp_info->recv_list); - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); if (restart_timer) rmpp_timer_start(&rmpp_info->recv_timer, @@ -1371,7 +1360,7 @@ p_rmpp_hdr = rmpp_ib_get_mad_buf(p_mad_element); /* Try to find a receive already being reassembled. */ - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); p_recv = __find_recv(rmpp_h, p_mad_element); if (!p_recv) { @@ -1380,7 +1369,7 @@ p_mad_element->p_mad_buf->trans_id); /* This receive is not being reassembled. It should be the first seg. */ if (ntoh32(p_rmpp_hdr->seg_num) != 1) { - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); return RMPP_NOT_FOUND; } @@ -1388,7 +1377,7 @@ p_recv = __get_mad_recv(p_mad_element); if (!p_recv) { - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); return RMPP_INSUFFICIENT_MEMORY; } @@ -1465,13 +1454,13 @@ rmpp_status = RMPP_OVERRUN; } - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); /* - * Send any response MAD (ACK, ABORT, etc.) to the sender. Note that - * we are currently in the callback from the MAD dispatcher. The + * Send any response MAD (ACK, ABORT, etc.) to the sender. Note that + * we are currently in the callback from the MAD dispatcher. The * dispatcher holds a reference on the MAD service while in the callback, - * preventing the MAD service from being destroyed. This allows the + * preventing the MAD service from being destroyed. This allows the * call to ib_send_mad() to proceed even if the user tries to destroy * the MAD service. */ @@ -1708,7 +1697,7 @@ RMPP_LOG(RMPP_LOG_DEBUG, "p_mad_element %p p_mad_buf %p\n", p_mad_element, p_mad_element->p_mad_buf); - rmpp_spinlock_acquire(&info->lock); + spin_lock_bh(&info->lock); if ((retval = __prepare_mad_send(rmpp_h, p_mad_element, @@ -1727,7 +1716,7 @@ } err: - rmpp_spinlock_release(&info->lock); + spin_unlock_bh(&info->lock); return retval; } @@ -1758,11 +1747,7 @@ INIT_LIST_HEAD(&info->send_list); INIT_LIST_HEAD(&info->recv_list); - rmpp_spinlock_construct(&info->lock); - if (rmpp_spinlock_init(&info->lock) != RMPP_SUCCESS) { - RMPP_LOG(RMPP_LOG_ERROR, "rmpp_spinlock_init error\n"); - goto send_spinlock_init_err; - } + spin_lock_init(&info->lock); rmpp_timer_construct(&info->send_timer); if (rmpp_timer_init(&info->send_timer, @@ -1820,7 +1805,7 @@ INIT_LIST_HEAD(&tmp_send_list); INIT_LIST_HEAD(&tmp_recv_list); - rmpp_spinlock_acquire(&info->lock); + spin_lock_bh(&info->lock); while (!list_empty(&info->send_list)) { h_send = (struct rmpp_mad_send_t *) info->send_list.next; @@ -1837,7 +1822,7 @@ list_del((struct list_head *) p_recv); list_add_tail((struct list_head *) p_recv, &tmp_recv_list); } - rmpp_spinlock_release(&info->lock); + spin_unlock_bh(&info->lock); /* Have a short sleep here in order to let fast operations (callbacks) finish */ RMPP_LOG(RMPP_LOG_DEBUG, "sleep\n"); @@ -1935,7 +1920,7 @@ return; } - rmpp_spinlock_acquire(&h_send->lock); + spin_lock_bh(&h_send->lock); RMPP_LOG(RMPP_LOG_VERBOSE, "h_send %p, ref_cnt %d, ack_seg %d, total_seg %d\n", @@ -1945,13 +1930,13 @@ if (h_send->ack_seg == h_send->total_seg) { #if 1 - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); /* * ACK was already received even before the send completion */ - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); list_del((struct list_head *) h_send); - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); if (rmpp_info->send_compl_cb) rmpp_info->send_compl_cb(rmpp_h, h_send->p_send_mad, @@ -1967,7 +1952,7 @@ rmpp_timer_trim(&rmpp_info->send_timer, h_send->p_send_mad->timeout_ms); - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); #endif return; } @@ -1980,7 +1965,7 @@ h_send, rmpp_atomic_read(&h_send->ref_cnt), h_send->cur_seg); - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); /* Send the next segment. */ status = __send_rmpp_seg(rmpp_h, h_send); @@ -1990,9 +1975,9 @@ "ERR: h_send %p DELETE, send status %d\n", h_send, status); - rmpp_spinlock_acquire(&rmpp_info->lock); + spin_lock_bh(&rmpp_info->lock); list_del((struct list_head *) h_send); - rmpp_spinlock_release(&rmpp_info->lock); + spin_unlock_bh(&rmpp_info->lock); #if 0 rmpp_put_mad(h_send->p_send_mad); @@ -2015,7 +2000,7 @@ RMPP_LOG(RMPP_LOG_DEBUG, "h_send %p TRIM TIMER timeout_ms %d\n", h_send, h_send->p_send_mad->timeout_ms); - rmpp_spinlock_release(&h_send->lock); + spin_unlock_bh(&h_send->lock); } } Index: rmpp/rmpp.h =================================================================== --- rmpp/rmpp.h (revision 560) +++ rmpp/rmpp.h (working copy) @@ -111,7 +111,6 @@ #include "rmpp_api.h" #include "rmpp_timer.h" -#include "rmpp_lock.h" #define RMPP_TYPE_DATA 1 #define RMPP_TYPE_ACK 2 @@ -150,7 +149,7 @@ u64 trans_id; struct list_head req_list; - struct rmpp_spinlock_t lock; + spinlock_t lock; int busy; /* Absolute time that the request should be retried. */ @@ -204,6 +203,6 @@ void *vendal_p; rmpp_recv_cb_t recv_cb; rmpp_send_compl_cb_t send_compl_cb; - struct rmpp_spinlock_t lock; + spinlock_t lock; }; #endif /* __RMPP_H__ */ Index: rmpp/Makefile =================================================================== --- rmpp/Makefile (revision 560) +++ rmpp/Makefile (working copy) @@ -8,7 +8,7 @@ #INCDIRS := -I. -I/usr/src/linux/include MODULE := rmpp_module.o -OBJS := rmpp.o rmpp_timer.o rmpp_lock.o +OBJS := rmpp.o rmpp_timer.o $(MODULE): $(OBJS) $(LD) $(LDFLAGS) -o $@ $(OBJS) Index: TODO =================================================================== --- TODO (revision 596) +++ TODO (working copy) @@ -1,10 +1,9 @@ -8/6/04 +8/9/04 Add support for (at least) responses to requests with GRH Remove #if 0/1 with suitable preprocessor symbols Replace ib_reg_mr with ib_reg_phys_mr Eliminate static limit on numbers of ports/HCAs -Get rid of spinlock wrappers Makefile needs to use standard kbuild Migrate from /proc to /sysfs Static rate handling (low priority) From roland at topspin.com Mon Aug 9 10:34:43 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 10:34:43 -0700 Subject: [openib-general] [PATCH] update my branch's AH API Message-ID: <524qncb4gc.fsf@topspin.com> The AH API on my branch is now updated... - R. Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c (revision 592) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c (working copy) @@ -50,7 +50,7 @@ u32 qpn; u16 lid; tTS_IB_SL sl; - struct ib_address *address_handle; + struct ib_ah *address_handle; tTS_IB_CLIENT_QUERY_TID tid; unsigned long created; @@ -174,10 +174,10 @@ entry->hash[3], entry->hash[4], entry->hash[5]); if (entry->address_handle != NULL) { - int ret = ib_address_destroy(entry->address_handle); + int ret = ib_destroy_ah(entry->address_handle); if (ret < 0) TS_REPORT_WARN(MOD_IB_NET, - "ib_address_destroy failed (ret = %d)", + "ib_destroy_ah failed (ret = %d)", ret); } @@ -406,18 +406,19 @@ entry->tid = TS_IB_CLIENT_QUERY_TID_INVALID; if (!status) { - struct ib_address_vector av = { - .dlid = path->dlid, - .service_level = path->sl, - .port = priv->port, - .source_path_bits = 0, - .use_grh = 0, - .static_rate = 0 + struct ib_ah_attr av = { + .dlid = path->dlid, + .sl = path->sl, + .src_path_bits = 0, + .static_rate = 0, + .grh_flag = 0, + .port = priv->port }; - if (ib_address_create(priv->pd, &av, &entry->address_handle)) { + entry->address_handle = ib_create_ah(priv->pd, &av); + if (IS_ERR(entry->address_handle)) { TS_REPORT_WARN(MOD_IB_NET, - "%s: ib_address_create failed", + "%s: ib_create_ah failed", dev->name); } else { TS_TRACE(MOD_IB_NET, T_VERY_VERBOSE, TRACE_IB_NET_ARP, Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib.h =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib.h (revision 576) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib.h (working copy) @@ -169,7 +169,7 @@ int ipoib_dev_send(struct net_device *dev, struct sk_buff *skb, ipoib_tx_callback_t callback, - void *ptr, struct ib_address *address, u32 qpn); + void *ptr, struct ib_ah *address, u32 qpn); struct ipoib_dev_priv *ipoib_intf_alloc(void); Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c (revision 576) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -288,7 +288,7 @@ static int _ipoib_ib_send(struct ipoib_dev_priv *priv, u64 work_request_id, - struct ib_address *address, u32 qpn, + struct ib_ah *address, u32 qpn, dma_addr_t addr, int len) { struct ib_gather_scatter list = { @@ -314,7 +314,7 @@ /*..ipoib_dev_send -- schedule an IB send work request */ int ipoib_dev_send(struct net_device *dev, struct sk_buff *skb, ipoib_tx_callback_t callback, void *ptr, - struct ib_address *address, u32 qpn) + struct ib_ah *address, u32 qpn) { struct ipoib_dev_priv *priv = dev->priv; struct ipoib_tx_buf *tx_req; Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (revision 576) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -43,7 +43,7 @@ unsigned long created; struct ib_multicast_member mcast_member; - struct ib_address *address_handle; + struct ib_ah *address_handle; tTS_IB_CLIENT_QUERY_TID tid; tTS_IB_GID mgid; @@ -85,10 +85,10 @@ dev->name, IPOIB_GID_ARG(mcast->mgid)); if (mcast->address_handle != NULL) { - int ret = ib_address_destroy(mcast->address_handle); + int ret = ib_destroy_ah(mcast->address_handle); if (ret < 0) TS_REPORT_WARN(MOD_IB_NET, - "%s: ib_address_destroy failed (ret = %d)", + "%s: ib_destroy_ah failed (ret = %d)", dev->name, ret); } @@ -241,22 +241,25 @@ } { - struct ib_address_vector av = { - .dlid = mcast->mcast_member.mlid, - .port = priv->port, - .service_level = mcast->mcast_member.sl, - .source_path_bits = 0, - .use_grh = 1, - .flow_label = mcast->mcast_member.flowlabel, - .hop_limit = mcast->mcast_member.hoplmt, - .source_gid_index = 0, - .static_rate = 0, - .traffic_class = mcast->mcast_member.tclass, + struct ib_ah_attr av = { + .dlid = mcast->mcast_member.mlid, + .port = priv->port, + .sl = mcast->mcast_member.sl, + .src_path_bits = 0, + .static_rate = 0, + .grh_flag = 1, + .grh = { + .flow_label = mcast->mcast_member.flowlabel, + .hop_limit = mcast->mcast_member.hoplmt, + .sgid_index = 0, + .traffic_class = mcast->mcast_member.tclass + } }; - memcpy(av.dgid, mcast->mcast_member.mgid, sizeof(av.dgid)); + memcpy(av.grh.dgid.raw, mcast->mcast_member.mgid, sizeof av.grh.dgid); - if (ib_address_create(priv->pd, &av, &mcast->address_handle)) { + mcast->address_handle = ib_create_ah(priv->pd, &av); + if (IS_ERR(mcast->address_handle)) { TS_REPORT_WARN(MOD_IB_NET, "%s: ib_address_create failed", dev->name); Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 589) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -39,6 +39,32 @@ #ifdef __KERNEL__ +union ib_gid { + u8 raw[16]; + struct { + u64 subnet_prefix; + u64 interface_id; + } global; +}; + +struct ib_global_route { + union ib_gid dgid; + u32 flow_label; + u8 sgid_index; + u8 hop_limit; + u8 traffic_class; +}; + +struct ib_ah_attr { + struct ib_global_route grh; + u16 dlid; + u8 sl; + u8 src_path_bits; + u8 static_rate; + u8 grh_flag; + u8 port; +}; + enum ib_wc_status { IB_WC_SUCCESS, IB_WC_LOC_LEN_ERR, @@ -132,6 +158,11 @@ atomic_t usecnt; /* count all resources */ }; +struct ib_ah { + struct ib_device *device; + struct ib_pd *pd; +}; + typedef void (*ib_comp_handler)(struct ib_cq *cq, void *cq_context); struct ib_cq { @@ -172,9 +203,13 @@ ib_gid_query_func gid_query; struct ib_pd * (*alloc_pd)(struct ib_device *device); int (*dealloc_pd)(struct ib_pd *pd); - ib_address_create_func address_create; - ib_address_query_func address_query; - ib_address_destroy_func address_destroy; + struct ib_ah * (*create_ah)(struct ib_pd *pd, + struct ib_ah_attr *ah_attr); + int (*modify_ah)(struct ib_ah *ah, + struct ib_ah_attr *ah_attr); + int (*query_ah)(struct ib_ah *ah, + struct ib_ah_attr *ah_attr); + int (*destroy_ah)(struct ib_ah *ah); ib_qp_create_func qp_create; ib_special_qp_create_func special_qp_create; ib_qp_modify_func qp_modify; @@ -225,6 +260,11 @@ struct ib_pd *ib_alloc_pd(struct ib_device *device); int ib_dealloc_pd(struct ib_pd *pd); +struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr); +int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr); +int ib_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr); +int ib_destroy_ah(struct ib_ah *ah); + struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, void *cq_context, int cqe); Index: src/linux-kernel/infiniband/include/ts_ib_core_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core_types.h (revision 576) +++ src/linux-kernel/infiniband/include/ts_ib_core_types.h (working copy) @@ -230,12 +230,6 @@ IB_PKEY_CHANGE, }; -struct ib_address { - IB_DECLARE_MAGIC - struct ib_device *device; - void *private; -}; - struct ib_async_obj { void * free_ptr; spinlock_t lock; @@ -513,7 +507,7 @@ u32 rkey; u32 dest_qpn; u32 dest_qkey; - struct ib_address *dest_address; + struct ib_ah *dest_address; u32 immediate_data; u64 compare_add; u64 swap; @@ -586,12 +580,6 @@ tTS_IB_PORT port, int index, tTS_IB_GID gid); -typedef int (*ib_address_create_func)(struct ib_pd *pd, - struct ib_address_vector *address_vector, - struct ib_address *address); -typedef int (*ib_address_query_func)(struct ib_address *address, - struct ib_address_vector *address_vector); -typedef int (*ib_address_destroy_func)(struct ib_address *address); typedef int (*ib_qp_create_func)(struct ib_pd *pd, struct ib_qp_create_param *param, struct ib_qp *qp); Index: src/linux-kernel/infiniband/include/ts_ib_core.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core.h (revision 576) +++ src/linux-kernel/infiniband/include/ts_ib_core.h (working copy) @@ -63,13 +63,6 @@ int index, tTS_IB_GID gid); -int ib_address_create(struct ib_pd *pd, - struct ib_address_vector *address, - struct ib_address **address_handle); -int ib_address_query(struct ib_address *address_handle, - struct ib_address_vector *address); -int ib_address_destroy(struct ib_address *address_handle); - int ib_qp_create(struct ib_qp_create_param *param, struct ib_qp **qp, u32 *qpn); Index: src/linux-kernel/infiniband/core/mad_ib.c =================================================================== --- src/linux-kernel/infiniband/core/mad_ib.c (revision 576) +++ src/linux-kernel/infiniband/core/mad_ib.c (working copy) @@ -51,8 +51,8 @@ struct ib_mad_private *priv = device->mad; struct ib_gather_scatter gather_list; struct ib_send_param send_param; - struct ib_address_vector av; - struct ib_address *addr; + struct ib_ah_attr av; + struct ib_ah *addr; gather_list.address = pci_map_single(priv->ib_dev->dma_device, mad, IB_MAD_PACKET_SIZE, @@ -68,22 +68,23 @@ send_param.solicited_event = 1; send_param.signaled = 1; - av.dlid = mad->dlid; - av.port = mad->port; - av.source_path_bits = 0; - av.use_grh = mad->has_grh; - av.service_level = mad->sl; - av.static_rate = 0; + av.dlid = mad->dlid; + av.port = mad->port; + av.src_path_bits = 0; + av.grh_flag = mad->has_grh; + av.sl = mad->sl; + av.static_rate = 0; - if (av.use_grh) { - av.source_gid_index = mad->gid_index; - av.flow_label = mad->flow_label; - av.hop_limit = mad->hop_limit; - av.traffic_class = mad->traffic_class; - memcpy(av.dgid, mad->gid, sizeof (tTS_IB_GID)); + if (av.grh_flag) { + av.grh.sgid_index = mad->gid_index; + av.grh.flow_label = mad->flow_label; + av.grh.hop_limit = mad->hop_limit; + av.grh.traffic_class = mad->traffic_class; + memcpy(av.grh.dgid.raw, mad->gid, sizeof av.grh.dgid); } - if (ib_address_create(priv->pd, &av, &addr)) + addr = ib_create_ah(priv->pd, &av); + if (IS_ERR(addr)) return -EINVAL; { @@ -112,11 +113,11 @@ mapping), IB_MAD_PACKET_SIZE, PCI_DMA_TODEVICE); - ib_address_destroy(addr); + ib_destroy_ah(addr); return -EINVAL; } - ib_address_destroy(addr); + ib_destroy_ah(addr); return 0; } Index: src/linux-kernel/infiniband/core/core_ah.c =================================================================== --- src/linux-kernel/infiniband/core/core_ah.c (revision 576) +++ src/linux-kernel/infiniband/core/core_ah.c (working copy) @@ -32,66 +32,51 @@ #include #include -int ib_address_create(struct ib_pd *pd, - struct ib_address_vector *address_vector, - struct ib_address **address_handle) +struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) { - struct ib_address *address; - int ret; + struct ib_ah *ah; - if (!pd->device->address_create) { - return -ENOSYS; - } + ah = pd->device->create_ah(pd, ah_attr); - address = kmalloc(sizeof *address, GFP_KERNEL); - if (!address) { - return -ENOMEM; + if (!IS_ERR(ah)) { + ah->device = pd->device; + ah->pd = pd; + atomic_inc(&pd->usecnt); } - ret = pd->device->address_create(pd, address_vector, address); - - if (!ret) { - IB_SET_MAGIC(address, ADDRESS); - address->device = pd->device; - *address_handle = address; - } else { - kfree(address); - } - - return ret; + return ah; } -EXPORT_SYMBOL(ib_address_create); +EXPORT_SYMBOL(ib_create_ah); -int ib_address_query(struct ib_address *address, - struct ib_address_vector *address_vector) +int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) { - IB_CHECK_MAGIC(address, ADDRESS); + return ah->device->modify_ah ? + ah->device->modify_ah(ah, ah_attr) : + -ENOSYS; +} +EXPORT_SYMBOL(ib_modify_ah); - return address->device->address_query ? - address->device->address_query(address, address_vector) : -ENOSYS; +int ib_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) +{ + return ah->device->query_ah ? + ah->device->query_ah(ah, ah_attr) : + -ENOSYS; } -EXPORT_SYMBOL(ib_address_query); +EXPORT_SYMBOL(ib_query_ah); -int ib_address_destroy(struct ib_address *address_handle) +int ib_destroy_ah(struct ib_ah *ah) { - struct ib_address *address = address_handle; - int ret; + struct ib_pd *pd; + int ret; - IB_CHECK_MAGIC(address, ADDRESS); + pd = ah->pd; + ret = ah->device->destroy_ah(ah); + if (!ret) + atomic_inc(&pd->usecnt); - if (!address->device->address_destroy) { - return -ENOSYS; - } - - ret = address->device->address_destroy(address); - if (!ret) { - IB_CLEAR_MAGIC(address); - kfree(address); - } - return ret; } -EXPORT_SYMBOL(ib_address_destroy); +EXPORT_SYMBOL(ib_destroy_ah); /* Local Variables: Index: src/linux-kernel/infiniband/core/core_device.c =================================================================== --- src/linux-kernel/infiniband/core/core_device.c (revision 576) +++ src/linux-kernel/infiniband/core/core_device.c (working copy) @@ -50,8 +50,8 @@ IB_MANDATORY_FUNC(gid_query), IB_MANDATORY_FUNC(alloc_pd), IB_MANDATORY_FUNC(dealloc_pd), - IB_MANDATORY_FUNC(address_create), - IB_MANDATORY_FUNC(address_destroy), + IB_MANDATORY_FUNC(create_ah), + IB_MANDATORY_FUNC(destroy_ah), IB_MANDATORY_FUNC(special_qp_create), IB_MANDATORY_FUNC(qp_modify), IB_MANDATORY_FUNC(qp_destroy), Index: src/linux-kernel/infiniband/hw/mthca/mthca_dev.h =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_dev.h (revision 576) +++ src/linux-kernel/infiniband/hw/mthca/mthca_dev.h (working copy) @@ -334,7 +334,7 @@ void mthca_free_qp(struct mthca_dev *dev, struct mthca_qp *qp); int mthca_create_ah(struct mthca_dev *dev, struct mthca_pd *pd, - struct ib_address_vector *address, + struct ib_ah_attr *ah_attr, struct mthca_ah *ah); int mthca_destroy_ah(struct mthca_dev *dev, struct mthca_ah *ah); int mthca_read_ah(struct mthca_dev *dev, struct mthca_ah *ah, Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 576) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -266,29 +266,31 @@ return 0; } -static int mthca_address_create(struct ib_pd *pd, - struct ib_address_vector *av, - struct ib_address *address) +static struct ib_ah *mthca_ah_create(struct ib_pd *pd, + struct ib_ah_attr *ah_attr) { int err; + struct mthca_ah *ah; - address->private = kmalloc(sizeof (struct mthca_ah), GFP_KERNEL); - if (!address->private) - return -ENOMEM; + ah = kmalloc(sizeof *ah, GFP_KERNEL); + if (!ah) + return ERR_PTR(-ENOMEM); err = mthca_create_ah(to_mdev(pd->device), (struct mthca_pd *) pd, - av, address->private); - if (err) - kfree(address->private); + ah_attr, ah); + if (err) { + kfree(ah); + return ERR_PTR(err); + } - return err; + return (struct ib_ah *) ah; } -static int mthca_address_destroy(struct ib_address *address) +static int mthca_ah_destroy(struct ib_ah *ah) { - mthca_destroy_ah(to_mdev(address->device), address->private); - kfree(address->private); + mthca_destroy_ah(to_mdev(ah->device), ah); + kfree(ah); return 0; } @@ -543,8 +545,8 @@ dev->ib_dev.gid_query = mthca_gid_query; dev->ib_dev.alloc_pd = mthca_alloc_pd; dev->ib_dev.dealloc_pd = mthca_dealloc_pd; - dev->ib_dev.address_create = mthca_address_create; - dev->ib_dev.address_destroy = mthca_address_destroy; + dev->ib_dev.create_ah = mthca_ah_create; + dev->ib_dev.destroy_ah = mthca_ah_destroy; dev->ib_dev.qp_create = mthca_qp_create; dev->ib_dev.special_qp_create = mthca_special_qp_create; dev->ib_dev.qp_modify = mthca_modify_qp; Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.h =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.h (revision 576) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.h (working copy) @@ -67,6 +67,7 @@ struct mthca_av; struct mthca_ah { + struct ib_ah ibah; int on_hca; u32 key; struct mthca_av *av; Index: src/linux-kernel/infiniband/hw/mthca/mthca_av.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_av.c (revision 576) +++ src/linux-kernel/infiniband/hw/mthca/mthca_av.c (working copy) @@ -42,7 +42,7 @@ int mthca_create_ah(struct mthca_dev *dev, struct mthca_pd *pd, - struct ib_address_vector *address, + struct ib_ah_attr *ah_attr, struct mthca_ah *ah) { u32 index = -1; @@ -81,20 +81,20 @@ memset(av, 0, MTHCA_AV_SIZE); - av->port_pd = cpu_to_be32(pd->pd_num | (address->port << 24)); - av->g_slid = (!!address->use_grh << 7) | address->source_path_bits; - av->dlid = cpu_to_be16(address->dlid); + av->port_pd = cpu_to_be32(pd->pd_num | (ah_attr->port << 24)); + av->g_slid = (!!ah_attr->grh_flag << 7) | ah_attr->src_path_bits; + av->dlid = cpu_to_be16(ah_attr->dlid); av->msg_sr = (3 << 4) | /* 2K message */ - address->static_rate; - av->sl_tclass_flowlabel = cpu_to_be32(address->service_level << 28); - if (address->use_grh) { - av->gid_index = (address->port - 1) * dev->limits.gid_table_len + - address->source_gid_index; - av->hop_limit = address->hop_limit; + ah_attr->static_rate; + av->sl_tclass_flowlabel = cpu_to_be32(ah_attr->sl << 28); + if (ah_attr->grh_flag) { + av->gid_index = (ah_attr->port - 1) * dev->limits.gid_table_len + + ah_attr->grh.sgid_index; + av->hop_limit = ah_attr->grh.hop_limit; av->sl_tclass_flowlabel |= - cpu_to_be32((address->traffic_class << 20) | - address->flow_label); - memcpy(av->dgid, address->dgid, 16); + cpu_to_be32((ah_attr->grh.traffic_class << 20) | + ah_attr->grh.flow_label); + memcpy(av->dgid, ah_attr->grh.dgid.raw, 16); } if (0) { Index: src/linux-kernel/infiniband/hw/mthca/mthca_qp.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_qp.c (revision 576) +++ src/linux-kernel/infiniband/hw/mthca/mthca_qp.c (working copy) @@ -226,11 +226,6 @@ u16 vcrc; } __attribute__((packed)); -static struct mthca_ah *to_mah(struct ib_address *ah) -{ - return (struct mthca_ah *) ah->private; -} - static int is_sqp(struct mthca_dev *dev, struct mthca_qp *qp) { return qp->qpn >= dev->qp_table.sqp_start && @@ -1034,7 +1029,8 @@ int header_size; int err; - err = mthca_read_ah(dev, to_mah(param->dest_address), &sqp->ud_header); + err = mthca_read_ah(dev, (struct mthca_ah *) param->dest_address, + &sqp->ud_header); if (err) return err; mlx->flags &= ~cpu_to_be32(MTHCA_NEXT_SOLICIT | 1); @@ -1165,9 +1161,9 @@ if (qp->transport == UD) { ((struct mthca_ud_seg *) wqe)->lkey = - cpu_to_be32(to_mah(param->dest_address)->key); + cpu_to_be32(((struct mthca_ah *) param->dest_address)->key); ((struct mthca_ud_seg *) wqe)->av_addr = - cpu_to_be64(to_mah(param->dest_address)->avdma); + cpu_to_be64(((struct mthca_ah *) param->dest_address)->avdma); ((struct mthca_ud_seg *) wqe)->dqpn = cpu_to_be32(param->dest_qpn); ((struct mthca_ud_seg *) wqe)->qkey = From roland at topspin.com Mon Aug 9 10:36:57 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 10:36:57 -0700 Subject: [openib-general] Re: [PATCH] remove the tSTR typedef In-Reply-To: <1091826283.22091.42.camel@localhost> (Tom Duffy's message of "Fri, 06 Aug 2004 14:04:43 -0700") References: <1091826283.22091.42.camel@localhost> Message-ID: <52zn549ps6.fsf@topspin.com> thanks, applied. From roland at topspin.com Mon Aug 9 10:44:06 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 10:44:06 -0700 Subject: [openib-general] [PATCH] Only include CM and DM client in kernel build when needed (Take 3) In-Reply-To: <1092050119.1818.15.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 09 Aug 2004 07:15:17 -0400") References: <1092050119.1818.15.camel@localhost.localdomain> Message-ID: <52vffs9pg9.fsf@topspin.com> Thanks, I've applied this. I'm curious to get feedback (especially from people who aren't "IB experts") on whether the INFINIBAND_CM and INFINIBAND_DM_CLIENT options seem like too much control (and maybe should be hidden unless EMBEDDED==y). - R. From roland at topspin.com Mon Aug 9 10:51:14 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 10:51:14 -0700 Subject: [openib-general] MELLANOX_HCA removed in favor of MTHCA on my branch Message-ID: <52r7qg9p4d.fsf@topspin.com> I've removed the MELLANOX_HCA driver from my branch as several people have suggested, since it was broken by all the API updates anyway. (Not posting a patch because this is really just deleting a bunch of files) - R. From mshefty at ichips.intel.com Mon Aug 9 10:25:37 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 9 Aug 2004 10:25:37 -0700 Subject: [openib-general] OpenSM Patches for gen2 In-Reply-To: <1092056613.1819.1.camel@localhost.localdomain> References: <1092056613.1819.1.camel@localhost.localdomain> Message-ID: <20040809102537.3ba465c0.mshefty@ichips.intel.com> On Mon, 09 Aug 2004 09:03:31 -0400 Hal Rosenstock wrote: > What is the plan for the gen2 OpenSM in terms of tracking changes to the > gen1 OpenSM ? Not sure of the plan, but we need to start defining the user-mode infrastructure for gen2 for this to happen. There was another message posted to the list about not requiring an embedded SM, so I think that this needs to be done fairly soon. We can probably port opensm once there's agreement on the user-mode portion. From tduffy at sun.com Mon Aug 9 11:46:39 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 09 Aug 2004 11:46:39 -0700 Subject: [openib-general] Re: [openib-commits] r607 - in gen2/branches/roland-merge/src/linux-kernel/infiniband: core hw/mthca include ulp/ipoib In-Reply-To: <20040809174330.A47832283D4@openib.ca.sandia.gov> References: <20040809174330.A47832283D4@openib.ca.sandia.gov> Message-ID: <1092077199.14886.5.camel@duffman> On Mon, 2004-08-09 at 10:43 -0700, wrote: > gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca/mthca_provider.c > Modified: gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca/mthca_provider.c > =================================================================== > --- gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca/mthca_provider.c 2004-08-09 17:26:34 UTC (rev 606) > +++ gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca/mthca_provider.c 2004-08-09 17:43:29 UTC (rev 607) > > -static int mthca_address_destroy(struct ib_address *address) > +static int mthca_ah_destroy(struct ib_ah *ah) > { > - mthca_destroy_ah(to_mdev(address->device), address->private); > - kfree(address->private); > + mthca_destroy_ah(to_mdev(ah->device), ah); > + kfree(ah); > > return 0; > } This is causing a build warning: CC [M] drivers/infiniband/hw/mthca/mthca_provider.o /build1/tduffy/openib-work/linux-2.6.8-rc3-openib/drivers/infiniband/hw/mthca/mthca_provider.c: In function `mthca_ah_destroy': /build1/tduffy/openib-work/linux-2.6.8-rc3-openib/drivers/infiniband/hw/mthca/mthca_provider.c:292: warning: passing arg 2 of `mthca_destroy_ah' from incompatible pointer type This fixes it... Index: drivers/infiniband/hw/mthca/mthca_provider.c =================================================================== --- drivers/infiniband/hw/mthca/mthca_provider.c (revision 610) +++ drivers/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -289,7 +289,7 @@ static int mthca_ah_destroy(struct ib_ah *ah) { - mthca_destroy_ah(to_mdev(ah->device), ah); + mthca_destroy_ah(to_mdev(ah->device), (struct mthca_ah *)ah); kfree(ah); return 0; -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From roland at topspin.com Mon Aug 9 11:50:57 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 11:50:57 -0700 Subject: [openib-general] Re: [openib-commits] r607 - in gen2/branches/roland-merge/src/linux-kernel/infiniband: core hw/mthca include ulp/ipoib In-Reply-To: <1092077199.14886.5.camel@duffman> (Tom Duffy's message of "Mon, 09 Aug 2004 11:46:39 -0700") References: <20040809174330.A47832283D4@openib.ca.sandia.gov> <1092077199.14886.5.camel@duffman> Message-ID: <52n0149mcu.fsf@topspin.com> thanks, I fixed it. -R. From tduffy at sun.com Mon Aug 9 11:59:04 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 09 Aug 2004 11:59:04 -0700 Subject: [openib-general] Re: [openib-commits] r611 - gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca In-Reply-To: <20040809185900.5B0E62283D4@openib.ca.sandia.gov> References: <20040809185900.5B0E62283D4@openib.ca.sandia.gov> Message-ID: <1092077944.14886.13.camel@duffman> On Mon, 2004-08-09 at 11:59 -0700, wrote: > Modified: gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca/mthca_provider.c > =================================================================== > --- gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca/mthca_provider.c 2004-08-09 17:58:24 UTC (rev 610) > +++ gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca/mthca_provider.c 2004-08-09 18:58:59 UTC (rev 611) > @@ -289,7 +289,7 @@ > > static int mthca_ah_destroy(struct ib_ah *ah) > { > - mthca_destroy_ah(to_mdev(ah->device), ah); > + mthca_destroy_ah(to_mdev(ah->device), (struct ib_mthca *) ah); > kfree(ah); hrm. I don't think that is the correct struct...should be struct mthca_ah. -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From tduffy at sun.com Mon Aug 9 12:09:55 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 09 Aug 2004 12:09:55 -0700 Subject: [openib-general] [PATCH] remove the tPTR typedef Message-ID: <1092078595.14886.16.camel@duffman> This patch removes all the uses of tPTR from Roland's branch. Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/ulp/dapl/udapl_mod.c =================================================================== --- drivers/infiniband/ulp/dapl/udapl_mod.c (revision 611) +++ drivers/infiniband/ulp/dapl/udapl_mod.c (working copy) @@ -1430,7 +1430,7 @@ tINT32 status, tTS_IB_PORT hw_port, struct ib_device *ca, - struct ib_path_record *path, tPTR usr_arg) + struct ib_path_record *path, void *usr_arg) { pr_entry_t *pr_entry; @@ -1479,7 +1479,7 @@ tUINT32 dst_addr, tTS_IB_PORT hw_port, struct ib_device *ca, - struct ib_path_record *path, tPTR usr_arg) + struct ib_path_record *path, void *usr_arg) { pr_entry_t *pr_entry; Index: drivers/infiniband/ulp/ipoib/ip2pr_priv.h =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_priv.h (revision 611) +++ drivers/infiniband/ulp/ipoib/ip2pr_priv.h (working copy) @@ -197,8 +197,8 @@ struct ip2pr_ipoib_wait { s8 type; /* ip2pr or gid2pr */ tIP2PR_PATH_LOOKUP_ID plid; /* request identifier */ - tPTR func; /* callback function for completion */ - tPTR arg; /* user argument */ + void *func; /* callback function for completion */ + void *arg; /* user argument */ struct net_device *dev; /* ipoib device */ tTS_KERNEL_TIMER_STRUCT timer; /* retry timer */ u8 retry; /* retry counter */ Index: drivers/infiniband/ulp/ipoib/ip2pr_export.h =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_export.h (revision 611) +++ drivers/infiniband/ulp/ipoib/ip2pr_export.h (working copy) @@ -48,14 +48,14 @@ tTS_IB_PORT hw_port, struct ib_device *ca, struct ib_path_record *path, - tPTR usr_arg); + void *usr_arg); typedef s32(*tGID2PR_LOOKUP_FUNC) (tIP2PR_PATH_LOOKUP_ID plid, s32 status, tTS_IB_PORT hw_port, struct ib_device *ca, struct ib_path_record *path, - tPTR usr_arg); + void *usr_arg); /* * address lookup initiation. * @@ -72,7 +72,7 @@ u8 localroute, s32 bound_dev_if, tIP2PR_PATH_LOOKUP_FUNC func, - tPTR arg, + void *arg, tIP2PR_PATH_LOOKUP_ID *plid); /* @@ -87,7 +87,7 @@ tTS_IB_GID dst_gid, u16 pkey, tGID2PR_LOOKUP_FUNC func, - tPTR arg, + void *arg, tIP2PR_PATH_LOOKUP_ID * plid); s32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid); Index: drivers/infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_link.c (revision 611) +++ drivers/infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -158,7 +158,7 @@ static s32 _tsIp2prPathLookupComplete(tIP2PR_PATH_LOOKUP_ID plid, s32 status, struct ip2pr_path_element *path_elmt, - tPTR funcptr, tPTR arg) + void *funcptr, void *arg) { tIP2PR_PATH_LOOKUP_FUNC func = (tIP2PR_PATH_LOOKUP_FUNC) funcptr; TS_CHECK_NULL(func, -EINVAL); @@ -217,7 +217,7 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitTimeout -- timeout function for link resolution */ -static void _tsIp2prIpoibWaitTimeout(tPTR arg) +static void _tsIp2prIpoibWaitTimeout(void *arg) { struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *)arg; s32 result; @@ -301,7 +301,7 @@ static struct ip2pr_ipoib_wait * _tsIp2prIpoibWaitCreate(tIP2PR_PATH_LOOKUP_ID plid, u32 dst_addr, u32 src_addr, u8 localroute, u32 bound_dev_if, - tIP2PR_PATH_LOOKUP_FUNC func, tPTR arg, s32 ltype) + tIP2PR_PATH_LOOKUP_FUNC func, void *arg, s32 ltype) { struct ip2pr_ipoib_wait *ipoib_wait; @@ -329,7 +329,7 @@ ipoib_wait->bound_dev = bound_dev_if; ipoib_wait->gw_addr = 0; ipoib_wait->arg = arg; - ipoib_wait->func = (tPTR) func; + ipoib_wait->func = (void *) func; ipoib_wait->plid = plid; ipoib_wait->dev = 0; ipoib_wait->retry = _tsIp2prLinkRoot.max_retries; @@ -728,7 +728,7 @@ /*.._tsIp2prPathRecordComplete -- path lookup complete, save result */ static s32 _tsIp2prPathRecordComplete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, struct ib_path_record *path, - s32 remaining, tPTR arg) + s32 remaining, void *arg) { struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *) arg; struct ip2pr_path_element *path_elmt = NULL; @@ -1227,7 +1227,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prArpRecvComplete -- receive all ARP packets. */ -static void _tsIp2prArpRecvComplete(tPTR arg) +static void _tsIp2prArpRecvComplete(void *arg) { struct ip2pr_ipoib_wait *ipoib_wait; struct ip2pr_ipoib_wait *next_wait; @@ -1437,7 +1437,7 @@ /* ========================================================================= */ /*.._tsIp2prPathSweepTimerFunc --sweep path cache to reap old entries. */ -static void _tsIp2prPathSweepTimerFunc(tPTR arg) +static void _tsIp2prPathSweepTimerFunc(void *arg) { struct ip2pr_path_element *path_elmt; struct ip2pr_path_element *next_elmt; @@ -1510,7 +1510,7 @@ /*..tsSdpPathRecordLookup -- resolve an ip address to a path record */ s32 tsIp2prPathRecordLookup(u32 dst_addr, u32 src_addr, u8 localroute, s32 bound_dev_if, tIP2PR_PATH_LOOKUP_FUNC func, - tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid) + void *arg, tIP2PR_PATH_LOOKUP_ID * plid) { struct ip2pr_path_element *path_elmt; struct ip2pr_ipoib_wait *ipoib_wait; @@ -1908,7 +1908,7 @@ /*.._tsGid2prComplete -- path lookup complete, save result */ static s32 _tsGid2prComplete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, struct ib_path_record *path, s32 remaining, - tPTR arg) + void *arg) { s32 result; struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *) arg; @@ -1971,7 +1971,7 @@ /* ========================================================================= */ /*..tsGid2prLookup -- Resolve a destination GD to Path Record */ s32 tsGid2prLookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, u16 pkey, - tGID2PR_LOOKUP_FUNC funcptr, tPTR arg, + tGID2PR_LOOKUP_FUNC funcptr, void *arg, tIP2PR_PATH_LOOKUP_ID * plid) { struct ip2pr_sgid_element *gid_node; @@ -2016,7 +2016,7 @@ 0, 0, 0, - (tPTR) funcptr, + (void *) funcptr, arg, LOOKUP_GID2PR); if (NULL == ipoib_wait) { @@ -2361,7 +2361,7 @@ static s32 _tsIp2prCbInternal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, u32 src_addr, u32 dst_addr, tTS_IB_PORT hw_port, struct ib_device *ca, struct ib_path_record *path, - tPTR usr_arg) + void *usr_arg) { struct ip2pr_user_req *ureq; @@ -2384,7 +2384,7 @@ /*..tsIp2prCbInternal -- Callback for Gid to Path Record Lookup */ static s32 _tsGid2prCbInternal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, tTS_IB_PORT hw_port, struct ib_device *ca, - struct ib_path_record *path, tPTR usr_arg) + struct ib_path_record *path, void *usr_arg) { struct ip2pr_user_req *ureq; @@ -2487,7 +2487,7 @@ ureq->status = 0; sema_init(&ureq->sem, 0); status = tsGid2prLookup(param.src_gid, param.dst_gid, param.pkey, - _tsGid2prCbInternal, (tPTR) ureq, &plid); + _tsGid2prCbInternal, (void *) ureq, &plid); if (status < 0) { kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); return (-EFAULT); Index: drivers/infiniband/ulp/ipoib/ip2pr_proc.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_proc.c (revision 611) +++ drivers/infiniband/ulp/ipoib/ip2pr_proc.c (working copy) @@ -87,7 +87,7 @@ /* ========================================================================= */ /*.._tsIp2prProcReadParse -- read function for the injection table */ static s32 _tsIp2prProcReadParse(char *page, char **start, off_t offset, - s32 count, s32 *eof, tPTR data) + s32 count, s32 *eof, void *data) { struct ip2pr_proc_sub_entry *sub_entry = (struct ip2pr_proc_sub_entry *) data; Index: drivers/infiniband/include/ib_legacy_types.h =================================================================== --- drivers/infiniband/include/ib_legacy_types.h (revision 611) +++ drivers/infiniband/include/ib_legacy_types.h (working copy) @@ -193,7 +193,6 @@ typedef tUINT8 tSLOT; /* 1 based value */ typedef tUINT8 tCPU; /* 1 based value */ typedef tUINT8 tPORT; -typedef void * tPTR; typedef const void * tCONST_PTR; typedef const char * tCONST_STR; typedef tUINT32 tIFINDEX; Index: drivers/infiniband/core/useraccess_cm.c =================================================================== --- drivers/infiniband/core/useraccess_cm.c (revision 611) +++ drivers/infiniband/core/useraccess_cm.c (working copy) @@ -501,7 +501,7 @@ tTS_IB_PORT hw_port, struct ib_device *ca, struct ib_path_record *path, - tPTR usr_arg) + void *usr_arg) { struct pathrecordlookup *prl = (struct pathrecordlookup *)usr_arg; -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From roland at topspin.com Mon Aug 9 12:15:39 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 12:15:39 -0700 Subject: [openib-general] Re: [openib-commits] r611 - gen2/branches/roland-merge/src/linux-kernel/infiniband/hw/mthca In-Reply-To: <1092077944.14886.13.camel@duffman> (Tom Duffy's message of "Mon, 09 Aug 2004 11:59:04 -0700") References: <20040809185900.5B0E62283D4@openib.ca.sandia.gov> <1092077944.14886.13.camel@duffman> Message-ID: <52isbs9l7o.fsf@topspin.com> You're right of course... serves me right for trying to apply "trivial" patches by hand.. - R. From roland at topspin.com Mon Aug 9 13:15:20 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 13:15:20 -0700 Subject: [openib-general] [PATCH] Update my branch's MW functions Message-ID: <52ekmg9ig7.fsf@topspin.com> Pretty trivial because I haven't implemented memory windows in mthca and I don't have any ULPs that use MWs. - R. Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 607) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -153,6 +153,8 @@ IB_MR_REREG_ACCESS = (1<<2) }; +struct ib_mw_bind; + struct ib_pd { struct ib_device *device; atomic_t usecnt; /* count all resources */ @@ -181,6 +183,12 @@ atomic_t usecnt; /* count number of MWs */ }; +struct ib_mw { + struct ib_device *device; + struct ib_pd *pd; + u32 rkey; +}; + struct ib_device { IB_DECLARE_MAGIC @@ -243,9 +251,11 @@ int num_phys_buf, int mr_access_flags, u64 *iova_start); - ib_mw_create_func mw_create; - ib_mw_destroy_func mw_destroy; - ib_mw_bind_func mw_bind; + struct ib_mw * (*alloc_mw)(struct ib_pd *pd); + int (*bind_mw)(struct ib_qp *qp, + struct ib_mw *mw, + struct ib_mw_bind *mw_bind); + int (*dealloc_mw)(struct ib_mw *mw); ib_fmr_create_func fmr_create; ib_fmr_destroy_func fmr_destroy; ib_fmr_map_func fmr_map; @@ -328,6 +338,20 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int ib_dereg_mr(struct ib_mr *mr); +struct ib_mw *ib_alloc_mw(struct ib_pd *pd); + +static inline int ib_bind_mw(struct ib_qp *qp, + struct ib_mw *mw, + struct ib_mw_bind *mw_bind) +{ + /* XXX reference counting in mw? */ + return mw->device->bind_mw ? + mw->device->bind_mw(qp, mw, mw_bind) : + -ENOSYS; +} + +int ib_dealloc_mw(struct ib_mw *mw); + #endif /* __KERNEL __ */ /* XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX */ Index: src/linux-kernel/infiniband/include/ts_ib_core_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core_types.h (revision 607) +++ src/linux-kernel/infiniband/include/ts_ib_core_types.h (working copy) @@ -405,14 +405,6 @@ /* structures */ -struct ib_mw { - IB_DECLARE_MAGIC - struct ib_device *device; - struct ib_pd *pd; - u32 rkey; - void *private; -}; - enum { IB_DEVICE_NOTIFIER_ADD, IB_DEVICE_NOTIFIER_REMOVE @@ -531,17 +523,6 @@ int signaled:1; }; -struct ib_mw_bind_param { - u64 work_request_id; - u32 rkey; - u32 lkey; - u64 address; - u64 length; - enum ib_memory_access access; - int signaled:1; - int fence:1; -}; - struct ib_fmr_pool_param { int max_pages_per_fmr; enum ib_memory_access access; @@ -599,14 +580,6 @@ typedef int (*ib_receive_post_func)(struct ib_qp *qp, struct ib_receive_param *param, int num_work_requests); -typedef int (*ib_mw_create_func)(struct ib_pd *pd, - struct ib_mw **mw, - u32 *rkey); -typedef int (*ib_mw_destroy_func)(struct ib_mw *mw); -typedef int (*ib_mw_bind_func)(struct ib_qp *qp, - struct ib_mw *mw, - struct ib_mw_bind_param *param, - u32 *new_rkey); typedef int (*ib_fmr_create_func)(struct ib_pd *pd, enum ib_memory_access access, int max_pages, Index: src/linux-kernel/infiniband/include/ts_ib_core.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core.h (revision 607) +++ src/linux-kernel/infiniband/include/ts_ib_core.h (working copy) @@ -93,15 +93,6 @@ return qp->device->receive_post(qp, param, num_work_requests); } -int ib_mw_create(struct ib_pd *pd, - struct ib_mw **mw, - u32 *rkey); -int ib_mw_destroy(struct ib_mw *mw); -int ib_mw_bind(struct ib_qp *qp, - struct ib_mw *mw, - struct ib_mw_bind_param *param, - u32 *new_rkey); - int ib_fmr_pool_create(struct ib_pd *pd, struct ib_fmr_pool_param *params, struct ib_fmr_pool **pool); Index: src/linux-kernel/infiniband/core/core_mw.c =================================================================== --- src/linux-kernel/infiniband/core/core_mw.c (revision 576) +++ src/linux-kernel/infiniband/core/core_mw.c (working copy) @@ -30,71 +30,38 @@ #include "ts_kernel_trace.h" #include "ts_kernel_services.h" -int ib_mw_create(struct ib_pd *pd, - struct ib_mw **mw, - u32 *rkey) +struct ib_mw *ib_alloc_mw(struct ib_pd *pd) { - int ret; + struct ib_mw *mw; - if (!pd->device->mw_create) - return -ENOSYS; + if (!pd->device->alloc_mw) + return ERR_PTR(-ENOSYS); - *mw = kmalloc(sizeof **mw, GFP_KERNEL); - if (!*mw) - return -ENOMEM; - - (*mw)->device = pd->device; - (*mw)->pd = pd; - - ret = pd->device->mw_create(pd, mw, &(*mw)->rkey); - if (ret) { - kfree(mw); - return ret; + mw = pd->device->alloc_mw(pd); + if (!IS_ERR(mw)) { + mw->device = pd->device; + mw->pd = pd; + atomic_inc(&pd->usecnt); } - IB_SET_MAGIC(*mw, MW); - *rkey = (*mw)->rkey; - - return 0; + return mw; } -EXPORT_SYMBOL(ib_mw_create); +EXPORT_SYMBOL(ib_alloc_mw); -int ib_mw_destroy(struct ib_mw *mw) +int ib_dealloc_mw(struct ib_mw *mw) { + struct ib_pd *pd; int ret; - IB_CHECK_MAGIC(mw, MW); + pd = mw->pd; + ret = mw->device->dealloc_mw(mw); + if (!ret) + atomic_inc(&pd->usecnt); - if (!mw->device->mw_destroy) - return -ENOSYS; - - ret = mw->device->mw_destroy(mw); - if (ret) - return ret; - - IB_CLEAR_MAGIC(mw); - kfree(mw); - - return 0; + return ret; } -EXPORT_SYMBOL(ib_mw_destroy); +EXPORT_SYMBOL(ib_dealloc_mw); -int ib_mw_bind(struct ib_qp *qp, - struct ib_mw *mw, - struct ib_mw_bind_param *param, - u32 *new_rkey) -{ - IB_CHECK_MAGIC(qp, QP); - IB_CHECK_MAGIC(mw, MW); - - if (qp->device != mw->device) - return -EINVAL; - - return mw->device->mw_bind ? - mw->device->mw_bind(qp, mw, param, new_rkey) : -ENOSYS; -} -EXPORT_SYMBOL(ib_mw_bind); - /* * Local Variables: * c-file-style: "linux" From roland at topspin.com Mon Aug 9 13:16:35 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 13:16:35 -0700 Subject: [openib-general] [PATCH] remove the tPTR typedef In-Reply-To: <1092078595.14886.16.camel@duffman> (Tom Duffy's message of "Mon, 09 Aug 2004 12:09:55 -0700") References: <1092078595.14886.16.camel@duffman> Message-ID: <52acx49ie4.fsf@topspin.com> thanks, applied. -R. From roland at topspin.com Mon Aug 9 13:16:58 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 13:16:58 -0700 Subject: [openib-general] [PATCH] Update my branch's MW functions In-Reply-To: <52ekmg9ig7.fsf@topspin.com> (Roland Dreier's message of "Mon, 09 Aug 2004 13:15:20 -0700") References: <52ekmg9ig7.fsf@topspin.com> Message-ID: <52657s9idh.fsf@topspin.com> BTW, someone still needs to fill out the definition of struct ib_mw_bind... - R. From roland at topspin.com Mon Aug 9 13:31:24 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 13:31:24 -0700 Subject: [openib-general] [PATCH] kill dead code from ib_legacy.h Message-ID: <521xig9hpf.fsf@topspin.com> Tom has been doing the real work of getting rid of the typedefs and so on that are actually used. Here's a patch that kills a bunch of stuff never referenced outside of ib_legacy.c. - R. Index: src/linux-kernel/infiniband/include/ib_legacy_types.h =================================================================== --- src/linux-kernel/infiniband/include/ib_legacy_types.h (revision 614) +++ src/linux-kernel/infiniband/include/ib_legacy_types.h (working copy) @@ -27,26 +27,7 @@ /* * #define section */ -#ifdef DEBUG -# ifndef PRIVATE -# define PRIVATE -# endif -# ifndef PUBLIC -# define PUBLIC -# endif -#else -# ifndef PRIVATE -# define PRIVATE static -# endif -# ifndef PUBLIC -# define PUBLIC -# endif -#endif -#ifndef NULL -#define NULL 0 -#endif - #ifndef TRUE #define TRUE 1 #endif @@ -55,129 +36,11 @@ #define FALSE 0 #endif -#ifndef STATIC -#define STATIC static -#endif - -#define TS_EOF TRUE -#define TS_NOT_EOF FALSE - - -/* Misc min, max, size values */ -#define MAX_IP_ADDR_LEN_IN_BYTE 4 -#define MAX_IP_ADDR_LEN_IN_WORD 2 -#define MAX_ETHER_ADDR_LEN_IN_BYTE 6 -#define MAX_ETHER_ADDR_LEN_IN_WORD 3 -#define MAX_LOGICAL_PORTS_IN_BITS 128 / 8 -#define TS_CLI_MAX_CONTEXT_BUFSIZE 64 -#define GUID_LEN_IN_BYTE 8 -#define GUID_LEN_IN_WORD 4 -#define GID_LEN_IN_BYTE 16 -#define GID_LEN_IN_WORD 8 -#define MAX_IPOIB_ADDR_LEN_IN_BYTE 26 -#define MAX_IPOIB_NOGID_ADDR_LEN_IN_BYTE 10 -#define MAX_IPOIB_HW_ADDR_LEN_IN_BYTE 19 - /* - * Low-level values used to distinguish between devices. - * (In-general, your code should not reference these - * values directly). - */ -#if defined(TS_ppc440_lt) || defined(TS_ppc440_lt_sim) -#define MAX_CC_PER_SHELF 1 -#define MAX_CPU_PER_SLOT 1 -#define MAX_FRU_PER_SHELF 0 -#define MAX_SLOT_PER_SHELF 2 -#define MAX_PORT_PER_SLOT 12 -#elif defined(TS_ppc440_en_sun) || defined(TS_ppc440_fc_sun) || defined(TS_ppc440_fg_sun) -#define MAX_CC_PER_SHELF 1 -#define MAX_CPU_PER_SLOT 1 -#define MAX_FRU_PER_SHELF 0 -#define MAX_SLOT_PER_SHELF 1 -#define MAX_PORT_PER_SLOT 12 /* including internal ib ports */ -#elif defined(TS_ppc440_270sc) -#define MAX_CC_PER_SHELF 2 -#define MAX_CPU_PER_SLOT 1 -#define MAX_FRU_PER_SHELF 0 -#define MAX_SLOT_PER_SHELF 17 -#define MAX_PORT_PER_SLOT 12 -#elif defined(TS_ppc440_120sc) -#define MAX_CC_PER_SHELF 1 -#define MAX_CPU_PER_SLOT 1 -#define MAX_FRU_PER_SHELF 4 -#define MAX_SLOT_PER_SHELF 1 -#define MAX_PORT_PER_SLOT 24 -#elif defined(TS_ppc440_bldsc) -#define MAX_CC_PER_SHELF 1 -#define MAX_CPU_PER_SLOT 1 -#define MAX_FRU_PER_SHELF 0 -#define MAX_SLOT_PER_SHELF 1 -#define MAX_PORT_PER_SLOT 24 -#elif defined(TS_i386) -#define MAX_CC_PER_SHELF 2 /* simulation target */ -#define MAX_CPU_PER_SLOT 1 -#define MAX_FRU_PER_SHELF 0 -#define MAX_SLOT_PER_SHELF 16 -#define MAX_PORT_PER_SLOT 12 -#else -#define MAX_CC_PER_SHELF 2 /* default target */ -#define MAX_CPU_PER_SLOT 1 -#define MAX_FRU_PER_SHELF 0 -#define MAX_SLOT_PER_SHELF 16 -#define MAX_PORT_PER_SLOT 12 -#endif - - -/* - * Generic values that are safe to use by all devices - * and software modules. - */ -#define NUM_FRUS MAX_FRU_PER_SHELF - -#define MIN_SLOT_NUM 1 -#define MAX_SLOT_NUM MAX_SLOT_PER_SHELF - -#define MIN_PORT_NUM 1 -#define MAX_PORT_NUM MAX_PORT_PER_SLOT - -#define MIN_VLAN_NUM 1 -#define MAX_VLAN_NUM 127 - -#define MIN_BRIDGE_NUM 1 -#define MAX_BRIDGE_NUM 127 - -#define MIN_TRK_NUM 1 -#define MAX_TRK_NUM 127 - -#define MIN_GATEWAY_PORT_NUM 1 -#define MAX_GATEWAY_PORT_NUM 2 - -#define GATEWAY_PORT_NUM 0 -#define GATEWAY_2_PORT_NUM 63 - -#if defined(TS_ppc440_lt) || defined(TS_ppc440_lt_sim) -#define FIRST_SWITCH_CARD_SLOT 1 -#else -#define FIRST_SWITCH_CARD_SLOT 16 -#endif - -#define TS_CONTROLLER_CARD_NUMBER 1 - -#define PKT_TCP_PORT_NUM (MAX_PORT_NUM + 1) - - - -/* - * typedef section - */ - -/* * Common types used by all proprietary TopSpin code (native C types * should not be used). */ typedef int tBOOLEAN; -typedef void tVOID; -typedef void* tpVOID; typedef char tINT8; typedef unsigned char tUINT8; typedef short tINT16; @@ -187,27 +50,6 @@ typedef long long tINT64; typedef unsigned long long tUINT64; -typedef tUINT32 tSTG_ID; -typedef tUINT16 tVLAN_ID; -typedef tUINT8 tSHELF; /* 1 based value */ -typedef tUINT8 tSLOT; /* 1 based value */ -typedef tUINT8 tCPU; /* 1 based value */ -typedef tUINT8 tPORT; -typedef const void * tCONST_PTR; -typedef const char * tCONST_STR; -typedef tUINT32 tIFINDEX; - -typedef tUINT32 tIB_RKEY; -typedef tUINT16 tIB_PKEY; -typedef tUINT32 tIB_QKEY; -typedef tUINT8 tIB_GUID[8]; -typedef tUINT8 tIB_GID[16]; -typedef tUINT8 tIB_LID[2]; -typedef tUINT32 tIB_QPN; - -typedef tIB_GUID * tpIB_GUID; - - /* * Generic type for returning pass/fail information back from subroutines * Note that this is the *opposite* semantics from BOOLEAN. I.e. a zero @@ -221,82 +63,4 @@ } tSTATUS; -/* - * Used to store error codes defined in "all/common/include/error_codes.h" - */ -typedef unsigned int tERROR_CODE; - - - -/* MAC address */ -typedef struct -{ - union - { - tUINT8 MacAddrByte[MAX_ETHER_ADDR_LEN_IN_BYTE]; - tUINT16 MacAddrWord[MAX_ETHER_ADDR_LEN_IN_WORD]; - } u; - -} tMAC_ADDR; - -/* IP v4 address */ -typedef struct -{ - union - { - tUINT8 IpAddrByte[MAX_IP_ADDR_LEN_IN_BYTE]; - tUINT16 IpAddrWord[MAX_IP_ADDR_LEN_IN_WORD]; - tUINT32 IpAddr; - } u; - -} tIP_ADDR, *tpIP_ADDR; - -#define TS_GET_IP_ADDR32(addr) ((addr).u.IpAddr) -#define TS_GET_IP_ADDR_HOST32(addr) ntohl((addr).u.IpAddr) -#define TS_GET_IP_ADDR_NET32(addr) htonl((addr).u.IpAddr) - -struct ipoib_struct { - tIB_GID gid; - tUINT32 cap_flags_qpn; /* low 3 bytes=QPN */ -} __attribute__ ((packed)); - - -/* IP over IB physical address */ -typedef struct ipoib_struct tIPOIB_ADDR, *tpIPOIB_ADDR; - - -/* Ethernet II Frame Header */ -typedef struct EthHdr -{ - tMAC_ADDR eth_daddr; /* off=0 */ - tMAC_ADDR eth_saddr __attribute__ ((packed)); /* off=6 */ - tUINT16 eth_type __attribute__ ((packed)); /* off=12 */ - tUINT8 eth_data[0] __attribute__ ((packed)); /* off=14 */ - -} tETH_HDR; - -typedef struct valueDescPairSt -{ - tINT32 iValue; - char* sDesc; -} tValueDescPair, *tpValueDescPair; - - -/* - * Table entry status - */ -typedef enum -{ - TS_ENTRY_DESTROY = 0, /* this is a command not a state */ - TS_ENTRY_STANDBY = 1, - TS_ENTRY_ACTIVE = 2, - TS_ENTRY_CREATE = 3 /* this is a command not a state */ - -} tTS_TBL_ENTRY_STATUS; - - -#define TS_EN4P1G_NUM_PORTS 6 - -//#define sim_ppc440_bldsc // to be commented out - #endif /* _IB_LEGACY_TYPES_H */ From halr at voltaire.com Mon Aug 9 14:25:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 09 Aug 2004 17:25:26 -0400 Subject: [openib-general] mthca v. current ib_verbs Message-ID: <1092086728.1923.101.camel@localhost.localdomain> I'm sure you know about these but here are a few places where mthca is not yet moved over to the latest ib_verbs: *** Warning: "ib_destroy_qp" [drivers/infiniband/access/gsi.ko] undefined! *** Warning: "ib_get_special_qp" [drivers/infiniband/access/gsi.ko] undefined! *** Warning: "ib_modify_qp" [drivers/infiniband/access/gsi.ko] undefined! *** Warning: "ib_req_notify_cq" [drivers/infiniband/access/gsi.ko] undefined! *** Warning: "ib_poll_cq" [drivers/infiniband/access/gsi.ko] undefined! *** Warning: "ib_post_send" [drivers/infiniband/access/gsi.ko] undefined! *** Warning: "ib_post_recv" [drivers/infiniband/access/gsi.ko] undefined! -- Hal From roland at topspin.com Mon Aug 9 14:29:26 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 14:29:26 -0700 Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <1092086728.1923.101.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 09 Aug 2004 17:25:26 -0400") References: <1092086728.1923.101.camel@localhost.localdomain> Message-ID: <52wu0880g9.fsf@topspin.com> Hal> I'm sure you know about these but here are a few places where Hal> mthca is not yet moved over to the latest ib_verbs: Yes, as my stream of patches shows I've been updating mthca step by step, but obviously I'm not done yet. - Roland From mshefty at ichips.intel.com Mon Aug 9 13:28:55 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 9 Aug 2004 13:28:55 -0700 Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <1092086728.1923.101.camel@localhost.localdomain> References: <1092086728.1923.101.camel@localhost.localdomain> Message-ID: <20040809132855.2e88b6e4.mshefty@ichips.intel.com> On Mon, 09 Aug 2004 17:25:26 -0400 Hal Rosenstock wrote: > I'm sure you know about these but here are a few places where mthca is > not yet moved over to the latest ib_verbs: I believe that Roland's been migrating the functions in groups to minimize testing and ensure that the ipoib still runs. From halr at voltaire.com Mon Aug 9 14:43:39 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 09 Aug 2004 17:43:39 -0400 Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <52wu0880g9.fsf@topspin.com> References: <1092086728.1923.101.camel@localhost.localdomain> <52wu0880g9.fsf@topspin.com> Message-ID: <1092087821.1691.2.camel@localhost.localdomain> On Mon, 2004-08-09 at 17:29, Roland Dreier wrote: > Yes, as my stream of patches shows I've been updating mthca step by > step, but obviously I'm not done yet. I don't know your entire list or your order of tackling them so I just wanted to provide some input on at least my preference for which ones might come sooner. -- Hal From tduffy at sun.com Mon Aug 9 14:52:04 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 09 Aug 2004 14:52:04 -0700 Subject: [openib-general] [PATCH] Kill more t??? typedefs Message-ID: <1092088324.14886.20.camel@duffman> This patch gets rid of all the t[U]INT* types in Roland's branch. Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/ulp/dapl/khash.c =================================================================== --- drivers/infiniband/ulp/dapl/khash.c (revision 615) +++ drivers/infiniband/ulp/dapl/khash.c (working copy) @@ -42,10 +42,10 @@ #define DAPL_CHECK_LT(value, bound, result) \ if ((bound) > (value)) return(result); -static inline tUINT32 _DaplKeyHash(DAPL_HASH_TABLE table, char *key) +static inline u32 _DaplKeyHash(DAPL_HASH_TABLE table, char *key) { - tUINT32 i; - tUINT32 s = 0; + u32 i; + u32 s = 0; for (i = 0; i < HASH_KEY_SIZE; i++) s += key[i]; @@ -57,12 +57,12 @@ /* Static functions for simple hash table bucket managment */ /* ------------------------------------------------------------------------- */ static kmem_cache_t *_bucket_cache = NULL; -static tINT32 _use_count = 0; +static s32 _use_count = 0; /* ========================================================================= */ /*.._DaplBucketCacheCreate -- create, if necessary, a bucket cache */ -static tINT32 _DaplBucketCacheCreate(void - ) { +static s32 _DaplBucketCacheCreate(void) +{ if (NULL == _bucket_cache) { DAPL_EXPECT((0 == _use_count)); @@ -83,8 +83,8 @@ /* ========================================================================= */ /*.._DaplBucketCacheDestroy -- destroy, if necessary, the bucket cache */ -static tINT32 _DaplBucketCacheDestroy(void - ) { +static s32 _DaplBucketCacheDestroy(void) +{ DAPL_CHECK_NULL(_bucket_cache, -EINVAL); DAPL_CHECK_LT(_use_count, 1, -EINVAL); @@ -100,8 +100,8 @@ /* ========================================================================= */ /*.._DaplBucketCacheGet -- get the bucket cache */ -static kmem_cache_t *_DaplBucketCacheGet(void - ) { +static kmem_cache_t *_DaplBucketCacheGet(void) +{ return _bucket_cache; } /* _DaplBucketCacheGet */ @@ -111,9 +111,10 @@ /* ========================================================================= */ /*..DaplHashTableCreate -- create the simple hash table */ -DAPL_HASH_TABLE DaplHashTableCreate(tINT32 size) { +DAPL_HASH_TABLE DaplHashTableCreate(s32 size) +{ DAPL_HASH_TABLE table; - tINT32 result; + s32 result; /* * round size down to a multiple of a bucket pointer. */ @@ -161,9 +162,10 @@ /* ========================================================================= */ /*..DaplHashTableDestroy -- destroy the simple hash table */ -tINT32 DaplHashTableDestroy(DAPL_HASH_TABLE table) { +s32 DaplHashTableDestroy(DAPL_HASH_TABLE table) +{ DAPL_HASH_BUCKET trav_bucket; - tINT32 counter; + s32 counter; DAPL_CHECK_NULL(table, -EINVAL); /* @@ -187,10 +189,10 @@ /* ========================================================================= */ /*..DaplHashTableInsert -- insert a value into the hash table */ -tINT32 DaplHashTableInsert - (DAPL_HASH_TABLE table, char *key, unsigned long value) { +s32 DaplHashTableInsert(DAPL_HASH_TABLE table, char *key, unsigned long value) +{ DAPL_HASH_BUCKET bucket; - tUINT32 offset; + u32 offset; DAPL_CHECK_NULL(_DaplBucketCacheGet(), -EINVAL); DAPL_CHECK_NULL(table, -EINVAL); @@ -234,10 +236,10 @@ /* ========================================================================= */ /*..DaplHashTableLookup -- lookup a value using it's key in the table */ -tINT32 DaplHashTableLookup - (DAPL_HASH_TABLE table, char *key, unsigned long *value) { +s32 DaplHashTableLookup(DAPL_HASH_TABLE table, char *key, unsigned long *value) +{ DAPL_HASH_BUCKET bucket; - tUINT32 offset; + u32 offset; DAPL_CHECK_NULL(table, -EINVAL); /* @@ -263,10 +265,10 @@ /* ========================================================================= */ /*..DaplHashTableRemove -- remove a src indexed entry from the table */ -tINT32 DaplHashTableRemove - (DAPL_HASH_TABLE table, char *key, unsigned long *value) { +s32 DaplHashTableRemove(DAPL_HASH_TABLE table, char *key, unsigned long *value) +{ DAPL_HASH_BUCKET bucket; - tUINT32 offset; + u32 offset; DAPL_CHECK_NULL(_DaplBucketCacheGet(), -EINVAL); DAPL_CHECK_NULL(table, -EINVAL); @@ -304,10 +306,11 @@ /* ========================================================================= */ /*..DaplHashTableRemoveValue -- remove all entries with a given value */ -tINT32 DaplHashTableRemoveValue(DAPL_HASH_TABLE table, unsigned long value) { +s32 DaplHashTableRemoveValue(DAPL_HASH_TABLE table, unsigned long value) +{ DAPL_HASH_BUCKET next; DAPL_HASH_BUCKET bucket; - tINT32 offset; + s32 offset; DAPL_CHECK_NULL(_DaplBucketCacheGet(), -EINVAL); DAPL_CHECK_NULL(table, -EINVAL); @@ -345,15 +348,14 @@ /* ========================================================================= */ /*..DaplHashTableDump -- dump the contents of the hash table */ -tINT32 DaplHashTableDump - (DAPL_HASH_TABLE table, - DAPL_HASH_DUMP_FUNC dfunc, - char *buffer, tINT32 max_size, tINT32 start, tINT32 * end) { +s32 DaplHashTableDump(DAPL_HASH_TABLE table, DAPL_HASH_DUMP_FUNC dfunc, + char *buffer, s32 max_size, s32 start, s32 * end) +{ DAPL_HASH_BUCKET bucket; - tINT32 offset = 0; - tINT32 elements; - tINT32 counter; - tINT32 result; + s32 offset = 0; + s32 elements; + s32 counter; + s32 result; DAPL_CHECK_NULL(table, -EINVAL); DAPL_CHECK_NULL(buffer, -EINVAL); Index: drivers/infiniband/ulp/dapl/khash.h =================================================================== --- drivers/infiniband/ulp/dapl/khash.h (revision 615) +++ drivers/infiniband/ulp/dapl/khash.h (working copy) @@ -39,9 +39,9 @@ * bytes written into buffer, a negative return means that data will not * fit into max_size bytes. */ -typedef tINT32(*DAPL_HASH_DUMP_FUNC) (char *buffer, - tINT32 max_size, - char *key, tUINT32 value); +typedef s32(*DAPL_HASH_DUMP_FUNC) (char *buffer, + s32 max_size, + char *key, u32 value); /* * A simple hash table. */ @@ -53,31 +53,31 @@ }; struct DAPL_HASH_TABLE_STRUCT { - tINT32 size; /* size of hash table */ - tINT32 num_entries; /* number of entries in hash table */ - tUINT64 mask; /* mask used for computing the hash */ - tINT32 num_collisions; /* number of collisions (useful for stats) */ + s32 size; /* size of hash table */ + s32 num_entries; /* number of entries in hash table */ + u64 mask; /* mask used for computing the hash */ + s32 num_collisions; /* number of collisions (useful for stats) */ DAPL_HASH_BUCKET *buckets; /* room for pointers to entries */ }; -DAPL_HASH_TABLE DaplHashTableCreate(tINT32 size); +DAPL_HASH_TABLE DaplHashTableCreate(s32 size); -tINT32 DaplHashTableDestroy(DAPL_HASH_TABLE table); +s32 DaplHashTableDestroy(DAPL_HASH_TABLE table); -tINT32 DaplHashTableInsert(DAPL_HASH_TABLE table, - char *key, unsigned long value); +s32 DaplHashTableInsert(DAPL_HASH_TABLE table, + char *key, unsigned long value); -tINT32 DaplHashTableLookup(DAPL_HASH_TABLE table, - char *key, unsigned long *value); +s32 DaplHashTableLookup(DAPL_HASH_TABLE table, + char *key, unsigned long *value); -tINT32 DaplHashTableRemove(DAPL_HASH_TABLE table, - char *key, unsigned long *value); +s32 DaplHashTableRemove(DAPL_HASH_TABLE table, + char *key, unsigned long *value); -tINT32 DaplHashTableRemoveValue(DAPL_HASH_TABLE table, unsigned long value); +s32 DaplHashTableRemoveValue(DAPL_HASH_TABLE table, unsigned long value); -tINT32 DaplHashTableDump(DAPL_HASH_TABLE table, - DAPL_HASH_DUMP_FUNC dfunc, - char *buffer, - tINT32 max_size, tINT32 start, tINT32 * end); +s32 DaplHashTableDump(DAPL_HASH_TABLE table, + DAPL_HASH_DUMP_FUNC dfunc, + char *buffer, + s32 max_size, s32 start, s32 *end); #endif /* _KHASH_H */ Index: drivers/infiniband/ulp/dapl/udapl_mod.c =================================================================== --- drivers/infiniband/ulp/dapl/udapl_mod.c (revision 615) +++ drivers/infiniband/ulp/dapl/udapl_mod.c (working copy) @@ -58,9 +58,9 @@ #define SMR_DB_SIZE 256 /* Keep this a power of 2. Max is 4096 */ #define MAX_SMR_PER_PROCESS 256 #define MAX_MRH_PER_SMR 512 -#define MRH_INDEX_INVALID ((tINT32) ~0) +#define MRH_INDEX_INVALID ((s32) ~0) #define MAX_WO_PER_PROCESS 256 -#define TS_TIMEOUT_INFINITE ((tINT32) ~0) +#define TS_TIMEOUT_INFINITE ((s32) ~0) #define SMR_COOKIE_SIZE 40 #define ATS_TIMEOUT (2*HZ) #define ATS_RETRIES 15 @@ -78,7 +78,7 @@ struct semaphore sem; struct ib_path_record *user_path_record; struct ib_path_record path_record; - tINT32 status; + s32 status; pr_entry_t *next; }; @@ -86,8 +86,8 @@ struct ats_entry_s { struct semaphore sem; - uint8_t info[16]; - tINT32 status; + u8 info[16]; + s32 status; ats_entry_t *next; }; @@ -95,16 +95,16 @@ struct ats_cache_rec_s { unsigned long created; - uint8_t ip_addr[16]; - uint8_t gid[16]; + u8 ip_addr[16]; + u8 gid[16]; }; typedef struct ats_advert_s ats_advert_t; struct ats_advert_s { - uint8_t ip_addr[16]; - uint8_t gid[16]; - tUINT32 set_flag; + u8 ip_addr[16]; + u8 gid[16]; + u32 set_flag; struct ib_device *ts_hca_handle; }; @@ -121,8 +121,8 @@ struct smr_rec_s { VAPI_mr_hndl_t *mrh_array; - tINT32 ref_count; - tINT32 initialized; + s32 ref_count; + s32 initialized; char cookie[SMR_COOKIE_SIZE]; smr_rec_t *next; }; @@ -131,7 +131,7 @@ struct smr_clean_info_s { smr_rec_t *smr_rec; - tUINT32 mrh_index; + u32 mrh_index; }; typedef struct resources_s resources_t; @@ -139,7 +139,7 @@ struct resources_s { wo_entry_t **wo_entries; smr_clean_info_t *smr_clean_info; - tUINT32 shmem_sem_flag; + u32 shmem_sem_flag; }; static int udapl_major_number = 245; @@ -161,7 +161,7 @@ static spinlock_t ats_advert_lock = SPIN_LOCK_UNLOCKED; static ats_cache_rec_t ats_cache[ATS_CACHE_SIZE]; -static tUINT32 ats_cache_last; +static u32 ats_cache_last; static spinlock_t ats_cache_lock = SPIN_LOCK_UNLOCKED; static void *pr_area; @@ -582,7 +582,7 @@ /* ATS processing */ /* ------------------------------------------------------------------------- */ -static tUINT32 find_hca(struct ib_device *ts_hca_handle) +static u32 find_hca(struct ib_device *ts_hca_handle) { int i; @@ -595,10 +595,10 @@ return i; } -static void ats_cache_insert(uint8_t * ip_addr, uint8_t * gid, tINT32 what) +static void ats_cache_insert(u8 *ip_addr, u8 *gid, s32 what) { unsigned long flags; - tUINT32 entry_n; + u32 entry_n; spin_lock_irqsave(&ats_cache_lock, flags); @@ -652,10 +652,10 @@ return; } -static tINT32 ats_cache_lookup(uint8_t * ip_addr, uint8_t * gid, tINT32 what) +static s32 ats_cache_lookup(u8 *ip_addr, u8 *gid, s32 what) { unsigned long flags; - tUINT32 entry_n; + u32 entry_n; spin_lock_irqsave(&ats_cache_lock, flags); @@ -790,13 +790,13 @@ } -static tINT32 ats_gid_lookup(uint8_t * ip_addr, uint8_t * gid, - void *ts_hca_handle, uint8_t port) +static s32 ats_gid_lookup(u8 *ip_addr, u8 *gid, + void *ts_hca_handle, u8 port) { - tINT32 status; + s32 status; ats_entry_t *ats_entry; tTS_IB_CLIENT_QUERY_TID tid; - tUINT32 count; + u32 count; TS_REPORT_STAGE(MOD_UDAPL, "Looking up GID for IP: %hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd.%hd on port %hd", @@ -940,15 +940,15 @@ } -static tINT32 ats_ipaddr_lookup(struct file *fp, unsigned long arg) +static s32 ats_ipaddr_lookup(struct file *fp, unsigned long arg) { ats_ipaddr_lookup_param_t param; - tINT32 status; - uint8_t gid[16]; - uint8_t ip_addr[16]; + s32 status; + u8 gid[16]; + u8 ip_addr[16]; ats_entry_t *ats_entry; tTS_IB_CLIENT_QUERY_TID tid; - tUINT32 count; + u32 count; if (!arg) { @@ -1116,13 +1116,13 @@ return; } -static tINT32 ats_do_set_ipaddr(uint8_t * ip_addr, uint8_t * gid, - void *ts_hca_handle, uint8_t port) +static s32 ats_do_set_ipaddr(u8 *ip_addr, u8 *gid, + void *ts_hca_handle, u8 port) { ats_entry_t *ats_entry; tTS_IB_CLIENT_QUERY_TID tid; - tUINT32 count; - tINT32 status; + u32 count; + s32 status; ats_entry = get_free_ats_entry(); @@ -1213,14 +1213,14 @@ return status; } -static tINT32 ats_set_ipaddr(struct file *fp, unsigned long arg) +static s32 ats_set_ipaddr(struct file *fp, unsigned long arg) { ats_set_ipaddr_param_t param; - tINT32 status; - uint8_t gid[16]; - uint8_t ip_addr[16]; + s32 status; + u8 gid[16]; + u8 ip_addr[16]; unsigned long flags; - tUINT32 hca_i, port_i; + u32 hca_i, port_i; if (!arg) { @@ -1312,9 +1312,9 @@ static void async_event_handler(struct ib_async_event_record *event, void *arg) { - tINT32 status; + s32 status; unsigned long flags; - tUINT32 hca_i, port_i; + u32 hca_i, port_i; switch (event->event) { @@ -1426,11 +1426,11 @@ return; } -static tINT32 gid_path_record_comp(tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, - tTS_IB_PORT hw_port, - struct ib_device *ca, - struct ib_path_record *path, void *usr_arg) +static s32 gid_path_record_comp(tIP2PR_PATH_LOOKUP_ID plid, + s32 status, + tTS_IB_PORT hw_port, + struct ib_device *ca, + struct ib_path_record *path, void *usr_arg) { pr_entry_t *pr_entry; @@ -1473,13 +1473,13 @@ return -1; } -static tINT32 ip_path_record_comp(tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, - tUINT32 src_addr, - tUINT32 dst_addr, - tTS_IB_PORT hw_port, - struct ib_device *ca, - struct ib_path_record *path, void *usr_arg) +static s32 ip_path_record_comp(tIP2PR_PATH_LOOKUP_ID plid, + s32 status, + u32 src_addr, + u32 dst_addr, + tTS_IB_PORT hw_port, + struct ib_device *ca, + struct ib_path_record *path, void *usr_arg) { pr_entry_t *pr_entry; @@ -1522,17 +1522,17 @@ return -1; } -static tINT32 get_path_record(struct file *fp, unsigned long arg) +static s32 get_path_record(struct file *fp, unsigned long arg) { path_record_param_t param; - tINT32 status; + s32 status; pr_entry_t *pr_entry; tIP2PR_PATH_LOOKUP_ID plid; - uint8_t dst_ip_addr[16]; - uint8_t src_gid[16]; - uint8_t dst_gid[16]; + u8 dst_ip_addr[16]; + u8 src_gid[16]; + u8 dst_gid[16]; u16 pkey; - tUINT32 dst_addr; + u32 dst_addr; if (!arg) { @@ -1742,7 +1742,7 @@ { unsigned long flags; smr_rec_t *smr_rec; - tUINT32 i; + u32 i; TS_ENTER(MOD_UDAPL); @@ -1781,14 +1781,14 @@ return smr_rec; } -static tINT32 smr_insert(struct file *fp, unsigned long arg) +static s32 smr_insert(struct file *fp, unsigned long arg) { smr_insert_param_t param; - tINT32 status; + s32 status; unsigned long flags; smr_rec_t *smr_rec; smr_clean_info_t *smr_clean_info; - tUINT32 smr_n; + u32 smr_n; TS_ENTER(MOD_UDAPL); @@ -1887,15 +1887,15 @@ return status; } -static tINT32 smr_add_mrh(struct file *fp, unsigned long arg) +static s32 smr_add_mrh(struct file *fp, unsigned long arg) { smr_add_mrh_param_t param; - tINT32 status; + s32 status; unsigned long flags; smr_rec_t *smr_rec; smr_clean_info_t *smr_clean_info; - tUINT32 i; - tUINT32 smr_n; + u32 i; + u32 smr_n; char cookie[SMR_COOKIE_SIZE]; char cookie_str[SMR_COOKIE_SIZE + 1]; @@ -1996,15 +1996,15 @@ return status; } -static tINT32 smr_del_mrh(struct file *fp, unsigned long arg) +static s32 smr_del_mrh(struct file *fp, unsigned long arg) { smr_del_mrh_param_t param; - tINT32 status; + s32 status; unsigned long flags; smr_rec_t *smr_rec; smr_clean_info_t *smr_clean_info; - tUINT32 i; - tUINT32 smr_n; + u32 i; + u32 smr_n; char cookie[SMR_COOKIE_SIZE]; char cookie_str[SMR_COOKIE_SIZE + 1]; @@ -2113,15 +2113,15 @@ return status; } -static tINT32 smr_query(struct file *fp, unsigned long arg) +static s32 smr_query(struct file *fp, unsigned long arg) { smr_query_param_t param; - tINT32 status; + s32 status; unsigned long flags; smr_rec_t *smr_rec; smr_clean_info_t *smr_clean_info; - tUINT32 smr_n; - tUINT32 i; + u32 smr_n; + u32 i; char cookie[SMR_COOKIE_SIZE]; char cookie_str[SMR_COOKIE_SIZE + 1]; @@ -2228,14 +2228,14 @@ return status; } -static tINT32 smr_dec(struct file *fp, unsigned long arg) +static s32 smr_dec(struct file *fp, unsigned long arg) { smr_insert_param_t param; - tINT32 status; + s32 status; unsigned long flags; smr_rec_t *smr_rec; smr_clean_info_t *smr_clean_info; - tUINT32 smr_n; + u32 smr_n; char cookie[SMR_COOKIE_SIZE]; char cookie_str[SMR_COOKIE_SIZE + 1]; @@ -2343,10 +2343,10 @@ static void smr_clean(smr_clean_info_t * smr_clean_info) { - tUINT32 smr_n; + u32 smr_n; unsigned long flags; smr_rec_t *cur_smr; - tINT32 status; + s32 status; TS_ENTER(MOD_UDAPL); @@ -2401,7 +2401,7 @@ spin_unlock_irqrestore(&smr_db_lock, flags); } -static tINT32 smr_mutex_lock(struct file *fp) +static s32 smr_mutex_lock(struct file *fp) { TS_ENTER(MOD_UDAPL); @@ -2411,7 +2411,7 @@ return 0; } -static tINT32 smr_mutex_unlock(struct file *fp) +static s32 smr_mutex_unlock(struct file *fp) { TS_ENTER(MOD_UDAPL); @@ -2429,13 +2429,13 @@ /* Misc. helper ioctls */ /* ------------------------------------------------------------------------- */ -static tINT32 get_hca_ipaddr(struct file *fp, unsigned long arg) +static s32 get_hca_ipaddr(struct file *fp, unsigned long arg) { get_hca_ipaddr_param_t param; - tINT32 status; - uint8_t gid[16]; - uint8_t dev_gid[16]; - tINT32 i; + s32 status; + u8 gid[16]; + u8 dev_gid[16]; + s32 i; struct net_device *dev; struct in_device *inet_dev; struct ib_device *ca; @@ -2645,7 +2645,7 @@ static int udapl_ioctl(struct inode *inode, struct file *fp, unsigned int cmd, unsigned long arg) { - tINT32 status; + s32 status; if ((_IOC_TYPE(cmd) != T_IOC_MAGIC) || (_IOC_NR(cmd) > T_IOC_MAXNR) || (fp->private_data == NULL)) { @@ -2841,7 +2841,7 @@ } -static tINT32 reg_dev(void) +static s32 reg_dev(void) { static struct file_operations udapl_fops = { owner:THIS_MODULE, @@ -2850,7 +2850,7 @@ read:udapl_read, release:udapl_close, }; - tINT32 result; + s32 result; result = register_chrdev(udapl_major_number, UDAPL_DEVNAME, &udapl_fops); @@ -2875,8 +2875,8 @@ static int __init udapl_init_module(void) { - tINT32 status; - tINT32 entry_n; + s32 status; + s32 entry_n; pr_entry_t *pr_entry; ats_entry_t *ats_entry; wo_entry_t *wo_entry; @@ -3029,7 +3029,7 @@ static void udapl_cleanup_module(void) { - tINT32 status; + s32 status; kfree(wo_area); kfree(pr_area); Index: drivers/infiniband/ulp/dapl/udapl_mod.h =================================================================== --- drivers/infiniband/ulp/dapl/udapl_mod.h (revision 615) +++ drivers/infiniband/ulp/dapl/udapl_mod.h (working copy) @@ -59,17 +59,17 @@ } clear_comp_eventh_param_t; typedef struct path_record_param_s { - uint8_t *dst_ip_addr; - uint8_t *src_gid; + u8 *dst_ip_addr; + u8 *src_gid; void *ts_hca_handle; - uint8_t port; - tUINT32 ats_flag; + u8 port; + u32 ats_flag; struct ib_path_record *path_record; } path_record_param_t; typedef struct smr_insert_param_s { void *cookie; - tUINT32 exists; + u32 exists; } smr_insert_param_t; typedef struct smr_add_mrh_param_s { @@ -84,7 +84,7 @@ typedef struct smr_query_param_s { void *cookie; - tUINT32 ready; + u32 ready; VAPI_mr_hndl_t mr_handle; } smr_query_param_t; @@ -93,22 +93,22 @@ } smr_dec_param_t; typedef struct ats_ipaddr_lookup_param_s { - uint8_t *gid; + u8 *gid; void *ts_hca_handle; - uint8_t port; - uint8_t *ip_addr; + u8 port; + u8 *ip_addr; } ats_ipaddr_lookup_param_t; typedef struct ats_set_ipaddr_param_s { - uint8_t *gid; + u8 *gid; void *ts_hca_handle; - uint8_t port; - uint8_t *ip_addr; + u8 port; + u8 *ip_addr; } ats_set_ipaddr_param_t; typedef struct get_hca_ipaddr_param_s { - uint8_t *gid; - uint32_t ip_addr; + u8 *gid; + u3t ip_addr; } get_hca_ipaddr_param_t; #define T_IOC_MAGIC 92 Index: drivers/infiniband/ulp/ipoib/ip2pr_mod.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_mod.c (revision 615) +++ drivers/infiniband/ulp/ipoib/ip2pr_mod.c (working copy) @@ -27,19 +27,19 @@ MODULE_DESCRIPTION("IB path record lookup module"); MODULE_LICENSE("Dual BSD/GPL"); -extern tINT32 tsIp2prLinkAddrInit(void +extern s32 tsIp2prLinkAddrInit(void ); -extern tINT32 tsIp2prLinkAddrCleanup(void +extern s32 tsIp2prLinkAddrCleanup(void ); -extern tINT32 _tsIp2prUserLookup(unsigned long arg); -extern tINT32 _tsGid2prUserLookup(unsigned long arg); -extern tINT32 tsIp2prProcFsInit(void +extern s32 _tsIp2prUserLookup(unsigned long arg); +extern s32 _tsGid2prUserLookup(unsigned long arg); +extern s32 tsIp2prProcFsInit(void ); -extern tINT32 tsIp2prProcFsCleanup(void +extern s32 tsIp2prProcFsCleanup(void ); -extern tINT32 tsIp2prSrcGidInit(void +extern s32 tsIp2prSrcGidInit(void ); -extern tINT32 tsIp2prSrcGidCleanup(void +extern s32 tsIp2prSrcGidCleanup(void ); static int ip2pr_major_number = 240; @@ -103,7 +103,7 @@ /*..prlookup_init -- initialize the PathRecord Lookup host module */ int __init tsIp2prDriverInitModule(void ) { - tINT32 result = 0; + s32 result = 0; TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: Path Record Lookup module load."); Index: drivers/infiniband/ulp/srp/srp_host.c =================================================================== --- drivers/infiniband/ulp/srp/srp_host.c (revision 615) +++ drivers/infiniband/ulp/srp/srp_host.c (working copy) @@ -26,7 +26,7 @@ MODULE_PARM(dlid_conf, "i"); MODULE_PARM_DESC(dlid_conf, "dlid_conf (nonzero value indicates that dlid is being specified in the conf file)"); -tUINT32 dlid_conf = 0; +u32 dlid_conf = 0; MODULE_PARM(service_str, "s"); MODULE_PARM_DESC(service_str, "used with dlid_conf, in stand alone systems"); @@ -35,7 +35,7 @@ MODULE_PARM(ib_ports_mask, "i"); MODULE_PARM_DESC(ib_ports_mask, "bit mask to enable or disable SRP usage of local IB ports 1 - use port1 only, 2 - use port2 only, 3 - use both ports"); -tUINT32 ib_ports_mask = 0xffffffff; +u32 ib_ports_mask = 0xffffffff; MODULE_PARM(max_luns, "i"); MODULE_PARM_DESC(max_luns, @@ -67,7 +67,7 @@ MODULE_PARM(srp_discovery_timeout, "i"); MODULE_PARM_DESC(srp_discovery_timeout, "timeout (in seconds) for SRP discovery (of SRP targets) to complete"); -tUINT32 srp_discovery_timeout = IB_DISCOVERY_TIMEOUT; /* 60 seconds */ +u32 srp_discovery_timeout = IB_DISCOVERY_TIMEOUT; /* 60 seconds */ MODULE_PARM(fmr_cache, "i"); MODULE_PARM_DESC(fmr_cache, "size of cached fmr entries"); @@ -406,7 +406,7 @@ } else if (srp_pkt->in_use == TRUE) { srp_pkt->scatter_gather_list.address = - (uint64_t) (unsigned long)srp_pkt->data; + (u64) (unsigned long)srp_pkt->data; srp_pkt->scatter_gather_list.length = srp_cmd_pkt_size; srp_pkt->in_use = FALSE; srp_pkt->conn = INVALID_CONN_HANDLE; @@ -437,7 +437,7 @@ } else if (srp_pkt->in_use == TRUE) { srp_pkt->scatter_gather_list.address = - (uint64_t) (unsigned long)srp_pkt->data; + (u64) (unsigned long)srp_pkt->data; srp_pkt->scatter_gather_list.length = srp_cmd_pkt_size; srp_pkt->in_use = FALSE; srp_pkt->conn = INVALID_CONN_HANDLE; @@ -632,7 +632,7 @@ srp_pkt->target = target; srp_pkt->data = srp_pkt_data; srp_pkt->scatter_gather_list.address = - (uint64_t) (unsigned long) srp_pkt_data; + (u64) (unsigned long) srp_pkt_data; srp_pkt->scatter_gather_list.length = srp_cmd_pkt_size; if (pkt_num == max_num_pkts - 1) @@ -815,7 +815,7 @@ * Reclaim memory registrations */ if (ioq->sr_list) { - uint32_t sr_list_index = 0; + u32 sr_list_index = 0; for (; sr_list_index < ioq->sr_list_length; sr_list_index++) { srptp_dereg_phys_host_buf(ioq->sr_list + @@ -1638,8 +1638,8 @@ int i; srp_target_t *target; int not_first_entry = FALSE; - uint8_t *gid; - uint8_t *ioc_guid; + u8 *gid; + u8 *ioc_guid; char *buf; if (inout == TRUE) { @@ -1917,7 +1917,7 @@ { srp_host_conn_t *s; srp_resp_t *resphdr; - tUINT32 resp_code; + u32 resp_code; ioq_t *ioq = NULL; ioq_t *next_ioq = NULL; ioq_t *cmd_ioq; @@ -1927,7 +1927,7 @@ #if DBG_IGNORE_WRITE resphdr = (srp_resp_t *) pkt->data; - ioq = (ioq_t *) (tUINT32) (be64_to_cpu(resphdr->tag)); + ioq = (ioq_t *) (u32) (be64_to_cpu(resphdr->tag)); if (ioq->req) { scsi_cmnd = (Scsi_Cmnd *) ioq->req; if (scsi_cmnd->cmnd[0] == 0x2A) { @@ -1968,7 +1968,7 @@ * does not occur. */ TS_REPORT_WARN(MOD_SRPTP, "NULL request queue, for tag %x", - (tUINT32) resphdr->tag); + (u32) resphdr->tag); spin_unlock_irqrestore(&target->spin_lock, cpu_flags); return 0; @@ -2021,7 +2021,7 @@ if (resphdr->status.bit.rspvalid) { /* response data */ - resp_code = ((tUINT8 *) (resphdr + 1))[3]; + resp_code = ((u8 *) (resphdr + 1))[3]; switch (resp_code) { case NO_FAILURE: @@ -2379,7 +2379,7 @@ srp_host_hca_params_t *hca = target->port->hca; if (ioq->sr_list) { - uint32_t sr_list_index = 0; + u32 sr_list_index = 0; for (; sr_list_index < ioq->sr_list_length; sr_list_index++) { srptp_dereg_phys_host_buf(ioq->sr_list + @@ -2462,7 +2462,7 @@ if (status) { /* we have a problem posting, disconnect, never should happen */ if (ioq->sr_list) { - uint32_t sr_list_index = 0; + u32 sr_list_index = 0; for (; sr_list_index < ioq->sr_list_length; sr_list_index++) { srptp_dereg_phys_host_buf(ioq->sr_list + @@ -2760,7 +2760,7 @@ TS_REPORT_STAGE(MOD_SRPTP, "Sending Login request for conn %p", s); TS_REPORT_STAGE(MOD_SRPTP, "SRP Initiator GUID: %llx", - be64_to_cpu(*(tUINT64 *) & s->port->hca->I_PORT_ID[8])); + be64_to_cpu(*(u64 *) & s->port->hca->I_PORT_ID[8])); s->login_buff_len = sizeof(srp_login_req_t); header = (srp_login_req_t *) s->login_buff; @@ -2897,7 +2897,7 @@ TS_REPORT_STAGE(MOD_SRPTP, "sense_buffer[%d] %x", i, - *((tUINT32 *) & + *((u32 *) & sense_buffer[i])); } TS_REPORT_STAGE(MOD_SRPTP, @@ -2925,7 +2925,7 @@ * we are going to need to free the response packet, and structures * to point to the host buffers */ if (cmnd->request_bufflen) { - uint32_t sr_list_index = 0; + u32 sr_list_index = 0; sr_list = ioq->sr_list; @@ -2960,13 +2960,13 @@ srp_remote_buf_t *curr_buff_descriptor = &header->partial_memory_descriptor_list[0]; - uint32_t total_length = 0; + u32 total_length = 0; dma_addr_t curr_dma_addr, base_dma_addr; - tUINT32 curr_registration_length = 0, curr_dma_length = 0; + u32 curr_registration_length = 0, curr_dma_length = 0; - uint64_t *dma_addr_list; - tUINT32 dma_addr_index = 0; + u64 *dma_addr_list; + u32 dma_addr_index = 0; int status; srp_host_buf_t *sr_list; @@ -2985,8 +2985,8 @@ sr_list->data = NULL; - dma_addr_list = (uint64_t *) - kmalloc(sizeof(uint64_t) * + dma_addr_list = (u64 *) + kmalloc(sizeof(u64) * ((max_xfer_sectors_per_io * 512 / PAGE_SIZE) + 2), GFP_ATOMIC); if (dma_addr_list == NULL) { @@ -3043,7 +3043,7 @@ * Register the current region */ sr_list->size = curr_registration_length; - sr_list->r_addr = (uint64_t) (unsigned long)sr_list->data; + sr_list->r_addr = (u64) (unsigned long)sr_list->data; status = srptp_register_memory(srp_pkt->conn, sr_list, @@ -3122,9 +3122,9 @@ header->total_length = cpu_to_be32(total_length); if (srp_cmd_frame->dofmt) - srp_cmd_frame->docount = (tUINT8) (*sr_list_length); + srp_cmd_frame->docount = (u8) (*sr_list_length); else if (srp_cmd_frame->difmt) - srp_cmd_frame->dicount = (tUINT8) (*sr_list_length); + srp_cmd_frame->dicount = (u8) (*sr_list_length); else srp_cmd_frame->dicount = srp_cmd_frame->docount = 0; @@ -3150,8 +3150,8 @@ int sg_cnt; int num_sg_elements; int offset, max_phys_pages, page_offset; - uint64_t *phys_buffer_list; - uint64_t new_phys_page, old_phys_page; + u64 *phys_buffer_list; + u64 new_phys_page, old_phys_page; int status, buf_len, num_phys_pages, old_buf_len; TS_REPORT_DATA(MOD_SRPTP, "sg cnt = %d buffer %p phys %lx", @@ -3207,7 +3207,7 @@ sr_list->data = (void *)(unsigned long) sg_dma_address(&st_buffer[0]); sr_list->r_addr = (unsigned long) sr_list->data; - sr_list->size = (uint32_t) cmnd->request_bufflen; + sr_list->size = (u32) cmnd->request_bufflen; /* * compute the number of physical pages @@ -3220,7 +3220,7 @@ */ max_phys_pages = cmnd->request_bufflen / PAGE_SIZE; page_offset = - (uint32_t) sg_dma_address(&st_buffer[0]) & (PAGE_SIZE - 1); + (u32) sg_dma_address(&st_buffer[0]) & (PAGE_SIZE - 1); if (page_offset) { max_phys_pages++; if ((PAGE_SIZE - page_offset) < @@ -3231,7 +3231,7 @@ } phys_buffer_list = - (uint64_t *) kmalloc(sizeof(uint64_t) * max_phys_pages, GFP_ATOMIC); + (u64 *) kmalloc(sizeof(u64) * max_phys_pages, GFP_ATOMIC); if (phys_buffer_list == NULL) { TS_REPORT_WARN(MOD_SRPTP, "phys buffer list allocation failed"); kfree(sr_list); @@ -3265,7 +3265,7 @@ */ for (sg_cnt = 0; sg_cnt < num_sg_elements; sg_cnt++) { - new_phys_page = (uint32_t) sg_dma_address(&st_buffer[sg_cnt]); + new_phys_page = (u32) sg_dma_address(&st_buffer[sg_cnt]); buf_len = sg_dma_len(&st_buffer[sg_cnt]); TS_REPORT_DATA(MOD_SRPTP, "virtual[%x] %llx len %x", sg_cnt, @@ -3274,7 +3274,7 @@ for (sg_cnt = 0; sg_cnt < num_sg_elements; sg_cnt++) { - new_phys_page = (uint32_t) sg_dma_address(&st_buffer[sg_cnt]); + new_phys_page = (u32) sg_dma_address(&st_buffer[sg_cnt]); buf_len = sg_dma_len(&st_buffer[sg_cnt]); TS_REPORT_DATA(MOD_SRPTP, "virtual[%x] %llx len %x", sg_cnt, @@ -3313,7 +3313,7 @@ "aligned at end and not last"); TS_REPORT_FATAL(MOD_SRPTP, "next addr 0x%x len 0x%x", - (uint32_t) (sg_dma_address + (u32) (sg_dma_address (&st_buffer [sg_cnt + 1])), sg_dma_len(&st_buffer Index: drivers/infiniband/ulp/srp/srp_dm.c =================================================================== --- drivers/infiniband/ulp/srp/srp_dm.c (revision 615) +++ drivers/infiniband/ulp/srp/srp_dm.c (working copy) @@ -53,7 +53,7 @@ if (ioc_table[i].valid == FALSE) { TS_REPORT_STAGE(MOD_SRPTP, "Creating IOC Entry %d for 0x%llx", i, - be64_to_cpu(*(uint64_t *) guid)); + be64_to_cpu(*(u64 *) guid)); memcpy(ioc_table[i].guid, guid, sizeof(tTS_IB_GUID)); ioc_table[i].valid = TRUE; @@ -97,7 +97,7 @@ if (path_available == FALSE) { TS_REPORT_WARN(MOD_SRPTP, "IOC GUID %llx, no available paths", - be64_to_cpu(*(uint64_t *) ioc->guid)); + be64_to_cpu(*(u64 *) ioc->guid)); /* * no paths available to this IOC, let's remove it from our @@ -331,7 +331,7 @@ return (TS_FAIL); } -int srp_find_query(srp_host_port_params_t * port, uint8_t * gid) +int srp_find_query(srp_host_port_params_t *port, u8 *gid) { unsigned long cpu_flags; struct list_head *temp_entry; @@ -403,8 +403,8 @@ { struct list_head *cur; char *service_name_str; - uint64_t service_id; - uint64_t service_name_cpu_endian, service_name; + u64 service_id; + u64 service_name_cpu_endian, service_name; int ioc_index = 0; int svc_index = 0; ioc_entry_t *ioc_entry; @@ -436,8 +436,8 @@ service_name = cpu_to_be64(service_name_cpu_endian); service_id = - be64_to_cpu(*(uint64_t *) io_svc->svc_entry.service_id); - if (service_id != ((uint64_t) (SRP_SERVICE_ID))) { + be64_to_cpu(*(u64 *) io_svc->svc_entry.service_id); + if (service_id != ((u64) (SRP_SERVICE_ID))) { TS_REPORT_WARN(MOD_SRPTP, "Invalid service id 0x%llx " "(expected 0x%llx) from DM Client\n", @@ -454,7 +454,7 @@ TS_REPORT_STAGE(MOD_SRPTP, "IOC not found %llx, creating new " "IOC entry", - be64_to_cpu(*(uint64_t *) io_svc-> + be64_to_cpu(*(u64 *) io_svc-> controller_guid)); status = @@ -714,7 +714,7 @@ static struct ib_path_record *srp_empty_path_record_cache(void) { - uint32_t index; + u32 index; struct ib_path_record unused_path_record; @@ -736,10 +736,9 @@ return NULL; } -static void srp_flush_path_record_cache(srp_host_port_params_t * port, - uint8_t * gid) +static void srp_flush_path_record_cache(srp_host_port_params_t *port, u8 *gid) { - uint32_t index; + u32 index; unsigned long cpu_flags; TS_REPORT_STAGE(MOD_SRPTP, "Flushing path record cache"); @@ -951,7 +950,7 @@ struct ib_device *dev_hndl, tTS_IB_PORT local_port, u16 io_port_lid, void *arg) { - uint8_t *notified_port_gid = notice->detail.sm_trap.gid; + u8 *notified_port_gid = notice->detail.sm_trap.gid; int hca_index; srp_host_hca_params_t *hca; srp_host_port_params_t *port; @@ -1043,7 +1042,7 @@ { srp_host_port_params_t *srp_port = (srp_host_port_params_t *) arg; - uint8_t *notified_port_gid = notice->detail.sm_trap.gid; + u8 *notified_port_gid = notice->detail.sm_trap.gid; srp_query_entry_t *query_entry; int status; @@ -1128,10 +1127,10 @@ tTS_IB_PORT local_port, void *arg) { - uint8_t *notified_port_gid = notice->detail.sm_trap.gid; + u8 *notified_port_gid = notice->detail.sm_trap.gid; srp_host_port_params_t *port = (srp_host_port_params_t *) arg; srp_target_t *target; - uint8_t *r; + u8 *r; TS_REPORT_WARN(MOD_SRPTP, "Lost GID %02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x on HCA %d Port %d", @@ -1469,7 +1468,7 @@ case IB_PORT_ERROR: { - tUINT32 i; + u32 i; int ioc_index; srp_target_t *target; unsigned long cpu_flags; Index: drivers/infiniband/ulp/srp/srp_host.h =================================================================== --- drivers/infiniband/ulp/srp/srp_host.h (revision 615) +++ drivers/infiniband/ulp/srp/srp_host.h (working copy) @@ -138,7 +138,7 @@ /* protection domain handle */ struct ib_pd *pd_hndl; - tUINT8 I_PORT_ID[16]; + u8 I_PORT_ID[16]; struct ib_fmr_pool *fmr_pool; @@ -175,7 +175,7 @@ tTS_KERNEL_THREAD thread; - tUINT8 I_PORT_ID[16]; + u8 I_PORT_ID[16]; int dm_shutdown; @@ -232,7 +232,7 @@ /* * --------------------------- */ - uint64_t r_addr; /* RDMA buffer address to be used by the + u64 r_addr; /* RDMA buffer address to be used by the * target */ u32 r_key; struct ib_fmr *mr_hndl; /* buffer's memory handle */ @@ -329,15 +329,15 @@ int retry_count; - uint8_t login_buff[256]; + u8 login_buff[256]; int login_buff_len; - uint8_t login_resp_data[256]; + u8 login_resp_data[256]; int login_resp_len; - uint8_t redirected_port_gid[16]; + u8 redirected_port_gid[16]; tTS_IB_CM_COMM_ID comm_id; @@ -362,7 +362,7 @@ * Data Buffers Mgmt */ struct list_head data_buffers_free_list; - uint8_t *data_buffers_vaddr; + u8 *data_buffers_vaddr; u32 r_key; /* R_Key to be used by the target */ u32 l_key; struct ib_mr *mr_hndl; /* buffer's memory handle */ @@ -421,7 +421,7 @@ srp_pkt_t *srp_pkt_hdr_area; /* memory area for SRP packet payloads */ - uint8_t *srp_pkt_data_area; + u8 *srp_pkt_data_area; u32 r_key[MAX_HCAS]; @@ -446,7 +446,7 @@ /* * IB pathing information */ - tUINT64 service_name; + u64 service_name; srp_host_port_params_t *port; ioc_entry_t *ioc; @@ -477,7 +477,7 @@ /* * Counters */ - int64_t ios_processed; + s64 ios_processed; } srp_target_t; @@ -531,11 +531,11 @@ extern void srp_fmr_flush_function(struct ib_fmr_pool *fmr_pool, void *flush_arg); -extern tUINT32 parse_parameters(char *parameters); +extern u32 parse_parameters(char *parameters); -extern tUINT32 parse_target_binding_parameters(char *parameters); +extern u32 parse_target_binding_parameters(char *parameters); -extern int StringToHex64(char *, uint64_t *); +extern int StringToHex64(char *, u64 *); extern int srp_host_disconnect_done(srp_host_conn_t * conn, int status); @@ -597,10 +597,10 @@ * Called by a host SRP driver to register a buffer on the host. * IN: host buffer */ -int srptp_register_memory(srp_host_conn_t * conn, - srp_host_buf_t * buf, - tUINT32 offset, - uint64_t * phys_buffer_list, tUINT32 list_len); +int srptp_register_memory(srp_host_conn_t *conn, + srp_host_buf_t *buf, + u32 offset, + u64 *phys_buffer_list, u32 list_len); /* * Called by a host SRP driver to deregister a buffer on the host. Index: drivers/infiniband/ulp/srp/srptp.c =================================================================== --- drivers/infiniband/ulp/srp/srptp.c (revision 615) +++ drivers/infiniband/ulp/srp/srptp.c (working copy) @@ -271,7 +271,7 @@ TS_REPORT_STAGE(MOD_SRPTP, "SRP Initiator GUID: %llx for hca %d", - be64_to_cpu(*(tUINT64 *) & hca->I_PORT_ID[8]), + be64_to_cpu(*(u64 *) & hca->I_PORT_ID[8]), hca->hca_index + 1); hca->pd_hndl = ib_alloc_pd(hca->ca_hndl); @@ -793,11 +793,11 @@ int srptp_register_memory(srp_host_conn_t * conn, srp_host_buf_t * buf, - tUINT32 offset, uint64_t * buffer_list, tUINT32 list_len) + u32 offset, u64 * buffer_list, u32 list_len) { int status; u32 l_key; - uint64_t start_address = (unsigned long) buf->data; + u64 start_address = (unsigned long) buf->data; if (buf == NULL) { Index: drivers/infiniband/ulp/srp/hostoptions.c =================================================================== --- drivers/infiniband/ulp/srp/hostoptions.c (revision 615) +++ drivers/infiniband/ulp/srp/hostoptions.c (working copy) @@ -129,7 +129,7 @@ *sourceCharsUsedPtr = index; } -int StringToHex32(char *stringPtr, tUINT32 * hexptr) +int StringToHex32(char *stringPtr, u32 * hexptr) { int firsttime = 1; long isError = kNoError; @@ -165,7 +165,7 @@ return isError; } -int StringToHex64(char *stringPtr, uint64_t * hexptr) +int StringToHex64(char *stringPtr, u64 * hexptr) { int firsttime = 1; long isError = kNoError; @@ -204,7 +204,7 @@ } #if 0 -tUINT32 parse_parameters(char *parameters) +u32 parse_parameters(char *parameters) { char *curr_loc; unsigned long chars_copied = 0; @@ -212,12 +212,12 @@ char wwn_str[kWWNStringLength + 1]; char guid_str[kGUIDStringLength + 1]; char dlid_str[kDLIDStringLength + 1]; - tUINT64 wwn; - tUINT64 guid; - tUINT32 dlid; + u64 wwn; + u64 guid; + u32 dlid; long result; - tUINT32 i; - extern tUINT32 dlid_conf; + u32 i; + extern u32 dlid_conf; /* first convert to lower case to make life easier */ ConvertToLowerCase(parameters); @@ -237,7 +237,7 @@ &chars_copied); /* printk( "wwn string %s\n", wwn_str ); */ - /* printk( "characters copied %d\n", (tUINT32)chars_copied ); */ + /* printk( "characters copied %d\n", (u32)chars_copied ); */ if (chars_copied > (kWWNStringLength + 1)) { return (TS_FAILURE); } else { @@ -246,7 +246,7 @@ result = StringToHex64(wwn_str, &wwn); printk("WWPN %llx ", wwn); - *(tUINT64 *) & (srp_targets[i].service_name) = cpu_to_be64(wwn); + *(u64 *) & (srp_targets[i].service_name) = cpu_to_be64(wwn); if (result != kNoError) return (TS_FAILURE); @@ -261,7 +261,7 @@ printk("guid string %s\n", guid_str); printk("characters copied %d\n", - (tUINT32) chars_copied); + (u32) chars_copied); if (chars_copied > (kGUIDStringLength + 1)) { return (TS_FAILURE); @@ -270,10 +270,10 @@ } result = StringToHex64(guid_str, &guid); - *(tUINT64 *) & (srp_targets[i].guid) = + *(u64 *) & (srp_targets[i].guid) = cpu_to_be64(guid); printk("GUID %llx\n", - *(tUINT64 *) & srp_targets[i].guid); + *(u64 *) & srp_targets[i].guid); if (result != kNoError) return (TS_FAILURE); @@ -282,7 +282,7 @@ delimeter, &chars_copied); /* printk( "dlid string %s\n", dlid_str ); */ - /* printk( "characters copied %d\n", (tUINT32)chars_copied ); */ + /* printk( "characters copied %d\n", (u32)chars_copied ); */ if (chars_copied > (kDLIDStringLength + 1)) { return (TS_FAILURE); @@ -306,19 +306,19 @@ } #endif -tUINT32 parse_target_binding_parameters(char *parameters) +u32 parse_target_binding_parameters(char *parameters) { char *curr_loc; unsigned long chars_copied = 0; char delimeter; char wwn_str[kWWNStringLength + 1]; char target_index_str[kTargetIndexStringLength + 1]; - uint64_t wwn; - tUINT32 target_index; + u64 wwn; + u32 target_index; srp_target_t *target; int status; long result; - tUINT32 i; + u32 i; /* first convert to lower case to make life easier */ ConvertToLowerCase(parameters); @@ -362,7 +362,7 @@ result = StringToHex32(target_index_str, &target_index); target = &srp_targets[target_index]; - *(tUINT64 *) & (target->service_name) = cpu_to_be64(wwn); + *(u64 *) & (target->service_name) = cpu_to_be64(wwn); printk("to Target %x\n", target_index); status = srp_host_alloc_pkts(target); if (status) { Index: drivers/infiniband/ulp/srp/srp_cmd.h =================================================================== --- drivers/infiniband/ulp/srp/srp_cmd.h (revision 615) +++ drivers/infiniband/ulp/srp/srp_cmd.h (working copy) @@ -109,270 +109,270 @@ */ typedef struct buf_format_s { - tUINT8 rsvd0; + u8 rsvd0; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 rsvd_b0:1, ddbd:1, idbd:1, rsvd_b3:5; + u8 rsvd_b0:1, ddbd:1, idbd:1, rsvd_b3:5; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 rsvd_b3:5, idbd:1, ddbd:1, rsvd_b0:1; + u8 rsvd_b3:5, idbd:1, ddbd:1, rsvd_b0:1; #else #error "Please fix " #endif } buf_format_t; typedef struct _srp_iu { - tUINT8 opcode; - tUINT8 rsvd[7]; - tUINT64 tag; + u8 opcode; + u8 rsvd[7]; + u64 tag; } srp_iu_t; /* * Remote buffer header */ typedef struct srp_remote_buf_s { - uint64_t r_data; /* physical address on the remote node */ - int32_t r_key; /* R_Key */ + u64 r_data; /* physical address on the remote node */ + s32 r_key; /* R_Key */ unsigned int r_size; /* size of remote buffer */ } srp_remote_buf_t; typedef struct _srp_login_req_t { - tUINT8 opcode; - tUINT8 rsvd[7]; - tUINT64 tag; + u8 opcode; + u8 rsvd[7]; + u64 tag; int request_I_T_IU; - tUINT8 rsvd1[4]; + u8 rsvd1[4]; buf_format_t req_buf_format; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 multi_chan:2, rsvd3:6; + u8 multi_chan:2, rsvd3:6; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 rsvd3:6, multi_chan:2; + u8 rsvd3:6, multi_chan:2; #endif - tUINT8 rsvd4; - tUINT8 rsvd5[4]; - tUINT8 initiator_port_id[16]; - tUINT8 target_port_id[16]; + u8 rsvd4; + u8 rsvd5[4]; + u8 initiator_port_id[16]; + u8 target_port_id[16]; } srp_login_req_t; typedef struct _srp_login_resp_t { - tUINT8 opcode; - tUINT8 rsvd[3]; - tUINT32 request_limit_delta; - tUINT64 tag; - tUINT32 request_I_T_IU; - tUINT32 request_T_I_IU; + u8 opcode; + u8 rsvd[3]; + u32 request_limit_delta; + u64 tag; + u32 request_I_T_IU; + u32 request_T_I_IU; buf_format_t sup_buf_format; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 multi_chan:2, rsvd3:6; + u8 multi_chan:2, rsvd3:6; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 rsvd3:6, multi_chan:2; + u8 rsvd3:6, multi_chan:2; #endif - tUINT8 rsvd4[25]; + u8 rsvd4[25]; } srp_login_resp_t; typedef struct _srp_login_rej { - tUINT8 opcode; - tUINT8 rsvd[3]; - tUINT32 reason_code; - tUINT64 tag; - tUINT8 rsvd1[8]; + u8 opcode; + u8 rsvd[3]; + u32 reason_code; + u64 tag; + u8 rsvd1[8]; buf_format_t sup_buf_format; - tUINT8 rsvd2[6]; + u8 rsvd2[6]; } srp_login_rej_t; struct srp_I_logout { - tUINT8 opcode; - tUINT8 rsvd[7]; - tUINT64 tag; + u8 opcode; + u8 rsvd[7]; + u64 tag; }; typedef struct _srp_t_logout_t { - tUINT8 opcode; - tUINT8 rsvd[3]; - tUINT32 reason_code; - tUINT64 tag; + u8 opcode; + u8 rsvd[3]; + u32 reason_code; + u64 tag; } srp_t_logout_t; typedef struct _srp_tm_t { - tUINT8 opcode; - tUINT8 rsvd[7]; - tUINT64 tag; - tUINT32 rsvd1; - tUINT8 lun[8]; - tUINT8 rsvd2[2]; - tUINT8 task_mgt_flags; - tUINT8 rsvd3; - tUINT64 cmd_tag; - tUINT8 rsvd4[8]; + u8 opcode; + u8 rsvd[7]; + u64 tag; + u32 rsvd1; + u8 lun[8]; + u8 rsvd2[2]; + u8 task_mgt_flags; + u8 rsvd3; + u64 cmd_tag; + u8 rsvd4[8]; } srp_tm_t; #define CMD_TAG_OFFSET 1 #define CMD_LUN_OFFSET 4 typedef struct _srp_cmd_t { - tUINT8 opcode; - tUINT8 rsvd1[4]; + u8 opcode; + u8 rsvd1[4]; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 difmt:4, dofmt:4; + u8 difmt:4, dofmt:4; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 dofmt:4, difmt:4; + u8 dofmt:4, difmt:4; #else #error "Please fix " #endif - tUINT8 docount; - tUINT8 dicount; - tUINT64 tag; - tUINT8 rsvd3[4]; - tUINT8 lun[8]; - tUINT8 rsvd4; + u8 docount; + u8 dicount; + u64 tag; + u8 rsvd3[4]; + u8 lun[8]; + u8 rsvd4; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 task_attr:3, rsvd5:5; + u8 task_attr:3, rsvd5:5; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 rsvd5:5, task_attr:3; + u8 rsvd5:5, task_attr:3; #endif - tUINT8 rsvd6; + u8 rsvd6; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 rsvd7:2, add_cdb_len:6; + u8 rsvd7:2, add_cdb_len:6; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 add_cdb_len:6, rsvd7:2; + u8 add_cdb_len:6, rsvd7:2; #endif - tUINT8 cdb[16]; + u8 cdb[16]; } srp_cmd_t; typedef struct _srp_cmd_indirect_data_buffer_descriptor { srp_remote_buf_t indirect_table_descriptor; - tUINT32 total_length; /* of the data transfer */ + u32 total_length; /* of the data transfer */ srp_remote_buf_t partial_memory_descriptor_list[0]; } srp_cmd_indirect_data_buffer_descriptor_t; typedef struct _resp_data { - tUINT8 rsvd[3]; - tUINT8 response_code; + u8 rsvd[3]; + u8 response_code; } resp_data_t; typedef struct _srp_resp { - tUINT8 opcode; - tUINT8 rsvd1[3]; - tUINT32 request_limit_delta; - tUINT64 tag; + u8 opcode; + u8 rsvd1[3]; + u32 request_limit_delta; + u64 tag; union { struct { - tUINT8 rsvd[2]; + u8 rsvd[2]; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 rspvalid:1, + u8 rspvalid:1, snsvalid:1, doover:1, dounder:1, diover:1, diunder:1, rsvd0:2; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 rsvd0:2, + u8 rsvd0:2, diunder:1, diover:1, dounder:1, doover:1, snsvalid:1, rspvalid:1; #else #error "Please fix " #endif - tUINT8 status; + u8 status; } bit; - tUINT32 word; + u32 word; } status; - tUINT32 data_out_residual_count; - tUINT32 data_in_residual_count; - tUINT32 sense_len; - tUINT32 response_len; + u32 data_out_residual_count; + u32 data_in_residual_count; + u32 sense_len; + u32 response_len; } srp_resp_t; typedef struct srp_RPL_res_s { - tUINT32 lun_list_len; - tUINT32 res; + u32 lun_list_len; + u32 res; union { - tUINT64 l; - tUINT8 b[8]; + u64 l; + u8 b[8]; } u[MAX_SRP_LUN]; } report_lun_res_t; typedef struct _INQUIRYDATA { #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 DeviceType:5; - tUINT8 DeviceTypeQualifier:3; + u8 DeviceType:5; + u8 DeviceTypeQualifier:3; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 DeviceTypeQualifier:3; - tUINT8 DeviceType:5; + u8 DeviceTypeQualifier:3; + u8 DeviceType:5; #endif #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 DeviceTypeModifier:7; - tUINT8 RemovableMedia:1; + u8 DeviceTypeModifier:7; + u8 RemovableMedia:1; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 RemovableMedia:1; - tUINT8 DeviceTypeModifier:7; + u8 RemovableMedia:1; + u8 DeviceTypeModifier:7; #endif - tUINT8 Versions; + u8 Versions; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 ResponseDataFormat:4; - tUINT8 HiSup:1; - tUINT8 NormACA:1; - tUINT8 Obsolete:1; - tUINT8 AERC:1; + u8 ResponseDataFormat:4; + u8 HiSup:1; + u8 NormACA:1; + u8 Obsolete:1; + u8 AERC:1; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 AERC:1; - tUINT8 Obsolete:1; - tUINT8 NormACA:1; - tUINT8 HiSup:1; - tUINT8 ResponseDataFormat:4; + u8 AERC:1; + u8 Obsolete:1; + u8 NormACA:1; + u8 HiSup:1; + u8 ResponseDataFormat:4; #endif - tUINT8 AdditionalLength; - tUINT8 Reserved; + u8 AdditionalLength; + u8 Reserved; #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 BQue:1; - tUINT8 EncServ:1; - tUINT8 VS:1; - tUINT8 MultiP:1; - tUINT8 MChngr:1; - tUINT8 obso2:1; - tUINT8 Obso:1; - tUINT8 ADDR16:1; + u8 BQue:1; + u8 EncServ:1; + u8 VS:1; + u8 MultiP:1; + u8 MChngr:1; + u8 obso2:1; + u8 Obso:1; + u8 ADDR16:1; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 ADDR16:1; - tUINT8 Obso:1; - tUINT8 obso2:1; - tUINT8 MChngr:1; - tUINT8 MultiP:1; - tUINT8 VS:1; - tUINT8 EncServ:1; - tUINT8 BQue:1; + u8 ADDR16:1; + u8 Obso:1; + u8 obso2:1; + u8 MChngr:1; + u8 MultiP:1; + u8 VS:1; + u8 EncServ:1; + u8 BQue:1; #endif #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 SoftReset:1; - tUINT8 CommandQueue:1; - tUINT8 Reserved2:1; - tUINT8 LinkedCommands:1; - tUINT8 Synchronous:1; - tUINT8 Wide16Bit:1; - tUINT8 Wide32Bit:1; - tUINT8 RelativeAddressing:1; + u8 SoftReset:1; + u8 CommandQueue:1; + u8 Reserved2:1; + u8 LinkedCommands:1; + u8 Synchronous:1; + u8 Wide16Bit:1; + u8 Wide32Bit:1; + u8 RelativeAddressing:1; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 RelativeAddressing:1; - tUINT8 Wide32Bit:1; - tUINT8 Wide16Bit:1; - tUINT8 Synchronous:1; - tUINT8 LinkedCommands:1; - tUINT8 Reserved2:1; - tUINT8 CommandQueue:1; - tUINT8 SoftReset:1; + u8 RelativeAddressing:1; + u8 Wide32Bit:1; + u8 Wide16Bit:1; + u8 Synchronous:1; + u8 LinkedCommands:1; + u8 Reserved2:1; + u8 CommandQueue:1; + u8 SoftReset:1; #endif - tUINT8 VendorId[8]; - tUINT8 ProductId[16]; - tUINT8 ProductRevisionLevel[4]; - tUINT8 VendorSpecific[20]; - tUINT8 Resv[2]; - tUINT8 VersionDesc[2]; - tUINT8 Reserved3[36]; + u8 VendorId[8]; + u8 ProductId[16]; + u8 ProductRevisionLevel[4]; + u8 VendorSpecific[20]; + u8 Resv[2]; + u8 VersionDesc[2]; + u8 Reserved3[36]; } inq_data_t; /* @@ -393,23 +393,23 @@ #define DEVICE_IDENTIFIER_TYPE_LOGICAL_UNIT_GROUP 6 typedef struct _T10_identifier_format { - tUINT8 VendorId[8]; - tUINT8 VendorSpecificId[0]; + u8 VendorId[8]; + u8 VendorSpecificId[0]; } T10_identifier_format_t; typedef struct _EUI64_identifier_format { - tUINT8 IeeeVendorID[3]; - tUINT8 VendorSpecificId[5]; + u8 IeeeVendorID[3]; + u8 VendorSpecificId[5]; } EUI64_identifier_format_t; typedef union _device_identifier_format_t { - tUINT8 VendorSpecificId[0]; + u8 VendorSpecificId[0]; T10_identifier_format_t T10Id; EUI64_identifier_format_t EUI64Id; } device_identifier_format_t; typedef struct _device_identifier_t { - tUINT8 IdLength; + u8 IdLength; device_identifier_format_t Id; } device_identifier_t; @@ -426,11 +426,11 @@ #define CODE_SET_BINARY_IDENTIFIER 1 #define CODE_SET_ASCII_IDENTIFIER 2 #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 CodeSet:4; - tUINT8 Reserved1:4; + u8 CodeSet:4; + u8 Reserved1:4; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 Reserved1:4; - tUINT8 CodeSet:4; + u8 Reserved1:4; + u8 CodeSet:4; #endif /* @@ -443,16 +443,16 @@ #define DEV_ID_ASSOCIATION_PORT_DEPENDENT 1 #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 IdentifierType:4; - tUINT8 Association:2; - tUINT8 Reserved2:2; + u8 IdentifierType:4; + u8 Association:2; + u8 Reserved2:2; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 Reserved2:2; - tUINT8 Association:2; - tUINT8 IdentifierType:4; + u8 Reserved2:2; + u8 Association:2; + u8 IdentifierType:4; #endif - tUINT8 Reserved3; + u8 Reserved3; device_identifier_t identifier; } device_identification_descriptor_t; @@ -463,15 +463,15 @@ typedef struct _INQUIRYEVPD { #if defined(__LITTLE_ENDIAN_BITFIELD) - tUINT8 DeviceType:5; - tUINT8 DeviceTypeQualifier:3; + u8 DeviceType:5; + u8 DeviceTypeQualifier:3; #elif defined (__BIG_ENDIAN_BITFIELD) - tUINT8 DeviceTypeQualifier:3; - tUINT8 DeviceType:5; + u8 DeviceTypeQualifier:3; + u8 DeviceType:5; #endif - tUINT8 Page_code; - tUINT8 Resv; - tUINT8 Page_len; + u8 Page_code; + u8 Resv; + u8 Page_len; device_identification_descriptor_t descriptors[0]; } inq_evpddata_t; @@ -492,32 +492,32 @@ #define FCP_LUN_LEN 8 typedef struct fcp_cntl_s { - tUINT8 reserved; - tUINT8 task_codes; - tUINT8 task_mgmt_flags; - tUINT8 exec_mgmt_code; + u8 reserved; + u8 task_codes; + u8 task_mgmt_flags; + u8 exec_mgmt_code; } fcp_cntl_t; /* * FCP_CMND IU */ typedef struct fcp_cmnd_s { - tUINT8 lun[8]; + u8 lun[8]; fcp_cntl_t cntl; - tUINT8 cdb[FCP_CDB_LEN]; - tUINT32 data_len; + u8 cdb[FCP_CDB_LEN]; + u32 data_len; } fcp_cmnd_t; /* * FCP_RSP IU */ typedef struct fcp_rsp_s { - tUINT8 reserved[8]; - tUINT8 status[4]; - tUINT32 residual; - tUINT32 sns_len; - tUINT32 rsp_len; - tUINT8 rsp_sns[FCP_RSP_SNS_BUF_SIZE]; + u8 reserved[8]; + u8 status[4]; + u32 residual; + u32 sns_len; + u32 rsp_len; + u8 rsp_sns[FCP_RSP_SNS_BUF_SIZE]; } fcp_rsp_t; #pragma pack() Index: drivers/infiniband/ulp/sdp/sdp_queue.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_queue.h (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_queue.h (working copy) @@ -55,8 +55,8 @@ */ struct sdpc_desc_q { struct sdpc_desc *head; /* double linked list of advertisments */ - tINT32 size; /* current number of advertisments in table */ - tUINT16 count[TS_SDP_GENERIC_TYPE_NONE]; /* object specific counter */ + s32 size; /* current number of advertisments in table */ + u16 count[TS_SDP_GENERIC_TYPE_NONE]; /* object specific counter */ }; /* struct sdpc_desc_q */ /* * object destruction callback type Index: drivers/infiniband/ulp/sdp/sdp_write.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_write.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_write.c (working copy) @@ -37,12 +37,12 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_event_write -- RDMA write event handler. */ -tINT32 sdp_event_write(struct sdp_opt *conn, struct ib_wc *comp) +s32 sdp_event_write(struct sdp_opt *conn, struct ib_wc *comp) { struct sdpc_iocb *iocb; struct sdpc_buff *buff; - tINT32 result; - tINT32 type; + s32 result; + s32 type; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(comp, -EINVAL); Index: drivers/infiniband/ulp/sdp/sdp_rcvd.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_rcvd.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_rcvd.c (working copy) @@ -649,7 +649,7 @@ static int _sdp_rcvd_snk_cancel(struct sdp_opt *conn, struct sdpc_buff *buff) { struct sdpc_advt *advt; - tINT32 counter; + s32 counter; int result; TS_CHECK_NULL(conn, -EINVAL); @@ -1037,7 +1037,7 @@ struct msg_hdr_srcah *srcah; struct sdpc_advt *advt; int result; - tINT32 size; + s32 size; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -1364,7 +1364,7 @@ /* ========================================================================= */ /*..sdp_event_recv -- recv event demultiplexing into sdp messages. */ -tINT32 sdp_event_recv(struct sdp_opt *conn, struct ib_wc *comp) +s32 sdp_event_recv(struct sdp_opt *conn, struct ib_wc *comp) { tGW_SDP_EVENT_CB_FUNC dispatch_func; struct sdpc_buff *buff; @@ -1452,8 +1452,8 @@ * the number we've sent and the remote host has received. */ conn->r_recv_bf = (buff->bsdh_hdr->recv_bufs - - abs((tINT32) conn->send_seq - - (tINT32) buff->bsdh_hdr->seq_ack)); + abs((s32) conn->send_seq - + (s32) buff->bsdh_hdr->seq_ack)); /* * dispatch */ Index: drivers/infiniband/ulp/sdp/sdp_proto.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_proto.h (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_proto.h (working copy) @@ -92,7 +92,7 @@ tSDP_BUFF_TEST_FUNC test_func, void *usr_arg); -int sdp_buff_pool_init(tUINT32 buff_min, tUINT32 buff_max); +int sdp_buff_pool_init(u32 buff_min, u32 buff_max); void sdp_buff_pool_destroy(void); @@ -141,15 +141,15 @@ int sdp_wall_abort(struct sdp_opt *conn); -tINT32 sdp_recv_buff(struct sdp_opt *conn, struct sdpc_buff *buff); +s32 sdp_recv_buff(struct sdp_opt *conn, struct sdpc_buff *buff); /* --------------------------------------------------------------------- */ /* Zcopy advertisment managment */ /* --------------------------------------------------------------------- */ -tINT32 sdp_main_advt_init(void); +s32 sdp_main_advt_init(void); -tINT32 sdp_main_advt_cleanup(void); +s32 sdp_main_advt_cleanup(void); -struct sdpc_advt_q *sdp_advt_q_create(tINT32 * result); +struct sdpc_advt_q *sdp_advt_q_create(s32 * result); int sdp_advt_q_init(struct sdpc_advt_q *table); @@ -169,11 +169,11 @@ /* --------------------------------------------------------------------- */ /* Zcopy IOCB managment */ /* --------------------------------------------------------------------- */ -tINT32 sdp_main_iocb_init(void); +s32 sdp_main_iocb_init(void); -tINT32 sdp_main_iocb_cleanup(void); +s32 sdp_main_iocb_cleanup(void); -struct sdpc_iocb_q *sdp_iocb_q_create(tINT32 * result); +struct sdpc_iocb_q *sdp_iocb_q_create(s32 * result); int sdp_iocb_q_init(struct sdpc_iocb_q *table); @@ -195,9 +195,9 @@ int sdp_iocb_q_put_tail(struct sdpc_iocb_q *table, struct sdpc_iocb *iocb); -struct sdpc_iocb *sdp_iocb_q_get_key(struct sdpc_iocb_q *table, tUINT32 key); +struct sdpc_iocb *sdp_iocb_q_get_key(struct sdpc_iocb_q *table, u32 key); -struct sdpc_iocb *sdp_iocb_q_lookup(struct sdpc_iocb_q *table, tUINT32 key); +struct sdpc_iocb *sdp_iocb_q_lookup(struct sdpc_iocb_q *table, u32 key); int sdp_iocb_q_cancel(struct sdpc_iocb_q *table, u32 mask, ssize_t comp); @@ -248,7 +248,7 @@ int sdp_desc_q_size(struct sdpc_desc_q *table); -struct sdpc_desc_q *sdp_desc_q_create(tINT32 * result); +struct sdpc_desc_q *sdp_desc_q_create(s32 * result); int sdp_desc_q_init(struct sdpc_desc_q *table); @@ -256,15 +256,15 @@ int sdp_desc_q_destroy(struct sdpc_desc_q *table); -tINT32 sdp_main_desc_init(void); +s32 sdp_main_desc_init(void); -tINT32 sdp_main_desc_cleanup(void); +s32 sdp_main_desc_cleanup(void); /* --------------------------------------------------------------------- */ /* proc entry managment */ /* --------------------------------------------------------------------- */ -tINT32 sdp_main_proc_init(void); +s32 sdp_main_proc_init(void); -tINT32 sdp_main_proc_cleanup(void); +s32 sdp_main_proc_cleanup(void); /* --------------------------------------------------------------------- */ /* connection table */ /* --------------------------------------------------------------------- */ @@ -278,7 +278,7 @@ int send_buff_max, int send_usig_max); -tINT32 sdp_conn_table_clear(void); +s32 sdp_conn_table_clear(void); int sdp_proc_dump_conn_main(char *buffer, int max_size, @@ -305,17 +305,17 @@ off_t start_index, long *end_index); -tINT32 sdp_conn_table_remove(struct sdp_opt *conn); +s32 sdp_conn_table_remove(struct sdp_opt *conn); -struct sdp_opt *sdp_conn_table_lookup(tINT32 entry); +struct sdp_opt *sdp_conn_table_lookup(s32 entry); -struct sdp_opt *sdp_conn_alloc(tINT32 priority, tTS_IB_CM_COMM_ID comm_id); +struct sdp_opt *sdp_conn_alloc(s32 priority, tTS_IB_CM_COMM_ID comm_id); int sdp_conn_alloc_ib(struct sdp_opt *conn, struct ib_device *device, tTS_IB_PORT hw_port); -tINT32 sdp_conn_destruct(struct sdp_opt *conn); +s32 sdp_conn_destruct(struct sdp_opt *conn); void sdp_inet_wake_send(struct sock *sk); @@ -329,23 +329,23 @@ /* --------------------------------------------------------------------- */ /* port/queue managment */ /* --------------------------------------------------------------------- */ -tINT32 sdp_inet_accept_q_put(struct sdp_opt *listen_conn, struct sdp_opt *accept_conn); +s32 sdp_inet_accept_q_put(struct sdp_opt *listen_conn, struct sdp_opt *accept_conn); struct sdp_opt *sdp_inet_accept_q_get(struct sdp_opt *listen_conn); -tINT32 sdp_inet_accept_q_remove(struct sdp_opt *accept_conn); +s32 sdp_inet_accept_q_remove(struct sdp_opt *accept_conn); -tINT32 sdp_inet_listen_start(struct sdp_opt *listen_conn); +s32 sdp_inet_listen_start(struct sdp_opt *listen_conn); -tINT32 sdp_inet_listen_stop(struct sdp_opt *listen_conn); +s32 sdp_inet_listen_stop(struct sdp_opt *listen_conn); -struct sdp_opt *sdp_inet_listen_lookup(tUINT32 addr, tUINT16 port); +struct sdp_opt *sdp_inet_listen_lookup(u32 addr, u16 port); -tINT32 sdp_inet_port_get(struct sdp_opt *conn, tUINT16 port); +s32 sdp_inet_port_get(struct sdp_opt *conn, u16 port); -tINT32 sdp_inet_port_put(struct sdp_opt *conn); +s32 sdp_inet_port_put(struct sdp_opt *conn); -tINT32 sdp_inet_port_inherit(struct sdp_opt *parent, struct sdp_opt *child); +s32 sdp_inet_port_inherit(struct sdp_opt *parent, struct sdp_opt *child); /* --------------------------------------------------------------------- */ /* post functions */ @@ -366,36 +366,36 @@ int sdp_cm_confirm(struct sdp_opt *conn); -tINT32 sdp_recv_flush(struct sdp_opt *conn); +s32 sdp_recv_flush(struct sdp_opt *conn); -tINT32 sdp_send_flush(struct sdp_opt *conn); +s32 sdp_send_flush(struct sdp_opt *conn); -tINT32 sdp_send_ctrl_ack(struct sdp_opt *conn); +s32 sdp_send_ctrl_ack(struct sdp_opt *conn); -tINT32 sdp_send_ctrl_disconnect(struct sdp_opt *conn); +s32 sdp_send_ctrl_disconnect(struct sdp_opt *conn); -tINT32 sdp_send_ctrl_abort(struct sdp_opt *conn); +s32 sdp_send_ctrl_abort(struct sdp_opt *conn); -tINT32 sdp_send_ctrl_send_sm(struct sdp_opt *conn); +s32 sdp_send_ctrl_send_sm(struct sdp_opt *conn); -tINT32 sdp_send_ctrl_snk_avail(struct sdp_opt *conn, - tUINT32 size, - tUINT32 rkey, - tUINT64 addr); +s32 sdp_send_ctrl_snk_avail(struct sdp_opt *conn, + u32 size, + u32 rkey, + u64 addr); -tINT32 sdp_send_ctrl_resize_buff_ack(struct sdp_opt *conn, tUINT32 size); +s32 sdp_send_ctrl_resize_buff_ack(struct sdp_opt *conn, u32 size); -tINT32 sdp_send_ctrl_rdma_rd(struct sdp_opt *conn, tINT32 size); +s32 sdp_send_ctrl_rdma_rd(struct sdp_opt *conn, s32 size); -tINT32 sdp_send_ctrl_rdma_wr(struct sdp_opt *conn, tUINT32 size); +s32 sdp_send_ctrl_rdma_wr(struct sdp_opt *conn, u32 size); -tINT32 sdp_send_ctrl_mode_ch(struct sdp_opt *conn, tUINT8 mode); +s32 sdp_send_ctrl_mode_ch(struct sdp_opt *conn, u8 mode); -tINT32 sdp_send_ctrl_src_cancel(struct sdp_opt *conn); +s32 sdp_send_ctrl_src_cancel(struct sdp_opt *conn); -tINT32 sdp_send_ctrl_snk_cancel(struct sdp_opt *conn); +s32 sdp_send_ctrl_snk_cancel(struct sdp_opt *conn); -tINT32 sdp_send_ctrl_snk_cancel_ack(struct sdp_opt *conn); +s32 sdp_send_ctrl_snk_cancel_ack(struct sdp_opt *conn); /* --------------------------------------------------------------------- */ /* inet functions */ @@ -403,7 +403,7 @@ /* --------------------------------------------------------------------- */ /* event functions */ /* --------------------------------------------------------------------- */ -tINT32 sdp_cq_event_locked(struct ib_wc *comp, struct sdp_opt *conn); +s32 sdp_cq_event_locked(struct ib_wc *comp, struct sdp_opt *conn); void sdp_cq_event_handler(struct ib_cq *cq, void *arg); @@ -411,13 +411,13 @@ tTS_IB_CM_COMM_ID comm_id, void *params, void *arg); -tINT32 sdp_event_recv(struct sdp_opt *conn, struct ib_wc *comp); +s32 sdp_event_recv(struct sdp_opt *conn, struct ib_wc *comp); -tINT32 sdp_event_send(struct sdp_opt *conn, struct ib_wc *comp); +s32 sdp_event_send(struct sdp_opt *conn, struct ib_wc *comp); -tINT32 sdp_event_read(struct sdp_opt *conn, struct ib_wc *comp); +s32 sdp_event_read(struct sdp_opt *conn, struct ib_wc *comp); -tINT32 sdp_event_write(struct sdp_opt *conn, struct ib_wc *comp); +s32 sdp_event_write(struct sdp_opt *conn, struct ib_wc *comp); /* --------------------------------------------------------------------- */ /* internal connection lock functions */ @@ -428,7 +428,7 @@ void sdp_conn_internal_relock(struct sdp_opt *conn); -tINT32 sdp_conn_cq_drain(struct ib_cq *cq, struct sdp_opt *conn); +s32 sdp_conn_cq_drain(struct ib_cq *cq, struct sdp_opt *conn); /* --------------------------------------------------------------------- */ /* DATA transport */ @@ -469,9 +469,9 @@ /* --------------------------------------------------------------------- */ /* ====================================================================== */ /*..__sdp_inet_write_space -- writable space on send side. */ -static __inline__ tINT32 __sdp_inet_write_space(struct sdp_opt *conn, tINT32 urg) +static __inline__ s32 __sdp_inet_write_space(struct sdp_opt *conn, s32 urg) { - tINT32 size; + s32 size; TS_CHECK_NULL(conn, -EINVAL); /* @@ -497,7 +497,7 @@ /* ====================================================================== */ /*..__sdp_inet_writable -- return non-zero if socket is writable. */ -static __inline__ tINT32 __sdp_inet_writable(struct sdp_opt *conn) +static __inline__ s32 __sdp_inet_writable(struct sdp_opt *conn) { TS_CHECK_NULL(conn, -EINVAL); @@ -514,10 +514,10 @@ /* ======================================================================== */ /*..__sdp_conn_stat_dump -- dump stats to the log */ -static __inline__ tINT32 __sdp_conn_stat_dump(struct sdp_opt *conn) +static __inline__ s32 __sdp_conn_stat_dump(struct sdp_opt *conn) { #ifdef _TS_SDP_CONN_STATS_REC - tUINT32 counter; + u32 counter; TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_INOUT, "STAT: src <%u> snk <%u>", conn->src_serv, conn->snk_serv); @@ -540,10 +540,10 @@ /* ======================================================================== */ /*..__sdp_conn_state_dump -- dump state information to the log */ -static __inline__ tINT32 __sdp_conn_state_dump(struct sdp_opt *conn) +static __inline__ s32 __sdp_conn_state_dump(struct sdp_opt *conn) { #ifdef _TS_SDP_CONN_STATE_REC - tUINT32 counter; + u32 counter; TS_CHECK_NULL(conn, -EINVAL); @@ -593,14 +593,14 @@ /* ======================================================================== */ /*..__sdp_conn_error -- get the connections error value destructively. */ -static inline tINT32 __sdp_conn_error(struct sdp_opt *conn) +static inline s32 __sdp_conn_error(struct sdp_opt *conn) { /* * The connection error parameter is set and read under the connection * lock, however the linux socket error, needs to be xchg'd since the * SO_ERROR getsockopt happens outside of the connection lock. */ - tINT32 error = xchg(&TS_SDP_OS_SK_ERR(conn->sk), 0); + s32 error = xchg(&TS_SDP_OS_SK_ERR(conn->sk), 0); TS_SDP_OS_CONN_SET_ERR(conn, 0); return -error; Index: drivers/infiniband/ulp/sdp/sdp_read.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_read.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_read.c (working copy) @@ -38,11 +38,11 @@ /* ========================================================================= */ /*.._sdp_event_read_advt -- RDMA read event handler for source advertisments. */ -static tINT32 +static s32 _sdp_event_read_advt(struct sdp_opt *conn, struct ib_wc *comp) { struct sdpc_advt *advt; - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(comp, -EINVAL); @@ -114,12 +114,12 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_event_read -- RDMA read event handler. */ -tINT32 sdp_event_read(struct sdp_opt *conn, struct ib_wc *comp) +s32 sdp_event_read(struct sdp_opt *conn, struct ib_wc *comp) { struct sdpc_iocb *iocb; struct sdpc_buff *buff; - tINT32 result; - tINT32 type; + s32 result; + s32 type; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(comp, -EINVAL); Index: drivers/infiniband/ulp/sdp/sdp_send.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_send.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_send.c (working copy) @@ -41,10 +41,10 @@ /* ========================================================================= */ /*.._sdp_inet_write_cancel_func -- lookup function for cancelation */ -static tINT32 _sdp_inet_write_cancel_func(struct sdpc_desc *element, void *arg) +static s32 _sdp_inet_write_cancel_func(struct sdpc_desc *element, void *arg) { struct sdpc_iocb *iocb = (struct sdpc_iocb *) element; - tINT32 value = (tINT32) (unsigned long)arg; + s32 value = (s32) (unsigned long)arg; TS_CHECK_NULL(element, -EINVAL); @@ -60,13 +60,13 @@ /* ========================================================================= */ /*.._sdp_inet_write_cancel -- cancel an IO operation */ -static tINT32 _sdp_inet_write_cancel(struct kiocb *kiocb - _TS_AIO_UNUSED_CANCEL_PARAM) +static s32 +_sdp_inet_write_cancel(struct kiocb *kiocb _TS_AIO_UNUSED_CANCEL_PARAM) { struct sock *sk; struct sdp_opt *conn; struct sdpc_iocb *iocb; - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(kiocb, -ERANGE); @@ -241,10 +241,10 @@ /* ========================================================================= */ /*.._sdp_send_buff_post -- Post a buffer send on a SDP connection. */ -static tINT32 _sdp_send_buff_post(struct sdp_opt *conn, struct sdpc_buff *buff) +static s32 _sdp_send_buff_post(struct sdp_opt *conn, struct sdpc_buff *buff) { struct ib_send_param send_param = { 0 }; - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -402,10 +402,11 @@ /* ========================================================================= */ /*.._sdp_send_data_buff_post -- Post data for buffered transmission */ -static tINT32 _sdp_send_data_buff_post(struct sdp_opt *conn, struct sdpc_buff *buff) +static s32 +_sdp_send_data_buff_post(struct sdp_opt *conn, struct sdpc_buff *buff) { struct sdpc_advt *advt; - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -526,12 +527,12 @@ /* ========================================================================= */ /*.._sdp_send_data_buff_snk -- Post data for buffered transmission */ -static tINT32 _sdp_send_data_buff_snk(struct sdp_opt *conn, struct sdpc_buff *buff) +static s32 _sdp_send_data_buff_snk(struct sdp_opt *conn, struct sdpc_buff *buff) { struct ib_send_param send_param = { 0 }; struct sdpc_advt *advt; - tINT32 result; - tINT32 zcopy; + s32 result; + s32 zcopy; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -678,13 +679,13 @@ /* ========================================================================= */ /*.._sdp_send_data_iocb_snk -- process a zcopy write advert in the data path */ -tINT32 _sdp_send_data_iocb_snk(struct sdp_opt *conn, struct sdpc_iocb *iocb) +s32 _sdp_send_data_iocb_snk(struct sdp_opt *conn, struct sdpc_iocb *iocb) { struct ib_send_param send_param = { 0 }; struct ib_gather_scatter sg_val; struct sdpc_advt *advt; - tINT32 result; - tINT32 zcopy; + s32 result; + s32 zcopy; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(iocb, -EINVAL); @@ -818,11 +819,11 @@ /* ========================================================================= */ /*.._sdp_send_data_iocb_src -- send a zcopy read advertisment in the data path */ -tINT32 _sdp_send_data_iocb_src(struct sdp_opt *conn, struct sdpc_iocb *iocb) +s32 _sdp_send_data_iocb_src(struct sdp_opt *conn, struct sdpc_iocb *iocb) { struct msg_hdr_srcah *src_ah; struct sdpc_buff *buff; - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(iocb, -EINVAL); @@ -932,7 +933,7 @@ if (TS_SDP_MODE_COMB == conn->send_mode) { #ifdef _TS_SDP_AIO_SUPPORT void *vaddr; - tINT32 offset; + s32 offset; /* * In combined mode, it's a protocol requirment to send at * least a byte of data in the SrcAvail. @@ -1005,14 +1006,14 @@ #ifdef _TS_SDP_AIO_SUPPORT /* ========================================================================= */ /*.._sdp_send_data_iocb_buff_kvec -- write into a SDP buffer from a kvec */ -static tINT32 _sdp_send_data_iocb_buff_kvec(struct kvec_dst *src, +static s32 _sdp_send_data_iocb_buff_kvec(struct kvec_dst *src, struct sdpc_buff *buff, - tINT32 len) + s32 len) { void *tail; - tINT32 part; - tINT32 left; - tINT32 copy; + s32 part; + s32 left; + s32 copy; TS_CHECK_NULL(src, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -1023,7 +1024,7 @@ /* * copy from source to buffer */ - copy = min(len, (tINT32) (buff->end - buff->tail)); + copy = min(len, (s32) (buff->end - buff->tail)); tail = buff->tail; for (left = copy; 0 < left;) { @@ -1071,13 +1072,14 @@ /* ========================================================================= */ /*.._sdp_send_data_iocb_buff -- write multiple SDP buffers from an ioc */ -static tINT32 _sdp_send_data_iocb_buff(struct sdp_opt *conn, struct sdpc_iocb *iocb) +static s32 +_sdp_send_data_iocb_buff(struct sdp_opt *conn, struct sdpc_iocb *iocb) { struct sdpc_buff *buff; - tINT32 copy; - tINT32 partial = 0; - tINT32 result; - tINT32 w_space; + s32 copy; + s32 partial = 0; + s32 result; + s32 w_space; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(iocb, -EINVAL); @@ -1166,9 +1168,9 @@ /* ========================================================================= */ /*.._sdp_send_data_iocb -- Post IOCB data for transmission */ -static tINT32 _sdp_send_data_iocb(struct sdp_opt *conn, struct sdpc_iocb *iocb) +static s32 _sdp_send_data_iocb(struct sdp_opt *conn, struct sdpc_iocb *iocb) { - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(iocb, -EINVAL); @@ -1300,9 +1302,10 @@ /* ========================================================================= */ /*.._sdp_send_data_queue_test -- send data buffer if conditions are met */ -static tINT32 _sdp_send_data_queue_test(struct sdp_opt *conn, struct sdpc_desc *element) +static s32 +_sdp_send_data_queue_test(struct sdp_opt *conn, struct sdpc_desc *element) { - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(element, -EINVAL); @@ -1345,10 +1348,10 @@ /* ========================================================================= */ /*.._sdp_send_data_queue_flush -- Flush data from send queue, to send post. */ -static tINT32 _sdp_send_data_queue_flush(struct sdp_opt *conn) +static s32 _sdp_send_data_queue_flush(struct sdp_opt *conn) { struct sdpc_desc *element; - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(conn, -EINVAL); /* @@ -1396,9 +1399,9 @@ /* ========================================================================= */ /*.._sdp_send_data_queue -- send using the data queue if necessary. */ -static tINT32 _sdp_send_data_queue(struct sdp_opt *conn, struct sdpc_desc *element) +static s32 _sdp_send_data_queue(struct sdp_opt *conn, struct sdpc_desc *element) { - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(element, -EINVAL); @@ -1500,13 +1503,13 @@ /* ========================================================================= */ /*.._sdp_send_data_buff_put -- place a buffer into the send queue */ -static __inline__ tINT32 _sdp_send_data_buff_put(struct sdp_opt *conn, - struct sdpc_buff *buff, - tINT32 size, - tINT32 urg) +static __inline__ s32 _sdp_send_data_buff_put(struct sdp_opt *conn, + struct sdpc_buff *buff, + s32 size, + s32 urg) { - tINT32 result = 0; - tINT32 expect; + s32 result = 0; + s32 expect; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -1567,9 +1570,10 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._sdp_send_ctrl_buff_test -- determine if it's OK to post a control msg */ -static tINT32 _sdp_send_ctrl_buff_test(struct sdp_opt *conn, struct sdpc_buff *buff) +static s32 +_sdp_send_ctrl_buff_test(struct sdp_opt *conn, struct sdpc_buff *buff) { - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -1599,10 +1603,10 @@ /* ========================================================================= */ /*.._sdp_send_ctrl_buff_flush -- Flush control buffers, to send post. */ -static tINT32 _sdp_send_ctrl_buff_flush(struct sdp_opt *conn) +static s32 _sdp_send_ctrl_buff_flush(struct sdp_opt *conn) { struct sdpc_desc *element; - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(conn, -EINVAL); /* @@ -1643,9 +1647,10 @@ /* ========================================================================= */ /*.._sdp_send_ctrl_buff_buffered -- Send a buffered control message. */ -static tINT32 _sdp_send_ctrl_buff_buffered(struct sdp_opt *conn, struct sdpc_buff *buff) +static s32 +_sdp_send_ctrl_buff_buffered(struct sdp_opt *conn, struct sdpc_buff *buff) { - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -1684,12 +1689,12 @@ /* ========================================================================= */ /*.._sdp_send_ctrl_buff -- Create and Send a buffered control message. */ -static tINT32 _sdp_send_ctrl_buff(struct sdp_opt *conn, - tUINT8 mid, +static s32 _sdp_send_ctrl_buff(struct sdp_opt *conn, + u8 mid, tBOOLEAN se, tBOOLEAN sig) { - tINT32 result = 0; + s32 result = 0; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -1753,9 +1758,9 @@ /* ========================================================================= */ /*.._sdp_send_ctrl_disconnect -- Send a disconnect request. */ -static tINT32 _sdp_send_ctrl_disconnect(struct sdp_opt *conn) +static s32 _sdp_send_ctrl_disconnect(struct sdp_opt *conn) { - tINT32 result = 0; + s32 result = 0; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -1806,9 +1811,9 @@ /* ========================================================================= */ /*..sdp_send_ctrl_disconnect -- potentially send a disconnect request. */ -tINT32 sdp_send_ctrl_disconnect(struct sdp_opt *conn) +s32 sdp_send_ctrl_disconnect(struct sdp_opt *conn) { - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); /* @@ -1840,7 +1845,7 @@ /* ========================================================================= */ /*..sdp_send_ctrl_ack -- Send a gratuitous Ack. */ -tINT32 sdp_send_ctrl_ack(struct sdp_opt *conn) +s32 sdp_send_ctrl_ack(struct sdp_opt *conn) { TS_CHECK_NULL(conn, -EINVAL); /* @@ -1864,28 +1869,28 @@ /* ========================================================================= */ /*..sdp_send_ctrl_send_sm -- Send a request for buffered mode. */ -tINT32 sdp_send_ctrl_send_sm(struct sdp_opt *conn) +s32 sdp_send_ctrl_send_sm(struct sdp_opt *conn) { return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SEND_SM, TRUE, TRUE); } /* sdp_send_ctrl_send_sm */ /* ========================================================================= */ /*..sdp_send_ctrl_src_cancel -- Send a source cancel */ -tINT32 sdp_send_ctrl_src_cancel(struct sdp_opt *conn) +s32 sdp_send_ctrl_src_cancel(struct sdp_opt *conn) { return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SRC_CANCEL, TRUE, TRUE); } /* sdp_send_ctrl_src_cancel */ /* ========================================================================= */ /*..sdp_send_ctrl_snk_cancel -- Send a sink cancel */ -tINT32 sdp_send_ctrl_snk_cancel(struct sdp_opt *conn) +s32 sdp_send_ctrl_snk_cancel(struct sdp_opt *conn) { return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SNK_CANCEL, TRUE, TRUE); } /* sdp_send_ctrl_snk_cancel */ /* ========================================================================= */ /*..sdp_send_ctrl_snk_cancel_ack -- Send an ack for a sink cancel */ -tINT32 sdp_send_ctrl_snk_cancel_ack(struct sdp_opt *conn) +s32 sdp_send_ctrl_snk_cancel_ack(struct sdp_opt *conn) { return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SNK_CANCEL_ACK, TRUE, TRUE); @@ -1893,7 +1898,7 @@ /* ========================================================================= */ /*..sdp_send_ctrl_abort -- Send an abort message. */ -tINT32 sdp_send_ctrl_abort(struct sdp_opt *conn) +s32 sdp_send_ctrl_abort(struct sdp_opt *conn) { TS_CHECK_NULL(conn, -EINVAL); /* @@ -1904,10 +1909,10 @@ /* ========================================================================= */ /*..sdp_send_ctrl_resize_buff_ack -- Send an ack for a buffer size change */ -tINT32 sdp_send_ctrl_resize_buff_ack(struct sdp_opt *conn, tUINT32 size) +s32 sdp_send_ctrl_resize_buff_ack(struct sdp_opt *conn, u32 size) { struct msg_hdr_crbah *crbah; - tINT32 result = 0; + s32 result = 0; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -1960,10 +1965,10 @@ /* ========================================================================= */ /*..sdp_send_ctrl_rdma_rd -- Send an rdma read completion */ -tINT32 sdp_send_ctrl_rdma_rd(struct sdp_opt *conn, tINT32 size) +s32 sdp_send_ctrl_rdma_rd(struct sdp_opt *conn, s32 size) { struct msg_hdr_rrch *rrch; - tINT32 result = 0; + s32 result = 0; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -1998,7 +2003,7 @@ buff->bsdh_hdr->flags = TS_SDP_MSG_FLAG_NON_FLAG; buff->tail += sizeof(struct msg_hdr_bsdh); rrch = (struct msg_hdr_rrch *) buff->tail; - rrch->size = (tUINT32) size; + rrch->size = (u32) size; buff->tail += sizeof(struct msg_hdr_rrch); /* * solicit event @@ -2036,10 +2041,10 @@ /* ========================================================================= */ /*..sdp_send_ctrl_rdma_wr -- Send an rdma write completion */ -tINT32 sdp_send_ctrl_rdma_wr(struct sdp_opt *conn, tUINT32 size) +s32 sdp_send_ctrl_rdma_wr(struct sdp_opt *conn, u32 size) { struct msg_hdr_rwch *rwch; - tINT32 result = 0; + s32 result = 0; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -2098,13 +2103,13 @@ /* ========================================================================= */ /*..sdp_send_ctrl_snk_avail -- Send a sink available message */ -tINT32 sdp_send_ctrl_snk_avail(struct sdp_opt *conn, - tUINT32 size, - tUINT32 rkey, - tUINT64 addr) +s32 sdp_send_ctrl_snk_avail(struct sdp_opt *conn, + u32 size, + u32 rkey, + u64 addr) { struct msg_hdr_snkah *snkah; - tINT32 result = 0; + s32 result = 0; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -2170,10 +2175,10 @@ /* ========================================================================= */ /*..sdp_send_ctrl_mode_ch -- Send a mode change command */ -tINT32 sdp_send_ctrl_mode_ch(struct sdp_opt *conn, tUINT8 mode) +s32 sdp_send_ctrl_mode_ch(struct sdp_opt *conn, u8 mode) { struct msg_hdr_mch *mch; - tINT32 result = 0; + s32 result = 0; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -2268,10 +2273,10 @@ /* ========================================================================= */ /*.._sdp_send_flush_advt -- Flush passive sink advertisments */ -static tINT32 _sdp_send_flush_advt(struct sdp_opt *conn) +static s32 _sdp_send_flush_advt(struct sdp_opt *conn) { struct sdpc_advt *advt; - tINT32 result; + s32 result; /* * If there is no data in the pending or active send pipes, and a * partially complete sink advertisment is pending, then it needs @@ -2312,9 +2317,9 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_send_flush -- Flush buffers from send queue, in to send post. */ -tINT32 sdp_send_flush(struct sdp_opt *conn) +s32 sdp_send_flush(struct sdp_opt *conn) { - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(conn, -EINVAL); /* @@ -2389,10 +2394,10 @@ struct sock *sk; struct sdp_opt *conn; struct sdpc_buff *buff; - tINT32 result = 0; - tINT32 copied = 0; - tINT32 copy; - tINT32 oob; + s32 result = 0; + s32 copied = 0; + s32 copy; + s32 oob; long timeout = -1; TS_CHECK_NULL(sock, -EINVAL); @@ -2565,15 +2570,15 @@ #ifdef _TS_SDP_AIO_SUPPORT /* ========================================================================= */ /*.._sdp_inet_write_fast -- write multiple SDP buffers from an iocb */ -static tINT32 _sdp_inet_write_fast(struct sdp_opt *conn, - struct kvec_dst *src, - tINT32 len) +static s32 _sdp_inet_write_fast(struct sdp_opt *conn, + struct kvec_dst *src, + s32 len) { struct sdpc_buff *buff; - tINT32 copied = 0; - tINT32 expect; - tINT32 result; - tINT32 copy; + s32 copied = 0; + s32 expect; + s32 result; + s32 copy; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(src, -EINVAL); @@ -2631,8 +2636,8 @@ struct sock *sk; struct sdp_opt *conn; struct sdpc_iocb *iocb; - tINT32 copied = 0; - tINT32 result = 0; + s32 copied = 0; + s32 result = 0; TS_CHECK_NULL(sock, -EINVAL); TS_CHECK_NULL(sock->sk, -EINVAL); @@ -2654,7 +2659,7 @@ "<%d:%d:%08x>", cb.vec->max_nr, cb.vec->nr, cb.vec->veclet->offset, cb.vec->veclet->length, - (tUINT32) cb.fn, req->key, req->users, (tUINT32) req->data); + (u32) cb.fn, req->key, req->users, (u32) req->data); #endif /* * initialize memory destination Index: drivers/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_conn.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -35,7 +35,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_inet_accept_q_put -- put a conn into a listen conn's accept Q. */ -tINT32 sdp_inet_accept_q_put(struct sdp_opt *listen_conn, struct sdp_opt *accept_conn) +s32 sdp_inet_accept_q_put(struct sdp_opt *listen_conn, struct sdp_opt *accept_conn) { struct sdp_opt *next_conn; @@ -111,7 +111,7 @@ /* ========================================================================= */ /*..sdp_inet_accept_q_remove -- remove a conn from a conn's accept Q. */ -tINT32 sdp_inet_accept_q_remove(struct sdp_opt *accept_conn) +s32 sdp_inet_accept_q_remove(struct sdp_opt *accept_conn) { struct sdp_opt *next_conn; struct sdp_opt *prev_conn; @@ -149,7 +149,7 @@ /* ========================================================================= */ /*..sdp_inet_listen_start -- start listening for new connections on a socket */ -tINT32 sdp_inet_listen_start(struct sdp_opt *conn) +s32 sdp_inet_listen_start(struct sdp_opt *conn) { unsigned long flags; @@ -188,10 +188,10 @@ /* ========================================================================= */ /*..sdp_inet_listen_stop -- stop listening for new connections on a socket */ -tINT32 sdp_inet_listen_stop(struct sdp_opt *listen_conn) +s32 sdp_inet_listen_stop(struct sdp_opt *listen_conn) { struct sdp_opt *accept_conn; - tINT32 result; + s32 result; unsigned long flags; TS_CHECK_NULL(listen_conn, -EINVAL); @@ -256,7 +256,7 @@ /* ========================================================================= */ /*..sdp_inet_listen_lookup -- lookup a connection in the listen list */ -struct sdp_opt *sdp_inet_listen_lookup(tUINT32 addr, tUINT16 port) +struct sdp_opt *sdp_inet_listen_lookup(u32 addr, u16 port) { struct sdp_opt *conn; unsigned long flags; @@ -284,17 +284,17 @@ /* ========================================================================= */ /*..sdp_inet_port_get -- bind a socket to a port. */ -tINT32 sdp_inet_port_get(struct sdp_opt *conn, tUINT16 port) +s32 sdp_inet_port_get(struct sdp_opt *conn, u16 port) { struct sock *sk; struct sock *srch; struct sdp_opt *look; - tINT32 counter; - tINT32 low_port; - tINT32 top_port; - tINT32 port_ok; - tINT32 result; - static tINT32 rover = -1; + s32 counter; + s32 low_port; + s32 top_port; + s32 port_ok; + s32 result; + static s32 rover = -1; unsigned long flags; TS_CHECK_NULL(conn, -EINVAL); @@ -423,7 +423,7 @@ /* ========================================================================= */ /*..sdp_inet_port_put -- unbind a socket from a port. */ -tINT32 sdp_inet_port_put(struct sdp_opt *conn) +s32 sdp_inet_port_put(struct sdp_opt *conn) { unsigned long flags; @@ -456,9 +456,9 @@ /* ========================================================================= */ /*..sdp_inet_port_inherit -- inherit a port from another socket (accept) */ -tINT32 sdp_inet_port_inherit(struct sdp_opt *parent, struct sdp_opt *child) +s32 sdp_inet_port_inherit(struct sdp_opt *parent, struct sdp_opt *child) { - tINT32 result; + s32 result; unsigned long flags; TS_CHECK_NULL(child, -EINVAL); @@ -497,10 +497,10 @@ /* ========================================================================= */ /*..sdp_conn_table_insert -- insert a connection into the connection table */ -tINT32 sdp_conn_table_insert(struct sdp_opt *conn) +s32 sdp_conn_table_insert(struct sdp_opt *conn) { - tINT32 counter; - tINT32 result = -ENOMEM; + s32 counter; + s32 result = -ENOMEM; unsigned long flags; TS_CHECK_NULL(conn, -EINVAL); @@ -547,9 +547,9 @@ /* ========================================================================= */ /*..sdp_conn_table_remove -- remove a connection from the connection table */ -tINT32 sdp_conn_table_remove(struct sdp_opt *conn) +s32 sdp_conn_table_remove(struct sdp_opt *conn) { - tINT32 result = 0; + s32 result = 0; unsigned long flags; TS_CHECK_NULL(conn, -EINVAL); @@ -587,7 +587,7 @@ /* ========================================================================= */ /*..sdp_conn_table_lookup -- look up connection in the connection table */ -struct sdp_opt *sdp_conn_table_lookup(tINT32 entry) +struct sdp_opt *sdp_conn_table_lookup(s32 entry) { struct sdp_opt *conn; unsigned long flags; @@ -625,10 +625,10 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_conn_destruct -- final destructor for connection. */ -tINT32 sdp_conn_destruct(struct sdp_opt *conn) +s32 sdp_conn_destruct(struct sdp_opt *conn) { - tINT32 result = 0; - tINT32 dump = 0; + s32 result = 0; + s32 dump = 0; if (NULL == conn) { @@ -799,10 +799,10 @@ void sdp_conn_internal_relock(struct sdp_opt *conn) { struct ib_wc entry; - tINT32 result_r; - tINT32 result_s; - tINT32 result; - tINT32 rearm = 1; + s32 result_r; + s32 result_s; + s32 result; + s32 rearm = 1; while (1) { @@ -880,12 +880,12 @@ /* ========================================================================= */ /*..sdp_conn_cq_drain -- drain one of the the connection's CQs */ -tINT32 sdp_conn_cq_drain(struct ib_cq *cq, struct sdp_opt *conn) +s32 sdp_conn_cq_drain(struct ib_cq *cq, struct sdp_opt *conn) { struct ib_wc entry; - tINT32 result; - tINT32 rearm = 1; - tINT32 calls = 0; + s32 result; + s32 rearm = 1; + s32 calls = 0; /* * the function should only be called under the connection locks * spinlock to ensure the call is serialized to avoid races. @@ -956,7 +956,7 @@ /*..sdp_conn_internal_unlock -- lock the connection (use only from macro) */ void sdp_conn_internal_unlock(struct sdp_opt *conn) { - tINT32 calls = 0; + s32 calls = 0; /* * poll CQs for events. */ @@ -980,7 +980,7 @@ /* ========================================================================= */ /*.._sdp_conn_lock_init -- initialize connection lock */ -static tINT32 _sdp_conn_lock_init(struct sdp_opt *conn) +static s32 _sdp_conn_lock_init(struct sdp_opt *conn) { TS_CHECK_NULL(conn, -EINVAL); @@ -1191,11 +1191,11 @@ /* ========================================================================= */ /*..sdp_conn_alloc -- allocate a new socket, and init. */ -struct sdp_opt *sdp_conn_alloc(tINT32 priority, tTS_IB_CM_COMM_ID comm_id) +struct sdp_opt *sdp_conn_alloc(s32 priority, tTS_IB_CM_COMM_ID comm_id) { struct sdp_opt *conn; struct sock *sk; - tINT32 result; + s32 result; sk = sk_alloc(_dev_root_s.proto, priority, 1, _dev_root_s.sock_cache); if (NULL == sk) { @@ -1808,8 +1808,8 @@ { struct sdev_hca_port *port; struct sdev_hca *hca; - tUINT64 subnet_prefix; - tUINT64 guid; + u64 subnet_prefix; + u64 guid; int hca_count; int port_count; int offset = 0; @@ -1892,7 +1892,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._sdp_device_table_init -- create hca list */ -static tINT32 _sdp_device_table_init(struct sdev_root *dev_root) +static s32 _sdp_device_table_init(struct sdev_root *dev_root) { #ifdef _TS_SDP_AIO_SUPPORT tTS_IB_FMR_POOL_PARAM_STRUCT fmr_param_s; @@ -1902,10 +1902,10 @@ struct ib_device *hca_handle; struct sdev_hca_port *port; struct sdev_hca *hca; - tINT32 result; - tINT32 hca_count; - tINT32 port_count; - tINT32 fmr_size; + s32 result; + s32 hca_count; + s32 port_count; + s32 fmr_size; TS_CHECK_NULL(dev_root, -EINVAL); @@ -2072,7 +2072,7 @@ /* ========================================================================= */ /*.._sdp_device_table_cleanup -- delete hca list */ -static tINT32 _sdp_device_table_cleanup(struct sdev_root *dev_root) +static s32 _sdp_device_table_cleanup(struct sdev_root *dev_root) { struct sdev_hca_port *port; struct sdev_hca *hca; @@ -2130,9 +2130,9 @@ int send_buff_max, int send_usig_max) { - tINT32 result; - tINT32 byte_size; - tINT32 page_size; + s32 result; + s32 byte_size; + s32 page_size; TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "INIT: creating connection table."); @@ -2279,7 +2279,7 @@ /* ========================================================================= */ /*..sdp_conn_table_clear -- destroy connection managment and tables */ -tINT32 sdp_conn_table_clear(void) +s32 sdp_conn_table_clear(void) { #if 0 struct sdp_opt *conn; Index: drivers/infiniband/ulp/sdp/sdp_advt.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_advt.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_advt.c (working copy) @@ -168,7 +168,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_advt_q_create - create an advertisment table */ -struct sdpc_advt_q *sdp_advt_q_create(tINT32 * result) +struct sdpc_advt_q *sdp_advt_q_create(s32 * result) { struct sdpc_advt_q *table = NULL; @@ -256,7 +256,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_main_advt_init -- initialize the advertisment caches. */ -tINT32 sdp_main_advt_init(void) +s32 sdp_main_advt_init(void) { int result; @@ -302,7 +302,7 @@ /* ========================================================================= */ /*..sdp_main_advt_cleanup -- cleanup the advertisment caches. */ -tINT32 sdp_main_advt_cleanup(void) +s32 sdp_main_advt_cleanup(void) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: Advertisment cache cleanup."); Index: drivers/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_recv.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_recv.c (working copy) @@ -38,10 +38,10 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._sdp_post_recv_buff -- post a single buffers for data recv */ -static tINT32 _sdp_post_recv_buff(struct sdp_opt *conn) +static s32 _sdp_post_recv_buff(struct sdp_opt *conn) { struct ib_receive_param receive_param = { 0 }; - tINT32 result; + s32 result; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -113,11 +113,11 @@ /* ========================================================================= */ /*.._sdp_post_rdma_buff -- post a single buffers for rdma read on a conn */ -static tINT32 _sdp_post_rdma_buff(struct sdp_opt *conn) +static s32 _sdp_post_rdma_buff(struct sdp_opt *conn) { struct ib_send_param send_param = { 0 }; struct sdpc_advt *advt; - tINT32 result; + s32 result; struct sdpc_buff *buff; TS_CHECK_NULL(conn, -EINVAL); @@ -157,7 +157,7 @@ * the correct range. */ buff->tail = buff->end; - buff->data = buff->tail - min((tINT32) conn->recv_size, advt->size); + buff->data = buff->tail - min((s32) conn->recv_size, advt->size); buff->lkey = conn->l_key; buff->ib_wrid = TS_SDP_WRID_READ_FLAG | conn->recv_wrid++; @@ -248,14 +248,14 @@ /* ========================================================================= */ /*.._sdp_post_rdma_iocb_src -- post a iocb for rdma read on a conn */ -static tINT32 _sdp_post_rdma_iocb_src(struct sdp_opt *conn) +static s32 _sdp_post_rdma_iocb_src(struct sdp_opt *conn) { struct ib_send_param send_param = { 0 }; struct ib_gather_scatter sg_val; struct sdpc_iocb *iocb; struct sdpc_advt *advt; - tINT32 result; - tINT32 zcopy; + s32 result; + s32 zcopy; TS_CHECK_NULL(conn, -EINVAL); /* @@ -414,9 +414,9 @@ /* ========================================================================= */ /*.._sdp_post_rdma_iocb_snk -- post a iocb for rdma read on a conn */ -static tINT32 _sdp_post_rdma_iocb_snk(struct sdp_opt *conn) +static s32 _sdp_post_rdma_iocb_snk(struct sdp_opt *conn) { - tINT32 result = 0; + s32 result = 0; struct sdpc_iocb *iocb; TS_CHECK_NULL(conn, -EINVAL); @@ -534,9 +534,9 @@ /* ========================================================================= */ /*.._sdp_post_rdma -- post a rdma based requests for a connection */ -static tINT32 _sdp_post_rdma(struct sdp_opt *conn) +static s32 _sdp_post_rdma(struct sdp_opt *conn) { - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); /* @@ -635,9 +635,9 @@ /* ========================================================================= */ /*..sdp_recv_flush -- post a certain number of buffers on a connection */ -tINT32 sdp_recv_flush(struct sdp_opt *conn) +s32 sdp_recv_flush(struct sdp_opt *conn) { - tINT32 result; + s32 result; int counter; TS_CHECK_NULL(conn, -EINVAL); @@ -732,7 +732,7 @@ if ((3 > conn->l_advt_bf && conn->l_recv_bf > conn->l_advt_bf) || (TS_SDP_CONN_RECV_POST_ACK < (conn->l_recv_bf - conn->l_advt_bf) && - 0 == ((tUINT32) conn->snk_recv + (tUINT32) conn->src_recv))) { + 0 == ((u32) conn->snk_recv + (u32) conn->src_recv))) { result = sdp_send_ctrl_ack(conn); if (0 > result) { @@ -757,15 +757,15 @@ #ifdef _TS_SDP_AIO_SUPPORT /* ========================================================================= */ /*.._sdp_read_buff_iocb -- read a SDP buffer into a kvec */ -static tINT32 _sdp_read_buff_iocb(struct kvec_dst *dst, - struct sdpc_buff *buff, - tINT32 len) +static s32 _sdp_read_buff_iocb(struct kvec_dst *dst, + struct sdpc_buff *buff, + s32 len) { void *data; void *tail; - tINT32 part; - tINT32 left; - tINT32 copy; + s32 part; + s32 left; + s32 copy; TS_CHECK_NULL(dst, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -785,7 +785,7 @@ /* * copy buffer to dst */ - copy = min(len, (tINT32) (buff->tail - buff->data)); + copy = min(len, (s32) (buff->tail - buff->data)); for (left = copy; 0 < left;) { @@ -830,16 +830,16 @@ /* ========================================================================= */ /*.._sdp_read_buff_iocb_flush -- read multiple SDP buffers into a kvec */ -tINT32 _sdp_read_buff_iocb_flush(struct sock *sk, - struct kvec_dst *dst, - tINT32 len) +s32 _sdp_read_buff_iocb_flush(struct sock *sk, + struct kvec_dst *dst, + s32 len) { struct sdp_opt *conn; struct sdpc_buff *buff; - tINT32 copied = 0; - tINT32 partial; - tINT32 result; - tINT32 expect; + s32 copied = 0; + s32 partial; + s32 result; + s32 expect; TS_CHECK_NULL(sk, -EINVAL); TS_CHECK_NULL(dst, -EINVAL); @@ -905,11 +905,11 @@ /* ========================================================================= */ /*.._sdp_recv_buff_iocb_active -- Ease AIO read pending pressure */ -static tINT32 _sdp_recv_buff_iocb_active(struct sdp_opt *conn, struct sdpc_buff *buff) +static s32 _sdp_recv_buff_iocb_active(struct sdp_opt *conn, struct sdpc_buff *buff) { #ifdef _TS_SDP_AIO_SUPPORT struct sdpc_iocb *iocb; - tINT32 result; + s32 result; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -971,12 +971,12 @@ /* ========================================================================= */ /*.._sdp_recv_buff_iocb_pending -- Ease AIO read pending pressure */ -static tINT32 _sdp_recv_buff_iocb_pending(struct sdp_opt *conn, struct sdpc_buff *buff) +static s32 _sdp_recv_buff_iocb_pending(struct sdp_opt *conn, struct sdpc_buff *buff) { #ifdef _TS_SDP_AIO_SUPPORT struct sdpc_iocb *iocb; - tINT32 copied; - tINT32 result; + s32 copied; + s32 result; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -1043,10 +1043,10 @@ /* ========================================================================= */ /*..sdp_recv_buff -- Process a new buffer based on queue type. */ -tINT32 sdp_recv_buff(struct sdp_opt *conn, struct sdpc_buff *buff) +s32 sdp_recv_buff(struct sdp_opt *conn, struct sdpc_buff *buff) { - tINT32 result; - tINT32 buffered; + s32 result; + s32 buffered; TS_CHECK_NULL(conn, -EINVAL); TS_CHECK_NULL(buff, -EINVAL); @@ -1164,7 +1164,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._sdp_inet_recv_urg_test_func -- recv queue urgent data cleanup function */ -static tINT32 _sdp_inet_recv_urg_test_func(struct sdpc_buff *buff, void *arg) +static s32 _sdp_inet_recv_urg_test_func(struct sdpc_buff *buff, void *arg) { TS_CHECK_NULL(buff, -EINVAL); @@ -1173,10 +1173,10 @@ /* ========================================================================= */ /*.._sdp_inet_recv_urg_trav_func -- recv queue urg data retreival function */ -static tINT32 _sdp_inet_recv_urg_trav_func(struct sdpc_buff *buff, void *arg) +static s32 _sdp_inet_recv_urg_trav_func(struct sdpc_buff *buff, void *arg) { - tUINT8 *value = (tUINT8 *) arg; - tUINT8 update; + u8 *value = (u8 *) arg; + u8 update; TS_CHECK_NULL(buff, -EINVAL); TS_CHECK_NULL(value, -EINVAL); @@ -1186,7 +1186,7 @@ TS_EXPECT(MOD_LNX_SDP, (buff->tail > buff->data)); update = *value; - *value = *(tUINT8 *) (buff->tail - 1); + *value = *(u8 *) (buff->tail - 1); if (0 < update) { @@ -1202,15 +1202,15 @@ /* ========================================================================= */ /*.._sdp_inet_recv_urg -- recv urgent data from the network to user space */ -static tINT32 _sdp_inet_recv_urg(struct sock *sk, - struct msghdr *msg, - int size, - int flags) +static s32 _sdp_inet_recv_urg(struct sock *sk, + struct msghdr *msg, + int size, + int flags) { struct sdp_opt *conn; struct sdpc_buff *buff; - tINT32 result = 0; - tUINT8 value; + s32 result = 0; + u8 value; TS_CHECK_NULL(sk, -EINVAL); TS_CHECK_NULL(msg, -EINVAL); @@ -1287,7 +1287,7 @@ /* ========================================================================= */ /*..sdp_inet_recv -- recv data from the network to user space. */ -tINT32 sdp_inet_recv( +s32 sdp_inet_recv( struct kiocb *iocb, struct socket *sock, struct msghdr *msg, @@ -1301,15 +1301,15 @@ struct sdpc_buff *head = NULL; long timeout; size_t length; - tINT32 result = 0; - tINT32 expect; - tINT32 low_water; - tINT32 copied = 0; - tINT32 copy; - tINT32 update; - tINT32 free_count = 0; - tINT8 oob = 0; - tINT8 ack = 0; + s32 result = 0; + s32 expect; + s32 low_water; + s32 copied = 0; + s32 copy; + s32 update; + s32 free_count = 0; + s8 oob = 0; + s8 ack = 0; struct sdpc_buff_q peek_queue; TS_CHECK_NULL(sock, -EINVAL); @@ -1657,10 +1657,10 @@ #ifdef _TS_SDP_AIO_SUPPORT /* ========================================================================= */ /*.._sdp_inet_read_cancel_func -- lookup function for cancelation */ -static tINT32 _sdp_inet_read_cancel_func(struct sdpc_desc *element, void *arg) +static s32 _sdp_inet_read_cancel_func(struct sdpc_desc *element, void *arg) { struct sdpc_iocb *iocb = (struct sdpc_iocb *) element; - tINT32 value = (tINT32) (unsigned long)arg; + s32 value = (s32) (unsigned long)arg; TS_CHECK_NULL(element, -EINVAL); @@ -1676,13 +1676,13 @@ /* ========================================================================= */ /*.._sdp_inet_read_cancel -- cancel an IO operation */ -static tINT32 _sdp_inet_read_cancel(struct kiocb *kiocb +static s32 _sdp_inet_read_cancel(struct kiocb *kiocb _TS_AIO_UNUSED_CANCEL_PARAM) { struct sock *sk; struct sdp_opt *conn; struct sdpc_iocb *iocb; - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(kiocb, -ERANGE); @@ -1839,9 +1839,9 @@ struct sock *sk; struct sdp_opt *conn; struct sdpc_iocb *iocb; - tINT32 copied = 0; - tINT32 result = 0; - tINT32 expect; + s32 copied = 0; + s32 result = 0; + s32 expect; TS_CHECK_NULL(sock, -EINVAL); TS_CHECK_NULL(sock->sk, -EINVAL); @@ -1864,7 +1864,7 @@ "<%d:%d:%08x>", cb.vec->max_nr, cb.vec->nr, cb.vec->veclet->offset, cb.vec->veclet->length, - (tUINT32) cb.fn, req->key, req->users, (tUINT32) req->data); + (u32) cb.fn, req->key, req->users, (u32) req->data); #endif /* * initialize memory destination Index: drivers/infiniband/ulp/sdp/sdp_advt.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_advt.h (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_advt.h (working copy) @@ -39,25 +39,25 @@ struct sdpc_advt { struct sdpc_advt *next; /* next structure in table */ struct sdpc_advt *prev; /* previous structure in table */ - tUINT32 type; /* element type. (for generic queue) */ + u32 type; /* element type. (for generic queue) */ struct sdpc_advt_q *table; /* table to which this object belongs */ tSDP_GENERIC_DESTRUCT_FUNC release; /* release the object */ /* * advertisment specific */ u32 rkey; /* advertised buffer remote key */ - tINT32 size; /* advertised buffer size */ - tINT32 post; /* running total of data moved for advert. */ - tUINT32 wrid; /* work request completing this advertisment */ - tUINT32 flag; /* advertisment flags. */ - tUINT64 addr; /* advertised buffer virtual address */ + s32 size; /* advertised buffer size */ + s32 post; /* running total of data moved for advert. */ + u32 wrid; /* work request completing this advertisment */ + u32 flag; /* advertisment flags. */ + u64 addr; /* advertised buffer virtual address */ }; /* struct sdpc_advt */ /* * table for holding SDP advertisments. */ struct sdpc_advt_q { struct sdpc_advt *head; /* double linked list of advertisments */ - tINT32 size; /* current number of advertisments in table */ + s32 size; /* current number of advertisments in table */ }; /* struct sdpc_advt_q */ /* * make size a macro. Index: drivers/infiniband/ulp/sdp/sdp_proc.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_proc.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_proc.c (working copy) @@ -34,17 +34,17 @@ #if 0 /* currently not used, because the no tables are not writable */ /* ========================================================================= */ /*.._sdp_proc_write_parse -- parse a buffer for write commands. */ -static tINT32 _sdp_proc_write_parse(tSDP_PROC_ENTRY_PARSE parse_list, +static s32 _sdp_proc_write_parse(tSDP_PROC_ENTRY_PARSE parse_list, char *buffer, - tUINT32 size, - tINT32 *result_array, - tINT32 result_max, + u32 size, + s32 *result_array, + s32 result_max, char **next) { tSDP_PROC_ENTRY_PARSE parse_item; - tINT32 elements; - tINT32 counter; - tINT32 end; + s32 elements; + s32 counter; + s32 end; char *name; char *value; /* @@ -54,7 +54,7 @@ * double check a few constants we'll be using to make sure everything * is safe. */ - if (0 != (result_max % sizeof(tINT32))) { + if (0 != (result_max % sizeof(s32))) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, "PROC: result structure of an incorrect size. <%d>", @@ -62,7 +62,7 @@ return -EFAULT; } else { - result_max = result_max / sizeof(tINT32); + result_max = result_max / sizeof(s32); } /* * pre parse, to determine number of elements in this line. @@ -172,11 +172,11 @@ break; case TS_SDP_PROC_WRITE_STR: result_array[parse_item->id] = - (tINT32) value; + (s32) value; break; case TS_SDP_PROC_WRITE_U64: sscanf(value, "%Lx", - (tUINT64 *) & + (u64 *) & result_array[parse_item->id]); break; default: @@ -312,10 +312,10 @@ /* ========================================================================= */ /*..sdp_main_proc_cleanup -- cleanup the proc filesystem entries */ -tINT32 sdp_main_proc_cleanup(void) +s32 sdp_main_proc_cleanup(void) { tSDP_PROC_SUB_ENTRY sub_entry; - tINT32 counter; + s32 counter; TS_CHECK_NULL(_dir_root, -EINVAL); /* @@ -343,20 +343,20 @@ /* ========================================================================= */ /*..sdp_main_proc_init -- initialize the proc filesystem entries */ -tINT32 sdp_main_proc_init(void) +s32 sdp_main_proc_init(void) { tSDP_PROC_SUB_ENTRY sub_entry; - tINT32 result; - tINT32 counter; + s32 result; + s32 counter; /* * XXX still need to check this: * validate some assumptions the write parser will be making. */ - if (0 && sizeof(tINT32) != sizeof(char *)) { + if (0 && sizeof(s32) != sizeof(char *)) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, "PROC: integers and pointers of a different size. <%d:%d>", - sizeof(tINT32), sizeof(char *)); + sizeof(s32), sizeof(char *)); return -EFAULT; } Index: drivers/infiniband/ulp/sdp/sdp_buff_p.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_buff_p.h (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_buff_p.h (working copy) @@ -66,10 +66,10 @@ */ kmem_cache_t *pool_cache; /* cache of pool objects */ - tUINT32 buff_min; - tUINT32 buff_max; - tUINT32 buff_cur; - tUINT32 buff_size; /* size of each buffer in the pool */ + u32 buff_min; + u32 buff_max; + u32 buff_cur; + u32 buff_size; /* size of each buffer in the pool */ tSDP_MEMORY_SEGMENT segs; }; /* tSDP_MAIN_POOL_STRUCT */ @@ -80,7 +80,7 @@ struct tSDP_MEM_SEG_HEAD_STRUCT { tSDP_MEMORY_SEGMENT next; tSDP_MEMORY_SEGMENT prev; - tUINT32 size; + u32 size; }; /* tSDP_MEM_SEG_HEAD_STRUCT */ #define TS_SDP_BUFF_COUNT ((PAGE_SIZE - sizeof(tSDP_MEM_SEG_HEAD_STRUCT))/ \ Index: drivers/infiniband/ulp/sdp/sdp_proc.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_proc.h (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_proc.h (working copy) @@ -58,7 +58,7 @@ struct tSDP_PROC_SUB_ENTRY_STRUCT { char *name; - tINT32 type; + s32 type; struct proc_dir_entry *entry; tSDP_PROC_READ_CB_FUNC read; write_proc_t *write; @@ -81,17 +81,17 @@ *tSDP_PROC_ENTRY_PARSE; struct tSDP_PROC_ENTRY_WRITE_STRUCT { - tINT16 id; - tINT16 type; + s16 id; + s16 type; union { - tINT32 i; + s32 i; char *s; } value; }; /* tSDP_PROC_WRITE_STRUCT */ struct tSDP_PROC_ENTRY_PARSE_STRUCT { - tINT16 id; - tINT16 type; + s16 id; + s16 type; char *value; }; /* tSDP_PROC_ENTRY_PARSE_STRUCT */ Index: drivers/infiniband/ulp/sdp/sdp_sent.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_sent.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_sent.c (working copy) @@ -430,7 +430,7 @@ /* ========================================================================= */ /*..sdp_event_send -- send event handler. */ -tINT32 sdp_event_send(struct sdp_opt *conn, struct ib_wc *comp) +s32 sdp_event_send(struct sdp_opt *conn, struct ib_wc *comp) { tGW_SDP_EVENT_CB_FUNC dispatch_func; u32 free_count = 0; Index: drivers/infiniband/ulp/sdp/sdp_iocb.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_iocb.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_iocb.c (working copy) @@ -38,7 +38,7 @@ #ifdef _TS_SDP_AIO_SUPPORT struct kveclet *let; int result; - tINT32 counter; + s32 counter; TS_CHECK_NULL(iocb, -EINVAL); TS_CHECK_NULL(conn, -EINVAL); @@ -47,13 +47,13 @@ */ iocb->page_count = iocb->cb.vec->nr; iocb->page_array = - kmalloc((sizeof(tUINT64) * iocb->page_count), GFP_ATOMIC); + kmalloc((sizeof(u64) * iocb->page_count), GFP_ATOMIC); if (NULL == iocb->page_array) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_WARN, "POST: Failed to allocate IOCB page array. <%d:%d>", - sizeof(tUINT64) * iocb->page_count, iocb->page_count); + sizeof(u64) * iocb->page_count, iocb->page_count); result = -ENOMEM; goto error; @@ -77,9 +77,9 @@ * register IOCBs physical memory */ result = ib_fmr_register_physical(conn->fmr_pool, - (uint64_t *) iocb->page_array, + (u64 *) iocb->page_array, iocb->page_count, - (uint64_t *) & iocb->io_addr, + (u64 *) & iocb->io_addr, iocb->page_offset, &iocb->mem, &iocb->l_key, &iocb->r_key); @@ -229,10 +229,10 @@ /* ========================================================================= */ /*..sdp_iocb_q_lookup - find an iocb based on key, without removing */ -struct sdpc_iocb *sdp_iocb_q_lookup(struct sdpc_iocb_q *table, tUINT32 key) +struct sdpc_iocb *sdp_iocb_q_lookup(struct sdpc_iocb_q *table, u32 key) { struct sdpc_iocb *iocb = NULL; - tINT32 counter; + s32 counter; TS_CHECK_NULL(table, NULL); @@ -315,7 +315,7 @@ /* ========================================================================= */ /*..sdp_iocb_q_get_key - find an iocb based on key, and remove it */ -struct sdpc_iocb *sdp_iocb_q_get_key(struct sdpc_iocb_q *table, tUINT32 key) +struct sdpc_iocb *sdp_iocb_q_get_key(struct sdpc_iocb_q *table, u32 key) { struct sdpc_iocb *iocb; int result; @@ -459,9 +459,9 @@ { struct sdpc_iocb *iocb; struct sdpc_iocb *next; - tINT32 counter; + s32 counter; int result; - tINT32 total; + s32 total; TS_CHECK_NULL(table, -EINVAL); /* @@ -500,7 +500,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_iocb_q_create - create an IOCB table */ -struct sdpc_iocb_q *sdp_iocb_q_create(tINT32 * result) +struct sdpc_iocb_q *sdp_iocb_q_create(s32 * result) { struct sdpc_iocb_q *table = NULL; @@ -584,7 +584,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_main_iocb_init -- initialize the advertisment caches. */ -tINT32 sdp_main_iocb_init(void) +s32 sdp_main_iocb_init(void) { int result; @@ -634,7 +634,7 @@ /* ========================================================================= */ /*..sdp_main_iocb_cleanup -- cleanup the advertisment caches. */ -tINT32 sdp_main_iocb_cleanup(void) +s32 sdp_main_iocb_cleanup(void) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: IOCB cache cleanup."); Index: drivers/infiniband/ulp/sdp/sdp_iocb.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_iocb.h (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_iocb.h (working copy) @@ -67,34 +67,34 @@ struct sdpc_iocb { struct sdpc_iocb *next; /* next structure in table */ struct sdpc_iocb *prev; /* previous structure in table */ - tUINT32 type; /* element type. (for generic queue) */ + u32 type; /* element type. (for generic queue) */ struct sdpc_iocb_q *table; /* table to which this iocb belongs */ tSDP_GENERIC_DESTRUCT_FUNC release; /* release the object */ /* * iocb sepcific */ - tUINT32 flags; /* usage flags */ + u32 flags; /* usage flags */ /* * iocb information */ - tINT32 len; /* space left in the user buffer */ - tINT32 size; /* total size of the user buffer */ - tINT32 post; /* amount of data requested so far. */ - tUINT32 wrid; /* work request completing this IOCB */ - tUINT32 key; /* matches kiocb key for lookups */ + s32 len; /* space left in the user buffer */ + s32 size; /* total size of the user buffer */ + s32 post; /* amount of data requested so far. */ + u32 wrid; /* work request completing this IOCB */ + u32 key; /* matches kiocb key for lookups */ /* * IB specific information for zcopy. */ struct ib_fmr *mem; /* memory region handle */ u32 l_key; /* local access key */ u32 r_key; /* remote access key */ - tUINT64 io_addr; /* virtual IO address */ + u64 io_addr; /* virtual IO address */ /* * page list. */ - tUINT64 *page_array; /* list of physical pages. */ - tINT32 page_count; /* number of physical pages. */ - tINT32 page_offset; /* offset into first page. */ + u64 *page_array; /* list of physical pages. */ + s32 page_count; /* number of physical pages. */ + s32 page_offset; /* offset into first page. */ /* * AIO extension specific */ @@ -112,7 +112,7 @@ */ struct sdpc_iocb_q { struct sdpc_iocb *head; /* double linked list of IOCBs */ - tINT32 size; /* current number of IOCBs in table */ + s32 size; /* current number of IOCBs in table */ }; /* struct sdpc_iocb_q */ /* ----------------------------------------------------------------------- */ @@ -231,7 +231,7 @@ { void *vaddr; #if defined(__i386__) - tINT32 index; + s32 index; if (page < highmem_start_page) { @@ -289,7 +289,7 @@ if (in_interrupt()) { #if 1 - tINT32 index; + s32 index; /* * This isn't necessary, but it will catch bug if * someone tries to use an area which has been Index: drivers/infiniband/ulp/sdp/sdp_event.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_event.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_event.c (working copy) @@ -37,9 +37,9 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_cq_event_locked -- main per QP event handler */ -tINT32 sdp_cq_event_locked(struct ib_wc *comp, struct sdp_opt *conn) +s32 sdp_cq_event_locked(struct ib_wc *comp, struct sdp_opt *conn) { - tINT32 result = 0; + s32 result = 0; TS_CHECK_NULL(comp, -EINVAL); TS_CHECK_NULL(conn, -EINVAL); @@ -141,9 +141,9 @@ /*..sdp_cq_event_handler -- main per QP event handler, and demuxer */ void sdp_cq_event_handler(struct ib_cq *cq, void *arg) { - tINT32 hashent = (unsigned long)arg; + s32 hashent = (unsigned long)arg; struct sdp_opt *conn; - tINT32 result; + s32 result; unsigned long flags; #ifdef _TS_SDP_DATA_PATH_DEBUG @@ -222,9 +222,9 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._sdp_cm_hello_check -- validate the hello header */ -static tINT32 _sdp_cm_hello_check(struct msg_hello *msg_hello, tINT32 size) +static s32 _sdp_cm_hello_check(struct msg_hello *msg_hello, s32 size) { - tINT32 result; + s32 result; TS_CHECK_NULL(msg_hello, -EINVAL); @@ -312,10 +312,10 @@ /* ========================================================================= */ /*.._sdp_cm_hello_ack_check -- validate the hello ack header */ -static tINT32 _sdp_cm_hello_ack_check(struct msg_hello_ack *hello_ack, - tINT32 size) +static s32 _sdp_cm_hello_ack_check(struct msg_hello_ack *hello_ack, + s32 size) { - tINT32 result; + s32 result; TS_CHECK_NULL(hello_ack, -EINVAL); @@ -395,13 +395,13 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._sdp_cm_req -- handler for passive connection open completion */ -static tINT32 _sdp_cm_req(tTS_IB_CM_COMM_ID comm_id, - struct ib_cm_req_received_param *param, - void *arg) +static s32 _sdp_cm_req(tTS_IB_CM_COMM_ID comm_id, + struct ib_cm_req_received_param *param, + void *arg) { struct msg_hello *msg_hello; struct sdp_opt *conn; - tINT32 result; + s32 result; TS_CHECK_NULL(param, -EINVAL); TS_CHECK_NULL(param->remote_private_data, -EINVAL); @@ -410,8 +410,8 @@ TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "EVENT: REQ. commID <%08x> service ID <%08x> ca <%p> port <%08x>", - (tUINT32) comm_id, (tUINT32) param->service_id, - param->device, (tUINT32) param->port); + (u32) comm_id, (u32) param->service_id, + param->device, (u32) param->port); /* * check Hello Header, to determine if we want the connection. */ @@ -475,8 +475,8 @@ * the size we advertise to the stream peer cannot be larger then our * internal buffer size. */ - conn->send_size = min((tUINT16) sdp_buff_pool_buff_size(), - (tUINT16) (conn->send_size - + conn->send_size = min((u16) sdp_buff_pool_buff_size(), + (u16) (conn->send_size - TS_SDP_MSG_HDR_SIZE)); result = ib_cm_callback_modify(conn->comm_id, @@ -514,14 +514,14 @@ /* ========================================================================= */ /*.._sdp_cm_rep -- handler for active connection open completion */ -static tINT32 _sdp_cm_rep(tTS_IB_CM_COMM_ID comm_id, - struct ib_cm_rep_received_param *param, - struct sdp_opt *conn) +static s32 _sdp_cm_rep(tTS_IB_CM_COMM_ID comm_id, + struct ib_cm_rep_received_param *param, + struct sdp_opt *conn) { struct msg_hello_ack *hello_ack; struct sdpc_buff *buff; - tINT32 result; - tINT32 error; + s32 result; + s32 error; TS_CHECK_NULL(param, -EINVAL); TS_CHECK_NULL(param->remote_private_data, -EINVAL); @@ -534,7 +534,7 @@ TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "EVENT: <%d> REP receive. comm ID <%08x> qpn <%06x:%06x>", - conn->hashent, (tINT32) comm_id, + conn->hashent, (s32) comm_id, param->local_qpn, param->remote_qpn); /* * lock the connection @@ -653,12 +653,12 @@ /* ========================================================================= */ /*.._sdp_cm_idle -- handler for connection idle completion */ -static tINT32 _sdp_cm_idle(tTS_IB_CM_COMM_ID comm_id, - struct ib_cm_idle_param *param, - struct sdp_opt *conn) +static s32 _sdp_cm_idle(tTS_IB_CM_COMM_ID comm_id, + struct ib_cm_idle_param *param, + struct sdp_opt *conn) { - tINT32 result = 0; - tINT32 expect; + s32 result = 0; + s32 expect; TS_CHECK_NULL(param, -EINVAL); @@ -673,7 +673,7 @@ */ TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "EVENT: <%d> IDLE, comm ID <%08x> reason <%d> state <%04x>", - conn->hashent, (tINT32) comm_id, param->reason, conn->state); + conn->hashent, (s32) comm_id, param->reason, conn->state); /* * last comm reference */ @@ -767,12 +767,12 @@ /* ========================================================================= */ /*.._sdp_cm_established -- handler for connection established completion */ -static tINT32 _sdp_cm_established(tTS_IB_CM_COMM_ID comm_id, - struct ib_cm_established_param *param, - struct sdp_opt *conn) +static s32 _sdp_cm_established(tTS_IB_CM_COMM_ID comm_id, + struct ib_cm_established_param *param, + struct sdp_opt *conn) { - tINT32 result = 0; - tINT32 expect; + s32 result = 0; + s32 expect; struct sdpc_buff *buff; TS_CHECK_NULL(param, -EINVAL); @@ -783,7 +783,7 @@ TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "EVENT: <%d> ESTABLISHED, comm ID <%08x> state <%04x>", - conn->hashent, (tINT32) comm_id, conn->state); + conn->hashent, (s32) comm_id, conn->state); /* * release disconnects. */ @@ -895,12 +895,12 @@ /* ========================================================================= */ /*.._sdp_cm_timewait -- handler for connection Time Wait completion */ -static tINT32 _sdp_cm_timewait(tTS_IB_CM_COMM_ID comm_id, - struct ib_cm_disconnected_param *param, - struct sdp_opt *conn) +static s32 _sdp_cm_timewait(tTS_IB_CM_COMM_ID comm_id, + struct ib_cm_disconnected_param *param, + struct sdp_opt *conn) { - tINT32 result = 0; - tINT32 expect; + s32 result = 0; + s32 expect; TS_CHECK_NULL(param, -EINVAL); @@ -911,7 +911,7 @@ TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "EVENT: <%d> TIME WAIT, ID <%08x> reason <%d> state <%04x>", - conn->hashent, (tINT32) comm_id, param->reason, conn->state); + conn->hashent, (s32) comm_id, param->reason, conn->state); /* * Clear out posted receives now, vs after IDLE timeout, which consumes * too many buffers when lots of connections are being established and @@ -990,13 +990,13 @@ /* ========================================================================= */ /*..sdp_cm_event_handler -- handler for CM state transitions request */ tTS_IB_CM_CALLBACK_RETURN sdp_cm_event_handler(tTS_IB_CM_EVENT event, - tTS_IB_CM_COMM_ID comm_id, - void *params, - void *arg) + tTS_IB_CM_COMM_ID comm_id, + void *params, + void *arg) { - tINT32 hashent = (unsigned long)arg; + s32 hashent = (unsigned long)arg; struct sdp_opt *conn = NULL; - tINT32 result = 0; + s32 result = 0; TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "EVENT: CM state transition <%d> for comm ID <%08x> conn <%d>", Index: drivers/infiniband/ulp/sdp/sdp_buff.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_buff.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_buff.c (working copy) @@ -598,12 +598,12 @@ /* ========================================================================= */ /*.._sdp_buff_pool_alloc -- allocate more buffers for the main pool */ -static int _sdp_buff_pool_alloc(tSDP_MAIN_POOL m_pool, tUINT32 size) +static int _sdp_buff_pool_alloc(tSDP_MAIN_POOL m_pool, u32 size) { tSDP_MEMORY_SEGMENT head_seg = NULL; tSDP_MEMORY_SEGMENT mem_seg; - tUINT32 counter = 0; - tUINT32 total = 0; + u32 counter = 0; + u32 total = 0; int result; TS_CHECK_NULL(m_pool, -EINVAL); @@ -674,7 +674,7 @@ /* ========================================================================= */ /*..sdp_buff_pool_init - Initialize the main buffer pool of memory */ -int sdp_buff_pool_init(tUINT32 buff_min, tUINT32 buff_max) +int sdp_buff_pool_init(u32 buff_min, u32 buff_max) { int result; Index: drivers/infiniband/ulp/sdp/sdp_dev.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_dev.h (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_dev.h (working copy) @@ -80,9 +80,9 @@ #define TS_SDP_MSG_SERVICE_ID_VALUE (0x000000000001FFFFULL) #define TS_SDP_MSG_SERVICE_ID_MASK (0xFFFFFFFFFFFF0000ULL) -#define TS_SDP_MSG_SID_TO_PORT(sid) ((tUINT16)((sid) & 0xFFFF)) +#define TS_SDP_MSG_SID_TO_PORT(sid) ((u16)((sid) & 0xFFFF)) #define TS_SDP_MSG_PORT_TO_SID(port) \ - ((tUINT64)(TS_SDP_MSG_SERVICE_ID_RANGE | ((port) & 0xFFFF))) + ((u64)(TS_SDP_MSG_SERVICE_ID_RANGE | ((port) & 0xFFFF))) /* * invalid socket identifier, top entry in table. */ @@ -149,7 +149,7 @@ struct ib_mr *mem_h; /* registered memory region */ u32 l_key; /* local key */ u32 r_key; /* remote key */ - uint64_t iova; /* address */ + u64 iova; /* address */ struct ib_fmr_pool *fmr_pool; /* fast memory for Zcopy */ struct sdev_hca_port *port_list; /* ports on this HCA */ struct sdev_hca *next; /* next HCA in the list */ Index: drivers/infiniband/ulp/sdp/sdp_queue.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_queue.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_queue.c (working copy) @@ -175,7 +175,7 @@ void *arg) { struct sdpc_desc *element; - tINT32 counter; + s32 counter; TS_CHECK_NULL(table, NULL); TS_CHECK_NULL(lookup_func, NULL); @@ -341,7 +341,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_desc_q_create - create/allocate a generic table */ -struct sdpc_desc_q *sdp_desc_q_create(tINT32 *result) +struct sdpc_desc_q *sdp_desc_q_create(s32 *result) { struct sdpc_desc_q *table = NULL; @@ -432,7 +432,7 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..sdp_main_desc_init -- initialize the generic table caches. */ -tINT32 sdp_main_desc_init(void) +s32 sdp_main_desc_init(void) { int result; @@ -465,7 +465,7 @@ /* ========================================================================= */ /*..sdp_main_desc_cleanup -- cleanup the generic table caches. */ -tINT32 sdp_main_desc_cleanup(void) +s32 sdp_main_desc_cleanup(void) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: Generic table cache cleanup."); Index: drivers/infiniband/ulp/sdp/sdp_post.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_post.c (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_post.c (working copy) @@ -98,8 +98,8 @@ TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "POST: <%d> Path record lookup complete <%016llx:%016llx:%d>", - conn->hashent, cpu_to_be64(*(tUINT64 *) path->dgid), - cpu_to_be64(*(tUINT64 *) (path->dgid + sizeof(tUINT64))), + conn->hashent, cpu_to_be64(*(u64 *) path->dgid), + cpu_to_be64(*(u64 *) (path->dgid + sizeof(u64))), path->dlid); /* * allocate IB resources. Index: drivers/infiniband/ulp/sdp/sdp_buff.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_buff.h (revision 615) +++ drivers/infiniband/ulp/sdp/sdp_buff.h (working copy) @@ -33,7 +33,7 @@ */ struct sdpc_buff_q { struct sdpc_buff *head; /* double linked list of buffers */ - tUINT32 size; /* current number of buffers allocated to the pool */ + u32 size; /* current number of buffers allocated to the pool */ #ifdef _TS_SDP_DEBUG_POOL_NAME char *name; /* pointer to pools name */ #endif Index: drivers/infiniband/include/ib_legacy_types.h =================================================================== --- drivers/infiniband/include/ib_legacy_types.h (revision 615) +++ drivers/infiniband/include/ib_legacy_types.h (working copy) @@ -41,14 +41,6 @@ * should not be used). */ typedef int tBOOLEAN; -typedef char tINT8; -typedef unsigned char tUINT8; -typedef short tINT16; -typedef unsigned short tUINT16; -typedef int tINT32; -typedef unsigned int tUINT32; -typedef long long tINT64; -typedef unsigned long long tUINT64; /* * Generic type for returning pass/fail information back from subroutines -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From roland at topspin.com Mon Aug 9 14:53:07 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 14:53:07 -0700 Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <1092087821.1691.2.camel@localhost.localdomain> (Hal Rosenstock's message of "Mon, 09 Aug 2004 17:43:39 -0400") References: <1092086728.1923.101.camel@localhost.localdomain> <52wu0880g9.fsf@topspin.com> <1092087821.1691.2.camel@localhost.localdomain> Message-ID: <52smaw7zcs.fsf@topspin.com> Hal> I don't know your entire list or your order of tackling them Hal> so I just wanted to provide some input on at least my Hal> preference for which ones might come sooner. OK, the QP functions should be one of the next areas I go for. By the way, there's something wrong with how you're building your test, because the CQ functions definitely do exist in my tree -- these warnings are bogus: > *** Warning: "ib_req_notify_cq" [drivers/infiniband/access/gsi.ko] undefined! > *** Warning: "ib_poll_cq" [drivers/infiniband/access/gsi.ko] undefined! - R. From itoumsn at nttdata.co.jp Mon Aug 9 17:58:07 2004 From: itoumsn at nttdata.co.jp (Masanori ITOH) Date: Tue, 10 Aug 2004 09:58:07 +0900 (JST) Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <52smaw7zcs.fsf@topspin.com> References: <52wu0880g9.fsf@topspin.com> <1092087821.1691.2.camel@localhost.localdomain> <52smaw7zcs.fsf@topspin.com> Message-ID: <20040810.095807.60847859.itoumsn@nttdata.co.jp> Hi, From: Roland Dreier Subject: Re: [openib-general] mthca v. current ib_verbs Date: Mon, 09 Aug 2004 14:53:07 -0700 > Hal> I don't know your entire list or your order of tackling them > Hal> so I just wanted to provide some input on at least my > Hal> preference for which ones might come sooner. > > OK, the QP functions should be one of the next areas I go for. By the > way, there's something wrong with how you're building your test, > because the CQ functions definitely do exist in my tree -- these > warnings are bogus: > > > *** Warning: "ib_req_notify_cq" [drivers/infiniband/access/gsi.ko] undefined! > > *** Warning: "ib_poll_cq" [drivers/infiniband/access/gsi.ko] undefined! Those messages sometimes appear when you are building your source outside linux kernel source tree. In this case, especially the modules containing ib_req_notify_cq or so. Regards, Masanori --- Masanori ITOH Open Source Software Development Center, NTT DATA CORPORATION e-mail: itoumsn at nttdata.co.jp phone : +81-3-3523-8122 (ext. 7354) From roland at topspin.com Mon Aug 9 18:23:34 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 18:23:34 -0700 Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <20040810.095807.60847859.itoumsn@nttdata.co.jp> (Masanori ITOH's message of "Tue, 10 Aug 2004 09:58:07 +0900 (JST)") References: <52wu0880g9.fsf@topspin.com> <1092087821.1691.2.camel@localhost.localdomain> <52smaw7zcs.fsf@topspin.com> <20040810.095807.60847859.itoumsn@nttdata.co.jp> Message-ID: <52oelj946h.fsf@topspin.com> Masanori> Those messages sometimes appear when you are building Masanori> your source outside linux kernel source tree. In this Masanori> case, especially the modules containing ib_req_notify_cq Masanori> or so. Do you have any idea why? If the code includes ib_verbs.h properly, then the definitions below mean that no references to either ib_poll_cq() or ib_req_notify_cq() are generated: static inline int ib_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc) { return cq->device->poll_cq(cq, num_entries, wc); } static inline int ib_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify) { return cq->device->req_notify_cq(cq, cq_notify); } Thanks, Roland From mshefty at ichips.intel.com Mon Aug 9 18:36:14 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 9 Aug 2004 18:36:14 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AC7F@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F18AC7F@taurus.voltaire.com> Message-ID: <20040809183614.487654d2.mshefty@ichips.intel.com> On Sat, 7 Aug 2004 22:29:18 +0300 "Yaron Haviv" wrote: > > If we're in agreement on the server side, then I think we're more than > > half-way there. > > To me it looks like there is agreement :) I've updated ib_mad.h (again). This time additional comments were added, and notes were made where we need to continue or start having discussions. The only major change to the API was modifying ib_mad_recv_wc to use a single receive buffer, versus a chain. (I haven't given up on zero-copy receives; the proposed chaining just needs some additional thought.) Hopefully, we can begin working towards this API, but continue discussing some of the areas marked in the file with 'XXX'. > So that's why I think Redirect in CM & SA are not the same and may > require different approach > > Your thoughts ? I need to think more about the redirection case, and examine an implementation to see how much work/policy is in it. I think that if it is totally transparent to the requestor, then the API will be unaffected. Likewise, if it is exposed. We probably only need to modify the API if we need to reach some sort of compromise between the implementation versus the policy. Modified file is below: /* This software is available to you under a choice of one of two licenses. You may choose to be licensed under the terms of the GNU General Public License (GPL) Version 2, available at , or the OpenIB.org BSD license, available in the LICENSE.TXT file accompanying this software. These details are also available at . THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved. Copyright (c) 2004 Infinicon Corporation. All rights reserved. Copyright (c) 2004 Intel Corporation. All rights reserved. Copyright (c) 2004 Topspin Corporation. All rights reserved. Copyright (c) 2004 Voltaire Corporation. All rights reserved. */ #if !defined( IB_MAD_H ) #define IB_MAD_H #include "ib_verbs.h" struct ib_grh { u32 version_tclass_flow; u16 paylen; u8 next_hdr; u8 hop_limit; union ib_gid sgid; union ib_gid dgid; } __attribute__ ((packed)); struct ib_mad_hdr { u8 base_version; u8 mgmt_class; u8 class_version; u8 method; u16 status; u16 class_specific; u64 tid; u16 attr_id; u16 resv; u32 attr_mod; } __attribute__ ((packed)); struct ib_rmpp_hdr { u8 rmpp_version; u8 rmpp_type; u8 rmpp_flags; u8 rmpp_status; u32 seg_num; u32 paylen_newwin; } __attribute__ ((packed)); struct ib_mad { struct ib_mad_hdr mad_hdr; u8 data[232]; } __attribute__ ((packed)); struct ib_rmpp_mad { struct ib_mad_hdr mad_hdr; struct ib_rmpp_hdr rmpp_hdr; u8 data[220]; } __attribute__ ((packed)); struct ib_mad_agent; struct ib_mad_send_wc; struct ib_mad_recv_wc; /** * ib_mad_send_handler - callback handler for a sent MAD. * @mad_agent - MAD agent that sent the MAD. * @mad_send_wc - Send work completion information on the sent MAD. */ typedef void (*ib_mad_send_handler)(struct ib_mad_agent *mad_agent, struct ib_mad_send_wc *mad_send_wc); /** * ib_mad_recv_handler - callback handler for a received MAD. * @mad_agent - MAD agent requesting the received MAD. * @mad_recv_wc - Received work completion information on the received MAD. * * MADs received in response to a send request operation will be handed to * the user after the send operation completes. All data buffers given * to the user through this routine are owned by the receiving client. */ typedef void (*ib_mad_recv_handler)(struct ib_mad_agent *mad_agent, struct ib_mad_recv_wc *mad_recv_wc); /** * ib_mad_agent - Used to track MAD registration with the access layer. * @device - Reference to device registration is on. * @qp - Reference to QP used for sending and receiving MADs. * @recv_handler - Callback handler for a received MAD. * @send_handler - Callback hander for a sent MAD. * @context - User-specified context associated with this registration. * @hi_tid - Access layer assigned transition ID for this client. * Unsolicited MADs sent by this client will have the upper 32-bits * of their TID set to this value. */ struct ib_mad_agent { struct ib_device *device; struct ib_qp *qp; ib_mad_recv_handler recv_handler; ib_mad_send_handler send_handler; void *context; u32 hi_tid; }; enum ib_mad_flags { IB_MAD_GRH_VALID = 1 }; /** * ib_mad_send_wr - send MAD work request. * @list - Allows chaining together multiple requests. * @context - User-controlled work request context. * @sg_list - An array of scatter-gather entries, referencing the MAD's * data buffer(s). The first entry must reference the standard MAD * header, plus any RMPP header, if used. * @num_sge - The number of scatter-gather entries. * @mad_flags - Flags used to control the send operation. * @ah - Address handle for the destination. * @timeout_ms - Timeout value, in milliseconds, to wait for a response * message. Set to 0 if no response is expected. * @remote_qpn - Destination QP. * @remote_qkey - Specifies the qkey used by remote QP. * @pkey_index - Pkey index to use. Required when sending on QP1 only. */ /* XXX See about using ib_send_wr directly, e.g.: context -> wr_id mad_flags -> send_flags add new timeout_ms field or double use of imm_data */ struct ib_mad_send_wr { struct list_head list; void *context; struct ib_sge *sg_list; int num_sge; int mad_flags; struct ib_ah *ah; int timeout_ms; u32 remote_qpn; u32 remote_qkey; u16 pkey_index; }; /** * ib_mad_send_wc - MAD send completion information. * @context - Context associated with the send MAD request. * @status - Completion status. * @vendor_err - Optional vendor error information returned with a failed * request. */ struct ib_mad_send_wc { void *context; enum ib_wc_status status; u32 vendor_err; }; /** * ib_mad_recv_wc - received MAD information. * @context - For received response, set to the context specified for * the corresponding send request. * @grh - References a data buffer containing the global route header. * The data refereced by this buffer is only valid if the GRH is * valid. * @mad - References the start of the received MAD. * @length - Specifies the size of the received MAD. * @mad_flags - Flags used to specify information about the received MAD. * @mad_len - The length of the received MAD, without duplicated headers. * @src_qpn - Source QP. * @pkey_index - Pkey index. * @slid - LID of remote QP. * @sl - Service level of source for a received message. * @dlid_path_bits - Path bits of source for a received message. * * An RMPP receive will be coalesced into a single data buffer. */ /* XXX revisit possibility of zero-copy receive */ struct ib_mad_recv_wc { void *context; struct ib_grh *grh; struct ib_mad *mad; u32 length; int mad_flags; u32 mad_len; u32 src_qp; u16 pkey_index; u16 slid; u8 sl; u8 dlid_path_bits; }; /** * ib_mad_reg_req - MAD registration request * @mgmt_class - Indicates which management class of MADs should be receive * by the caller. This field is only required if the user wishes to * receive unsolicited MADs, otherwise it should be 0. * @mgmt_class_version - Indicates which version of MADs for the given * management class to receive. * @method_mask - The caller will receive unsolicited MADs for any method * where @method_mask = 1. */ /* XXX Need to extend to support snooping - perhaps registration type with masks for the class, version, methods if type is 'view-only'? */ struct ib_mad_reg_req { u8 mgmt_class; u8 mgmt_class_version; DECLARE_BITMAP(method_mask, 128); }; /** * ib_mad_reg - Register to send/receive MADs. * @device - The device to register with. * @port - The port on the specified device to use. * @qp_type - Specifies which QP to access. Must be either * IB_QPT_SMI or IB_QPT_GSI. * @mad_reg_req - Specifies which unsolicited MADs should be received * by the caller. This parameter may be NULL if the caller only * wishes to receive solicited responses. * @rmpp_version - If set, indicates that the client will send * and receive MADs that contain the RMPP header for the given version. * If set to 0, indicates that RMPP is not used by this client. * @send_handler - The completion callback routine invoked after a send * request has completed. * @recv_handler - The completion callback routine invoked for a received * MAD. * @context - User specified context associated with the registration. */ struct ib_mad_agent *ib_mad_reg(struct ib_device *device, u8 port, enum ib_qp_type qp_type, struct ib_mad_reg_req *mad_reg_req, u8 rmpp_version, ib_mad_send_handler send_handler, ib_mad_recv_handler recv_handler, void *context); /** * ib_mad_dereg - Deregisters a client from using MAD services. * @mad_agent - Corresponding MAD registration request to deregister. * * After invoking this routine, MAD services are no longer usable by the * client on the associated QP. */ int ib_mad_dereg(struct ib_mad_agent *mad_agent); /** * ib_mad_post_send - Posts a MAD to the send queue of the QP associated * with the registered client. * @mad_agent - Specifies the associated registration to post the send to. * @mad_send_wr - Specifies the information needed to send the MAD. */ /* XXX Need to define queuing model - above or below API? */ int ib_mad_post_send(struct ib_mad_agent *mad_agent, struct ib_mad_send_wr *mad_send_wr); /** * ib_mad_qp_redir - Registers a QP for MAD services. * @qp - Reference to a QP that requires MAD services. * @rmpp_version - If set, indicates that the client will send * and receive MADs that contain the RMPP header for the given version. * If set to 0, indicates that RMPP is not used by this client. * @send_handler - The completion callback routine invoked after a send * request has completed. * @recv_handler - The completion callback routine invoked for a received * MAD. * @context - User specified context associated with the registration. * * Use of this call allows clients to use MAD services, such as RMPP, * on user-owned QPs. After calling this routine, users may send * MADs on the specified QP by calling ib_mad_post_send. */ /* XXX Need to define provided features for requestor-side redirecting */ struct ib_mad_agent *ib_mad_qp_redir(struct ib_qp *qp, u8 rmpp_version, ib_mad_send_handler send_handler, ib_mad_recv_handler recv_handler, void *context); /** * ib_mad_process_wc - Processes a work completion associated with a * MAD sent or received on a redirected QP. * @mad_agent - Specifies the registered MAD service using the redirected QP. * @wc - References a work completion associated with a sent or received * MAD segment. * * This routine is used to complete or continue processing on a MAD request. * If the work completion is associated with a send operation, calling * this routine is required to continue an RMPP transfer or to wait for a * corresponding response, if it is a request. If the work completion is * associated with a receive operation, calling this routine is required to * process an inbound or outbound RMPP transfer, or to match a response MAD * with its corresponding request. */ int ib_mad_process_wc(struct ib_mad_agent *mad_agent, struct ib_wc *wc); #endif /* IB_MAD_H */ From roland at topspin.com Mon Aug 9 20:16:09 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 20:16:09 -0700 Subject: [openib-general] [PATCH] Kill more t??? typedefs In-Reply-To: <1092088324.14886.20.camel@duffman> (Tom Duffy's message of "Mon, 09 Aug 2004 14:52:04 -0700") References: <1092088324.14886.20.camel@duffman> Message-ID: <52isbr8yyu.fsf@topspin.com> Thanks, I've committed this. In the past I had held off just doing all the tINT32 -> s32 conversions because I think a lot of them should just be changed to plain int, but now that I can see how close to killing off ib_legacy_types.h entirely, I like taking this step. - R. From halr at voltaire.com Mon Aug 9 20:34:38 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 09 Aug 2004 23:34:38 -0400 Subject: [openib-general] mthca v. current ib_verbs References: <1092086728.1923.101.camel@localhost.localdomain> <52wu0880g9.fsf@topspin.com> <1092087821.1691.2.camel@localhost.localdomain> <52smaw7zcs.fsf@topspin.com> Message-ID: <001a01c47e8a$f82cf8e0$6401a8c0@comcast.net> Roland Dreier wrote: > OK, the QP functions should be one of the next areas I go for. Good. I can see what's left to do. > By the > way, there's something wrong with how you're building your test, > because the CQ functions definitely do exist in my tree -- these > warnings are bogus: > > *** Warning: "ib_req_notify_cq" [drivers/infiniband/access/gsi.ko] undefined! > *** Warning: "ib_poll_cq" [drivers/infiniband/access/gsi.ko] undefined! Yes, this is my problem which I will fix soon. -- Hal From itoumsn at nttdata.co.jp Mon Aug 9 21:17:51 2004 From: itoumsn at nttdata.co.jp (Masanori ITOH) Date: Tue, 10 Aug 2004 13:17:51 +0900 (JST) Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <52oelj946h.fsf@topspin.com> References: <52smaw7zcs.fsf@topspin.com> <20040810.095807.60847859.itoumsn@nttdata.co.jp> <52oelj946h.fsf@topspin.com> Message-ID: <20040810.131751.01369461.itoumsn@nttdata.co.jp> Hi Roland, From: Roland Dreier Subject: Re: [openib-general] mthca v. current ib_verbs Date: Mon, 09 Aug 2004 18:23:34 -0700 > Masanori> Those messages sometimes appear when you are building > Masanori> your source outside linux kernel source tree. In this > Masanori> case, especially the modules containing ib_req_notify_cq > Masanori> or so. > > Do you have any idea why? If the code includes ib_verbs.h properly, > then the definitions below mean that no references to either > ib_poll_cq() or ib_req_notify_cq() are generated: > > static inline int ib_poll_cq(struct ib_cq *cq, int num_entries, > struct ib_wc *wc) > { > return cq->device->poll_cq(cq, num_entries, wc); > } > > static inline int ib_req_notify_cq(struct ib_cq *cq, > enum ib_cq_notify cq_notify) > { > return cq->device->req_notify_cq(cq, cq_notify); > } > > Thanks, > Roland Humm... There are 15 ib_verbs now, and it would be better to ask Hal which 'ib_verb.h' he is including. ritafsdev:/opt/svn/openib[11] find . -name 'ib_verbs.h' ./trunk/contrib/voltaire/voltaire-ibhost/src/user-protocols/dapl/dapl-sf-031229/dapl/include/ib/IBM/ke/ib_verbs.h ./trunk/contrib/voltaire/voltaire-ibhost/src/user-protocols/dapl/dapl-sf-031229/dapl/include/ib/IBM/us/ib_verbs.h ./trunk/contrib/intel/ib_verbs.h ./trunk/contrib/infinicon/ALL_HOST/Dat/dapl/dapl/include/ib/IBM/ke/ib_verbs.h ./trunk/contrib/infinicon/ALL_HOST/Dat/dapl/dapl/include/ib/IBM/us/ib_verbs.h ./trunk/contrib/infinicon/latest/ALL_HOST/Dat/dapl/dapl/include/ib/IBM/ke/ib_verbs.h ./trunk/contrib/infinicon/latest/ALL_HOST/Dat/dapl/dapl/include/ib/IBM/us/ib_verbs.h ./trunk/openib/1.0/ib-support/host/ulp/dapl/dapl3/dapl/include/ib/IBM/ke/ib_verbs.h ./trunk/openib/1.0/ib-support/host/ulp/dapl/dapl3/dapl/include/ib/IBM/us/ib_verbs.h ./gen2/branches/openib-candidate/src/linux-kernel/infiniband/include/ib_verbs.h ./gen2/branches/roland-merge/src/linux-kernel/infiniband/include/ib_verbs.h ./tags/infinicon-latest-pre2.6/ALL_HOST/Dat/dapl/dapl/include/ib/IBM/ke/ib_verbs.h ./tags/infinicon-latest-pre2.6/ALL_HOST/Dat/dapl/dapl/include/ib/IBM/us/ib_verbs.h ./tags/infinicon2.1/ALL_HOST/Dat/dapl/dapl/include/ib/IBM/ke/ib_verbs.h ./tags/infinicon2.1/ALL_HOST/Dat/dapl/dapl/include/ib/IBM/us/ib_verbs.h By the way, why not defining macros like the following instead of static inline functions? #define ib_poll_cq(cq, num_entries, wc) cq->device->poll_cq(cq, num_entries, wc); Thanks, Masanori --- Masanori ITOH Open Source Software Development Center, NTT DATA CORPORATION e-mail: itoumsn at nttdata.co.jp phone : +81-3-3523-8122 (ext. 7354) From roland at topspin.com Mon Aug 9 21:23:31 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 21:23:31 -0700 Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <20040810.131751.01369461.itoumsn@nttdata.co.jp> (Masanori ITOH's message of "Tue, 10 Aug 2004 13:17:51 +0900 (JST)") References: <52smaw7zcs.fsf@topspin.com> <20040810.095807.60847859.itoumsn@nttdata.co.jp> <52oelj946h.fsf@topspin.com> <20040810.131751.01369461.itoumsn@nttdata.co.jp> Message-ID: <52brhj8vuk.fsf@topspin.com> Masanori> Humm... There are 15 ib_verbs now, and it would be Masanori> better to ask Hal which 'ib_verb.h' he is including. Good point :) But I think Hal understands the problem now... Masanori> By the way, why not defining macros like the following Masanori> instead of static inline functions? I think inline functions are generally preferred to macros. I use macros only when an inline function won't work. - Roland From roland at topspin.com Mon Aug 9 21:24:13 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 09 Aug 2004 21:24:13 -0700 Subject: [openib-general] [PATCH] kill ib_legacy.h In-Reply-To: <52isbr8yyu.fsf@topspin.com> (Roland Dreier's message of "Mon, 09 Aug 2004 20:16:09 -0700") References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> Message-ID: <52acx38vte.fsf_-_@topspin.com> The coup de grace... Index: src/linux-kernel/infiniband/ulp/dapl/khash.c =================================================================== --- src/linux-kernel/infiniband/ulp/dapl/khash.c (revision 619) +++ src/linux-kernel/infiniband/ulp/dapl/khash.c (working copy) @@ -26,7 +26,6 @@ #include #include -#include "ib_legacy_types.h" #include "ts_kernel_trace.h" #include "khash.h" Index: src/linux-kernel/infiniband/ulp/dapl/khash.h =================================================================== --- src/linux-kernel/infiniband/ulp/dapl/khash.h (revision 619) +++ src/linux-kernel/infiniband/ulp/dapl/khash.h (working copy) @@ -24,8 +24,6 @@ #ifndef _KHASH_H #define _KHASH_H -#include - #define HASH_KEY_SIZE 40 /* Index: src/linux-kernel/infiniband/ulp/dapl/udapl_mod.c =================================================================== --- src/linux-kernel/infiniband/ulp/dapl/udapl_mod.c (revision 619) +++ src/linux-kernel/infiniband/ulp/dapl/udapl_mod.c (working copy) @@ -39,7 +39,6 @@ #include "vapi_common.h" #include "evapi.h" -#include "ib_legacy_types.h" #include "ts_kernel_trace.h" #include #include "ts_ib_sa_client.h" Index: src/linux-kernel/infiniband/ulp/dapl/udapl_mod.h =================================================================== --- src/linux-kernel/infiniband/ulp/dapl/udapl_mod.h (revision 619) +++ src/linux-kernel/infiniband/ulp/dapl/udapl_mod.h (working copy) @@ -28,7 +28,6 @@ #include -#include "ib_legacy_types.h" #include #define UDAPL_DEVNAME "udapl" Index: src/linux-kernel/infiniband/ulp/ipoib/ip2pr_priv.h =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ip2pr_priv.h (revision 614) +++ src/linux-kernel/infiniband/ulp/ipoib/ip2pr_priv.h (working copy) @@ -26,7 +26,6 @@ /* * topspin generic includes */ -#include #include #include #include @@ -66,7 +65,6 @@ /* * topspin IB includes */ -#include #include #include #include Index: src/linux-kernel/infiniband/ulp/ipoib/ip2pr_export.h =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ip2pr_export.h (revision 614) +++ src/linux-kernel/infiniband/ulp/ipoib/ip2pr_export.h (working copy) @@ -24,8 +24,6 @@ #ifndef _TS_IP2PR_EXPORT_H #define _TS_IP2PR_EXPORT_H -#include - /* ------------------------------------------------------------------------- */ /* kernel */ /* ------------------------------------------------------------------------- */ Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_proto.h =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_proto.h (revision 576) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_proto.h (working copy) @@ -25,7 +25,6 @@ #define _IPOIB_PROTO_H #include -#include #include /* ------------------------------------------------------------------------- */ Index: src/linux-kernel/infiniband/ulp/srp/srp_host.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_host.c (revision 619) +++ src/linux-kernel/infiniband/ulp/srp/srp_host.c (working copy) @@ -154,7 +154,7 @@ .use_clustering = ENABLE_CLUSTERING, }; -static int scsi_unload_in_progress = FALSE; +static int scsi_unload_in_progress = 0; static unsigned long connection_timeout = 0; static struct pci_dev *hca_pdev; @@ -279,7 +279,7 @@ driver_params.num_connections++; port->num_connections++; target->ioc->num_connections++; - target->hard_reject = FALSE; + target->hard_reject = 0; target->hard_reject_count = 0; target->active_conn = s; @@ -293,7 +293,7 @@ s->port = port; s->state = SRP_HOST_LOGIN_INPROGRESS; - s->redirected = FALSE; + s->redirected = 0; s->path_record_tid = 0; srp_host_login(s); @@ -323,7 +323,7 @@ target->active_conn = NULL; target->timeout = jiffies + connection_timeout; target->state = TARGET_POTENTIAL_CONNECTION; - target->need_device_reset = TRUE; + target->need_device_reset = 1; spin_unlock_irqrestore(&target->spin_lock, cpu_flags); } @@ -332,14 +332,14 @@ { srp_target_t *target = s->target; unsigned long cpu_flags; - int force_close = FALSE; + int force_close = 0; spin_lock_irqsave(&target->spin_lock, cpu_flags); if (s->state == SRP_HOST_LOGIN_INPROGRESS) { driver_params.num_pending_connections--; } else if (s->state == SRP_HOST_GET_PATH_RECORD) { - force_close = TRUE; + force_close = 1; driver_params.num_pending_connections--; } else if (s->state == SRP_UP) { driver_params.num_active_connections--; @@ -352,8 +352,8 @@ if (target->state == TARGET_POTENTIAL_CONNECTION) target->timeout = jiffies + connection_timeout; target->active_conn = NULL; - target->need_disconnect = FALSE; - target->hard_reject = FALSE; + target->need_disconnect = 0; + target->hard_reject = 0; s->state = SRP_HOST_LOGOUT_INPROGRESS; @@ -398,17 +398,17 @@ spin_lock_irqsave(&target->spin_lock, cpu_flags); - if (srp_pkt->in_use == FALSE) { + if (!srp_pkt->in_use) { TS_REPORT_STAGE(MOD_SRPTP, "srp_pkt already free %d", srp_pkt->pkt_index); - } else if (srp_pkt->in_use == TRUE) { + } else if (srp_pkt->in_use) { srp_pkt->scatter_gather_list.address = (u64) (unsigned long)srp_pkt->data; srp_pkt->scatter_gather_list.length = srp_cmd_pkt_size; - srp_pkt->in_use = FALSE; + srp_pkt->in_use = 0; srp_pkt->conn = INVALID_CONN_HANDLE; srp_pkt->next = target->srp_pkt_free_list; @@ -429,17 +429,17 @@ } else { target = (srp_target_t *) srp_pkt->target; - if (srp_pkt->in_use == FALSE) { + if (!srp_pkt->in_use) { TS_REPORT_STAGE(MOD_SRPTP, "srp_pkt already free %d", srp_pkt->pkt_index); - } else if (srp_pkt->in_use == TRUE) { + } else if (srp_pkt->in_use) { srp_pkt->scatter_gather_list.address = (u64) (unsigned long)srp_pkt->data; srp_pkt->scatter_gather_list.length = srp_cmd_pkt_size; - srp_pkt->in_use = FALSE; + srp_pkt->in_use = 0; srp_pkt->conn = INVALID_CONN_HANDLE; srp_pkt->next = target->srp_pkt_free_list; @@ -479,7 +479,7 @@ atomic_dec(&target->free_pkt_counter); srp_pkt->next = NULL; - srp_pkt->in_use = TRUE; + srp_pkt->in_use = 1; } spin_unlock_irqrestore(&target->spin_lock, cpu_flags); @@ -514,7 +514,7 @@ } } - return (TS_SUCCESS); + return 0; } /* @@ -558,7 +558,7 @@ hca = &hca_params[hca_index]; - if (hca->valid == FALSE) + if (!hca->valid) break; target->cqs_hndl[hca_index] = ib_create_cq(hca->ca_hndl, @@ -656,7 +656,7 @@ hca = &hca_params[hca_index]; - if (hca->valid == FALSE) + if (!hca->valid) break; if (target->srp_pkt_data_mhndl[hca_index]) @@ -685,13 +685,13 @@ target = &srp_targets[target_index]; - if (target->valid == TRUE) { + if (target->valid) { for (hca_index = 0; hca_index < MAX_HCAS; hca_index++) { hca = &hca_params[hca_index]; - if (hca->valid == FALSE) + if (!hca->valid) break; ib_dereg_mr(target->srp_pkt_data_mhndl[hca_index]); @@ -720,10 +720,10 @@ target->target_index = target - &srp_targets[0]; target->state = TARGET_POTENTIAL_CONNECTION; target->timeout = jiffies + TARGET_POTENTIAL_STARTUP_TIMEOUT; - target->valid = FALSE; - target->need_disconnect = FALSE; - target->need_device_reset = FALSE; - target->hard_reject = FALSE; + target->valid = 0; + target->need_disconnect = 0; + target->need_device_reset = 0; + target->hard_reject = 0; target->hard_reject_count = 0; INIT_LIST_HEAD(&target->conn_list); @@ -982,7 +982,7 @@ if (((target->state > TARGET_INITIALIZED) && (target->state < TARGET_ACTIVE_CONNECTION)) && - (target->valid == TRUE) && (target->active_conn == NULL)) { + (target->valid) && (target->active_conn == NULL)) { TS_REPORT_STAGE(MOD_SRPTP, "target %d, no active connection", @@ -1007,7 +1007,7 @@ port = &hca_params[hca_index].port[port_index]; - port->dm_need_query = TRUE; + port->dm_need_query = 1; } } } @@ -1055,7 +1055,7 @@ int port_index, hca_index; int dm_query_sum = 0; unsigned long dm_query_filter_timeout = 0; - int sweep_targets_for_connections = FALSE; + int sweep_targets_for_connections = 0; srp_target_t *target; srp_host_hca_params_t *hca; srp_host_port_params_t *port; @@ -1088,18 +1088,18 @@ port = &hca_params[hca_index]. port[port_index]; - if (port->valid == FALSE) + if (!port->valid) break; srp_port_query_cancel(port); if (port->out_of_service_xid != 0) srp_register_out_of_service - (port, FALSE); + (port, 0); if (port->in_service_xid != 0) srp_register_in_service(port, - FALSE); + 0); if (port->dm_query_in_progress) { TS_REPORT_STAGE(MOD_SRP, @@ -1123,7 +1123,7 @@ */ if (driver_params.need_refresh) { - driver_params.need_refresh = FALSE; + driver_params.need_refresh = 0; /* * Refresh our local port information @@ -1147,7 +1147,7 @@ port = &hca_params[hca_index].port[port_index]; - if (port->valid == FALSE) + if (!port->valid) break; if (port->dm_query_in_progress) { @@ -1155,14 +1155,14 @@ continue; } - if (port->dm_need_query == FALSE) + if (!port->dm_need_query) continue; if (port->out_of_service_xid == 0) - srp_register_out_of_service(port, TRUE); + srp_register_out_of_service(port, 1); if (port->in_service_xid == 0) - srp_register_in_service(port, TRUE); + srp_register_in_service(port, 1); port->dm_retry_count = 0; @@ -1185,8 +1185,8 @@ "Number of active dm_queries %d", dm_query_sum); - sweep_targets_for_connections = TRUE; - driver_params.port_query = FALSE; + sweep_targets_for_connections = 1; + driver_params.port_query = 0; if (dm_query_filter_timeout == 0) dm_query_filter_timeout = @@ -1204,12 +1204,12 @@ sweep_targets(); } - driver_params.dm_active = TRUE; + driver_params.dm_active = 1; - } else if (sweep_targets_for_connections == TRUE) { - sweep_targets_for_connections = FALSE; + } else if (sweep_targets_for_connections) { + sweep_targets_for_connections = 0; dm_query_filter_timeout = 0; - driver_params.dm_active = FALSE; + driver_params.dm_active = 0; sweep_targets(); } @@ -1229,7 +1229,7 @@ * Cleanup various disconnect/reconnect methods into * one method */ - if (target->need_disconnect == TRUE) { + if (target->need_disconnect) { remove_connection(target->active_conn, TARGET_POTENTIAL_CONNECTION); @@ -1238,17 +1238,17 @@ initialize_connection(target); - target->need_device_reset = FALSE; - target->hard_reject = FALSE; + target->need_device_reset = 0; + target->hard_reject = 0; } - if (target->need_device_reset == TRUE) { + if (target->need_device_reset) { struct list_head *conn_entry; srp_host_conn_t *conn; - target->need_device_reset = FALSE; - target->need_disconnect = FALSE; - target->hard_reject = FALSE; + target->need_device_reset = 0; + target->need_disconnect = 0; + target->hard_reject = 0; list_for_each(conn_entry, &target->conn_list) { @@ -1268,12 +1268,12 @@ } } - if ((target->hard_reject == TRUE) + if ((target->hard_reject) && (target->active_conn)) { - target->need_device_reset = FALSE; - target->need_disconnect = FALSE; - target->hard_reject = FALSE; + target->need_device_reset = 0; + target->need_disconnect = 0; + target->hard_reject = 0; if (target->hard_reject_count++ < MAX_HARD_REJECT_COUNT) { @@ -1306,7 +1306,7 @@ srp_host_close_conn(conn); - srp_dm_kill_ioc(target, FALSE); + srp_dm_kill_ioc(target, 0); pick_connection_path(target); } @@ -1368,7 +1368,7 @@ for (hca = &hca_params[0]; hca < &hca_params[MAX_HCAS]; hca++) { - if (hca->valid == FALSE) + if (!hca->valid) break; /* @@ -1546,12 +1546,12 @@ port_index++) { if (hca_params[hca_index].port[port_index].valid) { hca_params[hca_index].port[port_index]. - dm_need_query = TRUE; + dm_need_query = 1; } } } - driver_params.need_refresh = TRUE; + driver_params.need_refresh = 1; err = tsKernelThreadStart("ts_srp_dm", srp_dm_poll_thread, @@ -1586,7 +1586,7 @@ * (1) timeout to expire */ } while (((driver_params.num_active_local_ports == 0) || - (driver_params.dm_active == TRUE) || + (driver_params.dm_active) || (driver_params.num_pending_connections != 0)) && (connections_timeout < (srp_discovery_timeout * HZ))); @@ -1637,12 +1637,12 @@ int len = 0; int i; srp_target_t *target; - int not_first_entry = FALSE; + int not_first_entry = 0; u8 *gid; u8 *ioc_guid; char *buf; - if (inout == TRUE) { + if (inout) { /* write to proc interface, redistribute connections */ if (!buffer || length >= PAGE_SIZE) { return (-EINVAL); @@ -1680,8 +1680,7 @@ target = &srp_targets[i]; - if ((target->valid == FALSE) - || (target->state != TARGET_ACTIVE_CONNECTION)) + if (!target->valid || target->state != TARGET_ACTIVE_CONNECTION) continue; gid = target->port->local_gid; @@ -1736,13 +1735,13 @@ for (target = &srp_targets[0]; target < &srp_targets[max_srp_targets]; target++) { - if (target->valid == TRUE) { + if (target->valid) { /* don't print colon on first guy */ - if (not_first_entry == TRUE) { + if (not_first_entry) { len += sprintf(&buffer[len], ":"); } else { - not_first_entry = TRUE; + not_first_entry = 1; } len += sprintf(&buffer[len], "%llx.%x", @@ -2455,9 +2454,8 @@ /* send the srp packet */ status = srptp_post_recv(recv_pkt); - if (status == TS_SUCCESS) { + if (!status) status = srptp_post_send(send_pkt); - } if (status) { /* we have a problem posting, disconnect, never should happen */ @@ -2491,7 +2489,7 @@ target->target_index, status); target->state = TARGET_POTENTIAL_CONNECTION; - target->need_disconnect = TRUE; + target->need_disconnect = 1; spin_unlock_irqrestore(&target->spin_lock, cpu_flags); @@ -2499,10 +2497,10 @@ } SEND_SUCCESS: - return (TS_SUCCESS); + return 0; SEND_FAIL: - return (TS_FAIL); + return -1; } #if 1 @@ -3499,7 +3497,7 @@ } s->state = SRP_HOST_LOGIN_INPROGRESS; - s->redirected = TRUE; + s->redirected = 1; srptp_connect(s, redirected_path_record, (__u8 *) s->login_buff, s->login_buff_len); @@ -3624,7 +3622,7 @@ "SRP Target rejected for target %d, redirect %d", target->target_index, s->redirected); - target->hard_reject = TRUE; + target->hard_reject = 1; break; @@ -3761,7 +3759,7 @@ target->target_index); target->state = TARGET_POTENTIAL_CONNECTION; - target->need_disconnect = TRUE; + target->need_disconnect = 1; } spin_unlock_irqrestore(&target->spin_lock, cpu_flags); @@ -3813,7 +3811,7 @@ /* First unregister, the scsi driver, set a flag to indicate * to the abort code to complete aborts immediately */ - scsi_unload_in_progress = TRUE; + scsi_unload_in_progress = 1; tsKernelThreadStop(driver_params.thread); Index: src/linux-kernel/infiniband/ulp/srp/srp_dm.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_dm.c (revision 619) +++ src/linux-kernel/infiniband/ulp/srp/srp_dm.c (working copy) @@ -36,13 +36,13 @@ for (i = 0; i < MAX_IOCS; i++) { if ((memcmp(ioc_table[i].guid, guid, sizeof(tTS_IB_GUID)) == 0) - && (ioc_table[i].valid == TRUE)) { + && (ioc_table[i].valid == 1)) { /* we have a match, return IOC index */ *ioc_index = i; - return (TS_SUCCESS); + return 0; } } - return (TS_FAIL); + return -1; } int srp_new_ioc(tTS_IB_GUID guid, int *ioc_index) @@ -50,19 +50,19 @@ int i; for (i = 0; i < MAX_IOCS; i++) { - if (ioc_table[i].valid == FALSE) { + if (!ioc_table[i].valid) { TS_REPORT_STAGE(MOD_SRPTP, "Creating IOC Entry %d for 0x%llx", i, be64_to_cpu(*(u64 *) guid)); memcpy(ioc_table[i].guid, guid, sizeof(tTS_IB_GUID)); - ioc_table[i].valid = TRUE; + ioc_table[i].valid = 1; *ioc_index = i; - return (TS_SUCCESS); + return 0; } } - return (TS_FAIL); + return -1; } static void srp_check_ioc_paths(ioc_entry_t * ioc) @@ -78,24 +78,24 @@ spin_lock_irqsave(&driver_params.spin_lock, cpu_flags); - path_available = FALSE; + path_available = 0; /* check if it has any valid paths */ for (hca_index = 0; hca_index < MAX_HCAS; hca_index++) { for (port_index = 0; port_index < MAX_LOCAL_PORTS_PER_HCA; port_index++) { - if (ioc->path_valid[hca_index][port_index] == TRUE) { + if (ioc->path_valid[hca_index][port_index]) { /* * we have at least one path, lets keep the * IOC */ - path_available = TRUE; + path_available = 1; } } } - if (path_available == FALSE) { + if (!path_available) { TS_REPORT_WARN(MOD_SRPTP, "IOC GUID %llx, no available paths", be64_to_cpu(*(u64 *) ioc->guid)); @@ -103,14 +103,14 @@ * no paths available to this IOC, let's remove it from our * list */ - ioc->valid = FALSE; + ioc->valid = 0; /* loop through all targets, and indicate that this * ioc_index is not available as a path */ for (target = &srp_targets[0]; target < &srp_targets[max_srp_targets]; target++) { - target->ioc_mask[ioc_index] = FALSE; - target->ioc_needs_request[ioc_index] = FALSE; + target->ioc_mask[ioc_index] = 0; + target->ioc_needs_request[ioc_index] = 0; } } @@ -131,15 +131,15 @@ * the difference is if we get a IOU connection failure versus * a redirected connection failure */ - if (flag == TRUE) { + if (flag) { /* * this will not cause the path to be lost, * just asks connection hunt code to skip this IOC * for this target */ - target->ioc_needs_request[ioc - &ioc_table[0]] = FALSE; + target->ioc_needs_request[ioc - &ioc_table[0]] = 0; } else { - ioc->path_valid[hca_index][port_index] = FALSE; + ioc->path_valid[hca_index][port_index] = 0; srp_check_ioc_paths(ioc); } @@ -154,7 +154,7 @@ ioc_entry = &ioc_table[ioc_index]; /* check if the entry is valid */ - if (ioc_entry->valid == FALSE) + if (!ioc_entry->valid) continue; srp_check_ioc_paths(ioc_entry); @@ -175,8 +175,8 @@ * Search for IOC guid with lowest connections first */ for (ioc_index = 0; ioc_index < MAX_IOCS; ioc_index++) { - if ((target->ioc_mask[ioc_index] == TRUE) && - (target->ioc_needs_request[ioc_index] == TRUE)) { + if ((target->ioc_mask[ioc_index]) && + (target->ioc_needs_request[ioc_index])) { if (ioc_table[ioc_index].num_connections < connection_count) { lowest_ioc_entry = &ioc_table[ioc_index]; @@ -203,10 +203,8 @@ * check if the port is valid, the port is up * and if the IOC can be seen through this port */ - if ((port->valid == TRUE) && - (lowest_ioc_entry-> - path_valid[hca_index][port_index] == - TRUE)) { + if (port->valid && + lowest_ioc_entry->path_valid[hca_index][port_index]) { if (port->num_connections < connection_count) { lowest_port = port; @@ -295,8 +293,8 @@ } if (port->dm_query_in_progress) { - port->dm_query_in_progress = FALSE; - port->dm_need_query = FALSE; + port->dm_query_in_progress = 0; + port->dm_need_query = 0; TS_REPORT_STAGE(MOD_SRP, "Canceling DM Query on hca %d port %d", port->hca->hca_index + 1, port->local_port); @@ -323,12 +321,12 @@ * flag */ if ((*query_entry)->id == id) { - return (TS_SUCCESS); + return 0; } } *query_entry = NULL; - return (TS_FAIL); + return -1; } int srp_find_query(srp_host_port_params_t *port, u8 *gid) @@ -348,16 +346,16 @@ */ if ((query->port == port) && (memcmp(query->remote_gid, gid, sizeof(tTS_IB_GID)) == 0)) { - query->need_retry = TRUE; + query->need_retry = 1; spin_unlock_irqrestore(&driver_params.spin_lock, cpu_flags); - return (TS_SUCCESS); + return 0; } } spin_unlock_irqrestore(&driver_params.spin_lock, cpu_flags); - return (TS_FAIL); + return -1; } void srp_update_ioc(ioc_entry_t * ioc_entry, @@ -383,7 +381,7 @@ memcpy(svc_path_record->sgid, port->local_gid, sizeof(tTS_IB_GID)); ioc_entry->path_valid[port->hca->hca_index][port->local_port - 1] = - TRUE; + 1; TS_REPORT_STAGE(MOD_SRPTP, "Updating IOC on hca %d port %d", port->hca->hca_index + 1, port->local_port); @@ -450,7 +448,7 @@ * update the pathing information for the IOC */ status = srp_find_ioc(io_svc->controller_guid, &ioc_index); - if (status == TS_FAIL) { + if (status) { TS_REPORT_STAGE(MOD_SRPTP, "IOC not found %llx, creating new " "IOC entry", @@ -460,7 +458,7 @@ status = srp_new_ioc(io_svc->controller_guid, &ioc_index); - if (status == TS_FAIL) { + if (status) { TS_REPORT_STAGE(MOD_SRPTP, "IOC entry creation failed, " "too many IOCs"); @@ -525,7 +523,7 @@ * If target wasn't previously discovered * Allocate packets and mark as in-use. */ - if (empty_target->valid == FALSE) { + if (!empty_target->valid) { int status; status = srp_host_alloc_pkts(empty_target); @@ -534,15 +532,15 @@ "Could not allocat target %d", empty_target->target_index); } else { - empty_target->valid = TRUE; + empty_target->valid = 1; } } /* * Indicate which IOCs the target/service is visible on */ - empty_target->ioc_mask[ioc_index] = TRUE; - empty_target->ioc_needs_request[ioc_index] = TRUE; + empty_target->ioc_mask[ioc_index] = 1; + empty_target->ioc_needs_request[ioc_index] = 1; } } @@ -556,7 +554,7 @@ * If we are shuting down, throw away the query */ if (driver_params.dm_shutdown) { - if ((status == TS_SUCCESS) && (io_list)) { + if (!status && io_list) { ib_host_io_list_free(io_list); } return; @@ -564,15 +562,15 @@ down(&driver_params.sema); - if ((status == TS_SUCCESS) && (io_list == NULL)) { + if (!status && !io_list) { TS_REPORT_STAGE(MOD_SRPTP, "DM Client Query complete hca %d port %d", port->hca->hca_index + 1, port->local_port); - port->dm_query_in_progress = FALSE; - port->dm_need_query = FALSE; + port->dm_query_in_progress = 0; + port->dm_need_query = 0; - if (port->dm_need_retry == TRUE) { + if (port->dm_need_retry) { if (port->dm_retry_count++ < MAX_DM_RETRIES) { TS_REPORT_WARN(MOD_SRPTP, @@ -599,7 +597,7 @@ TS_REPORT_WARN(MOD_SRPTP, "DM Client timeout on hca %d port %d", port->hca->hca_index + 1, port->local_port); - port->dm_need_retry = TRUE; + port->dm_need_retry = 1; } else if (status) { /* @@ -630,7 +628,7 @@ * If we are shuting down, throw away the query */ if (driver_params.dm_shutdown) { - if ((status == TS_SUCCESS) && (io_list)) { + if (!status && io_list) { ib_host_io_list_free(io_list); } return; @@ -649,16 +647,16 @@ port = query_entry->port; - if ((status == TS_SUCCESS) && (io_list == NULL)) { + if (!status && !io_list) { TS_REPORT_STAGE(MOD_SRPTP, "Port Query %d complete hca %d port %d", query_entry->id, port->hca->hca_index + 1, port->local_port); - if (query_entry->need_retry == TRUE) { - query_entry->need_retry = FALSE; + if (query_entry->need_retry) { + query_entry->need_retry = 0; - driver_params.port_query = TRUE; + driver_params.port_query = 1; ib_host_io_port_query(query_entry->port->hca->ca_hndl, query_entry->port->local_port, @@ -678,13 +676,13 @@ port->local_port); if (++query_entry->retry < MAX_QUERY_RETRIES) { - query_entry->need_retry = TRUE; + query_entry->need_retry = 1; } else { TS_REPORT_WARN(MOD_SRPTP, "Retries exceeded on hca %d port %d", port->hca->hca_index + 1, port->local_port); - query_entry->need_retry = FALSE; + query_entry->need_retry = 0; } } else if (status) { /* @@ -784,11 +782,11 @@ sizeof(tTS_IB_GID)) == 0) && (srp_path_records[index].slid == port->slid)) { *path_record = &srp_path_records[index]; - return (TS_SUCCESS); + return 0; } } - return (TS_FAIL); + return -1; } void srp_update_cache(struct ib_path_record *path_record, @@ -824,7 +822,7 @@ status = srp_find_path_record(find_gid, port, &path_record); - if (status == TS_SUCCESS) { + if (!status) { TS_REPORT_STAGE(MOD_SRPTP, "Found Path Record in cache"); completion_function(TS_IB_CLIENT_QUERY_TID_INVALID, @@ -924,7 +922,7 @@ query_entry->state = QUERY_PORT_INFO; - driver_params.port_query = TRUE; + driver_params.port_query = 1; ib_host_io_port_query(port->hca->ca_hndl, port->local_port, @@ -980,7 +978,7 @@ /* * Query already outstanding, do nothing */ - if (srp_find_query(port, notified_port_gid) == TS_SUCCESS) { + if (!srp_find_query(port, notified_port_gid)) { up(&driver_params.sema); return; } @@ -1068,7 +1066,7 @@ down(&driver_params.sema); - if (srp_find_query(srp_port, notified_port_gid) == TS_SUCCESS) { + if (!srp_find_query(srp_port, notified_port_gid)) { up(&driver_params.sema); return; } @@ -1214,7 +1212,7 @@ port->hca->hca_index + 1, status); if (status == -ETIMEDOUT) - srp_register_out_of_service(port, TRUE); + srp_register_out_of_service(port, 1); else TS_REPORT_WARN(MOD_SRPTP, "Unhandled error"); } else { @@ -1244,7 +1242,7 @@ port->hca->hca_index + 1, status); if (status == -ETIMEDOUT) - srp_register_in_service(port, TRUE); + srp_register_in_service(port, 1); else TS_REPORT_WARN(MOD_SRPTP, "Unhandled error for in-service " @@ -1266,7 +1264,7 @@ tTS_IB_SA_NOTICE_HANDLER_FUNC handler; tTS_IB_INFORM_INFO_SET_COMPLETION_FUNC completion_handler; - if (port->valid == FALSE) + if (!port->valid) return; if (flag) { @@ -1316,7 +1314,7 @@ tTS_IB_SA_NOTICE_HANDLER_FUNC handler; tTS_IB_INFORM_INFO_SET_COMPLETION_FUNC completion_handler; - if (port->valid == FALSE) + if (!port->valid) return; if (flag) { @@ -1361,13 +1359,13 @@ int srp_dm_query(srp_host_port_params_t * port) { - int status = TS_FAIL; + int status = -1; if (port->port_state == IB_PORT_STATE_ACTIVE) { - port->dm_query_in_progress = TRUE; + port->dm_query_in_progress = 1; - port->dm_need_retry = FALSE; + port->dm_need_retry = 0; TS_REPORT_STAGE(MOD_SRPTP, "DM Query Initiated on hca %d local port %d", @@ -1383,7 +1381,7 @@ "tsIbHostIoQuery failed status 0x%x", status); - port->dm_query_in_progress = FALSE; + port->dm_query_in_progress = 0; } } @@ -1422,15 +1420,15 @@ "Port active event for hca %d port %d", hca_index + 1, event->modifier.port); - if (port->valid == FALSE) + if (!port->valid) break; down(&driver_params.sema); - port->dm_need_query = TRUE; + port->dm_need_query = 1; if (port->port_state != IB_PORT_ACTIVE) { - driver_params.need_refresh = TRUE; + driver_params.need_refresh = 1; } up(&driver_params.sema); @@ -1453,7 +1451,7 @@ for (port_index = 0; port_index < MAX_LOCAL_PORTS_PER_HCA; port_index++) { - if (hca->port[port_index].valid == FALSE) + if (!hca->port[port_index].valid) break; event->event = IB_PORT_ERROR; @@ -1477,7 +1475,7 @@ "Port error event for hca %d port %d", hca_index + 1, event->modifier.port); - if (port->valid == FALSE) + if (!port->valid) break; /* @@ -1498,7 +1496,7 @@ for (ioc_index = 0; ioc_index < MAX_IOCS; ioc_index++) { ioc_table[ioc_index].path_valid[hca-> hca_index] - [port->local_port - 1] = FALSE; + [port->local_port - 1] = 0; } spin_unlock_irqrestore(&driver_params.spin_lock, cpu_flags); @@ -1547,9 +1545,9 @@ up(&target->sema); } - srp_register_out_of_service(port, FALSE); + srp_register_out_of_service(port, 0); port->out_of_service_xid = 0; - srp_register_in_service(port, FALSE); + srp_register_in_service(port, 0); port->in_service_xid = 0; srp_port_query_cancel(port); @@ -1605,7 +1603,7 @@ for (hca_index = 0; hca_index < MAX_HCAS; hca_index++) { hca = &hca_params[hca_index]; - if (hca->valid == FALSE) + if (!hca->valid) break; TS_REPORT_STAGE(MOD_SRPTP, @@ -1656,7 +1654,7 @@ for (hca_index = 0; hca_index < MAX_HCAS; hca_index++) { hca = &hca_params[hca_index]; - if (hca->valid == FALSE) + if (!hca->valid) break; TS_REPORT_STAGE(MOD_SRPTP, Index: src/linux-kernel/infiniband/ulp/srp/srp_host.h =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_host.h (revision 619) +++ src/linux-kernel/infiniband/ulp/srp/srp_host.h (working copy) @@ -50,7 +50,6 @@ #include #include #include -#include "ib_legacy_types.h" #include "ts_kernel_trace.h" #include "ts_kernel_thread.h" #include Index: src/linux-kernel/infiniband/ulp/srp/srptp.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srptp.c (revision 619) +++ src/linux-kernel/infiniband/ulp/srp/srptp.c (working copy) @@ -30,7 +30,6 @@ #include #include -#include "ib_legacy_types.h" #include "ts_kernel_trace.h" #include #include "srp_cmd.h" @@ -156,7 +155,7 @@ rcv_param.scatter_list = &srp_pkt->scatter_gather_list; rcv_param.num_scatter_entries = 1; rcv_param.device_specific = NULL; - rcv_param.signaled = TRUE; + rcv_param.signaled = 1; status = ib_receive(srp_pkt->conn->qp_hndl, &rcv_param, 1); @@ -182,7 +181,7 @@ send_param.op = IB_OP_SEND; send_param.gather_list = &srp_pkt->scatter_gather_list; send_param.num_gather_entries = 1; - send_param.signaled = TRUE; + send_param.signaled = 1; status = ib_send(srp_pkt->conn->qp_hndl, &send_param, 1); @@ -252,14 +251,14 @@ goto cleanup; } - hca->valid = TRUE; + hca->valid = 1; hca->hca_index = hca_index; for (port_index = 0; port_index < MAX_LOCAL_PORTS_PER_HCA; port_index++) { /* * Apply IB ports mask here */ - hca->port[port_index].valid = TRUE; + hca->port[port_index].valid = 1; hca->port[port_index].hca = hca; hca->port[port_index].local_port = port_index + 1; hca->port[port_index].index = @@ -320,7 +319,7 @@ fmr_params.pool_size = 64 * max_cmds_per_lun * sg_elements; fmr_params.dirty_watermark = fmr_params.pool_size / 8; - fmr_params.cache = FALSE; + fmr_params.cache = 0; TS_REPORT_STAGE(MOD_SRPTP, "Pool Create max pages 0x%x pool size 0x%x", @@ -392,7 +391,7 @@ hca = &hca_params[i]; - if (hca_params[i].valid == FALSE) + if (!hca_params[i].valid) continue; status = ib_fmr_pool_destroy(hca->fmr_pool); @@ -534,7 +533,7 @@ TS_REPORT_WARN(MOD_SRPTP, "Unknown comm_id 0x%x for target %d", comm_id, target->target_index); up(&target->sema); - return (TS_SUCCESS); + return 0; } TS_REPORT_STAGE(MOD_SRPTP, "SRP conn event %d for comm id 0x%x", @@ -575,7 +574,7 @@ up(&target->sema); - return (TS_SUCCESS); + return 0; } /* @@ -599,7 +598,7 @@ hca = &hca_params[hca_index]; - if (hca->valid == FALSE) + if (!hca->valid) break; for (port_index = 0; port_index < MAX_LOCAL_PORTS_PER_HCA; @@ -726,7 +725,7 @@ active_param.rnr_retry_count = 3; active_param.cm_response_timeout = 19; active_param.max_cm_retries = 3; - active_param.flow_control = TRUE; + active_param.flow_control = 1; path_record->packet_life = 14; path_record->mtu = IB_MTU_1024; @@ -735,7 +734,7 @@ path_record, /* Primary Path */ NULL, /* alternate path */ SRP_SERVICE_ID, /* Service ID */ - FALSE, /* peer-to-peer */ + 0, /* peer-to-peer */ conn_handler, /* Callback function */ (void *)conn->target, /* Argument */ &conn->comm_id); /* comm_id */ @@ -750,7 +749,7 @@ return (status); } - return (TS_SUCCESS); + return 0; } /* Index: src/linux-kernel/infiniband/ulp/srp/hostoptions.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/hostoptions.c (revision 619) +++ src/linux-kernel/infiniband/ulp/srp/hostoptions.c (working copy) @@ -46,7 +46,6 @@ #include #include -#include "ib_legacy_types.h" #include "srp_cmd.h" #include "srptp.h" #include "srp_host.h" @@ -64,8 +63,6 @@ #define kLoadError 1 #define kNoError 0 -#define TS_FAILURE -1 -#define TS_SUCCESS 0 void ConvertToLowerCase(char *stringPtr) { @@ -239,7 +236,7 @@ /* printk( "wwn string %s\n", wwn_str ); */ /* printk( "characters copied %d\n", (u32)chars_copied ); */ if (chars_copied > (kWWNStringLength + 1)) { - return (TS_FAILURE); + return -1; } else { curr_loc += chars_copied; } @@ -249,7 +246,7 @@ *(u64 *) & (srp_targets[i].service_name) = cpu_to_be64(wwn); if (result != kNoError) - return (TS_FAILURE); + return -1; delimeter = ':'; memset(guid_str, 0, kGUIDStringLength + 1); @@ -264,7 +261,7 @@ (u32) chars_copied); if (chars_copied > (kGUIDStringLength + 1)) { - return (TS_FAILURE); + return -1; } else { curr_loc += chars_copied; } @@ -276,7 +273,7 @@ *(u64 *) & srp_targets[i].guid); if (result != kNoError) - return (TS_FAILURE); + return -1; } else { GetString(curr_loc, dlid_str, kDLIDStringLength, delimeter, &chars_copied); @@ -285,7 +282,7 @@ /* printk( "characters copied %d\n", (u32)chars_copied ); */ if (chars_copied > (kDLIDStringLength + 1)) { - return (TS_FAILURE); + return -1; } else { curr_loc += chars_copied; } @@ -296,13 +293,13 @@ srp_targets[i].iou_path_record[0].dlid); if (result != kNoError) - return (TS_FAILURE); + return -1; } i++; } - return (TS_SUCCESS); + return 0; } #endif @@ -338,7 +335,7 @@ &chars_copied); if (chars_copied > (kWWNStringLength + 1)) { - return (TS_FAILURE); + return -1; } else { curr_loc += chars_copied; } @@ -346,7 +343,7 @@ result = StringToHex64(wwn_str, &wwn); if (result != kNoError) - return (TS_FAILURE); + return -1; delimeter = ':'; memset(target_index_str, 0, kTargetIndexStringLength + 1); @@ -355,7 +352,7 @@ delimeter, &chars_copied); if (chars_copied > (kTargetIndexStringLength + 1)) { - return (TS_FAILURE); + return -1; } else { curr_loc += chars_copied; } @@ -369,33 +366,33 @@ TS_REPORT_FATAL(MOD_SRPTP, "Target %d, packet allocation failure"); } - target->valid = TRUE; + target->valid = 1; if (result != kNoError) - return (TS_FAILURE); + return -1; i++; } - return (TS_SUCCESS); + return 0; } void print_target_bindings(void) { srp_target_t *target; - int not_first_entry = FALSE; + int not_first_entry = 0; printk("srp_host: target_bindings="); for (target = &srp_targets[0]; (target < &srp_targets[max_srp_targets]); target++) { - if (target->valid == TRUE) { + if (target->valid) { /* don't print colon on first guy */ - if (not_first_entry == TRUE) { + if (not_first_entry) { printk(":"); } else { - not_first_entry = TRUE; + not_first_entry = 1; } printk("%llx.%x", Index: src/linux-kernel/infiniband/ulp/sdp/sdp_send.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_send.c (revision 619) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_send.c (working copy) @@ -1691,8 +1691,8 @@ /*.._sdp_send_ctrl_buff -- Create and Send a buffered control message. */ static s32 _sdp_send_ctrl_buff(struct sdp_opt *conn, u8 mid, - tBOOLEAN se, - tBOOLEAN sig) + int se, + int sig) { s32 result = 0; struct sdpc_buff *buff; @@ -1721,7 +1721,7 @@ /* * solicite event flag for IB sends. */ - if (TRUE == se) { + if (se) { TS_SDP_BUFF_F_SET_SE(buff); } @@ -1732,7 +1732,7 @@ /* * try for unsignalled? */ - if (TRUE == sig) { + if (sig) { TS_SDP_BUFF_F_CLR_UNSIG(buff); } @@ -1864,36 +1864,35 @@ return 0; } - return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_DATA, FALSE, FALSE); + return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_DATA, 0, 0); } /* sdp_send_ctrl_ack */ /* ========================================================================= */ /*..sdp_send_ctrl_send_sm -- Send a request for buffered mode. */ s32 sdp_send_ctrl_send_sm(struct sdp_opt *conn) { - return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SEND_SM, TRUE, TRUE); + return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SEND_SM, 1, 1); } /* sdp_send_ctrl_send_sm */ /* ========================================================================= */ /*..sdp_send_ctrl_src_cancel -- Send a source cancel */ s32 sdp_send_ctrl_src_cancel(struct sdp_opt *conn) { - return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SRC_CANCEL, TRUE, TRUE); + return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SRC_CANCEL, 1, 1); } /* sdp_send_ctrl_src_cancel */ /* ========================================================================= */ /*..sdp_send_ctrl_snk_cancel -- Send a sink cancel */ s32 sdp_send_ctrl_snk_cancel(struct sdp_opt *conn) { - return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SNK_CANCEL, TRUE, TRUE); + return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SNK_CANCEL, 1, 1); } /* sdp_send_ctrl_snk_cancel */ /* ========================================================================= */ /*..sdp_send_ctrl_snk_cancel_ack -- Send an ack for a sink cancel */ s32 sdp_send_ctrl_snk_cancel_ack(struct sdp_opt *conn) { - return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SNK_CANCEL_ACK, TRUE, - TRUE); + return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_SNK_CANCEL_ACK, 1, 1); } /* sdp_send_ctrl_snk_cancel_ack */ /* ========================================================================= */ @@ -1904,7 +1903,7 @@ /* * send */ - return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_ABORT_CONN, TRUE, TRUE); + return _sdp_send_ctrl_buff(conn, TS_SDP_MSG_MID_ABORT_CONN, 1, 1); } /* sdp_send_ctrl_abort */ /* ========================================================================= */ Index: src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.c (revision 619) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.c (working copy) @@ -338,7 +338,7 @@ /* ========================================================================= */ /*.._sdp_iocb_q_get - get, and remove, the object at the tables head */ -static struct sdpc_iocb *_sdp_iocb_q_get(struct sdpc_iocb_q *table, tBOOLEAN head) +static struct sdpc_iocb *_sdp_iocb_q_get(struct sdpc_iocb_q *table, int head) { struct sdpc_iocb *iocb; struct sdpc_iocb *next; @@ -351,7 +351,7 @@ return NULL; } - if (TRUE == head) { + if (head) { iocb = table->head; } @@ -387,7 +387,7 @@ /*.._sdp_iocb_q_put - put the IOCB object at the tables tail */ int _sdp_iocb_q_put(struct sdpc_iocb_q *table, struct sdpc_iocb *iocb, - tBOOLEAN head) + int head) { struct sdpc_iocb *next; struct sdpc_iocb *prev; @@ -411,7 +411,7 @@ iocb->next = next; next->prev = iocb; - if (TRUE == head) { + if (head) { table->head = iocb; } } @@ -427,28 +427,28 @@ /*..sdp_iocb_q_get_tail - get an IOCB object from the tables tail */ struct sdpc_iocb *sdp_iocb_q_get_tail(struct sdpc_iocb_q *table) { - return _sdp_iocb_q_get(table, FALSE); + return _sdp_iocb_q_get(table, 0); } /* sdp_iocb_q_get_tail */ /* ========================================================================= */ /*..sdp_iocb_q_get_head - get an IOCB object from the tables head */ struct sdpc_iocb *sdp_iocb_q_get_head(struct sdpc_iocb_q *table) { - return _sdp_iocb_q_get(table, TRUE); + return _sdp_iocb_q_get(table, 1); } /* sdp_iocb_q_get_head */ /* ========================================================================= */ /*..sdp_iocb_q_put_tail - put the IOCB object at the tables tail */ int sdp_iocb_q_put_tail(struct sdpc_iocb_q *table, struct sdpc_iocb *iocb) { - return _sdp_iocb_q_put(table, iocb, FALSE); + return _sdp_iocb_q_put(table, iocb, 0); } /* sdp_iocb_q_put_tail */ /* ========================================================================= */ /*..sdp_iocb_q_put_head - put the IOCB object at the tables head */ int sdp_iocb_q_put_head(struct sdpc_iocb_q *table, struct sdpc_iocb *iocb) { - return _sdp_iocb_q_put(table, iocb, TRUE); + return _sdp_iocb_q_put(table, iocb, 1); } /* sdp_iocb_q_put_head */ /* ========================================================================= */ Index: src/linux-kernel/infiniband/ulp/sdp/sdp_buff.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_buff.c (revision 619) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_buff.c (working copy) @@ -44,7 +44,7 @@ return NULL; } - if (TRUE == fifo) { + if (fifo) { buff = pool->head; } @@ -110,7 +110,7 @@ buff->next->prev = buff; buff->prev->next = buff; - if (TRUE == fifo) { + if (fifo) { pool->head = buff; } } @@ -128,7 +128,7 @@ { TS_CHECK_NULL(pool, NULL); - if (NULL == pool->head || TRUE == fifo) { + if (NULL == pool->head || fifo) { return pool->head; } @@ -243,7 +243,7 @@ { struct sdpc_buff *buff; - buff = _sdp_buff_q_get(pool, TRUE, NULL, NULL); + buff = _sdp_buff_q_get(pool, 1, NULL, NULL); return buff; } /* sdp_buff_q_get */ @@ -254,7 +254,7 @@ { struct sdpc_buff *buff; - buff = _sdp_buff_q_get(pool, TRUE, NULL, NULL); + buff = _sdp_buff_q_get(pool, 1, NULL, NULL); return buff; } /* sdp_buff_q_get_head */ @@ -265,7 +265,7 @@ { struct sdpc_buff *buff; - buff = _sdp_buff_q_get(pool, FALSE, NULL, NULL); + buff = _sdp_buff_q_get(pool, 0, NULL, NULL); return buff; } /* sdp_buff_q_get_tail */ @@ -276,7 +276,7 @@ { struct sdpc_buff *buff; - buff = _sdp_buff_q_look(pool, TRUE); + buff = _sdp_buff_q_look(pool, 1); return buff; } /* sdp_buff_q_look_head */ @@ -287,7 +287,7 @@ { struct sdpc_buff *buff; - buff = _sdp_buff_q_look(pool, FALSE); + buff = _sdp_buff_q_look(pool, 0); return buff; } /* sdp_buff_q_look_tail */ @@ -300,7 +300,7 @@ { struct sdpc_buff *buff; - buff = _sdp_buff_q_get(pool, TRUE, test_func, usr_arg); + buff = _sdp_buff_q_get(pool, 1, test_func, usr_arg); return buff; } /* sdp_buff_q_fetch_head */ @@ -313,7 +313,7 @@ { struct sdpc_buff *buff; - buff = _sdp_buff_q_get(pool, FALSE, test_func, usr_arg); + buff = _sdp_buff_q_get(pool, 0, test_func, usr_arg); return buff; } /* sdp_buff_q_fetch_tail */ @@ -432,7 +432,7 @@ { int result; - result = _sdp_buff_q_put(pool, buff, TRUE); + result = _sdp_buff_q_put(pool, buff, 1); return result; } /* sdp_buff_q_put */ @@ -444,7 +444,7 @@ { int result; - result = _sdp_buff_q_put(pool, buff, TRUE); + result = _sdp_buff_q_put(pool, buff, 1); return result; } /* sdp_buff_q_put_head */ @@ -456,7 +456,7 @@ { int result; - result = _sdp_buff_q_put(pool, buff, FALSE); + result = _sdp_buff_q_put(pool, buff, 0); return result; } /* sdp_buff_q_put_tail */ @@ -470,7 +470,7 @@ TS_CHECK_NULL(pool, -EINVAL); - while (NULL != (buff = _sdp_buff_q_get(pool, FALSE, NULL, NULL))) { + while (NULL != (buff = _sdp_buff_q_get(pool, 0, NULL, NULL))) { result = sdp_buff_pool_put(buff); if (0 > result) { Index: src/linux-kernel/infiniband/ulp/sdp/sdp_queue.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_queue.c (revision 619) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_queue.c (working copy) @@ -33,7 +33,7 @@ /* ========================================================================= */ /*.._sdp_desc_q_get - Get an element from a specific table */ static struct sdpc_desc *_sdp_desc_q_get(struct sdpc_desc_q *table, - tBOOLEAN fifo) + int fifo) { struct sdpc_desc *element; @@ -44,7 +44,7 @@ return NULL; } - if (TRUE == fifo) { + if (fifo) { element = table->head; } @@ -80,7 +80,7 @@ /*.._sdp_desc_q_put - Place an element into a specific table */ static __inline__ int _sdp_desc_q_put(struct sdpc_desc_q *table, struct sdpc_desc *element, - tBOOLEAN fifo) + int fifo) { /* * fifo: false == tail, true == head @@ -107,7 +107,7 @@ element->next->prev = element; element->prev->next = element; - if (TRUE == fifo) { + if (fifo) { table->head = element; } } @@ -212,14 +212,14 @@ /*..sdp_desc_q_get_head - Get the element at the front of the table */ struct sdpc_desc *sdp_desc_q_get_head(struct sdpc_desc_q *table) { - return _sdp_desc_q_get(table, TRUE); + return _sdp_desc_q_get(table, 1); } /* sdp_desc_q_get_head */ /* ========================================================================= */ /*..sdp_desc_q_get_tail - Get the element at the end of the table */ struct sdpc_desc *sdp_desc_q_get_tail(struct sdpc_desc_q *table) { - return _sdp_desc_q_get(table, FALSE); + return _sdp_desc_q_get(table, 0); } /* sdp_desc_q_get_tail */ /* ========================================================================= */ @@ -227,7 +227,7 @@ int sdp_desc_q_put_head(struct sdpc_desc_q *table, struct sdpc_desc *element) { - return _sdp_desc_q_put(table, element, TRUE); + return _sdp_desc_q_put(table, element, 1); } /* sdp_desc_q_put_head */ /* ========================================================================= */ @@ -235,7 +235,7 @@ int sdp_desc_q_put_tail(struct sdpc_desc_q *table, struct sdpc_desc *element) { - return _sdp_desc_q_put(table, element, FALSE); + return _sdp_desc_q_put(table, element, 0); } /* sdp_desc_q_put_tail */ /* ========================================================================= */ Index: src/linux-kernel/infiniband/include/ib_legacy_types.h =================================================================== --- src/linux-kernel/infiniband/include/ib_legacy_types.h (revision 576) +++ src/linux-kernel/infiniband/include/ib_legacy_types.h (working copy) @@ -1,58 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#ifndef _IB_LEGACY_TYPES_H -#define _IB_LEGACY_TYPES_H - -/* - * #define section - */ - -#ifndef TRUE -#define TRUE 1 -#endif - -#ifndef FALSE -#define FALSE 0 -#endif - -/* - * Common types used by all proprietary TopSpin code (native C types - * should not be used). - */ -typedef int tBOOLEAN; - -/* - * Generic type for returning pass/fail information back from subroutines - * Note that this is the *opposite* semantics from BOOLEAN. I.e. a zero - * (False) indicates success. This is consistent with the VxWorks stds. - */ -typedef enum -{ - TS_FAIL = -1, - TS_SUCCESS = 0 /* must be consistant with "OK" defined in */ - /* rl_rlstddef.h - RAPIDLOGIC */ - -} tSTATUS; - -#endif /* _IB_LEGACY_TYPES_H */ From mst at mellanox.co.il Mon Aug 9 23:20:44 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 10 Aug 2004 09:20:44 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <52llgth9wr.fsf@topspin.com> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> Message-ID: <20040810062044.GA6645@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] ib_req_ncomp_notif in core_ layer": > Sean> Since it's an optional > Sean> call though, it shouldn't assume that the call exists, and > Sean> should check first. I'd like to point out that any device can always implement req_ncomp_notif by means of req_comp_notif and cq peek or possibly even just alias req_ncomp_notif to req_comp_notif since the user will be always prepared to get a spurious event, right? So I would expect the core code to privide a software implementation using one of these two approaches if the harware driver set the pointer to 0 since device hardware does not know how to handle such event. This way the user would not need to worry if hardware can handle this feature, and thats the point of abstraction layer in my opinion - to abstract hardware. MST From mst at mellanox.co.il Mon Aug 9 23:31:16 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 10 Aug 2004 09:31:16 +0300 Subject: [openib-general] [PATCH] kill ib_legacy.h In-Reply-To: <52acx38vte.fsf_-_@topspin.com> References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> Message-ID: <20040810063116.GB6645@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "[openib-general] [PATCH] kill ib_legacy.h": > The coup de grace... > > Index: src/linux-kernel/infiniband/ulp/dapl/khash.c > =================================================================== > --- src/linux-kernel/infiniband/ulp/dapl/khash.c (revision 619) > +++ src/linux-kernel/infiniband/ulp/dapl/khash.c (working copy) > @@ -26,7 +26,6 @@ > #include > #include > > #include "ts_kernel_trace.h" Incidentally, do you think ts_kernel_trace is still a good idea? We have printk with priorities ... So far I only saw people either enabling all traces at both compile time and run time, or just disabling all traces at compile time. MST From tziporet at mellanox.co.il Tue Aug 10 00:06:51 2004 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 10 Aug 2004 10:06:51 +0300 Subject: [openib-general] OpenSM on mthca Message-ID: <506C3D7B14CDD411A52C00025558DED603CFD24D@mtlex01.yok.mtl.com> Hi, The osm_vendor_mlx_ts_anafa.c is very close to what you need, it just contain some hard-coded values (e.g. one port) that will not suite Tavor. But you can easily take it and change it in the way that will eliminate the need for any VAPI call in OpenSM. Of course you need to close the API for user space but this should be done in any case. Tziporet > -----Original Message----- > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Saturday, August 07, 2004 3:03 AM > To: Tom Duffy > Cc: openib-general at openib.org > Subject: Re: [openib-general] OpenSM on mthca > > > Tom> I have only just began to look into it. What is it going to > Tom> take to do this port? Is it possible with the current mthca > Tom> driver? > > I think all the VAPI calls from osm_vendor_XXX.[ch] need to be > removed. It's not clear to me how close osm_vendor_mlx_ts_anafa.c is > to what we would need. > > It's definitely possible on top of the current mthca driver, since > sending and receiving MADs is all that is required. > > In any case all this MAD/GSI discussion is going to lead to a change > in the userspace interface (not to mention the requirement of being > 32/64 clean) so I wouldn't want to spend too much effort getting > opensm working on the current tree. > > - Roland > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at topspin.com Tue Aug 10 06:51:26 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 06:51:26 -0700 Subject: [openib-general] [PATCH] kill ib_legacy.h In-Reply-To: <20040810063116.GB6645@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 10 Aug 2004 09:31:16 +0300") References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> Message-ID: <52vffr6qzl.fsf@topspin.com> Michael> Incidentally, do you think ts_kernel_trace is still a good idea? No, it needs to be removed as well. - Roland From mshefty at ichips.intel.com Tue Aug 10 07:57:51 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 07:57:51 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040810062044.GA6645@mellanox.co.il> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> Message-ID: <20040810075751.07f19460.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 09:20:44 +0300 "Michael S. Tsirkin" wrote: > I'd like to point out that any device can always implement > req_ncomp_notif by means of req_comp_notif and cq peek > or possibly even just alias req_ncomp_notif to req_comp_notif > since the user will be always prepared to get a spurious > event, right? The problem with abstracting this call is that the performance of the abstraction isn't what the client might expect. So, the abstraction does not behave as described. From halr at voltaire.com Tue Aug 10 09:41:47 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 10 Aug 2004 12:41:47 -0400 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <20040809183614.487654d2.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F18AC7F@taurus.voltaire.com> <20040809183614.487654d2.mshefty@ichips.intel.com> Message-ID: <1092156109.1804.1.camel@localhost.localdomain> On Mon, 2004-08-09 at 21:36, Sean Hefty wrote: > I've updated ib_mad.h (again). This time additional comments were added, > and notes were made where we need to continue or start having discussions. > The only major change to the API was modifying ib_mad_recv_wc to use a single receive buffer, > versus a chain. Is it intended that this structure include a pointer to the MAD (struct ib_mad *mad) rather than a coalesced buffer and a (total) length ? I presume it is a requirement (of the implementation) that out of order RMPP segments are reordered before presentation across this interface. > (I haven't given up on zero-copy receives; the proposed chaining just needs some additional thought.) > Hopefully, we can begin working towards this API, but continue discussing some of the areas marked > in the file with 'XXX'. Maybe these should be in a GSI TODO rather than in this file. > I need to think more about the redirection case, and examine an implementation to see > how much work/policy is in it. I think that if it is totally transparent to the requestor, > then the API will be unaffected. Likewise, if it is exposed. We probably only need to > modify the API if we need to reach some sort of compromise between the implementation > versus the policy. At this point, in the interest of moving on, we ought to defer redirect. We can live without it in the short term. > > Modified file is below: > > struct ib_rmpp_hdr { > u8 rmpp_version; > u8 rmpp_type; > u8 rmpp_flags; As this field also contains RespTime, maybe it's name should be something more like rmpp_rtime_flags. > u8 rmpp_status; > u32 seg_num; > u32 paylen_newwin; > } __attribute__ ((packed)); -- Hal From mshefty at ichips.intel.com Tue Aug 10 09:49:12 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 09:49:12 -0700 Subject: [openib-general] Some ib_mad.h Redirection Comments In-Reply-To: <1092156109.1804.1.camel@localhost.localdomain> References: <35EA21F54A45CB47B879F21A91F4862F18AC7F@taurus.voltaire.com> <20040809183614.487654d2.mshefty@ichips.intel.com> <1092156109.1804.1.camel@localhost.localdomain> Message-ID: <20040810094912.55d29965.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 12:41:47 -0400 Hal Rosenstock wrote: Thanks for the feedback! > Is it intended that this structure include a pointer to the MAD (struct > ib_mad *mad) rather than a coalesced buffer and a (total) length ? My intention is that the structure should reference the start of the MAD (coalesced or not). The length is the total length of the buffer referenced by *mad. In most cases, this would be 256 bytes. For RMPP MADs, it could be larger. We can discuss ownership of the ib_mad_recv_wc structure. For now, I'm assuming that it would be owned by the access layer. The buffers belong to the user. Note that I separated the grh to allow using a single grh data buffer for all receives on redirected QPs. > I presume it is a requirement (of the implementation) that out of order > RMPP segments are reordered before presentation across this interface. Correct. One issue with how the structure is defined is that it currently requires copying the RMPP segments into a single buffer before handing them to the user. We need to revisit how to avoid this data copy, but still have an interface that's usable for special QPs as well as redirected QPs. > > (I haven't given up on zero-copy receives; the proposed chaining just needs some additional thought.) > > Hopefully, we can begin working towards this API, but continue discussing some of the areas marked > > in the file with 'XXX'. > Maybe these should be in a GSI TODO rather than in this file. I will move these into a TODO file. > > I need to think more about the redirection case, and examine an implementation to see > > how much work/policy is in it. I think that if it is totally transparent to the requestor, > > then the API will be unaffected. Likewise, if it is exposed. We probably only need to > > modify the API if we need to reach some sort of compromise between the implementation > > versus the policy. > At this point, in the interest of moving on, we ought to defer redirect. > We can live without it in the short term. Sounds good to me. > > struct ib_rmpp_hdr { > > u8 rmpp_version; > > u8 rmpp_type; > > u8 rmpp_flags; > > As this field also contains RespTime, maybe it's name should be something > more like rmpp_rtime_flags. Good point. From roland at topspin.com Tue Aug 10 10:56:27 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 10:56:27 -0700 Subject: [openib-general] [PATCH] Convert multicast functions to new API Message-ID: <528ycm7u7o.fsf@topspin.com> This patch converts the multicast functions to the new API. There's still more work to do to get rid of all uses of tTS_IB_GID (in favor of union ib_gid). - Roland Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (revision 576) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (working copy) @@ -38,7 +38,7 @@ set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); } -int ipoib_mcast_attach(struct net_device *dev, u16 mlid, tTS_IB_GID mgid) +int ipoib_mcast_attach(struct net_device *dev, u16 mlid, union ib_gid *mgid) { struct ipoib_dev_priv *priv = dev->priv; struct ib_qp_attribute *qp_attr; @@ -68,7 +68,7 @@ /* attach QP to multicast group */ down(&priv->mcast_mutex); - ret = ib_multicast_attach(mlid, mgid, priv->qp); + ret = ib_attach_mcast(priv->qp, mgid, mlid); up(&priv->mcast_mutex); if (ret) TS_REPORT_FATAL(MOD_IB_NET, @@ -80,17 +80,17 @@ return ret; } -int ipoib_mcast_detach(struct net_device *dev, u16 mlid, tTS_IB_GID mgid) +int ipoib_mcast_detach(struct net_device *dev, u16 mlid, union ib_gid *mgid) { struct ipoib_dev_priv *priv = dev->priv; int ret; down(&priv->mcast_mutex); - ret = ib_multicast_detach(mlid, mgid, priv->qp); + ret = ib_detach_mcast(priv->qp, mgid, mlid); up(&priv->mcast_mutex); if (ret) TS_REPORT_WARN(MOD_IB_NET, - "%s: ib_multicast_detach failed (result = %d)", + "%s: ib_detach_mcast failed (result = %d)", dev->name, ret); return ret; Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c (revision 607) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_arp.c (working copy) @@ -46,7 +46,7 @@ uint8_t hash[IPOIB_ADDRESS_HASH_BYTES]; - tTS_IB_GID gid; + union ib_gid gid; u32 qpn; u16 lid; tTS_IB_SL sl; @@ -87,7 +87,7 @@ /* =============================================================== */ /*.._ipoib_sarp_hash -- hash GID/QPN to 6 bytes */ -static void _ipoib_sarp_hash(tTS_IB_GID gid, u32 qpn, uint8_t *hash) +static void _ipoib_sarp_hash(union ib_gid *gid, u32 qpn, uint8_t *hash) { /* We use the FNV hash (http://www.isthe.com/chongo/tech/comp/fnv/) */ #define TS_FNV_64_PRIME 0x100000001b3ULL @@ -99,11 +99,11 @@ /* make qpn big-endian so we know where digits are */ qpn = cpu_to_be32(qpn); - for (i = 0; i < sizeof(tTS_IB_GID) + 3; ++i) { + for (i = 0; i < sizeof (union ib_gid) + 3; ++i) { h *= TS_FNV_64_PRIME; h ^= (i < sizeof(tTS_IB_GID) - ? gid[i] - : ((uint8_t *)&qpn)[i - sizeof(tTS_IB_GID) + 1]); + ? gid->raw[i] + : ((uint8_t *)&qpn)[i - sizeof (union ib_gid) + 1]); } /* xor fold down to 6 bytes and make big-endian */ @@ -291,7 +291,7 @@ /* =============================================================== */ /*..ipoib_sarp_iter_read -- get data pointed to by ARP iterator */ void ipoib_sarp_iter_read(struct ipoib_sarp_iter *iter, uint8_t *hash, - tTS_IB_GID gid, u32 *qpn, + union ib_gid *gid, u32 *qpn, unsigned long *created, unsigned long *last_verify, unsigned int *queuelen, unsigned int *complete) { @@ -300,7 +300,7 @@ entry = list_entry(iter->cur, struct ipoib_sarp, cache_list); memcpy(hash, entry->hash, IPOIB_ADDRESS_HASH_BYTES); - memcpy(gid, entry->gid, sizeof(tTS_IB_GID)); + *gid = entry->gid; *qpn = entry->qpn; *created = entry->created; *last_verify = entry->last_verify; @@ -310,7 +310,7 @@ /* =============================================================== */ /*..ipoib_sarp_add -- add ARP entry */ -struct ipoib_sarp *ipoib_sarp_add(struct net_device *dev, tTS_IB_GID gid, +struct ipoib_sarp *ipoib_sarp_add(struct net_device *dev, union ib_gid *gid, u32 qpn) { struct ipoib_dev_priv *priv = dev->priv; @@ -323,7 +323,7 @@ entry = _ipoib_sarp_find(dev, hash); if (entry) { if (entry->qpn != qpn - || memcmp(entry->gid, gid, sizeof(tTS_IB_GID))) { + || memcmp(entry->gid.raw, gid->raw, sizeof (union ib_gid))) { TS_REPORT_WARN(MOD_IB_NET, "%s: hash collision", dev->name); ipoib_sarp_put(entry); /* for _find() */ @@ -340,8 +340,7 @@ } memcpy(entry->hash, hash, sizeof(entry->hash)); - memcpy(entry->gid, gid, sizeof(tTS_IB_GID)); - + entry->gid = *gid; entry->qpn = qpn; entry->require_verify = 1; @@ -356,7 +355,7 @@ /* =============================================================== */ /*..ipoib_sarp_local_add -- add ARP hash for local node */ struct ipoib_sarp *ipoib_sarp_local_add(struct net_device *dev, - tTS_IB_GID gid, u32 qpn) + union ib_gid *gid, u32 qpn) { _ipoib_sarp_hash(gid, qpn, dev->dev_addr); return ipoib_sarp_add(dev, gid, qpn); @@ -478,8 +477,10 @@ tTS_IB_CLIENT_QUERY_TID tid; ipoib_sarp_get(entry); - if (tsIbPathRecordRequest(priv->ca, priv->port, priv->local_gid, - entry->gid, priv->pkey, 0, HZ, 3600 * HZ, /* XXX cache jiffies */ + if (tsIbPathRecordRequest(priv->ca, priv->port, + priv->local_gid.raw, + entry->gid.raw, + priv->pkey, 0, HZ, 3600 * HZ, /* XXX cache jiffies */ _ipoib_sarp_path_record_completion, entry, &tid)) { TS_REPORT_WARN(MOD_IB_NET, @@ -626,8 +627,8 @@ /* rewrite IPoIB hw address to hashes */ if (be32_to_cpu(*(uint32_t *)payload->src_hw_addr) & 0xffffff) { - _ipoib_sarp_hash(payload->src_hw_addr + 4, - be32_to_cpu(*(uint32_t *)payload->src_hw_addr) & 0xffffff, hash); + _ipoib_sarp_hash((union ib_gid *) (payload->src_hw_addr + 4), + be32_to_cpu(*(uint32_t *)payload->src_hw_addr) & 0xffffff, hash); /* add shadow ARP entries if necessary */ if (ARPOP_REPLY == ntohs(arp->ar_op)) { @@ -676,7 +677,8 @@ /* Small optimization, if we already found it once, don't search again */ if (!entry) - entry = ipoib_sarp_add(dev, payload->src_hw_addr + 4, + entry = ipoib_sarp_add(dev, + (union ib_gid *) (payload->src_hw_addr + 4), be32_to_cpu(*(uint32_t *) payload->src_hw_addr) & 0xffffff); @@ -696,10 +698,11 @@ memcpy(header->h_source, hash, sizeof(header->h_source)); if (be32_to_cpu(*(uint32_t *)payload->dst_hw_addr) & 0xffffff) { - _ipoib_sarp_hash(payload->dst_hw_addr + 4, + _ipoib_sarp_hash((union ib_gid *) (payload->dst_hw_addr + 4), be32_to_cpu(*(uint32_t *)payload->dst_hw_addr) & 0xffffff, hash); - entry = ipoib_sarp_add(dev, payload->dst_hw_addr + 4, + entry = ipoib_sarp_add(dev, + (union ib_gid *) (payload->dst_hw_addr + 4), be32_to_cpu(*(uint32_t *)payload->dst_hw_addr) & 0xffffff); if (entry) @@ -753,7 +756,7 @@ if (memcmp(broadcast_mac_addr, skb->data, ETH_ALEN) == 0) { /* Broadcast gets handled differently */ - ret = ipoib_mcast_lookup(dev, priv->bcast_gid, &dmcast); + ret = ipoib_mcast_lookup(dev, &priv->bcast_gid, &dmcast); /* mcast is only valid if we get a return code of 0 or -EAGAIN */ switch (ret) { @@ -826,7 +829,7 @@ ipoib_sarp_delete(dev, dentry->hash); entry = ipoib_sarp_add(dev, - dentry->gid, + &dentry->gid, dentry->qpn); if (NULL == entry) { TS_TRACE(MOD_IB_NET, @@ -918,8 +921,8 @@ } else { *((uint32_t *)new_payload->src_hw_addr) = cpu_to_be32(entry->qpn); - memcpy(&new_payload->src_hw_addr[4], entry->gid, - sizeof(tTS_IB_GID)); + memcpy(&new_payload->src_hw_addr[4], entry->gid.raw, + sizeof (union ib_gid)); ipoib_sarp_put(entry); /* for _find() */ } @@ -927,8 +930,8 @@ ETH_ALEN) == 0) { *((uint32_t *)new_payload->dst_hw_addr) = cpu_to_be32(IB_MULTICAST_QPN); - memcpy(&new_payload->dst_hw_addr[4], priv->bcast_gid, - sizeof(tTS_IB_GID)); + memcpy(&new_payload->dst_hw_addr[4], priv->bcast_gid.raw, + sizeof (union ib_gid)); } else { entry = _ipoib_sarp_find(dev, payload + IPOIB_ADDRESS_HASH_BYTES + 4); @@ -937,8 +940,8 @@ else { *((uint32_t *)new_payload->dst_hw_addr) = cpu_to_be32(entry->qpn); - memcpy(&new_payload->dst_hw_addr[4], entry->gid, - sizeof(tTS_IB_GID)); + memcpy(&new_payload->dst_hw_addr[4], entry->gid.raw, + sizeof (union ib_gid)); ipoib_sarp_put(entry); /* for _find() */ } } @@ -1009,8 +1012,7 @@ if (nentry) { memcpy(nentry->hash, entry->hash, sizeof(nentry->hash)); - memcpy(nentry->gid, entry->gid, - sizeof(tTS_IB_GID)); + nentry->gid = entry->gid; nentry->require_verify = entry->require_verify; nentry->qpn = entry->qpn; @@ -1087,7 +1089,7 @@ if (!entry) return -EINVAL; - memcpy(gid, entry->gid, sizeof(tTS_IB_GID)); + memcpy(gid, entry->gid.raw, sizeof (union ib_gid)); ipoib_sarp_put(entry); /* for _find() */ Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_main.c (revision 576) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_main.c (working copy) @@ -45,8 +45,6 @@ DECLARE_MUTEX(ipoib_device_mutex); LIST_HEAD(ipoib_device_list); -extern tTS_IB_GID broadcast_mgid; - static const uint8_t broadcast_mac_addr[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; @@ -121,7 +119,7 @@ *ca = priv->ca; *port = priv->port; - memcpy(gid, priv->local_gid, sizeof(tTS_IB_GID)); + memcpy(gid, priv->local_gid.raw, sizeof (union ib_gid)); *pkey = priv->pkey; return 0; @@ -271,25 +269,24 @@ && (skb->data[3] & 0x80) == 0x00) { /* Multicast MAC addr */ struct ipoib_mcast *mcast = NULL; - tTS_IB_GID mgid; + union ib_gid mgid; struct iphdr *iph = (struct iphdr *)(skb->data + ETH_HLEN); u32 multiaddr = ntohl(iph->daddr); - memcpy(mgid, ipoib_broadcast_mgid, - sizeof(tTS_IB_GID)); + mgid = ipoib_broadcast_mgid; /* Add in the P_Key */ - mgid[4] = (priv->pkey >> 8) & 0xff; - mgid[5] = priv->pkey & 0xff; + mgid.raw[4] = (priv->pkey >> 8) & 0xff; + mgid.raw[5] = priv->pkey & 0xff; /* Fixup the group mapping */ - mgid[12] = (multiaddr >> 24) & 0x0f; - mgid[13] = (multiaddr >> 16) & 0xff; - mgid[14] = (multiaddr >> 8) & 0xff; - mgid[15] = multiaddr & 0xff; + mgid.raw[12] = (multiaddr >> 24) & 0x0f; + mgid.raw[13] = (multiaddr >> 16) & 0xff; + mgid.raw[14] = (multiaddr >> 8) & 0xff; + mgid.raw[15] = multiaddr & 0xff; - ret = ipoib_mcast_lookup(dev, mgid, &mcast); + ret = ipoib_mcast_lookup(dev, &mgid, &mcast); switch (ret) { case 0: return ipoib_mcast_send(dev, mcast, skb); @@ -302,7 +299,7 @@ if (memcmp(broadcast_mac_addr, skb->data, ETH_ALEN) == 0) { struct ipoib_mcast *mcast = NULL; - ret = ipoib_mcast_lookup(dev, priv->bcast_gid, &mcast); + ret = ipoib_mcast_lookup(dev, &priv->bcast_gid, &mcast); switch (ret) { case 0: return ipoib_mcast_send(dev, mcast, skb); Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib.h =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib.h (revision 607) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib.h (working copy) @@ -130,11 +130,11 @@ u16 pkey; tTS_KERNEL_THREAD pkey_thread; - tTS_IB_GID local_gid; + union ib_gid local_gid; u16 local_lid; u32 local_qpn; - tTS_IB_GID bcast_gid; + union ib_gid bcast_gid; unsigned int admin_mtu; unsigned int mcast_mtu; @@ -161,7 +161,7 @@ extern struct semaphore ipoib_device_mutex; extern struct list_head ipoib_device_list; -extern tTS_IB_GID ipoib_broadcast_mgid; +extern union ib_gid ipoib_broadcast_mgid; /* functions */ @@ -187,9 +187,9 @@ void ipoib_sarp_get(struct ipoib_sarp *entry); void ipoib_sarp_put(struct ipoib_sarp *entry); -struct ipoib_sarp *ipoib_sarp_add(struct net_device *dev, tTS_IB_GID gid, +struct ipoib_sarp *ipoib_sarp_add(struct net_device *dev, union ib_gid *gid, u32 qpn); -struct ipoib_sarp *ipoib_sarp_local_add(struct net_device *dev, tTS_IB_GID gid, +struct ipoib_sarp *ipoib_sarp_local_add(struct net_device *dev, union ib_gid *gid, u32 qpn); int ipoib_sarp_delete(struct net_device *dev, const uint8_t *hash); int ipoib_sarp_lookup(struct net_device *dev, uint8_t *hash, @@ -207,7 +207,7 @@ void ipoib_sarp_iter_free(struct ipoib_sarp_iter *iter); int ipoib_sarp_iter_next(struct ipoib_sarp_iter *iter); void ipoib_sarp_iter_read(struct ipoib_sarp_iter *iter, uint8_t *hash, - tTS_IB_GID gid, u32 *qpn, + union ib_gid *gid, u32 *qpn, unsigned long *created, unsigned long *last_verify, unsigned int *queuelen, unsigned int *complete); @@ -217,7 +217,7 @@ void ipoib_mcast_get(struct ipoib_mcast *mcast); void ipoib_mcast_put(struct ipoib_mcast *mcast); -int ipoib_mcast_lookup(struct net_device *dev, tTS_IB_GID mgid, +int ipoib_mcast_lookup(struct net_device *dev, union ib_gid *mgid, struct ipoib_mcast **mcast); int ipoib_mcast_queue_packet(struct ipoib_mcast *mcast, struct sk_buff *skb); int ipoib_mcast_send(struct net_device *dev, struct ipoib_mcast *mcast, @@ -234,16 +234,16 @@ void ipoib_mcast_iter_free(struct ipoib_mcast_iter *iter); int ipoib_mcast_iter_next(struct ipoib_mcast_iter *iter); void ipoib_mcast_iter_read(struct ipoib_mcast_iter *iter, - tTS_IB_GID gid, + union ib_gid *gid, unsigned long *created, unsigned int *queuelen, unsigned int *complete, unsigned int *send_only); int ipoib_mcast_attach(struct net_device *dev, u16 mlid, - tTS_IB_GID mgid); + union ib_gid *mgid); int ipoib_mcast_detach(struct net_device *dev, u16 mlid, - tTS_IB_GID mgid); + union ib_gid *mgid); int ipoib_qp_create(struct net_device *dev); void ipoib_qp_destroy(struct net_device *dev); @@ -271,9 +271,13 @@ #define IPOIB_GID_FMT "%02x%02x%02x%02x%02x%02x%02x%02x" \ "%02x%02x%02x%02x%02x%02x%02x%02x" -#define IPOIB_GID_ARG(gid) gid[ 0], gid[ 1], gid[ 2], gid[ 3], \ - gid[ 4], gid[ 5], gid[ 6], gid[ 7], \ - gid[ 8], gid[ 9], gid[10], gid[11], \ - gid[12], gid[13], gid[14], gid[15] +#define IPOIB_GID_ARG(gid) (gid).raw[ 0], (gid).raw[ 1], \ + (gid).raw[ 2], (gid).raw[ 3], \ + (gid).raw[ 4], (gid).raw[ 5], \ + (gid).raw[ 6], (gid).raw[ 7], \ + (gid).raw[ 8], (gid).raw[ 9], \ + (gid).raw[10], (gid).raw[11], \ + (gid).raw[12], (gid).raw[13], \ + (gid).raw[14], (gid).raw[15] #endif /* _IPOIB_H */ Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c (revision 607) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -29,11 +29,6 @@ #include -tTS_IB_GID broadcast_mgid = { - 0xff, 0x12, 0x40, 0x1b, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff -}; - static int _ipoib_ib_receive(struct ipoib_dev_priv *priv, u64 work_request_id, dma_addr_t addr) Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_proc.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_proc.c (revision 576) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_proc.c (working copy) @@ -94,14 +94,14 @@ struct ipoib_sarp_iter *iter = iter_ptr; uint8_t hash[IPOIB_ADDRESS_HASH_BYTES]; char gid_buf[sizeof("ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff")]; - tTS_IB_GID gid; + union ib_gid gid; u32 qpn; int i, n; unsigned long created, last_verify; unsigned int queuelen, complete; if (iter) { - ipoib_sarp_iter_read(iter, hash, gid, &qpn, &created, + ipoib_sarp_iter_read(iter, hash, &gid, &qpn, &created, &last_verify, &queuelen, &complete); for (i = 0; i < IPOIB_ADDRESS_HASH_BYTES; ++i) { @@ -112,10 +112,10 @@ seq_printf(file, " "); } - for (n = 0, i = 0; i < sizeof(tTS_IB_GID) / 2; ++i) { + for (n = 0, i = 0; i < sizeof gid / 2; ++i) { n += sprintf(gid_buf + n, "%x", - be16_to_cpu(((uint16_t *)gid)[i])); - if (i < sizeof(tTS_IB_GID) / 2 - 1) + be16_to_cpu(((u16 *)gid.raw)[i])); + if (i < sizeof gid / 2 - 1) gid_buf[n++] = ':'; } } @@ -162,7 +162,7 @@ /* =============================================================== */ /*.._ipoib_ascii_to_gid -- read GID from string */ -static int _ipoib_ascii_to_gid(const char *src, tTS_IB_GID dst) +static int _ipoib_ascii_to_gid(const char *src, union ib_gid *dst) { static const char xdigits[] = "0123456789abcdef"; unsigned char *tp, *endp, *colonp; @@ -170,8 +170,8 @@ int ch, saw_xdigit; unsigned int val; - memset((tp = dst), 0, sizeof(tTS_IB_GID)); - endp = tp + sizeof(tTS_IB_GID); + memset((tp = (char *) dst), 0, sizeof (union ib_gid)); + endp = tp + sizeof (union ib_gid); colonp = NULL; /* Leading :: requires some special handling. */ @@ -250,7 +250,7 @@ struct ipoib_sarp *entry; char kernel_buf[256]; char gid_buf[sizeof("ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff")]; - tTS_IB_GID gid; + union ib_gid gid; u32 qpn; count = min(count, sizeof(kernel_buf)); @@ -263,13 +263,13 @@ if (sscanf(kernel_buf, "%39s %i", gid_buf, &qpn) != 2) return -EINVAL; - if (!_ipoib_ascii_to_gid(gid_buf, gid)) + if (!_ipoib_ascii_to_gid(gid_buf, &gid)) return -EINVAL; if (qpn > 0xffffff) return -EINVAL; - entry = ipoib_sarp_add(proc_arp_device, gid, qpn); + entry = ipoib_sarp_add(proc_arp_device, &gid, qpn); if (entry) ipoib_sarp_put(entry); @@ -355,19 +355,19 @@ { struct ipoib_mcast_iter *iter = iter_ptr; char gid_buf[sizeof("ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff")]; - tTS_IB_GID mgid; + union ib_gid mgid; int i, n; unsigned long created; unsigned int queuelen, complete, send_only; if (iter) { - ipoib_mcast_iter_read(iter, mgid, &created, &queuelen, + ipoib_mcast_iter_read(iter, &mgid, &created, &queuelen, &complete, &send_only); - for (n = 0, i = 0; i < sizeof(tTS_IB_GID) / 2; ++i) { + for (n = 0, i = 0; i < sizeof mgid / 2; ++i) { n += sprintf(gid_buf + n, "%x", - be16_to_cpu(((uint16_t *)mgid)[i])); - if (i < sizeof(tTS_IB_GID) / 2 - 1) + be16_to_cpu(((u16 *)mgid.raw)[i])); + if (i < sizeof mgid / 2 - 1) gid_buf[n++] = ':'; } } Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (revision 607) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -46,7 +46,7 @@ struct ib_ah *address_handle; tTS_IB_CLIENT_QUERY_TID tid; - tTS_IB_GID mgid; + union ib_gid mgid; unsigned long flags; unsigned char logcount; @@ -61,9 +61,9 @@ struct rb_node *rb_node; }; -tTS_IB_GID ipoib_broadcast_mgid = { - 0xff, 0x12, 0x40, 0x1b, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff +union ib_gid ipoib_broadcast_mgid = { + .raw = { 0xff, 0x12, 0x40, 0x1b, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff } }; /* =============================================================== */ @@ -135,7 +135,7 @@ /* =============================================================== */ /*..__ipoib_mcast_find - find multicast group */ -struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, tTS_IB_GID mgid) +struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, union ib_gid *mgid) { struct ipoib_dev_priv *priv = dev->priv; struct rb_node *n = priv->multicast_tree.rb_node; @@ -146,7 +146,7 @@ mcast = rb_entry(n, struct ipoib_mcast, rb_node); - ret = memcmp(mgid, mcast->mgid, sizeof(tTS_IB_GID)); + ret = memcmp(mgid->raw, mcast->mgid.raw, sizeof(union ib_gid)); if (ret < 0) n = n->rb_left; else if (ret > 0) @@ -162,7 +162,7 @@ /* =============================================================== */ /*.._ipoib_mcast_find - find multicast group */ -struct ipoib_mcast *_ipoib_mcast_find(struct net_device *dev, tTS_IB_GID mgid) +struct ipoib_mcast *_ipoib_mcast_find(struct net_device *dev, union ib_gid *mgid) { struct ipoib_mcast *mcast; struct ipoib_dev_priv *priv = dev->priv; @@ -189,7 +189,7 @@ pn = *n; tmcast = rb_entry(pn, struct ipoib_mcast, rb_node); - ret = memcmp(mcast->mgid, tmcast->mgid, sizeof(tTS_IB_GID)); + ret = memcmp(mcast->mgid.raw, tmcast->mgid.raw, sizeof(union ib_gid)); if (ret < 0) n = &pn->rb_left; else if (ret > 0) @@ -226,10 +226,10 @@ } /* Set the cached Q_Key before we attach if it's the broadcast group */ - if (memcmp(mcast->mgid, priv->bcast_gid, sizeof(tTS_IB_GID)) == 0) + if (!memcmp(mcast->mgid.raw, priv->bcast_gid.raw, sizeof(union ib_gid))) priv->qkey = priv->broadcast->mcast_member.qkey; - ret = ipoib_mcast_attach(dev, mcast->mcast_member.mlid, mcast->mgid); + ret = ipoib_mcast_attach(dev, mcast->mcast_member.mlid, &mcast->mgid); if (ret < 0) { TS_REPORT_FATAL(MOD_IB_NET, "%s: couldn't attach QP to multicast group " @@ -256,7 +256,8 @@ } }; - memcpy(av.grh.dgid.raw, mcast->mcast_member.mgid, sizeof av.grh.dgid); + memcpy(av.grh.dgid.raw, mcast->mcast_member.mgid, + sizeof (union ib_gid)); mcast->address_handle = ib_create_ah(priv->pd, &av); if (IS_ERR(mcast->address_handle)) { @@ -360,7 +361,7 @@ ipoib_mcast_get(mcast); ret = tsIbMulticastGroupJoin(priv->ca, - priv->port, mcast->mgid, priv->pkey, + priv->port, mcast->mgid.raw, priv->pkey, /* ib_sm doesn't support send only yet TS_IB_MULTICAST_JOIN_SEND_ONLY_NON_MEMBER, */ @@ -427,7 +428,7 @@ status = tsIbMulticastGroupJoin(priv->ca, priv->port, - mcast->mgid, + mcast->mgid.raw, priv->pkey, TS_IB_MULTICAST_JOIN_FULL_MEMBER, HZ, @@ -524,13 +525,11 @@ goto out; } - memcpy(priv->bcast_gid, ipoib_broadcast_mgid, - sizeof(tTS_IB_GID)); - priv->bcast_gid[4] = (priv->pkey >> 8) & 0xff; - priv->bcast_gid[5] = priv->pkey & 0xff; + priv->bcast_gid = ipoib_broadcast_mgid; + priv->bcast_gid.raw[4] = (priv->pkey >> 8) & 0xff; + priv->bcast_gid.raw[5] = priv->pkey & 0xff; - memcpy(priv->broadcast->mgid, priv->bcast_gid, - sizeof(tTS_IB_GID)); + priv->broadcast->mgid = priv->bcast_gid; spin_lock_irqsave(&priv->lock, flags); __ipoib_mcast_add(dev, priv->broadcast); @@ -580,7 +579,7 @@ priv->local_lid = port_lid.lid; } - if (ib_gid_entry_get(priv->ca, priv->port, 0, priv->local_gid)) + if (ib_gid_entry_get(priv->ca, priv->port, 0, priv->local_gid.raw)) TS_REPORT_WARN(MOD_IB_NET, "%s: ib_gid_entry_get() failed", dev->name); @@ -589,7 +588,7 @@ dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); - entry = ipoib_sarp_local_add(dev, priv->local_gid, priv->local_qpn); + entry = ipoib_sarp_local_add(dev, &priv->local_gid, priv->local_qpn); if (entry) ipoib_sarp_put(entry); @@ -693,7 +692,7 @@ return 0; /* Remove ourselves from the multicast group */ - result = ipoib_mcast_detach(dev, mcast->mcast_member.mlid, mcast->mgid); + result = ipoib_mcast_detach(dev, mcast->mcast_member.mlid, &mcast->mgid); if (result) TS_REPORT_WARN(MOD_IB_NET, "%s: ipoib_mcast_detach failed (result = %d)", @@ -737,7 +736,7 @@ /* =============================================================== */ /*.._ipoib_mcast_delete -- delete multicast group join */ -static int _ipoib_mcast_delete(struct net_device *dev, tTS_IB_GID mgid) +static int _ipoib_mcast_delete(struct net_device *dev, union ib_gid *mgid) { struct ipoib_mcast *mcast; struct ipoib_dev_priv *priv = dev->priv; @@ -763,7 +762,7 @@ /* =============================================================== */ /*..ipoib_mcast_lookup -- return reference to multicast */ int ipoib_mcast_lookup(struct net_device *dev, - tTS_IB_GID mgid, + union ib_gid *mgid, struct ipoib_mcast **mmcast) { struct ipoib_dev_priv *priv = dev->priv; @@ -777,7 +776,7 @@ /* Let's create a new send only group now */ TS_TRACE(MOD_IB_NET, T_VERY_VERBOSE, TRACE_IB_NET_MULTICAST, "%s: setting up send only multicast group for " - IPOIB_GID_FMT, dev->name, IPOIB_GID_ARG(mgid)); + IPOIB_GID_FMT, dev->name, IPOIB_GID_ARG(*mgid)); mcast = ipoib_mcast_alloc(dev); if (!mcast) { @@ -790,7 +789,7 @@ set_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags); - memcpy(mcast->mgid, mgid, sizeof(tTS_IB_GID)); + mcast->mgid = *mgid; __ipoib_mcast_add(dev, mcast); @@ -863,7 +862,7 @@ nmcast->flags = mcast->flags & (1 << IPOIB_MCAST_FLAG_SENDONLY); - memcpy(nmcast->mgid, mcast->mgid, sizeof(tTS_IB_GID)); + nmcast->mgid = mcast->mgid; /* Add the new group in before the to-be-destroyed group */ list_add_tail(&nmcast->list, &mcast->list); @@ -886,8 +885,7 @@ if (priv->broadcast) { nmcast = ipoib_mcast_alloc(dev); if (nmcast) { - memcpy(nmcast->mgid, priv->broadcast->mgid, - sizeof(tTS_IB_GID)); + nmcast->mgid = priv->broadcast->mgid; rb_replace_node(&priv->broadcast->rb_node, &nmcast->rb_node, @@ -920,7 +918,7 @@ /* Delete broadcast since it will be recreated */ if (priv->broadcast) { - _ipoib_mcast_delete(dev, priv->broadcast->mgid); + _ipoib_mcast_delete(dev, &priv->broadcast->mgid); priv->broadcast = NULL; } } @@ -968,21 +966,21 @@ /* Mark all of the entries that are found or don't exist */ for (im = in_dev->mc_list; im; im = im->next) { u32 multiaddr = ntohl(im->multiaddr); - tTS_IB_GID mgid; + union ib_gid mgid; - memcpy(mgid, ipoib_broadcast_mgid, sizeof(tTS_IB_GID)); + mgid = ipoib_broadcast_mgid; /* Add in the P_Key */ - mgid[4] = (priv->pkey >> 8) & 0xff; - mgid[5] = priv->pkey & 0xff; + mgid.raw[4] = (priv->pkey >> 8) & 0xff; + mgid.raw[5] = priv->pkey & 0xff; /* Fixup the group mapping */ - mgid[12] = (multiaddr >> 24) & 0x0f; - mgid[13] = (multiaddr >> 16) & 0xff; - mgid[14] = (multiaddr >> 8) & 0xff; - mgid[15] = multiaddr & 0xff; + mgid.raw[12] = (multiaddr >> 24) & 0x0f; + mgid.raw[13] = (multiaddr >> 16) & 0xff; + mgid.raw[14] = (multiaddr >> 8) & 0xff; + mgid.raw[15] = multiaddr & 0xff; - mcast = __ipoib_mcast_find(dev, mgid); + mcast = __ipoib_mcast_find(dev, &mgid); if (!mcast || test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) { struct ipoib_mcast *nmcast; @@ -1002,7 +1000,7 @@ set_bit(IPOIB_MCAST_FLAG_FOUND, &nmcast->flags); - memcpy(nmcast->mgid, mgid, sizeof(tTS_IB_GID)); + nmcast->mgid = mgid; if (mcast) { /* Destroy the send only entry */ @@ -1125,7 +1123,7 @@ /* =============================================================== */ /*..ipoib_mcast_iter_read -- get data pointed to by multicast iterator */ void ipoib_mcast_iter_read(struct ipoib_mcast_iter *iter, - tTS_IB_GID mgid, + union ib_gid *mgid, unsigned long *created, unsigned int *queuelen, unsigned int *complete, @@ -1135,7 +1133,7 @@ mcast = rb_entry(iter->rb_node, struct ipoib_mcast, rb_node); - memcpy(mgid, mcast->mgid, sizeof(tTS_IB_GID)); + *mgid = mcast->mgid; *created = mcast->created; *queuelen = skb_queue_len(&mcast->pkt_queue); *complete = mcast->address_handle != NULL; Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 613) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -260,8 +260,12 @@ ib_fmr_destroy_func fmr_destroy; ib_fmr_map_func fmr_map; ib_fmr_unmap_func fmr_unmap; - ib_multicast_attach_func multicast_attach; - ib_multicast_detach_func multicast_detach; + int (*attach_mcast)(struct ib_qp *qp, + union ib_gid *gid, + u16 lid); + int (*detach_mcast)(struct ib_qp *qp, + union ib_gid *gid, + u16 lid); ib_mad_process_func mad_process; struct class_device class_dev; @@ -352,6 +356,9 @@ int ib_dealloc_mw(struct ib_mw *mw); +int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid); +int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid); + #endif /* __KERNEL __ */ /* XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX */ Index: src/linux-kernel/infiniband/include/ts_ib_core_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core_types.h (revision 613) +++ src/linux-kernel/infiniband/include/ts_ib_core_types.h (working copy) @@ -595,12 +595,6 @@ u32 *rkey); typedef int (*ib_fmr_unmap_func)(struct ib_device *device, struct list_head *fmr_list); -typedef int (*ib_multicast_attach_func)(struct ib_qp *qp, - u16 lid, - tTS_IB_GID gid); -typedef int (*ib_multicast_detach_func)(struct ib_qp *qp, - u16 lid, - tTS_IB_GID gid); struct ib_mad; Index: src/linux-kernel/infiniband/include/ts_ib_core.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core.h (revision 613) +++ src/linux-kernel/infiniband/include/ts_ib_core.h (working copy) @@ -108,13 +108,6 @@ u32 *rkey); int ib_fmr_deregister(struct ib_fmr *fmr); -int ib_multicast_attach(u16 multicast_lid, - tTS_IB_GID multicast_gid, - struct ib_qp *qp); -int ib_multicast_detach(u16 multicast_lid, - tTS_IB_GID multicast_gid, - struct ib_qp *qp); - int ib_async_event_handler_register(struct ib_async_event_record *record, ib_async_event_handler_func function, void *arg, Index: src/linux-kernel/infiniband/core/core_mcast.c =================================================================== --- src/linux-kernel/infiniband/core/core_mcast.c (revision 576) +++ src/linux-kernel/infiniband/core/core_mcast.c (working copy) @@ -31,25 +31,21 @@ #include -int ib_multicast_attach(u16 multicast_lid, - tTS_IB_GID multicast_gid, - struct ib_qp *qp) +int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) { - IB_CHECK_MAGIC(qp, QP); - - return qp->device->multicast_attach(qp, multicast_lid, multicast_gid); + return qp->device->attach_mcast ? + qp->device->attach_mcast(qp, gid, lid) : + -ENOSYS; } -EXPORT_SYMBOL(ib_multicast_attach); +EXPORT_SYMBOL(ib_attach_mcast); -int ib_multicast_detach(u16 multicast_lid, - tTS_IB_GID multicast_gid, - struct ib_qp *qp) +int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) { - IB_CHECK_MAGIC(qp, QP); - - return qp->device->multicast_detach(qp, multicast_lid, multicast_gid); + return qp->device->detach_mcast ? + qp->device->detach_mcast(qp, gid, lid) : + -ENOSYS; } -EXPORT_SYMBOL(ib_multicast_detach); +EXPORT_SYMBOL(ib_detach_mcast); /* Local Variables: Index: src/linux-kernel/infiniband/hw/mthca/mthca_dev.h =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_dev.h (revision 607) +++ src/linux-kernel/infiniband/hw/mthca/mthca_dev.h (working copy) @@ -340,8 +340,8 @@ int mthca_read_ah(struct mthca_dev *dev, struct mthca_ah *ah, struct ib_ud_header *header); -int mthca_multicast_attach(struct ib_qp *ibqp, u16 lid, u8 gid[16]); -int mthca_multicast_detach(struct ib_qp *ibqp, u16 lid, u8 gid[16]); +int mthca_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid); +int mthca_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid); enum ib_mad_result mthca_process_mad(struct ib_device *ibdev, int ignore_mkey, Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 612) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -559,8 +559,8 @@ dev->ib_dev.req_notify_cq = mthca_req_notify_cq; dev->ib_dev.reg_phys_mr = mthca_reg_phys_mr; dev->ib_dev.dereg_mr = mthca_dereg_mr; - dev->ib_dev.multicast_attach = mthca_multicast_attach; - dev->ib_dev.multicast_detach = mthca_multicast_detach; + dev->ib_dev.attach_mcast = mthca_multicast_attach; + dev->ib_dev.detach_mcast = mthca_multicast_detach; dev->ib_dev.mad_process = mthca_process_mad; return ib_device_register(&dev->ib_dev); Index: src/linux-kernel/infiniband/hw/mthca/mthca_mcg.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_mcg.c (revision 576) +++ src/linux-kernel/infiniband/hw/mthca/mthca_mcg.c (working copy) @@ -122,7 +122,7 @@ return err; } -int mthca_multicast_attach(struct ib_qp *ibqp, u16 lid, u8 gid[16]) +int mthca_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { struct mthca_dev *dev = to_mdev(ibqp->device); void *mailbox; @@ -142,13 +142,13 @@ if (down_interruptible(&dev->mcg_table.sem)) return -EINTR; - err = find_mgm(dev, gid, mgm, &hash, &prev, &index); + err = find_mgm(dev, gid->raw, mgm, &hash, &prev, &index); if (err) goto out; if (index != -1) { if (!memcmp(mgm->gid, zero_gid, 16)) - memcpy(mgm->gid, gid, 16); + memcpy(mgm->gid, gid->raw, 16); } else { link = 1; @@ -168,7 +168,7 @@ goto out; } - memcpy(mgm->gid, gid, 16); + memcpy(mgm->gid, gid->raw, 16); mgm->next_gid_index = 0; } @@ -220,7 +220,7 @@ return err; } -int mthca_multicast_detach(struct ib_qp *ibqp, u16 lid, u8 gid[16]) +int mthca_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { struct mthca_dev *dev = to_mdev(ibqp->device); void *mailbox; @@ -239,17 +239,21 @@ if (down_interruptible(&dev->mcg_table.sem)) return -EINTR; - err = find_mgm(dev, gid, mgm, &hash, &prev, &index); + err = find_mgm(dev, gid->raw, mgm, &hash, &prev, &index); if (err) goto out; if (index == -1) { mthca_err(dev, "MGID %04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x " "not found\n", - be16_to_cpu(((u16 *) gid)[0]), be16_to_cpu(((u16 *) gid)[1]), - be16_to_cpu(((u16 *) gid)[2]), be16_to_cpu(((u16 *) gid)[3]), - be16_to_cpu(((u16 *) gid)[4]), be16_to_cpu(((u16 *) gid)[5]), - be16_to_cpu(((u16 *) gid)[6]), be16_to_cpu(((u16 *) gid)[7])); + be16_to_cpu(((u16 *) gid->raw)[0]), + be16_to_cpu(((u16 *) gid->raw)[1]), + be16_to_cpu(((u16 *) gid->raw)[2]), + be16_to_cpu(((u16 *) gid->raw)[3]), + be16_to_cpu(((u16 *) gid->raw)[4]), + be16_to_cpu(((u16 *) gid->raw)[5]), + be16_to_cpu(((u16 *) gid->raw)[6]), + be16_to_cpu(((u16 *) gid->raw)[7])); err = -EINVAL; goto out; } From roland at topspin.com Tue Aug 10 11:19:11 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 11:19:11 -0700 Subject: [openib-general] RNR timer enum? Message-ID: <524qna7t5s.fsf@topspin.com> Do we want something in ib_verbs.h like the below enum to make setting min_rnr_timer values easier? enum ib_rnr_timeout { IB_RNR_TIMER_655_36 = 0, IB_RNR_TIMER_000_01 = 1, IB_RNR_TIMER_000_02 = 2, IB_RNR_TIMER_000_03 = 3, IB_RNR_TIMER_000_04 = 4, IB_RNR_TIMER_000_06 = 5, IB_RNR_TIMER_000_08 = 6, IB_RNR_TIMER_000_12 = 7, IB_RNR_TIMER_000_16 = 8, IB_RNR_TIMER_000_24 = 9, IB_RNR_TIMER_000_32 = 10, IB_RNR_TIMER_000_48 = 11, IB_RNR_TIMER_000_64 = 12, IB_RNR_TIMER_000_96 = 13, IB_RNR_TIMER_001_28 = 14, IB_RNR_TIMER_001_92 = 15, IB_RNR_TIMER_002_56 = 16, IB_RNR_TIMER_003_84 = 17, IB_RNR_TIMER_005_12 = 18, IB_RNR_TIMER_007_68 = 19, IB_RNR_TIMER_010_24 = 20, IB_RNR_TIMER_015_36 = 21, IB_RNR_TIMER_020_48 = 22, IB_RNR_TIMER_030_72 = 23, IB_RNR_TIMER_040_96 = 24, IB_RNR_TIMER_061_44 = 25, IB_RNR_TIMER_081_92 = 26, IB_RNR_TIMER_122_88 = 27, IB_RNR_TIMER_163_84 = 28, IB_RNR_TIMER_245_76 = 29, IB_RNR_TIMER_327_68 = 30, IB_RNR_TIMER_491_52 = 31 }; - R. From roland at topspin.com Tue Aug 10 11:35:07 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 11:35:07 -0700 Subject: [openib-general] ib_get_special_qp Message-ID: <52vffq6dus.fsf@topspin.com> Is there any point in having the enum ib_qp_type qp_type, parameter to ib_get_special_qp(), since struct ib_qp_init_attr has enum ib_qp_type qp_type; anyway? (In fact if we're willing to add a port member to struct ib_qp_init_attr, we could get rid of ib_get_special_qp() and just have ib_create_qp() handle special QPs as well, although I'm not it's worth it since both the low-level driver code and the ULP code for special QPs is probably pretty different from the code from regular QPs) - R. From roland at topspin.com Tue Aug 10 11:41:15 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 11:41:15 -0700 Subject: [openib-general] remote_atomic_flags? Message-ID: <52r7qe6dkk.fsf@topspin.com> What is supposed to be filled in the remote_atomic_flags member of struct ib_qp_attr? Would it make sense to make the type of that field be an enum instead of just int (so it's a little bit more self-documenting)? - R. From tduffy at sun.com Tue Aug 10 11:43:50 2004 From: tduffy at sun.com (Tom Duffy) Date: Tue, 10 Aug 2004 11:43:50 -0700 Subject: [openib-general] [PATCH] rename functions in ip2pr to linux standard conventions Message-ID: <1092163430.22057.27.camel@duffman> This patch renames the functions in ip2pr to be in Linux standard naming conventions. Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/ulp/ipoib/ip2pr_export.h =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_export.h (revision 622) +++ drivers/infiniband/ulp/ipoib/ip2pr_export.h (working copy) @@ -65,30 +65,30 @@ * arg - supplied argument is returned in callback function * plid - pointer to storage for identifier of this query. */ -s32 tsIp2prPathRecordLookup(u32 dst_addr, /* NBO */ - u32 src_addr, /* NBO */ - u8 localroute, - s32 bound_dev_if, - tIP2PR_PATH_LOOKUP_FUNC func, - void *arg, - tIP2PR_PATH_LOOKUP_ID *plid); +s32 ip2pr_path_record_lookup(u32 dst_addr, /* NBO */ + u32 src_addr, /* NBO */ + u8 localroute, + s32 bound_dev_if, + tIP2PR_PATH_LOOKUP_FUNC func, + void *arg, + tIP2PR_PATH_LOOKUP_ID *plid); /* * address lookup cancel */ -s32 tsIp2prPathRecordCancel(tIP2PR_PATH_LOOKUP_ID plid); +s32 ip2pr_path_record_cancel(tIP2PR_PATH_LOOKUP_ID plid); /* * Giver a Source and Destination GID, get the path record */ -s32 tsGid2prLookup(tTS_IB_GID src_gid, +s32 gid2pr_lookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, u16 pkey, tGID2PR_LOOKUP_FUNC func, void *arg, tIP2PR_PATH_LOOKUP_ID * plid); -s32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid); +s32 gid2pr_cancel(tIP2PR_PATH_LOOKUP_ID plid); #endif struct ip2pr_lookup_param { Index: drivers/infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_link.c (revision 622) +++ drivers/infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -47,7 +47,7 @@ gid_lock:SPIN_LOCK_UNLOCKED }; -s32 _tsIp2PrnDelete(struct ip2pr_gid_pr_element *pr_elmt); +static s32 ip2pr_delete(struct ip2pr_gid_pr_element *pr_elmt); static tTS_IB_GID nullgid = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; @@ -55,32 +55,25 @@ ((TS_IP2PR_PATH_LOOKUP_INVALID == ++_tsIp2prPathLookupId) ? \ ++_tsIp2prPathLookupId : _tsIp2prPathLookupId) -/* --------------------------------------------------------------------- */ -/* */ -/* Path Record lookup caching */ -/* */ -/* --------------------------------------------------------------------- */ -/* ========================================================================= */ -/*.._tsIp2prPathElementLookup -- lookup a path record entry */ -static struct ip2pr_path_element *_tsIp2prPathElementLookup(u32 ip_addr) +/** + * Path Record lookup caching + */ + +/* ip2pr_path_element_lookup -- lookup a path record entry */ +static struct ip2pr_path_element *ip2pr_path_element_lookup(u32 ip_addr) { struct ip2pr_path_element *path_elmt; for (path_elmt = _tsIp2prLinkRoot.path_list; - NULL != path_elmt; path_elmt = path_elmt->next) { - - if (ip_addr == path_elmt->dst_addr) { - + NULL != path_elmt; path_elmt = path_elmt->next) + if (ip_addr == path_elmt->dst_addr) break; - } /* if */ - } /* for */ return path_elmt; -} /* _tsIp2prPathElementLookup */ +} -/* ========================================================================= */ -/*.._tsIp2prPathElementCreate -- create an entry for a path record element */ -static s32 _tsIp2prPathElementCreate(u32 dst_addr, u32 src_addr, +/* ip2pr_path_element_create -- create an entry for a path record element */ +static s32 ip2pr_path_element_create(u32 dst_addr, u32 src_addr, tTS_IB_PORT hw_port, struct ib_device *ca, struct ib_path_record *path_r, struct ip2pr_path_element **return_elmt) @@ -93,11 +86,9 @@ TS_CHECK_NULL(_tsIp2prLinkRoot.path_cache, -EINVAL); path_elmt = kmem_cache_alloc(_tsIp2prLinkRoot.path_cache, SLAB_ATOMIC); - if (NULL == path_elmt) { - + if (NULL == path_elmt) return -ENOMEM; - } - /* if */ + memset(path_elmt, 0, sizeof(*path_elmt)); spin_lock_irqsave(&_tsIp2prLinkRoot.path_lock, flags); @@ -105,10 +96,9 @@ _tsIp2prLinkRoot.path_list = path_elmt; path_elmt->p_next = &_tsIp2prLinkRoot.path_list; - if (NULL != path_elmt->next) { - + if (NULL != path_elmt->next) path_elmt->next->p_next = &path_elmt->next; - } /* if */ + spin_unlock_irqrestore(&_tsIp2prLinkRoot.path_lock, flags); /* * set values @@ -123,11 +113,10 @@ *return_elmt = path_elmt; return 0; -} /* _tsIp2prPathElementCreate */ +} -/* ========================================================================= */ -/*.._tsIp2prPathElementDestroy -- destroy an entry for a path record element */ -static s32 _tsIp2prPathElementDestroy(struct ip2pr_path_element *path_elmt) +/* ip2pr_path_element_destroy -- destroy an entry for a path record element */ +static s32 ip2pr_path_element_destroy(struct ip2pr_path_element *path_elmt) { unsigned long flags; @@ -136,11 +125,9 @@ spin_lock_irqsave(&_tsIp2prLinkRoot.path_lock, flags); if (NULL != path_elmt->p_next) { - - if (NULL != path_elmt->next) { + if (NULL != path_elmt->next) path_elmt->next->p_next = path_elmt->p_next; - } - /* if */ + *(path_elmt->p_next) = path_elmt->next; path_elmt->p_next = NULL; @@ -151,11 +138,10 @@ kmem_cache_free(_tsIp2prLinkRoot.path_cache, path_elmt); return 0; -} /* _tsIp2prPathElementDestroy */ +} -/* ========================================================================= */ -/*.._tsIp2prPathLookupComplete -- complete the resolution of a path record */ -static s32 _tsIp2prPathLookupComplete(tIP2PR_PATH_LOOKUP_ID plid, +/* ip2pr_path_lookup_complete -- complete the resolution of a path record */ +static s32 ip2pr_path_lookup_complete(tIP2PR_PATH_LOOKUP_ID plid, s32 status, struct ip2pr_path_element *path_elmt, void *funcptr, void *arg) @@ -178,17 +164,16 @@ return func(plid, status, 0, 0, 0, NULL, NULL, arg); } /* else */ -} /* _tsIp2prPathLookupComplete */ +} -/* --------------------------------------------------------------------- */ -/* */ -/* module specific functions */ -/* */ -/* --------------------------------------------------------------------- */ -/* ========================================================================= */ -/*.._tsIp2prIpoibWaitDestroy -- destroy an entry for an outstanding request */ -static s32 _tsIp2prIpoibWaitDestroy - (struct ip2pr_ipoib_wait *ipoib_wait, IP2PR_USE_LOCK use_lock) { +/** + * module specific functions + */ + +/* ip2pr_ipoib_wait_destroy -- destroy an entry for an outstanding request */ +static s32 ip2pr_ipoib_wait_destroy(struct ip2pr_ipoib_wait *ipoib_wait, + IP2PR_USE_LOCK use_lock) +{ unsigned long flags = 0; TS_CHECK_NULL(ipoib_wait, -EINVAL); @@ -213,11 +198,10 @@ kmem_cache_free(_tsIp2prLinkRoot.wait_cache, ipoib_wait); return 0; -} /* _tsIp2prIpoibWaitDestroy */ +} -/* ========================================================================= */ -/*.._tsIp2prIpoibWaitTimeout -- timeout function for link resolution */ -static void _tsIp2prIpoibWaitTimeout(void *arg) +/* ip2pr_ipoib_wait_timeout -- timeout function for link resolution */ +static void ip2pr_ipoib_wait_timeout(void *arg) { struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *)arg; s32 result; @@ -238,7 +222,7 @@ if (0 < TS_IP2PR_IPOIB_FLAGS_EMPTY(ipoib_wait)) { result = - _tsIp2prIpoibWaitDestroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); + ip2pr_ipoib_wait_destroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); TS_EXPECT(MOD_IP2PR, !(0 > result)); return; @@ -255,7 +239,7 @@ ipoib_wait->timer.run_time = jiffies + (ipoib_wait->prev_timeout * HZ) + (jiffies & 0x0f); - ipoib_wait->timer.function = _tsIp2prIpoibWaitTimeout; + ipoib_wait->timer.function = ip2pr_ipoib_wait_timeout; ipoib_wait->timer.arg = ipoib_wait; tsKernelTimerAdd(&ipoib_wait->timer); @@ -271,35 +255,30 @@ ipoib_wait->dev, ipoib_wait->src_addr, NULL, ipoib_wait->dev->dev_addr, NULL); - } /* if */ - else { - - result = _tsIp2prPathLookupComplete(ipoib_wait->plid, + } else { + result = ip2pr_path_lookup_complete(ipoib_wait->plid, -EHOSTUNREACH, NULL, ipoib_wait->func, ipoib_wait->arg); - if (0 > result) { - + if (0 > result) TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, "FUNC: Error <%d> timing out address resolution. <%08x>", result, ipoib_wait->dst_addr); - } - /* if */ + TS_IP2PR_IPOIB_FLAG_CLR_FUNC(ipoib_wait); - result = - _tsIp2prIpoibWaitDestroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); + result = ip2pr_ipoib_wait_destroy(ipoib_wait, + IP2PR_LOCK_NOT_HELD); TS_EXPECT(MOD_IP2PR, !(0 > result)); - } /* else */ + } return; -} /* _tsIp2prIpoibWaitTimeout */ +} -/* ========================================================================= */ -/*.._tsIp2prIpoibWaitCreate -- create an entry for an outstanding request */ +/* ip2pr_ipoib_wait_create -- create an entry for an outstanding request */ static struct ip2pr_ipoib_wait * -_tsIp2prIpoibWaitCreate(tIP2PR_PATH_LOOKUP_ID plid, u32 dst_addr, u32 src_addr, +ip2pr_ipoib_wait_create(tIP2PR_PATH_LOOKUP_ID plid, u32 dst_addr, u32 src_addr, u8 localroute, u32 bound_dev_if, tIP2PR_PATH_LOOKUP_FUNC func, void *arg, s32 ltype) { @@ -319,7 +298,7 @@ tsKernelTimerInit(&ipoib_wait->timer); ipoib_wait->timer.run_time = jiffies + (_tsIp2prLinkRoot.retry_timeout * HZ); - ipoib_wait->timer.function = _tsIp2prIpoibWaitTimeout; + ipoib_wait->timer.function = ip2pr_ipoib_wait_timeout; ipoib_wait->timer.arg = ipoib_wait; } ipoib_wait->type = ltype; @@ -347,11 +326,10 @@ } /* if */ return ipoib_wait; -} /* _tsIp2prIpoibWaitCreate */ +} -/* ========================================================================= */ -/*.._tsIp2prIpoibWaitListInsert -- insert an entry into the wait list */ -static s32 _tsIp2prIpoibWaitListInsert(struct ip2pr_ipoib_wait *ipoib_wait) +/* ip2pr_ipoib_wait_list_insert -- insert an entry into the wait list */ +static s32 ip2pr_ipoib_wait_list_insert(struct ip2pr_ipoib_wait *ipoib_wait) { unsigned long flags; @@ -384,12 +362,14 @@ } return 0; -} /* _tsIp2prIpoibWaitListInsert */ +} -/* ========================================================================= */ -/*.._tsIp2prIpoibWaitPlidLookup -- lookup an entry for an outstanding request */ +/* + * ip2pr_ipoib_waith_plid_lookup -- lookup an entry for an outstanding + * request + */ static struct ip2pr_ipoib_wait * -tsIp2prIpoibWaitPlidLookup(tIP2PR_PATH_LOOKUP_ID plid) +ip2pr_ipoib_wait_plid_lookup(tIP2PR_PATH_LOOKUP_ID plid) { unsigned long flags; struct ip2pr_ipoib_wait *ipoib_wait; @@ -406,12 +386,14 @@ spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); return ipoib_wait; -} /* _tsIp2prIpoibWaitPlidLookup */ +} -/* ========================================================================= */ -/*..tsIp2prPathElementTableDump - dump the path record element table to proc */ -s32 tsIp2prPathElementTableDump(char *buffer, s32 max_size, s32 start_index, - long *end_index) +/* + * ip2pr_path_element_table_dump -- dump the path record element table to + * proc + */ +s32 ip2pr_path_element_table_dump(char *buffer, s32 max_size, s32 start_index, + long *end_index) { struct ip2pr_path_element *path_elmt; s32 counter = 0; @@ -475,13 +457,15 @@ } /* if */ return offset; -} /* tsIp2prPathElementTableDump */ +} -/* ========================================================================= */ -/*..tsIp2prIpoibWaitTableDump - dump the address resolution wait table to proc */ +/* + * ip2pr_ipoib_wait_table_dump -- dump the address resolution wait table + * to proc + */ s32 -tsIp2prIpoibWaitTableDump(char *buffer, s32 max_size, s32 start_index, - long *end_index) +ip2pr_ipoib_wait_table_dump(char *buffer, s32 max_size, s32 start_index, + long *end_index) { struct ip2pr_ipoib_wait *ipoib_wait; s32 counter = 0; @@ -535,11 +519,11 @@ } /* if */ return offset; -} /* tsIp2prIpoibWaitTableDump */ +} -/* ..tsIp2prProcReadInt. dump integer value to /proc file */ -s32 tsIp2prProcReadInt(char *buffer, s32 max_size, s32 start_index, - long *end_index, int val) +/* ip2pr_proc_read_int -- dump integer value to /proc file */ +s32 ip2pr_proc_read_int(char *buffer, s32 max_size, s32 start_index, + long *end_index, int val) { s32 offset = 0; @@ -553,94 +537,73 @@ return (offset); } -/* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcRetriesRead(char *buffer, s32 max_size, s32 start_index, - long *end_index) +/* ip2pr_proc_retries_read -- dump current retry value */ +s32 ip2pr_proc_retries_read(char *buffer, s32 max_size, s32 start_index, + long *end_index) { - - return (tsIp2prProcReadInt(buffer, - max_size, - start_index, - end_index, _tsIp2prLinkRoot.max_retries)); + return (ip2pr_proc_read_int(buffer, max_size, start_index, + end_index, _tsIp2prLinkRoot.max_retries)); } -/* ..tsIp2prProcTimeoutRead. dump current timeout value */ -s32 tsIp2prProcTimeoutRead(char *buffer, s32 max_size, s32 start_index, - long *end_index) +/* ip2pr_proc-timeout_read -- dump current timeout value */ +s32 ip2pr_proc_timeout_read(char *buffer, s32 max_size, s32 start_index, + long *end_index) { - - return (tsIp2prProcReadInt(buffer, - max_size, - start_index, - end_index, _tsIp2prLinkRoot.retry_timeout)); + return (ip2pr_proc_read_int(buffer, max_size, start_index, + end_index, _tsIp2prLinkRoot.retry_timeout)); } -/* ..tsIp2prProcBackoutRead. dump current backout value */ -s32 tsIp2prProcBackoffRead(char *buffer, s32 max_size, s32 start_index, +/* ip2pr_proc_backoff_read -- dump current backoff value */ +s32 ip2pr_proc_backoff_read(char *buffer, s32 max_size, s32 start_index, long *end_index) { - - return (tsIp2prProcReadInt(buffer, - max_size, - start_index, - end_index, _tsIp2prLinkRoot.backoff)); + return (ip2pr_proc_read_int(buffer, max_size, start_index, + end_index, _tsIp2prLinkRoot.backoff)); } -/* ..tsIp2prProcCacheTimeoutRead. dump current cache timeout value */ -s32 tsIp2prProcCacheTimeoutRead(char *buffer, s32 max_size, s32 start_index, - long *end_index) +/* ip2pr_proc_cache_timeout_read -- dump current cache timeout value */ +s32 ip2pr_proc_cache_timeout_read(char *buffer, s32 max_size, s32 start_index, + long *end_index) { - - return (tsIp2prProcReadInt(buffer, - max_size, - start_index, - end_index, _tsIp2prLinkRoot.cache_timeout)); + return (ip2pr_proc_read_int(buffer, max_size, start_index, + end_index, _tsIp2prLinkRoot.cache_timeout)); } -/* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcTotalReq(char *buffer, s32 max_size, s32 start_index, - long *end_index) +/* ip2pr_proc_total_req -- dump current retry value */ +s32 ip2pr_proc_total_req(char *buffer, s32 max_size, s32 start_index, + long *end_index) { - - return (tsIp2prProcReadInt(buffer, - max_size, - start_index, end_index, ip2pr_total_req)); + return (ip2pr_proc_read_int(buffer, max_size, start_index, end_index, + ip2pr_total_req)); } -/* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcArpTimeout(char *buffer, s32 max_size, s32 start_index, +/* ip2pr_proc_arp_timeout -- dump current retry value */ +s32 ip2pr_proc_arp_timeout(char *buffer, s32 max_size, s32 start_index, long *end_index) { - - return (tsIp2prProcReadInt(buffer, - max_size, - start_index, end_index, ip2pr_arp_timeout)); + return (ip2pr_proc_read_int(buffer, max_size, start_index, end_index, + ip2pr_arp_timeout)); } -/* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcPathTimeout(char *buffer, s32 max_size, s32 start_index, +/* ip2pr_proc_path_timeout -- dump current retry value */ +s32 ip2pr_proc_path_timeout(char *buffer, s32 max_size, s32 start_index, long *end_index) { - - return (tsIp2prProcReadInt(buffer, - max_size, - start_index, end_index, ip2pr_path_timeout)); + return (ip2pr_proc_read_int(buffer, max_size, start_index, end_index, + ip2pr_path_timeout)); } -/* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -s32 tsIp2prProcTotalFail(char *buffer, s32 max_size, s32 start_index, +/* ip2pr_proc_total_fail -- dump current retry value */ +s32 ip2pr_proc_total_fail(char *buffer, s32 max_size, s32 start_index, long *end_index) { - - return (tsIp2prProcReadInt(buffer, - max_size, - start_index, end_index, ip2pr_total_fail)); + return (ip2pr_proc_read_int(buffer, max_size, start_index, end_index, + ip2pr_total_fail)); } -/* ..tsIp2prProcWriteInt. scan integer value from /proc file */ -ssize_t tsIp2prProcWriteInt(struct file * file, - const char *buffer, - size_t count, loff_t * pos, int *val) +/* ip2pr_proc_write_int -- scan integer value from /proc file */ +ssize_t ip2pr_proc_write_int(struct file * file, const char *buffer, + size_t count, loff_t * pos, int *val) { char kernel_buf[256]; int ret; @@ -662,71 +625,68 @@ return (ret); } -/* ..tsIp2prProcMaxRetriesWrite. scan max retries value */ -ssize_t tsIp2prProcRetriesWrite(struct file * file, - const char *buffer, size_t count, loff_t * pos) +/* ip2pr_proc_retries_write -- scan max retries value */ +ssize_t ip2pr_proc_retries_write(struct file * file, const char *buffer, + size_t count, loff_t * pos) { int val; int ret; - ret = tsIp2prProcWriteInt(file, buffer, count, pos, &val); + ret = ip2pr_proc_write_int(file, buffer, count, pos, &val); if (val <= TS_IP2PR_PATH_MAX_RETRIES) _tsIp2prLinkRoot.max_retries = val; return (ret); } -/* ..tsIp2prProcTimeoutWrite. scan timeout value */ -ssize_t tsIp2prProcTimeoutWrite(struct file * file, - const char *buffer, size_t count, loff_t * pos) +/* ip2pr_proc_timeout_write -- scan timeout value */ +ssize_t ip2pr_proc_timeout_write(struct file * file, const char *buffer, + size_t count, loff_t * pos) { int val; int ret; - ret = tsIp2prProcWriteInt(file, buffer, count, pos, &val); + ret = ip2pr_proc_write_int(file, buffer, count, pos, &val); if (val <= TS_IP2PR_MAX_DEV_PATH_WAIT) _tsIp2prLinkRoot.retry_timeout = val; return (ret); } -/* ..tsIp2prProcBackoutWrite. scan backout value */ -ssize_t tsIp2prProcBackoffWrite(struct file * file, - const char *buffer, size_t count, loff_t * pos) +/* ip2pr_proc_backoff_write -- scan backoff value */ +ssize_t ip2pr_proc_backoff_write(struct file * file, const char *buffer, + size_t count, loff_t * pos) { int val; int ret; - ret = tsIp2prProcWriteInt(file, buffer, count, pos, &val); + ret = ip2pr_proc_write_int(file, buffer, count, pos, &val); if (val <= TS_IP2PR_PATH_MAX_BACKOFF) _tsIp2prLinkRoot.backoff = val; return (ret); } -/* ..tsIp2prProcCacheTimeoutWrite. scan cache timeout value */ -ssize_t tsIp2prProcCacheTimeoutWrite(struct file * file, - const char *buffer, - size_t count, loff_t * pos) +/* ip2pr_proc_cache_timeout_write -- scan cache timeout value */ +ssize_t ip2pr_proc_cache_timeout_write(struct file * file, const char *buffer, + size_t count, loff_t * pos) { int val; int ret; - ret = tsIp2prProcWriteInt(file, buffer, count, pos, &val); + ret = ip2pr_proc_write_int(file, buffer, count, pos, &val); if (val <= TS_IP2PR_PATH_MAX_CACHE_TIMEOUT) _tsIp2prLinkRoot.cache_timeout = val; return (ret); } -/* --------------------------------------------------------------------- */ -/* */ -/* Path record completion */ -/* */ -/* --------------------------------------------------------------------- */ -/* ========================================================================= */ -/*.._tsIp2prPathRecordComplete -- path lookup complete, save result */ -static s32 _tsIp2prPathRecordComplete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, +/** + * Path record completion + */ + +/* ip2pr_path_record_complete -- path lookup complete, save result */ +static s32 ip2pr_path_record_complete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, struct ib_path_record *path, s32 remaining, void *arg) { @@ -780,7 +740,7 @@ (ipoib_wait-> prev_timeout * HZ) + (jiffies & 0x0f), 0, - _tsIp2prPathRecordComplete, + ip2pr_path_record_complete, ipoib_wait, &ipoib_wait->tid); if (0 != result) { @@ -809,7 +769,7 @@ (path->dgid + sizeof(u64))), path->dlid); - result = _tsIp2prPathElementCreate(ipoib_wait->dst_addr, + result = ip2pr_path_element_create(ipoib_wait->dst_addr, ipoib_wait->src_addr, ipoib_wait->hw_port, ipoib_wait->ca, @@ -836,7 +796,7 @@ if (0 < TS_IP2PR_IPOIB_FLAG_GET_FUNC(ipoib_wait)) { - result = _tsIp2prPathLookupComplete(ipoib_wait->plid, + result = ip2pr_path_lookup_complete(ipoib_wait->plid, status, path_elmt, ipoib_wait->func, @@ -847,28 +807,23 @@ "PATH: Error <%d> completing Path Record Lookup.", result); } - /* if */ TS_IP2PR_IPOIB_FLAG_CLR_FUNC(ipoib_wait); } - /* if */ if (0 < TS_IP2PR_IPOIB_FLAGS_EMPTY(ipoib_wait)) { - result = - _tsIp2prIpoibWaitDestroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); + result = ip2pr_ipoib_wait_destroy(ipoib_wait, + IP2PR_LOCK_NOT_HELD); TS_EXPECT(MOD_IP2PR, !(0 > result)); } - /* if */ return 0; -} /* _tsIp2prPathRecordComplete */ +} -/* --------------------------------------------------------------------- */ -/* */ -/* Address resolution */ -/* */ -/* --------------------------------------------------------------------- */ -/* ========================================================================= */ -/*.._tsIp2prLinkFindComplete -- complete the resolution of an ip address */ -static s32 _tsIp2prLinkFindComplete(struct ip2pr_ipoib_wait *ipoib_wait, +/** + * Address resolution + */ + +/* ip2pr_link_find_complete -- complete the resolution of an ip address */ +static s32 ip2pr_link_find_complete(struct ip2pr_ipoib_wait *ipoib_wait, s32 status, IP2PR_USE_LOCK use_lock) { s32 result = 0; @@ -922,7 +877,7 @@ (ipoib_wait->prev_timeout * HZ) + (jiffies & 0x0f), 0, - _tsIp2prPathRecordComplete, + ip2pr_path_record_complete, ipoib_wait, &ipoib_wait->tid); if (0 != result) { @@ -944,7 +899,7 @@ done: if (0 < TS_IP2PR_IPOIB_FLAG_GET_FUNC(ipoib_wait)) { - result = _tsIp2prPathLookupComplete(ipoib_wait->plid, + result = ip2pr_path_lookup_complete(ipoib_wait->plid, status, NULL, ipoib_wait->func, @@ -961,16 +916,15 @@ /* if */ if (0 < TS_IP2PR_IPOIB_FLAGS_EMPTY(ipoib_wait)) { - expect = _tsIp2prIpoibWaitDestroy(ipoib_wait, use_lock); + expect = ip2pr_ipoib_wait_destroy(ipoib_wait, use_lock); TS_EXPECT(MOD_IP2PR, !(0 > expect)); } /* if */ return 0; -} /*_tsIp2prLinkFindComplete */ +} -/* ========================================================================= */ -/*.._tsIp2prArpQuery -- query arp cache */ -static int tsIp2prArpQuery(struct ip2pr_ipoib_wait *ipoib_wait, u32 * state) +/* ip2pr_arp_query -- query arp cache */ +static int ip2pr_arp_query(struct ip2pr_ipoib_wait *ipoib_wait, u32 * state) { struct neighbour *neigh; extern struct neigh_table arp_tbl; @@ -989,11 +943,10 @@ return (-ENOENT); } -} /*.._tsIp2prArpQuery */ +} -/* ========================================================================= */ -/*.._tsIp2prLinkFind -- resolve an ip address to a ipoib link address. */ -static s32 _tsIp2prLinkFind(struct ip2pr_ipoib_wait *ipoib_wait) +/* ip2pr_link_find -- resolve an ip address to a ipoib link address. */ +static s32 ip2pr_link_find(struct ip2pr_ipoib_wait *ipoib_wait) { s32 result; u32 state; @@ -1172,7 +1125,7 @@ /* * Not Lookback. Get the Mac address from arp */ - result = tsIp2prArpQuery(ipoib_wait, &state); + result = ip2pr_arp_query(ipoib_wait, &state); if ((result) || (state & NUD_FAILED) || ((ipoib_wait->hw[0] == 0) && (ipoib_wait->hw[1] == 0) && @@ -1182,7 +1135,7 @@ /* * No arp entry. Create a Wait entry and send Arp request */ - result = _tsIp2prIpoibWaitListInsert(ipoib_wait); + result = ip2pr_ipoib_wait_list_insert(ipoib_wait); if (0 > result) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, @@ -1206,7 +1159,7 @@ /* * We have a valid arp entry or this is a loopback interface. */ - result = _tsIp2prLinkFindComplete(ipoib_wait, 0, 1); + result = ip2pr_link_find_complete(ipoib_wait, 0, 1); if (0 > result) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, @@ -1218,16 +1171,14 @@ error: return result; -} /* _tsIp2prLinkFind */ +} -/* --------------------------------------------------------------------- */ -/* */ -/* Arp packet reception for completions */ -/* */ -/* --------------------------------------------------------------------- */ -/* ========================================================================= */ -/*.._tsIp2prArpRecvComplete -- receive all ARP packets. */ -static void _tsIp2prArpRecvComplete(void *arg) +/** + * Arp packet reception for completions + */ + +/* ip2pr_arp_recv_complete -- receive all ARP packets. */ +static void ip2pr_arp_recv_complete(void *arg) { struct ip2pr_ipoib_wait *ipoib_wait; struct ip2pr_ipoib_wait *next_wait; @@ -1257,32 +1208,28 @@ TS_IP2PR_IPOIB_FLAG_CLR_TASK(ipoib_wait); - result = _tsIp2prLinkFindComplete(ipoib_wait, 0, 0); + result = ip2pr_link_find_complete(ipoib_wait, 0, 0); if (0 > result) { - TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_WARN, "FIND: Error <%d> completing address lookup. <%08x>", result, ipoib_wait->dst_addr); - result = - _tsIp2prIpoibWaitDestroy(ipoib_wait, - IP2PR_LOCK_HELD); + result = ip2pr_ipoib_wait_destroy(ipoib_wait, + IP2PR_LOCK_HELD); TS_EXPECT(MOD_IP2PR, !(0 > result)); - } /* if */ + } } - /* if */ ipoib_wait = next_wait; } /* while */ spin_unlock_irqrestore(&_tsIp2prLinkRoot.wait_lock, flags); return; -} /* _tsIp2prArpRecvComplete */ +} -/* ========================================================================= */ -/*.._tsIp2prArpRecv -- receive all ARP packets. */ -static s32 _tsIp2prArpRecv(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pt) +/* ip2pr_arp_recv -- receive all ARP packets. */ +static s32 ip2pr_arp_recv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt) { struct ip2pr_ipoib_wait *ipoib_wait; struct ip2pr_ipoib_arp *arp_hdr; @@ -1353,7 +1300,7 @@ * Schedule the ARP completion. */ if (0 < counter) { - INIT_WORK(tqp, _tsIp2prArpRecvComplete, + INIT_WORK(tqp, ip2pr_arp_recv_complete, (void *)(unsigned long)arp_hdr->src_ip); schedule_work(tqp); @@ -1362,11 +1309,10 @@ done: kfree_skb(skb); return 0; -} /* _tsIp2prArpRecv */ +} -/* ========================================================================= */ -/*.._tsIp2prAsyncEventFunc -- IB async event handler, for clearing caches */ -static void _tsIp2prAsyncEventFunc(struct ib_async_event_record *record, +/* ip2pr_async_event_func -- IB async event handler, for clearing caches */ +static void ip2pr_async_event_func(struct ib_async_event_record *record, void *arg) { struct ip2pr_path_element *path_elmt; @@ -1389,7 +1335,7 @@ */ while (NULL != (path_elmt = _tsIp2prLinkRoot.path_list)) { - result = _tsIp2prPathElementDestroy(path_elmt); + result = ip2pr_path_element_destroy(path_elmt); TS_EXPECT(MOD_IP2PR, !(0 > result)); } /* while */ @@ -1421,7 +1367,7 @@ } /* clear the Gid pr cache */ while (NULL != (prn_elmt = sgid_elmt->pr_list)) { - _tsIp2PrnDelete(prn_elmt); + ip2pr_delete(prn_elmt); } break; } @@ -1433,11 +1379,10 @@ record->device, record->modifier.port, record->event); return; -} /* _tsIp2prAsyncEventFunc */ +} -/* ========================================================================= */ -/*.._tsIp2prPathSweepTimerFunc --sweep path cache to reap old entries. */ -static void _tsIp2prPathSweepTimerFunc(void *arg) +/* ip2pr_path_sweep_timer_func -- sweep path cache to reap old entries. */ +static void ip2pr_path_sweep_timer_func(void *arg) { struct ip2pr_path_element *path_elmt; struct ip2pr_path_element *next_elmt; @@ -1462,7 +1407,7 @@ path_elmt->usage, jiffies, htonl(path_elmt->dst_addr)); - result = _tsIp2prPathElementDestroy(path_elmt); + result = ip2pr_path_element_destroy(path_elmt); TS_EXPECT(MOD_IP2PR, !(0 > result)); } /* if */ @@ -1484,7 +1429,7 @@ TRACE_FLOW_INOUT, "GID: Deleting old <%u:%u>.", prn_elmt->usage, jiffies); - _tsIp2PrnDelete(prn_elmt); + ip2pr_delete(prn_elmt); } prn_elmt = next_prn; } @@ -1499,18 +1444,16 @@ tsKernelTimerAdd(&_tsIp2prPathTimer); return; -} /* _tsIp2prPathSweepTimerFunc */ +} -/* --------------------------------------------------------------------- */ -/* */ -/* Path record lookup functions */ -/* */ -/* --------------------------------------------------------------------- */ -/* ========================================================================= */ -/*..tsSdpPathRecordLookup -- resolve an ip address to a path record */ -s32 tsIp2prPathRecordLookup(u32 dst_addr, u32 src_addr, u8 localroute, - s32 bound_dev_if, tIP2PR_PATH_LOOKUP_FUNC func, - void *arg, tIP2PR_PATH_LOOKUP_ID * plid) +/** + * Path record lookup functions + */ + +/* ip2pr_path_record_lookup -- resolve an ip address to a path record */ +s32 ip2pr_path_record_lookup(u32 dst_addr, u32 src_addr, u8 localroute, + s32 bound_dev_if, tIP2PR_PATH_LOOKUP_FUNC func, + void *arg, tIP2PR_PATH_LOOKUP_ID * plid) { struct ip2pr_path_element *path_elmt; struct ip2pr_ipoib_wait *ipoib_wait; @@ -1526,15 +1469,15 @@ /* * perform a lookup to see if a path element structure exists. */ - path_elmt = _tsIp2prPathElementLookup(dst_addr); + path_elmt = ip2pr_path_element_lookup(dst_addr); if (NULL != path_elmt) { /* * update last used time. */ path_elmt->usage = jiffies; - result = - _tsIp2prPathLookupComplete(*plid, 0, path_elmt, func, arg); + result = ip2pr_path_lookup_complete(*plid, 0, path_elmt, func, + arg); if (0 > result) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, @@ -1545,7 +1488,7 @@ } /* if */ else { - ipoib_wait = _tsIp2prIpoibWaitCreate(*plid, + ipoib_wait = ip2pr_ipoib_wait_create(*plid, dst_addr, src_addr, localroute, @@ -1560,16 +1503,15 @@ } /* if */ ip2pr_total_req++; - result = _tsIp2prLinkFind(ipoib_wait); + result = ip2pr_link_find(ipoib_wait); if (0 > result) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "PATH: Error <%d> starting address resolution.", result); - expect = - _tsIp2prIpoibWaitDestroy(ipoib_wait, - IP2PR_LOCK_NOT_HELD); + expect = ip2pr_ipoib_wait_destroy(ipoib_wait, + IP2PR_LOCK_NOT_HELD); TS_EXPECT(MOD_IP2PR, !(0 > expect)); goto error; @@ -1579,11 +1521,11 @@ return 0; error: return result; -} /* tsIp2prPathRecordLookup */ +} +EXPORT_SYMBOL(ip2pr_path_record_lookup); -/* ========================================================================= */ -/*..tsIp2prPathRecordCancel -- cancel a lookup for an address. */ -s32 tsIp2prPathRecordCancel(tIP2PR_PATH_LOOKUP_ID plid) +/* ip2pr_path_record_cancel -- cancel a lookup for an address. */ +s32 ip2pr_path_record_cancel(tIP2PR_PATH_LOOKUP_ID plid) { struct ip2pr_ipoib_wait *ipoib_wait; s32 result; @@ -1593,7 +1535,7 @@ return -ERANGE; } /* if */ - ipoib_wait = tsIp2prIpoibWaitPlidLookup(plid); + ipoib_wait = ip2pr_ipoib_wait_plid_lookup(plid); if (NULL == ipoib_wait) { return -ENOENT; @@ -1619,15 +1561,16 @@ if (0 < TS_IP2PR_IPOIB_FLAGS_EMPTY(ipoib_wait)) { result = - _tsIp2prIpoibWaitDestroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); + ip2pr_ipoib_wait_destroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); TS_EXPECT(MOD_IP2PR, !(0 > result)); } /* if */ return 0; -} /* tsIp2prPathRecordCancel */ +} +EXPORT_SYMBOL(ip2pr_path_record_cancel); -/*..tsGid2prCancel -- cancel a lookup for an address. */ -s32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid) +/* gid2pr_cancel -- cancel a lookup for an address. */ +s32 gid2pr_cancel(tIP2PR_PATH_LOOKUP_ID plid) { struct ip2pr_ipoib_wait *ipoib_wait; s32 result; @@ -1637,7 +1580,7 @@ return -ERANGE; } /* if */ - ipoib_wait = tsIp2prIpoibWaitPlidLookup(plid); + ipoib_wait = ip2pr_ipoib_wait_plid_lookup(plid); if (NULL == ipoib_wait) { return -ENOENT; @@ -1656,29 +1599,26 @@ TS_IP2PR_IPOIB_FLAG_CLR_TIME(ipoib_wait); if (0 < TS_IP2PR_IPOIB_FLAGS_EMPTY(ipoib_wait)) { - - result = - _tsIp2prIpoibWaitDestroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); + result = ip2pr_ipoib_wait_destroy(ipoib_wait, + IP2PR_LOCK_NOT_HELD); TS_EXPECT(MOD_IP2PR, !(0 > result)); } - /* if */ return 0; -} /* tsGid2prCancel */ +} +EXPORT_SYMBOL(gid2pr_cancel); -/* --------------------------------------------------------------------- */ -/* */ -/* primary initialization/cleanup functions */ -/* */ -/* --------------------------------------------------------------------- */ +/** + * primary initialization/cleanup functions + */ + static struct packet_type _sdp_arp_type = { .type = __constant_htons(ETH_P_ARP), - .func = _tsIp2prArpRecv, + .func = ip2pr_arp_recv, .af_packet_priv = (void *)1, /* understand shared skbs */ }; -/* ========================================================================= */ -/*.._tsIp2prGidCacheLookup -- Lookup for GID in cache */ -s32 _tsIp2prGidCacheLookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, +/* ip2pr_gid_cache_lookup -- Lookup for GID in cache */ +s32 ip2pr_gid_cache_lookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, struct ib_path_record *path_record, struct ip2pr_sgid_element **gid_node) { @@ -1739,10 +1679,8 @@ return (-ENOENT); } -/* ========================================================================= */ -/*.._tsIp2prSrcGidNodeGet -- */ -s32 _tsIp2prSrcGidNodeGet(tTS_IB_GID src_gid, - struct ip2pr_sgid_element **gid_node) +s32 ip2pr_src_gid_node_get(tTS_IB_GID src_gid, + struct ip2pr_sgid_element **gid_node) { struct ip2pr_sgid_element *sgid_elmt; unsigned long flags; @@ -1763,18 +1701,16 @@ return (-EINVAL); } -/* ========================================================================= */ -/*.._tsIp2prGidElementAdd -- Add one node to Source GID List. */ -s32 _tsIp2prGidElementAdd(struct ip2pr_ipoib_wait *ipoib_wait, - struct ib_path_record *path_record) +/* ip2pr_gid_element_add -- Add one node to Source GID List. */ +static s32 ip2pr_gid_element_add(struct ip2pr_ipoib_wait *ipoib_wait, + struct ib_path_record *path_record) { unsigned long flags; struct ip2pr_sgid_element *gid_node = NULL; struct ip2pr_gid_pr_element *prn_elmt; - if (_tsIp2prSrcGidNodeGet(ipoib_wait->src_gid, &gid_node)) { + if (ip2pr_src_gid_node_get(ipoib_wait->src_gid, &gid_node)) return (-EINVAL); - } prn_elmt = kmem_cache_alloc(_tsIp2prLinkRoot.gid_pr_cache, SLAB_ATOMIC); if (NULL == prn_elmt) { @@ -1782,8 +1718,7 @@ "PATH: Error Allocating prn memory."); return (-ENOMEM); } - memcpy(&prn_elmt->path_record, path_record, - sizeof(*path_record)); + memcpy(&prn_elmt->path_record, path_record, sizeof(*path_record)); /* * Insert into the ccache list @@ -1803,7 +1738,7 @@ return (0); } -s32 _tsIp2PrnDelete(struct ip2pr_gid_pr_element *prn_elmt) +static s32 ip2pr_delete(struct ip2pr_gid_pr_element *prn_elmt) { if (NULL != prn_elmt->p_next) { @@ -1821,9 +1756,8 @@ return (0); } -/* ========================================================================= */ -/*.._tsIp2prSrcGidDelete -- Cleanup one node in Source GID List. */ -s32 _tsIp2prSrcGidDelete(struct ip2pr_sgid_element *sgid_elmt) +/* ip2pr_src_gid_delete -- Cleanup one node in Source GID List. */ +static s32 ip2pr_src_gid_delete(struct ip2pr_sgid_element *sgid_elmt) { unsigned long flags; struct ip2pr_gid_pr_element *prn_elmt; @@ -1834,7 +1768,7 @@ * Clear Path Record List for this Source GID node */ while (NULL != (prn_elmt = sgid_elmt->pr_list)) { - _tsIp2PrnDelete(prn_elmt); + ip2pr_delete(prn_elmt); } /* while */ if (NULL != sgid_elmt->p_next) { @@ -1855,9 +1789,8 @@ return (0); } -/* ========================================================================= */ -/*.._tsIp2prSrcGidAdd -- Add one node to Source GID List. */ -s32 _tsIp2prSrcGidAdd(struct ib_device *hca_device, +/* ip2pr_src_gid_add -- Add one node to Source GID List. */ +s32 ip2pr_src_gid_add(struct ib_device *hca_device, tTS_IB_PORT port, enum ib_port_state port_state) { @@ -1904,11 +1837,10 @@ return (0); } -/* ========================================================================= */ -/*.._tsGid2prComplete -- path lookup complete, save result */ -static s32 _tsGid2prComplete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, - struct ib_path_record *path, s32 remaining, - void *arg) +/* gid2pr_complete -- path lookup complete, save result */ +static s32 gid2pr_complete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, + struct ib_path_record *path, s32 remaining, + void *arg) { s32 result; struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *) arg; @@ -1933,7 +1865,7 @@ TS_IB_PATH_RECORD_FORCE_REMOTE, TS_IP2PR_DEV_PATH_WAIT, 0, - _tsGid2prComplete, + gid2pr_complete, ipoib_wait, &ipoib_wait->tid); if (0 > result) { @@ -1951,7 +1883,7 @@ /* * Add to cache */ - _tsIp2prGidElementAdd(ipoib_wait, path); + ip2pr_gid_element_add(ipoib_wait, path); goto callback; break; @@ -1968,11 +1900,10 @@ return (0); } -/* ========================================================================= */ -/*..tsGid2prLookup -- Resolve a destination GD to Path Record */ -s32 tsGid2prLookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, u16 pkey, - tGID2PR_LOOKUP_FUNC funcptr, void *arg, - tIP2PR_PATH_LOOKUP_ID * plid) +/* gid2pr_lookup -- Resolve a destination GD to Path Record */ +s32 gid2pr_lookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, u16 pkey, + tGID2PR_LOOKUP_FUNC funcptr, void *arg, + tIP2PR_PATH_LOOKUP_ID * plid) { struct ip2pr_sgid_element *gid_node; s32 result; @@ -1990,7 +1921,7 @@ /* * Lookup cache first */ - if (0 == _tsIp2prGidCacheLookup(src_gid, + if (0 == ip2pr_gid_cache_lookup(src_gid, dst_gid, &path_record, &gid_node)) { func = (tGID2PR_LOOKUP_FUNC) funcptr; result = @@ -2011,11 +1942,7 @@ return (-EHOSTUNREACH); } - ipoib_wait = _tsIp2prIpoibWaitCreate(*plid, - 0, - 0, - 0, - 0, + ipoib_wait = ip2pr_ipoib_wait_create(*plid, 0, 0, 0, 0, (void *) funcptr, arg, LOOKUP_GID2PR); if (NULL == ipoib_wait) { @@ -2030,9 +1957,9 @@ memcpy(ipoib_wait->src_gid, src_gid, sizeof(src_gid)); memcpy(ipoib_wait->dst_gid, dst_gid, sizeof(dst_gid)); - result = _tsIp2prIpoibWaitListInsert(ipoib_wait); + result = ip2pr_ipoib_wait_list_insert(ipoib_wait); if (0 > result) { - _tsIp2prIpoibWaitDestroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); + ip2pr_ipoib_wait_destroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); return (result); } @@ -2047,7 +1974,7 @@ TS_IB_PATH_RECORD_FORCE_REMOTE, TS_IP2PR_DEV_PATH_WAIT, 0, - _tsGid2prComplete, + gid2pr_complete, ipoib_wait, &ipoib_wait->tid); if (0 != result) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, @@ -2056,17 +1983,17 @@ return (result); } +EXPORT_SYMBOL(gid2pr_lookup); -/* ========================================================================= */ -/*..tsIp2prSrcGidCleanup -- Cleanup the Source GID List. */ -s32 tsIp2prSrcGidCleanup(void) +/* ip2pr_src_gid_cleanup -- Cleanup the Source GID List. */ +s32 ip2pr_src_gid_cleanup(void) { struct ip2pr_sgid_element *sgid_elmt; s32 result; while (NULL != (sgid_elmt = _tsIp2prLinkRoot.src_gid_list)) { - result = _tsIp2prSrcGidDelete(sgid_elmt); + result = ip2pr_src_gid_delete(sgid_elmt); TS_EXPECT(MOD_IP2PR, !(0 > result)); } /* while */ @@ -2076,9 +2003,8 @@ return (0); } -/* ========================================================================= */ -/*..tsIp2prSrcGidInit -- initialize the Source GID List. */ -s32 tsIp2prSrcGidInit(void) +/* ip2pr_src_gid_init -- initialize the Source GID List. */ +s32 ip2pr_src_gid_init(void) { s32 result = 0; int i, j; @@ -2128,9 +2054,8 @@ continue; } - result = - _tsIp2prSrcGidAdd(hca_device, j, - port_prop.port_state); + result = ip2pr_src_gid_add(hca_device, j, + port_prop.port_state); if (0 > result) { goto port_err; } @@ -2145,9 +2070,8 @@ return (result); } -/* ========================================================================= */ -/*..tsIp2prLinkAddrInit -- initialize the advertisment caches. */ -s32 tsIp2prLinkAddrInit(void) +/* ip2pr_link_addr_init -- initialize the advertisment caches. */ +s32 ip2pr_link_addr_init(void) { s32 result = 0; struct ib_async_event_record evt_rec; @@ -2223,7 +2147,7 @@ evt_rec.device = hca_device; evt_rec.event = IB_PORT_ERROR; result = ib_async_event_handler_register(&evt_rec, - _tsIp2prAsyncEventFunc, + ip2pr_async_event_func, NULL, &_tsIp2prAsyncErrHandle [i]); @@ -2238,7 +2162,7 @@ evt_rec.device = hca_device; evt_rec.event = IB_PORT_ACTIVE; result = ib_async_event_handler_register(&evt_rec, - _tsIp2prAsyncEventFunc, + ip2pr_async_event_func, NULL, &_tsIp2prAsyncActHandle [i]); @@ -2256,7 +2180,7 @@ */ tsKernelTimerInit(&_tsIp2prPathTimer); _tsIp2prPathTimer.run_time = jiffies + TS_IP2PR_PATH_TIMER_INTERVAL; - _tsIp2prPathTimer.function = _tsIp2prPathSweepTimerFunc; + _tsIp2prPathTimer.function = ip2pr_path_sweep_timer_func; _tsIp2prPathTimer.arg = NULL; tsKernelTimerAdd(&_tsIp2prPathTimer); /* @@ -2293,11 +2217,10 @@ error_wait: error: return result; -} /* tsIp2prLinkAddrInit */ +} -/* ========================================================================= */ -/*..tsIp2prLinkAddrCleanup -- cleanup the advertisment caches. */ -s32 tsIp2prLinkAddrCleanup(void) +/* ip2pr_link_addr_cleanup -- cleanup the advertisment caches. */ +s32 ip2pr_link_addr_cleanup(void) { struct ip2pr_path_element *path_elmt; struct ip2pr_ipoib_wait *ipoib_wait; @@ -2336,14 +2259,14 @@ */ while (NULL != (ipoib_wait = _tsIp2prLinkRoot.wait_list)) { - result = - _tsIp2prIpoibWaitDestroy(ipoib_wait, IP2PR_LOCK_NOT_HELD); + result = ip2pr_ipoib_wait_destroy(ipoib_wait, + IP2PR_LOCK_NOT_HELD); TS_EXPECT(MOD_IP2PR, !(0 > result)); } /* while */ while (NULL != (path_elmt = _tsIp2prLinkRoot.path_list)) { - result = _tsIp2prPathElementDestroy(path_elmt); + result = ip2pr_path_element_destroy(path_elmt); TS_EXPECT(MOD_IP2PR, !(0 > result)); } /* while */ /* @@ -2354,14 +2277,13 @@ kmem_cache_destroy(_tsIp2prLinkRoot.user_req); return 0; -} /* tsIp2prLinkAddrCleanup */ +} -/* ========================================================================= */ -/*..tsIp2prCbInternal -- Callback for IP to Path Record Lookup */ -static s32 _tsIp2prCbInternal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, - u32 src_addr, u32 dst_addr, tTS_IB_PORT hw_port, - struct ib_device *ca, struct ib_path_record *path, - void *usr_arg) +/* ip2pr_cb_internal -- Callback for IP to Path Record Lookup */ +static s32 ip2pr_cb_internal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, + u32 src_addr, u32 dst_addr, tTS_IB_PORT hw_port, + struct ib_device *ca, struct ib_path_record *path, + void *usr_arg) { struct ip2pr_user_req *ureq; @@ -2380,11 +2302,10 @@ return (0); } -/* ========================================================================= */ -/*..tsIp2prCbInternal -- Callback for Gid to Path Record Lookup */ -static s32 _tsGid2prCbInternal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, - tTS_IB_PORT hw_port, struct ib_device *ca, - struct ib_path_record *path, void *usr_arg) +/* gid2pr_cb_internal -- Callback for Gid to Path Record Lookup */ +static s32 gid2pr_cb_internal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, + tTS_IB_PORT hw_port, struct ib_device *ca, + struct ib_path_record *path, void *usr_arg) { struct ip2pr_user_req *ureq; @@ -2405,9 +2326,8 @@ return (0); } -/* ========================================================================= */ -/*..tsIp2prUserLookup -- Process a IP to Path Record lookup ioctl request */ -s32 _tsIp2prUserLookup(unsigned long arg) +/* ip2pr_user_lookup -- Process a IP to Path Record lookup ioctl request */ +s32 ip2pr_user_lookup(unsigned long arg) { struct ip2pr_user_req *ureq; struct ip2pr_lookup_param param; @@ -2432,8 +2352,8 @@ ureq->status = 0; sema_init(&ureq->sem, 0); - status = tsIp2prPathRecordLookup(param.dst_addr, 0, 0, 0, - _tsIp2prCbInternal, ureq, &plid); + status = ip2pr_path_record_lookup(param.dst_addr, 0, 0, 0, + ip2pr_cb_internal, ureq, &plid); if (status < 0) { kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); return (-EFAULT); @@ -2441,7 +2361,7 @@ status = down_interruptible(&ureq->sem); if (status) { - tsIp2prPathRecordCancel(plid); + ip2pr_path_record_cancel(plid); kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); return (-EINTR); } @@ -2458,9 +2378,8 @@ return (0); } -/* ========================================================================= */ -/*..tsGid2prUserLookup -- Process a Gid to Path Record lookup ioctl request */ -s32 _tsGid2prUserLookup(unsigned long arg) +/* gid2pr_user_lookup -- Process a Gid to Path Record lookup ioctl request */ +s32 gid2pr_user_lookup(unsigned long arg) { struct ip2pr_user_req *ureq; struct gid2pr_lookup_param param, *upa; @@ -2486,8 +2405,8 @@ ureq->status = 0; sema_init(&ureq->sem, 0); - status = tsGid2prLookup(param.src_gid, param.dst_gid, param.pkey, - _tsGid2prCbInternal, (void *) ureq, &plid); + status = gid2pr_lookup(param.src_gid, param.dst_gid, param.pkey, + gid2pr_cb_internal, (void *) ureq, &plid); if (status < 0) { kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); return (-EFAULT); @@ -2495,7 +2414,7 @@ status = down_interruptible(&ureq->sem); if (status) { - tsGid2prCancel(plid); + gid2pr_cancel(plid); kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); return (-EINTR); } @@ -2514,8 +2433,3 @@ return (0); } - -EXPORT_SYMBOL(tsIp2prPathRecordLookup); -EXPORT_SYMBOL(tsIp2prPathRecordCancel); -EXPORT_SYMBOL(tsGid2prLookup); -EXPORT_SYMBOL(tsGid2prCancel); Index: drivers/infiniband/ulp/ipoib/ip2pr_proc.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_proc.c (revision 622) +++ drivers/infiniband/ulp/ipoib/ip2pr_proc.c (working copy) @@ -26,67 +26,37 @@ static const char _dir_name_root[] = TS_IP2PR_PROC_DIR_NAME; static struct proc_dir_entry *_dir_root = NULL; -extern s32 tsIp2prPathElementTableDump(char *buffer, - s32 max_size, - s32 start_index, - long *end_index); -extern s32 tsIp2prIpoibWaitTableDump(char *buffer, - s32 max_size, - s32 start_index, - long *end_index); -extern s32 tsIp2prProcRetriesRead(char *buffer, - s32 max_size, - s32 start_index, +extern s32 ip2pr_path_element_table_dump(char *buffer, s32 max_size, + s32 start_index, long *end_index); +extern s32 ip2pr_ipoib_wait_table_dump(char *buffer, s32 max_size, + s32 start_index, long *end_index); +extern s32 ip2pr_proc_retries_read(char *buffer, s32 max_size, s32 start_index, + long *end_index); +extern s32 ip2pr_proc_timeout_read(char *buffer, s32 max_size, s32 start_index, + long *end_index); +extern s32 ip2pr_proc_backoff_read(char *buffer, s32 max_size, s32 start_index, + long *end_index); +extern s32 ip2pr_proc_cache_timeout_read(char *buffer, s32 max_size, + s32 start_index, long *end_index); +extern int ip2pr_proc_retries_write(struct file *file, const char *buffer, + unsigned long count, void *pos); +extern int ip2pr_proc_timeout_write(struct file *file, const char *buffer, + unsigned long count, void *pos); +extern int ip2pr_proc_backoff_write(struct file *file, const char *buffer, + unsigned long count, void *pos); +extern int ip2pr_proc_cache_timeout_write(struct file *file, const char *buffer, + unsigned long count, void *pos); +extern int ip2pr_proc_total_req(char *buffer, s32 max_size, s32 start_index, + long *end_index); +extern int ip2pr_proc_arp_timeout(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern s32 tsIp2prProcTimeoutRead(char *buffer, - s32 max_size, - s32 start_index, - long *end_index); -extern s32 tsIp2prProcBackoffRead(char *buffer, - s32 max_size, - s32 start_index, - long *end_index); -extern s32 tsIp2prProcCacheTimeoutRead(char *buffer, - s32 max_size, - s32 start_index, - long *end_index); -extern int tsIp2prProcRetriesWrite(struct file *file, - const char *buffer, - unsigned long count, - void *pos); -extern int tsIp2prProcTimeoutWrite(struct file *file, - const char *buffer, - unsigned long count, - void *pos); -extern int tsIp2prProcBackoffWrite(struct file *file, - const char *buffer, - unsigned long count, - void *pos); -extern int tsIp2prProcCacheTimeoutWrite(struct file *file, - const char *buffer, - unsigned long count, - void *pos); - -extern int tsIp2prProcTotalReq(char *buffer, - s32 max_size, - s32 start_index, - long *end_index); -extern int tsIp2prProcArpTimeout(char *buffer, - s32 max_size, - s32 start_index, +extern int ip2pr_proc_path_timeout(char *buffer, s32 max_size, s32 start_index, + long *end_index); +extern int ip2pr_proc_total_fail(char *buffer, s32 max_size, s32 start_index, long *end_index); -extern int tsIp2prProcPathTimeout(char *buffer, - s32 max_size, - s32 start_index, - long *end_index); -extern int tsIp2prProcTotalFail(char *buffer, - s32 max_size, - s32 start_index, - long *end_index); -/* ========================================================================= */ -/*.._tsIp2prProcReadParse -- read function for the injection table */ -static s32 _tsIp2prProcReadParse(char *page, char **start, off_t offset, +/* ip2pr_proc_read_parse -- read function for the injection table */ +static s32 ip2pr_proc_read_parse(char *page, char **start, off_t offset, s32 count, s32 *eof, void *data) { struct ip2pr_proc_sub_entry *sub_entry = @@ -116,64 +86,63 @@ } /* if */ return size; -} /* _tsIp2prProcReadParse */ +} static struct ip2pr_proc_sub_entry _file_entry_list[TS_IP2PR_PROC_ENTRIES] = { {entry:NULL, type:TS_IP2PR_PROC_ENTRY_ARP_WAIT, name:"arp_wait", - read:tsIp2prIpoibWaitTableDump, + read:ip2pr_ipoib_wait_table_dump, write:NULL}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_PATH_TABLE, name:"path_cache", - read:tsIp2prPathElementTableDump, + read:ip2pr_path_element_table_dump, write:NULL}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_MAX_RETRIES, name:"retries", - read:tsIp2prProcRetriesRead, - write:tsIp2prProcRetriesWrite}, + read:ip2pr_proc_retries_read, + write:ip2pr_proc_retries_write}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_TIMEOUT, name:"timeout", - read:tsIp2prProcTimeoutRead, - write:tsIp2prProcTimeoutWrite}, + read:ip2pr_proc_timeout_read, + write:ip2pr_proc_timeout_write}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_BACKOUT, name:"backoff", - read:tsIp2prProcBackoffRead, - write:tsIp2prProcBackoffWrite}, + read:ip2pr_proc_backoff_read, + write:ip2pr_proc_backoff_write}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_CACHE_TIMEOUT, name:"cache_timeout", - read:tsIp2prProcCacheTimeoutRead, - write:tsIp2prProcCacheTimeoutWrite}, + read:ip2pr_proc_cache_timeout_read, + write:ip2pr_proc_cache_timeout_write}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_TOTAL_REQ, name:"total_req", - read:tsIp2prProcTotalReq, + read:ip2pr_proc_total_req, write:NULL}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_ARP_TIMEOUT, name:"arp_timeout", - read:tsIp2prProcArpTimeout, + read:ip2pr_proc_arp_timeout, write:NULL}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_PATH_TIMEOUT, name:"path_timeout", - read:tsIp2prProcPathTimeout, + read:ip2pr_proc_path_timeout, write:NULL}, {entry:NULL, type:TS_IP2PR_PROC_ENTRY_TOTAL_FAIL, name:"total_fail", - read:tsIp2prProcTotalFail, + read:ip2pr_proc_total_fail, write:NULL} }; -/* ========================================================================= */ -/*..tsIp2prProcFsCleanup -- cleanup the proc filesystem entries */ -s32 tsIp2prProcFsCleanup(void) +/* ip2pr_proc_fs_cleanup -- cleanup the proc filesystem entries */ +s32 ip2pr_proc_fs_cleanup(void) { struct ip2pr_proc_sub_entry *sub_entry; s32 counter; @@ -200,11 +169,10 @@ "PROC: /proc filesystem cleanup complete."); return 0; -} /* tsIp2prProcFsCleanup */ +} -/* ========================================================================= */ -/*..tsIp2prProcFsInit -- initialize the proc filesystem entries */ -s32 tsIp2prProcFsInit(void) +/* ip2pr_proc_fs_init -- initialize the proc filesystem entries */ +s32 ip2pr_proc_fs_init(void) { struct ip2pr_proc_sub_entry *sub_entry; s32 result; @@ -262,7 +230,7 @@ goto error; } /* if */ - sub_entry->entry->read_proc = _tsIp2prProcReadParse; + sub_entry->entry->read_proc = ip2pr_proc_read_parse; sub_entry->entry->write_proc = sub_entry->write; sub_entry->entry->data = sub_entry; sub_entry->entry->owner = THIS_MODULE; @@ -270,6 +238,6 @@ return 0; /* success */ error: - (void)tsIp2prProcFsCleanup(); + (void)ip2pr_proc_fs_cleanup(); return result; -} /* tsIp2prProcFsInit */ +} Index: drivers/infiniband/ulp/ipoib/ip2pr_mod.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_mod.c (revision 622) +++ drivers/infiniband/ulp/ipoib/ip2pr_mod.c (working copy) @@ -27,53 +27,45 @@ MODULE_DESCRIPTION("IB path record lookup module"); MODULE_LICENSE("Dual BSD/GPL"); -extern s32 tsIp2prLinkAddrInit(void - ); -extern s32 tsIp2prLinkAddrCleanup(void - ); -extern s32 _tsIp2prUserLookup(unsigned long arg); -extern s32 _tsGid2prUserLookup(unsigned long arg); -extern s32 tsIp2prProcFsInit(void - ); -extern s32 tsIp2prProcFsCleanup(void - ); -extern s32 tsIp2prSrcGidInit(void - ); -extern s32 tsIp2prSrcGidCleanup(void - ); +extern s32 ip2pr_link_addr_init(void); +extern s32 ip2pr_link_addr_cleanup(void); +extern s32 ip2pr_user_lookup(unsigned long arg); +extern s32 gid2pr_user_lookup(unsigned long arg); +extern s32 ip2pr_proc_fs_init(void); +extern s32 ip2pr_proc_fs_cleanup(void); +extern s32 ip2pr_src_gid_init(void); +extern s32 ip2pr_src_gid_cleanup(void); static int ip2pr_major_number = 240; -static int _tsIp2prOpen(struct inode *inode, struct file *fp); -static int _tsIp2prClose(struct inode *inode, struct file *fp); -static int _tsIp2prIoctl(struct inode *inode, struct file *fp, unsigned int cmd, +static int ip2pr_open(struct inode *inode, struct file *fp); +static int ip2pr_close(struct inode *inode, struct file *fp); +static int ip2pr_ioctl(struct inode *inode, struct file *fp, unsigned int cmd, unsigned long arg); static struct file_operations ip2pr_fops = { .owner = THIS_MODULE, - .ioctl = _tsIp2prIoctl, - .open = _tsIp2prOpen, - .release = _tsIp2prClose, + .ioctl = ip2pr_ioctl, + .open = ip2pr_open, + .release = ip2pr_close, }; -/* ========================================================================= */ -/*..tsIp2prOpen -- Driver Open Entry Point */ -static int _tsIp2prOpen(struct inode *inode, struct file *fp) { +/* ip2pr_open -- Driver Open Entry Point */ +static int ip2pr_open(struct inode *inode, struct file *fp) +{ TS_ENTER(MOD_IP2PR); return 0; } -/* ========================================================================= */ -/*..tsIp2prClose -- Driver Close Entry Point */ -static int _tsIp2prClose(struct inode *inode, struct file *fp) { +/* ip2pr_close -- Driver Close Entry Point */ +static int ip2pr_close(struct inode *inode, struct file *fp) { TS_ENTER(MOD_IP2PR); return 0; } -/* ========================================================================= */ -/*..tsIp2prIoctl -- Driver Ioctl Entry Point */ -static int _tsIp2prIoctl - (struct inode *inode, - struct file *fp, unsigned int cmd, unsigned long arg) { +/* ip2pr_ioctl -- Driver Ioctl Entry Point */ +static int ip2pr_ioctl(struct inode *inode, struct file *fp, unsigned int cmd, + unsigned long arg) +{ int result; if (_IOC_TYPE(cmd) != IP2PR_IOC_MAGIC) { @@ -82,10 +74,10 @@ switch (cmd) { case IP2PR_IOC_LOOKUP_REQ: - result = _tsIp2prUserLookup(arg); + result = ip2pr_user_lookup(arg); break; case GID2PR_IOC_LOOKUP_REQ: - result = _tsGid2prUserLookup(arg); + result = gid2pr_user_lookup(arg); break; default: result = -EINVAL; @@ -94,15 +86,13 @@ return (result); } -/* --------------------------------------------------------------------- */ -/* */ -/* Path Record lookup host module load/unload functions */ -/* */ -/* --------------------------------------------------------------------- */ -/* ========================================================================= */ -/*..prlookup_init -- initialize the PathRecord Lookup host module */ -int __init tsIp2prDriverInitModule(void - ) { +/** + * Path Record lookup host module load/unload functions + */ + +/* ip2pr_driver_init_module -- initialize the PathRecord Lookup host module */ +int __init ip2pr_driver_init_module(void) +{ s32 result = 0; TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, @@ -117,26 +107,26 @@ if (ip2pr_major_number == 0) ip2pr_major_number = result; - result = tsIp2prProcFsInit(); + result = ip2pr_proc_fs_init(); if (0 > result) { TS_REPORT_FATAL(MOD_IP2PR, "Init: Error creating proc entries"); unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME); return (result); } - result = tsIp2prLinkAddrInit(); + result = ip2pr_link_addr_init(); if (0 > result) { TS_REPORT_FATAL(MOD_IP2PR, "Device resource allocation failed"); - (void)tsIp2prProcFsCleanup(); + (void)ip2pr_proc_fs_cleanup(); unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME); return (result); } - result = tsIp2prSrcGidInit(); + result = ip2pr_src_gid_init(); if (0 > result) { TS_REPORT_FATAL(MOD_IP2PR, "Gid resource allocation failed"); - (void)tsIp2prLinkAddrCleanup(); - (void)tsIp2prProcFsCleanup(); + (void)ip2pr_link_addr_cleanup(); + (void)ip2pr_proc_fs_cleanup(); unregister_chrdev(ip2pr_major_number, IP2PR_DEVNAME); return (result); } @@ -144,8 +134,8 @@ return (result); } -static void __exit tsIp2prDriverCleanupModule(void - ) { +static void __exit ip2pr_driver_cleanup_module(void) +{ TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_INOUT, "INIT: Path Record Lookup module load."); @@ -156,17 +146,17 @@ /* * Src Gid Cleanup */ - (void)tsIp2prSrcGidCleanup(); + (void)ip2pr_src_gid_cleanup(); /* * link level addressing services. */ - (void)tsIp2prLinkAddrCleanup(); + (void)ip2pr_link_addr_cleanup(); /* * proc tables */ - (void)tsIp2prProcFsCleanup(); + (void)ip2pr_proc_fs_cleanup(); } -module_init(tsIp2prDriverInitModule); -module_exit(tsIp2prDriverCleanupModule); +module_init(ip2pr_driver_init_module); +module_exit(ip2pr_driver_cleanup_module); Index: drivers/infiniband/ulp/sdp/sdp_event.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_event.c (revision 622) +++ drivers/infiniband/ulp/sdp/sdp_event.c (working copy) @@ -686,7 +686,7 @@ /* * cancel address resolution */ - result = tsIp2prPathRecordCancel(conn->plid); + result = ip2pr_path_record_cancel(conn->plid); TS_EXPECT(MOD_LNX_SDP, !(0 > result)); /* * fall through Index: drivers/infiniband/ulp/sdp/sdp_post.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_post.c (revision 622) +++ drivers/infiniband/ulp/sdp/sdp_post.c (working copy) @@ -265,12 +265,12 @@ TS_SDP_CONN_HOLD(conn); TS_SDP_CONN_UNLOCK(conn); - result = tsIp2prPathRecordLookup(htonl(conn->dst_addr), - htonl(conn->src_addr), - TS_SDP_OS_SK_LOCALROUTE(conn->sk), - TS_SDP_OS_SK_BOUND_IF(conn->sk), - _sdp_ip2pr_path_complete, - conn, &conn->plid); + result = ip2pr_path_record_lookup(htonl(conn->dst_addr), + htonl(conn->src_addr), + TS_SDP_OS_SK_LOCALROUTE(conn->sk), + TS_SDP_OS_SK_BOUND_IF(conn->sk), + _sdp_ip2pr_path_complete, + conn, &conn->plid); TS_SDP_CONN_LOCK(conn); if (0 > result) { -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mshefty at ichips.intel.com Tue Aug 10 10:42:48 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 10:42:48 -0700 Subject: [openib-general] [PATCH] TODO list for access layer Message-ID: <20040810104248.4b39f6e9.mshefty@ichips.intel.com> Added a TODO list for the access layer. Removed TODO statements. One minor update to ib_rmpp_hdr based on feedback from Hal. Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 620) +++ ib_verbs.h (working copy) @@ -787,7 +787,6 @@ return fmr->device->map_phys_fmr(fmr, phys_buf_array, num_phys_buf); } -/* Need to discuss this... */ static inline int ib_unmap_fmr(struct ib_fmr **fmr_array, int fmr_cnt) { Index: ib_mad.h =================================================================== --- ib_mad.h (revision 620) +++ ib_mad.h (working copy) @@ -53,7 +53,7 @@ struct ib_rmpp_hdr { u8 rmpp_version; u8 rmpp_type; - u8 rmpp_flags; + u8 rmpp_rtime_flags; u8 rmpp_status; u32 seg_num; u32 paylen_newwin; @@ -134,11 +134,6 @@ * @remote_qkey - Specifies the qkey used by remote QP. * @pkey_index - Pkey index to use. Required when sending on QP1 only. */ -/* XXX See about using ib_send_wr directly, e.g.: - context -> wr_id - mad_flags -> send_flags - add new timeout_ms field or double use of imm_data -*/ struct ib_mad_send_wr { struct list_head list; void *context; @@ -184,7 +179,6 @@ * * An RMPP receive will be coalesced into a single data buffer. */ -/* XXX revisit possibility of zero-copy receive */ struct ib_mad_recv_wc { void *context; struct ib_grh *grh; @@ -209,9 +203,6 @@ * @method_mask - The caller will receive unsolicited MADs for any method * where @method_mask = 1. */ -/* XXX Need to extend to support snooping - perhaps registration type - with masks for the class, version, methods if type is 'view-only'? -*/ struct ib_mad_reg_req { u8 mgmt_class; u8 mgmt_class_version; @@ -260,7 +251,6 @@ * @mad_agent - Specifies the associated registration to post the send to. * @mad_send_wr - Specifies the information needed to send the MAD. */ -/* XXX Need to define queuing model - above or below API? */ int ib_mad_post_send(struct ib_mad_agent *mad_agent, struct ib_mad_send_wr *mad_send_wr); @@ -280,7 +270,6 @@ * on user-owned QPs. After calling this routine, users may send * MADs on the specified QP by calling ib_mad_post_send. */ -/* XXX Need to define provided features for requestor-side redirecting */ struct ib_mad_agent *ib_mad_qp_redir(struct ib_qp *qp, u8 rmpp_version, ib_mad_send_handler send_handler, Index: TODO =================================================================== --- TODO (revision 0) +++ TODO (revision 0) @@ -0,0 +1,31 @@ +Verbs TODOs: + + - Ensure ib_mod_qp can change all QP parameters - check resize. + - Determine proper value for static_rate - match CM or inter-packet + delay. Can an abstracted value be easier to use? + - Need to define struct ib_mw_bind. + - Optional calls need checks before invoking device driver. + - Migrate non-speed path routines into .c file. + - Add comments for API. + - Should ib_unmap_fmr take fmr_array as input, or just fmr? + What should the restriction on the fmr_array be? All from same + device? + +MAD TODOs: + - Need to define queuing model for ib_mad_post_send. + Should queuing be above or below access layer? + Does this affected starvation? + - Should RMPP sends post one or multiple work requests. + - See about combining ib_mad_send_wr with ib_send_wr. + context is wr_id. + mad_flags can combine with send_flags - useful for redirected QPs. + timeout_ms does not map - could use imm_data or add new field. + - Examine methods to support zero-copy for received RMPP MADs. + - Need to extend ib_mad_reg_req to support snooping. + could use masks for class, version, methods to snoop all. + could add a registration type - receive MADs vs. view MADs. + registration for view MADs would need to apply to sends as well. + - Need to define features provided by access layer for requestor + side redirection. + - Should clients have ability to reserve QP entries (or at least one)? + From mshefty at ichips.intel.com Tue Aug 10 11:04:05 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 11:04:05 -0700 Subject: [openib-general] RNR timer enum? In-Reply-To: <524qna7t5s.fsf@topspin.com> References: <524qna7t5s.fsf@topspin.com> Message-ID: <20040810110405.40313f9c.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 11:19:11 -0700 Roland Dreier wrote: > Do we want something in ib_verbs.h like the below enum to make setting > min_rnr_timer values easier? > > enum ib_rnr_timeout { > IB_RNR_TIMER_655_36 = 0, > IB_RNR_TIMER_000_01 = 1, I think it makes sense to add these. I can copy these into ib_verbs.h... From mshefty at ichips.intel.com Tue Aug 10 11:17:19 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 11:17:19 -0700 Subject: [openib-general] ib_get_special_qp In-Reply-To: <52vffq6dus.fsf@topspin.com> References: <52vffq6dus.fsf@topspin.com> Message-ID: <20040810111719.1e7398d5.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 11:35:07 -0700 Roland Dreier wrote: > Is there any point in having the > > enum ib_qp_type qp_type, > > parameter to ib_get_special_qp(), since struct ib_qp_init_attr has > > enum ib_qp_type qp_type; > > anyway? I think this is just a remnant from having separated qp_type and spl_qp_type. I will remove. > (In fact if we're willing to add a port member to struct > ib_qp_init_attr, we could get rid of ib_get_special_qp() and just have > ib_create_qp() handle special QPs as well, although I'm not it's worth > it since both the low-level driver code and the ULP code for special > QPs is probably pretty different from the code from regular QPs) Hmm... I think this may be worth doing. At least from the access layer perspective, there's likely to be overlap in the code (if not exactly the same). Plus, I'm not sure that we want to expose QP0/1 above the access layer. (Does anything use the raw QP types, or ever plan on using them?) From halr at voltaire.com Tue Aug 10 12:27:37 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 10 Aug 2004 15:27:37 -0400 Subject: [openib-general] ib_get_special_qp In-Reply-To: <20040810111719.1e7398d5.mshefty@ichips.intel.com> References: <52vffq6dus.fsf@topspin.com> <20040810111719.1e7398d5.mshefty@ichips.intel.com> Message-ID: <1092166059.1840.93.camel@localhost.localdomain> On Tue, 2004-08-10 at 14:17, Sean Hefty wrote: Does anything use the raw QP types, or ever plan on using them? Raw datagram transports (ethertype and IPv6) are orphaned parts of IBA. I see no need for their support in OpenIB. -- Hal From mshefty at ichips.intel.com Tue Aug 10 11:28:18 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 11:28:18 -0700 Subject: [openib-general] remote_atomic_flags? In-Reply-To: <52r7qe6dkk.fsf@topspin.com> References: <52r7qe6dkk.fsf@topspin.com> Message-ID: <20040810112818.02d71637.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 11:41:15 -0700 Roland Dreier wrote: > What is supposed to be filled in the remote_atomic_flags member of > struct ib_qp_attr? Would it make sense to make the type of that field > be an enum instead of just int (so it's a little bit more > self-documenting)? Something is missing there. I believe this is supposed to indicate whether RDMA reads, writes, and/or atomics are enabled. I think flags (or a bit field) matches the rest of the API, but at least an enum is needed for the flag settings. I will update. From mshefty at ichips.intel.com Tue Aug 10 11:36:59 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 11:36:59 -0700 Subject: [openib-general] ib_get_special_qp In-Reply-To: <1092166059.1840.93.camel@localhost.localdomain> References: <52vffq6dus.fsf@topspin.com> <20040810111719.1e7398d5.mshefty@ichips.intel.com> <1092166059.1840.93.camel@localhost.localdomain> Message-ID: <20040810113659.32faf503.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 15:27:37 -0400 Hal Rosenstock wrote: > On Tue, 2004-08-10 at 14:17, Sean Hefty wrote: > Does anything use the raw QP types, or ever plan on using them? > > Raw datagram transports (ethertype and IPv6) are orphaned parts of IBA. > I see no need for their support in OpenIB. Unless someone stands up, I'll remove these types. I'll also combine ib_get_spl_qp and ib_create_qp together at the access layer. From mshefty at ichips.intel.com Tue Aug 10 12:05:10 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 12:05:10 -0700 Subject: [openib-general] [PATCH] QP updates Message-ID: <20040810120510.0f763400.mshefty@ichips.intel.com> Patch updates ib_verbs.h for QP changes: - removes ib_get_spl_qp - combines with ib_create_qp - Added RNR timer enum. - Renames qp_remote_atomic_flags to qp_access_flags (similar to MR) - Adds enum for qp_access_flags - Fixes some misnamed fields in struct ib_device. Not yet committed. Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 623) +++ ib_verbs.h (working copy) @@ -256,7 +256,7 @@ u8 init_type_reply; }; -enum ib_device_attr_flags { +enum ib_device_attr_flags { IB_DEVICE_SM = 1, IB_DEVICE_SNMP_TUN_SUP = (1<<1), IB_DEVICE_DM_SUP = (1<<2), @@ -322,12 +322,19 @@ enum ib_sig_type sq_sig_type; enum ib_sig_type rq_sig_type; enum ib_qp_type qp_type; + u8 port_num; /* special QP types only */ +}; + +enum ib_qp_access_flags { + IB_QP_REMOTE_WRITE = (1<<1), + IB_QP_REMOTE_READ = (1<<2), + IB_QP_REMOTE_ATOMIC = (1<<3) }; enum ib_qp_attr_mask { IB_QP_STATE = 1, IB_QP_EN_SQD_ASYNC_NOTIFY = (1<<1), - IB_QP_REMOTE_ATOMIC_FLAGS = (1<<3), + IB_QP_ACCESS_FLAGS = (1<<3), IB_QP_PKEY_INDEX = (1<<4), IB_QP_PORT = (1<<5), IB_QP_QKEY = (1<<6), @@ -363,6 +370,41 @@ IB_MIG_ARMED }; +enum ib_rnr_timeout { + IB_RNR_TIMER_655_36 = 0, + IB_RNR_TIMER_000_01 = 1, + IB_RNR_TIMER_000_02 = 2, + IB_RNR_TIMER_000_03 = 3, + IB_RNR_TIMER_000_04 = 4, + IB_RNR_TIMER_000_06 = 5, + IB_RNR_TIMER_000_08 = 6, + IB_RNR_TIMER_000_12 = 7, + IB_RNR_TIMER_000_16 = 8, + IB_RNR_TIMER_000_24 = 9, + IB_RNR_TIMER_000_32 = 10, + IB_RNR_TIMER_000_48 = 11, + IB_RNR_TIMER_000_64 = 12, + IB_RNR_TIMER_000_96 = 13, + IB_RNR_TIMER_001_28 = 14, + IB_RNR_TIMER_001_92 = 15, + IB_RNR_TIMER_002_56 = 16, + IB_RNR_TIMER_003_84 = 17, + IB_RNR_TIMER_005_12 = 18, + IB_RNR_TIMER_007_68 = 19, + IB_RNR_TIMER_010_24 = 20, + IB_RNR_TIMER_015_36 = 21, + IB_RNR_TIMER_020_48 = 22, + IB_RNR_TIMER_030_72 = 23, + IB_RNR_TIMER_040_96 = 24, + IB_RNR_TIMER_061_44 = 25, + IB_RNR_TIMER_081_92 = 26, + IB_RNR_TIMER_122_88 = 27, + IB_RNR_TIMER_163_84 = 28, + IB_RNR_TIMER_245_76 = 29, + IB_RNR_TIMER_327_68 = 30, + IB_RNR_TIMER_491_52 = 31 +}; + struct ib_qp_attr { enum ib_qp_state qp_state; enum ib_mtu path_mtu; @@ -371,7 +413,7 @@ u32 rq_psn; u32 sq_psn; u32 dest_qp_num; - int remote_atomic_flags; + int qp_access_flags; struct ib_qp_cap cap; struct ib_ah_attr ah_attr; struct ib_ah_attr alt_ah_attr; @@ -652,16 +694,6 @@ return qp->device->destroy_qp(qp); } -static inline struct ib_qp *ib_get_special_qp(struct ib_pd *pd, - u8 port_num, - enum ib_qp_type qp_type, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap) -{ - return pd->device->get_special_qp(pd, port_num, qp_type, - qp_init_attr, qp_cap); -} - static inline struct ib_srq *ib_create_srq(struct ib_pd *pd, void *srq_context, struct ib_srq_attr *srq_attr) @@ -897,18 +929,18 @@ u8 port_num, u16 index, u16 *pkey); int (*modify_device)(struct ib_device *device, u8 port_num, int device_attr_flags); - struct ib_pd (*ib_alloc_pd)(struct ib_device *device); + struct ib_pd (*alloc_pd)(struct ib_device *device); int (*dealloc_pd)(struct ib_pd *pd); - struct ib_ah (*ib_create_ah)(struct ib_pd *pd, - struct ib_ah_attr *ah_attr); + struct ib_ah (*create_ah)(struct ib_pd *pd, + struct ib_ah_attr *ah_attr); int (*modify_ah)(struct ib_ah *ah, struct ib_ah_attr *ah_attr); int (*query_ah)(struct ib_ah *ah, struct ib_ah_attr *ah_attr); int (*destroy_ah)(struct ib_ah *ah); - struct ib_qp (*ib_create_qp)(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); + struct ib_qp (*create_qp)(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap); int (*modify_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, @@ -918,14 +950,9 @@ int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr); int (*destroy_qp)(struct ib_qp *qp); - struct ib_qp (*ib_get_special_qp)(struct ib_pd *pd, - u8 port_num, - enum ib_qp_type qp_type, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); - struct ib_srq (*ib_create_srq)(struct ib_pd *pd, - void *srq_context, - struct ib_srq_attr *srq_attr); + struct ib_srq (*create_srq)(struct ib_pd *pd, + void *srq_context, + struct ib_srq_attr *srq_attr); int (*query_srq)(struct ib_srq *srq, struct ib_srq_attr *srq_attr); int (*modify_srq)(struct ib_srq *srq, @@ -936,16 +963,16 @@ int (*post_srq)(struct ib_srq *srq, struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr); - struct ib_cq (*ib_create_cq)(struct ib_device *device, - ib_comp_handler comp_handler, - void *cq_context, int cqe); + struct ib_cq (*create_cq)(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, int cqe); int (*resize_cq)(struct ib_cq *cq, int cqe); int (*destroy_cq)(struct ib_cq *cq); - struct ib_mr (*ib_reg_phys_mr)(struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start); + struct ib_mr (*reg_phys_mr)(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); int (*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int (*dereg_mr)(struct ib_mr *mr); @@ -956,14 +983,14 @@ int num_phys_buf, int mr_access_flags, u64 *iova_start); - struct ib_mw (*ib_alloc_mw)(struct ib_pd *pd); + struct ib_mw (*alloc_mw)(struct ib_pd *pd); int (*bind_mw)(struct ib_qp *qp, struct ib_mw *mw, struct ib_mw_bind *mw_bind); int (*dealloc_mw)(struct ib_mw *mw); - struct ib_fmr (*ib_alloc_fmr)(struct ib_pd *pd, - int mr_access_flags, - struct ib_fmr_attr *fmr_attr); + struct ib_fmr (*alloc_fmr)(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr); int (*map_fmr)(struct ib_fmr *fmr, void *addr, u64 size); int (*map_phys_fmr)(struct ib_fmr *fmr, struct ib_phys_buf *phys_buf_array, From roland at topspin.com Tue Aug 10 13:19:41 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 13:19:41 -0700 Subject: [openib-general] [PATCH] QP updates In-Reply-To: <20040810120510.0f763400.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 10 Aug 2004 12:05:10 -0700") References: <20040810120510.0f763400.mshefty@ichips.intel.com> Message-ID: <52brhi690i.fsf@topspin.com> Looks good to me (I merged it into my WIP tree of QP API work) - R. From halr at voltaire.com Tue Aug 10 13:46:03 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 10 Aug 2004 16:46:03 -0400 Subject: [openib-general] Coupla ib_verbs.h nits Message-ID: <1092170765.1833.3.camel@localhost.localdomain> Add forward declaration for struct ib_mad Also add define for IB_DEVICE_NAME_MAX -- Hal From greg at kroah.com Tue Aug 10 14:39:06 2004 From: greg at kroah.com (Greg KH) Date: Tue, 10 Aug 2004 14:39:06 -0700 Subject: [openib-general] Re: device classes In-Reply-To: <523c32ifz2.fsf@topspin.com> References: <524qnjjvp7.fsf@topspin.com> <20040804225526.GB11004@kroah.com> <523c32ifz2.fsf@topspin.com> Message-ID: <20040810213906.GA8198@kroah.com> On Wed, Aug 04, 2004 at 05:30:25PM -0700, Roland Dreier wrote: > Greg> A class? Look at the struct class_interface code, > Greg> specifically class_interface_register() and > Greg> class_interface_unregister() functions. If you call them, > Greg> then the function pointers you pass in for the add and > Greg> remove functions in the struct class_interface will get > Greg> called for every struct class_device that is added or > Greg> removed for that struct class. > > Hmm, that could work, if the core creates "infiniband_device" and > "infiniband_port" classes. But is it kosher to use the > class_device.dev pointer to go back up to the actual device (and then > create a new class_device)? I don't see why not. > Greg> No, that sounds about right. IPoIB would be a struct class. > Greg> Hm, but then you want that class code to be called whenever > Greg> a struct device (really a ib_device) is added to the system. > Greg> That's not built in anywhere, you'll have to either add some > Greg> custom code, or I need to get off my butt and create a > Greg> struct bus_interface chunk of code that will work like > Greg> struct class_interface. Would that help out here? > > Not sure -- it seems like bus_interface would be per bus (not per "bus > type"), so it's not quite what we want. IPoIB wants to get notified > whenever a device is added to any of the "virtual infiniband" buses we > talked about. Then you might just want to stick with an internal notifier that you have full control over. > At a high level here are the things I think we want to happen (exact > function names below aren't important, I'm just trying to come up with > placeholders): > > - when a low-level driver (eg mthca) finds an HCA, it calls > ib_register_device(), which creates a virtual bus rooted at the PCI > device and adds new virtual devices for the HCA and each of its > ports (in part to be able to put both global HCA attributes and > also per-port attributes in sysfs) Sounds good. > - when an ULP (IPoIB, SDP, etc) is loaded, it calls ib_register_ulp() > This will trigger callbacks for all of the HCA devices that already > exist (if the ULP is loaded after the LLD) and also when new HCA > devices are added. Also sounds reasonable. You might have to do this in ib specific code however. > - when a low-level driver is unloaded or an HCA is hot-removed, it > calls ib_unregister_device(), which calls every ULP's remove() method. > - when a ULP is unloaded, it calls ib_unregister_ulp(), which calls > the ULP's remove() method for every HCA in the system. > > The question is how to make this happen within the device model. One > way would be to use (abuse?) the class_interface stuff by having > ib_register_device() create virtual devices and then create > class_devices for each virtual device. That might work. > Then the ULPs would get called back by the class_interface stuff, and > then follow the dev pointer back to the original struct device and > create their own class_devices. There may be a better way... I think you need to start out slow and work into it. Don't try to design it all ahead of time. Try getting the struct device stuff and the bus working first :) thanks, greg k-h (p.s. please cc me on any questions you want me to answer, as I'm not on the openib mailing lists anymore.) From greg at kroah.com Tue Aug 10 14:39:47 2004 From: greg at kroah.com (Greg KH) Date: Tue, 10 Aug 2004 14:39:47 -0700 Subject: [openib-general] Re: device classes In-Reply-To: <20040805030301.GA3541@cup.hp.com> References: <524qnjjvp7.fsf@topspin.com> <20040804225526.GB11004@kroah.com> <20040805030301.GA3541@cup.hp.com> Message-ID: <20040810213947.GB8198@kroah.com> On Wed, Aug 04, 2004 at 08:03:01PM -0700, Grant Grundler wrote: > On Wed, Aug 04, 2004 at 03:55:26PM -0700, Greg KH wrote: > > Hm, but > > then you want that class code to be called whenever a struct device > > (really a ib_device) is added to the system. > > Would the device driver have to advertise that it supports the class? Yes, in a way. Just like the input or tty drivers "advertise" that they are input or tty drivers. thanks, greg k-h From halr at voltaire.com Tue Aug 10 14:54:10 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 10 Aug 2004 17:54:10 -0400 Subject: [openib-general] [PATCH] GSI: GRH support for normal MADs Message-ID: <1092174852.1807.7.camel@localhost.localdomain> Added GRH support for normal MADs Index: access/TODO =================================================================== --- access/TODO (revision 605) +++ access/TODO (working copy) @@ -1,11 +1,12 @@ -8/9/04 +8/10/04 -Add support for (at least) responses to requests with GRH Remove #if 0/1 with suitable preprocessor symbols Replace ib_reg_mr with ib_reg_phys_mr Eliminate static limit on numbers of ports/HCAs Makefile needs to use standard kbuild Migrate from /proc to /sysfs +Obtain proper SGID index for GRH support (low priority) +Add GRH support for RMPP (low priority) Static rate handling (low priority) Update API to proposed openib GSI interface Index: access/gsi_main.c =================================================================== --- access/gsi_main.c (revision 622) +++ access/gsi_main.c (working copy) @@ -1680,6 +1680,7 @@ dtgrm_priv->path_bits = wc->dlid_path_bits; dtgrm_priv->sl = wc->sl; dtgrm_priv->pkey_index = wc->pkey_index; + dtgrm_priv->grh_flag = wc->grh_flag; mad_swap_header(mad); if (gsi_post_send_plm_reply_mad @@ -1700,6 +1701,7 @@ dtgrm_priv->path_bits = wc->dlid_path_bits; dtgrm_priv->sl = wc->sl; dtgrm_priv->pkey_index = wc->pkey_index; + dtgrm_priv->grh_flag = wc->grh_flag; printk(KERN_DEBUG \ "Received datagram - remote QP num-%d, LID-%d, path bits- %d, SL - %d\n", @@ -1749,6 +1751,7 @@ struct ib_class_port_info_t class_port_info; int redirect = FALSE; struct gsi_redirect_info_st *redirect_info; + struct ib_grh *grh; GSI_REDIR_LIST_LOCK_VAR; if (!dtgrm_priv->mad.hdr.m.ms.r) { @@ -1768,10 +1771,8 @@ GSI_REDIR_LIST_UNLOCK(class_info); } - addr_vec->grh_flag = 0; - memset((char *) &addr_vec->grh.dgid, 0, sizeof (addr_vec->grh.dgid)); + addr_vec->grh_flag = dtgrm_priv->grh_flag; addr_vec->grh.sgid_index = 0; - addr_vec->grh.hop_limit = GSI_HOP_LIMIT; addr_vec->static_rate = GSI_MAX_STATIC_RATE; addr_vec->src_path_bits = GSI_SOURCE_PATH_BIT; addr_vec->port = class_info->hca->port; @@ -1782,8 +1783,10 @@ class_port_info.redirect_qp, class_port_info.redirect_q_key); + memcpy(&addr_vec->grh.dgid, &class_port_info.redirect_gid, sizeof (addr_vec->grh.dgid)); addr_vec->dlid = class_port_info.redirect_lid; addr_vec->sl = class_port_info.redirect_sl; + addr_vec->grh.hop_limit = GSI_HOP_LIMIT; addr_vec->grh.traffic_class = class_port_info.redirect_tc; addr_vec->grh.flow_label = class_port_info.redirect_fl; dtgrm_priv->rqp = class_port_info.redirect_qp; @@ -1791,8 +1794,18 @@ } else { addr_vec->dlid = dtgrm_priv->rlid; addr_vec->sl = GSI_SL; - addr_vec->grh.traffic_class = GSI_TRAFIC_CLASS; - addr_vec->grh.flow_label = GSI_FLOW_LABEL; + if (addr_vec->grh_flag) { + grh = (struct ib_grh *)dtgrm_priv->grh; + memcpy(&addr_vec->grh.dgid, &grh->destination_gid, sizeof(addr_vec->grh.dgid)); + addr_vec->grh.hop_limit = grh->hop_limit; + addr_vec->grh.traffic_class = grh->traffic_class; + addr_vec->grh.flow_label = grh->flow_label; + } else { + memset((char *) &addr_vec->grh.dgid, 0, sizeof (addr_vec->grh.dgid)); + addr_vec->grh.hop_limit = GSI_HOP_LIMIT; + addr_vec->grh.traffic_class = GSI_TRAFIC_CLASS; + addr_vec->grh.flow_label = GSI_FLOW_LABEL; + } dtgrm_priv->rqp = 1; dtgrm_priv->r_q_key = GSI_QP1_WELL_KNOWN_Q_KEY; } @@ -1822,20 +1835,19 @@ mad->hdr.class_ver = class_info->version; #if 0 mad->hdr.m.ms.method-- - Set by caller - mad->hdr.m.ms.r-- - Response bit - set by caller + mad->hdr.m.ms.r-- - Response bit - set by caller #endif - /* - * For response - the client ID is taken from the request datagram. - * It is may be set by the caller or the same received datagram is used - * for response. - */ - if (!mad->hdr.m.ms.r) { + /* + * For response - the client ID is taken from the request datagram. + * It is may be set by the caller or the same received datagram + * is used for response. + */ + if (!mad->hdr.m.ms.r) { /* - * For Microsoft CM compatability: + * For Microsoft CM compatibility: * they require the same transaction id in CM negotiation, - * although response bit is not set in Send() methods. - * Caller may set transaction id or - * set zero and let gsi to do it. + * although response bit is not set in Send() method. + * Caller may set transaction id or set zero and let gsi do it. */ if (!mad->hdr.transact_id) { ((struct gsi_tid_st *) &mad->hdr.transact_id)-> @@ -2013,19 +2025,28 @@ int ret = 0; struct gsi_dtgrm_priv_st *dtgrm_priv = (struct gsi_dtgrm_priv_st *) dtgrm; + struct ib_grh *grh; - addr_vec.grh_flag = 0; - memset((char *) &addr_vec.grh.dgid, 0, sizeof (addr_vec.grh.dgid)); + addr_vec.grh_flag = dtgrm_priv->grh_flag; addr_vec.grh.sgid_index = 0; - addr_vec.grh.hop_limit = GSI_HOP_LIMIT; addr_vec.static_rate = GSI_MAX_STATIC_RATE; addr_vec.src_path_bits = GSI_SOURCE_PATH_BIT; addr_vec.port = hca->port; addr_vec.dlid = dtgrm_priv->rlid; addr_vec.sl = GSI_SL; - addr_vec.grh.traffic_class = GSI_TRAFIC_CLASS; - addr_vec.grh.flow_label = GSI_FLOW_LABEL; + if (addr_vec.grh_flag) { + grh = (struct ib_grh *)dtgrm_priv->grh; + memcpy(&addr_vec.grh.dgid, &grh->source_gid, sizeof(addr_vec.grh.dgid)); + addr_vec.grh.hop_limit = grh->hop_limit; + addr_vec.grh.traffic_class = grh->traffic_class; + addr_vec.grh.flow_label = grh->flow_label; + } else { + memset((char *) &addr_vec.grh.dgid, 0, sizeof (addr_vec.grh.dgid)); + addr_vec.grh.hop_limit = GSI_HOP_LIMIT; + addr_vec.grh.traffic_class = GSI_TRAFIC_CLASS; + addr_vec.grh.flow_label = GSI_FLOW_LABEL; + } dtgrm_priv->rqp = 1; dtgrm_priv->r_q_key = GSI_QP1_WELL_KNOWN_Q_KEY; @@ -2117,7 +2138,7 @@ rmpp_mad->dir_switch_needed = dtgrm_priv->rmpp_dir_switch_needed; - /* NOTE: GRH not supported yet! */ + /* NOTE: GRH not supported yet for RMPP! */ rmpp_mad->grh_valid = FALSE; rmpp_mad->pkey_index = dtgrm_priv->pkey_index; @@ -2139,7 +2160,7 @@ rmpp_mad->pkey_index = dtgrm_priv->pkey_index; rmpp_mad->dir_switch_needed = dtgrm_priv->rmpp_dir_switch_needed; - /* NOTE: GRH not supported yet! */ + /* NOTE: GRH not supported yet for RMPP! */ rmpp_mad->grh_valid = FALSE; rmpp_mad->context = dtgrm_priv; Index: access/gsi.h =================================================================== --- access/gsi.h (revision 599) +++ access/gsi.h (working copy) @@ -110,6 +110,7 @@ u16 rlid; u16 pkey_index; u8 sl; + int grh_flag:1; u64 guid; u8 path_bits; Index: access/gsi_priv.h =================================================================== --- access/gsi_priv.h (revision 596) +++ access/gsi_priv.h (working copy) @@ -145,6 +145,7 @@ u16 rlid; u16 pkey_index; u8 sl; + int grh_flag:1; u64 guid; u8 path_bits; Index: include/ib_core_types.h =================================================================== --- include/ib_core_types.h (revision 562) +++ include/ib_core_types.h (working copy) @@ -34,4 +34,15 @@ struct ib_device * device, int event); struct list_head list; }; + +struct ib_grh { + u8 ip_version; + u8 traffic_class; + u32 flow_label; + u16 next_header; + u8 hop_limit; + union ib_gid source_gid; + union ib_gid destination_gid; +}; + #endif /* _IB_CORE_TYPES_H */ From roland at topspin.com Tue Aug 10 15:12:09 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 15:12:09 -0700 Subject: [openib-general] Re: device classes In-Reply-To: <20040810213906.GA8198@kroah.com> (Greg KH's message of "Tue, 10 Aug 2004 14:39:06 -0700") References: <524qnjjvp7.fsf@topspin.com> <20040804225526.GB11004@kroah.com> <523c32ifz2.fsf@topspin.com> <20040810213906.GA8198@kroah.com> Message-ID: <527js663t2.fsf@topspin.com> Greg> I think you need to start out slow and work into it. Don't Greg> try to design it all ahead of time. Try getting the struct Greg> device stuff and the bus working first :) Agreed... I've been thinking about it some more, and now that I know about struct class_interface, I actually don't see a need to create a virtual bus with virtual devices. I think it should work just to create an "infiniband" class, and have IPoIB et al register class_interfaces for this class. Then low-level drivers will register class_devices in the infiniband class so IPoIB etc. get notified when devices show up or leave. IPoIB etc can create their own class_devices (in their own class). Hierarchical classes might make it neater but it all seems to work in my head.... - R. From mshefty at ichips.intel.com Tue Aug 10 15:20:43 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 15:20:43 -0700 Subject: [openib-general] [PATCH] QP updates In-Reply-To: <20040810120510.0f763400.mshefty@ichips.intel.com> References: <20040810120510.0f763400.mshefty@ichips.intel.com> Message-ID: <20040810152043.5d49b5ee.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 12:05:10 -0700 Sean Hefty wrote: > Patch updates ib_verbs.h for QP changes: Patch has been applied. Updated to include Hal's comments (forward declaration of ib_mad and IB_DEVICE_NAME_MAX defined). - Sean From mshefty at ichips.intel.com Tue Aug 10 16:20:36 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 16:20:36 -0700 Subject: [openib-general] [PATCH] defined struct ib_mw_bind Message-ID: <20040810162036.36bf3832.mshefty@ichips.intel.com> Here's a patch that defines ib_mw_bind. Rather than defining yet another set of flags, I combined the MR, QP, and MW memory access flags together. I will submit a second patch to update Roland's tree, to keep mthca in sync with this change. The change has not been submitted yet. - Sean -- Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 626) +++ ib_verbs.h (working copy) @@ -326,10 +326,23 @@ u8 port_num; /* special QP types only */ }; -enum ib_qp_access_flags { - IB_QP_REMOTE_WRITE = (1<<1), - IB_QP_REMOTE_READ = (1<<2), - IB_QP_REMOTE_ATOMIC = (1<<3) +/** + * ib_access_flags - Memory access flags for memory regions, memory windows, + * and queue pairs. + * @IB_ACCESS_REMOTE_WRITE - Enabled for remote RDMA writes. + * @IB_ACCESS_REMOTE_READ - Enabled for remote RDMA reads. + * @IB_ACCESS_REMOTE_ATOMIC - Enabled for remote atomic operations. + * @IB_ACCESS_LOCAL_WRITE - Enabled for write access by the local device. + * Applies to memory regions only. + * @IB_ACCESS_MW_BIND - Enabled for memory window bind operations. Applies + * to memory regions only. + */ +enum ib_access_flags { + IB_ACCESS_REMOTE_WRITE = 1, + IB_ACCESS_REMOTE_READ = (1<<1), + IB_ACCESS_REMOTE_ATOMIC = (1<<2), + IB_ACCESS_LOCAL_WRITE = (1<<3), + IB_ACCESS_MW_BIND = (1<<4) }; enum ib_qp_attr_mask { @@ -446,14 +459,6 @@ int srq_limit; }; -enum ib_mr_access_flags { - IB_MR_LOCAL_WRITE = 1, - IB_MR_REMOTE_WRITE = (1<<1), - IB_MR_REMOTE_READ = (1<<2), - IB_MR_REMOTE_ATOMIC = (1<<3), - IB_MR_MW_BIND = (1<<4) -}; - struct ib_phys_buf { u64 addr; u64 size; @@ -471,7 +476,14 @@ IB_MR_REREG_ACCESS = (1<<2) }; -struct ib_mw_bind; +struct ib_mw_bind { + struct *ib_mr; + u64 wr_id; + u64 addr; + u32 length; + int send_flags; + int mw_access_flags; +}; struct ib_fmr_attr { int max_pages; From mshefty at ichips.intel.com Tue Aug 10 16:27:01 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 10 Aug 2004 16:27:01 -0700 Subject: [openib-general] [PATCH] defined struct ib_mw_bind In-Reply-To: <20040810162036.36bf3832.mshefty@ichips.intel.com> References: <20040810162036.36bf3832.mshefty@ichips.intel.com> Message-ID: <20040810162701.0ec7352f.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 16:20:36 -0700 Sean Hefty wrote: > Here's a patch that defines ib_mw_bind. Rather than defining yet another set of flags, I combined the MR, QP, and MW memory access flags together. I will submit a second patch to update Roland's tree, to keep mthca in sync with this change. Patch for mthca files (or whatever was under linux-kernel/infiniband). - Sean Index: ulp/ipoib/ipoib_verbs.c =================================================================== --- ulp/ipoib/ipoib_verbs.c (revision 626) +++ ulp/ipoib/ipoib_verbs.c (working copy) @@ -227,7 +227,7 @@ priv->mr = ib_reg_phys_mr(priv->pd, &buffer_list, 1, /* list_len */ - IB_MR_LOCAL_WRITE, + IB_ACCESS_LOCAL_WRITE, &dummy_iova); if (IS_ERR(priv->mr)) { TS_REPORT_FATAL(MOD_IB_NET, Index: ulp/srp/srp_host.c =================================================================== --- ulp/srp/srp_host.c (revision 626) +++ ulp/srp/srp_host.c (working copy) @@ -606,8 +606,8 @@ target->srp_pkt_data_mhndl[hca_index] = ib_reg_phys_mr(hca->pd_hndl, &buffer_list, 1, /*list_len */ - IB_MR_LOCAL_WRITE | - IB_MR_REMOTE_READ, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_READ, &iova); if (IS_ERR(target->srp_pkt_data_mhndl[hca_index])) { Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 626) +++ include/ib_verbs.h (working copy) @@ -125,12 +125,12 @@ IB_CQ_NEXT_COMP }; -enum ib_mr_access_flags { - IB_MR_LOCAL_WRITE = 1, - IB_MR_REMOTE_WRITE = (1<<1), - IB_MR_REMOTE_READ = (1<<2), - IB_MR_REMOTE_ATOMIC = (1<<3), - IB_MR_MW_BIND = (1<<4) +enum ib_access_flags { + IB_ACCESS_LOCAL_WRITE = 1, + IB_ACCESS_REMOTE_WRITE = (1<<1), + IB_ACCESS_REMOTE_READ = (1<<2), + IB_ACCESS_REMOTE_ATOMIC = (1<<3), + IB_ACCESS_MW_BIND = (1<<4) }; struct ib_phys_buf { Index: core/mad_main.c =================================================================== --- core/mad_main.c (revision 626) +++ core/mad_main.c (working copy) @@ -58,7 +58,7 @@ }; *mr = ib_reg_phys_mr(pd, &buffer_list, 1, /* list_len */ - IB_MR_LOCAL_WRITE, &iova); + IB_ACCESS_LOCAL_WRITE, &iova); if (IS_ERR(*mr)) { TS_REPORT_WARN(MOD_KERNEL_IB, "ib_reg_phys_mr failed " Index: hw/mthca/mthca_provider.c =================================================================== --- hw/mthca/mthca_provider.c (revision 626) +++ hw/mthca/mthca_provider.c (working copy) @@ -495,10 +495,10 @@ page_list[n++] = buffer_list[i].addr + ((u64) j << shift); access = - (acc & IB_MR_REMOTE_ATOMIC ? MTHCA_MPT_FLAG_ATOMIC : 0) | - (acc & IB_MR_REMOTE_WRITE ? MTHCA_MPT_FLAG_REMOTE_WRITE : 0) | - (acc & IB_MR_REMOTE_READ ? MTHCA_MPT_FLAG_REMOTE_READ : 0) | - (acc & IB_MR_LOCAL_WRITE ? MTHCA_MPT_FLAG_LOCAL_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_ATOMIC ? MTHCA_MPT_FLAG_ATOMIC : 0) | + (acc & IB_ACCESS_REMOTE_WRITE ? MTHCA_MPT_FLAG_REMOTE_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? MTHCA_MPT_FLAG_REMOTE_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? MTHCA_MPT_FLAG_LOCAL_WRITE : 0) | MTHCA_MPT_FLAG_LOCAL_READ; mthca_dbg(to_mdev(pd->device), "Registering memory at %llx (iova %llx) " From roland at topspin.com Tue Aug 10 18:02:04 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 18:02:04 -0700 Subject: [openib-general] [PATCH] defined struct ib_mw_bind In-Reply-To: <20040810162701.0ec7352f.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 10 Aug 2004 16:27:01 -0700") References: <20040810162036.36bf3832.mshefty@ichips.intel.com> <20040810162701.0ec7352f.mshefty@ichips.intel.com> Message-ID: <523c2u5vxv.fsf@topspin.com> Thanks, this is in my WIP tree (will commit once I have the QP stuff done). - R. From roland at topspin.com Tue Aug 10 18:06:33 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 10 Aug 2004 18:06:33 -0700 Subject: [openib-general] [PATCH] defined struct ib_mw_bind In-Reply-To: <20040810162036.36bf3832.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 10 Aug 2004 16:20:36 -0700") References: <20040810162036.36bf3832.mshefty@ichips.intel.com> Message-ID: <52y8km4h5y.fsf@topspin.com> Looks reasonable -- kernel-doc-nano-HOWTO.txt says that constants should be marked with '%' instead of '@' though. However I'm not sure how kernel-doc handles enum declarations (since it mostly seems to be for documenting functions). - Roland From halr at voltaire.com Tue Aug 10 18:40:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 10 Aug 2004 21:40:26 -0400 Subject: [openib-general] [PATCH]: GSI Use ib_create_qp rather than ib_get_special_qp Message-ID: <1092188428.1840.5.camel@localhost.localdomain> Use ib_create_qp rather than ib_get_special_qp Index: access/gsi_main.c =================================================================== --- access/gsi_main.c (revision 624) +++ access/gsi_main.c (working copy) @@ -732,20 +732,19 @@ qp_init_attr.recv_cq = hca->cq; qp_init_attr.sq_sig_type = IB_SIGNAL_ALL_WR; qp_init_attr.rq_sig_type = IB_SIGNAL_ALL_WR; - qp_init_attr.qp_type = IB_QPT_GSI; qp_init_attr.cap.max_send_wr = GSI_QP_SND_SIZE; qp_init_attr.cap.max_recv_wr = GSI_QP_RCV_SIZE; qp_init_attr.cap.max_send_sge = GSI_SND_REQ_MAX_SG; qp_init_attr.cap.max_recv_sge = GSI_RCV_REQ_MAX_SG; #if 0 /* GSI_REDIRECT */ - hca->qp = - ib_create_qp(hca->pd, hca->port, IB_QPT_UD, &qp_init_attr, &qp_cap); + qp_init_attr.qp_type = IB_QPT_UD; #else - hca->qp = - ib_get_special_qp(hca->pd, hca->port, IB_QPT_GSI, &qp_init_attr, - &qp_cap); + qp_init_attr.qp_type = IB_QPT_GSI; + qp_init_attr.port_num = hca->port; #endif + hca->qp = + ib_create_qp(hca->pd, &qp_init_attr, &qp_cap); if (IS_ERR(hca->qp)) { printk(KERN_ERR "Could not create QP.\n"); ret = PTR_ERR(hca->qp); Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 625) +++ include/ib_verbs.h (working copy) @@ -299,6 +299,7 @@ enum ib_sig_type sq_sig_type; enum ib_sig_type rq_sig_type; enum ib_qp_type qp_type; + u8 port_num; /* special QP types only */ }; enum ib_qp_attr_mask { @@ -578,12 +579,6 @@ int ib_destroy_qp(struct ib_qp *qp); -struct ib_qp *ib_get_special_qp(struct ib_pd *pd, - u8 port_num, - enum ib_qp_type qp_type, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); - struct ib_srq *ib_create_srq(struct ib_pd *pd, struct ib_srq_attr *srq_attr); int ib_query_srq(struct ib_srq *srq, From halr at voltaire.com Wed Aug 11 04:36:32 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 11 Aug 2004 07:36:32 -0400 Subject: [openib-general] [PATCH]: GSI Eliminate static limit on HCAs and ports Message-ID: <1092224194.1840.8.camel@localhost.localdomain> Eliminate static limit on HCAs and ports in GSI Index: access/TODO =================================================================== --- access/TODO (revision 624) +++ access/TODO (working copy) @@ -1,8 +1,7 @@ -8/10/04 +8/11/04 Remove #if 0/1 with suitable preprocessor symbols Replace ib_reg_mr with ib_reg_phys_mr -Eliminate static limit on numbers of ports/HCAs Makefile needs to use standard kbuild Migrate from /proc to /sysfs Obtain proper SGID index for GRH support (low priority) Index: access/gsi_main.c =================================================================== --- access/gsi_main.c (revision 627) +++ access/gsi_main.c (working copy) @@ -147,13 +147,6 @@ static struct timer_list gsi_sent_dtgrm_timer; static atomic_t gsi_sent_dtgrm_timer_running = ATOMIC_INIT(1); -#if 1 /* GSI_CREATE_PORTS_IN_INSMOD */ -/* support max 10 ports !!! */ -# define GSI_MAX_SUPPORTED_PORTS 10 -static struct gsi_hca_info_st *gsi_hca_h_array[GSI_MAX_SUPPORTED_PORTS]; -static int gsi_hca_h_array_num_ports = 0; -#endif - /* Current client ID (Manager in IB MI terms) */ static u32 gsi_curr_client_id = GSI_SERVER_ID + 1; static u32 gsi_curr_tid_cnt = 0; @@ -623,16 +616,26 @@ * and remove the HCA info structure. */ static int -gsi_hca_close(struct gsi_hca_info_st *hca) +gsi_hca_close(struct ib_device *device, int gsi_port) { + struct gsi_hca_info_st *hca = NULL, *head = + (struct gsi_hca_info_st *) &gsi_hca_list, *entry; GSI_HCA_LIST_LOCK_VAR; + GSI_HCA_LIST_LOCK(); + list_for_each(entry, head) { + if (entry->handle == device && entry->port == gsi_port) { + hca = entry; + break; + } + } + if (hca == NULL) { printk(KERN_ERR "hca == NULL\n"); + GSI_HCA_LIST_UNLOCK(); return -ENODEV; } - GSI_HCA_LIST_LOCK(); if (--hca->ref_cnt > 0) { printk(KERN_DEBUG "cnt - %d\n", hca->ref_cnt); GSI_HCA_LIST_UNLOCK(); @@ -656,8 +659,8 @@ * Open the HCA device, create the QP and CQ if needed */ static int -gsi_hca_open(struct ib_device *device, - struct gsi_hca_info_st **hca_info, int gsi_port) +gsi_hca_open(struct ib_device *device, int gsi_port, + struct gsi_hca_info_st **hca_info) { struct gsi_hca_info_st *hca = NULL, *head = (struct gsi_hca_info_st *) &gsi_hca_list, *entry; @@ -700,14 +703,11 @@ printk(KERN_ERR "Memory allocation error\n"); return -ENOMEM; } - - memset(hca, 0, sizeof (*hca)); *hca_info = hca; + memset(hca, 0, sizeof (*hca)); hca->port = gsi_port; - hca->ref_cnt = 1; - hca->handle = device; cq_size = GSI_QP_SND_SIZE + GSI_QP_RCV_SIZE + 20; @@ -816,7 +816,7 @@ return -EPERM; } - if ((ret = gsi_hca_open(device, &hca, port))) { + if ((ret = gsi_hca_open(device, port, &hca))) { printk(KERN_ERR "Cannot open HCA\n"); goto error1; } @@ -931,7 +931,7 @@ kfree(newinfo); #endif error2: - gsi_hca_close(hca); + gsi_hca_close(device, port); error1: return ret; } @@ -976,7 +976,7 @@ gsi_class_return_posted_snd_dtgrms(class_info); gsi_class_clean_redirect_class_port_info_list(class_info); - gsi_hca_close(class_info->hca); + gsi_hca_close(class_info->hca->handle, class_info->hca->port); kfree(class_info); return 0; @@ -2719,43 +2719,32 @@ int ret; struct ib_device_cap hca_attrib; int i = 0; + struct gsi_hca_info_st *hca = NULL; if ((ret = ib_query_device(device, &hca_attrib))) { printk(KERN_ERR "Could not query HCA\n"); goto error_hca_query; } - if (hca_attrib.phys_port_cnt > GSI_MAX_SUPPORTED_PORTS) { - printk(KERN_ERR "Too many ports - %d (support up to %d)\n", - hca_attrib.phys_port_cnt, GSI_MAX_SUPPORTED_PORTS); - ret = -EMFILE; - goto error_too_many_ports; - } - for (i = 0; i < hca_attrib.phys_port_cnt; i++) { - if ((ret = gsi_hca_open(device, - &(gsi_hca_h_array[i]), (i + 1)))) { + if ((ret = gsi_hca_open(device, i + 1, &hca))) { printk(KERN_ERR "Cannot open HCA\n"); goto error_gsi_hca_open; } - gsi_hca_h_array_num_ports++; } return 0; error_gsi_hca_open: while (i > 0) { - if ((ret = gsi_hca_close((gsi_hca_h_array[i - 1])))) { + if ((ret = gsi_hca_close(device, i - 1))) { printk(KERN_ERR \ "Cannot close gsi HCA (ret: %d), ix: %d\n", ret, i - 1); } - gsi_hca_h_array_num_ports--; - gsi_hca_h_array[i - 1] = NULL; i--; } -error_too_many_ports: error_hca_query: return ret; } @@ -2763,7 +2752,7 @@ static int gsi_delete_ports(struct ib_device *device) { - int ret; + int ret = 0; struct ib_device_cap hca_attrib; int i = 0; @@ -2772,34 +2761,14 @@ goto error_hca_query; } - if (hca_attrib.phys_port_cnt > GSI_MAX_SUPPORTED_PORTS) { - printk(KERN_ERR "Too many ports - %d (support up to %d)\n", - hca_attrib.phys_port_cnt, GSI_MAX_SUPPORTED_PORTS); - ret = -EMFILE; - goto error_too_many_ports; - } - for (i = 0; i < hca_attrib.phys_port_cnt; i++) { - if (gsi_hca_h_array[i] != NULL) { - if ((ret = gsi_hca_close((gsi_hca_h_array[i])))) { - printk(KERN_ERR \ - "Cannot close gsi HCA (ret: %d), ix: %d\n", - ret, i); - } - gsi_hca_h_array_num_ports--; - } else { - printk(KERN_ERR "gsi_hca_h_array[%d] == NULL\n", i); + if ((ret = gsi_hca_close(device, i))) { + printk(KERN_ERR \ + "Cannot close gsi HCA (ret: %d), ix: %d\n", + ret, i); } } - if (gsi_hca_h_array_num_ports != 0) { - printk(KERN_ERR "gsi_hca_h_array_num_ports: %d != 0\n", - gsi_hca_h_array_num_ports); - } - - return 0; - -error_too_many_ports: error_hca_query: return ret; } Index: access/gsi_priv.h =================================================================== --- access/gsi_priv.h (revision 624) +++ access/gsi_priv.h (working copy) @@ -187,7 +187,7 @@ spinlock_t rcv_list_lock; struct list_head rcv_posted_dtgrm_list; - u8 port; /* #### Set this field? */ + u8 port; struct ib_qp *qp; /* QP */ struct ib_cq *cq; /* Complete queue */ Index: access/class_port_info.h =================================================================== --- access/class_port_info.h (revision 561) +++ access/class_port_info.h (working copy) @@ -133,7 +133,7 @@ #if defined(__LITTLE_ENDIAN) static inline void -class_port_ntoh(char *dest, char *src, u8 size) +gid_ntoh(char *dest, char *src, u8 size) { u8 i; char temp; @@ -151,7 +151,7 @@ } } #else -#define class_port_ntoh(a,b,c) if(a != b)memcpy(a,b,c) +#define gid_ntoh(a,b,c) if(a != b)memcpy(a,b,c) #endif static inline void @@ -159,7 +159,7 @@ { info->capability_mask = htons(info->capability_mask); info->reserved1 = htonl(info->reserved1) >> 5; - class_port_ntoh((char *) &info->redirect_gid, + gid_ntoh((char *) &info->redirect_gid, (char *) &info->redirect_gid, sizeof (info->redirect_gid)); info->redirect_fl = htonl(info->redirect_fl) >> 12; @@ -167,8 +167,8 @@ info->redirect_p_key = htons(info->redirect_p_key); info->redirect_qp = (htonl(info->redirect_qp)) >> 8; info->redirect_q_key = htonl(info->redirect_q_key); - class_port_ntoh((char *) &info->trap_gid, (char *) &info->trap_gid, - sizeof (info->trap_gid)); + gid_ntoh((char *) &info->trap_gid, (char *) &info->trap_gid, + sizeof (info->trap_gid)); info->trapfl = htonl(info->trapfl) >> 12; info->trap_lid = htons(info->trap_lid); info->trap_p_key = htons(info->trap_p_key); From halr at voltaire.com Wed Aug 11 06:03:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 11 Aug 2004 09:03:24 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Fix some ib_device declarations Message-ID: <1092229406.1839.22.camel@localhost.localdomain> Fix some ib_device declarations Also, move struct ib_device structure up in file Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 626) +++ ib_verbs.h (working copy) @@ -30,7 +30,6 @@ #include #include -struct ib_device; struct ib_mad; enum ib_event_type { @@ -598,6 +597,129 @@ IB_CQ_NEXT_COMP }; + +enum ib_process_mad_flags { + IB_MAD_IGNORE_MKEY = 1 +}; + +#define IB_DEVICE_NAME_MAX 64 + +struct ib_device { + struct module *owner; + struct pci_dev *dma_device; + + char name[IB_DEVICE_NAME_MAX]; + char *provider; + void *private; + struct list_head core_list; + void *core; + void *mad; + u32 flags; + + int (*query_device)(struct ib_device *device, + struct ib_device_cap *device_cap); + int (*query_port)(struct ib_device *device, + u8 port_num, struct ib_port *port); + int (*query_gid)(struct ib_device *device, + u8 port_num, int index, + union ib_gid *gid); + int (*query_pkey)(struct ib_device *device, + u8 port_num, u16 index, u16 *pkey); + int (*modify_device)(struct ib_device *device, + u8 port_num, int device_attr_flags); + struct ib_pd * (*alloc_pd)(struct ib_device *device); + int (*dealloc_pd)(struct ib_pd *pd); + struct ib_ah * (*create_ah)(struct ib_pd *pd, + struct ib_ah_attr *ah_attr); + int (*modify_ah)(struct ib_ah *ah, + struct ib_ah_attr *ah_attr); + int (*query_ah)(struct ib_ah *ah, + struct ib_ah_attr *ah_attr); + int (*destroy_ah)(struct ib_ah *ah); + struct ib_qp * (*create_qp)(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap); + int (*modify_qp)(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap); + int (*query_qp)(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr); + int (*destroy_qp)(struct ib_qp *qp); + struct ib_srq * (*create_srq)(struct ib_pd *pd, + void *srq_context, + struct ib_srq_attr *srq_attr); + int (*query_srq)(struct ib_srq *srq, + struct ib_srq_attr *srq_attr); + int (*modify_srq)(struct ib_srq *srq, + struct ib_pd *pd, + struct ib_srq_attr *srq_attr, + int srq_attr_mask); + int (*destroy_srq)(struct ib_srq *srq); + int (*post_srq)(struct ib_srq *srq, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); + struct ib_cq * (*create_cq)(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, int cqe); + int (*resize_cq)(struct ib_cq *cq, int cqe); + int (*destroy_cq)(struct ib_cq *cq); + struct ib_mr * (*reg_phys_mr)(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); + int (*query_mr)(struct ib_mr *mr, + struct ib_mr_attr *mr_attr); + int (*dereg_mr)(struct ib_mr *mr); + int (*rereg_phys_mr)(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); + struct ib_mw * (*alloc_mw)(struct ib_pd *pd); + int (*bind_mw)(struct ib_qp *qp, + struct ib_mw *mw, + struct ib_mw_bind *mw_bind); + int (*dealloc_mw)(struct ib_mw *mw); + struct ib_fmr * (*alloc_fmr)(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr); + int (*map_fmr)(struct ib_fmr *fmr, void *addr, u64 size); + int (*map_phys_fmr)(struct ib_fmr *fmr, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf); + int (*unmap_fmr)(struct ib_fmr **fmr_array, int fmr_cnt); + int (*free_fmr)(struct ib_fmr *fmr); + int (*attach_mcast)(struct ib_qp *qp, union ib_gid *gid, + u16 lid); + int (*detach_mcast)(struct ib_qp *qp, union ib_gid *gid, + u16 lid); + int (*post_send)(struct ib_qp *qp, + struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr); + int (*post_recv)(struct ib_qp *qp, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); + int (*poll_cq)(struct ib_cq *cq, + int num_entries, + struct ib_wc *wc_array); + int (*peek_cq)(struct ib_cq *cq, int wc_cnt); + int (*req_notify_cq)(struct ib_cq *cq, + enum ib_cq_notify cq_notify); + int (*req_n_notify_cq)(struct ib_cq *cq, int wc_cnt); + int (*process_mad)(struct ib_device *device, + int process_mad_flags, + struct ib_mad *in_mad, + struct ib_mad *out_mad); + + struct class_device class_dev; +}; + static inline int ib_query_device(struct ib_device *device, struct ib_device_cap *device_cap) { @@ -677,7 +799,7 @@ static inline int ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, - struct ib_qp_cap *qp_cap ) + struct ib_qp_cap *qp_cap) { return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); } @@ -903,126 +1025,4 @@ return cq->device->req_n_notify_cq(cq, wc_cnt); } -enum ib_process_mad_flags { - IB_MAD_IGNORE_MKEY = 1 -}; - -#define IB_DEVICE_NAME_MAX 64 - -struct ib_device { - struct module *owner; - struct pci_dev *dma_device; - - char name[IB_DEVICE_NAME_MAX]; - char *provider; - void *private; - struct list_head core_list; - void *core; - void *mad; - u32 flags; - - int (*query_device)(struct ib_device *device, - struct ib_device_cap *device_cap); - int (*query_port)(struct ib_device *device, - u8 port_num, struct ib_port *port); - int (*query_gid)(struct ib_device *device, - u8 port_num, int index, - union ib_gid *gid); - int (*query_pkey)(struct ib_device *device, - u8 port_num, u16 index, u16 *pkey); - int (*modify_device)(struct ib_device *device, - u8 port_num, int device_attr_flags); - struct ib_pd (*alloc_pd)(struct ib_device *device); - int (*dealloc_pd)(struct ib_pd *pd); - struct ib_ah (*create_ah)(struct ib_pd *pd, - struct ib_ah_attr *ah_attr); - int (*modify_ah)(struct ib_ah *ah, - struct ib_ah_attr *ah_attr); - int (*query_ah)(struct ib_ah *ah, - struct ib_ah_attr *ah_attr); - int (*destroy_ah)(struct ib_ah *ah); - struct ib_qp (*create_qp)(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); - int (*modify_qp)(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_cap *qp_cap ); - int (*query_qp)(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_init_attr *qp_init_attr); - int (*destroy_qp)(struct ib_qp *qp); - struct ib_srq (*create_srq)(struct ib_pd *pd, - void *srq_context, - struct ib_srq_attr *srq_attr); - int (*query_srq)(struct ib_srq *srq, - struct ib_srq_attr *srq_attr); - int (*modify_srq)(struct ib_srq *srq, - struct ib_pd *pd, - struct ib_srq_attr *srq_attr, - int srq_attr_mask); - int (*destroy_srq)(struct ib_srq *srq); - int (*post_srq)(struct ib_srq *srq, - struct ib_recv_wr *recv_wr, - struct ib_recv_wr **bad_recv_wr); - struct ib_cq (*create_cq)(struct ib_device *device, - ib_comp_handler comp_handler, - void *cq_context, int cqe); - int (*resize_cq)(struct ib_cq *cq, int cqe); - int (*destroy_cq)(struct ib_cq *cq); - struct ib_mr (*reg_phys_mr)(struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start); - int (*query_mr)(struct ib_mr *mr, - struct ib_mr_attr *mr_attr); - int (*dereg_mr)(struct ib_mr *mr); - int (*rereg_phys_mr)(struct ib_mr *mr, - int mr_rereg_mask, - struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start); - struct ib_mw (*alloc_mw)(struct ib_pd *pd); - int (*bind_mw)(struct ib_qp *qp, - struct ib_mw *mw, - struct ib_mw_bind *mw_bind); - int (*dealloc_mw)(struct ib_mw *mw); - struct ib_fmr (*alloc_fmr)(struct ib_pd *pd, - int mr_access_flags, - struct ib_fmr_attr *fmr_attr); - int (*map_fmr)(struct ib_fmr *fmr, void *addr, u64 size); - int (*map_phys_fmr)(struct ib_fmr *fmr, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf); - int (*unmap_fmr)(struct ib_fmr **fmr_array, int fmr_cnt); - int (*free_fmr)(struct ib_fmr *fmr); - int (*attach_mcast)(struct ib_qp *qp, union ib_gid *gid, - u16 lid); - int (*detach_mcast)(struct ib_qp *qp, union ib_gid *gid, - u16 lid); - int (*post_send)(struct ib_qp *qp, - struct ib_send_wr *send_wr, - struct ib_send_wr **bad_send_wr); - int (*post_recv)(struct ib_qp *qp, - struct ib_recv_wr *recv_wr, - struct ib_recv_wr **bad_recv_wr); - int (*poll_cq)(struct ib_cq *cq, - int num_entries, - struct ib_wc *wc_array); - int (*peek_cq)(struct ib_cq *cq, int wc_cnt); - int (*req_notify_cq)(struct ib_cq *cq, - enum ib_cq_notify cq_notify); - int (*req_n_notify_cq)(struct ib_cq *cq, int wc_cnt); - int (*process_mad)(struct ib_device *device, - int process_mad_flags, - struct ib_mad *in_mad, - struct ib_mad *out_mad); - - struct class_device class_dev; -}; - #endif /* IB_VERBS_H */ From Tom.Duffy at Sun.COM Wed Aug 11 07:51:28 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Wed, 11 Aug 2004 07:51:28 -0700 Subject: [openib-general] [PATCH] ib_verbs.h: Fix some ib_device declarations In-Reply-To: <1092229406.1839.22.camel@localhost.localdomain> References: <1092229406.1839.22.camel@localhost.localdomain> Message-ID: <1092235888.30996.12.camel@localhost> On Wed, 2004-08-11 at 06:03, Hal Rosenstock wrote: > Fix some ib_device declarations > Also, move struct ib_device structure up in file Can you please do the move in a separate patch from the fixup. It is very hard to tell what was changed if you change it and move it in the same patch. Thanks, -tduffy From halr at voltaire.com Wed Aug 11 08:27:29 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 11 Aug 2004 11:27:29 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Fix some ib_device declarations In-Reply-To: <1092235888.30996.12.camel@localhost> References: <1092229406.1839.22.camel@localhost.localdomain> <1092235888.30996.12.camel@localhost> Message-ID: <1092238050.1800.0.camel@localhost.localdomain> On Wed, 2004-08-11 at 10:51, Tom Duffy wrote: > On Wed, 2004-08-11 at 06:03, Hal Rosenstock wrote: > > Fix some ib_device declarations > > Also, move struct ib_device structure up in file > > Can you please do the move in a separate patch from the fixup. It is > very hard to tell what was changed if you change it and move it in the > same patch. Sure. I'll reissue in 2 steps. -- Hal From halr at voltaire.com Wed Aug 11 08:32:35 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 11 Aug 2004 11:32:35 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Fix some ib_device declarations Message-ID: <1092238357.1800.3.camel@localhost.localdomain> ib_verbs.h: Fix some ib_device declarations Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 629) +++ ib_verbs.h (working copy) @@ -677,7 +677,7 @@ static inline int ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, - struct ib_qp_cap *qp_cap ) + struct ib_qp_cap *qp_cap) { return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); } @@ -903,6 +903,7 @@ return cq->device->req_n_notify_cq(cq, wc_cnt); } + enum ib_process_mad_flags { IB_MAD_IGNORE_MKEY = 1 }; @@ -932,28 +933,28 @@ u8 port_num, u16 index, u16 *pkey); int (*modify_device)(struct ib_device *device, u8 port_num, int device_attr_flags); - struct ib_pd (*alloc_pd)(struct ib_device *device); + struct ib_pd * (*alloc_pd)(struct ib_device *device); int (*dealloc_pd)(struct ib_pd *pd); - struct ib_ah (*create_ah)(struct ib_pd *pd, + struct ib_ah * (*create_ah)(struct ib_pd *pd, struct ib_ah_attr *ah_attr); int (*modify_ah)(struct ib_ah *ah, struct ib_ah_attr *ah_attr); int (*query_ah)(struct ib_ah *ah, struct ib_ah_attr *ah_attr); int (*destroy_ah)(struct ib_ah *ah); - struct ib_qp (*create_qp)(struct ib_pd *pd, + struct ib_qp * (*create_qp)(struct ib_pd *pd, struct ib_qp_init_attr *qp_init_attr, struct ib_qp_cap *qp_cap); int (*modify_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, - struct ib_qp_cap *qp_cap ); + struct ib_qp_cap *qp_cap); int (*query_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr); int (*destroy_qp)(struct ib_qp *qp); - struct ib_srq (*create_srq)(struct ib_pd *pd, + struct ib_srq * (*create_srq)(struct ib_pd *pd, void *srq_context, struct ib_srq_attr *srq_attr); int (*query_srq)(struct ib_srq *srq, @@ -966,12 +967,12 @@ int (*post_srq)(struct ib_srq *srq, struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr); - struct ib_cq (*create_cq)(struct ib_device *device, + struct ib_cq * (*create_cq)(struct ib_device *device, ib_comp_handler comp_handler, void *cq_context, int cqe); int (*resize_cq)(struct ib_cq *cq, int cqe); int (*destroy_cq)(struct ib_cq *cq); - struct ib_mr (*reg_phys_mr)(struct ib_pd *pd, + struct ib_mr * (*reg_phys_mr)(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, @@ -986,12 +987,12 @@ int num_phys_buf, int mr_access_flags, u64 *iova_start); - struct ib_mw (*alloc_mw)(struct ib_pd *pd); + struct ib_mw * (*alloc_mw)(struct ib_pd *pd); int (*bind_mw)(struct ib_qp *qp, struct ib_mw *mw, struct ib_mw_bind *mw_bind); int (*dealloc_mw)(struct ib_mw *mw); - struct ib_fmr (*alloc_fmr)(struct ib_pd *pd, + struct ib_fmr * (*alloc_fmr)(struct ib_pd *pd, int mr_access_flags, struct ib_fmr_attr *fmr_attr); int (*map_fmr)(struct ib_fmr *fmr, void *addr, u64 size); From mshefty at ichips.intel.com Wed Aug 11 07:41:03 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Aug 2004 07:41:03 -0700 Subject: [openib-general] [PATCH] ib_verbs.h: Fix some ib_device declarations In-Reply-To: <1092238357.1800.3.camel@localhost.localdomain> References: <1092238357.1800.3.camel@localhost.localdomain> Message-ID: <20040811074103.5e5a413d.mshefty@ichips.intel.com> On Wed, 11 Aug 2004 11:32:35 -0400 Hal Rosenstock wrote: > ib_verbs.h: Fix some ib_device declarations applied From mshefty at ichips.intel.com Wed Aug 11 07:45:12 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Aug 2004 07:45:12 -0700 Subject: [openib-general] [PATCH] defined struct ib_mw_bind In-Reply-To: <523c2u5vxv.fsf@topspin.com> References: <20040810162036.36bf3832.mshefty@ichips.intel.com> <20040810162701.0ec7352f.mshefty@ichips.intel.com> <523c2u5vxv.fsf@topspin.com> Message-ID: <20040811074512.4ab6cdc5.mshefty@ichips.intel.com> On Tue, 10 Aug 2004 18:02:04 -0700 Roland Dreier wrote: > Thanks, this is in my WIP tree (will commit once I have the QP stuff done). Patch has been committed. Removed comments from the enum. (Will just use comments on function calls when I get to them.) From halr at voltaire.com Wed Aug 11 09:09:11 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 11 Aug 2004 12:09:11 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Fix ib_mw_bind structure Message-ID: <1092240552.1821.4.camel@localhost.localdomain> Fix ib_mw_bind structure Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 630) +++ ib_verbs.h (working copy) @@ -466,7 +466,7 @@ }; struct ib_mw_bind { - struct *ib_mr; + struct ib_mr *mr; u64 wr_id; u64 addr; u32 length; From mshefty at ichips.intel.com Wed Aug 11 08:27:12 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Aug 2004 08:27:12 -0700 Subject: [openib-general] [PATCH] ib_verbs.h: Fix ib_mw_bind structure In-Reply-To: <1092240552.1821.4.camel@localhost.localdomain> References: <1092240552.1821.4.camel@localhost.localdomain> Message-ID: <20040811082712.48c55f67.mshefty@ichips.intel.com> On Wed, 11 Aug 2004 12:09:11 -0400 Hal Rosenstock wrote: > Fix ib_mw_bind structure Thanks - applied. From halr at voltaire.com Wed Aug 11 09:38:15 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 11 Aug 2004 12:38:15 -0400 Subject: [openib-general] [PATCH] ib_verbs.h: Move struct ib_device up in file Message-ID: <1092242297.1826.2.camel@localhost.localdomain> Move definition of struct ib_device earlier in ib_verbs.h so it is defined prior to it's use in functions Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 631) +++ ib_verbs.h (working copy) @@ -30,7 +30,6 @@ #include #include -struct ib_device; struct ib_mad; enum ib_event_type { @@ -599,6 +598,128 @@ IB_CQ_NEXT_COMP }; +enum ib_process_mad_flags { + IB_MAD_IGNORE_MKEY = 1 +}; + +#define IB_DEVICE_NAME_MAX 64 + +struct ib_device { + struct module *owner; + struct pci_dev *dma_device; + + char name[IB_DEVICE_NAME_MAX]; + char *provider; + void *private; + struct list_head core_list; + void *core; + void *mad; + u32 flags; + + int (*query_device)(struct ib_device *device, + struct ib_device_cap *device_cap); + int (*query_port)(struct ib_device *device, + u8 port_num, struct ib_port *port); + int (*query_gid)(struct ib_device *device, + u8 port_num, int index, + union ib_gid *gid); + int (*query_pkey)(struct ib_device *device, + u8 port_num, u16 index, u16 *pkey); + int (*modify_device)(struct ib_device *device, + u8 port_num, int device_attr_flags); + struct ib_pd * (*alloc_pd)(struct ib_device *device); + int (*dealloc_pd)(struct ib_pd *pd); + struct ib_ah * (*create_ah)(struct ib_pd *pd, + struct ib_ah_attr *ah_attr); + int (*modify_ah)(struct ib_ah *ah, + struct ib_ah_attr *ah_attr); + int (*query_ah)(struct ib_ah *ah, + struct ib_ah_attr *ah_attr); + int (*destroy_ah)(struct ib_ah *ah); + struct ib_qp * (*create_qp)(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap); + int (*modify_qp)(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap); + int (*query_qp)(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr); + int (*destroy_qp)(struct ib_qp *qp); + struct ib_srq * (*create_srq)(struct ib_pd *pd, + void *srq_context, + struct ib_srq_attr *srq_attr); + int (*query_srq)(struct ib_srq *srq, + struct ib_srq_attr *srq_attr); + int (*modify_srq)(struct ib_srq *srq, + struct ib_pd *pd, + struct ib_srq_attr *srq_attr, + int srq_attr_mask); + int (*destroy_srq)(struct ib_srq *srq); + int (*post_srq)(struct ib_srq *srq, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); + struct ib_cq * (*create_cq)(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, int cqe); + int (*resize_cq)(struct ib_cq *cq, int cqe); + int (*destroy_cq)(struct ib_cq *cq); + struct ib_mr * (*reg_phys_mr)(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); + int (*query_mr)(struct ib_mr *mr, + struct ib_mr_attr *mr_attr); + int (*dereg_mr)(struct ib_mr *mr); + int (*rereg_phys_mr)(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); + struct ib_mw * (*alloc_mw)(struct ib_pd *pd); + int (*bind_mw)(struct ib_qp *qp, + struct ib_mw *mw, + struct ib_mw_bind *mw_bind); + int (*dealloc_mw)(struct ib_mw *mw); + struct ib_fmr * (*alloc_fmr)(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr); + int (*map_fmr)(struct ib_fmr *fmr, void *addr, u64 size); + int (*map_phys_fmr)(struct ib_fmr *fmr, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf); + int (*unmap_fmr)(struct ib_fmr **fmr_array, int fmr_cnt); + int (*free_fmr)(struct ib_fmr *fmr); + int (*attach_mcast)(struct ib_qp *qp, union ib_gid *gid, + u16 lid); + int (*detach_mcast)(struct ib_qp *qp, union ib_gid *gid, + u16 lid); + int (*post_send)(struct ib_qp *qp, + struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr); + int (*post_recv)(struct ib_qp *qp, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); + int (*poll_cq)(struct ib_cq *cq, + int num_entries, + struct ib_wc *wc_array); + int (*peek_cq)(struct ib_cq *cq, int wc_cnt); + int (*req_notify_cq)(struct ib_cq *cq, + enum ib_cq_notify cq_notify); + int (*req_n_notify_cq)(struct ib_cq *cq, int wc_cnt); + int (*process_mad)(struct ib_device *device, + int process_mad_flags, + struct ib_mad *in_mad, + struct ib_mad *out_mad); + + struct class_device class_dev; +}; + static inline int ib_query_device(struct ib_device *device, struct ib_device_cap *device_cap) { @@ -904,126 +1025,4 @@ return cq->device->req_n_notify_cq(cq, wc_cnt); } -enum ib_process_mad_flags { - IB_MAD_IGNORE_MKEY = 1 -}; - -#define IB_DEVICE_NAME_MAX 64 - -struct ib_device { - struct module *owner; - struct pci_dev *dma_device; - - char name[IB_DEVICE_NAME_MAX]; - char *provider; - void *private; - struct list_head core_list; - void *core; - void *mad; - u32 flags; - - int (*query_device)(struct ib_device *device, - struct ib_device_cap *device_cap); - int (*query_port)(struct ib_device *device, - u8 port_num, struct ib_port *port); - int (*query_gid)(struct ib_device *device, - u8 port_num, int index, - union ib_gid *gid); - int (*query_pkey)(struct ib_device *device, - u8 port_num, u16 index, u16 *pkey); - int (*modify_device)(struct ib_device *device, - u8 port_num, int device_attr_flags); - struct ib_pd * (*alloc_pd)(struct ib_device *device); - int (*dealloc_pd)(struct ib_pd *pd); - struct ib_ah * (*create_ah)(struct ib_pd *pd, - struct ib_ah_attr *ah_attr); - int (*modify_ah)(struct ib_ah *ah, - struct ib_ah_attr *ah_attr); - int (*query_ah)(struct ib_ah *ah, - struct ib_ah_attr *ah_attr); - int (*destroy_ah)(struct ib_ah *ah); - struct ib_qp * (*create_qp)(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); - int (*modify_qp)(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_cap *qp_cap ); - int (*query_qp)(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_init_attr *qp_init_attr); - int (*destroy_qp)(struct ib_qp *qp); - struct ib_srq * (*create_srq)(struct ib_pd *pd, - void *srq_context, - struct ib_srq_attr *srq_attr); - int (*query_srq)(struct ib_srq *srq, - struct ib_srq_attr *srq_attr); - int (*modify_srq)(struct ib_srq *srq, - struct ib_pd *pd, - struct ib_srq_attr *srq_attr, - int srq_attr_mask); - int (*destroy_srq)(struct ib_srq *srq); - int (*post_srq)(struct ib_srq *srq, - struct ib_recv_wr *recv_wr, - struct ib_recv_wr **bad_recv_wr); - struct ib_cq * (*create_cq)(struct ib_device *device, - ib_comp_handler comp_handler, - void *cq_context, int cqe); - int (*resize_cq)(struct ib_cq *cq, int cqe); - int (*destroy_cq)(struct ib_cq *cq); - struct ib_mr * (*reg_phys_mr)(struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start); - int (*query_mr)(struct ib_mr *mr, - struct ib_mr_attr *mr_attr); - int (*dereg_mr)(struct ib_mr *mr); - int (*rereg_phys_mr)(struct ib_mr *mr, - int mr_rereg_mask, - struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start); - struct ib_mw * (*alloc_mw)(struct ib_pd *pd); - int (*bind_mw)(struct ib_qp *qp, - struct ib_mw *mw, - struct ib_mw_bind *mw_bind); - int (*dealloc_mw)(struct ib_mw *mw); - struct ib_fmr * (*alloc_fmr)(struct ib_pd *pd, - int mr_access_flags, - struct ib_fmr_attr *fmr_attr); - int (*map_fmr)(struct ib_fmr *fmr, void *addr, u64 size); - int (*map_phys_fmr)(struct ib_fmr *fmr, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf); - int (*unmap_fmr)(struct ib_fmr **fmr_array, int fmr_cnt); - int (*free_fmr)(struct ib_fmr *fmr); - int (*attach_mcast)(struct ib_qp *qp, union ib_gid *gid, - u16 lid); - int (*detach_mcast)(struct ib_qp *qp, union ib_gid *gid, - u16 lid); - int (*post_send)(struct ib_qp *qp, - struct ib_send_wr *send_wr, - struct ib_send_wr **bad_send_wr); - int (*post_recv)(struct ib_qp *qp, - struct ib_recv_wr *recv_wr, - struct ib_recv_wr **bad_recv_wr); - int (*poll_cq)(struct ib_cq *cq, - int num_entries, - struct ib_wc *wc_array); - int (*peek_cq)(struct ib_cq *cq, int wc_cnt); - int (*req_notify_cq)(struct ib_cq *cq, - enum ib_cq_notify cq_notify); - int (*req_n_notify_cq)(struct ib_cq *cq, int wc_cnt); - int (*process_mad)(struct ib_device *device, - int process_mad_flags, - struct ib_mad *in_mad, - struct ib_mad *out_mad); - - struct class_device class_dev; -}; - #endif /* IB_VERBS_H */ From mshefty at ichips.intel.com Wed Aug 11 08:43:40 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Aug 2004 08:43:40 -0700 Subject: [openib-general] [PATCH] ib_verbs.h: Move struct ib_device up in file In-Reply-To: <1092242297.1826.2.camel@localhost.localdomain> References: <1092242297.1826.2.camel@localhost.localdomain> Message-ID: <20040811084340.430cf4c0.mshefty@ichips.intel.com> On Wed, 11 Aug 2004 12:38:15 -0400 Hal Rosenstock wrote: > Move definition of struct ib_device earlier in ib_verbs.h so it is > defined prior to it's use in functions Thanks! - applied. From halr at voltaire.com Wed Aug 11 10:14:07 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 11 Aug 2004 13:14:07 -0400 Subject: [openib-general] [PATCH]: GSI: synchronize with latest ib_verbs.h file Message-ID: <1092244449.1877.10.camel@localhost.localdomain> Synchronize GSI with latest ib_verbs.h file Note that this change does not include the replacement of the use of ib_reg_mr with ib_reg_phys_mr Index: access/gsi_main.c =================================================================== --- access/gsi_main.c (revision 628) +++ access/gsi_main.c (working copy) @@ -471,7 +471,7 @@ dtgrm_priv->v_mem_h = ib_reg_mr(hca->pd, dtgrm_priv->grh, MAD_BLOCK_SIZE + IB_GRH_LEN, - IB_MR_LOCAL_WRITE, + IB_ACCESS_LOCAL_WRITE, &dtgrm_priv->sg.lkey, &rkey); if (IS_ERR(dtgrm_priv->v_mem_h)) { printk(KERN_ERR \ @@ -1913,7 +1913,7 @@ dtgrm_priv->v_mem_h = ib_reg_mr(hca->pd, mad, MAD_BLOCK_SIZE, - IB_MR_LOCAL_WRITE, + IB_ACCESS_LOCAL_WRITE, &dtgrm_priv->sg.lkey, &rkey); if (IS_ERR(dtgrm_priv->v_mem_h)) { printk(KERN_ERR "Could not get general memory attr.\n"); @@ -1921,7 +1921,6 @@ goto error2; } - wr.next = NULL; wr.wr_id = (unsigned long) dtgrm_priv; wr.sg_list = &dtgrm_priv->sg; wr.sg_list->addr = (unsigned long) mad; @@ -2065,14 +2064,13 @@ dtgrm_priv->v_mem_h = ib_reg_mr(hca->pd, mad, MAD_BLOCK_SIZE, - IB_MR_LOCAL_WRITE, + IB_ACCESS_LOCAL_WRITE, &dtgrm_priv->sg.lkey, &rkey); if (IS_ERR(dtgrm_priv->v_mem_h)) { printk(KERN_ERR "Could not get general memory attr.\n"); goto error2; } - wr.next = NULL; wr.wr_id = (unsigned long) dtgrm_priv; wr.sg_list = &dtgrm_priv->sg; wr.sg_list->addr = (unsigned long) mad; Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 627) +++ include/ib_verbs.h (working copy) @@ -26,6 +26,12 @@ #if !defined( IB_VERBS_H ) #define IB_VERBS_H +#include +#include +#include + +struct ib_mad; + enum ib_event_type { IB_EVENT_CQ_ERR, IB_EVENT_QP_FATAL, @@ -44,75 +50,96 @@ }; struct ib_event { - struct ib_device *device; - void *context; + struct ib_device *device; + void *context; union { - struct ib_cq *cq; - struct ib_qp *qp; - u8 port; + struct ib_cq *cq; + struct ib_qp *qp; + u8 port; } element; - enum ib_event_type event; + enum ib_event_type event; }; -typedef void (*ib_event_handler) (struct ib_event * async_event); +typedef void (*ib_event_handler)(struct ib_event *async_event); -typedef void (*ib_comp_handler) (struct ib_cq * cq); +typedef void (*ib_comp_handler)(struct ib_cq *cq); struct ib_pd { - struct ib_device *device; + struct ib_device *device; + atomic_t usecnt; }; struct ib_ah { - struct ib_device *device; + struct ib_device *device; + struct ib_pd *pd; + atomic_t usecnt; }; struct ib_cq { - struct ib_device *device; - ib_comp_handler comp_handler; - void *cq_context; - int cqe; + struct ib_device *device; + ib_comp_handler comp_handler; + void *cq_context; + int cqe; + atomic_t usecnt; }; struct ib_srq { - struct ib_device *device; - void *srq_context; + struct ib_device *device; + struct ib_pd *pd; + void *srq_context; + atomic_t usecnt; }; struct ib_qp { - struct ib_device *device; - void *qp_context; - u32 qp_num; + struct ib_device *device; + struct ib_pd *pd; + struct ib_cq *send_cq; + struct ib_cq *recv_cq; + void *qp_context; + u32 qp_num; + atomic_t usecnt; }; struct ib_mr { - struct ib_device *device; + struct ib_device *device; + struct ib_pd *pd; + u32 lkey; + u32 rkey; + atomic_t usecnt; }; struct ib_mw { - struct ib_device *device; + struct ib_device *device; + struct ib_pd *pd; + u32 rkey; + atomic_t usecnt; }; struct ib_fmr { - struct ib_device *device; + struct ib_device *device; + struct ib_pd *pd; + u32 lkey; + u32 rkey; + atomic_t usecnt; }; enum ib_device_cap_flags { - IB_DEVICE_RESIZE_MAX_WR = 1, - IB_DEVICE_BAD_PKEY_CNT = (1 << 1), - IB_DEVICE_BAD_QKEY_CNT = (1 << 2), - IB_DEVICE_RAW_MULTI = (1 << 3), - IB_DEVICE_AUTO_PATH_MIG = (1 << 4), - IB_DEVICE_CHANGE_PHY_PORT = (1 << 5), - IB_DEVICE_UD_AV_PORT_ENFORCE = (1 << 6), - IB_DEVICE_CURR_QP_STATE_MOD = (1 << 7), - IB_DEVICE_SHUTDOWN_PORT = (1 << 8), - IB_DEVICE_INIT_TYPE = (1 << 9), - IB_DEVICE_PORT_ACTIVE_EVENT = (1 << 10), - IB_DEVICE_SYS_IMG_GUID = (1 << 11), - IB_DEVICE_RC_RNR_NAK_GEN = (1 << 12), - IB_DEVICE_SRQ_RESIZE = (1 << 13), - IB_DEVICE_N_NOTIFY_CQ = (1 << 14), - IB_DEVICE_RQ_SIG_TYPE = (1 << 15) + IB_DEVICE_RESIZE_MAX_WR = 1, + IB_DEVICE_BAD_PKEY_CNT = (1<<1), + IB_DEVICE_BAD_QKEY_CNT = (1<<2), + IB_DEVICE_RAW_MULTI = (1<<3), + IB_DEVICE_AUTO_PATH_MIG = (1<<4), + IB_DEVICE_CHANGE_PHY_PORT = (1<<5), + IB_DEVICE_UD_AV_PORT_ENFORCE = (1<<6), + IB_DEVICE_CURR_QP_STATE_MOD = (1<<7), + IB_DEVICE_SHUTDOWN_PORT = (1<<8), + IB_DEVICE_INIT_TYPE = (1<<9), + IB_DEVICE_PORT_ACTIVE_EVENT = (1<<10), + IB_DEVICE_SYS_IMG_GUID = (1<<11), + IB_DEVICE_RC_RNR_NAK_GEN = (1<<12), + IB_DEVICE_SRQ_RESIZE = (1<<13), + IB_DEVICE_N_NOTIFY_CQ = (1<<14), + IB_DEVICE_RQ_SIG_TYPE = (1<<15) }; enum ib_atomic_cap { @@ -122,157 +149,153 @@ }; struct ib_device_cap { - u64 fw_ver; - u64 node_guid; - u64 sys_image_guid; - u64 max_mr_size; - u64 page_size_cap; - u32 vendor_id; - u32 vendor_part_id; - u32 hw_ver; - int max_qp; - int max_qp_wr; - int device_cap_flags; - int max_sge; - int max_sge_rd; - int max_cq; - int max_cqe; - int max_mr; - int max_pd; - int phys_port_cnt; - int max_qp_rd_atom; - int max_ee_rd_atom; - int max_res_rd_atom; - int max_qp_init_rd_atom; - int max_ee_init_rd_atom; - enum ib_atomic_cap atomic_cap; - int max_ee; - int max_rdd; - int max_mw; - int max_raw_ipv6_qp; - int max_raw_ethy_qp; - int max_mcast_grp; - int max_mcast_qp_attach; - int max_total_mcast_qp_attach; - int max_ah; - int max_fmr; - int max_map_per_fmr; - int max_srq; - int max_srq_wr; - int max_srq_sge; - u16 max_pkeys; - u8 local_ca_ack_delay; + u64 fw_ver; + u64 node_guid; + u64 sys_image_guid; + u64 max_mr_size; + u64 page_size_cap; + u32 vendor_id; + u32 vendor_part_id; + u32 hw_ver; + int max_qp; + int max_qp_wr; + int device_cap_flags; + int max_sge; + int max_sge_rd; + int max_cq; + int max_cqe; + int max_mr; + int max_pd; + int phys_port_cnt; + int max_qp_rd_atom; + int max_ee_rd_atom; + int max_res_rd_atom; + int max_qp_init_rd_atom; + int max_ee_init_rd_atom; + enum ib_atomic_cap atomic_cap; + int max_ee; + int max_rdd; + int max_mw; + int max_raw_ipv6_qp; + int max_raw_ethy_qp; + int max_mcast_grp; + int max_mcast_qp_attach; + int max_total_mcast_qp_attach; + int max_ah; + int max_fmr; + int max_map_per_fmr; + int max_srq; + int max_srq_wr; + int max_srq_sge; + u16 max_pkeys; + u8 local_ca_ack_delay; }; enum ib_mtu { - IB_MTU_256 = 1, - IB_MTU_512 = 2, + IB_MTU_256 = 1, + IB_MTU_512 = 2, IB_MTU_1024 = 3, IB_MTU_2048 = 4, IB_MTU_4096 = 5 }; enum ib_static_rate { - IB_STATIC_RATE_FULL = 0, - IB_STATIC_RATE_12X_TO_4X = 2, - IB_STATIC_RATE_4X_TO_1X = 3, - IB_STATIC_RATE_12X_TO_1X = 11 + IB_STATIC_RATE_FULL = 0, + IB_STATIC_RATE_12X_TO_4X = 2, + IB_STATIC_RATE_4X_TO_1X = 3, + IB_STATIC_RATE_12X_TO_1X = 11 }; enum ib_port_state { - IB_PORT_NOP = 0, - IB_PORT_DOWN = 1, - IB_PORT_INIT = 2, - IB_PORT_ARMED = 3, - IB_PORT_ACTIVE = 4, - IB_PORT_ACTIVE_DEFER = 5 + IB_PORT_NOP = 0, + IB_PORT_DOWN = 1, + IB_PORT_INIT = 2, + IB_PORT_ARMED = 3, + IB_PORT_ACTIVE = 4, + IB_PORT_ACTIVE_DEFER = 5 }; enum ib_port_cap_flags { - IB_PORT_SM = (1 << 1), - IB_PORT_NOTICE_SUP = (1 << 2), - IB_PORT_TRAP_SUP = (1 << 3), - IB_PORT_AUTO_MIGR_SUP = (1 << 5), - IB_PORT_SL_MAP_SUP = (1 << 6), - IB_PORT_MKEY_NVRAM = (1 << 7), - IB_PORT_PKEY_NVRAM = (1 << 8), - IB_PORT_LED_INFO_SUP = (1 << 9), - IB_PORT_SM_DISABLED = (1 << 10), - IB_PORT_SYS_IMAGE_GUID_SUP = (1 << 11), - IB_PORT_PKEY_SW_EXT_PORT_TRAP_SUP = (1 << 12), - IB_PORT_CM_SUP = (1 << 16), - IB_PORT_SNMP_TUNN_SUP = (1 << 17), - IB_PORT_REINIT_SUP = (1 << 18), - IB_PORT_DEVICE_MGMT_SUP = (1 << 19), - IB_PORT_VENDOR_CLS_SUP = (1 << 20), - IB_PORT_DR_NOTICE_SUP = (1 << 21), - IB_PORT_PORT_NOTICE_SUP = (1 << 22), - IB_PORT_BOOT_MGMT_SUP = (1 << 23) + IB_PORT_SM = (1<<1), + IB_PORT_NOTICE_SUP = (1<<2), + IB_PORT_TRAP_SUP = (1<<3), + IB_PORT_AUTO_MIGR_SUP = (1<<5), + IB_PORT_SL_MAP_SUP = (1<<6), + IB_PORT_MKEY_NVRAM = (1<<7), + IB_PORT_PKEY_NVRAM = (1<<8), + IB_PORT_LED_INFO_SUP = (1<<9), + IB_PORT_SM_DISABLED = (1<<10), + IB_PORT_SYS_IMAGE_GUID_SUP = (1<<11), + IB_PORT_PKEY_SW_EXT_PORT_TRAP_SUP = (1<<12), + IB_PORT_CM_SUP = (1<<16), + IB_PORT_SNMP_TUNN_SUP = (1<<17), + IB_PORT_REINIT_SUP = (1<<18), + IB_PORT_DEVICE_MGMT_SUP = (1<<19), + IB_PORT_VENDOR_CLS_SUP = (1<<20), + IB_PORT_DR_NOTICE_SUP = (1<<21), + IB_PORT_PORT_NOTICE_SUP = (1<<22), + IB_PORT_BOOT_MGMT_SUP = (1<<23) }; struct ib_port { - enum ib_port_state state; - enum ib_mtu max_mtu; - int port_cap_flags; - int gid_tbl_len; - u32 max_msg_sz; - u32 bad_pkey_cntr; - u32 qkey_viol_cntr; - u16 pkey_tbl_len; - u16 lid; - u16 sm_lid; - u8 lmc; - u8 max_vl_num; - u8 sm_sl; - u8 subnet_timeout; - u8 init_type_reply; -}; + enum ib_port_state state; + enum ib_mtu max_mtu; + enum ib_mtu active_mtu; + int port_cap_flags; + int gid_tbl_len; + u32 max_msg_sz; + u32 bad_pkey_cntr; + u32 qkey_viol_cntr; + u16 pkey_tbl_len; + u16 lid; + u16 sm_lid; + u8 lmc; + u8 max_vl_num; + u8 sm_sl; + u8 subnet_timeout; + u8 init_type_reply; +}; enum ib_device_attr_flags { - IB_DEVICE_SM = 1, - IB_DEVICE_SNMP_TUN_SUP = (1 << 1), - IB_DEVICE_DM_SUP = (1 << 2), - IB_DEVICE_VENDOR_CLS_SUP = (1 << 3), - IB_DEVICE_RESET_QKEY_CNTR = (1 << 4) + IB_DEVICE_SM = 1, + IB_DEVICE_SNMP_TUN_SUP = (1<<1), + IB_DEVICE_DM_SUP = (1<<2), + IB_DEVICE_VENDOR_CLS_SUP = (1<<3), + IB_DEVICE_RESET_QKEY_CNTR = (1<<4) }; union ib_gid { - u8 raw[16]; + u8 raw[16]; struct { -#if __BIG_ENDIAN - u64 subnet_prefix; - u64 interface_id; -#else - u64 interface_id; - u64 subnet_prefix; -#endif + u64 subnet_prefix; + u64 interface_id; } global; }; struct ib_global_route { - union ib_gid dgid; - u32 flow_label; - u8 sgid_index; - u8 hop_limit; - u8 traffic_class; + union ib_gid dgid; + u32 flow_label; + u8 sgid_index; + u8 hop_limit; + u8 traffic_class; }; struct ib_ah_attr { - struct ib_global_route grh; - u16 dlid; - u8 sl; - u8 src_path_bits; - u8 static_rate; - u8 grh_flag; - u8 port; + struct ib_global_route grh; + u16 dlid; + u8 sl; + u8 src_path_bits; + u8 static_rate; + u8 grh_flag; + u8 port; }; struct ib_qp_cap { - u32 max_send_wr; - u32 max_recv_wr; - u32 max_send_sge; - u32 max_recv_sge; - u32 max_inline_data; + u32 max_send_wr; + u32 max_recv_wr; + u32 max_send_sge; + u32 max_recv_sge; + u32 max_inline_data; }; enum ib_sig_type { @@ -291,38 +314,46 @@ }; struct ib_qp_init_attr { - void *qp_context; - struct ib_cq *send_cq; - struct ib_cq *recv_cq; - struct ib_srq *srq; - struct ib_qp_cap cap; - enum ib_sig_type sq_sig_type; - enum ib_sig_type rq_sig_type; - enum ib_qp_type qp_type; - u8 port_num; /* special QP types only */ + void *qp_context; + struct ib_cq *send_cq; + struct ib_cq *recv_cq; + struct ib_srq *srq; + struct ib_qp_cap cap; + enum ib_sig_type sq_sig_type; + enum ib_sig_type rq_sig_type; + enum ib_qp_type qp_type; + u8 port_num; /* special QP types only */ }; +enum ib_access_flags { + IB_ACCESS_REMOTE_WRITE = 1, + IB_ACCESS_REMOTE_READ = (1<<1), + IB_ACCESS_REMOTE_ATOMIC = (1<<2), + IB_ACCESS_LOCAL_WRITE = (1<<3), + IB_ACCESS_MW_BIND = (1<<4) +}; + enum ib_qp_attr_mask { - IB_QP_STATE = 1, - IB_QP_EN_SQD_ASYNC_NOTIFY = (1 << 1), - IB_QP_REMOTE_ATOMIC_FLAGS = (1 << 3), - IB_QP_PKEY_INDEX = (1 << 4), - IB_QP_PORT = (1 << 5), - IB_QP_QKEY = (1 << 6), - IB_QP_AV = (1 << 7), - IB_QP_PATH_MTU = (1 << 8), - IB_QP_TIMEOUT = (1 << 9), - IB_QP_RETRY_CNT = (1 << 10), - IB_QP_RNR_RETRY = (1 << 11), - IB_QP_RQ_PSN = (1 << 12), - IB_QP_MAX_QP_RD_ATOMIC = (1 << 13), - IB_QP_ALT_PATH = (1 << 14), - IB_QP_MIN_RNR_TIMER = (1 << 15), - IB_QP_SQ_PSN = (1 << 16), - IB_QP_MAX_DEST_RD_ATOMIC = (1 << 17), - IB_QP_PATH_MIG_STATE = (1 << 18), - IB_QP_CAP = (1 << 19), - IB_QP_DEST_QPN = (1 << 20) + IB_QP_STATE = 1, + IB_QP_EN_SQD_ASYNC_NOTIFY = (1<<1), + IB_QP_ACCESS_FLAGS = (1<<3), + IB_QP_PKEY_INDEX = (1<<4), + IB_QP_PORT = (1<<5), + IB_QP_QKEY = (1<<6), + IB_QP_AV = (1<<7), + IB_QP_PATH_MTU = (1<<8), + IB_QP_TIMEOUT = (1<<9), + IB_QP_RETRY_CNT = (1<<10), + IB_QP_RNR_RETRY = (1<<11), + IB_QP_RQ_PSN = (1<<12), + IB_QP_MAX_QP_RD_ATOMIC = (1<<13), + IB_QP_ALT_PATH = (1<<14), + IB_QP_MIN_RNR_TIMER = (1<<15), + IB_QP_SQ_PSN = (1<<16), + IB_QP_MAX_DEST_RD_ATOMIC = (1<<17), + IB_QP_PATH_MIG_STATE = (1<<18), + IB_QP_CAP = (1<<19), + IB_QP_DEST_QPN = (1<<20) }; enum ib_qp_state { @@ -341,81 +372,111 @@ IB_MIG_ARMED }; +enum ib_rnr_timeout { + IB_RNR_TIMER_655_36 = 0, + IB_RNR_TIMER_000_01 = 1, + IB_RNR_TIMER_000_02 = 2, + IB_RNR_TIMER_000_03 = 3, + IB_RNR_TIMER_000_04 = 4, + IB_RNR_TIMER_000_06 = 5, + IB_RNR_TIMER_000_08 = 6, + IB_RNR_TIMER_000_12 = 7, + IB_RNR_TIMER_000_16 = 8, + IB_RNR_TIMER_000_24 = 9, + IB_RNR_TIMER_000_32 = 10, + IB_RNR_TIMER_000_48 = 11, + IB_RNR_TIMER_000_64 = 12, + IB_RNR_TIMER_000_96 = 13, + IB_RNR_TIMER_001_28 = 14, + IB_RNR_TIMER_001_92 = 15, + IB_RNR_TIMER_002_56 = 16, + IB_RNR_TIMER_003_84 = 17, + IB_RNR_TIMER_005_12 = 18, + IB_RNR_TIMER_007_68 = 19, + IB_RNR_TIMER_010_24 = 20, + IB_RNR_TIMER_015_36 = 21, + IB_RNR_TIMER_020_48 = 22, + IB_RNR_TIMER_030_72 = 23, + IB_RNR_TIMER_040_96 = 24, + IB_RNR_TIMER_061_44 = 25, + IB_RNR_TIMER_081_92 = 26, + IB_RNR_TIMER_122_88 = 27, + IB_RNR_TIMER_163_84 = 28, + IB_RNR_TIMER_245_76 = 29, + IB_RNR_TIMER_327_68 = 30, + IB_RNR_TIMER_491_52 = 31 +}; + struct ib_qp_attr { - enum ib_qp_state qp_state; - enum ib_mtu path_mtu; - enum ib_mig_state path_mig_state; - u32 qkey; - u32 rq_psn; - u32 sq_psn; - u32 dest_qp_num; - int remote_atomic_flags; - struct ib_qp_cap cap; - struct ib_ah_attr ah_attr; - struct ib_ah_attr alt_ah_attr; - u16 pkey_index; - u16 alt_pkey_index; - u8 en_sqd_async_notify; - u8 sq_draining; - u8 max_rd_atomic; - u8 max_dest_rd_atomic; - u8 min_rnr_timer; - u8 port; - u8 timeout; - u8 retry_cnt; - u8 rnr_retry; - u8 alt_port; - u8 alt_timeout; + enum ib_qp_state qp_state; + enum ib_mtu path_mtu; + enum ib_mig_state path_mig_state; + u32 qkey; + u32 rq_psn; + u32 sq_psn; + u32 dest_qp_num; + int qp_access_flags; + struct ib_qp_cap cap; + struct ib_ah_attr ah_attr; + struct ib_ah_attr alt_ah_attr; + u16 pkey_index; + u16 alt_pkey_index; + u8 en_sqd_async_notify; + u8 sq_draining; + u8 max_rd_atomic; + u8 max_dest_rd_atomic; + u8 min_rnr_timer; + u8 port; + u8 timeout; + u8 retry_cnt; + u8 rnr_retry; + u8 alt_port; + u8 alt_timeout; }; enum ib_srq_attr_mask { - IB_SRQ_PD = 1, - IB_SRQ_MAX_WR = (1 << 1), - IB_SRQ_MAX_SGE = (1 << 2), - IB_SRQ_LIMIT = (1 << 3) + IB_SRQ_PD = 1, + IB_SRQ_MAX_WR = (1<<1), + IB_SRQ_MAX_SGE = (1<<2), + IB_SRQ_LIMIT = (1<<3) }; struct ib_srq_attr { - void *srq_context; - int max_wr; - int max_sge; - int srq_limit; + int max_wr; + int max_sge; + int srq_limit; }; -enum ib_mr_access_flags { - IB_MR_LOCAL_WRITE = 1, - IB_MR_REMOTE_WRITE = (1 << 1), - IB_MR_REMOTE_READ = (1 << 2), - IB_MR_REMOTE_ATOMIC = (1 << 3), - IB_MR_MW_BIND = (1 << 4) -}; - struct ib_phys_buf { - u64 addr; - u64 size; + u64 addr; + u64 size; }; struct ib_mr_attr { - struct ib_pd *pd; - u64 device_virt_addr; - u64 size; - int mr_access_flags; - u32 lkey; - u32 rkey; + u64 device_virt_addr; + u64 size; + int mr_access_flags; }; enum ib_mr_rereg_flags { - IB_MR_REREG_TRANS = 1, - IB_MR_REREG_PD = (1 << 1), - IB_MR_REREG_ACCESS = (1 << 2) + IB_MR_REREG_TRANS = 1, + IB_MR_REREG_PD = (1<<1), + IB_MR_REREG_ACCESS = (1<<2) }; -struct ib_mw_bind; +struct ib_mw_bind { + struct ib_mr *mr; + u64 wr_id; + u64 addr; + u32 length; + int send_flags; + int mw_access_flags; +}; struct ib_fmr_attr { - int max_pages; - int max_maps; - u8 page_size; + int max_pages; + int max_maps; + u8 page_size; }; enum ib_wr_opcode { @@ -429,52 +490,52 @@ }; enum ib_send_flags { - IB_SEND_FENCE = 1, - IB_SEND_SIGNALED = (1 << 1), - IB_SEND_SOLICITED = (1 << 2), - IB_SEND_INLINE = (1 << 3) + IB_SEND_FENCE = 1, + IB_SEND_SIGNALED = (1<<1), + IB_SEND_SOLICITED = (1<<2), + IB_SEND_INLINE = (1<<3) }; struct ib_sge { - u64 addr; - u32 length; - u32 lkey; + u64 addr; + u32 length; + u32 lkey; }; struct ib_send_wr { - struct ib_send_wr *next; - u64 wr_id; - struct ib_sge *sg_list; - int num_sge; - enum ib_wr_opcode opcode; - int send_flags; - u32 imm_data; + struct list_head list; + u64 wr_id; + struct ib_sge *sg_list; + int num_sge; + enum ib_wr_opcode opcode; + int send_flags; + u32 imm_data; union { struct { - u64 remote_addr; - u32 rkey; + u64 remote_addr; + u32 rkey; } rdma; struct { - u64 remote_addr; - u64 compare_add; - u64 swap; - u32 rkey; + u64 remote_addr; + u64 compare_add; + u64 swap; + u32 rkey; } atomic; struct { - struct ib_ah *ah; - u32 remote_qpn; - u32 remote_qkey; - u16 pkey_index; + struct ib_ah *ah; + u32 remote_qpn; + u32 remote_qkey; + u16 pkey_index; /* valid for GSI only */ } ud; } wr; }; struct ib_recv_wr { - struct _ib_recv_wr *next; - u64 wr_id; - struct ib_sge *sg_list; - int num_sge; - int recv_flags; + struct list_head list; + u64 wr_id; + struct ib_sge *sg_list; + int num_sge; + int recv_flags; }; enum ib_wc_status { @@ -497,7 +558,8 @@ IB_WC_REM_ABORT_ERR, IB_WC_INV_EECN_ERR, IB_WC_INV_EEC_STATE_ERR, - IB_WC_GENERAL_ERR + IB_WC_GENERAL_ERR, + IB_WC_RESP_TIMEOUT_ERR }; enum ib_wc_opcode { @@ -511,24 +573,24 @@ * Set value of IB_WC_RECV so consumers can test if a completion is a * receive by testing (opcode & IB_WC_RECV). */ - IB_WC_RECV = (1 << 7), + IB_WC_RECV = (1<<7), IB_WC_RECV_RDMA_WITH_IMM }; struct ib_wc { - u64 wr_id; - enum ib_wc_status status; - enum ib_wc_opcode opcode; - u32 vendor_err; - u32 byte_len; - u32 imm_data; - u32 src_qp; - u16 pkey_index; - int grh_flag:1; - int imm_data_valid:1; - u16 slid; - u8 sl; - u8 dlid_path_bits; + u64 wr_id; + enum ib_wc_status status; + enum ib_wc_opcode opcode; + u32 vendor_err; + u32 byte_len; + u32 imm_data; + u32 src_qp; + u16 pkey_index; + int grh_flag:1; + int imm_data_valid:1; + u16 slid; + u8 sl; + u8 dlid_path_bits; }; enum ib_cq_notify { @@ -536,129 +598,431 @@ IB_CQ_NEXT_COMP }; -int ib_query_device(struct ib_device *device, - struct ib_device_cap *device_cap); +enum ib_process_mad_flags { + IB_MAD_IGNORE_MKEY = 1 +}; -int ib_query_port(struct ib_device *device, - u8 port_num, struct ib_port *port); +#define IB_DEVICE_NAME_MAX 64 -int ib_query_gid(struct ib_device *device, - u8 port_num, - int index, union ib_gid *gid); +struct ib_device { + struct module *owner; + struct pci_dev *dma_device; -int ib_query_pkey(struct ib_device *device, - u8 port_num, - u16 index, u16 *pkey); + char name[IB_DEVICE_NAME_MAX]; + char *provider; + void *private; + struct list_head core_list; + void *core; + void *mad; + u32 flags; -int ib_modify_device(struct ib_device *device, - u8 port_num, int device_attr_flags); + int (*query_device)(struct ib_device *device, + struct ib_device_cap *device_cap); + int (*query_port)(struct ib_device *device, + u8 port_num, struct ib_port *port); + int (*query_gid)(struct ib_device *device, + u8 port_num, int index, + union ib_gid *gid); + int (*query_pkey)(struct ib_device *device, + u8 port_num, u16 index, u16 *pkey); + int (*modify_device)(struct ib_device *device, + u8 port_num, int device_attr_flags); + struct ib_pd * (*alloc_pd)(struct ib_device *device); + int (*dealloc_pd)(struct ib_pd *pd); + struct ib_ah * (*create_ah)(struct ib_pd *pd, + struct ib_ah_attr *ah_attr); + int (*modify_ah)(struct ib_ah *ah, + struct ib_ah_attr *ah_attr); + int (*query_ah)(struct ib_ah *ah, + struct ib_ah_attr *ah_attr); + int (*destroy_ah)(struct ib_ah *ah); + struct ib_qp * (*create_qp)(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap); + int (*modify_qp)(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap); + int (*query_qp)(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr); + int (*destroy_qp)(struct ib_qp *qp); + struct ib_srq * (*create_srq)(struct ib_pd *pd, + void *srq_context, + struct ib_srq_attr *srq_attr); + int (*query_srq)(struct ib_srq *srq, + struct ib_srq_attr *srq_attr); + int (*modify_srq)(struct ib_srq *srq, + struct ib_pd *pd, + struct ib_srq_attr *srq_attr, + int srq_attr_mask); + int (*destroy_srq)(struct ib_srq *srq); + int (*post_srq)(struct ib_srq *srq, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); + struct ib_cq * (*create_cq)(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, int cqe); + int (*resize_cq)(struct ib_cq *cq, int cqe); + int (*destroy_cq)(struct ib_cq *cq); + struct ib_mr * (*reg_phys_mr)(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); + int (*query_mr)(struct ib_mr *mr, + struct ib_mr_attr *mr_attr); + int (*dereg_mr)(struct ib_mr *mr); + int (*rereg_phys_mr)(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); + struct ib_mw * (*alloc_mw)(struct ib_pd *pd); + int (*bind_mw)(struct ib_qp *qp, + struct ib_mw *mw, + struct ib_mw_bind *mw_bind); + int (*dealloc_mw)(struct ib_mw *mw); + struct ib_fmr * (*alloc_fmr)(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr); + int (*map_fmr)(struct ib_fmr *fmr, void *addr, u64 size); + int (*map_phys_fmr)(struct ib_fmr *fmr, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf); + int (*unmap_fmr)(struct ib_fmr **fmr_array, int fmr_cnt); + int (*free_fmr)(struct ib_fmr *fmr); + int (*attach_mcast)(struct ib_qp *qp, union ib_gid *gid, + u16 lid); + int (*detach_mcast)(struct ib_qp *qp, union ib_gid *gid, + u16 lid); + int (*post_send)(struct ib_qp *qp, + struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr); + int (*post_recv)(struct ib_qp *qp, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); + int (*poll_cq)(struct ib_cq *cq, + int num_entries, + struct ib_wc *wc_array); + int (*peek_cq)(struct ib_cq *cq, int wc_cnt); + int (*req_notify_cq)(struct ib_cq *cq, + enum ib_cq_notify cq_notify); + int (*req_n_notify_cq)(struct ib_cq *cq, int wc_cnt); + int (*process_mad)(struct ib_device *device, + int process_mad_flags, + struct ib_mad *in_mad, + struct ib_mad *out_mad); -struct ib_pd *ib_alloc_pd(struct ib_device *device); + struct class_device class_dev; +}; -int ib_dealloc_pd(struct ib_pd *pd); +static inline int ib_query_device(struct ib_device *device, + struct ib_device_cap *device_cap) +{ + return device->query_device(device, device_cap); +} -struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr); +static inline int ib_query_port(struct ib_device *device, + u8 port_num, + struct ib_port *port) +{ + return device->query_port(device, port_num, port); +} -int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr); +static inline int ib_query_gid(struct ib_device *device, + u8 port_num, + int index, + union ib_gid *gid) +{ + return device->query_gid(device, port_num, index, gid); +} -int ib_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr); +static inline int ib_query_pkey(struct ib_device *device, + u8 port_num, + u16 index, + u16 *pkey) +{ + return device->query_pkey(device, port_num, index, pkey); +} -int ib_destroy_ah(struct ib_ah *ah); +static inline int ib_modify_device(struct ib_device *device, + u8 port_num, + int device_attr_flags) +{ + return device->modify_device(device, port_num, device_attr_flags); +} -struct ib_qp *ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); +static inline struct ib_pd *ib_alloc_pd(struct ib_device *device) +{ + return device->alloc_pd(device); +} -int ib_modify_qp(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, struct ib_qp_cap *qp_cap); +static inline int ib_dealloc_pd(struct ib_pd *pd) +{ + return pd->device->dealloc_pd(pd); +} -int ib_query_qp(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr); +static inline struct ib_ah *ib_create_ah(struct ib_pd *pd, + struct ib_ah_attr *ah_attr) +{ + return pd->device->create_ah(pd, ah_attr); +} -int ib_destroy_qp(struct ib_qp *qp); +static inline int ib_modify_ah(struct ib_ah *ah, + struct ib_ah_attr *ah_attr) +{ + return ah->device->modify_ah(ah, ah_attr); +} -struct ib_srq *ib_create_srq(struct ib_pd *pd, struct ib_srq_attr *srq_attr); +static inline int ib_query_ah(struct ib_ah *ah, + struct ib_ah_attr *ah_attr) +{ + return ah->device->query_ah(ah, ah_attr); +} -int ib_query_srq(struct ib_srq *srq, - struct ib_pd **pd, struct ib_srq_attr *srq_attr); +static inline int ib_destroy_ah(struct ib_ah *ah) +{ + return ah->device->destroy_ah(ah); +} -int ib_modify_srq(struct ib_srq *srq, - struct ib_pd *pd, - struct ib_srq_attr *srq_attr, int srq_attr_mask); +static inline struct ib_qp *ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap) +{ + return pd->device->create_qp(pd, qp_init_attr, qp_cap); +} -int ib_destroy_srq(struct ib_srq *srq); +static inline int ib_modify_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap) +{ + return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); +} -int ib_post_srq(struct ib_srq *srq, - struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr); +static inline int ib_query_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr) +{ + return qp->device->query_qp(qp, qp_attr, qp_attr_mask, qp_init_attr); +} -struct ib_cq *ib_create_cq(struct ib_device *device, - ib_comp_handler comp_handler, - void *cq_context, int cqe); +static inline int ib_destroy_qp(struct ib_qp *qp) +{ + return qp->device->destroy_qp(qp); +} -int ib_resize_cq(struct ib_cq *cq, int cqe); +static inline struct ib_srq *ib_create_srq(struct ib_pd *pd, + void *srq_context, + struct ib_srq_attr *srq_attr) +{ + return pd->device->create_srq(pd, srq_context, srq_attr); +} -int ib_destroy_cq(struct ib_cq *cq); +static inline int ib_query_srq(struct ib_srq *srq, + struct ib_srq_attr *srq_attr) +{ + return srq->device->query_srq(srq, srq_attr); +} +static inline int ib_modify_srq(struct ib_srq *srq, + struct ib_pd *pd, + struct ib_srq_attr *srq_attr, + int srq_attr_mask) +{ + return srq->device->modify_srq(srq, pd, srq_attr, srq_attr_mask); +} + +static inline int ib_destroy_srq(struct ib_srq *srq) +{ + return srq->device->destroy_srq(srq); +} + +static inline int ib_post_srq(struct ib_srq *srq, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr) +{ + return srq->device->post_srq(srq, recv_wr, bad_recv_wr); +} + +static inline struct ib_cq *ib_create_cq(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, + int cqe) +{ + return device->create_cq(device, comp_handler, cq_context, cqe); +} + +static inline int ib_resize_cq(struct ib_cq *cq, + int cqe) +{ + return cq->device->resize_cq(cq, cqe); +} + +static inline int ib_destroy_cq(struct ib_cq *cq) +{ + return cq->device->destroy_cq(cq); +} + /* in functions below iova_start is in/out parameter */ -struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start); +static inline struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start) +{ + return pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, + mr_access_flags, iova_start); +} -int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); +static inline int ib_query_mr(struct ib_mr *mr, + struct ib_mr_attr *mr_attr) +{ + return mr->device->query_mr(mr, mr_attr); +} -int ib_dereg_mr(struct ib_mr *mr); +static inline int ib_dereg_mr(struct ib_mr *mr) +{ + return mr->device->dereg_mr(mr); +} -int ib_rereg_phys_mr(struct ib_mr *mr, - int mr_rereg_mask, - struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start); +static inline int ib_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start) +{ + return mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, phys_buf_array, + num_phys_buf, mr_access_flags, + iova_start); +} -struct ib_mw *ib_alloc_mw(struct ib_pd *pd, u32 * rkey); +static inline struct ib_mw *ib_alloc_mw(struct ib_pd *pd) +{ + return pd->device->alloc_mw(pd); +} -int ib_query_mw(struct ib_mw *mw, u32 * rkey, struct ib_pd **pd); +static inline int ib_bind_mw(struct ib_qp *qp, + struct ib_mw *mw, + struct ib_mw_bind *mw_bind) +{ + return mw->device->bind_mw(qp, mw, mw_bind); +} -int ib_bind_mw(struct ib_qp *qp, - struct ib_mw *mw, struct ib_mw_bind *mw_bind); +static inline int ib_dealloc_mw(struct ib_mw *mw) +{ + return mw->device->dealloc_mw(mw); +} -int ib_dealloc_mw(struct ib_mw *mw); +static inline struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr) +{ + return pd->device->alloc_fmr(pd, mr_access_flags, fmr_attr); +} -struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, - int mr_access_flags, struct ib_fmr_attr *fmr_attr); +static inline int ib_map_fmr(struct ib_fmr *fmr, + void *addr, + u64 size) +{ + return fmr->device->map_fmr(fmr, addr, size); +} -int ib_map_fmr(struct ib_fmr *fmr, - void *addr, u64 size); +static inline int ib_map_phys_fmr(struct ib_fmr *fmr, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf) +{ + return fmr->device->map_phys_fmr(fmr, phys_buf_array, num_phys_buf); +} -int ib_map_phys_fmr(struct ib_fmr *fmr, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, u32 * lkey, u32 * rkey); +static inline int ib_unmap_fmr(struct ib_fmr **fmr_array, + int fmr_cnt) +{ + /* Requires all FMRs to come from same device. */ + return fmr_array[0]->device->unmap_fmr(fmr_array, fmr_cnt); +} -int ib_unmap_fmr(struct ib_fmr **fmr_array, int fmr_cnt); +static inline int ib_free_fmr(struct ib_fmr *fmr) +{ + return fmr->device->free_fmr(fmr); +} -int ib_free_fmr(struct ib_fmr *fmr); +static inline int ib_attach_mcast(struct ib_qp *qp, + union ib_gid *gid, + u16 lid) +{ + return qp->device->attach_mcast(qp, gid, lid); +} -int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid); +static inline int ib_detach_mcast(struct ib_qp *qp, + union ib_gid *gid, + u16 lid) +{ + return qp->device->detach_mcast(qp, gid, lid); +} -int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid); +static inline int ib_post_send(struct ib_qp *qp, + struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr) +{ + return qp->device->post_send(qp, send_wr, bad_send_wr); +} -int ib_post_send(struct ib_qp *qp, - struct ib_send_wr *send_wr, struct ib_send_wr **bad_send_wr); +static inline int ib_post_recv(struct ib_qp *qp, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr) +{ + return qp->device->post_recv(qp, recv_wr, bad_recv_wr); +} -int ib_post_recv(struct ib_qp *qp, - struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr); +/** + * ib_poll_cq - poll a CQ for completion(s) + * @cq:the CQ being polled + * @num_entries:maximum number of completions to return + * @wc:array of at least @num_entries &struct ib_wc where completions + * will be returned + * + * Poll a CQ for (possibly multiple) completions. If the return value + * is < 0, an error occurred. If the return value is >= 0, it is the + * number of completions returned. If the return value is + * non-negative and < num_entries, then the CQ was emptied. + */ +static inline int ib_poll_cq(struct ib_cq *cq, + int num_entries, + struct ib_wc *wc_array) +{ + return cq->device->poll_cq(cq, num_entries, wc_array); +} -int ib_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc_array); +static inline int ib_peek_cq(struct ib_cq *cq, + int wc_cnt) +{ + return cq->device->peek_cq(cq, wc_cnt); +} -int ib_peek_cq(struct ib_cq *cq, int wc_cnt); +/** + * ib_req_notify_cq - request completion notification + * @cq:the CQ to generate an event for + * @cq_notify:%IB_CQ_SOLICITED for next solicited event, + * %IB_CQ_NEXT_COMP for any completion. + */ +static inline int ib_req_notify_cq(struct ib_cq *cq, + enum ib_cq_notify cq_notify) +{ + return cq->device->req_notify_cq(cq, cq_notify); +} -int ib_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify); +static inline int ib_req_n_notify_cq(struct ib_cq *cq, + int wc_cnt) +{ + return cq->device->req_n_notify_cq(cq, wc_cnt); +} -int ib_req_n_notify_cq(struct ib_cq *cq, int wc_cnt); - -#endif /* IB_VERBS_H */ +#endif /* IB_VERBS_H */ Index: include/ib_core.h =================================================================== --- include/ib_core.h (revision 562) +++ include/ib_core.h (working copy) @@ -24,10 +24,6 @@ #include "ib_core_types.h" -struct ib_device { - char name[IB_DEVICE_NAME_MAX]; -}; - struct ib_device *ib_device_get_by_name(const char *name); struct ib_device *ib_device_get_by_index(int index); Index: include/ib_core_types.h =================================================================== --- include/ib_core_types.h (revision 624) +++ include/ib_core_types.h (working copy) @@ -22,8 +22,6 @@ #ifndef _IB_CORE_TYPES_H #define _IB_CORE_TYPES_H -#define IB_DEVICE_NAME_MAX 64 - enum { IB_DEVICE_NOTIFIER_ADD, IB_DEVICE_NOTIFIER_REMOVE From mst at mellanox.co.il Wed Aug 11 10:38:49 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Aug 2004 20:38:49 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040810075751.07f19460.mshefty@ichips.intel.com> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> <20040810075751.07f19460.mshefty@ichips.intel.com> Message-ID: <20040811173849.GA29669@mellanox.co.il> Hello! Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] ib_req_ncomp_notif in core_ layer": > On Tue, 10 Aug 2004 09:20:44 +0300 > "Michael S. Tsirkin" wrote: > > > I'd like to point out that any device can always implement > > req_ncomp_notif by means of req_comp_notif and cq peek > > or possibly even just alias req_ncomp_notif to req_comp_notif > > since the user will be always prepared to get a spurious > > event, right? > > The problem with abstracting this call is that the performance of the abstraction isn't what the client might expect. So, the abstraction does not behave as described. I dont see how its a problem. I think, a simplest interface that gives a best performance possible shall be presented. Thats the idea of abstraction, right? Performance is hardware-dependend, anyway. Even if the driver does implement it you dont get any guarantees that req_ncomp has better performance that req_comp, especially for small values of n. On some hardware the reverse may be true. So, why is it a good idea to have all clients do something like if (req_ncomp_notif()) req_comp_notif() Why not make this part of the layer already? If the client wants to check whether req_ncomp_notif is implemented and do something else (hard for me to think what) he can always check the pointer in the driver structure. Indeed, why not just have req_ncomp_notif and pass in n? Why is there a special call for n=1? Is it to save a conditional branch? Hardware could just ignore the n parameter if it cant support it. MST From roland at topspin.com Wed Aug 11 10:42:02 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 11 Aug 2004 10:42:02 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040811173849.GA29669@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 11 Aug 2004 20:38:49 +0300") References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> <20040810075751.07f19460.mshefty@ichips.intel.com> <20040811173849.GA29669@mellanox.co.il> Message-ID: <52657p4ln9.fsf@topspin.com> Michael> Indeed, why not just have req_ncomp_notif and pass in n? Michael> Why is there a special call for n=1? Is it to save a Michael> conditional branch? Hardware could just ignore the n Michael> parameter if it cant support it. req_ncomp_notif doesn't seem to have any way to request an event on the next solicited completion. - Roland From mst at mellanox.co.il Wed Aug 11 11:32:56 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Aug 2004 21:32:56 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <52657p4ln9.fsf@topspin.com> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> <20040810075751.07f19460.mshefty@ichips.intel.com> <20040811173849.GA29669@mellanox.co.il> <52657p4ln9.fsf@topspin.com> Message-ID: <20040811183256.GB29669@mellanox.co.il> Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] ib_req_ncomp_notif in core_ layer": > Michael> Indeed, why not just have req_ncomp_notif and pass in n? > Michael> Why is there a special call for n=1? Is it to save a > Michael> conditional branch? Hardware could just ignore the n > Michael> parameter if it cant support it. > > req_ncomp_notif doesn't seem to have any way to request an event on > the next solicited completion. > Its a documentation bug, actually. The current implementation actually reports an event after n completions or after any solicited completion. Same for completion with error. Thus (going from more to less events): request completion notif. > > request n completions notif. > request solicited notif. So they can be merged? MST From mshefty at ichips.intel.com Wed Aug 11 11:04:03 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Aug 2004 11:04:03 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040811173849.GA29669@mellanox.co.il> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> <20040810075751.07f19460.mshefty@ichips.intel.com> <20040811173849.GA29669@mellanox.co.il> Message-ID: <20040811110403.524ea2e3.mshefty@ichips.intel.com> On Wed, 11 Aug 2004 20:38:49 +0300 "Michael S. Tsirkin" wrote: > I dont see how its a problem. I think, a simplest interface > that gives a best performance possible shall be presented. > Thats the idea of abstraction, right? I view the overall goal of the access layer more along the lines of incorporating common code from the ULPs and device driver, than abstraction. > Indeed, why not just have req_ncomp_notif and pass in n? > Why is there a special call for n=1? Is it to save a conditional > branch? Hardware could just ignore the n parameter if it > cant support it. I'm fine with trying merging the two calls, especially since ib_req_n_notify_cq can handle the case of solicited completions. Users who care will know whether they can expect fewer HW generated events by checking the device_cap_flags. How would something like this be: ib_req_notify_cq( *cq, cq_notify, wc_cnt ); If cq_notify is set to IB_CQ_NEXT_COMP - an event is generated after wc_cnt completions or after the next solicited completion, whichever comes first. If cq_notify is set to IB_CQ_SOLICITED, wc_cnt is ignored, and an event is generated after the next solicited completion. From openib-in at polstra.com Wed Aug 11 12:14:55 2004 From: openib-in at polstra.com (John Polstra) Date: Wed, 11 Aug 2004 12:14:55 -0700 (PDT) Subject: [openib-general] [PATCH] Bug in gen1 opensm/osm_port.c revision 520 Message-ID: Revision 520 of gen1/trunk/src/userspace/osm/opensm/osm_port.c attempted to fix incorrect usage of cl_list_t structures. Although the original code was definitely wrong, I believe the fix in revision 520 is incomplete. As evidence, consider this code snippet from line 1003 of revision 520: currPortsList = &nextPortsList; cl_list_construct( &nextPortsList ); cl_list_init( &nextPortsList, 10 ); p_physp = (osm_physp_t*)cl_list_remove_head( currPortsList ); while ( p_physp != NULL ) { /* ... */ } The body of the loop will never execute, because currPortsList points to the nextPortsList header, and nextPortsList has just been made empty. I think the intent of the code is that both currPortsList and nextPortsList should be cl_list_t structures rather than pointers to cl_list_t structures. The snippet above should then look like this: currPortsList = nextPortsList; cl_list_construct( &nextPortsList ); cl_list_init( &nextPortsList, 10 ); p_physp = (osm_physp_t*)cl_list_remove_head( &currPortsList ); while ( p_physp != NULL ) { /* ... */ } So the first statement copies the whole list header (structure copy) before reinitializing nextPortsList. The attached patch is relative to the previous revision (357). I think it's the correct way to fix this code. John -------------- next part -------------- Index: osm_port.c =================================================================== RCS file: /a/jdp/isicvs/osm/opensm/osm_port.c,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -u -r1.1.1.1 -r1.2 --- osm_port.c 11 Aug 2004 18:12:29 -0000 1.1.1.1 +++ osm_port.c 11 Aug 2004 18:20:13 -0000 1.2 @@ -888,7 +888,7 @@ IN osm_bind_handle_t *h_bind ) { cl_list_t tmpPortsList; - osm_physp_t *p_physp, *p_src_physp; + osm_physp_t *p_physp, *p_src_physp = NULL; uint8_t path_array[IB_SUBNET_PATH_HOPS_MAX]; uint8_t i = 0; osm_dr_path_t *p_dr_path; @@ -943,8 +943,8 @@ cl_map_t physp_map; cl_map_t visited_map; osm_dr_path_t * p_dr_path; - cl_list_t *currPortsList; - cl_list_t *nextPortsList; + cl_list_t currPortsList; + cl_list_t nextPortsList; cl_qmap_t const *p_port_tbl; osm_port_t *p_port; osm_physp_t *p_physp, *p_remote_physp; @@ -967,8 +967,8 @@ BFS from OSM port until we find the target physp but avoid going through mapped ports */ - cl_list_construct( nextPortsList ); - cl_list_init( nextPortsList, 10 ); + cl_list_construct( &nextPortsList ); + cl_list_init( &nextPortsList, 10 ); p_port_tbl = &p_subn->port_guid_tbl; port_guid = p_subn->sm_port_guid; @@ -995,15 +995,15 @@ CL_ASSERT( p_physp ); CL_ASSERT( osm_physp_is_valid( p_physp ) ); - cl_list_insert_tail( nextPortsList, p_physp ); + cl_list_insert_tail( &nextPortsList, p_physp ); while (next_list_is_full == TRUE) { next_list_is_full = FALSE; currPortsList = nextPortsList; - cl_list_construct( nextPortsList ); - cl_list_init( nextPortsList, 10 ); - p_physp = (osm_physp_t*)cl_list_remove_head( currPortsList ); + cl_list_construct( &nextPortsList ); + cl_list_init( &nextPortsList, 10 ); + p_physp = (osm_physp_t*)cl_list_remove_head( &currPortsList ); while ( p_physp != NULL ) { /* If we are in a switch - need to go out through all the other @@ -1046,30 +1046,30 @@ } /* add the p_remote_physp to the nextPortsList */ - cl_list_insert_tail( nextPortsList, p_remote_physp ); + cl_list_insert_tail( &nextPortsList, p_remote_physp ); next_list_is_full = TRUE; } } - p_physp = (osm_physp_t*)cl_list_remove_head( currPortsList ); + p_physp = (osm_physp_t*)cl_list_remove_head( &currPortsList ); if ( reached_dest == TRUE ) { /* free the rest of the currPortsList */ while ( p_physp != NULL ) - p_physp = (osm_physp_t*)cl_list_remove_head( currPortsList ); + p_physp = (osm_physp_t*)cl_list_remove_head( &currPortsList ); /* free the nextPortsList, if items were added to it */ - p_physp = (osm_physp_t*)cl_list_remove_head( nextPortsList ); + p_physp = (osm_physp_t*)cl_list_remove_head( &nextPortsList ); while ( p_physp != NULL ) - p_physp = (osm_physp_t*)cl_list_remove_head( nextPortsList ); + p_physp = (osm_physp_t*)cl_list_remove_head( &nextPortsList ); next_list_is_full = FALSE; } } - cl_list_destroy( currPortsList ); + cl_list_destroy( &currPortsList ); } /* cleanup */ Exit: - cl_list_destroy( nextPortsList ); + cl_list_destroy( &nextPortsList ); cl_map_destroy( &physp_map ); cl_map_destroy( &visited_map ); } From mst at mellanox.co.il Wed Aug 11 14:23:40 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Aug 2004 00:23:40 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040811110403.524ea2e3.mshefty@ichips.intel.com> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> <20040810075751.07f19460.mshefty@ichips.intel.com> <20040811173849.GA29669@mellanox.co.il> <20040811110403.524ea2e3.mshefty@ichips.intel.com> Message-ID: <20040811212340.GC29669@mellanox.co.il> Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] ib_req_ncomp_notif in core_ layer": > On Wed, 11 Aug 2004 20:38:49 +0300 > "Michael S. Tsirkin" wrote: > > > I dont see how its a problem. I think, a simplest interface > > that gives a best performance possible shall be presented. > > Thats the idea of abstraction, right? > > I view the overall goal of the access layer more along the lines of incorporating common code from the ULPs and device driver, than abstraction. Semantics matter though. Currently some things are unclear: If you get an event and do req_comp_notif immediately, without polling - do you expect to get an event immediately? What currently happends as far as I can see, is: for req_comp_notif you dont, for req_ncomp_notif you do. For n=1 the behaviour of req_ncomp_notif probably does not make sence ... If you dont for req_ncomp_notif, what would you expect? > > Indeed, why not just have req_ncomp_notif and pass in n? > > Why is there a special call for n=1? Is it to save a conditional > > branch? Hardware could just ignore the n parameter if it > > cant support it. > > I'm fine with trying merging the two calls, especially since ib_req_n_notify_cq can handle the case of solicited completions. Users who care will know whether they can expect fewer HW generated events by checking the device_cap_flags. > > How would something like this be: > > ib_req_notify_cq( *cq, cq_notify, wc_cnt ); > > If cq_notify is set to IB_CQ_NEXT_COMP - an event is generated after wc_cnt completions or after the next solicited completion, whichever comes first. If cq_notify is set to IB_CQ_SOLICITED, wc_cnt is ignored, and an event is generated after the next solicited completion. Looks good. As I see it any hardware has an upper bound on the max legal value for wc_cnt - if a bugger value is given, is defaults to IB_CQ_NEXT_COMP. For devices without support for ib_req_n_notify_cq, the upper bound is simply 0. So device drivers could advertise this bound in their structure, and the access layer would call the proper method according to n. I'd also like to see another option in addition to IB_CQ_SOLICITED and IB_CQ_NEXT_COMP - something like IB_CQ_WQE, where the user labels work request for completion of which he wants to get event, and the event generated which such a work request is completed or of there is a solicited or error completion. This is often more convenient than counting how much wqes were posted. Again, hardware not supporting such events can always do IB_CQ_NEXT_COMP. MST From mshefty at ichips.intel.com Wed Aug 11 14:51:58 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Aug 2004 14:51:58 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040811212340.GC29669@mellanox.co.il> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> <20040810075751.07f19460.mshefty@ichips.intel.com> <20040811173849.GA29669@mellanox.co.il> <20040811110403.524ea2e3.mshefty@ichips.intel.com> <20040811212340.GC29669@mellanox.co.il> Message-ID: <20040811145158.09f20f5e.mshefty@ichips.intel.com> On Thu, 12 Aug 2004 00:23:40 +0300 "Michael S. Tsirkin" wrote: > Currently some things are unclear: > > If you get an event and do req_comp_notif > immediately, without polling - do you expect to get an event immediately? The call for ib_req_notify_cq() is intended to map to the semantics mentioned in the spec. For the example mentioned, calling ib_req_notify_cq() without polling arms the CQ, but will not generate an event until a new completion of the specified type is added to the CQ. > What currently happends as far as I can see, is: > for req_comp_notif you dont, for req_ncomp_notif you do. > For n=1 the behaviour of req_ncomp_notif probably does not > make sence ... > > If you dont for req_ncomp_notif, what would you expect? ib_req_n_notify_cq() is outside of the spec, so needs to be defined. My expectation would be that it behaves similar to ib_req_notify_cq(), in that it generates an event after wc_cnt (or one solicited) *new* completions are added to the CQ. > Looks good. > As I see it any hardware has an upper bound on the max legal value > for wc_cnt If there would be an upper bound, then I think it makes sense to replace the device_cap_flag with a max_notify_cnt (or whatever), that can be set to 1 if needed. > I'd also like to see another option in addition to IB_CQ_SOLICITED and > IB_CQ_NEXT_COMP - something like IB_CQ_WQE, where the user labels work > request for completion of which he wants to get event, and the event generated > which such a work request is completed or of there is a solicited or error > completion. The receive work request has some unused recv_flags that could be used for this purpose. Can current hardware support such an option? From mst at mellanox.co.il Wed Aug 11 16:01:46 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Aug 2004 02:01:46 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040811145158.09f20f5e.mshefty@ichips.intel.com> References: <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> <20040810075751.07f19460.mshefty@ichips.intel.com> <20040811173849.GA29669@mellanox.co.il> <20040811110403.524ea2e3.mshefty@ichips.intel.com> <20040811212340.GC29669@mellanox.co.il> <20040811145158.09f20f5e.mshefty@ichips.intel.com> Message-ID: <20040811230146.GA31059@mellanox.co.il> Hello! Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] ib_req_ncomp_notif in core_ layer": > On Thu, 12 Aug 2004 00:23:40 +0300 > "Michael S. Tsirkin" wrote: > > > Currently some things are unclear: > > > > If you get an event and do req_comp_notif > > immediately, without polling - do you expect to get an event immediately? > > The call for ib_req_notify_cq() is intended to map to the semantics mentioned in the spec. For the example mentioned, calling ib_req_notify_cq() without polling arms the CQ, but will not generate an event until a new completion of the specified type is added to the CQ. > > > What currently happends as far as I can see, is: > > for req_comp_notif you dont, for req_ncomp_notif you do. > > For n=1 the behaviour of req_ncomp_notif probably does not > > make sence ... > > > > If you dont for req_ncomp_notif, what would you expect? > > ib_req_n_notify_cq() is outside of the spec, so needs to be defined. My expectation would be that it behaves similar to ib_req_notify_cq(), in that it generates an event after wc_cnt (or one solicited) *new* completions are added to the CQ. This can be done but it will require a poll to be done on existing hardware. But what if the event was, say, because of completion with error? > > Looks good. > > As I see it any hardware has an upper bound on the max legal value > > for wc_cnt > > If there would be an upper bound, then I think it makes sense to replace the device_cap_flag with a max_notify_cnt (or whatever), that can be set to 1 if needed. Sounds good. > > I'd also like to see another option in addition to IB_CQ_SOLICITED and > > IB_CQ_NEXT_COMP - something like IB_CQ_WQE, where the user labels work > > request for completion of which he wants to get event, and the event generated > > which such a work request is completed or of there is a solicited or error > > completion. > > The receive work request has some unused recv_flags that could be used for this purpose. Can current hardware support such an option? Yes. mst From ftillier at infiniconsys.com Wed Aug 11 16:19:57 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Wed, 11 Aug 2004 16:19:57 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040811145158.09f20f5e.mshefty@ichips.intel.com> Message-ID: <000501c47ff9$b82e41c0$655aa8c0@infiniconsys.com> > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, August 11, 2004 2:52 PM > > On Thu, 12 Aug 2004 00:23:40 +0300 > "Michael S. Tsirkin" wrote: > > > Currently some things are unclear: > > > > If you get an event and do req_comp_notif > > immediately, without polling - do you expect to get an event > immediately? > > The call for ib_req_notify_cq() is intended to map to the semantics > mentioned in the spec. For the example mentioned, calling > ib_req_notify_cq() without polling arms the CQ, but will not generate an > event until a new completion of the specified type is added to the CQ. True, but as has been pointed out before, Tavor by default will generate a new CQ event if any completions are left unreaped in the CQ (or something to that effect). See http://openib.org/pipermail/openib-general/2004-June/003099.html. So rearming without polling *will* generate a CQ event if you're using Mellanox HCAs. > > > What currently happends as far as I can see, is: > > for req_comp_notif you dont, for req_ncomp_notif you do. > > For n=1 the behaviour of req_ncomp_notif probably does not > > make sence ... > > > > If you dont for req_ncomp_notif, what would you expect? > > ib_req_n_notify_cq() is outside of the spec, so needs to be defined. My > expectation would be that it behaves similar to ib_req_notify_cq(), in > that it generates an event after wc_cnt (or one solicited) *new* > completions are added to the CQ. We're hitting the whole "should CQ rearm be spec compliant" issue again. My understanding was that we were going for non-spec compliance because that's how the current Mellanox HCA is implemented (at least in the FW). So to poll, a user does poll->rearm, not rearm->poll. Either way, both rearm calls need to have the same general semantics, and the expected behavior needs to be clear so that future HCA HW and SW can be implemented "properly". - Fab From mst at mellanox.co.il Wed Aug 11 23:47:18 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Aug 2004 09:47:18 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <000501c47ff9$b82e41c0$655aa8c0@infiniconsys.com> References: <20040811145158.09f20f5e.mshefty@ichips.intel.com> <000501c47ff9$b82e41c0$655aa8c0@infiniconsys.com> Message-ID: <20040812064718.GB28866@mellanox.co.il> Hello! Quoting r. Fab Tillier (ftillier at infiniconsys.com) "RE: [openib-general] ib_req_ncomp_notif in core_ layer": > > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > > Sent: Wednesday, August 11, 2004 2:52 PM > > > > On Thu, 12 Aug 2004 00:23:40 +0300 > > "Michael S. Tsirkin" wrote: > > > > > Currently some things are unclear: > > > > > > If you get an event and do req_comp_notif > > > immediately, without polling - do you expect to get an event > > immediately? > > > > The call for ib_req_notify_cq() is intended to map to the semantics > > mentioned in the spec. For the example mentioned, calling > > ib_req_notify_cq() without polling arms the CQ, but will not generate an > > event until a new completion of the specified type is added to the CQ. > > True, but as has been pointed out before, Tavor by default will generate a > new CQ event if any completions are left unreaped in the CQ (or something to > that effect). See > http://openib.org/pipermail/openib-general/2004-June/003099.html. So > rearming without polling *will* generate a CQ event if you're using Mellanox > HCAs. Nope, sorry. Tavor will generate an event if completion are generated *after the event was generated*. So arm completion event arm <-- no event But arm completion event completion arm <-- event Thus you can do poll -> rearm and also rearm -> poll and get the same result without races. > > > > > What currently happends as far as I can see, is: > > > for req_comp_notif you dont, for req_ncomp_notif you do. > > > For n=1 the behaviour of req_ncomp_notif probably does not > > > make sence ... > > > > > > If you dont for req_ncomp_notif, what would you expect? > > > > ib_req_n_notify_cq() is outside of the spec, so needs to be defined. My > > expectation would be that it behaves similar to ib_req_notify_cq(), in > > that it generates an event after wc_cnt (or one solicited) *new* > > completions are added to the CQ. > > We're hitting the whole "should CQ rearm be spec compliant" issue again. My > understanding was that we were going for non-spec compliance because that's > how the current Mellanox HCA is implemented (at least in the FW). So to > poll, a user does poll->rearm, not rearm->poll. Thats not what, say, ip over ib currently does, as far as I can see. > Either way, both rearm calls need to have the same general semantics, and > the expected behavior needs to be clear so that future HCA HW and SW can be > implemented "properly". Either way this would have to be documented. What do others here think? MST From mst at mellanox.co.il Thu Aug 12 00:28:06 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Aug 2004 10:28:06 +0300 Subject: [openib-general] qp lock in mthca_poll_cq Message-ID: <20040812072806.GA803@mellanox.co.il> Hi, Roland! Why do you need the qp locked in mthca_poll_cq_one? Since the wr_id table has an entry per QP per WQE, and since you use atomics for the ref count, it seems you should not need this lock? Thanks, MST From mst at mellanox.co.il Thu Aug 12 00:46:43 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Aug 2004 10:46:43 +0300 Subject: [openib-general] qp lock in mthca_poll_cq In-Reply-To: <20040812072806.GA803@mellanox.co.il> References: <20040812072806.GA803@mellanox.co.il> Message-ID: <20040812074643.GB803@mellanox.co.il> Quoting r. Michael S. Tsirkin (mst at mellanox.co.il) "[openib-general] qp lock in mthca_poll_cq": > Hi, Roland! > Why do you need the qp locked in mthca_poll_cq_one? > Since the wr_id table has an entry per QP per WQE, and since you > use atomics for the ref count, it seems you should not need this lock? > > Thanks, > MST Further, I dont fully understand the use of qp->refcount in that function? Is it true that what you are trying to do is to call wake_up only once on the same qp? The reason I ask is that if there is a mix of completions from different qps, it seems that wake_up is still called mutple times. So why not use a local variable in mthca_poll_cq for this , and avoid atomics (which are expensive)? MST From ftillier at infiniconsys.com Thu Aug 12 06:53:43 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Thu, 12 Aug 2004 06:53:43 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040812064718.GB28866@mellanox.co.il> Message-ID: <000601c48073$c85710c0$655aa8c0@infiniconsys.com> > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Wednesday, August 11, 2004 11:47 PM > > Quoting r. Fab Tillier (ftillier at infiniconsys.com) "RE: [openib-general] > ib_req_ncomp_notif in core_ layer": > > > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > > > Sent: Wednesday, August 11, 2004 2:52 PM > > > > > > On Thu, 12 Aug 2004 00:23:40 +0300 > > > "Michael S. Tsirkin" wrote: > > > > > > > Currently some things are unclear: > > > > > > > > If you get an event and do req_comp_notif > > > > immediately, without polling - do you expect to get an event > > > > immediately? > > > > > > The call for ib_req_notify_cq() is intended to map to the semantics > > > mentioned in the spec. For the example mentioned, calling > > > ib_req_notify_cq() without polling arms the CQ, but will not generate > > > an > > > event until a new completion of the specified type is added to the CQ. > > > > True, but as has been pointed out before, Tavor by default will generate > > a new CQ event if any completions are left unreaped in the CQ (or > > something to that effect). See > > http://openib.org/pipermail/openib-general/2004-June/003099.html. So > > rearming without polling *will* generate a CQ event if you're using > > Mellanox HCAs. > > Nope, sorry. Tavor will generate an event if completion are generated > *after the event was generated*. > > So > arm > completion > event > arm <-- no event > > But > > arm > completion > event > completion > arm <-- event > > Thus you can do > poll -> rearm > and also > rearm -> poll > > and get the same result without races. > Great feedback! I think this makes a lot of sense, and I appreciate the clarification. I had been somewhat confused by the previous explanations. - Fab From roland at topspin.com Thu Aug 12 08:15:33 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 12 Aug 2004 08:15:33 -0700 Subject: [openib-general] qp lock in mthca_poll_cq In-Reply-To: <20040812074643.GB803@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 12 Aug 2004 10:46:43 +0300") References: <20040812072806.GA803@mellanox.co.il> <20040812074643.GB803@mellanox.co.il> Message-ID: <523c2s2xre.fsf@topspin.com> There are two separate uses of the QP during CQ poll, which is why refcount is an atomic_t and we also take the spinlock. First, the refcount is used to make sure that destroy_qp does not get rid of the QP struct while the QP is being accessed to handle poll_cq. It is an atomic_t because it may be accessed without the QP lock being held (eg when an async event is received for the QP). Second, the QP's lock is taken during CQE processing because other non-atomic parts of the QP struct such as the number of WQEs outstanding _are_ modified and need to be protected against concurrent access from the send/receive post routine. It might be possible to avoid taking the QP lock in poll_cq by making the current WQE count an atomic_t, but I'm not sure if that's a win because it means that send/receive posting would have to use atomic accesses as well (and I don't think you can make posting WQEs lock-free). Some of my reasoning is in the comment near the bottom of mthca_provider.h too. - Roland From mshefty at ichips.intel.com Thu Aug 12 07:49:46 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 12 Aug 2004 07:49:46 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040812064718.GB28866@mellanox.co.il> References: <20040811145158.09f20f5e.mshefty@ichips.intel.com> <000501c47ff9$b82e41c0$655aa8c0@infiniconsys.com> <20040812064718.GB28866@mellanox.co.il> Message-ID: <20040812074946.6f61fbb2.mshefty@ichips.intel.com> On Thu, 12 Aug 2004 09:47:18 +0300 "Michael S. Tsirkin" wrote: > Either way this would have to be documented. > What do others here think? Thanks for the clarification. I think we want ib_req_notify_cq to behave as you described. How does the current implementation of req_ncomp_notify work? What options does the hardware support for n>1? I will also add some documentation around these calls (depending on if they can be combined) for clarification. From mshefty at ichips.intel.com Thu Aug 12 09:31:26 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 12 Aug 2004 09:31:26 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040812064718.GB28866@mellanox.co.il> References: <20040811145158.09f20f5e.mshefty@ichips.intel.com> <000501c47ff9$b82e41c0$655aa8c0@infiniconsys.com> <20040812064718.GB28866@mellanox.co.il> Message-ID: <20040812093126.1adfc9f3.mshefty@ichips.intel.com> On Thu, 12 Aug 2004 09:47:18 +0300 "Michael S. Tsirkin" wrote: > Nope, sorry. Tavor will generate an event if completion are generated > *after the event was generated*. > > So > arm > completion > event > arm <-- no event > > But > > arm > completion > event > completion > arm <-- event Is the event generated only if the CQ is armed for the next completion? Or is it generated even in the case the the CQ is armed for the next solicited completion? If we add a generate event flag for the send and receive work requests, what sort of behavior is possible if the CQ is armed with completions still on it? From Yuefeng.Liu at Sun.COM Thu Aug 12 13:06:43 2004 From: Yuefeng.Liu at Sun.COM (Yuefeng Liu) Date: Thu, 12 Aug 2004 13:06:43 -0700 Subject: [openib-general] kdapl and udapl in openib Message-ID: <411BCDD3.50403@Sun.COM> I am looking kdapl and udapl specification but I can't find their implementations in openib.org gen2 tree. there is something called udapl helper in gen1 tree that I could compile into a kernel module, but I can't figure out what part of kdapl or udapl it implements. Is anyone in openib working on udapl and kdapl? Yuefeng From robert.j.woodruff at intel.com Thu Aug 12 13:32:51 2004 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 12 Aug 2004 13:32:51 -0700 Subject: [openib-general] kdapl and udapl in openib Message-ID: <1AC79F16F5C5284499BB9591B33D6F00CA6350@orsmsx408> >Is anyone in openib working on udapl and kdapl? >Yuefeng I think that someone from my team may be able to help here, once there is a usermode API defined for the access layer. Not sure if we will have the resources to maintain the openib.org kdapl or udapl, but would definately be willing to help get it ported to the gen2 stack. woody From halr at voltaire.com Thu Aug 12 14:23:44 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 12 Aug 2004 17:23:44 -0400 Subject: [openib-general] Client Reregistration Status Message-ID: <1092345823.1750.25.camel@localhost.localdomain> Hi, It looks like client reregistration will be in IBA 1.2. The MgtWG approved their part of these changes today and the other changes were "shipped off" to the SWG. The details are as follows: 1. PortInfo:CapabilityMask bit to indicate IsClientRegistrationSupported (bit 4) 2. New PortInfo bit for ClientReregister Used By end nodes (CA, router, base SP0, enhanced SP0) Access RW Length 1 Offset 408 Description: Optional; shall be 0 if not implemented (PortInfo:CapabilityMask.IsClientReregistrationSupported = 0). Used by SM to request end node client reregistration of SA subscriptions. See . 3. New subsection on Client Reregistration: 14.4.11 Client Reregistration Client reregistration allows the Subnet Manager to request that a client reregister all subscriptions requested from this port. The SM may request this at any time of any port supporting this option. A typical reason for the SM doing this might be that the SM suffered a failure and as a result lost its own records of such subscriptions. The SM class uses the PortInfo attribute to affect client reregistration. A port indicates it supports client reregistration for the SM class by setting PortInfo:CapabilityMask.IsClientReregistrationSupported = 1. o14-13.1.yy: If a port supports client reregistration (PortInfo.IsClientReregistrationSupported = 1), the SMA shall respond to a SubnSet(PortInfo) with PortInfo:ClientReregister=1 as follows: - a SubnGetResp(PortInfo) shall be returned with PortInfo:ClientReregister=1 - an asynchronous unaffiliated event of type Client Reregistration shall be generated (see ). o14-13.2.yy: Compliance statement is the only situation in which any SMA shall return a PortInfo with ClientReregister=1; in all other cases, it shall be 0. 4. Query HCA can read IsClientReregistration capability mask bit. 5. Addition of new asynchronous unaffiliated event for client reregistration o Client Reregistration Event - issued when SM requests client reregistration (see ) o11-6.1.2: If the CI indicates that the port supports client reregistration, the CI shall generate a Client Reregistration Event when the SMA receives this request from the SM. Note that the last 2 items (4 and 5) need approval by SWG and are subject to change. -- Hal From yaronh at voltaire.com Thu Aug 12 15:18:13 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Fri, 13 Aug 2004 01:18:13 +0300 Subject: [openib-general] kdapl and udapl in openib Message-ID: <35EA21F54A45CB47B879F21A91F4862F134CA7@taurus.voltaire.com> ________________________________ From: openib-general-bounces at openib.org on behalf of Yuefeng Liu Sent: Thu 8/12/2004 11:06 PM To: openib-general at openib.org Subject: [openib-general] kdapl and udapl in openib >I am looking kdapl and udapl specification but I can't find their >implementations in openib.org gen2 tree. there is something called >udapl helper in gen1 tree that I could compile into a kernel module, but >I can't figure out what part of kdapl or udapl it implements. > >Is anyone in openib working on udapl and kdapl? Yuefeng, our gen1 trunk also incorporates a working kDAPL and uDAPL version , we intend to port the kDAPL to gen2 once it will be stable and there will be a CM to hook to, uDAPL is more work since we need the user support, and also it maybe provided by one of the other vendors who posted it in their gen1 if you want to help in the kDAPL gen2 porting effort we can guide you through it as well as provide you the latest kDAPL code with some new updates/fixes Yaron _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From roland at topspin.com Fri Aug 13 10:49:25 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 10:49:25 -0700 Subject: [openib-general] [PATCH][1/4] Core QP API In-Reply-To: <10924193641148@topspin.com> Message-ID: <10924193642477@topspin.com> Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 621) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -112,9 +112,9 @@ u32 byte_len; u32 imm_data; u32 qp; - int pkey_index; int grh_flag:1; int imm_data_valid:1; + u16 pkey_index; u16 slid; u8 sl; u8 dlid_path_bits; @@ -125,14 +125,221 @@ IB_CQ_NEXT_COMP }; -enum ib_mr_access_flags { - IB_MR_LOCAL_WRITE = 1, - IB_MR_REMOTE_WRITE = (1<<1), - IB_MR_REMOTE_READ = (1<<2), - IB_MR_REMOTE_ATOMIC = (1<<3), - IB_MR_MW_BIND = (1<<4) +struct ib_qp_cap { + u32 max_send_wr; + u32 max_recv_wr; + u32 max_send_sge; + u32 max_recv_sge; + u32 max_inline_data; }; +enum ib_sig_type { + IB_SIGNAL_ALL_WR, + IB_SIGNAL_REQ_WR +}; + +enum ib_qp_type { + IB_QPT_RC, + IB_QPT_UC, + IB_QPT_UD, + IB_QPT_SMI, + IB_QPT_GSI, + IB_QPT_RAW_IPV6, + IB_QPT_RAW_ETY +}; + +struct ib_qp_init_attr { + void *qp_context; + struct ib_cq *send_cq; + struct ib_cq *recv_cq; + struct ib_srq *srq; + struct ib_qp_cap cap; + enum ib_sig_type sq_sig_type; + enum ib_sig_type rq_sig_type; + enum ib_qp_type qp_type; + u8 port_num; /* special QP types only */ +}; + +enum ib_rnr_timeout { + IB_RNR_TIMER_655_36 = 0, + IB_RNR_TIMER_000_01 = 1, + IB_RNR_TIMER_000_02 = 2, + IB_RNR_TIMER_000_03 = 3, + IB_RNR_TIMER_000_04 = 4, + IB_RNR_TIMER_000_06 = 5, + IB_RNR_TIMER_000_08 = 6, + IB_RNR_TIMER_000_12 = 7, + IB_RNR_TIMER_000_16 = 8, + IB_RNR_TIMER_000_24 = 9, + IB_RNR_TIMER_000_32 = 10, + IB_RNR_TIMER_000_48 = 11, + IB_RNR_TIMER_000_64 = 12, + IB_RNR_TIMER_000_96 = 13, + IB_RNR_TIMER_001_28 = 14, + IB_RNR_TIMER_001_92 = 15, + IB_RNR_TIMER_002_56 = 16, + IB_RNR_TIMER_003_84 = 17, + IB_RNR_TIMER_005_12 = 18, + IB_RNR_TIMER_007_68 = 19, + IB_RNR_TIMER_010_24 = 20, + IB_RNR_TIMER_015_36 = 21, + IB_RNR_TIMER_020_48 = 22, + IB_RNR_TIMER_030_72 = 23, + IB_RNR_TIMER_040_96 = 24, + IB_RNR_TIMER_061_44 = 25, + IB_RNR_TIMER_081_92 = 26, + IB_RNR_TIMER_122_88 = 27, + IB_RNR_TIMER_163_84 = 28, + IB_RNR_TIMER_245_76 = 29, + IB_RNR_TIMER_327_68 = 30, + IB_RNR_TIMER_491_52 = 31 +}; + +enum ib_qp_attr_mask { + IB_QP_STATE = 1, + IB_QP_EN_SQD_ASYNC_NOTIFY = (1<<1), + IB_QP_ACCESS_FLAGS = (1<<3), + IB_QP_PKEY_INDEX = (1<<4), + IB_QP_PORT = (1<<5), + IB_QP_QKEY = (1<<6), + IB_QP_AV = (1<<7), + IB_QP_PATH_MTU = (1<<8), + IB_QP_TIMEOUT = (1<<9), + IB_QP_RETRY_CNT = (1<<10), + IB_QP_RNR_RETRY = (1<<11), + IB_QP_RQ_PSN = (1<<12), + IB_QP_MAX_QP_RD_ATOMIC = (1<<13), + IB_QP_ALT_PATH = (1<<14), + IB_QP_MIN_RNR_TIMER = (1<<15), + IB_QP_SQ_PSN = (1<<16), + IB_QP_MAX_DEST_RD_ATOMIC = (1<<17), + IB_QP_PATH_MIG_STATE = (1<<18), + IB_QP_CAP = (1<<19), + IB_QP_DEST_QPN = (1<<20) +}; + +enum ib_qp_state { + IB_QPS_RESET, + IB_QPS_INIT, + IB_QPS_RTR, + IB_QPS_RTS, + IB_QPS_SQD, + IB_QPS_SQE, + IB_QPS_ERR +}; + +enum ib_mtu { + IB_MTU_256 = 1, + IB_MTU_512 = 2, + IB_MTU_1024 = 3, + IB_MTU_2048 = 4, + IB_MTU_4096 = 5 +}; + +enum ib_mig_state { + IB_MIG_MIGRATED, + IB_MIG_REARM, + IB_MIG_ARMED +}; + +struct ib_qp_attr { + enum ib_qp_state qp_state; + enum ib_mtu path_mtu; + enum ib_mig_state path_mig_state; + u32 qkey; + u32 rq_psn; + u32 sq_psn; + u32 dest_qp_num; + int qp_access_flags; + struct ib_qp_cap cap; + struct ib_ah_attr ah_attr; + struct ib_ah_attr alt_ah_attr; + u16 pkey_index; + u16 alt_pkey_index; + u8 en_sqd_async_notify; + u8 sq_draining; + u8 max_rd_atomic; + u8 max_dest_rd_atomic; + u8 min_rnr_timer; + u8 port; + u8 timeout; + u8 retry_cnt; + u8 rnr_retry; + u8 alt_port; + u8 alt_timeout; +}; + +enum ib_wr_opcode { + IB_WR_RDMA_WRITE, + IB_WR_RDMA_WRITE_WITH_IMM, + IB_WR_SEND, + IB_WR_SEND_WITH_IMM, + IB_WR_RDMA_READ, + IB_WR_ATOMIC_CMP_AND_SWP, + IB_WR_ATOMIC_FETCH_AND_ADD +}; + +enum ib_send_flags { + IB_SEND_FENCE = 1, + IB_SEND_SIGNALED = (1<<1), + IB_SEND_SOLICITED = (1<<2), + IB_SEND_INLINE = (1<<3) +}; + +enum ib_recv_flags { + IB_RECV_SIGNALED = 1 +}; + +struct ib_sge { + u64 addr; + u32 length; + u32 lkey; +}; + +struct ib_send_wr { + struct ib_send_wr *next; + u64 wr_id; + struct ib_sge *sg_list; + int num_sge; + enum ib_wr_opcode opcode; + int send_flags; + u32 imm_data; + union { + struct { + u64 remote_addr; + u32 rkey; + } rdma; + struct { + u64 remote_addr; + u64 compare_add; + u64 swap; + u32 rkey; + } atomic; + struct { + struct ib_ah *ah; + u32 remote_qpn; + u32 remote_qkey; + u16 pkey_index; /* valid for GSI only */ + } ud; + } wr; +}; + +struct ib_recv_wr { + struct ib_recv_wr *next; + u64 wr_id; + struct ib_sge *sg_list; + int num_sge; + int recv_flags; +}; + +enum ib_access_flags { + IB_ACCESS_LOCAL_WRITE = 1, + IB_ACCESS_REMOTE_WRITE = (1<<1), + IB_ACCESS_REMOTE_READ = (1<<2), + IB_ACCESS_REMOTE_ATOMIC = (1<<3), + IB_ACCESS_MW_BIND = (1<<4) +}; + struct ib_phys_buf { u64 addr; u64 size; @@ -153,7 +360,14 @@ IB_MR_REREG_ACCESS = (1<<2) }; -struct ib_mw_bind; +struct ib_mw_bind { + struct ib_mr *mr; + u64 wr_id; + u64 addr; + u32 length; + int send_flags; + int mw_access_flags; +}; struct ib_pd { struct ib_device *device; @@ -175,6 +389,15 @@ atomic_t usecnt; /* count number of work queues */ }; +struct ib_qp { + struct ib_device *device; + struct ib_pd *pd; + struct ib_cq *send_cq; + struct ib_cq *recv_cq; + void *qp_context; + u32 qp_num; +}; + struct ib_mr { struct ib_device *device; struct ib_pd *pd; @@ -218,13 +441,24 @@ int (*query_ah)(struct ib_ah *ah, struct ib_ah_attr *ah_attr); int (*destroy_ah)(struct ib_ah *ah); - ib_qp_create_func qp_create; - ib_special_qp_create_func special_qp_create; - ib_qp_modify_func qp_modify; - ib_qp_query_func qp_query; - ib_qp_destroy_func qp_destroy; - ib_send_post_func send_post; - ib_receive_post_func receive_post; + struct ib_qp * (*create_qp)(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap); + int (*modify_qp)(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap); + int (*query_qp)(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr); + int (*destroy_qp)(struct ib_qp *qp); + int (*post_send)(struct ib_qp *qp, + struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr); + int (*post_recv)(struct ib_qp *qp, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); struct ib_cq * (*create_cq)(struct ib_device *device, int *cqe); int (*destroy_cq)(struct ib_cq *cq); @@ -279,6 +513,36 @@ int ib_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr); int ib_destroy_ah(struct ib_ah *ah); +struct ib_qp *ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap); + +int ib_modify_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap); + +int ib_query_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr); + +int ib_destroy_qp(struct ib_qp *qp); + +static inline int ib_post_send(struct ib_qp *qp, + struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr) +{ + return qp->device->post_send(qp, send_wr, bad_send_wr); +} + +static inline int ib_post_recv(struct ib_qp *qp, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr) +{ + return qp->device->post_recv(qp, recv_wr, bad_recv_wr); +} + struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, void *cq_context, int cqe); Index: src/linux-kernel/infiniband/include/ts_ib_core.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core.h (revision 621) +++ src/linux-kernel/infiniband/include/ts_ib_core.h (working copy) @@ -63,36 +63,6 @@ int index, tTS_IB_GID gid); -int ib_qp_create(struct ib_qp_create_param *param, - struct ib_qp **qp, - u32 *qpn); -int ib_special_qp_create(struct ib_qp_create_param *param, - tTS_IB_PORT port, - enum ib_special_qp_type qp_type, - struct ib_qp **qp); -int ib_qp_modify(struct ib_qp *qp, - struct ib_qp_attribute *attr); -int ib_qp_query(struct ib_qp *qp, - struct ib_qp_attribute *attr); -int ib_qp_query_qpn(struct ib_qp *qp, - u32 *qpn); -int ib_qp_destroy(struct ib_qp *qp); - -static inline int ib_send(struct ib_qp *qp, - struct ib_send_param *param, - int num_work_requests) -{ - IB_CHECK_MAGIC(qp, QP); - return qp->device->send_post(qp, param, num_work_requests); -} -static inline int ib_receive(struct ib_qp *qp, - struct ib_receive_param *param, - int num_work_requests) -{ - IB_CHECK_MAGIC(qp, QP); - return qp->device->receive_post(qp, param, num_work_requests); -} - int ib_fmr_pool_create(struct ib_pd *pd, struct ib_fmr_pool_param *params, struct ib_fmr_pool **pool); @@ -140,7 +110,7 @@ int ib_cached_pkey_find(struct ib_device *device, tTS_IB_PORT port, u16 pkey, - int *index); + u16 *index); #endif /* _TS_IB_CORE_H */ Index: src/linux-kernel/infiniband/include/ts_ib_core_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core_types.h (revision 621) +++ src/linux-kernel/infiniband/include/ts_ib_core_types.h (working copy) @@ -52,14 +52,6 @@ struct ib_device; -enum ib_mtu { - IB_MTU_256 = 1, - IB_MTU_512 = 2, - IB_MTU_1024 = 3, - IB_MTU_2048 = 4, - IB_MTU_4096 = 5 -}; - enum ib_port_state { IB_PORT_STATE_NOP = 0, IB_PORT_STATE_DOWN = 1, @@ -76,7 +68,7 @@ }; struct ib_port_properties { - enum ib_mtu max_mtu; + int max_mtu; __u32 max_message_size; __u16 lid; __u8 lmc; @@ -130,7 +122,7 @@ __u8 tclass; __u16 pkey; tTS_IB_SL sl; - enum ib_mtu mtu; + int mtu; enum ib_rate rate; __u8 packet_life; __u8 preference; @@ -199,18 +191,6 @@ #ifdef __KERNEL__ -enum ib_op { - IB_OP_RECEIVE, - IB_OP_SEND, - IB_OP_SEND_IMMEDIATE, - IB_OP_RDMA_WRITE, - IB_OP_RDMA_WRITE_IMMEDIATE, - IB_OP_RDMA_READ, - IB_OP_COMPARE_SWAP, - IB_OP_FETCH_ADD, - IB_OP_MEMORY_WINDOW_BIND -}; - enum ib_async_event { IB_QP_PATH_MIGRATED, IB_EEC_PATH_MIGRATED, @@ -237,17 +217,8 @@ int dead; }; -struct ib_qp { - IB_DECLARE_MAGIC - struct ib_device *device; - u32 qpn; - struct ib_async_obj async_obj; - struct list_head async_handler_list; - spinlock_t async_handler_lock; - void *private; -}; - struct ib_fmr_pool; /* actual definition in core_fmr.c */ +struct ib_pd; struct ib_fmr { IB_DECLARE_MAGIC @@ -296,110 +267,11 @@ IB_STATIC_RATE_12X_TO_1X = 11 }; -enum ib_rnr_timeout { - IB_RNR_TIMER_655_36 = 0, - IB_RNR_TIMER_000_01 = 1, - IB_RNR_TIMER_000_02 = 2, - IB_RNR_TIMER_000_03 = 3, - IB_RNR_TIMER_000_04 = 4, - IB_RNR_TIMER_000_06 = 5, - IB_RNR_TIMER_000_08 = 6, - IB_RNR_TIMER_000_12 = 7, - IB_RNR_TIMER_000_16 = 8, - IB_RNR_TIMER_000_24 = 9, - IB_RNR_TIMER_000_32 = 10, - IB_RNR_TIMER_000_48 = 11, - IB_RNR_TIMER_000_64 = 12, - IB_RNR_TIMER_000_96 = 13, - IB_RNR_TIMER_001_28 = 14, - IB_RNR_TIMER_001_92 = 15, - IB_RNR_TIMER_002_56 = 16, - IB_RNR_TIMER_003_84 = 17, - IB_RNR_TIMER_005_12 = 18, - IB_RNR_TIMER_007_68 = 19, - IB_RNR_TIMER_010_24 = 20, - IB_RNR_TIMER_015_36 = 21, - IB_RNR_TIMER_020_48 = 22, - IB_RNR_TIMER_030_72 = 23, - IB_RNR_TIMER_040_96 = 24, - IB_RNR_TIMER_061_44 = 25, - IB_RNR_TIMER_081_92 = 26, - IB_RNR_TIMER_122_88 = 27, - IB_RNR_TIMER_163_84 = 28, - IB_RNR_TIMER_245_76 = 29, - IB_RNR_TIMER_327_68 = 30, - IB_RNR_TIMER_491_52 = 31 -}; - enum ib_device_properties_mask { IB_DEVICE_SYSTEM_IMAGE_GUID = 1 << 0 }; -enum ib_transport { - IB_TRANSPORT_RC = 0, - IB_TRANSPORT_UC = 1, - IB_TRANSPORT_RD = 2, - IB_TRANSPORT_UD = 3, -}; - -enum ib_special_qp_type { - IB_SMI_QP, - IB_GSI_QP, - IB_RAW_IPV6_QP, - IB_RAW_ETHERTYPE_QP -}; - -enum ib_wq_signal_policy { - IB_WQ_SIGNAL_ALL, - IB_WQ_SIGNAL_SELECTABLE -}; - -enum ib_qp_state { - IB_QP_STATE_RESET, - IB_QP_STATE_INIT, - IB_QP_STATE_RTR, - IB_QP_STATE_RTS, - IB_QP_STATE_SQD, - IB_QP_STATE_SQE, - IB_QP_STATE_ERROR -}; - -enum ib_migration_state { - IB_MIGRATED, - IB_REARM, - IB_ARMED -}; - -enum ib_qp_attribute_mask { - IB_QP_ATTRIBUTE_STATE = 1 << 0, - IB_QP_ATTRIBUTE_SEND_PSN = 1 << 1, - IB_QP_ATTRIBUTE_RECEIVE_PSN = 1 << 2, - IB_QP_ATTRIBUTE_DESTINATION_QPN = 1 << 3, - IB_QP_ATTRIBUTE_QKEY = 1 << 4, - IB_QP_ATTRIBUTE_PATH_MTU = 1 << 5, - IB_QP_ATTRIBUTE_MIGRATION_STATE = 1 << 6, - IB_QP_ATTRIBUTE_INITIATOR_DEPTH = 1 << 7, - IB_QP_ATTRIBUTE_RESPONDER_RESOURCES = 1 << 8, - IB_QP_ATTRIBUTE_RETRY_COUNT = 1 << 9, - IB_QP_ATTRIBUTE_RNR_RETRY_COUNT = 1 << 10, - IB_QP_ATTRIBUTE_RNR_TIMEOUT = 1 << 11, - IB_QP_ATTRIBUTE_PKEY_INDEX = 1 << 12, - IB_QP_ATTRIBUTE_PORT = 1 << 13, - IB_QP_ATTRIBUTE_ADDRESS = 1 << 14, - IB_QP_ATTRIBUTE_LOCAL_ACK_TIMEOUT = 1 << 15, - IB_QP_ATTRIBUTE_ALT_PKEY_INDEX = 1 << 16, - IB_QP_ATTRIBUTE_ALT_PORT = 1 << 17, - IB_QP_ATTRIBUTE_ALT_ADDRESS = 1 << 18, - IB_QP_ATTRIBUTE_ALT_LOCAL_ACK_TIMEOUT = 1 << 19, - IB_QP_ATTRIBUTE_RDMA_ATOMIC_ENABLE = 1 << 20, - IB_QP_ATTRIBUTE_SQD_ASYNC_EVENT_ENABLE = 1 << 21 -}; - enum ib_memory_access { - IB_ACCESS_LOCAL_WRITE = 1 << 0, - IB_ACCESS_REMOTE_WRITE = 1 << 1, - IB_ACCESS_REMOTE_READ = 1 << 2, - IB_ACCESS_REMOTE_ATOMIC = 1 << 3, IB_ACCESS_ENABLE_WINDOW = 1 << 4 }; @@ -422,107 +294,6 @@ tTS_IB_GUID system_image_guid; }; -struct ib_address_vector { - int service_level; - enum ib_static_rate static_rate; - int source_path_bits; - u16 dlid; - tTS_IB_PORT port; - u32 flow_label; - int source_gid_index; - u8 hop_limit; - u8 traffic_class; - tTS_IB_GID dgid; - int use_grh:1; -}; - -struct ib_qp_limit { - int max_outstanding_send_request; - int max_outstanding_receive_request; - int max_send_gather_element; - int max_receive_scatter_element; -}; - -struct ib_qp_create_param { - struct ib_qp_limit limit; - struct ib_pd *pd; - struct ib_cq *send_queue; - struct ib_cq *receive_queue; - enum ib_wq_signal_policy send_policy; - enum ib_wq_signal_policy receive_policy; - struct ib_rdd *rd_domain; - enum ib_transport transport; - void *device_specific; -}; - -struct ib_qp_attribute { - enum ib_qp_attribute_mask valid_fields; - enum ib_qp_state state; - tTS_IB_PSN send_psn; - tTS_IB_PSN receive_psn; - u32 destination_qpn; - u32 qkey; - enum ib_mtu path_mtu; - enum ib_migration_state migration_state; - int initiator_depth; - int responder_resources; - u8 retry_count; - u8 rnr_retry_count; - enum ib_rnr_timeout rnr_timeout; - int pkey_index; - tTS_IB_PORT port; - struct ib_address_vector address; - u8 local_ack_timeout; - int alt_pkey_index; - tTS_IB_PORT alt_port; - struct ib_address_vector alt_address; - u8 alt_local_ack_timeout; - int enable_atomic:1; - int enable_rdma_read:1; - int enable_rdma_write:1; - int sqd_async_event_enable:1; - int sq_drained:1; -}; - -struct ib_gather_scatter { - u64 address; - u32 length; - u32 key; -}; - -struct ib_send_param { - u64 work_request_id; - enum ib_op op; - struct ib_gather_scatter *gather_list; - int num_gather_entries; - u64 remote_address; - u32 rkey; - u32 dest_qpn; - u32 dest_qkey; - struct ib_ah *dest_address; - u32 immediate_data; - u64 compare_add; - u64 swap; - u32 eecn; - u16 ethertype; - enum ib_static_rate static_rate; - int pkey_index; - void *device_specific; - int solicited_event:1; - int signaled:1; - int immediate_data_valid:1; - int fence:1; - int inline_data:1; -}; - -struct ib_receive_param { - u64 work_request_id; - struct ib_gather_scatter *scatter_list; - int num_scatter_entries; - void *device_specific; - int signaled:1; -}; - struct ib_fmr_pool_param { int max_pages_per_fmr; enum ib_memory_access access; @@ -561,25 +332,6 @@ tTS_IB_PORT port, int index, tTS_IB_GID gid); -typedef int (*ib_qp_create_func)(struct ib_pd *pd, - struct ib_qp_create_param *param, - struct ib_qp *qp); -typedef int (*ib_special_qp_create_func)(struct ib_pd *pd, - struct ib_qp_create_param *param, - tTS_IB_PORT port, - enum ib_special_qp_type qp_type, - struct ib_qp *qp); -typedef int (*ib_qp_modify_func)(struct ib_qp *qp, - struct ib_qp_attribute *attr); -typedef int (*ib_qp_query_func)(struct ib_qp *qp, - struct ib_qp_attribute *attr); -typedef int (*ib_qp_destroy_func)(struct ib_qp *qp); -typedef int (*ib_send_post_func)(struct ib_qp *qp, - struct ib_send_param *param, - int num_work_requests); -typedef int (*ib_receive_post_func)(struct ib_qp *qp, - struct ib_receive_param *param, - int num_work_requests); typedef int (*ib_fmr_create_func)(struct ib_pd *pd, enum ib_memory_access access, int max_pages, Index: src/linux-kernel/infiniband/core/core_ah.c =================================================================== --- src/linux-kernel/infiniband/core/core_ah.c (revision 607) +++ src/linux-kernel/infiniband/core/core_ah.c (working copy) @@ -21,17 +21,10 @@ $Id$ */ -#include "core_priv.h" - -#include "ts_kernel_trace.h" -#include "ts_kernel_services.h" - -#include -#include - #include -#include +#include "core_priv.h" + struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) { struct ib_ah *ah; Index: src/linux-kernel/infiniband/core/core_async.c =================================================================== --- src/linux-kernel/infiniband/core/core_async.c (revision 576) +++ src/linux-kernel/infiniband/core/core_async.c (working copy) @@ -112,28 +112,8 @@ switch (event_table[record->event].mod) { case QP: - { - struct ib_qp *qp = record->modifier.qp; + break; - if (!IB_TEST_MAGIC(qp, QP)) { - TS_REPORT_WARN(MOD_KERNEL_IB, "Bad magic 0x%lx at %p for QP", - IB_GET_MAGIC(qp), qp); - ret = -EINVAL; - goto error; - } - - if (qp->device != record->device) { - ret = -EINVAL; - goto error; - } - - spin_lock_irqsave(&qp->async_handler_lock, flags); - handler->list_lock = &qp->async_handler_lock; - list_add_tail(&handler->list, &qp->async_handler_list); - spin_unlock_irqrestore(&qp->async_handler_lock, flags); - } - break; - case CQ: printk(KERN_WARNING "Async events for CQs not supported\n"); break; @@ -192,9 +172,6 @@ unsigned long flags = 0; /* initialize to shut up gcc */ switch (event_table[event_record->event].mod) { - case QP: - async_obj = &event_record->modifier.qp->async_obj; - break; default: break; } @@ -243,9 +220,6 @@ switch (event_table[event->record.event].mod) { case QP: sprintf(mod_buf, " (QP %p)", event->record.modifier.qp); - handler_list = &event->record.modifier.qp->async_handler_list; - handler_lock = &event->record.modifier.qp->async_handler_lock; - async_obj = &event->record.modifier.qp->async_obj; break; case CQ: Index: src/linux-kernel/infiniband/core/core_cache.c =================================================================== --- src/linux-kernel/infiniband/core/core_cache.c (revision 576) +++ src/linux-kernel/infiniband/core/core_cache.c (working copy) @@ -239,7 +239,7 @@ int ib_cached_pkey_find(struct ib_device *device, tTS_IB_PORT port, u16 pkey, - int *index) + u16 *index) { struct ib_device_private *priv; unsigned int seq; Index: src/linux-kernel/infiniband/core/core_cq.c =================================================================== --- src/linux-kernel/infiniband/core/core_cq.c (revision 589) +++ src/linux-kernel/infiniband/core/core_cq.c (working copy) @@ -21,9 +21,6 @@ $Id$ */ -#include -#include - #include #include "core_priv.h" Index: src/linux-kernel/infiniband/core/core_device.c =================================================================== --- src/linux-kernel/infiniband/core/core_device.c (revision 607) +++ src/linux-kernel/infiniband/core/core_device.c (working copy) @@ -52,11 +52,11 @@ IB_MANDATORY_FUNC(dealloc_pd), IB_MANDATORY_FUNC(create_ah), IB_MANDATORY_FUNC(destroy_ah), - IB_MANDATORY_FUNC(special_qp_create), - IB_MANDATORY_FUNC(qp_modify), - IB_MANDATORY_FUNC(qp_destroy), - IB_MANDATORY_FUNC(send_post), - IB_MANDATORY_FUNC(receive_post), + IB_MANDATORY_FUNC(create_qp), + IB_MANDATORY_FUNC(modify_qp), + IB_MANDATORY_FUNC(destroy_qp), + IB_MANDATORY_FUNC(post_send), + IB_MANDATORY_FUNC(post_recv), IB_MANDATORY_FUNC(create_cq), IB_MANDATORY_FUNC(destroy_cq), IB_MANDATORY_FUNC(poll_cq), Index: src/linux-kernel/infiniband/core/core_mcast.c =================================================================== --- src/linux-kernel/infiniband/core/core_mcast.c (revision 621) +++ src/linux-kernel/infiniband/core/core_mcast.c (working copy) @@ -21,16 +21,10 @@ $Id$ */ -#include "core_priv.h" - -#include "ts_kernel_trace.h" -#include "ts_kernel_services.h" - -#include -#include - #include +#include "core_priv.h" + int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) { return qp->device->attach_mcast ? Index: src/linux-kernel/infiniband/core/core_mw.c =================================================================== --- src/linux-kernel/infiniband/core/core_mw.c (revision 613) +++ src/linux-kernel/infiniband/core/core_mw.c (working copy) @@ -22,14 +22,9 @@ */ #include -#include -#include #include "core_priv.h" -#include "ts_kernel_trace.h" -#include "ts_kernel_services.h" - struct ib_mw *ib_alloc_mw(struct ib_pd *pd) { struct ib_mw *mw; Index: src/linux-kernel/infiniband/core/core_qp.c =================================================================== --- src/linux-kernel/infiniband/core/core_qp.c (revision 576) +++ src/linux-kernel/infiniband/core/core_qp.c (working copy) @@ -21,147 +21,72 @@ $Id$ */ -#include "core_priv.h" - -#include "ts_kernel_trace.h" -#include "ts_kernel_services.h" - -#include -#include - #include -#include -int ib_qp_create(struct ib_qp_create_param *param, - struct ib_qp **qp_handle, - u32 *qpn) -{ - struct ib_pd *pd; - struct ib_qp *qp; - int ret; +#include "core_priv.h" - pd = param->pd; - - if (!pd->device->qp_create) { - return -ENOSYS; - } - - qp = kmalloc(sizeof *qp, GFP_KERNEL); - if (!qp) { - return -ENOMEM; - } - - INIT_LIST_HEAD(&qp->async_handler_list); - spin_lock_init(&qp->async_handler_lock); - ib_async_obj_init(&qp->async_obj, qp); - - ret = pd->device->qp_create(pd, param, qp); - - if (!ret) { - IB_SET_MAGIC(qp, QP); - qp->device = pd->device; - *qp_handle = qp; - *qpn = qp->qpn; - } else { - kfree(qp); - } - - return ret; -} -EXPORT_SYMBOL(ib_qp_create); - -int ib_special_qp_create(struct ib_qp_create_param *param, - tTS_IB_PORT port, - enum ib_special_qp_type qp_type, - struct ib_qp **qp_handle) +struct ib_qp *ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap) { - struct ib_pd *pd; struct ib_qp *qp; - int ret; - pd = param->pd; + qp = pd->device->create_qp(pd, qp_init_attr, qp_cap); - if (!pd->device->special_qp_create) { - return -ENOSYS; + if (!IS_ERR(qp)) { + qp->device = pd->device; + qp->pd = pd; + qp->send_cq = qp_init_attr->send_cq; + qp->recv_cq = qp_init_attr->recv_cq; + atomic_inc(&pd->usecnt); + atomic_inc(&qp_init_attr->send_cq->usecnt); + atomic_inc(&qp_init_attr->recv_cq->usecnt); } - qp = kmalloc(sizeof *qp, GFP_KERNEL); - if (!qp) { - return -ENOMEM; - } - - INIT_LIST_HEAD(&qp->async_handler_list); - spin_lock_init(&qp->async_handler_lock); - ib_async_obj_init(&qp->async_obj, qp); - - ret = pd->device->special_qp_create(pd, param, port, qp_type, qp); - - if (!ret) { - IB_SET_MAGIC(qp, QP); - qp->device = pd->device; - *qp_handle = qp; - } else { - kfree(qp); - } - - return ret; + return qp; } -EXPORT_SYMBOL(ib_special_qp_create); +EXPORT_SYMBOL(ib_create_qp); -int ib_qp_modify(struct ib_qp *qp, - struct ib_qp_attribute *attr) +int ib_modify_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap) { - IB_CHECK_MAGIC(qp, QP); - return qp->device->qp_modify ? qp->device->qp_modify(qp, attr) : -ENOSYS; + return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); } -EXPORT_SYMBOL(ib_qp_modify); +EXPORT_SYMBOL(ib_modify_qp); -int ib_qp_query(struct ib_qp *qp, - struct ib_qp_attribute *attr) +int ib_query_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr) { - IB_CHECK_MAGIC(qp, QP); - return qp->device->qp_query ? qp->device->qp_query(qp, attr) : -ENOSYS; + return qp->device->query_qp ? + qp->device->query_qp(qp, qp_attr, qp_attr_mask, qp_init_attr) : + -ENOSYS; } -EXPORT_SYMBOL(ib_qp_query); +EXPORT_SYMBOL(ib_query_qp); -int ib_qp_query_qpn(struct ib_qp *qp, - u32 *qpn) +int ib_destroy_qp(struct ib_qp *qp) { - IB_CHECK_MAGIC(qp, QP); + struct ib_pd *pd; + struct ib_cq *scq, *rcq; + int ret; - *qpn = qp->qpn; - return 0; -} -EXPORT_SYMBOL(ib_qp_query_qpn); + pd = qp->pd; + scq = qp->send_cq; + rcq = qp->recv_cq; -int ib_qp_destroy(struct ib_qp *qp_handle) -{ - struct ib_qp *qp = qp_handle; - int ret; - unsigned long flags; - - IB_CHECK_MAGIC(qp, QP); - - if (!qp->device->qp_destroy) { - return -ENOSYS; - } - - if (!list_empty(&qp->async_handler_list)) { - return -EBUSY; - } - - ret = qp->device->qp_destroy(qp); + ret = qp->device->destroy_qp(qp); if (!ret) { - IB_CLEAR_MAGIC(qp); - spin_lock_irqsave(&qp->async_obj.lock, flags); - if (!qp->async_obj.pending) - kfree(qp); - spin_unlock_irqrestore(&qp->async_obj.lock, flags); + atomic_dec(&pd->usecnt); + atomic_dec(&scq->usecnt); + atomic_dec(&rcq->usecnt); } return ret; } -EXPORT_SYMBOL(ib_qp_destroy); +EXPORT_SYMBOL(ib_destroy_qp); /* Local Variables: From roland at topspin.com Fri Aug 13 10:49:25 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 10:49:25 -0700 Subject: [openib-general] [PATCH][2/4] Low-level driver QP API In-Reply-To: <10924193642477@topspin.com> Message-ID: <10924193652379@topspin.com> Index: src/linux-kernel/infiniband/hw/mthca/mthca_dev.h =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_dev.h (revision 621) +++ src/linux-kernel/infiniband/hw/mthca/mthca_dev.h (working copy) @@ -229,7 +229,7 @@ #define MTHCA_GET(dest, source, offset) \ do { \ - void *__p = (void *) (source) + (offset); \ + void *__p = (char *) (source) + (offset); \ switch (sizeof (dest)) { \ case 1: (dest) = *(u8 *) __p; break; \ case 2: (dest) = be16_to_cpup(__p); break; \ @@ -241,7 +241,8 @@ #define MTHCA_PUT(dest, source, offset) \ do { \ - __typeof__(source) *__p = (void *) (dest) + (offset); \ + __typeof__(source) *__p = \ + (__typeof__(source) *) ((char *) (dest) + (offset)); \ switch (sizeof(source)) { \ case 1: *__p = (source); break; \ case 2: *__p = cpu_to_be16(source); break; \ @@ -307,28 +308,29 @@ void mthca_qp_event(struct mthca_dev *dev, u32 qpn, enum ib_async_event event); -int mthca_modify_qp(struct ib_qp *qp, struct ib_qp_attribute *attr); -int mthca_post_send(struct ib_qp *ibqp, struct ib_send_param *param, - int nreq); -int mthca_post_receive(struct ib_qp *ibqp, struct ib_receive_param *param, - int nreq); +int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, + int attr_mask, struct ib_qp_cap *qp_cap); +int mthca_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, + struct ib_send_wr **bad_wr); +int mthca_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr, + struct ib_recv_wr **bad_wr); int mthca_free_err_wqe(struct mthca_qp *qp, int is_send, int index, int *dbd, u32 *new_wqe); int mthca_alloc_qp(struct mthca_dev *dev, struct mthca_pd *pd, struct mthca_cq *send_cq, struct mthca_cq *recv_cq, - enum ib_transport transport, - enum ib_wq_signal_policy send_policy, - enum ib_wq_signal_policy recv_policy, + enum ib_qp_type type, + enum ib_sig_type send_policy, + enum ib_sig_type recv_policy, struct mthca_qp *qp); int mthca_alloc_sqp(struct mthca_dev *dev, struct mthca_pd *pd, struct mthca_cq *send_cq, struct mthca_cq *recv_cq, - enum ib_wq_signal_policy send_policy, - enum ib_wq_signal_policy recv_policy, - enum ib_special_qp_type type, + enum ib_sig_type send_policy, + enum ib_sig_type recv_policy, + int qpn, int port, struct mthca_sqp *sqp); void mthca_free_qp(struct mthca_dev *dev, struct mthca_qp *qp); Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 621) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -295,87 +295,77 @@ return 0; } -static int mthca_qp_create(struct ib_pd *pd, - struct ib_qp_create_param *param, - struct ib_qp *ibqp) +static struct ib_qp *mthca_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *init_attr, + struct ib_qp_cap *qp_cap) { + struct mthca_qp *qp; int err; - struct mthca_qp *qp = kmalloc(sizeof *qp, GFP_KERNEL); - if (!qp) - return -ENOMEM; - qp->ib_qp = ibqp; + switch (init_attr->qp_type) { + case IB_QPT_RC: + case IB_QPT_UC: + case IB_QPT_UD: + { + qp = kmalloc(sizeof *qp, GFP_KERNEL); + if (!qp) + return ERR_PTR(-ENOMEM); - qp->sq.max = param->limit.max_outstanding_send_request; - qp->rq.max = param->limit.max_outstanding_receive_request; - qp->sq.max_gs = param->limit.max_send_gather_element; - qp->rq.max_gs = param->limit.max_receive_scatter_element; + qp->sq.max = init_attr->cap.max_send_wr; + qp->rq.max = init_attr->cap.max_recv_wr; + qp->sq.max_gs = init_attr->cap.max_send_sge; + qp->rq.max_gs = init_attr->cap.max_recv_sge; - err = mthca_alloc_qp(to_mdev(pd->device), (struct mthca_pd *) pd, - (struct mthca_cq *) param->send_queue, - (struct mthca_cq *) param->receive_queue, - param->transport, param->send_policy, - param->receive_policy, qp); - if (err) { - kfree(qp); - return err; + err = mthca_alloc_qp(to_mdev(pd->device), (struct mthca_pd *) pd, + (struct mthca_cq *) init_attr->send_cq, + (struct mthca_cq *) init_attr->recv_cq, + init_attr->qp_type, init_attr->sq_sig_type, + init_attr->rq_sig_type, qp); + qp->ibqp.qp_num = qp->qpn; + break; } + case IB_QPT_SMI: + case IB_QPT_GSI: + { + qp = kmalloc(sizeof (struct mthca_sqp), GFP_KERNEL); + if (!qp) + return ERR_PTR(-ENOMEM); - ibqp->private = qp; - ibqp->qpn = qp->qpn; + qp->sq.max = init_attr->cap.max_send_wr; + qp->rq.max = init_attr->cap.max_recv_wr; + qp->sq.max_gs = init_attr->cap.max_send_sge; + qp->rq.max_gs = init_attr->cap.max_recv_sge; - return 0; -} + qp->ibqp.qp_num = init_attr->qp_type == IB_QPT_SMI ? 0 : 1; -static int mthca_special_qp_create(struct ib_pd *pd, - struct ib_qp_create_param *param, - u8 port, - enum ib_special_qp_type qp_type, - struct ib_qp *ibqp) -{ - struct mthca_qp *qp; - int err; + err = mthca_alloc_sqp(to_mdev(pd->device), (struct mthca_pd *) pd, + (struct mthca_cq *) init_attr->send_cq, + (struct mthca_cq *) init_attr->recv_cq, + init_attr->sq_sig_type, init_attr->rq_sig_type, + qp->ibqp.qp_num, init_attr->port_num, + (struct mthca_sqp *) qp); + break; + } + default: + /* Don't support raw QPs */ + return ERR_PTR(-ENOSYS); + } - /* Don't support raw QPs */ - if (qp_type != IB_SMI_QP && qp_type != IB_GSI_QP) - return -ENOSYS; - - ibqp->private = kmalloc(sizeof (struct mthca_sqp), GFP_KERNEL); - if (!ibqp->private) - return -ENOMEM; - - ((struct mthca_qp *) ibqp->private)->ib_qp = ibqp; - - if (port < 1 || port > to_mdev(pd->device)->limits.num_ports) - return -EINVAL; - - qp = ibqp->private; - - qp->sq.max = param->limit.max_outstanding_send_request; - qp->rq.max = param->limit.max_outstanding_receive_request; - qp->sq.max_gs = param->limit.max_send_gather_element; - qp->rq.max_gs = param->limit.max_receive_scatter_element; - - err = mthca_alloc_sqp(to_mdev(pd->device), (struct mthca_pd *) pd, - (struct mthca_cq *) param->send_queue, - (struct mthca_cq *) param->receive_queue, - param->send_policy, param->receive_policy, - qp_type, port, ibqp->private); - if (err) { - kfree(ibqp->private); - return err; + kfree(qp); + return ERR_PTR(err); } - ibqp->qpn = qp_type == IB_SMI_QP ? 0 : 1; + *qp_cap = init_attr->cap; + qp_cap->max_inline_data = 0; - return 0; + return (struct ib_qp *) qp; } -static int mthca_qp_destroy(struct ib_qp *qp) +static int mthca_destroy_qp(struct ib_qp *qp) { - mthca_free_qp(to_mdev(qp->device), qp->private); - kfree(qp->private); + mthca_free_qp(to_mdev(qp->device), (struct mthca_qp *) qp); + kfree(qp); return 0; } @@ -495,10 +485,10 @@ page_list[n++] = buffer_list[i].addr + ((u64) j << shift); access = - (acc & IB_MR_REMOTE_ATOMIC ? MTHCA_MPT_FLAG_ATOMIC : 0) | - (acc & IB_MR_REMOTE_WRITE ? MTHCA_MPT_FLAG_REMOTE_WRITE : 0) | - (acc & IB_MR_REMOTE_READ ? MTHCA_MPT_FLAG_REMOTE_READ : 0) | - (acc & IB_MR_LOCAL_WRITE ? MTHCA_MPT_FLAG_LOCAL_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_ATOMIC ? MTHCA_MPT_FLAG_ATOMIC : 0) | + (acc & IB_ACCESS_REMOTE_WRITE ? MTHCA_MPT_FLAG_REMOTE_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? MTHCA_MPT_FLAG_REMOTE_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? MTHCA_MPT_FLAG_LOCAL_WRITE : 0) | MTHCA_MPT_FLAG_LOCAL_READ; mthca_dbg(to_mdev(pd->device), "Registering memory at %llx (iova %llx) " @@ -547,12 +537,11 @@ dev->ib_dev.dealloc_pd = mthca_dealloc_pd; dev->ib_dev.create_ah = mthca_ah_create; dev->ib_dev.destroy_ah = mthca_ah_destroy; - dev->ib_dev.qp_create = mthca_qp_create; - dev->ib_dev.special_qp_create = mthca_special_qp_create; - dev->ib_dev.qp_modify = mthca_modify_qp; - dev->ib_dev.qp_destroy = mthca_qp_destroy; - dev->ib_dev.send_post = mthca_post_send; - dev->ib_dev.receive_post = mthca_post_receive; + dev->ib_dev.create_qp = mthca_create_qp; + dev->ib_dev.modify_qp = mthca_modify_qp; + dev->ib_dev.destroy_qp = mthca_destroy_qp; + dev->ib_dev.post_send = mthca_post_send; + dev->ib_dev.post_recv = mthca_post_receive; dev->ib_dev.create_cq = mthca_create_cq; dev->ib_dev.destroy_cq = mthca_destroy_cq; dev->ib_dev.poll_cq = mthca_poll_cq; Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.h =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.h (revision 607) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.h (working copy) @@ -141,19 +141,16 @@ void *last; int max_gs; int wqe_shift; - enum ib_wq_signal_policy policy; + enum ib_sig_type policy; }; struct mthca_qp { + struct ib_qp ibqp; spinlock_t lock; atomic_t refcount; - struct ib_qp *ib_qp; - int qpn; + u32 qpn; int transport; - struct mthca_pd *pd; enum ib_qp_state state; - u32 cqn_send; - u32 cqn_recv; int is_direct; struct mthca_mr mr; @@ -172,7 +169,6 @@ struct mthca_sqp { struct mthca_qp qp; - int sqpn; int port; int pkey_index; u32 qkey; Index: src/linux-kernel/infiniband/hw/mthca/mthca_cmd.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_cmd.c (revision 576) +++ src/linux-kernel/infiniband/hw/mthca/mthca_cmd.c (working copy) @@ -1240,16 +1240,16 @@ u8 op_mod; switch (type) { - case IB_SMI_QP: + case IB_QPT_SMI: op_mod = 0; break; - case IB_GSI_QP: + case IB_QPT_GSI: op_mod = 1; break; - case IB_RAW_IPV6_QP: + case IB_QPT_RAW_IPV6: op_mod = 2; break; - case IB_RAW_ETHERTYPE_QP: + case IB_QPT_RAW_ETY: op_mod = 3; break; default: Index: src/linux-kernel/infiniband/hw/mthca/mthca_mcg.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_mcg.c (revision 621) +++ src/linux-kernel/infiniband/hw/mthca/mthca_mcg.c (working copy) @@ -174,7 +174,7 @@ for (i = 0; i < MTHCA_QP_PER_MGM; ++i) if (!(mgm->qp[i] & cpu_to_be32(1 << 31))) { - mgm->qp[i] = cpu_to_be32(ibqp->qpn | (1 << 31)); + mgm->qp[i] = cpu_to_be32(ibqp->qp_num | (1 << 31)); break; } @@ -259,14 +259,14 @@ } for (loc = -1, i = 0; i < MTHCA_QP_PER_MGM; ++i) { - if (mgm->qp[i] == cpu_to_be32(ibqp->qpn | (1 << 31))) + if (mgm->qp[i] == cpu_to_be32(ibqp->qp_num | (1 << 31))) loc = i; if (!(mgm->qp[i] & cpu_to_be32(1 << 31))) break; } if (loc == -1) { - mthca_err(dev, "QP %06x not found in MGM\n", ibqp->qpn); + mthca_err(dev, "QP %06x not found in MGM\n", ibqp->qp_num); err = -EINVAL; goto out; } Index: src/linux-kernel/infiniband/hw/mthca/mthca_qp.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_qp.c (revision 607) +++ src/linux-kernel/infiniband/hw/mthca/mthca_qp.c (working copy) @@ -279,7 +279,7 @@ event_record.device = &dev->ib_dev; event_record.event = event; - event_record.modifier.qp = qp->ib_qp; + event_record.modifier.qp = (struct ib_qp *) qp; ib_async_event_dispatch(&event_record); if (atomic_dec_and_test(&qp->refcount)) @@ -289,13 +289,13 @@ static int to_mthca_state(enum ib_qp_state ib_state) { switch (ib_state) { - case IB_QP_STATE_RESET: return MTHCA_QP_STATE_RST; - case IB_QP_STATE_INIT: return MTHCA_QP_STATE_INIT; - case IB_QP_STATE_RTR: return MTHCA_QP_STATE_RTR; - case IB_QP_STATE_RTS: return MTHCA_QP_STATE_RTS; - case IB_QP_STATE_SQD: return MTHCA_QP_STATE_SQD; - case IB_QP_STATE_SQE: return MTHCA_QP_STATE_SQE; - case IB_QP_STATE_ERROR: return MTHCA_QP_STATE_ERR; + case IB_QPS_RESET: return MTHCA_QP_STATE_RST; + case IB_QPS_INIT: return MTHCA_QP_STATE_INIT; + case IB_QPS_RTR: return MTHCA_QP_STATE_RTR; + case IB_QPS_RTS: return MTHCA_QP_STATE_RTS; + case IB_QPS_SQD: return MTHCA_QP_STATE_SQD; + case IB_QPS_SQE: return MTHCA_QP_STATE_SQE; + case IB_QPS_ERR: return MTHCA_QP_STATE_ERR; default: return -1; } } @@ -318,148 +318,140 @@ int trans; u32 req_param[NUM_TRANS]; u32 opt_param[NUM_TRANS]; -} state_table[IB_QP_STATE_ERROR + 1][IB_QP_STATE_ERROR + 1] = { - [IB_QP_STATE_RESET] = { - [IB_QP_STATE_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, - [IB_QP_STATE_ERROR] = { .trans = MTHCA_TRANS_ANY2ERR }, - [IB_QP_STATE_INIT] = { +} state_table[IB_QPS_ERR + 1][IB_QPS_ERR + 1] = { + [IB_QPS_RESET] = { + [IB_QPS_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, + [IB_QPS_ERR] = { .trans = MTHCA_TRANS_ANY2ERR }, + [IB_QPS_INIT] = { .trans = MTHCA_TRANS_RST2INIT, .req_param = { - [UD] = (IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_PORT | - IB_QP_ATTRIBUTE_QKEY), - [RC] = (IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_PORT | - IB_QP_ATTRIBUTE_RDMA_ATOMIC_ENABLE), - [MLX] = (IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_QKEY), + [UD] = (IB_QP_PKEY_INDEX | + IB_QP_PORT | + IB_QP_QKEY), + [RC] = (IB_QP_PKEY_INDEX | + IB_QP_PORT | + IB_QP_ACCESS_FLAGS), + [MLX] = (IB_QP_PKEY_INDEX | + IB_QP_QKEY), }, /* bug-for-bug compatibility with VAPI: */ .opt_param = { - [MLX] = IB_QP_ATTRIBUTE_PORT + [MLX] = IB_QP_PORT } }, }, - [IB_QP_STATE_INIT] = { - [IB_QP_STATE_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, - [IB_QP_STATE_ERROR] = { .trans = MTHCA_TRANS_ANY2ERR }, - [IB_QP_STATE_INIT] = { + [IB_QPS_INIT] = { + [IB_QPS_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, + [IB_QPS_ERR] = { .trans = MTHCA_TRANS_ANY2ERR }, + [IB_QPS_INIT] = { .trans = MTHCA_TRANS_INIT2INIT, .opt_param = { - [UD] = (IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_PORT | - IB_QP_ATTRIBUTE_QKEY), - [RC] = (IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_PORT | - IB_QP_ATTRIBUTE_RDMA_ATOMIC_ENABLE), - [MLX] = (IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_QKEY), + [UD] = (IB_QP_PKEY_INDEX | + IB_QP_PORT | + IB_QP_QKEY), + [RC] = (IB_QP_PKEY_INDEX | + IB_QP_PORT | + IB_QP_ACCESS_FLAGS), + [MLX] = (IB_QP_PKEY_INDEX | + IB_QP_QKEY), } }, - [IB_QP_STATE_RTR] = { + [IB_QPS_RTR] = { .trans = MTHCA_TRANS_INIT2RTR, .req_param = { - [RC] = (IB_QP_ATTRIBUTE_ADDRESS | - IB_QP_ATTRIBUTE_PATH_MTU | - IB_QP_ATTRIBUTE_DESTINATION_QPN | - IB_QP_ATTRIBUTE_RECEIVE_PSN | - IB_QP_ATTRIBUTE_RESPONDER_RESOURCES | - IB_QP_ATTRIBUTE_RNR_TIMEOUT), + [RC] = (IB_QP_AV | + IB_QP_PATH_MTU | + IB_QP_DEST_QPN | + IB_QP_RQ_PSN | + IB_QP_MAX_DEST_RD_ATOMIC | + IB_QP_MIN_RNR_TIMER), }, .opt_param = { - [UD] = (IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_QKEY), - [RC] = (IB_QP_ATTRIBUTE_ALT_ADDRESS | - IB_QP_ATTRIBUTE_ALT_PKEY_INDEX | - IB_QP_ATTRIBUTE_ALT_PORT | - IB_QP_ATTRIBUTE_ALT_LOCAL_ACK_TIMEOUT | - IB_QP_ATTRIBUTE_RDMA_ATOMIC_ENABLE | - IB_QP_ATTRIBUTE_PKEY_INDEX), - [MLX] = (IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_QKEY), + [UD] = (IB_QP_PKEY_INDEX | + IB_QP_QKEY), + [RC] = (IB_QP_ALT_PATH | + IB_QP_ACCESS_FLAGS | + IB_QP_PKEY_INDEX), + [MLX] = (IB_QP_PKEY_INDEX | + IB_QP_QKEY), } } }, - [IB_QP_STATE_RTR] = { - [IB_QP_STATE_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, - [IB_QP_STATE_ERROR] = { .trans = MTHCA_TRANS_ANY2ERR }, - [IB_QP_STATE_RTS] = { + [IB_QPS_RTR] = { + [IB_QPS_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, + [IB_QPS_ERR] = { .trans = MTHCA_TRANS_ANY2ERR }, + [IB_QPS_RTS] = { .trans = MTHCA_TRANS_RTR2RTS, .req_param = { - [UD] = IB_QP_ATTRIBUTE_SEND_PSN, - [RC] = (IB_QP_ATTRIBUTE_LOCAL_ACK_TIMEOUT | - IB_QP_ATTRIBUTE_RETRY_COUNT | - IB_QP_ATTRIBUTE_RNR_RETRY_COUNT | - IB_QP_ATTRIBUTE_SEND_PSN | - IB_QP_ATTRIBUTE_INITIATOR_DEPTH), - [MLX] = IB_QP_ATTRIBUTE_SEND_PSN, + [UD] = IB_QP_SQ_PSN, + [RC] = (IB_QP_TIMEOUT | + IB_QP_RETRY_CNT | + IB_QP_RNR_RETRY | + IB_QP_SQ_PSN | + IB_QP_MAX_QP_RD_ATOMIC), + [MLX] = IB_QP_SQ_PSN, }, .opt_param = { - [UD] = IB_QP_ATTRIBUTE_QKEY, - [RC] = (IB_QP_ATTRIBUTE_ALT_ADDRESS | - IB_QP_ATTRIBUTE_ALT_PKEY_INDEX | - IB_QP_ATTRIBUTE_ALT_PORT | - IB_QP_ATTRIBUTE_ALT_LOCAL_ACK_TIMEOUT | - IB_QP_ATTRIBUTE_RDMA_ATOMIC_ENABLE | - IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_RNR_TIMEOUT | - IB_QP_ATTRIBUTE_MIGRATION_STATE), - [MLX] = IB_QP_ATTRIBUTE_QKEY, + [UD] = IB_QP_QKEY, + [RC] = (IB_QP_ALT_PATH | + IB_QP_ACCESS_FLAGS | + IB_QP_PKEY_INDEX | + IB_QP_MIN_RNR_TIMER | + IB_QP_PATH_MIG_STATE), + [MLX] = IB_QP_QKEY, } } }, - [IB_QP_STATE_RTS] = { - [IB_QP_STATE_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, - [IB_QP_STATE_ERROR] = { .trans = MTHCA_TRANS_ANY2ERR }, - [IB_QP_STATE_RTS] = { + [IB_QPS_RTS] = { + [IB_QPS_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, + [IB_QPS_ERR] = { .trans = MTHCA_TRANS_ANY2ERR }, + [IB_QPS_RTS] = { .trans = MTHCA_TRANS_RTS2RTS, .opt_param = { - [UD] = IB_QP_ATTRIBUTE_QKEY, - [RC] = (IB_QP_ATTRIBUTE_RDMA_ATOMIC_ENABLE | - IB_QP_ATTRIBUTE_ALT_ADDRESS | - IB_QP_ATTRIBUTE_ALT_PKEY_INDEX | - IB_QP_ATTRIBUTE_ALT_PORT | - IB_QP_ATTRIBUTE_ALT_LOCAL_ACK_TIMEOUT | - IB_QP_ATTRIBUTE_MIGRATION_STATE | - IB_QP_ATTRIBUTE_RNR_TIMEOUT), - [MLX] = IB_QP_ATTRIBUTE_QKEY, + [UD] = IB_QP_QKEY, + [RC] = (IB_QP_ACCESS_FLAGS | + IB_QP_ALT_PATH | + IB_QP_PATH_MIG_STATE | + IB_QP_MIN_RNR_TIMER), + [MLX] = IB_QP_QKEY, } }, - [IB_QP_STATE_SQD] = { + [IB_QPS_SQD] = { .trans = MTHCA_TRANS_RTS2SQD, }, }, - [IB_QP_STATE_SQD] = { - [IB_QP_STATE_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, - [IB_QP_STATE_ERROR] = { .trans = MTHCA_TRANS_ANY2ERR }, - [IB_QP_STATE_RTS] = { + [IB_QPS_SQD] = { + [IB_QPS_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, + [IB_QPS_ERR] = { .trans = MTHCA_TRANS_ANY2ERR }, + [IB_QPS_RTS] = { .trans = MTHCA_TRANS_SQD2RTS, }, - [IB_QP_STATE_SQD] = { + [IB_QPS_SQD] = { .trans = MTHCA_TRANS_SQD2SQD, } }, - [IB_QP_STATE_SQE] = { - [IB_QP_STATE_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, - [IB_QP_STATE_ERROR] = { .trans = MTHCA_TRANS_ANY2ERR }, - [IB_QP_STATE_RTS] = { + [IB_QPS_SQE] = { + [IB_QPS_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, + [IB_QPS_ERR] = { .trans = MTHCA_TRANS_ANY2ERR }, + [IB_QPS_RTS] = { .trans = MTHCA_TRANS_SQERR2RTS, } }, - [IB_QP_STATE_ERROR] = { - [IB_QP_STATE_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, - [IB_QP_STATE_ERROR] = { .trans = MTHCA_TRANS_ANY2ERR } + [IB_QPS_ERR] = { + [IB_QPS_RESET] = { .trans = MTHCA_TRANS_ANY2RST }, + [IB_QPS_ERR] = { .trans = MTHCA_TRANS_ANY2ERR } } }; -static void store_attrs(struct mthca_sqp *sqp, struct ib_qp_attribute *attr) +static void store_attrs(struct mthca_sqp *sqp, struct ib_qp_attr *attr, + int attr_mask) { - if (attr->valid_fields & IB_QP_ATTRIBUTE_PKEY_INDEX) + if (attr_mask & IB_QP_PKEY_INDEX) sqp->pkey_index = attr->pkey_index; - if (attr->valid_fields & IB_QP_ATTRIBUTE_QKEY) + if (attr_mask & IB_QP_QKEY) sqp->qkey = attr->qkey; - if (attr->valid_fields & IB_QP_ATTRIBUTE_SEND_PSN) - sqp->send_psn = attr->send_psn; + if (attr_mask & IB_QP_SQ_PSN) + sqp->send_psn = attr->sq_psn; } static void init_port(struct mthca_dev *dev, int port) @@ -484,10 +476,11 @@ mthca_warn(dev, "INIT_IB returned status %02x.\n", status); } -int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attribute *attr) +int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, + int attr_mask, struct ib_qp_cap *qp_cap) { struct mthca_dev *dev = to_mdev(ibqp->device); - struct mthca_qp *qp = ibqp->private; + struct mthca_qp *qp = (struct mthca_qp *) ibqp; enum ib_qp_state cur_state, new_state; void *mailbox = NULL; struct mthca_qp_param *qp_param; @@ -500,10 +493,10 @@ cur_state = qp->state; spin_unlock_irq(&qp->lock); - if (attr->valid_fields & IB_QP_ATTRIBUTE_STATE) { - if (attr->state <= 0 || attr->state > IB_QP_STATE_ERROR) + if (attr_mask & IB_QP_STATE) { + if (attr->qp_state <= 0 || attr->qp_state > IB_QPS_ERR) return -EINVAL; - new_state = attr->state; + new_state = attr->qp_state; } else new_state = cur_state; @@ -516,22 +509,21 @@ req_param = state_table[cur_state][new_state].req_param[qp->transport]; opt_param = state_table[cur_state][new_state].opt_param[qp->transport]; - if ((req_param & attr->valid_fields) != req_param) { + if ((req_param & attr_mask) != req_param) { mthca_dbg(dev, "QP transition " "%d->%d missing req attr 0x%08x\n", cur_state, new_state, - req_param & ~attr->valid_fields); + req_param & ~attr_mask); return -EINVAL; } - if (attr->valid_fields & ~(req_param | opt_param | - IB_QP_ATTRIBUTE_STATE)) { + if (attr_mask & ~(req_param | opt_param | IB_QP_STATE)) { mthca_dbg(dev, "QP transition (transport %d) " "%d->%d has extra attr 0x%08x\n", qp->transport, cur_state, new_state, - attr->valid_fields & ~(req_param | opt_param | - IB_QP_ATTRIBUTE_STATE)); + attr_mask & ~(req_param | opt_param | + IB_QP_STATE)); return -EINVAL; } @@ -545,18 +537,18 @@ qp_context->flags = cpu_to_be32((to_mthca_state(new_state) << 28) | (to_mthca_st(qp->transport) << 16)); qp_context->flags |= cpu_to_be32(MTHCA_QP_BIT_DE); - if (!(attr->valid_fields & IB_QP_ATTRIBUTE_MIGRATION_STATE)) + if (!(attr_mask & IB_QP_PATH_MIG_STATE)) qp_context->flags |= cpu_to_be32(MTHCA_QP_PM_MIGRATED << 11); else { qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_PM_STATE); - switch (attr->migration_state) { - case IB_MIGRATED: + switch (attr->path_mig_state) { + case IB_MIG_MIGRATED: qp_context->flags |= cpu_to_be32(MTHCA_QP_PM_MIGRATED << 11); break; - case IB_REARM: + case IB_MIG_REARM: qp_context->flags |= cpu_to_be32(MTHCA_QP_PM_REARM << 11); break; - case IB_ARMED: + case IB_MIG_ARMED: qp_context->flags |= cpu_to_be32(MTHCA_QP_PM_ARMED << 11); break; } @@ -565,66 +557,68 @@ if (qp->transport == MLX || qp->transport == UD) qp_context->mtu_msgmax = cpu_to_be32((IB_MTU_2048 << 29) | (11 << 24)); - else if (attr->valid_fields & IB_QP_ATTRIBUTE_PATH_MTU) { + else if (attr_mask & IB_QP_PATH_MTU) { qp_context->mtu_msgmax = cpu_to_be32((attr->path_mtu << 29) | (31 << 24)); } qp_context->usr_page = cpu_to_be32(MTHCA_KAR_PAGE); qp_context->local_qpn = cpu_to_be32(qp->qpn); - if (attr->valid_fields & IB_QP_ATTRIBUTE_DESTINATION_QPN) { - qp_context->remote_qpn = cpu_to_be32(attr->destination_qpn); + if (attr_mask & IB_QP_DEST_QPN) { + qp_context->remote_qpn = cpu_to_be32(attr->dest_qp_num); } if (qp->transport == MLX) qp_context->pri_path.port_pkey |= cpu_to_be32(((struct mthca_sqp *) qp)->port << 24); else { - if (attr->valid_fields & IB_QP_ATTRIBUTE_PORT) { + if (attr_mask & IB_QP_PORT) { qp_context->pri_path.port_pkey |= cpu_to_be32(attr->port << 24); qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_PORT_NUM); } } - if (attr->valid_fields & IB_QP_ATTRIBUTE_PKEY_INDEX) { + if (attr_mask & IB_QP_PKEY_INDEX) { qp_context->pri_path.port_pkey |= cpu_to_be32(attr->pkey_index); qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_PKEY_INDEX); } - if (attr->valid_fields & IB_QP_ATTRIBUTE_RNR_RETRY_COUNT) { - qp_context->pri_path.rnr_retry = attr->rnr_retry_count << 5; + if (attr_mask & IB_QP_RNR_RETRY) { + qp_context->pri_path.rnr_retry = attr->rnr_retry << 5; qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RNR_RETRY); } - if (attr->valid_fields & IB_QP_ATTRIBUTE_ADDRESS) { - qp_context->pri_path.g_mylmc = attr->address.source_path_bits & 0x7f; - qp_context->pri_path.rlid = cpu_to_be16(attr->address.dlid); - qp_context->pri_path.static_rate = (!!attr->address.static_rate) << 3; - if (attr->address.use_grh) { + if (attr_mask & IB_QP_AV) { + qp_context->pri_path.g_mylmc = attr->ah_attr.src_path_bits & 0x7f; + qp_context->pri_path.rlid = cpu_to_be16(attr->ah_attr.dlid); + qp_context->pri_path.static_rate = (!!attr->ah_attr.static_rate) << 3; + if (attr->ah_attr.grh_flag) { qp_context->pri_path.g_mylmc |= 1 << 7; - qp_context->pri_path.mgid_index = attr->address.source_gid_index; - qp_context->pri_path.hop_limit = attr->address.hop_limit; + qp_context->pri_path.mgid_index = attr->ah_attr.grh.sgid_index; + qp_context->pri_path.hop_limit = attr->ah_attr.grh.hop_limit; qp_context->pri_path.sl_tclass_flowlabel = - cpu_to_be32((attr->address.service_level << 28) | - (attr->address.traffic_class << 20) | - (attr->address.flow_label)); + cpu_to_be32((attr->ah_attr.sl << 28) | + (attr->ah_attr.grh.traffic_class << 20) | + (attr->ah_attr.grh.flow_label)); memcpy(qp_context->pri_path.rgid, - attr->address.dgid, 16); + attr->ah_attr.grh.dgid.raw, 16); + } else { + qp_context->pri_path.sl_tclass_flowlabel = + cpu_to_be32(attr->ah_attr.sl << 28); } - qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_PRIMARY_ADDR_PATH); } - if (attr->valid_fields & IB_QP_ATTRIBUTE_LOCAL_ACK_TIMEOUT) { - qp_context->pri_path.ackto = attr->local_ack_timeout; + if (attr_mask & IB_QP_TIMEOUT) { + qp_context->pri_path.ackto = attr->timeout; qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_ACK_TIMEOUT); } /* XXX alt_path */ /* leave rdd as 0 */ - qp_context->pd = cpu_to_be32(qp->pd->pd_num); + qp_context->pd = cpu_to_be32(((struct mthca_pd *) ibqp->pd)->pd_num); /* leave wqe_base as 0 (we always create an MR based at 0 for WQs) */ qp_context->wqe_lkey = cpu_to_be32(qp->mr.ibmr.lkey); qp_context->params1 = cpu_to_be32((MTHCA_ACK_REQ_FREQ << 28) | @@ -632,34 +626,34 @@ MTHCA_QP_BIT_SRE | MTHCA_QP_BIT_SWE | MTHCA_QP_BIT_SAE); - if (qp->sq.policy == IB_WQ_SIGNAL_ALL) + if (qp->sq.policy == IB_SIGNAL_ALL_WR) qp_context->params1 |= cpu_to_be32(MTHCA_QP_BIT_SSC); - if (attr->valid_fields & IB_QP_ATTRIBUTE_RETRY_COUNT) { - qp_context->params1 |= cpu_to_be32(attr->retry_count << 16); + if (attr_mask & IB_QP_RETRY_CNT) { + qp_context->params1 |= cpu_to_be32(attr->retry_cnt << 16); qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RETRY_COUNT); } /* XXX initiator resources */ - if (attr->valid_fields & IB_QP_ATTRIBUTE_SEND_PSN) - qp_context->next_send_psn = cpu_to_be32(attr->send_psn); - qp_context->cqn_snd = cpu_to_be32(qp->cqn_send); + if (attr_mask & IB_QP_SQ_PSN) + qp_context->next_send_psn = cpu_to_be32(attr->sq_psn); + qp_context->cqn_snd = cpu_to_be32(((struct mthca_cq *) ibqp->send_cq)->cqn); /* XXX RDMA/atomic enable, responder resources */ - if (qp->rq.policy == IB_WQ_SIGNAL_ALL) + if (qp->rq.policy == IB_SIGNAL_ALL_WR) qp_context->params2 |= cpu_to_be32(MTHCA_QP_BIT_RSC); - if (attr->valid_fields & IB_QP_ATTRIBUTE_RNR_TIMEOUT) { - qp_context->rnr_nextrecvpsn |= cpu_to_be32(attr->rnr_timeout << 24); + if (attr_mask & IB_QP_MIN_RNR_TIMER) { + qp_context->rnr_nextrecvpsn |= cpu_to_be32(attr->min_rnr_timer << 24); qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RNR_TIMEOUT); } - if (attr->valid_fields & IB_QP_ATTRIBUTE_RECEIVE_PSN) - qp_context->rnr_nextrecvpsn |= cpu_to_be32(attr->receive_psn); + if (attr_mask & IB_QP_RQ_PSN) + qp_context->rnr_nextrecvpsn |= cpu_to_be32(attr->rq_psn); /* XXX ra_buff_indx */ - qp_context->cqn_rcv = cpu_to_be32(qp->cqn_recv); + qp_context->cqn_rcv = cpu_to_be32(((struct mthca_cq *) ibqp->recv_cq)->cqn); - if (attr->valid_fields & IB_QP_ATTRIBUTE_QKEY) { + if (attr_mask & IB_QP_QKEY) { qp_context->qkey = cpu_to_be32(attr->qkey); qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_Q_KEY); } @@ -682,21 +676,21 @@ kfree(mailbox); if (is_sqp(dev, qp)) - store_attrs((struct mthca_sqp *) qp, attr); + store_attrs((struct mthca_sqp *) qp, attr, attr_mask); /* * If we are moving QP0 to RTR, bring the IB link up; if we * are moving QP0 to RESET or ERROR, bring the link back down. */ if (is_qp0(dev, qp)) { - if (cur_state != IB_QP_STATE_RTR && - new_state == IB_QP_STATE_RTR) + if (cur_state != IB_QPS_RTR && + new_state == IB_QPS_RTR) init_port(dev, ((struct mthca_sqp *) qp)->port); - if (cur_state != IB_QP_STATE_RESET && - cur_state != IB_QP_STATE_ERROR && - (new_state == IB_QP_STATE_RESET || - new_state == IB_QP_STATE_ERROR)) + if (cur_state != IB_QPS_RESET && + cur_state != IB_QPS_ERR && + (new_state == IB_QPS_RESET || + new_state == IB_QPS_ERR)) mthca_CLOSE_IB(dev, ((struct mthca_sqp *) qp)->port, &status); } @@ -711,6 +705,7 @@ * queue) */ static int mthca_alloc_wqe_buf(struct mthca_dev *dev, + struct mthca_pd *pd, struct mthca_qp *qp) { int size; @@ -809,9 +804,8 @@ } } - err = mthca_mr_alloc_phys(dev, qp->pd->pd_num, - dma_list, shift, npages, - 0, size, + err = mthca_mr_alloc_phys(dev, pd->pd_num, dma_list, shift, + npages, 0, size, MTHCA_MPT_FLAG_LOCAL_WRITE | MTHCA_MPT_FLAG_LOCAL_READ, &qp->mr); @@ -846,18 +840,15 @@ struct mthca_pd *pd, struct mthca_cq *send_cq, struct mthca_cq *recv_cq, - enum ib_wq_signal_policy send_policy, - enum ib_wq_signal_policy recv_policy, + enum ib_sig_type send_policy, + enum ib_sig_type recv_policy, struct mthca_qp *qp) { int err; spin_lock_init(&qp->lock); atomic_set(&qp->refcount, 1); - qp->pd = pd; - qp->cqn_send = send_cq->cqn; - qp->cqn_recv = recv_cq->cqn; - qp->state = IB_QP_STATE_RESET; + qp->state = IB_QPS_RESET; qp->sq.policy = send_policy; qp->rq.policy = recv_policy; qp->rq.cur = 0; @@ -869,7 +860,7 @@ qp->rq.last = NULL; qp->sq.last = NULL; - err = mthca_alloc_wqe_buf(dev, qp); + err = mthca_alloc_wqe_buf(dev, pd, qp); return err; } @@ -877,18 +868,17 @@ struct mthca_pd *pd, struct mthca_cq *send_cq, struct mthca_cq *recv_cq, - enum ib_transport transport, - enum ib_wq_signal_policy send_policy, - enum ib_wq_signal_policy recv_policy, + enum ib_qp_type type, + enum ib_sig_type send_policy, + enum ib_sig_type recv_policy, struct mthca_qp *qp) { int err; - switch (transport) { - case IB_TRANSPORT_RC: qp->transport = RC; break; - case IB_TRANSPORT_UC: qp->transport = UC; break; - case IB_TRANSPORT_RD: qp->transport = RD; break; - case IB_TRANSPORT_UD: qp->transport = UD; break; + switch (type) { + case IB_QPT_RC: qp->transport = RC; break; + case IB_QPT_UC: qp->transport = UC; break; + case IB_QPT_UD: qp->transport = UD; break; default: return -EINVAL; } @@ -915,15 +905,14 @@ struct mthca_pd *pd, struct mthca_cq *send_cq, struct mthca_cq *recv_cq, - enum ib_wq_signal_policy send_policy, - enum ib_wq_signal_policy recv_policy, - enum ib_special_qp_type type, + enum ib_sig_type send_policy, + enum ib_sig_type recv_policy, + int qpn, int port, struct mthca_sqp *sqp) { int err = 0; - u32 mqpn = (type == IB_SMI_QP ? 0 : 2) - + dev->qp_table.sqp_start + port - 1; + u32 mqpn = qpn * 2 + dev->qp_table.sqp_start + port - 1; sqp->header_buf_size = sqp->qp.sq.max * MTHCA_UD_HEADER_SIZE; sqp->header_buf = pci_alloc_consistent(dev->pdev, sqp->header_buf_size, @@ -941,7 +930,6 @@ if (err) goto err_out; - sqp->sqpn = type == IB_SMI_QP ? 0 : 1; sqp->port = port; sqp->qp.qpn = mqpn; sqp->qp.transport = MLX; @@ -985,9 +973,10 @@ mthca_MODIFY_QP(dev, MTHCA_TRANS_ANY2RST, qp->qpn, 0, NULL, 0, &status); - mthca_cq_clean(dev, qp->cqn_send, qp->qpn); - if (qp->cqn_recv != qp->cqn_send) - mthca_cq_clean(dev, qp->cqn_recv, qp->qpn); + mthca_cq_clean(dev, ((struct mthca_cq *) qp->ibqp.send_cq)->cqn, qp->qpn); + if (qp->ibqp.send_cq != qp->ibqp.recv_cq) + mthca_cq_clean(dev, ((struct mthca_cq *) qp->ibqp.recv_cq)->cqn, + qp->qpn); mthca_free_mr(dev, &qp->mr); @@ -1010,7 +999,7 @@ kfree(qp->wrid); if (is_sqp(dev, qp)) { - atomic_dec(&qp->pd->sqp_count); + atomic_dec(&((struct mthca_pd *) &qp->ibqp.pd)->sqp_count); pci_free_consistent(dev->pdev, ((struct mthca_sqp *) qp)->header_buf_size, ((struct mthca_sqp *) qp)->header_buf, @@ -1022,19 +1011,19 @@ /* Create UD header for an MLX send and build a data segment for it */ static int build_mlx_header(struct mthca_dev *dev, struct mthca_sqp *sqp, - int ind, struct ib_send_param *param, + int ind, struct ib_send_wr *wr, struct mthca_mlx_seg *mlx, struct mthca_data_seg *data) { int header_size; int err; - err = mthca_read_ah(dev, (struct mthca_ah *) param->dest_address, + err = mthca_read_ah(dev, (struct mthca_ah *) wr->wr.ud.ah, &sqp->ud_header); if (err) return err; mlx->flags &= ~cpu_to_be32(MTHCA_NEXT_SOLICIT | 1); - mlx->flags |= cpu_to_be32((!sqp->sqpn ? MTHCA_MLX_VL15 : 0) | + mlx->flags |= cpu_to_be32((!sqp->qp.ibqp.qp_num ? MTHCA_MLX_VL15 : 0) | (sqp->ud_header.lrh.destination_lid == 0xffff ? MTHCA_MLX_SLR : 0) | (sqp->ud_header.lrh.service_level << 8)); @@ -1045,102 +1034,92 @@ sqp->ud_header.grh_present, &sqp->ud_header); - switch (param->op) { - case IB_OP_SEND: + switch (wr->opcode) { + case IB_WR_SEND: sqp->ud_header.bth.opcode = IB_OPCODE_UD_SEND_ONLY; sqp->ud_header.immediate_present = 0; break; - case IB_OP_SEND_IMMEDIATE: + case IB_WR_SEND_WITH_IMM: sqp->ud_header.bth.opcode = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE; sqp->ud_header.immediate_present = 1; - sqp->ud_header.immediate_data = param->immediate_data; + sqp->ud_header.immediate_data = wr->imm_data; break; default: return -EINVAL; } - sqp->ud_header.lrh.virtual_lane = !sqp->sqpn ? 15 : 0; + sqp->ud_header.lrh.virtual_lane = !sqp->qp.ibqp.qp_num ? 15 : 0; if (sqp->ud_header.lrh.destination_lid == 0xffff) sqp->ud_header.lrh.source_lid = 0xffff; - sqp->ud_header.bth.solicited_event = param->solicited_event; - if (!sqp->sqpn) + sqp->ud_header.bth.solicited_event = !!(wr->send_flags & IB_SEND_SOLICITED); + if (!sqp->qp.ibqp.qp_num) ib_cached_pkey_get(&dev->ib_dev, sqp->port, sqp->pkey_index, &sqp->ud_header.bth.pkey); else ib_cached_pkey_get(&dev->ib_dev, sqp->port, - param->pkey_index, + wr->wr.ud.pkey_index, &sqp->ud_header.bth.pkey); - sqp->ud_header.bth.destination_qpn = param->dest_qpn; + sqp->ud_header.bth.destination_qpn = wr->wr.ud.remote_qpn; sqp->ud_header.bth.psn = (sqp->send_psn++) & ((1 << 24) - 1); - sqp->ud_header.deth.qkey = param->dest_qkey & 0x80000000 ? - sqp->qkey : param->dest_qkey; - sqp->ud_header.deth.source_qpn = sqp->sqpn; + sqp->ud_header.deth.qkey = wr->wr.ud.remote_qkey & 0x80000000 ? + sqp->qkey : wr->wr.ud.remote_qkey; + sqp->ud_header.deth.source_qpn = sqp->qp.ibqp.qp_num; header_size = ib_ud_header_pack(&sqp->ud_header, sqp->header_buf + ind * MTHCA_UD_HEADER_SIZE); data->byte_count = cpu_to_be32(header_size); - data->lkey = cpu_to_be32(sqp->qp.pd->ntmr.ibmr.lkey); + data->lkey = cpu_to_be32(((struct mthca_pd *) sqp->qp.ibqp.pd)->ntmr.ibmr.lkey); data->addr = cpu_to_be64(sqp->header_dma + ind * MTHCA_UD_HEADER_SIZE); return 0; } -int mthca_post_send(struct ib_qp *ibqp, struct ib_send_param *param, - int nreq) +int mthca_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, + struct ib_send_wr **bad_wr) { struct mthca_dev *dev = to_mdev(ibqp->device); - struct mthca_qp *qp = ibqp->private; + struct mthca_qp *qp = (struct mthca_qp *) ibqp; + void *wqe; + void *prev_wqe; unsigned long flags; int err = 0; - int i, j; + int nreq; + int i; int size; int size0 = 0; - u8 op0 = 0; u32 f0 = 0; int ind; - void *wqe; - void *prev_wqe; + u8 op0 = 0; static const u8 opcode[] = { - [IB_OP_RECEIVE] = MTHCA_OPCODE_INVALID, - [IB_OP_SEND] = MTHCA_OPCODE_SEND, - [IB_OP_SEND_IMMEDIATE] = MTHCA_OPCODE_SEND_IMM, - [IB_OP_RDMA_WRITE] = MTHCA_OPCODE_RDMA_WRITE, - [IB_OP_RDMA_WRITE_IMMEDIATE] = MTHCA_OPCODE_RDMA_WRITE_IMM, - [IB_OP_RDMA_READ] = MTHCA_OPCODE_RDMA_READ, - [IB_OP_COMPARE_SWAP] = MTHCA_OPCODE_ATOMIC_CS, - [IB_OP_FETCH_ADD] = MTHCA_OPCODE_ATOMIC_FA, - [IB_OP_MEMORY_WINDOW_BIND] = MTHCA_OPCODE_BIND_MW + [IB_WR_SEND] = MTHCA_OPCODE_SEND, + [IB_WR_SEND_WITH_IMM] = MTHCA_OPCODE_SEND_IMM, + [IB_WR_RDMA_WRITE] = MTHCA_OPCODE_RDMA_WRITE, + [IB_WR_RDMA_WRITE_WITH_IMM] = MTHCA_OPCODE_RDMA_WRITE_IMM, + [IB_WR_RDMA_READ] = MTHCA_OPCODE_RDMA_READ, + [IB_WR_ATOMIC_CMP_AND_SWP] = MTHCA_OPCODE_ATOMIC_CS, + [IB_WR_ATOMIC_FETCH_AND_ADD] = MTHCA_OPCODE_ATOMIC_FA, }; - if (nreq <= 0) - return -EINVAL; - spin_lock_irqsave(&qp->lock, flags); /* XXX check that state is OK to post send */ - if (qp->sq.cur + nreq > qp->sq.max) { - mthca_err(dev, "SQ full (%d posted, %d max, %d nreq)\n", - qp->sq.cur, qp->sq.max, nreq); - err = -EINVAL; - goto out; - } - ind = qp->sq.next; - /* - * XXX our semantics are wrong according to the verbs - * extensions spec: an immediate error with one work request - * should only cause that and subsequent requests not to be - * posted, rather than all of the requests to be thrown out. - */ + for (nreq = 0; wr; ++nreq, wr = wr->next) { + if (qp->sq.cur + nreq >= qp->sq.max) { + mthca_err(dev, "SQ full (%d posted, %d max, %d nreq)\n", + qp->sq.cur, qp->sq.max, nreq); + err = -ENOMEM; + *bad_wr = wr; + goto out; + } - for (i = 0; i < nreq; ++i) { wqe = get_send_wqe(qp, ind); prev_wqe = qp->sq.last; qp->sq.last = wqe; @@ -1148,53 +1127,58 @@ ((struct mthca_next_seg *) wqe)->nda_op = 0; ((struct mthca_next_seg *) wqe)->ee_nds = 0; ((struct mthca_next_seg *) wqe)->flags = - cpu_to_be32((param->signaled ? MTHCA_NEXT_CQ_UPDATE : 0) | - (param->solicited_event ? MTHCA_NEXT_SOLICIT : 0) | - 1); - if (param[i].op == IB_OP_SEND_IMMEDIATE || - param[i].op == IB_OP_RDMA_WRITE_IMMEDIATE) + ((wr->send_flags & IB_SEND_SIGNALED) ? + cpu_to_be32(MTHCA_NEXT_CQ_UPDATE) : 0) | + ((wr->send_flags & IB_SEND_SOLICITED) ? + cpu_to_be32(MTHCA_NEXT_SOLICIT) : 0) | + cpu_to_be32(1); + if (wr->opcode == IB_WR_SEND_WITH_IMM || + wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) ((struct mthca_next_seg *) wqe)->flags = - cpu_to_be32(param->immediate_data); + cpu_to_be32(wr->imm_data); wqe += sizeof (struct mthca_next_seg); size = sizeof (struct mthca_next_seg) / 16; if (qp->transport == UD) { ((struct mthca_ud_seg *) wqe)->lkey = - cpu_to_be32(((struct mthca_ah *) param->dest_address)->key); + cpu_to_be32(((struct mthca_ah *) wr->wr.ud.ah)->key); ((struct mthca_ud_seg *) wqe)->av_addr = - cpu_to_be64(((struct mthca_ah *) param->dest_address)->avdma); + cpu_to_be64(((struct mthca_ah *) wr->wr.ud.ah)->avdma); ((struct mthca_ud_seg *) wqe)->dqpn = - cpu_to_be32(param->dest_qpn); + cpu_to_be32(wr->wr.ud.remote_qpn); ((struct mthca_ud_seg *) wqe)->qkey = - cpu_to_be32(param->dest_qkey); + cpu_to_be32(wr->wr.ud.remote_qkey); wqe += sizeof (struct mthca_ud_seg); size += sizeof (struct mthca_ud_seg) / 16; } else if (qp->transport == MLX) { err = build_mlx_header(dev, (struct mthca_sqp *) qp, - ind, param + i, + ind, wr, wqe - sizeof (struct mthca_next_seg), wqe); - if (err) + if (err) { + *bad_wr = wr; goto out; + } wqe += sizeof (struct mthca_data_seg); size += sizeof (struct mthca_data_seg) / 16; } - if (param[i].num_gather_entries > qp->sq.max_gs) { + if (wr->num_sge > qp->sq.max_gs) { mthca_err(dev, "too many gathers\n"); err = -EINVAL; + *bad_wr = wr; goto out; } - for (j = 0; j < param[i].num_gather_entries; ++j) { + for (i = 0; i < wr->num_sge; ++i) { ((struct mthca_data_seg *) wqe)->byte_count = - cpu_to_be32(param[i].gather_list[j].length); + cpu_to_be32(wr->sg_list[i].length); ((struct mthca_data_seg *) wqe)->lkey = - cpu_to_be32(param[i].gather_list[j].key); + cpu_to_be32(wr->sg_list[i].lkey); ((struct mthca_data_seg *) wqe)->addr = - cpu_to_be64(param[i].gather_list[j].address); + cpu_to_be64(wr->sg_list[i].addr); wqe += sizeof (struct mthca_data_seg); size += sizeof (struct mthca_data_seg) / 16; } @@ -1208,12 +1192,12 @@ size += sizeof (struct mthca_data_seg) / 16; } - qp->wrid[ind + qp->rq.max] = param[i].work_request_id; + qp->wrid[ind + qp->rq.max] = wr->wr_id; - if (param[i].op >= ARRAY_SIZE(opcode) || - opcode[param[i].op] == MTHCA_OPCODE_INVALID) { + if (wr->opcode >= ARRAY_SIZE(opcode)) { mthca_err(dev, "opcode invalid\n"); err = -EINVAL; + *bad_wr = wr; goto out; } @@ -1221,15 +1205,15 @@ ((struct mthca_next_seg *) prev_wqe)->nda_op = cpu_to_be32(((ind << qp->sq.wqe_shift) + qp->send_wqe_offset) | - opcode[param[i].op]); + opcode[wr->opcode]); smp_wmb(); ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32((i ? 0 : MTHCA_NEXT_DBD) | size); + cpu_to_be32((size0 ? 0 : MTHCA_NEXT_DBD) | size); } - if (!i) { + if (!size0) { size0 = size; - op0 = opcode[param[i].op]; + op0 = opcode[wr->opcode]; } ++ind; @@ -1237,7 +1221,8 @@ ind -= qp->sq.max; } - { +out: + if (nreq) { u32 doorbell[2]; doorbell[0] = cpu_to_be32(((qp->sq.next << qp->sq.wqe_shift) + @@ -1254,48 +1239,39 @@ qp->sq.cur += nreq; qp->sq.next = ind; - out: spin_unlock_irqrestore(&qp->lock, flags); return err; } -int mthca_post_receive(struct ib_qp *ibqp, struct ib_receive_param *param, - int nreq) +int mthca_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr, + struct ib_recv_wr **bad_wr) { struct mthca_dev *dev = to_mdev(ibqp->device); - struct mthca_qp *qp = ibqp->private; + struct mthca_qp *qp = (struct mthca_qp *) ibqp; unsigned long flags; int err = 0; - int i, j; + int nreq; + int i; int size; int size0 = 0; int ind; void *wqe; void *prev_wqe; - if (nreq <= 0) - return -EINVAL; - spin_lock_irqsave(&qp->lock, flags); /* XXX check that state is OK to post receive */ - if (qp->rq.cur + nreq > qp->rq.max) { - mthca_err(dev, "RQ %06x full\n", qp->qpn); - err = -EINVAL; - goto out; - } - ind = qp->rq.next; - /* - * XXX our semantics are wrong according to the verbs - * extensions spec: an immediate error with one work request - * should only cause that and subsequent requests not to be - * posted, rather than all of the requests to be thrown out. - */ + for (nreq = 0; wr; ++nreq, wr = wr->next) { + if (qp->rq.cur + nreq >= qp->rq.max) { + mthca_err(dev, "RQ %06x full\n", qp->qpn); + err = -ENOMEM; + *bad_wr = wr; + goto out; + } - for (i = 0; i < nreq; ++i) { wqe = get_recv_wqe(qp, ind); prev_wqe = qp->rq.last; qp->rq.last = wqe; @@ -1304,28 +1280,30 @@ ((struct mthca_next_seg *) wqe)->ee_nds = cpu_to_be32(MTHCA_NEXT_DBD); ((struct mthca_next_seg *) wqe)->flags = - cpu_to_be32(param->signaled ? MTHCA_NEXT_CQ_UPDATE : 0); + (wr->recv_flags & IB_RECV_SIGNALED) ? + cpu_to_be32(MTHCA_NEXT_CQ_UPDATE) : 0; wqe += sizeof (struct mthca_next_seg); size = sizeof (struct mthca_next_seg) / 16; - if (param[i].num_scatter_entries > qp->rq.max_gs) { + if (wr->num_sge > qp->rq.max_gs) { err = -EINVAL; + *bad_wr = wr; goto out; } - for (j = 0; j < param[i].num_scatter_entries; ++j) { + for (i = 0; i < wr->num_sge; ++i) { ((struct mthca_data_seg *) wqe)->byte_count = - cpu_to_be32(param[i].scatter_list[j].length); + cpu_to_be32(wr->sg_list[i].length); ((struct mthca_data_seg *) wqe)->lkey = - cpu_to_be32(param[i].scatter_list[j].key); + cpu_to_be32(wr->sg_list[i].lkey); ((struct mthca_data_seg *) wqe)->addr = - cpu_to_be64(param[i].scatter_list[j].address); + cpu_to_be64(wr->sg_list[i].addr); wqe += sizeof (struct mthca_data_seg); size += sizeof (struct mthca_data_seg) / 16; } - qp->wrid[ind] = param[i].work_request_id; + qp->wrid[ind] = wr->wr_id; if (prev_wqe) { ((struct mthca_next_seg *) prev_wqe)->nda_op = @@ -1335,7 +1313,7 @@ cpu_to_be32(MTHCA_NEXT_DBD | size); } - if (!i) + if (!size0) size0 = size; ++ind; @@ -1343,7 +1321,8 @@ ind -= qp->rq.max; } - { +out: + if (nreq) { u32 doorbell[2]; doorbell[0] = cpu_to_be32((qp->rq.next << qp->rq.wqe_shift) | size0); @@ -1359,7 +1338,6 @@ qp->rq.cur += nreq; qp->rq.next = ind; - out: spin_unlock_irqrestore(&qp->lock, flags); return err; } @@ -1413,7 +1391,7 @@ } for (i = 0; i < 2; ++i) { - err = mthca_CONF_SPECIAL_QP(dev, i, + err = mthca_CONF_SPECIAL_QP(dev, i ? IB_QPT_GSI : IB_QPT_SMI, dev->qp_table.sqp_start + i * 2, &status); if (err) From roland at topspin.com Fri Aug 13 10:49:25 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 10:49:25 -0700 Subject: [openib-general] [PATCH][3/4] MAD, CM, SA/DM QP API In-Reply-To: <10924193652379@topspin.com> Message-ID: <10924193653247@topspin.com> Index: src/linux-kernel/infiniband/core/dm_client_svc_entries.c =================================================================== --- src/linux-kernel/infiniband/core/dm_client_svc_entries.c (revision 576) +++ src/linux-kernel/infiniband/core/dm_client_svc_entries.c (working copy) @@ -181,7 +181,7 @@ attrib_mod = TS_IB_DM_SE_SET_BEG_ENTRY_ID(attrib_mod, begin_svc_entry_id); attrib_mod = TS_IB_DM_SE_SET_END_ENTRY_ID(attrib_mod, end_svc_entry_id); - ib_dm_client_mad_init(&mad, device, port, dst_port_lid, IB_GSI_QP, + ib_dm_client_mad_init(&mad, device, port, dst_port_lid, 1, TS_IB_DM_METHOD_GET, TS_IB_DM_ATTRIBUTE_SVC_ENTRIES, attrib_mod); Index: src/linux-kernel/infiniband/core/cm_common.c =================================================================== --- src/linux-kernel/infiniband/core/cm_common.c (revision 576) +++ src/linux-kernel/infiniband/core/cm_common.c (working copy) @@ -166,7 +166,8 @@ void ib_cm_qp_to_error(struct ib_qp *qp) { - struct ib_qp_attribute *qp_attr; + struct ib_qp_attr *qp_attr; + int attr_mask; qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); if (!qp_attr) { @@ -175,10 +176,10 @@ return; } - qp_attr->state = IB_QP_STATE_ERROR; - qp_attr->valid_fields = IB_QP_ATTRIBUTE_STATE; + qp_attr->qp_state = IB_QPS_ERR; + attr_mask = IB_QP_STATE; - if (ib_cm_qp_modify(qp, qp_attr)) + if (ib_cm_qp_modify(qp, qp_attr, attr_mask)) TS_REPORT_WARN(MOD_IB_CM, "ib_qp_modify to error failed"); @@ -352,7 +353,7 @@ connection->mad.device = connection->local_cm_device; connection->mad.port = connection->local_cm_port; connection->mad.pkey_index = connection->local_cm_pkey_index; - connection->mad.sqpn = IB_GSI_QP; + connection->mad.sqpn = 1; connection->mad.dlid = connection->remote_cm_lid; connection->mad.dqpn = connection->remote_cm_qpn; @@ -414,7 +415,7 @@ packet->device = local_cm_device; packet->port = local_cm_port; packet->pkey_index = pkey_index; - packet->sqpn = IB_GSI_QP; + packet->sqpn = 1; packet->dlid = remote_cm_lid; packet->dqpn = remote_cm_qpn; packet->has_grh = 0; @@ -488,7 +489,7 @@ connection->mad.device = connection->local_cm_device; connection->mad.port = connection->local_cm_port; connection->mad.pkey_index = connection->local_cm_pkey_index; - connection->mad.sqpn = IB_GSI_QP; + connection->mad.sqpn = 1; connection->mad.dlid = connection->remote_cm_lid; connection->mad.dqpn = connection->remote_cm_qpn; @@ -711,7 +712,7 @@ drep->device = packet->device; drep->port = packet->port; drep->pkey_index = packet->pkey_index; - drep->sqpn = IB_GSI_QP; + drep->sqpn = 1; drep->dlid = packet->slid; drep->dqpn = packet->sqpn; drep->has_grh = 0; Index: src/linux-kernel/infiniband/core/cm_path_migration.c =================================================================== --- src/linux-kernel/infiniband/core/cm_path_migration.c (revision 576) +++ src/linux-kernel/infiniband/core/cm_path_migration.c (working copy) @@ -41,7 +41,8 @@ static int ib_cm_alt_path_load(struct ib_cm_connection *connection) { - struct ib_qp_attribute *qp_attr; + struct ib_qp_attr *qp_attr; + int attr_mask; int result; qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); @@ -51,15 +52,16 @@ memset(qp_attr, 0, sizeof *qp_attr); /* XXX need to include CA ACK delay */ - qp_attr->alt_local_ack_timeout = min(31, connection->alternate_path.packet_life + 1); - qp_attr->alt_address.service_level = connection->alternate_path.sl; - qp_attr->alt_address.dlid = connection->alternate_path.dlid; - qp_attr->alt_address.source_path_bits = connection->alternate_path.slid & 0x7f; - qp_attr->alt_address.static_rate = 0; - qp_attr->alt_address.use_grh = 0; - qp_attr->migration_state = IB_REARM; + qp_attr->alt_timeout = min(31, connection->alternate_path.packet_life + 1); + qp_attr->ah_attr.sl = connection->alternate_path.sl; + qp_attr->ah_attr.dlid = connection->alternate_path.dlid; + qp_attr->ah_attr.src_path_bits = connection->alternate_path.slid & 0x7f; + qp_attr->ah_attr.static_rate = 0; + qp_attr->ah_attr.grh_flag = 0; + qp_attr->path_mig_state = IB_MIG_REARM; - if (ib_cached_gid_find(connection->alternate_path.sgid, NULL, &qp_attr->alt_port, NULL)) { + if (ib_cached_gid_find(connection->alternate_path.sgid, NULL, + &qp_attr->alt_port, NULL)) { result = -EINVAL; goto out; } @@ -75,18 +77,13 @@ TS_TRACE(MOD_IB_CM, T_VERY_VERBOSE, TRACE_IB_CM_GEN, "Loading alternate path: port %d, timeout %d, 0x%04x -> 0x%04x", qp_attr->alt_port, - qp_attr->alt_local_ack_timeout, + qp_attr->alt_timeout, connection->alternate_path.slid, - qp_attr->alt_address.dlid); + qp_attr->ah_attr.dlid); - qp_attr->valid_fields = - IB_QP_ATTRIBUTE_ALT_PORT | - IB_QP_ATTRIBUTE_ALT_ADDRESS | - IB_QP_ATTRIBUTE_ALT_PKEY_INDEX | - IB_QP_ATTRIBUTE_MIGRATION_STATE; + attr_mask = (IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE); - result = ib_cm_qp_modify(connection->local_qp, - qp_attr); + result = ib_cm_qp_modify(connection->local_qp, qp_attr, attr_mask); if (result) { TS_REPORT_WARN(MOD_IB_CM, @@ -154,7 +151,7 @@ connection->mad.device = connection->local_cm_device; connection->mad.port = connection->local_cm_port; connection->mad.pkey_index = connection->local_cm_pkey_index; - connection->mad.sqpn = IB_GSI_QP; + connection->mad.sqpn = 1; connection->mad.dlid = connection->remote_cm_lid; connection->mad.dqpn = connection->remote_cm_qpn; @@ -291,7 +288,7 @@ connection->mad.device = connection->local_cm_device; connection->mad.port = connection->local_cm_port; connection->mad.pkey_index = connection->local_cm_pkey_index; - connection->mad.sqpn = IB_GSI_QP; + connection->mad.sqpn = 1; connection->mad.dlid = connection->remote_cm_lid; connection->mad.dqpn = connection->remote_cm_qpn; Index: src/linux-kernel/infiniband/core/cm_connection_table.c =================================================================== --- src/linux-kernel/infiniband/core/cm_connection_table.c (revision 576) +++ src/linux-kernel/infiniband/core/cm_connection_table.c (working copy) @@ -231,7 +231,7 @@ tsKernelTimerInit(&connection->timer); - connection->mad.sqpn = IB_GSI_QP; + connection->mad.sqpn = 1; connection->mad.has_grh = 0; connection->mad.completion_func = NULL; connection->remote_qp_node.pprev = NULL; Index: src/linux-kernel/infiniband/core/mad_thread.c =================================================================== --- src/linux-kernel/infiniband/core/mad_thread.c (revision 576) +++ src/linux-kernel/infiniband/core/mad_thread.c (working copy) @@ -214,7 +214,7 @@ ib_mad_invoke_filters(mad, TS_IB_MAD_DIRECTION_OUT); /* Handle directed route SMPs */ - if (mad->dqpn == IB_SMI_QP && + if (mad->dqpn == 0 && mad->dlid == IB_LID_PERMISSIVE && mad->mgmt_class == IB_SM_DIRECTED_ROUTE) if (ib_mad_smp_send(device, work, &reuse)) Index: src/linux-kernel/infiniband/core/dm_client_class_port_info.c =================================================================== --- src/linux-kernel/infiniband/core/dm_client_class_port_info.c (revision 576) +++ src/linux-kernel/infiniband/core/dm_client_class_port_info.c (working copy) @@ -188,7 +188,7 @@ return -ENOMEM; } - ib_dm_client_mad_init(&mad, device, port, dst_port_lid, IB_GSI_QP, + ib_dm_client_mad_init(&mad, device, port, dst_port_lid, 1, TS_IB_DM_METHOD_SET, TS_IB_DM_ATTRIBUTE_CLASS_PORTINFO, 0); Index: src/linux-kernel/infiniband/core/dm_client_ioc_profile.c =================================================================== --- src/linux-kernel/infiniband/core/dm_client_ioc_profile.c (revision 576) +++ src/linux-kernel/infiniband/core/dm_client_ioc_profile.c (working copy) @@ -203,7 +203,7 @@ return -ENOMEM; } - ib_dm_client_mad_init(&mad, device, port, dst_port_lid, IB_GSI_QP, + ib_dm_client_mad_init(&mad, device, port, dst_port_lid, 1, TS_IB_DM_METHOD_GET, TS_IB_DM_ATTRIBUTE_IOC_PROFILE, TS_IB_DM_IOCPROFILE_GET_CONTROLLER_ID (controller_id)); Index: src/linux-kernel/infiniband/core/mad_ib.c =================================================================== --- src/linux-kernel/infiniband/core/mad_ib.c (revision 607) +++ src/linux-kernel/infiniband/core/mad_ib.c (working copy) @@ -49,24 +49,26 @@ { struct ib_device *device = mad->device; struct ib_mad_private *priv = device->mad; - struct ib_gather_scatter gather_list; - struct ib_send_param send_param; + struct ib_sge gather_list; + struct ib_send_wr send_param; + struct ib_send_wr *bad_wr; struct ib_ah_attr av; struct ib_ah *addr; - gather_list.address = pci_map_single(priv->ib_dev->dma_device, + gather_list.addr = pci_map_single(priv->ib_dev->dma_device, mad, IB_MAD_PACKET_SIZE, PCI_DMA_TODEVICE); gather_list.length = IB_MAD_PACKET_SIZE; - gather_list.key = priv->lkey; + gather_list.lkey = priv->lkey; - send_param.op = IB_OP_SEND; - send_param.gather_list = &gather_list; - send_param.num_gather_entries = 1; - send_param.dest_qpn = mad->dqpn; - send_param.pkey_index = mad->pkey_index; - send_param.solicited_event = 1; - send_param.signaled = 1; + send_param.next = NULL; + send_param.opcode = IB_WR_SEND; + send_param.sg_list = &gather_list; + send_param.num_sge = 1; + send_param.wr.ud.remote_qpn = mad->dqpn; + send_param.wr.ud.pkey_index = mad->pkey_index; + send_param.send_flags = + IB_SEND_SIGNALED | IB_SEND_SOLICITED; av.dlid = mad->dlid; av.port = mad->port; @@ -95,16 +97,16 @@ wrid.field.qpn = mad->sqpn; wrid.field.index = index; - send_param.work_request_id = wrid.id; + send_param.wr_id = wrid.id; } - send_param.dest_address = addr; - send_param.dest_qkey = - mad->dqpn == IB_SMI_QP ? 0 : IB_GSI_WELL_KNOWN_QKEY; + send_param.wr.ud.ah = addr; + send_param.wr.ud.remote_qkey = + mad->dqpn ? IB_GSI_WELL_KNOWN_QKEY : 0; pci_unmap_addr_set(&priv->send_buf[mad->port][mad->sqpn][index], mapping, gather_list.address); - if (ib_send(priv->qp[mad->port][mad->sqpn], &send_param, 1)) { + if (ib_post_send(priv->qp[mad->port][mad->sqpn], &send_param, &bad_wr)) { TS_REPORT_WARN(MOD_KERNEL_IB, "ib_send failed for port %d QPN %d of %s", mad->port, mad->sqpn, device->name); @@ -290,23 +292,24 @@ { struct ib_mad_private *priv = device->mad; void *buf; - struct ib_receive_param receive_param; - struct ib_gather_scatter scatter_list; + struct ib_recv_wr receive_param; + struct ib_recv_wr *bad_wr; + struct ib_sge scatter_list; buf = kmalloc(sizeof (struct ib_mad) + IB_MAD_GRH_SIZE, GFP_KERNEL); if (!buf) return -ENOMEM; - scatter_list.address = pci_map_single(priv->ib_dev->dma_device, + scatter_list.addr = pci_map_single(priv->ib_dev->dma_device, buf, IB_MAD_BUFFER_SIZE, PCI_DMA_FROMDEVICE); scatter_list.length = IB_MAD_BUFFER_SIZE; - scatter_list.key = priv->lkey; + scatter_list.lkey = priv->lkey; - receive_param.scatter_list = &scatter_list; - receive_param.num_scatter_entries = 1; - receive_param.device_specific = NULL; - receive_param.signaled = 1; + receive_param.next = NULL; + receive_param.sg_list = &scatter_list; + receive_param.num_sge = 1; + receive_param.recv_flags = IB_RECV_SIGNALED; { union ib_mad_wrid wrid; @@ -316,14 +319,14 @@ wrid.field.qpn = qpn; wrid.field.index = index; - receive_param.work_request_id = wrid.id; + receive_param.wr_id = wrid.id; } priv->receive_buf[port][qpn][index].buf = buf; pci_unmap_addr_set(&priv->receive_buf[port][qpn][index], mapping, scatter_list.address); - if (ib_receive(priv->qp[port][qpn], &receive_param, 1)) { + if (ib_post_recv(priv->qp[port][qpn], &receive_param, &bad_wr)) { TS_REPORT_WARN(MOD_KERNEL_IB, "ib_receive failed for port %d QPN %d of %s", port, qpn, device->name); Index: src/linux-kernel/infiniband/core/cm_api.c =================================================================== --- src/linux-kernel/infiniband/core/cm_api.c (revision 576) +++ src/linux-kernel/infiniband/core/cm_api.c (working copy) @@ -99,20 +99,14 @@ *comm_id = connection->local_comm_id; - connection->local_qp = param->qp; - ret = ib_qp_query_qpn(param->qp, &connection->local_qpn); - if (ret) { - TS_REPORT_WARN(MOD_IB_CM, - "ib_qp_query_qpn failed %d", ret); - goto out; - } - + connection->local_qp = param->qp; + connection->local_qpn = param->qp->qp_num; connection->transaction_id = ib_cm_tid_generate(); connection->local_cm_device = device; connection->local_cm_port = port; connection->primary_path = *primary_path; connection->remote_cm_lid = primary_path->dlid; - connection->remote_cm_qpn = IB_GSI_QP; + connection->remote_cm_qpn = 1; connection->receive_psn = ib_cm_psn_generate(); connection->cm_function = function; connection->cm_arg = arg; @@ -133,7 +127,8 @@ } { - struct ib_qp_attribute *qp_attr; + struct ib_qp_attr *qp_attr; + int attr_mask; qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); if (!qp_attr) { @@ -153,10 +148,10 @@ } connection->local_cm_pkey_index = qp_attr->pkey_index; - qp_attr->valid_fields = - IB_QP_ATTRIBUTE_PORT | - IB_QP_ATTRIBUTE_PKEY_INDEX; - ret = ib_cm_qp_modify(connection->local_qp, qp_attr); + attr_mask = + IB_QP_PORT | + IB_QP_PKEY_INDEX; + ret = ib_cm_qp_modify(connection->local_qp, qp_attr, attr_mask); kfree(qp_attr); Index: src/linux-kernel/infiniband/core/cm_passive.c =================================================================== --- src/linux-kernel/infiniband/core/cm_passive.c (revision 576) +++ src/linux-kernel/infiniband/core/cm_passive.c (working copy) @@ -92,7 +92,7 @@ connection->mad.device = connection->local_cm_device; connection->mad.port = connection->local_cm_port; connection->mad.pkey_index = connection->local_cm_pkey_index; - connection->mad.sqpn = IB_GSI_QP; + connection->mad.sqpn = 1; connection->mad.dlid = connection->remote_cm_lid; connection->mad.dqpn = connection->remote_cm_qpn; @@ -122,15 +122,16 @@ void *response_data, int response_size) { - struct ib_qp_attribute *qp_attr; + struct ib_qp_attr *qp_attr; + int attr_mask; int result; - qp_attr = kmalloc(sizeof(struct ib_qp_attribute), GFP_KERNEL); + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); if (NULL == qp_attr) { return -ENOMEM; } - memset(qp_attr, 0, sizeof(struct ib_qp_attribute)); + memset(qp_attr, 0, sizeof *qp_attr); qp_attr->port = connection->local_cm_port; if (ib_cached_gid_find(connection->primary_path.sgid, NULL, &qp_attr->port, NULL)) { @@ -145,11 +146,9 @@ } connection->local_cm_pkey_index = qp_attr->pkey_index; - qp_attr->valid_fields = - IB_QP_ATTRIBUTE_PORT | - IB_QP_ATTRIBUTE_PKEY_INDEX; + attr_mask = (IB_QP_PORT | IB_QP_PKEY_INDEX); - if (ib_cm_qp_modify(connection->local_qp, qp_attr)) { + if (ib_cm_qp_modify(connection->local_qp, qp_attr, attr_mask)) { TS_REPORT_WARN(MOD_IB_CM, "ib_qp_modify INIT->INIT failed"); goto fail; @@ -158,28 +157,28 @@ /* modify QP INIT->RTR */ connection->receive_psn = ib_cm_psn_generate(); - qp_attr->state = IB_QP_STATE_RTR; - qp_attr->receive_psn = connection->receive_psn; - qp_attr->destination_qpn = connection->remote_qpn; - qp_attr->responder_resources = connection->responder_resources; - qp_attr->rnr_timeout = IB_RNR_TIMER_122_88; /* XXX settable? */ + qp_attr->qp_state = IB_QPS_RTR; + qp_attr->rq_psn = connection->receive_psn; + qp_attr->dest_qp_num = connection->remote_qpn; + qp_attr->max_dest_rd_atomic = connection->responder_resources; + qp_attr->min_rnr_timer = IB_RNR_TIMER_122_88; /* XXX settable? */ qp_attr->path_mtu = connection->primary_path.mtu; - qp_attr->address.service_level = connection->primary_path.sl; - qp_attr->address.dlid = connection->primary_path.dlid; - qp_attr->address.source_path_bits = connection->primary_path.slid & 0x7f; - qp_attr->address.static_rate = 0; - qp_attr->address.use_grh = 0; + qp_attr->ah_attr.sl = connection->primary_path.sl; + qp_attr->ah_attr.dlid = connection->primary_path.dlid; + qp_attr->ah_attr.src_path_bits = connection->primary_path.slid & 0x7f; + qp_attr->ah_attr.static_rate = 0; + qp_attr->ah_attr.grh_flag = 0; - qp_attr->valid_fields = - IB_QP_ATTRIBUTE_STATE | - IB_QP_ATTRIBUTE_RECEIVE_PSN | - IB_QP_ATTRIBUTE_DESTINATION_QPN | - IB_QP_ATTRIBUTE_RESPONDER_RESOURCES | - IB_QP_ATTRIBUTE_RNR_TIMEOUT | - IB_QP_ATTRIBUTE_PATH_MTU | - IB_QP_ATTRIBUTE_ADDRESS; + attr_mask = + IB_QP_STATE | + IB_QP_RQ_PSN | + IB_QP_DEST_QPN | + IB_QP_MAX_DEST_RD_ATOMIC | + IB_QP_MIN_RNR_TIMER | + IB_QP_PATH_MTU | + IB_QP_AV; - if (ib_cm_qp_modify(connection->local_qp, qp_attr)) { + if (ib_cm_qp_modify(connection->local_qp, qp_attr, attr_mask)) { TS_REPORT_WARN(MOD_IB_CM, "ib_qp_modify to RTR failed"); goto fail; @@ -246,49 +245,46 @@ /* =============================================================== */ /*..ib_cm_passive_rts - Transition a passive connection to RTS */ -int ib_cm_passive_rts( - struct ib_cm_connection *connection - ) { - struct ib_qp_attribute *qp_attr; - int result; +int ib_cm_passive_rts(struct ib_cm_connection *connection) { + struct ib_qp_attr *qp_attr; + int attr_mask; + int result; - qp_attr = kmalloc(sizeof(struct ib_qp_attribute), GFP_KERNEL); + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); if (!qp_attr) { return -ENOMEM; } - memset(qp_attr, 0, sizeof(struct ib_qp_attribute)); + memset(qp_attr, 0, sizeof *qp_attr); - qp_attr->state = IB_QP_STATE_RTS; - qp_attr->send_psn = connection->send_psn; - qp_attr->initiator_depth = connection->initiator_depth; - qp_attr->retry_count = connection->retry_count; - qp_attr->rnr_retry_count = connection->rnr_retry_count; + qp_attr->qp_state = IB_QPS_RTS; + qp_attr->sq_psn = connection->send_psn; + qp_attr->max_rd_atomic = connection->initiator_depth; + qp_attr->retry_cnt = connection->retry_count; + qp_attr->rnr_retry = connection->rnr_retry_count; /* We abuse packet life and put local ACK timeout there */ - qp_attr->local_ack_timeout = connection->primary_path.packet_life; + qp_attr->timeout = connection->primary_path.packet_life; - qp_attr->valid_fields = - IB_QP_ATTRIBUTE_STATE | - IB_QP_ATTRIBUTE_SEND_PSN | - IB_QP_ATTRIBUTE_INITIATOR_DEPTH | - IB_QP_ATTRIBUTE_RETRY_COUNT | - IB_QP_ATTRIBUTE_RNR_RETRY_COUNT | - IB_QP_ATTRIBUTE_LOCAL_ACK_TIMEOUT; + attr_mask = + IB_QP_STATE | + IB_QP_SQ_PSN | + IB_QP_MAX_QP_RD_ATOMIC | + IB_QP_RETRY_CNT | + IB_QP_RNR_RETRY | + IB_QP_TIMEOUT; if (connection->alternate_path.dlid) { - qp_attr->valid_fields |= - IB_QP_ATTRIBUTE_ALT_PORT | - IB_QP_ATTRIBUTE_ALT_ADDRESS | - IB_QP_ATTRIBUTE_ALT_PKEY_INDEX | - IB_QP_ATTRIBUTE_MIGRATION_STATE; + attr_mask |= + IB_QP_ALT_PATH | + IB_QP_PATH_MIG_STATE; /* We abuse packet life and put local ACK timeout there */ - qp_attr->alt_local_ack_timeout = connection->alternate_path.packet_life; - qp_attr->alt_address.service_level = connection->alternate_path.sl; - qp_attr->alt_address.dlid = connection->alternate_path.dlid; - qp_attr->alt_address.source_path_bits = connection->alternate_path.slid & 0x7f; - qp_attr->alt_address.static_rate = 0; - qp_attr->alt_address.use_grh = 0; - qp_attr->migration_state = IB_REARM; + qp_attr->alt_timeout = connection->alternate_path.packet_life; + qp_attr->alt_ah_attr.sl = connection->alternate_path.sl; + qp_attr->alt_ah_attr.dlid = connection->alternate_path.dlid; + qp_attr->alt_ah_attr.src_path_bits = connection->alternate_path.slid & 0x7f; + qp_attr->alt_ah_attr.static_rate = 0; + qp_attr->alt_ah_attr.grh_flag = 0; + qp_attr->path_mig_state = IB_MIG_REARM; ib_cached_gid_find(connection->alternate_path.sgid, NULL, &qp_attr->alt_port, NULL); /* XXX check return value: */ @@ -298,7 +294,7 @@ &qp_attr->alt_pkey_index); } - result = ib_cm_qp_modify(connection->local_qp, qp_attr); + result = ib_cm_qp_modify(connection->local_qp, qp_attr, attr_mask); kfree(qp_attr); tsKernelTimerRemove(&connection->timer); @@ -311,18 +307,10 @@ int ib_cm_passive_param_store(struct ib_cm_connection *connection, struct ib_cm_passive_param *param) { - int result; + connection->local_qp = param->qp; + connection->local_qpn = param->qp->qp_num; - connection->local_qp = param->qp; - - result = ib_qp_query_qpn(param->qp, &connection->local_qpn); - if (result) { - TS_REPORT_WARN(MOD_IB_CM, - "ib_qp_query_qpn failed (return %d)", - result); - } - - return result; + return 0; } static void ib_cm_service_store(struct ib_cm_service *service, Index: src/linux-kernel/infiniband/core/cm_priv.h =================================================================== --- src/linux-kernel/infiniband/core/cm_priv.h (revision 576) +++ src/linux-kernel/infiniband/core/cm_priv.h (working copy) @@ -121,14 +121,14 @@ struct ib_path_record alternate_path; tTS_IB_PSN receive_psn; tTS_IB_PSN send_psn; - u8 retry_count; - u8 rnr_retry_count; - u8 responder_resources; - u8 initiator_depth; + u8 retry_count; + u8 rnr_retry_count; + u8 responder_resources; + u8 initiator_depth; struct ib_device *local_cm_device; tTS_IB_PORT local_cm_port; - int local_cm_pkey_index; + u16 local_cm_pkey_index; u16 remote_cm_lid; u32 remote_cm_qpn; u16 alternate_remote_cm_lid; @@ -174,10 +174,13 @@ tTS_IB_CM_COMM_ID comm_id; }; -static inline int ib_cm_qp_modify(struct ib_qp *qp, - struct ib_qp_attribute *attr) +static inline int ib_cm_qp_modify(struct ib_qp *qp, + struct ib_qp_attr *attr, + int attr_mask) { - return qp ? ib_qp_modify(qp, attr) : 0; + struct ib_qp_cap qp_cap; + + return qp ? ib_modify_qp(qp, attr, attr_mask, &qp_cap) : 0; } int ib_cm_timeout_to_jiffies(int timeout); Index: src/linux-kernel/infiniband/core/dm_client_iou_info.c =================================================================== --- src/linux-kernel/infiniband/core/dm_client_iou_info.c (revision 576) +++ src/linux-kernel/infiniband/core/dm_client_iou_info.c (working copy) @@ -141,7 +141,7 @@ return -ENOMEM; } - ib_dm_client_mad_init(&mad, device, port, dst_port_lid, IB_GSI_QP, + ib_dm_client_mad_init(&mad, device, port, dst_port_lid, 1, TS_IB_DM_METHOD_GET, TS_IB_DM_ATTRIBUTE_IOU_INFO, 0); Index: src/linux-kernel/infiniband/core/mad_main.c =================================================================== --- src/linux-kernel/infiniband/core/mad_main.c (revision 576) +++ src/linux-kernel/infiniband/core/mad_main.c (working copy) @@ -58,7 +58,7 @@ }; *mr = ib_reg_phys_mr(pd, &buffer_list, 1, /* list_len */ - IB_MR_LOCAL_WRITE, &iova); + IB_ACCESS_LOCAL_WRITE, &iova); if (IS_ERR(*mr)) { TS_REPORT_WARN(MOD_KERNEL_IB, "ib_reg_phys_mr failed " @@ -77,7 +77,9 @@ u32 qpn) { struct ib_mad_private *priv = device->mad; - struct ib_qp_attribute qp_attr; + struct ib_qp_attr qp_attr; + struct ib_qp_cap qp_cap; + int attr_mask; int ret; TS_TRACE(MOD_KERNEL_IB, T_VERY_VERBOSE, TRACE_KERNEL_IB_GEN, @@ -85,74 +87,69 @@ port, qpn, device->name); { - struct ib_qp_create_param param = { { 0 } }; + struct ib_qp_init_attr init_attr = { + .send_cq = priv->cq, + .recv_cq = priv->cq, + .cap = { + .max_send_wr = IB_MAD_SENDS_PER_QP, + .max_recv_wr = IB_MAD_RECEIVES_PER_QP, + .max_send_sge = 1, + .max_recv_sge = 1 + }, + .sq_sig_type = IB_SIGNAL_ALL_WR, + .rq_sig_type = IB_SIGNAL_ALL_WR, + .qp_type = qpn == 0 ? IB_QPT_SMI : IB_QPT_GSI, + .port_num = port + }; + struct ib_qp_cap qp_cap; - param.limit.max_outstanding_send_request = IB_MAD_SENDS_PER_QP; - param.limit.max_outstanding_receive_request = IB_MAD_RECEIVES_PER_QP; - param.limit.max_send_gather_element = 1; - param.limit.max_receive_scatter_element = 1; - - param.pd = priv->pd; - param.send_queue = priv->cq; - param.receive_queue = priv->cq; - param.send_policy = IB_WQ_SIGNAL_ALL; - param.receive_policy = IB_WQ_SIGNAL_ALL; - param.transport = IB_TRANSPORT_UD; - - ret = ib_special_qp_create(¶m, - port, - qpn == 0 ? IB_SMI_QP : IB_GSI_QP, - &priv->qp[port][qpn]); - if (ret) { + priv->qp[port][qpn] = ib_create_qp(priv->pd, &init_attr, &qp_cap); + if (IS_ERR(priv->qp[port][qpn])) { TS_REPORT_FATAL(MOD_KERNEL_IB, "ib_special_qp_create failed for %s port %d QPN %d (%d)", - device->name, port, qpn, ret); - return ret; + device->name, port, qpn, + PTR_ERR(priv->qp[port][qpn])); + return PTR_ERR(priv->qp[port][qpn]); } } - qp_attr.state = IB_QP_STATE_INIT; - qp_attr.qkey = qpn == 0 ? 0 : IB_GSI_WELL_KNOWN_QKEY; + qp_attr.qp_state = IB_QPS_INIT; + qp_attr.qkey = qpn == 0 ? 0 : IB_GSI_WELL_KNOWN_QKEY; /* P_Key index is really irrelevant for QP0/QP1, but we have to set some value for RESET->INIT transition. */ qp_attr.pkey_index = 0; - qp_attr.valid_fields = - IB_QP_ATTRIBUTE_STATE | - IB_QP_ATTRIBUTE_QKEY | - IB_QP_ATTRIBUTE_PKEY_INDEX; + attr_mask = + IB_QP_STATE | + IB_QP_QKEY | + IB_QP_PKEY_INDEX; - /* This is not required, according to the IB spec, but do it until - the Tavor driver is fixed: */ - qp_attr.port = port; - qp_attr.valid_fields |= IB_QP_ATTRIBUTE_PORT; - - ret = ib_qp_modify(priv->qp[port][qpn], &qp_attr); + ret = ib_modify_qp(priv->qp[port][qpn], &qp_attr, attr_mask, &qp_cap); if (ret) { TS_REPORT_FATAL(MOD_KERNEL_IB, - "ib_qp_modify -> INIT failed for %s port %d QPN %d (%d)", + "ib_modify_qp -> INIT failed for %s port %d QPN %d (%d)", device->name, port, qpn, ret); return ret; } - qp_attr.state = IB_QP_STATE_RTR; - qp_attr.valid_fields = IB_QP_ATTRIBUTE_STATE; - ret = ib_qp_modify(priv->qp[port][qpn], &qp_attr); + qp_attr.qp_state = IB_QPS_RTR; + attr_mask = IB_QP_STATE; + ret = ib_modify_qp(priv->qp[port][qpn], &qp_attr, attr_mask, &qp_cap); if (ret) { TS_REPORT_FATAL(MOD_KERNEL_IB, - "ib_qp_modify -> RTR failed for %s port %d QPN %d (%d)", + "ib_modify_qp -> RTR failed for %s port %d QPN %d (%d)", device->name, port, qpn, ret); return ret; } - qp_attr.state = IB_QP_STATE_RTS; - qp_attr.send_psn = 0; - qp_attr.valid_fields = - IB_QP_ATTRIBUTE_STATE | - IB_QP_ATTRIBUTE_SEND_PSN; - ret = ib_qp_modify(priv->qp[port][qpn], &qp_attr); + qp_attr.qp_state = IB_QPS_RTS; + qp_attr.sq_psn = 0; + attr_mask = + IB_QP_STATE | + IB_QP_SQ_PSN; + ret = ib_modify_qp(priv->qp[port][qpn], &qp_attr, attr_mask, &qp_cap); if (ret) { TS_REPORT_FATAL(MOD_KERNEL_IB, - "ib_qp_modify -> RTS failed for %s port %d QPN %d (%d)", + "ib_modify_qp -> RTS failed for %s port %d QPN %d (%d)", device->name, port, qpn, ret); return ret; } @@ -283,7 +280,7 @@ for (p = 0; p <= IB_MAD_MAX_PORTS_PER_DEVICE; ++p) { for (q = 0; q <= 1; ++q) { if (priv->qp[p][q]) { - ib_qp_destroy(priv->qp[p][q]); + ib_destroy_qp(priv->qp[p][q]); for (i = 0; i < IB_MAD_RECEIVES_PER_QP; ++i) { if (priv->receive_buf[p][q][i].buf) pci_unmap_single(priv->ib_dev->dma_device, @@ -324,7 +321,7 @@ for (p = 0; p <= IB_MAD_MAX_PORTS_PER_DEVICE; ++p) { for (q = 0; q <= 1; ++q) { if (priv->qp[p][q]) { - ib_qp_destroy(priv->qp[p][q]); + ib_destroy_qp(priv->qp[p][q]); for (i = 0; i < IB_MAD_RECEIVES_PER_QP; ++i) { if (priv->receive_buf[p][q][i].buf) Index: src/linux-kernel/infiniband/core/dm_client_query.c =================================================================== --- src/linux-kernel/infiniband/core/dm_client_query.c (revision 576) +++ src/linux-kernel/infiniband/core/dm_client_query.c (working copy) @@ -53,7 +53,7 @@ packet->slid = 0xffff; packet->dlid = dst_lid; packet->sl = 0; - packet->sqpn = IB_GSI_QP; + packet->sqpn = 1; packet->dqpn = dst_qpn; packet->r_method = r_method; packet->attribute_id = cpu_to_be16(attribute_id); Index: src/linux-kernel/infiniband/core/sa_client_query.c =================================================================== --- src/linux-kernel/infiniband/core/sa_client_query.c (revision 576) +++ src/linux-kernel/infiniband/core/sa_client_query.c (working copy) @@ -56,8 +56,8 @@ packet->slid = 0xffff; packet->dlid = sm_path.sm_lid; packet->sl = sm_path.sm_sl; - packet->sqpn = IB_GSI_QP; - packet->dqpn = IB_GSI_QP; + packet->sqpn = 1; + packet->dqpn = 1; packet->has_grh = 0; packet->completion_func = NULL; Index: src/linux-kernel/infiniband/core/cm_active.c =================================================================== --- src/linux-kernel/infiniband/core/cm_active.c (revision 576) +++ src/linux-kernel/infiniband/core/cm_active.c (working copy) @@ -150,7 +150,7 @@ connection->mad.device = connection->local_cm_device; connection->mad.port = connection->local_cm_port; connection->mad.pkey_index = connection->local_cm_pkey_index; - connection->mad.sqpn = IB_GSI_QP; + connection->mad.sqpn = 1; connection->mad.dlid = connection->remote_cm_lid; connection->mad.dqpn = connection->remote_cm_qpn; @@ -225,12 +225,13 @@ int ib_cm_rtu_send(struct ib_cm_connection *connection) { int result = 0; - struct ib_qp_attribute *qp_attr; + struct ib_qp_attr *qp_attr; + int attr_mask; if (!connection) return -EINVAL; - qp_attr = kmalloc(sizeof(struct ib_qp_attribute), GFP_KERNEL); + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); if (!qp_attr) return -ENOMEM; @@ -245,7 +246,7 @@ connection->mad.device = connection->local_cm_device; connection->mad.port = connection->local_cm_port; connection->mad.pkey_index = connection->local_cm_pkey_index; - connection->mad.sqpn = IB_GSI_QP; + connection->mad.sqpn = 1; connection->mad.dlid = connection->remote_cm_lid; connection->mad.dqpn = connection->remote_cm_qpn; @@ -261,36 +262,34 @@ "Sent RTU"); /* move connection to established. */ - qp_attr->state = IB_QP_STATE_RTS; - qp_attr->send_psn = connection->send_psn; - qp_attr->initiator_depth = connection->initiator_depth; - qp_attr->retry_count = connection->retry_count; - qp_attr->rnr_retry_count = connection->rnr_retry_count; + qp_attr->qp_state = IB_QPS_RTS; + qp_attr->sq_psn = connection->send_psn; + qp_attr->max_rd_atomic = connection->initiator_depth; + qp_attr->retry_cnt = connection->retry_count; + qp_attr->rnr_retry = connection->rnr_retry_count; /* XXX need to include CA ACK delay */ - qp_attr->local_ack_timeout = min(31, connection->primary_path.packet_life + 1); + qp_attr->timeout = min(31, connection->primary_path.packet_life + 1); - qp_attr->valid_fields = - IB_QP_ATTRIBUTE_STATE | - IB_QP_ATTRIBUTE_SEND_PSN | - IB_QP_ATTRIBUTE_INITIATOR_DEPTH | - IB_QP_ATTRIBUTE_RETRY_COUNT | - IB_QP_ATTRIBUTE_RNR_RETRY_COUNT | - IB_QP_ATTRIBUTE_LOCAL_ACK_TIMEOUT; + attr_mask = + IB_QP_STATE | + IB_QP_SQ_PSN | + IB_QP_MAX_QP_RD_ATOMIC | + IB_QP_RETRY_CNT | + IB_QP_RNR_RETRY | + IB_QP_TIMEOUT; if (connection->alternate_path.dlid) { - qp_attr->valid_fields |= - IB_QP_ATTRIBUTE_ALT_PORT | - IB_QP_ATTRIBUTE_ALT_ADDRESS | - IB_QP_ATTRIBUTE_ALT_PKEY_INDEX | - IB_QP_ATTRIBUTE_MIGRATION_STATE; + attr_mask |= + IB_QP_ALT_PATH | + IB_QP_PATH_MIG_STATE; /* XXX need to include CA ACK delay */ - qp_attr->alt_local_ack_timeout = min(31, connection->alternate_path.packet_life + 1); - qp_attr->alt_address.service_level = connection->alternate_path.sl; - qp_attr->alt_address.dlid = connection->alternate_path.dlid; - qp_attr->alt_address.source_path_bits = connection->alternate_path.slid & 0x7f; - qp_attr->alt_address.static_rate = 0; - qp_attr->alt_address.use_grh = 0; - qp_attr->migration_state = IB_REARM; + qp_attr->alt_timeout = min(31, connection->alternate_path.packet_life + 1); + qp_attr->alt_ah_attr.sl = connection->alternate_path.sl; + qp_attr->alt_ah_attr.dlid = connection->alternate_path.dlid; + qp_attr->alt_ah_attr.src_path_bits = connection->alternate_path.slid & 0x7f; + qp_attr->alt_ah_attr.static_rate = 0; + qp_attr->alt_ah_attr.grh_flag = 0; + qp_attr->path_mig_state = IB_MIG_REARM; ib_cached_gid_find(connection->alternate_path.sgid, NULL, &qp_attr->alt_port, NULL); @@ -299,7 +298,7 @@ connection->alternate_path.pkey, &qp_attr->alt_pkey_index); } - result = ib_cm_qp_modify(connection->local_qp, qp_attr); + result = ib_cm_qp_modify(connection->local_qp, qp_attr, attr_mask); if (result) { TS_REPORT_WARN(MOD_IB_CM, "ib_qp_modify to RTS failed"); goto free; @@ -344,7 +343,8 @@ void ib_cm_rep_handler(struct ib_mad *packet) { struct ib_cm_connection *connection; - struct ib_qp_attribute *qp_attr; + struct ib_qp_attr *qp_attr; + int attr_mask; tTS_IB_CM_REJ_REASON rej_reason; int result; @@ -392,7 +392,7 @@ goto out; } - qp_attr = kmalloc(sizeof(struct ib_qp_attribute), GFP_KERNEL); + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); if (!qp_attr) { rej_reason = TS_IB_REJ_NO_RESOURCES; goto reject; @@ -406,30 +406,30 @@ connection->remote_comm_id = ib_cm_rep_local_comm_id_get(packet); connection->send_psn = ib_cm_rep_starting_psn_get(packet); - memset(qp_attr, 0, sizeof(struct ib_qp_attribute)); + memset(qp_attr, 0, sizeof *qp_attr); - qp_attr->state = IB_QP_STATE_RTR; - qp_attr->receive_psn = connection->receive_psn; - qp_attr->destination_qpn = connection->remote_qpn; - qp_attr->responder_resources = connection->responder_resources; - qp_attr->rnr_timeout = IB_RNR_TIMER_122_88; /* XXX settable? */ + qp_attr->qp_state = IB_QPS_RTR; + qp_attr->rq_psn = connection->receive_psn; + qp_attr->dest_qp_num = connection->remote_qpn; + qp_attr->max_dest_rd_atomic = connection->responder_resources; + qp_attr->min_rnr_timer = IB_RNR_TIMER_122_88; /* XXX settable? */ qp_attr->path_mtu = connection->primary_path.mtu; - qp_attr->address.service_level = connection->primary_path.sl; - qp_attr->address.dlid = connection->primary_path.dlid; - qp_attr->address.source_path_bits = connection->primary_path.slid & 0x7f; - qp_attr->address.static_rate = 0; - qp_attr->address.use_grh = 0; + qp_attr->ah_attr.sl = connection->primary_path.sl; + qp_attr->ah_attr.dlid = connection->primary_path.dlid; + qp_attr->ah_attr.src_path_bits = connection->primary_path.slid & 0x7f; + qp_attr->ah_attr.static_rate = 0; + qp_attr->ah_attr.grh_flag = 0; - qp_attr->valid_fields = - IB_QP_ATTRIBUTE_STATE | - IB_QP_ATTRIBUTE_RECEIVE_PSN | - IB_QP_ATTRIBUTE_DESTINATION_QPN | - IB_QP_ATTRIBUTE_RESPONDER_RESOURCES | - IB_QP_ATTRIBUTE_RNR_TIMEOUT | - IB_QP_ATTRIBUTE_PATH_MTU | - IB_QP_ATTRIBUTE_ADDRESS; + attr_mask = + IB_QP_STATE | + IB_QP_RQ_PSN | + IB_QP_DEST_QPN | + IB_QP_MAX_DEST_RD_ATOMIC | + IB_QP_MIN_RNR_TIMER | + IB_QP_PATH_MTU | + IB_QP_AV; - result = ib_cm_qp_modify(connection->local_qp, qp_attr); + result = ib_cm_qp_modify(connection->local_qp, qp_attr, attr_mask); if (result) { TS_REPORT_WARN(MOD_IB_CM, "ib_qp_modify to RTR failed. <%d>", result); kfree(qp_attr); Index: src/linux-kernel/infiniband/include/ts_ib_cm_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_cm_types.h (revision 576) +++ src/linux-kernel/infiniband/include/ts_ib_cm_types.h (working copy) @@ -31,11 +31,6 @@ # include /* for size_t */ #endif -/* Visual C++ apparently can't handle empty structs. */ -#if !defined(EMPTY_STRUCT) -# define EMPTY_STRUCT -#endif - #include typedef uint32_t tTS_IB_CM_COMM_ID; @@ -45,6 +40,13 @@ #define TS_IB_CM_COMM_ID_INVALID 0 #define TS_IB_CM_SERVICE_EXACT_MASK 0xffffffffffffffffULL +enum ib_transport { + IB_TRANSPORT_RC = 0, + IB_TRANSPORT_UC = 1, + IB_TRANSPORT_RD = 2, + IB_TRANSPORT_UD = 3, +}; + typedef enum { TS_IB_CM_CALLBACK_PROCEED, TS_IB_CM_CALLBACK_DEFER, @@ -192,9 +194,7 @@ void *apr_info; }; -struct ib_cm_established_param { - EMPTY_STRUCT -}; +struct ib_cm_established_param { }; struct ib_cm_disconnected_param { tTS_IB_CM_DISCONNECTED_REASON reason; From roland at topspin.com Fri Aug 13 10:49:25 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 10:49:25 -0700 Subject: [openib-general] [PATCH][4/4] ULP QP API In-Reply-To: <10924193653247@topspin.com> Message-ID: <10924193652637@topspin.com> Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (revision 621) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (working copy) @@ -30,7 +30,7 @@ void ipoib_pkey_dev_check_presence(struct net_device *dev) { struct ipoib_dev_priv *priv = dev->priv; - int pkey_index = 0; + u16 pkey_index = 0; if (ib_cached_pkey_find(priv->ca, priv->port, priv->pkey, &pkey_index)) clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); @@ -41,8 +41,11 @@ int ipoib_mcast_attach(struct net_device *dev, u16 mlid, union ib_gid *mgid) { struct ipoib_dev_priv *priv = dev->priv; - struct ib_qp_attribute *qp_attr; - int ret, pkey_index; + struct ib_qp_attr *qp_attr; + struct ib_qp_cap qp_cap; + int attr_mask; + int ret; + u16 pkey_index; ret = -ENOMEM; qp_attr = kmalloc(sizeof(*qp_attr), GFP_ATOMIC); @@ -58,8 +61,8 @@ /* set correct QKey for QP */ qp_attr->qkey = priv->qkey; - qp_attr->valid_fields = IB_QP_ATTRIBUTE_QKEY; - ret = ib_qp_modify(priv->qp, qp_attr); + attr_mask = IB_QP_QKEY; + ret = ib_modify_qp(priv->qp, qp_attr, attr_mask, &qp_cap); if (ret) { TS_REPORT_FATAL(MOD_IB_NET, "%s: failed to modify QP, ret = %d", dev->name, ret); @@ -99,21 +102,27 @@ int ipoib_qp_create(struct net_device *dev) { struct ipoib_dev_priv *priv = dev->priv; - int ret, pkey_index; - struct ib_qp_create_param qp_create = { - .limit = { - .max_outstanding_send_request = IPOIB_TX_RING_SIZE, - .max_outstanding_receive_request = IPOIB_RX_RING_SIZE, - .max_send_gather_element = 1, - .max_receive_scatter_element = 1, - }, - .pd = priv->pd, - .send_queue = priv->cq, - .receive_queue = priv->cq, - .transport = IB_TRANSPORT_UD, + int ret; + u16 pkey_index; + + struct ib_qp_init_attr init_attr = { + .send_cq = priv->cq, + .recv_cq = priv->cq, + .cap = { + .max_send_wr = IPOIB_TX_RING_SIZE, + .max_recv_wr = IPOIB_RX_RING_SIZE, + .max_send_sge = 1, + .max_recv_sge = 1 + }, + .sq_sig_type = IB_SIGNAL_ALL_WR, + .rq_sig_type = IB_SIGNAL_ALL_WR, + .qp_type = IB_QPT_UD }; - struct ib_qp_attribute qp_attr; + struct ib_qp_cap qp_cap; + struct ib_qp_attr qp_attr; + int attr_mask; + /* * Search through the port P_Key table for the requested pkey value. * The port has to be assigned to the respective IB partition in @@ -126,23 +135,24 @@ } set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); - ret = ib_qp_create(&qp_create, &priv->qp, &priv->local_qpn); - if (ret) { + priv->qp = ib_create_qp(priv->pd, &init_attr, &qp_cap); + if (IS_ERR(priv->qp)) { TS_REPORT_FATAL(MOD_IB_NET, "%s: failed to create QP", dev->name); - return ret; + return PTR_ERR(priv->qp); } + priv->local_qpn = priv->qp->qp_num; - qp_attr.state = IB_QP_STATE_INIT; + qp_attr.qp_state = IB_QPS_INIT; qp_attr.qkey = 0; qp_attr.port = priv->port; qp_attr.pkey_index = pkey_index; - qp_attr.valid_fields = - IB_QP_ATTRIBUTE_QKEY | - IB_QP_ATTRIBUTE_PORT | - IB_QP_ATTRIBUTE_PKEY_INDEX | - IB_QP_ATTRIBUTE_STATE; - ret = ib_qp_modify(priv->qp, &qp_attr); + attr_mask = + IB_QP_QKEY | + IB_QP_PORT | + IB_QP_PKEY_INDEX | + IB_QP_STATE; + ret = ib_modify_qp(priv->qp, &qp_attr, attr_mask, &qp_cap); if (ret) { TS_REPORT_FATAL(MOD_IB_NET, "%s: failed to modify QP to init, ret = %d", @@ -150,10 +160,10 @@ goto out_fail; } - qp_attr.state = IB_QP_STATE_RTR; + qp_attr.qp_state = IB_QPS_RTR; /* Can't set this in a INIT->RTR transition */ - qp_attr.valid_fields &= ~IB_QP_ATTRIBUTE_PORT; - ret = ib_qp_modify(priv->qp, &qp_attr); + attr_mask &= ~IB_QP_PORT; + ret = ib_modify_qp(priv->qp, &qp_attr, attr_mask, &qp_cap); if (ret) { TS_REPORT_FATAL(MOD_IB_NET, "%s: failed to modify QP to RTR, ret = %d", @@ -161,11 +171,11 @@ goto out_fail; } - qp_attr.state = IB_QP_STATE_RTS; - qp_attr.send_psn = 0; - qp_attr.valid_fields |= IB_QP_ATTRIBUTE_SEND_PSN; - qp_attr.valid_fields &= ~IB_QP_ATTRIBUTE_PKEY_INDEX; - ret = ib_qp_modify(priv->qp, &qp_attr); + qp_attr.qp_state = IB_QPS_RTS; + qp_attr.sq_psn = 0; + attr_mask |= IB_QP_SQ_PSN; + attr_mask &= ~IB_QP_PKEY_INDEX; + ret = ib_modify_qp(priv->qp, &qp_attr, attr_mask, &qp_cap); if (ret) { TS_REPORT_FATAL(MOD_IB_NET, "%s: failed to modify QP to RTS, ret = %d", @@ -176,7 +186,7 @@ return 0; out_fail: - ib_qp_destroy(priv->qp); + ib_destroy_qp(priv->qp); priv->qp = NULL; return -EINVAL; @@ -186,7 +196,7 @@ { struct ipoib_dev_priv *priv = dev->priv; - if (ib_qp_destroy(priv->qp)) + if (ib_destroy_qp(priv->qp)) TS_REPORT_WARN(MOD_IB_NET, "%s: ib_qp_destroy failed", dev->name); @@ -227,7 +237,7 @@ priv->mr = ib_reg_phys_mr(priv->pd, &buffer_list, 1, /* list_len */ - IB_MR_LOCAL_WRITE, + IB_ACCESS_LOCAL_WRITE, &dummy_iova); if (IS_ERR(priv->mr)) { TS_REPORT_FATAL(MOD_IB_NET, @@ -255,7 +265,7 @@ struct ipoib_dev_priv *priv = dev->priv; if (priv->qp != NULL) { - if (ib_qp_destroy(priv->qp)) + if (ib_destroy_qp(priv->qp)) TS_REPORT_WARN(MOD_IB_NET, "%s: ib_qp_destroy failed", dev->name); Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c (revision 621) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -33,20 +33,20 @@ u64 work_request_id, dma_addr_t addr) { - struct ib_gather_scatter list = { - .address = addr, + struct ib_sge list = { + .addr = addr, .length = IPOIB_BUF_SIZE, - .key = priv->lkey, + .lkey = priv->lkey, }; - struct ib_receive_param param = { - .work_request_id = work_request_id, - .scatter_list = &list, - .num_scatter_entries = 1, - .device_specific = NULL, - .signaled = 1, + struct ib_recv_wr param = { + .wr_id = work_request_id, + .sg_list = &list, + .num_sge = 1, + .recv_flags = IB_RECV_SIGNALED }; + struct ib_recv_wr *bad_wr; - return ib_receive(priv->qp, ¶m, 1); + return ib_post_recv(priv->qp, ¶m, &bad_wr); } /* =============================================================== */ @@ -286,23 +286,24 @@ struct ib_ah *address, u32 qpn, dma_addr_t addr, int len) { - struct ib_gather_scatter list = { - .address = addr, + struct ib_sge list = { + .addr = addr, .length = len, - .key = priv->lkey, + .lkey = priv->lkey, }; - struct ib_send_param param = { - .work_request_id = work_request_id, - .op = IB_OP_SEND, - .gather_list = &list, - .num_gather_entries = 1, - .dest_qpn = qpn, - .dest_qkey = priv->qkey, - .dest_address = address, - .signaled = 1, + struct ib_send_wr param = { + .wr_id = work_request_id, + .opcode = IB_WR_SEND, + .sg_list = &list, + .num_sge = 1, + .wr.ud.remote_qpn = qpn, + .wr.ud.remote_qkey = priv->qkey, + .wr.ud.ah = address, + .send_flags = IB_SEND_SIGNALED, }; + struct ib_send_wr *bad_wr; - return ib_send(priv->qp, ¶m, 1); + return ib_post_send(priv->qp, ¶m, &bad_wr); } /* =============================================================== */ Index: src/linux-kernel/infiniband/ulp/srp/srp_host.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_host.c (revision 620) +++ src/linux-kernel/infiniband/ulp/srp/srp_host.c (working copy) @@ -405,7 +405,7 @@ } else if (srp_pkt->in_use) { - srp_pkt->scatter_gather_list.address = + srp_pkt->scatter_gather_list.addr = (u64) (unsigned long)srp_pkt->data; srp_pkt->scatter_gather_list.length = srp_cmd_pkt_size; srp_pkt->in_use = 0; @@ -436,7 +436,7 @@ } else if (srp_pkt->in_use) { - srp_pkt->scatter_gather_list.address = + srp_pkt->scatter_gather_list.addr = (u64) (unsigned long)srp_pkt->data; srp_pkt->scatter_gather_list.length = srp_cmd_pkt_size; srp_pkt->in_use = 0; @@ -501,7 +501,7 @@ if (recv_pkt) { recv_pkt->conn = s; - recv_pkt->scatter_gather_list.key = + recv_pkt->scatter_gather_list.lkey = target->l_key[s->port->hca->hca_index]; recv_pkt->scatter_gather_list.length = srp_cmd_pkt_size; @@ -606,8 +606,8 @@ target->srp_pkt_data_mhndl[hca_index] = ib_reg_phys_mr(hca->pd_hndl, &buffer_list, 1, /*list_len */ - IB_MR_LOCAL_WRITE | - IB_MR_REMOTE_READ, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_READ, &iova); if (IS_ERR(target->srp_pkt_data_mhndl[hca_index])) { @@ -631,7 +631,7 @@ srp_pkt->conn = INVALID_CONN_HANDLE; srp_pkt->target = target; srp_pkt->data = srp_pkt_data; - srp_pkt->scatter_gather_list.address = + srp_pkt->scatter_gather_list.addr = (u64) (unsigned long) srp_pkt_data; srp_pkt->scatter_gather_list.length = srp_cmd_pkt_size; @@ -2361,11 +2361,11 @@ ioq->recv_pkt = recv_pkt; send_pkt->conn = s; - send_pkt->scatter_gather_list.key = + send_pkt->scatter_gather_list.lkey = target->l_key[s->port->hca->hca_index]; recv_pkt->conn = s; - recv_pkt->scatter_gather_list.key = + recv_pkt->scatter_gather_list.lkey = target->l_key[s->port->hca->hca_index]; } else { Index: src/linux-kernel/infiniband/ulp/srp/srp_host.h =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_host.h (revision 620) +++ src/linux-kernel/infiniband/ulp/srp/srp_host.h (working copy) @@ -244,7 +244,7 @@ int pkt_index; int in_use; char *data; - struct ib_gather_scatter scatter_gather_list; + struct ib_sge scatter_gather_list; u32 r_key; } srp_pkt_t; Index: src/linux-kernel/infiniband/ulp/srp/srptp.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srptp.c (revision 620) +++ src/linux-kernel/infiniband/ulp/srp/srptp.c (working copy) @@ -148,16 +148,17 @@ int srptp_post_recv(srp_pkt_t * srp_pkt) { int status; - struct ib_receive_param rcv_param; + struct ib_recv_wr rcv_param; + struct ib_recv_wr *bad_wr; - memset(&rcv_param, 0x00, sizeof(struct ib_receive_param)); - rcv_param.work_request_id = srp_pkt->pkt_index; - rcv_param.scatter_list = &srp_pkt->scatter_gather_list; - rcv_param.num_scatter_entries = 1; - rcv_param.device_specific = NULL; - rcv_param.signaled = 1; + memset(&rcv_param, 0x00, sizeof rcv_param); + rcv_param.next = NULL; + rcv_param.wr_id = srp_pkt->pkt_index; + rcv_param.sg_list = &srp_pkt->scatter_gather_list; + rcv_param.num_sge = 1; + rcv_param.recv_flags = IB_RECV_SIGNALED; - status = ib_receive(srp_pkt->conn->qp_hndl, &rcv_param, 1); + status = ib_post_recv(srp_pkt->conn->qp_hndl, &rcv_param, &bad_wr); if (status) { TS_REPORT_FATAL(MOD_SRPTP, "Post Recv Failed failed"); @@ -173,17 +174,19 @@ */ int srptp_post_send(srp_pkt_t * srp_pkt) { - struct ib_send_param send_param; + struct ib_send_wr send_param; + struct ib_send_wr *bad_wr; int status; - memset(&send_param, 0x00, sizeof(struct ib_send_param)); - send_param.work_request_id = srp_pkt->pkt_index; - send_param.op = IB_OP_SEND; - send_param.gather_list = &srp_pkt->scatter_gather_list; - send_param.num_gather_entries = 1; - send_param.signaled = 1; + memset(&send_param, 0x00, sizeof send_param); + send_param.next = NULL; + send_param.wr_id = srp_pkt->pkt_index; + send_param.opcode = IB_WR_SEND; + send_param.sg_list = &srp_pkt->scatter_gather_list; + send_param.num_sge = 1; + send_param.send_flags = IB_SEND_SIGNALED; - status = ib_send(srp_pkt->conn->qp_hndl, &send_param, 1); + status = ib_post_send(srp_pkt->conn->qp_hndl, &send_param, &bad_wr); if (status) { TS_REPORT_FATAL(MOD_SRPTP, "ib_send failed: %d", status); @@ -460,7 +463,7 @@ } } - status = ib_qp_destroy(qp_hndl); + status = ib_destroy_qp(qp_hndl); if (status) TS_REPORT_WARN(MOD_SRPTP, "QP destroy failed"); @@ -489,7 +492,7 @@ srptp_reason = SRPTP_FAILURE; } - status = ib_qp_destroy(qp_hndl); + status = ib_destroy_qp(qp_hndl); if (status) TS_REPORT_WARN(MOD_SRPTP, "QP destroy failed"); @@ -506,7 +509,7 @@ */ srptp_reason = SRPTP_FAILURE; - status = ib_qp_destroy(qp_hndl); + status = ib_destroy_qp(qp_hndl); if (status) TS_REPORT_WARN(MOD_SRPTP, "QP destroy failed"); @@ -652,8 +655,10 @@ struct ib_path_record *path_record, char *srp_login_req, int srp_login_req_len) { - struct ib_qp_create_param qp_param; - struct ib_qp_attribute *qp_attr; + struct ib_qp_init_attr init_attr; + struct ib_qp_cap qp_cap; + struct ib_qp_attr *qp_attr; + int attr_mask; struct ib_cm_active_param active_param; srp_host_hca_params_t *hca; int status; @@ -665,52 +670,52 @@ * connection */ - qp_param.limit.max_outstanding_send_request = MAX_SEND_WQES; - qp_param.limit.max_outstanding_receive_request = MAX_RECV_WQES; - qp_param.limit.max_send_gather_element = 1; - qp_param.limit.max_receive_scatter_element = 1; + init_attr.cap.max_send_wr = MAX_SEND_WQES; + init_attr.cap.max_recv_wr = MAX_RECV_WQES; + init_attr.cap.max_send_sge = 1; + init_attr.cap.max_recv_sge = 1; - qp_param.pd = hca->pd_hndl; - qp_param.send_queue = conn->cqs_hndl; - qp_param.receive_queue = conn->cqr_hndl; - qp_param.send_policy = IB_WQ_SIGNAL_ALL; - qp_param.receive_policy = IB_WQ_SIGNAL_ALL; - qp_param.transport = IB_TRANSPORT_RC; - qp_param.device_specific = NULL; + init_attr.send_cq = conn->cqs_hndl; + init_attr.recv_cq = conn->cqr_hndl; + init_attr.sq_sig_type = IB_SIGNAL_ALL_WR; + init_attr.rq_sig_type = IB_SIGNAL_ALL_WR; + init_attr.qp_type = IB_QPT_RC; - status = ib_qp_create(&qp_param, &conn->qp_hndl, &conn->qpn); - if (status) { - TS_REPORT_FATAL(MOD_SRPTP, "QP Create failed %d", status); - return (status); + conn->qp_hndl = ib_create_qp(hca->pd_hndl, &init_attr, &qp_cap); + if (IS_ERR(conn->qp_hndl)) { + TS_REPORT_FATAL(MOD_SRPTP, "QP Create failed %d", + PTR_ERR(conn->qp_hndl)); + return PTR_ERR(conn->qp_hndl); } + conn->qpn = conn->qp_hndl->qp_num; /* * Modify QP to init state */ - qp_attr = kmalloc(sizeof(struct ib_qp_attribute), GFP_ATOMIC); + qp_attr = kmalloc(sizeof *qp_attr, GFP_ATOMIC); if (qp_attr == NULL) { return (ENOMEM); } - memset(qp_attr, 0x00, sizeof(struct ib_qp_attribute)); + memset(qp_attr, 0x00, sizeof *qp_attr); - qp_attr->valid_fields |= IB_QP_ATTRIBUTE_STATE; - qp_attr->state = IB_QP_STATE_INIT; + attr_mask = IB_QP_STATE; + qp_attr->qp_state = IB_QPS_INIT; - qp_attr->valid_fields |= IB_QP_ATTRIBUTE_RDMA_ATOMIC_ENABLE; - qp_attr->enable_rdma_read = 1; - qp_attr->enable_rdma_write = 1; + attr_mask |= IB_QP_ACCESS_FLAGS; + qp_attr->qp_access_flags = + IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_WRITE; - qp_attr->valid_fields |= IB_QP_ATTRIBUTE_PORT; + attr_mask |= IB_QP_PORT; qp_attr->port = conn->port->local_port; - qp_attr->valid_fields |= IB_QP_ATTRIBUTE_PKEY_INDEX; + attr_mask |= IB_QP_PKEY_INDEX; qp_attr->pkey_index = 0; - status = ib_qp_modify(conn->qp_hndl, qp_attr); + status = ib_modify_qp(conn->qp_hndl, qp_attr, attr_mask, &qp_cap); kfree(qp_attr); if (status) { - ib_qp_destroy(conn->qp_hndl); + ib_destroy_qp(conn->qp_hndl); return (status); } /* @@ -743,7 +748,7 @@ TS_REPORT_FATAL(MOD_SRPTP, "tsIbConnect failed: %d", status); conn->comm_id = (tTS_IB_CM_COMM_ID) - 1; - ib_qp_destroy(conn->qp_hndl); + ib_destroy_qp(conn->qp_hndl); srp_host_close_conn(conn); return (status); Index: src/linux-kernel/infiniband/ulp/sdp/sdp_send.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_send.c (revision 620) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_send.c (working copy) @@ -243,7 +243,8 @@ /*.._sdp_send_buff_post -- Post a buffer send on a SDP connection. */ static s32 _sdp_send_buff_post(struct sdp_opt *conn, struct sdpc_buff *buff) { - struct ib_send_param send_param = { 0 }; + struct ib_send_wr send_param = { 0 }; + struct ib_send_wr *bad_wr; s32 result; TS_CHECK_NULL(conn, -EINVAL); @@ -270,7 +271,7 @@ result = _sdp_msg_cpu_to_net_bsdh(buff->bsdh_hdr); TS_EXPECT(MOD_LNX_SDP, !(0 > result)); - send_param.op = IB_OP_SEND; + send_param.opcode = IB_WR_SEND; /* * OOB processing. If there is a single OOB byte in flight then the * pending flag is set as early as possible. IF a second OOB byte @@ -319,7 +320,7 @@ */ if (0 < TS_SDP_BUFF_F_GET_SE(buff)) { - send_param.solicited_event = 1; + send_param.send_flags |= IB_SEND_SOLICITED; } /* * unsignalled event @@ -333,7 +334,7 @@ else { TS_SDP_BUFF_F_CLR_UNSIG(buff); - send_param.signaled = 1; + send_param.send_flags |= IB_SEND_SIGNALED; conn->send_cons = 0; } /* @@ -357,11 +358,12 @@ /* * post send */ - send_param.work_request_id = buff->ib_wrid; - send_param.gather_list = TS_SDP_BUFF_GAT_SCAT(buff); - send_param.num_gather_entries = 1; + send_param.next = NULL; + send_param.wr_id = buff->ib_wrid; + send_param.sg_list = TS_SDP_BUFF_GAT_SCAT(buff); + send_param.num_sge = 1; - result = ib_send(conn->qp, &send_param, 1); + result = ib_post_send(conn->qp, &send_param, &bad_wr); if (0 != result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_WARN, @@ -529,7 +531,8 @@ /*.._sdp_send_data_buff_snk -- Post data for buffered transmission */ static s32 _sdp_send_data_buff_snk(struct sdp_opt *conn, struct sdpc_buff *buff) { - struct ib_send_param send_param = { 0 }; + struct ib_send_wr send_param = { 0 }; + struct ib_send_wr *bad_wr; struct sdpc_advt *advt; s32 result; s32 zcopy; @@ -586,10 +589,10 @@ /* * setup RDMA write */ - send_param.op = IB_OP_RDMA_WRITE; - send_param.remote_address = advt->addr; - send_param.rkey = advt->rkey; - send_param.signaled = 1; + send_param.opcode = IB_WR_RDMA_WRITE; + send_param.wr.rdma.remote_addr = advt->addr; + send_param.wr.rdma.rkey = advt->rkey; + send_param.send_flags = IB_SEND_SIGNALED; buff->ib_wrid = conn->send_wrid++; buff->lkey = conn->l_key; @@ -630,11 +633,12 @@ /* * post RDMA */ - send_param.work_request_id = buff->ib_wrid; - send_param.gather_list = TS_SDP_BUFF_GAT_SCAT(buff); - send_param.num_gather_entries = 1; + send_param.next = NULL; + send_param.wr_id = buff->ib_wrid; + send_param.sg_list = TS_SDP_BUFF_GAT_SCAT(buff); + send_param.num_sge = 1; - result = ib_send(conn->qp, &send_param, 1); + result = ib_post_send(conn->qp, &send_param, &bad_wr); if (0 != result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_WARN, @@ -681,8 +685,9 @@ /*.._sdp_send_data_iocb_snk -- process a zcopy write advert in the data path */ s32 _sdp_send_data_iocb_snk(struct sdp_opt *conn, struct sdpc_iocb *iocb) { - struct ib_send_param send_param = { 0 }; - struct ib_gather_scatter sg_val; + struct ib_send_wr send_param = { 0 }; + struct ib_send_wr *bad_wr; + struct ib_sge sg_val; struct sdpc_advt *advt; s32 result; s32 zcopy; @@ -722,14 +727,14 @@ */ zcopy = min(advt->size, iocb->len); - sg_val.address = iocb->io_addr; - sg_val.key = iocb->l_key; + sg_val.addr = iocb->io_addr; + sg_val.lkey = iocb->l_key; sg_val.length = zcopy; - send_param.op = IB_OP_RDMA_WRITE; - send_param.remote_address = advt->addr; - send_param.rkey = advt->rkey; - send_param.signaled = 1; + send_param.opcode = IB_WR_RDMA_WRITE; + send_param.wr.rdma.remote_addr = advt->addr; + send_param.wr.rdma.rkey = advt->rkey; + send_param.send_flags = IB_SEND_SIGNALED; iocb->wrid = conn->send_wrid++; iocb->len -= zcopy; @@ -757,11 +762,12 @@ /* * post RDMA */ - send_param.work_request_id = iocb->wrid; - send_param.gather_list = &sg_val; - send_param.num_gather_entries = 1; + send_param.next = NULL; + send_param.wr_id = iocb->wrid; + send_param.sg_list = &sg_val; + send_param.num_sge = 1; - result = ib_send(conn->qp, &send_param, 1); + result = ib_post_send(conn->qp, &send_param, &bad_wr); if (0 != result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_WARN, Index: src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (revision 619) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -697,7 +697,7 @@ */ if (conn->qp) { - result = ib_qp_destroy(conn->qp); + result = ib_destroy_qp(conn->qp); if (0 > result && -EINVAL != result) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_WARN, @@ -997,10 +997,12 @@ struct ib_device *device, tTS_IB_PORT hw_port) { - struct ib_qp_create_param *qp_param; - struct ib_qp_attribute *qp_attr; + struct ib_qp_init_attr *init_attr; + struct ib_qp_attr *qp_attr; struct sdev_hca_port *port = NULL; struct sdev_hca *hca = NULL; + struct ib_qp_cap qp_cap; + int attr_mask; int result; TS_CHECK_NULL(conn, -EINVAL); @@ -1031,22 +1033,22 @@ /* * allocate creation parameters */ - qp_attr = kmalloc(sizeof(struct ib_qp_attribute), GFP_KERNEL); + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); if (NULL == qp_attr) { result = -ENOMEM; goto error_attr; } /* if */ - qp_param = kmalloc(sizeof(struct ib_qp_create_param), GFP_KERNEL); - if (NULL == qp_param) { + init_attr = kmalloc(sizeof *init_attr, GFP_KERNEL); + if (NULL == init_attr) { result = -ENOMEM; goto error_param; } /* if */ - memset(qp_attr, 0, sizeof(struct ib_qp_attribute)); - memset(qp_param, 0, sizeof(struct ib_qp_create_param)); + memset(qp_attr, 0, sizeof *qp_attr); + memset(init_attr, 0, sizeof *init_attr); /* * set port specific connection parameters. */ @@ -1115,49 +1117,47 @@ if (!conn->qp) { - qp_param->limit.max_outstanding_send_request = - conn->send_cq_size; - qp_param->limit.max_outstanding_receive_request = - conn->recv_cq_size; - qp_param->limit.max_send_gather_element = - TS_SDP_QP_LIMIT_SG_SEND; - qp_param->limit.max_receive_scatter_element = - TS_SDP_QP_LIMIT_SG_RECV; + init_attr->cap.max_send_wr = conn->send_cq_size; + init_attr->cap.max_recv_wr = conn->recv_cq_size; + init_attr->cap.max_send_sge = TS_SDP_QP_LIMIT_SG_SEND; + init_attr->cap.max_recv_sge = TS_SDP_QP_LIMIT_SG_RECV; - qp_param->pd = conn->pd; - qp_param->send_queue = conn->send_cq; - qp_param->receive_queue = conn->recv_cq; - qp_param->send_policy = IB_WQ_SIGNAL_SELECTABLE; - qp_param->receive_policy = IB_WQ_SIGNAL_ALL; - qp_param->transport = IB_TRANSPORT_RC; + init_attr->send_cq = conn->send_cq; + init_attr->recv_cq = conn->recv_cq; + init_attr->sq_sig_type = IB_SIGNAL_REQ_WR; + init_attr->rq_sig_type = IB_SIGNAL_ALL_WR; + init_attr->qp_type = IB_QPT_RC; - result = ib_qp_create(qp_param, &conn->qp, &conn->s_qpn); - if (0 != result) { + conn->qp = ib_create_qp(conn->pd, init_attr, &qp_cap); + if (IS_ERR(conn->qp)) { + result = PTR_ERR(conn->qp); TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, "INIT: Error <%d> creating queue pair.", result); goto error_qp; } + conn->s_qpn = conn->qp->qp_num; + /* * modify QP to INIT */ - memset(qp_attr, 0, sizeof(struct ib_qp_attribute)); + memset(qp_attr, 0, sizeof(struct ib_qp_attr)); - qp_attr->valid_fields |= IB_QP_ATTRIBUTE_STATE; - qp_attr->state = IB_QP_STATE_INIT; + attr_mask = IB_QP_STATE; + qp_attr->qp_state = IB_QPS_INIT; - qp_attr->valid_fields |= IB_QP_ATTRIBUTE_RDMA_ATOMIC_ENABLE; - qp_attr->enable_rdma_read = 1; - qp_attr->enable_rdma_write = 1; + attr_mask |= IB_QP_ACCESS_FLAGS; + qp_attr->qp_access_flags = + IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_WRITE; - qp_attr->valid_fields |= IB_QP_ATTRIBUTE_PORT; + attr_mask |= IB_QP_PORT; qp_attr->port = conn->hw_port; - qp_attr->valid_fields |= IB_QP_ATTRIBUTE_PKEY_INDEX; + attr_mask |= IB_QP_PKEY_INDEX; qp_attr->pkey_index = 0; - result = ib_qp_modify(conn->qp, qp_attr); + result = ib_modify_qp(conn->qp, qp_attr, attr_mask, &qp_cap); if (0 != result) { @@ -1172,7 +1172,7 @@ goto done; error_mod: - (void)ib_qp_destroy(conn->qp); + (void)ib_destroy_qp(conn->qp); error_qp: (void)ib_destroy_cq(conn->recv_cq); error_rcq: @@ -1182,7 +1182,7 @@ conn->recv_cq = NULL; conn->qp = NULL; done: - kfree(qp_param); + kfree(init_attr); error_param: kfree(qp_attr); error_attr: Index: src/linux-kernel/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_recv.c (revision 619) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_recv.c (working copy) @@ -40,7 +40,8 @@ /*.._sdp_post_recv_buff -- post a single buffers for data recv */ static s32 _sdp_post_recv_buff(struct sdp_opt *conn) { - struct ib_receive_param receive_param = { 0 }; + struct ib_recv_wr receive_param = { 0 }; + struct ib_recv_wr *bad_wr; s32 result; struct sdpc_buff *buff; @@ -88,12 +89,13 @@ /* * post recv */ - receive_param.work_request_id = buff->ib_wrid; - receive_param.scatter_list = TS_SDP_BUFF_GAT_SCAT(buff); - receive_param.num_scatter_entries = 1; - receive_param.signaled = 1; + receive_param.next = NULL; + receive_param.wr_id = buff->ib_wrid; + receive_param.sg_list = TS_SDP_BUFF_GAT_SCAT(buff); + receive_param.num_sge = 1; + receive_param.recv_flags = IB_RECV_SIGNALED; - result = ib_receive(conn->qp, &receive_param, 1); + result = ib_post_recv(conn->qp, &receive_param, &bad_wr); if (0 != result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_WARN, @@ -115,7 +117,8 @@ /*.._sdp_post_rdma_buff -- post a single buffers for rdma read on a conn */ static s32 _sdp_post_rdma_buff(struct sdp_opt *conn) { - struct ib_send_param send_param = { 0 }; + struct ib_send_wr send_param = { 0 }; + struct ib_send_wr *bad_wr; struct sdpc_advt *advt; s32 result; struct sdpc_buff *buff; @@ -162,10 +165,10 @@ buff->ib_wrid = TS_SDP_WRID_READ_FLAG | conn->recv_wrid++; - send_param.op = IB_OP_RDMA_READ; - send_param.remote_address = advt->addr; - send_param.rkey = advt->rkey; - send_param.signaled = 1; + send_param.opcode = IB_WR_RDMA_READ; + send_param.wr.rdma.remote_addr = advt->addr; + send_param.wr.rdma.rkey = advt->rkey; + send_param.send_flags = IB_SEND_SIGNALED; advt->wrid = buff->ib_wrid; advt->size -= (buff->tail - buff->data); @@ -222,11 +225,12 @@ /* * post rdma */ - send_param.work_request_id = buff->ib_wrid; - send_param.gather_list = TS_SDP_BUFF_GAT_SCAT(buff); - send_param.num_gather_entries = 1; + send_param.next = NULL; + send_param.wr_id = buff->ib_wrid; + send_param.sg_list = TS_SDP_BUFF_GAT_SCAT(buff); + send_param.num_sge = 1; - result = ib_send(conn->qp, &send_param, 1); + result = ib_post_send(conn->qp, &send_param, &bad_wr); if (0 != result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_WARN, @@ -250,8 +254,9 @@ /*.._sdp_post_rdma_iocb_src -- post a iocb for rdma read on a conn */ static s32 _sdp_post_rdma_iocb_src(struct sdp_opt *conn) { - struct ib_send_param send_param = { 0 }; - struct ib_gather_scatter sg_val; + struct ib_send_wr send_param = { 0 }; + struct ib_send_wr *bad_wr; + struct ib_sge sg_val; struct sdpc_iocb *iocb; struct sdpc_advt *advt; s32 result; @@ -307,14 +312,14 @@ */ zcopy = min(advt->size, iocb->len); - sg_val.address = iocb->io_addr; - sg_val.key = iocb->l_key; + sg_val.addr = iocb->io_addr; + sg_val.lkey = iocb->l_key; sg_val.length = zcopy; - send_param.op = IB_OP_RDMA_READ; - send_param.remote_address = advt->addr; - send_param.rkey = advt->rkey; - send_param.signaled = 1; + send_param.opcode = IB_WR_RDMA_READ; + send_param.wr.rdma.remote_addr = advt->addr; + send_param.wr.rdma.rkey = advt->rkey; + send_param.send_flags = IB_SEND_SIGNALED; iocb->wrid = TS_SDP_WRID_READ_FLAG | conn->recv_wrid++; iocb->len -= zcopy; @@ -392,11 +397,12 @@ /* * post RDMA */ - send_param.work_request_id = iocb->wrid; - send_param.gather_list = &sg_val; - send_param.num_gather_entries = 1; + send_param.next = NULL; + send_param.wr_id = iocb->wrid; + send_param.sg_list = &sg_val; + send_param.num_sge = 1; - result = ib_send(conn->qp, &send_param, 1); + result = ib_post_send(conn->qp, &send_param, &bad_wr); if (0 != result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_WARN, Index: src/linux-kernel/infiniband/ulp/sdp/sdp_buff.h =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_buff.h (revision 619) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_buff.h (working copy) @@ -93,7 +93,7 @@ #define TS_SDP_BUFF_GAT_SCAT(buff) \ ({ (buff)->real = virt_to_phys((buff)->data); \ (buff)->size = (buff)->tail - (buff)->data; \ - (struct ib_gather_scatter *)(&(buff)->real); }) + (struct ib_sge *)(&(buff)->real); }) /* * function prototypes used in certain functions. */ From roland at topspin.com Fri Aug 13 10:49:24 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 10:49:24 -0700 Subject: [openib-general] [PATCH][0/4] QP API patches Message-ID: <10924193641148@topspin.com> This series of patches (broken up because the total diff is rather giant) updates my branch to the new QP API. - R. From halr at voltaire.com Fri Aug 13 13:34:09 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 13 Aug 2004 16:34:09 -0400 Subject: [openib-general] [PATCH] GSI: Move some temporary definitions Message-ID: <1092429249.2926.34.camel@localhost.localdomain> Move some temporary definitions Index: access/gsi_main.c =================================================================== --- access/gsi_main.c (revision 636) +++ access/gsi_main.c (working copy) @@ -71,6 +71,7 @@ #include "gsi_redir.h" #endif #include "ib_core.h" +#include "ib_core_types.h" #include "vversion.h" #include "vv_list.h" #include "rmpp/rmpp_api.h" Index: include/ib_core.h =================================================================== --- include/ib_core.h (revision 634) +++ include/ib_core.h (working copy) @@ -27,6 +27,17 @@ struct ib_device *ib_device_get_by_name(const char *name); struct ib_device *ib_device_get_by_index(int index); +enum { + IB_DEVICE_NOTIFIER_ADD, + IB_DEVICE_NOTIFIER_REMOVE +}; + +struct ib_device_notifier { + void (*notifier) (struct ib_device_notifier * self, + struct ib_device * device, int event); + struct list_head list; +}; + int ib_device_notifier_register(struct ib_device_notifier *notifier); int ib_device_notifier_deregister(struct ib_device_notifier *notifier); Index: include/ib_core_types.h =================================================================== --- include/ib_core_types.h (revision 634) +++ include/ib_core_types.h (working copy) @@ -22,17 +22,6 @@ #ifndef _IB_CORE_TYPES_H #define _IB_CORE_TYPES_H -enum { - IB_DEVICE_NOTIFIER_ADD, - IB_DEVICE_NOTIFIER_REMOVE -}; - -struct ib_device_notifier { - void (*notifier) (struct ib_device_notifier * self, - struct ib_device * device, int event); - struct list_head list; -}; - struct ib_grh { u8 ip_version; u8 traffic_class; From halr at voltaire.com Fri Aug 13 13:52:29 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 13 Aug 2004 16:52:29 -0400 Subject: [openib-general] [PATCH] GSI: Make pools named Message-ID: <1092430349.1831.37.camel@localhost.localdomain> Make pools named Index: gsi_main.c =================================================================== --- gsi_main.c (revision 642) +++ gsi_main.c (working copy) @@ -756,8 +756,9 @@ printk(KERN_DEBUG "Created QP - %d\n", hca->qp->qp_num); - ret = gsi_dtgrm_pool_create(GSI_QP_RCV_SIZE, - &hca->rcv_dtgrm_pool); + ret = gsi_dtgrm_pool_create_named(GSI_QP_RCV_SIZE, + "rcv", + &hca->rcv_dtgrm_pool); if (ret < 0) { printk(KERN_ERR "Could not create receive datagram pool\n"); goto error6; @@ -887,16 +888,18 @@ #ifdef GSI_RMPP_SUPPORT if (rmpp) { - if (gsi_dtgrm_pool_create(GSI_RMPP_RCV_POOL_SIZE, - &newinfo->rmpp_rcv_dtgrm_pool) < 0) { + if (gsi_dtgrm_pool_create_named(GSI_RMPP_RCV_POOL_SIZE, + "rmpp-rcv", + &newinfo->rmpp_rcv_dtgrm_pool) < 0) { printk(KERN_ERR \ "Could not create RMPP receive pool\n"); ret = -ENOMEM; goto error3; } - if (gsi_dtgrm_pool_create(GSI_RMPP_SND_POOL_SIZE, - &newinfo->rmpp_snd_dtgrm_pool) < 0) { + if (gsi_dtgrm_pool_create_named(GSI_RMPP_SND_POOL_SIZE, + "rmpp-snd", + &newinfo->rmpp_snd_dtgrm_pool) < 0) { printk(KERN_ERR "Could not create RMPP send pool\n"); ret = -ENOMEM; goto error4; @@ -2942,27 +2945,19 @@ /* * Create datagram pool */ -#if 0 /* GSI_POOL_TRACE */ -int gsi_dtgrm_pool_create_named(u32 cnt, void **handle, char *modname) -#else -int gsi_dtgrm_pool_create(u32 cnt, void **handle) -#endif +int gsi_dtgrm_pool_create_named(u32 cnt, char *pool_name, void **handle) { struct gsi_dtgrm_pool_info_st *pool; char name[GSI_POOL_MAX_NAME_LEN]; -#if 0 /* GSI_POOL_TRACE */ /* * Sanity check */ - if (!modname || (strlen(modname) > (GSI_POOL_MAX_NAME_LEN - 8))) { + if (!pool_name || (strlen(pool_name) > (GSI_POOL_MAX_NAME_LEN - 8))) { printk(KERN_ERR "Invalid pool name\n"); return -ENOENT; } - sprintf(name, "gsi_%s%-d", modname, gsi_pool_cnt++); -#else - sprintf(name, "gsi%-d", gsi_pool_cnt++); -#endif + sprintf(name, "gsi_%s%-d", pool_name, gsi_pool_cnt++); pool = kmalloc(sizeof (*pool), GFP_KERNEL); if (!pool) { @@ -3158,11 +3153,7 @@ */ EXPORT_SYMBOL_NOVERS(gsi_reg_class); EXPORT_SYMBOL_NOVERS(gsi_dereg_class); -#if 0 /* GSI_POOL_TRACE */ EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_create_named); -#else -EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_create); -#endif EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_destroy); EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_get); EXPORT_SYMBOL_NOVERS(gsi_dtgrm_pool_put); Index: gsi.h =================================================================== --- gsi.h (revision 634) +++ gsi.h (working copy) @@ -59,10 +59,6 @@ #include "mad.h" #include "class_port_info.h" -#if 0 -#define GSI_POOL_TRACE -#endif - #define GSI_DTGRM_POOL_MAX_SIZE 5000 #define GSI_QP1_WELL_KNOWN_Q_KEY 0x80010000 @@ -183,14 +179,8 @@ /* * Create a datagram pool (see struct gsi_dtgrm_t) */ -#if 0 /* GSI_POOL_TRACE */ -int gsi_dtgrm_pool_create_named(u32 cnt, void **handle, char *modname); +int gsi_dtgrm_pool_create_named(u32 cnt, char *pool_name, void **handle); -#define gsi_dtgrm_pool_create(cnt, handle) gsi_dtgrm_pool_create_named((cnt), (handle), (char *)((THIS_MODULE)->name)) -#else -int gsi_dtgrm_pool_create(u32 cnt, void **handle); -#endif - /* * Destroy datagram pool */ From Tom.Duffy at Sun.COM Fri Aug 13 14:07:21 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 13 Aug 2004 14:07:21 -0700 Subject: [openib-general] [PATCH][TRIVIAL] Fix mad_ib.c build In-Reply-To: <10924193653247@topspin.com> References: <10924193653247@topspin.com> Message-ID: <1092431240.29661.8.camel@localhost> Index: drivers/infiniband/core/mad_ib.c =================================================================== --- drivers/infiniband/core/mad_ib.c (revision 643) +++ drivers/infiniband/core/mad_ib.c (working copy) @@ -104,7 +104,7 @@ mad->dqpn ? IB_GSI_WELL_KNOWN_QKEY : 0; pci_unmap_addr_set(&priv->send_buf[mad->port][mad->sqpn][index], - mapping, gather_list.address); + mapping, gather_list.addr); if (ib_post_send(priv->qp[mad->port][mad->sqpn], &send_param, &bad_wr)) { TS_REPORT_WARN(MOD_KERNEL_IB, @@ -324,7 +324,7 @@ priv->receive_buf[port][qpn][index].buf = buf; pci_unmap_addr_set(&priv->receive_buf[port][qpn][index], - mapping, scatter_list.address); + mapping, scatter_list.addr); if (ib_post_recv(priv->qp[port][qpn], &receive_param, &bad_wr)) { TS_REPORT_WARN(MOD_KERNEL_IB, From halr at voltaire.com Fri Aug 13 14:14:11 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 13 Aug 2004 17:14:11 -0400 Subject: [openib-general] [PATCH] GSI: Create send pool per HCA Message-ID: <1092431651.1831.40.camel@localhost.localdomain> Create send pool per HCA Index: gsi_main.c =================================================================== --- gsi_main.c (revision 643) +++ gsi_main.c (working copy) @@ -649,6 +649,7 @@ gsi_hca_stop(hca); gsi_thread_stop(hca); gsi_dtgrm_pool_destroy(hca->rcv_dtgrm_pool); + gsi_dtgrm_pool_destroy(hca->snd_dtgrm_pool); ib_destroy_qp(hca->qp); ib_destroy_cq(hca->cq); kfree(hca); @@ -764,6 +765,14 @@ goto error6; } + ret = gsi_dtgrm_pool_create_named(GSI_QP_SND_SIZE, + "snd", + &hca->snd_dtgrm_pool); + if (ret < 0) { + printk(KERN_ERR "Could not create send datagram pool\n"); + goto error7; + } + spin_lock_init(&hca->rcv_list_lock); spin_lock_init(&hca->snd_list_lock); @@ -774,7 +783,7 @@ ret = gsi_hca_start(hca); if (ret) { printk(KERN_ERR "Could not start device\n"); - goto error7; + goto error8; } GSI_HCA_LIST_LOCK(); @@ -783,9 +792,11 @@ return 0; -error7: +error8: gsi_thread_stop(hca); + gsi_dtgrm_pool_destroy(hca->snd_dtgrm_pool); +error7: gsi_dtgrm_pool_destroy(hca->rcv_dtgrm_pool); error6: ib_destroy_qp(hca->qp); @@ -852,7 +863,8 @@ #endif if ((ret = gsi_register_redirection(hca_name, hca->port, - class, &class_port_info))) { + class, + &class_port_info))) { printk(KERN_ERR \ "Could not register redirection for class (0x%x)!\n", class); @@ -877,6 +889,7 @@ newinfo->send_compl_cb = send_compl_cb; newinfo->receive_cb = receive_cb; newinfo->context = context; + newinfo->snd_dtgrm_pool = hca->snd_dtgrm_pool; newinfo->client_id = server ? GSI_SERVER_ID : gsi_curr_client_id++; spin_lock_init(&newinfo->redirect_class_port_info_list_lock); @@ -2473,15 +2486,19 @@ else printk("DOWN\n"); - printk(" tx:%-8u err:%-5u\n", - hca_info->stat.snd_cnt, hca_info->stat.snd_err_cnt); - printk (" rx:%-8u err:%-5u posted:%-6d pool size:%-6d dtgrm in pool:%-6d\n", hca_info->stat.rcv_cnt, hca_info->stat.rcv_err_cnt, hca_info->stat.rcv_posted_cnt, gsi_dtgrm_pool_size(hca_info->rcv_dtgrm_pool), gsi_dtgrm_pool_dtgrm_cnt(hca_info->rcv_dtgrm_pool)); + + printk + (" tx:%-8u err:%-5u posted:%-6d pool size:%-6d dtgrm in pool:%-6d\n", + hca_info->stat.snd_cnt, hca_info->stat.snd_err_cnt, + hca_info->stat.snd_posted_cnt, + gsi_dtgrm_pool_size(hca_info->snd_dtgrm_pool), + gsi_dtgrm_pool_dtgrm_cnt(hca_info->snd_dtgrm_pool)); } GSI_HCA_LIST_UNLOCK(); Index: gsi_priv.h =================================================================== --- gsi_priv.h (revision 635) +++ gsi_priv.h (working copy) @@ -184,6 +184,7 @@ int up; void *rcv_dtgrm_pool; + void *snd_dtgrm_pool; spinlock_t snd_list_lock; @@ -244,6 +245,7 @@ gsi_receive_cb_t receive_cb; void *context; + void *snd_dtgrm_pool; void *rmpp_h; void *rmpp_rcv_dtgrm_pool; void *rmpp_snd_dtgrm_pool; From roland at topspin.com Fri Aug 13 14:40:57 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 14:40:57 -0700 Subject: [openib-general] [PATCH][TRIVIAL] Fix mad_ib.c build In-Reply-To: <1092431240.29661.8.camel@localhost> (Tom Duffy's message of "Fri, 13 Aug 2004 14:07:21 -0700") References: <10924193653247@topspin.com> <1092431240.29661.8.camel@localhost> Message-ID: <52llgizpg6.fsf@topspin.com> Thanks, applied (shame on me, I only tested on i386 where pci_unmap_addr_set() discards its parameters...). - R. From Tom.Duffy at Sun.COM Fri Aug 13 15:18:03 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 13 Aug 2004 15:18:03 -0700 Subject: [openib-general] [PATCH] rename functions in ip2pr to linux standard conventions In-Reply-To: <1092163430.22057.27.camel@duffman> References: <1092163430.22057.27.camel@duffman> Message-ID: <1092435483.32316.11.camel@localhost> On Tue, 2004-08-10 at 11:43, Tom Duffy wrote: > This patch renames the functions in ip2pr to be in Linux standard naming > conventions. Roland, are you planning on integrating this patch? Or did you want me to do something different? -tduffy From roland at topspin.com Fri Aug 13 15:31:46 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 15:31:46 -0700 Subject: [openib-general] [PATCH] rename functions in ip2pr to linux standard conventions In-Reply-To: <1092435483.32316.11.camel@localhost> (Tom Duffy's message of "Fri, 13 Aug 2004 15:18:03 -0700") References: <1092163430.22057.27.camel@duffman> <1092435483.32316.11.camel@localhost> Message-ID: <524qn6zn3h.fsf@topspin.com> Tom> Roland, are you planning on integrating this patch? Or did Tom> you want me to do something different? Sorry, I had put it in my to-apply folder and forgot about it... it looks fine. - R. From roland at topspin.com Fri Aug 13 15:32:54 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 15:32:54 -0700 Subject: [openib-general] [PATCH] rename functions in ip2pr to linux standard conventions In-Reply-To: <1092435483.32316.11.camel@localhost> (Tom Duffy's message of "Fri, 13 Aug 2004 15:18:03 -0700") References: <1092163430.22057.27.camel@duffman> <1092435483.32316.11.camel@localhost> Message-ID: <52zn4yy8h5.fsf@topspin.com> OK, applied it... thanks. - R. From Tom.Duffy at Sun.COM Fri Aug 13 15:37:29 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 13 Aug 2004 15:37:29 -0700 Subject: [openib-general] [PATCH] rename functions in ip2pr to linux standard conventions In-Reply-To: <52zn4yy8h5.fsf@topspin.com> References: <1092163430.22057.27.camel@duffman> <1092435483.32316.11.camel@localhost> <52zn4yy8h5.fsf@topspin.com> Message-ID: <1092436648.32316.14.camel@localhost> On Fri, 2004-08-13 at 15:32, Roland Dreier wrote: > OK, applied it... thanks. Cool. Thanks. Have a good weekend. -tduffy From Tom.Duffy at Sun.COM Fri Aug 13 16:06:16 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 13 Aug 2004 16:06:16 -0700 Subject: Intent to remove ts_kernel_trace WAS[Re: [openib-general] [PATCH] kill ib_legacy.h] In-Reply-To: <20040810063116.GB6645@mellanox.co.il> References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> Message-ID: <1092438376.32316.30.camel@localhost> On Mon, 2004-08-09 at 23:31, Michael S. Tsirkin wrote: > Incidentally, do you think ts_kernel_trace is still a good idea? > We have printk with priorities ... So, I would like to go through and get rid of TS_TRACE and replace with standard printk. I am planning on using the following mapping from tTS_TRACE_LEVEL to printk priority: T_VERY_TERSE -> KERN_ERR T_TERSE -> KERN_WARNING T_VERBOSE -> KERN_NOTICE T_VERY_VERBOSE -> KERN_INFO T_SCREAM -> KERN_DEBUG What do people think? -tduffy From halr at voltaire.com Fri Aug 13 16:37:44 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 13 Aug 2004 19:37:44 -0400 Subject: [openib-general] [PATCH] [TRIVIAL] In struct ib_wc, member name is src_qp rather than qp Message-ID: <1092440263.1838.13.camel@localhost.localdomain> In struct ib_wc, member name is src_qp rather than qp (to be consistent with Sean's ib_verbs.h version) in Roland's branch Index: core/mad_ib.c =================================================================== --- core/mad_ib.c (revision 649) +++ core/mad_ib.c (working copy) @@ -228,7 +228,7 @@ mad->pkey_index = entry->pkey_index; mad->slid = entry->slid; mad->dlid = entry->dlid_path_bits; - mad->sqpn = entry->qp; + mad->sqpn = entry->src_qp; mad->dqpn = wrid.field.qpn; if (entry->grh_flag) { Index: hw/mthca/mthca_cq.c =================================================================== --- hw/mthca/mthca_cq.c (revision 649) +++ hw/mthca/mthca_cq.c (working copy) @@ -476,7 +476,7 @@ } entry->slid = be16_to_cpu(cqe->rlid); entry->sl = be16_to_cpu(cqe->sl_g_mlpath) >> 12; - entry->qp = be32_to_cpu(cqe->rqpn) & 0xffffff; + entry->src_qp = be32_to_cpu(cqe->rqpn) & 0xffffff; entry->dlid_path_bits = be16_to_cpu(cqe->sl_g_mlpath) & 0x7f; entry->pkey_index = be32_to_cpu(cqe->imm_etype_pkey_eec) >> 16; entry->grh_flag = !!(be16_to_cpu(cqe->sl_g_mlpath) & 0x80); Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 649) +++ include/ib_verbs.h (working copy) @@ -111,7 +111,7 @@ u32 vendor_err; u32 byte_len; u32 imm_data; - u32 qp; + u32 src_qp; int grh_flag:1; int imm_data_valid:1; u16 pkey_index; Index: ulp/ipoib/ipoib_ib.c =================================================================== --- ulp/ipoib/ipoib_ib.c (revision 649) +++ ulp/ipoib/ipoib_ib.c (working copy) @@ -153,7 +153,7 @@ skb_pull(skb, TS_IB_GRH_BYTES); if (entry->slid != priv->local_lid || - entry->qp != priv->local_qpn) { + entry->src_qp != priv->local_qpn) { struct ethhdr *header; skb->protocol = *(uint16_t *)skb->data; From roland at topspin.com Fri Aug 13 17:33:26 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 17:33:26 -0700 Subject: [openib-general] [PATCH] [TRIVIAL] In struct ib_wc, member name is src_qp rather than qp In-Reply-To: <1092440263.1838.13.camel@localhost.localdomain> (Hal Rosenstock's message of "Fri, 13 Aug 2004 19:37:44 -0400") References: <1092440263.1838.13.camel@localhost.localdomain> Message-ID: <52vffmy2w9.fsf@topspin.com> sorry, missed the qp->src_qp change when it happened. I've applied this patch. Thanks, Roland From roland at topspin.com Fri Aug 13 18:14:42 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 18:14:42 -0700 Subject: Intent to remove ts_kernel_trace WAS[Re: [openib-general] [PATCH] kill ib_legacy.h] In-Reply-To: <1092438376.32316.30.camel@localhost> (Tom Duffy's message of "Fri, 13 Aug 2004 16:06:16 -0700") References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> Message-ID: <52r7qay0zh.fsf@topspin.com> Tom> So, I would like to go through and get rid of TS_TRACE and Tom> replace with standard printk. Tom> What do people think? Some TS_TRACE messages can be directly replaced with printk (although in some cases dev_printk might be nice to get a little more info printed). For example stuff like TS_REPORT_WARN(MOD_KERNEL_IB, "Device %s is missing mandatory function %s", device->name, mandatory_table[i].name); return -EINVAL; (in core_device.c) can obviously be replaced with a printk(KERN_ERR ...). However most of the trace messages (probably everything at level VERBOSE and above) are for debugging and need to be wrapped in #ifdef DEBUG or (better) dynamically controlled with a debug level settable through sysfs. For example most of the tracing in cm_*.c is way too verbose to printk all the time (even at level KERN_DEBUG). - R. From roland at topspin.com Fri Aug 13 18:42:44 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 18:42:44 -0700 Subject: [openib-general] modify_device API? Message-ID: <52n00yxzor.fsf@topspin.com> I'm starting to look at implementing the new device query/modify API, and I have a few questions about the modify_device API. First, the enum ib_device_attr_flags enum ib_device_attr_flags { IB_DEVICE_SM = 1, IB_DEVICE_SNMP_TUN_SUP = (1<<1), IB_DEVICE_DM_SUP = (1<<2), IB_DEVICE_VENDOR_CLS_SUP = (1<<3), IB_DEVICE_RESET_QKEY_CNTR = (1<<4) }; seems to leave out a few things that IBTA says can be changed, namely system image GUID, port shutdown, and PortInfo:InitType. Also, there are a couple of extensions we may want to add, namely setting IsCM and possibly IsClientReregistrationSupported in the capabilities. Next, the API int ib_modify_device(struct ib_device *device, u8 port_num, int device_attr_flags); seems to leave out the actual properties structure. One last minor question: system image GUID is really per-device (since it's in NodeInfo, not PortInfo), so requiring a port number to set it seems a little unclean. Is it worth creating a new entry point for setting system image GUID (and any other per-device settings we want)? - Roland From ftillier at infiniconsys.com Fri Aug 13 19:14:19 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Fri, 13 Aug 2004 19:14:19 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <52n00yxzor.fsf@topspin.com> Message-ID: <000201c481a4$68e22fd0$655aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Friday, August 13, 2004 6:43 PM > > I'm starting to look at implementing the new device query/modify API, > and I have a few questions about the modify_device API. > > First, the enum ib_device_attr_flags > > enum ib_device_attr_flags { > IB_DEVICE_SM = 1, > IB_DEVICE_SNMP_TUN_SUP = (1<<1), > IB_DEVICE_DM_SUP = (1<<2), > IB_DEVICE_VENDOR_CLS_SUP = (1<<3), > IB_DEVICE_RESET_QKEY_CNTR = (1<<4) > }; > > seems to leave out a few things that IBTA says can be changed, namely > system image GUID, port shutdown, and PortInfo:InitType. > > Also, there are a couple of extensions we may want to add, namely > setting IsCM and possibly IsClientReregistrationSupported in the > capabilities. I agree we should support setting these bits. > > Next, the API > > int ib_modify_device(struct ib_device *device, > u8 port_num, > int device_attr_flags); > > seems to leave out the actual properties structure. Only if you add things like system image GUID and InitType. Otherwise, the flags provide all the information needed. > > One last minor question: system image GUID is really per-device (since > it's in NodeInfo, not PortInfo), so requiring a port number to set it > seems a little unclean. Is it worth creating a new entry point for > setting system image GUID (and any other per-device settings we want)? > InitType is also per-device. I'd suggest ib_modify_device should be ib_modify_port, and a new API for the system image GUID and InitType created, named something like ib_modify_node. This way, the function names give a clue as to what is being changed - port info for ib_modify_port, and node info for ib_modify_node. Thoughts? - Fab From roland at topspin.com Fri Aug 13 20:08:41 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 13 Aug 2004 20:08:41 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <000201c481a4$68e22fd0$655aa8c0@infiniconsys.com> (Fab Tillier's message of "Fri, 13 Aug 2004 19:14:19 -0700") References: <000201c481a4$68e22fd0$655aa8c0@infiniconsys.com> Message-ID: <52fz6qxvpi.fsf@topspin.com> Fab> Only if you add things like system image GUID and InitType. Fab> Otherwise, the flags provide all the information needed. Hmm... true in a way but what if someone eg only wants to set the IsSM bit without touching anything else? I guess they could do a query first to find out the current state but even that leaves a race open between the query and the modify. Fab> InitType is also per-device. In my copy of the 1.1 spec it's PortInfo:InitType, which seems to indicate that it's actually per-port (unless this has been changed in the errata). Fab> I'd suggest ib_modify_device should be ib_modify_port, and a Fab> new API for the system image GUID and InitType created, named Fab> something like ib_modify_node. This way, the function names Fab> give a clue as to what is being changed - port info for Fab> ib_modify_port, and node info for ib_modify_node. This is what we did for the Topspin API. - R. From halr at voltaire.com Sat Aug 14 08:44:29 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Sat, 14 Aug 2004 11:44:29 -0400 Subject: [openib-general] Linux 2.6.8/2.6.8.1 Message-ID: <1092498268.1831.2.camel@localhost.localdomain> Here's the diff file to add to Roland's branch to get started with Linux 2.6.8/2.6.8.1. -- Hal -------------- next part -------------- --- linux-2.6.8/drivers/Kconfig 2004-08-14 11:34:09.000000000 -0400 +++ linux-2.6.8/drivers/Kconfig.orig 2004-08-14 01:38:04.000000000 -0400 @@ -54,6 +54,4 @@ source "drivers/usb/Kconfig" -source "drivers/infiniband/Kconfig" - endmenu --- linux-2.6.8/drivers/Makefile 2004-08-14 11:35:00.000000000 -0400 +++ linux-2.6.8/drivers/Makefile.orig 2004-08-14 01:37:38.000000000 -0400 @@ -50,5 +50,4 @@ obj-$(CONFIG_MCA) += mca/ obj-$(CONFIG_EISA) += eisa/ obj-$(CONFIG_CPU_FREQ) += cpufreq/ -obj-$(CONFIG_INFINIBAND) += infiniband/ obj-y += firmware/ From roland at topspin.com Sat Aug 14 09:21:11 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 14 Aug 2004 09:21:11 -0700 Subject: [openib-general] Linux 2.6.8/2.6.8.1 In-Reply-To: <1092498268.1831.2.camel@localhost.localdomain> (Hal Rosenstock's message of "Sat, 14 Aug 2004 11:44:29 -0400") References: <1092498268.1831.2.camel@localhost.localdomain> Message-ID: <52u0v5zo5k.fsf@topspin.com> Thanks (the patch was reversed but easy to fix). I added it as linux-2.6.8.1-infiniband.diff since no one should be using linux-2.6.8. - R. From roland at topspin.com Sat Aug 14 09:36:53 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 14 Aug 2004 09:36:53 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) Message-ID: <52pt5tznfe.fsf@topspin.com> Now that Linus has released 2.6.8 (and 2.6.8.1), I've just committed a few updates to mthca. The major change is to use the new MSI/MSI-X API in 2.6.8 (which means that mthca will not build against older kernels). Since mthca now depends on 2.6.8 anyway, I got rid of the mthca_pci.h file (which just has defines that are all in 2.6.8). Finally, I changed the logic to unconditionally reset the HCA during initialization, since I've found that the previous initialization logic may not work when handed an HCA in an unknown state. To try MSI or MSI-X, you need a platform that supports MSI (right now, that means an Intel system -- I've only tested i386, but x86-64 with an Intel CPU and IA64 should theoretically work too). Then build your kernel with CONFIG_PCI_MSI=y and add msi=1 and/or msi_x=1 to your mthca module options (if both are set, msi_x will be tried first). I believe MSI-X requires HCA firmware that Mellanox has not yet released. MSI mode seems to work with current firmware (although I have seen some unexplained machine checks when using MSI). MSI-X seems to improve performance somewhat, because it allows the driver to register three separate interrupt vectors and avoid having to do a slow MMIO read of the event cause register in the interrupt handler. I don't know if there's much point to plain MSI, because Linux only allows a driver to have a single interrupt even in MSI mode. As usual all comments and test reports are gratefully accepted. - Roland Index: hw/mthca/mthca_reset.c =================================================================== --- hw/mthca/mthca_reset.c (revision 649) +++ hw/mthca/mthca_reset.c (working copy) @@ -25,6 +25,7 @@ #include #include #include +#include #include "mthca_dev.h" #include "mthca_cmd.h" @@ -33,7 +34,6 @@ { int i; int err = 0; - u8 status; u32 *hca_header = NULL; u32 *bridge_header = NULL; struct pci_dev *bridge = NULL; @@ -41,16 +41,9 @@ #define MTHCA_RESET_OFFSET 0xf0010 #define MTHCA_RESET_VALUE cpu_to_be32(1) - mthca_info(mdev, "HCA already enabled -- restarting.\n"); - /* Shut down the HCA cleanly first; assume it has two ports */ - mthca_CLOSE_IB(mdev, 1, &status); - mthca_CLOSE_IB(mdev, 2, &status); - mthca_CLOSE_HCA(mdev, 0, &status); - mthca_SYS_DIS(mdev, &status); - /* - * Now reset the chip. This is somewhat ugly because we have - * to save off the PCI header before reset and then restore it + * Reset the chip. This is somewhat ugly because we have to + * save off the PCI header before reset and then restore it * after the chip reboots. We skip config space offsets 22 * and 23 since those have a special meaning. * @@ -74,7 +67,12 @@ } if (!bridge) { - mthca_err(mdev, "No bridge found for %s (%s), aborting\n", + /* + * Didn't find a bridge for a Tavor device -- + * assume we're in no-bridge mode and hope for + * the best. + */ + mthca_warn(mdev, "No bridge found for %s (%s)\n", pci_pretty_name(mdev->pdev), pci_name(mdev->pdev)); return -ENODEV; } @@ -139,8 +137,7 @@ } /* Docs say to wait one second before accessing device */ - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(HZ); + msleep(1000); /* Now wait for PCI device to start responding again */ { @@ -156,20 +153,18 @@ } if (v != 0xffffffff) - break; - } + goto good; - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(HZ / 10); - - if (c == 100) { - err = -ENODEV; - mthca_err(mdev, "PCI device did not come back after reset, " - "aborting.\n"); - goto out; + msleep(100); } + + err = -ENODEV; + mthca_err(mdev, "PCI device did not come back after reset, " + "aborting.\n"); + goto out; } +good: /* Now restore the PCI headers */ if (bridge) { /* @@ -217,7 +212,7 @@ goto out; } - out: +out: if (bridge) pci_dev_put(bridge); kfree(bridge_header); Index: hw/mthca/Kconfig =================================================================== --- hw/mthca/Kconfig (revision 649) +++ hw/mthca/Kconfig (working copy) @@ -14,24 +14,6 @@ messages. Select this is you are developing the driver or trying to diagnose a problem. -config INFINIBAND_MTHCA_MSI - bool "MSI (Message Signaled Interrupt) support (EXPERIMENTAL)" - depends on INFINIBAND_MTHCA && PCI_USE_VECTOR && EXPERIMENTAL - ---help--- - This option will have the mthca driver attempt to use MSI - (message signaled interrupts) instead of old-style INTx PCI - interrupts. - -config INFINIBAND_MTHCA_MSI_X - bool "MSI-X support (EXPERIMENTAL)" - depends on INFINIBAND_MTHCA && PCI_USE_VECTOR && EXPERIMENTAL && BROKEN - ---help--- - This option will have the mthca driver attempt to use MSI-X - (extended message signaled interrupts). This allows the - driver to use separate interrupts for each event queue, - which may improve performance. If both MSI and MSI-X are - selected, MSI-X will be tried first. - config INFINIBAND_MTHCA_SSE_DOORBELL bool "SSE doorbell code" depends on INFINIBAND_MTHCA && X86 && !X86_64 Index: hw/mthca/mthca_main.c =================================================================== --- hw/mthca/mthca_main.c (revision 649) +++ hw/mthca/mthca_main.c (working copy) @@ -35,7 +35,6 @@ #endif #include "mthca_dev.h" -#include "mthca_pci.h" #include "mthca_config_reg.h" #include "mthca_cmd.h" #include "mthca_profile.h" @@ -45,6 +44,23 @@ MODULE_LICENSE("Dual BSD/GPL"); MODULE_VERSION(DRV_VERSION); +#ifdef CONFIG_PCI_MSI + +static int msi_x = 0; +module_param(msi_x, int, 0444); +MODULE_PARM_DESC(msi_x, "attempt to use MSI-X if nonzero"); + +static int msi = 0; +module_param(msi, int, 0444); +MODULE_PARM_DESC(msi_x, "attempt to use MSI if nonzero"); + +#else /* CONFIG_PCI_MSI */ + +#define msi_x (0) +#define msi (0) + +#endif /* CONFIG_PCI_MSI */ + static const char mthca_version[] __devinitdata = "ib_mthca: Mellanox InfiniBand HCA driver v" DRV_VERSION " (" DRV_RELDATE ")\n"; @@ -106,36 +122,9 @@ return err; } if (status) { - if (status == MTHCA_CMD_STAT_BAD_SYS_STATE || - status == MTHCA_CMD_STAT_BAD_OP) { - /* - * The HCA is already running, probably - * because a boot ROM left it enabled. - * Disable it and try again. - */ - err = mthca_reset(mdev); - if (err) { - mthca_err(mdev, "Failed to reset running HCA, " - "aborting.\n"); - return err; - } - - err = mthca_SYS_EN(mdev, &status); - if (err) { - mthca_err(mdev, "SYS_EN command failed, " - "aborting.\n"); - return err; - } - if (status) { - mthca_err(mdev, "SYS_EN returned status 0x%02x, " - "aborting.\n", status); - return -EINVAL; - } - } else { - mthca_err(mdev, "SYS_EN returned status 0x%02x, " - "aborting.\n", status); - return -EINVAL; - } + mthca_err(mdev, "SYS_EN returned status 0x%02x, " + "aborting.\n", status); + return -EINVAL; } err = mthca_QUERY_FW(mdev, &status); @@ -409,7 +398,6 @@ pci_release_region(pdev, 4); } -#ifdef CONFIG_INFINIBAND_MTHCA_MSI_X static int mthca_enable_msi_x(struct mthca_dev *mdev) { struct msix_entry entries[3]; @@ -419,7 +407,7 @@ entries[1].entry = 1; entries[2].entry = 2; - err = pci_enable_msix(mdev->pdev, &entries, ARRAY_SIZE(entries)); + err = pci_enable_msix(mdev->pdev, entries, ARRAY_SIZE(entries)); if (err) { if (err > 0) mthca_info(mdev, "Only %d MSI-X vectors available, " @@ -433,19 +421,7 @@ return 0; } -#endif /* CONFIG_INFINIBAND_MTHCA_MSI_X */ -static void mthca_free_irqs(struct mthca_dev *dev) -{ - int i; - - if (dev->eq_table.have_irq) - free_irq(dev->pdev->irq, dev); - for (i = 0; i < MTHCA_NUM_EQ; ++i) - if (dev->eq_table.eq[i].have_irq) - free_irq(dev->eq_table.eq[i].msi_x_vector, dev); -} - static int __devinit mthca_init_one(struct pci_dev *pdev, const struct pci_device_id *id) { @@ -534,15 +510,22 @@ if (ddr_hidden) mdev->mthca_flags |= MTHCA_FLAG_DDR_HIDDEN; -#ifdef CONFIG_INFINIBAND_MTHCA_MSI_X - if (!mthca_enable_msi_x(mdev)) + /* + * Now reset the HCA before we touch the PCI capabilities or + * attempt a firmware command, since a boot ROM may have left + * the HCA in an undefined state. + */ + err = mthca_reset(mdev); + if (err) { + mthca_err(mdev, "Failed to reset HCA, aborting.\n"); + goto err_out_free_dev; + } + + if (msi_x && !mthca_enable_msi_x(mdev)) mdev->mthca_flags |= MTHCA_FLAG_MSI_X; -#endif -#ifdef CONFIG_INFINIBAND_MTHCA_MSI - if (!(mdev->mthca_flags & MTHCA_FLAG_MSI_X) && + if (msi && !(mdev->mthca_flags & MTHCA_FLAG_MSI_X) && !pci_enable_msi(pdev)) mdev->mthca_flags |= MTHCA_FLAG_MSI; -#endif sema_init(&mdev->cmd.hcr_sem, 1); sema_init(&mdev->cmd.poll_sem, 1); @@ -619,8 +602,6 @@ mthca_SYS_DIS(mdev, &status); } - mthca_free_irqs(mdev); - err_out_iounmap_kar: iounmap((void *) mdev->kar); @@ -631,6 +612,11 @@ iounmap((void *) mdev->hcr); err_out_free_dev: + if (mdev->mthca_flags & MTHCA_FLAG_MSI_X) + pci_disable_msix(pdev); + if (mdev->mthca_flags & MTHCA_FLAG_MSI) + pci_disable_msi(pdev); + kfree(mdev); err_out_free_res: @@ -668,15 +654,15 @@ mthca_CLOSE_HCA(mdev, 0, &status); mthca_SYS_DIS(mdev, &status); - /* - * We don't free our IRQ(s) until after SYS_DIS, - * because it seems that SYS_DIS rewrites the PCI - * config space, and if we are using MSI, we want to - * make sure MSI stays disabled after we unload. - */ - mthca_free_irqs(mdev); + iounmap((void *) mdev->hcr); iounmap((void *) mdev->clr_base); + + if (mdev->mthca_flags & MTHCA_FLAG_MSI_X) + pci_disable_msix(pdev); + if (mdev->mthca_flags & MTHCA_FLAG_MSI) + pci_disable_msi(pdev); + kfree(mdev); mthca_release_regions(pdev, mdev->mthca_flags & MTHCA_FLAG_DDR_HIDDEN); Index: hw/mthca/ChangeLog =================================================================== --- hw/mthca/ChangeLog (revision 649) +++ hw/mthca/ChangeLog (working copy) @@ -31,3 +31,8 @@ HCA firmware that has not been released yet. Both MSI and MSI-X may be less stable than using standard INTx). Implement more asynchronous events. + Always reset the HCA on initialization since we we + might not be able to tell that the HCA was left running. + Remove mthca_pci.h (since we already depend on kernel + 2.6.8). + Rework API to conform with new OpenIB verbs. Index: hw/mthca/mthca_profile.c =================================================================== --- hw/mthca/mthca_profile.c (revision 649) +++ hw/mthca/mthca_profile.c (working copy) @@ -26,7 +26,7 @@ #include "mthca_profile.h" -static int use_profile = 0; +static int use_profile; module_param(use_profile, int, 0444); MODULE_PARM_DESC(use_profile, "load HCA profile through sysfs firmware " Index: hw/mthca/mthca_eq.c =================================================================== --- hw/mthca/mthca_eq.c (revision 649) +++ hw/mthca/mthca_eq.c (working copy) @@ -502,6 +502,18 @@ kfree(mailbox); } +static void mthca_free_irqs(struct mthca_dev *dev) +{ + int i; + + if (dev->eq_table.have_irq) + free_irq(dev->pdev->irq, dev); + for (i = 0; i < MTHCA_NUM_EQ; ++i) + if (dev->eq_table.eq[i].have_irq) + free_irq(dev->eq_table.eq[i].msi_x_vector, + dev->eq_table.eq + i); +} + int __devinit mthca_init_eq_table(struct mthca_dev *dev) { int err; @@ -561,7 +573,7 @@ for (i = 0; i < MTHCA_NUM_EQ; ++i) { err = request_irq(dev->eq_table.eq[i].msi_x_vector, mthca_msi_x_interrupt, 0, - eq_name[i], &dev->eq_table.eq[i]); + eq_name[i], dev->eq_table.eq + i); if (err) goto err_out_cmd; dev->eq_table.eq[i].have_irq = 1; @@ -593,6 +605,7 @@ return 0; err_out_cmd: + mthca_free_irqs(dev); mthca_free_eq(dev, &dev->eq_table.eq[MTHCA_EQ_CMD]); err_out_async: @@ -611,6 +624,8 @@ u8 status; int i; + mthca_free_irqs(dev); + mthca_MAP_EQ(dev, MTHCA_ASYNC_EVENT_MASK, 1, dev->eq_table.eq[MTHCA_EQ_ASYNC].eqn, &status); mthca_MAP_EQ(dev, MTHCA_CMD_EVENT_MASK, Index: hw/mthca/mthca_pci.h =================================================================== --- hw/mthca/mthca_pci.h (revision 649) +++ hw/mthca/mthca_pci.h (working copy) @@ -1,55 +0,0 @@ -/* - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available at - * , or the OpenIB.org BSD - * license, available in the LICENSE.TXT file accompanying this - * software. These details are also available at - * . - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * Copyright (c) 2004 Topspin Communications. All rights reserved. - * - * $Id$ - */ - -/* - * This file will be removed once all IDs are merged into an official - * kernel release (they are already in Linus's BK tree and 2.6.7-rc2). - */ - -#ifndef MTHCA_PCI_H -#define MTHCA_PCI_H - -#if !defined(PCI_VENDOR_ID_MELLANOX) -#define PCI_VENDOR_ID_MELLANOX 0x15b3 -#endif - -#if !defined(PCI_VENDOR_ID_TOPSPIN) -#define PCI_VENDOR_ID_TOPSPIN 0x1867 -#endif - -#if !defined(PCI_DEVICE_ID_MELLANOX_TAVOR) -#define PCI_DEVICE_ID_MELLANOX_TAVOR 23108 -#endif - -#if !defined(PCI_DEVICE_ID_MELLANOX_ARBEL_COMPAT) -#define PCI_DEVICE_ID_MELLANOX_ARBEL_COMPAT 25208 -#endif - -#endif /* MTHCA_PCI_H */ - -/* - * Local Variables: - * c-file-style: "linux" - * indent-tabs-mode: t - * End: - */ From mst at mellanox.co.il Sun Aug 15 00:41:03 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 15 Aug 2004 10:41:03 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040812093126.1adfc9f3.mshefty@ichips.intel.com> References: <20040811145158.09f20f5e.mshefty@ichips.intel.com> <000501c47ff9$b82e41c0$655aa8c0@infiniconsys.com> <20040812064718.GB28866@mellanox.co.il> <20040812093126.1adfc9f3.mshefty@ichips.intel.com> Message-ID: <20040815074102.GD31266@mellanox.co.il> Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] ib_req_ncomp_notif in core_ layer": > On Thu, 12 Aug 2004 09:47:18 +0300 > "Michael S. Tsirkin" wrote: > > > Nope, sorry. Tavor will generate an event if completion are generated > > *after the event was generated*. > > > > So > > arm > > completion > > event > > arm <-- no event > > > > But > > > > arm > > completion > > event > > completion > > arm <-- event > > Is the event generated only if the CQ is armed for the next completion? > Or is it generated even in the case the the CQ is armed for the next > solicited completion? If we add a generate event flag for the send and > receive work requests, what sort of behavior is possible if the CQ is armed > with completions still on it? All these work pretty much as you would expect I think: Here's an example for solicited: arm solicited completion (solicited) event completion (not solicited) arm solicited <-- no event But arm solicited completion (solicited) event completion (solicited) arm <-- event Similiarly for send/receive work requests marked for events. MST From gdror at mellanox.co.il Sun Aug 15 04:28:26 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Sun, 15 Aug 2004 14:28:26 +0300 Subject: [openib-general] Client Reregistration Status Message-ID: <506C3D7B14CDD411A52C00025558DED605C6870D@mtlex01.yok.mtl.com> > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Friday, August 13, 2004 12:24 AM > 4. Query HCA can read IsClientReregistration capability mask bit. > > 5. Addition of new asynchronous unaffiliated event for client > reregistration > > o Client Reregistration Event - issued when SM requests > client reregistration (see ) > > o11-6.1.2: If the CI indicates that the port supports client > reregistration, the CI shall generate a Client Reregistration > Event when the SMA receives this request from the SM. > > Note that the last 2 items (4 and 5) need approval by SWG and > are subject to change. > It will be discussed on Tuesday. I don't expect any problems. Anyway, I'll update you after the meeting. Thanks Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Aug 15 05:50:27 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 15 Aug 2004 15:50:27 +0300 Subject: [openib-general] qp lock in mthca_poll_cq In-Reply-To: <523c2s2xre.fsf@topspin.com> References: <20040812072806.GA803@mellanox.co.il> <20040812074643.GB803@mellanox.co.il> <523c2s2xre.fsf@topspin.com> Message-ID: <20040815125027.GA410@mellanox.co.il> Hello, Roland! Thanks, the comment in mthca_provider.h clarified things to me. A couple of small things I'd like to remark on: Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] qp lock in mthca_poll_cq": > There are two separate uses of the QP during CQ poll, which is why > refcount is an atomic_t and we also take the spinlock. > > First, the refcount is used to make sure that destroy_qp does not > get rid of the QP struct while the QP is being accessed to handle > poll_cq. Wouldnt locking the cq while QP is being destroyed also work? And maybe the eq which gets the async events. > It is an atomic_t because it may be accessed without the QP > lock being held (eg when an async event is received for the QP). I see. Maybe this code: spin_lock(&dev->qp_table.lock); qp = mthca_array_get(&dev->qp_table.qp, qpn & (dev->limits.num_qps - 1)); if (qp) atomic_inc(&qp->refcount); spin_unlock(&dev->qp_table.lock); and this if (atomic_dec_and_test(&qp->refcount)) wake_up(&qp->wait); Shall be factored in inline functions? There appear to be several users of such constructs in mthca_cq and mthca_qp. Something like mthca_qp_get / mthca_qp_put. > > Second, the QP's lock is taken during CQE processing because other > non-atomic parts of the QP struct such as the number of WQEs > outstanding _are_ modified and need to be protected against concurrent > access from the send/receive post routine. > > It might be possible to avoid taking the QP lock in poll_cq by making > the current WQE count an atomic_t, but I'm not sure if that's a win > because it means that send/receive posting would have to use atomic > accesses as well (and I don't think you can make posting WQEs lock-free). It seems so, since one has to serialize the access to the queue itself. > Some of my reasoning is in the comment near the bottom of > mthca_provider.h too. It says there: * We have one global lock that protects dev->cq/qp_table. Each * struct mthca_cq/qp also has its own lock. No locks should be * nested inside each other. However mthca_poll_one takes qp->lock while cq->lock is being held in mthca_poll_cq. Isn't that nesting? > - Roland MST From mst at mellanox.co.il Sun Aug 15 07:09:11 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 15 Aug 2004 17:09:11 +0300 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040812074946.6f61fbb2.mshefty@ichips.intel.com> References: <20040811145158.09f20f5e.mshefty@ichips.intel.com> <000501c47ff9$b82e41c0$655aa8c0@infiniconsys.com> <20040812064718.GB28866@mellanox.co.il> <20040812074946.6f61fbb2.mshefty@ichips.intel.com> Message-ID: <20040815140911.GD905@mellanox.co.il> Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] ib_req_ncomp_notif in core_ layer": > On Thu, 12 Aug 2004 09:47:18 +0300 > "Michael S. Tsirkin" wrote: > > > Either way this would have to be documented. > > What do others here think? > > Thanks for the clarification. > I think we want ib_req_notify_cq to behave as you described. How does > the current implementation of req_ncomp_notify work? What options does > the hardware support for n>1? Not sure which implementation do you refer to. Hardware, for n>1, reports event after arm N if and only if 1. there was completion with error or solicited (or descriptor marked for event) after the last completion event, and this completion was not yet polled when arm was performed [ This is same as for request for solicited notification] or 2. At least n completions where not polled yet (2) Means that if you did arm N and you see an event, you really want to do poll at least once or you'll get another event immediately. > I will also add some documentation around these calls (depending on if > they can be combined) for clarification. From gdror at mellanox.co.il Sun Aug 15 12:16:45 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Sun, 15 Aug 2004 22:16:45 +0300 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) Message-ID: <506C3D7B14CDD411A52C00025558DED605C6879C@mtlex01.yok.mtl.com> > -----Original Message----- > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Saturday, August 14, 2004 7:37 PM > > Now that Linus has released 2.6.8 (and 2.6.8.1), I've just > committed a few updates to mthca. The major change is to use > the new MSI/MSI-X API in 2.6.8 (which means that mthca will > not build against older kernels). Since mthca now depends on > 2.6.8 anyway, I got rid of the mthca_pci.h file (which just > has defines that are all in 2.6.8). Finally, I changed the > logic to unconditionally reset the HCA during initialization, > since I've found that the previous initialization logic may > not work when handed an HCA in an unknown state. This worth some discussion. What is the case where you expect to get the HCA in an unknown state ? Is it when handing over between pre-OS driver to the OS-driver ? It is always good to try as much to quiesce the device before resetting it. The best way is to SYS_DIS it. Other option for a running HCA is to halt if (see CLOSE_HCA with the Panic close). Resetting the bus while HCA is running puts some risk if the PCI-X bus is running, e.g. some PCI reads that got split response will never get completed. > > To try MSI or MSI-X, you need a platform that supports MSI > (right now, that means an Intel system -- I've only tested > i386, but x86-64 with an Intel CPU and IA64 should > theoretically work too). Then build your kernel with > CONFIG_PCI_MSI=y and add msi=1 and/or msi_x=1 to your mthca > module options (if both are set, msi_x will be tried first). > > I believe MSI-X requires HCA firmware that Mellanox has not > yet released. MSI mode seems to work with current firmware > (although I have seen some unexplained machine checks when using MSI). Current Tavor FW is supposed to fully support MSI-X and MSI. You just choose... I think MSI-X is better :) > > MSI-X seems to improve performance somewhat, because it > allows the driver to register three separate interrupt > vectors and avoid having to do a slow MMIO read of the event > cause register in the interrupt handler. I don't know if > there's much point to plain MSI, because Linux only allows a > driver to have a single interrupt even in MSI mode. I think that limitation of a single vectors is not necessarily Linux but rather it has to do with the chipset. The major difference between MSI and MSI-X is where different messages go. While MSI sends them to the same address, MSI-X sends them to different addresses. That enables the chipset/CPU-APIC to handle more interrupts messages per device. Anyway, I believe that with both MSI and MSI-X you can avoid the PIO read. * If it's MSI-X, it's easy. By the IRQ# you can tell which EQ has work. * If it's MSI, just go and peek into all available EQs. It's much more efficient than doing a PIO read. Last, if you take a look at the verbs extensions. There is a nice extension that allows using more than one CQ event handler. The idea was to use MSI/MSI-X for thing like CPU affinity or IRQ priority. In which case, you should plan on using multiple MSI/MSI-X vectors and allow user to select interrupt characteristic. ... but we're trying to get work done, not talk about verbs extensions, aren't we :) > > As usual all comments and test reports are gratefully accepted. > > - Roland -------------- next part -------------- An HTML attachment was scrubbed... URL: From ftillier at infiniconsys.com Sun Aug 15 22:05:58 2004 From: ftillier at infiniconsys.com (Fab Tillier) Date: Sun, 15 Aug 2004 22:05:58 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <52fz6qxvpi.fsf@topspin.com> Message-ID: <000301c4834e$b844c960$655aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Friday, August 13, 2004 8:09 PM > > Fab> Only if you add things like system image GUID and InitType. > Fab> Otherwise, the flags provide all the information needed. > > Hmm... true in a way but what if someone eg only wants to set the IsSM > bit without touching anything else? I guess they could do a query > first to find out the current state but even that leaves a race open > between the query and the modify. Good point... There needs to be a way to set *and* clear the bits. > > Fab> InitType is also per-device. > > In my copy of the 1.1 spec it's PortInfo:InitType, which seems to > indicate that it's actually per-port (unless this has been changed in > the errata). I confused NodeType with InitType, so you're right. Not sure what I was thinking... Sorry - Fab From gdror at mellanox.co.il Mon Aug 16 01:18:00 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Mon, 16 Aug 2004 11:18:00 +0300 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) Message-ID: <506C3D7B14CDD411A52C00025558DED605C6880E@mtlex01.yok.mtl.com> >-----Original Message----- >From: Dror Goldenberg [mailto:gdror at mellanox.co.il] >Sent: Sunday, August 15, 2004 10:17 PM > >> -----Original Message----- >> From: Roland Dreier [mailto:roland at topspin.com] >> Sent: Saturday, August 14, 2004 7:37 PM >> >> >> I believe MSI-X requires HCA firmware that Mellanox has not >> yet released. MSI mode seems to work with current firmware >> (although I have seen some unexplained machine checks when using MSI). > >Current Tavor FW is supposed to fully support MSI-X and MSI. You just >choose... I think MSI-X is better :) Correction. I checked and it appears that there is a bug on MSI-X on currently released firmware. The bug will be fixed on FW 3.3.0. The bug is about endianess, so it's theoretically possible to workaround it, or just wait for the release. I'd wait for the release and get the driver developed much easily. Roland, what do you say? -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at topspin.com Mon Aug 16 08:23:33 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 16 Aug 2004 08:23:33 -0700 Subject: [openib-general] qp lock in mthca_poll_cq In-Reply-To: <20040815125027.GA410@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 15 Aug 2004 15:50:27 +0300") References: <20040812072806.GA803@mellanox.co.il> <20040812074643.GB803@mellanox.co.il> <523c2s2xre.fsf@topspin.com> <20040815125027.GA410@mellanox.co.il> Message-ID: <527jrzqf7u.fsf@topspin.com> Michael> Wouldnt locking the cq while QP is being destroyed also Michael> work? And maybe the eq which gets the async events. Yes, that's a good idea, trading some locking in the slow destroy path for removing an atomic access in the data path. EQ access is currently lock-free, but replacing the atomic_t refcounting of individual resources with a per-EQ spinlock should if anything be a little more cache friendly. I'll add this to my TODO list (I need to take care that the locking hierarchies are OK to avoid deadlocks). Michael> It says there: * We have one global lock that protects Michael> dev->cq/qp_table. Each * struct mthca_cq/qp also has its Michael> own lock. No locks should be * nested inside each other. Michael> However mthca_poll_one takes qp->lock while cq->lock is Michael> being held in mthca_poll_cq. Isn't that nesting? Yes, as usual the comment has not kept up with the code. I'll update this. (There's no risk of deadlock with that nesting) - Roland From roland at topspin.com Mon Aug 16 08:51:44 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 16 Aug 2004 08:51:44 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <506C3D7B14CDD411A52C00025558DED605C6879C@mtlex01.yok.mtl.com> (Dror Goldenberg's message of "Sun, 15 Aug 2004 22:16:45 +0300") References: <506C3D7B14CDD411A52C00025558DED605C6879C@mtlex01.yok.mtl.com> Message-ID: <521xi7qdwv.fsf@topspin.com> Roland> Finally, I changed the logic to unconditionally reset the Roland> HCA during initialization, since I've found that the Roland> previous initialization logic may not work when handed an Roland> HCA in an unknown state. Dror> This worth some discussion. What is the case where you Dror> expect to get the HCA in an unknown state ? Is it when Dror> handing over between pre-OS driver to the OS-driver ? It is Dror> always good to try as much to quiesce the device before Dror> resetting it. The best way is to SYS_DIS it. Other option Dror> for a running HCA is to halt if (see CLOSE_HCA with the Dror> Panic close). Resetting the bus while HCA is running puts Dror> some risk if the PCI-X bus is running, e.g. some PCI reads Dror> that got split response will never get completed. I'll check into this. My impression is that we had problems running FW commands when the HCA was left running by a pre-OS driver. Roland> MSI-X seems to improve performance somewhat, because it Roland> allows the driver to register three separate interrupt Roland> vectors and avoid having to do a slow MMIO read of the Roland> event cause register in the interrupt handler. I don't Roland> know if there's much point to plain MSI, because Linux Roland> only allows a driver to have a single interrupt even in Roland> MSI mode. Dror> I think that limitation of a single vectors is not Dror> necessarily Linux but rather it has to do with the Dror> chipset. The major difference between MSI and MSI-X is where Dror> different messages go. While MSI sends them to the same Dror> address, MSI-X sends them to different addresses. That Dror> enables the chipset/CPU-APIC to handle more interrupts Dror> messages per device. Maybe, but the Linux Documentation/MSI-HOWTO.txt file says this about MSI vectors: "Due to the non-contiguous fashion in vector assignment of the existing Linux kernel, this version does not support multiple messages regardless of a device function is capable of supporting more than one vector." Dror> Anyway, I believe that with both MSI and MSI-X you can avoid Dror> the PIO read. * If it's MSI-X, it's easy. By the IRQ# you Dror> can tell which EQ has work. * If it's MSI, just go and peek Dror> into all available EQs. It's much more efficient than doing Dror> a PIO read. It seems this would apply to standard INTx mode as well. Do you know why Mellanox didn't use this in THCA? In any case, I'll have to benchmark this approach. Dror> Last, if you take a look at the verbs extensions. There is a Dror> nice extension that allows using more than one CQ event Dror> handler. The idea was to use MSI/MSI-X for thing like CPU Dror> affinity or IRQ priority. In which case, you should plan on Dror> using multiple MSI/MSI-X vectors and allow user to select Dror> interrupt characteristic. ... but we're trying to get work Dror> done, not talk about verbs extensions, aren't we :) Yes, I've definitely thought of this, and it's a great idea. However, I thought it would be worth it to get our base verbs working before trying to design an API for multiple CQ event handlers. It's definitely something I'll implement in the future (the actual mthca code is pretty easy; designing the access layer API is most of the work). Thanks, - R. From Tom.Duffy at Sun.COM Mon Aug 16 09:03:03 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Mon, 16 Aug 2004 09:03:03 -0700 Subject: Intent to remove ts_kernel_trace WAS[Re: [openib-general] [PATCH] kill ib_legacy.h] In-Reply-To: <52r7qay0zh.fsf@topspin.com> References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> Message-ID: <1092672183.2752.4.camel@duffman> On Fri, 2004-08-13 at 18:14 -0700, Roland Dreier wrote: > Some TS_TRACE messages can be directly replaced with printk (although > in some cases dev_printk might be nice to get a little more info > printed). For example stuff like > > TS_REPORT_WARN(MOD_KERNEL_IB, > "Device %s is missing mandatory function %s", > device->name, mandatory_table[i].name); > return -EINVAL; > > (in core_device.c) can obviously be replaced with a printk(KERN_ERR ...). Fair enough. > However most of the trace messages (probably everything at level > VERBOSE and above) are for debugging and need to be wrapped in #ifdef > DEBUG or (better) dynamically controlled with a debug level settable > through sysfs. How about I put all VERBOSE and above in a dbg() function only defined when DEBUG is set? This is how USB does it. > For example most of the tracing in cm_*.c is way too verbose to printk > all the time (even at level KERN_DEBUG). That is true. -tduffy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From roland at topspin.com Mon Aug 16 09:29:37 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 16 Aug 2004 09:29:37 -0700 Subject: Intent to remove ts_kernel_trace WAS[Re: [openib-general] [PATCH] kill ib_legacy.h] In-Reply-To: <1092672183.2752.4.camel@duffman> (Tom Duffy's message of "Mon, 16 Aug 2004 09:03:03 -0700") References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> Message-ID: <52k6vzoxla.fsf@topspin.com> Tom> How about I put all VERBOSE and above in a dbg() function Tom> only defined when DEBUG is set? This is how USB does it. Actually defines pr_debug() now, so we could use that. Somewhere down the road I'd like to get settable print levels, though (it's been pretty useful in the past to be able to turn on debug prints on the fly when a system gets in a bad state). - R. From mshefty at ichips.intel.com Mon Aug 16 09:07:53 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 16 Aug 2004 09:07:53 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <52n00yxzor.fsf@topspin.com> References: <52n00yxzor.fsf@topspin.com> Message-ID: <20040816090753.4bd30a8d.mshefty@ichips.intel.com> On Fri, 13 Aug 2004 18:42:44 -0700 Roland Dreier wrote: > I'm starting to look at implementing the new device query/modify API, > and I have a few questions about the modify_device API. The call originally mapped to the vapi_modify_hca_attr routine. Looking at the vapi code, I don't see how you can set the system image guid, port shutdown, or inittype, so it does appear that this functionality is missing. > Also, there are a couple of extensions we may want to add, namely > setting IsCM and possibly IsClientReregistrationSupported in the > capabilities. Agreed. > Next, the API > > int ib_modify_device(struct ib_device *device, > u8 port_num, > int device_attr_flags); > > seems to leave out the actual properties structure. > > One last minor question: system image GUID is really per-device (since > it's in NodeInfo, not PortInfo), so requiring a port number to set it > seems a little unclean. Is it worth creating a new entry point for > setting system image GUID (and any other per-device settings we want)? If you have a proposal for an alternate API, please let me know. Otherwise, I will take some time and examine some of the other code and try to pull something together. I like the idea of having two calls, modify_port and modify_device/node. From mst at mellanox.co.il Mon Aug 16 10:33:03 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 16 Aug 2004 20:33:03 +0300 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <521xi7qdwv.fsf@topspin.com> References: <506C3D7B14CDD411A52C00025558DED605C6879C@mtlex01.yok.mtl.com> <521xi7qdwv.fsf@topspin.com> Message-ID: <20040816173303.GA15087@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] [PATCH] mthca updates (2.6.8 dependent)": > Dror> Anyway, I believe that with both MSI and MSI-X you can avoid > Dror> the PIO read. * If it's MSI-X, it's easy. By the IRQ# you > Dror> can tell which EQ has work. * If it's MSI, just go and peek > Dror> into all available EQs. It's much more efficient than doing > Dror> a PIO read. > > It seems this would apply to standard INTx mode as well. Do you know > why Mellanox didn't use this in THCA? In any case, I'll have to > benchmark this approach. It does apply to the standard INTx mode. However with standard INTx mode there is more of a chance that by the time you peek the EQ, the EQ is empty and you end up getting an extra interrupt. But please note that you must arm all EQs even if you dont find any EQ entries there. So you trade PIO read for PIO writes, and the more EQs the more writes you will need. I dont know what the driver currently does but it is certainly something we planned to do. MST From tduffy at sun.com Mon Aug 16 10:52:09 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 16 Aug 2004 10:52:09 -0700 Subject: [openib-general] [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <52k6vzoxla.fsf@topspin.com> References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> Message-ID: <1092678729.2752.12.camel@duffman> On Mon, 2004-08-16 at 09:29 -0700, Roland Dreier wrote: > Tom> How about I put all VERBOSE and above in a dbg() function > Tom> only defined when DEBUG is set? This is how USB does it. > > Actually defines pr_debug() now, so we could use > that. Somewhere down the road I'd like to get settable print levels, > though (it's been pretty useful in the past to be able to turn on > debug prints on the fly when a system gets in a bad state). OK, here is a patch against client_query to get an idea of how this conversion will look. Let me know if this is a good thing and I will go through the rest. Signed-by: Tom Duffy with permission from Sun legal. Index: client_query.c =================================================================== --- client_query.c (revision 654) +++ client_query.c (working copy) @@ -28,7 +28,6 @@ #include "ts_ib_client_query.h" #include "ts_ib_rmpp_mad_types.h" -#include "ts_kernel_trace.h" #include "ts_kernel_hash.h" #include "ts_kernel_timer.h" @@ -212,8 +211,8 @@ if (query) { if (!query->rmpp_rcv) { /* ERROR */ - TS_REPORT_WARN(MOD_KERNEL_IB, - "RMPP timeout for non-RMPP query"); + printk(KERN_WARNING "RMPP timeout for non-RMPP" + " query\n"); ib_client_query_put(query); } else { /* @@ -274,8 +273,7 @@ goto mem_err_query; - TS_TRACE(MOD_KERNEL_IB, T_VERBOSE, TRACE_KERNEL_IB_GEN, - "ib_client_rmpp_query_new()\n"); + pr_debug("ib_client_rmpp_query_new()\n"); query->callback_running = 0; query->transaction_id = packet->transaction_id; @@ -329,8 +327,7 @@ kfree(query); mem_err_query: - TS_TRACE(MOD_KERNEL_IB, T_TERSE, TRACE_KERNEL_IB_GEN, - "Error: Failed to allocate query structure\n"); + printk(KERN_WARNING "Error: Failed to allocate query structure\n"); return NULL; } @@ -351,8 +348,7 @@ attribute_modifier = be32_to_cpu(mad->attribute_modifier); flag = rmpp_mad->resp_time__flags & 0x0F; - TS_TRACE(MOD_KERNEL_IB, T_VERBOSE, TRACE_KERNEL_IB_GEN, - "ib_client_query_rmpp_rcv_mad(flag= 0x%x)\n", flag); + pr_debug("ib_client_query_rmpp_rcv_mad(flag= 0x%x)\n", flag); /* Basic checking */ if ((flag & TS_IB_CLIENT_RMPP_FLAG_ACTIVE) == 0) @@ -366,18 +362,15 @@ u32 payload_length = be32_to_cpu(rmpp_mad->specific.data.payload_length); - TS_TRACE(MOD_KERNEL_IB, T_VERBOSE, TRACE_KERNEL_IB_GEN, - "Data - segment_number= %d, payload_length= %d\n", - segment_number, payload_length); + pr_debug("Data - segment_number= %d, payload_length= %d\n", + segment_number, payload_length); /* if first - allocate data */ if (flag & TS_IB_CLIENT_RMPP_FLAG_DATA_FIRST) { if (query->rmpp_rcv->data) { /* WARNING */ } else { - TS_TRACE(MOD_KERNEL_IB, T_VERBOSE, - TRACE_KERNEL_IB_GEN, - "Allocate data on first segment\n"); + pr_debug("Allocate data on first segment\n"); query->rmpp_rcv->data_length = payload_length; query->rmpp_rcv->data = kmalloc(payload_length, GFP_ATOMIC); @@ -434,8 +427,7 @@ } /* send back ack with sliding window of 1 */ - TS_TRACE(MOD_KERNEL_IB, T_VERBOSE, TRACE_KERNEL_IB_GEN, - "Send back ack, dlid= %d\n", mad->slid); + pr_debug("Send back ack, dlid= %d\n", mad->slid); mad->dlid = mad->slid; mad->completion_func = NULL; mad->has_grh = 0; @@ -448,9 +440,7 @@ /* if last - call user supplied completion */ if (flag & TS_IB_CLIENT_RMPP_FLAG_DATA_LAST) { - TS_TRACE(MOD_KERNEL_IB, T_VERBOSE, - TRACE_KERNEL_IB_GEN, - "get last segment of transaction\n"); + pr_debug("get last segment of transaction\n"); ib_client_query_rmpp_callback(query, TS_IB_CLIENT_RESPONSE_OK, @@ -492,9 +482,8 @@ tTS_IB_CLIENT_RESPONSE_STATUS resp_status; if (!query) { - TS_TRACE(MOD_KERNEL_IB, T_VERBOSE, TRACE_KERNEL_IB_GEN, - "packet received for unknown TID 0x%016" TS_U64_FMT - "x", mad->transaction_id); + pr_debug("packet received for unknown TID 0x%016" TS_U64_FMT + "x\n", mad->transaction_id); return; } @@ -534,8 +523,7 @@ if (0) { int i; - TS_TRACE(MOD_KERNEL_IB, T_VERY_VERBOSE, TRACE_KERNEL_IB_GEN, - "Sending query packet:"); + pr_debug("Sending query packet:\n"); for (i = 0; i < 256; ++i) { if (i % 8 == 0) { @@ -584,8 +572,7 @@ void ib_client_mad_handler(struct ib_mad *mad, void *arg) { - TS_TRACE(MOD_KERNEL_IB, T_VERY_VERBOSE, TRACE_KERNEL_IB_GEN, - "query packet received, TID 0x%016" TS_U64_FMT "x", + pr_debug("query packet received, TID 0x%016" TS_U64_FMT "x\n", mad->transaction_id); if (0) { int i; @@ -689,8 +676,7 @@ { struct ib_client_query *query; - TS_TRACE(MOD_KERNEL_IB, T_VERBOSE, TRACE_KERNEL_IB_GEN, - "tsIbRmppClientQuery()\n"); + pr_debug("tsIbRmppClientQuery()\n"); query = ib_client_rmpp_query_new(packet, timeout_jiffies, header_length, function, arg); Index: client_query_main.c =================================================================== --- client_query_main.c (revision 654) +++ client_query_main.c (working copy) @@ -23,7 +23,6 @@ #include "client_query.h" #include "ts_ib_mad.h" -#include "ts_kernel_trace.h" #include #include @@ -71,8 +70,7 @@ &async_mad_table[mgmt_class]. filter); if (ret) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "Failed to register MAD filter"); + printk(KERN_WARNING "Failed to register MAD filter\n"); return ret; } } else { -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From gdror at mellanox.co.il Mon Aug 16 11:27:13 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Mon, 16 Aug 2004 21:27:13 +0300 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) Message-ID: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> > -----Original Message----- > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Monday, August 16, 2004 6:52 PM > Dror> Anyway, I believe that with both MSI and MSI-X you can avoid > Dror> the PIO read. * If it's MSI-X, it's easy. By the IRQ# you > Dror> can tell which EQ has work. * If it's MSI, just go and peek > Dror> into all available EQs. It's much more efficient than doing > Dror> a PIO read. > > It seems this would apply to standard INTx mode as well. Do > you know why Mellanox didn't use this in THCA? In any case, > I'll have to benchmark this approach. > In PCI/PCIX, the interrupt is a wire, so it is not guaranteed that by the time you got the interrupt, the EQE will be waiting in memory. This is because interrupt goes on a separate wire from HCA to interrupt controller, while data goes up the PCI bridges. Therefore it is required to perform a PIO read to flush all posted writes flying upstream. In PCI-Express, the interrupt is a message, so it will work. The interrupt will just flush the data to the memory because it maintain ordering with posted writes upstream. In the current driver, since it's PCI and PCI-Express we don't do it. In the new mode for Arbel we may do it. When you do MSI/MSI-X, then architecturally it is guaranteed that by the time you get the interrupt, the data already waits for you in memory. Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Aug 16 12:39:26 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 16 Aug 2004 22:39:26 +0300 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> Message-ID: <20040816193925.GA15536@mellanox.co.il> Hello! Quoting r. Dror Goldenberg (gdror at mellanox.co.il) "RE: [openib-general] [PATCH] mthca updates (2.6.8 dependent)": > > > > -----Original Message----- > > From: Roland Dreier [mailto:roland at topspin.com] > > Sent: Monday, August 16, 2004 6:52 PM > > > Dror> Anyway, I believe that with both MSI and MSI-X you can avoid > > Dror> the PIO read. * If it's MSI-X, it's easy. By the IRQ# you > > Dror> can tell which EQ has work. * If it's MSI, just go and peek > > Dror> into all available EQs. It's much more efficient than doing > > Dror> a PIO read. > > > > It seems this would apply to standard INTx mode as well. Do > > you know why Mellanox didn't use this in THCA? In any case, > > I'll have to benchmark this approach. > > > > In PCI/PCIX, the interrupt is a wire, so it is not guaranteed that by the time > you > got the interrupt, the EQE will be waiting in memory. Clarification: if this happends, and you arm the EQ, you will get an immediate interrupt again. So what you can do, is ARM all EQs without regard to whether you did or did not find an eq entry there. Whether this race happends a sufficient number of times to affect performance negatively due to an extra interrupt remains to be seen. > This is because interrupt > goes on a separate wire from HCA to interrupt controller, while data goes > up the PCI bridges. Therefore it is required to perform a PIO read to flush all > posted writes flying upstream. So not "required" per se - required only if you want to find out which EQ had an event. > In PCI-Express, the interrupt is a message, so it will work. The interrupt will > just flush the data to the memory because it maintain ordering with posted > writes upstream. I'm not sure you can rely on messages being ordered properly with regard to posted writes e.g. inside the chipset. > In the current driver, since it's PCI and PCI-Express we > don't do it. In the new mode for Arbel we may do it. > When you do MSI/MSI-X, then architecturally it is guaranteed that by the time > you get the interrupt, the data already waits for you in memory. > Dror > From mshefty at ichips.intel.com Mon Aug 16 12:23:10 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 16 Aug 2004 12:23:10 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <20040816090753.4bd30a8d.mshefty@ichips.intel.com> References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> Message-ID: <20040816122310.1ced998f.mshefty@ichips.intel.com> On Mon, 16 Aug 2004 09:07:53 -0700 Sean Hefty wrote: > > Next, the API > > > > int ib_modify_device(struct ib_device *device, > > u8 port_num, > > int device_attr_flags); > > > > seems to leave out the actual properties structure. > > > > One last minor question: system image GUID is really per-device (since > > it's in NodeInfo, not PortInfo), so requiring a port number to set it > > seems a little unclean. Is it worth creating a new entry point for > > setting system image GUID (and any other per-device settings we want)? > > If you have a proposal for an alternate API, please let me know. Otherwise, I will take some time and examine some of the other code and try to pull something together. I like the idea of having two calls, modify_port and modify_device/node. Here's an initial attempt at fixing the ib_modify_device routine. I split the call into two calls: ib_modify_device and ib_modify_port. For ib_modify_port, I tried to re-use the port_info capability mask definitions. Please respond with any feedback. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 654) +++ ib_verbs.h (working copy) @@ -135,7 +135,7 @@ IB_DEVICE_SHUTDOWN_PORT = (1<<8), IB_DEVICE_INIT_TYPE = (1<<9), IB_DEVICE_PORT_ACTIVE_EVENT = (1<<10), - IB_DEVICE_SYS_IMG_GUID = (1<<11), + IB_DEVICE_SYS_IMAGE_GUID = (1<<11), IB_DEVICE_RC_RNR_NAK_GEN = (1<<12), IB_DEVICE_SRQ_RESIZE = (1<<13), IB_DEVICE_N_NOTIFY_CQ = (1<<14), @@ -228,10 +228,10 @@ IB_PORT_SYS_IMAGE_GUID_SUP = (1<<11), IB_PORT_PKEY_SW_EXT_PORT_TRAP_SUP = (1<<12), IB_PORT_CM_SUP = (1<<16), - IB_PORT_SNMP_TUNN_SUP = (1<<17), + IB_PORT_SNMP_TUNNEL_SUP = (1<<17), IB_PORT_REINIT_SUP = (1<<18), IB_PORT_DEVICE_MGMT_SUP = (1<<19), - IB_PORT_VENDOR_CLS_SUP = (1<<20), + IB_PORT_VENDOR_CLASS_SUP = (1<<20), IB_PORT_DR_NOTICE_SUP = (1<<21), IB_PORT_PORT_NOTICE_SUP = (1<<22), IB_PORT_BOOT_MGMT_SUP = (1<<23) @@ -256,12 +256,24 @@ u8 init_type_reply; }; -enum ib_device_attr_flags { - IB_DEVICE_SM = 1, - IB_DEVICE_SNMP_TUN_SUP = (1<<1), - IB_DEVICE_DM_SUP = (1<<2), - IB_DEVICE_VENDOR_CLS_SUP = (1<<3), - IB_DEVICE_RESET_QKEY_CNTR = (1<<4) +enum ib_device_modify_flags { + IB_DEVICE_SYS_IMAGE_GUID = 1 +}; + +struct ib_device_modify { + u64 sys_image_guid; +}; + +enum ib_port_modify_flags { + IB_PORT_SHUTDOWN = 1, + IB_PORT_INIT_TYPE = (1<<2), + IB_PORT_RESET_QKEY_CNTR = (1<<3) +}; + +struct ib_port_modify { + u32 set_port_cap_mask; + u32 clr_port_cap_mask; + u8 init_type; }; union ib_gid { @@ -626,7 +638,11 @@ int (*query_pkey)(struct ib_device *device, u8 port_num, u16 index, u16 *pkey); int (*modify_device)(struct ib_device *device, - u8 port_num, int device_attr_flags); + int device_modify_mask, + struct ib_device_modify *device_modify); + int (*modify_port)(struct ib_device *device, + u8 port_num, int port_modify_mask, + struct ib_port_modify *port_modify); struct ib_pd * (*alloc_pd)(struct ib_device *device); int (*dealloc_pd)(struct ib_pd *pd); struct ib_ah * (*create_ah)(struct ib_pd *pd, @@ -750,10 +766,20 @@ } static inline int ib_modify_device(struct ib_device *device, - u8 port_num, - int device_attr_flags) + int device_modify_mask, + struct ib_device_modify *device_modify) +{ + return device->modify_device(device, device_modify_mask, + device_modify); +} + +static inline int ib_modify_port(struct ib_device *device, + u8 port_num, + int port_modify_mask, + struct ib_port_modify *port_modify) { - return device->modify_device(device, port_num, device_attr_flags); + return device->modify_device(device, port_num, port_modify_mask, + port_modify); } static inline struct ib_pd *ib_alloc_pd(struct ib_device *device) From halr at voltaire.com Mon Aug 16 13:55:09 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 16 Aug 2004 16:55:09 -0400 Subject: [openib-general] [PATCH] GSI: Use one send pool Message-ID: <1092689709.1877.9.camel@localhost.localdomain> Use one send pool rather than 1/HCA in GSI Index: gsi_main.c =================================================================== --- gsi_main.c (revision 648) +++ gsi_main.c (working copy) @@ -145,6 +145,7 @@ static struct list_head gsi_class_list; static struct list_head gsi_hca_list; static int gsi_pool_cnt = 0; +static void *gsi_snd_dtgrm_pool = NULL; static struct timer_list gsi_sent_dtgrm_timer; static atomic_t gsi_sent_dtgrm_timer_running = ATOMIC_INIT(1); @@ -649,7 +650,6 @@ gsi_hca_stop(hca); gsi_thread_stop(hca); gsi_dtgrm_pool_destroy(hca->rcv_dtgrm_pool); - gsi_dtgrm_pool_destroy(hca->snd_dtgrm_pool); ib_destroy_qp(hca->qp); ib_destroy_cq(hca->cq); kfree(hca); @@ -765,13 +765,7 @@ goto error6; } - ret = gsi_dtgrm_pool_create_named(GSI_QP_SND_SIZE, - "snd", - &hca->snd_dtgrm_pool); - if (ret < 0) { - printk(KERN_ERR "Could not create send datagram pool\n"); - goto error7; - } + hca->snd_dtgrm_pool = gsi_snd_dtgrm_pool; spin_lock_init(&hca->rcv_list_lock); spin_lock_init(&hca->snd_list_lock); @@ -783,7 +777,7 @@ ret = gsi_hca_start(hca); if (ret) { printk(KERN_ERR "Could not start device\n"); - goto error8; + goto error7; } GSI_HCA_LIST_LOCK(); @@ -792,11 +786,9 @@ return 0; -error8: +error7: gsi_thread_stop(hca); - gsi_dtgrm_pool_destroy(hca->snd_dtgrm_pool); -error7: gsi_dtgrm_pool_destroy(hca->rcv_dtgrm_pool); error6: ib_destroy_qp(hca->qp); @@ -2870,6 +2862,18 @@ goto error2; } + if (!gsi_snd_dtgrm_pool) + { + result = gsi_dtgrm_pool_create_named(GSI_QP_SND_SIZE, + "snd", + &gsi_snd_dtgrm_pool); + if (result < 0) { + printk(KERN_ERR "Could not create send datagram pool\n"); + goto error3; + } + + } + if (rmpp_init() != RMPP_SUCCESS) { printk(KERN_ERR "Could init RMPP!\n"); result = -ENXIO; @@ -2895,13 +2899,18 @@ gsi_cleanup_module(void) { printk(KERN_DEBUG "Bye GSI!\n"); + + ib_device_notifier_deregister(&gsi_notifier); + #ifdef GSI_RMPP_SUPPORT rmpp_cleanup(); #endif - ib_device_notifier_deregister(&gsi_notifier); - gsi_sent_dtgrm_timer_stop(); + if (gsi_snd_dtgrm_pool) { + gsi_dtgrm_pool_destroy(gsi_snd_dtgrm_pool); + gsi_snd_dtgrm_pool = NULL; + } remove_proc_entry("openib/gsi/control", NULL); module_version_exit(MODNAME); } From tduffy at sun.com Mon Aug 16 14:49:50 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 16 Aug 2004 14:49:50 -0700 Subject: [openib-general] [PATCH] Remove drivers/infiniband/core/header_export.c Message-ID: <1092692991.2752.21.camel@duffman> Remove file drivers/infiniband/core/header_export.c. Roland, if you apply this, please don't forget to "svn delete header_export.c". Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/core/Makefile =================================================================== --- drivers/infiniband/core/Makefile (revision 654) +++ drivers/infiniband/core/Makefile (working copy) @@ -34,7 +34,6 @@ pm_export.o \ header_main.o \ header_ud.o \ - header_export.o \ core_main.o \ core_device.o \ core_pd.o \ Index: drivers/infiniband/core/header_main.c =================================================================== --- drivers/infiniband/core/header_main.c (revision 654) +++ drivers/infiniband/core/header_main.c (working copy) @@ -108,6 +108,7 @@ } } } +EXPORT_SYMBOL(ib_header_pack); static void ib_value_write(int offset, int size, @@ -190,6 +191,7 @@ } } } +EXPORT_SYMBOL(ib_header_unpack); /* Local Variables: Index: drivers/infiniband/core/header_ud.c =================================================================== --- drivers/infiniband/core/header_ud.c (revision 654) +++ drivers/infiniband/core/header_ud.c (working copy) @@ -28,6 +28,7 @@ #include "ts_kernel_services.h" #include +#include void ib_ud_header_init(int payload_bytes, int grh_present, @@ -74,6 +75,7 @@ header->bth.pad_count = (4 - payload_bytes) & 3; header->bth.transport_header_version = 0; } +EXPORT_SYMBOL(ib_ud_header_init); int ib_ud_header_pack(struct ib_ud_header *header, void *buf) @@ -113,6 +115,7 @@ return len; } +EXPORT_SYMBOL(ib_ud_header_pack); int ib_ud_header_unpack(void *buf, struct ib_ud_header *header) @@ -198,6 +201,7 @@ return 0; } +EXPORT_SYMBOL(ib_ud_header_unpack); /* Local Variables: Index: drivers/infiniband/core/header_export.c =================================================================== --- drivers/infiniband/core/header_export.c (revision 654) +++ drivers/infiniband/core/header_export.c (working copy) @@ -1,32 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include "ts_ib_header.h" - -#include - -EXPORT_SYMBOL(ib_header_pack); -EXPORT_SYMBOL(ib_header_unpack); -EXPORT_SYMBOL(ib_ud_header_init); -EXPORT_SYMBOL(ib_ud_header_pack); -EXPORT_SYMBOL(ib_ud_header_unpack); -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From tduffy at sun.com Mon Aug 16 14:56:12 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 16 Aug 2004 14:56:12 -0700 Subject: [openib-general] [PATCH] Remove drivers/infiniband/core/mad_export.c In-Reply-To: <1092692991.2752.21.camel@duffman> References: <1092692991.2752.21.camel@duffman> Message-ID: <1092693372.2752.25.camel@duffman> Remove file drivers/infiniband/core/mad_export.c. Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/core/Makefile =================================================================== --- drivers/infiniband/core/Makefile (revision 654) +++ drivers/infiniband/core/Makefile (working copy) @@ -55,8 +54,7 @@ mad_filter.o \ mad_thread.o \ mad_static.o \ - mad_proc.o \ - mad_export.o + mad_proc.o ib_cm-objs := \ cm_main.o \ Index: drivers/infiniband/core/mad_ib.c =================================================================== --- drivers/infiniband/core/mad_ib.c (revision 654) +++ drivers/infiniband/core/mad_ib.c (working copy) @@ -159,6 +159,7 @@ *buf = *mad; return ib_mad_send_no_copy(buf); } +EXPORT_SYMBOL(ib_mad_send); static void ib_mad_handle_wc(struct ib_device *device, struct ib_wc *entry) Index: drivers/infiniband/core/mad_export.c =================================================================== --- drivers/infiniband/core/mad_export.c (revision 654) +++ drivers/infiniband/core/mad_export.c (working copy) @@ -1,30 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include "ts_ib_mad.h" - -#include - -EXPORT_SYMBOL(ib_mad_send); -EXPORT_SYMBOL(ib_mad_handler_register); -EXPORT_SYMBOL(ib_mad_handler_deregister); Index: drivers/infiniband/core/mad_filter.c =================================================================== --- drivers/infiniband/core/mad_filter.c (revision 654) +++ drivers/infiniband/core/mad_filter.c (working copy) @@ -347,6 +347,7 @@ *handle = filter; return 0; } +EXPORT_SYMBOL(ib_mad_handler_register); int ib_mad_handler_deregister(tTS_IB_MAD_FILTER_HANDLE handle) { @@ -373,6 +374,7 @@ kfree(filter); return 0; } +EXPORT_SYMBOL(ib_mad_handler_deregister); int ib_mad_filter_get_by_index(int index, struct ib_mad_filter_list *filter) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From tduffy at sun.com Mon Aug 16 16:07:16 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 16 Aug 2004 16:07:16 -0700 Subject: [openib-general] [PATCH] Remove drivers/infiniband/core/sa_client_export.c In-Reply-To: <1092693372.2752.25.camel@duffman> References: <1092692991.2752.21.camel@duffman> <1092693372.2752.25.camel@duffman> Message-ID: <1092697636.2752.31.camel@duffman> Remove file drivers/infiniband/core/sa_client_export.c. That is the last of them in core. Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/core/sa_client_service.c =================================================================== --- drivers/infiniband/core/sa_client_service.c (revision 654) +++ drivers/infiniband/core/sa_client_service.c (working copy) @@ -222,6 +222,7 @@ completion_arg, transaction_id, IB_MGMT_METHOD_GET); } +EXPORT_SYMBOL(ib_service_get); int ib_service_set(struct ib_device *device, tTS_IB_PORT port, @@ -237,6 +238,7 @@ completion_arg, transaction_id, IB_MGMT_METHOD_SET); } +EXPORT_SYMBOL(ib_service_set); int ib_service_delete(struct ib_device *device, tTS_IB_PORT port, @@ -252,6 +254,7 @@ completion_arg, transaction_id, IB_SA_METHOD_DELETE); } +EXPORT_SYMBOL(ib_service_delete); static void _tsIbServiceAtsGetGidResponse(tTS_IB_CLIENT_RESPONSE_STATUS status, struct ib_mad *packet, @@ -446,6 +449,7 @@ return 0; } +EXPORT_SYMBOL(tsIbAtsServiceSet); int tsIbAtsServiceGetGid(struct ib_device *device, tTS_IB_PORT port, @@ -503,6 +507,7 @@ return 0; } +EXPORT_SYMBOL(tsIbAtsServiceGetGid); int tsIbAtsServiceGetIp(struct ib_device *device, tTS_IB_PORT port, @@ -559,6 +564,7 @@ return 0; } +EXPORT_SYMBOL(tsIbAtsServiceGetIp); /* Local Variables: Index: drivers/infiniband/core/Makefile =================================================================== --- drivers/infiniband/core/Makefile (revision 654) +++ drivers/infiniband/core/Makefile (working copy) @@ -82,7 +80,6 @@ sa_client_inform.o \ sa_client_notice.o \ sa_client_service.o \ - sa_client_export.o \ sa_client_node_info.o ib_dm_client-objs := \ Index: drivers/infiniband/core/sa_client_inform.c =================================================================== --- drivers/infiniband/core/sa_client_inform.c (revision 654) +++ drivers/infiniband/core/sa_client_inform.c (working copy) @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -311,6 +312,7 @@ return rc; } +EXPORT_SYMBOL(tsIbSetInServiceNoticeHandler); int tsIbSetOutofServiceNoticeHandler(struct ib_device *device, tTS_IB_PORT port, @@ -348,6 +350,7 @@ return rc; } +EXPORT_SYMBOL(tsIbSetOutofServiceNoticeHandler); int tsIbSetMcastGroupCreateNoticeHandler(struct ib_device *device, tTS_IB_PORT port, Index: drivers/infiniband/core/sa_client_path_record.c =================================================================== --- drivers/infiniband/core/sa_client_path_record.c (revision 654) +++ drivers/infiniband/core/sa_client_path_record.c (working copy) @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -187,3 +188,4 @@ return 0; } +EXPORT_SYMBOL(tsIbPathRecordRequest); Index: drivers/infiniband/core/sa_client_node_info.c =================================================================== --- drivers/infiniband/core/sa_client_node_info.c (revision 654) +++ drivers/infiniband/core/sa_client_node_info.c (working copy) @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -184,3 +185,4 @@ return 0; } +EXPORT_SYMBOL(tsIbNodeInfoQuery); Index: drivers/infiniband/core/sa_client_export.c =================================================================== --- drivers/infiniband/core/sa_client_export.c (revision 654) +++ drivers/infiniband/core/sa_client_export.c (working copy) @@ -1,42 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include "ts_ib_sa_client.h" - -#include - -EXPORT_SYMBOL(tsIbPathRecordRequest); -EXPORT_SYMBOL(tsIbMulticastGroupJoin); -EXPORT_SYMBOL(tsIbMulticastGroupLeave); -EXPORT_SYMBOL(tsIbPortInfoQuery); -EXPORT_SYMBOL(tsIbPortInfoTblQuery); -EXPORT_SYMBOL(tsIbSetInServiceNoticeHandler); -EXPORT_SYMBOL(tsIbSetOutofServiceNoticeHandler); -EXPORT_SYMBOL(tsIbAtsServiceSet); -EXPORT_SYMBOL(tsIbAtsServiceGetGid); -EXPORT_SYMBOL(tsIbAtsServiceGetIp); -EXPORT_SYMBOL(tsIbMulticastGroupTableQuery); -EXPORT_SYMBOL(tsIbNodeInfoQuery); -EXPORT_SYMBOL(ib_service_get); -EXPORT_SYMBOL(ib_service_set); -EXPORT_SYMBOL(ib_service_delete); Index: drivers/infiniband/core/sa_client_multicast.c =================================================================== --- drivers/infiniband/core/sa_client_multicast.c (revision 654) +++ drivers/infiniband/core/sa_client_multicast.c (working copy) @@ -32,6 +32,7 @@ #include #include #include +#include #include #include @@ -357,6 +358,7 @@ return 0; } +EXPORT_SYMBOL(tsIbMulticastGroupJoin); int tsIbMulticastGroupLeave(struct ib_device *device, tTS_IB_PORT port, tTS_IB_GID mgid) @@ -364,6 +366,7 @@ /* XXX implement */ return 0; } +EXPORT_SYMBOL(tsIbMulticastGroupLeave); int tsIbMulticastGroupTableQuery(struct ib_device *device, tTS_IB_PORT port, @@ -411,3 +414,4 @@ return 0; } +EXPORT_SYMBOL(tsIbMulticastGroupTableQuery); Index: drivers/infiniband/core/sa_client_port_info.c =================================================================== --- drivers/infiniband/core/sa_client_port_info.c (revision 654) +++ drivers/infiniband/core/sa_client_port_info.c (working copy) @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -494,6 +495,7 @@ return 0; } +EXPORT_SYMBOL(tsIbPortInfoQuery); int tsIbPortInfoTblQuery(struct ib_device *device, tTS_IB_PORT port, @@ -532,3 +534,4 @@ return 0; } +EXPORT_SYMBOL(tsIbPortInfoTblQuery); -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From tduffy at sun.com Mon Aug 16 16:17:41 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 16 Aug 2004 16:17:41 -0700 Subject: [openib-general] [PATCH] Remove drivers/infiniband/ulp/ipoib/ipoib_export.c In-Reply-To: <1092697636.2752.31.camel@duffman> References: <1092692991.2752.21.camel@duffman> <1092693372.2752.25.camel@duffman> <1092697636.2752.31.camel@duffman> Message-ID: <1092698261.2752.37.camel@duffman> Remove file drivers/infiniband/ulp/ipoib/ipoib_export.c. Signed-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/ulp/ipoib/ipoib_export.c =================================================================== --- drivers/infiniband/ulp/ipoib/ipoib_export.c (revision 654) +++ drivers/infiniband/ulp/ipoib/ipoib_export.c (working copy) @@ -1,29 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include "ipoib_proto.h" - -#include - -EXPORT_SYMBOL(ipoib_get_gid); -EXPORT_SYMBOL(ipoib_device_handle); Index: drivers/infiniband/ulp/ipoib/ipoib_arp.c =================================================================== --- drivers/infiniband/ulp/ipoib/ipoib_arp.c (revision 654) +++ drivers/infiniband/ulp/ipoib/ipoib_arp.c (working copy) @@ -30,6 +30,7 @@ #include #include +#include enum { IPOIB_ADDRESS_HASH_BITS = IPOIB_ADDRESS_HASH_BYTES * 8, @@ -1095,6 +1096,7 @@ return 0; } +EXPORT_SYMBOL(ipoib_get_gid); /* * Local Variables: Index: drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 654) +++ drivers/infiniband/ulp/ipoib/ipoib_main.c (working copy) @@ -124,6 +124,7 @@ return 0; } +EXPORT_SYMBOL(ipoib_device_handle); int ipoib_dev_open(struct net_device *dev) { Index: drivers/infiniband/ulp/ipoib/Makefile =================================================================== --- drivers/infiniband/ulp/ipoib/Makefile (revision 654) +++ drivers/infiniband/ulp/ipoib/Makefile (working copy) @@ -10,7 +10,6 @@ ipoib_multicast.o \ ipoib_arp.o \ ipoib_proc.o \ - ipoib_export.o \ ipoib_verbs.o \ ipoib_vlan.o -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From tduffy at sun.com Mon Aug 16 16:29:30 2004 From: tduffy at sun.com (Tom Duffy) Date: Mon, 16 Aug 2004 16:29:30 -0700 Subject: [openib-general] [PATCH] Update Roland's kernel TODO Message-ID: <1092698970.2752.39.camel@duffman> Index: gen2/branches/roland-merge/src/linux-kernel/TODO =================================================================== --- gen2/branches/roland-merge/src/linux-kernel/TODO (revision 654) +++ gen2/branches/roland-merge/src/linux-kernel/TODO (working copy) @@ -5,15 +5,14 @@ uses of tsIb* are gone. - get rid of unnecessary typedefs of structs. + DONE for core and ipoib. - remove use of void * handles and change to passing pointers to underlying struct. - - get rid of bizarro types such as tUINT32 -- replace with Linux - standard u32 etc. - - remove the *_exports.c files and place the exports next to the function declarations + DONE except for legacy. - remove uses of in_atomic() (replace by a "can_sleep" parameter and push up the call chain until in a context that knows if it can @@ -38,6 +37,9 @@ user (e.g. using __get_free_pages where appropriate) and while you're at it you should probably switch it to struct hlist_head + - remove ts_kernel_trace.h and use printk/pr_debug instead of + TS_TRACE/TS_REPORT_* + IB specific tasks: - rewrite client_query/sa_client/dm_client so that they are more general (better support for component mask, RMPP, etc) and more -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From roland at topspin.com Tue Aug 17 10:06:40 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:06:40 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> (Dror Goldenberg's message of "Mon, 16 Aug 2004 21:27:13 +0300") References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> Message-ID: <52brh9d78f.fsf@topspin.com> Dror> In PCI/PCIX, the interrupt is a wire, so it is not Dror> guaranteed that by the time you got the interrupt, the EQE Dror> will be waiting in memory. Ah, thanks. I forgot about the write ordering issue. - Roland From roland at topspin.com Tue Aug 17 10:09:12 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:09:12 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <20040816193925.GA15536@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 16 Aug 2004 22:39:26 +0300") References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> <20040816193925.GA15536@mellanox.co.il> Message-ID: <527jrxd747.fsf@topspin.com> Michael> I'm not sure you can rely on messages being ordered Michael> properly with regard to posted writes e.g. inside the Michael> chipset. Hmm... that seems like a really ugly "feature" to allow interrupts to pass posted writes within the chipset. In any case in mthca I unconditionally rearm the EQ after polling it, so I think my MSI-X implementation should work OK even if we have that ordering problem. - R. From roland at topspin.com Tue Aug 17 10:14:49 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:14:49 -0700 Subject: [openib-general] [PATCH] Remove drivers/infiniband/core/header_export.c In-Reply-To: <1092692991.2752.21.camel@duffman> (Tom Duffy's message of "Mon, 16 Aug 2004 14:49:50 -0700") References: <1092692991.2752.21.camel@duffman> Message-ID: <523c2ld6uu.fsf@topspin.com> Tom> Remove file drivers/infiniband/core/header_export.c. Committed, thanks. Tom> Roland, if you apply this, please don't forget to "svn delete Tom> header_export.c". Thanks for the reminder -- I can say that it is highly likely I would have forgotten to do this otherwise :) - R. From roland at topspin.com Tue Aug 17 10:14:54 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:14:54 -0700 Subject: [openib-general] [PATCH] Remove drivers/infiniband/core/mad_export.c In-Reply-To: <1092693372.2752.25.camel@duffman> (Tom Duffy's message of "Mon, 16 Aug 2004 14:56:12 -0700") References: <1092692991.2752.21.camel@duffman> <1092693372.2752.25.camel@duffman> Message-ID: <52y8kdbsa9.fsf@topspin.com> thanks, applied. From roland at topspin.com Tue Aug 17 10:14:59 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:14:59 -0700 Subject: [openib-general] [PATCH] Remove drivers/infiniband/core/sa_client_export.c In-Reply-To: <1092697636.2752.31.camel@duffman> (Tom Duffy's message of "Mon, 16 Aug 2004 16:07:16 -0700") References: <1092692991.2752.21.camel@duffman> <1092693372.2752.25.camel@duffman> <1092697636.2752.31.camel@duffman> Message-ID: <52u0v1bsa4.fsf@topspin.com> thanks, applied. From roland at topspin.com Tue Aug 17 10:15:04 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:15:04 -0700 Subject: [openib-general] [PATCH] Remove drivers/infiniband/ulp/ipoib/ipoib_export.c In-Reply-To: <1092698261.2752.37.camel@duffman> (Tom Duffy's message of "Mon, 16 Aug 2004 16:17:41 -0700") References: <1092692991.2752.21.camel@duffman> <1092693372.2752.25.camel@duffman> <1092697636.2752.31.camel@duffman> <1092698261.2752.37.camel@duffman> Message-ID: <52pt5pbs9z.fsf@topspin.com> thanks, applied. From roland at topspin.com Tue Aug 17 10:15:08 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:15:08 -0700 Subject: [openib-general] [PATCH] Update Roland's kernel TODO In-Reply-To: <1092698970.2752.39.camel@duffman> (Tom Duffy's message of "Mon, 16 Aug 2004 16:29:30 -0700") References: <1092698970.2752.39.camel@duffman> Message-ID: <52llgdbs9v.fsf@topspin.com> thanks, applied. From mst at mellanox.co.il Tue Aug 17 10:16:40 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Aug 2004 20:16:40 +0300 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <527jrxd747.fsf@topspin.com> References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> <20040816193925.GA15536@mellanox.co.il> <527jrxd747.fsf@topspin.com> Message-ID: <20040817171640.GA23956@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] [PATCH] mthca updates (2.6.8 dependent)": > Michael> I'm not sure you can rely on messages being ordered > Michael> properly with regard to posted writes e.g. inside the > Michael> chipset. > > Hmm... that seems like a really ugly "feature" to allow interrupts to > pass posted writes within the chipset. No, I was talking about interrupt messages in PCI Express. > In any case in mthca I unconditionally rearm the EQ after polling it, > so I think my MSI-X implementation should work OK even if we have that > ordering problem. Should be OK then, the only reason you have to read ECR is to know which EQs to arm. If you know it from MSI-X Vector, ordering will take care of itself. MST From roland at topspin.com Tue Aug 17 10:16:18 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:16:18 -0700 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <1092678729.2752.12.camel@duffman> (Tom Duffy's message of "Mon, 16 Aug 2004 10:52:09 -0700") References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> Message-ID: <52hdr1bs7x.fsf@topspin.com> I need a little time to think this over and talk to a few people. As I said I would like to allow for dynamic setting of the trace level, so it may not be worth doing the pr_debug() conversion and then switching to yet another trace macro. Thanks, - R. From roland at topspin.com Tue Aug 17 10:19:54 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:19:54 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <20040817171640.GA23956@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Aug 2004 20:16:40 +0300") References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> <20040816193925.GA15536@mellanox.co.il> <527jrxd747.fsf@topspin.com> <20040817171640.GA23956@mellanox.co.il> Message-ID: <524qn1bs1x.fsf@topspin.com> Roland> Hmm... that seems like a really ugly "feature" to allow Roland> interrupts to pass posted writes within the chipset. Michael> No, I was talking about interrupt messages in PCI Michael> Express. I haven't looked at this in a while but I remember reading the PCI Express spec and deciding that interrupt messages sent after a posted write cannot pass the write within the PCI Express world. I thought the issue is that once the PCI Express root complex has received the interrupt message, it may process it and raise an interrupt with the CPU before earlier writes have made through to memory. - R. From mst at mellanox.co.il Tue Aug 17 10:26:23 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Aug 2004 20:26:23 +0300 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <524qn1bs1x.fsf@topspin.com> References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> <20040816193925.GA15536@mellanox.co.il> <527jrxd747.fsf@topspin.com> <20040817171640.GA23956@mellanox.co.il> <524qn1bs1x.fsf@topspin.com> Message-ID: <20040817172623.GB23956@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] [PATCH] mthca updates (2.6.8 dependent)": > Roland> Hmm... that seems like a really ugly "feature" to allow > Roland> interrupts to pass posted writes within the chipset. > > Michael> No, I was talking about interrupt messages in PCI > Michael> Express. > > I haven't looked at this in a while but I remember reading the PCI > Express spec and deciding that interrupt messages sent after a posted > write cannot pass the write within the PCI Express world. I thought > the issue is that once the PCI Express root complex has received the > interrupt message, it may process it and raise an interrupt with the > CPU before earlier writes have made through to memory. > Yes, thats it. So PCI Express interrupts are similiar to PCI - ordering versus writes may not be guaranteed. No idea if this happends in practice though. MST From roland at topspin.com Tue Aug 17 10:39:11 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 10:39:11 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <20040816122310.1ced998f.mshefty@ichips.intel.com> (Sean Hefty's message of "Mon, 16 Aug 2004 12:23:10 -0700") References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> Message-ID: <52n00taclc.fsf@topspin.com> This looks good to me. As I try to implement it in mthca I guess I'll see if there are any more problems... - R. From mshefty at ichips.intel.com Tue Aug 17 09:54:06 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Aug 2004 09:54:06 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <52n00taclc.fsf@topspin.com> References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> <52n00taclc.fsf@topspin.com> Message-ID: <20040817095406.7f57bde6.mshefty@ichips.intel.com> On Tue, 17 Aug 2004 10:39:11 -0700 Roland Dreier wrote: > This looks good to me. As I try to implement it in mthca I guess I'll > see if there are any more problems... Thanks for the feedback! I will update and commit these changes. I want to convert the ib_port_cap_flags to match bit locations defined by the PortInfo attribute. (E.g. IB_PORT_SM = (1<<31).) From mshefty at ichips.intel.com Tue Aug 17 10:23:13 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Aug 2004 10:23:13 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <20040817095406.7f57bde6.mshefty@ichips.intel.com> References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> <52n00taclc.fsf@topspin.com> <20040817095406.7f57bde6.mshefty@ichips.intel.com> Message-ID: <20040817102313.6126e078.mshefty@ichips.intel.com> On Tue, 17 Aug 2004 09:54:06 -0700 Sean Hefty wrote: > I will update and commit these changes. I want to convert the ib_port_cap_flags to match bit locations defined by the PortInfo attribute. (E.g. IB_PORT_SM = (1<<31).) Changes have been committed. From halr at voltaire.com Tue Aug 17 11:48:40 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 17 Aug 2004 14:48:40 -0400 Subject: [openib-general] modify_device API? In-Reply-To: <20040817102313.6126e078.mshefty@ichips.intel.com> References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> <52n00taclc.fsf@topspin.com> <20040817095406.7f57bde6.mshefty@ichips.intel.com> <20040817102313.6126e078.mshefty@ichips.intel.com> Message-ID: <1092768520.1837.15.camel@localhost.localdomain> On Tue, 2004-08-17 at 13:23, Sean Hefty wrote: > On Tue, 17 Aug 2004 09:54:06 -0700 > Sean Hefty wrote: > > I will update and commit these changes. I want to convert the ib_port_cap_flags to match bit locations defined by the PortInfo attribute. (E.g. IB_PORT_SM = (1<<31).) > > Changes have been committed. I found 2 minor issues with this change: ib_verbs.h:261: conflicting types for `IB_DEVICE_SYS_IMAGE_GUID' ib_verbs.h:138: previous declaration of `IB_DEVICE_SYS_IMAGE_GUID' and line 782 should be device->modify_port rather than modify_device Here's a patch for the latter: Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 660) +++ ib_verbs.h (working copy) @@ -778,8 +778,8 @@ int port_modify_mask, struct ib_port_modify *port_modify) { - return device->modify_device(device, port_num, port_modify_mask, - port_modify); + return device->modify_port(device, port_num, port_modify_mask, + port_modify); } static inline struct ib_pd *ib_alloc_pd(struct ib_device *device) > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mshefty at ichips.intel.com Tue Aug 17 12:44:03 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Aug 2004 12:44:03 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <1092768520.1837.15.camel@localhost.localdomain> References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> <52n00taclc.fsf@topspin.com> <20040817095406.7f57bde6.mshefty@ichips.intel.com> <20040817102313.6126e078.mshefty@ichips.intel.com> <1092768520.1837.15.camel@localhost.localdomain> Message-ID: <20040817124403.53cda21f.mshefty@ichips.intel.com> On Tue, 17 Aug 2004 14:48:40 -0400 Hal Rosenstock wrote: > I found 2 minor issues with this change: > ib_verbs.h:261: conflicting types for `IB_DEVICE_SYS_IMAGE_GUID' > ib_verbs.h:138: previous declaration of `IB_DEVICE_SYS_IMAGE_GUID' Thanks for the review! Patch is applied. I need to think about what to do about the ib_device_cap_flags and ib_device_modify_flags. Any thoughts? From gdror at mellanox.co.il Tue Aug 17 14:19:34 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Wed, 18 Aug 2004 00:19:34 +0300 Subject: [openib-general] Error code in "create" functions Message-ID: <506C3D7B14CDD411A52C00025558DED605C68A98@mtlex01.yok.mtl.com> Hi, It appears that in some functions that perform allocation of objects, the new object pointer is being returned. For example: struct ib_pd * (*alloc_pd)(struct ib_device *device); struct ib_ah * (*create_ah)(struct ib_pd *pd, struct ib_ah_attr *ah_attr); struct ib_qp * (*create_qp)(struct ib_pd *pd,struct ib_qp_init_attr *qp_init_attr, struct ib_qp_cap *qp_cap); struct ib_srq * (*create_srq)(struct ib_pd *pd, void *srq_context, struct ib_srq_attr *srq_attr); My concern is about the ability to return error status. The failure may be due to lack of resources, in which case, returning NULL makes sense. However, some may fail because of bad parameters being passed in. This is mainly because these functions not only deal with memory allocation of the object, they also deal with initialization of the object and configuration of the HW. It's not so bad just to return NULL when the operation fails and drop the error code. But, even for the sake of debug and diagnostics, I'd consider allowing the propagation of a return code. -Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue Aug 17 13:23:25 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Aug 2004 13:23:25 -0700 Subject: [openib-general] Error code in "create" functions In-Reply-To: <506C3D7B14CDD411A52C00025558DED605C68A98@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED605C68A98@mtlex01.yok.mtl.com> Message-ID: <20040817132325.186b79eb.mshefty@ichips.intel.com> On Wed, 18 Aug 2004 00:19:34 +0300 Dror Goldenberg wrote: > My concern is about the ability to return error status. The failure may be > due > to lack of resources, in which case, returning NULL makes sense. However, > some may fail because of bad parameters being passed in. This is mainly > because these functions not only deal with memory allocation of the object, > they also deal with initialization of the object and configuration of the > HW. Based on previous discussions, I believe that the plan is to use the ERR_PTR / PTR_ERR / IS_ERR routines to return failure reasons. From gdror at mellanox.co.il Tue Aug 17 14:30:22 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Wed, 18 Aug 2004 00:30:22 +0300 Subject: [openib-general] Client Reregistration Status Message-ID: <506C3D7B14CDD411A52C00025558DED605C68AAC@mtlex01.yok.mtl.com> > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Friday, August 13, 2004 12:24 AM > > Note that the last 2 items (4 and 5) need approval by SWG and > are subject to change. > I promised to update... There was no quorum today. It'll be discussed next Tue :) Anyways, I still don't expect any problems with it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdror at mellanox.co.il Tue Aug 17 14:38:13 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Wed, 18 Aug 2004 00:38:13 +0300 Subject: [openib-general] Error code in "create" functions Message-ID: <506C3D7B14CDD411A52C00025558DED605C68AAE@mtlex01.yok.mtl.com> > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Tuesday, August 17, 2004 11:23 PM > > > Based on previous discussions, I believe that the plan is to > use the ERR_PTR / PTR_ERR / IS_ERR routines to return failure reasons. > Got it ! Thanks, Dror -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at topspin.com Tue Aug 17 14:34:30 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 14:34:30 -0700 Subject: [openib-general] Error code in "create" functions In-Reply-To: <20040817132325.186b79eb.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 17 Aug 2004 13:23:25 -0700") References: <506C3D7B14CDD411A52C00025558DED605C68A98@mtlex01.yok.mtl.com> <20040817132325.186b79eb.mshefty@ichips.intel.com> Message-ID: <52acwta1p5.fsf@topspin.com> Dror> My concern is about the ability to return error status. The Dror> failure may be due to lack of resources, in which case, Dror> returning NULL makes sense. However, some may fail because Dror> of bad parameters being passed in. This is mainly because Dror> these functions not only deal with memory allocation of the Dror> object, they also deal with initialization of the object and Dror> configuration of the HW. Sean> Based on previous discussions, I believe that the plan is to Sean> use the ERR_PTR / PTR_ERR / IS_ERR routines to return Sean> failure reasons. Exactly, that is what my implementation in mthca does. - Roland From roland at topspin.com Tue Aug 17 14:49:01 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 14:49:01 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <20040817124403.53cda21f.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 17 Aug 2004 12:44:03 -0700") References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> <52n00taclc.fsf@topspin.com> <20040817095406.7f57bde6.mshefty@ichips.intel.com> <20040817102313.6126e078.mshefty@ichips.intel.com> <1092768520.1837.15.camel@localhost.localdomain> <20040817124403.53cda21f.mshefty@ichips.intel.com> Message-ID: <52657ha10y.fsf@topspin.com> Sean> Patch is applied. I need to think about what to do about Sean> the ib_device_cap_flags and ib_device_modify_flags. Any Sean> thoughts? We could maybe use IB_DEV_CAP_ and IB_DEV_MOD_ as the prefixes... - R. From krkumar at us.ibm.com Tue Aug 17 16:29:43 2004 From: krkumar at us.ibm.com (Krishna Kumar) Date: Tue, 17 Aug 2004 16:29:43 -0700 (PDT) Subject: [openib-general] Couple of questions on struct ib_device In-Reply-To: <52hdrytory.fsf@topspin.com> Message-ID: Hi, I was looking at ib_verbs.h and I am confused about one thing : why are entry points like open_hca, modify_hca and close_hca missing ? Also, is there any value in encapsulating the various functions of ib_device into a different structure, like ib_device_ops or something rather than having a visually huge structure that we have today ? thanks, - KK From roland at topspin.com Tue Aug 17 17:32:01 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 17 Aug 2004 17:32:01 -0700 Subject: [openib-general] Couple of questions on struct ib_device In-Reply-To: (Krishna Kumar's message of "Tue, 17 Aug 2004 16:29:43 -0700 (PDT)") References: Message-ID: <521xi59tha.fsf@topspin.com> Krishna> Hi, I was looking at ib_verbs.h and I am confused about Krishna> one thing : why are entry points like open_hca, Krishna> modify_hca and close_hca missing ? Not sure which version of ib_verbs.h you were looking at. If you were looking in my branch then the structure is incomplete. In any case, modify_hca will be split into modify_port and modify_device functions. I'm not sure we really need open_hca and close_hca (at least as defined in the spec), and in any case I think this will be handled in the access layer (the low-level driver doesn't need to handle the reference counting or anything else). I expect to get a better idea about this as I implement device model/sysfs stuff. Krishna> Also, is there any value in encapsulating the various Krishna> functions of ib_device into a different structure, like Krishna> ib_device_ops or something rather than having a visually Krishna> huge structure that we have today ? Maybe, but the kernel currently has far bigger structures such as struct net_device or struct task_struct. We can just keep the function pointers separate in the declaration of struct ib_device. - R. From mshefty at ichips.intel.com Thu Aug 19 08:58:49 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 19 Aug 2004 08:58:49 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <52657ha10y.fsf@topspin.com> References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> <52n00taclc.fsf@topspin.com> <20040817095406.7f57bde6.mshefty@ichips.intel.com> <20040817102313.6126e078.mshefty@ichips.intel.com> <1092768520.1837.15.camel@localhost.localdomain> <20040817124403.53cda21f.mshefty@ichips.intel.com> <52657ha10y.fsf@topspin.com> Message-ID: <20040819085849.404391af.mshefty@ichips.intel.com> On Tue, 17 Aug 2004 14:49:01 -0700 Roland Dreier wrote: > Sean> Patch is applied. I need to think about what to do about > Sean> the ib_device_cap_flags and ib_device_modify_flags. Any > Sean> thoughts? > > We could maybe use IB_DEV_CAP_ and IB_DEV_MOD_ as the prefixes... Here's a patch that fixes the duplicate names and updates a couple of areas for consistency with other parts of the API/spec. I renamed the ib_device_cap and ib_port structures to ib_device_attr and ib_port_attr, respectively. I didn't see that these were in use in the checked-in mthca driver yet, so I didn't create a patch for those files. Patch is not yet applied. Please respond with any comments. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 666) +++ ib_verbs.h (working copy) @@ -125,8 +125,8 @@ enum ib_device_cap_flags { IB_DEVICE_RESIZE_MAX_WR = 1, - IB_DEVICE_BAD_PKEY_CNT = (1<<1), - IB_DEVICE_BAD_QKEY_CNT = (1<<2), + IB_DEVICE_BAD_PKEY_CNTR = (1<<1), + IB_DEVICE_BAD_QKEY_CNTR = (1<<2), IB_DEVICE_RAW_MULTI = (1<<3), IB_DEVICE_AUTO_PATH_MIG = (1<<4), IB_DEVICE_CHANGE_PHY_PORT = (1<<5), @@ -148,7 +148,7 @@ IB_ATOMIC_GLOB }; -struct ib_device_cap { +struct ib_device_attr { u64 fw_ver; u64 node_guid; u64 sys_image_guid; @@ -237,7 +237,7 @@ IB_PORT_BOOT_MGMT_SUP = (1<<9) }; -struct ib_port { +struct ib_port_attr { enum ib_port_state state; enum ib_mtu max_mtu; enum ib_mtu active_mtu; @@ -257,7 +257,7 @@ }; enum ib_device_modify_flags { - IB_DEVICE_SYS_IMAGE_GUID = 1 + IB_DEVICE_MODIFY_SYS_IMAGE_GUID = 1 }; struct ib_device_modify { @@ -629,9 +629,9 @@ u32 flags; int (*query_device)(struct ib_device *device, - struct ib_device_cap *device_cap); + struct ib_device_attr *device_attr); int (*query_port)(struct ib_device *device, - u8 port_num, struct ib_port *port); + u8 port_num, struct ib_port_attr *port_attr); int (*query_gid)(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); @@ -737,16 +737,16 @@ }; static inline int ib_query_device(struct ib_device *device, - struct ib_device_cap *device_cap) + struct ib_device_attr *device_attr) { - return device->query_device(device, device_cap); + return device->query_device(device, device_attr); } static inline int ib_query_port(struct ib_device *device, u8 port_num, - struct ib_port *port) + struct ib_port_attr *port_attr) { - return device->query_port(device, port_num, port); + return device->query_port(device, port_num, port_attr); } static inline int ib_query_gid(struct ib_device *device, From roland.list at gmail.com Thu Aug 19 19:10:16 2004 From: roland.list at gmail.com (Roland Dreier) Date: Thu, 19 Aug 2004 19:10:16 -0700 Subject: [openib-general] modify_device API? In-Reply-To: <20040819085849.404391af.mshefty@ichips.intel.com> References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> <52n00taclc.fsf@topspin.com> <20040817095406.7f57bde6.mshefty@ichips.intel.com> <20040817102313.6126e078.mshefty@ichips.intel.com> <1092768520.1837.15.camel@localhost.localdomain> <20040817124403.53cda21f.mshefty@ichips.intel.com> <52657ha10y.fsf@topspin.com> <20040819085849.404391af.mshefty@ichips.intel.com> Message-ID: This looks good to me. I haven't checked anything yet in for mthca (been stuck working on other stuff lately). - R. From mshefty at ichips.intel.com Fri Aug 20 09:59:19 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 20 Aug 2004 09:59:19 -0700 Subject: [openib-general] modify_device API? In-Reply-To: References: <52n00yxzor.fsf@topspin.com> <20040816090753.4bd30a8d.mshefty@ichips.intel.com> <20040816122310.1ced998f.mshefty@ichips.intel.com> <52n00taclc.fsf@topspin.com> <20040817095406.7f57bde6.mshefty@ichips.intel.com> <20040817102313.6126e078.mshefty@ichips.intel.com> <1092768520.1837.15.camel@localhost.localdomain> <20040817124403.53cda21f.mshefty@ichips.intel.com> <52657ha10y.fsf@topspin.com> <20040819085849.404391af.mshefty@ichips.intel.com> Message-ID: <20040820095919.185cefcb.mshefty@ichips.intel.com> On Thu, 19 Aug 2004 19:10:16 -0700 Roland Dreier wrote: > This looks good to me. Changes have been committed. > I haven't checked anything yet in for mthca (been stuck working on > other stuff lately). Same here... From stan.smith at intel.com Fri Aug 20 15:11:51 2004 From: stan.smith at intel.com (Smith, Stan) Date: Fri, 20 Aug 2004 15:11:51 -0700 Subject: [openib-general] IB stack configured for kernel only support? Message-ID: Will the openIB 2.6.x stack support a 'kernel only' configuration suitable to be run from a ramdisk (initrd) boot environment? Specifically, IB startup has no dependencies on 'root' filesystem inodes. Will user-mode IB be supported in such a manner that it can be enabled/loaded 'after' the kernel IB stack has been started? Case in point, an openSSI (www.openSSI.org) kernel creates an ICS (Internode Communication Subsystem) channel during the ramdisk (initrd) phase of kernel boot. This ICS channel is expected to persist thru the mounting of the root filesystem until kernel shutdown. For Infiniband ICS support, the kernel IB stack is started with connections established during the ramdisk (initrd) phase of system booting. If the user-mode IB stack requires a 'root' inode (/dev?) for ioctl access to kernel IB layers, then the user-mode startup must be delayed until the 'root' filesystem is actually mounted. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at topspin.com Fri Aug 20 15:18:32 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 20 Aug 2004 15:18:32 -0700 Subject: [openib-general] IB stack configured for kernel only support? In-Reply-To: (Stan Smith's message of "Fri, 20 Aug 2004 15:11:51 -0700") References: Message-ID: <52fz6htpvr.fsf@topspin.com> Stan> Will the openIB 2.6.x stack support a 'kernel only' Stan> configuration suitable to be run from a ramdisk (initrd) Stan> boot environment?  Specifically, IB startup has no Stan> dependencies on 'root' filesystem inodes.Will user-mode IB Stan> be supported in such a manner that it can be enabled/loaded Stan> 'after'  the kernel IB stack has been started?Case in point, Stan> an openSSI (www.openSSI.org) kernel creates an ICS Stan> (Internode Communication Subsystem) channel during the Stan> ramdisk (initrd) phase of kernel boot. This ICS channel is Stan> expected to persist thru the mounting of the root filesystem Stan> until kernel shutdown. For Infiniband ICS support, the Stan> kernel IB stack is started with connections established Stan> during the ramdisk (initrd) phase of system booting. If the Stan> user-mode IB stack requires a 'root' inode (/dev?) for ioctl Stan> access to kernel IB layers, then the user-mode startup must Stan> be delayed until the 'root' filesystem is actually Stan> mounted. The kernel piece of the OpenIB stack works with no dependecies on userspace, if it is linked into the kernel. Obviously if modules need to be loaded then module-init-tools and possibly hotplug and others need to be running. However it's pretty standard to load modules from an initrd so even a modular stack should work. Userspace access has not been implemented yet but it will definitely be possible to start it after the kernel IB layer is running. - R. From mlleinin at hpcn.ca.sandia.gov Sat Aug 21 15:57:30 2004 From: mlleinin at hpcn.ca.sandia.gov (Matt L. Leininger) Date: Sat, 21 Aug 2004 15:57:30 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <826A2C2E68E28D4781FAA6D675A3E93802C4DC48@exch-1.topspincom.com> References: <826A2C2E68E28D4781FAA6D675A3E93802C4DC48@exch-1.topspincom.com> Message-ID: <1093129050.15905.790.camel@trinity> Is anyone going to put their version of DAPL (uDAPL and kDAPL) source code in the openib code repository? About 7 weeks ago someone was going to 'add it soon', but I still don't see anything. - Matt From roland at topspin.com Sat Aug 21 17:34:41 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 21 Aug 2004 17:34:41 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <1093129050.15905.790.camel@trinity> (Matt L. Leininger's message of "Sat, 21 Aug 2004 15:57:30 -0700") References: <826A2C2E68E28D4781FAA6D675A3E93802C4DC48@exch-1.topspincom.com> <1093129050.15905.790.camel@trinity> Message-ID: <52smagvwm6.fsf@topspin.com> Matt> Is anyone going to put their version of DAPL (uDAPL and Matt> kDAPL) source code in the openib code repository? About 7 Matt> weeks ago someone was going to 'add it soon', but I still Matt> don't see anything. I believe both the Voltaire and InfiniCon trees checked into svn include uDAPL. The Topspin uDAPL is in the infiniband-support tarball linked on http://openib.org/downloads (we never got around to fixing up our build system, and now that userspace verbs have been pushed out so far, the priority of this work has gone way down). In any case I don't think it's worth worrying about DAPL right now before we have the prerequisites squared away (userspace access for uDAPL, CM API for kDAPL). - R. From yaronh at voltaire.com Mon Aug 23 01:20:48 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Mon, 23 Aug 2004 11:20:48 +0300 Subject: [openib-general] DAPL for openib Message-ID: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Matt L. Leininger > Sent: Sunday, August 22, 2004 1:58 AM > To: openib-general at openib.org > Subject: [openib-general] DAPL for openib > > Is anyone going to put their version of DAPL (uDAPL and kDAPL) source > code in the openib code repository? About 7 weeks ago someone was going > to 'add it soon', but I still don't see anything. > > - Matt > Matt, Our trunk includes kDAPL working over our gen1 code We will port it to gen2 once things become more stable and the missing access pieces will be completed. One thing we haven't discussed yet is the gen2 CM api which is a prerequisite for most ULP's. For uDAPL we have a version out, I believe others indicated they have done more work with Oracle, and will provide an updated drop sometime ago, I think the latest code should be published regardless of the userspace support in gen2, so development will be done in the open. Another idea I suggested in the pass was to deliver all the changes back to Source Forge (maintained by Steve S) and have one unified kDAPL/uDAPL, our recent changes were updated there just few weeks ago. Maybe we should also start the discussion on the userspace support After all people would like to see MPI & uDAPL running over gen2 sooner the better Yaron From yhkim93 at keti.re.kr Mon Aug 23 04:32:18 2004 From: yhkim93 at keti.re.kr (yhkim93) Date: Mon, 23 Aug 2004 20:32:18 +0900 Subject: [openib-general] is there SRP target emulation code in the gen1? Message-ID: <000001c48904$d8ba50a0$d2abfecb@ketiresearcher> I like to make SRP target emulation developer team. Who is interesting in it? I am making SRP target module. But very difficulty to do for oneself. -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Aug 23 09:34:11 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 23 Aug 2004 12:34:11 -0400 Subject: [openib-general] [PATCH] GSI: Use ib_reg_phys_mr rather than ib_reg_mr Message-ID: <1093278851.1830.2.camel@localhost.localdomain> Use ib_reg_phys_mr rather than ib_reg_mr as ib_reg_mr is no longer a kernel supported function Index: access/ib_verbs_priv.h =================================================================== --- access/ib_verbs_priv.h (revision 634) +++ access/ib_verbs_priv.h (working copy) @@ -38,8 +38,4 @@ int proc_mad_opts, struct mad_t *mad_in, struct mad_t *mad_out); -/* temporary !!! */ -struct ib_mr *ib_reg_mr(struct ib_pd *pd, - void *addr, - u64 size, int mr_access_flags, u32 * lkey, u32 * rkey); #endif /* IB_VERBS_PRIV_H */ Index: access/TODO =================================================================== --- access/TODO (revision 647) +++ access/TODO (working copy) @@ -1,6 +1,5 @@ -8/13/04 +8/23/04 -Replace ib_reg_mr with ib_reg_phys_mr Makefile needs to use standard kbuild Migrate from /proc to /sysfs Index: access/gsi_main.c =================================================================== --- access/gsi_main.c (revision 669) +++ access/gsi_main.c (working copy) @@ -60,6 +60,7 @@ #include #include #include +#include #include #include @@ -353,6 +354,10 @@ dtgrm = (struct gsi_dtgrm_priv_st *) hca->rcv_posted_dtgrm_list. next; + pci_unmap_single(hca->handle->dma_device, + pci_unmap_addr(&dtgrm->grh, mapping), + MAD_BLOCK_SIZE + IB_GRH_LEN, + PCI_DMA_FROMDEVICE); v_list_del((struct list_head *) dtgrm); gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm); @@ -373,7 +378,9 @@ dtgrm_priv = (struct gsi_dtgrm_priv_st *) class_info-> snd_posted_dtgrm_list.next; - + pci_unmap_single(class_info->hca->handle->dma_device, + pci_unmap_addr(&dtgrm_priv->mad, mapping), + MAD_BLOCK_SIZE, PCI_DMA_TODEVICE); /* * Remove the datagram from the posted datagram list * class_info->snd_posted_dtgrm_list @@ -458,7 +465,6 @@ gsi_post_receive_dtgrms(struct gsi_hca_info_st *hca) { int ret; - u32 rkey; struct ib_recv_wr wr; struct ib_recv_wr *bad_wr; struct gsi_dtgrm_priv_st *dtgrm_priv; @@ -470,24 +476,19 @@ */ while (gsi_dtgrm_pool_get(hca->rcv_dtgrm_pool, (struct gsi_dtgrm_t **) &dtgrm_priv) == 0) { - dtgrm_priv->v_mem_h = ib_reg_mr(hca->pd, - dtgrm_priv->grh, - MAD_BLOCK_SIZE + IB_GRH_LEN, - IB_ACCESS_LOCAL_WRITE, - &dtgrm_priv->sg.lkey, &rkey); - if (IS_ERR(dtgrm_priv->v_mem_h)) { - printk(KERN_ERR \ - "Could not get general memory region\n"); - ret = PTR_ERR(dtgrm_priv->v_mem_h); - goto error1; - } /* * Setup scatter-gather list */ - dtgrm_priv->sg.addr = (unsigned long) dtgrm_priv->grh; + dtgrm_priv->sg.addr = pci_map_single(hca->handle->dma_device, + dtgrm_priv->grh, + MAD_BLOCK_SIZE + IB_GRH_LEN, + PCI_DMA_FROMDEVICE); dtgrm_priv->sg.length = MAD_BLOCK_SIZE + IB_GRH_LEN; + dtgrm_priv->sg.lkey = hca->mr->lkey; + pci_unmap_addr_set(dtgrm_priv->grh, mapping, dtgrm_priv->sg.addr); + memset(&wr, 0, sizeof (wr)); wr.wr_id = (unsigned long) dtgrm_priv; wr.sg_list = &dtgrm_priv->sg; @@ -500,18 +501,20 @@ if (!(ret = ib_post_recv(hca->qp, &wr, &bad_wr))) { printk(KERN_ERR "Could not post receive request\n"); - goto error2; + goto error1; } hca->stat.rcv_posted_cnt++; } return 0; -error2: +error1: + pci_unmap_single(hca->handle->dma_device, + pci_unmap_addr(&dtgrm_priv->grh, mapping), + MAD_BLOCK_SIZE + IB_GRH_LEN, PCI_DMA_FROMDEVICE); GSI_RCV_LIST_LOCK(hca); v_list_del((struct list_head *) dtgrm_priv); GSI_RCV_LIST_UNLOCK(hca); -error1: gsi_dtgrm_pool_put((struct gsi_dtgrm_t *) dtgrm_priv); printk(KERN_DEBUG "ret = %d\n", ret); @@ -669,6 +672,11 @@ (struct gsi_hca_info_st *) &gsi_hca_list, *entry; int ret; int cq_size; + u64 iova = 0; + struct ib_phys_buf buf_list = { + .addr = 0, + .size = (unsigned long) high_memory - PAGE_OFFSET + }; struct ib_qp_init_attr qp_init_attr; struct ib_qp_cap qp_cap; GSI_HCA_LIST_LOCK_VAR; @@ -731,6 +739,14 @@ goto error4; } + hca->mr = ib_reg_phys_mr(hca->pd, &buf_list, 1, + IB_ACCESS_LOCAL_WRITE, &iova); + if (IS_ERR(hca->mr)) { + printk(KERN_ERR "Could not register MR.\n"); + ret = PTR_ERR(hca->mr); + goto error5; + } + memset(&qp_init_attr, 0, sizeof (qp_init_attr)); qp_init_attr.send_cq = hca->cq; qp_init_attr.recv_cq = hca->cq; @@ -1217,6 +1233,10 @@ struct mad_t *mad = (struct mad_t *) &dtgrm_priv->mad; GSI_SND_LIST_LOCK_VAR; + pci_unmap_single(hca->handle->dma_device, + pci_unmap_addr(&dtgrm_priv->mad, mapping), + MAD_BLOCK_SIZE, PCI_DMA_TODEVICE); + #if 0 /* GSI_ADDRESS_HNDL_POOL_SUPPORT */ ib_put_ah(dtgrm_priv->addr_hndl); #else @@ -1634,11 +1654,14 @@ hca->stat.rcv_cnt++; hca->stat.rcv_posted_cnt--; + pci_unmap_single(hca->handle->dma_device, + pci_unmap_addr(&dtgrm_priv->grh, mapping), + MAD_BLOCK_SIZE + IB_GRH_LEN, PCI_DMA_FROMDEVICE); + /* * Remove the datagram from the posted datagram list * hca->rcv_posted_dtgrm_list */ - GSI_RCV_LIST_LOCK(hca); v_list_del((struct list_head *) dtgrm_priv); GSI_RCV_LIST_UNLOCK(hca); @@ -1898,7 +1921,6 @@ { struct ib_ah_attr addr_vec; struct ib_ah *addr_hndl; - u32 rkey; struct ib_send_wr wr; struct ib_send_wr *bad_wr; struct mad_t *mad = (struct mad_t *) dtgrm->mad; @@ -1925,20 +1947,11 @@ } #endif - dtgrm_priv->v_mem_h = ib_reg_mr(hca->pd, - mad, - MAD_BLOCK_SIZE, - IB_ACCESS_LOCAL_WRITE, - &dtgrm_priv->sg.lkey, &rkey); - if (IS_ERR(dtgrm_priv->v_mem_h)) { - printk(KERN_ERR "Could not get general memory attr.\n"); - ret = PTR_ERR(dtgrm_priv->v_mem_h); - goto error2; - } - wr.wr_id = (unsigned long) dtgrm_priv; wr.sg_list = &dtgrm_priv->sg; - wr.sg_list->addr = (unsigned long) mad; + wr.sg_list->addr = pci_map_single(hca->handle->dma_device, + mad, MAD_BLOCK_SIZE, + PCI_DMA_TODEVICE); wr.sg_list->length = MAD_BLOCK_SIZE; wr.sg_list->lkey = dtgrm_priv->sg.lkey; wr.num_sge = 1; @@ -1953,6 +1966,8 @@ mad_swap_header(mad); + pci_unmap_addr_set(&dtgrm_priv->mad, mapping, wr.sg_list->addr); + dtgrm_priv->posted++; dtgrm_priv->addr_hndl = addr_hndl; @@ -1971,18 +1986,20 @@ if ((ret = ib_post_send(hca->qp, &wr, &bad_wr))) { printk(KERN_ERR "Could not post send request\n"); - goto error3; + goto error2; } class_info->stat.snd_cnt++; return 0; -error3: +error2: + pci_unmap_single(hca->handle->dma_device, + pci_unmap_addr(&dtgrm_priv->mad, mapping), + MAD_BLOCK_SIZE, PCI_DMA_TODEVICE); GSI_SND_LIST_LOCK(class_info->hca); v_list_del((struct list_head *) dtgrm_priv); dtgrm_priv->posted--; GSI_SND_LIST_UNLOCK(class_info->hca); -error2: #if 0 /* GSI_ADDRESS_HNDL_POOL_SUPPORT */ ib_put_ah(addr_hndl); #else @@ -2031,7 +2048,6 @@ { struct ib_ah_attr addr_vec; struct ib_ah *addr_hndl; - u32 rkey; struct ib_send_wr wr; struct ib_send_wr *bad_wr; struct mad_t *mad = (struct mad_t *) dtgrm->mad; @@ -2076,19 +2092,12 @@ goto error1; } #endif - dtgrm_priv->v_mem_h = ib_reg_mr(hca->pd, - mad, - MAD_BLOCK_SIZE, - IB_ACCESS_LOCAL_WRITE, - &dtgrm_priv->sg.lkey, &rkey); - if (IS_ERR(dtgrm_priv->v_mem_h)) { - printk(KERN_ERR "Could not get general memory attr.\n"); - goto error2; - } wr.wr_id = (unsigned long) dtgrm_priv; wr.sg_list = &dtgrm_priv->sg; - wr.sg_list->addr = (unsigned long) mad; + wr.sg_list->addr = pci_map_single(hca->handle->dma_device, + mad, MAD_BLOCK_SIZE, + PCI_DMA_TODEVICE); wr.sg_list->length = MAD_BLOCK_SIZE; wr.sg_list->lkey = dtgrm_priv->sg.lkey; wr.num_sge = 1; @@ -2103,6 +2112,8 @@ mad_swap_header(mad); + pci_unmap_addr_set(&dtgrm_priv->mad, mapping, wr.sg_list->addr); + dtgrm_priv->posted++; dtgrm_priv->addr_hndl = addr_hndl; @@ -2122,6 +2133,9 @@ return 0; error2: + pci_unmap_single(hca->handle->dma_device, + pci_unmap_addr(&dtgrm_priv->mad, mapping), + MAD_BLOCK_SIZE, PCI_DMA_TODEVICE); #if 0 /* GSI_ADDRESS_HNDL_POOL_SUPPORT */ ib_put_ah(addr_hndl); #else Index: access/gsi_priv.h =================================================================== --- access/gsi_priv.h (revision 644) +++ access/gsi_priv.h (working copy) @@ -88,7 +88,6 @@ struct gsi_dtgrm_priv_st *prev; void *pool; struct ib_sge sg; - struct ib_mr *v_mem_h; int owner; /* 0 - gsi pool, 1 - user */ } __attribute__ ((packed)); @@ -97,7 +96,6 @@ struct gsi_dtgrm_priv_st *prev; void *pool; struct ib_sge sg; - struct ib_mr *v_mem_h; int owner; /* 0 - gsi pool, 1 - user */ /* @@ -116,7 +114,6 @@ struct gsi_dtgrm_priv_st *prev; void *pool; struct ib_sge sg; - struct ib_mr *v_mem_h; int owner; /* 0 - gsi pool, 1 - user */ /* @@ -194,8 +191,9 @@ u8 port; struct ib_qp *qp; /* QP */ - struct ib_cq *cq; /* Complete queue */ + struct ib_cq *cq; /* Completion queue */ struct ib_pd *pd; /* Protection domain */ + struct ib_mr *mr; /* Memory region */ #if 0 /* GSI_AH_CACHE_SUPPORT */ #define GSI_AH_CACHE_ENTRY_EMPTY 0 From bryans at aspsys.com Mon Aug 23 09:59:45 2004 From: bryans at aspsys.com (Bryan Stillwell) Date: Mon, 23 Aug 2004 10:59:45 -0600 Subject: [openib-general] OpenIB w/ SuSE 9.1 Professional Message-ID: <20040823165945.GA14927@aspsys.com> I have the task of setting up a cluster for a customer that each node contains an InfiniServ HCA and a Silicon Image SATA chipset (Sil3112). Pertinent lspci output: 01:05.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) Silicon Image SiI 3114 SATARaid Controller (rev 02) 02:03.0 PCI bridge: Mellanox Technology MT23108 PCI Bridge (rev a1) 03:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) The problem I have is getting proper driver support for both pieces of hardware at the same time. The SATA chipset seems to be supported in 2.6 kernels and 2.4 kernels starting with 2.4.27. However, InfiniCon doesn't seem to have any released drivers for those kernels and I was wondering if openib might be a good solution for me? Thanks, Bryan -- Aspen Systems, Inc. | http://www.aspsys.com/ Production Engineer | Phone: (303)431-4606 bryans at aspsys.com | Fax: (303)431-7196 From roland at topspin.com Mon Aug 23 10:33:46 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 23 Aug 2004 10:33:46 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> (Yaron Haviv's message of "Mon, 23 Aug 2004 11:20:48 +0300") References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> Message-ID: <52pt5hvjwl.fsf@topspin.com> Yaron> Maybe we should also start the discussion on the userspace Yaron> support After all people would like to see MPI & uDAPL Yaron> running over gen2 sooner the better Given the small team that is actually coding, I think we should follow a step-by-step approach and focus on getting IPoIB ready for submission to the kernel. I think the pieces we need are: - Update verbs to new API (I should finish this week) - Add driver model/sysfs support (I'll start soon) - Finish new MAD API (stalled?) - Design implement SA multicast/path record queries (not started) - clean up IPoIB driver (not started) Once we have that, we can look at userspace MAD access so that OpenSM can run on the new stack. Also the CM API would be a logical next step. I don't think it makes sense to even worry about user verbs until we've finished these pieces. - Roland From fabbri at isilon.com Mon Aug 23 15:01:41 2004 From: fabbri at isilon.com (fabbri) Date: Mon, 23 Aug 2004 15:01:41 -0700 Subject: [openib-general] Alternate Path Migration support? Message-ID: <20040823220141.GI16945@isilon.com> Hi, I am looking at using Alternate Path Migration (APM) to do failover of a Reliable Connection (RC). A couple of questions: 1. Does the openib stack on tavor H/W support APM? Anyone played with it? 2. Do I need to supply the alternate path in the ib_cm_connect, or can I later do a ib_qp_modify to supply the alternate path? If ib_qp_modify is sufficient, should this cause the LAP CM message to be sent, or is that something that the consumer has request separately (after the call to qp_modify)? Thanks, Aaron -- << Aaron Fabbri | fabbri at isilon.com >> From roland at topspin.com Mon Aug 23 15:49:40 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 23 Aug 2004 15:49:40 -0700 Subject: [openib-general] Alternate Path Migration support? In-Reply-To: <20040823220141.GI16945@isilon.com> (fabbri@isilon.com's message of "Mon, 23 Aug 2004 15:01:41 -0700") References: <20040823220141.GI16945@isilon.com> Message-ID: <52brh1tqpn.fsf@topspin.com> fabbri> 1. Does the openib stack on tavor H/W support APM? Anyone fabbri> played with it? All the older stacks should support it, and I have tested it. I still have not implemented setting of the alternate path in the new mthca driver (although it is not much work), so the gen2 stack doesn't support APM yet. fabbri> 2. Do I need to supply the alternate path in the fabbri> ib_cm_connect, or can I later do a ib_qp_modify to supply fabbri> the alternate path? fabbri> If ib_qp_modify is sufficient, should this cause the LAP fabbri> CM message to be sent, or is that something that the fabbri> consumer has request separately (after the call to fabbri> qp_modify)? You can supply an alternate path to ib_cm_connect(), or load an alternate path after the connection is established using ib_cm_alternate_path_load(), which will cause a LAP to be sent, which will make the other side get a TS_IB_CM_LAP_RECEIVED callback, etc. You shouldn't use ib_qp_modify() while the CM owns the QP state -- in particular it definitely won't send a LAP. - R. From michael at mellanox.co.il Mon Aug 23 16:12:43 2004 From: michael at mellanox.co.il (Michael Kagan) Date: Tue, 24 Aug 2004 02:12:43 +0300 Subject: [openib-general] Alternate Path Migration support? Message-ID: <506C3D7B14CDD411A52C00025558DED605C224D2@mtlex01.yok.mtl.com> Tavor HW supports APM. Support is fully compliant to the IB spec (e.g. alternative path load, migration and notification). Michael Kagan VP of Architecture Mellanox Technologies, Ltd mailto:michael at mellanox.co.il Tel +972-4-9097200 Fax +972-4-9593245 Cellular +972-54-478807 (Israel) Cellular (408)-802-0838 (USA) P.O.B 86, Yokneam, 20692, Israel [sent from Blackberry] -----Original Message----- From: fabbri To: openib-general at openib.org Sent: Mon Aug 23 15:01:41 2004 Subject: [openib-general] Alternate Path Migration support? Hi, I am looking at using Alternate Path Migration (APM) to do failover of a Reliable Connection (RC). A couple of questions: 1. Does the openib stack on tavor H/W support APM? Anyone played with it? 2. Do I need to supply the alternate path in the ib_cm_connect, or can I later do a ib_qp_modify to supply the alternate path? If ib_qp_modify is sufficient, should this cause the LAP CM message to be sent, or is that something that the consumer has request separately (after the call to qp_modify)? Thanks, Aaron -- << Aaron Fabbri | fabbri at isilon.com >> _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From itoumsn at nttdata.co.jp Tue Aug 24 04:36:16 2004 From: itoumsn at nttdata.co.jp (Masanori ITOH) Date: Tue, 24 Aug 2004 20:36:16 +0900 (JST) Subject: [openib-general] DAPL for openib In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> Message-ID: <20040824.203616.12327924.itoumsn@nttdata.co.jp> Hi all, I also have a working implementation of u/kDAPL on top of OpenIB stack. It's based on the beta2.04 of reference implementation and the OpenIB uDAPL contained in ib-support-2.0 tar ball. The kDAPL portion uses the gen1 OpenIB API. The uDAPL portion is just a porting of OpenIB uDAPL into the beta2.04 and so still contains VAPI dependent codes. Although my u/kDAPL is still a quick hack thing, but anyway I succeeded to run some performance test programs and a DAFS server/client suite. I think my work also can be a start point of the pure OpenIB u/kDAPL development, and I'm talking with my bosses making that public in these days. From: "Yaron Haviv" Subject: RE: [openib-general] DAPL for openib Date: Mon, 23 Aug 2004 11:20:48 +0300 [snip] > Our trunk includes kDAPL working over our gen1 code > We will port it to gen2 once things become more stable and the missing > access pieces will be completed. One thing we haven't discussed yet is > the gen2 CM api which is a prerequisite for most ULP's. Which 'gen1' API and 'gen2' API do you mean? 'gen1' of Voltaire stack, and 'gen2' of OpenIB stack? > For uDAPL we have a version out, I believe others indicated they have > done more work with Oracle, and will provide an updated drop sometime > ago, I think the latest code should be published regardless of the > userspace support in gen2, so development will be done in the open. > > Another idea I suggested in the pass was to deliver all the changes back > to Source Forge (maintained by Steve S) and have one unified > kDAPL/uDAPL, our recent changes were updated there just few weeks ago. I was also thinking about that. > Maybe we should also start the discussion on the userspace support > After all people would like to see MPI & uDAPL running over gen2 sooner > the better Thanks, Masanori --- Masanori ITOH Open Source Software Development Center, NTT DATA CORPORATION e-mail: itoumsn at nttdata.co.jp phone : +81-3-3523-8122 (ext. 7354) From halr at voltaire.com Tue Aug 24 07:35:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 24 Aug 2004 10:35:24 -0400 Subject: [openib-general] mthca_mad.c Message-ID: <1093358123.1831.14.camel@localhost.localdomain> Hi, Shouldn't most of the functionality in mthca_mad.c be above the driver rather than part of the driver ? It seems to me that most of this is part of the access layer rather than the driver. The only thing needed is for the driver to be able to perform the process_local_mad command. Also, a nit: In Sean's ib_verbs.h it is process_mad and whereas in Roland's version it is mad_process. -- Hal From halr at voltaire.com Tue Aug 24 07:50:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 24 Aug 2004 10:50:01 -0400 Subject: [openib-general] DAPL for openib In-Reply-To: <52pt5hvjwl.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> Message-ID: <1093359001.1831.28.camel@localhost.localdomain> On Mon, 2004-08-23 at 13:33, Roland Dreier wrote: > Given the small team that is actually coding, Yes, it currently does appear to be a small team :-( Maybe it will get larger once the core is nearer to completion. > I think we should follow > a step-by-step approach and focus on getting IPoIB ready for > submission to the kernel. I think the pieces we need are: > > - Update verbs to new API (I should finish this week) > - Add driver model/sysfs support (I'll start soon) > - Finish new MAD API (stalled?) Can you elaborate on what you mean by this ? Are there API issues (other than the ones deferred) or is this referring to an implementation to go with the API ? > - Design implement SA multicast/path record queries (not started) > - clean up IPoIB driver (not started) Step by step but not necessarily serialized unless resources dictate this. > Once we have that, we can look at userspace MAD access so that OpenSM > can run on the new stack. Also the CM API would be a logical next > step. I don't think it makes sense to even worry about user verbs > until we've finished these pieces. This ordering makes sense to me, At the point there is a (kernel) CM API, the kernel ULPs are "enabled" (SDP, SRP, kDAPL). Hopefully we can be more parallelized by then. -- Hal From roland at topspin.com Tue Aug 24 08:54:34 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 08:54:34 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <1093359001.1831.28.camel@localhost.localdomain> (Hal Rosenstock's message of "Tue, 24 Aug 2004 10:50:01 -0400") References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> Message-ID: <52zn4kr0p1.fsf@topspin.com> Roland> - Finish new MAD API (stalled?) Hal> Can you elaborate on what you mean by this ? Are there API Hal> issues (other than the ones deferred) or is this referring to Hal> an implementation to go with the API ? I haven't been following this very closely but I didn't feel we had reached a final form of the API, and Sean's ib_mad.h has not been updated for several weeks. Also (correct me if I'm wrong) you seem to be working on an implementation of a different API. Also, I can think of at least one issue with the ib_mad.h API, and other people probably have their own issues. My question is how we should split SM class queries such as PortInfo, which go to the SMA, from queries such as SMInfo, which needs to be passed to userspace to be handled by the SM. Right now the ib_mad_reg() call just takes class, version and method, but both PortInfo and SMInfo will be identical in those attributes. - Roland From roland at topspin.com Tue Aug 24 08:59:43 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 08:59:43 -0700 Subject: [openib-general] mthca_mad.c In-Reply-To: <1093358123.1831.14.camel@localhost.localdomain> (Hal Rosenstock's message of "Tue, 24 Aug 2004 10:35:24 -0400") References: <1093358123.1831.14.camel@localhost.localdomain> Message-ID: <52vff8r0gg.fsf@topspin.com> Hal> Shouldn't most of the functionality in mthca_mad.c be above Hal> the driver rather than part of the driver ? It seems to me Hal> that most of this is part of the access layer rather than the Hal> driver. We had some discussion about how to implement the SMA earlier, and it was suggested that more of the SMA should be handled in common code. However my feeling is that the SMA is quite device-specific and the low-level driver should have maximum flexibility about how to handle things. For example, the Tavor generates traps that need to be explicitly forwarded to the SM. Hal> Also, a nit: In Sean's ib_verbs.h it is process_mad and Hal> whereas in Roland's version it is mad_process. This is because I have not yet updated that entry point in my tree. In my old API, I named functions like {OBJECT}_{METHOD} (so that eg qp_create, qp_destroy, qp_modify etc all sort together), while Sean uses a {METHOD}_{OBJECT} naming convention. So this and all the other naming differences will be fixed as I finish merging with Sean's ib_verbs.h. - Roland From rminnich at lanl.gov Tue Aug 24 09:04:01 2004 From: rminnich at lanl.gov (ron minnich) Date: Tue, 24 Aug 2004 10:04:01 -0600 (MDT) Subject: [openib-general] one request Message-ID: Ollie Lo found in the latest mellanox gold source tree a circular dependency. The tavor driver depended on the mad symbols being loaded, but correct initialization of the mad layer required that the tavor driver be loaded. The way it was resolved in that code was to have the mad module demand-load the tavor driver (i.e. from the kernel). This solution is not workable in bproc systems, so I am posting this letter to request that nobody use this type of solution in the newer openib software. We void daemons like the plague, so we're not going to run daemons that allow this type of demand-loading of drivers. Please don't have kernel modules demand load other kernel modules. Thanks ron From roland at topspin.com Tue Aug 24 09:50:01 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 09:50:01 -0700 Subject: [openib-general] one request In-Reply-To: (ron minnich's message of "Tue, 24 Aug 2004 10:04:01 -0600 (MDT)") References: Message-ID: <52n00kqy4m.fsf@topspin.com> ron> Ollie Lo found in the latest mellanox gold source tree a ron> circular dependency. The tavor driver depended on the mad ron> symbols being loaded, but correct initialization of the mad ron> layer required that the tavor driver be loaded. I don't believe there is a circular dependency (otherwise it would be impossible to load the modules). The problem is actually that the MAD module does _not_ depend on the Tavor driver, so module dependencies don't automatically load the Tavor driver. It should work fine to either load the MAD driver and then load the Tavor driver, or just modprobe the Tavor driver, which will bring in the Tavor driver and all its dependencies (including the MAD driver). - Roland From roland at topspin.com Tue Aug 24 09:50:49 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 09:50:49 -0700 Subject: [openib-general] one request In-Reply-To: (ron minnich's message of "Tue, 24 Aug 2004 10:04:01 -0600 (MDT)") References: Message-ID: <52isb8qy3a.fsf@topspin.com> ron> This solution is not workable in bproc systems, so I am ron> posting this letter to request that nobody use this type of ron> solution in the newer openib software. We void daemons like ron> the plague, so we're not going to run daemons that allow this ron> type of demand-loading of drivers. Please don't have kernel ron> modules demand load other kernel modules. Oh yeah... one other point... request_module() does not use any daemon. The kernel simply execs modprobe to load the module it's looking for. - R. From mshefty at ichips.intel.com Tue Aug 24 08:52:23 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 24 Aug 2004 08:52:23 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <52zn4kr0p1.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> Message-ID: <20040824085223.71efa553.mshefty@ichips.intel.com> On Tue, 24 Aug 2004 08:54:34 -0700 Roland Dreier wrote: > Roland> - Finish new MAD API (stalled?) > > Hal> Can you elaborate on what you mean by this ? Are there API > Hal> issues (other than the ones deferred) or is this referring to > Hal> an implementation to go with the API ? > > I haven't been following this very closely but I didn't feel we had > reached a final form of the API, and Sean's ib_mad.h has not been > updated for several weeks. Also (correct me if I'm wrong) you seem to > be working on an implementation of a different API. I thought that we were pretty much in agreement with the API as defined in ib_mad.h. There are some MAD related issues in trunk/contrib/intel/TODO, however. We decided to defer some of the QP redirection issues on the client side, but that may or may not affect the API. > Also, I can think of at least one issue with the ib_mad.h API, and > other people probably have their own issues. My question is how we > should split SM class queries such as PortInfo, which go to the SMA, > from queries such as SMInfo, which needs to be passed to userspace to > be handled by the SM. Right now the ib_mad_reg() call just takes > class, version and method, but both PortInfo and SMInfo will be > identical in those attributes. We need to discuss how to route MADs to the SMA. Are you suggesting that the SMA would register for MADs the same as any other client? My plan was to extend the ib_mad_reg_req structure as needed. I limited it to version, class, method currently, since it easily allows O(1) routing. We could add in AttributeID and AttributeModifier if those are needed. From roland at topspin.com Tue Aug 24 09:57:48 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 09:57:48 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <20040824085223.71efa553.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 24 Aug 2004 08:52:23 -0700") References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> Message-ID: <52eklwqxrn.fsf@topspin.com> Sean> I thought that we were pretty much in agreement with the API Sean> as defined in ib_mad.h. There are some MAD related issues Sean> in trunk/contrib/intel/TODO, however. We decided to defer Sean> some of the QP redirection issues on the client side, but Sean> that may or may not affect the API. That seems OK to me. Is anyone working on implementing the API? Sean> We need to discuss how to route MADs to the SMA. Are you Sean> suggesting that the SMA would register for MADs the same as Sean> any other client? My plan was to extend the ib_mad_reg_req Sean> structure as needed. I limited it to version, class, method Sean> currently, since it easily allows O(1) routing. We could Sean> add in AttributeID and AttributeModifier if those are Sean> needed. I don't know how the SMA should be handled. In the Topspin drivers it's a bit of a special case -- it gets to see all MADs before any other consumers. We could do the same, or have the SMA register. - R. From rminnich at lanl.gov Tue Aug 24 09:59:37 2004 From: rminnich at lanl.gov (ron minnich) Date: Tue, 24 Aug 2004 10:59:37 -0600 (MDT) Subject: [openib-general] one request In-Reply-To: Message-ID: On Tue, 24 Aug 2004, ron minnich wrote: > openib software. We void daemons like the plague, so we're not going to ^^^^ oops. we avoid daemons like the plague. :-) ron From rminnich at lanl.gov Tue Aug 24 10:03:41 2004 From: rminnich at lanl.gov (ron minnich) Date: Tue, 24 Aug 2004 11:03:41 -0600 (MDT) Subject: [openib-general] one request In-Reply-To: <52n00kqy4m.fsf@topspin.com> Message-ID: On Tue, 24 Aug 2004, Roland Dreier wrote: > ron> Ollie Lo found in the latest mellanox gold source tree a > ron> circular dependency. The tavor driver depended on the mad > ron> symbols being loaded, but correct initialization of the mad > ron> layer required that the tavor driver be loaded. > > I don't believe there is a circular dependency (otherwise it would be > impossible to load the modules). The problem is actually that the MAD > module does _not_ depend on the Tavor driver, so module dependencies > don't automatically load the Tavor driver. no, that's not what the code is doing. Have you run this stuff? It's really clear if you do. tavor load fails on an ib_mad symbol, and if you load mad without loading tavor, then the interface list won't get initialized correctly. So what the ib_mad stuff does is demand-load the tavor driver about halfway through its module init. I'll find the source and quote chapter and verse if that helps :-) > It should work fine to either load the MAD driver and then load the > Tavor driver, or just modprobe the Tavor driver, which will bring in > the Tavor driver and all its dependencies (including the MAD driver). it doesn't, at least on mellanox gold cd as of the most recent download we did. ron From rminnich at lanl.gov Tue Aug 24 10:04:58 2004 From: rminnich at lanl.gov (ron minnich) Date: Tue, 24 Aug 2004 11:04:58 -0600 (MDT) Subject: [openib-general] one request In-Reply-To: <52isb8qy3a.fsf@topspin.com> Message-ID: On Tue, 24 Aug 2004, Roland Dreier wrote: > Oh yeah... one other point... request_module() does not use any > daemon. The kernel simply execs modprobe to load the module it's > looking for. oops, my bad. Stupid of me! running modprobe is still not acceptable, however, on bproc systems. There's way too many assumptions going on about local file systems and binary availability that fail badly on bproc nodes. ron From roland at topspin.com Tue Aug 24 10:08:51 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 10:08:51 -0700 Subject: [openib-general] one request In-Reply-To: (ron minnich's message of "Tue, 24 Aug 2004 11:03:41 -0600 (MDT)") References: Message-ID: <52acwkqx98.fsf@topspin.com> Roland> I don't believe there is a circular dependency (otherwise Roland> it would be impossible to load the modules). The problem Roland> is actually that the MAD module does _not_ depend on the Roland> Tavor driver, so module dependencies don't automatically Roland> load the Tavor driver. ron> no, that's not what the code is doing. ron> Have you run this stuff? It's really clear if you do. I wrote most of it :) I haven't tried the Mellanox gold CD but it sounds like they may have pulled a broken/old tree. ron> tavor load fails on an ib_mad symbol, and if you load mad ron> without loading tavor, then the interface list won't get ron> initialized correctly. So what the ib_mad stuff does is ron> demand-load the tavor driver about halfway through its module ron> init. For a while I did have a request_module() of infiniband_hca in mad_main.c at the very end of initialization, to help people who forgot to load the ib_tavor module. However that broke on 2.6 because it seems module symbols are not visible until init_module() has returned, so I took the request_module() out. I don't think the request_module was ever halfway through initialization. You can look at the gen1 tree to see how things stood before we gave up on that development. ron> I'll find the source and quote chapter and verse if that ron> helps :-) That would be interesting.... - R. From rminnich at lanl.gov Tue Aug 24 10:18:56 2004 From: rminnich at lanl.gov (ron minnich) Date: Tue, 24 Aug 2004 11:18:56 -0600 (MDT) Subject: [openib-general] one request In-Reply-To: <52acwkqx98.fsf@topspin.com> Message-ID: On Tue, 24 Aug 2004, Roland Dreier wrote: > ron> tavor load fails on an ib_mad symbol, and if you load mad > ron> without loading tavor, then the interface list won't get > ron> initialized correctly. So what the ib_mad stuff does is > ron> demand-load the tavor driver about halfway through its module > ron> init. > > For a while I did have a request_module() of infiniband_hca in > mad_main.c at the very end of initialization, to help people who > forgot to load the ib_tavor module. However that broke on 2.6 because > it seems module symbols are not visible until init_module() has > returned, so I took the request_module() out. I don't think the > request_module was ever halfway through initialization. ah ha! yes, we're on 2.6, and yes, that request_module of infiniband_hca is in there, and no, that is not at the very end any more, from my memory of looking at this code 10 days ago. And yes, your description of the behavior is close to what we're seeing. I guess it's old code. > You can look at the gen1 tree to see how things stood before we gave > up on that development. we're just trying to get gold cd running, and then once that's go we'll have a 'working' stack for users, at which point we move to gen2 for the real long-term code. thanks again, Roland, you cleared some things up for me. ron From roland at topspin.com Tue Aug 24 10:33:11 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 10:33:11 -0700 Subject: [openib-general] one request In-Reply-To: (ron minnich's message of "Tue, 24 Aug 2004 11:18:56 -0600 (MDT)") References: Message-ID: <526578qw4o.fsf@topspin.com> ron> ah ha! yes, we're on 2.6, and yes, that request_module of ron> infiniband_hca is in there, and no, that is not at the very ron> end any more, from my memory of looking at this code 10 days ron> ago. ron> And yes, your description of the behavior is close to what ron> we're seeing. I guess it's old code. Seems so... I just looked back at the gen1 and this all was fixed up in revision 128 from May 6 2004. ron> we're just trying to get gold cd running, and then once ron> that's go we'll have a 'working' stack for users, at which ron> point we move to gen2 for the real long-term code. ron> thanks again, Roland, you cleared some things up for me. My pleasure... by the way, your main point of not relying on request_module() is well taken and I have no plans to add any code like that in the future. - R. From tduffy at sun.com Tue Aug 24 11:05:19 2004 From: tduffy at sun.com (Tom Duffy) Date: Tue, 24 Aug 2004 11:05:19 -0700 Subject: [openib-general] [PATCH] gen2/roland compile fixes for 2.6.9-rc1 Message-ID: <1093370719.13962.10.camel@duffman> This patch updates the locking mechanism introduced in 2.6.9-rc1. I modeled the fixes based off of changes that went it across the kernel in that update. Please look over for correctness before applying. Signed-off-by: Tom Duffy with permission from Sun legal. Index: drivers/infiniband/core/mad_static.c =================================================================== --- drivers/infiniband/core/mad_static.c (revision 678) +++ drivers/infiniband/core/mad_static.c (working copy) @@ -53,16 +53,15 @@ continue; } - idev = in_dev_get(dev); - if (!idev) { + rcu_read_lock(); + idev = __in_dev_get(dev); + if (idev == NULL) { + rcu_read_unlock(); continue; } - read_lock(&idev->lock); - - if (!idev->ifa_list) { - read_unlock(&idev->lock); - in_dev_put(idev); + if (idev->ifa_list == NULL) { + rcu_read_unlock(); continue; } @@ -86,8 +85,7 @@ dev->name, i[0], i[1], i[2], i[3], lid_base); } - read_unlock(&idev->lock); - in_dev_put(idev); + rcu_read_unlock(); break; } read_unlock(&dev_base_lock); Index: drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 678) +++ drivers/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -961,7 +961,7 @@ list_for_each_entry(mcast, &priv->multicast_list, list) clear_bit(IPOIB_MCAST_FLAG_FOUND, &mcast->flags); - read_lock(&in_dev->lock); + read_lock(&in_dev->mc_list_lock); /* Mark all of the entries that are found or don't exist */ for (im = in_dev->mc_list; im; im = im->next) { @@ -1026,7 +1026,7 @@ } } - read_unlock(&in_dev->lock); + read_unlock(&in_dev->mc_list_lock); /* Remove all of the entries don't exist anymore */ list_for_each_entry_safe(mcast, tmcast, &priv->multicast_list, list) { -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From roland at topspin.com Tue Aug 24 12:08:53 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 12:08:53 -0700 Subject: [openib-general] Re: [PATCH] gen2/roland compile fixes for 2.6.9-rc1 In-Reply-To: <1093370719.13962.10.camel@duffman> (Tom Duffy's message of "Tue, 24 Aug 2004 11:05:19 -0700") References: <1093370719.13962.10.camel@duffman> Message-ID: <52r7pwpd4q.fsf@topspin.com> Tom> This patch updates the locking mechanism introduced in Tom> 2.6.9-rc1. I modeled the fixes based off of changes that Tom> went it across the kernel in that update. Please look over Tom> for correctness before applying. Thanks, this looks good to me (not that I'm a network layer locking expert). I think I'll hold off comitting until the real 2.6.9 release comes out though (same policy I used for my incompatible MSI API changes). Those who run bleeding-edge kernels can just apply this for their local trees. Thanks, Roland From roland at topspin.com Tue Aug 24 12:14:42 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 24 Aug 2004 12:14:42 -0700 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <1092678729.2752.12.camel@duffman> (Tom Duffy's message of "Mon, 16 Aug 2004 10:52:09 -0700") References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> Message-ID: <52n00kpcv1.fsf@topspin.com> So I thought about this a fair bit and I think what I would rather see would be something using something like module_param(debug_level, int, 0644); which would put a root-settable debug_level in sysfs. I still need to work out the best way to handle the name of the debug_level variable though (to avoid symbol clashes between different multi-file modules compiled into a monolithic kernel). - R. From yaronh at voltaire.com Tue Aug 24 13:48:02 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Tue, 24 Aug 2004 23:48:02 +0300 Subject: [openib-general] DAPL for openib Message-ID: <35EA21F54A45CB47B879F21A91F4862F1DEA36@taurus.voltaire.com> > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Sean Hefty > Sent: Tuesday, August 24, 2004 6:52 PM > To: Roland Dreier > Cc: openib-general at openib.org > Subject: Re: [openib-general] DAPL for openib > > On Tue, 24 Aug 2004 08:54:34 -0700 > Roland Dreier wrote: > > > I haven't been following this very closely but I didn't feel we had > > reached a final form of the API, and Sean's ib_mad.h has not been > > updated for several weeks. Also (correct me if I'm wrong) you seem to > > be working on an implementation of a different API. > > I thought that we were pretty much in agreement with the API as defined in > ib_mad.h. There are some MAD related issues in trunk/contrib/intel/TODO, > however. We decided to defer some of the QP redirection issues on the > client side, but that may or may not affect the API. I was also under the impression at least the GSI was agreed on Hal is working on implementing it, if it wasn't clear from his postings > > > Also, I can think of at least one issue with the ib_mad.h API, and > > other people probably have their own issues. My question is how we > > should split SM class queries such as PortInfo, which go to the SMA, > > from queries such as SMInfo, which needs to be passed to userspace to > > be handled by the SM. Right now the ib_mad_reg() call just takes > > class, version and method, but both PortInfo and SMInfo will be > > identical in those attributes. > > We need to discuss how to route MADs to the SMA. Are you suggesting that > the SMA would register for MADs the same as any other client? My plan was > to extend the ib_mad_reg_req structure as needed. I limited it to > version, class, method currently, since it easily allows O(1) routing. We > could add in AttributeID and AttributeModifier if those are needed. Any reason to keep the same registration API for both SMA (QP0) and GSI ? There is very little in common between the two, beside the fact that both use MAD's Yaron From Tom.Duffy at Sun.COM Tue Aug 24 13:49:08 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Tue, 24 Aug 2004 13:49:08 -0700 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <52n00kpcv1.fsf@topspin.com> References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> <52n00kpcv1.fsf@topspin.com> Message-ID: <1093380548.13962.15.camel@duffman> On Tue, 2004-08-24 at 12:14 -0700, Roland Dreier wrote: > So I thought about this a fair bit and I think what I would rather see > would be something using something like > > module_param(debug_level, int, 0644); > > which would put a root-settable debug_level in sysfs. I still need to > work out the best way to handle the name of the debug_level variable > though (to avoid symbol clashes between different multi-file modules > compiled into a monolithic kernel). So, would this be a per module variable? So, there would be like 10 or 15 parameters to set? Or would this be set in the base module that provides the debug printing and everyone would be affected? -tduffy -- "When they took the 4th Amendment, I was quiet because I didn't deal drugs. When they took the 6th Amendment, I was quiet because I am innocent. When they took the 2nd Amendment, I was quiet because I don't own a gun. Now they have taken the 1st Amendment, and I can only be quiet." --Lyle Myhr -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mshefty at ichips.intel.com Tue Aug 24 12:47:35 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 24 Aug 2004 12:47:35 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <52eklwqxrn.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> Message-ID: <20040824124735.3d25da95.mshefty@ichips.intel.com> On Tue, 24 Aug 2004 09:57:48 -0700 Roland Dreier wrote: > Sean> I thought that we were pretty much in agreement with the API > Sean> as defined in ib_mad.h. There are some MAD related issues > Sean> in trunk/contrib/intel/TODO, however. We decided to defer > Sean> some of the QP redirection issues on the client side, but > Sean> that may or may not affect the API. > > That seems OK to me. Is anyone working on implementing the API? I was assuming that Hal was working towards this API. I've been working on other issues the past couple of weeks (covering for vacations), but do have plans on contributing to the MAD implemention after finishing the verbs portion. From halr at voltaire.com Tue Aug 24 14:35:07 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 24 Aug 2004 17:35:07 -0400 Subject: [openib-general] DAPL for openib In-Reply-To: <20040824124735.3d25da95.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040824124735.3d25da95.mshefty@ichips.intel.com> Message-ID: <1093383307.5290.13.camel@localhost.localdomain> On Tue, 2004-08-24 at 15:47, Sean Hefty wrote: > On Tue, 24 Aug 2004 09:57:48 -0700 > Roland Dreier wrote: > Is anyone working on implementing the API? > > I was assuming that Hal was working towards this API. I've had this in my TODO list (Update API to proposed openib GSI interface (ib_mad.h)) and as of today I've started implementing this. It will likely turn out to be a rewrite with code liberally borrowed. The first cut will be without RMPP and without redirection. I am hoping to have this out by the end of next week. I may post something sooner to get some early feedback. -- Hal From mvonwyl at bluewin.ch Wed Aug 25 06:48:49 2004 From: mvonwyl at bluewin.ch (mvonwyl at bluewin.ch) Date: Wed, 25 Aug 2004 15:48:49 +0200 Subject: [openib-general] VAPI programming question Message-ID: <40F7A9ED00119618@mssbzhb-int.msg.bluewin.ch> Until now I made some samples using the vapi where the user must pass manually the lid/gid and qp number from one process (sender) to the other (receiver). Could anyone write me how to communicate the qp number from one part to another (I tried using the gsi but I don't think it is a good idea. Using a multicast group is a solution but I don't find it elegant)? Thanks From halr at voltaire.com Wed Aug 25 07:05:35 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 25 Aug 2004 10:05:35 -0400 Subject: [openib-general] VAPI programming question References: <40F7A9ED00119618@mssbzhb-int.msg.bluewin.ch> Message-ID: <000901c48aac$992d3020$6401a8c0@comcast.net> mvonwyl at bluewin.ch wrote: > Until now I made some samples using the vapi where the user must pass > manually the lid/gid and qp number from one process (sender) to the > other (receiver). Could anyone write me how to communicate the qp > number from one part to another (I tried using the gsi but I don't > think it is a good idea. Using a multicast group is a solution but I > don't find it elegant)? SA ServiceRecords would be one solution. The receiver would register its service with the SA and the sender would query for that service and obtain the relevant info to do a path query to get the rest of the info and then a connect. -- Hal From halr at voltaire.com Wed Aug 25 08:31:07 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 25 Aug 2004 11:31:07 -0400 Subject: [openib-general] ib_verbs.h ib_device_attr device type Message-ID: <1093447867.1832.4.camel@localhost.localdomain> Is there a way to determine whether a device is a HCA, switch, or router ? Does there need to be another field in ib_device_attr for this ? -- Hal From roland at topspin.com Wed Aug 25 09:30:29 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 09:30:29 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <1093383307.5290.13.camel@localhost.localdomain> (Hal Rosenstock's message of "Tue, 24 Aug 2004 17:35:07 -0400") References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040824124735.3d25da95.mshefty@ichips.intel.com> <1093383307.5290.13.camel@localhost.localdomain> Message-ID: <52isb7dvtm.fsf@topspin.com> Hal> I've had this in my TODO list (Update API to proposed openib Hal> GSI interface (ib_mad.h)) and as of today I've started Hal> implementing this. That's great, thanks for the clarification. - Roland From roland at topspin.com Wed Aug 25 09:36:42 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 09:36:42 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F1DEA36@taurus.voltaire.com> (Yaron Haviv's message of "Tue, 24 Aug 2004 23:48:02 +0300") References: <35EA21F54A45CB47B879F21A91F4862F1DEA36@taurus.voltaire.com> Message-ID: <52eklvdvj9.fsf@topspin.com> Yaron> Any reason to keep the same registration API for both SMA Yaron> (QP0) and GSI ? There is very little in common between the Yaron> two, beside the fact that both use MAD's No reason they have to be the same, but on the other hand I think we need a good reason to split the API, since some consumers will want both QP0 and QP1 MADs (eg low-level driver implements both SMA and PMA, subnet manager needs both SM and SA, etc). Also, most of the QP0/QP1 handling code should probably be shared. It seems creating PD, creating CQs, registering memory, creating special QPs, moving QP reset->init->rtr->rts, allocating receive buffers and posting receives, queuing and posting sends, processing completion events and CQ entries, etc. can all be done in common code. So if we want separate GSI and SMI implementations, then we should probably work on an API for this common MAD layer so that it can be abstracted and used by both the GSI and SMI. - Roland From roland at topspin.com Wed Aug 25 09:39:52 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 09:39:52 -0700 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <1093380548.13962.15.camel@duffman> (Tom Duffy's message of "Tue, 24 Aug 2004 13:49:08 -0700") References: <1092088324.14886.20.camel@duffman> <52isbr8yyu.fsf@topspin.com> <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> <52n00kpcv1.fsf@topspin.com> <1093380548.13962.15.camel@duffman> Message-ID: <52acwjdvdz.fsf@topspin.com> Tom> So, would this be a per module variable? So, there would be Tom> like 10 or 15 parameters to set? Tom> Or would this be set in the base module that provides the Tom> debug printing and everyone would be affected? I would say per-module, although there would probably end up being a fair number of parameters. Someone could easily write a userspace wrapper to make a nice interface to debuglevel setting though. The reason I don't like having it in a base module is that it doesn't let you just turn on one module's tracing -- eg if I'm debugging IPoIB, I just want IPoIB tracing. - Roland From roland at topspin.com Wed Aug 25 09:41:42 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 09:41:42 -0700 Subject: [openib-general] ib_verbs.h ib_device_attr device type In-Reply-To: <1093447867.1832.4.camel@localhost.localdomain> (Hal Rosenstock's message of "Wed, 25 Aug 2004 11:31:07 -0400") References: <1093447867.1832.4.camel@localhost.localdomain> Message-ID: <526577dvax.fsf@topspin.com> Hal> Is there a way to determine whether a device is a HCA, Hal> switch, or router ? Does there need to be another field in Hal> ib_device_attr for this ? I would use the flags member of struct ib_device... add something like enum { IB_DEV_FLAG_IS_SWITCH = 1, /* etc */ }; - R. From halr at voltaire.com Wed Aug 25 09:59:30 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 25 Aug 2004 12:59:30 -0400 Subject: [openib-general] ib_verbs.h ib_device_attr device type In-Reply-To: <526577dvax.fsf@topspin.com> References: <1093447867.1832.4.camel@localhost.localdomain> <526577dvax.fsf@topspin.com> Message-ID: <1093453170.1832.7.camel@localhost.localdomain> On Wed, 2004-08-25 at 12:41, Roland Dreier wrote: > Hal> Is there a way to determine whether a device is a HCA, > Hal> switch, or router ? Does there need to be another field in > Hal> ib_device_attr for this ? > > I would use the flags member of struct ib_device... add something like > > enum { > IB_DEV_FLAG_IS_SWITCH = 1, > /* etc */ > }; Sounds like a good solution to me. -- Hal From mshefty at ichips.intel.com Wed Aug 25 09:56:13 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 09:56:13 -0700 Subject: [openib-general] VAPI programming question In-Reply-To: <40F7A9ED00119618@mssbzhb-int.msg.bluewin.ch> References: <40F7A9ED00119618@mssbzhb-int.msg.bluewin.ch> Message-ID: <20040825095613.35dd2058.mshefty@ichips.intel.com> On Wed, 25 Aug 2004 15:48:49 +0200 mvonwyl at bluewin.ch wrote: > Until now I made some samples using the vapi where the user must pass manually > the lid/gid and qp number from one process (sender) to the other (receiver). > Could anyone write me how to communicate the qp number from one part to another > (I tried using the gsi but I don't think it is a good idea. Using a multicast > group is a solution but I don't find it elegant)? It sounds like you want to use the CM (either the connection protocol or SIDR, depending on what you're trying to do). The CM uses the GSI, so if the GSI works for you, I'd say go ahead and use it, even if you aren't using the CM protocol currently. From halr at voltaire.com Wed Aug 25 12:28:08 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 25 Aug 2004 15:28:08 -0400 Subject: [openib-general] ib_mad.h GSI QP1 Well Known QKey Message-ID: <1093462088.1832.12.camel@localhost.localdomain> Should there be a common definition for the GSI QP1 well known QKey (0x80010000) in ib_mad.h ? Also, a typo on line 104: * @hi_tid - Access layer assigned transition ID for this client. s.b. * @hi_tid - Access layer assigned transaction ID for this client. -- Hal From mst at mellanox.co.il Wed Aug 25 12:43:44 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Aug 2004 22:43:44 +0300 Subject: [openib-general] DAPL for openib In-Reply-To: <52eklwqxrn.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> Message-ID: <20040825194344.GA2399@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] DAPL for openib": > > Sean> We need to discuss how to route MADs to the SMA. Are you > Sean> suggesting that the SMA would register for MADs the same as > Sean> any other client? My plan was to extend the ib_mad_reg_req > Sean> structure as needed. I limited it to version, class, method > Sean> currently, since it easily allows O(1) routing. We could > Sean> add in AttributeID and AttributeModifier if those are > Sean> needed. > > I don't know how the SMA should be handled. In the Topspin drivers > it's a bit of a special case -- it gets to see all MADs before any > other consumers. We could do the same, or have the SMA register. > > - R. I'm not sure I understand the semantics - if SMA simply registers without supplying version, class or method, will it get to see all MADs? MST From mst at mellanox.co.il Wed Aug 25 12:50:34 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Aug 2004 22:50:34 +0300 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <52acwjdvdz.fsf@topspin.com> References: <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> <52n00kpcv1.fsf@topspin.com> <1093380548.13962.15.camel@duffman> <52acwjdvdz.fsf@topspin.com> Message-ID: <20040825195034.GB2399@mellanox.co.il> Quoting r. Roland Dreier (roland at topspin.com) "[openib-general] Re: [PATCH] remove ts_kernel_trace from client_query*": > Tom> So, would this be a per module variable? So, there would be > Tom> like 10 or 15 parameters to set? > > Tom> Or would this be set in the base module that provides the > Tom> debug printing and everyone would be affected? > > I would say per-module, although there would probably end up being a > fair number of parameters. Someone could easily write a userspace > wrapper to make a nice interface to debuglevel setting though. > > The reason I don't like having it in a base module is that it doesn't > let you just turn on one module's tracing -- eg if I'm debugging > IPoIB, I just want IPoIB tracing. > > - Roland Wouldnt compile-time be enough? If you want to change the setting, you could just add -DDEBUG or something in the makefile and rebuild the relevant module. I'm not sure multiple debug levels are really useful - from experience its hard to define in which debug level each print belongs, so you end up enabling the maximum verbosity, anyway. MST From roland at topspin.com Wed Aug 25 12:55:45 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 12:55:45 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825194344.GA2399@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 25 Aug 2004 22:43:44 +0300") References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> Message-ID: <52smabc7r2.fsf@topspin.com> Michael> I'm not sure I understand the semantics - if SMA simply Michael> registers without supplying version, class or method, Michael> will it get to see all MADs? In the OpenIB MAD API each MAD can be dispatched to only one consumer. So if the SMA sees all MADs by registering this way, no other consumer would be able to receive MADs. - Roland From roland at topspin.com Wed Aug 25 12:57:17 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 12:57:17 -0700 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <20040825195034.GB2399@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 25 Aug 2004 22:50:34 +0300") References: <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> <52n00kpcv1.fsf@topspin.com> <1093380548.13962.15.camel@duffman> <52acwjdvdz.fsf@topspin.com> <20040825195034.GB2399@mellanox.co.il> Message-ID: <52oekzc7oi.fsf@topspin.com> Michael> Wouldnt compile-time be enough? If you want to change Michael> the setting, you could just add -DDEBUG or something in Michael> the makefile and rebuild the relevant module. We've found it very useful for debugging to be able to turn on debug output after a problem is detected without having to disturb the system. Michael> I'm not sure multiple debug levels are really useful - Michael> from experience its hard to define in which debug level Michael> each print belongs, so you end up enabling the maximum Michael> verbosity, anyway. Yes, I agree, although perhaps two levels (DEBUG and VERBOSEDEBUG say) might be useful. - Roland From mst at mellanox.co.il Wed Aug 25 13:21:41 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Aug 2004 23:21:41 +0300 Subject: [openib-general] DAPL for openib In-Reply-To: <52smabc7r2.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> Message-ID: <20040825202141.GA2672@mellanox.co.il> Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] DAPL for openib": > Michael> I'm not sure I understand the semantics - if SMA simply > Michael> registers without supplying version, class or method, > Michael> will it get to see all MADs? > > In the OpenIB MAD API each MAD can be dispatched to only one > consumer. So if the SMA sees all MADs by registering this way, no > other consumer would be able to receive MADs. > Pity. Is this dictated by difficulty of implementation? Maybe have something like linux probe function - in linux device driver model after device matches class vendor and device id parameters specified by the driver, and probe function is called to make it possible for driver to decide if it wants to handle this device based on other fields. In the same vein, for each MAD API cold probe matching consumers and let them examine the MAD until one says "its mine". So SMA could just refuse all MADs. MST From mst at mellanox.co.il Wed Aug 25 13:26:01 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Aug 2004 23:26:01 +0300 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825202141.GA2672@mellanox.co.il> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> Message-ID: <20040825202601.GB2672@mellanox.co.il> Hello! Quoting r. Michael S. Tsirkin (mst at mellanox.co.il) "Re: [openib-general] DAPL for openib": > Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] DAPL for openib": > > Michael> I'm not sure I understand the semantics - if SMA simply > > Michael> registers without supplying version, class or method, > > Michael> will it get to see all MADs? > > > > In the OpenIB MAD API each MAD can be dispatched to only one > > consumer. So if the SMA sees all MADs by registering this way, no > > other consumer would be able to receive MADs. > > > > Pity. > Is this dictated by difficulty of implementation? > > Maybe have something like linux probe function - > in linux device driver model after device matches > class vendor and device id parameters specified by the > driver, and probe function is called to make it possible > for driver to decide if it wants to handle this device based on other > fields. > > In the same vein, for each MAD API cold probe matching consumers > and let them examine the MAD until one says "its mine". > > So SMA could just refuse all MADs. > > MST > To clarify, typedef void (*ib_mad_recv_handler)(struct ib_mad_agent *mad_agent, struct ib_mad_recv_wc *mad_recv_wc); could be typedef int (*ib_mad_recv_handler)(struct ib_mad_agent *mad_agent, struct ib_mad_recv_wc *mad_recv_wc); MST From tduffy at sun.com Wed Aug 25 13:29:18 2004 From: tduffy at sun.com (Tom Duffy) Date: Wed, 25 Aug 2004 13:29:18 -0700 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <52oekzc7oi.fsf@topspin.com> References: <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> <52n00kpcv1.fsf@topspin.com> <1093380548.13962.15.camel@duffman> <52acwjdvdz.fsf@topspin.com> <20040825195034.GB2399@mellanox.co.il> <52oekzc7oi.fsf@topspin.com> Message-ID: <1093465758.23633.38.camel@duffman> On Wed, 2004-08-25 at 12:57 -0700, Roland Dreier wrote: > Michael> Wouldnt compile-time be enough? If you want to change > Michael> the setting, you could just add -DDEBUG or something in > Michael> the makefile and rebuild the relevant module. > > We've found it very useful for debugging to be able to turn on debug > output after a problem is detected without having to disturb the system. I still think having a CONFIG option to enable debugging in the first place would be a good idea as having to call a function and make a choice to print out or not (especially in the data path) will hurt performance. -tduffy -- "When they took the 4th Amendment, I was quiet because I didn't deal drugs. When they took the 6th Amendment, I was quiet because I am innocent. When they took the 2nd Amendment, I was quiet because I don't own a gun. Now they have taken the 1st Amendment, and I can only be quiet." --Lyle Myhr -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at mellanox.co.il Wed Aug 25 13:37:20 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Aug 2004 23:37:20 +0300 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825202601.GB2672@mellanox.co.il> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825202601.GB2672@mellanox.co.il> Message-ID: <20040825203720.GC2672@mellanox.co.il> Hello! Quoting r. Michael S. Tsirkin (mst at mellanox.co.il) "Re: [openib-general] DAPL for openib": > Hello! > Quoting r. Michael S. Tsirkin (mst at mellanox.co.il) "Re: [openib-general] DAPL for openib": > > Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] DAPL for openib": > > > Michael> I'm not sure I understand the semantics - if SMA simply > > > Michael> registers without supplying version, class or method, > > > Michael> will it get to see all MADs? > > > > > > In the OpenIB MAD API each MAD can be dispatched to only one > > > consumer. So if the SMA sees all MADs by registering this way, no > > > other consumer would be able to receive MADs. > > > > > > > Pity. > > Is this dictated by difficulty of implementation? > > > > Maybe have something like linux probe function - > > in linux device driver model after device matches > > class vendor and device id parameters specified by the > > driver, and probe function is called to make it possible > > for driver to decide if it wants to handle this device based on other > > fields. > > > > In the same vein, for each MAD API cold probe matching consumers > > and let them examine the MAD until one says "its mine". > > > > So SMA could just refuse all MADs. > > > > MST > > > > To clarify, > > typedef void (*ib_mad_recv_handler)(struct ib_mad_agent *mad_agent, > struct ib_mad_recv_wc *mad_recv_wc); > > could be > > > typedef int (*ib_mad_recv_handler)(struct ib_mad_agent *mad_agent, > struct ib_mad_recv_wc *mad_recv_wc); > > MST > That would have to be for unsolicited MADs only, of course. Hmm ... I see where this gets messy. MST From mst at mellanox.co.il Wed Aug 25 13:44:21 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Aug 2004 23:44:21 +0300 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825194344.GA2399@mellanox.co.il> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> Message-ID: <20040825204420.GA2829@mellanox.co.il> Hello! Quoting r. Michael S. Tsirkin (mst at mellanox.co.il) "Re: [openib-general] DAPL for openib": > Hello! > Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] DAPL for openib": > > > > Sean> We need to discuss how to route MADs to the SMA. Are you > > Sean> suggesting that the SMA would register for MADs the same as > > Sean> any other client? My plan was to extend the ib_mad_reg_req > > Sean> structure as needed. I limited it to version, class, method > > Sean> currently, since it easily allows O(1) routing. We could > > Sean> add in AttributeID and AttributeModifier if those are > > Sean> needed. > > > > I don't know how the SMA should be handled. In the Topspin drivers > > it's a bit of a special case -- it gets to see all MADs before any > > other consumers. We could do the same, or have the SMA register. > > > > - R. > > I'm not sure I understand the semantics - > if SMA simply registers without supplying > version, class or method, will it get to see all MADs? > Sorry, I just noticed this is being addressed already: From contrib/intel/TODO: - Need to extend ib_mad_reg_req to support snooping. could use masks for class, version, methods to snoop all. could add a registration type - receive MADs vs. view MADs. registration for view MADs would need to apply to sends as well. view MAD would solve the SMA issue , right? MST From mshefty at ichips.intel.com Wed Aug 25 13:38:08 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 13:38:08 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825204420.GA2829@mellanox.co.il> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <20040825204420.GA2829@mellanox.co.il> Message-ID: <20040825133808.1a215494.mshefty@ichips.intel.com> On Wed, 25 Aug 2004 23:44:21 +0300 "Michael S. Tsirkin" wrote: > Sorry, I just noticed this is being addressed already: > > From contrib/intel/TODO: > > - Need to extend ib_mad_reg_req to support snooping. > could use masks for class, version, methods to snoop all. > could add a registration type - receive MADs vs. view MADs. > registration for view MADs would need to apply to sends as well. > > view MAD would solve the SMA issue , right? At the core of the issue is who owns the buffer, along with who should respond to an unsolicited MAD. In order to support zero-copy send/receives, registration for unsolicited MADs needs to be unique. The view option mentioned should allow a client to see the MAD, but the client doesn't own the buffer. From mshefty at ichips.intel.com Wed Aug 25 13:43:56 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 13:43:56 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825202141.GA2672@mellanox.co.il> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> Message-ID: <20040825134356.7491f403.mshefty@ichips.intel.com> On Wed, 25 Aug 2004 23:21:41 +0300 "Michael S. Tsirkin" wrote: > Is this dictated by difficulty of implementation? > > Maybe have something like linux probe function - > in linux device driver model after device matches > class vendor and device id parameters specified by the > driver, and probe function is called to make it possible > for driver to decide if it wants to handle this device based on other > fields. > > In the same vein, for each MAD API cold probe matching consumers > and let them examine the MAD until one says "its mine". My preference would be for the implementation to hand the MAD directly to the client who needs to process it, and for that algorithm to be O(1). From mst at mellanox.co.il Wed Aug 25 14:55:58 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 26 Aug 2004 00:55:58 +0300 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825134356.7491f403.mshefty@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> Message-ID: <20040825215558.GB2829@mellanox.co.il> Hello! Quoting r. Sean Hefty (mshefty at ichips.intel.com) "Re: [openib-general] DAPL for openib": > On Wed, 25 Aug 2004 23:21:41 +0300 > "Michael S. Tsirkin" wrote: > > > Is this dictated by difficulty of implementation? > > > > Maybe have something like linux probe function - > > in linux device driver model after device matches > > class vendor and device id parameters specified by the > > driver, and probe function is called to make it possible > > for driver to decide if it wants to handle this device based on other > > fields. > > > > In the same vein, for each MAD API cold probe matching consumers > > and let them examine the MAD until one says "its mine". > > My preference would be for the implementation to hand the MAD directly to the client who needs to process it, and for that algorithm to be O(1). What was the data structure you had in mind for this? On linux tree is the easiest I think , but thats not O(1). If you want to, a tree can keep mutiple identical items I think. MST From roland at topspin.com Wed Aug 25 14:55:30 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 14:55:30 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825134356.7491f403.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 25 Aug 2004 13:43:56 -0700") References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> Message-ID: <52fz6adgrx.fsf@topspin.com> Sean> My preference would be for the implementation to hand the Sean> MAD directly to the client who needs to process it, and for Sean> that algorithm to be O(1). This is a reasonable goal. However here's one more SMA wrinkle that I just remembered. When a Mellanox HCA generates a trap for the SM, it doesn't send it directly -- the trap shows up on the receive queue of QP0 locally with a source LID of 0 instead. The low-level driver needs to know about this and forward it on to the SM. This leads to some difficulty on a node where the SM is running, since the SM will also want to see SM traps received on QP0 (obviously). The way we handled this in the Topspin drivers was to let the SMA/low-level driver see all MADs before they got dispatched. I guess we need to decide what to do for OpenIB. - R. From roland at topspin.com Wed Aug 25 14:58:07 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 14:58:07 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825215558.GB2829@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 26 Aug 2004 00:55:58 +0300") References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <20040825215558.GB2829@mellanox.co.il> Message-ID: <52brgydgnk.fsf@topspin.com> Michael> What was the data structure you had in mind for this? On Michael> linux tree is the easiest I think , but thats not O(1). Michael> If you want to, a tree can keep mutiple identical items I Michael> think. I think the O(1) data structure Sean has in mind is a dispatch table :) I guess we could have a linked list of consumers in entry so that dispatch just takes time proportional to the number of consumers for a given MAD. If need be O(log n) doesn't seem unacceptable to me either. - R. From roland at topspin.com Wed Aug 25 15:03:20 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 15:03:20 -0700 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <1093465758.23633.38.camel@duffman> (Tom Duffy's message of "Wed, 25 Aug 2004 13:29:18 -0700") References: <52acx38vte.fsf_-_@topspin.com> <20040810063116.GB6645@mellanox.co.il> <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> <52n00kpcv1.fsf@topspin.com> <1093380548.13962.15.camel@duffman> <52acwjdvdz.fsf@topspin.com> <20040825195034.GB2399@mellanox.co.il> <52oekzc7oi.fsf@topspin.com> <1093465758.23633.38.camel@duffman> Message-ID: <527jrmdgev.fsf@topspin.com> Tom> I still think having a CONFIG option to enable debugging in Tom> the first place would be a good idea as having to call a Tom> function and make a choice to print out or not (especially in Tom> the data path) will hurt performance. I agree. The model I like is what ALSA does: they have a CONFIG_SND_DEBUG (as well as some suboptions), which (among other things) makes files like /proc/asound/card0/pcm0p/xrun_debug show up. The default debug level is 0, but if you echo 1 or 2 into the xrun_debug file then more verbose debugging is enabled at runtime. I'm probably making too big a deal out of the debug code design but I guess I've spent too much time debugging IB code not to make a big deal out of it :) - R. PS Greg, I know we shouldn't put files in /proc, I was just describing what ALSA does :) From mst at mellanox.co.il Wed Aug 25 15:13:38 2004 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 26 Aug 2004 01:13:38 +0300 Subject: [openib-general] DAPL for openib In-Reply-To: <52brgydgnk.fsf@topspin.com> References: <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <20040825215558.GB2829@mellanox.co.il> <52brgydgnk.fsf@topspin.com> Message-ID: <20040825221338.GC2829@mellanox.co.il> Hello! Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] DAPL for openib": > Michael> What was the data structure you had in mind for this? On > Michael> linux tree is the easiest I think , but thats not O(1). > Michael> If you want to, a tree can keep mutiple identical items I > Michael> think. > > I think the O(1) data structure Sean has in mind is a dispatch table :) Wouldnt it be a bit big? For 8 bit values for class/version and 128 methods 256*256*128 * sizeof(void*) --> 32 MByte on a 32 bit machine? From mshefty at ichips.intel.com Wed Aug 25 14:11:15 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 14:11:15 -0700 Subject: [openib-general] ib_mad.h GSI QP1 Well Known QKey In-Reply-To: <1093462088.1832.12.camel@localhost.localdomain> References: <1093462088.1832.12.camel@localhost.localdomain> Message-ID: <20040825141115.5278cbd4.mshefty@ichips.intel.com> On Wed, 25 Aug 2004 15:28:08 -0400 Hal Rosenstock wrote: > Should there be a common definition for the GSI QP1 well known QKey > (0x80010000) in ib_mad.h ? > > Also, a typo on line 104: > * @hi_tid - Access layer assigned transition ID for this client. > s.b. > * @hi_tid - Access layer assigned transaction ID for this client. How about something like this: Index: ib_mad.h =================================================================== --- ib_mad.h (revision 686) +++ ib_mad.h (working copy) @@ -28,6 +28,10 @@ #include "ib_verbs.h" +#define IB_QP0 0 +#define IB_QP1 __cpu_to_be32(1) +#define IB_QP1_QKEY __cpu_to_be32(0x80010000) + struct ib_grh { u32 version_tclass_flow; u16 paylen; @@ -101,7 +105,7 @@ * @recv_handler - Callback handler for a received MAD. * @send_handler - Callback hander for a sent MAD. * @context - User-specified context associated with this registration. - * @hi_tid - Access layer assigned transition ID for this client. + * @hi_tid - Access layer assigned transaction ID for this client. * Unsolicited MADs sent by this client will have the upper 32-bits * of their TID set to this value. */ From mshefty at ichips.intel.com Wed Aug 25 14:13:53 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 14:13:53 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <52fz6adgrx.fsf@topspin.com> References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <52fz6adgrx.fsf@topspin.com> Message-ID: <20040825141353.5261e3e9.mshefty@ichips.intel.com> On Wed, 25 Aug 2004 14:55:30 -0700 Roland Dreier wrote: > The way we handled this in the Topspin drivers was to let the > SMA/low-level driver see all MADs before they got dispatched. I guess > we need to decide what to do for OpenIB. Hmm... I think that this approach may work. Did you need to do this on QP0 and 1? Did you have a separate entry point into the driver, or use the process_mad API? From roland at topspin.com Wed Aug 25 15:17:59 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 15:17:59 -0700 Subject: [openib-general] ib_mad.h GSI QP1 Well Known QKey In-Reply-To: <20040825141115.5278cbd4.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 25 Aug 2004 14:11:15 -0700") References: <1093462088.1832.12.camel@localhost.localdomain> <20040825141115.5278cbd4.mshefty@ichips.intel.com> Message-ID: <523c2adfqg.fsf@topspin.com> > #define IB_QP1 __cpu_to_be32(1) Why use the __ form? I think it should just be cpu_to_be32() By the way, I'm not looking forward to debugging everything if we switch our byte ordering convention for QPNs etc.... - R. From roland at topspin.com Wed Aug 25 15:20:24 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 15:20:24 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825141353.5261e3e9.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 25 Aug 2004 14:13:53 -0700") References: <35EA21F54A45CB47B879F21A91F4862F1DE980@taurus.voltaire.com> <52pt5hvjwl.fsf@topspin.com> <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <52fz6adgrx.fsf@topspin.com> <20040825141353.5261e3e9.mshefty@ichips.intel.com> Message-ID: <52y8k2c11z.fsf@topspin.com> Roland> The way we handled this in the Topspin drivers was to let Roland> the SMA/low-level driver see all MADs before they got Roland> dispatched. I guess we need to decide what to do for Roland> OpenIB. Sean> Hmm... I think that this approach may work. Did you need to Sean> do this on QP0 and 1? Did you have a separate entry point Sean> into the driver, or use the process_mad API? I think Mellanox HCAs will only generate traps like this on QP0. Of course we don't know what future devices might do. the mechanism just used the process_mad entry point. - R. From mshefty at ichips.intel.com Wed Aug 25 14:19:43 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 14:19:43 -0700 Subject: [openib-general] ib_mad.h GSI QP1 Well Known QKey In-Reply-To: <523c2adfqg.fsf@topspin.com> References: <1093462088.1832.12.camel@localhost.localdomain> <20040825141115.5278cbd4.mshefty@ichips.intel.com> <523c2adfqg.fsf@topspin.com> Message-ID: <20040825141943.34c3339f.mshefty@ichips.intel.com> On Wed, 25 Aug 2004 15:17:59 -0700 Roland Dreier wrote: > > #define IB_QP1 __cpu_to_be32(1) > > Why use the __ form? I think it should just be cpu_to_be32() Agreed. > By the way, I'm not looking forward to debugging everything if we > switch our byte ordering convention for QPNs etc.... I don't want to debug host/network order issues either, which is why I think everything that appears in a MAD should be in network order. Otherwise, we'll have a combination of host/network order values. From mshefty at ichips.intel.com Wed Aug 25 14:25:17 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 14:25:17 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825221338.GC2829@mellanox.co.il> References: <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <20040825215558.GB2829@mellanox.co.il> <52brgydgnk.fsf@topspin.com> <20040825221338.GC2829@mellanox.co.il> Message-ID: <20040825142517.70ec64a9.mshefty@ichips.intel.com> On Thu, 26 Aug 2004 01:13:38 +0300 "Michael S. Tsirkin" wrote: > Hello! > Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] DAPL for openib": > > Michael> What was the data structure you had in mind for this? On > > Michael> linux tree is the easiest I think , but thats not O(1). > > Michael> If you want to, a tree can keep mutiple identical items I > > Michael> think. > > > > I think the O(1) data structure Sean has in mind is a dispatch table :) > > Wouldnt it be a bit big? For 8 bit values for class/version and > 128 methods 256*256*128 * sizeof(void*) --> 32 MByte on a 32 bit machine? If you use separate tables for version, class, and method, and let them grow dynamically, they can be substantially smaller. E.g. the version table would likely be a single entry referencing a class table. The class table is about 8 entries long, but requires remapping class 0x81 to index 0. The method array is only needed if a client doesn't register to receive all unsolicited MADs for a specific class. Something like this should let you dispatch to a single client. To support a client viewing the MAD, you'd just need linked lists at any point along the dispatch path. From mshefty at ichips.intel.com Wed Aug 25 15:50:00 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 15:50:00 -0700 Subject: [openib-general] PATCH to rename port to port_num Message-ID: <20040825155000.5538fa20.mshefty@ichips.intel.com> This patch renames port to port_num in the following structures: ib_ah_attr, ib_qp_attr, and ib_event. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 686) +++ ib_verbs.h (working copy) @@ -55,7 +55,7 @@ union { struct ib_cq *cq; struct ib_qp *qp; - u8 port; + u8 port_num; } element; enum ib_event_type event; }; @@ -299,7 +299,7 @@ u8 src_path_bits; u8 static_rate; u8 grh_flag; - u8 port; + u8 port_num; }; struct ib_qp_cap { @@ -438,11 +438,11 @@ u8 max_rd_atomic; u8 max_dest_rd_atomic; u8 min_rnr_timer; - u8 port; + u8 port_num; u8 timeout; u8 retry_cnt; u8 rnr_retry; - u8 alt_port; + u8 alt_port_num; u8 alt_timeout; }; -- From mshefty at ichips.intel.com Wed Aug 25 15:57:09 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Aug 2004 15:57:09 -0700 Subject: [openib-general] PATCH to rename port to port_num In-Reply-To: <20040825155000.5538fa20.mshefty@ichips.intel.com> References: <20040825155000.5538fa20.mshefty@ichips.intel.com> Message-ID: <20040825155709.67be28a3.mshefty@ichips.intel.com> On Wed, 25 Aug 2004 15:50:00 -0700 Sean Hefty wrote: > This patch renames port to port_num in the following structures: ib_ah_attr, ib_qp_attr, and ib_event. And if I didn't miss anything, here's a patch for the mthca stack. Index: ulp/ipoib/ipoib_verbs.c =================================================================== --- ulp/ipoib/ipoib_verbs.c (revision 687) +++ ulp/ipoib/ipoib_verbs.c (working copy) @@ -145,7 +145,7 @@ qp_attr.qp_state = IB_QPS_INIT; qp_attr.qkey = 0; - qp_attr.port = priv->port; + qp_attr.port_num = priv->port; qp_attr.pkey_index = pkey_index; attr_mask = IB_QP_QKEY | Index: ulp/ipoib/ipoib_arp.c =================================================================== --- ulp/ipoib/ipoib_arp.c (revision 687) +++ ulp/ipoib/ipoib_arp.c (working copy) @@ -412,7 +412,7 @@ .src_path_bits = 0, .static_rate = 0, .grh_flag = 0, - .port = priv->port + .port_num = priv->port }; entry->address_handle = ib_create_ah(priv->pd, &av); Index: ulp/ipoib/ipoib_multicast.c =================================================================== --- ulp/ipoib/ipoib_multicast.c (revision 687) +++ ulp/ipoib/ipoib_multicast.c (working copy) @@ -243,7 +243,7 @@ { struct ib_ah_attr av = { .dlid = mcast->mcast_member.mlid, - .port = priv->port, + .port_num = priv->port, .sl = mcast->mcast_member.sl, .src_path_bits = 0, .static_rate = 0, Index: ulp/srp/srptp.c =================================================================== --- ulp/srp/srptp.c (revision 687) +++ ulp/srp/srptp.c (working copy) @@ -706,7 +706,7 @@ IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_WRITE; attr_mask |= IB_QP_PORT; - qp_attr->port = conn->port->local_port; + qp_attr->port_num = conn->port->local_port; attr_mask |= IB_QP_PKEY_INDEX; qp_attr->pkey_index = 0; Index: ulp/sdp/sdp_conn.c =================================================================== --- ulp/sdp/sdp_conn.c (revision 687) +++ ulp/sdp/sdp_conn.c (working copy) @@ -1152,7 +1152,7 @@ IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_WRITE; attr_mask |= IB_QP_PORT; - qp_attr->port = conn->hw_port; + qp_attr->port_num = conn->hw_port; attr_mask |= IB_QP_PKEY_INDEX; qp_attr->pkey_index = 0; Index: include/ib_verbs.h =================================================================== --- include/ib_verbs.h (revision 687) +++ include/ib_verbs.h (working copy) @@ -62,7 +62,7 @@ u8 src_path_bits; u8 static_rate; u8 grh_flag; - u8 port; + u8 port_num; }; enum ib_wc_status { @@ -261,11 +261,11 @@ u8 max_rd_atomic; u8 max_dest_rd_atomic; u8 min_rnr_timer; - u8 port; + u8 port_num; u8 timeout; u8 retry_cnt; u8 rnr_retry; - u8 alt_port; + u8 alt_port_num; u8 alt_timeout; }; Index: core/cm_path_migration.c =================================================================== --- core/cm_path_migration.c (revision 687) +++ core/cm_path_migration.c (working copy) @@ -61,13 +61,13 @@ qp_attr->path_mig_state = IB_MIG_REARM; if (ib_cached_gid_find(connection->alternate_path.sgid, NULL, - &qp_attr->alt_port, NULL)) { + &qp_attr->alt_port_num, NULL)) { result = -EINVAL; goto out; } if (ib_cached_pkey_find(connection->local_cm_device, - qp_attr->alt_port, + qp_attr->alt_port_num, connection->alternate_path.pkey, &qp_attr->alt_pkey_index)) { result = -EINVAL; @@ -76,7 +76,7 @@ TS_TRACE(MOD_IB_CM, T_VERY_VERBOSE, TRACE_IB_CM_GEN, "Loading alternate path: port %d, timeout %d, 0x%04x -> 0x%04x", - qp_attr->alt_port, + qp_attr->alt_port_num, qp_attr->alt_timeout, connection->alternate_path.slid, qp_attr->ah_attr.dlid); Index: core/mad_ib.c =================================================================== --- core/mad_ib.c (revision 687) +++ core/mad_ib.c (working copy) @@ -71,7 +71,7 @@ IB_SEND_SIGNALED | IB_SEND_SOLICITED; av.dlid = mad->dlid; - av.port = mad->port; + av.port_num = mad->port; av.src_path_bits = 0; av.grh_flag = mad->has_grh; av.sl = mad->sl; Index: core/cm_api.c =================================================================== --- core/cm_api.c (revision 687) +++ core/cm_api.c (working copy) @@ -136,10 +136,10 @@ goto out; } - qp_attr->port = port; + qp_attr->port_num = port; if (ib_cached_pkey_find(connection->local_cm_device, - qp_attr->port, + qp_attr->port_num, connection->primary_path.pkey, &qp_attr->pkey_index)) { ret = -EINVAL; Index: core/cm_passive.c =================================================================== --- core/cm_passive.c (revision 687) +++ core/cm_passive.c (working copy) @@ -133,13 +133,13 @@ memset(qp_attr, 0, sizeof *qp_attr); - qp_attr->port = connection->local_cm_port; - if (ib_cached_gid_find(connection->primary_path.sgid, NULL, &qp_attr->port, NULL)) { - qp_attr->port = connection->local_cm_port; + qp_attr->port_num = connection->local_cm_port; + if (ib_cached_gid_find(connection->primary_path.sgid, NULL, &qp_attr->port_num, NULL)) { + qp_attr->port_num = connection->local_cm_port; } if (ib_cached_pkey_find(connection->local_cm_device, - qp_attr->port, + qp_attr->port_num, connection->primary_path.pkey, &qp_attr->pkey_index)) { goto fail; @@ -286,10 +286,10 @@ qp_attr->alt_ah_attr.grh_flag = 0; qp_attr->path_mig_state = IB_MIG_REARM; - ib_cached_gid_find(connection->alternate_path.sgid, NULL, &qp_attr->alt_port, NULL); + ib_cached_gid_find(connection->alternate_path.sgid, NULL, &qp_attr->alt_port_num, NULL); /* XXX check return value: */ ib_cached_pkey_find(connection->local_cm_device, - qp_attr->alt_port, + qp_attr->alt_port_num, connection->alternate_path.pkey, &qp_attr->alt_pkey_index); } Index: core/cm_active.c =================================================================== --- core/cm_active.c (revision 687) +++ core/cm_active.c (working copy) @@ -292,9 +292,9 @@ qp_attr->path_mig_state = IB_MIG_REARM; ib_cached_gid_find(connection->alternate_path.sgid, NULL, - &qp_attr->alt_port, NULL); + &qp_attr->alt_port_num, NULL); ib_cached_pkey_find(connection->local_cm_device, - qp_attr->alt_port, + qp_attr->alt_port_num, connection->alternate_path.pkey, &qp_attr->alt_pkey_index); } Index: hw/mthca/mthca_av.c =================================================================== --- hw/mthca/mthca_av.c (revision 687) +++ hw/mthca/mthca_av.c (working copy) @@ -81,14 +81,14 @@ memset(av, 0, MTHCA_AV_SIZE); - av->port_pd = cpu_to_be32(pd->pd_num | (ah_attr->port << 24)); + av->port_pd = cpu_to_be32(pd->pd_num | (ah_attr->port_num << 24)); av->g_slid = (!!ah_attr->grh_flag << 7) | ah_attr->src_path_bits; av->dlid = cpu_to_be16(ah_attr->dlid); av->msg_sr = (3 << 4) | /* 2K message */ ah_attr->static_rate; av->sl_tclass_flowlabel = cpu_to_be32(ah_attr->sl << 28); if (ah_attr->grh_flag) { - av->gid_index = (ah_attr->port - 1) * dev->limits.gid_table_len + + av->gid_index = (ah_attr->port_num - 1) * dev->limits.gid_table_len + ah_attr->grh.sgid_index; av->hop_limit = ah_attr->grh.hop_limit; av->sl_tclass_flowlabel |= Index: hw/mthca/mthca_qp.c =================================================================== --- hw/mthca/mthca_qp.c (revision 687) +++ hw/mthca/mthca_qp.c (working copy) @@ -573,7 +573,7 @@ else { if (attr_mask & IB_QP_PORT) { qp_context->pri_path.port_pkey |= - cpu_to_be32(attr->port << 24); + cpu_to_be32(attr->port_num << 24); qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_PORT_NUM); } } From roland at topspin.com Wed Aug 25 22:21:26 2004 From: roland at topspin.com (Roland Dreier) Date: Wed, 25 Aug 2004 22:21:26 -0700 Subject: [openib-general] PATCH to rename port to port_num In-Reply-To: <20040825155709.67be28a3.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 25 Aug 2004 15:57:09 -0700") References: <20040825155000.5538fa20.mshefty@ichips.intel.com> <20040825155709.67be28a3.mshefty@ichips.intel.com> Message-ID: <52eklubhk9.fsf@topspin.com> Thanks, I applied this. - R. From mvonwyl at bluewin.ch Thu Aug 26 07:39:40 2004 From: mvonwyl at bluewin.ch (mvonwyl at bluewin.ch) Date: Thu, 26 Aug 2004 16:39:40 +0200 Subject: [openib-general] VAPI programming question In-Reply-To: <20040825095613.35dd2058.mshefty@ichips.intel.com> Message-ID: <40F7A9ED001243A7@mssbzhb-int.msg.bluewin.ch> >It sounds like you want to use the CM (either the connection protocol or >SIDR, depending on what you're trying to do). The CM uses the GSI, so if >the GSI works for you, I'd say go ahead and use it, even if you aren't using >the CM protocol currently. It seems that the gsi is always busy when trying to get it with the VAPI. If I use the CM to do that I must register the server's qp to the Subnet Administrator and perform a request with the client to receive a list from the SA that contain all the qp that match some properties, right? It looks like a little be complicate with the CM API but I'll give it a try. Someone knows where I can find a good CM API specification? Thanks From halr at voltaire.com Thu Aug 26 07:59:08 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 26 Aug 2004 10:59:08 -0400 Subject: [openib-general] VAPI programming question In-Reply-To: <40F7A9ED001243A7@mssbzhb-int.msg.bluewin.ch> References: <40F7A9ED001243A7@mssbzhb-int.msg.bluewin.ch> Message-ID: <1093532347.1831.12.camel@localhost.localdomain> On Thu, 2004-08-26 at 10:39, mvonwyl at bluewin.ch wrote: > >It sounds like you want to use the CM (either the connection protocol or > >SIDR, depending on what you're trying to do). The CM uses the GSI, so if > >the GSI works for you, I'd say go ahead and use it, even if you aren't using > >the CM protocol currently. > > It seems that the gsi is always busy when trying to get it with the VAPI. Not sure what you mean by this. Are you trying to get special QP1 and that is what is "busy" ? I think that is because there can be only one owner of QP1. There needs to be/is demultiplexing on QP1 to the various GSI clients (CM, SA, SA client, etc.). > If I use the CM to do that I must register the server's qp to the Subnet > Administrator and perform a request with the client to receive a list from > the SA that contain all the qp that match some properties, right? That's what SA service records are for. The server registers its service with the SA. The client can then use some service ID/name to find the server GID, then lookup the path record, and then either initiate the connection or use SIDR to find the UD QPN. If you already know the server GID you can skip the first step. > It looks like a little be complicate with the CM API but I'll give it a try. > > Someone knows where I can find a good CM API specification? What CM are you using ? -- Hal > > Thanks > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mshefty at ichips.intel.com Thu Aug 26 08:26:44 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 26 Aug 2004 08:26:44 -0700 Subject: [openib-general] PATCH to rename port to port_num In-Reply-To: <20040825155000.5538fa20.mshefty@ichips.intel.com> References: <20040825155000.5538fa20.mshefty@ichips.intel.com> Message-ID: <20040826082644.006298a9.mshefty@ichips.intel.com> On Wed, 25 Aug 2004 15:50:00 -0700 Sean Hefty wrote: > This patch renames port to port_num in the following structures: ib_ah_attr, ib_qp_attr, and ib_event. patch committed From mshefty at ichips.intel.com Fri Aug 27 10:40:03 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 27 Aug 2004 10:40:03 -0700 Subject: [openib-general] PATCH - minor fixes for verb calls Message-ID: <20040827104003.3282b00e.mshefty@ichips.intel.com> Roland, Here's a couple of minor bug fixes for your verb implementation. - Sean Index: core/core_mw.c =================================================================== --- core/core_mw.c (revision 691) +++ core/core_mw.c (working copy) @@ -51,7 +51,7 @@ pd = mw->pd; ret = mw->device->dealloc_mw(mw); if (!ret) - atomic_inc(&pd->usecnt); + atomic_dec(&pd->usecnt); return ret; } Index: core/core_ah.c =================================================================== --- core/core_ah.c (revision 691) +++ core/core_ah.c (working copy) @@ -65,7 +65,7 @@ pd = ah->pd; ret = ah->device->destroy_ah(ah); if (!ret) - atomic_inc(&pd->usecnt); + atomic_dec(&pd->usecnt); return ret; } -- From roland at topspin.com Fri Aug 27 12:08:01 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 27 Aug 2004 12:08:01 -0700 Subject: [openib-general] PATCH - minor fixes for verb calls In-Reply-To: <20040827104003.3282b00e.mshefty@ichips.intel.com> (Sean Hefty's message of "Fri, 27 Aug 2004 10:40:03 -0700") References: <20040827104003.3282b00e.mshefty@ichips.intel.com> Message-ID: <52n00g8kmm.fsf@topspin.com> oops... thanks, applied. - R. From mshefty at ichips.intel.com Fri Aug 27 15:28:07 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 27 Aug 2004 15:28:07 -0700 Subject: [openib-general] PATCH creation of ib_verbs.c file Message-ID: <20040827152807.1e7d60d5.mshefty@ichips.intel.com> Below is a patch to create a new file, ib_verbs.c. Several non-performance critical calls were relocated into the file and updated based on changes in Roland's branch. Roland, here's a list of differences between this patch and your branch: ib_create_qp / ib_destroy_qp - include support for srq optional calls are not yet checked - I'll start a separate discussion on this. ib_create_cq / ib_resize_cq - I was thinking that that device driver would set the struct cq.cqe value directly, rather than returning a changed &cqe value and requiring the access layer to set it. ib_rereg_phys_mr - checks for bound mw's and pd changes I only moved calls from ib_verbs.h to ib_verbs.c that were more than simple pass-through calls. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 691) +++ ib_verbs.h (working copy) @@ -72,7 +72,6 @@ struct ib_ah { struct ib_device *device; struct ib_pd *pd; - atomic_t usecnt; }; struct ib_cq { @@ -95,9 +94,9 @@ struct ib_pd *pd; struct ib_cq *send_cq; struct ib_cq *recv_cq; + struct ib_srq *srq; void *qp_context; u32 qp_num; - atomic_t usecnt; }; struct ib_mr { @@ -112,7 +111,6 @@ struct ib_device *device; struct ib_pd *pd; u32 rkey; - atomic_t usecnt; }; struct ib_fmr { @@ -782,22 +780,6 @@ port_modify); } -static inline struct ib_pd *ib_alloc_pd(struct ib_device *device) -{ - return device->alloc_pd(device); -} - -static inline int ib_dealloc_pd(struct ib_pd *pd) -{ - return pd->device->dealloc_pd(pd); -} - -static inline struct ib_ah *ib_create_ah(struct ib_pd *pd, - struct ib_ah_attr *ah_attr) -{ - return pd->device->create_ah(pd, ah_attr); -} - static inline int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) { @@ -810,18 +792,6 @@ return ah->device->query_ah(ah, ah_attr); } -static inline int ib_destroy_ah(struct ib_ah *ah) -{ - return ah->device->destroy_ah(ah); -} - -static inline struct ib_qp *ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap) -{ - return pd->device->create_qp(pd, qp_init_attr, qp_cap); -} - static inline int ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, @@ -838,18 +808,6 @@ return qp->device->query_qp(qp, qp_attr, qp_attr_mask, qp_init_attr); } -static inline int ib_destroy_qp(struct ib_qp *qp) -{ - return qp->device->destroy_qp(qp); -} - -static inline struct ib_srq *ib_create_srq(struct ib_pd *pd, - void *srq_context, - struct ib_srq_attr *srq_attr) -{ - return pd->device->create_srq(pd, srq_context, srq_attr); -} - static inline int ib_query_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr) { @@ -864,11 +822,6 @@ return srq->device->modify_srq(srq, pd, srq_attr, srq_attr_mask); } -static inline int ib_destroy_srq(struct ib_srq *srq) -{ - return srq->device->destroy_srq(srq); -} - static inline int ib_post_srq(struct ib_srq *srq, struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr) @@ -876,65 +829,18 @@ return srq->device->post_srq(srq, recv_wr, bad_recv_wr); } -static inline struct ib_cq *ib_create_cq(struct ib_device *device, - ib_comp_handler comp_handler, - void *cq_context, - int cqe) -{ - return device->create_cq(device, comp_handler, cq_context, cqe); -} - static inline int ib_resize_cq(struct ib_cq *cq, int cqe) { return cq->device->resize_cq(cq, cqe); } -static inline int ib_destroy_cq(struct ib_cq *cq) -{ - return cq->device->destroy_cq(cq); -} - -/* in functions below iova_start is in/out parameter */ -static inline struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start) -{ - return pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, - mr_access_flags, iova_start); -} - static inline int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr) { return mr->device->query_mr(mr, mr_attr); } -static inline int ib_dereg_mr(struct ib_mr *mr) -{ - return mr->device->dereg_mr(mr); -} - -static inline int ib_rereg_phys_mr(struct ib_mr *mr, - int mr_rereg_mask, - struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start) -{ - return mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, phys_buf_array, - num_phys_buf, mr_access_flags, - iova_start); -} - -static inline struct ib_mw *ib_alloc_mw(struct ib_pd *pd) -{ - return pd->device->alloc_mw(pd); -} - static inline int ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw, struct ib_mw_bind *mw_bind) @@ -942,18 +848,6 @@ return mw->device->bind_mw(qp, mw, mw_bind); } -static inline int ib_dealloc_mw(struct ib_mw *mw) -{ - return mw->device->dealloc_mw(mw); -} - -static inline struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, - int mr_access_flags, - struct ib_fmr_attr *fmr_attr) -{ - return pd->device->alloc_fmr(pd, mr_access_flags, fmr_attr); -} - static inline int ib_map_fmr(struct ib_fmr *fmr, void *addr, u64 size) @@ -961,13 +855,6 @@ return fmr->device->map_fmr(fmr, addr, size); } -static inline int ib_map_phys_fmr(struct ib_fmr *fmr, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf) -{ - return fmr->device->map_phys_fmr(fmr, phys_buf_array, num_phys_buf); -} - static inline int ib_unmap_fmr(struct ib_fmr **fmr_array, int fmr_cnt) { @@ -975,11 +862,6 @@ return fmr_array[0]->device->unmap_fmr(fmr_array, fmr_cnt); } -static inline int ib_free_fmr(struct ib_fmr *fmr) -{ - return fmr->device->free_fmr(fmr); -} - static inline int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) Index: TODO =================================================================== --- TODO (revision 691) +++ TODO (working copy) @@ -1,11 +1,10 @@ Verbs TODOs: + - Hey there! Howz about a makefile? - Ensure ib_mod_qp can change all QP parameters - check resize. - Determine proper value for static_rate - match CM or inter-packet delay. Can an abstracted value be easier to use? - - Need to define struct ib_mw_bind. - Optional calls need checks before invoking device driver. - - Migrate non-speed path routines into .c file. - Add comments for API. - Should ib_unmap_fmr take fmr_array as input, or just fmr? What should the restriction on the fmr_array be? All from same Index: ib_verbs.c =================================================================== --- ib_verbs.c (revision 0) +++ ib_verbs.c (revision 0) @@ -0,0 +1,338 @@ +/* + This software is available to you under a choice of one of two + licenses. You may choose to be licensed under the terms of the GNU + General Public License (GPL) Version 2, available at + , or the OpenIB.org BSD + license, available in the LICENSE.TXT file accompanying this + software. These details are also available at + . + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + SOFTWARE. + + Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved. + Copyright (c) 2004 Infinicon Corporation. All rights reserved. + Copyright (c) 2004 Intel Corporation. All rights reserved. + Copyright (c) 2004 Topspin Corporation. All rights reserved. + Copyright (c) 2004 Voltaire Corporation. All rights reserved. +*/ + +#include +#include + +struct ib_pd *ib_alloc_pd(struct ib_device *device) +{ + struct ib_pd *pd; + + pd = device->alloc_pd(device); + + if (!IS_ERR(pd)) { + pd->device = device; + atomic_set(&pd->usecnt, 0); + } + + return pd; +} +EXPORT_SYMBOL(ib_alloc_pd); + +int ib_dealloc_pd(struct ib_pd *pd) +{ + if (atomic_read(&pd->usecnt)) + return -EBUSY; + + return pd->device->dealloc_pd(pd); +} +EXPORT_SYMBOL(ib_dealloc_pd); + +struct ib_ah *ib_create_ah(struct ib_pd *pd, + struct ib_ah_attr *ah_attr) +{ + struct ib_ah *ah; + + ah = pd->device->create_ah(pd, ah_attr); + + if (!IS_ERR(ah)) { + ah->device = pd->device; + ah->pd = pd; + atomic_inc(&pd->usecnt); + } + + return ah; +} +EXPORT_SYMBOL(ib_create_ah); + +int ib_destroy_ah(struct ib_ah *ah) +{ + struct ib_pd *pd; + int ret; + + pd = ah->pd; + + ret = ah->device->destroy_ah(ah); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_ah); + +struct ib_qp *ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap) +{ + struct ib_qp *qp; + + qp = pd->device->create_qp(pd, qp_init_attr, qp_cap); + + if (!IS_ERR(qp)) { + qp->device = pd->device; + qp->pd = pd; + qp->send_cq = qp_init_attr->send_cq; + qp->recv_cq = qp_init_attr->recv_cq; + qp->srq = qp_init_attr->srq; + qp->qp_context = qp_init_attr->qp_context; + + atomic_inc(&pd->usecnt); + atomic_inc(&qp_init_attr->send_cq->usecnt); + atomic_inc(&qp_init_attr->recv_cq->usecnt); + if (qp_init_attr->srq) + atomic_inc(&qp_init_attr->srq->usecnt); + } + + return qp; +} +EXPORT_SYMBOL(ib_create_qp); + +int ib_destroy_qp(struct ib_qp *qp) +{ + struct ib_pd *pd; + struct ib_cq *send_cq, *recv_cq; + struct ib_srq *srq; + int ret; + + pd = qp->pd; + send_cq = qp->send_cq; + recv_cq = qp->recv_cq; + srq = qp->srq; + + ret = qp->device->destroy_qp(qp); + if (!ret) { + atomic_dec(&pd->usecnt); + atomic_dec(&send_cq->usecnt); + atomic_dec(&recv_cq->usecnt); + if (srq) + atomic_dec(&srq->usecnt); + } + + return ret; +} +EXPORT_SYMBOL(ib_destroy_qp); + +struct ib_srq *ib_create_srq(struct ib_pd *pd, + void *srq_context, + struct ib_srq_attr *srq_attr) +{ + struct ib_srq *srq; + + if (!pd->device->create_srq) + return -ENOSYS; + + srq = pd->device->create_srq(pd, srq_context, srq_attr); + + if (!IS_ERR(srq)) { + srq->device = pd->device; + srq->pd = pd; + srq->srq_context = srq_context; + atomic_inc(&pd->usecnt); + atomic_set(&srq->usecnt, 0); + } + + return srq; +} +EXPORT_SYMBOL(ib_create_srq); + +int ib_destroy_srq(struct ib_srq *srq) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(&srq->usecnt)) + return -EBUSY; + + pd = srq->pd; + + ret = srq->device->destroy_srq(srq); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_srq); + +struct ib_cq *ib_create_cq(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, + int cqe) +{ + struct ib_cq *cq; + + cq = device->create_cq(device, comp_handler, cq_context, cqe); + + if (!IS_ERR(cq)) { + cq->device = device; + cq->comp_handler = comp_handler; + cq->cq_context = cq_context; + atomic_set(&cq->usecnt, 0); + } + + return cq; +} +EXPORT_SYMBOL(ib_create_cq); + +int ib_destroy_cq(struct ib_cq *cq) +{ + if (atomic_read(&cq->usecnt)) + return -EBUSY; + + return cq->device->destroy_cq(cq); +} +EXPORT_SYMBOL(ib_destroy_cq); + +struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start) +{ + struct ib_mr *mr; + + mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, + mr_access_flags, iova_start); + + if (!IS_ERR(mr)) { + mr->device = pd->device; + mr->pd = pd; + atomic_inc(&pd->usecnt); + atomic_set(&mr->usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_reg_phys_mr); + +int ib_dereg_mr(struct ib_mr *mr) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(&mr->usecnt)) + return -EBUSY; + + pd = mr->pd; + + ret = mr->device->dereg_mr(mr); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_dereg_mr); + +int ib_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start) +{ + struct ib_pd *old_pd; + int ret; + + if (atomic_read(&mr->usecnt)) + return -EBUSY; + + old_pd = mr->pd; + + ret = mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, + phys_buf_array, num_phys_buf, + mr_access_flags, iova_start); + + if (!ret && (mr_rereg_mask & IB_MR_REREG_PD)) { + atomic_dec(&old_pd->usecnt); + atomic_inc(&pd->usecnt); + } + + return ret; +} +EXPORT_SYMBOL(ib_rereg_phys_mr); + +struct ib_mw *ib_alloc_mw(struct ib_pd *pd) +{ + struct ib_mw *mw; + + mw = pd->device->alloc_mw(pd); + + if (!IS_ERR(mw)) { + mw->device = pd->device; + mw->pd = pd; + atomic_inc(&pd->usecnt); + } + + return mw; +} +EXPORT_SYMBOL(ib_alloc_mw); + +int ib_dealloc_mw(struct ib_mw *mw) +{ + struct ib_pd *pd; + int ret; + + pd = mw->pd; + + ret = mw->device->dealloc_mw(mw); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_dealloc_mw); + +struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr) +{ + struct ib_fmr *fmr; + + fmr = pd->device->alloc_fmr(pd, mr_access_flags, fmr_attr); + + if (!IS_ERR(fmr)) { + fmr->device = pd->device; + fmr->pd = pd; + atomic_inc(&pd->usecnt); + } + + return fmr; +} +EXPORT_SYMBOL(ib_alloc_fmr); + +int ib_free_fmr(struct ib_fmr *fmr) +{ + struct ib_pd *pd; + int ret; + + pd = fmr->pd; + + ret = fmr->device->free_fmr(fmr); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_free_fmr); -- From mshefty at ichips.intel.com Fri Aug 27 15:37:56 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 27 Aug 2004 15:37:56 -0700 Subject: [openib-general] optional function calls Message-ID: <20040827153756.0f29cdd9.mshefty@ichips.intel.com> I'm trying to decide which IB function require checks to see if they exist. Should we go by the spec and assume that all mandatory functions are implemented by the device driver? Or should we allow the minimal subset possible? Also, if calls like ib_create_srq and ib_attach_mcast check to see if the device implemented the functions, do we need the same check on calls to ib_destroy_srq and ib_detach_mcast? I.e. should we only perform the check in the creation calls? - Sean From roland at topspin.com Fri Aug 27 19:50:40 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 27 Aug 2004 19:50:40 -0700 Subject: [openib-general] [PATCH] update to new FMR API Message-ID: <5265747z7j.fsf@topspin.com> The patch below updates my branch to the new FMR API. I made a few cleanups to the API while implementing the core support: - I called the FMR free function dealloc_fmr rather than free_fmr, to match alloc_mw/dealloc_mw and alloc_pd/dealloc_pd. - I got rid of the map_fmr method since we don't do virtual memory registration in the kernel. - I added an iova parameter to the map_phys_fmr method since there has to be a way for the consumer to specify the address. I also moved a cleaned up version 'FMR pool' stuff into core_fmr_pool.c, since it seems like a useful library for ULPs to have access to. We can also discuss further changes to this interface. mthca still doesn't actually implement FMRs. - Roland Index: src/linux-kernel/infiniband/ulp/srp/srp_host.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_host.c (revision 692) +++ src/linux-kernel/infiniband/ulp/srp/srp_host.c (working copy) @@ -3045,8 +3045,8 @@ status = srptp_register_memory(srp_pkt->conn, sr_list, - ((unsigned long)sr_list-> - data & (PAGE_SIZE - 1)), + ((unsigned long)sr_list->data & + (PAGE_SIZE - 1)), dma_addr_list, dma_addr_index); if (status == -EAGAIN) { Index: src/linux-kernel/infiniband/ulp/srp/srp_host.h =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_host.h (revision 692) +++ src/linux-kernel/infiniband/ulp/srp/srp_host.h (working copy) @@ -53,7 +53,7 @@ #include "ts_kernel_trace.h" #include "ts_kernel_thread.h" #include -#include "ts_ib_core.h" +#include #include "ts_ib_dm_client_host.h" #include "ts_ib_sa_client.h" #include "ts_ib_cm_types.h" @@ -234,7 +234,7 @@ u64 r_addr; /* RDMA buffer address to be used by the * target */ u32 r_key; - struct ib_fmr *mr_hndl; /* buffer's memory handle */ + struct ib_pool_fmr *mr_hndl; /* buffer's memory handle */ } srp_host_buf_t; typedef struct _srp_pkt_t { Index: src/linux-kernel/infiniband/ulp/srp/srptp.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srptp.c (revision 692) +++ src/linux-kernel/infiniband/ulp/srp/srptp.c (working copy) @@ -334,7 +334,7 @@ * fmr_params.dirty_watermark = 256; */ - status = ib_fmr_pool_create(hca->pd_hndl, + status = ib_create_fmr_pool(hca->pd_hndl, &fmr_params, &hca->fmr_pool); if (status != 0) { @@ -348,7 +348,7 @@ fmr_params.pool_size = 1024; fmr_params.dirty_watermark = 256; - status = ib_fmr_pool_create(hca->pd_hndl, + status = ib_create_fmr_pool(hca->pd_hndl, &fmr_params, &hca->fmr_pool); if (status != 0) { @@ -370,7 +370,7 @@ while (hca_index >= 0) { hca = &hca_params[hca_index]; if (hca->fmr_pool) - ib_fmr_pool_destroy(hca->fmr_pool); + ib_destroy_fmr_pool(hca->fmr_pool); if (hca->pd_hndl) ib_dealloc_pd(hca->pd_hndl); module_put(hca->ca_hndl->owner); @@ -397,7 +397,7 @@ if (!hca_params[i].valid) continue; - status = ib_fmr_pool_destroy(hca->fmr_pool); + status = ib_destroy_fmr_pool(hca->fmr_pool); if (status != 0) TS_REPORT_STAGE(MOD_SRPTP, @@ -801,7 +801,7 @@ { int status; u32 l_key; - u64 start_address = (unsigned long) buf->data; + u64 start_address = (unsigned long) buf->data - offset; if (buf == NULL) { @@ -813,17 +813,20 @@ TS_REPORT_DATA(MOD_SRPTP, "iova %llx, iova_offset %x length 0x%x", start_address, offset, buf->size); - status = ib_fmr_register_physical(conn->target->port->hca->fmr_pool, - buffer_list, list_len, - &start_address, offset, - &buf->mr_hndl, &l_key, &buf->r_key); + buf->mr_hndl = ib_fmr_pool_map_phys(conn->target->port->hca->fmr_pool, + buffer_list, list_len, + &start_address); - if (status) { + if (IS_ERR(buf->mr_hndl)) { + status = PTR_ERR(buf->mr_hndl); TS_REPORT_DATA(MOD_SRPTP, "Memory registration failed: %d", status); return (status); } + l_key = buf->mr_hndl->fmr->lkey; + buf->r_key = buf->mr_hndl->fmr->rkey; + TS_REPORT_DATA(MOD_SRPTP, "l_key %x, r_key %x, mr_hndl %x", l_key, buf->r_key, buf->mr_hndl); @@ -837,7 +840,7 @@ TS_REPORT_DATA(MOD_SRPTP, "releasing mr_hndl %x", buf->mr_hndl); - status = ib_fmr_deregister(buf->mr_hndl); + status = ib_fmr_pool_unmap(buf->mr_hndl); if (status != 0) { TS_REPORT_WARN(MOD_SRPTP, "de-registration failed: %d", status); Index: src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (revision 692) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -1895,7 +1895,7 @@ static s32 _sdp_device_table_init(struct sdev_root *dev_root) { #ifdef _TS_SDP_AIO_SUPPORT - tTS_IB_FMR_POOL_PARAM_STRUCT fmr_param_s; + struct ib_fmr_pool_param fmr_param_s; #endif struct ib_phys_buf buffer_list; struct ib_device_properties node_info; @@ -2012,7 +2012,7 @@ /* * create SDP memory pool */ - result = ib_fmr_pool_create(hca->pd, + result = ib_create_fmr_pool(hca->pd, &fmr_param_s, &hca->fmr_pool); if (0 > result) { @@ -2096,7 +2096,7 @@ if (NULL != hca->fmr_pool) { - (void)ib_fmr_pool_destroy(hca->fmr_pool); + (void)ib_destroy_fmr_pool(hca->fmr_pool); } if (hca->mem_h) { Index: src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.c (revision 692) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.c (working copy) @@ -72,28 +72,28 @@ /* * prime io address with physical address of first byte? */ - iocb->io_addr = iocb->page_array[0] + iocb->page_offset; + iocb->io_addr = iocb->page_array[0]; /* * register IOCBs physical memory */ - result = ib_fmr_register_physical(conn->fmr_pool, + iocb->mem = ib_fmr_pool_map_phys(conn->fmr_pool, (u64 *) iocb->page_array, iocb->page_count, - (u64 *) & iocb->io_addr, - iocb->page_offset, - &iocb->mem, - &iocb->l_key, &iocb->r_key); - if (0 != result) { - if (-EAGAIN != result) { + iocb->io_addr); + if (IS_ERR(iocb->mem)) { + if (-EAGAIN != PTR_ERR(result)) { TS_TRACE(MOD_LNX_SDP, T_VERY_VERBOSE, TRACE_FLOW_WARN, "POST: Error <%d> registering physical memory. <%d:%d:%d>", - result, iocb->len, iocb->page_count, + PTR_ERR(result), iocb->len, iocb->page_count, iocb->page_offset); } goto error_register; } + + iocb->l_key = iocb->mem->fmr->lkey; + iocb->r_key = iocb->mem->fmr->rkey; /* * some data may have already been consumed, adjust the io address * to take this into account @@ -121,7 +121,7 @@ if (NULL != iocb->page_array) { - result = ib_fmr_deregister(iocb->mem); + result = ib_fmr_pool_unmap(iocb->mem); if (0 > result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_WARN, Index: src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.h =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.h (revision 692) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_iocb.h (working copy) @@ -26,6 +26,8 @@ #include +#include + /* * topspin specific includes. */ @@ -85,7 +87,7 @@ /* * IB specific information for zcopy. */ - struct ib_fmr *mem; /* memory region handle */ + struct ib_pool_fmr *mem; /* memory region handle */ u32 l_key; /* local access key */ u32 r_key; /* remote access key */ u64 io_addr; /* virtual IO address */ Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 692) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -369,6 +369,12 @@ int mw_access_flags; }; +struct ib_fmr_attr { + int max_pages; + int max_maps; + u8 page_size; +}; + struct ib_pd { struct ib_device *device; atomic_t usecnt; /* count all resources */ @@ -412,9 +418,15 @@ u32 rkey; }; +struct ib_fmr { + struct ib_device *device; + struct ib_pd *pd; + struct list_head list; + u32 lkey; + u32 rkey; +}; + struct ib_device { - IB_DECLARE_MAGIC - struct module *owner; struct pci_dev *dma_device; @@ -490,10 +502,14 @@ struct ib_mw *mw, struct ib_mw_bind *mw_bind); int (*dealloc_mw)(struct ib_mw *mw); - ib_fmr_create_func fmr_create; - ib_fmr_destroy_func fmr_destroy; - ib_fmr_map_func fmr_map; - ib_fmr_unmap_func fmr_unmap; + struct ib_fmr * (*alloc_fmr)(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr); + int (*map_phys_fmr)(struct ib_fmr *fmr, + u64 *page_list, int list_len, + u64 iova); + int (*unmap_fmr)(struct list_head *fmr_list); + int (*dealloc_fmr)(struct ib_fmr *fmr); int (*attach_mcast)(struct ib_qp *qp, union ib_gid *gid, u16 lid); @@ -612,7 +628,7 @@ struct ib_mw *mw, struct ib_mw_bind *mw_bind) { - /* XXX reference counting in mw? */ + /* XXX reference counting in corresponding MR? */ return mw->device->bind_mw ? mw->device->bind_mw(qp, mw, mw_bind) : -ENOSYS; @@ -620,6 +636,20 @@ int ib_dealloc_mw(struct ib_mw *mw); +struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr); + +static inline int ib_map_phys_fmr(struct ib_fmr *fmr, + u64 *page_list, int list_len, + u64 iova) +{ + return fmr->device->map_phys_fmr(fmr, page_list, list_len, iova); +} + +int ib_unmap_fmr(struct list_head *fmr_list); +int ib_dealloc_fmr(struct ib_fmr *fmr); + int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid); int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid); Index: src/linux-kernel/infiniband/include/ts_ib_sma_provider_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_sma_provider_types.h (revision 692) +++ src/linux-kernel/infiniband/include/ts_ib_sma_provider_types.h (working copy) @@ -31,7 +31,6 @@ # include /* for size_t */ #endif -#include "ts_ib_magic.h" #include #include "ts_ib_mad_types.h" #include "ts_ib_mad_smi_types.h" @@ -291,8 +290,6 @@ /* The provider structure that a device-specific SMA needs to fill in. */ struct ib_sma_provider { - IB_DECLARE_MAGIC - struct ib_device *device; tTS_IB_SMA_PROVIDER_FLAGS flags; void *sma; // Generic SMA use Index: src/linux-kernel/infiniband/include/ts_ib_pma_provider_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_pma_provider_types.h (revision 692) +++ src/linux-kernel/infiniband/include/ts_ib_pma_provider_types.h (working copy) @@ -31,7 +31,6 @@ # include /* for size_t */ #endif -#include "ts_ib_magic.h" #include #include "ts_ib_mad_types.h" @@ -119,8 +118,6 @@ /* The provider structure that a device-specific PMA needs to fill in. */ struct ib_pma_provider { - IB_DECLARE_MAGIC - struct ib_device *device; tTS_IB_PMA_PROVIDER_FLAGS flags; void *pma; // Generic PMA use Index: src/linux-kernel/infiniband/include/ts_ib_core_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core_types.h (revision 692) +++ src/linux-kernel/infiniband/include/ts_ib_core_types.h (working copy) @@ -33,8 +33,6 @@ # include #endif -#include - /* basic type definitions */ enum { @@ -210,36 +208,6 @@ IB_PKEY_CHANGE, }; -struct ib_async_obj { - void * free_ptr; - spinlock_t lock; - int pending; - int dead; -}; - -struct ib_fmr_pool; /* actual definition in core_fmr.c */ -struct ib_pd; - -struct ib_fmr { - IB_DECLARE_MAGIC - struct ib_device *device; - void *private; - struct ib_fmr_pool *pool; - u32 lkey; - u32 rkey; - int ref_count; - int remap_count; - struct list_head list; - tTS_HASH_NODE_STRUCT cache_node; - u64 io_virtual_address; - u64 iova_offset; - int page_list_len; - u64 page_list[0]; -}; - -typedef void (*ib_fmr_flush_func)(struct ib_fmr_pool *pool, - void *arg); - struct ib_async_event_handler; /* actual definition in core_async.c */ struct ib_async_event_record { @@ -271,10 +239,6 @@ IB_DEVICE_SYSTEM_IMAGE_GUID = 1 << 0 }; -enum ib_memory_access { - IB_ACCESS_ENABLE_WINDOW = 1 << 4 -}; - /* structures */ enum { @@ -294,16 +258,6 @@ tTS_IB_GUID system_image_guid; }; -struct ib_fmr_pool_param { - int max_pages_per_fmr; - enum ib_memory_access access; - int pool_size; - int dirty_watermark; - ib_fmr_flush_func flush_function; - void *flush_arg; - int cache:1; -}; - struct ib_sm_path { u16 sm_lid; tTS_IB_SL sm_sl; @@ -332,21 +286,6 @@ tTS_IB_PORT port, int index, tTS_IB_GID gid); -typedef int (*ib_fmr_create_func)(struct ib_pd *pd, - enum ib_memory_access access, - int max_pages, - int max_remaps, - struct ib_fmr *fmr); -typedef int (*ib_fmr_destroy_func)(struct ib_fmr *fmr); -typedef int (*ib_fmr_map_func)(struct ib_fmr *fmr, - u64 *page_list, - int list_len, - u64 *io_virtual_address, - u64 iova_offset, - u32 *lkey, - u32 *rkey); -typedef int (*ib_fmr_unmap_func)(struct ib_device *device, - struct list_head *fmr_list); struct ib_mad; Index: src/linux-kernel/infiniband/include/ts_ib_magic.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_magic.h (revision 652) +++ src/linux-kernel/infiniband/include/ts_ib_magic.h (working copy) @@ -1,69 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#ifndef _IB_MAGIC_H -#define _IB_MAGIC_H - -#include "ts_kernel_trace.h" - -#define IB_MAGIC_INVALID 0xbadf00d -#define IB_MAGIC_DEVICE 0x11f11f -#define IB_MAGIC_ADDRESS 0x33f11f -#define IB_MAGIC_QP 0x44f11f -#define IB_MAGIC_CQ 0x55f11f -#define IB_MAGIC_MR 0x66f11f -#define IB_MAGIC_FMR 0x77f11f -#define IB_MAGIC_FMR_POOL 0x88f11f -#define IB_MAGIC_ASYNC 0x99f11f -#define IB_MAGIC_FILTER 0xaaf11f -#define IB_MAGIC_SMA 0xbbf11f -#define IB_MAGIC_PMA 0xccf11f -#define IB_MAGIC_MW 0xddf11f - -#define IB_DECLARE_MAGIC \ - unsigned long magic; -#define IB_GET_MAGIC(ptr) \ - (*(unsigned long *) (ptr)) -#define IB_SET_MAGIC(ptr, type) \ - do { \ - IB_GET_MAGIC(ptr) = IB_MAGIC_##type; \ - } while (0) -#define IB_CLEAR_MAGIC(ptr) \ - do { \ - IB_GET_MAGIC(ptr) = IB_MAGIC_INVALID; \ - } while (0) -#define IB_CHECK_MAGIC(ptr, type) \ - do { \ - if (!ptr) { \ - return -EINVAL; \ - } \ - if (IB_GET_MAGIC(ptr) != IB_MAGIC_##type) { \ - TS_REPORT_WARN(MOD_KERNEL_IB, "Bad magic 0x%lx at %p for %s", \ - IB_GET_MAGIC(ptr), ptr, #type); \ - return -EINVAL; \ - } \ - } while (0) -#define IB_TEST_MAGIC(ptr, type) \ - (IB_GET_MAGIC(ptr) == IB_MAGIC_##type) - -#endif /* _IB_MAGIC_H */ Index: src/linux-kernel/infiniband/include/ts_ib_core.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core.h (revision 692) +++ src/linux-kernel/infiniband/include/ts_ib_core.h (working copy) @@ -63,21 +63,6 @@ int index, tTS_IB_GID gid); -int ib_fmr_pool_create(struct ib_pd *pd, - struct ib_fmr_pool_param *params, - struct ib_fmr_pool **pool); -int ib_fmr_pool_destroy(struct ib_fmr_pool *pool); -int ib_fmr_pool_force_flush(struct ib_fmr_pool *pool); -int ib_fmr_register_physical(struct ib_fmr_pool *pool, - uint64_t *page_list, - int list_len, - uint64_t *io_virtual_address, - uint64_t iova_offset, - struct ib_fmr **fmr, - u32 *lkey, - u32 *rkey); -int ib_fmr_deregister(struct ib_fmr *fmr); - int ib_async_event_handler_register(struct ib_async_event_record *record, ib_async_event_handler_func function, void *arg, Index: src/linux-kernel/infiniband/include/ib_fmr_pool.h =================================================================== --- src/linux-kernel/infiniband/include/ib_fmr_pool.h (revision 0) +++ src/linux-kernel/infiniband/include/ib_fmr_pool.h (revision 0) @@ -0,0 +1,69 @@ +/* + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available at + * , or the OpenIB.org BSD + * license, available in the LICENSE.TXT file accompanying this + * software. These details are also available at + * . + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Copyright (c) 2004 Topspin Corporation. All rights reserved. + * + * $Id$ + */ + +#if !defined(IB_FMR_POOL_H) +#define IB_FMR_POOL_H + +#include + +struct ib_fmr_pool; + +struct ib_fmr_pool_param { + int max_pages_per_fmr; + enum ib_access_flags access; + int pool_size; + int dirty_watermark; + void (*flush_function)(struct ib_fmr_pool *pool, + void * arg); + void *flush_arg; + unsigned cache:1; +}; + +struct ib_pool_fmr { + struct ib_fmr *fmr; + struct ib_fmr_pool *pool; + struct list_head list; + struct hlist_node cache_node; + int ref_count; + int remap_count; + u64 io_virtual_address; + int page_list_len; + u64 page_list[0]; +}; + +int ib_create_fmr_pool(struct ib_pd *pd, + struct ib_fmr_pool_param *params, + struct ib_fmr_pool **pool_handle); + +int ib_destroy_fmr_pool(struct ib_fmr_pool *pool); + +int ib_flush_fmr_pool(struct ib_fmr_pool *pool); + +struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle, + u64 *page_list, + int list_len, + u64 *io_virtual_address); + +int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr); + +#endif /* IB_FMR_POOL_H */ Property changes on: src/linux-kernel/infiniband/include/ib_fmr_pool.h ___________________________________________________________________ Name: svn:keywords + Id Index: src/linux-kernel/infiniband/core/Makefile =================================================================== --- src/linux-kernel/infiniband/core/Makefile (revision 692) +++ src/linux-kernel/infiniband/core/Makefile (working copy) @@ -42,6 +42,7 @@ core_cq.o \ core_mr.o \ core_fmr.o \ + core_fmr_pool.o \ core_mw.o \ core_mcast.o \ core_async.o \ Index: src/linux-kernel/infiniband/core/mad_ib.c =================================================================== --- src/linux-kernel/infiniband/core/mad_ib.c (revision 692) +++ src/linux-kernel/infiniband/core/mad_ib.c (working copy) @@ -148,8 +148,6 @@ { struct ib_mad *buf; - IB_CHECK_MAGIC(mad->device, DEVICE); - buf = kmem_cache_alloc(mad_cache, (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL); Index: src/linux-kernel/infiniband/core/core_cache.c =================================================================== --- src/linux-kernel/infiniband/core/core_cache.c (revision 692) +++ src/linux-kernel/infiniband/core/core_cache.c (working copy) @@ -37,8 +37,6 @@ { struct ib_device_private *priv; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; memcpy(node_guid, priv->node_guid, sizeof (tTS_IB_GUID)); @@ -53,8 +51,6 @@ struct ib_device_private *priv; unsigned int seq; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; if (port < priv->start_port || port > priv->end_port) @@ -78,8 +74,6 @@ struct ib_device_private *priv; unsigned int seq; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; if (port < priv->start_port || port > priv->end_port) @@ -103,8 +97,6 @@ struct ib_device_private *priv; unsigned int seq; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; if (port < priv->start_port || port > priv->end_port) @@ -129,8 +121,6 @@ struct ib_device_private *priv; unsigned int seq; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; if (port < priv->start_port || port > priv->end_port) @@ -217,8 +207,6 @@ struct ib_device_private *priv; unsigned int seq; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; if (port < priv->start_port || port > priv->end_port) @@ -246,8 +234,6 @@ int i; int found; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; if (port < priv->start_port || port > priv->end_port) Index: src/linux-kernel/infiniband/core/core_fmr.c =================================================================== --- src/linux-kernel/infiniband/core/core_fmr.c (revision 692) +++ src/linux-kernel/infiniband/core/core_fmr.c (working copy) @@ -21,528 +21,58 @@ $Id$ */ -#include "core_priv.h" - -#include "ts_kernel_trace.h" -#include "ts_kernel_services.h" -#include "ts_kernel_thread.h" -#include "ts_kernel_hash.h" - -#include -#include - #include #include #include -#if defined(IB_FMR_NODEBUG) -# define IB_COMPILE_FMR_DEBUGGING_CODE 0 -#else -# define IB_COMPILE_FMR_DEBUGGING_CODE 1 -#endif +#include "core_priv.h" -enum { - TS_IB_FMR_MAX_REMAPS = 32, - - TS_IB_FMR_HASH_BITS = 8, - TS_IB_FMR_HASH_SIZE = 1 << TS_IB_FMR_HASH_BITS, - TS_IB_FMR_HASH_MASK = TS_IB_FMR_HASH_SIZE - 1 -}; - -/* - If an FMR is not in use, then the list member will point to either - its pool's free_list (if the FMR can be mapped again; that is, - remap_count < TS_IB_FMR_MAX_REMAPS) or its pool's dirty_list (if the - FMR needs to be unmapped before being remapped). In either of these - cases it is a bug if the ref_count is not 0. In other words, if - ref_count is > 0, then the list member must not be linked into - either free_list or dirty_list. - - The cache_node member is used to link the FMR into a cache bucket - (if caching is enabled). This is independent of the reference count - of the FMR. When a valid FMR is released, its ref_count is - decremented, and if ref_count reaches 0, the FMR is placed in either - free_list or dirty_list as appropriate. However, it is not removed - from the cache and may be "revived" if a call to - ib_fmr_register_physical() occurs before the FMR is remapped. In - this case we just increment the ref_count and remove the FMR from - free_list/dirty_list. - - Before we remap an FMR from free_list, we remove it from the cache - (to prevent another user from obtaining a stale FMR). When an FMR - is released, we add it to the tail of the free list, so that our - cache eviction policy is "least recently used." - - All manipulation of ref_count, list and cache_node is protected by - pool_lock to maintain consistency. -*/ - -struct ib_fmr_pool { - IB_DECLARE_MAGIC - struct ib_device *device; - - spinlock_t pool_lock; - - int pool_size; - int max_pages; - int dirty_watermark; - int dirty_len; - struct list_head free_list; - struct list_head dirty_list; - tTS_HASH_HEAD cache_bucket; - - tTS_KERNEL_THREAD thread; - - ib_fmr_flush_func flush_function; - void *flush_arg; - - atomic_t req_ser; - atomic_t flush_ser; - - wait_queue_head_t thread_wait; - wait_queue_head_t force_wait; -}; - -static inline u32 ib_fmr_hash(u64 first_page) +struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr) { - return tsKernelHashFunction((u32) (first_page >> PAGE_SHIFT), - TS_IB_FMR_HASH_MASK); -} - -/* Caller must hold pool_lock */ -static inline struct ib_fmr *ib_fmr_cache_lookup(struct ib_fmr_pool *pool, - u64 *page_list, - int page_list_len, - u64 io_virtual_address, - u64 iova_offset) -{ - tTS_HASH_HEAD bucket; struct ib_fmr *fmr; - if (!pool->cache_bucket) { - return NULL; - } + if (!pd->device->alloc_fmr) + return ERR_PTR(-ENOSYS); - bucket = &pool->cache_bucket[ib_fmr_hash(*page_list)]; - - TS_KERNEL_HASH_FOR_EACH_ENTRY(fmr, bucket, cache_node) - if (io_virtual_address == fmr->io_virtual_address && - iova_offset == fmr->iova_offset && - page_list_len == fmr->page_list_len && - !memcmp(page_list, fmr->page_list, page_list_len * sizeof *page_list)) - return fmr; - - return NULL; -} - -/* Caller must hold pool_lock */ -static inline void ib_fmr_cache_store(struct ib_fmr_pool *pool, - struct ib_fmr *fmr) -{ - tsKernelHashNodeAdd(&fmr->cache_node, - &pool->cache_bucket[ib_fmr_hash(fmr->page_list[0])]); -} - -/* Caller must hold pool_lock */ -static inline void ib_fmr_cache_remove(struct ib_fmr *fmr) -{ - if (!tsKernelHashNodeUnhashed(&fmr->cache_node)) - tsKernelHashNodeRemove(&fmr->cache_node); -} - -static void ib_fmr_batch_release(struct ib_fmr_pool *pool) -{ - int ret; - struct list_head *ptr; - struct ib_fmr *fmr; - LIST_HEAD(unmap_list); - - spin_lock_irq(&pool->pool_lock); - - list_for_each(ptr, &pool->dirty_list) { - fmr = list_entry(ptr, struct ib_fmr, list); - - ib_fmr_cache_remove(fmr); - fmr->remap_count = 0; - - if (IB_COMPILE_FMR_DEBUGGING_CODE) { - if (fmr->ref_count !=0) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "Unmapping FMR 0x%08x with ref count %d", - fmr, fmr->ref_count); - } - } + fmr = pd->device->alloc_fmr(pd, mr_access_flags, fmr_attr); + if (!IS_ERR(fmr)) { + fmr->device = pd->device; + fmr->pd = pd; + atomic_inc(&pd->usecnt); } - list_splice(&pool->dirty_list, &unmap_list); - INIT_LIST_HEAD(&pool->dirty_list); - pool->dirty_len = 0; - - spin_unlock_irq(&pool->pool_lock); - - if (list_empty(&unmap_list)) { - return; - } - - ret = pool->device->fmr_unmap(pool->device, &unmap_list); - if (ret) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "fmr_unmap for %s returns %d", - pool->device->name, ret); - } - - spin_lock_irq(&pool->pool_lock); - list_splice(&unmap_list, &pool->free_list); - spin_unlock_irq(&pool->pool_lock); + return fmr; } +EXPORT_SYMBOL(ib_alloc_fmr); -static void ib_fmr_cleanup_thread(void *pool_ptr) +int ib_unmap_fmr(struct list_head *fmr_list) { - struct ib_fmr_pool *pool = pool_ptr; - int ret; + struct ib_fmr *fmr; - while (!signal_pending(current)) { - ret = wait_event_interruptible(pool->thread_wait, - (pool->dirty_len >= - pool->dirty_watermark) || - (atomic_read(&pool->flush_ser) - - atomic_read(&pool->req_ser) < 0)); + if (list_empty(fmr_list)) + return 0; - TS_TRACE(MOD_KERNEL_IB, T_VERY_VERBOSE, TRACE_KERNEL_IB_GEN, - "cleanup thread woken up, dirty len = %d", - pool->dirty_len); - - if (ret) - break; - - ib_fmr_batch_release(pool); - - atomic_inc(&pool->flush_ser); - wake_up_interruptible(&pool->force_wait); - - if (pool->flush_function) - pool->flush_function(pool, pool->flush_arg); - } - - TS_REPORT_CLEANUP(MOD_KERNEL_IB, "FMR cleanup thread exiting"); + fmr = list_entry(fmr_list->next, struct ib_fmr, list); + return fmr->device->unmap_fmr(fmr_list); } +EXPORT_SYMBOL(ib_unmap_fmr); -int ib_fmr_pool_create(struct ib_pd *pd, - struct ib_fmr_pool_param *params, - struct ib_fmr_pool **pool_handle) +int ib_dealloc_fmr(struct ib_fmr *fmr) { - struct ib_device *device; - struct ib_fmr_pool *pool; - int i; + struct ib_pd *pd; int ret; - if (!params) { - return -EINVAL; - } + pd = fmr->pd; + ret = fmr->device->dealloc_fmr(fmr); + if (!ret) + atomic_dec(&pd->usecnt); - device = pd->device; - if (!device->fmr_create || - !device->fmr_destroy || - !device->fmr_map || - !device->fmr_unmap) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "Device %s does not support fast memory regions", - device->name); - return -ENOSYS; - } - - pool = kmalloc(sizeof *pool, GFP_KERNEL); - if (!pool) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "couldn't allocate pool struct"); - return -ENOMEM; - } - - pool->cache_bucket = NULL; - - pool->flush_function = params->flush_function; - pool->flush_arg = params->flush_arg; - - INIT_LIST_HEAD(&pool->free_list); - INIT_LIST_HEAD(&pool->dirty_list); - - if (params->cache) { - pool->cache_bucket = - kmalloc(TS_IB_FMR_HASH_SIZE * sizeof *pool->cache_bucket, GFP_KERNEL); - if (!pool->cache_bucket) { - TS_REPORT_WARN(MOD_KERNEL_IB, "Failed to allocate cache in pool"); - ret = -ENOMEM; - goto out_free_pool; - } - - for (i = 0; i < TS_IB_FMR_HASH_SIZE; ++i) { - tsKernelHashHeadInit(&pool->cache_bucket[i]); - } - } - - pool->device = device; - pool->pool_size = 0; - pool->max_pages = params->max_pages_per_fmr; - pool->dirty_watermark = params->dirty_watermark; - pool->dirty_len = 0; - spin_lock_init(&pool->pool_lock); - atomic_set(&pool->req_ser, 0); - atomic_set(&pool->flush_ser, 0); - init_waitqueue_head(&pool->thread_wait); - init_waitqueue_head(&pool->force_wait); - - ret = tsKernelThreadStart("ts_fmr", - ib_fmr_cleanup_thread, - pool, - &pool->thread); - - if (ret) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "couldn't start cleanup thread"); - goto out_free_pool; - } - - { - struct ib_fmr *fmr; - - for (i = 0; i < params->pool_size; ++i) { - fmr = kmalloc(sizeof *fmr + params->max_pages_per_fmr * sizeof (u64), - GFP_KERNEL); - if (!fmr) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "failed to allocate fmr struct for FMR %d", i); - goto out_fail; - } - - fmr->device = device; - fmr->pool = pool; - fmr->remap_count = 0; - fmr->ref_count = 0; - fmr->cache_node.pprev = NULL; - - if (device->fmr_create(pd, - params->access, - params->max_pages_per_fmr, - TS_IB_FMR_MAX_REMAPS, - fmr)) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "fmr_create failed for FMR %d", i); - kfree(fmr); - goto out_fail; - } - - IB_SET_MAGIC(fmr, FMR); - list_add_tail(&fmr->list, &pool->free_list); - ++pool->pool_size; - } - } - - IB_SET_MAGIC(pool, FMR_POOL); - *pool_handle = pool; - return 0; - - out_free_pool: - kfree(pool->cache_bucket); - kfree(pool); - return ret; - - out_fail: - IB_SET_MAGIC(pool, FMR_POOL); - ib_fmr_pool_destroy(pool); - *pool_handle = NULL; - - return -ENOMEM; } -EXPORT_SYMBOL(ib_fmr_pool_create); +EXPORT_SYMBOL(ib_dealloc_fmr); -int ib_fmr_pool_destroy(struct ib_fmr_pool *pool) -{ - struct list_head *ptr; - struct list_head *tmp; - struct ib_fmr *fmr; - int i; - - IB_CHECK_MAGIC(pool, FMR_POOL); - - tsKernelThreadStop(pool->thread); - ib_fmr_batch_release(pool); - - i = 0; - list_for_each_safe(ptr, tmp, &pool->free_list) { - fmr = list_entry(ptr, struct ib_fmr, list); - pool->device->fmr_destroy(fmr); - - list_del(ptr); - kfree(fmr); - ++i; - } - - if (i < pool->pool_size) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "pool still has %d regions registered", - pool->pool_size - i); - } - - kfree(pool->cache_bucket); - kfree(pool); - - return 0; -} -EXPORT_SYMBOL(ib_fmr_pool_destroy); - -int ib_fmr_pool_force_flush(struct ib_fmr_pool *pool) -{ - int serial; - - atomic_inc(&pool->req_ser); - /* It's OK if someone else bumps req_ser again here -- we'll - just wait a little longer. */ - serial = atomic_read(&pool->req_ser); - - wake_up_interruptible(&pool->thread_wait); - - if (wait_event_interruptible(pool->force_wait, - atomic_read(&pool->flush_ser) - - atomic_read(&pool->req_ser) >= 0)) - return -EINTR; - - return 0; -} -EXPORT_SYMBOL(ib_fmr_pool_force_flush); - -int ib_fmr_register_physical(struct ib_fmr_pool *pool_handle, - u64 *page_list, - int list_len, - u64 *io_virtual_address, - u64 iova_offset, - struct ib_fmr **fmr_handle, - u32 *lkey, - u32 *rkey) -{ - struct ib_fmr_pool *pool = pool_handle; - struct ib_fmr *fmr; - unsigned long flags; - int result; - - IB_CHECK_MAGIC(pool, FMR_POOL); - - if (list_len < 1 || list_len > pool->max_pages) { - return -EINVAL; - } - - spin_lock_irqsave(&pool->pool_lock, flags); - fmr = ib_fmr_cache_lookup(pool, - page_list, - list_len, - *io_virtual_address, - iova_offset); - if (fmr) { - /* found in cache */ - ++fmr->ref_count; - if (fmr->ref_count == 1) { - list_del(&fmr->list); - } - - spin_unlock_irqrestore(&pool->pool_lock, flags); - - *lkey = fmr->lkey; - *rkey = fmr->rkey; - *fmr_handle = fmr; - - return 0; - } - - if (list_empty(&pool->free_list)) { - spin_unlock_irqrestore(&pool->pool_lock, flags); - return -EAGAIN; - } - - fmr = list_entry(pool->free_list.next, struct ib_fmr, list); - list_del(&fmr->list); - ib_fmr_cache_remove(fmr); - spin_unlock_irqrestore(&pool->pool_lock, flags); - - result = pool->device->fmr_map(fmr, - page_list, - list_len, - io_virtual_address, - iova_offset, - lkey, - rkey); - - if (result) { - spin_lock_irqsave(&pool->pool_lock, flags); - list_add(&fmr->list, &pool->free_list); - spin_unlock_irqrestore(&pool->pool_lock, flags); - - TS_REPORT_WARN(MOD_KERNEL_IB, - "fmr_map returns %d", - result); - *fmr_handle = NULL; - - return -EINVAL; - } - - ++fmr->remap_count; - fmr->ref_count = 1; - - *fmr_handle = fmr; - - if (pool->cache_bucket) { - fmr->lkey = *lkey; - fmr->rkey = *rkey; - fmr->io_virtual_address = *io_virtual_address; - fmr->iova_offset = iova_offset; - fmr->page_list_len = list_len; - memcpy(fmr->page_list, page_list, list_len * sizeof(*page_list)); - - spin_lock_irqsave(&pool->pool_lock, flags); - ib_fmr_cache_store(pool, fmr); - spin_unlock_irqrestore(&pool->pool_lock, flags); - } - - return 0; -} -EXPORT_SYMBOL(ib_fmr_register_physical); - -int ib_fmr_deregister(struct ib_fmr *fmr_handle) -{ - struct ib_fmr *fmr = fmr_handle; - struct ib_fmr_pool *pool; - unsigned long flags; - - IB_CHECK_MAGIC(fmr, FMR); - - pool = fmr->pool; - - spin_lock_irqsave(&pool->pool_lock, flags); - - --fmr->ref_count; - if (!fmr->ref_count) { - if (fmr->remap_count < TS_IB_FMR_MAX_REMAPS) { - list_add_tail(&fmr->list, &pool->free_list); - } else { - list_add_tail(&fmr->list, &pool->dirty_list); - ++pool->dirty_len; - wake_up_interruptible(&pool->thread_wait); - } - } - - if (IB_COMPILE_FMR_DEBUGGING_CODE) { - if (fmr->ref_count < 0) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "FMR %p has ref count %d < 0", - fmr, fmr->ref_count); - } - } - - spin_unlock_irqrestore(&pool->pool_lock, flags); - - return 0; -} -EXPORT_SYMBOL(ib_fmr_deregister); - /* Local Variables: c-file-style: "linux" Index: src/linux-kernel/infiniband/core/core_priv.h =================================================================== --- src/linux-kernel/infiniband/core/core_priv.h (revision 692) +++ src/linux-kernel/infiniband/core/core_priv.h (working copy) @@ -79,7 +79,6 @@ void ib_remove_proc_dir(void); void ib_completion_thread(struct list_head *entry, void *device_ptr); void ib_async_thread(struct list_head *entry, void *device_ptr); -void ib_async_obj_init(struct ib_async_obj *async_obj, void *free_ptr); #endif /* _CORE_PRIV_H */ Index: src/linux-kernel/infiniband/core/core_async.c =================================================================== --- src/linux-kernel/infiniband/core/core_async.c (revision 692) +++ src/linux-kernel/infiniband/core/core_async.c (working copy) @@ -33,7 +33,6 @@ #include struct ib_async_event_handler { - IB_DECLARE_MAGIC struct ib_async_event_record record; ib_async_event_handler_func function; void *arg; @@ -75,14 +74,6 @@ [IB_PKEY_CHANGE] = { PORT, "P_Key Change" } }; -void ib_async_obj_init(struct ib_async_obj *async_obj, void *free_ptr) -{ - spin_lock_init(&async_obj->lock); - async_obj->free_ptr = free_ptr; - async_obj->pending = 0; - async_obj->dead = 0; -} - int ib_async_event_handler_register(struct ib_async_event_record *record, ib_async_event_handler_func function, void *arg, @@ -92,8 +83,6 @@ int ret; unsigned long flags; - IB_CHECK_MAGIC(record->device, DEVICE); - if (record->event < 0 || record->event >= ARRAY_SIZE(event_table)) { TS_REPORT_WARN(MOD_KERNEL_IB, "Attempt to register handler for invalid async event %d", @@ -137,7 +126,6 @@ break; } - IB_SET_MAGIC(handler, ASYNC); *handle = handler; return 0; @@ -152,13 +140,10 @@ struct ib_async_event_handler *handler = handle; unsigned long flags; - IB_CHECK_MAGIC(handle, ASYNC); - spin_lock_irqsave(handler->list_lock, flags); list_del(&handler->list); spin_unlock_irqrestore(handler->list_lock, flags); - IB_CLEAR_MAGIC(handle); kfree(handle); return 0; } @@ -168,7 +153,6 @@ { struct ib_async_event_list *event; struct ib_device_private *priv = event_record->device->core; - struct ib_async_obj *async_obj = NULL; unsigned long flags = 0; /* initialize to shut up gcc */ switch (event_table[event_record->event].mod) { @@ -176,12 +160,6 @@ break; } - if (async_obj) { - spin_lock_irqsave(&async_obj->lock, flags); - if (async_obj->dead) - goto out; - } - event = kmalloc(sizeof *event, GFP_ATOMIC); if (!event) { return; @@ -190,12 +168,6 @@ event->record = *event_record; tsKernelQueueThreadAdd(priv->async_thread, &event->list); - if (async_obj) - ++async_obj->pending; - -out: - if (async_obj) - spin_unlock_irqrestore(&async_obj->lock, flags); } EXPORT_SYMBOL(ib_async_event_dispatch); @@ -212,7 +184,6 @@ struct ib_async_event_handler *handler; ib_async_event_handler_func function; void *arg; - struct ib_async_obj *async_obj = NULL; event = list_entry(entry, struct ib_async_event_list, list); priv = ((struct ib_device *) event->record.device)->core; @@ -257,12 +228,6 @@ spin_lock_irq(handler_lock); - if (async_obj) { - spin_lock(&async_obj->lock); - if (async_obj->dead) - goto skip; - } - list_for_each_safe(pos, n, handler_list) { handler = list_entry(pos, struct ib_async_event_handler, list); if (handler->record.event == event->record.event) { @@ -275,14 +240,6 @@ } } -skip: - if (async_obj) { - --async_obj->pending; - if (async_obj->dead && !async_obj->pending) - kfree(async_obj->free_ptr); - spin_unlock(&async_obj->lock); - } - spin_unlock_irq(handler_lock); kfree(event); } Index: src/linux-kernel/infiniband/core/core_fmr_pool.c =================================================================== --- src/linux-kernel/infiniband/core/core_fmr_pool.c (revision 0) +++ src/linux-kernel/infiniband/core/core_fmr_pool.c (revision 0) @@ -0,0 +1,470 @@ +/* + This software is available to you under a choice of one of two + licenses. You may choose to be licensed under the terms of the GNU + General Public License (GPL) Version 2, available at + , or the OpenIB.org BSD + license, available in the LICENSE.TXT file accompanying this + software. These details are also available at + . + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + SOFTWARE. + + Copyright (c) 2004 Topspin Communications. All rights reserved. + + $Id$ +*/ + +#include +#include +#include +#include +#include + +#include + +#include "core_priv.h" + +enum { + IB_FMR_MAX_REMAPS = 32, + + IB_FMR_HASH_BITS = 8, + IB_FMR_HASH_SIZE = 1 << IB_FMR_HASH_BITS, + IB_FMR_HASH_MASK = IB_FMR_HASH_SIZE - 1 +}; + +/* + If an FMR is not in use, then the list member will point to either + its pool's free_list (if the FMR can be mapped again; that is, + remap_count < IB_FMR_MAX_REMAPS) or its pool's dirty_list (if the + FMR needs to be unmapped before being remapped). In either of these + cases it is a bug if the ref_count is not 0. In other words, if + ref_count is > 0, then the list member must not be linked into + either free_list or dirty_list. + + The cache_node member is used to link the FMR into a cache bucket + (if caching is enabled). This is independent of the reference count + of the FMR. When a valid FMR is released, its ref_count is + decremented, and if ref_count reaches 0, the FMR is placed in either + free_list or dirty_list as appropriate. However, it is not removed + from the cache and may be "revived" if a call to + ib_fmr_register_physical() occurs before the FMR is remapped. In + this case we just increment the ref_count and remove the FMR from + free_list/dirty_list. + + Before we remap an FMR from free_list, we remove it from the cache + (to prevent another user from obtaining a stale FMR). When an FMR + is released, we add it to the tail of the free list, so that our + cache eviction policy is "least recently used." + + All manipulation of ref_count, list and cache_node is protected by + pool_lock to maintain consistency. +*/ + +struct ib_fmr_pool { + spinlock_t pool_lock; + + int pool_size; + int max_pages; + int dirty_watermark; + int dirty_len; + struct list_head free_list; + struct list_head dirty_list; + struct hlist_head *cache_bucket; + + void (*flush_function)(struct ib_fmr_pool *pool, + void * arg); + void *flush_arg; + + struct task_struct *thread; + + atomic_t req_ser; + atomic_t flush_ser; + + wait_queue_head_t force_wait; +}; + +static inline u32 ib_fmr_hash(u64 first_page) +{ + return jhash_2words((u32) first_page, + (u32) (first_page >> 32), + 0); +} + +/* Caller must hold pool_lock */ +static inline struct ib_pool_fmr *ib_fmr_cache_lookup(struct ib_fmr_pool *pool, + u64 *page_list, + int page_list_len, + u64 io_virtual_address) +{ + struct hlist_head *bucket; + struct ib_pool_fmr *fmr; + struct hlist_node *pos; + + if (!pool->cache_bucket) + return NULL; + + bucket = pool->cache_bucket + ib_fmr_hash(*page_list); + + hlist_for_each_entry(fmr, pos, bucket, cache_node) + if (io_virtual_address == fmr->io_virtual_address && + page_list_len == fmr->page_list_len && + !memcmp(page_list, fmr->page_list, + page_list_len * sizeof *page_list)) + return fmr; + + return NULL; +} + +static void ib_fmr_batch_release(struct ib_fmr_pool *pool) +{ + int ret; + struct ib_pool_fmr *fmr; + LIST_HEAD(unmap_list); + LIST_HEAD(fmr_list); + + spin_lock_irq(&pool->pool_lock); + + list_for_each_entry(fmr, &pool->dirty_list, list) { + hlist_del_init(&fmr->cache_node); + fmr->remap_count = 0; + list_add_tail(&fmr->fmr->list, &fmr_list); + +#ifdef DEBUG + if (fmr->ref_count !=0) { + printk(KERN_WARNING "Unmapping FMR 0x%08x with ref count %d", + fmr, fmr->ref_count); + } +#endif + } + + list_splice(&pool->dirty_list, &unmap_list); + INIT_LIST_HEAD(&pool->dirty_list); + pool->dirty_len = 0; + + spin_unlock_irq(&pool->pool_lock); + + if (list_empty(&unmap_list)) { + return; + } + + ret = ib_unmap_fmr(&fmr_list); + if (ret) + printk(KERN_WARNING "ib_unmap_fmr returned %d", ret); + + spin_lock_irq(&pool->pool_lock); + list_splice(&unmap_list, &pool->free_list); + spin_unlock_irq(&pool->pool_lock); +} + +static int ib_fmr_cleanup_thread(void *pool_ptr) +{ + struct ib_fmr_pool *pool = pool_ptr; + + do { + if (pool->dirty_len >= pool->dirty_watermark || + atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0) { + ib_fmr_batch_release(pool); + + atomic_inc(&pool->flush_ser); + wake_up_interruptible(&pool->force_wait); + + if (pool->flush_function) + pool->flush_function(pool, pool->flush_arg); + } + + set_current_state(TASK_INTERRUPTIBLE); + if (pool->dirty_len < pool->dirty_watermark && + atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0 && + !kthread_should_stop()) + schedule(); + __set_current_state(TASK_RUNNING); + } while (!kthread_should_stop()); + + return 0; +} + +int ib_create_fmr_pool(struct ib_pd *pd, + struct ib_fmr_pool_param *params, + struct ib_fmr_pool **pool_handle) +{ + struct ib_device *device; + struct ib_fmr_pool *pool; + int i; + int ret; + + if (!params) { + return -EINVAL; + } + + device = pd->device; + if (!device->alloc_fmr || + !device->dealloc_fmr || + !device->map_phys_fmr || + !device->unmap_fmr) { + printk(KERN_WARNING "Device %s does not support fast memory regions", + device->name); + return -ENOSYS; + } + + pool = kmalloc(sizeof *pool, GFP_KERNEL); + if (!pool) { + printk(KERN_WARNING "couldn't allocate pool struct"); + return -ENOMEM; + } + + pool->cache_bucket = NULL; + + pool->flush_function = params->flush_function; + pool->flush_arg = params->flush_arg; + + INIT_LIST_HEAD(&pool->free_list); + INIT_LIST_HEAD(&pool->dirty_list); + + if (params->cache) { + pool->cache_bucket = + kmalloc(IB_FMR_HASH_SIZE * sizeof *pool->cache_bucket, + GFP_KERNEL); + if (!pool->cache_bucket) { + printk(KERN_WARNING "Failed to allocate cache in pool"); + ret = -ENOMEM; + goto out_free_pool; + } + + for (i = 0; i < IB_FMR_HASH_SIZE; ++i) + INIT_HLIST_HEAD(pool->cache_bucket + i); + } + + pool->pool_size = 0; + pool->max_pages = params->max_pages_per_fmr; + pool->dirty_watermark = params->dirty_watermark; + pool->dirty_len = 0; + spin_lock_init(&pool->pool_lock); + atomic_set(&pool->req_ser, 0); + atomic_set(&pool->flush_ser, 0); + init_waitqueue_head(&pool->force_wait); + + pool->thread = kthread_create(ib_fmr_cleanup_thread, + pool, + "ib_fmr(%s)", + device->name); + if (IS_ERR(pool->thread)) { + printk(KERN_WARNING "couldn't start cleanup thread"); + ret = PTR_ERR(pool->thread); + goto out_free_pool; + } + + { + struct ib_pool_fmr *fmr; + struct ib_fmr_attr attr = { + .max_pages = params->max_pages_per_fmr, + .max_maps = IB_FMR_MAX_REMAPS, + .page_size = PAGE_SHIFT + }; + + for (i = 0; i < params->pool_size; ++i) { + fmr = kmalloc(sizeof *fmr + params->max_pages_per_fmr * sizeof (u64), + GFP_KERNEL); + if (!fmr) { + printk(KERN_WARNING "failed to allocate fmr struct for FMR %d", i); + goto out_fail; + } + + fmr->pool = pool; + fmr->remap_count = 0; + fmr->ref_count = 0; + INIT_HLIST_NODE(&fmr->cache_node); + + fmr->fmr = ib_alloc_fmr(pd, params->access, &attr); + if (IS_ERR(fmr->fmr)) { + printk(KERN_WARNING "fmr_create failed for FMR %d", i); + kfree(fmr); + goto out_fail; + } + + list_add_tail(&fmr->list, &pool->free_list); + ++pool->pool_size; + } + } + + *pool_handle = pool; + return 0; + + out_free_pool: + kfree(pool->cache_bucket); + kfree(pool); + + return ret; + + out_fail: + ib_destroy_fmr_pool(pool); + *pool_handle = NULL; + + return -ENOMEM; +} +EXPORT_SYMBOL(ib_create_fmr_pool); + +int ib_destroy_fmr_pool(struct ib_fmr_pool *pool) +{ + struct ib_pool_fmr *fmr; + struct ib_pool_fmr *tmp; + int i; + + kthread_stop(pool->thread); + ib_fmr_batch_release(pool); + + i = 0; + list_for_each_entry_safe(fmr, tmp, &pool->free_list, list) { + ib_dealloc_fmr(fmr->fmr); + list_del(&fmr->list); + kfree(fmr); + ++i; + } + + if (i < pool->pool_size) + printk(KERN_WARNING "pool still has %d regions registered", + pool->pool_size - i); + + kfree(pool->cache_bucket); + kfree(pool); + + return 0; +} +EXPORT_SYMBOL(ib_destroy_fmr_pool); + +int ib_flush_fmr_pool(struct ib_fmr_pool *pool) +{ + int serial; + + atomic_inc(&pool->req_ser); + /* It's OK if someone else bumps req_ser again here -- we'll + just wait a little longer. */ + serial = atomic_read(&pool->req_ser); + + wake_up_process(pool->thread); + + if (wait_event_interruptible(pool->force_wait, + atomic_read(&pool->flush_ser) - + atomic_read(&pool->req_ser) >= 0)) + return -EINTR; + + return 0; +} +EXPORT_SYMBOL(ib_flush_fmr_pool); + +struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle, + u64 *page_list, + int list_len, + u64 *io_virtual_address) +{ + struct ib_fmr_pool *pool = pool_handle; + struct ib_pool_fmr *fmr; + unsigned long flags; + int result; + + if (list_len < 1 || list_len > pool->max_pages) + return ERR_PTR(-EINVAL); + + spin_lock_irqsave(&pool->pool_lock, flags); + fmr = ib_fmr_cache_lookup(pool, + page_list, + list_len, + *io_virtual_address); + if (fmr) { + /* found in cache */ + ++fmr->ref_count; + if (fmr->ref_count == 1) { + list_del(&fmr->list); + } + + spin_unlock_irqrestore(&pool->pool_lock, flags); + + return fmr; + } + + if (list_empty(&pool->free_list)) { + spin_unlock_irqrestore(&pool->pool_lock, flags); + return ERR_PTR(-EAGAIN); + } + + fmr = list_entry(pool->free_list.next, struct ib_pool_fmr, list); + list_del(&fmr->list); + hlist_del_init(&fmr->cache_node); + spin_unlock_irqrestore(&pool->pool_lock, flags); + + result = ib_map_phys_fmr(fmr->fmr, page_list, list_len, + *io_virtual_address); + + if (result) { + spin_lock_irqsave(&pool->pool_lock, flags); + list_add(&fmr->list, &pool->free_list); + spin_unlock_irqrestore(&pool->pool_lock, flags); + + printk(KERN_WARNING "fmr_map returns %d", + result); + + return ERR_PTR(result); + } + + ++fmr->remap_count; + fmr->ref_count = 1; + + if (pool->cache_bucket) { + fmr->io_virtual_address = *io_virtual_address; + fmr->page_list_len = list_len; + memcpy(fmr->page_list, page_list, list_len * sizeof(*page_list)); + + spin_lock_irqsave(&pool->pool_lock, flags); + hlist_add_head(&fmr->cache_node, + pool->cache_bucket + ib_fmr_hash(fmr->page_list[0])); + spin_unlock_irqrestore(&pool->pool_lock, flags); + } + + return fmr; +} +EXPORT_SYMBOL(ib_fmr_pool_map_phys); + +int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr) +{ + struct ib_fmr_pool *pool; + unsigned long flags; + + pool = fmr->pool; + + spin_lock_irqsave(&pool->pool_lock, flags); + + --fmr->ref_count; + if (!fmr->ref_count) { + if (fmr->remap_count < IB_FMR_MAX_REMAPS) { + list_add_tail(&fmr->list, &pool->free_list); + } else { + list_add_tail(&fmr->list, &pool->dirty_list); + ++pool->dirty_len; + wake_up_process(pool->thread); + } + } + +#ifdef DEBUG + if (fmr->ref_count < 0) + printk(KERN_WARNING "FMR %p has ref count %d < 0", + fmr, fmr->ref_count); +#endif + + spin_unlock_irqrestore(&pool->pool_lock, flags); + + return 0; +} +EXPORT_SYMBOL(ib_fmr_pool_unmap); + +/* + Local Variables: + c-file-style: "linux" + indent-tabs-mode: t + End: +*/ Property changes on: src/linux-kernel/infiniband/core/core_fmr_pool.c ___________________________________________________________________ Name: svn:keywords + Id Index: src/linux-kernel/infiniband/core/mad_priv.h =================================================================== --- src/linux-kernel/infiniband/core/mad_priv.h (revision 692) +++ src/linux-kernel/infiniband/core/mad_priv.h (working copy) @@ -81,7 +81,6 @@ }; struct ib_mad_filter_list { - IB_DECLARE_MAGIC struct ib_mad_filter filter; ib_mad_dispatch_func function; void *arg; Index: src/linux-kernel/infiniband/core/mad_filter.c =================================================================== --- src/linux-kernel/infiniband/core/mad_filter.c (revision 692) +++ src/linux-kernel/infiniband/core/mad_filter.c (working copy) @@ -334,7 +334,6 @@ filter->arg = arg; filter->matches = 0; filter->in_callback = 0; - IB_SET_MAGIC(filter, FILTER); if (down_interruptible(&filter_sem)) { kfree(filter); @@ -354,8 +353,6 @@ { struct ib_mad_filter_list *filter = handle; - IB_CHECK_MAGIC(filter, FILTER); - if (down_interruptible(&filter_sem)) return -EINTR; @@ -371,7 +368,6 @@ up(&filter_sem); - IB_CLEAR_MAGIC(filter); kfree(filter); return 0; } Index: src/linux-kernel/infiniband/core/core_device.c =================================================================== --- src/linux-kernel/infiniband/core/core_device.c (revision 692) +++ src/linux-kernel/infiniband/core/core_device.c (working copy) @@ -215,8 +215,6 @@ goto out_stop_async; } - IB_SET_MAGIC(device, DEVICE); - list_add_tail(&device->core_list, &device_list); { struct list_head *ptr; @@ -253,8 +251,6 @@ { struct ib_device_private *priv; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; if (tsKernelQueueThreadStop(priv->async_thread)) { @@ -279,7 +275,6 @@ } up(&device_lock); - IB_CLEAR_MAGIC(device); kfree(priv->port_data); kfree(priv); @@ -352,8 +347,6 @@ int ib_device_properties_get(struct ib_device *device, struct ib_device_properties *properties) { - IB_CHECK_MAGIC(device, DEVICE); - return device->device_query ? device->device_query(device, properties) : -ENOSYS; } EXPORT_SYMBOL(ib_device_properties_get); @@ -361,8 +354,6 @@ int ib_device_properties_set(struct ib_device *device, struct ib_device_changes *properties) { - IB_CHECK_MAGIC(device, DEVICE); - return device->device_modify ? device->device_modify(device, properties) : -ENOSYS; } @@ -370,8 +361,6 @@ tTS_IB_PORT port, struct ib_port_properties *properties) { - IB_CHECK_MAGIC(device, DEVICE); - return device->port_query ? device->port_query(device, port, properties) : -ENOSYS; } EXPORT_SYMBOL(ib_port_properties_get); @@ -384,8 +373,6 @@ struct ib_port_changes prop_set; unsigned long flags; - IB_CHECK_MAGIC(device, DEVICE); - priv = device->core; if (port < priv->start_port || port > priv->end_port) { @@ -455,8 +442,6 @@ int index, u16 *pkey) { - IB_CHECK_MAGIC(device, DEVICE); - return device->pkey_query ? device->pkey_query(device, port, index, pkey) : -ENOSYS; } EXPORT_SYMBOL(ib_pkey_entry_get); @@ -466,8 +451,6 @@ int index, tTS_IB_GID gid) { - IB_CHECK_MAGIC(device, DEVICE); - return device->gid_query ? device->gid_query(device, port, index, gid) : -ENOSYS; } EXPORT_SYMBOL(ib_gid_entry_get); From roland at topspin.com Fri Aug 27 19:54:47 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 27 Aug 2004 19:54:47 -0700 Subject: [openib-general] PATCH creation of ib_verbs.c file In-Reply-To: <20040827152807.1e7d60d5.mshefty@ichips.intel.com> (Sean Hefty's message of "Fri, 27 Aug 2004 15:28:07 -0700") References: <20040827152807.1e7d60d5.mshefty@ichips.intel.com> Message-ID: <52zn4g6kg8.fsf@topspin.com> Looks good. I've been meaning to consolidate a lot of the small core_xxx files for a while, this will give me the push to do it on my branch. Sean> ib_create_qp / ib_destroy_qp - include support for srq I'll merge this in. Sean> optional calls are not yet checked - I'll start a separate Sean> discussion on this. I'll reply separately :) Sean> ib_create_cq / ib_resize_cq - I was thinking that that Sean> device driver would set the struct cq.cqe value directly, Sean> rather than returning a changed &cqe value and requiring the Sean> access layer to set it. OK, seems reasonable. Sean> ib_rereg_phys_mr - checks for bound mw's and pd changes I'll merge this. - R. From roland at topspin.com Fri Aug 27 19:59:25 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 27 Aug 2004 19:59:25 -0700 Subject: [openib-general] optional function calls In-Reply-To: <20040827153756.0f29cdd9.mshefty@ichips.intel.com> (Sean Hefty's message of "Fri, 27 Aug 2004 15:37:56 -0700") References: <20040827153756.0f29cdd9.mshefty@ichips.intel.com> Message-ID: <52vff46k8i.fsf@topspin.com> Sean> I'm trying to decide which IB function require checks to see Sean> if they exist. Should we go by the spec and assume that all Sean> mandatory functions are implemented by the device driver? No, I don't think so. I think we might as well make the stack useful not just with full HCA hardware but also on embedded hardware (running on switches and TCAs). For example, a switch is unlikely to support memory windows (even though the MW verbs are mandatory). Sean> Or should we allow the minimal subset possible? This is my vote. Sean> Also, if calls like ib_create_srq and ib_attach_mcast check Sean> to see if the device implemented the functions, do we need Sean> the same check on calls to ib_destroy_srq and Sean> ib_detach_mcast? I.e. should we only perform the check in Sean> the creation calls? I think it's reasonable to only check the creation calls (and possibly put a check in the device registration function that if a device implements the create method, it implements all the other methods for an object -- although this falls down for things like CQ resize which might not be implemented). - R. From sean.hefty at intel.com Sat Aug 28 21:46:15 2004 From: sean.hefty at intel.com (Sean Hefty) Date: Sat, 28 Aug 2004 21:46:15 -0700 Subject: [openib-general] optional function calls In-Reply-To: <52vff46k8i.fsf@topspin.com> Message-ID: > Sean> I'm trying to decide which IB function require checks to see > Sean> if they exist. Should we go by the spec and assume that all > Sean> mandatory functions are implemented by the device driver? > >No, I don't think so. I think we might as well make the stack useful >not just with full HCA hardware but also on embedded hardware (running >on switches and TCAs). For example, a switch is unlikely to support >memory windows (even though the MW verbs are mandatory). This was my thinking as well, but wanted to see if anyone thought differently. > Sean> Or should we allow the minimal subset possible? > >This is my vote. If there's no disagreement, then I'll probably use the minimal subset in your core_device.c file. > Sean> Also, if calls like ib_create_srq and ib_attach_mcast check > Sean> to see if the device implemented the functions, do we need > Sean> the same check on calls to ib_destroy_srq and > Sean> ib_detach_mcast? I.e. should we only perform the check in > Sean> the creation calls? > >I think it's reasonable to only check the creation calls (and possibly >put a check in the device registration function that if a device >implements the create method, it implements all the other methods for >an object -- although this falls down for things like CQ resize which >might not be implemented). I'll add checks to only the necessary calls then. E.g. ib_create_srq, ib_attach_mcast, ib_resize_cq, ib_query_qp, etc. From gdror at mellanox.co.il Mon Aug 30 13:28:00 2004 From: gdror at mellanox.co.il (Dror Goldenberg) Date: Mon, 30 Aug 2004 23:28:00 +0300 Subject: [openib-general] Multicast address aliasing in IPoIB Message-ID: <506C3D7B14CDD411A52C00025558DED605E00235@mtlex01.yok.mtl.com> IPoIB defines no aliasing in the mapping of IP multicast address into IPoIB HW addresses. In Ethernet, there is an aliasing, i.e. more than one IP address can map into the same Ethernet multicast MAC address. In short: IP to Ether takes 24 LSbits from the IP address IP to IB takes 28 LSbits from the IP address (which are essentially the whole IP address, the remaining 4 bits are "class D prefix"). The problem is that the current IPoIB driver interfaces the Linux kernel as if it were an Ethernet driver. Therefore, the IP layer will not notify the net_device when a new MC address is added if it maps to the same MAC address. It will rather increment the reference count of the MAC address (net_device->mc_list->dmi_user) and won't call net_device->set_multicast_list(). Therefore, if a user just adds itself to an IP MC group (setsockopt with IP_ADD_MEMBERSHIP), then if the IPoIB driver already has this Ether MAC address in its filter because of a previous registration to another IP MC group, then the IPoIB driver will not get any notification, and the user will not get registered to the MCG. I was wondering what should be the solution for that in the current kernels (gen1) and in future kernels (gen2). For gen2, will it be possible to define a new medium for the IPoIB driver (not ARPHRD_ETHER), such that arp_mc_map() will map the entire IP address into the HW address ? Today it looks impossible, because arp_mc_map() just overrides bits 31:24 of the IP address. For gen1, what is that we can do ? Is there a way to obtain such an event from the in_device ? If not, then I don't see any clean escape. Is it possible to periodically check out the in_device multicast list and see if anything has changed ? would that cause any problem during the transition periods ? Any other idea of how to do that without a kernel patch ? - Dror * For reference: The algorithm for mapping IP mcast address to Ether mcast address is defined in RFC 1113 section 6.4., and for IB in draft-ietf-ipoib-ip-over-infiniband-07.txt section 4.0. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at topspin.com Mon Aug 30 13:35:03 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 30 Aug 2004 13:35:03 -0700 Subject: [openib-general] Multicast address aliasing in IPoIB In-Reply-To: <506C3D7B14CDD411A52C00025558DED605E00235@mtlex01.yok.mtl.com> (Dror Goldenberg's message of "Mon, 30 Aug 2004 23:28:00 +0300") References: <506C3D7B14CDD411A52C00025558DED605E00235@mtlex01.yok.mtl.com> Message-ID: <52r7po5pqg.fsf@topspin.com> Dror> For gen2, will it be possible to define a new medium for the Dror> IPoIB driver (not ARPHRD_ETHER), such that arp_mc_map() will Dror> map the entire IP address into the HW address ? Today it Dror> looks impossible, because arp_mc_map() just overrides bits Dror> 31:24 of the IP address. I guess when we merge the IPoIB driver we will need to include a patch to the networking core that treats ARPHRD_INFINIBAND properly for IPv4 and IPv6 multicast addresses. Dror> For gen1, what is that we can do ? Is there a way to obtain Dror> such an event from the in_device ? If not, then I don't see Dror> any clean escape. Is it possible to periodically check out Dror> the in_device multicast list and see if anything has changed Dror> ? would that cause any problem during the transition periods Dror> ? Any other idea of how to do that without a kernel patch ? We're not developing the gen1 tree anymore so I don't think we need to worry about this. - Roland From roland at topspin.com Mon Aug 30 18:21:25 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 30 Aug 2004 18:21:25 -0700 Subject: [openib-general] [PATCH] update to new query API Message-ID: <5265705ch6.fsf@topspin.com> This updates my branch to use the new query device/port/gid/pkey API. The only method in struct ib_device I still need to convert is mad_process. I'm going to hold off on that though and work on async events and sysfs/driver model stuff first... - R Index: src/linux-kernel/infiniband/ulp/dapl/udapl_mod.c =================================================================== --- src/linux-kernel/infiniband/ulp/dapl/udapl_mod.c (revision 692) +++ src/linux-kernel/infiniband/ulp/dapl/udapl_mod.c (working copy) @@ -1317,7 +1317,7 @@ switch (event->event) { - case IB_PORT_ACTIVE: + case IB_EVENT_PORT_ACTIVE: if ((event->modifier.port < 1) || (event->modifier.port > MAX_PORTS_PER_HCA)) { @@ -3007,7 +3007,7 @@ /* FIXME: We should do this for each device supported */ struct ib_async_event_record event_record = { .device = ib_device_get_by_index(0), - .event = IB_PORT_ACTIVE, + .event = IB_EVENT_PORT_ACTIVE, }; status = ib_async_event_handler_register(&event_record, Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (revision 692) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (working copy) @@ -291,21 +291,21 @@ static void ipoib_device_notifier(struct ib_device_notifier *self, struct ib_device *device, int event) { - struct ib_device_properties props; + struct ib_device_attr props; int port; switch (event) { case IB_DEVICE_NOTIFIER_ADD: - if (ib_device_properties_get(device, &props)) { + if (ib_query_device(device, &props)) { TS_REPORT_WARN(MOD_IB_NET, "ib_device_properties_get failed"); return; } - if (props.is_switch) { + if (device->flags & IB_DEVICE_IS_SWITCH) { if (try_module_get(device->owner)) ipoib_add_port("ib%d", device, 0); } else { - for (port = 1; port <= props.num_port; ++port) + for (port = 1; port <= props.phys_port_cnt; ++port) if (try_module_get(device->owner)) ipoib_add_port("ib%d", device, port); } @@ -316,7 +316,7 @@ underneath us yet! */ TS_REPORT_WARN(MOD_IB_NET, "IPoIB driver can't handle removal of device %s", - props.name); + device->name); break; default: @@ -345,7 +345,7 @@ { struct ipoib_dev_priv *priv = priv_ptr; - if (record->event == IB_PORT_ACTIVE) { + if (record->event == IB_EVENT_PORT_ACTIVE) { TS_TRACE(MOD_IB_NET, T_VERBOSE, TRACE_IB_NET_GEN, "%s: Port active Event", priv->dev.name); @@ -361,7 +361,7 @@ struct ipoib_dev_priv *priv = dev->priv; struct ib_async_event_record event_record = { .device = priv->ca, - .event = IB_PORT_ACTIVE, + .event = IB_EVENT_PORT_ACTIVE, }; if (ib_async_event_handler_register(&event_record, Index: src/linux-kernel/infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ip2pr_link.c (revision 692) +++ src/linux-kernel/infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -1348,9 +1348,8 @@ if ((sgid_elmt->ca == record->device) && (sgid_elmt->port == record->modifier.port)) { sgid_elmt->port_state = - (record->event == - IB_PORT_ACTIVE) ? IB_PORT_STATE_ACTIVE : - IB_PORT_STATE_DOWN; + record->event == IB_EVENT_PORT_ACTIVE ? + IB_PORT_ACTIVE : IB_PORT_DOWN; /* Gid could have changed. Get the gid */ if (ib_cached_gid_get(record->device, @@ -1631,16 +1630,16 @@ for (sgid_elmt = _tsIp2prLinkRoot.src_gid_list; NULL != sgid_elmt; sgid_elmt = sgid_elmt->next) { - if (IB_PORT_STATE_ACTIVE == sgid_elmt->port_state) { + if (IB_PORT_ACTIVE == sgid_elmt->port_state) { /* * if the port is active and the gid is zero, then getting the * gid in the async handler had failed. Try to get it now. */ if (0 == memcmp(sgid_elmt->gid, nullgid, sizeof(nullgid))) { - if (ib_gid_entry_get(sgid_elmt->ca, - sgid_elmt->port, 0, - sgid_elmt->gid)) { + if (ib_query_gid(sgid_elmt->ca, + sgid_elmt->port, 0, + (union ib_gid *) sgid_elmt->gid)) { TS_TRACE(MOD_IP2PR, T_VERBOSE, TRACE_FLOW_WARN, "Could not get GID: on hca=<%d>,port=<%d>", @@ -1806,7 +1805,8 @@ } memset(sgid_elmt, 0, sizeof(*sgid_elmt)); - if (ib_gid_entry_get(hca_device, port, 0, sgid_elmt->gid)) { + if (ib_query_gid(hca_device, port, 0, + (union ib_gid *) sgid_elmt->gid)) { kmem_cache_free(_tsIp2prLinkRoot.src_gid_cache, sgid_elmt); return (-EFAULT); } @@ -2009,8 +2009,8 @@ s32 result = 0; int i, j; struct ib_device *hca_device; - struct ib_device_properties dev_prop; - struct ib_port_properties port_prop; + struct ib_device_attr dev_prop; + struct ib_port_attr port_prop; _tsIp2prLinkRoot.src_gid_cache = kmem_cache_create("Ip2prSrcGidList", sizeof @@ -2043,19 +2043,19 @@ * Create SGID list for each port on hca */ for (i = 0; ((hca_device = ib_device_get_by_index(i)) != NULL); ++i) { - if (ib_device_properties_get(hca_device, &dev_prop)) { + if (ib_query_device(hca_device, &dev_prop)) { TS_REPORT_FATAL(MOD_IB_NET, "ib_device_properties_get() failed"); return -EINVAL; } - for (j = 1; j <= dev_prop.num_port; j++) { - if (ib_port_properties_get(hca_device, j, &port_prop)) { + for (j = 1; j <= dev_prop.phys_port_cnt; j++) { + if (ib_query_port(hca_device, j, &port_prop)) { continue; } result = ip2pr_src_gid_add(hca_device, j, - port_prop.port_state); + port_prop.state); if (0 > result) { goto port_err; } @@ -2160,7 +2160,7 @@ } /* if */ evt_rec.device = hca_device; - evt_rec.event = IB_PORT_ACTIVE; + evt_rec.event = IB_EVENT_PORT_ACTIVE; result = ib_async_event_handler_register(&evt_rec, ip2pr_async_event_func, NULL, Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (revision 692) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -579,7 +579,7 @@ priv->local_lid = port_lid.lid; } - if (ib_gid_entry_get(priv->ca, priv->port, 0, priv->local_gid.raw)) + if (ib_query_gid(priv->ca, priv->port, 0, &priv->local_gid)) TS_REPORT_WARN(MOD_IB_NET, "%s: ib_gid_entry_get() failed", dev->name); Index: src/linux-kernel/infiniband/ulp/srp/srp_dm.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_dm.c (revision 692) +++ src/linux-kernel/infiniband/ulp/srp/srp_dm.c (working copy) @@ -1271,7 +1271,7 @@ handler = srp_in_service_handler; completion_handler = srp_in_service_completion; - if (port->port_state != IB_PORT_STATE_ACTIVE) + if (port->port_state != IB_PORT_ACTIVE) return; TS_REPORT_STAGE(MOD_SRPTP, @@ -1321,7 +1321,7 @@ handler = srp_out_of_service_handler; completion_handler = srp_out_of_service_completion; - if (port->port_state != IB_PORT_STATE_ACTIVE) + if (port->port_state != IB_PORT_ACTIVE) return; TS_REPORT_STAGE(MOD_SRPTP, @@ -1361,7 +1361,7 @@ { int status = -1; - if (port->port_state == IB_PORT_STATE_ACTIVE) { + if (port->port_state == IB_PORT_ACTIVE) { port->dm_query_in_progress = 1; @@ -1410,7 +1410,7 @@ switch (event->event) { - case IB_PORT_ACTIVE: + case IB_EVENT_PORT_ACTIVE: /* * Wake up that DM thread, so that it will start * a fresh scan and initiate connections to discovered @@ -1482,7 +1482,7 @@ * Cannot call refresh hca info as the HCA may be hung, * simply mark the port as being down */ - port->port_state = IB_PORT_STATE_DOWN; + port->port_state = IB_PORT_DOWN; /* * Two step process to update the paths for the IOCs. Index: src/linux-kernel/infiniband/ulp/srp/srptp.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srptp.c (revision 696) +++ src/linux-kernel/infiniband/ulp/srp/srptp.c (working copy) @@ -207,7 +207,7 @@ extern Scsi_Host_Template driver_template; int sg_elements = driver_template.sg_tablesize; int port_index; - struct ib_device_properties device_properties; + struct ib_device_attr device_properties; tsKernelTraceLevelSet(MOD_SRPTP, srp_tracelevel); @@ -245,8 +245,7 @@ TS_REPORT_STAGE(MOD_SRPTP, "Found HCA %d %p", hca_index, hca); - status = ib_device_properties_get(hca->ca_hndl, - &device_properties); + status = ib_query_device(hca->ca_hndl, &device_properties); if (status != 0) { TS_REPORT_FATAL(MOD_SRPTP, "Property query failed with " @@ -256,7 +255,8 @@ hca->valid = 1; hca->hca_index = hca_index; - for (port_index = 0; port_index < MAX_LOCAL_PORTS_PER_HCA; + for (port_index = 0; + port_index < device_properties.phys_port_cnt; port_index++) { /* * Apply IB ports mask here @@ -269,7 +269,7 @@ } memset(&hca->I_PORT_ID[0], 0, 16); - memcpy(&hca->I_PORT_ID[8], device_properties.node_guid, 8); + memcpy(&hca->I_PORT_ID[8], &device_properties.node_guid, 8); TS_REPORT_STAGE(MOD_SRPTP, "SRP Initiator GUID: %llx for hca %d", @@ -592,7 +592,7 @@ { int status, hca_index, port_index; - struct ib_port_properties port_properties; + struct ib_port_attr port_properties; srp_host_port_params_t *port; srp_host_hca_params_t *hca; int port_active_count = 0; @@ -611,9 +611,9 @@ set_current_state(TASK_INTERRUPTIBLE); schedule_timeout(HZ / 10); - status = ib_port_properties_get(hca->ca_hndl, - port_index + 1, - &port_properties); + status = ib_query_port(hca->ca_hndl, + port_index + 1, + &port_properties); if (status != 0) { TS_REPORT_WARN(MOD_SRPTP, @@ -622,14 +622,14 @@ } port->slid = port_properties.lid; - port->port_state = port_properties.port_state; + port->port_state = port_properties.state; - if (port->port_state == IB_PORT_STATE_ACTIVE) + if (port->port_state == IB_PORT_ACTIVE) port_active_count++; - status = ib_gid_entry_get(hca->ca_hndl, - port->local_port, 0, - port->local_gid); + status = ib_query_gid(hca->ca_hndl, + port->local_port, 0, + (union ib_gid *) port->local_gid); if (status) { TS_REPORT_WARN(MOD_SRPTP, Index: src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (revision 696) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -1898,7 +1898,7 @@ struct ib_fmr_pool_param fmr_param_s; #endif struct ib_phys_buf buffer_list; - struct ib_device_properties node_info; + struct ib_device_attr node_info; struct ib_device *hca_handle; struct sdev_hca_port *port; struct sdev_hca *hca; @@ -1929,7 +1929,7 @@ if (!hca_handle || !try_module_get(hca_handle->owner)) continue; - result = ib_device_properties_get(hca_handle, &node_info); + result = ib_query_device(hca_handle, &node_info); if (0 != result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_FATAL, @@ -2026,7 +2026,7 @@ /* * port allocation */ - for (port_count = 0; port_count < node_info.num_port; + for (port_count = 0; port_count < node_info.phys_port_cnt; port_count++) { port = kmalloc(sizeof(struct sdev_hca_port), @@ -2037,7 +2037,7 @@ TRACE_FLOW_FATAL, "INIT: Error allocating HCA <%d:%d> port <%x:%d> memory.", hca_handle, hca_count, port_count, - node_info.num_port); + node_info.phys_port_cnt); result = -ENOMEM; goto error; @@ -2049,17 +2049,17 @@ port->next = hca->port_list; hca->port_list = port; - result = ib_gid_entry_get(hca->ca, - port->index, - 0, /* index */ - port->gid); + result = ib_query_gid(hca->ca, + port->index, + 0, /* index */ + (union ib_gid *) port->gid); if (0 != result) { TS_TRACE(MOD_LNX_SDP, T_VERBOSE, TRACE_FLOW_FATAL, "INIT: Error <%d> getting GID for HCA <%d:%d> port <%d:%d>", result, hca->ca, hca_count, - port->index, node_info.num_port); + port->index, node_info.phys_port_cnt); goto error; } } Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 696) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -47,6 +47,159 @@ } global; }; +enum ib_device_cap_flags { + IB_DEVICE_RESIZE_MAX_WR = 1, + IB_DEVICE_BAD_PKEY_CNTR = (1<<1), + IB_DEVICE_BAD_QKEY_CNTR = (1<<2), + IB_DEVICE_RAW_MULTI = (1<<3), + IB_DEVICE_AUTO_PATH_MIG = (1<<4), + IB_DEVICE_CHANGE_PHY_PORT = (1<<5), + IB_DEVICE_UD_AV_PORT_ENFORCE = (1<<6), + IB_DEVICE_CURR_QP_STATE_MOD = (1<<7), + IB_DEVICE_SHUTDOWN_PORT = (1<<8), + IB_DEVICE_INIT_TYPE = (1<<9), + IB_DEVICE_PORT_ACTIVE_EVENT = (1<<10), + IB_DEVICE_SYS_IMAGE_GUID = (1<<11), + IB_DEVICE_RC_RNR_NAK_GEN = (1<<12), + IB_DEVICE_SRQ_RESIZE = (1<<13), + IB_DEVICE_N_NOTIFY_CQ = (1<<14), + IB_DEVICE_RQ_SIG_TYPE = (1<<15) +}; + +enum ib_atomic_cap { + IB_ATOMIC_NONE, + IB_ATOMIC_HCA, + IB_ATOMIC_GLOB +}; + +struct ib_device_attr { + u64 fw_ver; + u64 node_guid; + u64 sys_image_guid; + u64 max_mr_size; + u64 page_size_cap; + u32 vendor_id; + u32 vendor_part_id; + u32 hw_ver; + int max_qp; + int max_qp_wr; + int device_cap_flags; + int max_sge; + int max_sge_rd; + int max_cq; + int max_cqe; + int max_mr; + int max_pd; + int phys_port_cnt; + int max_qp_rd_atom; + int max_ee_rd_atom; + int max_res_rd_atom; + int max_qp_init_rd_atom; + int max_ee_init_rd_atom; + enum ib_atomic_cap atomic_cap; + int max_ee; + int max_rdd; + int max_mw; + int max_raw_ipv6_qp; + int max_raw_ethy_qp; + int max_mcast_grp; + int max_mcast_qp_attach; + int max_total_mcast_qp_attach; + int max_ah; + int max_fmr; + int max_map_per_fmr; + int max_srq; + int max_srq_wr; + int max_srq_sge; + u16 max_pkeys; + u8 local_ca_ack_delay; +}; + +enum ib_mtu { + IB_MTU_256 = 1, + IB_MTU_512 = 2, + IB_MTU_1024 = 3, + IB_MTU_2048 = 4, + IB_MTU_4096 = 5 +}; + +enum ib_static_rate { + IB_STATIC_RATE_FULL = 0, + IB_STATIC_RATE_12X_TO_4X = 2, + IB_STATIC_RATE_4X_TO_1X = 3, + IB_STATIC_RATE_12X_TO_1X = 11 +}; + +enum ib_port_state { + IB_PORT_NOP = 0, + IB_PORT_DOWN = 1, + IB_PORT_INIT = 2, + IB_PORT_ARMED = 3, + IB_PORT_ACTIVE = 4, + IB_PORT_ACTIVE_DEFER = 5 +}; + +enum ib_port_cap_flags { + IB_PORT_SM = (1<<31), + IB_PORT_NOTICE_SUP = (1<<30), + IB_PORT_TRAP_SUP = (1<<29), + IB_PORT_AUTO_MIGR_SUP = (1<<27), + IB_PORT_SL_MAP_SUP = (1<<26), + IB_PORT_MKEY_NVRAM = (1<<25), + IB_PORT_PKEY_NVRAM = (1<<24), + IB_PORT_LED_INFO_SUP = (1<<23), + IB_PORT_SM_DISABLED = (1<<22), + IB_PORT_SYS_IMAGE_GUID_SUP = (1<<21), + IB_PORT_PKEY_SW_EXT_PORT_TRAP_SUP = (1<<20), + IB_PORT_CM_SUP = (1<<16), + IB_PORT_SNMP_TUNNEL_SUP = (1<<15), + IB_PORT_REINIT_SUP = (1<<14), + IB_PORT_DEVICE_MGMT_SUP = (1<<13), + IB_PORT_VENDOR_CLASS_SUP = (1<<12), + IB_PORT_DR_NOTICE_SUP = (1<<11), + IB_PORT_PORT_NOTICE_SUP = (1<<10), + IB_PORT_BOOT_MGMT_SUP = (1<<9) +}; + +struct ib_port_attr { + enum ib_port_state state; + enum ib_mtu max_mtu; + enum ib_mtu active_mtu; + int gid_tbl_len; + u32 port_cap_flags; + u32 max_msg_sz; + u32 bad_pkey_cntr; + u32 qkey_viol_cntr; + u16 pkey_tbl_len; + u16 lid; + u16 sm_lid; + u8 lmc; + u8 max_vl_num; + u8 sm_sl; + u8 subnet_timeout; + u8 init_type_reply; +}; + +enum ib_device_modify_flags { + IB_DEVICE_MODIFY_SYS_IMAGE_GUID = 1 +}; + +struct ib_device_modify { + u64 sys_image_guid; +}; + +enum ib_port_modify_flags { + IB_PORT_SHUTDOWN = 1, + IB_PORT_INIT_TYPE = (1<<2), + IB_PORT_RESET_QKEY_CNTR = (1<<3) +}; + +struct ib_port_modify { + u32 set_port_cap_mask; + u32 clr_port_cap_mask; + u8 init_type; +}; + struct ib_global_route { union ib_gid dgid; u32 flow_label; @@ -228,14 +381,6 @@ IB_QPS_ERR }; -enum ib_mtu { - IB_MTU_256 = 1, - IB_MTU_512 = 2, - IB_MTU_1024 = 3, - IB_MTU_2048 = 4, - IB_MTU_4096 = 5 -}; - enum ib_mig_state { IB_MIG_MIGRATED, IB_MIG_REARM, @@ -426,6 +571,8 @@ u32 rkey; }; +#define IB_DEVICE_NAME_MAX 64 + struct ib_device { struct module *owner; struct pci_dev *dma_device; @@ -438,12 +585,22 @@ void *mad; u32 flags; - ib_device_query_func device_query; - ib_device_modify_func device_modify; - ib_port_query_func port_query; - ib_port_modify_func port_modify; - ib_pkey_query_func pkey_query; - ib_gid_query_func gid_query; + int (*query_device)(struct ib_device *device, + struct ib_device_attr *device_attr); + int (*query_port)(struct ib_device *device, + u8 port_num, + struct ib_port_attr *port_attr); + int (*query_gid)(struct ib_device *device, + u8 port_num, int index, + union ib_gid *gid); + int (*query_pkey)(struct ib_device *device, + u8 port_num, u16 index, u16 *pkey); + int (*modify_device)(struct ib_device *device, + int device_modify_mask, + struct ib_device_modify *device_modify); + int (*modify_port)(struct ib_device *device, + u8 port_num, int port_modify_mask, + struct ib_port_modify *port_modify); struct ib_pd * (*alloc_pd)(struct ib_device *device); int (*dealloc_pd)(struct ib_pd *pd); struct ib_ah * (*create_ah)(struct ib_pd *pd, @@ -521,6 +678,26 @@ struct class_device class_dev; }; +int ib_query_device(struct ib_device *device, + struct ib_device_attr *device_attr); + +int ib_query_port(struct ib_device *device, + u8 port_num, struct ib_port_attr *port_attr); + +int ib_query_gid(struct ib_device *device, + u8 port_num, int index, union ib_gid *gid); + +int ib_query_pkey(struct ib_device *device, + u8 port_num, u16 index, u16 *pkey); + +int ib_modify_device(struct ib_device *device, + int device_modify_mask, + struct ib_device_modify *device_modify); + +int ib_modify_port(struct ib_device *device, + u8 port_num, int port_modify_mask, + struct ib_port_modify *port_modify); + struct ib_pd *ib_alloc_pd(struct ib_device *device); int ib_dealloc_pd(struct ib_pd *pd); Index: src/linux-kernel/infiniband/include/ts_ib_core_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core_types.h (revision 696) +++ src/linux-kernel/infiniband/include/ts_ib_core_types.h (working copy) @@ -50,60 +50,6 @@ struct ib_device; -enum ib_port_state { - IB_PORT_STATE_NOP = 0, - IB_PORT_STATE_DOWN = 1, - IB_PORT_STATE_INIT = 2, - IB_PORT_STATE_ARMED = 3, - IB_PORT_STATE_ACTIVE = 4 -}; - -enum ib_mad_result { - IB_MAD_RESULT_FAILURE = 0, // (!SUCCESS is the important flag) - IB_MAD_RESULT_SUCCESS = 1 << 0, // MAD was successfully processed - IB_MAD_RESULT_REPLY = 1 << 1, // Reply packet needs to be sent - IB_MAD_RESULT_CONSUMED = 1 << 2 // Packet consumed: stop processing -}; - -struct ib_port_properties { - int max_mtu; - __u32 max_message_size; - __u16 lid; - __u8 lmc; - enum ib_port_state port_state; - int gid_table_length; - int pkey_table_length; - int max_vl; - __u32 bad_pkey_counter; - __u32 qkey_violation_counter; - __u8 init_type_reply; - __u16 sm_lid; - tTS_IB_SL sm_sl; - tTS_IB_TIME subnet_timeout; - __u32 capability_mask; -}; - -enum ib_port_properties_mask { - IB_PORT_SHUTDOWN_PORT = 1 << 0, - IB_PORT_INIT_TYPE = 1 << 1, - IB_PORT_QKEY_VIOLATION_COUNTER_RESET = 1 << 2, - IB_PORT_IS_SM = 1 << 3, - IB_PORT_IS_SNMP_TUNNELING_SUPPORTED = 1 << 4, - IB_PORT_IS_DEVICE_MANAGEMENT_SUPPORTED = 1 << 5, - IB_PORT_IS_VENDOR_CLASS_SUPPORTED = 1 << 6 -}; - -struct ib_port_changes { - enum ib_port_properties_mask valid_fields; - __u8 init_type; - int shutdown:1; - int qkey_violation_counter_reset:1; - int is_sm:1; - int is_snmp_tunneling_supported:1; - int is_device_management_supported:1; - int is_vendor_class_supported:1; -}; - enum ib_rate { IB_RATE_2GB5 = 2, IB_RATE_10GB = 3, @@ -126,67 +72,6 @@ __u8 preference; }; -#define IB_DEVICE_NAME_MAX 64 - -enum ib_atomic_support { - IB_NO_ATOMIC_OPS, - IB_ATOMIC_HCA, - IB_ATOMIC_ALL -}; - -struct ib_device_properties { - char name[IB_DEVICE_NAME_MAX]; - char *provider; - __u32 vendor_id; - __u16 device_id; - __u32 hw_rev; - __u64 fw_rev; - int max_qp; - int max_wr_per_qp; - int max_wr_per_post; - int max_sg_per_wr; - int max_sg_per_wr_rd; - int max_cq; - int max_mr; - __u64 max_mr_size; - int max_pd; - int page_size_cap; - int num_port; - int max_pkey; - tTS_IB_TIME local_ca_ack_delay; - int max_responder_per_qp; - int max_responder_per_eec; - int max_responder_per_hca; - int max_initiator_per_qp; - int max_initiator_per_eec; - enum ib_atomic_support atomic_support; - int max_eec; - int max_rdd; - int max_mw; - int max_raw_ipv6_qp; - int max_raw_ethertype_qp; - int max_mcg; - int max_mc_qp; - int max_qp_per_mcg; - int max_ah; - int max_fmr; - int max_map_per_fmr; - tTS_IB_GUID node_guid; - int is_switch:1; - int ah_port_num_check:1; - int rnr_nak_supported:1; - int port_shutdown_supported:1; - int init_type_supported:1; - int port_active_event_supported:1; - int system_image_guid_supported:1; - int bad_pkey_counter_supported:1; - int qkey_violation_counter_supported; - int modify_wr_num_supported:1; - int raw_multicast_supported:1; - int apm_supported:1; - int qp_port_change_supported:1; -}; - #ifdef __KERNEL__ enum ib_async_event { @@ -203,7 +88,7 @@ IB_LOCAL_EEC_CATASTROPHIC_ERROR, IB_LOCAL_CATASTROPHIC_ERROR, IB_PORT_ERROR, - IB_PORT_ACTIVE, + IB_EVENT_PORT_ACTIVE, IB_LID_CHANGE, IB_PKEY_CHANGE, }; @@ -228,17 +113,6 @@ #define IB_MULTICAST_QPN 0xffffff -enum ib_static_rate { - IB_STATIC_RATE_FULL = 0, - IB_STATIC_RATE_4X_TO_1X = 3, - IB_STATIC_RATE_12X_TO_4X = 2, - IB_STATIC_RATE_12X_TO_1X = 11 -}; - -enum ib_device_properties_mask { - IB_DEVICE_SYSTEM_IMAGE_GUID = 1 << 0 -}; - /* structures */ enum { @@ -253,11 +127,6 @@ struct list_head list; }; -struct ib_device_changes { - enum ib_device_properties_mask valid_fields; - tTS_IB_GUID system_image_guid; -}; - struct ib_sm_path { u16 sm_lid; tTS_IB_SL sm_sl; @@ -268,27 +137,15 @@ u8 lmc; }; -typedef int (*ib_device_query_func)(struct ib_device *device, - struct ib_device_properties *properties); -typedef int (*ib_device_modify_func)(struct ib_device *device, - struct ib_device_changes *properties); -typedef int (*ib_port_query_func)(struct ib_device *device, - tTS_IB_PORT port, - struct ib_port_properties *properties); -typedef int (*ib_port_modify_func)(struct ib_device *device, - tTS_IB_PORT port, - struct ib_port_changes *properties); -typedef int (*ib_pkey_query_func)(struct ib_device *device, - tTS_IB_PORT port, - int index, - u16 *pkey); -typedef int (*ib_gid_query_func)(struct ib_device *device, - tTS_IB_PORT port, - int index, - tTS_IB_GID gid); - struct ib_mad; +enum ib_mad_result { + IB_MAD_RESULT_FAILURE = 0, // (!SUCCESS is the important flag) + IB_MAD_RESULT_SUCCESS = 1 << 0, // MAD was successfully processed + IB_MAD_RESULT_REPLY = 1 << 1, // Reply packet needs to be sent + IB_MAD_RESULT_CONSUMED = 1 << 2 // Packet consumed: stop processing +}; + typedef enum ib_mad_result (*ib_mad_process_func)(struct ib_device *device, int ignore_mkey, struct ib_mad *in_mad, Index: src/linux-kernel/infiniband/include/ts_ib_core.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core.h (revision 696) +++ src/linux-kernel/infiniband/include/ts_ib_core.h (working copy) @@ -43,26 +43,6 @@ int ib_device_notifier_register(struct ib_device_notifier *notifier); int ib_device_notifier_deregister(struct ib_device_notifier *notifier); -int ib_device_properties_get(struct ib_device *device, - struct ib_device_properties *properties); -int ib_device_properties_set(struct ib_device *device, - struct ib_device_changes *properties); -int ib_port_properties_get(struct ib_device *device, - tTS_IB_PORT port, - struct ib_port_properties *properties); -int ib_port_properties_set(struct ib_device *device, - tTS_IB_PORT port, - struct ib_port_changes *properties); - -int ib_pkey_entry_get(struct ib_device *device, - tTS_IB_PORT port, - int index, - u16 *pkey); -int ib_gid_entry_get(struct ib_device *device, - tTS_IB_PORT port, - int index, - tTS_IB_GID gid); - int ib_async_event_handler_register(struct ib_async_event_record *record, ib_async_event_handler_func function, void *arg, @@ -71,9 +51,9 @@ int ib_cached_node_guid_get(struct ib_device *device, tTS_IB_GUID node_guid); -int ib_cached_port_properties_get(struct ib_device *device, - tTS_IB_PORT port, - struct ib_port_properties *properties); +int ib_cached_port_properties_get(struct ib_device *device, + tTS_IB_PORT port, + struct ib_port_attr *properties); int ib_cached_sm_path_get(struct ib_device *device, tTS_IB_PORT port, struct ib_sm_path *sm_path); Index: src/linux-kernel/infiniband/include/ts_ib_useraccess.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_useraccess.h (revision 692) +++ src/linux-kernel/infiniband/include/ts_ib_useraccess.h (working copy) @@ -63,12 +63,13 @@ struct ib_get_port_info_ioctl { tTS_IB_PORT port; - struct ib_port_properties port_info; + struct ib_port_attr port_info; }; struct ib_set_port_info_ioctl { tTS_IB_PORT port; - struct ib_port_changes port_info; + int port_modify_mask; + struct ib_port_modify port_info; }; struct ib_mad_process_ioctl { @@ -82,13 +83,13 @@ }; struct ib_gid_entry_ioctl { - tTS_IB_PORT port; - int index; - tTS_IB_GID gid_entry; + u8 port; + int index; + union ib_gid gid_entry; }; struct ib_get_dev_info_ioctl { - struct ib_device_properties dev_info; + struct ib_device_attr dev_info; }; /* Old useraccess module used magic 0xbb; we change it here so Index: src/linux-kernel/infiniband/core/useraccess_main.c =================================================================== --- src/linux-kernel/infiniband/core/useraccess_main.c (revision 692) +++ src/linux-kernel/infiniband/core/useraccess_main.c (working copy) @@ -100,35 +100,27 @@ static int user_close(struct inode *inode, struct file *filp) { struct ib_useraccess_private *priv = TS_IB_USER_PRIV_FROM_FILE(filp); - struct ib_port_changes prop = { 0 }; + struct ib_port_modify prop = { 0 }; int port; for (port = 0; port <= TS_USERACCESS_MAX_PORTS_PER_DEVICE; ++port) { /* Undo any port capability changes from this process */ - prop.valid_fields = 0; - if (priv->port_cap_count[port][IB_PORT_CAP_SM]) { - prop.valid_fields |= IB_PORT_IS_SM; - } + if (priv->port_cap_count[port][IB_PORT_CAP_SM]) + prop.clr_port_cap_mask |= IB_PORT_SM; - if (priv->port_cap_count[port][IB_PORT_CAP_SNMP_TUN]) { - prop.valid_fields |= - IB_PORT_IS_SNMP_TUNNELING_SUPPORTED; - } + if (priv->port_cap_count[port][IB_PORT_CAP_SNMP_TUN]) + prop.clr_port_cap_mask |= IB_PORT_SNMP_TUNNEL_SUP; - if (priv->port_cap_count[port][IB_PORT_CAP_DEV_MGMT]) { - prop.valid_fields |= - IB_PORT_IS_DEVICE_MANAGEMENT_SUPPORTED; - } + if (priv->port_cap_count[port][IB_PORT_CAP_DEV_MGMT]) + prop.clr_port_cap_mask |= IB_PORT_DEVICE_MGMT_SUP; - if (priv->port_cap_count[port][IB_PORT_CAP_VEND_CLASS]) { - prop.valid_fields |= - IB_PORT_IS_VENDOR_CLASS_SUPPORTED; - } + if (priv->port_cap_count[port][IB_PORT_CAP_VEND_CLASS]) + prop.clr_port_cap_mask |= IB_PORT_VENDOR_CLASS_SUP; - if (prop.valid_fields) { - ib_port_properties_set(priv->device->ib_device, port, - &prop); + if (prop.clr_port_cap_mask) { + ib_modify_port(priv->device->ib_device, port, + 0, &prop); } } Index: src/linux-kernel/infiniband/core/useraccess_ioctl.c =================================================================== --- src/linux-kernel/infiniband/core/useraccess_ioctl.c (revision 692) +++ src/linux-kernel/infiniband/core/useraccess_ioctl.c (working copy) @@ -87,9 +87,9 @@ return -EFAULT; } - ret = ib_port_properties_get(priv->device->ib_device, - get_port_info_ioctl.port, - &get_port_info_ioctl.port_info); + ret = ib_query_port(priv->device->ib_device, + get_port_info_ioctl.port, + &get_port_info_ioctl.port_info); if (ret) { return -EFAULT; @@ -118,102 +118,82 @@ return -EINVAL; } - if (set_port_info_ioctl.port_info.valid_fields & IB_PORT_IS_SM) { - if (set_port_info_ioctl.port_info.is_sm) { - if (priv->port_cap_count[port][IB_PORT_CAP_SM]++) { - /* already set, don't set it again */ - set_port_info_ioctl.port_info.valid_fields &= - ~IB_PORT_IS_SM; - } - } else { - if (!priv->port_cap_count[port][IB_PORT_CAP_SM]) { - /* can't decrement count below 0 */ - return -EINVAL; - } else if (--priv-> - port_cap_count[port][IB_PORT_CAP_SM]) { - /* still set, don't clear it yet */ - set_port_info_ioctl.port_info.valid_fields &= - ~IB_PORT_IS_SM; - } - } + if (set_port_info_ioctl.port_info.set_port_cap_mask & + IB_PORT_SM) + if (priv->port_cap_count[port][IB_PORT_CAP_SM]++) + /* already set, don't set it again */ + set_port_info_ioctl.port_info.set_port_cap_mask &= + ~IB_PORT_SM; + + if (set_port_info_ioctl.port_info.clr_port_cap_mask & + IB_PORT_SM) { + if (!priv->port_cap_count[port][IB_PORT_CAP_SM]) + /* can't decrement count below 0 */ + return -EINVAL; + else if (--priv->port_cap_count[port][IB_PORT_CAP_SM]) + /* still set, don't clear it yet */ + set_port_info_ioctl.port_info.clr_port_cap_mask &= + ~IB_PORT_SM; } - if (set_port_info_ioctl.port_info.valid_fields & - IB_PORT_IS_SNMP_TUNNELING_SUPPORTED) { - if (set_port_info_ioctl.port_info.is_snmp_tunneling_supported) { - if (priv-> - port_cap_count[port][IB_PORT_CAP_SNMP_TUN]++) { - /* already set, don't set it again */ - set_port_info_ioctl.port_info.valid_fields &= - ~IB_PORT_IS_SNMP_TUNNELING_SUPPORTED; - } - } else { - if (!priv-> - port_cap_count[port][IB_PORT_CAP_SNMP_TUN]) { - /* can't decrement count below 0 */ - return -EINVAL; - } else if (--priv-> - port_cap_count[port] - [IB_PORT_CAP_SNMP_TUN]) { - /* still set, don't clear it yet */ - set_port_info_ioctl.port_info.valid_fields &= - ~IB_PORT_IS_SNMP_TUNNELING_SUPPORTED; - } - } + if (set_port_info_ioctl.port_info.set_port_cap_mask & + IB_PORT_SNMP_TUNNEL_SUP) + if (priv->port_cap_count[port][IB_PORT_CAP_SNMP_TUN]++) + /* already set, don't set it again */ + set_port_info_ioctl.port_info.set_port_cap_mask &= + ~IB_PORT_SNMP_TUNNEL_SUP; + + if (set_port_info_ioctl.port_info.clr_port_cap_mask & + IB_PORT_SNMP_TUNNEL_SUP) { + if (!priv->port_cap_count[port][IB_PORT_CAP_SNMP_TUN]) + /* can't decrement count below 0 */ + return -EINVAL; + else if (--priv->port_cap_count[port][IB_PORT_CAP_SNMP_TUN]) + /* still set, don't clear it yet */ + set_port_info_ioctl.port_info.clr_port_cap_mask &= + ~IB_PORT_SNMP_TUNNEL_SUP; } - if (set_port_info_ioctl.port_info.valid_fields & - IB_PORT_IS_DEVICE_MANAGEMENT_SUPPORTED) { - if (set_port_info_ioctl.port_info. - is_device_management_supported) { - if (priv-> - port_cap_count[port][IB_PORT_CAP_DEV_MGMT]++) { - /* already set, don't set it again */ - set_port_info_ioctl.port_info.valid_fields &= - ~IB_PORT_IS_DEVICE_MANAGEMENT_SUPPORTED; - } - } else { - if (!priv-> - port_cap_count[port][IB_PORT_CAP_DEV_MGMT]) { - /* can't decrement count below 0 */ - return -EINVAL; - } else if (--priv-> - port_cap_count[port] - [IB_PORT_CAP_DEV_MGMT]) { - /* still set, don't clear it yet */ - set_port_info_ioctl.port_info.valid_fields &= - ~IB_PORT_IS_DEVICE_MANAGEMENT_SUPPORTED; - } - } + if (set_port_info_ioctl.port_info.set_port_cap_mask & + IB_PORT_DEVICE_MGMT_SUP) + if (priv->port_cap_count[port][IB_PORT_CAP_DEV_MGMT]++) + /* already set, don't set it again */ + set_port_info_ioctl.port_info.set_port_cap_mask &= + ~IB_PORT_DEVICE_MGMT_SUP; + + if (set_port_info_ioctl.port_info.clr_port_cap_mask & + IB_PORT_DEVICE_MGMT_SUP) { + if (!priv->port_cap_count[port][IB_PORT_CAP_DEV_MGMT]) + /* can't decrement count below 0 */ + return -EINVAL; + else if (--priv->port_cap_count[port][IB_PORT_CAP_DEV_MGMT]) + /* still set, don't clear it yet */ + set_port_info_ioctl.port_info.clr_port_cap_mask &= + ~IB_PORT_DEVICE_MGMT_SUP; } - if (set_port_info_ioctl.port_info.valid_fields & - IB_PORT_IS_VENDOR_CLASS_SUPPORTED) { - if (set_port_info_ioctl.port_info.is_vendor_class_supported) { - if (priv-> - port_cap_count[port][IB_PORT_CAP_VEND_CLASS]++) { - /* already set, don't set it again */ - set_port_info_ioctl.port_info.valid_fields &= - ~IB_PORT_IS_VENDOR_CLASS_SUPPORTED; - } - } else { - if (!priv-> - port_cap_count[port][IB_PORT_CAP_VEND_CLASS]) { - /* can't decrement count below 0 */ - return -EINVAL; - } else if (--priv-> - port_cap_count[port] - [IB_PORT_CAP_VEND_CLASS]) { - /* still set, don't clear it yet */ - set_port_info_ioctl.port_info.valid_fields &= - ~IB_PORT_IS_VENDOR_CLASS_SUPPORTED; - } - } + if (set_port_info_ioctl.port_info.set_port_cap_mask & + IB_PORT_VENDOR_CLASS_SUP) + if (priv->port_cap_count[port][IB_PORT_CAP_VEND_CLASS]++) + /* already set, don't set it again */ + set_port_info_ioctl.port_info.set_port_cap_mask &= + ~IB_PORT_VENDOR_CLASS_SUP; + + if (set_port_info_ioctl.port_info.clr_port_cap_mask & + IB_PORT_VENDOR_CLASS_SUP) { + if (!priv->port_cap_count[port][IB_PORT_CAP_VEND_CLASS]) + /* can't decrement count below 0 */ + return -EINVAL; + else if (--priv->port_cap_count[port][IB_PORT_CAP_VEND_CLASS]) + /* still set, don't clear it yet */ + set_port_info_ioctl.port_info.clr_port_cap_mask &= + ~IB_PORT_VENDOR_CLASS_SUP; } - return ib_port_properties_set(priv->device->ib_device, - set_port_info_ioctl.port, - &set_port_info_ioctl.port_info); + return ib_modify_port(priv->device->ib_device, + set_port_info_ioctl.port, + set_port_info_ioctl.port_modify_mask, + &set_port_info_ioctl.port_info); } static int @@ -251,8 +231,8 @@ return -EFAULT; } - ret = ib_gid_entry_get(priv->device->ib_device, gid_ioctl.port, - gid_ioctl.index, gid_ioctl.gid_entry); + ret = ib_query_gid(priv->device->ib_device, gid_ioctl.port, + gid_ioctl.index, &gid_ioctl.gid_entry); if (ret) { return ret; @@ -328,8 +308,8 @@ if (!get_dev_info_ioctl) return -ENOMEM; - ret = ib_device_properties_get(priv->device->ib_device, - &get_dev_info_ioctl->dev_info); + ret = ib_query_device(priv->device->ib_device, + &get_dev_info_ioctl->dev_info); if (ret) goto out; Index: src/linux-kernel/infiniband/core/core_proc.c =================================================================== --- src/linux-kernel/infiniband/core/core_proc.c (revision 692) +++ src/linux-kernel/infiniband/core/core_proc.c (working copy) @@ -97,25 +97,25 @@ static int ib_dev_info_seq_show(struct seq_file *file, void *iter_ptr) { - struct ib_device_properties prop; - struct ib_device *proc_device = file->private; + struct ib_device_attr prop; + struct ib_device *proc_device = file->private; seq_printf(file, "name: %s\n", proc_device->name); seq_printf(file, "provider: %s\n", proc_device->provider); - if (proc_device->device_query(proc_device, &prop)) + if (proc_device->query_device(proc_device, &prop)) return 0; seq_printf(file, "node GUID: %04x:%04x:%04x:%04x\n", - be16_to_cpu(((u16 *) prop.node_guid)[0]), - be16_to_cpu(((u16 *) prop.node_guid)[1]), - be16_to_cpu(((u16 *) prop.node_guid)[2]), - be16_to_cpu(((u16 *) prop.node_guid)[3])); - seq_printf(file, "ports: %d\n", prop.num_port); + be16_to_cpu(((u16 *) &prop.node_guid)[0]), + be16_to_cpu(((u16 *) &prop.node_guid)[1]), + be16_to_cpu(((u16 *) &prop.node_guid)[2]), + be16_to_cpu(((u16 *) &prop.node_guid)[3])); + seq_printf(file, "ports: %d\n", prop.phys_port_cnt); seq_printf(file, "vendor ID: 0x%x\n", prop.vendor_id); - seq_printf(file, "device ID: 0x%x\n", prop.device_id); - seq_printf(file, "HW revision: 0x%x\n", prop.hw_rev); - seq_printf(file, "FW revision: 0x%" TS_U64_FMT "x\n", prop.fw_rev); + seq_printf(file, "device ID: 0x%x\n", prop.vendor_part_id); + seq_printf(file, "HW revision: 0x%x\n", prop.hw_ver); + seq_printf(file, "FW revision: 0x%" TS_U64_FMT "x\n", prop.fw_ver); return 0; } @@ -167,28 +167,29 @@ static int ib_port_info_seq_show(struct seq_file *file, void *iter_ptr) { - struct ib_port_properties prop; + struct ib_port_attr prop; struct ib_port_proc *proc_port = file->private; - if (proc_port->device->port_query(proc_port->device, proc_port->port_num, &prop)) { + if (proc_port->device->query_port(proc_port->device, + proc_port->port_num, &prop)) { return 0; } seq_printf(file, "state: "); - switch (prop.port_state) { - case IB_PORT_STATE_NOP: + switch (prop.state) { + case IB_PORT_NOP: seq_printf(file, "NOP\n"); break; - case IB_PORT_STATE_DOWN: + case IB_PORT_DOWN: seq_printf(file, "DOWN\n"); break; - case IB_PORT_STATE_INIT: + case IB_PORT_INIT: seq_printf(file, "INITIALIZE\n"); break; - case IB_PORT_STATE_ARMED: + case IB_PORT_ARMED: seq_printf(file, "ARMED\n"); break; - case IB_PORT_STATE_ACTIVE: + case IB_PORT_ACTIVE: seq_printf(file, "ACTIVE\n"); break; default: @@ -201,7 +202,7 @@ seq_printf(file, "SM LID: 0x%04x\n", prop.sm_lid); seq_printf(file, "SM SL: 0x%04x\n", prop.sm_sl); seq_printf(file, "Capabilities: "); - if (prop.capability_mask) { + if (prop.port_cap_flags) { static const char *cap_name[] = { [1] = "IsSM", [2] = "IsNoticeSupported", @@ -227,7 +228,7 @@ int f = 0; for (i = 0; i < ARRAY_SIZE(cap_name); ++i) { - if (prop.capability_mask & (1 << i)) { + if (prop.port_cap_flags & (1 << i)) { if (f++) { seq_puts(file, " "); } Index: src/linux-kernel/infiniband/core/core_cache.c =================================================================== --- src/linux-kernel/infiniband/core/core_cache.c (revision 696) +++ src/linux-kernel/infiniband/core/core_cache.c (working copy) @@ -46,7 +46,7 @@ int ib_cached_port_properties_get(struct ib_device *device, tTS_IB_PORT port, - struct ib_port_properties *properties) + struct ib_port_attr *properties) { struct ib_device_private *priv; unsigned int seq; @@ -60,7 +60,7 @@ seq = read_seqcount_begin(&priv->port_data[port].lock); memcpy(properties, &priv->port_data[port].properties, - sizeof (struct ib_port_properties)); + sizeof (struct ib_port_attr)); } while (read_seqcount_retry(&priv->port_data[port].lock, seq)); return 0; @@ -126,13 +126,13 @@ if (port < priv->start_port || port > priv->end_port) return -EINVAL; - if (index < 0 || index >= priv->port_data[port].properties.gid_table_length) + if (index < 0 || index >= priv->port_data[port].properties.gid_tbl_len) return -EINVAL; do { seq = read_seqcount_begin(&priv->port_data[port].lock); memcpy(gid, - priv->port_data[port].gid_table[index], + priv->port_data[port].gid_table[index].raw, sizeof (tTS_IB_GID)); } while (read_seqcount_retry(&priv->port_data[port].lock, seq)); @@ -161,10 +161,10 @@ seq = read_seqcount_begin(&priv->port_data[p].lock); f = 0; for (j = 0; - j < priv->port_data[p].properties.gid_table_length; + j < priv->port_data[p].properties.gid_tbl_len; ++j) { if (!memcmp(gid, - priv->port_data[p].gid_table[j], + priv->port_data[p].gid_table[j].raw, sizeof (tTS_IB_GID))) { f = 1; break; @@ -212,7 +212,7 @@ if (port < priv->start_port || port > priv->end_port) return -EINVAL; - if (index < 0 || index >= priv->port_data[port].properties.pkey_table_length) + if (index < 0 || index >= priv->port_data[port].properties.pkey_tbl_len) return -EINVAL; do { @@ -242,7 +242,7 @@ do { seq = read_seqcount_begin(&priv->port_data[port].lock); found = -1; - for (i = 0; i < priv->port_data[port].properties.pkey_table_length; ++i) { + for (i = 0; i < priv->port_data[port].properties.pkey_tbl_len; ++i) { if ((priv->port_data[port].pkey_table[i] & 0x7fff) == (pkey & 0x7fff)) { found = i; @@ -263,7 +263,7 @@ int ib_cache_setup(struct ib_device *device) { struct ib_device_private *priv = device->core; - struct ib_port_properties prop; + struct ib_port_attr prop; int p; int ret; @@ -274,23 +274,23 @@ for (p = priv->start_port; p <= priv->end_port; ++p) { seqcount_init(&priv->port_data[p].lock); - ret = device->port_query(device, p, &prop); + ret = device->query_port(device, p, &prop); if (ret) { TS_REPORT_WARN(MOD_KERNEL_IB, - "port_query failed for %s", + "query_port failed for %s", device->name); goto error; } - priv->port_data[p].gid_table_alloc_length = prop.gid_table_length; - priv->port_data[p].gid_table = kmalloc(prop.gid_table_length * sizeof (tTS_IB_GID), + priv->port_data[p].gid_table_alloc_length = prop.gid_tbl_len; + priv->port_data[p].gid_table = kmalloc(prop.gid_tbl_len * sizeof (tTS_IB_GID), GFP_KERNEL); if (!priv->port_data[p].gid_table) { ret = -ENOMEM; goto error; } - priv->port_data[p].pkey_table_alloc_length = prop.pkey_table_length; - priv->port_data[p].pkey_table = kmalloc(prop.pkey_table_length * sizeof (u16), + priv->port_data[p].pkey_table_alloc_length = prop.pkey_tbl_len; + priv->port_data[p].pkey_table = kmalloc(prop.pkey_tbl_len * sizeof (u16), GFP_KERNEL); if (!priv->port_data[p].pkey_table) { ret = -ENOMEM; @@ -327,8 +327,8 @@ { struct ib_device_private *priv = device->core; struct ib_port_data *info = &priv->port_data[port]; - struct ib_port_properties *tprops = NULL; - tTS_IB_GID *tgid = NULL; + struct ib_port_attr *tprops = NULL; + union ib_gid *tgid = NULL; u16 *tpkey = NULL; int i; int ret; @@ -341,43 +341,42 @@ if (!tprops) goto out; - ret = device->port_query(device, port, tprops); + ret = device->query_port(device, port, tprops); if (ret) { TS_REPORT_WARN(MOD_KERNEL_IB, - "port_query failed (%d) for %s", + "query_port failed (%d) for %s", ret, device->name); goto out; } - tprops->gid_table_length = min(tprops->gid_table_length, - info->gid_table_alloc_length); - tgid = kmalloc(tprops->gid_table_length * sizeof (tTS_IB_GID), - GFP_KERNEL); + tprops->gid_tbl_len = min(tprops->gid_tbl_len, + info->gid_table_alloc_length); + tgid = kmalloc(tprops->gid_tbl_len * sizeof *tgid, GFP_KERNEL); if (!tgid) goto out; - for (i = 0; i < tprops->gid_table_length; ++i) { - ret = device->gid_query(device, port, i, tgid[i]); + for (i = 0; i < tprops->gid_tbl_len; ++i) { + ret = device->query_gid(device, port, i, tgid + i); if (ret) { TS_REPORT_WARN(MOD_KERNEL_IB, - "gid_query failed (%d) for %s (index %d)", + "query_gid failed (%d) for %s (index %d)", ret, device->name, i); goto out; } } - tprops->pkey_table_length = min(tprops->pkey_table_length, - info->pkey_table_alloc_length); - tpkey = kmalloc(tprops->pkey_table_length * sizeof (u16), + tprops->pkey_tbl_len = min(tprops->pkey_tbl_len, + info->pkey_table_alloc_length); + tpkey = kmalloc(tprops->pkey_tbl_len * sizeof (u16), GFP_KERNEL); if (!tpkey) goto out; - for (i = 0; i < tprops->pkey_table_length; ++i) { - ret = device->pkey_query(device, port, i, &tpkey[i]); + for (i = 0; i < tprops->pkey_tbl_len; ++i) { + ret = device->query_pkey(device, port, i, &tpkey[i]); if (ret) { TS_REPORT_WARN(MOD_KERNEL_IB, - "pkey_query failed (%d) for %s, port %d, index %d", + "query_pkey failed (%d) for %s, port %d, index %d", ret, device->name, port, i); goto out; } @@ -394,9 +393,9 @@ info->port_lid.lmc = info->properties.lmc; memcpy(info->gid_table, tgid, - tprops->gid_table_length * sizeof(tTS_IB_GID)); + tprops->gid_tbl_len * sizeof *tgid); memcpy(info->pkey_table, tpkey, - tprops->pkey_table_length * sizeof (u16)); + tprops->pkey_tbl_len * sizeof *tpkey); write_seqcount_end(&info->lock); Index: src/linux-kernel/infiniband/core/generate_pkt_access.pl =================================================================== --- src/linux-kernel/infiniband/core/generate_pkt_access.pl (revision 692) +++ src/linux-kernel/infiniband/core/generate_pkt_access.pl (working copy) @@ -142,7 +142,7 @@ #ifndef _TS_${class_type}_TYPES_H #define _TS_${class_type}_TYPES_H -#include "ts_ib_core_types.h" +#include #ifdef __KERNEL__ #include #endif Index: src/linux-kernel/infiniband/core/core_priv.h =================================================================== --- src/linux-kernel/infiniband/core/core_priv.h (revision 696) +++ src/linux-kernel/infiniband/core/core_priv.h (working copy) @@ -61,12 +61,12 @@ int port_cap_count[IB_PORT_CAP_NUM]; seqcount_t lock; - struct ib_port_properties properties; + struct ib_port_attr properties; struct ib_sm_path sm_path; struct ib_port_lid port_lid; int gid_table_alloc_length; - int pkey_table_alloc_length; - tTS_IB_GID *gid_table; + u16 pkey_table_alloc_length; + union ib_gid *gid_table; u16 *pkey_table; }; Index: src/linux-kernel/infiniband/core/core_async.c =================================================================== --- src/linux-kernel/infiniband/core/core_async.c (revision 696) +++ src/linux-kernel/infiniband/core/core_async.c (working copy) @@ -153,7 +153,6 @@ { struct ib_async_event_list *event; struct ib_device_private *priv = event_record->device->core; - unsigned long flags = 0; /* initialize to shut up gcc */ switch (event_table[event_record->event].mod) { default: Index: src/linux-kernel/infiniband/core/mad_main.c =================================================================== --- src/linux-kernel/infiniband/core/mad_main.c (revision 692) +++ src/linux-kernel/infiniband/core/mad_main.c (working copy) @@ -159,36 +159,37 @@ static int ib_mad_init_one(struct ib_device *device) { - struct ib_mad_private *priv; - struct ib_device_properties prop; - int ret; + struct ib_mad_private *priv; + struct ib_device_attr prop; + int ret; - ret = ib_device_properties_get(device, &prop); + ret = ib_query_device(device, &prop); if (ret) return ret; TS_TRACE(MOD_KERNEL_IB, T_VERY_VERBOSE, TRACE_KERNEL_IB_GEN, - "Setting up device %s (%s), %d ports", - prop.name, prop.provider, prop.num_port); + "Setting up device %s, %d ports", + device->name, prop.phys_port_cnt); priv = kmalloc(sizeof *priv, GFP_KERNEL); if (!priv) { TS_REPORT_WARN(MOD_KERNEL_IB, "Couldn't allocate private structure for %s", - prop.name); + device->name); return -ENOMEM; } device->mad = priv; priv->ib_dev = device; - priv->num_port = (prop.is_switch) ? 1 : prop.num_port; + priv->num_port = device->flags & IB_DEVICE_IS_SWITCH ? + 1 : prop.phys_port_cnt; priv->pd = ib_alloc_pd(device); if (IS_ERR(priv->pd)) { TS_REPORT_FATAL(MOD_KERNEL_IB, "Failed to allocate PD for %s", - prop.name); + device->name); goto error; } @@ -201,7 +202,7 @@ if (IS_ERR(priv->cq)) { TS_REPORT_FATAL(MOD_KERNEL_IB, "Failed to allocate CQ for %s", - prop.name); + device->name); goto error_free_pd; } } @@ -215,7 +216,7 @@ if (ib_mad_register_memory(priv->pd, &priv->mr, &priv->lkey)) { TS_REPORT_FATAL(MOD_KERNEL_IB, "Failed to allocate MR for %s", - prop.name); + device->name); goto error_free_cq; } @@ -226,7 +227,7 @@ if (ret) { TS_REPORT_WARN(MOD_KERNEL_IB, "Couldn't start completion thread for %s", - prop.name); + device->name); goto error_free_mr; } @@ -234,11 +235,11 @@ int start_port, end_port; int p, q, i; - if (prop.is_switch) { + if (device->flags & IB_DEVICE_IS_SWITCH) { start_port = end_port = 0; } else { start_port = 1; - end_port = prop.num_port; + end_port = prop.phys_port_cnt; } for (p = 0; p <= IB_MAD_MAX_PORTS_PER_DEVICE; ++p) { Index: src/linux-kernel/infiniband/core/core_device.c =================================================================== --- src/linux-kernel/infiniband/core/core_device.c (revision 696) +++ src/linux-kernel/infiniband/core/core_device.c (working copy) @@ -44,10 +44,10 @@ size_t offset; char *name; } mandatory_table[] = { - IB_MANDATORY_FUNC(device_query), - IB_MANDATORY_FUNC(port_query), - IB_MANDATORY_FUNC(pkey_query), - IB_MANDATORY_FUNC(gid_query), + IB_MANDATORY_FUNC(query_device), + IB_MANDATORY_FUNC(query_port), + IB_MANDATORY_FUNC(query_pkey), + IB_MANDATORY_FUNC(query_gid), IB_MANDATORY_FUNC(alloc_pd), IB_MANDATORY_FUNC(dealloc_pd), IB_MANDATORY_FUNC(create_ah), @@ -125,7 +125,7 @@ int ib_device_register(struct ib_device *device) { struct ib_device_private *priv; - struct ib_device_properties prop; + struct ib_device_attr prop; int ret; int p; @@ -152,21 +152,21 @@ *priv = (struct ib_device_private) { 0 }; - ret = device->device_query(device, &prop); + ret = device->query_device(device, &prop); if (ret) { TS_REPORT_WARN(MOD_KERNEL_IB, - "device_query failed for %s", + "query_device failed for %s", device->name); goto out_free; } - memcpy(priv->node_guid, prop.node_guid, sizeof (tTS_IB_GUID)); + memcpy(priv->node_guid, &prop.node_guid, sizeof (tTS_IB_GUID)); - if (prop.is_switch) { + if (device->flags & IB_DEVICE_IS_SWITCH) { priv->start_port = priv->end_port = 0; } else { priv->start_port = 1; - priv->end_port = prop.num_port; + priv->end_port = prop.phys_port_cnt; } priv->port_data = kmalloc((priv->end_port + 1) * sizeof (struct ib_port_data), @@ -207,7 +207,7 @@ goto out_free_cache; } - ret = ib_proc_setup(device, prop.is_switch); + ret = ib_proc_setup(device, !!(device->flags & IB_DEVICE_IS_SWITCH)); if (ret) { TS_REPORT_WARN(MOD_KERNEL_IB, "Couldn't create /proc dir for %s", @@ -344,116 +344,52 @@ } EXPORT_SYMBOL(ib_device_notifier_deregister); -int ib_device_properties_get(struct ib_device *device, - struct ib_device_properties *properties) +int ib_query_device(struct ib_device *device, + struct ib_device_attr *device_attr) { - return device->device_query ? device->device_query(device, properties) : -ENOSYS; + return device->query_device(device, device_attr); } -EXPORT_SYMBOL(ib_device_properties_get); +EXPORT_SYMBOL(ib_query_device); -int ib_device_properties_set(struct ib_device *device, - struct ib_device_changes *properties) +int ib_query_port(struct ib_device *device, + u8 port_num, + struct ib_port_attr *port_attr) { - return device->device_modify ? device->device_modify(device, properties) : -ENOSYS; + return device->query_port(device, port_num, port_attr); } +EXPORT_SYMBOL(ib_query_port); -int ib_port_properties_get(struct ib_device *device, - tTS_IB_PORT port, - struct ib_port_properties *properties) +int ib_query_gid(struct ib_device *device, + u8 port_num, int index, union ib_gid *gid) { - return device->port_query ? device->port_query(device, port, properties) : -ENOSYS; + return device->query_gid(device, port_num, index, gid); } -EXPORT_SYMBOL(ib_port_properties_get); +EXPORT_SYMBOL(ib_query_gid); -int ib_port_properties_set(struct ib_device *device, - tTS_IB_PORT port, - struct ib_port_changes *properties) +int ib_query_pkey(struct ib_device *device, + u8 port_num, u16 index, u16 *pkey) { - struct ib_device_private *priv; - struct ib_port_changes prop_set; - unsigned long flags; - - priv = device->core; - - if (port < priv->start_port || port > priv->end_port) { - return -EINVAL; - } - - prop_set = *properties; - - spin_lock_irqsave(&priv->port_data[port].port_cap_lock, flags); - - if (properties->valid_fields & IB_PORT_IS_SM) { - priv->port_data[port].port_cap_count[IB_PORT_CAP_SM] += - 2 * !!properties->is_sm - 1; - if (priv->port_data[port].port_cap_count[IB_PORT_CAP_SM] < 0) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "'is SM' cap count decremented below 0"); - priv->port_data[port].port_cap_count[IB_PORT_CAP_SM] = 0; - } - prop_set.is_sm = - !!priv->port_data[port].port_cap_count[IB_PORT_CAP_SM]; - } - - if (properties->valid_fields & IB_PORT_IS_SNMP_TUNNELING_SUPPORTED) { - priv->port_data[port].port_cap_count[IB_PORT_CAP_SNMP_TUN] += - 2 * !!properties->is_snmp_tunneling_supported - 1; - if (priv->port_data[port].port_cap_count[IB_PORT_CAP_SNMP_TUN] < 0) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "'is SNMP tunneling supported' cap count decremented below 0"); - priv->port_data[port].port_cap_count[IB_PORT_CAP_SNMP_TUN] = 0; - } - prop_set.is_snmp_tunneling_supported = - !!priv->port_data[port].port_cap_count[IB_PORT_CAP_SNMP_TUN]; - } - - if (properties->valid_fields & IB_PORT_IS_DEVICE_MANAGEMENT_SUPPORTED) { - priv->port_data[port].port_cap_count[IB_PORT_CAP_DEV_MGMT] += - 2 * !!properties->is_device_management_supported - 1; - if (priv->port_data[port].port_cap_count[IB_PORT_CAP_DEV_MGMT] < 0) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "'is device management supported' cap count decremented below 0"); - priv->port_data[port].port_cap_count[IB_PORT_CAP_DEV_MGMT] = 0; - } - prop_set.is_device_management_supported = - !!priv->port_data[port].port_cap_count[IB_PORT_CAP_DEV_MGMT]; - } - - if (properties->valid_fields & IB_PORT_IS_VENDOR_CLASS_SUPPORTED) { - priv->port_data[port].port_cap_count[IB_PORT_CAP_VEND_CLASS] += - 2 * !!properties->is_vendor_class_supported - 1; - if (priv->port_data[port].port_cap_count[IB_PORT_CAP_VEND_CLASS] < 0) { - TS_REPORT_WARN(MOD_KERNEL_IB, - "'is vendor class supported' cap count decremented below 0"); - priv->port_data[port].port_cap_count[IB_PORT_CAP_VEND_CLASS] = 0; - } - prop_set.is_vendor_class_supported = - !!priv->port_data[port].port_cap_count[IB_PORT_CAP_VEND_CLASS]; - } - - spin_unlock_irqrestore(&priv->port_data[port].port_cap_lock, flags); - - return device->port_modify ? device->port_modify(device, port, &prop_set) : -ENOSYS; + return device->query_pkey(device, port_num, index, pkey); } -EXPORT_SYMBOL(ib_port_properties_set); +EXPORT_SYMBOL(ib_query_pkey); -int ib_pkey_entry_get(struct ib_device *device, - tTS_IB_PORT port, - int index, - u16 *pkey) +int ib_modify_device(struct ib_device *device, + int device_modify_mask, + struct ib_device_modify *device_modify) { - return device->pkey_query ? device->pkey_query(device, port, index, pkey) : -ENOSYS; + return device->modify_device(device, device_modify_mask, + device_modify); } -EXPORT_SYMBOL(ib_pkey_entry_get); +EXPORT_SYMBOL(ib_modify_device); -int ib_gid_entry_get(struct ib_device *device, - tTS_IB_PORT port, - int index, - tTS_IB_GID gid) +int ib_modify_port(struct ib_device *device, + u8 port_num, int port_modify_mask, + struct ib_port_modify *port_modify) { - return device->gid_query ? device->gid_query(device, port, index, gid) : -ENOSYS; + return device->modify_port(device, port_num, port_modify_mask, + port_modify); } -EXPORT_SYMBOL(ib_gid_entry_get); +EXPORT_SYMBOL(ib_modify_port); /* Local Variables: Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 692) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -34,8 +34,8 @@ IB_SMP_ATTRIB_PKEY_TABLE = 0x0016 }; -static int mthca_device_query(struct ib_device *ibdev, - struct ib_device_properties *props) +static int mthca_query_device(struct ib_device *ibdev, + struct ib_device_attr *props) { struct ib_mad *in_mad = NULL; struct ib_mad *out_mad = NULL; @@ -47,11 +47,8 @@ if (!in_mad || !out_mad) goto out; - props->num_port = to_mdev(ibdev)->limits.num_ports; - strlcpy(props->name, ibdev->name, IB_DEVICE_NAME_MAX); - props->provider = ibdev->provider; - props->fw_rev = to_mdev(ibdev)->fw_ver; - props->is_switch = 0; + props->phys_port_cnt = to_mdev(ibdev)->limits.num_ports; + props->fw_ver = to_mdev(ibdev)->fw_ver; memset(in_mad, 0, sizeof *in_mad); in_mad->format_version = 1; @@ -70,11 +67,11 @@ goto out; } - props->vendor_id = be32_to_cpup((u32 *) (out_mad->payload + 76)) & + props->vendor_id = be32_to_cpup((u32 *) (out_mad->payload + 76)) & 0xffffff; - props->device_id = be16_to_cpup((u16 *) (out_mad->payload + 70)); - props->hw_rev = be16_to_cpup((u16 *) (out_mad->payload + 72)); - memcpy(props->node_guid, out_mad->payload + 52, 8); + props->vendor_part_id = be16_to_cpup((u16 *) (out_mad->payload + 70)); + props->hw_ver = be16_to_cpup((u16 *) (out_mad->payload + 72)); + memcpy(&props->node_guid, out_mad->payload + 52, 8); err = 0; out: @@ -83,9 +80,8 @@ return err; } -static int mthca_port_query(struct ib_device *ibdev, - u8 port, - struct ib_port_properties *props) +static int mthca_query_port(struct ib_device *ibdev, + u8 port, struct ib_port_attr *props) { struct ib_mad *in_mad = NULL; struct ib_mad *out_mad = NULL; @@ -119,11 +115,11 @@ props->lmc = (*(u8 *) (out_mad->payload + 74)) & 0x7; props->sm_lid = be16_to_cpup((u16 *) (out_mad->payload + 58)); props->sm_sl = (*(u8 *) (out_mad->payload + 76)) & 0xf; - props->port_state = (*(u8 *) (out_mad->payload + 72)) & 0xf; - props->capability_mask = be32_to_cpup((u32 *) (out_mad->payload + 60)); - props->gid_table_length = to_mdev(ibdev)->limits.gid_table_len; - props->pkey_table_length = to_mdev(ibdev)->limits.pkey_table_len; - props->qkey_violation_counter = be16_to_cpup((u16 *) (out_mad->payload + 88)); + props->state = (*(u8 *) (out_mad->payload + 72)) & 0xf; + props->port_cap_flags = be32_to_cpup((u32 *) (out_mad->payload + 60)); + props->gid_tbl_len = to_mdev(ibdev)->limits.gid_table_len; + props->pkey_tbl_len = to_mdev(ibdev)->limits.pkey_table_len; + props->qkey_viol_cntr = be16_to_cpup((u16 *) (out_mad->payload + 88)); out: kfree(in_mad); @@ -131,17 +127,15 @@ return err; } -static int mthca_port_modify(struct ib_device *ibdev, - tTS_IB_PORT port, - struct ib_port_changes *props) +static int mthca_modify_port(struct ib_device *ibdev, + u8 port, int port_modify_mask, + struct ib_port_modify *props) { return 0; } -static int mthca_pkey_query(struct ib_device *ibdev, - u8 port, - int index, - u16 *pkey) +static int mthca_query_pkey(struct ib_device *ibdev, + u8 port, int index, u16 *pkey) { struct ib_mad *in_mad = NULL; struct ib_mad *out_mad = NULL; @@ -179,10 +173,8 @@ return err; } -static int mthca_gid_query(struct ib_device *ibdev, - u8 port, - int index, - u8 gid[16]) +static int mthca_query_gid(struct ib_device *ibdev, u8 port, + int index, union ib_gid *gid) { struct ib_mad *in_mad = NULL; struct ib_mad *out_mad = NULL; @@ -212,7 +204,7 @@ goto out; } - memcpy(gid, out_mad->payload + 48, 8); + memcpy(gid->raw, out_mad->payload + 48, 8); memset(in_mad, 0, sizeof *in_mad); in_mad->format_version = 1; @@ -232,7 +224,7 @@ goto out; } - memcpy(gid + 8, out_mad->payload + 40 + (index % 8) * 16, 8); + memcpy(gid->raw + 8, out_mad->payload + 40 + (index % 8) * 16, 8); out: kfree(in_mad); @@ -528,11 +520,11 @@ dev->ib_dev.owner = THIS_MODULE; dev->ib_dev.dma_device = dev->pdev; dev->ib_dev.provider = "mthca"; - dev->ib_dev.device_query = mthca_device_query; - dev->ib_dev.port_query = mthca_port_query; - dev->ib_dev.port_modify = mthca_port_modify; - dev->ib_dev.pkey_query = mthca_pkey_query; - dev->ib_dev.gid_query = mthca_gid_query; + dev->ib_dev.query_device = mthca_query_device; + dev->ib_dev.query_port = mthca_query_port; + dev->ib_dev.modify_port = mthca_modify_port; + dev->ib_dev.query_pkey = mthca_query_pkey; + dev->ib_dev.query_gid = mthca_query_gid; dev->ib_dev.alloc_pd = mthca_alloc_pd; dev->ib_dev.dealloc_pd = mthca_dealloc_pd; dev->ib_dev.create_ah = mthca_ah_create; Index: src/linux-kernel/infiniband/hw/mthca/mthca_eq.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_eq.c (revision 692) +++ src/linux-kernel/infiniband/hw/mthca/mthca_eq.c (working copy) @@ -199,7 +199,7 @@ active ? "active" : "down", port); record.device = &dev->ib_dev; - record.event = active ? IB_PORT_ACTIVE : IB_PORT_ERROR; + record.event = active ? IB_EVENT_PORT_ACTIVE : IB_PORT_ERROR; record.modifier.port = port; ib_async_event_dispatch(&record); From iod00d at hp.com Mon Aug 30 18:38:36 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 30 Aug 2004 18:38:36 -0700 Subject: [openib-general] mthca v. current ib_verbs In-Reply-To: <20040810.131751.01369461.itoumsn@nttdata.co.jp> References: <52smaw7zcs.fsf@topspin.com> <20040810.095807.60847859.itoumsn@nttdata.co.jp> <52oelj946h.fsf@topspin.com> <20040810.131751.01369461.itoumsn@nttdata.co.jp> Message-ID: <20040831013836.GC27631@cup.hp.com> On Tue, Aug 10, 2004 at 01:17:51PM +0900, Masanori ITOH wrote: > By the way, why not defining macros like the following instead of static > inline functions? > > #define ib_poll_cq(cq, num_entries, wc) cq->device->poll_cq(cq, num_entries, wc); Macros don't provide typedef checking. Using static inline lets the compiler check the types for us. grant From iod00d at hp.com Mon Aug 30 18:52:42 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 30 Aug 2004 18:52:42 -0700 Subject: [openib-general] Re: [PATCH] remove ts_kernel_trace from client_query* In-Reply-To: <52oekzc7oi.fsf@topspin.com> References: <1092438376.32316.30.camel@localhost> <52r7qay0zh.fsf@topspin.com> <1092672183.2752.4.camel@duffman> <52k6vzoxla.fsf@topspin.com> <1092678729.2752.12.camel@duffman> <52n00kpcv1.fsf@topspin.com> <1093380548.13962.15.camel@duffman> <52acwjdvdz.fsf@topspin.com> <20040825195034.GB2399@mellanox.co.il> <52oekzc7oi.fsf@topspin.com> Message-ID: <20040831015242.GE27631@cup.hp.com> [ still catching up from email while on vacation...this goose could already be cooked... ] On Wed, Aug 25, 2004 at 12:57:17PM -0700, Roland Dreier wrote: > Michael> Wouldnt compile-time be enough? If you want to change > Michael> the setting, you could just add -DDEBUG or something in > Michael> the makefile and rebuild the relevant module. > > We've found it very useful for debugging to be able to turn on debug > output after a problem is detected without having to disturb the system. It could be another one of the "CONFIG_EMBEDDED" options. > Michael> I'm not sure multiple debug levels are really useful - > Michael> from experience its hard to define in which debug level > Michael> each print belongs, so you end up enabling the maximum > Michael> verbosity, anyway. > > Yes, I agree, although perhaps two levels (DEBUG and VERBOSEDEBUG say) > might be useful. I strongly prefer setting up the debug flags based on features. Ie track memory usage with one flag, setup/tear down of connections with another, etc. grant From iod00d at hp.com Mon Aug 30 19:02:51 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 30 Aug 2004 19:02:51 -0700 Subject: [openib-general] ib_req_ncomp_notif in core_ layer In-Reply-To: <20040811173849.GA29669@mellanox.co.il> References: <20040804084138.GA29136@mellanox.co.il> <20040804230200.GB548@cup.hp.com> <20040805125242.GH2640@mellanox.co.il> <20040805073226.497bb020.mshefty@ichips.intel.com> <52llgth9wr.fsf@topspin.com> <20040810062044.GA6645@mellanox.co.il> <20040810075751.07f19460.mshefty@ichips.intel.com> <20040811173849.GA29669@mellanox.co.il> Message-ID: <20040831020251.GF27631@cup.hp.com> On Wed, Aug 11, 2004 at 08:38:49PM +0300, Michael S. Tsirkin wrote: > Performance is hardware-dependend, anyway. Michael, I'll go trolling and assert "Performance is SW-dependent" is more correct. apologies for pulling a your comment out of context...but that statement is an extreme simplification that is misleading. Yes, it's only partially correct. > So, why is it a good idea to have all clients do something like > if (req_ncomp_notif()) req_comp_notif() > > Why not make this part of the layer already? req_ncomp_notif() can be implemented to be a NOP and the compiler can optimize away the test (either to always call or never call the req_comp_notif()). grant From roland at topspin.com Mon Aug 30 20:21:04 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 30 Aug 2004 20:21:04 -0700 Subject: [openib-general] [PATCH] consolidate verbs functions in ib_verbs.c Message-ID: <521xho56xr.fsf@topspin.com> This follows what Sean did and moves a lot of verbs functions into a single ib_verbs.c file (which is still only 399 lines so it seems pretty manageable). I also merged in the differences in implementation, with SRQ handling in create_qp/destroy_qp and PD handling in rereg_phys_mr. I even went a little farther and didn't inline verbs like modify_ah (if create_ah is out-of-line, it seems reasonable for modify_ah to be the same). - R. Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 701) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -540,6 +540,13 @@ atomic_t usecnt; /* count number of work queues */ }; +struct ib_srq { + struct ib_device *device; + struct ib_pd *pd; + void *srq_context; + atomic_t usecnt; +}; + struct ib_qp { struct ib_device *device; struct ib_pd *pd; @@ -629,7 +636,7 @@ struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr); struct ib_cq * (*create_cq)(struct ib_device *device, - int *cqe); + int cqe); int (*destroy_cq)(struct ib_cq *cq); int (*resize_cq)(struct ib_cq *cq, int *cqe); int (*poll_cq)(struct ib_cq *cq, int num_entries, Index: src/linux-kernel/infiniband/core/core_mw.c =================================================================== --- src/linux-kernel/infiniband/core/core_mw.c (revision 652) +++ src/linux-kernel/infiniband/core/core_mw.c (working copy) @@ -1,65 +0,0 @@ -/* - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available at - * , or the OpenIB.org BSD - * license, available in the LICENSE.TXT file accompanying this - * software. These details are also available at - * . - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * Copyright (c) 2004 Topspin Communications. All rights reserved. - * - * $Id$ - */ - -#include - -#include "core_priv.h" - -struct ib_mw *ib_alloc_mw(struct ib_pd *pd) -{ - struct ib_mw *mw; - - if (!pd->device->alloc_mw) - return ERR_PTR(-ENOSYS); - - mw = pd->device->alloc_mw(pd); - if (!IS_ERR(mw)) { - mw->device = pd->device; - mw->pd = pd; - atomic_inc(&pd->usecnt); - } - - return mw; -} -EXPORT_SYMBOL(ib_alloc_mw); - -int ib_dealloc_mw(struct ib_mw *mw) -{ - struct ib_pd *pd; - int ret; - - pd = mw->pd; - ret = mw->device->dealloc_mw(mw); - if (!ret) - atomic_dec(&pd->usecnt); - - return ret; -} -EXPORT_SYMBOL(ib_dealloc_mw); - -/* - * Local Variables: - * c-file-style: "linux" - * indent-tabs-mode: t - * End: - */ Index: src/linux-kernel/infiniband/core/Makefile =================================================================== --- src/linux-kernel/infiniband/core/Makefile (revision 696) +++ src/linux-kernel/infiniband/core/Makefile (working copy) @@ -34,17 +34,10 @@ pm_export.o \ header_main.o \ header_ud.o \ + ib_verbs.c \ core_main.o \ core_device.o \ - core_pd.o \ - core_ah.o \ - core_qp.o \ - core_cq.o \ - core_mr.o \ - core_fmr.o \ core_fmr_pool.o \ - core_mw.o \ - core_mcast.o \ core_async.o \ core_cache.o \ core_proc.o Index: src/linux-kernel/infiniband/core/core_fmr.c =================================================================== --- src/linux-kernel/infiniband/core/core_fmr.c (revision 652) +++ src/linux-kernel/infiniband/core/core_fmr.c (working copy) @@ -1,81 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include -#include -#include - -#include "core_priv.h" - -struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, - int mr_access_flags, - struct ib_fmr_attr *fmr_attr) -{ - struct ib_fmr *fmr; - - if (!pd->device->alloc_fmr) - return ERR_PTR(-ENOSYS); - - fmr = pd->device->alloc_fmr(pd, mr_access_flags, fmr_attr); - if (!IS_ERR(fmr)) { - fmr->device = pd->device; - fmr->pd = pd; - atomic_inc(&pd->usecnt); - } - - return fmr; -} -EXPORT_SYMBOL(ib_alloc_fmr); - -int ib_unmap_fmr(struct list_head *fmr_list) -{ - struct ib_fmr *fmr; - - if (list_empty(fmr_list)) - return 0; - - fmr = list_entry(fmr_list->next, struct ib_fmr, list); - return fmr->device->unmap_fmr(fmr_list); -} -EXPORT_SYMBOL(ib_unmap_fmr); - -int ib_dealloc_fmr(struct ib_fmr *fmr) -{ - struct ib_pd *pd; - int ret; - - pd = fmr->pd; - ret = fmr->device->dealloc_fmr(fmr); - if (!ret) - atomic_dec(&pd->usecnt); - - return ret; -} -EXPORT_SYMBOL(ib_dealloc_fmr); - -/* - Local Variables: - c-file-style: "linux" - indent-tabs-mode: t - End: -*/ Index: src/linux-kernel/infiniband/core/core_ah.c =================================================================== --- src/linux-kernel/infiniband/core/core_ah.c (revision 652) +++ src/linux-kernel/infiniband/core/core_ah.c (working copy) @@ -1,79 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include - -#include "core_priv.h" - -struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) -{ - struct ib_ah *ah; - - ah = pd->device->create_ah(pd, ah_attr); - - if (!IS_ERR(ah)) { - ah->device = pd->device; - ah->pd = pd; - atomic_inc(&pd->usecnt); - } - - return ah; -} -EXPORT_SYMBOL(ib_create_ah); - -int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) -{ - return ah->device->modify_ah ? - ah->device->modify_ah(ah, ah_attr) : - -ENOSYS; -} -EXPORT_SYMBOL(ib_modify_ah); - -int ib_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) -{ - return ah->device->query_ah ? - ah->device->query_ah(ah, ah_attr) : - -ENOSYS; -} -EXPORT_SYMBOL(ib_query_ah); - -int ib_destroy_ah(struct ib_ah *ah) -{ - struct ib_pd *pd; - int ret; - - pd = ah->pd; - ret = ah->device->destroy_ah(ah); - if (!ret) - atomic_dec(&pd->usecnt); - - return ret; -} -EXPORT_SYMBOL(ib_destroy_ah); - -/* - Local Variables: - c-file-style: "linux" - indent-tabs-mode: t - End: -*/ Index: src/linux-kernel/infiniband/core/core_cq.c =================================================================== --- src/linux-kernel/infiniband/core/core_cq.c (revision 652) +++ src/linux-kernel/infiniband/core/core_cq.c (working copy) @@ -1,78 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include - -#include "core_priv.h" - -struct ib_cq *ib_create_cq(struct ib_device *device, - ib_comp_handler comp_handler, - void *cq_context, int cqe) -{ - struct ib_cq *cq; - - cq = device->create_cq(device, &cqe); - - if (!IS_ERR(cq)) { - cq->device = device; - cq->comp_handler = comp_handler; - cq->context = cq_context; - cq->cqe = cqe; - atomic_set(&cq->usecnt, 0); - } - - return cq; -} -EXPORT_SYMBOL(ib_create_cq); - -int ib_destroy_cq(struct ib_cq *cq) -{ - if (atomic_read(&cq->usecnt)) - return -EBUSY; - - return cq->device->destroy_cq(cq); -} -EXPORT_SYMBOL(ib_destroy_cq); - -int ib_resize_cq(struct ib_cq *cq, - int cqe) -{ - int ret; - - if (!cq->device->resize_cq) - return -ENOSYS; - - ret = cq->device->resize_cq(cq, &cqe); - if (!ret) - cq->cqe = cqe; - - return ret; -} -EXPORT_SYMBOL(ib_resize_cq); - -/* - Local Variables: - c-file-style: "linux" - indent-tabs-mode: t - End: -*/ Index: src/linux-kernel/infiniband/core/core_mr.c =================================================================== --- src/linux-kernel/infiniband/core/core_mr.c (revision 652) +++ src/linux-kernel/infiniband/core/core_mr.c (working copy) @@ -1,98 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include -#include - -#include - -#include "core_priv.h" - -struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start) -{ - struct ib_mr *mr; - - mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, - mr_access_flags, iova_start); - - if (!IS_ERR(mr)) { - mr->device = pd->device; - mr->pd = pd; - atomic_inc(&pd->usecnt); - atomic_set(&mr->usecnt, 0); - } - - return mr; -} -EXPORT_SYMBOL(ib_reg_phys_mr); - -int ib_rereg_phys_mr(struct ib_mr *mr, - int mr_rereg_mask, - struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start) -{ - return mr->device->rereg_phys_mr ? - mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, - phys_buf_array, num_phys_buf, - mr_access_flags, iova_start) : - -ENOSYS; -} -EXPORT_SYMBOL(ib_rereg_phys_mr); - -int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr) -{ - return mr->device->query_mr ? - mr->device->query_mr(mr, mr_attr) : -ENOSYS; -} -EXPORT_SYMBOL(ib_query_mr); - -int ib_dereg_mr(struct ib_mr *mr) -{ - struct ib_pd *pd; - int ret; - - if (atomic_read(&mr->usecnt)) - return -EBUSY; - - pd = mr->pd; - ret = mr->device->dereg_mr(mr); - if (!ret) - atomic_dec(&pd->usecnt); - - return ret; -} -EXPORT_SYMBOL(ib_dereg_mr); - -/* - Local Variables: - c-file-style: "linux" - indent-tabs-mode: t - End: -*/ Index: src/linux-kernel/infiniband/core/core_qp.c =================================================================== --- src/linux-kernel/infiniband/core/core_qp.c (revision 652) +++ src/linux-kernel/infiniband/core/core_qp.c (working copy) @@ -1,96 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include - -#include "core_priv.h" - -struct ib_qp *ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap) -{ - struct ib_qp *qp; - - qp = pd->device->create_qp(pd, qp_init_attr, qp_cap); - - if (!IS_ERR(qp)) { - qp->device = pd->device; - qp->pd = pd; - qp->send_cq = qp_init_attr->send_cq; - qp->recv_cq = qp_init_attr->recv_cq; - atomic_inc(&pd->usecnt); - atomic_inc(&qp_init_attr->send_cq->usecnt); - atomic_inc(&qp_init_attr->recv_cq->usecnt); - } - - return qp; -} -EXPORT_SYMBOL(ib_create_qp); - -int ib_modify_qp(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_cap *qp_cap) -{ - return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); -} -EXPORT_SYMBOL(ib_modify_qp); - -int ib_query_qp(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_init_attr *qp_init_attr) -{ - return qp->device->query_qp ? - qp->device->query_qp(qp, qp_attr, qp_attr_mask, qp_init_attr) : - -ENOSYS; -} -EXPORT_SYMBOL(ib_query_qp); - -int ib_destroy_qp(struct ib_qp *qp) -{ - struct ib_pd *pd; - struct ib_cq *scq, *rcq; - int ret; - - pd = qp->pd; - scq = qp->send_cq; - rcq = qp->recv_cq; - - ret = qp->device->destroy_qp(qp); - if (!ret) { - atomic_dec(&pd->usecnt); - atomic_dec(&scq->usecnt); - atomic_dec(&rcq->usecnt); - } - - return ret; -} -EXPORT_SYMBOL(ib_destroy_qp); - -/* - Local Variables: - c-file-style: "linux" - indent-tabs-mode: t - End: -*/ Index: src/linux-kernel/infiniband/core/core_pd.c =================================================================== --- src/linux-kernel/infiniband/core/core_pd.c (revision 652) +++ src/linux-kernel/infiniband/core/core_pd.c (working copy) @@ -1,60 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include -#include - -#include - -#include "core_priv.h" - -struct ib_pd *ib_alloc_pd(struct ib_device *device) -{ - struct ib_pd *pd; - - pd = device->alloc_pd(device); - - if (!IS_ERR(pd)) { - pd->device = device; - atomic_set(&pd->usecnt, 0); - } - - return pd; -} -EXPORT_SYMBOL(ib_alloc_pd); - -int ib_dealloc_pd(struct ib_pd *pd) -{ - if (atomic_read(&pd->usecnt)) - return -EBUSY; - - return pd->device->dealloc_pd(pd); -} -EXPORT_SYMBOL(ib_dealloc_pd); - -/* - Local Variables: - c-file-style: "linux" - indent-tabs-mode: t - End: -*/ Index: src/linux-kernel/infiniband/core/core_mcast.c =================================================================== --- src/linux-kernel/infiniband/core/core_mcast.c (revision 652) +++ src/linux-kernel/infiniband/core/core_mcast.c (working copy) @@ -1,49 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#include - -#include "core_priv.h" - -int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) -{ - return qp->device->attach_mcast ? - qp->device->attach_mcast(qp, gid, lid) : - -ENOSYS; -} -EXPORT_SYMBOL(ib_attach_mcast); - -int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) -{ - return qp->device->detach_mcast ? - qp->device->detach_mcast(qp, gid, lid) : - -ENOSYS; -} -EXPORT_SYMBOL(ib_detach_mcast); - -/* - Local Variables: - c-file-style: "linux" - indent-tabs-mode: t - End: -*/ Index: src/linux-kernel/infiniband/core/ib_verbs.c =================================================================== --- src/linux-kernel/infiniband/core/ib_verbs.c (revision 0) +++ src/linux-kernel/infiniband/core/ib_verbs.c (revision 0) @@ -0,0 +1,398 @@ +/* + This software is available to you under a choice of one of two + licenses. You may choose to be licensed under the terms of the GNU + General Public License (GPL) Version 2, available at + , or the OpenIB.org BSD + license, available in the LICENSE.TXT file accompanying this + software. These details are also available at + . + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + SOFTWARE. + + Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved. + Copyright (c) 2004 Infinicon Corporation. All rights reserved. + Copyright (c) 2004 Intel Corporation. All rights reserved. + Copyright (c) 2004 Topspin Corporation. All rights reserved. + Copyright (c) 2004 Voltaire Corporation. All rights reserved. +*/ + +#include +#include + +/* Protection domains */ + +struct ib_pd *ib_alloc_pd(struct ib_device *device) +{ + struct ib_pd *pd; + + pd = device->alloc_pd(device); + + if (!IS_ERR(pd)) { + pd->device = device; + atomic_set(&pd->usecnt, 0); + } + + return pd; +} +EXPORT_SYMBOL(ib_alloc_pd); + +int ib_dealloc_pd(struct ib_pd *pd) +{ + if (atomic_read(&pd->usecnt)) + return -EBUSY; + + return pd->device->dealloc_pd(pd); +} +EXPORT_SYMBOL(ib_dealloc_pd); + +/* Address handles */ + +struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +{ + struct ib_ah *ah; + + ah = pd->device->create_ah(pd, ah_attr); + + if (!IS_ERR(ah)) { + ah->device = pd->device; + ah->pd = pd; + atomic_inc(&pd->usecnt); + } + + return ah; +} +EXPORT_SYMBOL(ib_create_ah); + +int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) +{ + return ah->device->modify_ah ? + ah->device->modify_ah(ah, ah_attr) : + -ENOSYS; +} +EXPORT_SYMBOL(ib_modify_ah); + +int ib_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) +{ + return ah->device->query_ah ? + ah->device->query_ah(ah, ah_attr) : + -ENOSYS; +} +EXPORT_SYMBOL(ib_query_ah); + +int ib_destroy_ah(struct ib_ah *ah) +{ + struct ib_pd *pd; + int ret; + + pd = ah->pd; + ret = ah->device->destroy_ah(ah); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_ah); + +/* Queue pairs */ + +struct ib_qp *ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap) +{ + struct ib_qp *qp; + + qp = pd->device->create_qp(pd, qp_init_attr, qp_cap); + + if (!IS_ERR(qp)) { + qp->device = pd->device; + qp->pd = pd; + qp->send_cq = qp_init_attr->send_cq; + qp->recv_cq = qp_init_attr->recv_cq; + qp->srq = qp_init_attr->srq; + qp->qp_context = qp_init_attr->qp_context; + atomic_inc(&pd->usecnt); + atomic_inc(&qp_init_attr->send_cq->usecnt); + atomic_inc(&qp_init_attr->recv_cq->usecnt); + if (qp_init_attr->srq) + atomic_inc(&qp_init_attr->srq->usecnt); + } + + return qp; +} +EXPORT_SYMBOL(ib_create_qp); + +int ib_modify_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap) +{ + return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); +} +EXPORT_SYMBOL(ib_modify_qp); + +int ib_query_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr) +{ + return qp->device->query_qp ? + qp->device->query_qp(qp, qp_attr, qp_attr_mask, qp_init_attr) : + -ENOSYS; +} +EXPORT_SYMBOL(ib_query_qp); + +int ib_destroy_qp(struct ib_qp *qp) +{ + struct ib_pd *pd; + struct ib_cq *scq, *rcq; + struct ib_srq *srq; + int ret; + + pd = qp->pd; + scq = qp->send_cq; + rcq = qp->recv_cq; + + ret = qp->device->destroy_qp(qp); + if (!ret) { + atomic_dec(&pd->usecnt); + atomic_dec(&scq->usecnt); + atomic_dec(&rcq->usecnt); + if (srq) + atomic_dec(&srq->usecnt); + } + + return ret; +} +EXPORT_SYMBOL(ib_destroy_qp); + +/* Completion queues */ + +struct ib_cq *ib_create_cq(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, int cqe) +{ + struct ib_cq *cq; + + cq = device->create_cq(device, cqe); + + if (!IS_ERR(cq)) { + cq->device = device; + cq->comp_handler = comp_handler; + cq->context = cq_context; + atomic_set(&cq->usecnt, 0); + } + + return cq; +} +EXPORT_SYMBOL(ib_create_cq); + +int ib_destroy_cq(struct ib_cq *cq) +{ + if (atomic_read(&cq->usecnt)) + return -EBUSY; + + return cq->device->destroy_cq(cq); +} +EXPORT_SYMBOL(ib_destroy_cq); + +int ib_resize_cq(struct ib_cq *cq, + int cqe) +{ + int ret; + + if (!cq->device->resize_cq) + return -ENOSYS; + + ret = cq->device->resize_cq(cq, &cqe); + if (!ret) + cq->cqe = cqe; + + return ret; +} +EXPORT_SYMBOL(ib_resize_cq); + +/* Memory regions */ + +struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start) +{ + struct ib_mr *mr; + + mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, + mr_access_flags, iova_start); + + if (!IS_ERR(mr)) { + mr->device = pd->device; + mr->pd = pd; + atomic_inc(&pd->usecnt); + atomic_set(&mr->usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_reg_phys_mr); + +int ib_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start) +{ + struct ib_pd *old_pd; + int ret; + + if (!mr->device->rereg_phys_mr) + return -ENOSYS; + + if (atomic_read(&mr->usecnt)) + return -EBUSY; + + ret = mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, + phys_buf_array, num_phys_buf, + mr_access_flags, iova_start); + + if (!ret && (mr_rereg_mask & IB_MR_REREG_PD)) { + atomic_dec(&old_pd->usecnt); + atomic_inc(&pd->usecnt); + } + + return ret; +} +EXPORT_SYMBOL(ib_rereg_phys_mr); + +int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr) +{ + return mr->device->query_mr ? + mr->device->query_mr(mr, mr_attr) : -ENOSYS; +} +EXPORT_SYMBOL(ib_query_mr); + +int ib_dereg_mr(struct ib_mr *mr) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(&mr->usecnt)) + return -EBUSY; + + pd = mr->pd; + ret = mr->device->dereg_mr(mr); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_dereg_mr); + +/* Memory windows */ + +struct ib_mw *ib_alloc_mw(struct ib_pd *pd) +{ + struct ib_mw *mw; + + if (!pd->device->alloc_mw) + return ERR_PTR(-ENOSYS); + + mw = pd->device->alloc_mw(pd); + if (!IS_ERR(mw)) { + mw->device = pd->device; + mw->pd = pd; + atomic_inc(&pd->usecnt); + } + + return mw; +} +EXPORT_SYMBOL(ib_alloc_mw); + +int ib_dealloc_mw(struct ib_mw *mw) +{ + struct ib_pd *pd; + int ret; + + pd = mw->pd; + ret = mw->device->dealloc_mw(mw); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_dealloc_mw); + +/* "Fast" memory regions */ + +struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr) +{ + struct ib_fmr *fmr; + + if (!pd->device->alloc_fmr) + return ERR_PTR(-ENOSYS); + + fmr = pd->device->alloc_fmr(pd, mr_access_flags, fmr_attr); + if (!IS_ERR(fmr)) { + fmr->device = pd->device; + fmr->pd = pd; + atomic_inc(&pd->usecnt); + } + + return fmr; +} +EXPORT_SYMBOL(ib_alloc_fmr); + +int ib_unmap_fmr(struct list_head *fmr_list) +{ + struct ib_fmr *fmr; + + if (list_empty(fmr_list)) + return 0; + + fmr = list_entry(fmr_list->next, struct ib_fmr, list); + return fmr->device->unmap_fmr(fmr_list); +} +EXPORT_SYMBOL(ib_unmap_fmr); + +int ib_dealloc_fmr(struct ib_fmr *fmr) +{ + struct ib_pd *pd; + int ret; + + pd = fmr->pd; + ret = fmr->device->dealloc_fmr(fmr); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_dealloc_fmr); + +/* Multicast groups */ + +int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) +{ + return qp->device->attach_mcast ? + qp->device->attach_mcast(qp, gid, lid) : + -ENOSYS; +} +EXPORT_SYMBOL(ib_attach_mcast); + +int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) +{ + return qp->device->detach_mcast ? + qp->device->detach_mcast(qp, gid, lid) : + -ENOSYS; +} +EXPORT_SYMBOL(ib_detach_mcast); + Property changes on: src/linux-kernel/infiniband/core/ib_verbs.c ___________________________________________________________________ Name: svn:keywords + Id Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 701) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -361,7 +361,7 @@ return 0; } -static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, int *entries) +static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, int entries) { struct mthca_cq *cq; int nent; @@ -371,7 +371,7 @@ if (!cq) return ERR_PTR(-ENOMEM); - for (nent = 1; nent < *entries; nent <<= 1) + for (nent = 1; nent < entries; nent <<= 1) ; /* nothing */ err = mthca_init_cq(to_mdev(ibdev), nent, cq); @@ -379,7 +379,7 @@ kfree(cq); cq = ERR_PTR(err); } else - *entries = nent; + cq->cqe = nent; return (struct ib_cq *) cq; } From iod00d at hp.com Mon Aug 30 20:39:32 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 30 Aug 2004 20:39:32 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <52pt5tznfe.fsf@topspin.com> References: <52pt5tznfe.fsf@topspin.com> Message-ID: <20040831033932.GH27631@cup.hp.com> On Sat, Aug 14, 2004 at 09:36:53AM -0700, Roland Dreier wrote: ... > Finally, I changed the logic to unconditionally reset the HCA during > initialization, since I've found that the previous initialization > logic may not work when handed an HCA in an unknown state. This is good. Different firmware (BIOS, EFI, OpenBoot, etc) may leave the adapter in a slightly different state. I've been bitten by this twice in the past 18 monthes. > MSI-X seems to improve performance somewhat, because it allows the > driver to register three separate interrupt vectors and avoid having > to do a slow MMIO read of the event cause register in the interrupt > handler. I don't know if there's much point to plain MSI, because > Linux only allows a driver to have a single interrupt even in MSI mode. Well, MSI was intended to allow devices to use multiple consective vectors. I forgot Linux only allows one and why. MSI is useful with one if the driver can avoid all MMIO read in the normal interrupt case. MSI guarantees strong ordering of DMA to host mem and delivery of the interrupt. grant From iod00d at hp.com Mon Aug 30 20:44:03 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 30 Aug 2004 20:44:03 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <20040816173303.GA15087@mellanox.co.il> References: <506C3D7B14CDD411A52C00025558DED605C6879C@mtlex01.yok.mtl.com> <521xi7qdwv.fsf@topspin.com> <20040816173303.GA15087@mellanox.co.il> Message-ID: <20040831034403.GI27631@cup.hp.com> On Mon, Aug 16, 2004 at 08:33:03PM +0300, Michael S. Tsirkin wrote: > > It seems this would apply to standard INTx mode as well. Do you know > > why Mellanox didn't use this in THCA? In any case, I'll have to > > benchmark this approach. > > It does apply to the standard INTx mode. However with standard > INTx mode there is more of a chance that by the time you peek > the EQ, the EQ is empty and you end up getting an extra interrupt. Erm. I wonder if the EQ is empty because the interrupt was delivered to the CPU before the DMA hit host mem. MSI would avoid this race condition. grant From iod00d at hp.com Mon Aug 30 21:00:02 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 30 Aug 2004 21:00:02 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> Message-ID: <20040831040002.GJ27631@cup.hp.com> On Mon, Aug 16, 2004 at 09:27:13PM +0300, Dror Goldenberg wrote: > In PCI/PCIX, the interrupt is a wire, so it is not guaranteed that by the > time you got the interrupt, the EQE will be waiting in memory. > This is because interrupt goes on a separate wire from HCA to interrupt > controller, while data goes up the PCI bridges. > Therefore it is required to perform a PIO read to flush all > posted writes flying upstream. Dror, I'm pretty sure you understand the issues but are using confusing terminology: o posted write. CPU does not wait for completion of write to IO device o PIO write. Programmed IO - CPU write to IO device. May or not be posted and typically depends on chipset and which "space" (MMIO vs I/O Port) is the target. o PIO read. Programmed IO - CPU read stalls until completion (may be MMIO or I/O port space). o DMA write: IO Device write to host memory (aka upstream) o DMA read: Device command to retrieve data from host memory (downstream) o DMA read return: completion portion of DMA read command (upstream) A PIO Read "flushes" inflight DMA writes from a CPU perspective because the CPU stalls until the PIO read completes. > In PCI-Express, the interrupt is a message, so it will work. The interrupt > will just flush the data to the memory because it maintain ordering with > posted writes upstream. The MSI/MSI-X interrupt doesn't do anything. The interrupt transaction is just another DMA Write and must follow the PCI ordering rules like any other DMA write. The destination address is just not a regular host memory location. > In the current driver, since it's PCI and PCI-Express we > don't do it. In the new mode for Arbel we may do it. > When you do MSI/MSI-X, then architecturally it is guaranteed that by the > time you get the interrupt, the data already waits for you in memory. yes. FYI, PARISC *only* support transaction based interrupts. The CPU has no interrupt lines going to it. On IA64, I believe the same is true (for PCI) because the "Local SAPIC" is integrated into the CPU (same silicon). hth, grant From roland at topspin.com Mon Aug 30 21:10:35 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 30 Aug 2004 21:10:35 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <20040831033932.GH27631@cup.hp.com> (Grant Grundler's message of "Mon, 30 Aug 2004 20:39:32 -0700") References: <52pt5tznfe.fsf@topspin.com> <20040831033932.GH27631@cup.hp.com> Message-ID: <52sma43q2s.fsf@topspin.com> Grant> MSI is useful with one if the driver can avoid all MMIO Grant> read in the normal interrupt case. MSI guarantees strong Grant> ordering of DMA to host mem and delivery of the interrupt. I agree that MSI messages are strongly ordered with respect to other writes coming from a device. However, I couldn't find anything explicit in the PCI spec that says that the interrupt can't pass an earlier write to memory once the MSI hits the host bridge. In other words the interrupt message will definitely hit the chipset after any writes that were initiated before it, but I didn't see anywhere it's guaranteed that the CPU will see those writes before it sees the interrupt. The Intel E7500 Xeon chipset PCI bridge datasheet does say that an interrupt message causes the bridge to flush its write buffers to preserve precisely this ordering, but I don't know whether everyone followed or will follow this example (I'm sure we'll see many more MSI implementations on PCI Express). - R. From roland at topspin.com Mon Aug 30 21:12:36 2004 From: roland at topspin.com (Roland Dreier) Date: Mon, 30 Aug 2004 21:12:36 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <20040831040002.GJ27631@cup.hp.com> (Grant Grundler's message of "Mon, 30 Aug 2004 21:00:02 -0700") References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> <20040831040002.GJ27631@cup.hp.com> Message-ID: <52oeks3pzf.fsf@topspin.com> Grant> FYI, PARISC *only* support transaction based interrupts. Grant> The CPU has no interrupt lines going to it. On IA64, I Grant> believe the same is true (for PCI) because the "Local Grant> SAPIC" is integrated into the CPU (same silicon). This must just be between the CPU and chipset... surely these platforms support conventional PCI devices. (Even current Intel x86 CPUs use interrupt messages rather than interrupt wires to get interrupts from their host bridges) - R. From iod00d at hp.com Mon Aug 30 22:10:47 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 30 Aug 2004 22:10:47 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <52oeks3pzf.fsf@topspin.com> References: <506C3D7B14CDD411A52C00025558DED605C688FD@mtlex01.yok.mtl.com> <20040831040002.GJ27631@cup.hp.com> <52oeks3pzf.fsf@topspin.com> Message-ID: <20040831051047.GP27631@cup.hp.com> On Mon, Aug 30, 2004 at 09:12:36PM -0700, Roland Dreier wrote: > Grant> FYI, PARISC *only* support transaction based interrupts. > Grant> The CPU has no interrupt lines going to it. On IA64, I > Grant> believe the same is true (for PCI) because the "Local > Grant> SAPIC" is integrated into the CPU (same silicon). > > This must just be between the CPU and chipset... surely these > platforms support conventional PCI devices. Conventional PCI IRQs are converted to transaction based interrupts on both PARISC and IA64 by the IO SAPIC (or some other agent on behalf of the PCI device). > (Even current Intel x86 > CPUs use interrupt messages rather than interrupt wires to get > interrupts from their host bridges) yes - I believe it's the same thing but requires IO xAPIC (IIRC). grant From iod00d at hp.com Mon Aug 30 22:22:46 2004 From: iod00d at hp.com (Grant Grundler) Date: Mon, 30 Aug 2004 22:22:46 -0700 Subject: [openib-general] [PATCH] mthca updates (2.6.8 dependent) In-Reply-To: <52sma43q2s.fsf@topspin.com> References: <52pt5tznfe.fsf@topspin.com> <20040831033932.GH27631@cup.hp.com> <52sma43q2s.fsf@topspin.com> Message-ID: <20040831052246.GQ27631@cup.hp.com> On Mon, Aug 30, 2004 at 09:10:35PM -0700, Roland Dreier wrote: > Grant> MSI is useful with one if the driver can avoid all MMIO > Grant> read in the normal interrupt case. MSI guarantees strong > Grant> ordering of DMA to host mem and delivery of the interrupt. > > I agree that MSI messages are strongly ordered with respect to other > writes coming from a device. However, I couldn't find anything > explicit in the PCI spec that says that the interrupt can't pass an > earlier write to memory once the MSI hits the host bridge. MSI transaction is a DMA write from the chipset perspective. It is subject to the same ordering constraints as any other DMA write (or PCI bus transaction for that matter). > In other > words the interrupt message will definitely hit the chipset after any > writes that were initiated before it, but I didn't see anywhere it's > guaranteed that the CPU will see those writes before it sees the > interrupt. I think it assumes the interrupt controller (Local xAPIC) is inside the cache coherency domain. ie once a DMA write hits the coherency domain, ordering no longer is a (PCI) problem. ie we know transactions targeting memory will reach the coherency domain before MSI and thus will be visible to the CPU before the MSI is delivered. > The Intel E7500 Xeon chipset PCI bridge datasheet does say that an > interrupt message causes the bridge to flush its write buffers to > preserve precisely this ordering, but I don't know whether everyone > followed or will follow this example (I'm sure we'll see many more MSI > implementations on PCI Express). Maybe the flush is required because of DMA write coalescing in the E7500 chipset? Ie any DMA writes which can't be coalesced will cause this kind of a flushing behavior. I don't really know since I don't how E7500 handles cache coherency. And yes, I'm certain some chipsets get DMA write coalescing wrong. Look at drivers/net/tg3.c and search for TG3_FLAG_MBOX_WRITE_REORDER in tg3_get_invariants(). grant From halr at voltaire.com Tue Aug 31 04:39:17 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 07:39:17 -0400 Subject: [openib-general] PATCH creation of ib_verbs.c file In-Reply-To: <20040827104003.3282b00e.mshefty@ichips.intel.com> References: <20040827104003.3282b00e.mshefty@ichips.intel.com> Message-ID: <1093952357.1832.4.camel@localhost.localdomain> For all the functions moved to ib_verbs.c, can (just) their definitions be added back in to ib_verbs.h ? -- Hal From mshefty at ichips.intel.com Tue Aug 31 08:12:25 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Aug 2004 08:12:25 -0700 Subject: [openib-general] [PATCH] consolidate verbs functions in ib_verbs.c In-Reply-To: <521xho56xr.fsf@topspin.com> References: <521xho56xr.fsf@topspin.com> Message-ID: <20040831081225.7a36cd2a.mshefty@ichips.intel.com> On Mon, 30 Aug 2004 20:21:04 -0700 Roland Dreier wrote: > I even went a little farther and didn't inline verbs like modify_ah > (if create_ah is out-of-line, it seems reasonable for modify_ah to be > the same). I thought about this, but left those calls inline, since they're a single line. I can go either way though. From mshefty at ichips.intel.com Tue Aug 31 08:45:58 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Aug 2004 08:45:58 -0700 Subject: [openib-general] PATCH creation of ib_verbs.c file In-Reply-To: <1093952357.1832.4.camel@localhost.localdomain> References: <20040827104003.3282b00e.mshefty@ichips.intel.com> <1093952357.1832.4.camel@localhost.localdomain> Message-ID: <20040831084558.00d687eb.mshefty@ichips.intel.com> On Tue, 31 Aug 2004 07:39:17 -0400 Hal Rosenstock wrote: > For all the functions moved to ib_verbs.c, can (just) their definitions > be added back in to ib_verbs.h ? Here's a patch to include all definitions in ib_verbs.h. I also renamed ib_free_fmr to ib_dealloc_fmr based on comments from Roland. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 700) +++ ib_verbs.h (working copy) @@ -708,7 +708,7 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf); int (*unmap_fmr)(struct ib_fmr **fmr_array, int fmr_cnt); - int (*free_fmr)(struct ib_fmr *fmr); + int (*dealloc_fmr)(struct ib_fmr *fmr); int (*attach_mcast)(struct ib_qp *qp, union ib_gid *gid, u16 lid); int (*detach_mcast)(struct ib_qp *qp, union ib_gid *gid, @@ -780,6 +780,13 @@ port_modify); } +struct ib_pd *ib_alloc_pd(struct ib_device *device); + +int ib_dealloc_pd(struct ib_pd *pd); + +struct ib_ah *ib_create_ah(struct ib_pd *pd, + struct ib_ah_attr *ah_attr); + static inline int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) { @@ -792,6 +799,12 @@ return ah->device->query_ah(ah, ah_attr); } +int ib_destroy_ah(struct ib_ah *ah); + +struct ib_qp *ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap); + static inline int ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, @@ -808,6 +821,12 @@ return qp->device->query_qp(qp, qp_attr, qp_attr_mask, qp_init_attr); } +int ib_destroy_qp(struct ib_qp *qp); + +struct ib_srq *ib_create_srq(struct ib_pd *pd, + void *srq_context, + struct ib_srq_attr *srq_attr); + static inline int ib_query_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr) { @@ -829,18 +848,45 @@ return srq->device->post_srq(srq, recv_wr, bad_recv_wr); } +int ib_destroy_srq(struct ib_srq *srq); + +struct ib_cq *ib_create_cq(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, + int cqe); + static inline int ib_resize_cq(struct ib_cq *cq, int cqe) { return cq->device->resize_cq(cq, cqe); } +int ib_destroy_cq(struct ib_cq *cq); + +struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); + static inline int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr) { return mr->device->query_mr(mr, mr_attr); } +int ib_dereg_mr(struct ib_mr *mr); + +int ib_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start); + +struct ib_mw *ib_alloc_mw(struct ib_pd *pd); + static inline int ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw, struct ib_mw_bind *mw_bind) @@ -848,6 +894,12 @@ return mw->device->bind_mw(qp, mw, mw_bind); } +int ib_dealloc_mw(struct ib_mw *mw); + +struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr); + static inline int ib_map_fmr(struct ib_fmr *fmr, void *addr, u64 size) @@ -862,6 +914,8 @@ return fmr_array[0]->device->unmap_fmr(fmr_array, fmr_cnt); } +int ib_dealloc_fmr(struct ib_fmr *fmr); + static inline int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) Index: ib_verbs.c =================================================================== --- ib_verbs.c (revision 700) +++ ib_verbs.c (working copy) @@ -322,17 +322,17 @@ } EXPORT_SYMBOL(ib_alloc_fmr); -int ib_free_fmr(struct ib_fmr *fmr) +int ib_dealloc_fmr(struct ib_fmr *fmr) { struct ib_pd *pd; int ret; pd = fmr->pd; - ret = fmr->device->free_fmr(fmr); + ret = fmr->device->dealloc_fmr(fmr); if (!ret) atomic_dec(&pd->usecnt); return ret; } -EXPORT_SYMBOL(ib_free_fmr); +EXPORT_SYMBOL(ib_dealloc_fmr); From halr at voltaire.com Tue Aug 31 10:29:27 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 13:29:27 -0400 Subject: [openib-general] PATCH creation of ib_verbs.c file In-Reply-To: <20040831084558.00d687eb.mshefty@ichips.intel.com> References: <20040827104003.3282b00e.mshefty@ichips.intel.com> <1093952357.1832.4.camel@localhost.localdomain> <20040831084558.00d687eb.mshefty@ichips.intel.com> Message-ID: <1093973366.1830.11.camel@localhost.localdomain> On Tue, 2004-08-31 at 11:45, Sean Hefty wrote: > Here's a patch to include all definitions in ib_verbs.h. I also renamed ib_free_fmr to > ib_dealloc_fmr based on comments from Roland. Here's another patch to this to define IS_ERR and also remove cast of int to pointer in return from ib_create_srq. Index: ib_verbs.c =================================================================== --- ib_verbs.c (revision 704) +++ ib_verbs.c (working copy) @@ -23,7 +23,7 @@ Copyright (c) 2004 Voltaire Corporation. All rights reserved. */ -#include +#include #include struct ib_pd *ib_alloc_pd(struct ib_device *device) @@ -141,7 +141,7 @@ struct ib_srq *srq; if (!pd->device->create_srq) - return -ENOSYS; + return ERR_PTR(-ENOSYS); srq = pd->device->create_srq(pd, srq_context, srq_attr); From halr at voltaire.com Tue Aug 31 11:37:47 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 14:37:47 -0400 Subject: [openib-general] ib_mad.h Message-ID: <1093977466.1836.16.camel@localhost.localdomain> Hi, Should SM and GS class (and versions) definitions be added to ib_mad.h ? Also, a commentary typo: Index: ib_mad.h =================================================================== --- ib_mad.h (revision 704) +++ ib_mad.h (working copy) @@ -103,7 +103,7 @@ * @device - Reference to device registration is on. * @qp - Reference to QP used for sending and receiving MADs. * @recv_handler - Callback handler for a received MAD. - * @send_handler - Callback hander for a sent MAD. + * @send_handler - Callback handler for a sent MAD. * @context - User-specified context associated with this registration. * @hi_tid - Access layer assigned transaction ID for this client. * Unsolicited MADs sent by this client will have the upper 32-bits -- Hal From mshefty at ichips.intel.com Tue Aug 31 10:41:07 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Aug 2004 10:41:07 -0700 Subject: [openib-general] Re: ib_mad.h In-Reply-To: <1093977466.1836.16.camel@localhost.localdomain> References: <1093977466.1836.16.camel@localhost.localdomain> Message-ID: <20040831104107.4c98b113.mshefty@ichips.intel.com> On Tue, 31 Aug 2004 14:37:47 -0400 Hal Rosenstock wrote: > Hi, > > Should SM and GS class (and versions) definitions be added to ib_mad.h ? Sounds good. If you have a patch, I can apply it. Otherwise, I'll try to add these in as I get time. > Also, a commentary typo: applied. thanks for catching this From halr at voltaire.com Tue Aug 31 11:47:52 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 14:47:52 -0400 Subject: [openib-general] Re: ib_mad.h In-Reply-To: <20040831104107.4c98b113.mshefty@ichips.intel.com> References: <1093977466.1836.16.camel@localhost.localdomain> <20040831104107.4c98b113.mshefty@ichips.intel.com> Message-ID: <1093978071.1830.23.camel@localhost.localdomain> On Tue, 2004-08-31 at 13:41, Sean Hefty wrote: > On Tue, 31 Aug 2004 14:37:47 -0400 > Hal Rosenstock wrote: > > > Hi, > > > > Should SM and GS class (and versions) definitions be added to ib_mad.h ? > > Sounds good. If you have a patch, I can apply it. Otherwise, I'll try to add these in as I get time. I'll be happy to supply a patch for this. (I always wonder whether to do this first or float the idea... > > > Also, a commentary typo: > > applied. thanks for catching this Thanks. -- Hal From halr at voltaire.com Tue Aug 31 12:03:10 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 15:03:10 -0400 Subject: [openib-general] ib_mad.h: ib_mad_reg_req question Message-ID: <1093978990.1837.45.camel@localhost.localdomain> Currently, ib_mad_reg_req includes the mgmt_class as follows: struct ib_mad_reg_req { u8 mgmt_class; ... }; There are 2 SM classes: one for direct routed and the other for LID routed. Should these be handled separately (requiring 2 registrations for a client to get them all) or treated "special" and allow one registration to get them ? To me, this is a 2 part question: 1. Is there ever a need to get just one of these classes ? (I don't think so but want to be sure...) 2. If not, then is it acceptable to muddy the interface this way ? -- Hal From roland.list at gmail.com Tue Aug 31 12:13:12 2004 From: roland.list at gmail.com (Roland Dreier) Date: Tue, 31 Aug 2004 12:13:12 -0700 Subject: [openib-general] ib_mad.h: ib_mad_reg_req question In-Reply-To: <1093978990.1837.45.camel@localhost.localdomain> References: <1093978990.1837.45.camel@localhost.localdomain> Message-ID: > There are 2 SM classes: one for direct routed and the other for LID > routed. Should these be handled separately (requiring 2 registrations > for a client to get them all) or treated "special" and allow one > registration to get them ? > To me, this is a 2 part question: > 1. Is there ever a need to get just one of these classes ? (I don't > think so but want to be sure...) Well, it's impossible to predict what users might want to do, so we should never say never. In fact I can imagine an SM that wants to split the directed route discovery into a different process/thread from LID routed MAD handling, and I don't see a good reason to force a client like this to do the splitting between DR and LR MADs when the core MAD layer could do it perfectly well. > 2. If not, then is it acceptable to muddy the interface this way ? I think the gain (client saves one registration call) is minimal, and is outweighed by the loss in flexibility. Also, I think the confusion generated by having a "magic" interface where one class acts differently from all other classes probably makes it not worth it to people who would use the interface. So I guess my vote on this feature would be "no" :) - R. From mshefty at ichips.intel.com Tue Aug 31 11:22:40 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Aug 2004 11:22:40 -0700 Subject: [openib-general] PATCH creation of ib_verbs.c file In-Reply-To: <1093973366.1830.11.camel@localhost.localdomain> References: <20040827104003.3282b00e.mshefty@ichips.intel.com> <1093952357.1832.4.camel@localhost.localdomain> <20040831084558.00d687eb.mshefty@ichips.intel.com> <1093973366.1830.11.camel@localhost.localdomain> Message-ID: <20040831112240.3599bb8b.mshefty@ichips.intel.com> On Tue, 31 Aug 2004 13:29:27 -0400 Hal Rosenstock wrote: > Here's another patch to this to define IS_ERR and also remove cast of > int to pointer in return from ib_create_srq. Thanks! applied. From mshefty at ichips.intel.com Tue Aug 31 11:34:04 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Aug 2004 11:34:04 -0700 Subject: [openib-general] ib_mad.h: ib_mad_reg_req question In-Reply-To: <1093978990.1837.45.camel@localhost.localdomain> References: <1093978990.1837.45.camel@localhost.localdomain> Message-ID: <20040831113404.45a65e0d.mshefty@ichips.intel.com> On Tue, 31 Aug 2004 15:03:10 -0400 Hal Rosenstock wrote: > To me, this is a 2 part question: > 1. Is there ever a need to get just one of these classes ? (I don't > think so but want to be sure...) > 2. If not, then is it acceptable to muddy the interface this way ? My first thought is that it changing the API isn't worth the gain. As an aside question, does the SM receive unsolicited directed route MADs? I always thought of directed route MADs as being initiated by the SM, basically for configuration purposes, and then routed locally (via ib_process_mad) at the remote side. - Sean From roland at topspin.com Tue Aug 31 12:39:37 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 31 Aug 2004 12:39:37 -0700 Subject: [openib-general] ib_mad.h: ib_mad_reg_req question In-Reply-To: <20040831113404.45a65e0d.mshefty@ichips.intel.com> (Sean Hefty's message of "Tue, 31 Aug 2004 11:34:04 -0700") References: <1093978990.1837.45.camel@localhost.localdomain> <20040831113404.45a65e0d.mshefty@ichips.intel.com> Message-ID: <52brgr3xmu.fsf@topspin.com> Sean> As an aside question, does the SM receive unsolicited Sean> directed route MADs? I always thought of directed route Sean> MADs as being initiated by the SM, basically for Sean> configuration purposes, and then routed locally (via Sean> ib_process_mad) at the remote side. It shouldn't -- I'm pretty sure there's a compliance statement that says only the SM may originate DR SMPs (obviously other nodes are allowed to return their responses). - R. From halr at voltaire.com Tue Aug 31 12:41:51 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 15:41:51 -0400 Subject: [openib-general] ib_mad.h: ib_mad_reg_req question In-Reply-To: <20040831113404.45a65e0d.mshefty@ichips.intel.com> References: <1093978990.1837.45.camel@localhost.localdomain> <20040831113404.45a65e0d.mshefty@ichips.intel.com> Message-ID: <1093981311.1830.83.camel@localhost.localdomain> On Tue, 2004-08-31 at 14:34, Sean Hefty wrote: > On Tue, 31 Aug 2004 15:03:10 -0400 > Hal Rosenstock wrote: > > > To me, this is a 2 part question: > > 1. Is there ever a need to get just one of these classes ? (I don't > > think so but want to be sure...) > > 2. If not, then is it acceptable to muddy the interface this way ? > > My first thought is that it changing the API isn't worth the gain. OK. > As an aside question, does the SM receive unsolicited directed route MADs? Currently no but there has been discussion of allowing for DR traps (not just LR as is the case now (meaning up to and including 1.2)). > I always thought of directed route MADs as being initiated by the SM, > basically for configuration purposes, and then routed locally (via ib_process_mad) > at the remote side. -- Hal From halr at voltaire.com Tue Aug 31 12:44:23 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 15:44:23 -0400 Subject: [openib-general] ib_mad.h: ib_mad_reg_req question In-Reply-To: References: <1093978990.1837.45.camel@localhost.localdomain> Message-ID: <1093981462.1830.87.camel@localhost.localdomain> On Tue, 2004-08-31 at 15:13, Roland Dreier wrote: > > There are 2 SM classes: one for direct routed and the other for LID > > routed. Should these be handled separately (requiring 2 registrations > > for a client to get them all) or treated "special" and allow one > > registration to get them ? > > > To me, this is a 2 part question: > > 1. Is there ever a need to get just one of these classes ? (I don't > > think so but want to be sure...) > > Well, it's impossible to predict what users might want to do, so we > should never say never. In fact I can imagine an SM that wants to > split the directed route discovery into a different process/thread > from LID routed MAD handling, and I don't see a good reason to force a > client like this to do the splitting between DR and LR MADs when the > core MAD layer could do it perfectly well. > > > 2. If not, then is it acceptable to muddy the interface this way ? > > I think the gain (client saves one registration call) is minimal, and > is outweighed by the loss in flexibility. Also, I think the confusion > generated by having a "magic" interface where one class acts > differently from all other classes probably makes it not worth it to > people who would use the interface. > > So I guess my vote on this feature would be "no" :) Thanks. I'm convinced this is not worth it. -- Hal From halr at voltaire.com Tue Aug 31 12:48:38 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 15:48:38 -0400 Subject: [openib-general] ib_mad.h: ib_mad_reg_req question In-Reply-To: <52brgr3xmu.fsf@topspin.com> References: <1093978990.1837.45.camel@localhost.localdomain> <20040831113404.45a65e0d.mshefty@ichips.intel.com> <52brgr3xmu.fsf@topspin.com> Message-ID: <1093981718.1830.89.camel@localhost.localdomain> On Tue, 2004-08-31 at 15:39, Roland Dreier wrote: > Sean> As an aside question, does the SM receive unsolicited > Sean> directed route MADs? I always thought of directed route > Sean> MADs as being initiated by the SM, basically for > Sean> configuration purposes, and then routed locally (via > Sean> ib_process_mad) at the remote side. > > It shouldn't -- I'm pretty sure there's a compliance statement that > says only the SM may originate DR SMPs (obviously other nodes are > allowed to return their responses). That would be C14-5 (and would need changing if DR traps were ever to be added). -- Hal From halr at voltaire.com Tue Aug 31 14:36:34 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 17:36:34 -0400 Subject: [openib-general] DAPL for openib In-Reply-To: <20040825142517.70ec64a9.mshefty@ichips.intel.com> References: <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <20040825215558.GB2829@mellanox.co.il> <52brgydgnk.fsf@topspin.com> <20040825221338.GC2829@mellanox.co.il> <20040825142517.70ec64a9.mshefty@ichips.intel.com> Message-ID: <1093988194.1836.127.camel@localhost.localdomain> On Wed, 2004-08-25 at 17:25, Sean Hefty wrote: > If you use separate tables for version, class, and method, and let them grow dynamically, > they can be substantially smaller. E.g. the version table would likely be a single entry > referencing a class table. For most classes this is true. For CM, there might be a couple of versions supported. > The class table is about 8 entries long, This of course depends on what services are running on the node. > but requires remapping class 0x81 to index 0. This remapping won't be done based on the recent email exchange of not combining the DR/LR SM registration. (This is a minor alteration to Sean's point about memory consumption.) > The method array is only needed if a client doesn't register to > receive all unsolicited MADs for a specific class. This means that for any class, the IB MAD layer needs to know which methods are used for unsolicited MADs. By class, here are the potential unsolicited methods: SM: trap SA client: report SA: all requests (get, set, gettable, getmulti, gettrace, delete) CM: send PM: none BM: send, trap, report DevMgt: trap, report SNMP: send vendor & application: send, trap > Something like this should let you dispatch to a single client. > To support a client viewing the MAD, you'd just need linked lists at any point > along the dispatch path. From halr at voltaire.com Tue Aug 31 14:39:37 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 17:39:37 -0400 Subject: [openib-general] [PATCH] ib_mad.h: Add in IB management class definitions Message-ID: <1093988376.1830.131.camel@localhost.localdomain> ib_mad.h: Add in IB management class definitions Vendor and application class definitions are still pending. Index: ib_mad.h =================================================================== --- ib_mad.h (revision 707) +++ ib_mad.h (working copy) @@ -28,6 +28,16 @@ #include "ib_verbs.h" +/* Management classes */ +#define IB_MGMT_CLASS_SUBN_LID_ROUTED 0x01 +#define IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE 0x81 +#define IB_MGMT_CLASS_SUBN_ADM 0x03 +#define IB_MGMT_CLASS_PERF 0x04 +#define IB_MGMT_CLASS_BM 0x05 +#define IB_MGMT_CLASS_DEV_MGT 0x06 +#define IB_MGMT_CLASS_COM_MGT 0x07 +#define IB_MGMT_CLASS_SNMP 0x08 + #define IB_QP0 0 #define IB_QP1 cpu_to_be32(1) #define IB_QP1_QKEY cpu_to_be32(0x80010000) From mshefty at ichips.intel.com Tue Aug 31 14:09:09 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Aug 2004 14:09:09 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <1093988194.1836.127.camel@localhost.localdomain> References: <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <20040825215558.GB2829@mellanox.co.il> <52brgydgnk.fsf@topspin.com> <20040825221338.GC2829@mellanox.co.il> <20040825142517.70ec64a9.mshefty@ichips.intel.com> <1093988194.1836.127.camel@localhost.localdomain> Message-ID: <20040831140909.7c63f866.mshefty@ichips.intel.com> On Tue, 31 Aug 2004 17:36:34 -0400 Hal Rosenstock wrote: > For most classes this is true. For CM, there might be a couple of > versions supported. Still, the table is small, consuming only a few bytes. > > The class table is about 8 entries long, > > This of course depends on what services are running on the node. Correct. There are only a handful of defined classes. The table could be larger if there were several vendor defined classes. Worst case is a table with 256 entries, with the table duplicated for each version. > > The method array is only needed if a client doesn't register to > > receive all unsolicited MADs for a specific class. > > This means that for any class, the IB MAD layer needs to know > which methods are used for unsolicited MADs. I don't think that the MAD layer needs to know this information. It needs to know if a received MAD was sent in response to a previously sent one. In general, this means checking the response bit. If not, it will route the MAD to whichever client is registered to receive unsolicited MADs that match the MADs version / class / method / attribute / etc. If the MAD is a response, it can route based on the transaction ID. Of course there are exceptions to the rule of simply checking the response bit, like RMPP. From halr at voltaire.com Tue Aug 31 15:23:59 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 18:23:59 -0400 Subject: [openib-general] DAPL for openib In-Reply-To: <20040831140909.7c63f866.mshefty@ichips.intel.com> References: <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <20040825215558.GB2829@mellanox.co.il> <52brgydgnk.fsf@topspin.com> <20040825221338.GC2829@mellanox.co.il> <20040825142517.70ec64a9.mshefty@ichips.intel.com> <1093988194.1836.127.camel@localhost.localdomain> <20040831140909.7c63f866.mshefty@ichips.intel.com> Message-ID: <1093991038.1836.160.camel@localhost.localdomain> On Tue, 2004-08-31 at 17:09, Sean Hefty wrote: > > > The method array is only needed if a client doesn't register to > > > receive all unsolicited MADs for a specific class. > > > > This means that for any class, the IB MAD layer needs to know > > which methods are used for unsolicited MADs. > > I don't think that the MAD layer needs to know this information. Then can you remind me why we are passing in the ib_mad_reg_req structure which contains the method array bitmask ? (see below) > It needs to know if a received MAD was sent in response to a previously sent one. > In general, this means checking the response bit. If not, it will route the MAD to whichever client > is registered to receive unsolicited MADs that match the MADs version / class / method / attribute / etc. Right and we don't go down to attribute right now but may need to. Isn't the method here the reason for the method bitmask ? This seems to be at odds with the previous statement about the MAD layer not needing to know this. That would be true for responses but not unsolicited requests. Is that what you meant ? > If the MAD is a response, it can route based on the transaction ID. Based on the high 32 bits of the TID. > Of course there are exceptions to the rule of simply checking the response bit, like RMPP. Are there any exceptions you are aware of off the top of your head ? Thanks. -- Hal From timur.tabi at ammasso.com Tue Aug 31 15:35:29 2004 From: timur.tabi at ammasso.com (Timur Tabi) Date: Tue, 31 Aug 2004 17:35:29 -0500 Subject: [openib-general] get_user_pages() vs. sys_mlock() and 2.6 kernel Message-ID: <4134FD31.4060301@ammasso.com> What is the reason that sys_mlock() is used instead of get_user_pages()? I know the sys_mlock() is used because other methods of locking pages didn't really lock the pages (i.e. there were still situations where the page would be swapped out). Doe get_user_pages() have that problem also? If so, has any checked to see if it's been fixed in the 2.6 kernel? -- Timur Tabi Staff Software Engineer timur.tabi at ammasso.com From mshefty at ichips.intel.com Tue Aug 31 14:43:01 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Aug 2004 14:43:01 -0700 Subject: [openib-general] DAPL for openib In-Reply-To: <1093991038.1836.160.camel@localhost.localdomain> References: <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <20040825215558.GB2829@mellanox.co.il> <52brgydgnk.fsf@topspin.com> <20040825221338.GC2829@mellanox.co.il> <20040825142517.70ec64a9.mshefty@ichips.intel.com> <1093988194.1836.127.camel@localhost.localdomain> <20040831140909.7c63f866.mshefty@ichips.intel.com> <1093991038.1836.160.camel@localhost.localdomain> Message-ID: <20040831144301.606f0ede.mshefty@ichips.intel.com> On Tue, 31 Aug 2004 18:23:59 -0400 Hal Rosenstock wrote: > > > This means that for any class, the IB MAD layer needs to know > > > which methods are used for unsolicited MADs. > > > > I don't think that the MAD layer needs to know this information. > > Then can you remind me why we are passing in the ib_mad_reg_req > structure which contains the method array bitmask ? (see below) >... > > Right and we don't go down to attribute right now but may need to. Isn't > the method here the reason for the method bitmask ? This seems to be > at odds with the previous statement about the MAD layer not needing to > know this. That would be true for responses but not unsolicited > requests. Is that what you meant ? I don't think we're matching our terminology. The MAD layer needs to know how to route, and will operate based on class/version/method/etc., but doesn't need to know the specifics for any given class. I.e. it routes based on the values of given fields, not their meaning, wherever possible. The registration process is intended to provide the MAD layer a set of values that it uses to route with. The MAD layer shouldn't care what those values mean. This is what I meant when I said that the MAD layer doesn't need to "know" which methods are unsolicited. > > Of course there are exceptions to the rule of simply checking the response bit, like RMPP. > > Are there any exceptions you are aware of off the top of your head ? I *think* just RMPP, which doesn't use the response bit in all cases (or has the bit flipped). This is really an internal issue inside the MAD layer, but may affect the implementation of how MAD routing is done. From jdaley at systemfabricworks.com Tue Aug 31 15:35:05 2004 From: jdaley at systemfabricworks.com (Jan Daley) Date: Tue, 31 Aug 2004 17:35:05 -0500 Subject: [openib-general] RE: openib-general Digest, Vol 2, Issue 81 In-Reply-To: <20040831222108.E76D42283DA@openib.ca.sandia.gov> Message-ID: <000001c48faa$c78b1650$6b01a8c0@maverick> There is one case in which a SM can receive an unsolicited directed route MAD. The master SM could receive a directed route SubnGet(SMInfo) from another SM on the subnet that is in the discovering state. Jan Daley System Fabric Works (512) 343-6101 x 13 ------------------------------ Message: 4 Date: Tue, 31 Aug 2004 11:34:04 -0700 From: Sean Hefty Subject: Re: [openib-general] ib_mad.h: ib_mad_reg_req question To: Hal Rosenstock Cc: openib-general at openib.org Message-ID: <20040831113404.45a65e0d.mshefty at ichips.intel.com> Content-Type: text/plain; charset=US-ASCII On Tue, 31 Aug 2004 15:03:10 -0400 Hal Rosenstock wrote: > To me, this is a 2 part question: > 1. Is there ever a need to get just one of these classes ? (I don't > think so but want to be sure...) > 2. If not, then is it acceptable to muddy the interface this way ? My first thought is that it changing the API isn't worth the gain. As an aside question, does the SM receive unsolicited directed route MADs? I always thought of directed route MADs as being initiated by the SM, basically for configuration purposes, and then routed locally (via ib_process_mad) at the remote side. - Sean From mshefty at ichips.intel.com Tue Aug 31 14:59:23 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Aug 2004 14:59:23 -0700 Subject: [openib-general] Re: [PATCH] ib_mad.h: Add in IB management class definitions In-Reply-To: <1093988376.1830.131.camel@localhost.localdomain> References: <1093988376.1830.131.camel@localhost.localdomain> Message-ID: <20040831145923.661c12f6.mshefty@ichips.intel.com> On Tue, 31 Aug 2004 17:39:37 -0400 Hal Rosenstock wrote: > ib_mad.h: Add in IB management class definitions > Vendor and application class definitions are still pending. Thanks! I've integrated into my working repository, but some nit-picky comments below... > +/* Management classes */ > +#define IB_MGMT_CLASS_PERF 0x04 > +#define IB_MGMT_CLASS_BM 0x05 > +#define IB_MGMT_CLASS_DEV_MGT 0x06 > +#define IB_MGMT_CLASS_COM_MGT 0x07 I've gone back and forth on the names here. The names closest to the spec would be what you have: PERF, BM, DEV_MGT, and COM_MGT. For API consistency, we use MGMT, instead of MGT, and DEVICE instead of DEV. And I'm guessing that the resulting CM API will use "cm" or "conn" in its name. Anyway, I'm inclined to go with: PERF_MGMT (or PM), BM, DEVICE_MGMT (or DM), and CM. Does anyone care or have an opinion? From jdaley at systemfabricworks.com Tue Aug 31 15:35:05 2004 From: jdaley at systemfabricworks.com (Jan Daley) Date: Tue, 31 Aug 2004 17:35:05 -0500 Subject: [openib-general] RE: openib-general Digest, Vol 2, Issue 81 In-Reply-To: <20040831222108.E76D42283DA@openib.ca.sandia.gov> Message-ID: <000001c48faa$c78b1650$6b01a8c0@maverick> There is one case in which a SM can receive an unsolicited directed route MAD. The master SM could receive a directed route SubnGet(SMInfo) from another SM on the subnet that is in the discovering state. Jan Daley System Fabric Works (512) 343-6101 x 13 ------------------------------ Message: 4 Date: Tue, 31 Aug 2004 11:34:04 -0700 From: Sean Hefty Subject: Re: [openib-general] ib_mad.h: ib_mad_reg_req question To: Hal Rosenstock Cc: openib-general at openib.org Message-ID: <20040831113404.45a65e0d.mshefty at ichips.intel.com> Content-Type: text/plain; charset=US-ASCII On Tue, 31 Aug 2004 15:03:10 -0400 Hal Rosenstock wrote: > To me, this is a 2 part question: > 1. Is there ever a need to get just one of these classes ? (I don't > think so but want to be sure...) > 2. If not, then is it acceptable to muddy the interface this way ? My first thought is that it changing the API isn't worth the gain. As an aside question, does the SM receive unsolicited directed route MADs? I always thought of directed route MADs as being initiated by the SM, basically for configuration purposes, and then routed locally (via ib_process_mad) at the remote side. - Sean From halr at voltaire.com Tue Aug 31 16:20:09 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 19:20:09 -0400 Subject: [openib-general] RE: openib-general Digest, Vol 2, Issue 81 In-Reply-To: <000001c48faa$c78b1650$6b01a8c0@maverick> References: <000001c48faa$c78b1650$6b01a8c0@maverick> Message-ID: <1093994408.1836.163.camel@localhost.localdomain> On Tue, 2004-08-31 at 18:35, Jan Daley wrote: > There is one case in which a SM can receive an unsolicited directed > route MAD. The master SM could receive a directed route SubnGet(SMInfo) > from another SM on the subnet that is in the discovering state. It could also be LR (as well as DR) SubnGet(SMInfo) for when the standbys are polling the master. -- Hal From halr at voltaire.com Tue Aug 31 16:25:35 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 19:25:35 -0400 Subject: [Fwd: Re: [openib-general] RE: openib-general Digest, Vol 2, Issue 81] Message-ID: <1093994735.1836.168.camel@localhost.localdomain> One other excception case is the one which Roland has brought up which is Tavor specific: locally generated SM traps (LR SM traps with SLID 0). -----Forwarded Message----- From: Hal Rosenstock To: Jan Daley Cc: openib-general at openib.org Subject: Re: [openib-general] RE: openib-general Digest, Vol 2, Issue 81 Date: Tue, 31 Aug 2004 19:20:09 -0400 On Tue, 2004-08-31 at 18:35, Jan Daley wrote: > There is one case in which a SM can receive an unsolicited directed > route MAD. The master SM could receive a directed route SubnGet(SMInfo) > from another SM on the subnet that is in the discovering state. It could also be LR (as well as DR) SubnGet(SMInfo) for when the standbys are polling the master. -- Hal From halr at voltaire.com Tue Aug 31 16:38:56 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 19:38:56 -0400 Subject: [openib-general] DAPL for openib In-Reply-To: <20040831144301.606f0ede.mshefty@ichips.intel.com> References: <1093359001.1831.28.camel@localhost.localdomain> <52zn4kr0p1.fsf@topspin.com> <20040824085223.71efa553.mshefty@ichips.intel.com> <52eklwqxrn.fsf@topspin.com> <20040825194344.GA2399@mellanox.co.il> <52smabc7r2.fsf@topspin.com> <20040825202141.GA2672@mellanox.co.il> <20040825134356.7491f403.mshefty@ichips.intel.com> <20040825215558.GB2829@mellanox.co.il> <52brgydgnk.fsf@topspin.com> <20040825221338.GC2829@mellanox.co.il> <20040825142517.70ec64a9.mshefty@ichips.intel.com> <1093988194.1836.127.camel@localhost.localdomain> <20040831140909.7c63f866.mshefty@ichips.intel.com> <1093991038.1836.160.camel@localhost.localdomain> <20040831144301.606f0ede.mshefty@ichips.intel.com> Message-ID: <1093995535.1837.181.camel@localhost.localdomain> On Tue, 2004-08-31 at 17:43, Sean Hefty wrote: > I don't think we're matching our terminology. The MAD layer needs to know how to route, > and will operate based on class/version/method/etc., but doesn't need to know the specifics > for any given class. I.e. it routes based on the values of given fields, not their meaning, > wherever possible. > > The registration process is intended to provide the MAD layer a set of values > that it uses to route with. The MAD layer shouldn't care what those values mean. > This is what I meant when I said that the MAD layer doesn't need to "know" which > methods are unsolicited. Understood. What got me started on this was the following comment: The method array is only needed if a client doesn't register to receive all unsolicited MADs for a specific class. I think the above comment is in terms of the base routing requirement rather than how we are doing this as there is no way for a client to not specify the method array for a class due to the current definition of ib_mad_reg_req. > I *think* just RMPP, which doesn't use the response bit in all cases (or has the bit flipped). > This is really an internal issue inside the MAD layer, but may affect the implementation of how > MAD routing is done. I'm going to defer this and worry about it more when I get (back) to RMPP which won't be for a little while yet. -- Hal From roland at topspin.com Tue Aug 31 16:45:02 2004 From: roland at topspin.com (Roland Dreier) Date: Tue, 31 Aug 2004 16:45:02 -0700 Subject: [Fwd: Re: [openib-general] RE: openib-general Digest, Vol 2, Issue 81] In-Reply-To: <1093994735.1836.168.camel@localhost.localdomain> (Hal Rosenstock's message of "Tue, 31 Aug 2004 19:25:35 -0400") References: <1093994735.1836.168.camel@localhost.localdomain> Message-ID: <52vfey3m9t.fsf@topspin.com> Hal> One other excception case is the one which Roland has brought Hal> up which is Tavor specific: locally generated SM traps (LR SM Hal> traps with SLID 0). This isn't an exception to the rule about DR SMPs, though (since the traps are LID routed). - R. From halr at voltaire.com Tue Aug 31 17:09:41 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 31 Aug 2004 20:09:41 -0400 Subject: [Fwd: Re: [openib-general] RE: openib-general Digest, Vol 2, Issue 81] In-Reply-To: <52vfey3m9t.fsf@topspin.com> References: <1093994735.1836.168.camel@localhost.localdomain> <52vfey3m9t.fsf@topspin.com> Message-ID: <1093997381.1837.188.camel@localhost.localdomain> On Tue, 2004-08-31 at 19:45, Roland Dreier wrote: > Hal> One other excception case is the one which Roland has brought > Hal> up which is Tavor specific: locally generated SM traps (LR SM > Hal> traps with SLID 0). > > This isn't an exception to the rule about DR SMPs, though (since the > traps are LID routed). Right, I meant unsolicited methods in addition to the ones I had listed. Sorry for the confusion. -- Hal