From yaronh at voltaire.com Thu Jul 29 08:44:57 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Thu, 29 Jul 2004 18:44:57 +0300 Subject: [openib-general] gen2 dev branch Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AA6F@taurus.voltaire.com> On Thursday, July 29, 2004 5:31 PM, Roland Dreier wrote: > Yaron> The rather simplistic QP1 approach suggested by Roland > Yaron> cannot work for others who did implement and built on top > Yaron> of functionality such as RMPP and Redirect, not to mention > Yaron> its poor scalability that cannot service the deployments we > Yaron> are used to. > > Why can't it work? I think the fact that Topspin's drivers work on > 500+ node fabrics with such a simple-minded MAD layer is a strong > argument against other layers, which look over-engineered and > inflexible to me. You are not working with multipathing, with multiple HCA's per server, and tons of CPU's per node in many nodes configurations as far as I know, This requires multiple SA consumers, RMPP for GetTable and MultiPathQuery, And very high SA performance, and I also don't know what you implemented for Traps and Report support and how you use it in such large configurations. I don't see where is the over engineering in our/Todd suggestion, there are only few params in the API, we just suggest to put more things in a common place, and you are suggesting to scatter them all over and take us backwards. I do know it is different than your implementation, maybe that is the reason for the strong resistance :) The gsi.h file was already change to accommodate your genuine memory management concerns, we appreciate any productive suggestions, and some of the work we do now is to adopt the implementation to it (few numbers: our SM manages configuration with thousands of ports, have systems with 512 CPU's on a single system, and dozens of HCA's in each, and staging a cluster with over 10,000 CPUs if you read the news, so we may have learned one or two things about IB scalability and efficiency the hard way, its not all about how the code looks). > Yaron> I didn't here from Roland any reason why the proposed gsi > Yaron> doesn't answer his ULP's functional requirements, I believe > Yaron> it does, and he can benefit from the > Yaron> added value in future as well. > > I'm very concerned about the lack of flexibility in the proposed API. > For example, only allowing one manager to register per GSI class > wouldn't work well for a subnet manager that wants to handle > multicast queries in one thread and path record requests in another > thread. First that's not how OpenSM works, or planned to work in the near future, and it is more important to make the simple use cases more efficient by eliminating additional filtering and extra copies, but as I mentioned before the attribute field can be added if you really need it for your ULP's (it is not changing the gsi model just adding a param) I'm open to here other real use cases you feel are not met by our proposal, and why you think it is complex I think the TID based demux for clients is much simpler than the multi-filter, multi-copy approach you suggest The Redirect code can be #if 0 initially if you are concerned about having it there Also can you describe how can we test our cleaned gsi version with mthca and the new core ? Will you put it in tree as Sean suggested ? Is there an .h file with additional access api's beyond verbs you can point us to ? Yaron From roland at topspin.com Thu Jul 29 09:08:03 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 29 Jul 2004 09:08:03 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AA6F@taurus.voltaire.com> (Yaron Haviv's message of "Thu, 29 Jul 2004 18:44:57 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AA6F@taurus.voltaire.com> Message-ID: <52vfg6vlss.fsf@topspin.com> Yaron> Also can you describe how can we test our cleaned gsi Yaron> version with mthca and the new core ? Will you put it in Yaron> tree as Sean suggested ? Is there an .h file with Yaron> additional access api's beyond verbs you can point us to ? svn cp https://openib.org/svn/gen2/branches/roland-merge https://openib.org/svn/gen2/branches/YOUR-BRANCH-NAME svn co https://openib.org/svn/gen2/branches/YOUR-BRANCH-NAME and enjoy... Obviously since we haven't finished defining ib_verbs.h, my branch is not up-to-date with the (as yet nonexistent) new verbs API. All the include files are in src/linux-kernel/infiniband/include. - R. From robert.j.woodruff at intel.com Thu Jul 29 09:23:10 2004 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 29 Jul 2004 09:23:10 -0700 Subject: [openib-general] gen2 dev branch Message-ID: <1AC79F16F5C5284499BB9591B33D6F00CA630B@orsmsx408> >I don't need write access unless no one else wants to touch IPoIB. I still only have a >high level view of IB and don't understand the nuances of all the VAPI interfaces being >discussed. Which version of IPoIB are we going to start with to port to the new verbs API ? and who is going to be the maintainer ? Looks like Grant is offering to help if no one else has the cycles. woody From trimmer at infiniconsys.com Thu Jul 29 09:26:56 2004 From: trimmer at infiniconsys.com (Rimmer, Todd) Date: Thu, 29 Jul 2004 12:26:56 -0400 Subject: [openib-general] mpirun_rsh help Message-ID: <08628CA53C6CBA4ABAFB9E808A5214CB02294B96@mercury.infiniconsys.com> > -----Original Message----- > From: Parul Bhatt [mailto:parul_sunil at yahoo.com] > Sent: Thursday, July 29, 2004 10:57 AM > I do understand looking at the error messages that > VAPI is not SUCCESSFUL, but what I do not understand > is "Got completion with error, > code=VAPI_RETRY_EXC_ERR, > vendor code=81 > PMB-MPI1: viacheck.c:2104: viutil_spinandwaitcq:" > > What we are misssing? Can someone inform/help us with > this? We are using LINUX operating system with > 2.4.... Kernel. This error means the Infiniband Reliable Connection retry Count was exceeded. Most likely you have a marginal cable or port on the hardware. Todd R. From trimmer at infiniconsys.com Thu Jul 29 09:29:33 2004 From: trimmer at infiniconsys.com (Rimmer, Todd) Date: Thu, 29 Jul 2004 12:29:33 -0400 Subject: [openib-general] VAPI like API Message-ID: <08628CA53C6CBA4ABAFB9E808A5214CB02294B97@mercury.infiniconsys.com> > -----Original Message----- > From: Roland Dreier [mailto:roland at topspin.com] > Sent: Thursday, July 29, 2004 10:50 AM > To: openib-general at openib.org > Subject: Re: [openib-general] VAPI like API > > > Sean> If not ib_device, I'm guessing that there will be some > Sean> structure per client in the access layer...? I do agree > Sean> that having separate send and receive handlers would be > Sean> better than a single completion handler. So, at a minimum, > Sean> I'd vote for _at least_ two completion handlers per client. > > There's no fundamental reason for the AL to have a per consumer > structure right now, although we still need how to work out how to tie > into the kernel device model. By the way, I don't think the IB spec's > concept of enumerating HCAs and then having the consumer open the HCAs > it like works well in the kernel. That's sort of like the old PCI > driver model, and it doesn't work well in a dynamic world where > devices can come and go. We should rather think in terms of consumers > registering with the AL and getting called back when devices appear or > disappear (and I'm hoping the driver model can do this for us without > having to create our own mechanism). > > BTW, if we're going to have two completion handlers, it seems we might > as well have the handler be per CQ (since it seems worse to me to have > an index/pointer stored in struct ib_cq that's just used to look up > the actual function pointer to call). Yes in our implementation, the HCA drivers call into the access layer on PCI device discovery. ULPs can register with the access layer for callbacks when CA's appear (and or the state of a link changes). This does not mean that open is a bad call, the response to these notifications was for the access layer or ULP to open the CA and begin performing operations against it. Todd R. From roland at topspin.com Thu Jul 29 09:30:29 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 29 Jul 2004 09:30:29 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F00CA630B@orsmsx408> (Robert J. Woodruff's message of "Thu, 29 Jul 2004 09:23:10 -0700") References: <1AC79F16F5C5284499BB9591B33D6F00CA630B@orsmsx408> Message-ID: <52hdrqvkre.fsf@topspin.com> Robert> Which version of IPoIB are we going to start with to port Robert> to the new verbs API ? and who is going to be the Robert> maintainer ? I am porting our IPoIB driver as I update mthca so that I have something to test with. However the more ULPs the merrier so I don't think we have to pick only one IPoIB driver. - R. From roland at topspin.com Thu Jul 29 09:32:51 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 29 Jul 2004 09:32:51 -0700 Subject: [openib-general] Register virtual for kernel API? In-Reply-To: <20040728212124.79f515a0.mshefty@ichips.intel.com> (Sean Hefty's message of "Wed, 28 Jul 2004 21:21:24 -0700") References: <52d62fyfe7.fsf@topspin.com> <20040728212124.79f515a0.mshefty@ichips.intel.com> Message-ID: <52brhyvkng.fsf@topspin.com> Sean> My initial thought is to agree. Doesn't iwarp only support Sean> physical registration? (Meaning, if so, then the iwarp Sean> designers didn't find it necessary either.) OK, here's a patch removing ib_reg_mr, ib_rereg_mr and ib_reg_smr (I don't think shared memory makes sense in the kernel). I also changed ib_rereg_phys_mr to return an int, since I assume it will update the struct ib_mr * passed to it without allocating a new one. - R. Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 539) +++ ib_verbs.h (working copy) @@ -616,13 +616,6 @@ int ib_destroy_cq(struct ib_cq *cq); -struct ib_mr *ib_reg_mr(struct ib_pd *pd, - void *addr, - u64 size, - int mr_access_flags, - u32 *lkey, - u32 *rkey); - struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, @@ -636,32 +629,16 @@ int ib_dereg_mr(struct ib_mr *mr); -struct ib_mr *ib_rereg_mr(struct ib_mr *mr, - int mr_rereg_mask, - struct ib_pd *pd, - void *addr, - u64 size, - int mr_access_flags, - u32 *lkey, - u32 *rkey); +int ib_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey); -struct ib_mr *ib_rereg_phys_mr(struct ib_mr *mr, - int mr_rereg_mask, - struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey); - -struct ib_mr *ib_reg_smr(struct ib_mr *mr, - struct ib_pd *pd, - int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey); - struct ib_mw *ib_alloc_mw(struct ib_pd *pd, u32 *rkey); From mshefty at ichips.intel.com Thu Jul 29 08:59:36 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Jul 2004 08:59:36 -0700 Subject: [openib-general] VAPI like API In-Reply-To: <521xiux3zd.fsf@topspin.com> References: <001801c46e5c$a0120a80$6401a8c0@comcast.net> <52fz7m6799.fsf@topspin.com> <00b601c46e72$90c25b00$6401a8c0@comcast.net> <52y8le4q6q.fsf@topspin.com> <000801c46f41$f751b120$6401a8c0@comcast.net> <20040727145917.00000a91@mshefty-mobl.amr.corp.intel.com> <52acxl2aya.fsf@topspin.com> <20040728113736.00001722@mshefty-mobl.amr.corp.intel.com> <52n01jwhjk.fsf@topspin.com> <20040728205732.77512f42.mshefty@ichips.intel.com> <521xiux3zd.fsf@topspin.com> Message-ID: <20040729085936.193724c6.mshefty@ichips.intel.com> On Thu, 29 Jul 2004 07:49:58 -0700 Roland Dreier wrote: > There's no fundamental reason for the AL to have a per consumer > structure right now, although we still need how to work out how to tie > into the kernel device model. By the way, I don't think the IB spec's > concept of enumerating HCAs and then having the consumer open the HCAs > it like works well in the kernel. That's sort of like the old PCI > driver model, and it doesn't work well in a dynamic world where > devices can come and go. We should rather think in terms of consumers > registering with the AL and getting called back when devices appear or > disappear (and I'm hoping the driver model can do this for us without > having to create our own mechanism). I'm fine with thinking this way. But wouldn't the registration with AL result in a per client structure, or are you hoping that this would be hidden by the normal driver model? > BTW, if we're going to have two completion handlers, it seems we might > as well have the handler be per CQ (since it seems worse to me to have > an index/pointer stored in struct ib_cq that's just used to look up > the actual function pointer to call). Thinking about this after sending the message, it seems that if you wanted to allow two handlers per client, then the struct ib_cq would need at least a u8 index field. So, I'm tending to agree here. From mshefty at ichips.intel.com Thu Jul 29 09:04:36 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Jul 2004 09:04:36 -0700 Subject: [openib-general] VAPI like API In-Reply-To: <00a401c47574$295f8b00$6401a8c0@comcast.net> References: <08628CA53C6CBA4ABAFB9E808A5214CB016974D0@mercury.infiniconsys.com> <20040728151500.0529c1af.mshefty@ichips.intel.com> <00a401c47574$295f8b00$6401a8c0@comcast.net> Message-ID: <20040729090436.219e871b.mshefty@ichips.intel.com> On Thu, 29 Jul 2004 09:58:42 -0400 Hal Rosenstock wrote: > Sean Hefty wrote: > > On Wed, 28 Jul 2004 08:17:40 -0400 > > "Rimmer, Todd" wrote: > >> A few more updates: > >> - added neighbor_mtu to ib_device_port. > > Would ib_port be sufficient as a name rather than ib_device_port ? I'm okay with this. Any objections? > >> max_mtu is the hardware > >> capability while neighbor_mtu is the configured active MTU for a > >> Armed/Active port. > > > > Do we need to distinguish between the HW versus active MTUs? Seems > > like only the active MTU is usable. > > Yes, only the active (neighbor) MTU is usable but the max_mtu is used by the > SM > to determine a compatible neighbor MTU at both ends of each link. Okay - I'll add an active_mtu field to ib_[device]_port. (I like the name active better than neighbor, since the value should be the minimum of the local and remote MTUs.) From mshefty at ichips.intel.com Thu Jul 29 09:08:53 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Jul 2004 09:08:53 -0700 Subject: [openib-general] VAPI like API In-Reply-To: <009101c47572$d538af80$6401a8c0@comcast.net> References: <08628CA53C6CBA4ABAFB9E808A5214CB016974D0@mercury.infiniconsys.com> <20040728151500.0529c1af.mshefty@ichips.intel.com> <20040728210359.76748bb5.mshefty@ichips.intel.com> <009101c47572$d538af80$6401a8c0@comcast.net> Message-ID: <20040729090853.5d6f0096.mshefty@ichips.intel.com> On Thu, 29 Jul 2004 09:49:11 -0400 Hal Rosenstock wrote: > I think it's more than debug printing. It's anything that makes any > of this visible to a user (e.g. network management applications, etc.). For any application to make use of these values, they must all be in the same format. The current implementations require converting *some* of the values from host to network order (or vice-versa), while others are already converted. The data in MADs will be in network order, while we have a choice at the access layer API. From halr at voltaire.com Thu Jul 29 10:41:56 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 29 Jul 2004 13:41:56 -0400 Subject: [openib-general] gen2 dev branch References: <35EA21F54A45CB47B879F21A91F4862F18AA39@taurus.voltaire.com> <526586x4uv.fsf@topspin.com> Message-ID: <019801c47593$591f91e0$6401a8c0@comcast.net> Roland Dreier wrote: > I'm very concerned about the lack of flexibility in the proposed API. > For example, only allowing one manager to register per GSI class > wouldn't work well for a subnet manager that wants to handle multicast > queries in one thread and path record requests in another thread. The GSI API can readily accomodate attribute as an additional parameter if that is a requirement (and has been previously stated). > Here's another way of putting it. I think we should aim for: > > pid_t fork(void); > > And right now I'm afraid we have: > > BOOL CreateProcess( > LPCTSTR lpApplicationName, > LPTSTR lpCommandLine, > LPSECURITY_ATTRIBUTES lpProcessAttributes, > LPSECURITY_ATTRIBUTES lpThreadAttributes, > BOOL bInheritHandles, > DWORD dwCreationFlags, > LPVOID lpEnvironment, > LPCTSTR lpCurrentDirectory, > LPSTARTUPINFO lpStartupInfo, > LPPROCESS_INFORMATION lpProcessInformation > ); I don't think that the above is a fair comparison. It's seems to me it's more like the contrast of registering a MAD filter against the register GSI class call. int ib_mad_handler_register(struct ib_mad_filter *filter, ib_mad_dispatch_func function, void *arg, tTS_IB_MAD_FILTER_HANDLE *handle); where struct ib_mad_filter { struct ib_device *device; tTS_IB_PORT port; __u32 qpn; uint8_t mgmt_class; uint8_t r_method; uint16_t attribute_id; tTS_IB_MAD_DIRECTION direction; tTS_IB_MAD_FILTER_MASK mask; char name[TS_IB_MAD_FILTER_NAME_MAX]; }; int gsi_reg_class(u8 mgmt_class, u8 mgmt_class_version, char *hca_name, u8 port, enum ib_gsi_reg_flags reg_flags, ib_gsi_send_comp_handler send_comp_handler, ib_gsi_recv_handler recv_handler, void *context, void **handle); Some of the parameters to the register GSI class call could be collapsed under a single parameter as has been done in the MAD filter if it is the width of the API that is the issue but I don't think that's the core of the issue. -- Hal From mshefty at ichips.intel.com Thu Jul 29 10:29:45 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Jul 2004 10:29:45 -0700 Subject: [openib-general] VAPI like API In-Reply-To: <506C3D7B14CDD411A52C00025558DED60585B9A6@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED60585B9A6@mtlex01.yok.mtl.com> Message-ID: <20040729102945.1f6e7bc6.mshefty@ichips.intel.com> On Thu, 29 Jul 2004 09:55:32 +0300 Dror Goldenberg wrote: > What's wrong with the name ? They *are* really fast memory regions. I should have said "fast memory registration" instead of regions. And I didn't say that anything was wrong with the name, just that VAPI and the spec used similar names for different registration processes. > I think that the names will never conflict. I agree. I was only suggesting a name change to help avoid confusion, but I didn't have anything in mind. I was going to leave the calls as they are for now. From robert.j.woodruff at intel.com Thu Jul 29 13:25:25 2004 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 29 Jul 2004 13:25:25 -0700 Subject: [openib-general] gen2 dev branch Message-ID: <1AC79F16F5C5284499BB9591B33D6F00CA630F@orsmsx408> Sean wrote, >I'd like to start forming a development branch for the gen2 code. Any thoughts or ideas >on the layout (besides making it easy to eventually drop into the kernel)? >I think starting with just mthca and the access layer would work for now, until we can >get far enough along to get everything compiling, and then loading/unloading. Then >ipoib can join the fun after we have all the required access layer features. We could create something like https://openib.org/svn/gen2/branches/accesslayer and start to fill it in, initially with the header files and then pull code over as it is converted or developed. Then once we have something that works, we can rename it to https://openib.org/svn/gen2/accesslayer From yaronh at voltaire.com Thu Jul 29 13:39:39 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Thu, 29 Jul 2004 23:39:39 +0300 Subject: [openib-general] gen2 dev branch Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AA7B@taurus.voltaire.com> On Thursday, July 29, 2004 11:25 PM, Woodruff, Robert J wrote: > Sean wrote, >> I'd like to start forming a development branch for the gen2 code. >> Any thoughts or ideas on the layout (besides making it easy to >> eventually drop into the kernel)? > >> I think starting with just mthca and the access layer would work for >> now, until we can get far enough along to get everything compiling, >> and then loading/unloading. Then ipoib can join the fun after we >> have all the required access layer features. > > We could create something like > https://openib.org/svn/gen2/branches/accesslayer > and start to fill it in, initially with the header files and then > pull code over as it is converted or developed. Then once we have > something that works, we can rename it to > https://openib.org/svn/gen2/accesslayer > Sound good, or we can even call it gen2/accesslayer from day one Any way all the ULP's and CM, .. Can just link to that Yaron From halr at voltaire.com Thu Jul 29 14:00:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 29 Jul 2004 17:00:26 -0400 Subject: [openib-general] gen2 dev branch References: <1AC79F16F5C5284499BB9591B33D6F00CA630F@orsmsx408> Message-ID: <02e901c475af$13cdd680$6401a8c0@comcast.net> Woodruff, Robert J wrote: > We could create something like > https://openib.org/svn/gen2/branches/accesslayer > and start to fill it in, initially with the header files and then pull > code over as it is converted or developed. Then once we have something > that works, we can rename it > to > https://openib.org/svn/gen2/accesslayer I think there are currently only access layer candidates :-) One of those is some portion of https://openib.org/svn/gen2/branches/roland-merge/src/linux-kernel/infiniban d/core/. I think there is only one other. If that is the case, we can stage it there but it will need to be moved if it gets adopted as this will not be its position in the tree for Linux inclusion. I would expect the tree to resemble Roland's tree. -- Hal From mshefty at ichips.intel.com Thu Jul 29 13:59:09 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Jul 2004 13:59:09 -0700 Subject: [openib-general] VAPI like API In-Reply-To: <08628CA53C6CBA4ABAFB9E808A5214CB02294B91@mercury.infiniconsys.com> References: <08628CA53C6CBA4ABAFB9E808A5214CB02294B91@mercury.infiniconsys.com> Message-ID: <20040729135909.7eb04b3e.mshefty@ichips.intel.com> On Thu, 29 Jul 2004 09:45:46 -0400 "Rimmer, Todd" wrote: > How do you see odd sized fields like the multicast gid group id being handled? > Also when we get to actual MAD processing there are a lot of bitfields in the MAD/CM/SA packets. My preference would be to use functions to get/set values in MADs that are of odd sizes. I would prefer to use bit masks, rather an bit fields. But, I'll defer to using whatever method is most commonly used in the networking stack today. > Personally I have debugged so many missing ntoh type bugs in the past that This is why I'm saying that everything should be in network order. You get a LID or QP number from a path record/CM MAD/etc., then turn around and use it unmodified in the access layer API. Remove or reduce how many times ntoh must be called, and the number of bugs should be removed/reduced accordingly. > We took a layered byteswapping approach. This sounds error prone to me; however, I can understand why you might take this approach with the MAD headers. From yaronh at voltaire.com Thu Jul 29 15:05:58 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Fri, 30 Jul 2004 01:05:58 +0300 Subject: [openib-general] gen2 dev branch Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AA81@taurus.voltaire.com> On Thursday, July 29, 2004 7:32 AM, Roland Dreier wrote: > stay out of this, but I still have the feeling that we are going in a > bad direction here, putting too much policy into an overly complex, > inflexible API. Rather than trying to decide in advance exactly what > consumers should do and then adding enough flags and knobs to the API > to implement exactly what we've decided, I think we should try and > come up with a minimal mechanism to implement what we know we need. > For example, we need to multiplex QP0/QP1, we want to have common > RMPP code, etc. Lets review the real diff between the two approaches as I see it Our/Todd/Eitan approach register one server per hca/port/class/ver/flags (and optionally attrib) For clients the responses are directed based on the MSB of TID (client code) And traps/reports can have one or more registrations Roland approach is to have a chain of filters (hca/port/qpn/class/method/attrib/dir/mask/name) that forwards the MAD to multiple consumers based on the filter list, the consumer must copy the MAD, if for e.g. there are multiple SA clients than each one will get the MAD and will need to decide if its his MAD (I assume that's also the reason attrib is needed in such approach, not because of futuristic SM) So I don't see how our demux approach is more "complex" to me it looks simpler and more transparent, I believe Sean supported the TID based client demux approach previously Since our approach is GSI MAD aware it can also perform RMPP internally (transparently), and a client may send a MAD request to a server, when a redirect response arrives to the GSI it automatically resend it to the appropriate new target (without implementing it for every consumer) Our implementation also includes some performance optimizations we are removing now (for "Simplicity") such as Addr_Hndl caching, and mem caching, we got the message that performance is less important than "simplicity" in this list. For people who need RMPP the best approach is to make it transparent (also requested by Sean in the pass), with a simple on/off flag and not by using libraries (that are not more simple, and require the same amount of code) If/when people will need Redirect support, the only reasonable place to implement it is in the GSI layer, and not duplicate it across the code, as I mentioned we can initially start with #if 0 the redirect code All in all I believe it's a better and simpler approach, that incorporated inputs from Roland, Todd, Eitan (the open way) Yaron From mshefty at ichips.intel.com Thu Jul 29 14:20:58 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Jul 2004 14:20:58 -0700 Subject: [openib-general] ib_verbs.h updated Message-ID: <20040729142058.5c4f90af.mshefty@ichips.intel.com> I think I've collected all the input from the various e-mails now and updated ib_verbs.h accordingly. Some issues that I have listed that are still open include: * determine host/network order for various fields * determine desired values for static rate * re-examine hca query routines * see if resize QP is supported by API From roland at topspin.com Thu Jul 29 15:52:12 2004 From: roland at topspin.com (Roland Dreier) Date: Thu, 29 Jul 2004 15:52:12 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AA81@taurus.voltaire.com> (Yaron Haviv's message of "Fri, 30 Jul 2004 01:05:58 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AA81@taurus.voltaire.com> Message-ID: <527jsmtoir.fsf@topspin.com> Yaron> Roland approach is to have a chain of filters Yaron> (hca/port/qpn/class/method/attrib/dir/mask/name) that Yaron> forwards the MAD to multiple consumers based on the filter Yaron> list, the consumer must copy the MAD, if for e.g. there are Yaron> multiple SA clients than each one will get the MAD and will Yaron> need to decide if its his MAD (I assume that's also the Yaron> reason attrib is needed in such approach, not because of Yaron> futuristic SM) I guess I should have been clearer in my description. Nothing forces the consumer to copy the MAD unless it wants to keep the MAD around for later. Also nothing prevents us from implementing an SA layer on top of the basic MAD layer that handles demultiplexing multiple consumers, etc. (in fact that is how the Topspin stack works and what I would expect we would want to do). I'm not too concerned about the details of what I proposed; I would just like to see a general layer that gives us the flexibility to experiment and deal with future requirements. In any case I remember now why I gave up on this discussion the first time around. - Roland From halr at voltaire.com Thu Jul 29 17:14:38 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 29 Jul 2004 20:14:38 -0400 Subject: [openib-general] ib_verbs.h nit Message-ID: <03af01c475ca$34d78680$6401a8c0@comcast.net> struct ib_mad; is no longer needed (since ib_process_local_mad was excised). From openib-in at polstra.com Thu Jul 29 18:37:48 2004 From: openib-in at polstra.com (John Polstra) Date: Thu, 29 Jul 2004 18:37:48 -0700 (PDT) Subject: [openib-general] [PATCH] Nasty bug in ipoib Message-ID: I stumbled onto a nasty bug in the ipoib code. The call to kmem_cache_create() for the _tsIp2prLinkRoot.user_req cache specifies the element size as the size of the pointer rather than the size of the structures that are stored in the cache. The attached patch (relative to the gen2 branch) fixes it. I think this should be fixed in the gen1 branch as well, since it can scribble on kernel memory that doesn't belong to it. John -------------- next part -------------- Index: ip2pr_link.c =================================================================== --- ip2pr_link.c (revision 544) +++ ip2pr_link.c (working copy) @@ -2188,7 +2188,7 @@ } /* if */ _tsIp2prLinkRoot.user_req = kmem_cache_create("Ip2prUserReq", - sizeof(tIP2PR_USER_REQ), + sizeof(tIP2PR_USER_REQ_STRUCT), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.user_req) { From halr at voltaire.com Fri Jul 30 04:15:19 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 07:15:19 -0400 Subject: [openib-general] ib_query/modify_hca_xxx in ib_verbs Message-ID: <00ae01c47626$809e2860$6401a8c0@comcast.net> Should the ib_query/modify_hca_xxx calls now be ib_query/modify_device_xxx calls ? ib_query_hca_cap ib_query_hca_port_prop ib_query_hca_gid_tbl ib_query_hca_pkey_tbl ib_modify_hca_attr From roland at topspin.com Fri Jul 30 06:34:05 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 30 Jul 2004 06:34:05 -0700 Subject: [openib-general] [PATCH] MR API changes Message-ID: <521xitty9e.fsf@topspin.com> This set of changes (also committed to gen2/branches/roland-merge) changes to the MR API. The code in mthca_reg_phys_mr() for figuring out alignment/page size is a little ugly and may still have bugs but it works for the case we actually use: register all of lowmem in one HCA page. - R. Index: src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (revision 530) +++ src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (working copy) @@ -220,31 +220,19 @@ { /* XXX we assume physical memory starts at address 0. */ - struct ib_physical_buffer buffer_list = { - .address = 0, - .size = 1 + struct ib_phys_buf buffer_list = { + .addr = 0, + .size = (unsigned long) high_memory - PAGE_OFFSET }; uint64_t dummy_iova = 0; - unsigned long tsize = (unsigned long)high_memory - PAGE_OFFSET; u32 rkey; - /* - * Make our region have size the size of low memory rounded - * up to the next power of 2 (so we use as few TPT entries - * as possible) - */ - while (tsize) { - buffer_list.size <<= 1; - tsize >>= 1; - } - - if (ib_memory_register_physical(priv->pd, &buffer_list, - 1, /* list_len */ - &dummy_iova, buffer_list.size, - 0, /* iova_offset */ - IB_ACCESS_LOCAL_WRITE, - &priv->mr, - &priv->lkey, &rkey)) { + priv->mr = ib_reg_phys_mr(priv->pd, &buffer_list, + 1, /* list_len */ + IB_MR_LOCAL_WRITE, + &dummy_iova, + &priv->lkey, &rkey); + if (IS_ERR(priv->mr)) { TS_REPORT_FATAL(MOD_IB_NET, "%s: ib_memory_register_physical failed", dev->name); @@ -276,9 +264,9 @@ clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); } - if (ib_memory_deregister(priv->mr)) + if (ib_dereg_mr(priv->mr)) TS_REPORT_WARN(MOD_IB_NET, - "%s: ib_memory_deregister failed", dev->name); + "%s: ib_dereg_mr failed", dev->name); if (ib_destroy_cq(priv->cq)) TS_REPORT_WARN(MOD_IB_NET, Index: src/linux-kernel/infiniband/ulp/srp/srp_host.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srp_host.c (revision 531) +++ src/linux-kernel/infiniband/ulp/srp/srp_host.c (working copy) @@ -522,10 +522,11 @@ */ int srp_host_alloc_pkts(srp_target_t * target) { - int pkt_num; + struct ib_phys_buf buffer_list; + u64 iova; srp_pkt_t *srp_pkt; void *srp_pkt_data; - int status; + int pkt_num; int max_num_pkts; srp_host_hca_params_t *hca; int cq_entries; @@ -605,20 +606,25 @@ if (ib_req_notify_cq(target->cqr_hndl[hca_index], IB_CQ_NEXT_COMP)) goto CQ_MR_FAIL; - status = ib_memory_register(hca->pd_hndl, - target->srp_pkt_data_area, - max_num_pkts * srp_cmd_pkt_size, - IB_ACCESS_LOCAL_WRITE | - IB_ACCESS_REMOTE_READ, - &target-> - srp_pkt_data_mhndl[hca_index], - &target->l_key[hca_index], - &target->r_key[hca_index]); + /* XXX UGH -- use pci_alloc_consistent? */ + buffer_list.addr = virt_to_phys(target->srp_pkt_data_area); + buffer_list.size = max_num_pkts * srp_cmd_pkt_size; + iova = (unsigned long) target->srp_pkt_data_area; - if (status != 0) { + target->srp_pkt_data_mhndl[hca_index] = + ib_reg_phys_mr(hca->pd_hndl, &buffer_list, + 1, /*list_len */ + IB_MR_LOCAL_WRITE | + IB_MR_REMOTE_READ, + &iova, + &target->l_key[hca_index], + &target->r_key[hca_index]); + + if (IS_ERR(target->srp_pkt_data_mhndl[hca_index])) { TS_REPORT_FATAL(MOD_SRPTP, "Memory registration failed: %d", - status); + PTR_ERR(target->srp_pkt_data_mhndl[hca_index])); + target->srp_pkt_data_mhndl[hca_index] = NULL; goto CQ_MR_FAIL; } } @@ -660,8 +666,8 @@ if (hca->valid == FALSE) break; - ib_memory_deregister(target->srp_pkt_data_mhndl[hca_index]); - + if (target->srp_pkt_data_mhndl[hca_index]) + ib_dereg_mr(target->srp_pkt_data_mhndl[hca_index]); if (target->cqr_hndl[hca_index]) ib_destroy_cq(target->cqr_hndl[hca_index]); if (target->cqs_hndl[hca_index]) @@ -695,7 +701,7 @@ if (hca->valid == FALSE) break; - ib_memory_deregister(target->srp_pkt_data_mhndl[hca_index]); + ib_dereg_mr(target->srp_pkt_data_mhndl[hca_index]); ib_destroy_cq(target->cqr_hndl[hca_index]); ib_destroy_cq(target->cqs_hndl[hca_index]); Index: src/linux-kernel/infiniband/ulp/srp/srptp.c =================================================================== --- src/linux-kernel/infiniband/ulp/srp/srptp.c (revision 530) +++ src/linux-kernel/infiniband/ulp/srp/srptp.c (working copy) @@ -50,9 +50,9 @@ static void cq_send_handler(struct ib_wc *cq_entry, srp_target_t *target) { switch (cq_entry->status) { - case IB_COMPLETION_STATUS_SUCCESS: + case IB_WC_SUCCESS: - if (cq_entry->opcode != IB_COMPLETION_OP_SEND) { + if (cq_entry->opcode != IB_WC_SEND) { TS_REPORT_FATAL(MOD_SRPTP, "Wrong Opcode"); return; } @@ -60,7 +60,7 @@ srp_send_done(cq_entry->wr_id, target); break; - case IB_COMPLETION_STATUS_WORK_REQUEST_FLUSHED_ERROR: + case IB_WC_WR_FLUSH_ERR: TS_REPORT_STAGE(MOD_SRPTP, "Send WR_FLUSH_ERR wr_id %d", cq_entry->wr_id); @@ -96,9 +96,9 @@ int status; switch (cq_entry->status) { - case IB_COMPLETION_STATUS_SUCCESS: + case IB_WC_SUCCESS: - if (cq_entry->opcode != IB_COMPLETION_OP_RECEIVE) { + if (cq_entry->opcode != IB_WC_RECV) { TS_REPORT_FATAL(MOD_SRPTP, "Wrong Opcode"); return; } @@ -113,7 +113,7 @@ srp_recv(cq_entry->wr_id, target); break; - case IB_COMPLETION_STATUS_WORK_REQUEST_FLUSHED_ERROR: + case IB_WC_WR_FLUSH_ERR: TS_REPORT_STAGE(MOD_SRPTP, "Recv WR_FLUSH_ERR wr_id %d", cq_entry->wr_id); Index: src/linux-kernel/infiniband/ulp/sdp/sdp_write.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_write.c (revision 530) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_write.c (working copy) @@ -49,10 +49,10 @@ /* * error handling */ - if (IB_COMPLETION_STATUS_SUCCESS != comp->status) { + if (IB_WC_SUCCESS != comp->status) { switch (comp->status) { - case IB_COMPLETION_STATUS_WORK_REQUEST_FLUSHED_ERROR: + case IB_WC_WR_FLUSH_ERR: /* * clear posted buffers from error'd queue */ Index: src/linux-kernel/infiniband/ulp/sdp/sdp_rcvd.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_rcvd.c (revision 530) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_rcvd.c (working copy) @@ -1376,10 +1376,10 @@ /* * error handling */ - if (IB_COMPLETION_STATUS_SUCCESS != comp->status) { + if (IB_WC_SUCCESS != comp->status) { switch (comp->status) { - case IB_COMPLETION_STATUS_WORK_REQUEST_FLUSHED_ERROR: + case IB_WC_WR_FLUSH_ERR: /* * clear posted buffers from error'd queue */ Index: src/linux-kernel/infiniband/ulp/sdp/sdp_read.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_read.c (revision 530) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_read.c (working copy) @@ -126,10 +126,10 @@ /* * error handling */ - if (IB_COMPLETION_STATUS_SUCCESS != comp->status) { + if (IB_WC_SUCCESS != comp->status) { switch (comp->status) { - case IB_COMPLETION_STATUS_WORK_REQUEST_FLUSHED_ERROR: + case IB_WC_WR_FLUSH_ERR: /* * clear posted buffers from error'd queue */ Index: src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (revision 530) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -1893,7 +1893,7 @@ #ifdef _TS_SDP_AIO_SUPPORT tTS_IB_FMR_POOL_PARAM_STRUCT fmr_param_s; #endif - struct ib_physical_buffer buffer_list; + struct ib_phys_buf buffer_list; struct ib_device_properties node_info; struct ib_device *hca_handle; struct sdev_hca_port *port; @@ -1971,22 +1971,19 @@ /* * memory registration */ - buffer_list.address = 0; + buffer_list.addr = 0; buffer_list.size = (unsigned long)high_memory - PAGE_OFFSET; hca->iova = 0; - result = ib_memory_register_physical(hca->pd, - &buffer_list, - 1, /* list_len */ - &hca->iova, - (unsigned long)(high_memory - PAGE_OFFSET), - 0, /* iova_offset */ - IB_ACCESS_LOCAL_WRITE, - &hca->mem_h, - &hca->l_key, &hca->r_key); - if (0 != result) { - + hca->mem_h = ib_reg_phys_mr(hca->pd, + &buffer_list, + 1, /* list_len */ + IB_ACCESS_LOCAL_WRITE, + &hca->iova, + &hca->l_key, &hca->r_key); + if (IS_ERR(hca->mem_h)) { + result = PTR_ERR(hca->mem_h); TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, "INIT: Error <%d> registering HCA <%x:%d> memory.", result, hca_handle, hca_count); @@ -2097,7 +2094,7 @@ if (hca->mem_h) { - (void)ib_memory_deregister(hca->mem_h); + (void)ib_dereg_mr(hca->mem_h); } if (hca->pd) { Index: src/linux-kernel/infiniband/ulp/sdp/sdp_sent.c =================================================================== --- src/linux-kernel/infiniband/ulp/sdp/sdp_sent.c (revision 530) +++ src/linux-kernel/infiniband/ulp/sdp/sdp_sent.c (working copy) @@ -445,10 +445,10 @@ /* * error handling */ - if (IB_COMPLETION_STATUS_SUCCESS != comp->status) { + if (IB_WC_SUCCESS != comp->status) { switch (comp->status) { - case IB_COMPLETION_STATUS_WORK_REQUEST_FLUSHED_ERROR: + case IB_WC_WR_FLUSH_ERR: /* * clear posted buffers from error'd queue */ Index: src/linux-kernel/infiniband/include/ib_verbs.h =================================================================== --- src/linux-kernel/infiniband/include/ib_verbs.h (revision 530) +++ src/linux-kernel/infiniband/include/ib_verbs.h (working copy) @@ -70,7 +70,11 @@ IB_WC_COMP_SWAP, IB_WC_FETCH_ADD, IB_WC_BIND_MW, - IB_WC_RECV, +/* + * Set value of IB_WC_RECV so consumers can test if a completion is a + * receive by testing (opcode & IB_WC_RECV). + */ + IB_WC_RECV = 1 << 7, IB_WC_RECV_RDMA_WITH_IMM }; @@ -95,9 +99,37 @@ IB_CQ_NEXT_COMP }; +enum ib_mr_access_flags { + IB_MR_LOCAL_WRITE = 1, + IB_MR_REMOTE_WRITE = (1<<1), + IB_MR_REMOTE_READ = (1<<2), + IB_MR_REMOTE_ATOMIC = (1<<3), + IB_MR_MW_BIND = (1<<4) +}; + +struct ib_phys_buf { + u64 addr; + u64 size; +}; + +struct ib_mr_attr { + struct ib_pd *pd; + u64 device_virt_addr; + u64 size; + int mr_access_flags; + u32 lkey; + u32 rkey; +}; + +enum ib_mr_rereg_flags { + IB_MR_REREG_TRANS = 1, + IB_MR_REREG_PD = (1<<1), + IB_MR_REREG_ACCESS = (1<<2) +}; + struct ib_pd { struct ib_device *device; - atomic_t usecnt; + atomic_t usecnt; /* count all resources */ }; typedef void (*ib_comp_handler)(struct ib_cq *cq, void *cq_context); @@ -107,9 +139,15 @@ ib_comp_handler comp_handler; void * context; int cqe; - atomic_t usecnt; + atomic_t usecnt; /* count number of work queues */ }; +struct ib_mr { + struct ib_device *device; + struct ib_pd *pd; + atomic_t usecnt; /* count number of MWs */ +}; + struct ib_device { IB_DECLARE_MAGIC @@ -153,9 +191,25 @@ enum ib_cq_notify cq_notify); int (*req_ncomp_notif)(struct ib_cq *cq, int wc_cnt); - ib_mr_register_func mr_register; - ib_mr_register_physical_func mr_register_physical; - ib_mr_deregister_func mr_deregister; + struct ib_mr * (*reg_phys_mr)(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey); + int (*query_mr)(struct ib_mr *mr, + struct ib_mr_attr *mr_attr); + int (*dereg_mr)(struct ib_mr *mr); + int (*rereg_phys_mr)(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey); ib_mw_create_func mw_create; ib_mw_destroy_func mw_destroy; ib_mw_bind_func mw_bind; @@ -214,6 +268,27 @@ int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt); +struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey); + +int ib_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey); + +int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); +int ib_dereg_mr(struct ib_mr *mr); + #endif /* __KERNEL __ */ /* XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX */ Index: src/linux-kernel/infiniband/include/ts_ib_core_types.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core_types.h (revision 530) +++ src/linux-kernel/infiniband/include/ts_ib_core_types.h (working copy) @@ -211,40 +211,6 @@ IB_OP_MEMORY_WINDOW_BIND }; -enum ib_completion_op { - IB_COMPLETION_OP_RECEIVE, - IB_COMPLETION_OP_RDMA_WRITE_RECEIVE, - IB_COMPLETION_OP_SEND, - IB_COMPLETION_OP_RDMA_WRITE, - IB_COMPLETION_OP_RDMA_READ, - IB_COMPLETION_OP_COMPARE_SWAP, - IB_COMPLETION_OP_FETCH_ADD, - IB_COMPLETION_OP_MEMORY_WINDOW_BIND, -}; - -enum ib_completion_status { - IB_COMPLETION_STATUS_SUCCESS, - IB_COMPLETION_STATUS_LOCAL_LENGTH_ERROR, - IB_COMPLETION_STATUS_LOCAL_QP_OPERATION_ERROR, - IB_COMPLETION_STATUS_LOCAL_EEC_OPERATION_ERROR, - IB_COMPLETION_STATUS_LOCAL_PROTECTION_ERROR, - IB_COMPLETION_STATUS_WORK_REQUEST_FLUSHED_ERROR, - IB_COMPLETION_STATUS_MEMORY_WINDOW_BIND_ERROR, - IB_COMPLETION_STATUS_BAD_RESPONSE_ERROR, - IB_COMPLETION_STATUS_LOCAL_ACCESS_ERROR, - IB_COMPLETION_STATUS_REMOTE_INVALID_REQUEST_ERROR, - IB_COMPLETION_STATUS_REMOTE_ACCESS_ERORR, - IB_COMPLETION_STATUS_REMOTE_OPERATION_ERROR, - IB_COMPLETION_STATUS_TRANSPORT_RETRY_COUNTER_EXCEEDED, - IB_COMPLETION_STATUS_RNR_RETRY_COUNTER_EXCEEDED, - IB_COMPLETION_STATUS_LOCAL_RDD_VIOLATION_ERROR, - IB_COMPLETION_STATUS_REMOTE_INVALID_RD_REQUEST, - IB_COMPLETION_STATUS_REMOTE_ABORTED_ERROR, - IB_COMPLETION_STATUS_INVALID_EEC_NUMBER, - IB_COMPLETION_STATUS_INVALID_EEC_STATE, - IB_COMPLETION_STATUS_UNKNOWN_ERROR -}; - enum ib_async_event { IB_QP_PATH_MIGRATED, IB_EEC_PATH_MIGRATED, @@ -287,14 +253,6 @@ void *private; }; -struct ib_mr { - IB_DECLARE_MAGIC - struct ib_device *device; - u32 lkey; - u32 rkey; - void *private; -}; - struct ib_fmr_pool; /* actual definition in core_fmr.c */ struct ib_fmr { @@ -590,11 +548,6 @@ int fence:1; }; -struct ib_physical_buffer { - u64 address; - u64 size; -}; - struct ib_fmr_pool_param { int max_pages_per_fmr; enum ib_memory_access access; @@ -658,20 +611,6 @@ typedef int (*ib_receive_post_func)(struct ib_qp *qp, struct ib_receive_param *param, int num_work_requests); -typedef int (*ib_mr_register_func)(struct ib_pd *pd, - void *start_address, - u64 buffer_size, - enum ib_memory_access access, - struct ib_mr *mr); -typedef int (*ib_mr_register_physical_func)(struct ib_pd *pd, - struct ib_physical_buffer *buffer_list, - int list_len, - u64 *io_virtual_address, - u64 buffer_size, - u64 iova_offset, - enum ib_memory_access access, - struct ib_mr *mr); -typedef int (*ib_mr_deregister_func)(struct ib_mr *mr); typedef int (*ib_mw_create_func)(struct ib_pd *pd, struct ib_mw **mw, u32 *rkey); Index: src/linux-kernel/infiniband/include/ts_ib_core.h =================================================================== --- src/linux-kernel/infiniband/include/ts_ib_core.h (revision 530) +++ src/linux-kernel/infiniband/include/ts_ib_core.h (working copy) @@ -100,25 +100,6 @@ return qp->device->receive_post(qp, param, num_work_requests); } -int ib_memory_register(struct ib_pd *pd, - void *start_address, - uint64_t buffer_size, - enum ib_memory_access access, - struct ib_mr **memory, - u32 *lkey, - u32 *rkey); -int ib_memory_register_physical(struct ib_pd *pd, - struct ib_physical_buffer *buffer_list, - int list_len, - uint64_t *io_virtual_address, - uint64_t buffer_size, - uint64_t iova_offset, - enum ib_memory_access access, - struct ib_mr **mr, - u32 *lkey, - u32 *rkey); -int ib_memory_deregister(struct ib_mr *memory); - int ib_mw_create(struct ib_pd *pd, struct ib_mw **mw, u32 *rkey); Index: src/linux-kernel/infiniband/core/mad_ib.c =================================================================== --- src/linux-kernel/infiniband/core/mad_ib.c (revision 530) +++ src/linux-kernel/infiniband/core/mad_ib.c (working copy) @@ -23,7 +23,6 @@ */ #include "mad_priv.h" -#include "mad_mem_compat.h" #include "ts_kernel_trace.h" #include "ts_kernel_services.h" Index: src/linux-kernel/infiniband/core/mad_mem_compat.h =================================================================== --- src/linux-kernel/infiniband/core/mad_mem_compat.h (revision 521) +++ src/linux-kernel/infiniband/core/mad_mem_compat.h (working copy) @@ -1,71 +0,0 @@ -/* - This software is available to you under a choice of one of two - licenses. You may choose to be licensed under the terms of the GNU - General Public License (GPL) Version 2, available at - , or the OpenIB.org BSD - license, available in the LICENSE.TXT file accompanying this - software. These details are also available at - . - - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - SOFTWARE. - - Copyright (c) 2004 Topspin Communications. All rights reserved. - - $Id$ -*/ - -#ifndef _MAD_MEM_COMPAT_H -#define _MAD_MEM_COMPAT_H - -/* Need the definition of high_memory: */ -#include - -static inline int ib_mad_register_memory(struct ib_pd *pd, - struct ib_mr **mr, - u32 *lkey) -{ - u32 rkey; - u64 iova = 0; - struct ib_physical_buffer buffer_list; - int result; - - buffer_list.address = 0; - - /* make our region have size the size of low memory rounded up to - the next power of 2, so we use as few TPT entries as possible - and don't confuse the verbs driver when lowmem has an odd size - (cf bug 1921) */ - for (buffer_list.size = 1; - buffer_list.size < (unsigned long) high_memory - PAGE_OFFSET; - buffer_list.size <<= 1) { - /* nothing */ - } - - result = ib_memory_register_physical(pd, - &buffer_list, - 1, /* list_len */ - &iova, - buffer_list.size, - 0, /* iova_offset */ - IB_ACCESS_LOCAL_WRITE, - mr, - lkey, - &rkey); - if (result) - TS_REPORT_WARN(MOD_KERNEL_IB, - "ib_memory_register_physical failed " - "size 0x%016" TS_U64_FMT "x, iova 0x%016" TS_U64_FMT "x" - " (return code %d)", - buffer_list.size, iova, result); - - return result; -} - -#endif /* _MAD_COMPAT_H */ Index: src/linux-kernel/infiniband/core/core_cq.c =================================================================== --- src/linux-kernel/infiniband/core/core_cq.c (revision 530) +++ src/linux-kernel/infiniband/core/core_cq.c (working copy) @@ -21,17 +21,13 @@ $Id$ */ -#include "core_priv.h" - -#include "ts_kernel_trace.h" -#include "ts_kernel_services.h" - #include #include #include -#include +#include "core_priv.h" + struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, void *cq_context, int *cqe) Index: src/linux-kernel/infiniband/core/core_mr.c =================================================================== --- src/linux-kernel/infiniband/core/core_mr.c (revision 521) +++ src/linux-kernel/infiniband/core/core_mr.c (working copy) @@ -21,118 +21,79 @@ $Id$ */ -#include "core_priv.h" - -#include "ts_kernel_trace.h" -#include "ts_kernel_services.h" - #include #include #include -#include -int ib_memory_register(struct ib_pd *pd, - void *start_address, - u64 buffer_size, - enum ib_memory_access access, - struct ib_mr **mr_handle, - u32 *lkey, - u32 *rkey) +#include "core_priv.h" + +struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey) { struct ib_mr *mr; - int ret; - if (!pd->device->mr_register) { - return -ENOSYS; - } + mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, + mr_access_flags, iova_start, lkey, rkey); - mr = kmalloc(sizeof *mr, GFP_KERNEL); - if (!mr) { - return -ENOMEM; - } - - ret = pd->device->mr_register(pd, start_address, buffer_size, access, mr); - - if (!ret) { - IB_SET_MAGIC(mr, MR); + if (!IS_ERR(mr)) { mr->device = pd->device; - *mr_handle = mr; - *lkey = mr->lkey; - *rkey = mr->rkey; - } else { - kfree(mr); + mr->pd = pd; + atomic_inc(&pd->usecnt); + atomic_set(&mr->usecnt, 0); } - return ret; + return mr; } -EXPORT_SYMBOL(ib_memory_register); +EXPORT_SYMBOL(ib_reg_phys_mr); -int ib_memory_register_physical(struct ib_pd *pd, - struct ib_physical_buffer *buffer_list, - int list_len, - u64 *io_virtual_address, - u64 buffer_size, - u64 iova_offset, - enum ib_memory_access access, - struct ib_mr **mr_handle, - u32 *lkey, - u32 *rkey) +int ib_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey) { - struct ib_mr *mr; - int ret; + return mr->device->rereg_phys_mr ? + mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, + phys_buf_array, num_phys_buf, + mr_access_flags, iova_start, + lkey, rkey) : + -ENOSYS; +} +EXPORT_SYMBOL(ib_rereg_phys_mr); - if (!pd->device->mr_register_physical) { - return -ENOSYS; - } - - mr = kmalloc(sizeof *mr, GFP_KERNEL); - if (!mr) { - return -ENOMEM; - } - - ret = pd->device->mr_register_physical(pd, - buffer_list, - list_len, - io_virtual_address, - buffer_size, - iova_offset, - access, - mr); - - if (!ret) { - IB_SET_MAGIC(mr, MR); - mr->device = pd->device; - *mr_handle = mr; - *lkey = mr->lkey; - *rkey = mr->rkey; - } else { - kfree(mr); - } - - return ret; +int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr) +{ + return mr->device->query_mr ? + mr->device->query_mr(mr, mr_attr) : -ENOSYS; } -EXPORT_SYMBOL(ib_memory_register_physical); +EXPORT_SYMBOL(ib_query_mr); -int ib_memory_deregister(struct ib_mr *mr) +int ib_dereg_mr(struct ib_mr *mr) { - int ret; + struct ib_pd *pd; + int ret; - IB_CHECK_MAGIC(mr, MR); + if (atomic_read(&mr->usecnt)) + return -EBUSY; - if (!mr->device->mr_deregister) { - return -ENOSYS; - } + pd = mr->pd; + ret = mr->device->dereg_mr(mr); + if (!ret) + atomic_dec(&pd->usecnt); - ret = mr->device->mr_deregister(mr); - if (!ret) { - IB_CLEAR_MAGIC(mr); - kfree(mr); - } - return ret; } -EXPORT_SYMBOL(ib_memory_deregister); +EXPORT_SYMBOL(ib_dereg_mr); /* Local Variables: Index: src/linux-kernel/infiniband/core/mad_main.c =================================================================== --- src/linux-kernel/infiniband/core/mad_main.c (revision 530) +++ src/linux-kernel/infiniband/core/mad_main.c (working copy) @@ -24,13 +24,7 @@ #include #include "mad_priv.h" -#include "mad_mem_compat.h" -#if defined(CONFIG_INFINIBAND_MELLANOX_HCA) || \ - defined(CONFIG_INFINIBAND_MELLANOX_HCA_MODULE) -#include "ts_ib_tavor_provider.h" -#endif - #include "ts_kernel_trace.h" #include "ts_kernel_services.h" @@ -40,6 +34,9 @@ #include #include +/* Need the definition of high_memory: */ +#include + #ifdef CONFIG_KMOD #include #endif @@ -50,6 +47,31 @@ kmem_cache_t *mad_cache; +static inline int ib_mad_register_memory(struct ib_pd *pd, + struct ib_mr **mr, + u32 *lkey) +{ + u32 rkey; + u64 iova = 0; + struct ib_phys_buf buffer_list = { + .addr = 0, + .size = (unsigned long) high_memory - PAGE_OFFSET + }; + + *mr = ib_reg_phys_mr(pd, &buffer_list, + 1, /* list_len */ + IB_MR_LOCAL_WRITE, + &iova, lkey, &rkey); + if (IS_ERR(*mr)) + TS_REPORT_WARN(MOD_KERNEL_IB, + "ib_reg_phys_mr failed " + "size 0x%016" TS_U64_FMT "x, iova 0x%016" TS_U64_FMT "x" + " (return code %d)", + buffer_list.size, iova, PTR_ERR(*mr)); + + return IS_ERR(*mr) ? PTR_ERR(*mr) : 0; +} + static int ib_mad_qp_create(struct ib_device *device, tTS_IB_PORT port, u32 qpn) @@ -277,7 +299,7 @@ } error_free_mr: - ib_memory_deregister(priv->mr); + ib_dereg_mr(priv->mr); error_free_cq: ib_destroy_cq(priv->cq); @@ -317,7 +339,7 @@ } } - ib_memory_deregister(priv->mr); + ib_dereg_mr(priv->mr); ib_destroy_cq(priv->cq); ib_dealloc_pd(priv->pd); Index: src/linux-kernel/infiniband/core/core_device.c =================================================================== --- src/linux-kernel/infiniband/core/core_device.c (revision 530) +++ src/linux-kernel/infiniband/core/core_device.c (working copy) @@ -61,8 +61,8 @@ IB_MANDATORY_FUNC(destroy_cq), IB_MANDATORY_FUNC(poll_cq), IB_MANDATORY_FUNC(req_notify_cq), - IB_MANDATORY_FUNC(mr_register_physical), - IB_MANDATORY_FUNC(mr_deregister) + IB_MANDATORY_FUNC(reg_phys_mr), + IB_MANDATORY_FUNC(dereg_mr) }; int i; Index: src/linux-kernel/infiniband/core/core_pd.c =================================================================== --- src/linux-kernel/infiniband/core/core_pd.c (revision 521) +++ src/linux-kernel/infiniband/core/core_pd.c (working copy) @@ -21,17 +21,13 @@ $Id$ */ -#include "core_priv.h" - -#include "ts_kernel_trace.h" -#include "ts_kernel_services.h" - #include #include #include -#include +#include "core_priv.h" + struct ib_pd *ib_alloc_pd(struct ib_device *device) { struct ib_pd *pd; Index: src/linux-kernel/infiniband/hw/mthca/mthca_dev.h =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_dev.h (revision 530) +++ src/linux-kernel/infiniband/hw/mthca/mthca_dev.h (working copy) @@ -291,8 +291,7 @@ int mthca_mr_alloc_phys(struct mthca_dev *dev, u32 pd, u64 *buffer_list, int buffer_size_shift, int list_len, u64 iova, u64 total_size, - u64 iova_offset, u32 access, - struct mthca_mr *mr); + u32 access, struct mthca_mr *mr); void mthca_free_mr(struct mthca_dev *dev, struct mthca_mr *mr); int mthca_poll_cq(struct ib_cq *ibcq, int num_entries, Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 530) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -415,79 +415,122 @@ return 0; } -static int mthca_mr_register_physical(struct ib_pd *pd, - struct ib_physical_buffer *buffer_list, - int list_len, - uint64_t *io_virtual_address, - uint64_t buffer_size, - uint64_t iova_offset, - enum ib_memory_access acc, - struct ib_mr *mr) +static struct ib_mr *mthca_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *buffer_list, + int num_phys_buf, + int acc, + u64 *iova_start, + u32 *lkey, + u32 *rkey) { + struct mthca_mr *mr; u64 *page_list; + u64 total_size; + u64 mask; int shift; int npages; u32 access; int err = -ENOMEM; - int i; + int i, j, n; - /* We only support one buffer for now */ - if (list_len > 1 || buffer_list[0].size != buffer_size) - return -EINVAL; + /* First check that we have enough alignment */ + if ((*iova_start & PAGE_MASK) != (buffer_list[0].addr & PAGE_MASK)) + return ERR_PTR(-EINVAL); - for (shift = 1; shift < 31 && 1ULL << shift < buffer_size; ++shift) - ; /* nothing */ + if (num_phys_buf > 1 && + ((buffer_list[0].addr + buffer_list[0].size) & PAGE_MASK)) + return ERR_PTR(-EINVAL); - npages = (buffer_size + (1ULL << shift) - 1) >> shift; + mask = 0; + total_size = 0; + for (i = 0; i < num_phys_buf; ++i) { + if (buffer_list[i].addr & PAGE_MASK) + return ERR_PTR(-EINVAL); + if (i != 0 && i != num_phys_buf - 1 && + (buffer_list[i].size & PAGE_MASK)) + return ERR_PTR(-EINVAL); + total_size += buffer_list[i].size; + if (i > 0) + mask |= buffer_list[i].addr; + } + + /* Find largest page shift we can use to cover buffers */ + for (shift = PAGE_SHIFT; shift < 31; ++shift) + if (num_phys_buf > 1) { + if ((1ULL << shift) & mask) + break; + } else { + if (1ULL << shift >= + buffer_list[0].size + + (buffer_list[0].addr & ((1ULL << shift) - 1))) + break; + } + + buffer_list[0].size += buffer_list[0].addr & ((1ULL << shift) - 1); + buffer_list[0].addr &= ~0ull << shift; + + mr = kmalloc(sizeof *mr, GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + npages = 0; + for (i = 0; i < num_phys_buf; ++i) + npages += (buffer_list[i].size + (1ULL << shift) - 1) >> shift; + if (!npages) - return 0; + return (struct ib_mr *) mr; page_list = kmalloc(npages * sizeof *page_list, GFP_KERNEL); - if (!page_list) - return -ENOMEM; + if (!page_list) { + kfree(mr); + return ERR_PTR(-ENOMEM); + } - for (i = 0; i < npages; ++i) - page_list[i] = buffer_list[0].address + ((u64) i << shift); + n = 0; + for (i = 0; i < num_phys_buf; ++i) + for (j = 0; + j < (buffer_list[i].size + (1ULL << shift) - 1) >> shift; + ++j) + page_list[n++] = buffer_list[i].addr + ((u64) j << shift); - mr->private = kmalloc(sizeof (struct mthca_mr), GFP_KERNEL); - if (!mr->private) - goto out; - access = - (acc & IB_ACCESS_REMOTE_ATOMIC ? MTHCA_MPT_FLAG_ATOMIC : 0) | - (acc & IB_ACCESS_REMOTE_WRITE ? MTHCA_MPT_FLAG_REMOTE_WRITE : 0) | - (acc & IB_ACCESS_REMOTE_READ ? MTHCA_MPT_FLAG_REMOTE_READ : 0) | - (acc & IB_ACCESS_LOCAL_WRITE ? MTHCA_MPT_FLAG_LOCAL_WRITE : 0) | + (acc & IB_MR_REMOTE_ATOMIC ? MTHCA_MPT_FLAG_ATOMIC : 0) | + (acc & IB_MR_REMOTE_WRITE ? MTHCA_MPT_FLAG_REMOTE_WRITE : 0) | + (acc & IB_MR_REMOTE_READ ? MTHCA_MPT_FLAG_REMOTE_READ : 0) | + (acc & IB_MR_LOCAL_WRITE ? MTHCA_MPT_FLAG_LOCAL_WRITE : 0) | MTHCA_MPT_FLAG_LOCAL_READ; mthca_dbg(to_mdev(pd->device), "Registering memory at %llx (iova %llx) " "in PD %x; shift %d, npages %d.\n", - (unsigned long long) buffer_list[0].address, - (unsigned long long) *io_virtual_address, + (unsigned long long) buffer_list[0].addr, + (unsigned long long) *iova_start, ((struct mthca_pd *) pd)->pd_num, shift, npages); err = mthca_mr_alloc_phys(to_mdev(pd->device), ((struct mthca_pd *) pd)->pd_num, page_list, shift, npages, - *io_virtual_address, buffer_size, - iova_offset, access, mr->private); + *iova_start, total_size, + access, mr); - if (err) - kfree(mr->private); + if (err) { + kfree(mr); + mr = ERR_PTR(err); + goto out; + } - mr->lkey = mr->rkey = ((struct mthca_mr *) mr->private)->key; + *lkey = *rkey = mr->key; - out: +out: kfree(page_list); - return err; + return (struct ib_mr *) mr; } -static int mthca_mr_deregister(struct ib_mr *mr) +static int mthca_dereg_mr(struct ib_mr *mr) { - mthca_free_mr(to_mdev(mr->device), mr->private); - kfree(mr->private); + mthca_free_mr(to_mdev(mr->device), (struct mthca_mr *) mr); + kfree(mr); return 0; } @@ -516,8 +559,8 @@ dev->ib_dev.destroy_cq = mthca_destroy_cq; dev->ib_dev.poll_cq = mthca_poll_cq; dev->ib_dev.req_notify_cq = mthca_req_notify_cq; - dev->ib_dev.mr_register_physical = mthca_mr_register_physical; - dev->ib_dev.mr_deregister = mthca_mr_deregister; + dev->ib_dev.reg_phys_mr = mthca_reg_phys_mr; + dev->ib_dev.dereg_mr = mthca_dereg_mr; dev->ib_dev.multicast_attach = mthca_multicast_attach; dev->ib_dev.multicast_detach = mthca_multicast_detach; dev->ib_dev.mad_process = mthca_process_mad; Index: src/linux-kernel/infiniband/hw/mthca/mthca_provider.h =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_provider.h (revision 541) +++ src/linux-kernel/infiniband/hw/mthca/mthca_provider.h (working copy) @@ -39,6 +39,7 @@ }; struct mthca_mr { + struct ib_mr ibmr; u32 key; int order; u32 first_seg; Index: src/linux-kernel/infiniband/hw/mthca/mthca_cq.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_cq.c (revision 543) +++ src/linux-kernel/infiniband/hw/mthca/mthca_cq.c (working copy) @@ -636,7 +636,7 @@ err = mthca_mr_alloc_phys(dev, dev->driver_pd.pd_num, dma_list, shift, npages, - 0, size, 0, + 0, size, MTHCA_MPT_FLAG_LOCAL_WRITE | MTHCA_MPT_FLAG_LOCAL_READ, &cq->mr); Index: src/linux-kernel/infiniband/hw/mthca/mthca_eq.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_eq.c (revision 521) +++ src/linux-kernel/infiniband/hw/mthca/mthca_eq.c (working copy) @@ -389,7 +389,7 @@ err = mthca_mr_alloc_phys(dev, dev->driver_pd.pd_num, dma_list, PAGE_SHIFT, npages, - 0, npages * PAGE_SIZE, 0, + 0, npages * PAGE_SIZE, MTHCA_MPT_FLAG_LOCAL_WRITE | MTHCA_MPT_FLAG_LOCAL_READ, &eq->mr); Index: src/linux-kernel/infiniband/hw/mthca/TODO =================================================================== --- src/linux-kernel/infiniband/hw/mthca/TODO (revision 521) +++ src/linux-kernel/infiniband/hw/mthca/TODO (working copy) @@ -3,10 +3,8 @@ immediate operations in send queues. APM support: ib_mthca's QP modify does not set alternate path fields in QP context. - Full support for asynchronous events: only port change - asynchronous events will be dispatched. - Full support for physical memory registration: Physical memory - registration is limited to a single buffer. + Full support for asynchronous events: dispatching CQ and various + unaffiliated errors still needs to be implemented. UD address vectors in HCA memory with DDR hidden: Even if the HCA does not expose its memory via a PCI BAR, the ACCESS_DDR firmware command could be used to write UD address vectors @@ -19,8 +17,6 @@ FMR support: ib_mthca does not support Mellanox-style "fast memory regions" as used by SDP and SRP. SRQ support: ib_mthca does not support shared receive queues. - Virtual memory registration: the register virtual memory region - verb is not implemented. Native Arbel support: Support for Arbel in native mode needs to be added, including at least memory-free mode and base memory management extensions. Or should native Arbel and Tavor have Index: src/linux-kernel/infiniband/hw/mthca/mthca_mr.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_mr.c (revision 521) +++ src/linux-kernel/infiniband/hw/mthca/mthca_mr.c (working copy) @@ -161,8 +161,7 @@ int mthca_mr_alloc_phys(struct mthca_dev *dev, u32 pd, u64 *buffer_list, int buffer_size_shift, int list_len, u64 iova, u64 total_size, - u64 iova_offset, u32 access, - struct mthca_mr *mr) + u32 access, struct mthca_mr *mr) { void *mailbox; u64 *mtt_entry; Index: src/linux-kernel/infiniband/hw/mthca/mthca_qp.c =================================================================== --- src/linux-kernel/infiniband/hw/mthca/mthca_qp.c (revision 541) +++ src/linux-kernel/infiniband/hw/mthca/mthca_qp.c (working copy) @@ -816,7 +816,7 @@ err = mthca_mr_alloc_phys(dev, qp->pd->pd_num, dma_list, shift, npages, - 0, size, 0, + 0, size, MTHCA_MPT_FLAG_LOCAL_WRITE | MTHCA_MPT_FLAG_LOCAL_READ, &qp->mr); From roland at topspin.com Fri Jul 30 06:37:48 2004 From: roland at topspin.com (Roland Dreier) Date: Fri, 30 Jul 2004 06:37:48 -0700 Subject: [openib-general] [PATCH] Nasty bug in ipoib In-Reply-To: (John Polstra's message of "Thu, 29 Jul 2004 18:37:48 -0700 (PDT)") References: Message-ID: <52wu0lsjir.fsf@topspin.com> Thanks, committed to my gen2 branch. Not sure why I never hit this, since I run with CONFIG_DEBUG_SLAB a lot of the time... - R. From halr at voltaire.com Fri Jul 30 07:09:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 10:09:01 -0400 Subject: [openib-general] Updated gsi.h Message-ID: <077d01c4763e$c49d2760$6401a8c0@comcast.net> In https://openib.org/svn/trunk/contrib/voltaire/access/gsi.h, gsi_register/redir_class changed to take pointer to ib_device struct rather than device name From Tom.Duffy at Sun.COM Fri Jul 30 10:44:29 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 30 Jul 2004 10:44:29 -0700 Subject: [openib-general] [PATCH] Fix underlying problem with using typedefs [WAS: Nasty bug] In-Reply-To: References: Message-ID: <1091209468.2772.1081.camel@localhost> On Thu, 2004-07-29 at 18:37, John Polstra wrote: > I stumbled onto a nasty bug in the ipoib code. The call to > kmem_cache_create() for the _tsIp2prLinkRoot.user_req cache specifies > the element size as the size of the pointer rather than the size of > the structures that are stored in the cache. The attached patch > (relative to the gen2 branch) fixes it. I think this should be fixed > in the gen1 branch as well, since it can scribble on kernel memory > that doesn't belong to it. This patch cleans up the underlying typedefs of pointers which cause this kind of crap to crop up. It also changes to using standard linux basic types. Index: drivers/infiniband/ulp/ipoib/ip2pr_priv.h =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_priv.h (revision 547) +++ drivers/infiniband/ulp/ipoib/ip2pr_priv.h (working copy) @@ -92,46 +92,43 @@ /* * IPoIB hardware address. */ -struct tIP2PR_IPOIB_ADDR_STRUCT { - tUINT32 qpn; /* MSB = reserved, low 3 bytes=QPN */ +struct ip2pr_ipoib_addr { + u32 qpn; /* MSB = reserved, low 3 bytes=QPN */ union { - tUINT8 all[16]; + u8 all[16]; struct { - tUINT64 high; - tUINT64 low; + u64 high; + u64 low; } s; } gid; } __attribute__ ((packed)); -typedef struct tIP2PR_IPOIB_ADDR_STRUCT tIP2PR_IPOIB_ADDR_STRUCT, - *tIP2PR_IPOIB_ADDR; + /* * The two src and dst addresses are the same size as the GRH. */ -struct tIP2PR_IPOIB_HDR_STRUCT { - tIP2PR_IPOIB_ADDR_STRUCT src; - tIP2PR_IPOIB_ADDR_STRUCT dst; - tUINT16 proto; - tUINT16 reserved; +struct ip2pr_ipoib_hdr { + struct ip2pr_ipoib_addr src; + struct ip2pr_ipoib_addr dst; + u16 proto; + u16 reserved; } __attribute__ ((packed)); /* * Ethernet/IPoIB pseudo ARP header, used by out IPoIB driver. */ -struct tIP2PR_IPOIB_ARP_STRUCT { - tUINT16 addr_type; /* format of hardware address */ - tUINT16 proto_type; /* format of protocol address */ - tUINT8 addr_len; /* length of hardware address */ - tUINT8 proto_len; /* length of protocol address */ - tUINT16 cmd; /* ARP opcode (command) */ +struct ip2pr_ipoib_arp { + u16 addr_type; /* format of hardware address */ + u16 proto_type; /* format of protocol address */ + u8 addr_len; /* length of hardware address */ + u8 proto_len; /* length of protocol address */ + u16 cmd; /* ARP opcode (command) */ /* * begin ethernet */ - tUINT8 src_hw[ETH_ALEN]; - tUINT32 src_ip; - tUINT8 dst_hw[ETH_ALEN]; - tUINT32 dst_ip; + u8 src_hw[ETH_ALEN]; + u32 src_ip; + u8 dst_hw[ETH_ALEN]; + u32 dst_ip; } __attribute__ ((packed)); -typedef struct tIP2PR_IPOIB_ARP_STRUCT tIP2PR_IPOIB_ARP_STRUCT, - *tIP2PR_IPOIB_ARP; typedef enum { IP2PR_LOCK_HELD = 0, @@ -193,98 +190,90 @@ /* * tables */ -typedef struct tIP2PR_IPOIB_WAIT_STRUCT tIP2PR_IPOIB_WAIT_STRUCT, - *tIP2PR_IPOIB_WAIT; + /* * wait for an ARP event to complete. */ -struct tIP2PR_IPOIB_WAIT_STRUCT { - tINT8 type; /* ip2pr or gid2pr */ +struct ip2pr_ipoib_wait { + s8 type; /* ip2pr or gid2pr */ tIP2PR_PATH_LOOKUP_ID plid; /* request identifier */ tPTR func; /* callback function for completion */ tPTR arg; /* user argument */ struct net_device *dev; /* ipoib device */ tTS_KERNEL_TIMER_STRUCT timer; /* retry timer */ - tUINT8 retry; /* retry counter */ - tUINT8 flags; /* usage flags */ - tUINT8 state; /* current state */ - tUINT8 hw[ETH_ALEN]; /* hardware address */ - tUINT32 src_addr; /* requested address. */ - tUINT32 dst_addr; /* requested address. */ - tUINT32 gw_addr; /* next hop IP address */ - tUINT8 local_rt; /* local route only */ - tINT32 bound_dev; /* bound device interface */ + u8 retry; /* retry counter */ + u8 flags; /* usage flags */ + u8 state; /* current state */ + u8 hw[ETH_ALEN]; /* hardware address */ + u32 src_addr; /* requested address. */ + u32 dst_addr; /* requested address. */ + u32 gw_addr; /* next hop IP address */ + u8 local_rt; /* local route only */ + s32 bound_dev; /* bound device interface */ tTS_IB_GID src_gid; /* source GID */ tTS_IB_GID dst_gid; /* destination GID */ u16 pkey; /* pkey to use */ tTS_IB_PORT hw_port; /* hardware port */ struct ib_device *ca; /* hardware HCA */ - tUINT32 prev_timeout; /* timeout value for pending request */ + u32 prev_timeout; /* timeout value for pending request */ tTS_IB_CLIENT_QUERY_TID tid; /* path record lookup transactionID */ spinlock_t lock; - tIP2PR_IPOIB_WAIT next; /* next element in wait list. */ - tIP2PR_IPOIB_WAIT *p_next; /* previous next element in list */ + struct ip2pr_ipoib_wait *next; /* next element in wait list. */ + struct ip2pr_ipoib_wait **p_next; /* previous next element in list */ struct work_struct arp_completion; -}; /* tIP2PR_IPOIB_WAIT_STRUCT */ +}; -typedef struct tIP2PR_PATH_ELEMENT_STRUCT tIP2PR_PATH_ELEMENT_STRUCT, - *tIP2PR_PATH_ELEMENT; /* * wait for an ARP event to complete. */ -struct tIP2PR_PATH_ELEMENT_STRUCT { - tUINT32 src_addr; /* requested address. */ - tUINT32 dst_addr; /* requested address. */ - tUINT32 usage; /* last used time. */ +struct ip2pr_path_element { + u32 src_addr; /* requested address. */ + u32 dst_addr; /* requested address. */ + u32 usage; /* last used time. */ tTS_IB_PORT hw_port; /* source port */ struct ib_device *ca; /* hardware HCA */ struct ib_path_record path_s; /* path structure */ - tIP2PR_PATH_ELEMENT next; /* next element in wait list. */ - tIP2PR_PATH_ELEMENT *p_next; /* previous next element in list */ -}; /* tIP2PR_PATH_ELEMENT_STRUCT */ + struct ip2pr_path_element *next; /* next element in wait list. */ + struct ip2pr_path_element **p_next; /* previous next element in list */ +}; -struct tIP2PR_USER_REQ_STRUCT { +struct ip2pr_user_req { struct ib_path_record path_record; - tINT32 status; + s32 status; struct ib_device *device; tTS_IB_PORT port; struct semaphore sem; }; -typedef struct tIP2PR_USER_REQ_STRUCT tIP2PR_USER_REQ_STRUCT, *tIP2PR_USER_REQ; /* * List of Path records cached on a port on a hca */ -typedef struct tIP2PR_GID_PR_ELEMENT_STRUCT tIP2PR_GID_PR_ELEMENT_STRUCT, - *tIP2PR_GID_PR_ELEMENT; -struct tIP2PR_GID_PR_ELEMENT_STRUCT { +struct ip2pr_gid_pr_element { struct ib_path_record path_record; - tUINT32 usage; /* last used time. */ - tIP2PR_GID_PR_ELEMENT next; - tIP2PR_GID_PR_ELEMENT *p_next; + u32 usage; /* last used time. */ + struct ip2pr_gid_pr_element *next; + struct ip2pr_gid_pr_element **p_next; }; /* * List of Source GID's */ -typedef struct tIP2PR_SGID_ELEMENT_STRUCT tIP2PR_SGID_ELEMENT_STRUCT, - *tIP2PR_SGID_ELEMENT; -struct tIP2PR_SGID_ELEMENT_STRUCT { +struct ip2pr_sgid_element { tTS_IB_GID gid; struct ib_device *ca; tTS_IB_PORT port; enum ib_port_state port_state; int gid_index; - tIP2PR_GID_PR_ELEMENT pr_list; - tIP2PR_SGID_ELEMENT next; /* next element in the GID list */ - tIP2PR_SGID_ELEMENT *p_next; /* previous next element in the list */ + struct ip2pr_gid_pr_element *pr_list; + struct ip2pr_sgid_element *next; /* next element in the GID list */ + struct ip2pr_sgid_element **p_next; /* previous next element in the list */ }; -struct tIP2PR_LINK_ROOT_STRUCT { +struct ip2pr_link_root { /* * waiting for resolution table */ - tIP2PR_IPOIB_WAIT wait_list; + struct ip2pr_ipoib_wait *wait_list; kmem_cache_t *wait_cache; spinlock_t wait_lock; int max_retries; @@ -296,7 +285,7 @@ /* * path record cache list. */ - tIP2PR_PATH_ELEMENT path_list; + struct ip2pr_path_element *path_list; kmem_cache_t *path_cache; spinlock_t path_lock; /* @@ -307,14 +296,12 @@ /* * source gid list */ - tIP2PR_SGID_ELEMENT src_gid_list; + struct ip2pr_sgid_element *src_gid_list; kmem_cache_t *src_gid_cache; kmem_cache_t *gid_pr_cache; spinlock_t gid_lock; -}; /* tIP2PR_LINK_ROOT_STRUCT */ -typedef struct tIP2PR_LINK_ROOT_STRUCT tIP2PR_LINK_ROOT_STRUCT, - *tIP2PR_LINK_ROOT; +}; #define TS_EXPECT(mod, expr) #define TS_CHECK_NULL(value, result) Index: drivers/infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_link.c (revision 547) +++ drivers/infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -35,7 +35,7 @@ static unsigned int ip2pr_path_timeout = 0; static unsigned int ip2pr_total_fail = 0; -static tIP2PR_LINK_ROOT_STRUCT _tsIp2prLinkRoot = { +static struct ip2pr_link_root _tsIp2prLinkRoot = { wait_list:NULL, path_list:NULL, wait_lock:SPIN_LOCK_UNLOCKED, @@ -47,7 +47,7 @@ gid_lock:SPIN_LOCK_UNLOCKED }; -tINT32 _tsIp2PrnDelete(tIP2PR_GID_PR_ELEMENT pr_elmt); +s32 _tsIp2PrnDelete(struct ip2pr_gid_pr_element *pr_elmt); static tTS_IB_GID nullgid = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; @@ -62,8 +62,9 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prPathElementLookup -- lookup a path record entry */ -static tIP2PR_PATH_ELEMENT _tsIp2prPathElementLookup(tUINT32 ip_addr) { - tIP2PR_PATH_ELEMENT path_elmt; +static struct ip2pr_path_element *_tsIp2prPathElementLookup(u32 ip_addr) +{ + struct ip2pr_path_element *path_elmt; for (path_elmt = _tsIp2prLinkRoot.path_list; NULL != path_elmt; path_elmt = path_elmt->next) { @@ -79,13 +80,12 @@ /* ========================================================================= */ /*.._tsIp2prPathElementCreate -- create an entry for a path record element */ -static tINT32 _tsIp2prPathElementCreate - (tUINT32 dst_addr, - tUINT32 src_addr, - tTS_IB_PORT hw_port, - struct ib_device *ca, - struct ib_path_record *path_r, tIP2PR_PATH_ELEMENT * return_elmt) { - tIP2PR_PATH_ELEMENT path_elmt; +static s32 _tsIp2prPathElementCreate(u32 dst_addr, u32 src_addr, + tTS_IB_PORT hw_port, struct ib_device *ca, + struct ib_path_record *path_r, + struct ip2pr_path_element **return_elmt) +{ + struct ip2pr_path_element *path_elmt; unsigned long flags; TS_CHECK_NULL(path_r, -EINVAL); @@ -98,7 +98,7 @@ return -ENOMEM; } /* if */ - memset(path_elmt, 0, sizeof(tIP2PR_PATH_ELEMENT_STRUCT)); + memset(path_elmt, 0, sizeof(struct ip2pr_path_element)); spin_lock_irqsave(&_tsIp2prLinkRoot.path_lock, flags); path_elmt->next = _tsIp2prLinkRoot.path_list; @@ -127,7 +127,8 @@ /* ========================================================================= */ /*.._tsIp2prPathElementDestroy -- destroy an entry for a path record element */ -static tINT32 _tsIp2prPathElementDestroy(tIP2PR_PATH_ELEMENT path_elmt) { +static s32 _tsIp2prPathElementDestroy(struct ip2pr_path_element *path_elmt) +{ unsigned long flags; TS_CHECK_NULL(path_elmt, -EINVAL); @@ -154,9 +155,11 @@ /* ========================================================================= */ /*.._tsIp2prPathLookupComplete -- complete the resolution of a path record */ -static tINT32 _tsIp2prPathLookupComplete - (tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, tIP2PR_PATH_ELEMENT path_elmt, tPTR funcptr, tPTR arg) { +static s32 _tsIp2prPathLookupComplete(tIP2PR_PATH_LOOKUP_ID plid, + s32 status, + struct ip2pr_path_element *path_elmt, + tPTR funcptr, tPTR arg) +{ tIP2PR_PATH_LOOKUP_FUNC func = (tIP2PR_PATH_LOOKUP_FUNC) funcptr; TS_CHECK_NULL(func, -EINVAL); @@ -184,8 +187,8 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prIpoibWaitDestroy -- destroy an entry for an outstanding request */ -static tINT32 _tsIp2prIpoibWaitDestroy - (tIP2PR_IPOIB_WAIT ipoib_wait, IP2PR_USE_LOCK use_lock) { +static s32 _tsIp2prIpoibWaitDestroy + (struct ip2pr_ipoib_wait *ipoib_wait, IP2PR_USE_LOCK use_lock) { unsigned long flags = 0; TS_CHECK_NULL(ipoib_wait, -EINVAL); @@ -214,9 +217,10 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitTimeout -- timeout function for link resolution */ -static void _tsIp2prIpoibWaitTimeout(tPTR arg) { - tIP2PR_IPOIB_WAIT ipoib_wait = (tIP2PR_IPOIB_WAIT) arg; - tINT32 result; +static void _tsIp2prIpoibWaitTimeout(tPTR arg) +{ + struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *)arg; + s32 result; if (NULL == ipoib_wait) { @@ -294,21 +298,19 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitCreate -- create an entry for an outstanding request */ -static tIP2PR_IPOIB_WAIT _tsIp2prIpoibWaitCreate - (tIP2PR_PATH_LOOKUP_ID plid, - tUINT32 dst_addr, - tUINT32 src_addr, - tUINT8 localroute, - tINT32 bound_dev_if, - tIP2PR_PATH_LOOKUP_FUNC func, tPTR arg, tINT32 ltype) { - tIP2PR_IPOIB_WAIT ipoib_wait; +static struct ip2pr_ipoib_wait * +_tsIp2prIpoibWaitCreate(tIP2PR_PATH_LOOKUP_ID plid, u32 dst_addr, u32 src_addr, + u8 localroute, u32 bound_dev_if, + tIP2PR_PATH_LOOKUP_FUNC func, tPTR arg, s32 ltype) +{ + struct ip2pr_ipoib_wait *ipoib_wait; TS_CHECK_NULL(_tsIp2prLinkRoot.wait_cache, NULL); ipoib_wait = kmem_cache_alloc(_tsIp2prLinkRoot.wait_cache, SLAB_ATOMIC); if (NULL != ipoib_wait) { - memset(ipoib_wait, 0, sizeof(tIP2PR_IPOIB_WAIT_STRUCT)); + memset(ipoib_wait, 0, sizeof(struct ip2pr_ipoib_wait)); /* * start timer only for IP to PR lookups @@ -349,7 +351,8 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitListInsert -- insert an entry into the wait list */ -static tINT32 _tsIp2prIpoibWaitListInsert(tIP2PR_IPOIB_WAIT ipoib_wait) { +static s32 _tsIp2prIpoibWaitListInsert(struct ip2pr_ipoib_wait *ipoib_wait) +{ unsigned long flags; TS_CHECK_NULL(ipoib_wait, -EINVAL); @@ -385,9 +388,11 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitPlidLookup -- lookup an entry for an outstanding request */ -static tIP2PR_IPOIB_WAIT tsIp2prIpoibWaitPlidLookup(tIP2PR_PATH_LOOKUP_ID plid) { +static struct ip2pr_ipoib_wait * +tsIp2prIpoibWaitPlidLookup(tIP2PR_PATH_LOOKUP_ID plid) +{ unsigned long flags; - tIP2PR_IPOIB_WAIT ipoib_wait; + struct ip2pr_ipoib_wait *ipoib_wait; spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); for (ipoib_wait = _tsIp2prLinkRoot.wait_list; @@ -405,11 +410,12 @@ /* ========================================================================= */ /*..tsIp2prPathElementTableDump - dump the path record element table to proc */ -tINT32 tsIp2prPathElementTableDump - (tSTR buffer, tINT32 max_size, tINT32 start_index, long *end_index) { - tIP2PR_PATH_ELEMENT path_elmt; - tINT32 counter = 0; - tINT32 offset = 0; +s32 tsIp2prPathElementTableDump(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) +{ + struct ip2pr_path_element *path_elmt; + s32 counter = 0; + s32 offset = 0; unsigned long flags; TS_CHECK_NULL(buffer, -EINVAL); @@ -448,12 +454,12 @@ ((path_elmt->dst_addr >> 16) & 0xff), ((path_elmt->dst_addr >> 24) & 0xff), (unsigned long long) - be64_to_cpu(*(tUINT64 *) path_elmt-> + be64_to_cpu(*(u64 *) path_elmt-> path_s.dgid), (unsigned long long) - be64_to_cpu(*(tUINT64 *) + be64_to_cpu(*(u64 *) (path_elmt->path_s.dgid + - sizeof(tUINT64))), + sizeof(u64))), path_elmt->path_s.dlid, path_elmt->path_s.slid, path_elmt->path_s.pkey, @@ -473,11 +479,13 @@ /* ========================================================================= */ /*..tsIp2prIpoibWaitTableDump - dump the address resolution wait table to proc */ -tINT32 tsIp2prIpoibWaitTableDump - (tSTR buffer, tINT32 max_size, tINT32 start_index, long *end_index) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tINT32 counter = 0; - tINT32 offset = 0; +s32 +tsIp2prIpoibWaitTableDump(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) +{ + struct ip2pr_ipoib_wait *ipoib_wait; + s32 counter = 0; + s32 offset = 0; unsigned long flags; TS_CHECK_NULL(buffer, -EINVAL); @@ -530,11 +538,10 @@ } /* tsIp2prIpoibWaitTableDump */ /* ..tsIp2prProcReadInt. dump integer value to /proc file */ -tINT32 tsIp2prProcReadInt(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index, int val) +s32 tsIp2prProcReadInt(tSTR buffer, s32 max_size, s32 start_index, + long *end_index, int val) { - tINT32 offset = 0; + s32 offset = 0; TS_CHECK_NULL(buffer, -EINVAL); TS_CHECK_NULL(end_index, -EINVAL); @@ -547,9 +554,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcRetriesRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcRetriesRead(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -559,9 +565,8 @@ } /* ..tsIp2prProcTimeoutRead. dump current timeout value */ -tINT32 tsIp2prProcTimeoutRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcTimeoutRead(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -571,9 +576,8 @@ } /* ..tsIp2prProcBackoutRead. dump current backout value */ -tINT32 tsIp2prProcBackoffRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcBackoffRead(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -583,9 +587,8 @@ } /* ..tsIp2prProcCacheTimeoutRead. dump current cache timeout value */ -tINT32 tsIp2prProcCacheTimeoutRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcCacheTimeoutRead(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -595,8 +598,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcTotalReq(tSTR buffer, - tINT32 max_size, tINT32 start_index, long *end_index) +s32 tsIp2prProcTotalReq(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -605,9 +608,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcArpTimeout(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcArpTimeout(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -616,9 +618,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcPathTimeout(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcPathTimeout(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -627,9 +628,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcTotalFail(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcTotalFail(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -726,12 +726,13 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prPathRecordComplete -- path lookup complete, save result */ -static tINT32 _tsIp2prPathRecordComplete - (tTS_IB_CLIENT_QUERY_TID tid, - tINT32 status, struct ib_path_record *path, tINT32 remaining, tPTR arg) { - tIP2PR_IPOIB_WAIT ipoib_wait = (tIP2PR_IPOIB_WAIT) arg; - tIP2PR_PATH_ELEMENT path_elmt = NULL; - tINT32 result; +static s32 _tsIp2prPathRecordComplete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, + struct ib_path_record *path, + s32 remaining, tPTR arg) +{ + struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *) arg; + struct ip2pr_path_element *path_elmt = NULL; + s32 result; TS_CHECK_NULL(ipoib_wait, -EINVAL); TS_CHECK_NULL(path, -EINVAL); @@ -803,9 +804,9 @@ TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "POST: Path record lookup complete. <%016llx:%016llx:%d>", - be64_to_cpu(*(tUINT64 *) path->dgid), - be64_to_cpu(*(tUINT64 *) - (path->dgid + sizeof(tUINT64))), + be64_to_cpu(*(u64 *) path->dgid), + be64_to_cpu(*(u64 *) + (path->dgid + sizeof(u64))), path->dlid); result = _tsIp2prPathElementCreate(ipoib_wait->dst_addr, @@ -867,10 +868,11 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prLinkFindComplete -- complete the resolution of an ip address */ -static tINT32 _tsIp2prLinkFindComplete - (tIP2PR_IPOIB_WAIT ipoib_wait, tINT32 status, IP2PR_USE_LOCK use_lock) { - tINT32 result = 0; - tINT32 expect; +static s32 _tsIp2prLinkFindComplete(struct ip2pr_ipoib_wait *ipoib_wait, + s32 status, IP2PR_USE_LOCK use_lock) +{ + s32 result = 0; + s32 expect; unsigned long flags; TS_CHECK_NULL(ipoib_wait, -EINVAL); @@ -968,7 +970,8 @@ /* ========================================================================= */ /*.._tsIp2prArpQuery -- query arp cache */ -static int tsIp2prArpQuery(tIP2PR_IPOIB_WAIT ipoib_wait, tUINT32 * state) { +static int tsIp2prArpQuery(struct ip2pr_ipoib_wait *ipoib_wait, u32 * state) +{ struct neighbour *neigh; extern struct neigh_table arp_tbl; @@ -990,9 +993,10 @@ /* ========================================================================= */ /*.._tsIp2prLinkFind -- resolve an ip address to a ipoib link address. */ -static tINT32 _tsIp2prLinkFind(tIP2PR_IPOIB_WAIT ipoib_wait) { - tINT32 result; - tUINT32 state; +static s32 _tsIp2prLinkFind(struct ip2pr_ipoib_wait *ipoib_wait) +{ + s32 result; + u32 state; struct rtable *rt; char devname[20]; int i; @@ -1223,11 +1227,12 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prArpRecvComplete -- receive all ARP packets. */ -static void _tsIp2prArpRecvComplete(tPTR arg) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tIP2PR_IPOIB_WAIT next_wait; - tUINT32 ip_addr = (unsigned long)arg; - tINT32 result; +static void _tsIp2prArpRecvComplete(tPTR arg) +{ + struct ip2pr_ipoib_wait *ipoib_wait; + struct ip2pr_ipoib_wait *next_wait; + u32 ip_addr = (unsigned long)arg; + s32 result; unsigned long flags; TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, @@ -1276,12 +1281,12 @@ /* ========================================================================= */ /*.._tsIp2prArpRecv -- receive all ARP packets. */ -static tINT32 _tsIp2prArpRecv(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pt) +static s32 _tsIp2prArpRecv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tIP2PR_IPOIB_ARP arp_hdr; - tINT32 counter; + struct ip2pr_ipoib_wait *ipoib_wait; + struct ip2pr_ipoib_arp *arp_hdr; + s32 counter; unsigned long flags; struct work_struct *tqp = NULL; @@ -1289,7 +1294,7 @@ TS_CHECK_NULL(skb, -EINVAL); TS_CHECK_NULL(skb->nh.raw, -EINVAL); - arp_hdr = (tIP2PR_IPOIB_ARP) skb->nh.raw; + arp_hdr = (struct ip2pr_ipoib_arp *) skb->nh.raw; #if 0 TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, @@ -1364,11 +1369,11 @@ static void _tsIp2prAsyncEventFunc(struct ib_async_event_record *record, void *arg) { - tIP2PR_PATH_ELEMENT path_elmt; - tINT32 result; - tIP2PR_SGID_ELEMENT sgid_elmt; + struct ip2pr_path_element *path_elmt; + s32 result; + struct ip2pr_sgid_element *sgid_elmt; unsigned long flags; - tIP2PR_GID_PR_ELEMENT prn_elmt; + struct ip2pr_gid_pr_element *prn_elmt; if (NULL == record) { @@ -1432,12 +1437,13 @@ /* ========================================================================= */ /*.._tsIp2prPathSweepTimerFunc --sweep path cache to reap old entries. */ -static void _tsIp2prPathSweepTimerFunc(tPTR arg) { - tIP2PR_PATH_ELEMENT path_elmt; - tIP2PR_PATH_ELEMENT next_elmt; - tINT32 result; - tIP2PR_SGID_ELEMENT sgid_elmt; - tIP2PR_GID_PR_ELEMENT prn_elmt, next_prn; +static void _tsIp2prPathSweepTimerFunc(tPTR arg) +{ + struct ip2pr_path_element *path_elmt; + struct ip2pr_path_element *next_elmt; + s32 result; + struct ip2pr_sgid_element *sgid_elmt; + struct ip2pr_gid_pr_element *prn_elmt, *next_prn; /* cache_timeout of zero implies static path records. */ if (_tsIp2prLinkRoot.cache_timeout) { @@ -1448,7 +1454,7 @@ while (NULL != path_elmt) { next_elmt = path_elmt->next; if (!((_tsIp2prLinkRoot.cache_timeout * HZ) > - (tINT32) (jiffies - path_elmt->usage))) { + (s32) (jiffies - path_elmt->usage))) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, @@ -1472,7 +1478,7 @@ while (NULL != prn_elmt) { next_prn = prn_elmt->next; if (!((_tsIp2prLinkRoot.cache_timeout * HZ) > - (tINT32) (jiffies - prn_elmt->usage))) { + (s32) (jiffies - prn_elmt->usage))) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, @@ -1502,16 +1508,14 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..tsSdpPathRecordLookup -- resolve an ip address to a path record */ -tINT32 tsIp2prPathRecordLookup(tUINT32 dst_addr, /* NBO */ - tUINT32 src_addr, /* NBO */ - tUINT8 localroute, - tINT32 bound_dev_if, - tIP2PR_PATH_LOOKUP_FUNC func, - tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid) { - tIP2PR_PATH_ELEMENT path_elmt; - tIP2PR_IPOIB_WAIT ipoib_wait; - tINT32 result = 0; - tINT32 expect; +s32 tsIp2prPathRecordLookup(u32 dst_addr, u32 src_addr, u8 localroute, + s32 bound_dev_if, tIP2PR_PATH_LOOKUP_FUNC func, + tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid) +{ + struct ip2pr_path_element *path_elmt; + struct ip2pr_ipoib_wait *ipoib_wait; + s32 result = 0; + s32 expect; TS_CHECK_NULL(plid, -EINVAL); TS_CHECK_NULL(func, -EINVAL); @@ -1579,9 +1583,10 @@ /* ========================================================================= */ /*..tsIp2prPathRecordCancel -- cancel a lookup for an address. */ -tINT32 tsIp2prPathRecordCancel(tIP2PR_PATH_LOOKUP_ID plid) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tINT32 result; +s32 tsIp2prPathRecordCancel(tIP2PR_PATH_LOOKUP_ID plid) +{ + struct ip2pr_ipoib_wait *ipoib_wait; + s32 result; if (TS_IP2PR_PATH_LOOKUP_INVALID == plid) { @@ -1622,9 +1627,10 @@ } /* tsIp2prPathRecordCancel */ /*..tsGid2prCancel -- cancel a lookup for an address. */ -tINT32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tINT32 result; +s32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid) +{ + struct ip2pr_ipoib_wait *ipoib_wait; + s32 result; if (TS_IP2PR_PATH_LOOKUP_INVALID == plid) { @@ -1672,12 +1678,12 @@ /* ========================================================================= */ /*.._tsIp2prGidCacheLookup -- Lookup for GID in cache */ -tINT32 _tsIp2prGidCacheLookup - (tTS_IB_GID src_gid, - tTS_IB_GID dst_gid, - struct ib_path_record *path_record, tIP2PR_SGID_ELEMENT * gid_node) { - tIP2PR_SGID_ELEMENT sgid_elmt; - tIP2PR_GID_PR_ELEMENT prn_elmt; +s32 _tsIp2prGidCacheLookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, + struct ib_path_record *path_record, + struct ip2pr_sgid_element **gid_node) +{ + struct ip2pr_sgid_element *sgid_elmt; + struct ip2pr_gid_pr_element *prn_elmt; unsigned long flags; *gid_node = NULL; @@ -1739,8 +1745,10 @@ /* ========================================================================= */ /*.._tsIp2prSrcGidNodeGet -- */ -tINT32 _tsIp2prSrcGidNodeGet(tTS_IB_GID src_gid, tIP2PR_SGID_ELEMENT * gid_node) { - tIP2PR_SGID_ELEMENT sgid_elmt; +s32 _tsIp2prSrcGidNodeGet(tTS_IB_GID src_gid, + struct ip2pr_sgid_element **gid_node) +{ + struct ip2pr_sgid_element *sgid_elmt; unsigned long flags; *gid_node = NULL; @@ -1761,11 +1769,12 @@ /* ========================================================================= */ /*.._tsIp2prGidElementAdd -- Add one node to Source GID List. */ -tINT32 _tsIp2prGidElementAdd - (tIP2PR_IPOIB_WAIT ipoib_wait, struct ib_path_record *path_record) { +s32 _tsIp2prGidElementAdd(struct ip2pr_ipoib_wait *ipoib_wait, + struct ib_path_record *path_record) +{ unsigned long flags; - tIP2PR_SGID_ELEMENT gid_node = NULL; - tIP2PR_GID_PR_ELEMENT prn_elmt; + struct ip2pr_sgid_element *gid_node = NULL; + struct ip2pr_gid_pr_element *prn_elmt; if (_tsIp2prSrcGidNodeGet(ipoib_wait->src_gid, &gid_node)) { return (-EINVAL); @@ -1798,7 +1807,8 @@ return (0); } -tINT32 _tsIp2PrnDelete(tIP2PR_GID_PR_ELEMENT prn_elmt) { +s32 _tsIp2PrnDelete(struct ip2pr_gid_pr_element *prn_elmt) +{ if (NULL != prn_elmt->p_next) { if (NULL != prn_elmt->next) { @@ -1817,9 +1827,10 @@ /* ========================================================================= */ /*.._tsIp2prSrcGidDelete -- Cleanup one node in Source GID List. */ -tINT32 _tsIp2prSrcGidDelete(tIP2PR_SGID_ELEMENT sgid_elmt) { +s32 _tsIp2prSrcGidDelete(struct ip2pr_sgid_element *sgid_elmt) +{ unsigned long flags; - tIP2PR_GID_PR_ELEMENT prn_elmt; + struct ip2pr_gid_pr_element *prn_elmt; spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); @@ -1850,11 +1861,11 @@ /* ========================================================================= */ /*.._tsIp2prSrcGidAdd -- Add one node to Source GID List. */ -tINT32 _tsIp2prSrcGidAdd(struct ib_device *hca_device, - tTS_IB_PORT port, - enum ib_port_state port_state) +s32 _tsIp2prSrcGidAdd(struct ib_device *hca_device, + tTS_IB_PORT port, + enum ib_port_state port_state) { - tIP2PR_SGID_ELEMENT sgid_elmt; + struct ip2pr_sgid_element *sgid_elmt; unsigned long flags; sgid_elmt = @@ -1865,7 +1876,7 @@ return (-ENOMEM); } - memset(sgid_elmt, 0, sizeof(tIP2PR_SGID_ELEMENT_STRUCT)); + memset(sgid_elmt, 0, sizeof(struct ip2pr_sgid_element)); if (ib_gid_entry_get(hca_device, port, 0, sgid_elmt->gid)) { kmem_cache_free(_tsIp2prLinkRoot.src_gid_cache, sgid_elmt); return (-EFAULT); @@ -1899,11 +1910,12 @@ /* ========================================================================= */ /*.._tsGid2prComplete -- path lookup complete, save result */ -static tINT32 _tsGid2prComplete - (tTS_IB_CLIENT_QUERY_TID tid, - tINT32 status, struct ib_path_record *path, tINT32 remaining, tPTR arg) { - tINT32 result; - tIP2PR_IPOIB_WAIT ipoib_wait = (tIP2PR_IPOIB_WAIT) arg; +static s32 _tsGid2prComplete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, + struct ib_path_record *path, s32 remaining, + tPTR arg) +{ + s32 result; + struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *) arg; tGID2PR_LOOKUP_FUNC func; if (tid != ipoib_wait->tid) { @@ -1962,14 +1974,13 @@ /* ========================================================================= */ /*..tsGid2prLookup -- Resolve a destination GD to Path Record */ -tINT32 tsGid2prLookup - (tTS_IB_GID src_gid, - tTS_IB_GID dst_gid, - u16 pkey, - tGID2PR_LOOKUP_FUNC funcptr, tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid) { - tIP2PR_SGID_ELEMENT gid_node; - tINT32 result; - tIP2PR_IPOIB_WAIT ipoib_wait; +s32 tsGid2prLookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, u16 pkey, + tGID2PR_LOOKUP_FUNC funcptr, tPTR arg, + tIP2PR_PATH_LOOKUP_ID * plid) +{ + struct ip2pr_sgid_element *gid_node; + s32 result; + struct ip2pr_ipoib_wait *ipoib_wait; struct ib_path_record path_record; tGID2PR_LOOKUP_FUNC func; @@ -2052,9 +2063,10 @@ /* ========================================================================= */ /*..tsIp2prSrcGidCleanup -- Cleanup the Source GID List. */ -tINT32 tsIp2prSrcGidCleanup(void) { - tIP2PR_SGID_ELEMENT sgid_elmt; - tINT32 result; +s32 tsIp2prSrcGidCleanup(void) +{ + struct ip2pr_sgid_element *sgid_elmt; + s32 result; while (NULL != (sgid_elmt = _tsIp2prLinkRoot.src_gid_list)) { @@ -2070,8 +2082,9 @@ /* ========================================================================= */ /*..tsIp2prSrcGidInit -- initialize the Source GID List. */ -tINT32 tsIp2prSrcGidInit(void) { - tINT32 result = 0; +s32 tsIp2prSrcGidInit(void) +{ + s32 result = 0; int i, j; struct ib_device *hca_device; struct ib_device_properties dev_prop; @@ -2079,7 +2092,7 @@ _tsIp2prLinkRoot.src_gid_cache = kmem_cache_create("Ip2prSrcGidList", sizeof - (tIP2PR_SGID_ELEMENT_STRUCT), + (struct ip2pr_sgid_element), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); @@ -2092,7 +2105,7 @@ /* if */ _tsIp2prLinkRoot.gid_pr_cache = kmem_cache_create("Ip2prGidPrList", sizeof - (tIP2PR_GID_PR_ELEMENT_STRUCT), + (struct ip2pr_gid_pr_element), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.gid_pr_cache) { @@ -2138,8 +2151,9 @@ /* ========================================================================= */ /*..tsIp2prLinkAddrInit -- initialize the advertisment caches. */ -tINT32 tsIp2prLinkAddrInit(void) { - tINT32 result = 0; +s32 tsIp2prLinkAddrInit(void) +{ + s32 result = 0; struct ib_async_event_record evt_rec; int i; struct ib_device *hca_device; @@ -2161,7 +2175,7 @@ */ _tsIp2prLinkRoot.wait_cache = kmem_cache_create("Ip2prIpoibWait", sizeof - (tIP2PR_IPOIB_WAIT_STRUCT), + (struct ip2pr_ipoib_wait), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.wait_cache) { @@ -2175,7 +2189,7 @@ /* if */ _tsIp2prLinkRoot.path_cache = kmem_cache_create("Ip2prPathLookup", sizeof - (tIP2PR_PATH_ELEMENT_STRUCT), + (struct ip2pr_path_element), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.path_cache) { @@ -2188,7 +2202,8 @@ } /* if */ _tsIp2prLinkRoot.user_req = kmem_cache_create("Ip2prUserReq", - sizeof(tIP2PR_USER_REQ_STRUCT), + sizeof + (struct ip2pr_user_req), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.user_req) { @@ -2286,10 +2301,11 @@ /* ========================================================================= */ /*..tsIp2prLinkAddrCleanup -- cleanup the advertisment caches. */ -tINT32 tsIp2prLinkAddrCleanup(void) { - tIP2PR_PATH_ELEMENT path_elmt; - tIP2PR_IPOIB_WAIT ipoib_wait; - tUINT32 result; +s32 tsIp2prLinkAddrCleanup(void) +{ + struct ip2pr_path_element *path_elmt; + struct ip2pr_ipoib_wait *ipoib_wait; + u32 result; int i; TS_CHECK_NULL(_tsIp2prLinkRoot.wait_cache, -EINVAL); @@ -2346,20 +2362,18 @@ /* ========================================================================= */ /*..tsIp2prCbInternal -- Callback for IP to Path Record Lookup */ -static tINT32 _tsIp2prCbInternal - (tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, - tUINT32 src_addr, - tUINT32 dst_addr, - tTS_IB_PORT hw_port, - struct ib_device *ca, struct ib_path_record *path, tPTR usr_arg) { - tIP2PR_USER_REQ ureq; +static s32 _tsIp2prCbInternal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, + u32 src_addr, u32 dst_addr, tTS_IB_PORT hw_port, + struct ib_device *ca, struct ib_path_record *path, + tPTR usr_arg) +{ + struct ip2pr_user_req *ureq; if (usr_arg == NULL) { TS_REPORT_WARN(MOD_IP2PR, "Called with a NULL usr_arg"); return -1; } - ureq = (tIP2PR_USER_REQ) usr_arg; + ureq = (struct ip2pr_user_req*) usr_arg; ureq->status = status; if (0 == status) { @@ -2373,18 +2387,17 @@ /* ========================================================================= */ /*..tsIp2prCbInternal -- Callback for Gid to Path Record Lookup */ -static tINT32 _tsGid2prCbInternal - (tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, - tTS_IB_PORT hw_port, - struct ib_device *ca, struct ib_path_record *path, tPTR usr_arg) { - tIP2PR_USER_REQ ureq; +static s32 _tsGid2prCbInternal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, + tTS_IB_PORT hw_port, struct ib_device *ca, + struct ib_path_record *path, tPTR usr_arg) +{ + struct ip2pr_user_req *ureq; if (usr_arg == NULL) { TS_REPORT_WARN(MOD_IP2PR, "Called with a NULL usr_arg"); return -1; } - ureq = (tIP2PR_USER_REQ) usr_arg; + ureq = (struct ip2pr_user_req *)usr_arg; ureq->status = status; ureq->port = hw_port; ureq->device = ca; @@ -2400,10 +2413,11 @@ /* ========================================================================= */ /*..tsIp2prUserLookup -- Process a IP to Path Record lookup ioctl request */ -tINT32 _tsIp2prUserLookup(unsigned long arg) { - tIP2PR_USER_REQ ureq; +s32 _tsIp2prUserLookup(unsigned long arg) +{ + struct ip2pr_user_req *ureq; tIP2PR_LOOKUP_PARAM_STRUCT param; - tINT32 status; + s32 status; tIP2PR_PATH_LOOKUP_ID plid; if (0 == arg) { @@ -2452,11 +2466,12 @@ /* ========================================================================= */ /*..tsGid2prUserLookup -- Process a Gid to Path Record lookup ioctl request */ -tINT32 _tsGid2prUserLookup(unsigned long arg) { - tIP2PR_USER_REQ ureq; +s32 _tsGid2prUserLookup(unsigned long arg) +{ + struct ip2pr_user_req *ureq; tGID2PR_LOOKUP_PARAM_STRUCT param; tGID2PR_LOOKUP_PARAM upa; - tINT32 status; + s32 status; tIP2PR_PATH_LOOKUP_ID plid; if (0 == arg) { From greg at kroah.com Fri Jul 30 11:03:53 2004 From: greg at kroah.com (Greg KH) Date: Fri, 30 Jul 2004 11:03:53 -0700 Subject: [openib-general] [PATCH] Fix underlying problem with using typedefs [WAS: Nasty bug] In-Reply-To: <1091209468.2772.1081.camel@localhost> References: <1091209468.2772.1081.camel@localhost> Message-ID: <20040730180353.GA29306@kroah.com> On Fri, Jul 30, 2004 at 10:44:29AM -0700, Tom Duffy wrote: > + struct ip2pr_path_element *path_elmt; > unsigned long flags; > > TS_CHECK_NULL(path_r, -EINVAL); > @@ -98,7 +98,7 @@ > return -ENOMEM; > } > /* if */ > - memset(path_elmt, 0, sizeof(tIP2PR_PATH_ELEMENT_STRUCT)); > + memset(path_elmt, 0, sizeof(struct ip2pr_path_element)); No, that should be: memset(path_elmt, 0, sizeof(*path_elmt)); to prevent any future change of that variable type causing a problem. Please change your patch to use this style. thanks, greg k-h From mshefty at ichips.intel.com Fri Jul 30 10:20:22 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 10:20:22 -0700 Subject: [openib-general] ib_query/modify_hca_xxx in ib_verbs In-Reply-To: <00ae01c47626$809e2860$6401a8c0@comcast.net> References: <00ae01c47626$809e2860$6401a8c0@comcast.net> Message-ID: <20040730102022.7f0e4e0b.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 07:15:19 -0400 Hal Rosenstock wrote: > Should the ib_query/modify_hca_xxx calls now be ib_query/modify_device_xxx > calls ? Here's a patch that renames these calls and simplifies the query pkey/gid routines. Unless there are objections, I will update the file. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 544) +++ ib_verbs.h (working copy) @@ -534,28 +534,26 @@ IB_CQ_NEXT_COMP }; -int ib_query_hca_cap(struct ib_device *device, - struct ib_device_cap *device_cap); +int ib_query_device(struct ib_device *device, + struct ib_device_cap *device_cap); -int ib_query_hca_port_prop(struct ib_device *device, - u8 port_num, - struct ib_port *port); - -int ib_query_hca_gid_tbl(struct ib_device *device, - u8 port_num, - int tbl_len_in, - int *tbl_len_out, - union ib_gid *gid_tbl); - -int ib_query_hca_pkey_tbl(struct ib_device *device, - u8 port_num, - int tbl_len_in, - int *tbl_len_out, - u16 *pkey_tbl); - -int ib_modify_hca_attr(struct ib_device *device, - u8 port_num, - int device_attr_flags); +int ib_query_port(struct ib_device *device, + u8 port_num, + struct ib_port *port); + +int ib_query_gid(struct ib_device *device, + u8 port_num, + int index, + union ib_gid *gid); + +int ib_query_pkey(struct ib_device *device, + u8 port_num, + u16 index, + u16 *pkey); + +int ib_modify_device(struct ib_device *device, + u8 port_num, + int device_attr_flags); struct ib_pd *ib_alloc_pd(struct ib_device *device); From Tom.Duffy at Sun.COM Fri Jul 30 11:42:05 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 30 Jul 2004 11:42:05 -0700 Subject: [openib-general] [PATCH] [FIXED] Fix underlying problem with using typedefs In-Reply-To: <20040730180353.GA29306@kroah.com> References: <1091209468.2772.1081.camel@localhost> <20040730180353.GA29306@kroah.com> Message-ID: <1091212925.2772.1148.camel@localhost> On Fri, 2004-07-30 at 11:03, Greg KH wrote: > No, that should be: > memset(path_elmt, 0, sizeof(*path_elmt)); > > to prevent any future change of that variable type causing a problem. > > Please change your patch to use this style. OK, I changed it throughout ip2pr_link.c Index: drivers/infiniband/ulp/ipoib/ip2pr_priv.h =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_priv.h (revision 547) +++ drivers/infiniband/ulp/ipoib/ip2pr_priv.h (working copy) @@ -92,46 +92,43 @@ /* * IPoIB hardware address. */ -struct tIP2PR_IPOIB_ADDR_STRUCT { - tUINT32 qpn; /* MSB = reserved, low 3 bytes=QPN */ +struct ip2pr_ipoib_addr { + u32 qpn; /* MSB = reserved, low 3 bytes=QPN */ union { - tUINT8 all[16]; + u8 all[16]; struct { - tUINT64 high; - tUINT64 low; + u64 high; + u64 low; } s; } gid; } __attribute__ ((packed)); -typedef struct tIP2PR_IPOIB_ADDR_STRUCT tIP2PR_IPOIB_ADDR_STRUCT, - *tIP2PR_IPOIB_ADDR; + /* * The two src and dst addresses are the same size as the GRH. */ -struct tIP2PR_IPOIB_HDR_STRUCT { - tIP2PR_IPOIB_ADDR_STRUCT src; - tIP2PR_IPOIB_ADDR_STRUCT dst; - tUINT16 proto; - tUINT16 reserved; +struct ip2pr_ipoib_hdr { + struct ip2pr_ipoib_addr src; + struct ip2pr_ipoib_addr dst; + u16 proto; + u16 reserved; } __attribute__ ((packed)); /* * Ethernet/IPoIB pseudo ARP header, used by out IPoIB driver. */ -struct tIP2PR_IPOIB_ARP_STRUCT { - tUINT16 addr_type; /* format of hardware address */ - tUINT16 proto_type; /* format of protocol address */ - tUINT8 addr_len; /* length of hardware address */ - tUINT8 proto_len; /* length of protocol address */ - tUINT16 cmd; /* ARP opcode (command) */ +struct ip2pr_ipoib_arp { + u16 addr_type; /* format of hardware address */ + u16 proto_type; /* format of protocol address */ + u8 addr_len; /* length of hardware address */ + u8 proto_len; /* length of protocol address */ + u16 cmd; /* ARP opcode (command) */ /* * begin ethernet */ - tUINT8 src_hw[ETH_ALEN]; - tUINT32 src_ip; - tUINT8 dst_hw[ETH_ALEN]; - tUINT32 dst_ip; + u8 src_hw[ETH_ALEN]; + u32 src_ip; + u8 dst_hw[ETH_ALEN]; + u32 dst_ip; } __attribute__ ((packed)); -typedef struct tIP2PR_IPOIB_ARP_STRUCT tIP2PR_IPOIB_ARP_STRUCT, - *tIP2PR_IPOIB_ARP; typedef enum { IP2PR_LOCK_HELD = 0, @@ -193,98 +190,90 @@ /* * tables */ -typedef struct tIP2PR_IPOIB_WAIT_STRUCT tIP2PR_IPOIB_WAIT_STRUCT, - *tIP2PR_IPOIB_WAIT; + /* * wait for an ARP event to complete. */ -struct tIP2PR_IPOIB_WAIT_STRUCT { - tINT8 type; /* ip2pr or gid2pr */ +struct ip2pr_ipoib_wait { + s8 type; /* ip2pr or gid2pr */ tIP2PR_PATH_LOOKUP_ID plid; /* request identifier */ tPTR func; /* callback function for completion */ tPTR arg; /* user argument */ struct net_device *dev; /* ipoib device */ tTS_KERNEL_TIMER_STRUCT timer; /* retry timer */ - tUINT8 retry; /* retry counter */ - tUINT8 flags; /* usage flags */ - tUINT8 state; /* current state */ - tUINT8 hw[ETH_ALEN]; /* hardware address */ - tUINT32 src_addr; /* requested address. */ - tUINT32 dst_addr; /* requested address. */ - tUINT32 gw_addr; /* next hop IP address */ - tUINT8 local_rt; /* local route only */ - tINT32 bound_dev; /* bound device interface */ + u8 retry; /* retry counter */ + u8 flags; /* usage flags */ + u8 state; /* current state */ + u8 hw[ETH_ALEN]; /* hardware address */ + u32 src_addr; /* requested address. */ + u32 dst_addr; /* requested address. */ + u32 gw_addr; /* next hop IP address */ + u8 local_rt; /* local route only */ + s32 bound_dev; /* bound device interface */ tTS_IB_GID src_gid; /* source GID */ tTS_IB_GID dst_gid; /* destination GID */ u16 pkey; /* pkey to use */ tTS_IB_PORT hw_port; /* hardware port */ struct ib_device *ca; /* hardware HCA */ - tUINT32 prev_timeout; /* timeout value for pending request */ + u32 prev_timeout; /* timeout value for pending request */ tTS_IB_CLIENT_QUERY_TID tid; /* path record lookup transactionID */ spinlock_t lock; - tIP2PR_IPOIB_WAIT next; /* next element in wait list. */ - tIP2PR_IPOIB_WAIT *p_next; /* previous next element in list */ + struct ip2pr_ipoib_wait *next; /* next element in wait list. */ + struct ip2pr_ipoib_wait **p_next; /* previous next element in list */ struct work_struct arp_completion; -}; /* tIP2PR_IPOIB_WAIT_STRUCT */ +}; -typedef struct tIP2PR_PATH_ELEMENT_STRUCT tIP2PR_PATH_ELEMENT_STRUCT, - *tIP2PR_PATH_ELEMENT; /* * wait for an ARP event to complete. */ -struct tIP2PR_PATH_ELEMENT_STRUCT { - tUINT32 src_addr; /* requested address. */ - tUINT32 dst_addr; /* requested address. */ - tUINT32 usage; /* last used time. */ +struct ip2pr_path_element { + u32 src_addr; /* requested address. */ + u32 dst_addr; /* requested address. */ + u32 usage; /* last used time. */ tTS_IB_PORT hw_port; /* source port */ struct ib_device *ca; /* hardware HCA */ struct ib_path_record path_s; /* path structure */ - tIP2PR_PATH_ELEMENT next; /* next element in wait list. */ - tIP2PR_PATH_ELEMENT *p_next; /* previous next element in list */ -}; /* tIP2PR_PATH_ELEMENT_STRUCT */ + struct ip2pr_path_element *next; /* next element in wait list. */ + struct ip2pr_path_element **p_next; /* previous next element in list */ +}; -struct tIP2PR_USER_REQ_STRUCT { +struct ip2pr_user_req { struct ib_path_record path_record; - tINT32 status; + s32 status; struct ib_device *device; tTS_IB_PORT port; struct semaphore sem; }; -typedef struct tIP2PR_USER_REQ_STRUCT tIP2PR_USER_REQ_STRUCT, *tIP2PR_USER_REQ; /* * List of Path records cached on a port on a hca */ -typedef struct tIP2PR_GID_PR_ELEMENT_STRUCT tIP2PR_GID_PR_ELEMENT_STRUCT, - *tIP2PR_GID_PR_ELEMENT; -struct tIP2PR_GID_PR_ELEMENT_STRUCT { +struct ip2pr_gid_pr_element { struct ib_path_record path_record; - tUINT32 usage; /* last used time. */ - tIP2PR_GID_PR_ELEMENT next; - tIP2PR_GID_PR_ELEMENT *p_next; + u32 usage; /* last used time. */ + struct ip2pr_gid_pr_element *next; + struct ip2pr_gid_pr_element **p_next; }; /* * List of Source GID's */ -typedef struct tIP2PR_SGID_ELEMENT_STRUCT tIP2PR_SGID_ELEMENT_STRUCT, - *tIP2PR_SGID_ELEMENT; -struct tIP2PR_SGID_ELEMENT_STRUCT { +struct ip2pr_sgid_element { tTS_IB_GID gid; struct ib_device *ca; tTS_IB_PORT port; enum ib_port_state port_state; int gid_index; - tIP2PR_GID_PR_ELEMENT pr_list; - tIP2PR_SGID_ELEMENT next; /* next element in the GID list */ - tIP2PR_SGID_ELEMENT *p_next; /* previous next element in the list */ + struct ip2pr_gid_pr_element *pr_list; + struct ip2pr_sgid_element *next; /* next element in the GID list */ + struct ip2pr_sgid_element **p_next; /* previous next element in the list */ }; -struct tIP2PR_LINK_ROOT_STRUCT { +struct ip2pr_link_root { /* * waiting for resolution table */ - tIP2PR_IPOIB_WAIT wait_list; + struct ip2pr_ipoib_wait *wait_list; kmem_cache_t *wait_cache; spinlock_t wait_lock; int max_retries; @@ -296,7 +285,7 @@ /* * path record cache list. */ - tIP2PR_PATH_ELEMENT path_list; + struct ip2pr_path_element *path_list; kmem_cache_t *path_cache; spinlock_t path_lock; /* @@ -307,14 +296,12 @@ /* * source gid list */ - tIP2PR_SGID_ELEMENT src_gid_list; + struct ip2pr_sgid_element *src_gid_list; kmem_cache_t *src_gid_cache; kmem_cache_t *gid_pr_cache; spinlock_t gid_lock; -}; /* tIP2PR_LINK_ROOT_STRUCT */ -typedef struct tIP2PR_LINK_ROOT_STRUCT tIP2PR_LINK_ROOT_STRUCT, - *tIP2PR_LINK_ROOT; +}; #define TS_EXPECT(mod, expr) #define TS_CHECK_NULL(value, result) Index: drivers/infiniband/ulp/ipoib/ip2pr_link.c =================================================================== --- drivers/infiniband/ulp/ipoib/ip2pr_link.c (revision 547) +++ drivers/infiniband/ulp/ipoib/ip2pr_link.c (working copy) @@ -35,7 +35,7 @@ static unsigned int ip2pr_path_timeout = 0; static unsigned int ip2pr_total_fail = 0; -static tIP2PR_LINK_ROOT_STRUCT _tsIp2prLinkRoot = { +static struct ip2pr_link_root _tsIp2prLinkRoot = { wait_list:NULL, path_list:NULL, wait_lock:SPIN_LOCK_UNLOCKED, @@ -47,7 +47,7 @@ gid_lock:SPIN_LOCK_UNLOCKED }; -tINT32 _tsIp2PrnDelete(tIP2PR_GID_PR_ELEMENT pr_elmt); +s32 _tsIp2PrnDelete(struct ip2pr_gid_pr_element *pr_elmt); static tTS_IB_GID nullgid = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; @@ -62,8 +62,9 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prPathElementLookup -- lookup a path record entry */ -static tIP2PR_PATH_ELEMENT _tsIp2prPathElementLookup(tUINT32 ip_addr) { - tIP2PR_PATH_ELEMENT path_elmt; +static struct ip2pr_path_element *_tsIp2prPathElementLookup(u32 ip_addr) +{ + struct ip2pr_path_element *path_elmt; for (path_elmt = _tsIp2prLinkRoot.path_list; NULL != path_elmt; path_elmt = path_elmt->next) { @@ -79,13 +80,12 @@ /* ========================================================================= */ /*.._tsIp2prPathElementCreate -- create an entry for a path record element */ -static tINT32 _tsIp2prPathElementCreate - (tUINT32 dst_addr, - tUINT32 src_addr, - tTS_IB_PORT hw_port, - struct ib_device *ca, - struct ib_path_record *path_r, tIP2PR_PATH_ELEMENT * return_elmt) { - tIP2PR_PATH_ELEMENT path_elmt; +static s32 _tsIp2prPathElementCreate(u32 dst_addr, u32 src_addr, + tTS_IB_PORT hw_port, struct ib_device *ca, + struct ib_path_record *path_r, + struct ip2pr_path_element **return_elmt) +{ + struct ip2pr_path_element *path_elmt; unsigned long flags; TS_CHECK_NULL(path_r, -EINVAL); @@ -98,7 +98,7 @@ return -ENOMEM; } /* if */ - memset(path_elmt, 0, sizeof(tIP2PR_PATH_ELEMENT_STRUCT)); + memset(path_elmt, 0, sizeof(*path_elmt)); spin_lock_irqsave(&_tsIp2prLinkRoot.path_lock, flags); path_elmt->next = _tsIp2prLinkRoot.path_list; @@ -118,7 +118,7 @@ path_elmt->hw_port = hw_port; path_elmt->ca = ca; path_elmt->usage = jiffies; - memcpy(&path_elmt->path_s, path_r, sizeof(struct ib_path_record)); + memcpy(&path_elmt->path_s, path_r, sizeof(*path_r)); *return_elmt = path_elmt; @@ -127,7 +127,8 @@ /* ========================================================================= */ /*.._tsIp2prPathElementDestroy -- destroy an entry for a path record element */ -static tINT32 _tsIp2prPathElementDestroy(tIP2PR_PATH_ELEMENT path_elmt) { +static s32 _tsIp2prPathElementDestroy(struct ip2pr_path_element *path_elmt) +{ unsigned long flags; TS_CHECK_NULL(path_elmt, -EINVAL); @@ -154,9 +155,11 @@ /* ========================================================================= */ /*.._tsIp2prPathLookupComplete -- complete the resolution of a path record */ -static tINT32 _tsIp2prPathLookupComplete - (tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, tIP2PR_PATH_ELEMENT path_elmt, tPTR funcptr, tPTR arg) { +static s32 _tsIp2prPathLookupComplete(tIP2PR_PATH_LOOKUP_ID plid, + s32 status, + struct ip2pr_path_element *path_elmt, + tPTR funcptr, tPTR arg) +{ tIP2PR_PATH_LOOKUP_FUNC func = (tIP2PR_PATH_LOOKUP_FUNC) funcptr; TS_CHECK_NULL(func, -EINVAL); @@ -184,8 +187,8 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prIpoibWaitDestroy -- destroy an entry for an outstanding request */ -static tINT32 _tsIp2prIpoibWaitDestroy - (tIP2PR_IPOIB_WAIT ipoib_wait, IP2PR_USE_LOCK use_lock) { +static s32 _tsIp2prIpoibWaitDestroy + (struct ip2pr_ipoib_wait *ipoib_wait, IP2PR_USE_LOCK use_lock) { unsigned long flags = 0; TS_CHECK_NULL(ipoib_wait, -EINVAL); @@ -214,9 +217,10 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitTimeout -- timeout function for link resolution */ -static void _tsIp2prIpoibWaitTimeout(tPTR arg) { - tIP2PR_IPOIB_WAIT ipoib_wait = (tIP2PR_IPOIB_WAIT) arg; - tINT32 result; +static void _tsIp2prIpoibWaitTimeout(tPTR arg) +{ + struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *)arg; + s32 result; if (NULL == ipoib_wait) { @@ -294,21 +298,19 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitCreate -- create an entry for an outstanding request */ -static tIP2PR_IPOIB_WAIT _tsIp2prIpoibWaitCreate - (tIP2PR_PATH_LOOKUP_ID plid, - tUINT32 dst_addr, - tUINT32 src_addr, - tUINT8 localroute, - tINT32 bound_dev_if, - tIP2PR_PATH_LOOKUP_FUNC func, tPTR arg, tINT32 ltype) { - tIP2PR_IPOIB_WAIT ipoib_wait; +static struct ip2pr_ipoib_wait * +_tsIp2prIpoibWaitCreate(tIP2PR_PATH_LOOKUP_ID plid, u32 dst_addr, u32 src_addr, + u8 localroute, u32 bound_dev_if, + tIP2PR_PATH_LOOKUP_FUNC func, tPTR arg, s32 ltype) +{ + struct ip2pr_ipoib_wait *ipoib_wait; TS_CHECK_NULL(_tsIp2prLinkRoot.wait_cache, NULL); ipoib_wait = kmem_cache_alloc(_tsIp2prLinkRoot.wait_cache, SLAB_ATOMIC); if (NULL != ipoib_wait) { - memset(ipoib_wait, 0, sizeof(tIP2PR_IPOIB_WAIT_STRUCT)); + memset(ipoib_wait, 0, sizeof(*ipoib_wait)); /* * start timer only for IP to PR lookups @@ -349,7 +351,8 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitListInsert -- insert an entry into the wait list */ -static tINT32 _tsIp2prIpoibWaitListInsert(tIP2PR_IPOIB_WAIT ipoib_wait) { +static s32 _tsIp2prIpoibWaitListInsert(struct ip2pr_ipoib_wait *ipoib_wait) +{ unsigned long flags; TS_CHECK_NULL(ipoib_wait, -EINVAL); @@ -385,9 +388,11 @@ /* ========================================================================= */ /*.._tsIp2prIpoibWaitPlidLookup -- lookup an entry for an outstanding request */ -static tIP2PR_IPOIB_WAIT tsIp2prIpoibWaitPlidLookup(tIP2PR_PATH_LOOKUP_ID plid) { +static struct ip2pr_ipoib_wait * +tsIp2prIpoibWaitPlidLookup(tIP2PR_PATH_LOOKUP_ID plid) +{ unsigned long flags; - tIP2PR_IPOIB_WAIT ipoib_wait; + struct ip2pr_ipoib_wait *ipoib_wait; spin_lock_irqsave(&_tsIp2prLinkRoot.wait_lock, flags); for (ipoib_wait = _tsIp2prLinkRoot.wait_list; @@ -405,11 +410,12 @@ /* ========================================================================= */ /*..tsIp2prPathElementTableDump - dump the path record element table to proc */ -tINT32 tsIp2prPathElementTableDump - (tSTR buffer, tINT32 max_size, tINT32 start_index, long *end_index) { - tIP2PR_PATH_ELEMENT path_elmt; - tINT32 counter = 0; - tINT32 offset = 0; +s32 tsIp2prPathElementTableDump(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) +{ + struct ip2pr_path_element *path_elmt; + s32 counter = 0; + s32 offset = 0; unsigned long flags; TS_CHECK_NULL(buffer, -EINVAL); @@ -448,12 +454,12 @@ ((path_elmt->dst_addr >> 16) & 0xff), ((path_elmt->dst_addr >> 24) & 0xff), (unsigned long long) - be64_to_cpu(*(tUINT64 *) path_elmt-> + be64_to_cpu(*(u64 *) path_elmt-> path_s.dgid), (unsigned long long) - be64_to_cpu(*(tUINT64 *) + be64_to_cpu(*(u64 *) (path_elmt->path_s.dgid + - sizeof(tUINT64))), + sizeof(u64))), path_elmt->path_s.dlid, path_elmt->path_s.slid, path_elmt->path_s.pkey, @@ -473,11 +479,13 @@ /* ========================================================================= */ /*..tsIp2prIpoibWaitTableDump - dump the address resolution wait table to proc */ -tINT32 tsIp2prIpoibWaitTableDump - (tSTR buffer, tINT32 max_size, tINT32 start_index, long *end_index) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tINT32 counter = 0; - tINT32 offset = 0; +s32 +tsIp2prIpoibWaitTableDump(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) +{ + struct ip2pr_ipoib_wait *ipoib_wait; + s32 counter = 0; + s32 offset = 0; unsigned long flags; TS_CHECK_NULL(buffer, -EINVAL); @@ -530,11 +538,10 @@ } /* tsIp2prIpoibWaitTableDump */ /* ..tsIp2prProcReadInt. dump integer value to /proc file */ -tINT32 tsIp2prProcReadInt(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index, int val) +s32 tsIp2prProcReadInt(tSTR buffer, s32 max_size, s32 start_index, + long *end_index, int val) { - tINT32 offset = 0; + s32 offset = 0; TS_CHECK_NULL(buffer, -EINVAL); TS_CHECK_NULL(end_index, -EINVAL); @@ -547,9 +554,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcRetriesRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcRetriesRead(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -559,9 +565,8 @@ } /* ..tsIp2prProcTimeoutRead. dump current timeout value */ -tINT32 tsIp2prProcTimeoutRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcTimeoutRead(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -571,9 +576,8 @@ } /* ..tsIp2prProcBackoutRead. dump current backout value */ -tINT32 tsIp2prProcBackoffRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcBackoffRead(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -583,9 +587,8 @@ } /* ..tsIp2prProcCacheTimeoutRead. dump current cache timeout value */ -tINT32 tsIp2prProcCacheTimeoutRead(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcCacheTimeoutRead(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -595,8 +598,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcTotalReq(tSTR buffer, - tINT32 max_size, tINT32 start_index, long *end_index) +s32 tsIp2prProcTotalReq(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -605,9 +608,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcArpTimeout(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcArpTimeout(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -616,9 +618,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcPathTimeout(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcPathTimeout(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -627,9 +628,8 @@ } /* ..tsIp2prProcMaxRetriesRead. dump current retry value */ -tINT32 tsIp2prProcTotalFail(tSTR buffer, - tINT32 max_size, - tINT32 start_index, long *end_index) +s32 tsIp2prProcTotalFail(tSTR buffer, s32 max_size, s32 start_index, + long *end_index) { return (tsIp2prProcReadInt(buffer, @@ -645,8 +645,8 @@ char kernel_buf[256]; int ret; - if (count > sizeof kernel_buf) { - count = sizeof kernel_buf; + if (count > sizeof(kernel_buf)) { + count = sizeof(kernel_buf); } if (copy_from_user(kernel_buf, buffer, count)) { @@ -726,12 +726,13 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prPathRecordComplete -- path lookup complete, save result */ -static tINT32 _tsIp2prPathRecordComplete - (tTS_IB_CLIENT_QUERY_TID tid, - tINT32 status, struct ib_path_record *path, tINT32 remaining, tPTR arg) { - tIP2PR_IPOIB_WAIT ipoib_wait = (tIP2PR_IPOIB_WAIT) arg; - tIP2PR_PATH_ELEMENT path_elmt = NULL; - tINT32 result; +static s32 _tsIp2prPathRecordComplete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, + struct ib_path_record *path, + s32 remaining, tPTR arg) +{ + struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *) arg; + struct ip2pr_path_element *path_elmt = NULL; + s32 result; TS_CHECK_NULL(ipoib_wait, -EINVAL); TS_CHECK_NULL(path, -EINVAL); @@ -803,9 +804,9 @@ TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, "POST: Path record lookup complete. <%016llx:%016llx:%d>", - be64_to_cpu(*(tUINT64 *) path->dgid), - be64_to_cpu(*(tUINT64 *) - (path->dgid + sizeof(tUINT64))), + be64_to_cpu(*(u64 *) path->dgid), + be64_to_cpu(*(u64 *) + (path->dgid + sizeof(u64))), path->dlid); result = _tsIp2prPathElementCreate(ipoib_wait->dst_addr, @@ -867,10 +868,11 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prLinkFindComplete -- complete the resolution of an ip address */ -static tINT32 _tsIp2prLinkFindComplete - (tIP2PR_IPOIB_WAIT ipoib_wait, tINT32 status, IP2PR_USE_LOCK use_lock) { - tINT32 result = 0; - tINT32 expect; +static s32 _tsIp2prLinkFindComplete(struct ip2pr_ipoib_wait *ipoib_wait, + s32 status, IP2PR_USE_LOCK use_lock) +{ + s32 result = 0; + s32 expect; unsigned long flags; TS_CHECK_NULL(ipoib_wait, -EINVAL); @@ -968,7 +970,8 @@ /* ========================================================================= */ /*.._tsIp2prArpQuery -- query arp cache */ -static int tsIp2prArpQuery(tIP2PR_IPOIB_WAIT ipoib_wait, tUINT32 * state) { +static int tsIp2prArpQuery(struct ip2pr_ipoib_wait *ipoib_wait, u32 * state) +{ struct neighbour *neigh; extern struct neigh_table arp_tbl; @@ -990,9 +993,10 @@ /* ========================================================================= */ /*.._tsIp2prLinkFind -- resolve an ip address to a ipoib link address. */ -static tINT32 _tsIp2prLinkFind(tIP2PR_IPOIB_WAIT ipoib_wait) { - tINT32 result; - tUINT32 state; +static s32 _tsIp2prLinkFind(struct ip2pr_ipoib_wait *ipoib_wait) +{ + s32 result; + u32 state; struct rtable *rt; char devname[20]; int i; @@ -1223,11 +1227,12 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*.._tsIp2prArpRecvComplete -- receive all ARP packets. */ -static void _tsIp2prArpRecvComplete(tPTR arg) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tIP2PR_IPOIB_WAIT next_wait; - tUINT32 ip_addr = (unsigned long)arg; - tINT32 result; +static void _tsIp2prArpRecvComplete(tPTR arg) +{ + struct ip2pr_ipoib_wait *ipoib_wait; + struct ip2pr_ipoib_wait *next_wait; + u32 ip_addr = (unsigned long)arg; + s32 result; unsigned long flags; TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, @@ -1276,12 +1281,12 @@ /* ========================================================================= */ /*.._tsIp2prArpRecv -- receive all ARP packets. */ -static tINT32 _tsIp2prArpRecv(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pt) +static s32 _tsIp2prArpRecv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tIP2PR_IPOIB_ARP arp_hdr; - tINT32 counter; + struct ip2pr_ipoib_wait *ipoib_wait; + struct ip2pr_ipoib_arp *arp_hdr; + s32 counter; unsigned long flags; struct work_struct *tqp = NULL; @@ -1289,7 +1294,7 @@ TS_CHECK_NULL(skb, -EINVAL); TS_CHECK_NULL(skb->nh.raw, -EINVAL); - arp_hdr = (tIP2PR_IPOIB_ARP) skb->nh.raw; + arp_hdr = (struct ip2pr_ipoib_arp *) skb->nh.raw; #if 0 TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, @@ -1364,11 +1369,11 @@ static void _tsIp2prAsyncEventFunc(struct ib_async_event_record *record, void *arg) { - tIP2PR_PATH_ELEMENT path_elmt; - tINT32 result; - tIP2PR_SGID_ELEMENT sgid_elmt; + struct ip2pr_path_element *path_elmt; + s32 result; + struct ip2pr_sgid_element *sgid_elmt; unsigned long flags; - tIP2PR_GID_PR_ELEMENT prn_elmt; + struct ip2pr_gid_pr_element *prn_elmt; if (NULL == record) { @@ -1412,7 +1417,7 @@ /* for now zero it. Will get it, when user queries */ memcpy(sgid_elmt->gid, nullgid, - sizeof(tTS_IB_GID)); + sizeof(nullgid)); } /* clear the Gid pr cache */ while (NULL != (prn_elmt = sgid_elmt->pr_list)) { @@ -1432,12 +1437,13 @@ /* ========================================================================= */ /*.._tsIp2prPathSweepTimerFunc --sweep path cache to reap old entries. */ -static void _tsIp2prPathSweepTimerFunc(tPTR arg) { - tIP2PR_PATH_ELEMENT path_elmt; - tIP2PR_PATH_ELEMENT next_elmt; - tINT32 result; - tIP2PR_SGID_ELEMENT sgid_elmt; - tIP2PR_GID_PR_ELEMENT prn_elmt, next_prn; +static void _tsIp2prPathSweepTimerFunc(tPTR arg) +{ + struct ip2pr_path_element *path_elmt; + struct ip2pr_path_element *next_elmt; + s32 result; + struct ip2pr_sgid_element *sgid_elmt; + struct ip2pr_gid_pr_element *prn_elmt, *next_prn; /* cache_timeout of zero implies static path records. */ if (_tsIp2prLinkRoot.cache_timeout) { @@ -1448,7 +1454,7 @@ while (NULL != path_elmt) { next_elmt = path_elmt->next; if (!((_tsIp2prLinkRoot.cache_timeout * HZ) > - (tINT32) (jiffies - path_elmt->usage))) { + (s32) (jiffies - path_elmt->usage))) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, @@ -1472,7 +1478,7 @@ while (NULL != prn_elmt) { next_prn = prn_elmt->next; if (!((_tsIp2prLinkRoot.cache_timeout * HZ) > - (tINT32) (jiffies - prn_elmt->usage))) { + (s32) (jiffies - prn_elmt->usage))) { TS_TRACE(MOD_IP2PR, T_VERY_VERBOSE, TRACE_FLOW_INOUT, @@ -1502,16 +1508,14 @@ /* --------------------------------------------------------------------- */ /* ========================================================================= */ /*..tsSdpPathRecordLookup -- resolve an ip address to a path record */ -tINT32 tsIp2prPathRecordLookup(tUINT32 dst_addr, /* NBO */ - tUINT32 src_addr, /* NBO */ - tUINT8 localroute, - tINT32 bound_dev_if, - tIP2PR_PATH_LOOKUP_FUNC func, - tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid) { - tIP2PR_PATH_ELEMENT path_elmt; - tIP2PR_IPOIB_WAIT ipoib_wait; - tINT32 result = 0; - tINT32 expect; +s32 tsIp2prPathRecordLookup(u32 dst_addr, u32 src_addr, u8 localroute, + s32 bound_dev_if, tIP2PR_PATH_LOOKUP_FUNC func, + tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid) +{ + struct ip2pr_path_element *path_elmt; + struct ip2pr_ipoib_wait *ipoib_wait; + s32 result = 0; + s32 expect; TS_CHECK_NULL(plid, -EINVAL); TS_CHECK_NULL(func, -EINVAL); @@ -1579,9 +1583,10 @@ /* ========================================================================= */ /*..tsIp2prPathRecordCancel -- cancel a lookup for an address. */ -tINT32 tsIp2prPathRecordCancel(tIP2PR_PATH_LOOKUP_ID plid) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tINT32 result; +s32 tsIp2prPathRecordCancel(tIP2PR_PATH_LOOKUP_ID plid) +{ + struct ip2pr_ipoib_wait *ipoib_wait; + s32 result; if (TS_IP2PR_PATH_LOOKUP_INVALID == plid) { @@ -1622,9 +1627,10 @@ } /* tsIp2prPathRecordCancel */ /*..tsGid2prCancel -- cancel a lookup for an address. */ -tINT32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid) { - tIP2PR_IPOIB_WAIT ipoib_wait; - tINT32 result; +s32 tsGid2prCancel(tIP2PR_PATH_LOOKUP_ID plid) +{ + struct ip2pr_ipoib_wait *ipoib_wait; + s32 result; if (TS_IP2PR_PATH_LOOKUP_INVALID == plid) { @@ -1672,12 +1678,12 @@ /* ========================================================================= */ /*.._tsIp2prGidCacheLookup -- Lookup for GID in cache */ -tINT32 _tsIp2prGidCacheLookup - (tTS_IB_GID src_gid, - tTS_IB_GID dst_gid, - struct ib_path_record *path_record, tIP2PR_SGID_ELEMENT * gid_node) { - tIP2PR_SGID_ELEMENT sgid_elmt; - tIP2PR_GID_PR_ELEMENT prn_elmt; +s32 _tsIp2prGidCacheLookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, + struct ib_path_record *path_record, + struct ip2pr_sgid_element **gid_node) +{ + struct ip2pr_sgid_element *sgid_elmt; + struct ip2pr_gid_pr_element *prn_elmt; unsigned long flags; *gid_node = NULL; @@ -1691,8 +1697,7 @@ * gid in the async handler had failed. Try to get it now. */ if (0 == - memcmp(sgid_elmt->gid, nullgid, - sizeof(tTS_IB_GID))) { + memcmp(sgid_elmt->gid, nullgid, sizeof(nullgid))) { if (ib_gid_entry_get(sgid_elmt->ca, sgid_elmt->port, 0, sgid_elmt->gid)) { @@ -1707,20 +1712,17 @@ /* we have a valid GID */ if (0 == - memcmp(sgid_elmt->gid, src_gid, - sizeof(tTS_IB_GID))) { + memcmp(sgid_elmt->gid, src_gid, sizeof(src_gid))) { *gid_node = sgid_elmt; for (prn_elmt = sgid_elmt->pr_list; NULL != prn_elmt; prn_elmt = prn_elmt->next) { if (0 == memcmp(&prn_elmt->path_record.dgid, - dst_gid, - sizeof(tTS_IB_GID))) { + dst_gid, sizeof(dst_gid))) { memcpy(path_record, &prn_elmt->path_record, - sizeof - (struct ib_path_record)); + sizeof(*path_record)); prn_elmt->usage = jiffies; spin_unlock_irqrestore @@ -1739,15 +1741,17 @@ /* ========================================================================= */ /*.._tsIp2prSrcGidNodeGet -- */ -tINT32 _tsIp2prSrcGidNodeGet(tTS_IB_GID src_gid, tIP2PR_SGID_ELEMENT * gid_node) { - tIP2PR_SGID_ELEMENT sgid_elmt; +s32 _tsIp2prSrcGidNodeGet(tTS_IB_GID src_gid, + struct ip2pr_sgid_element **gid_node) +{ + struct ip2pr_sgid_element *sgid_elmt; unsigned long flags; *gid_node = NULL; spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); for (sgid_elmt = _tsIp2prLinkRoot.src_gid_list; NULL != sgid_elmt; sgid_elmt = sgid_elmt->next) { - if (0 == memcmp(sgid_elmt->gid, src_gid, sizeof(tTS_IB_GID))) { + if (0 == memcmp(sgid_elmt->gid, src_gid, sizeof(src_gid))) { *gid_node = sgid_elmt; spin_unlock_irqrestore(&_tsIp2prLinkRoot.gid_lock, flags); @@ -1761,11 +1765,12 @@ /* ========================================================================= */ /*.._tsIp2prGidElementAdd -- Add one node to Source GID List. */ -tINT32 _tsIp2prGidElementAdd - (tIP2PR_IPOIB_WAIT ipoib_wait, struct ib_path_record *path_record) { +s32 _tsIp2prGidElementAdd(struct ip2pr_ipoib_wait *ipoib_wait, + struct ib_path_record *path_record) +{ unsigned long flags; - tIP2PR_SGID_ELEMENT gid_node = NULL; - tIP2PR_GID_PR_ELEMENT prn_elmt; + struct ip2pr_sgid_element *gid_node = NULL; + struct ip2pr_gid_pr_element *prn_elmt; if (_tsIp2prSrcGidNodeGet(ipoib_wait->src_gid, &gid_node)) { return (-EINVAL); @@ -1778,7 +1783,7 @@ return (-ENOMEM); } memcpy(&prn_elmt->path_record, path_record, - sizeof(struct ib_path_record)); + sizeof(*path_record)); /* * Insert into the ccache list @@ -1798,7 +1803,8 @@ return (0); } -tINT32 _tsIp2PrnDelete(tIP2PR_GID_PR_ELEMENT prn_elmt) { +s32 _tsIp2PrnDelete(struct ip2pr_gid_pr_element *prn_elmt) +{ if (NULL != prn_elmt->p_next) { if (NULL != prn_elmt->next) { @@ -1817,9 +1823,10 @@ /* ========================================================================= */ /*.._tsIp2prSrcGidDelete -- Cleanup one node in Source GID List. */ -tINT32 _tsIp2prSrcGidDelete(tIP2PR_SGID_ELEMENT sgid_elmt) { +s32 _tsIp2prSrcGidDelete(struct ip2pr_sgid_element *sgid_elmt) +{ unsigned long flags; - tIP2PR_GID_PR_ELEMENT prn_elmt; + struct ip2pr_gid_pr_element *prn_elmt; spin_lock_irqsave(&_tsIp2prLinkRoot.gid_lock, flags); @@ -1850,11 +1857,11 @@ /* ========================================================================= */ /*.._tsIp2prSrcGidAdd -- Add one node to Source GID List. */ -tINT32 _tsIp2prSrcGidAdd(struct ib_device *hca_device, - tTS_IB_PORT port, - enum ib_port_state port_state) +s32 _tsIp2prSrcGidAdd(struct ib_device *hca_device, + tTS_IB_PORT port, + enum ib_port_state port_state) { - tIP2PR_SGID_ELEMENT sgid_elmt; + struct ip2pr_sgid_element *sgid_elmt; unsigned long flags; sgid_elmt = @@ -1865,7 +1872,7 @@ return (-ENOMEM); } - memset(sgid_elmt, 0, sizeof(tIP2PR_SGID_ELEMENT_STRUCT)); + memset(sgid_elmt, 0, sizeof(*sgid_elmt)); if (ib_gid_entry_get(hca_device, port, 0, sgid_elmt->gid)) { kmem_cache_free(_tsIp2prLinkRoot.src_gid_cache, sgid_elmt); return (-EFAULT); @@ -1899,11 +1906,12 @@ /* ========================================================================= */ /*.._tsGid2prComplete -- path lookup complete, save result */ -static tINT32 _tsGid2prComplete - (tTS_IB_CLIENT_QUERY_TID tid, - tINT32 status, struct ib_path_record *path, tINT32 remaining, tPTR arg) { - tINT32 result; - tIP2PR_IPOIB_WAIT ipoib_wait = (tIP2PR_IPOIB_WAIT) arg; +static s32 _tsGid2prComplete(tTS_IB_CLIENT_QUERY_TID tid, s32 status, + struct ib_path_record *path, s32 remaining, + tPTR arg) +{ + s32 result; + struct ip2pr_ipoib_wait *ipoib_wait = (struct ip2pr_ipoib_wait *) arg; tGID2PR_LOOKUP_FUNC func; if (tid != ipoib_wait->tid) { @@ -1962,14 +1970,13 @@ /* ========================================================================= */ /*..tsGid2prLookup -- Resolve a destination GD to Path Record */ -tINT32 tsGid2prLookup - (tTS_IB_GID src_gid, - tTS_IB_GID dst_gid, - u16 pkey, - tGID2PR_LOOKUP_FUNC funcptr, tPTR arg, tIP2PR_PATH_LOOKUP_ID * plid) { - tIP2PR_SGID_ELEMENT gid_node; - tINT32 result; - tIP2PR_IPOIB_WAIT ipoib_wait; +s32 tsGid2prLookup(tTS_IB_GID src_gid, tTS_IB_GID dst_gid, u16 pkey, + tGID2PR_LOOKUP_FUNC funcptr, tPTR arg, + tIP2PR_PATH_LOOKUP_ID * plid) +{ + struct ip2pr_sgid_element *gid_node; + s32 result; + struct ip2pr_ipoib_wait *ipoib_wait; struct ib_path_record path_record; tGID2PR_LOOKUP_FUNC func; @@ -2020,8 +2027,8 @@ ipoib_wait->ca = gid_node->ca; ipoib_wait->hw_port = gid_node->port; ipoib_wait->pkey = pkey; - memcpy(ipoib_wait->src_gid, src_gid, sizeof(tTS_IB_GID)); - memcpy(ipoib_wait->dst_gid, dst_gid, sizeof(tTS_IB_GID)); + memcpy(ipoib_wait->src_gid, src_gid, sizeof(src_gid)); + memcpy(ipoib_wait->dst_gid, dst_gid, sizeof(dst_gid)); result = _tsIp2prIpoibWaitListInsert(ipoib_wait); if (0 > result) { @@ -2052,9 +2059,10 @@ /* ========================================================================= */ /*..tsIp2prSrcGidCleanup -- Cleanup the Source GID List. */ -tINT32 tsIp2prSrcGidCleanup(void) { - tIP2PR_SGID_ELEMENT sgid_elmt; - tINT32 result; +s32 tsIp2prSrcGidCleanup(void) +{ + struct ip2pr_sgid_element *sgid_elmt; + s32 result; while (NULL != (sgid_elmt = _tsIp2prLinkRoot.src_gid_list)) { @@ -2070,8 +2078,9 @@ /* ========================================================================= */ /*..tsIp2prSrcGidInit -- initialize the Source GID List. */ -tINT32 tsIp2prSrcGidInit(void) { - tINT32 result = 0; +s32 tsIp2prSrcGidInit(void) +{ + s32 result = 0; int i, j; struct ib_device *hca_device; struct ib_device_properties dev_prop; @@ -2079,7 +2088,7 @@ _tsIp2prLinkRoot.src_gid_cache = kmem_cache_create("Ip2prSrcGidList", sizeof - (tIP2PR_SGID_ELEMENT_STRUCT), + (struct ip2pr_sgid_element), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); @@ -2092,7 +2101,7 @@ /* if */ _tsIp2prLinkRoot.gid_pr_cache = kmem_cache_create("Ip2prGidPrList", sizeof - (tIP2PR_GID_PR_ELEMENT_STRUCT), + (struct ip2pr_gid_pr_element), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.gid_pr_cache) { @@ -2138,8 +2147,9 @@ /* ========================================================================= */ /*..tsIp2prLinkAddrInit -- initialize the advertisment caches. */ -tINT32 tsIp2prLinkAddrInit(void) { - tINT32 result = 0; +s32 tsIp2prLinkAddrInit(void) +{ + s32 result = 0; struct ib_async_event_record evt_rec; int i; struct ib_device *hca_device; @@ -2161,7 +2171,7 @@ */ _tsIp2prLinkRoot.wait_cache = kmem_cache_create("Ip2prIpoibWait", sizeof - (tIP2PR_IPOIB_WAIT_STRUCT), + (struct ip2pr_ipoib_wait), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.wait_cache) { @@ -2175,7 +2185,7 @@ /* if */ _tsIp2prLinkRoot.path_cache = kmem_cache_create("Ip2prPathLookup", sizeof - (tIP2PR_PATH_ELEMENT_STRUCT), + (struct ip2pr_path_element), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.path_cache) { @@ -2188,7 +2198,8 @@ } /* if */ _tsIp2prLinkRoot.user_req = kmem_cache_create("Ip2prUserReq", - sizeof(tIP2PR_USER_REQ_STRUCT), + sizeof + (struct ip2pr_user_req), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (NULL == _tsIp2prLinkRoot.user_req) { @@ -2286,10 +2297,11 @@ /* ========================================================================= */ /*..tsIp2prLinkAddrCleanup -- cleanup the advertisment caches. */ -tINT32 tsIp2prLinkAddrCleanup(void) { - tIP2PR_PATH_ELEMENT path_elmt; - tIP2PR_IPOIB_WAIT ipoib_wait; - tUINT32 result; +s32 tsIp2prLinkAddrCleanup(void) +{ + struct ip2pr_path_element *path_elmt; + struct ip2pr_ipoib_wait *ipoib_wait; + u32 result; int i; TS_CHECK_NULL(_tsIp2prLinkRoot.wait_cache, -EINVAL); @@ -2346,26 +2358,23 @@ /* ========================================================================= */ /*..tsIp2prCbInternal -- Callback for IP to Path Record Lookup */ -static tINT32 _tsIp2prCbInternal - (tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, - tUINT32 src_addr, - tUINT32 dst_addr, - tTS_IB_PORT hw_port, - struct ib_device *ca, struct ib_path_record *path, tPTR usr_arg) { - tIP2PR_USER_REQ ureq; +static s32 _tsIp2prCbInternal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, + u32 src_addr, u32 dst_addr, tTS_IB_PORT hw_port, + struct ib_device *ca, struct ib_path_record *path, + tPTR usr_arg) +{ + struct ip2pr_user_req *ureq; if (usr_arg == NULL) { TS_REPORT_WARN(MOD_IP2PR, "Called with a NULL usr_arg"); return -1; } - ureq = (tIP2PR_USER_REQ) usr_arg; + ureq = (struct ip2pr_user_req*) usr_arg; ureq->status = status; - if (0 == status) { - memcpy(&ureq->path_record, path, - sizeof(struct ib_path_record)); - } + if (0 == status) + memcpy(&ureq->path_record, path, sizeof(*path)); + up(&ureq->sem); /* wake up sleeping process */ return (0); @@ -2373,26 +2382,24 @@ /* ========================================================================= */ /*..tsIp2prCbInternal -- Callback for Gid to Path Record Lookup */ -static tINT32 _tsGid2prCbInternal - (tIP2PR_PATH_LOOKUP_ID plid, - tINT32 status, - tTS_IB_PORT hw_port, - struct ib_device *ca, struct ib_path_record *path, tPTR usr_arg) { - tIP2PR_USER_REQ ureq; +static s32 _tsGid2prCbInternal(tIP2PR_PATH_LOOKUP_ID plid, s32 status, + tTS_IB_PORT hw_port, struct ib_device *ca, + struct ib_path_record *path, tPTR usr_arg) +{ + struct ip2pr_user_req *ureq; if (usr_arg == NULL) { TS_REPORT_WARN(MOD_IP2PR, "Called with a NULL usr_arg"); return -1; } - ureq = (tIP2PR_USER_REQ) usr_arg; + ureq = (struct ip2pr_user_req *)usr_arg; ureq->status = status; ureq->port = hw_port; ureq->device = ca; - if (0 == status) { - memcpy(&ureq->path_record, path, - sizeof(struct ib_path_record)); - } + if (0 == status) + memcpy(&ureq->path_record, path, sizeof(*path)); + up(&ureq->sem); /* wake up sleeping process */ return (0); @@ -2400,17 +2407,17 @@ /* ========================================================================= */ /*..tsIp2prUserLookup -- Process a IP to Path Record lookup ioctl request */ -tINT32 _tsIp2prUserLookup(unsigned long arg) { - tIP2PR_USER_REQ ureq; +s32 _tsIp2prUserLookup(unsigned long arg) +{ + struct ip2pr_user_req *ureq; tIP2PR_LOOKUP_PARAM_STRUCT param; - tINT32 status; + s32 status; tIP2PR_PATH_LOOKUP_ID plid; if (0 == arg) { return (-EINVAL); } - if (copy_from_user(¶m, (tIP2PR_LOOKUP_PARAM) arg, - sizeof(tIP2PR_LOOKUP_PARAM_STRUCT))) { + if (copy_from_user(¶m, (tIP2PR_LOOKUP_PARAM) arg, sizeof(param))) { return (-EFAULT); } if (NULL == param.path_record) { @@ -2444,7 +2451,7 @@ } copy_to_user(param.path_record, &ureq->path_record, - sizeof(struct ib_path_record)); + sizeof(*param.path_record)); kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); return (0); @@ -2452,11 +2459,12 @@ /* ========================================================================= */ /*..tsGid2prUserLookup -- Process a Gid to Path Record lookup ioctl request */ -tINT32 _tsGid2prUserLookup(unsigned long arg) { - tIP2PR_USER_REQ ureq; +s32 _tsGid2prUserLookup(unsigned long arg) +{ + struct ip2pr_user_req *ureq; tGID2PR_LOOKUP_PARAM_STRUCT param; tGID2PR_LOOKUP_PARAM upa; - tINT32 status; + s32 status; tIP2PR_PATH_LOOKUP_ID plid; if (0 == arg) { @@ -2464,7 +2472,7 @@ } if (copy_from_user(¶m, (tGID2PR_LOOKUP_PARAM) arg, - sizeof(tGID2PR_LOOKUP_PARAM_STRUCT))) { + sizeof(param))) { return (-EFAULT); } @@ -2498,10 +2506,10 @@ } upa = (tGID2PR_LOOKUP_PARAM) arg; - copy_to_user(&upa->device, &ureq->device, sizeof (struct ib_device *)); - copy_to_user(&upa->port, &ureq->port, sizeof(tTS_IB_PORT)); + copy_to_user(&upa->device, &ureq->device, sizeof(upa->device)); + copy_to_user(&upa->port, &ureq->port, sizeof(upa->port)); copy_to_user(param.path_record, &ureq->path_record, - sizeof(struct ib_path_record)); + sizeof(*param.path_record)); kmem_cache_free(_tsIp2prLinkRoot.user_req, ureq); return (0); From halr at voltaire.com Fri Jul 30 11:53:06 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 14:53:06 -0400 Subject: [openib-general] more ib_verbs.h nits Message-ID: <01b001c47666$73fb3040$6401a8c0@comcast.net> enum ib_qp_attr_mask { IB_QP_STATE = 1, IB_QP_EN_SQD_ASYNC_NOTIFY = (1<<1), IB_QP_REMOTE_ATOMIC_FLAGS = (1<<3), Should IB_QP_REMOTE_ATOMIC_FLAGS be 1<<2 and so on for the bits after that one ? Also, should ib_get_special_qp be called ib_create_special_qp ? -- Hal From mshefty at ichips.intel.com Fri Jul 30 10:57:59 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 10:57:59 -0700 Subject: [openib-general] more ib_verbs.h nits In-Reply-To: <01b001c47666$73fb3040$6401a8c0@comcast.net> References: <01b001c47666$73fb3040$6401a8c0@comcast.net> Message-ID: <20040730105759.38b41760.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 14:53:06 -0400 Hal Rosenstock wrote: > enum ib_qp_attr_mask { > IB_QP_STATE = 1, > IB_QP_EN_SQD_ASYNC_NOTIFY = (1<<1), > IB_QP_REMOTE_ATOMIC_FLAGS = (1<<3), > > Should IB_QP_REMOTE_ATOMIC_FLAGS be 1<<2 and so on for the bits after that > one ? I had removed the mask that was (1<<2) and didn't updates the other values. I was just going to insert a new entry at that location. (E.g. I need to add support to resize the QP.) > Also, should ib_get_special_qp be called ib_create_special_qp ? I used "get" to better match the terminology given in the spec. (It's probably what VAPI used as well.) Btw, please continue to send nits like these. From mshefty at ichips.intel.com Fri Jul 30 11:29:41 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 11:29:41 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <52acxix5j8.fsf@topspin.com> References: <20040728144612.3dc1eebc.mshefty@ichips.intel.com> <52r7qvwi1d.fsf@topspin.com> <20040728213645.042a4315.mshefty@ichips.intel.com> <52acxix5j8.fsf@topspin.com> Message-ID: <20040730112941.661ac601.mshefty@ichips.intel.com> On Thu, 29 Jul 2004 07:16:27 -0700 Roland Dreier wrote: > Sean> Ah - didn't realize you were compiling and testing already. > > Yup... those patches I've been posting are also tested (and checked > into https://openib.org/svn/gen2/branches/roland-merge/). I started taking a look at the code under the tree mentioned above and bringing some of the changes back into ib_verbs.h. Some questions: Which layer were you expecting to perform reference counting? Which layer were you expecting to set the values in struct ib_xxx? From Tom.Duffy at Sun.COM Fri Jul 30 13:34:15 2004 From: Tom.Duffy at Sun.COM (Tom Duffy) Date: Fri, 30 Jul 2004 13:34:15 -0700 Subject: [openib-general] reply-to munging Message-ID: <1091219655.3942.1263.camel@localhost> Is there a reason why reply-to munging has been turned off? -tduffy P.S. I did it manually for this mail. From halr at voltaire.com Fri Jul 30 14:36:01 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 17:36:01 -0400 Subject: [openib-general] ib_post_send ib_send_wr question Message-ID: <01d101c4767d$367bf760$6401a8c0@comcast.net> I may have missed this in the email thread on this. struct ib_send_wr { ... struct { struct ib_ah *ah; u32 remote_qpn; u32 remote_qkey; u16 pkey_index; } ud; } wr; }; When sending UD, there is a pkey_index included in the structure. In VAPI/EVAPI, there were two post sends: the normal VAPI one which does not take a PKey index and the EVAPI one for the GSI which does take a PKey index. It looks like we have collapsed the two into one, but there is no way to indicate whether the PKey index is present or not. I think a flag is needed for this to indicate whether the index is present or not, otherwise it always needs to be supplied. Am I missing something ?. -- Hal From mshefty at ichips.intel.com Fri Jul 30 13:48:33 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 13:48:33 -0700 Subject: [openib-general] ib_post_send ib_send_wr question In-Reply-To: <01d101c4767d$367bf760$6401a8c0@comcast.net> References: <01d101c4767d$367bf760$6401a8c0@comcast.net> Message-ID: <20040730134833.490948fd.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 17:36:01 -0400 Hal Rosenstock wrote: > I may have missed this in the email thread on this. > > struct ib_send_wr { > ... > struct { > struct ib_ah *ah; > u32 remote_qpn; > u32 remote_qkey; > u16 pkey_index; > } ud; > } wr; > }; > > When sending UD, there is a pkey_index included in the structure. > > In VAPI/EVAPI, there were two post sends: the normal VAPI one which does not > take a PKey index and the EVAPI one for the GSI which does take a PKey > index. It looks like we have collapsed the two into one, but there is no way > to indicate whether the PKey index is present or not. I think a flag is > needed for this to indicate whether the index is present or not, otherwise > it always needs to be supplied. Am I missing something ?. Doesn't the hardware expect a pkey for any send posted to QP1? If so, we'll know based on which QP the send is posted. We should be able to add an assertion similar to this: ASSERT( qp->qp_num == IB_QP1 || wr->pkey_index == 0 ); From halr at voltaire.com Fri Jul 30 15:11:14 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 18:11:14 -0400 Subject: [openib-general] ib_post_send ib_send_wr question References: <01d101c4767d$367bf760$6401a8c0@comcast.net> <20040730134833.490948fd.mshefty@ichips.intel.com> Message-ID: <01e001c47682$21c1edc0$6401a8c0@comcast.net> Sean Hefty wrote: > Doesn't the hardware expect a pkey for any send posted to QP1? If > so, we'll know based on which QP the send is posted. We should be > able to add an assertion similar to this: > > ASSERT( qp->qp_num == IB_QP1 || wr->pkey_index == 0 ); I think you meant && rather than ||. Are indices 1 based rather than 0 based (so 0 can mean invalid) ? I would rather have a flag to indicate the presence or absence of the pkey_index in the ud structure. -- Hal From mshefty at ichips.intel.com Fri Jul 30 14:21:42 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 14:21:42 -0700 Subject: [openib-general] ib_post_send ib_send_wr question In-Reply-To: <01e001c47682$21c1edc0$6401a8c0@comcast.net> References: <01d101c4767d$367bf760$6401a8c0@comcast.net> <20040730134833.490948fd.mshefty@ichips.intel.com> <01e001c47682$21c1edc0$6401a8c0@comcast.net> Message-ID: <20040730142142.566aa27c.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 18:11:14 -0400 Hal Rosenstock wrote: > > ASSERT( qp->qp_num == IB_QP1 || wr->pkey_index == 0 ); > > I think you meant && rather than ||. The OR was intended. The pkey index must either be set to 0 or the QP must be 1. > Are indices 1 based rather than 0 based (so 0 can mean invalid) ? > I would rather have a flag to indicate the presence or absence of the > pkey_index in the ud structure. I think the flag is the qp_num. If it's IB_QP1, then the pkey index must be set. If it's not IB_QP1, then it should be 0 (or is ignored). From halr at voltaire.com Fri Jul 30 15:37:08 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 18:37:08 -0400 Subject: [openib-general] ib_post_send ib_send_wr question References: <01d101c4767d$367bf760$6401a8c0@comcast.net> <20040730134833.490948fd.mshefty@ichips.intel.com> <01e001c47682$21c1edc0$6401a8c0@comcast.net> <20040730142142.566aa27c.mshefty@ichips.intel.com> Message-ID: <01ee01c47685$c058f5c0$6401a8c0@comcast.net> Sean Hefty wrote: > On Fri, 30 Jul 2004 18:11:14 -0400 > Hal Rosenstock wrote: > >>> ASSERT( qp->qp_num == IB_QP1 || wr->pkey_index == 0 ); >> >> I think you meant && rather than ||. > > The OR was intended. The pkey index must either be set to 0 or the > QP must be 1. I think the pkey_index needs to be ignored if it is not QP1. This should be noted in the API and something the driver will need to take care of. I'm wondering about redirected QPs :-) Are they locked on a single PKey index or do they need to work the same way QP1 does ? -- Hal From mshefty at ichips.intel.com Fri Jul 30 14:44:19 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 14:44:19 -0700 Subject: [openib-general] ib_post_send ib_send_wr question In-Reply-To: <01ee01c47685$c058f5c0$6401a8c0@comcast.net> References: <01d101c4767d$367bf760$6401a8c0@comcast.net> <20040730134833.490948fd.mshefty@ichips.intel.com> <01e001c47682$21c1edc0$6401a8c0@comcast.net> <20040730142142.566aa27c.mshefty@ichips.intel.com> <01ee01c47685$c058f5c0$6401a8c0@comcast.net> Message-ID: <20040730144419.3446bc5a.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 18:37:08 -0400 Hal Rosenstock wrote: > Sean Hefty wrote: > I think the pkey_index needs to be ignored if it is not QP1. This should > be noted in the API and something the driver will need to take care of. agreed > I'm wondering about redirected QPs :-) Are they locked on a single PKey > index or do they need to work the same way QP1 does ? I believe that only QP1 has the ability to accept packets from any partition, and hence, use any PKey. I'm not sure that we can assume that hardware will provide this feature on any given QP. I *think* redirection needs to go to separate QPs for separate PKeys. From halr at voltaire.com Fri Jul 30 16:07:24 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 19:07:24 -0400 Subject: [openib-general] ib_post_send ib_send_wr question References: <01d101c4767d$367bf760$6401a8c0@comcast.net> <20040730134833.490948fd.mshefty@ichips.intel.com> <01e001c47682$21c1edc0$6401a8c0@comcast.net> <20040730142142.566aa27c.mshefty@ichips.intel.com> <01ee01c47685$c058f5c0$6401a8c0@comcast.net> <20040730144419.3446bc5a.mshefty@ichips.intel.com> Message-ID: <021601c47689$fa70a560$6401a8c0@comcast.net> Sean Hefty wrote: > I believe that only QP1 has the ability to accept packets from any > partition, and hence, use any PKey. I'm not sure that we can assume > that hardware will provide this feature on any given QP. I *think* > redirection needs to go to separate QPs for separate PKeys. Yes, you are right. The special PKey matching is only for QP1. It also needs to be able to send on any of the port's PKeys. Other QPs are "locked" on a single PKey index. C10-133: Packets sent from the Send Queue of a GSI QP shall attach a P_Key associated with that QP, just as a P_Key is associated with nonmanagement QPs. -- Hal From mshefty at ichips.intel.com Fri Jul 30 15:07:23 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 15:07:23 -0700 Subject: [openib-general] ib_query/modify_hca_xxx in ib_verbs In-Reply-To: <20040730102022.7f0e4e0b.mshefty@ichips.intel.com> References: <00ae01c47626$809e2860$6401a8c0@comcast.net> <20040730102022.7f0e4e0b.mshefty@ichips.intel.com> Message-ID: <20040730150723.77bb9aad.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 10:20:22 -0700 Sean Hefty wrote: > On Fri, 30 Jul 2004 07:15:19 -0400 > Hal Rosenstock wrote: > > > Should the ib_query/modify_hca_xxx calls now be ib_query/modify_device_xxx > > calls ? > > Here's a patch that renames these calls and simplifies the query pkey/gid routines. Unless there are objections, I will update the file. I have committed this change. From mshefty at ichips.intel.com Fri Jul 30 15:17:13 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 15:17:13 -0700 Subject: [openib-general] Update from merging in Roland's changes... Message-ID: <20040730151713.52bc6b55.mshefty@ichips.intel.com> I've updated ib_verbs based on changes that Roland had in his copy of the file. Specifically, I've updated the ib_xxx structures to include pointers to referenced items, added a usecnt field, and converted the functions from prototypes into static inline routines. The patch for this update is listed below. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 547) +++ ib_verbs.h (working copy) @@ -26,6 +26,11 @@ #if !defined( IB_VERBS_H ) #define IB_VERBS_H +#include +#include + +struct ib_device; + enum ib_event_type { IB_EVENT_CQ_ERR, IB_EVENT_QP_FATAL, @@ -60,39 +65,56 @@ struct ib_pd { struct ib_device *device; + atomic_t usecnt; }; struct ib_ah { struct ib_device *device; + struct ib_pd *pd; + atomic_t usecnt; }; struct ib_cq { struct ib_device *device; ib_comp_handler comp_handler; void *cq_context; + int cqe; + atomic_t usecnt; }; struct ib_srq { struct ib_device *device; + struct ib_pd *pd; void *srq_context; + atomic_t usecnt; }; struct ib_qp { struct ib_device *device; + struct ib_pd *pd; + struct ib_cq *send_cq; + struct ib_cq *recv_cq; void *qp_context; u32 qp_num; + atomic_t usecnt; }; struct ib_mr { struct ib_device *device; + struct ib_pd *pd; + atomic_t usecnt; }; struct ib_mw { struct ib_device *device; + struct ib_pd *pd; + atomic_t usecnt; }; struct ib_fmr { struct ib_device *device; + struct ib_pd *pd; + atomic_t usecnt; }; enum ib_device_cap_flags { @@ -212,6 +234,7 @@ struct ib_port { enum ib_port_state state; enum ib_mtu max_mtu; + enum ib_mtu active_mtu; int port_cap_flags; int gid_tbl_len; u32 max_msg_sz; @@ -374,7 +397,6 @@ }; struct ib_srq_attr { - void *srq_context; int max_wr; int max_sge; int srq_limit; @@ -394,7 +416,6 @@ }; struct ib_mr_attr { - struct ib_pd *pd; u64 device_virt_addr; u64 size; int mr_access_flags; @@ -462,7 +483,7 @@ struct ib_ah *ah; u32 remote_qpn; u32 remote_qkey; - u16 pkey_index; + u16 pkey_index; /* valid for GSI only */ } ud; } wr; }; @@ -534,184 +555,338 @@ IB_CQ_NEXT_COMP }; -int ib_query_hca_cap(struct ib_device *device, - struct ib_device_cap *device_cap); - -int ib_query_hca_port_prop(struct ib_device *device, - u8 port_num, - struct ib_port *port); - -int ib_query_hca_gid_tbl(struct ib_device *device, - u8 port_num, - int tbl_len_in, - int *tbl_len_out, - union ib_gid *gid_tbl); - -int ib_query_hca_pkey_tbl(struct ib_device *device, - u8 port_num, - int tbl_len_in, - int *tbl_len_out, - u16 *pkey_tbl); - -int ib_modify_hca_attr(struct ib_device *device, - u8 port_num, - int device_attr_flags); - -struct ib_pd *ib_alloc_pd(struct ib_device *device); - -int ib_dealloc_pd(struct ib_pd *pd); - -struct ib_ah *ib_create_ah(struct ib_pd *pd, - struct ib_ah_attr *ah_attr); - -int ib_modify_ah(struct ib_ah *ah, - struct ib_ah_attr *ah_attr); - -int ib_query_ah(struct ib_ah *ah, - struct ib_ah_attr *ah_attr); - -int ib_destroy_ah(struct ib_ah *ah); - -struct ib_qp *ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); - -int ib_modify_qp(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_cap *qp_cap ); - -int ib_query_qp(struct ib_qp *qp, - struct ib_qp_attr *qp_attr, - int qp_attr_mask, - struct ib_qp_init_attr *qp_init_attr); - -int ib_destroy_qp(struct ib_qp *qp); - -struct ib_qp *ib_get_special_qp(struct ib_pd *pd, - u8 port_num, - enum ib_qp_type qp_type, - struct ib_qp_init_attr *qp_init_attr, - struct ib_qp_cap *qp_cap); - -struct ib_srq *ib_create_srq(struct ib_pd *pd, - struct ib_srq_attr *srq_attr); - -int ib_query_srq(struct ib_srq *srq, - struct ib_pd **pd, - struct ib_srq_attr *srq_attr); - -int ib_modify_srq(struct ib_srq *srq, - struct ib_pd *pd, - struct ib_srq_attr *srq_attr, - int srq_attr_mask); - -int ib_destroy_srq(struct ib_srq *srq); - -int ib_post_srq(struct ib_srq *srq, - struct ib_recv_wr *recv_wr, - struct ib_recv_wr **bad_recv_wr); - -struct ib_cq *ib_create_cq(struct ib_device *device, - ib_comp_handler comp_handler, - void *cq_context, - int *cqe); - -int ib_query_cq(struct ib_cq *cq, - void *cq_context, - int *cqe); - -int ib_resize_cq(struct ib_cq *cq, - int *cqe); - -int ib_destroy_cq(struct ib_cq *cq); +static inline int ib_query_device(struct ib_device *device, + struct ib_device_cap *device_cap) +{ + return device->query_device(device, device_cap); +} + +static inline int ib_query_port(struct ib_device *device, + u8 port_num, + struct ib_port *port) +{ + return device->query_port(device, port_num, port); +} + +static inline int ib_query_gid(struct ib_device *device, + u8 port_num, + int index, + union ib_gid *gid) +{ + return device->query_gid(device, port_num, index, gid); +} + +static inline int ib_query_pkey(struct ib_device *device, + u8 port_num, + u16 index, + u16 *pkey) +{ + return device->query_pkey(device, port_num, index, pkey); +} + +static inline int ib_modify_device(struct ib_device *device, + u8 port_num, + int device_attr_flags) +{ + return device->modify_device(device, port_num, device_attr_flags); +} + +static inline struct ib_pd *ib_alloc_pd(struct ib_device *device) +{ + return device->alloc_pd(device); +} + +static inline int ib_dealloc_pd(struct ib_pd *pd) +{ + return pd->device->dealloc_pd(pd); +} + +static inline struct ib_ah *ib_create_ah(struct ib_pd *pd, + struct ib_ah_attr *ah_attr) +{ + return pd->device->create_ah(pd, ah_attr); +} + +static inline int ib_modify_ah(struct ib_ah *ah, + struct ib_ah_attr *ah_attr) +{ + return ah->device->modify_ah(ah, ah_attr); +} + +static inline int ib_query_ah(struct ib_ah *ah, + struct ib_ah_attr *ah_attr) +{ + return ah->device->query_ah(ah, ah_attr); +} + +static inline int ib_destroy_ah(struct ib_ah *ah) +{ + return ah->device->destroy_ah(ah); +} + +static inline struct ib_qp *ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap) +{ + return pd->device->create_qp(pd, qp_init_attr, qp_cap); +} + +static inline int ib_modify_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_cap *qp_cap ) +{ + return qp->device->modify_qp(qp, qp_attr, qp_attr_mask, qp_cap); +} + +static inline int ib_query_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr) +{ + return qp->device->query_qp(qp, qp_attr, qp_attr_mask, qp_init_attr); +} + +static inline int ib_destroy_qp(struct ib_qp *qp) +{ + return qp->device->destroy_qp(qp); +} + +static inline struct ib_qp *ib_get_special_qp(struct ib_pd *pd, + u8 port_num, + enum ib_qp_type qp_type, + struct ib_qp_init_attr *qp_init_attr, + struct ib_qp_cap *qp_cap) +{ + return pd->device->get_special_qp(pd, port_num, qp_type, + qp_init_attr, qp_cap); +} + +static inline struct ib_srq *ib_create_srq(struct ib_pd *pd, + void *srq_context; + struct ib_srq_attr *srq_attr) +{ + return pd->device->create_srq(pd, srq_attr); +} + +static inline int ib_query_srq(struct ib_srq *srq, + struct ib_srq_attr *srq_attr) +{ + return srq->device->query_srq(srq, srq_attr); +} + +static inline int ib_modify_srq(struct ib_srq *srq, + struct ib_pd *pd, + struct ib_srq_attr *srq_attr, + int srq_attr_mask) +{ + return srq->device->modify_srq(srq, pd, srq_attr, srq_attr_mask); +} + +static inline int ib_destroy_srq(struct ib_srq *srq) +{ + return srq->device->destroy_srq(srq); +} + +static inline int ib_post_srq(struct ib_srq *srq, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr) +{ + return srq->device->post_srq(srq, recv_wr, bad_recv_wr); +} + +static inline struct ib_cq *ib_create_cq(struct ib_device *device, + ib_comp_handler comp_handler, + void *cq_context, + int *cqe) +{ + return device->create_cq(device, comp_handler, cq_context, cqe); +} + +static inline int ib_resize_cq(struct ib_cq *cq, + int *cqe) +{ + return cq->device->resize_cq(cq, cqe); +} + +static inline int ib_destroy_cq(struct ib_cq *cq) +{ + return cq->device->destroy_cq(cq); +} /* in functions below iova_start is in/out parameter */ -struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start, +static inline struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey) +{ + return pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, + mr_access_flags, iova_start, lkey, rkey); +} + +static inline int ib_query_mr(struct ib_mr *mr, + struct ib_mr_attr *mr_attr) +{ + return mr->device->query_mr(mr, mr_attr); +} + +static inline int ib_dereg_mr(struct ib_mr *mr) +{ + return mr->device->dereg_mr(mr); +} + +static inline int ib_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start, + u32 *lkey, + u32 *rkey) +{ + return mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, phys_buf_array, + num_phys_buf, mr_access_flags, + iova_start, lkey, rkey); +} + +static inline struct ib_mw *ib_alloc_mw(struct ib_pd *pd, + u32 *rkey) +{ + return pd->device->allow_mw(pd, rkey); +} + +static inline int ib_query_mw(struct ib_mw *mw, + u32 *rkey, + struct ib_pd **pd) +{ + return mw->device->query_mw(mw, rkey, pd); +} + +static inline int ib_bind_mw(struct ib_qp *qp, + struct ib_mw *mw, + struct ib_mw_bind *mw_bind, + u32 *rkey) +{ + return mw->device->bind_mw(qp, mw, mw_bind, rkey); +} + +static inline int ib_dealloc_mw(struct ib_mw *mw) +{ + return mw->device->dealloc_mw(mw); +} + +static inline struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr) +{ + return pd->device->alloc_fmr(pd, mr_access_flags, fmr_attr); +} + +static inline int ib_map_fmr(struct ib_fmr *fmr, + void *addr, + u64 size, u32 *lkey, - u32 *rkey); - -int ib_query_mr(struct ib_mr *mr, - struct ib_mr_attr *mr_attr); - -int ib_dereg_mr(struct ib_mr *mr); - -int ib_rereg_phys_mr(struct ib_mr *mr, - int mr_rereg_mask, - struct ib_pd *pd, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey); - -struct ib_mw *ib_alloc_mw(struct ib_pd *pd, - u32 *rkey); - -int ib_query_mw(struct ib_mw *mw, - u32 *rkey, - struct ib_pd **pd); - -int ib_bind_mw(struct ib_qp *qp, - struct ib_mw *mw, - struct ib_mw_bind *mw_bind, - u32 *rkey); - -int ib_dealloc_mw(struct ib_mw *mw); - -struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd, - int mr_access_flags, - struct ib_fmr_attr *fmr_attr); - -int ib_map_fmr(struct ib_fmr *fmr, - void *addr, - u64 size, - u32 *lkey, - u32 *rkey); - -int ib_map_phys_fmr(struct ib_fmr *fmr, - struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - u32 *lkey, - u32 *rkey); - -int ib_unmap_fmr(struct ib_fmr **fmr_array, - int fmr_cnt); - -int ib_free_fmr(struct ib_fmr *fmr); - -int ib_attach_mcast(struct ib_qp *qp, - union ib_gid *gid, - u16 lid); - -int ib_detach_mcast(struct ib_qp *qp, - union ib_gid *gid, - u16 lid); - -int ib_post_send(struct ib_qp *qp, - struct ib_send_wr *send_wr, - struct ib_send_wr **bad_send_wr); - -int ib_post_recv(struct ib_qp *qp, - struct ib_recv_wr *recv_wr, - struct ib_recv_wr **bad_recv_wr); - -int ib_poll_cq(struct ib_cq *cq, - int num_entries, - struct ib_wc *wc_array); - -int ib_peek_cq(struct ib_cq *cq, - int wc_cnt); - -int ib_req_notify_cq(struct ib_cq *cq, - enum ib_cq_notify cq_notify); - -int ib_req_n_notify_cq(struct ib_cq *cq, - int wc_cnt); + u32 *rkey) +{ + return fmr->device->map_fmr(fmr, addr, size, lkey, rkey); +} + +static inline int ib_map_phys_fmr(struct ib_fmr *fmr, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + u32 *lkey, + u32 *rkey) +{ + return fmr->device->map_phys_fmr(fmr, phys_buf_array, num_phys_buf, + lkey, rkey); +} + +/* Need to discuss this... */ +static inline int ib_unmap_fmr(struct ib_fmr **fmr_array, + int fmr_cnt) +{ + /* Requires all FMRs to come from same device. */ + return fmr_array[0]->device->unmap_fmr(fmr_array, fmr_cnt); +} + +static inline int ib_free_fmr(struct ib_fmr *fmr) +{ + return fmr->device->free_fmr(fmr); +} + +static inline int ib_attach_mcast(struct ib_qp *qp, + union ib_gid *gid, + u16 lid) +{ + return qp->device->attach_mcast(qp, gid, lid); +} + +static inline int ib_detach_mcast(struct ib_qp *qp, + union ib_gid *gid, + u16 lid) +{ + return qp->device->detach_mcast(qp, gid, lid); +} + +static inline int ib_post_send(struct ib_qp *qp, + struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr) +{ + return qp->device->post_send(qp, send_wr, bad_send_wr); +} + +static inline int ib_post_recv(struct ib_qp *qp, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr) +{ + return qp->device->post_recv(qp, recv_wr, bad_recv_wr); +} + +/** + * ib_poll_cq - poll a CQ for completion(s) + * @cq:the CQ being polled + * @num_entries:maximum number of completions to return + * @wc:array of at least @num_entries &struct ib_wc where completions + * will be returned + * + * Poll a CQ for (possibly multiple) completions. If the return value + * is < 0, an error occurred. If the return value is >= 0, it is the + * number of completions returned. If the return value is + * non-negative and < num_entries, then the CQ was emptied. + */ +static inline int ib_poll_cq(struct ib_cq *cq, + int num_entries, + struct ib_wc *wc_array) +{ + return cq->device->poll_cq(cq, num_entries, wc); +} + +static inline int ib_peek_cq(struct ib_cq *cq, + int wc_cnt) +{ + return cq->device->peek_cq(cq, wc_cnt); +} + +/** + * ib_req_notify_cq - request completion notification + * @cq:the CQ to generate an event for + * @cq_notify:%IB_CQ_SOLICITED for next solicited event, + * %IB_CQ_NEXT_COMP for any completion. + */ +static inline int ib_req_notify_cq(struct ib_cq *cq, + enum ib_cq_notify cq_notify) +{ + return cq->device->req_notify_cq(cq, cq_notify); +} + +static inline int ib_req_n_notify_cq(struct ib_cq *cq, + int wc_cnt) +{ + return cq->device->req_n_notify_cq(cq, wc_cnt); +} #endif /* IB_VERBS_H */ From mshefty at ichips.intel.com Fri Jul 30 15:32:53 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 15:32:53 -0700 Subject: [openib-general] Update from merging in Roland's changes... In-Reply-To: <20040730151713.52bc6b55.mshefty@ichips.intel.com> References: <20040730151713.52bc6b55.mshefty@ichips.intel.com> Message-ID: <20040730153253.038876fe.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 15:17:13 -0700 Sean Hefty wrote: > I've updated ib_verbs based on changes that Roland had in his copy of the file. Specifically, I've updated the ib_xxx structures to include pointers to referenced items, added a usecnt field, and converted the functions from prototypes into static inline routines. The patch for this update is listed below. Here's a separate patch to include the lkey/rkey in ib_mr, ib_mw, and ib_fmr. With these fields included in the structures, the lkey/rkey parameters can be removed from several calls. I have not committed these changes. I would like to reach an agreement whether or not we want these changes. - Sean Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 548) +++ ib_verbs.h (working copy) @@ -102,18 +102,23 @@ struct ib_mr { struct ib_device *device; struct ib_pd *pd; + u32 lkey; + u32 rkey; atomic_t usecnt; }; struct ib_mw { struct ib_device *device; struct ib_pd *pd; + u32 rkey; atomic_t usecnt; }; struct ib_fmr { struct ib_device *device; struct ib_pd *pd; + u32 lkey; + u32 rkey; atomic_t usecnt; }; @@ -419,8 +424,6 @@ u64 device_virt_addr; u64 size; int mr_access_flags; - u32 lkey; - u32 rkey; }; enum ib_mr_rereg_flags { @@ -719,12 +722,10 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey) + u64 *iova_start) { return pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, - mr_access_flags, iova_start, lkey, rkey); + mr_access_flags, iova_start); } static inline int ib_query_mr(struct ib_mr *mr, @@ -744,34 +745,23 @@ struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, - u64 *iova_start, - u32 *lkey, - u32 *rkey) + u64 *iova_start) { return mr->device->rereg_phys_mr(mr, mr_rereg_mask, pd, phys_buf_array, num_phys_buf, mr_access_flags, - iova_start, lkey, rkey); + iova_start); } -static inline struct ib_mw *ib_alloc_mw(struct ib_pd *pd, - u32 *rkey) +static inline struct ib_mw *ib_alloc_mw(struct ib_pd *pd) { - return pd->device->allow_mw(pd, rkey); -} - -static inline int ib_query_mw(struct ib_mw *mw, - u32 *rkey, - struct ib_pd **pd) -{ - return mw->device->query_mw(mw, rkey, pd); + return pd->device->alloc_mw(pd); } static inline int ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw, - struct ib_mw_bind *mw_bind, - u32 *rkey) + struct ib_mw_bind *mw_bind) { - return mw->device->bind_mw(qp, mw, mw_bind, rkey); + return mw->device->bind_mw(qp, mw, mw_bind); } static inline int ib_dealloc_mw(struct ib_mw *mw) @@ -788,21 +778,16 @@ static inline int ib_map_fmr(struct ib_fmr *fmr, void *addr, - u64 size, - u32 *lkey, - u32 *rkey) + u64 size) { - return fmr->device->map_fmr(fmr, addr, size, lkey, rkey); + return fmr->device->map_fmr(fmr, addr, size); } static inline int ib_map_phys_fmr(struct ib_fmr *fmr, struct ib_phys_buf *phys_buf_array, - int num_phys_buf, - u32 *lkey, - u32 *rkey) + int num_phys_buf) { - return fmr->device->map_phys_fmr(fmr, phys_buf_array, num_phys_buf, - lkey, rkey); + return fmr->device->map_phys_fmr(fmr, phys_buf_array, num_phys_buf); } /* Need to discuss this... */ From halr at voltaire.com Fri Jul 30 16:43:26 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 19:43:26 -0400 Subject: [openib-general] Update from merging in Roland's changes... References: <20040730151713.52bc6b55.mshefty@ichips.intel.com> Message-ID: <022c01c4768f$03795260$6401a8c0@comcast.net> Sean Hefty wrote: > I've updated ib_verbs based on changes that Roland had in his copy of > the file. Specifically, I've updated the ib_xxx structures to > include pointers to referenced items, added a usecnt field, and > converted the functions from prototypes into static inline routines. > The patch for this update is listed below. Can you pick up the ib_device struct too (so device->xxxx can be dereferenced) ? Also, a few typos (aka nits) :-) > +static inline struct ib_srq *ib_create_srq(struct ib_pd *pd, > + void *srq_context; s.b. void *srq_context, > + struct ib_srq_attr *srq_attr) > +/** > + * ib_poll_cq - poll a CQ for completion(s) > + * @cq:the CQ being polled > + * @num_entries:maximum number of completions to return > + * @wc:array of at least @num_entries &struct ib_wc where completions > + * will be returned > + * > + * Poll a CQ for (possibly multiple) completions. If the return > value + * is < 0, an error occurred. If the return value is >= 0, it > is the + * number of completions returned. If the return value is > + * non-negative and < num_entries, then the CQ was emptied. > + */ > +static inline int ib_poll_cq(struct ib_cq *cq, > + int num_entries, > + struct ib_wc *wc_array) > +{ > + return cq->device->poll_cq(cq, num_entries, wc); s.b. return cq->device->poll_cq(cq, num_entries, wc_array); > +} From halr at voltaire.com Fri Jul 30 17:05:37 2004 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 30 Jul 2004 20:05:37 -0400 Subject: [openib-general] Update from merging in Roland's changes... Message-ID: <024601c47692$1cf2e640$6401a8c0@comcast.net> Sean Hefty wrote: > I've updated ib_verbs based on changes that Roland had in his copy of > the file. Specifically, I've updated the ib_xxx structures to > include pointers to referenced items, added a usecnt field, and > converted the functions from prototypes into static inline routines. > The patch for this update is listed below. One more question: > +static inline struct ib_srq *ib_create_srq(struct ib_pd *pd, > + void *srq_context; s.b. void *srq_context, > + struct ib_srq_attr *srq_attr) +{ + return pd->device->create_srq(pd, srq_attr); +} Shouldn't this call also pass the srq_context in ? From mshefty at ichips.intel.com Fri Jul 30 18:48:02 2004 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 30 Jul 2004 18:48:02 -0700 Subject: [openib-general] Update from merging in Roland's changes... In-Reply-To: <022c01c4768f$03795260$6401a8c0@comcast.net> References: <20040730151713.52bc6b55.mshefty@ichips.intel.com> <022c01c4768f$03795260$6401a8c0@comcast.net> Message-ID: <20040730184802.4af39618.mshefty@ichips.intel.com> On Fri, 30 Jul 2004 19:43:26 -0400 Hal Rosenstock wrote: > Can you pick up the ib_device struct too (so device->xxxx can be > dereferenced) ? It's on my todo list... > Also, a few typos (aka nits) :-) > > +static inline struct ib_srq *ib_create_srq(struct ib_pd *pd, > > + void *srq_context; > > s.b. > void *srq_context, fixed both issues > > +static inline int ib_poll_cq(struct ib_cq *cq, > > + int num_entries, > > + struct ib_wc *wc_array) > > +{ > > + return cq->device->poll_cq(cq, num_entries, wc); > > s.b. > return cq->device->poll_cq(cq, num_entries, wc_array); > > > +} > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general fixed From yaronh at voltaire.com Sat Jul 31 01:53:53 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Sat, 31 Jul 2004 11:53:53 +0300 Subject: [openib-general] gen2 dev branch Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AA8B@taurus.voltaire.com> On Friday, July 30, 2004 1:52 AM, Roland Dreier wrote: > Yaron> Roland approach is to have a chain of filters > Yaron> (hca/port/qpn/class/method/attrib/dir/mask/name) that > Yaron> forwards the MAD to multiple consumers based on the filter > Yaron> list, the consumer must copy the MAD, if for e.g. there are > Yaron> multiple SA clients than each one will get the MAD and will > Yaron> need to decide if its his MAD (I assume that's also the > Yaron> reason attrib is needed in such approach, not because of > Yaron> futuristic SM) > > I guess I should have been clearer in my description. Nothing forces > the consumer to copy the MAD unless it wants to keep the MAD around > for later. Also nothing prevents us from implementing an SA layer on > top of the basic MAD layer that handles demultiplexing multiple > consumers, etc. I don't see how having two layers of demux+filters for SA and a daisy chain of code more "simple" than just issue the callback based on the TID (client ID), the SA query part can be as little as a MAD template on the send and retry if no response arrives >(in fact that is how the Topspin stack works and what > I would expect we would want to do). I sense that's your main reason for resistance, it is not a valid claim, the essence of Matt's proposal is that no ones stack is more important than the other, you agreed in the pass that SA/GSI is not your strong part so we should use something better > I'm not too concerned about the details of what I proposed; I would > just like to see a general layer that gives us the flexibility to > experiment and deal with future requirements. No one of the smart guys on this list including you found even a use case that cannot be met by our approach, if you see one please mention it, we will address it, or don't use this argument > In any case I remember > now why I gave up on this discussion the first time around. You gave up since we don't accept baseless claims but REAL use cases and example, just saying your approach is "Simple" is a subjective statement we don't agree with Yaron From roland at topspin.com Sat Jul 31 09:56:39 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 31 Jul 2004 09:56:39 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <20040730112941.661ac601.mshefty@ichips.intel.com> (Sean Hefty's message of "Fri, 30 Jul 2004 11:29:41 -0700") References: <20040728144612.3dc1eebc.mshefty@ichips.intel.com> <52r7qvwi1d.fsf@topspin.com> <20040728213645.042a4315.mshefty@ichips.intel.com> <52acxix5j8.fsf@topspin.com> <20040730112941.661ac601.mshefty@ichips.intel.com> Message-ID: <52zn5gqfnc.fsf@topspin.com> Sean> Which layer were you expecting to perform reference Sean> counting? Sean> Which layer were you expecting to set the values in struct Sean> ib_xxx? I've been doing both in the device-independent part of the access layer. It could be changed easily but I figured device-independent common code didn't belong in the device driver. - R. From roland at topspin.com Sat Jul 31 10:09:30 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 31 Jul 2004 10:09:30 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AA8B@taurus.voltaire.com> (Yaron Haviv's message of "Sat, 31 Jul 2004 11:53:53 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AA8B@taurus.voltaire.com> Message-ID: <52r7qsqf1x.fsf@topspin.com> Roland> (in fact that is how the Topspin stack works and what I Roland> would expect we would want to do). Yaron> I sense that's your main reason for resistance, it is not a Yaron> valid claim, the essence of Matt's proposal is that no ones Yaron> stack is more important than the other, you agreed in the Yaron> pass that SA/GSI is not your strong part so we should use Yaron> something better Your interpretation completely wrong. I was just saying that it is not a requirement to code everything to the lowest level API that I am proposing, and giving the Topspin stack as an example of how my design would be used. Remember, we are building a "stack" so it makes sense to layer things. In any case you are pushing not just Voltaire's design but Voltaire's code as well so this is a ridiculous complaint coming from you. Yaron> No one of the smart guys on this list including you found Yaron> even a use case that cannot be met by our approach, if you Yaron> see one please mention it, we will address it, or don't use Yaron> this argument What do you mean? I've come up with several use cases and your response is always that you'll add yet another special case to the API. For example, I suggested being able to snoop every MAD. I could also imagine someone wanting to snoop every CM MAD or every SA MAD for debugging those components. The restriction of one consumer per class doesn't look right to me either -- there could be multiple network management applications all trying to use PM. I could go on but I don't think it's worth it. I just want to provide mechanism and not policy. - Roland From yaronh at voltaire.com Sat Jul 31 15:17:04 2004 From: yaronh at voltaire.com (Yaron Haviv) Date: Sun, 1 Aug 2004 01:17:04 +0300 Subject: [openib-general] gen2 dev branch Message-ID: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> On Saturday, July 31, 2004 8:10 PM, Roland Dreier wrote: > Roland> (in fact that is how the Topspin stack works and what I > Roland> would expect we would want to do). > > Yaron> I sense that's your main reason for resistance, it is not a > Yaron> valid claim, the essence of Matt's proposal is that no ones > Yaron> stack is more important than the other, you agreed in the > Yaron> pass that SA/GSI is not your strong part so we should use > Yaron> something better > > In any case you are pushing not just Voltaire's design but Voltaire's > code as well so this is a ridiculous complaint coming from you. We didn't resist to tons of code pushed by Topspin, and it isn't all that perfect or better than ours. The suggested gsi.h is different than our original gsi, the code is also been drastically changed to incorporates elements suggested by you (Slab's,..), suggested by Todd/Eitan/Sean, and remove performance and other optimizations (so it would be more "Simple") Those changes were made after people gave constructive feedback The idea in Matt's proposal is that code is been contributed by few and not just by one Since the suggested approach can support all your needs today, I suggest that instead of fighting it accept, and give some credit to others who have a fare share of experience, your strong resistance cannot be interpreted in too many ways > What do you mean? I've come up with several use cases and your > response is always that you'll add yet another special case to the > API. You made few remarks: wanted dynamic memory allocation instead of pools, it was accepted, API was changed, and code is been worked on by Hal to support it > For example, I suggested being able to snoop every MAD Personally I'm ok with just having debug prints for that when people change the verbosity level, but we did accept the requirement and added an API for it without arguing about it You also made a somewhat of an extreme case for needing the attribute field, anyway its not changing the model its just adding a parameter, I think more filters is not better if others in this list think its not just trying to make a case but a real requirement we will add it to the API . > The restriction of one consumer per > class doesn't look right to me either -- there could be multiple > network management applications all trying to use PM. There can be more than one consumer per class in our proposal, the limit is on one server per class PM is a client model (issues PM requests), demux by client ID, so you can have as many as you want and in such example it is much more efficient to use our approach than the multi-layer, multi-filter, daisy chain approach you suggest. Do you see a case with few servers (accept requests) listening on the same hca/port/class/ver/attrib ? Did we miss or didn't address any requirement you made till now ? > I just want to provide mechanism and not policy. > I don't see where is the policy vs mechanism, we suggest a mechanism that works and performs in the most extreme cases possible , enable more functionality, more scalable, and easier to work with, and the only counter arguments I see is that the MAD snooping is not flexible I still think you don't have a real reason to object to our model so aggressively, most people in this list support it, it can work with your code, and we are quite open to incorporate any useful suggestions as long as they are focused and productive. Yaron From roland at topspin.com Sat Jul 31 18:46:50 2004 From: roland at topspin.com (Roland Dreier) Date: Sat, 31 Jul 2004 18:46:50 -0700 Subject: [openib-general] gen2 dev branch In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> (Yaron Haviv's message of "Sun, 1 Aug 2004 01:17:04 +0300") References: <35EA21F54A45CB47B879F21A91F4862F18AA8F@taurus.voltaire.com> Message-ID: <52ekmrr5o5.fsf@topspin.com> Yaron> I don't see where is the policy vs mechanism, we suggest a Yaron> mechanism that works and performs in the most extreme cases Yaron> possible , enable more functionality, more scalable, and Yaron> easier to work with, and the only counter arguments I see Yaron> is that the MAD snooping is not flexible Yaron> I still think you don't have a real reason to object to our Yaron> model so aggressively, most people in this list support it, Yaron> it can work with your code, and we are quite open to Yaron> incorporate any useful suggestions as long as they are Yaron> focused and productive. I guess I've said enough times that I'm not comfortable with this design. I don't like the assumption that there is only one consumer for each MAD. I don't like the assumption that that every MAD on QP1 will follow all the GSI rules (especially since CM doesn't really follow all the GSI ruls). I don't like building a general kernel GSI layer when the SA client is the only known consumer. I don't like to treat QP0 and QP1 so differently. However I agree that you approach can be made to work for today's applications, so if Sean is comfortable with it I will go along. In any case we can always replace it later if it turns out I'm right. - Roland