From halr at voltaire.com Fri Jul 1 03:27:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jul 2005 06:27:14 -0400 Subject: [openib-general] [PATCH] user_mad: Add receive side RMPP support In-Reply-To: <1120167637.4371.630.camel@hal.voltaire.com> References: <1120149968.4371.81.camel@hal.voltaire.com> <52d5q365uo.fsf@topspin.com> <42C464E6.8040406@ichips.intel.com> <1120167637.4371.630.camel@hal.voltaire.com> Message-ID: <1120213633.4371.2295.camel@hal.voltaire.com> On Thu, 2005-06-30 at 17:40, Hal Rosenstock wrote: > > Given that, I think that it makes more sense to add a length field to the > > ib_user_mad_hdr. > > That's my conclusion too. ABI change :-( Is there a need to any backward compatibility ? Specifically should the current ABI version (4) be supported as well as the one this will be bumped to (5) or just the new one ? -- Hal From mst at mellanox.co.il Fri Jul 1 05:07:00 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 1 Jul 2005 15:07:00 +0300 Subject: [openib-general] Re: [PATCH (draft)] sdp: fix aio/sync completion race In-Reply-To: <20050630181636.A29268@topspin.com> References: <20050506175415.I26823@topspin.com> <20050510143232.GF2474@mellanox.co.il> <20050512151051.C23734@topspin.com> <52fywst4z3.fsf@topspin.com> <20050513135145.GA320@mellanox.co.il> <20050513100149.B24123@topspin.com> <20050525202128.GA7463@mellanox.co.il> <20050525172954.A2979@topspin.com> <20050628201857.GA5987@mellanox.co.il> <20050630181636.A29268@topspin.com> Message-ID: <20050701120700.GA10889@mellanox.co.il> Quoting r. Libor Michalek : > Subject: Re: [PATCH (draft)] sdp: fix aio/sync completion race > > On Tue, Jun 28, 2005 at 11:18:57PM +0300, Michael S. Tsirkin wrote: > > Libor, here's a stub at solving an old problem. > > Most of the patch is passing sdp_opt to iocb complete and cancel functions. > > > > I didnt yet test it, and I wont have the time today, > > but maybe you could tell me whether I'm going in the right > > direction with this. > > I'm wondering if we should prevent a new call from proceeding until > all IOCBs have been completed, like the patch you have, or if we should > force the creation of an IOCB when there are any IOCBs outstanding, > even if the call can be processed synchronously. > > Remember the original problem was IOCB completes in the work queue > completing after a call to send/recv that was processed synchronously. > Either forcing an IOCB or waiting on the lock should prevent the > ordering issue. You mean using zcopy for packets below zcopy threshold? > I think either case if means passing the sdp_opt to iocb_complete. In > do_iocb_complete you cannot call conn_unlock unconditionally since > the lock.users variable is only incremented in_atomic() or irqs_disabled() > Also, I think the conn reference needs to be incremented before > creating the work request, and decremeneted upon completion since > nothing prevents it from going away before the thread is scheduled. > Libor, thanks for the comments. I'll work on addressing them. -- MST From jlentini at netapp.com Fri Jul 1 07:18:51 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 1 Jul 2005 10:18:51 -0400 (EDT) Subject: [openib-general] Re: [PATCH] kDAPL: remove redundant kfree checks In-Reply-To: <1120172491.29522.28.camel@duffman> References: <1120172491.29522.28.camel@duffman> Message-ID: On Thu, 30 Jun 2005, Tom Duffy wrote: tduffy> kfree() already checks for NULL. No need to do it twice. tduffy> tduffy> Signed-off-by: Tom Duffy Committed in revision 2765. From caitlinb at siliquent.com Fri Jul 1 08:59:06 2005 From: caitlinb at siliquent.com (Caitlin Bestler) Date: Fri, 1 Jul 2005 08:59:06 -0700 Subject: [openib-general] Making gen2 transport neutral Message-ID: <8508251A6FC08A489844A94261D3693A057CA1@fiona.siliquent.com> While waiting to see if any of the champions of the union form appear, I'd like to ask what the enum equivalent of the non-union struct strategy would look like. I'm guessing it would take something like enum rdma_xyz { /* common values */ RDMA_XYZ_A, RDMA_XYZ_B, RDMA_XYZ_C, /* IB specific values */ IB_XYZ_D, IB_XYZ_E, /* iWARP specific values */ IWARP_XYZ_F = RDMA_XYZ_C+1, IWARP_XYZ_G } to enum rdma_xyz { /* common values */ RDMA_XYZ_A, RDMA_XYZ_B, RDMA_XYZ_C, RDMA_XYZ_LIM /* must be last */ }; enum ib_xyz { /* extends rdma_xyz */ IB_XYZ_D = RDMA_XYZ_LIM, IB_XYZ_E }; enum iwarp_xzy { /* extends iwarp_xyz */ IWARP_XYZ_F = RDMA_XYZ_LIM, IWARP_XYZ_G; }; That way the transport dependent enums could even be in transport dependent .h files, and only the common ones would have to be in the main header file. That would be very friendly for adding a third transport at some later date (I still don't think it will happen, but it would make it easier). > -----Original Message----- > From: Christoph Hellwig [mailto:hch at lst.de] > Sent: Thursday, June 30, 2005 9:03 AM > To: Caitlin Bestler > Cc: openib-general; rdma-developers at lists.sourceforge.net > Subject: Re: [openib-general] Making gen2 transport neutral > > On Thu, Jun 30, 2005 at 08:52:54AM -0700, Caitlin Bestler wrote: > > structs: > > typically "struct ib_xyz" is transformed as follows: > > > > struct ib_xyz { > > /* Only IB specific fields remain */ > > /* In some cases fields have been split, because > > * iWARP allows two things to vary that IB had > > * locked together. SGE limits are the primary > > * example of this. iWARP can have different > > * limits on SGE size for each type of message > > */ > > }; > > > > struct iwarp_xyz { > > /* equivalent iWARP specific fields */ > > }; > > > > struct rdma_xyz { > > /* Transport neutral fields. Typically > > * a subset of what was in struct ib_xyz before > > */ > > union { > > struct ib_xyz ib; > > struct iwarp_xyz iwarp; > > } xpt; > > }; > > wrong way around, but we had that before. It should be > > struct ib_foo { > struct rdma_foo common; > ... > } > > struct iwarp_foo { > struct rdma_foo common; > ... > } > > see filesystem and network protocol private data for example > where why historically did it that union way and it didn't > work out at all long-term. > > > I am assuming that returning a "not supported" error for irrelevant > > verbs is an acceptable burnden for all providers. > > Even better make the methods implementing them optional and > let the upper layer return EOPNOTSUPP when it's not implemeneted. > > From hch at lst.de Fri Jul 1 09:02:51 2005 From: hch at lst.de (Christoph Hellwig) Date: Fri, 1 Jul 2005 18:02:51 +0200 Subject: [openib-general] Making gen2 transport neutral In-Reply-To: <8508251A6FC08A489844A94261D3693A057CA1@fiona.siliquent.com> References: <8508251A6FC08A489844A94261D3693A057CA1@fiona.siliquent.com> Message-ID: <20050701160251.GA9380@lst.de> On Fri, Jul 01, 2005 at 08:59:06AM -0700, Caitlin Bestler wrote: > to > > enum rdma_xyz { /* common values */ > RDMA_XYZ_A, > RDMA_XYZ_B, > RDMA_XYZ_C, > RDMA_XYZ_LIM /* must be last */ > }; > > enum ib_xyz { /* extends rdma_xyz */ > IB_XYZ_D = RDMA_XYZ_LIM, > IB_XYZ_E > }; > > enum iwarp_xzy { /* extends iwarp_xyz */ > IWARP_XYZ_F = RDMA_XYZ_LIM, > IWARP_XYZ_G; > }; > > That way the transport dependent enums could even > be in transport dependent .h files, and only the > common ones would have to be in the main header file. > That would be very friendly for adding a third > transport at some later date (I still don't think > it will happen, but it would make it easier). looks fine. From caitlinb at siliquent.com Fri Jul 1 09:14:46 2005 From: caitlinb at siliquent.com (Caitlin Bestler) Date: Fri, 1 Jul 2005 09:14:46 -0700 Subject: [openib-general] gen2/rnic-pi differences Message-ID: <8508251A6FC08A489844A94261D3693A057CA2@fiona.siliquent.com> The wr_id field is the "minimal support" that I was referring to (that and the context on callbacks). DAPL and IT-API really need more than that to optimally implement features such as: storing the ULP's 64-bit cookie. controlling "notification status" on a per-DTO level rather than just always leaving the CQ on a "next event" basis. implementing graceful disconnect (disconnect after the last send request completes). allowing the ULP to flush a request through as a marker. returning the EP or RMR pointer with the completion. The 64 bit wr_id is fully consumed just implementing the first option. Without further support the Access Layer (DAT/IT-API/whatever) must create its own parallel data structure to shadow the work request (a DTO_COOKIE in the reference implementation). Such a structure has been found in every DAT implementation that I am aware of to date for the simple reason that the verb layers have not offered much support beyond the 64-bit work request id and a context field on callbacks. But it is in fact very easy to solve all of these problems at the verb layer, even if there is no hardware support. RNIC-PI defined several verb layer features that if supported eliminate the need for a DTO_COOKIE. If the information can be integrated with existing verb layer structures it is a major improvement in efficiency, at worst case it merely requires the verb layer to implement the same workarounds that the Access Layer is already forced to use. These features are: all verb layer objects have a consumer supplied identifier (os_data) that is used to identify that object back to the consumer in all completions and callbacks. So instead of getting the QPID you get the EP pointer (assuming that is what you supplied). Three flags are identified per work request that can be ignored, passed through or fully implemented. They are Local Solicited, Consumer0 and Consumer1. If 'Local Solicited' is defined it means that completion of the work request should be treated as though it were a solicited event (i.e., it qualifies for 'next solicited event' callback notification). with these changes there is no need for a DTO_COOKIE, which reduces the number of memory touches, simplifies code, and may eliminate associated locking to protect the DTO_COOKIE resource itself. > -----Original Message----- > From: James Lentini [mailto:jlentini at netapp.com] > Sent: Thursday, June 30, 2005 3:18 PM > To: Caitlin Bestler > Cc: openib-general; rdma-developers at lists.sourceforge.net > Subject: Re: [openib-general] gen2/rnic-pi differences > > > > On Thu, 30 Jun 2005, Caitlin Bestler wrote: > > > os_data / Identification of Consumer Objects > > > > gen2 provides minimal support for identification of RDMA > > resources using consumer supplied handles. A user-supplied > > context is available in callbacks, but not in work completions. > > Gen2 work completions store a user-supplied context in the > ib_wc structure's wr_id field. > From ftillier at silverstorm.com Fri Jul 1 09:37:56 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 1 Jul 2005 09:37:56 -0700 Subject: [openib-general] gen2/rnic-pi differences In-Reply-To: <8508251A6FC08A489844A94261D3693A057CA2@fiona.siliquent.com> Message-ID: <000501c57e5b$3e287650$6501a8c0@infiniconsys.com> > From: Caitlin Bestler [mailto:caitlinb at siliquent.com] > Sent: Friday, July 01, 2005 9:15 AM > > RNIC-PI defined several verb layer features that if supported > eliminate the need for a DTO_COOKIE. If the information can > be integrated with existing verb layer structures it is a > major improvement in efficiency, at worst case it merely > requires the verb layer to implement the same workarounds > that the Access Layer is already forced to use. > > These features are: > all verb layer objects have a consumer supplied > identifier (os_data) that is used to identify that > object back to the consumer in all completions and > callbacks. So instead of getting the QPID you get > the EP pointer (assuming that is what you supplied). This should be really easy to implement for Mellanox HCAs - the mthca driver already has to resolve the QP structure when processing completions, and getting the user's QP context and including it in the work completion should be a trivial addition (for someone familiar with the code base). > Three flags are identified per work request that can > be ignored, passed through or fully implemented. They > are Local Solicited, Consumer0 and Consumer1. If 'Local > Solicited' is defined it means that completion of the > work request should be treated as though it were a > solicited event (i.e., it qualifies for 'next solicited > event' callback notification). Would these flags be returned in the work completion? I don't know if I quite understand what you're requesting here. Do Consumer0 and Consumer1 represent bits in a flags field? In the Mellanox HCA implementation, the 64-bit work request ID is stored by the driver and recovered upon a completion. Basically, the HCA driver maintains DTO_COOKIE-like information for each work request already. Due to this lookup requirement, the information stored per work request could be arbitrarily large if so desired. I don't know if that holds true for the PathScale HCA hardware, though - if it doesn't it would require an additional lookup in the HCA driver to get back to this information, in effect pushing the DTO_COOKIE concept into the HCA driver rather than leaving it in the consumer. - Fab From mshefty at ichips.intel.com Fri Jul 1 09:42:19 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 01 Jul 2005 09:42:19 -0700 Subject: [openib-general] gen2/rnic-pi differences In-Reply-To: <8508251A6FC08A489844A94261D3693A057CA2@fiona.siliquent.com> References: <8508251A6FC08A489844A94261D3693A057CA2@fiona.siliquent.com> Message-ID: <42C5726B.1070604@ichips.intel.com> Caitlin Bestler wrote: > The wr_id field is the "minimal support" that I was > referring to (that and the context on callbacks). > > DAPL and IT-API really need more than that to optimally > implement features such as: > storing the ULP's 64-bit cookie. > > controlling "notification status" on a per-DTO level > rather than just always leaving the CQ on a "next event" > basis. > > implementing graceful disconnect (disconnect after > the last send request completes). > > allowing the ULP to flush a request through as > a marker. > > returning the EP or RMR pointer with the completion. > > The 64 bit wr_id is fully consumed just implementing > the first option. > > Without further support the Access Layer (DAT/IT-API/whatever) > must create its own parallel data structure to shadow the > work request (a DTO_COOKIE in the reference implementation). The assumption being made here is that there needs to be another abstraction layer on top of the existing core layer. I don't think that an optimal implementation would want this. - Sean From caitlin.bestler at gmail.com Fri Jul 1 09:44:26 2005 From: caitlin.bestler at gmail.com (Caitlin Bestler) Date: Fri, 1 Jul 2005 09:44:26 -0700 Subject: [openib-general] gen2/rnic-pi differences In-Reply-To: <000501c57e5b$3e287650$6501a8c0@infiniconsys.com> References: <8508251A6FC08A489844A94261D3693A057CA2@fiona.siliquent.com> <000501c57e5b$3e287650$6501a8c0@infiniconsys.com> Message-ID: <469958e0050701094493bc8ea@mail.gmail.com> On 7/1/05, Fab Tillier wrote: > > From: Caitlin Bestler [mailto:caitlinb at siliquent.com] > > Sent: Friday, July 01, 2005 9:15 AM > > > > RNIC-PI defined several verb layer features that if supported > > eliminate the need for a DTO_COOKIE. If the information can > > be integrated with existing verb layer structures it is a > > major improvement in efficiency, at worst case it merely > > requires the verb layer to implement the same workarounds > > that the Access Layer is already forced to use. > > > > These features are: > > all verb layer objects have a consumer supplied > > identifier (os_data) that is used to identify that > > object back to the consumer in all completions and > > callbacks. So instead of getting the QPID you get > > the EP pointer (assuming that is what you supplied). > > This should be really easy to implement for Mellanox HCAs - the mthca driver > already has to resolve the QP structure when processing completions, and getting > the user's QP context and including it in the work completion should be a > trivial addition (for someone familiar with the code base). > That was the basic assumption, that the verb layer drivers could implement these features more easily than DAT/IT-API could because they would simply require fields in existing data structures rather than totally separate parallel data structures. > > Three flags are identified per work request that can > > be ignored, passed through or fully implemented. They > > are Local Solicited, Consumer0 and Consumer1. If 'Local > > Solicited' is defined it means that completion of the > > work request should be treated as though it were a > > solicited event (i.e., it qualifies for 'next solicited > > event' callback notification). > > Would these flags be returned in the work completion? I don't know if I quite > understand what you're requesting here. Do Consumer0 and Consumer1 represent > bits in a flags field? > Yes. The three flags end up being pass-throughs to the work completion, although the Local Solicited bit can be merged with the Remote Solicited bit without harm (you don't really need to know which end set the bit). > In the Mellanox HCA implementation, the 64-bit work request ID is stored by the > driver and recovered upon a completion. Basically, the HCA driver maintains > DTO_COOKIE-like information for each work request already. Due to this lookup > requirement, the information stored per work request could be arbitrarily large > if so desired. I don't know if that holds true for the PathScale HCA hardware, > though - if it doesn't it would require an additional lookup in the HCA driver > to get back to this information, in effect pushing the DTO_COOKIE concept into > the HCA driver rather than leaving it in the consumer. > Exactly. Any implementation that has access to the original send queue when reaping the work completion can easily extend the 64-bit work request ID with additional data. I suspect that this is a very common implementation because there isn't much purpose in actually sending the 64-bit ID out over the system bus just to have it be shipped back in the completion. Why pay for storage on the card to hold the 64-bit wr id for every pending work request? And why pay to ship it across the system bus twice? From caitlin.bestler at gmail.com Fri Jul 1 09:53:06 2005 From: caitlin.bestler at gmail.com (Caitlin Bestler) Date: Fri, 1 Jul 2005 09:53:06 -0700 Subject: [openib-general] gen2/rnic-pi differences In-Reply-To: <42C5726B.1070604@ichips.intel.com> References: <8508251A6FC08A489844A94261D3693A057CA2@fiona.siliquent.com> <42C5726B.1070604@ichips.intel.com> Message-ID: <469958e005070109534d6c8407@mail.gmail.com> On 7/1/05, Sean Hefty wrote: > Caitlin Bestler wrote: > > The assumption being made here is that there needs to be another abstraction > layer on top of the existing core layer. I don't think that an optimal > implementation would want this. > I believe that any application using the core layer directly would have the same need to identify *it's* context efficiently when processing a work completion. The QP ID does not truly meet that requirement. It is an arbitrary integer selected by another software package. You have to add your own reverse indexing or use the work request ID to fully identify your context, which means that it cannot also be promised to a higher layer. So I think this problem is generic to *any* middleware consumer, whether it is DAPL, IT-API, MPI, the communications layer of a database, RPC, etc. The Work Request ID by itself will typically leave any middleware consumer forced to create a "DTO Coookie" type solution. And while I believe that we should *allow* consumers to use the core layer directly, I do not believe we should *encourage* it. Middleware layers such as DAPL, IT-API or an MPI messaging system are enable the application to focus on application issues rather than on wire issues. That's why kDAPL has been used, and why we should be concerned with enabling efficient middleware even if we allow consumers to bypass it. From sato at make-love.cx Fri Jul 1 09:55:14 2005 From: sato at make-love.cx (make-love ) Date: Sat, 02 Jul 2005 01:55:14 +0900 Subject: [openib-general] =?iso-2022-jp?b?GyRCOkZFWSQ0QXc/LiQ1JDsbKEI=?= =?iso-2022-jp?b?GyRCJEZEOiQtJF4kOSEjGyhC?= Message-ID: <20050701232750.828.MAIL> 先日は私共の手違いによりメールが届いていない方が多数いらっしゃったそうですので この場を借りてお詫び申し上げます。 今後はこのようなことがないよう全力で努めさせて頂く所存ですのでこれからも よろしくお願い致します。 今回の企画は『身元の確かな出会いに縁のない女性会員様と 誠実で秘密厳守を守れる男性会員様』が出会うという素晴らしい企画でございます。 先日会員様を募集したところ、8000名以上の女性会員の方が集いました。 しかし残念なことに男性会員様は700名という形で終りました。 そこで今回は男性会員様を大募集致します。 Make-up Loveでは女性会員様の会費で運営されております。 お会いする場合にはくれぐれも失礼のないようお願い致します。 また男性会員様におきまして登録料・紹介料は一切頂いておりません。 当クラブには様々な女性会員様が在籍しております。 学生、OL、フリーター、ニート、セレブ、淫乱マダム、SM、3P、合コン などなど夢のようです。 日本全国に女性会員様が多数在籍しておりますので、 まずは貴方様の地域に該当している女性会員様をご紹介させて頂きたいと思います。 簡単なシートですので、まずはお気軽にお書き下さいませ。 Q:NO,1   貴方様のお会いするにあたってのご希望の地域(都道府県等) Q:NO,2   当クラブをご利用するにあたってのご利用タイプ 【タイプ1】真面目に一般女性とお付き合い  (真面目な交際・結婚前提) 【タイプ2】女性スポンサーとお付き合い   (事業資金などサポートして欲しい) 【タイプ3】割り切った大人のお付き合い   (愛人・セフレ・人妻) 【タイプ4】多趣味にそして過激なお付き合い (3P、乱交・SM) 【タイプ特別】40歳以上同士の新しい交際 以上のご質問にお答え頂きご返信頂ければ、当クラブの各地域担当女性から 貴方様のご希望に沿った女性をご案内させて頂きます。 メール会報に登録した覚えがないという方や同様のメールが届いていると いう場合もあるかと思いますがご了承下さい。 当クラブ「Make-up Love」の会報メールがご不要な場合にはそのまま破棄してください。 尚、「配信不要」とお送りいただければ会報メールは二度と配信致しません。 ここまで読んで「嘘くさい」「配信不要」と思った方々。 少々お考え下さい。 ここで削除、配信不要とすることは簡単です。 しかし、こんな機会は一生に一度あるかどうかと思われます。 やはりお金の心配をなさっている方もいるのではないでしょうか? 皆様の喜ぶの声を聞かせて頂ければ、それだけで充分でございます。 最後になりますが、「配信不要」と書こうと思った方も どうか地域、ご利用のタイプをご明記ください。 直ちに女性会員の方々をご紹介させて頂きます。 ご連絡お待ちしております。 『club Make-up Love【メイク・ラブ】』代表 藁科美智子 From halr at voltaire.com Fri Jul 1 09:58:50 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jul 2005 12:58:50 -0400 Subject: [openib-general] Service Record support in SA query Message-ID: <1120237130.4371.3616.camel@hal.voltaire.com> Hi Roland, Any more comments on the Service Record support in SA query patch or is this OK to commit ? Thanks. -- Hal From mshefty at ichips.intel.com Fri Jul 1 10:22:20 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 01 Jul 2005 10:22:20 -0700 Subject: [openib-general] gen2/rnic-pi differences In-Reply-To: <469958e005070109534d6c8407@mail.gmail.com> References: <8508251A6FC08A489844A94261D3693A057CA2@fiona.siliquent.com> <42C5726B.1070604@ichips.intel.com> <469958e005070109534d6c8407@mail.gmail.com> Message-ID: <42C57BCC.3050005@ichips.intel.com> Caitlin Bestler wrote: > And while I believe that we should *allow* consumers to use the > core layer directly, I do not believe we should *encourage* it. > Middleware layers such as DAPL, IT-API or an MPI messaging > system are enable the application to focus on application issues > rather than on wire issues. That's why kDAPL has been used, > and why we should be concerned with enabling efficient middleware > even if we allow consumers to bypass it. The core layer should be designed to encourage, not discourage, its use, otherwise it has the wrong abstraction. - Sean From halr at voltaire.com Fri Jul 1 10:24:37 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jul 2005 13:24:37 -0400 Subject: [openib-general] [oops] recent opensm crash In-Reply-To: <1120169407.29522.17.camel@duffman> References: <1120169407.29522.17.camel@duffman> Message-ID: <1120238676.4371.3718.camel@hal.voltaire.com> On Thu, 2005-06-30 at 18:10, Tom Duffy wrote: > #0 stack_dump () at src/stack.c:72 > 72 if (!__builtin_frame_address(2)) > (gdb) bt > #0 stack_dump () at src/stack.c:72 > #1 0x00002aaaaacbd1a6 in handler (x=11) at src/stack.c:151 > #2 > #3 __osm_sm_mad_ctrl_send_err_cb (bind_context=0x550dd8, p_madw=0x567820) > at osm_sm_mad_ctrl.c:832 > #4 0x00002aaaaaaaeeed in osm_vendor_send (h_bind=0x586920, p_madw=0x567820, > resp_expected=1) at osm_vendor_ibumad.c:889 I found one problem associated with this and just checked in a patch. I'm not sure whether there is another one behind this or not. Any reliable way to recreate this ? -- Hal > #5 0x000000000042ef72 in __osm_vl15_poller (p_ptr=0x552620) at osm_madw.h:933 > #6 0x00002aaaaadc911e in __cl_thread_wrapper (arg=0x0) at cl_thread.c:61 > #7 0x00000036d28060aa in start_thread () from /lib64/tls/libpthread.so.0 > #8 0x00000036d19c53d3 in clone () from /lib64/tls/libc.so.6 > #9 0x0000000000000000 in ?? () > > I was bringing up and down an node when this happened. > > Attached are the last 500 lines from osm.log. > > -tduffy > > ______________________________________________________________________ > > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x1 > trans_id................0x16c6a > attr_id.................0x16 (P_KeyTable) > resv....................0x0 > attr_mod................0x50000 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1] > Return path: [0][0] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Jun 30 14:59:00 [43806960] -> osm_vendor_send: [ > Jun 30 14:59:00 [43005960] -> PortInfo dump: > port number.............0x8 > node_guid...............0x000617000000000d > port_guid...............0x000617000000000d > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x7 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x0 > link_speed_supported....0x1 > port_state..............DOWN > state_info2.............0x22 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x10 > vl_cap..................0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x20 > vl_arb_low_cap..........0x20 > mtu_cap.................0x5 > vl_stall_life...........0x8 > vl_enforce..............0x10 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0xFF > Jun 30 14:59:00 [43005960] -> Capabilities Mask: > Jun 30 14:59:00 [43005960] -> __osm_pi_rcv_process_switch_port: [ > Jun 30 14:59:00 [43005960] -> __osm_pi_rcv_process_switch_port: ] > Jun 30 14:59:00 [43005960] -> __osm_pi_rcv_get_pkey_slvl_vla_tables: [ > Jun 30 14:59:00 [43005960] -> osm_physp_has_pkey: [ > Jun 30 14:59:00 [43005960] -> osm_req_get: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: [ > Jun 30 14:59:00 [43806960] -> __osm_mtl_send_callback: Completed Sending Request MADW: 0x5b0ac0. > Jun 30 14:59:00 [43806960] -> osm_vendor_send: ] > Jun 30 14:59:00 [43005960] -> osm_vendor_get: [ > Jun 30 14:59:00 [44808960] -> osm_vendor_put: [ > Jun 30 14:59:00 [44808960] -> osm_vendor_put: Retiring UMAD 0x591a50. > Jun 30 14:59:00 [44808960] -> osm_vendor_put: ] > Jun 30 14:59:00 [44808960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43806960] -> __osm_vl15_poller: 1 on wire, 11 outstanding, 10 unicasts sent, 88641 sent total. > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_process_get_resp: Posting Dispatcher message OSM_MSG_MAD_NODE_INFO. > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_process_get_resp: ] > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_rcv_callback: ] > Jun 30 14:59:00 [43005960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x567428, size = 256. > Jun 30 14:59:00 [44808960] -> osm_mad_pool_get: [ > Jun 30 14:59:00 [44808960] -> osm_vendor_get: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_get: Acquired UMAD 0x592620, size = 256. > Jun 30 14:59:00 [43005960] -> osm_vendor_get: ] > Jun 30 14:59:00 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x560268, size = 256. > Jun 30 14:59:00 [44808960] -> osm_vendor_get: Acquired UMAD 0x591a50, size = 256. > Jun 30 14:59:00 [44808960] -> osm_vendor_get: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: Acquired p_madw = 0x567410, p_mad = 0x592654, size = 256. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: ] > Jun 30 14:59:00 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x560250, p_mad = 0x591a84, size = 256. > Jun 30 14:59:00 [44808960] -> osm_mad_pool_get: ] > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_rcv_callback: [ > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_rcv_callback: 88641 QP0 MADs received. > Jun 30 14:59:00 [43005960] -> osm_req_get: Getting P_KeyTable (0x16), modifier = 0x80000, TID = 0x16c6d. > Jun 30 14:59:00 [43005960] -> osm_vl15_post: [ > Jun 30 14:59:00 [43005960] -> osm_vl15_post: Servicing p_madw = 0x567410 (mad 0x592654 req 1) > Jun 30 14:59:00 [44808960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > status..................0x8000 > hop_ptr.................0x0 > hop_count...............0x1 > trans_id................0x16c6a > attr_id.................0x16 (P_KeyTable) > resv....................0x0 > attr_mod................0x50000 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1] > Return path: [0][7] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_process_get_resp: [ > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_update_wire_stats: [ > Jun 30 14:59:00 [43005960] -> osm_vl15_post: 1 MADs on wire, 12 MADs outstanding. > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: [ > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: Signalling poller thread. > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_update_wire_stats: 0 SMPs on the wire, 12 outstanding. > Jun 30 14:59:00 [44808960] -> osm_vl15_poll: [ > Jun 30 14:59:00 [44808960] -> osm_vl15_poll: Signalling poller thread. > Jun 30 14:59:00 [44808960] -> osm_vl15_poll: ] > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_update_wire_stats: ] > Jun 30 14:59:00 [44808960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [44808960] -> osm_mad_pool_put: Releasing p_madw = 0x5b0ac0, p_mad = 0x5930e4. > Jun 30 14:59:00 [44808960] -> osm_vendor_put: [ > Jun 30 14:59:00 [44808960] -> osm_vendor_put: Retiring UMAD 0x5930b0. > Jun 30 14:59:00 [44808960] -> osm_vendor_put: ] > Jun 30 14:59:00 [44808960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43806960] -> __osm_vl15_poller: Servicing p_madw = 0x567820 (mad 0x591bc4 req 1) > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_process_get_resp: Posting Dispatcher message OSM_MSG_MAD_PKEY. > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_process_get_resp: ] > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_rcv_callback: ] > Jun 30 14:59:00 [43806960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x1 > trans_id................0x16c6b > attr_id.................0x16 (P_KeyTable) > resv....................0x0 > attr_mod................0x60000 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1] > Return path: [0][0] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: ] > Jun 30 14:59:00 [43806960] -> osm_vendor_send: [ > Jun 30 14:59:00 [43005960] -> osm_vl15_post: ] > Jun 30 14:59:00 [43005960] -> osm_req_get: ] > Jun 30 14:59:00 [43005960] -> osm_physp_has_pkey: ] > Jun 30 14:59:00 [43005960] -> __osm_pi_rcv_get_pkey_slvl_vla_tables: ] > Jun 30 14:59:00 [43005960] -> osm_pi_rcv_process: ] > Jun 30 14:59:00 [44808960] -> osm_mad_pool_get: [ > Jun 30 14:59:00 [44808960] -> osm_vendor_get: [ > Jun 30 14:59:00 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x5b0ad8, size = 256. > Jun 30 14:59:00 [44808960] -> osm_vendor_get: Acquired UMAD 0x5930b0, size = 256. > Jun 30 14:59:00 [44808960] -> osm_vendor_get: ] > Jun 30 14:59:00 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x5b0ac0, p_mad = 0x5930e4, size = 256. > Jun 30 14:59:00 [44808960] -> osm_mad_pool_get: ] > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_rcv_callback: [ > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_rcv_callback: 88642 QP0 MADs received. > Jun 30 14:59:00 [43806960] -> osm_vendor_send: Send failed -5 (Success). > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43806960] -> __osm_sm_mad_ctrl_send_err_cb: [ > Jun 30 14:59:00 [44808960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > status..................0x8000 > hop_ptr.................0x0 > hop_count...............0x1 > trans_id................0x16c6b > attr_id.................0x16 (P_KeyTable) > resv....................0x0 > attr_mod................0x60000 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1] > Return path: [0][7] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_process_get_resp: [ > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_update_wire_stats: [ > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_update_wire_stats: 0 SMPs on the wire, 12 outstanding. > Jun 30 14:59:00 [44808960] -> osm_vl15_poll: [ > Jun 30 14:59:00 [44808960] -> osm_vl15_poll: Signalling poller thread. > Jun 30 14:59:00 [44808960] -> osm_vl15_poll: ] > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_update_wire_stats: ] > Jun 30 14:59:00 [44808960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [44808960] -> osm_mad_pool_put: Releasing p_madw = 0x567820, p_mad = 0x591bc4. > Jun 30 14:59:00 [44808960] -> osm_vendor_put: [ > Jun 30 14:59:00 [44808960] -> osm_vendor_put: Retiring UMAD 0x591b90. > Jun 30 14:59:00 [44808960] -> osm_vendor_put: ] > Jun 30 14:59:00 [44808960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_process_get_resp: Posting Dispatcher message OSM_MSG_MAD_PKEY. > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_process_get_resp: ] > Jun 30 14:59:00 [44808960] -> __osm_sm_mad_ctrl_rcv_callback: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c62. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x5b0370, p_mad = 0x5b1244. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x5b1210. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 11 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: [ > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: Got GetResp(PKey) block:0 port_num 0 with GUID = 0x617000000000d for parent node GUID = 0x617000000000d, TID = 0x16c63. > Jun 30 14:59:00 [43005960] -> P_Key table dump: > port_guid...........0x000617000000000d > block_num...........0x0 > port_num............0x0 > P_Key Table: 0XFFFF | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c63. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x55df60, p_mad = 0x5b0c74. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x5b0c40. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 10 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: [ > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: Got GetResp(PKey) block:0 port_num 1 with GUID = 0x617000000000d for parent node GUID = 0x617000000000d, TID = 0x16c64. > Jun 30 14:59:00 [43005960] -> P_Key table dump: > port_guid...........0x000617000000000d > block_num...........0x0 > port_num............0x1 > P_Key Table: 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c64. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x5b0920, p_mad = 0x5916c4. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x591690. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 9 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: [ > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: Got GetResp(PKey) block:0 port_num 2 with GUID = 0x617000000000d for parent node GUID = 0x617000000000d, TID = 0x16c65. > Jun 30 14:59:00 [43005960] -> P_Key table dump: > port_guid...........0x000617000000000d > block_num...........0x0 > port_num............0x2 > P_Key Table: 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c65. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x55add0, p_mad = 0x591d04. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x591cd0. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 8 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_ni_rcv_process: [ > Jun 30 14:59:00 [43005960] -> NodeInfo dump: > base_version............0x1 > class_version...........0x1 > node_type...............Channel Adapter > num_ports...............0x2 > sys_guid................0x0002c9000100d050 > node_guid...............0x0002c901097624c0 > port_guid...............0x0002c901097624c1 > partition_cap...........0x40 > device_id...............0x5A44 > revision................0xA1 > port_num................0x1 > vendor_id...............0x2C9 > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing: [ > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing: Rediscovered Channel Adapter node 0x2c901097624c0 > TID = 0x16c66, discovered 0 times already. > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing_ca: [ > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_ca_port: [ > Jun 30 14:59:00 [43005960] -> osm_req_get: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_get: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x55ade8, size = 256. > Jun 30 14:59:00 [43005960] -> osm_vendor_get: Acquired UMAD 0x5b1210, size = 256. > Jun 30 14:59:00 [43005960] -> osm_vendor_get: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: Acquired p_madw = 0x55add0, p_mad = 0x5b1244, size = 256. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: ] > Jun 30 14:59:00 [43005960] -> osm_req_get: Getting PortInfo (0x15), modifier = 0x1, TID = 0x16c6e. > Jun 30 14:59:00 [43005960] -> osm_vl15_post: [ > Jun 30 14:59:00 [43005960] -> osm_vl15_post: Servicing p_madw = 0x55add0 (mad 0x5b1244 req 1) > Jun 30 14:59:00 [43005960] -> osm_vl15_post: 0 MADs on wire, 9 MADs outstanding. > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: [ > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: Signalling poller thread. > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: ] > Jun 30 14:59:00 [43005960] -> osm_vl15_post: ] > Jun 30 14:59:00 [43005960] -> osm_req_get: ] > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_ca_port: ] > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing_ca: ] > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_set_links: [ > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_set_links: Link already exists. > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_set_links: ] > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing: ] > Jun 30 14:59:00 [43005960] -> osm_ni_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c66. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x5b09f0, p_mad = 0x592fa4. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x592f70. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 8 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: [ > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: Got GetResp(PKey) block:0 port_num 3 with GUID = 0x617000000000d for parent node GUID = 0x617000000000d, TID = 0x16c67. > Jun 30 14:59:00 [43005960] -> P_Key table dump: > port_guid...........0x000617000000000d > block_num...........0x0 > port_num............0x3 > P_Key Table: 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c67. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x561f90, p_mad = 0x592e64. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x592e30. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 7 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: [ > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: Got GetResp(PKey) block:0 port_num 4 with GUID = 0x617000000000d for parent node GUID = 0x617000000000d, TID = 0x16c68. > Jun 30 14:59:00 [43005960] -> P_Key table dump: > port_guid...........0x000617000000000d > block_num...........0x0 > port_num............0x4 > P_Key Table: 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c68. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x5b0850, p_mad = 0x591804. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x5917d0. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 6 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_ni_rcv_process: [ > Jun 30 14:59:00 [43005960] -> NodeInfo dump: > base_version............0x1 > class_version...........0x1 > node_type...............Channel Adapter > num_ports...............0x2 > sys_guid................0x0002c90109765633 > node_guid...............0x0002c90109765630 > port_guid...............0x0002c90109765631 > partition_cap...........0x20 > device_id...............0x5A44 > revision................0xA1 > port_num................0x1 > vendor_id...............0x2C9 > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing: [ > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing: Rediscovered Channel Adapter node 0x2c90109765630 > TID = 0x16c69, discovered 0 times already. > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing_ca: [ > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_ca_port: [ > Jun 30 14:59:00 [43005960] -> osm_req_get: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_get: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x5b0868, size = 256. > Jun 30 14:59:00 [43005960] -> osm_vendor_get: Acquired UMAD 0x5b0c40, size = 256. > Jun 30 14:59:00 [43005960] -> osm_vendor_get: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: Acquired p_madw = 0x5b0850, p_mad = 0x5b0c74, size = 256. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_get: ] > Jun 30 14:59:00 [43005960] -> osm_req_get: Getting PortInfo (0x15), modifier = 0x1, TID = 0x16c6f. > Jun 30 14:59:00 [43005960] -> osm_vl15_post: [ > Jun 30 14:59:00 [43005960] -> osm_vl15_post: Servicing p_madw = 0x5b0850 (mad 0x5b0c74 req 1) > Jun 30 14:59:00 [43005960] -> osm_vl15_post: 0 MADs on wire, 7 MADs outstanding. > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: [ > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: Signalling poller thread. > Jun 30 14:59:00 [43005960] -> osm_vl15_poll: ] > Jun 30 14:59:00 [43005960] -> osm_vl15_post: ] > Jun 30 14:59:00 [43005960] -> osm_req_get: ] > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_ca_port: ] > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing_ca: ] > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_set_links: [ > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_set_links: Link already exists. > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_set_links: ] > Jun 30 14:59:00 [43005960] -> __osm_ni_rcv_process_existing: ] > Jun 30 14:59:00 [43005960] -> osm_ni_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c69. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x5b02a0, p_mad = 0x592144. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x592110. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 6 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: [ > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: Got GetResp(PKey) block:0 port_num 5 with GUID = 0x617000000000d for parent node GUID = 0x617000000000d, TID = 0x16c6a. > Jun 30 14:59:00 [43005960] -> P_Key table dump: > port_guid...........0x000617000000000d > block_num...........0x0 > port_num............0x5 > P_Key Table: 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c6a. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x560250, p_mad = 0x592514. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x5924e0. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 5 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: [ > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: Got GetResp(PKey) block:0 port_num 6 with GUID = 0x617000000000d for parent node GUID = 0x617000000000d, TID = 0x16c6b. > Jun 30 14:59:00 [43005960] -> P_Key table dump: > port_guid...........0x000617000000000d > block_num...........0x0 > port_num............0x6 > P_Key Table: 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | 0X0 | > Jun 30 14:59:00 [43005960] -> osm_pkey_rcv_process: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: [ > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: Retiring MAD with TID = 0x16c6b. > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: [ > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: Releasing p_madw = 0x5b0ac0, p_mad = 0x591a84. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: [ > Jun 30 14:59:00 [43005960] -> osm_vendor_put: Retiring UMAD 0x591a50. > Jun 30 14:59:00 [43005960] -> osm_vendor_put: ] > Jun 30 14:59:00 [43005960] -> osm_mad_pool_put: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: 4 QP0 MADs outstanding. > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_retire_trans_mad: ] > Jun 30 14:59:00 [43005960] -> __osm_sm_mad_ctrl_disp_done_callback: ] > Jun 30 14:59:00 [43806960] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_SUCCESS). > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Fri Jul 1 11:31:26 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jul 2005 14:31:26 -0400 Subject: [openib-general] [PATCH v3] user_mad: Add receive side RMPP support Message-ID: <1120242686.4371.3804.camel@hal.voltaire.com> user_mad: Add receive side RMPP support This change involves another ABI_VERSION change. [NOTE: Please do not commit. This needs to be coordinated with a change to userspace.] Signed-off-by: Hal Rosenstock Index: infiniband/include/ib_user_mad.h =================================================================== --- infiniband/include/ib_user_mad.h (revision 2757) +++ infiniband/include/ib_user_mad.h (working copy) @@ -43,7 +43,7 @@ * Increment this value if any changes that break userspace ABI * compatibility are made. */ -#define IB_USER_MAD_ABI_VERSION 4 +#define IB_USER_MAD_ABI_VERSION 5 /* * Make sure that all structs defined in this file remain laid out so @@ -78,6 +78,7 @@ __u32 status; __u32 timeout_ms; __u32 retries; + __u32 length; __u32 qpn; __u32 qkey; __u16 lid; Index: infiniband/core/user_mad.c =================================================================== --- infiniband/core/user_mad.c (revision 2760) +++ infiniband/core/user_mad.c (working copy) @@ -176,7 +176,7 @@ if (mad_recv_wc->wc->status != IB_WC_SUCCESS) goto out; - length = 256; /* until RMPP is supported */ + length = mad_recv_wc->mad_len; packet = kmalloc(sizeof *packet + length, GFP_KERNEL); if (!packet) goto out; @@ -184,8 +184,10 @@ memset(packet, 0, sizeof *packet + length); packet->length = length; - memcpy(packet->mad.data, mad_recv_wc->recv_buf.mad, length); + ib_coalesce_recv_mad(mad_recv_wc, packet->mad.data); + packet->mad.hdr.status = 0; + packet->mad.hdr.length = length + sizeof (struct ib_user_mad); packet->mad.hdr.qpn = cpu_to_be32(mad_recv_wc->wc->src_qp); packet->mad.hdr.lid = cpu_to_be16(mad_recv_wc->wc->slid); packet->mad.hdr.sl = mad_recv_wc->wc->sl; @@ -214,7 +216,7 @@ struct ib_umad_packet *packet; ssize_t ret; - if (count < sizeof (struct ib_user_mad) + 256) /* until RMPP supported */ + if (count < sizeof (struct ib_user_mad) + sizeof (struct ib_mad)) return -EINVAL; spin_lock_irq(&file->recv_lock); @@ -237,9 +239,14 @@ spin_unlock_irq(&file->recv_lock); - if (count < packet->length + sizeof (struct ib_user_mad)) - ret = -EINVAL; - else if (copy_to_user(buf, &packet->mad, + if (count < packet->length + sizeof (struct ib_user_mad)) { + /* Return length needed (and first RMPP segment) if too small */ + if (copy_to_user(buf, &packet->mad, + sizeof (struct ib_user_mad) + sizeof (struct ib_mad))) + ret = -EFAULT; + else + ret = -ENOSPC; + } else if (copy_to_user(buf, &packet->mad, packet->length + sizeof (struct ib_user_mad))) ret = -EFAULT; else Index: docs/user_mad.txt =================================================================== --- docs/user_mad.txt (revision 2757) +++ docs/user_mad.txt (working copy) @@ -28,16 +28,38 @@ Receiving MADs - MADs are received using read(). The receive side does not currently - support RMPP so the buffer passed to read() must be at least one + MADs are received using read(). The receive side now supports + RMPP. The buffer passed to read() must be at least one struct ib_user_mad + 256 bytes. For example: + If the buffer passed is not large enough to hold the received + MAD (RMPP), the errno is set to ENOSPC and the length of the + buffer needed is set in mad.length. + + Example for normal MAD (non RMPP) reads: struct ib_user_mad *mad; mad = malloc(sizeof *mad + 256); ret = read(fd, mad, sizeof *mad + 256); - if (ret != sizeof mad + 256) + if (ret != sizeof mad + 256) { perror("read"); + free(mad); + } + Example for RMPP reads: + struct ib_user_mad *mad; + mad = malloc(sizeof *mad + 256); + ret = read(fd, mad, sizeof *mad + 256); + if (ret == -ENOSPC)) { + length = mad.length; + free(mad); + mad = malloc(sizeof *mad + length); + ret = read(fd, mad, sizeof *mad + length); + } + if (ret < 0) { + perror("read"); + free(mad); + } + In addition to the actual MAD contents, the other struct ib_user_mad fields will be filled in with information on the received MAD. For example, the remote LID will be in mad.lid. From mshefty at ichips.intel.com Fri Jul 1 14:41:11 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 01 Jul 2005 14:41:11 -0700 Subject: [openib-general] sending RMPP non-data messages Message-ID: <42C5B877.5040901@ichips.intel.com> Looking at the kernel RMPP implementation, I think that we want to ensure that an application doesn't send a non-DATA RMPP MAD. The RMPP code would interpret such a MAD as being internally generated when it wasn't. - Sean From jlentini at netapp.com Fri Jul 1 14:46:13 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 1 Jul 2005 17:46:13 -0400 (EDT) Subject: [openib-general] comments on DAT registry in OpenIB In-Reply-To: <20050630104845.GA11393@lst.de> References: <20050630104845.GA11393@lst.de> Message-ID: On Thu, 30 Jun 2005, Christoph Hellwig wrote: > Could you please stop that comitte crap? > > James, what do you think about doing an s/DAT/RDMA/ and s/dat/rdma/ > on the code so we can stop this endless mess? If we explain the changes we're making, there should be no need to do that. > In the end it won't look like dat anyway, and the sooner why make > that absolutely clear that less idiocy like this is going to happen. From jlentini at netapp.com Fri Jul 1 14:50:13 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 1 Jul 2005 17:50:13 -0400 (EDT) Subject: [openib-general] comments on DAT registry in OpenIB In-Reply-To: <52slyzakdv.fsf@topspin.com> References: <1AC79F16F5C5284499BB9591B33D6F0004D1F645@orsmsx408> <52slyzakdv.fsf@topspin.com> Message-ID: On Thu, 30 Jun 2005, Roland Dreier wrote: > Robert> I think that your suggestion to s/DAT/RDMA makes sense, > Robert> since this code is quickly becoming "the" RDMA transport > Robert> independent interface for Linux, rather than trying to > Robert> RNIC-PI unionize the IB core layer to make it support both > Robert> IB and iWarp. > > I disagree. It doesn't make sense to me for us to add an abstraction > layer on top of another abstraction layer -- let's just fix the first > abstraction layer. > > If we follow the approach of changing the name of DAT to RDMA and then > putting it in the kernel, we end up with a stack that looks like: > > upper layer protocol <-> RDMA midlayer <-> IB RDMA provider <-> IB midlayer <-> IB low-level driver I'd note that the RDMA midlayer above is very thin (plus or minus 1000 lines for headers and source files). > > Let's just evolve the IB midlayer so the picture can be more sensible: > > upper layer protocol <-> RDMA midlayer <-> IB low-level driver > > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Fri Jul 1 15:00:59 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jul 2005 18:00:59 -0400 Subject: [openib-general] Re: sending RMPP non-data messages In-Reply-To: <42C5B877.5040901@ichips.intel.com> References: <42C5B877.5040901@ichips.intel.com> Message-ID: <1120255259.4389.7.camel@hal.voltaire.com> On Fri, 2005-07-01 at 17:41, Sean Hefty wrote: > Looking at the kernel RMPP implementation, I think that we want to ensure > that an application doesn't send a non-DATA RMPP MAD. The RMPP code would > interpret such a MAD as being internally generated when it wasn't. Right; sounds like a good safety check to me. There should be no need for an application to generate anything other than DATA. ACKs, etc. are all hidden and handled rather nicely by your implementation. If there were some reason to terminate an in progress RMPP send, would that be handled by a cancel operation ? -- Hal From rolandd at cisco.com Fri Jul 1 16:36:18 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 01 Jul 2005 16:36:18 -0700 Subject: [openib-general] Service Record support in SA query In-Reply-To: <1120237130.4371.3616.camel@hal.voltaire.com> (Hal Rosenstock's message of "01 Jul 2005 12:58:50 -0400") References: <1120237130.4371.3616.camel@hal.voltaire.com> Message-ID: <52wtoayu8d.fsf@topspin.com> Hal> Hi Roland, Any more comments on the Service Record support in Hal> SA query patch or is this OK to commit ? I haven't really had a chance to read it. I'll get to it by Tuesday. - R. From halr at voltaire.com Fri Jul 1 16:42:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jul 2005 19:42:14 -0400 Subject: [openib-general] [PATCH] [udapl dtest]: Fix broken statement Message-ID: <1120261202.4389.32.camel@hal.voltaire.com> [udapl dtest]: Fix broken statement Signed-off-by: Hal Rosenstock Index: dtest.c =================================================================== --- dtest.c (revision 2769) +++ dtest.c (working copy) @@ -298,8 +298,7 @@ ep_attr.qos = 0; ep_attr.recv_completion_flags = 0; ep_attr.max_recv_dtos = MSG_BUF_COUNT + (burst*3); - ep_attr.max_request_dtos = MSG_BUF_COUNT + (burst*3) - MAX_RDMA_RD; + ep_attr.max_request_dtos = MSG_BUF_COUNT + (burst*3) + MAX_RDMA_RD; ep_attr.max_recv_iov = 1; ep_attr.max_request_iov = 1; ep_attr.max_rdma_read_in = MAX_RDMA_RD; From libor at topspin.com Fri Jul 1 17:25:32 2005 From: libor at topspin.com (Libor Michalek) Date: Fri, 1 Jul 2005 17:25:32 -0700 Subject: [openib-general] Re: uCM create connection ID In-Reply-To: <42C4A3F2.2020000@ichips.intel.com>; from ardavis@ichips.intel.com on Thu, Jun 30, 2005 at 07:01:22PM -0700 References: <42C1BA15.7060205@ichips.intel.com> <20050629111038.G26240@topspin.com> <42C2F991.60005@ichips.intel.com> <20050629181054.J26240@topspin.com> <42C4A3F2.2020000@ichips.intel.com> Message-ID: <20050701172532.A11410@topspin.com> On Thu, Jun 30, 2005 at 07:01:22PM -0700, Arlin Davis wrote: > Libor Michalek wrote: > > > The listen id is in the req rcvd event. (event->param.req_rcvd.listen_id) > >Do you mean that it is not being set correctly? > > > > > Ok, I didn't look deep enough. It is set correctly and the polling seems > to be working. > > The uDAPL code is now connecting properly but I am having difficulty > setting the QP states properly without the ib_cm_init_qp_attr() call. > Any chance of providing this call in uCM? To recreate that call, I was going to expand the ib_cm_attr_id() call that's currently there, to retreive the available connection information, and then provide a ib_cm_init_qp_attr() like wrapper around it. -Libor From halr at voltaire.com Sat Jul 2 04:20:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Jul 2005 07:20:39 -0400 Subject: [openib-general] mstflint doesn't build on x86_64 Message-ID: <1120303238.4389.754.camel@hal.voltaire.com> Hi, mstflint doesn't build on x86_64. Here is the version of g++: g++ -v Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.2/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,java,f77 --enable-java-awt=gtk --host=x86_64-redhat-linux Thread model: posix gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3) Thanks. -- Hal g++ -O2 -g -I. -fno-exceptions -Wall flint.cpp -o mstflint flint.cpp:598: error: expected `0' before "" flint.cpp:598: error: invalid initializer for virtual method `virtual bool Flash::read(u_int32_t, u_int32_t*)' flint.cpp:598: error: expected `;' before "" flint.cpp:615: error: expected `0' before "" flint.cpp:615: error: invalid initializer for virtual method `virtual bool Flash::wait_ready(const char*)' flint.cpp:615: error: expected `;' before "" flint.cpp:673: error: expected `0' before "" flint.cpp:673: error: invalid initializer for virtual method `virtual bool Flash::init_gpios()' flint.cpp:673: error: expected `;' before "" flint.cpp:675: error: expected `0' before "" flint.cpp:675: error: invalid initializer for virtual method `virtual bool Flash::get_cmd_set()' flint.cpp:675: error: expected `;' before "" flint.cpp:680: error: expected `0' before "" flint.cpp:680: error: invalid initializer for virtual method `virtual bool Flash::write_internal(u_int32_t, u_int8_t)' flint.cpp:680: error: expected `;' before "" flint.cpp:695: error: expected `0' before "" flint.cpp:695: error: invalid initializer for virtual method `virtual bool Flash::CmdSet::write(u_int32_t, void*, int, bool, bool)' flint.cpp:695: error: expected `;' before "" flint.cpp:698: error: expected `0' before "" flint.cpp:698: error: invalid initializer for virtual method `virtual bool Flash::CmdSet::erase_sector(u_int32_t)' flint.cpp:698: error: expected `;' before "" flint.cpp:700: error: expected `0' before "" flint.cpp:700: error: invalid initializer for virtual method `virtual bool Flash::CmdSet::reset()' flint.cpp:700: error: expected `;' before "" flint.cpp:766: error: expected `0' before "" flint.cpp:766: error: invalid initializer for virtual method `virtual bool Flash::set_bank_int(u_int32_t)' flint.cpp:766: error: expected `;' before "" flint.cpp: In member function `virtual bool Flash::erase_sector(u_int32_t)': flint.cpp:620: error: 'class Flash::CmdSet' has no member named 'erase_sector' flint.cpp: In member function `virtual bool Flash::open(const char*, bool)': flint.cpp:1264: error: `init_gpios' undeclared (first use this function) flint.cpp:1264: error: (Each undeclared identifier is reported only once for each function it appears in.) flint.cpp:1268: error: `get_cmd_set' undeclared (first use this function) flint.cpp:1272: error: 'class Flash::CmdSet' has no member named 'reset' flint.cpp: In member function `virtual bool Flash::read(u_int32_t, void*, int, bool)': flint.cpp:1344: error: no matching function for call to `Flash::read(u_int32_t&, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) flint.cpp: In member function `virtual bool Flash::write(u_int32_t, void*, int, bool, bool)': flint.cpp:1388: error: 'class Flash::CmdSet' has no member named 'write' flint.cpp: In member function `virtual bool Flash::write(u_int32_t, u_int32_t)': flint.cpp:1413: error: no matching function for call to `Flash::read(u_int32_t&, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) flint.cpp: In member function `bool Flash::set_bank(u_int32_t)': flint.cpp:1432: error: `set_bank_int' undeclared (first use this function) flint.cpp: In function `bool repair(Flash&, int, int, bool)': flint.cpp:3146: error: no matching function for call to `Flash::read(unsigned int, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) flint.cpp:3154: error: no matching function for call to `Flash::read(unsigned int, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) flint.cpp: In function `bool FailSafe_burn_internal(Flash&, void*, int, bool)': flint.cpp:3321: error: no matching function for call to `Flash::read(unsigned int, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) flint.cpp: In function `bool FailSafe_burn(Flash&, void*, int, bool, bool)': flint.cpp:3378: error: no matching function for call to `Flash::read(int, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) flint.cpp:3440: error: no matching function for call to `Flash::read(unsigned int, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) flint.cpp:3541: error: no matching function for call to `Flash::read(unsigned int, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) flint.cpp: In function `void TerminationHandler(int)': flint.cpp:4686: error: 'class Flash' has no member named 'wait_ready' flint.cpp: In function `int main(int, char**)': flint.cpp:5153: error: no matching function for call to `Flash::read(u_int32_t&, u_int32_t*)' flint.cpp:1324: note: candidates are: virtual bool Flash::read(u_int32_t, void*, int, bool) make: *** [mstflint] Error 1 From johannes at erdfelt.com Sat Jul 2 09:37:12 2005 From: johannes at erdfelt.com (Johannes Erdfelt) Date: Sat, 2 Jul 2005 09:37:12 -0700 Subject: [openib-general] mstflint doesn't build on x86_64 In-Reply-To: <1120303238.4389.754.camel@hal.voltaire.com> References: <1120303238.4389.754.camel@hal.voltaire.com> Message-ID: <20050702163712.GA1388@sventech.com> On Sat, Jul 02, 2005, Hal Rosenstock wrote: > mstflint doesn't build on x86_64. Here is the version of g++: > g++ -v > Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.2/specs > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,java,f77 --enable-java-awt=gtk --host=x86_64-redhat-linux > Thread model: posix > gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3) It's not an x86-64 problem, it's a C++ problem. mstflint uses NULL where it should be using 0 for the function protoytypes in the class definitions. Older gcc versions were a bit more lenient, but it has tightened up to meet the C++ specs recently. I just recently ran into this problem and I have a patch, but it's at the office. If someone doesn't get around to it before Tuesday, I'll send it in. JE From eitan at mellanox.co.il Sat Jul 2 12:38:51 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 2 Jul 2005 22:38:51 +0300 Subject: [openib-general] IBDM - IB DataModel build issues fixed Message-ID: <506C3D7B14CDD411A52C00025558DED607C30401@mtlex01.yok.mtl.com> Hi Hal, (IBDM is a flexible data model for describing arbitrary IB systems and then instantiating them in a topology. Then based on this topology an in-memory data model is built and various checking, simulation and analysis procedures are provided.) Could not find your last mail with the issue. But I was able to update the IBDM code at: https://openib.org/svn/gen2/utils/src/linux-user/ibdm I was able to checkout and build it with no issue But only after installing: autoconf 2.59 automake 1.9.3 EZ Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sat Jul 2 14:23:28 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 3 Jul 2005 00:23:28 +0300 Subject: [openib-general] Re: mstflint doesn't build on x86_64 In-Reply-To: <1120303238.4389.754.camel@hal.voltaire.com> References: <1120303238.4389.754.camel@hal.voltaire.com> Message-ID: <20050702212328.GA21801@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: mstflint doesn't build on x86_64 > > Hi, > > mstflint doesn't build on x86_64. Here is the version of g++: > g++ -v > Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.2/specs > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,java,f77 --enable-java-awt=gtk --host=x86_64-redhat-linux > Thread model: posix > gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3) > > Thanks. > > -- Hal Probably more to do with gcc version than the architecture. Does the following patch help? Index: flint.cpp =================================================================== --- flint.cpp (revision 2744) +++ flint.cpp (working copy) @@ -580,7 +580,7 @@ class Flash : public FBase { public: Flash(u_int32_t log2_bank_size) : _mf(0), - _cmd_set(NULL), + _cmd_set(0), _curr_bank(0xffffffff), _log2_bank_size(log2_bank_size) {} @@ -595,7 +595,7 @@ public: virtual void close (); virtual bool read (u_int32_t addr, - u_int32_t *data) = NULL; + u_int32_t *data) = 0; virtual bool read (u_int32_t addr, void* data, @@ -612,7 +612,7 @@ public: get_size () {return _cfi_data.device_size ? _cfi_data.device_size : (u_int32_t)MAX_FLASH;} - virtual bool wait_ready (const char* msg = NULL) = NULL; + virtual bool wait_ready (const char* msg = 0) = 0; // Write and Erase functions are performed by the Command Set @@ -670,14 +670,14 @@ protected: virtual bool lock (bool retry=true); virtual bool unlock (); - virtual bool init_gpios () = NULL; + virtual bool init_gpios () = 0; - virtual bool get_cmd_set () = NULL; + virtual bool get_cmd_set () = 0; bool set_bank (u_int32_t addr); virtual bool write_internal(u_int32_t addr, - u_int8_t data) = NULL; + u_int8_t data) = 0; bool write_internal(u_int32_t addr, u_int8_t* data, @@ -692,12 +692,12 @@ protected: void* data, int cnt, bool noerase = false, - bool noverify = false) = NULL; + bool noverify = false) = 0; - //virtual bool unlock_bypass (bool unlock) = NULL; - virtual bool erase_sector (u_int32_t addr) = NULL; + //virtual bool unlock_bypass (bool unlock) = 0; + virtual bool erase_sector (u_int32_t addr) = 0; - virtual bool reset () = NULL; + virtual bool reset () = 0; protected: @@ -763,7 +763,7 @@ protected: bool print_cfi_info ( const cfi_query *q ); - virtual bool set_bank_int (u_int32_t bank) = NULL; + virtual bool set_bank_int (u_int32_t bank) = 0; u_int32_t bank_mask () {return((1 << _log2_bank_size) -1 );} mfile *_mf; @@ -829,7 +829,7 @@ public: int len, bool verbose=false) {return Flash::read(addr, data, len, verbose);} - virtual bool wait_ready (const char* msg = NULL); + virtual bool wait_ready (const char* msg = 0); bool unlock_bypass (bool unlock); @@ -4100,7 +4100,7 @@ bool patchVSD(FImage& f, char *vsd1, cha strncpy(&vsd[0], vsd1, VSD_OFFS); - if (new_psid == NULL) { + if (new_psid == 0) { // New psid is not explicitly given - take it from image memcpy(image_psid, (char*)ps->psid, sizeof(ps->psid)); @@ -4383,13 +4383,13 @@ Flash* get_serial_flash(mfile* mf) { flash_type_str[flash_type]); } - return NULL; + return 0; } Flash* get_flash(const char* device) { - Flash* f = NULL; + Flash* f = 0; // // Check device ID. Allocate flash accordingly @@ -4400,7 +4400,7 @@ Flash* get_flash(const char* device) { mfile* mf = mopen(device); if (!mf) { printf("*** ERROR *** Can't open %s: %s\n", device, strerror(errno)); - return NULL; + return 0; } if (mread4(mf, 0xf0014, &dev_id) != 4) return false; @@ -4658,7 +4658,7 @@ void usage(const char *sname, bool full // Signal handlers // -Flash* g_flash = NULL; +Flash* g_flash = 0; int g_signals_for_termination[] = { SIGINT, @@ -4679,7 +4679,7 @@ void TerminationHandler (int signum) signal (signum, SIG_DFL); - if (g_flash != NULL) { + if (g_flash != 0) { report("\n Received signal %d. Cleaning up ...", signum); fflush(stdout); sleep(1); // let erase sector end @@ -4756,7 +4756,7 @@ int main(int ac, char *av[]) auto_ptr tmp( get_flash(device)); f = tmp; - if (f.get() == NULL) { + if (f.get() == 0) { printf("*** ERROR *** Can't get flash type using device %s\n", device); rc = 1; goto done; } -- MST From panda at cse.ohio-state.edu Sat Jul 2 21:42:48 2005 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun, 3 Jul 2005 00:42:48 -0400 (EDT) Subject: [openib-general] Announcing the release of MVAPICH2 0.6.5 with uDAPL support Message-ID: <200507030442.j634gmfF002860@xi.cse.ohio-state.edu> The MVAPICH (MPI over InfiniBand) team at the Ohio State University is pleased to announce the release of MVAPICH2 0.6.5 for multiple platforms (EM64T, G5, IA-32, IA-64, and Opteron) and network interfaces (PCI-X and PCI-Express-including the new mem-free cards). In addition to the optimized design over the VAPI interface for InfiniBand, MVAPICH2 0.6.5 also supports optimized design over the uDAPL interface so that users can take advantage of MPI-2 functionalities available in MVAPICH2 on networks supporting uDAPL provider library. The uDAPL interface of MVAPICH2 0.6.5 has been tested with InfiniBand (IBGD uDAPL), Myrinet (DAPL-GM beta), and Ammasso GigE (Ammasso uDAPL). Through the uDAPL interface, it provides portability across different networks while delivering high performance. MVAPICH2 0.6.5 is being distributed as a single integrated package (with the latest MPICH2 1.0.1 and MVICH). It can be downloaded with a `single click' and installed. It is available under BSD license. MVAPICH/MVAPICH2 software is being used by more than 230 organizations world-wide (in 27 countries) to extract the potential of InfiniBand networking technology for designing high-end computing systems and servers. It is also being distributed by many IBA vendors in their software distributions. This new release has the following features: - MPI-2 functionalities (one-sided, collectives, datatype) - all MPI-1 functionalities - optimized one-sided operations (Get, Put, and Accumulate) - support for active and passive synchronization - optimized two-sided operations - uDAPL support (tested for InfiniBand, Myrinet, and Ammasso GigE) - scalable job start-up - optimized and tuned for the above platforms and different network interfaces (PCI-X and PCI-Express) - single code base for all of the above platforms - memory efficient scaling modes for medium and large clusters Other features of this release include: - Excellent performance: For two-sided operations, MVAPICH2 0.6.5 with VAPI interface delivers 5.0 microsec latency (with the switch), up to 965 MB/sec unidirectional bandwidth, and up to 1725 MB/sec bidirectional bandwidth on EM64T system with PCI-Express. For one-sided Put operation, MVAPICH2 0.6.5 delivers 6.98 microsec latency (with the switch), and up to 972 MB/sec unidirectional bandwidth on the above platform. - With uDAPL interface, MVAPICH2 0.6.5 delivers latency for small messages very close (within 1.0 microsec) to that of the uDAPL library. It delivers bandwidth very close to that provided by the VAPI interface. - Detailed performance numbers for two-sided and all one-sided operations on various platforms and interconnects using VAPI and uDAPL interfaces are available on the project's web page. - A set of benchmarks to evaluate one-sided operations (Put, Get, and Accumulate) - An enhanced and detailed `User and Tuning Guide' to assist users: - to install this package on different platforms with both interfaces (VAPI and uDAPL) and different options - to vary different parameters of the MPI installation to extract maximum performance and achieve scalability, especially on large-scale systems. You are welcome to download the MVAPICH2 0.6.5 package and access relevant information from the following URL: http://nowlab.cis.ohio-state.edu/projects/mpi-iba/ Our upcoming release will include a high performance design of MVAPICH2 with MPICH2 1.0.2 and uDAPL support. All feedbacks, including bug reports and hints for performance tuning, are welcome. Please send an e-mail to mvapich-help at cse.ohio-state.edu. Thanks, MVAPICH Team at OSU/NBCL ---------- PS: If you would like to be removed from this mailing list, please end an e-mail to mvapich_request at cse.ohio-state.edu. From rep.nop at aon.at Sun Jul 3 15:42:56 2005 From: rep.nop at aon.at (Bernhard Fischer) Date: Mon, 4 Jul 2005 00:42:56 +0200 Subject: [openib-general] Re: [openib-commits] r2776 - gen2/utils/src/linux-user/ibdm/datamodel In-Reply-To: <20050702192533.51E6922834D@openib.ca.sandia.gov> References: <20050702192533.51E6922834D@openib.ca.sandia.gov> Message-ID: <20050703224256.GC5997@aon.at> On Sat, Jul 02, 2005 at 12:25:33PM -0700, eitan at openib.org wrote: >Modified: gen2/utils/src/linux-user/ibdm/datamodel/Fabric.cpp Eitan, - fix a few typos. Signed-off-by: Bernhard Fischer -------------- next part -------------- Index: utils/src/linux-user/ibdm/datamodel/ibdm.i =================================================================== --- utils/src/linux-user/ibdm/datamodel/ibdm.i (revision 2778) +++ utils/src/linux-user/ibdm/datamodel/ibdm.i (working copy) @@ -780,7 +780,7 @@ int FabricUtilsVerboseLevel; IBDM exposes some of its internal objects. The objects identifiers returned by the various function calls are formatted - acording to the following rules: + according to the following rules: Fabric: fabric: System: system:: SysPort: sysport::: Index: utils/src/linux-user/ibdm/datamodel/Fabric.cpp =================================================================== --- utils/src/linux-user/ibdm/datamodel/Fabric.cpp (revision 2778) +++ utils/src/linux-user/ibdm/datamodel/Fabric.cpp (working copy) @@ -34,7 +34,7 @@ /* IB Fabric Data Model -This file hodls implementation of the data model classes and methods +This file holds implementation of the data model classes and methods */ @@ -119,8 +119,8 @@ IBPort::connect (IBPort *p_otherPort, if (p_remotePort != p_otherPort) { cout << "-W- Disconnecting: " << p_remotePort->getName() << " previously connected to:" - << p_remotePort->getName() - << " whil econnecting:" << p_otherPort->getName() << endl; + << p_remotePort->getName() + << " while connecting:" << p_otherPort->getName() << endl; // the other side should be cleaned only if points here if (p_remotePort->p_remotePort == this) { p_remotePort->p_remotePort = NULL; @@ -580,7 +580,7 @@ IBSystem::getSysPort(string name) { // constructor: IBSystem::IBSystem(string n, class IBFabric *p_fab, string t) { if (p_fab->getSystem(n)) { - cerr << "Can' deal with double allocation of same system!" << endl; + cerr << "Can't deal with double allocation of same system!" << endl; abort(); } name = n; @@ -679,7 +679,7 @@ IBSystem::removeBoard (string boardName) // Warn if no match: if (matchedNodes.empty()) { - cout << "-W-(RemoveBoard) Fail to find any node in:" + cout << "-W- removeBoard : Fail to find any node in:" << sysNodePrefix << " while removing:" << boardName << endl; return 1; } @@ -1636,7 +1636,7 @@ IBFabric::parseFdbFile(string fn) { //cout << "-W- Ignoring line:" << sLine << endl; } - cout << "-I- Defined " << fdbLines << " fdb entires for:" + cout << "-I- Defined " << fdbLines << " fdb entries for:" << switches << " switches" << endl; f.close(); return anyErr; @@ -1715,7 +1715,7 @@ IBFabric::parseMCFdbFile(string fn) { //cout << "-W- Ignoring line:" << sLine << endl; } - cout << "-I- Defined " << fdbLines << " Multicast Fdb entires for:" + cout << "-I- Defined " << fdbLines << " Multicast Fdb entries for:" << switches << " switches" << endl; f.close(); return anyErr; Index: utils/src/linux-user/IBMgtSim/src/sma.cpp =================================================================== --- utils/src/linux-user/IBMgtSim/src/sma.cpp (revision 2778) +++ utils/src/linux-user/IBMgtSim/src/sma.cpp (working copy) @@ -904,7 +904,7 @@ int IBMSSma::setIBPortBaseLid( unsigned int portLidIndex; MSG_ENTER_FUNC; - MSGREG(inf0, 'I', "Seting base_lid for node:$ port:$ to $", "setIBPortBaseLid"); + MSGREG(inf0, 'I', "Setting base_lid for node:$ port:$ to $", "setIBPortBaseLid"); pNode = pSimNode->getIBNode(); pPort = pNode->getPort(portNum); if (! pPort) @@ -925,7 +925,7 @@ int IBMSSma::setIBPortBaseLid( /* make sure the vector of port by lid has enough entries */ if (pNode->p_fabric->PortByLid.size() <= base_lid) { - /* we add 20 entires each time */ + /* we add 20 entries each time */ pNode->p_fabric->PortByLid.resize(base_lid+20); for ( portLidIndex = pNode->p_fabric->PortByLid.size(); portLidIndex < base_lid + 20; From rep.nop at aon.at Sun Jul 3 15:51:04 2005 From: rep.nop at aon.at (Bernhard Fischer) Date: Mon, 4 Jul 2005 00:51:04 +0200 Subject: [openib-general] [PATCH][osm] commentary typos Message-ID: <20050703225104.GD5997@aon.at> Hi Hal, - commentary typos Signed-off-by: Bernhard Fischer -------------- next part -------------- Index: trunk/src/userspace/management/osm/opensm/osm_state_mgr.c =================================================================== --- trunk/src/userspace/management/osm/opensm/osm_state_mgr.c (revision 2778) +++ trunk/src/userspace/management/osm/opensm/osm_state_mgr.c (working copy) @@ -518,7 +518,7 @@ __osm_state_mgr_reset_node_count( { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_state_mgr_reset_node_count: " - "Reseting discovery count for node 0x%" PRIx64 ".\n", + "Resetting discovery count for node 0x%" PRIx64 ".\n", cl_ntoh64( osm_node_get_node_guid( p_node ) )); } @@ -539,7 +539,7 @@ __osm_state_mgr_reset_port_count( { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_state_mgr_reset_port_count: " - "Reseting discovery count for port 0x%" PRIx64 ".\n", + "Resetting discovery count for port 0x%" PRIx64 ".\n", cl_ntoh64( osm_port_get_guid( p_port ) )); } @@ -560,7 +560,7 @@ __osm_state_mgr_reset_switch_count( { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_state_mgr_reset_switch_count: " - "Reseting discovery count for switch 0x%" PRIx64 ".\n", + "Resetting discovery count for switch 0x%" PRIx64 ".\n", cl_ntoh64( osm_node_get_node_guid( p_sw->p_node ) )); } Index: trunk/src/userspace/management/osm/opensm/osm_drop_mgr.c =================================================================== --- trunk/src/userspace/management/osm/opensm/osm_drop_mgr.c (revision 2778) +++ trunk/src/userspace/management/osm/opensm/osm_drop_mgr.c (working copy) @@ -271,7 +271,7 @@ __osm_drop_mgr_remove_port( osm_port_discovery_count_reset( p_remote_port ); osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_drop_mgr_remove_port: " - "reseting discovery count of node: " + "resetting discovery count of node: " "0x%016" PRIx64 " port num:%u.\n", cl_ntoh64( osm_node_get_node_guid( p_remote_node ) ), remote_port_num ); Index: trunk/src/userspace/management/osm/include/complib/cl_timer.h =================================================================== --- trunk/src/userspace/management/osm/include/complib/cl_timer.h (revision 2778) +++ trunk/src/userspace/management/osm/include/complib/cl_timer.h (working copy) @@ -331,7 +331,7 @@ cl_timer_trim( * remaining time when the timer is set. * * If the new interval time is less than the remaining time, cl_timer_trim -* implicitly stops the timer before reseting it. +* implicitly stops the timer before resetting it. * * If the timer is reset, it is guaranteed to expire no sooner than the * new interval, but may take longer to expire. Index: trunk/src/userspace/management/osm/complib/cl_event.c =================================================================== --- trunk/src/userspace/management/osm/complib/cl_event.c (revision 2778) +++ trunk/src/userspace/management/osm/complib/cl_event.c (working copy) @@ -102,7 +102,7 @@ cl_event_signal( cl_spinlock_acquire( &p_event->spinlock ); p_event->signaled = TRUE; - /* Wake up one or all depending on whether the event is auto-reseting. */ + /* Wake up one or all depending on whether the event is auto-resetting. */ if( p_event->manual_reset ) pthread_cond_broadcast( &p_event->condvar ); else From eitan at mellanox.co.il Mon Jul 4 00:20:16 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 4 Jul 2005 10:20:16 +0300 Subject: [openib-general] RE: [openib-commits] r2776 - gen2/utils/src/linux-user/ibdm/datam odel Message-ID: <506C3D7B14CDD411A52C00025558DED607C3040A@mtlex01.yok.mtl.com> Thanks. Committed as 2779. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Bernhard Fischer [mailto:rep.nop at aon.at] > Sent: Monday, July 04, 2005 1:43 AM > To: Eitan Zahavi > Cc: openib-general at openib.org > Subject: Re: [openib-commits] r2776 - gen2/utils/src/linux-user/ibdm/datamodel > > On Sat, Jul 02, 2005 at 12:25:33PM -0700, eitan at openib.org wrote: > >Modified: gen2/utils/src/linux-user/ibdm/datamodel/Fabric.cpp > > Eitan, > > - fix a few typos. > > Signed-off-by: Bernhard Fischer -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomduffy at gmail.com Mon Jul 4 09:59:18 2005 From: tomduffy at gmail.com (Tom Duffy) Date: Mon, 4 Jul 2005 12:59:18 -0400 Subject: [openib-general] [oops] recent opensm crash In-Reply-To: <1120238676.4371.3718.camel@hal.voltaire.com> References: <1120169407.29522.17.camel@duffman> <1120238676.4371.3718.camel@hal.voltaire.com> Message-ID: <9d3b7de705070409594e2bf25b@mail.gmail.com> On 01 Jul 2005 13:24:37 -0400, Hal Rosenstock wrote: > I found one problem associated with this and just checked in a patch. > I'm not sure whether there is another one behind this or not. Any > reliable way to recreate this ? No, it happened after being up for weeks one time when I was bringing an interface down and back up again. I will report if I get another crash later. Thanks, -tduffy From mst at mellanox.co.il Mon Jul 4 11:40:38 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 4 Jul 2005 21:40:38 +0300 Subject: [openib-general] Re: [PATCH] updates to copyright In-Reply-To: References: Message-ID: <20050704184038.GA16650@mellanox.co.il> The following patch adds copyright statements for Mellanox to files that I've touched and that are missing them. --- Add Mellanox copyright to files modified by Michael S. Tsirkin. Signed-off-by: Michael S. Tsirkin Index: infiniband/core/uverbs_mem.c =================================================================== --- infiniband/core/uverbs_mem.c (revision 2780) +++ infiniband/core/uverbs_mem.c (working copy) @@ -1,6 +1,7 @@ /* * Copyright (c) 2005 Topspin Communications. All rights reserved. * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2005 Mellanox Technologies. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU Index: infiniband/core/uverbs_main.c =================================================================== --- infiniband/core/uverbs_main.c (revision 2780) +++ infiniband/core/uverbs_main.c (working copy) @@ -1,6 +1,7 @@ /* * Copyright (c) 2005 Topspin Communications. All rights reserved. * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2005 Mellanox Technologies. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU Index: infiniband/core/uverbs.h =================================================================== --- infiniband/core/uverbs.h (revision 2780) +++ infiniband/core/uverbs.h (working copy) @@ -1,6 +1,7 @@ /* * Copyright (c) 2005 Topspin Communications. All rights reserved. * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2005 Mellanox Technologies. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU -- MST From rep.nop at aon.at Mon Jul 4 11:51:10 2005 From: rep.nop at aon.at (Bernhard Fischer) Date: Mon, 4 Jul 2005 20:51:10 +0200 Subject: [openib-general] RE: [openib-commits] r2776 - gen2/utils/src/linux-user/ibdm/datam odel In-Reply-To: <506C3D7B14CDD411A52C00025558DED607C3040A@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED607C3040A@mtlex01.yok.mtl.com> Message-ID: <20050704185110.GE5997@aon.at> On Mon, Jul 04, 2005 at 10:20:16AM +0300, Eitan Zahavi wrote: >Thanks. Committed as 2779. I'm sorry but i overlooked another occurance of acor; attached. Would you be so kind and do these in ibdm and IBMgtSim? TIA. s/teh/the/g s/eb/be/g s/fo/of/g this may or may not work for you: egrep -ri "([[:space:]](fo|teh|eb)[[:space:]])" . - fix a few typos. Signed-off-by: Bernhard Fischer -------------- next part -------------- Index: utils/src/linux-user/IBMgtSim/src/ibdm.i =================================================================== --- utils/src/linux-user/IBMgtSim/src/ibdm.i (revision 2780) +++ utils/src/linux-user/IBMgtSim/src/ibdm.i (working copy) @@ -732,7 +732,7 @@ int FabricUtilsVerboseLevel; IBDM exposes some of its internal objects. The objects identifiers returned by the various function calls are formatted - acording to the following rules: + according to the following rules: Fabric: fabric: System: system:: SysPort: sysport::: Index: utils/src/linux-user/IBMgtSim/utils/RunSimTest =================================================================== --- utils/src/linux-user/IBMgtSim/utils/RunSimTest (revision 2780) +++ utils/src/linux-user/IBMgtSim/utils/RunSimTest (working copy) @@ -112,18 +112,18 @@ proc Help {} { # OpenSM log file analyzer # # Continuously monitor the OpenSM log file and generate a log of all the -# events reported in teh log file. +# events reported in the log file. # # On any event - it scans through the list of callbacks to be invoked -# and calls them acordingly. The list of callabcks is in osmLogCallbacks(logFile) +# and calls them accordingly. The list of callbacks is in osmLogCallbacks(logFile) # -# The log fo all events is accoumulated in the global list: +# The log of all events is accoumulated in the global list: # osmEventLog(logFile) # # The format of the event log list entry is: #