From halr at voltaire.com Sat Jul 1 03:43:18 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jul 2006 06:43:18 -0400 Subject: [openib-general] Reloading of partition policy In-Reply-To: <44A568A8.1020908@3leafnetworks.com> References: <44A46775.1000507@3leafnetworks.com> <1151639421.4478.746.camel@hal.voltaire.com> <1151680218.4478.28538.camel@hal.voltaire.com> <44A568A8.1020908@3leafnetworks.com> Message-ID: <1151750596.4478.77252.camel@hal.voltaire.com> On Fri, 2006-06-30 at 14:08, Venkatesh Babu wrote: > The document doesn't describe the scenario where nodes are added/deleted > from the partition table. I raised this issue because it could be an > important use case. If this can be achieved without restarting the > OpenSM, it is good. It already does. > Just one more clarification - sending HUP signal doesn't cause OpenSM > failover to other standby one right ? Sending the HUP signal to OpenSM in master state causes the partition file to be reread and a heavy sweep to be invoked. This does not cause a (SM) failover. -- Hal > VBabu > > Hal Rosenstock wrote: > > On Thu, 2006-06-29 at 23:50, Hal Rosenstock wrote: > > > >> On Thu, 2006-06-29 at 19:51, Venkatesh Babu wrote: > >> > >>> I was reviewing partition-config.txt and OpenSM_PKey_Mgr.txt and had the > >>> following comment - > >>> > >>> If we need to add/delete a node to/from a partition we need to update > >>> the file > >>> > >>> /etc/osm-partitions.txt > >>> > >>> and restart the OpenSM. According to the docs there no way we can do > >>> this without restarting the OpenSM. > >>> > > > > I just looked at those documents and couldn't find what you were > > referring to. Can you be more specific ? > > > > -- Hal > > > > > >>> It would be useful to add new feature to reload the partition table > >>> after making the changes. > >>> > >> Partitions can be deleted and the new partitions enforced via issuing > >> kill -HUP to the OpenSM without restarting now. > >> > >> The document is (already) out of date :-( I will update it shortly. > >> > >> -- Hal > >> > >> > >>> VBabu > >>> > >>> _______________________________________________ > >>> openib-general mailing list > >>> openib-general at openib.org > >>> http://openib.org/mailman/listinfo/openib-general > >>> > >>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >>> > >>> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >> > >> > > > > From mst at mellanox.co.il Sat Jul 1 12:38:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 1 Jul 2006 22:38:18 +0300 Subject: [openib-general] ipath patch series a-comin', but no IB maintainer to shepherd them In-Reply-To: References: Message-ID: <20060701193818.GA10904@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] ipath patch series a-comin', but no IB maintainer to shepherd them > > > Further, in the hope that this will help keep things reasonably stable till > > Roland comes back, and help everyone see what's being merged, I have > > created a git branch for all things infiniband going into 2.6.18. > > > > You can get at it here: > > git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 > > Thanks for doing this ... however www.mellanox.co.il doesn't seem to > have the git port open: > > fatal: unable to connect a socket (Connection refused) > fetch-pack from 'git://www.mellanox.co.il/~git/infiniband' failed. Hmm. Worked for me ... Does ftp.mellanox.co.il work instead? -- MST From mst at mellanox.co.il Sat Jul 1 12:43:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 1 Jul 2006 22:43:23 +0300 Subject: [openib-general] [PATCH 0 of 39] ipath - bug fixes, performanceenhancements, and portability improvements In-Reply-To: <1151686831.2194.7.camel@localhost.localdomain> References: <1151686831.2194.7.camel@localhost.localdomain> Message-ID: <20060701194323.GB10904@mellanox.co.il> Quoting r. Bryan O'Sullivan : > Subject: Re: [PATCH 0 of 39] ipath - bug fixes, performanceenhancements,and portability improvements > > On Fri, 2006-06-30 at 19:31 +0300, Michael S. Tsirkin wrote: > > > OK, next week I'll put these into my tree, too. > > Thanks. The first 37 are in -mm; the last two you can drop until I sort > them out. What about patch 28? -- MST From eitan at mellanox.co.il Sat Jul 1 13:21:18 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 01 Jul 2006 23:21:18 +0300 Subject: [openib-general] [PATCH] osm: pkey manager fail to clear unused peer block Message-ID: <86ejx56pq9.fsf@mtl066.yok.mtl.com> Hi Hal Similar to the failure to clear HCA port pkey block if it was empty. The code for peer port does depend on the neighbor HCA block to exist. So the fix is very similar: use an empty block for comparing with the current peer port in this case. Eitan Signed-off-by: Eitan Zahavi Index: osm/opensm/osm_pkey_mgr.c =================================================================== --- osm/opensm/osm_pkey_mgr.c (revision 8313) +++ osm/opensm/osm_pkey_mgr.c (working copy) @@ -404,6 +404,7 @@ pkey_mgr_update_peer_port( uint16_t peer_max_blocks; ib_api_status_t status = IB_SUCCESS; boolean_t ret_val = FALSE; + ib_pkey_table_t empty_block = {.pkey_entry = {0}, }; p_physp = osm_port_get_default_phys_ptr( p_port ); if (!osm_physp_is_valid( p_physp )) @@ -452,6 +453,9 @@ pkey_mgr_update_peer_port( for (block_index = 0; block_index < p_pkey_tbl->used_blocks; block_index++) { block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index ); + if (!block) + block = &empty_block; + peer_block = osm_pkey_tbl_block_get( p_peer_pkey_tbl, block_index ); if ( !peer_block || memcmp( peer_block, block, sizeof( *peer_block ) ) ) { From mst at mellanox.co.il Sat Jul 1 13:09:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 1 Jul 2006 23:09:32 +0300 Subject: [openib-general] ipath patch series a-comin', but no IB maintainer to shepherd them In-Reply-To: References: Message-ID: <20060701200932.GA11167@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ipath patch series a-comin', but no IB maintainer to shepherd them > > > Further, in the hope that this will help keep things reasonably stable till > > Roland comes back, and help everyone see what's being merged, I have > > created a git branch for all things infiniband going into 2.6.18. > > > > You can get at it here: > > git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 > > Thanks for doing this ... however www.mellanox.co.il doesn't seem to > have the git port open: > > fatal: unable to connect a socket (Connection refused) > fetch-pack from 'git://www.mellanox.co.il/~git/infiniband' failed. OK, should be fixed now, thanks for letting me know. -- MST From akpm at osdl.org Sat Jul 1 15:35:42 2006 From: akpm at osdl.org (Andrew Morton) Date: Sat, 1 Jul 2006 15:35:42 -0700 Subject: [openib-general] [PATCH 0 of 39] ipath - bug fixes, performanceenhancements, and portability improvements In-Reply-To: <20060701194323.GB10904@mellanox.co.il> References: <1151686831.2194.7.camel@localhost.localdomain> <20060701194323.GB10904@mellanox.co.il> Message-ID: <20060701153542.73ac101e.akpm@osdl.org> On Sat, 1 Jul 2006 22:43:23 +0300 "Michael S. Tsirkin" wrote: > Quoting r. Bryan O'Sullivan : > > Subject: Re: [PATCH 0 of 39] ipath - bug fixes, performanceenhancements,and portability improvements > > > > On Fri, 2006-06-30 at 19:31 +0300, Michael S. Tsirkin wrote: > > > > > OK, next week I'll put these into my tree, too. > > > > Thanks. The first 37 are in -mm; the last two you can drop until I sort > > them out. > > What about patch 28? > I sent 1-27,29-37 to Linus and he has merged them. But I held #28 back based on Roland's concerns. From mst at mellanox.co.il Sun Jul 2 00:18:11 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 2 Jul 2006 10:18:11 +0300 Subject: [openib-general] ipath patch series a-comin', but no IB maintainer to shepherd them In-Reply-To: <1151538893.13430.43.camel@obsidian> References: <1151538893.13430.43.camel@obsidian> Message-ID: <20060702071810.GU19300@mellanox.co.il> Quoting r. Bryan O'Sullivan : > Subject: ipath patch series a-comin', but no IB maintainer to shepherd them > > Hi, Andrew - > > I have a pile of patches for the ipath driver that I'd like to get in > during the "open season" window. Roland has his hands full with diapers > and other sprog paraphernalia as of a few days ago, so I doubt he'll see > this message soon, much less care about the patches. > > Given Roland's presumed unavailability, would the appropriate thing be > to drop the patches into -mm and then push them along to Linus, or what? > > http://openib.org/bugzilla/show_bug.cgi?id=159 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from tziporet at mellanox.co.il 2006-07-02 04:22 ------- Item 25 from FM #35398: Gen2 is compliant. In Gen2, the driver does not do any modify_qp commands for LAP/APR. Rather, it relies on the application to manage alternate path migration. This is done via procedures in libibcm/src/cm.c, libibverbs/src/marshall.c, linux-kernel/infiniband/core/cm.c, core/ucm.c, core/uverbs_marshall.c Thus, for example, user-space app can call ib_cm_send_lap() to send the new alt path. The receiver can do ib_cm_poll() to see if there is any event waiting to be pulled to user space. The user-space app can then call ib_cm_get_event() to get the LAP event (including the new alternate path info), and can send the reply using ib_cm_send_apr(). Responsibility for actually performing the modify-qp is left to the user application. This is more in keeping with the IB Spec. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Mon Jul 3 02:45:40 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Jul 2006 05:45:40 -0400 Subject: [openib-general] [PATCH] osm: pkey manager fail to clear unused peer block In-Reply-To: <86ejx56pq9.fsf@mtl066.yok.mtl.com> References: <86ejx56pq9.fsf@mtl066.yok.mtl.com> Message-ID: <1151919934.4478.200353.camel@hal.voltaire.com> Hi Eitan, On Sat, 2006-07-01 at 16:21, Eitan Zahavi wrote: > Hi Hal > > Similar to the failure to clear HCA port pkey block if it was empty. > The code for peer port does depend on the neighbor HCA block to exist. > So the fix is very similar: use an empty block for comparing with the > current peer port in this case. > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied. -- Hal From ogerlitz at voltaire.com Mon Jul 3 03:06:21 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 03 Jul 2006 13:06:21 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060627224857.GE5398@mellanox.co.il> References: <44A1B3F7.7090504@ichips.intel.com> <20060627224857.GE5398@mellanox.co.il> Message-ID: <44A8EC1D.9010605@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Sean Hefty : >>> Can you drive these enhancements only to 2.6.18. >> I would like these features in OFED 1.1 as well. However, there are no users >> of those new interfaces in 2.6.18 that would justify their inclusion. > > I think setting the number of retries and timeout in CMA might be useful for > iSER as well. Or, what do you think? Currently i don't plan to have iSER use these APIs. Sean - looking on the cma/ucma APIs i see that the kernel APIs are not in place yet (eg the equivalent of rdma_set/get_options) or in the kernel the CMA consumer is expected to call directly the APIs exported by rdma_cm_ib.h? Or. From dotanb at mellanox.co.il Mon Jul 3 02:57:33 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Mon, 3 Jul 2006 12:57:33 +0300 Subject: [openib-general] [diags] compilation warning fix on 32 bit machines Message-ID: <200607031257.33792.dotanb@mellanox.co.il> Compilation warning on 32 bit machines was fixed. Signed-off-by: Dotan Barak Index: last_stable/src/userspace/management/diags/src/grouping.c =================================================================== --- last_stable.orig/src/userspace/management/diags/src/grouping.c 2006-07-02 18:09:41.000000000 +0300 +++ last_stable/src/userspace/management/diags/src/grouping.c 2006-07-03 11:50:18.000000000 +0300 @@ -93,7 +93,7 @@ static uint64_t topspin_chassisguid(uint { /* Byte 3 in system image GUID is chassis type, and */ /* Byte 4 is location ID (slot) so just mask off byte 4 */ - return guid & 0xffffffff00ffffff; + return guid & 0xffffffff00ffffffULL; } static uint64_t get_chassisguid(uint64_t guid, uint32_t vendid) From halr at voltaire.com Mon Jul 3 03:15:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Jul 2006 06:15:02 -0400 Subject: [openib-general] [diags] compilation warning fix on 32 bit machines In-Reply-To: <200607031257.33792.dotanb@mellanox.co.il> References: <200607031257.33792.dotanb@mellanox.co.il> Message-ID: <1151921701.4478.201696.camel@hal.voltaire.com> On Mon, 2006-07-03 at 05:57, Dotan Barak wrote: > Compilation warning on 32 bit machines was fixed. > > Signed-off-by: Dotan Barak Thanks. Applied. -- Hal From ogerlitz at voltaire.com Mon Jul 3 03:34:54 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 03 Jul 2006 13:34:54 +0300 Subject: [openib-general] design for communication established affiliated asynchronous event handling In-Reply-To: <44A40468.9070600@ichips.intel.com> References: <44A40468.9070600@ichips.intel.com> Message-ID: <44A8F2CE.6060204@voltaire.com> Sean Hefty wrote: > Rimmer, Todd wrote: >> The CM would open the CA, provide its async event callback routine and >> perform a special register_cm() verbs call. Of course most CM traffic >> would occur on the GSI QP, so this open CA instance was only for this >> purpose. This special verb was only available in kernel space (avoiding >> security issue of application stealing CM interface and because our CM >> was in the kernel anyway). > > Thanks for the info. I'm considering this sort of approach. OK, so you opt for a change that will have the whole solution running within the ibstack core (hw driver / core / cm) - the CM gets an async event which make it synthesize an RTU and act on it. So we went down from CMA level handling to CM level handling and it would work for both user and kernel consumers, this is in the price of having to change the verbs access layer for the CM to register on QP async events. Again, also with this solution the ULP has to be aware for CQ completions related to a QP on which ESTABLISHED event was not yet delivered on the associated CMA ID. Sound good, in fact our gen1 stack was using this solution as well, relying on the VAPI driver feature of delivering affiliated async events to all the kernel consumers (the async event ***handler*** was not associated with a specific QP) Or. From dotanb at mellanox.co.il Mon Jul 3 03:41:35 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Mon, 3 Jul 2006 13:41:35 +0300 Subject: [openib-general] [librdmacm] rping: compilation warning on 64 bit machines was fixed Message-ID: <200607031341.36056.dotanb@mellanox.co.il> Compilation warning on 64 bit machines was fixed. Signed-off-by: Dotan Barak Index: last_stable/src/userspace/librdmacm/examples/rping.c =================================================================== --- last_stable.orig/src/userspace/librdmacm/examples/rping.c 2006-07-02 18:09:41.000000000 +0300 +++ last_stable/src/userspace/librdmacm/examples/rping.c 2006-07-03 13:38:20.000000000 +0300 @@ -1025,7 +1025,7 @@ int main(int argc, char *argv[]) if ((cb->size < RPING_MIN_BUFSIZE) || (cb->size > (RPING_BUFSIZE - 1))) { fprintf(stderr, "Invalid size %d " - "(valid range is %d to %d)\n", + "(valid range is %Zd to %d)\n", cb->size, RPING_MIN_BUFSIZE, RPING_BUFSIZE); ret = EINVAL; } else From halr at voltaire.com Mon Jul 3 05:46:34 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Jul 2006 08:46:34 -0400 Subject: [openib-general] [PATCH] OpenSM/osmtest: Fix display of key violation counters in osmtest_validate_port_data Message-ID: <1151930792.4478.208450.camel@hal.voltaire.com> OpenSM/osmtest: Fix display of key violation counters in osmtest_validate_port_data Signed-off-by: Hal Rosenstock Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 8335) +++ osmtest/osmtest.c (working copy) @@ -3919,8 +3919,8 @@ osmtest_validate_port_data( IN osmtest_t "Field mismatch port LID 0x%X Num:0x%X\n" "\t\t\t\tExpected m_key_violations 0x%X, received 0x%X\n", cl_ntoh16( p_rec->lid ), p_rec->port_num, - p_port->rec.port_info.m_key_violations, - p_rec->port_info.m_key_violations ); + cl_ntoh16( p_port->rec.port_info.m_key_violations ), + cl_ntoh16( p_rec->port_info.m_key_violations ) ); status = IB_ERROR; goto Exit; } @@ -3933,8 +3933,8 @@ osmtest_validate_port_data( IN osmtest_t "Field mismatch port LID 0x%X Num:0x%X\n" "\t\t\t\tExpected p_key_violations 0x%X, received 0x%X\n", cl_ntoh16( p_rec->lid ), p_rec->port_num, - p_port->rec.port_info.p_key_violations, - p_rec->port_info.p_key_violations ); + cl_ntoh16( p_port->rec.port_info.p_key_violations ), + cl_ntoh16( p_rec->port_info.p_key_violations ) ); status = IB_ERROR; goto Exit; } @@ -3947,8 +3947,8 @@ osmtest_validate_port_data( IN osmtest_t "Field mismatch port LID 0x%X Num:0x%X\n" "\t\t\t\tExpected q_key_violations 0x%X, received 0x%X\n", cl_ntoh16( p_rec->lid ), p_rec->port_num, - p_port->rec.port_info.q_key_violations, - p_rec->port_info.q_key_violations ); + cl_ntoh16( p_port->rec.port_info.q_key_violations ), + cl_ntoh16( p_rec->port_info.q_key_violations ) ); status = IB_ERROR; goto Exit; } From bugzilla-daemon at openib.org Mon Jul 3 06:19:38 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 3 Jul 2006 06:19:38 -0700 (PDT) Subject: [openib-general] [Bug 165] New: IBWSD.DLL crashes when accessing g_pm_stat.pdata[INTR_TOTAL] Message-ID: <20060703131938.B4285228742@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=165 Summary: IBWSD.DLL crashes when accessing g_pm_stat.pdata[INTR_TOTAL] Product: OpenFabrics Windows Version: unspecified Platform: X86-64 OS/Version: Other Status: NEW Severity: blocker Priority: P4 Component: WSD AssignedTo: bugzilla at openib.org ReportedBy: guyc at voltaire.com Scenario: .NET 2 application ontop WSD The app itself has performance counters. The following line: InterlockedIncrement64( &g_pm_stat.pdata[INTR_TOTAL] ); Cause a crash in the function ib_cq_thread() When compiling WSD without performance counter, the app runs succssfully. I couldn't debug the application (yet). ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Mon Jul 3 07:17:13 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Jul 2006 10:17:13 -0400 Subject: [openib-general] [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Eliminate dead code in __osm_sm_mad_ctrl_process_get_resp Message-ID: <1151936232.4533.2336.camel@hal.voltaire.com> OpenSM/osm_sm_mad_ctrl.c: Eliminate dead code in __osm_sm_mad_ctrl_process_get_resp Signed-off-by: Hal Rosenstock Index: opensm/osm_sm_mad_ctrl.c =================================================================== --- opensm/osm_sm_mad_ctrl.c (revision 8341) +++ opensm/osm_sm_mad_ctrl.c (working copy) @@ -356,13 +356,6 @@ __osm_sm_mad_ctrl_process_get_resp( goto Exit; } } - else - { - /* - We received a response for which there is no recipient!! - */ - CL_ASSERT( FALSE ); - } Exit: OSM_LOG_EXIT( p_ctrl->p_log ); From halr at voltaire.com Mon Jul 3 10:04:47 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Jul 2006 13:04:47 -0400 Subject: [openib-general] [PATCH] [MINOR] OpenSM/osm_helper: Add support for dump of SAInformInfoRecord Message-ID: <1151946286.4533.10134.camel@hal.voltaire.com> OpenSM/osm_helper: Add support for dump of SAInformInfoRecord Signed-off-by: Hal Rosenstock Index: include/opensm/osm_helper.h =================================================================== --- include/opensm/osm_helper.h (revision 8343) +++ include/opensm/osm_helper.h (working copy) @@ -265,6 +265,12 @@ osm_dump_inform_info( IN const osm_log_level_t log_level ); void +osm_dump_inform_info_record( + IN osm_log_t* const p_log, + IN const ib_inform_info_record_t* const p_iir, + IN const osm_log_level_t log_level ); + +void osm_dump_pkey_block( IN osm_log_t* const p_log, IN uint64_t port_guid, Index: opensm/osm_helper.c =================================================================== --- opensm/osm_helper.c (revision 8343) +++ opensm/osm_helper.c (working copy) @@ -1254,6 +1254,101 @@ osm_dump_inform_info( /********************************************************************** **********************************************************************/ void +osm_dump_inform_info_record( + IN osm_log_t* const p_log, + IN const ib_inform_info_record_t* const p_iir, + IN const osm_log_level_t log_level ) +{ + uint32_t qpn; + uint8_t resp_time_val; + + ib_inform_info_get_qpn_resp_time(p_iir->inform_info.g_or_v.generic.qpn_resp_time_val, + &qpn, &resp_time_val); + + if( osm_log_is_active( p_log, log_level ) ) + { + + ib_inform_info_get_qpn_resp_time(p_iir->inform_info.g_or_v.generic.qpn_resp_time_val, + &qpn, &resp_time_val); + + if (p_iir->inform_info.is_generic) + { + osm_log( p_log, log_level, + "InformInfo Record dump:\n" + "\t\t\t\tRID\n" + "\t\t\t\tSubscriberGID...........0x%016" PRIx64 " : " + "0x%016" PRIx64 "\n" + "\t\t\t\tSubscriberEnum..........0x%X\n" + "\t\t\t\tInformInfo dump:\n" + "\t\t\t\tgid.....................0x%016" PRIx64 ",%016" PRIx64 "\n" + "\t\t\t\tlid_range_begin.........0x%X\n" + "\t\t\t\tlid_range_end...........0x%X\n" + "\t\t\t\tis_generic..............0x%X\n" + "\t\t\t\tsubscribe...............0x%X\n" + "\t\t\t\ttrap_type...............0x%X\n" + "\t\t\t\ttrap_num................%u\n" + "\t\t\t\tqpn.....................0x%06X\n" + "\t\t\t\tresp_time_val...........0x%X\n" + "\t\t\t\tnode_type...............0x%06X\n" + "", + cl_ntoh64( p_iir->subscriber_gid.unicast.prefix ), + cl_ntoh64( p_iir->subscriber_gid.unicast.interface_id ), + cl_ntoh16( p_iir->subscriber_enum ), + cl_ntoh64( p_iir->inform_info.gid.unicast.prefix ), + cl_ntoh64( p_iir->inform_info.gid.unicast.interface_id ), + cl_ntoh16( p_iir->inform_info.lid_range_begin ), + cl_ntoh16( p_iir->inform_info.lid_range_end ), + p_iir->inform_info.is_generic, + p_iir->inform_info.subscribe, + cl_ntoh16( p_iir->inform_info.trap_type ), + cl_ntoh16( p_iir->inform_info.g_or_v.generic.trap_num ), + cl_ntoh32(qpn), + resp_time_val, + cl_ntoh32(ib_inform_info_get_node_type( &p_iir->inform_info )) + ); + } + else + { + osm_log( p_log, log_level, + "InformInfo Record dump:\n" + "\t\t\t\tInformInfo dump:\n" + "\t\t\t\tRID\n" + "\t\t\t\tSubscriberGID...........0x%016" PRIx64 " : " + "0x%016" PRIx64 "\n" + "\t\t\t\tSubscriberEnum..........0x%X\n" + "\t\t\t\tgid.....................0x%016" PRIx64 ",%016" PRIx64 "\n" + "\t\t\t\tlid_range_begin.........0x%X\n" + "\t\t\t\tlid_range_end...........0x%X\n" + "\t\t\t\tis_generic..............0x%X\n" + "\t\t\t\tsubscribe...............0x%X\n" + "\t\t\t\ttrap_type...............0x%X\n" + "\t\t\t\tdev_id..................0x%X\n" + "\t\t\t\tqpn.....................0x%06X\n" + "\t\t\t\tresp_time_val...........0x%X\n" + "\t\t\t\tvendor_id...............0x%06X\n" + "", + cl_ntoh64( p_iir->subscriber_gid.unicast.prefix ), + cl_ntoh64( p_iir->subscriber_gid.unicast.interface_id ), + cl_ntoh16( p_iir->subscriber_enum ), + cl_ntoh64( p_iir->inform_info.gid.unicast.prefix ), + cl_ntoh64( p_iir->inform_info.gid.unicast.interface_id ), + cl_ntoh16( p_iir->inform_info.lid_range_begin ), + cl_ntoh16( p_iir->inform_info.lid_range_end ), + p_iir->inform_info.is_generic, + p_iir->inform_info.subscribe, + cl_ntoh16( p_iir->inform_info.trap_type ), + cl_ntoh16( p_iir->inform_info.g_or_v.vend.dev_id ), + cl_ntoh32(qpn), + resp_time_val, + cl_ntoh32(ib_inform_info_get_node_type( &p_iir->inform_info )) + ); + } + } +} + +/********************************************************************** + **********************************************************************/ +void osm_dump_link_record( IN osm_log_t* const p_log, IN const ib_link_record_t* const p_lr, From eitan at mellanox.co.il Mon Jul 3 10:18:51 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 3 Jul 2006 20:18:51 +0300 Subject: [openib-general] [PATCH] [MINOR] OpenSM/osm_helper: Add support for dump of SAInformInfoRecord Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236894E@mtlexch01.mtl.com> Sure - thanks Eitan Zahavi From anton at samba.org Mon Jul 3 15:25:06 2006 From: anton at samba.org (Anton Blanchard) Date: Tue, 4 Jul 2006 08:25:06 +1000 Subject: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems In-Reply-To: <20060629.150417.78710870.davem@davemloft.net> References: <20060629.145319.71091846.davem@davemloft.net> <1151618499.10886.26.camel@chalcedony.pathscale.com> <20060629.150417.78710870.davem@davemloft.net> Message-ID: <20060703222506.GD31081@krispykreme> Hi, > Please fix the generic code if it doesn't provide the facility > you need at the moment. Don't shoe horn it into your driver > just to make up for that. Ive had 3 drivers asking for write combining recently so I agree this is a good idea. How about ioremap_wc as suggested by Willy: http://marc.theaimsgroup.com/?l=linux-kernel&m=114374741828040&w=2 Anton From bugzilla-daemon at openib.org Mon Jul 3 15:30:59 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 3 Jul 2006 15:30:59 -0700 (PDT) Subject: [openib-general] [Bug 165] IBWSD.DLL crashes when accessing g_pm_stat.pdata[INTR_TOTAL] Message-ID: <20060703223059.97186228742@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=165 ------- Comment #1 from ftillier at silverstorm.com 2006-07-03 15:30 ------- (In reply to comment #0) > The following line: > InterlockedIncrement64( &g_pm_stat.pdata[INTR_TOTAL] ); > Cause a crash in the function ib_cq_thread() > When compiling WSD without performance counter, the app runs succssfully. > I couldn't debug the application (yet). Can you provide a backlog, or any information about the crash? Information like the exception code and the address of the variable passed to the function would help. - Fab ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From zach.brown at oracle.com Mon Jul 3 15:50:20 2006 From: zach.brown at oracle.com (Zach Brown) Date: Mon, 3 Jul 2006 15:50:20 -0700 (PDT) Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately Message-ID: <20060703225019.7379.96075.sendpatchset@tetsuo.zabbo.net> mthca: initialize send and receive queue locks separately lockdep identifies a lock by the call site of its initialization. By initializing the send and receive queue locks in mthca_wq_init() we confuse lockdep. It warns that that the ordered acquiry of both locks in mthca_modify_qp() is recursive acquiry of one lock: ============================================= [ INFO: possible recursive locking detected ] --------------------------------------------- modprobe/1192 is trying to acquire lock: (&wq->lock){....}, at: [] mthca_modify_qp+0x60/0xa7b [ib_mthca] but task is already holding lock: (&wq->lock){....}, at: [] mthca_modify_qp+0x53/0xa7b [ib_mthca] Initializing the locks separately in mthca_alloc_qp_common() stops the warning and will let lockdep enforce proper ordering on paths that acquire both locks. Signed-off-by: Zach Brown --- drivers/infiniband/hw/mthca/mthca_qp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) Index: 2.6.17-mm6/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- 2.6.17-mm6.orig/drivers/infiniband/hw/mthca/mthca_qp.c 2006-07-03 08:41:16.000000000 -0400 +++ 2.6.17-mm6/drivers/infiniband/hw/mthca/mthca_qp.c 2006-07-03 10:05:52.000000000 -0400 @@ -224,7 +224,7 @@ static void mthca_wq_init(struct mthca_wq *wq) { - spin_lock_init(&wq->lock); + /* mthca_alloc_qp_common() initializes the locks */ wq->next_ind = 0; wq->last_comp = wq->max - 1; wq->head = 0; @@ -1114,6 +1114,9 @@ qp->sq_policy = send_policy; mthca_wq_init(&qp->sq); mthca_wq_init(&qp->rq); + /* these are initialized separately so lockdep can tell them apart */ + spin_lock_init(&qp->sq.lock); + spin_lock_init(&qp->rq.lock); ret = mthca_map_memfree(dev, qp); if (ret) From mst at mellanox.co.il Mon Jul 3 21:55:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 4 Jul 2006 07:55:47 +0300 Subject: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems In-Reply-To: <20060703222506.GD31081@krispykreme> References: <20060629.145319.71091846.davem@davemloft.net> <1151618499.10886.26.camel@chalcedony.pathscale.com> <20060629.150417.78710870.davem@davemloft.net> <20060703222506.GD31081@krispykreme> Message-ID: <20060704045547.GA4325@mellanox.co.il> Quoting r. Anton Blanchard : > Subject: Re: [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems > > > Hi, > > > Please fix the generic code if it doesn't provide the facility > > you need at the moment. Don't shoe horn it into your driver > > just to make up for that. > > Ive had 3 drivers asking for write combining recently so I agree this is > a good idea. How about ioremap_wc as suggested by Willy: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=114374741828040&w=2 Hmm ... I think ioremap_wc isn't sufficient by itself to make drivers using write-combined memory, portable. Here's another thread on a related subject: http://lkml.org/lkml/2006/2/24/347 -- MST From mst at mellanox.co.il Tue Jul 4 00:03:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 4 Jul 2006 10:03:28 +0300 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: <20060703225019.7379.96075.sendpatchset@tetsuo.zabbo.net> References: <20060703225019.7379.96075.sendpatchset@tetsuo.zabbo.net> Message-ID: <20060704070328.GG21049@mellanox.co.il> Quoting r. Zach Brown : > Subject: [PATCH] mthca: initialize send and receive queue locks separately > > mthca: initialize send and receive queue locks separately > > lockdep identifies a lock by the call site of its initialization. By > initializing the send and receive queue locks in mthca_wq_init() we confuse > lockdep. It warns that that the ordered acquiry of both locks in > mthca_modify_qp() is recursive acquiry of one lock: > > ============================================= > [ INFO: possible recursive locking detected ] > --------------------------------------------- > modprobe/1192 is trying to acquire lock: > (&wq->lock){....}, at: [] mthca_modify_qp+0x60/0xa7b [ib_mthca] > but task is already holding lock: > (&wq->lock){....}, at: [] mthca_modify_qp+0x53/0xa7b [ib_mthca] Is this mthca code unique? Would not it be better to teach lockdep about this scenario somehow? > Initializing the locks separately in mthca_alloc_qp_common() stops the warning > and will let lockdep enforce proper ordering on paths that acquire both locks. > > Signed-off-by: Zach Brown This moves code out of a common function and so results in code duplication and has memory cost. > --- > > drivers/infiniband/hw/mthca/mthca_qp.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > Index: 2.6.17-mm6/drivers/infiniband/hw/mthca/mthca_qp.c > =================================================================== > --- 2.6.17-mm6.orig/drivers/infiniband/hw/mthca/mthca_qp.c 2006-07-03 08:41:16.000000000 -0400 > +++ 2.6.17-mm6/drivers/infiniband/hw/mthca/mthca_qp.c 2006-07-03 10:05:52.000000000 -0400 > @@ -224,7 +224,7 @@ > > static void mthca_wq_init(struct mthca_wq *wq) > { > - spin_lock_init(&wq->lock); > + /* mthca_alloc_qp_common() initializes the locks */ > wq->next_ind = 0; > wq->last_comp = wq->max - 1; > wq->head = 0; And then we'll have to remember to update this comment when lock is moved to another place? > @@ -1114,6 +1114,9 @@ > qp->sq_policy = send_policy; > mthca_wq_init(&qp->sq); > mthca_wq_init(&qp->rq); > + /* these are initialized separately so lockdep can tell them apart */ > + spin_lock_init(&qp->sq.lock); > + spin_lock_init(&qp->rq.lock); > > ret = mthca_map_memfree(dev, qp); > if (ret) > Looks wrong, to me. Is it a good idea to fix correct code? Assuming its important, can we maybe add some annotations to make lockdep shut up, instead? -- MST From mingo at elte.hu Tue Jul 4 01:56:53 2006 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 4 Jul 2006 10:56:53 +0200 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: <20060704070328.GG21049@mellanox.co.il> References: <20060703225019.7379.96075.sendpatchset@tetsuo.zabbo.net> <20060704070328.GG21049@mellanox.co.il> Message-ID: <20060704085653.GA13426@elte.hu> * Michael S. Tsirkin wrote: > > Initializing the locks separately in mthca_alloc_qp_common() stops the warning > > and will let lockdep enforce proper ordering on paths that acquire both locks. > > > > Signed-off-by: Zach Brown > > This moves code out of a common function and so results in code > duplication and has memory cost. the patch below does the same via the lockdep_set_class() method, which has no cost on non-lockdep kernels. Ingo ----------------> Subject: lockdep: annotate drivers/infiniband/hw/mthca/mthca_qp.c From: Ingo Molnar annotate mthca queue locks: split them into send and receive locks. (both can be held at once, but there is ordering between them which lockdep enforces) Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar --- drivers/infiniband/hw/mthca/mthca_qp.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) Index: linux/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- linux.orig/drivers/infiniband/hw/mthca/mthca_qp.c +++ linux/drivers/infiniband/hw/mthca/mthca_qp.c @@ -222,9 +222,15 @@ static void *get_send_wqe(struct mthca_q (PAGE_SIZE - 1)); } -static void mthca_wq_init(struct mthca_wq *wq) +/* + * Send and receive queues for two different lock classes: + */ +static struct lock_class_key mthca_wq_send_lock_class, mthca_wq_recv_lock_class; + +static void mthca_wq_init(struct mthca_wq *wq, struct lock_class_key *key) { spin_lock_init(&wq->lock); + lockdep_set_class(&wq->lock, key); wq->next_ind = 0; wq->last_comp = wq->max - 1; wq->head = 0; @@ -845,10 +851,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); - mthca_wq_init(&qp->sq); + mthca_wq_init(&qp->sq, &mthca_wq_send_lock_class); qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); - mthca_wq_init(&qp->rq); + mthca_wq_init(&qp->rq, &mthca_wq_recv_lock_class); qp->rq.last = get_recv_wqe(qp, qp->rq.max - 1); if (mthca_is_memfree(dev)) { @@ -1112,8 +1118,8 @@ static int mthca_alloc_qp_common(struct qp->atomic_rd_en = 0; qp->resp_depth = 0; qp->sq_policy = send_policy; - mthca_wq_init(&qp->sq); - mthca_wq_init(&qp->rq); + mthca_wq_init(&qp->sq, &mthca_wq_send_lock_class); + mthca_wq_init(&qp->rq, &mthca_wq_recv_lock_class); ret = mthca_map_memfree(dev, qp); if (ret) From mst at mellanox.co.il Tue Jul 4 02:42:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 4 Jul 2006 12:42:19 +0300 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: <20060704085653.GA13426@elte.hu> References: <20060704085653.GA13426@elte.hu> Message-ID: <20060704094219.GO21049@mellanox.co.il> Quoting r. Ingo Molnar : > Subject: Re: [PATCH] mthca: initialize send and receive queue locks separately > > > * Michael S. Tsirkin wrote: > > > > Initializing the locks separately in mthca_alloc_qp_common() stops the warning > > > and will let lockdep enforce proper ordering on paths that acquire both locks. > > > > > > Signed-off-by: Zach Brown > > > > This moves code out of a common function and so results in code > > duplication and has memory cost. > > the patch below does the same via the lockdep_set_class() method, which > has no cost on non-lockdep kernels. > > Ingo > > ----------------> > Subject: lockdep: annotate drivers/infiniband/hw/mthca/mthca_qp.c > From: Ingo Molnar > > annotate mthca queue locks: split them into send and receive locks. > > (both can be held at once, but there is ordering between them which > lockdep enforces) I find this capability of lockdep very useful. > Has no effect on non-lockdep kernels. Hmm ... adding parameters to function still has text cost, I think. No? > Signed-off-by: Ingo Molnar > --- > drivers/infiniband/hw/mthca/mthca_qp.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > Index: linux/drivers/infiniband/hw/mthca/mthca_qp.c > =================================================================== > --- linux.orig/drivers/infiniband/hw/mthca/mthca_qp.c > +++ linux/drivers/infiniband/hw/mthca/mthca_qp.c > @@ -222,9 +222,15 @@ static void *get_send_wqe(struct mthca_q > (PAGE_SIZE - 1)); > } > > -static void mthca_wq_init(struct mthca_wq *wq) > +/* > + * Send and receive queues for two different lock classes: > + */ > +static struct lock_class_key mthca_wq_send_lock_class, mthca_wq_recv_lock_class; > + Does this still have a small cost in data size on non-lockdep kernels, as well? If yes, maybe some typedef/macro magic can be used to put this struct in an unused elf section for such kernels? > +static void mthca_wq_init(struct mthca_wq *wq, struct lock_class_key *key) > { > spin_lock_init(&wq->lock); > + lockdep_set_class(&wq->lock, key); > wq->next_ind = 0; > wq->last_comp = wq->max - 1; > wq->head = 0; > @@ -845,10 +851,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, > mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, > qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); > > - mthca_wq_init(&qp->sq); > + mthca_wq_init(&qp->sq, &mthca_wq_send_lock_class); > qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); > > - mthca_wq_init(&qp->rq); > + mthca_wq_init(&qp->rq, &mthca_wq_recv_lock_class); > qp->rq.last = get_recv_wqe(qp, qp->rq.max - 1); > > if (mthca_is_memfree(dev)) { > @@ -1112,8 +1118,8 @@ static int mthca_alloc_qp_common(struct > qp->atomic_rd_en = 0; > qp->resp_depth = 0; > qp->sq_policy = send_policy; > - mthca_wq_init(&qp->sq); > - mthca_wq_init(&qp->rq); > + mthca_wq_init(&qp->sq, &mthca_wq_send_lock_class); > + mthca_wq_init(&qp->rq, &mthca_wq_recv_lock_class); > > ret = mthca_map_memfree(dev, qp); > if (ret) > > I'm pretty sure this still adds to code footprint due to extra function parameters even on non-lockdep kernels. Will the following work? @@ -1112,8 +1118,8 @@ static int mthca_alloc_qp_common(struct qp->atomic_rd_en = 0; qp->resp_depth = 0; qp->sq_policy = send_policy; mthca_wq_init(&qp->sq); + lockdep_set_class(&qp->sq.lock, &mthca_wq_send_lock_class); mthca_wq_init(&qp->rq); + lockdep_set_class(&qp->rq.lock, &mthca_wq_recv_lock_class); -- MST From AparnaA at KPITCummins.com Tue Jul 4 04:28:16 2006 From: AparnaA at KPITCummins.com (Aparna Argade) Date: Tue, 4 Jul 2006 16:58:16 +0530 Subject: [openib-general] Using ITAPI to communicate with IB subsystem Message-ID: <4A1BE23A7B777442B60F4B4916AE0F130CB95429@sohm.kpit.com> Hi, I am newly working on SRP protocol and I have to use IT-API to communicate with IB(infiniband) subsystem. I have read ITAPI documentation at top level. But I am not getting clear idea about how exactly I can use ITAPIs to communicate with IB subsystem. Do I need to completely replace calls (in ib_srp.c) like ib_send_cm_reg, ib_post_recv, ib_post_send etc. calls with equivalent APIs in ITAPI? Are there any useful guidelines/tips/samples for using ITAPI to communicate with IB subsystem? Thanks in advance. Regards, Aparna -------------- next part -------------- An HTML attachment was scrubbed... URL: From mingo at elte.hu Tue Jul 4 04:56:56 2006 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 4 Jul 2006 13:56:56 +0200 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: <20060704094219.GO21049@mellanox.co.il> References: <20060704085653.GA13426@elte.hu> <20060704094219.GO21049@mellanox.co.il> Message-ID: <20060704115656.GA1539@elte.hu> * Michael S. Tsirkin wrote: > > Has no effect on non-lockdep kernels. > > Hmm ... adding parameters to function still has text cost, I think. No? it shouldnt - it's a static function and the parameter is unused _and_ is of a type that is zero-size [on non-lockdep kernels] - gcc ought to be able to optimize it out. Ingo From dotanb at mellanox.co.il Tue Jul 4 05:36:48 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 4 Jul 2006 15:36:48 +0300 Subject: [openib-general] [libmthca] patch to check the value of cqe in create_cq/modify_cq Message-ID: <200607041536.48985.dotanb@mellanox.co.il> Passing huge size values to create_cq/resize_cq causes hang in align_cq_size. Fix this by validating input, similiar to what we do for create_qp. Signed-off-by: Dotan Barak Signed-off-by: Michael S. Tsirkin Index: last_stable/src/userspace/libmthca/src/verbs.c =================================================================== --- last_stable.orig/src/userspace/libmthca/src/verbs.c 2006-07-04 15:22:20.000000000 +0300 +++ last_stable/src/userspace/libmthca/src/verbs.c 2006-07-04 15:23:49.000000000 +0300 @@ -174,6 +174,10 @@ struct ibv_cq *mthca_create_cq(struct ib struct mthca_cq *cq; int ret; + /* Sanity check CQ size before proceeding */ + if (cqe > 131072) + return NULL; + cq = malloc(sizeof *cq); if (!cq) return NULL; @@ -264,6 +268,10 @@ int mthca_resize_cq(struct ibv_cq *ibcq, int old_cqe; int ret; + /* Sanity check CQ size before proceeding */ + if (cqe > 131072) + return EINVAL; + pthread_spin_lock(&cq->lock); cqe = align_cq_size(cqe); From mst at mellanox.co.il Tue Jul 4 05:52:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 4 Jul 2006 15:52:22 +0300 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: <20060704115656.GA1539@elte.hu> References: <20060704115656.GA1539@elte.hu> Message-ID: <20060704125222.GZ21049@mellanox.co.il> Quoting r. Ingo Molnar : > Subject: Re: [PATCH] mthca: initialize send and receive queue locks separately > > > * Michael S. Tsirkin wrote: > > > > Has no effect on non-lockdep kernels. > > > > Hmm ... adding parameters to function still has text cost, I think. No? > > it shouldnt - it's a static function and the parameter is unused _and_ > is of a type that is zero-size [on non-lockdep kernels] - gcc ought to > be able to optimize it out. Ingo, you are right, and I just checked this with several gcc versions. More power to you :) -- MST From bugzilla-daemon at openib.org Tue Jul 4 06:02:32 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 4 Jul 2006 06:02:32 -0700 (PDT) Subject: [openib-general] [Bug 167] New: HPL test failed over WSD Message-ID: <20060704130232.BD771228742@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=167 Summary: HPL test failed over WSD Product: OpenFabrics Windows Version: unspecified Platform: X86-64 OS/Version: Other Status: NEW Severity: blocker Priority: P2 Component: WSD AssignedTo: bugzilla at openib.org ReportedBy: evgeniyge at voltaire.com CC: evgeniyge at voltaire.com scenario: 12 processes command line: mpiexec -hosts 12 host1 host2 ..... host12 hpl.exe hpl parameters(HPL.dat) : HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 10000 Ns 1 # of NBs 220 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 3 Ps 4 Qs 16.0 threshold 1 # of panel fact 0 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 2 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 0 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Tue Jul 4 05:59:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 4 Jul 2006 15:59:57 +0300 Subject: [openib-general] [PATCH updated] libmthca: create_cq/resize_cq input validation In-Reply-To: <200607041536.48985.dotanb@mellanox.co.il> References: <200607041536.48985.dotanb@mellanox.co.il> Message-ID: <20060704125957.GA21049@mellanox.co.il> Here's an updated version of Dotan's patch (that was line wrapped). --- Passing huge size values to create_cq/resize_cq causes hang in align_cq_size. Fix this by validating input, similiar to what we do for create_qp. Signed-off-by: Dotan Barak Signed-off-by: Michael S. Tsirkin Index: last_stable/src/userspace/libmthca/src/verbs.c =================================================================== --- last_stable.orig/src/userspace/libmthca/src/verbs.c 2006-07-04 15:22:20.000000000 +0300 +++ last_stable/src/userspace/libmthca/src/verbs.c 2006-07-04 15:23:49.000000000 +0300 @@ -174,6 +174,9 @@ struct ibv_cq *mthca_create_cq(struct ib struct mthca_cq *cq; int ret; + if (cqe > 131072) + return NULL; + cq = malloc(sizeof *cq); if (!cq) return NULL; @@ -264,6 +268,9 @@ int mthca_resize_cq(struct ibv_cq *ibcq, int old_cqe; int ret; + if (cqe > 131072) + return EINVAL; + pthread_spin_lock(&cq->lock); cqe = align_cq_size(cqe); -- MST From halr at voltaire.com Tue Jul 4 06:04:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Jul 2006 09:04:02 -0400 Subject: [openib-general] [PATCH] OpenSM: Support LMC > 0 on enhanced switch port 0 Message-ID: <1152018240.4533.66836.camel@hal.voltaire.com> OpenSM: Support LMC > 0 on enhanced switch port 0 Allow enhanced switch port 0 (ESP0) to have a non zero LMC. Use the configured subnet wide LMC for this. Modifications were necessary to the LID assignment and routing to support this. Also, add an option to the configuration to use LMC configured for subnet for enhanced switch port 0 or set it to 0 even if a non zero LMC is configured for the subnet. The default is currently the latter option. Signed-off-by: Hal Rosenstock Index: include/opensm/osm_subnet.h =================================================================== --- include/opensm/osm_subnet.h (revision 8357) +++ include/opensm/osm_subnet.h (working copy) @@ -237,6 +237,7 @@ typedef struct _osm_subn_opt uint32_t transaction_timeout; uint8_t sm_priority; uint8_t lmc; + boolean_t lmc_esp0; uint8_t max_op_vls; boolean_t reassign_lids; boolean_t reassign_lfts; @@ -314,12 +315,17 @@ typedef struct _osm_subn_opt * lmc * The LMC value used on this subnet. * +* lmc_esp0 +* Whether LMC value used on subnet should be used for +* enhanced switch port 0 or not. If TRUE, it is used. +* Otherwise (the default), LMC is set to 0 for ESP0. +* * max_op_vls * Limit the maximal operational VLs. default is 1. * * reassign_lids * If TRUE cause all lids to be re-assigend. -* Otherwise (the default) +* Otherwise (the default), * OpenSM always tries to preserve as LIDs as much as possible. * * reassign_lfts Index: opensm/osm_lid_mgr.c =================================================================== --- opensm/osm_lid_mgr.c (revision 8357) +++ opensm/osm_lid_mgr.c (working copy) @@ -94,6 +94,7 @@ #include #include #include +#include #include #include #include @@ -351,6 +352,7 @@ __osm_lid_mgr_init_sweep( osm_lid_mgr_range_t *p_range = NULL; osm_port_t *p_port; cl_qmap_t *p_port_guid_tbl; + osm_switch_t *p_sw; uint8_t lmc_num_lids = (uint8_t)(1 << p_mgr->p_subn->opt.lmc); uint16_t lmc_mask; uint16_t req_lid, num_lids; @@ -436,7 +438,19 @@ __osm_lid_mgr_init_sweep( IB_NODE_TYPE_SWITCH ) num_lids = lmc_num_lids; else - num_lids = 1; + { + /* Determine if enhanced switch port 0 */ + p_sw = osm_get_switch_by_guid(p_mgr->p_subn, + osm_node_get_node_guid(osm_port_get_parent_node(p_port))); + if (p_mgr->p_subn->opt.lmc_esp0 && osm_switch_is_sp0_enhanced(p_sw)) + { + num_lids = lmc_num_lids; + } + else + { + num_lids = 1; + } + } if ((num_lids != 1) && (((db_min_lid & lmc_mask) != db_min_lid) || @@ -539,7 +553,18 @@ __osm_lid_mgr_init_sweep( } else { - num_lids = 1; + /* Determine if enhanced switch port 0 */ + p_sw = osm_get_switch_by_guid(p_mgr->p_subn, + osm_node_get_node_guid(osm_port_get_parent_node(p_port))); + if (p_mgr->p_subn->opt.lmc_esp0 && osm_switch_is_sp0_enhanced(p_sw)) + { + disc_max_lid = disc_min_lid + lmc_num_lids - 1; + num_lids = lmc_num_lids; + } + else + { + num_lids = 1; + } } /* Make sure the lid is aligned */ @@ -798,6 +823,7 @@ __osm_lid_mgr_get_port_lid( uint8_t num_lids = (1 << p_mgr->p_subn->opt.lmc); int lid_changed = 0; uint16_t lmc_mask; + osm_switch_t *p_sw; OSM_LOG_ENTER( p_mgr->p_log, __osm_lid_mgr_get_port_lid ); @@ -809,10 +835,18 @@ __osm_lid_mgr_get_port_lid( /* get the lid from the guid2lid */ guid = cl_ntoh64( osm_port_get_guid( p_port ) ); - /* if the port is a switch then we only need one lid */ + /* if the port is a base switch port 0 then we only need one lid */ if( osm_node_get_type( osm_port_get_parent_node( p_port ) ) == IB_NODE_TYPE_SWITCH ) - num_lids = 1; + { + /* Determine if base switch port 0 */ + p_sw = osm_get_switch_by_guid(p_mgr->p_subn, + osm_node_get_node_guid(osm_port_get_parent_node(p_port))); + if (!osm_switch_is_sp0_enhanced(p_sw)) + { + num_lids = 1; + } + } /* if the port matches the guid2lid */ if (!osm_db_guid2lid_get( p_mgr->p_g2l, guid, &min_lid, &max_lid)) @@ -954,6 +988,7 @@ __osm_lid_mgr_set_physp_pi( const ib_port_info_t* p_old_pi; osm_madw_context_t context; osm_node_t* p_node; + osm_switch_t* p_sw; ib_api_status_t status; uint8_t mtu; uint8_t op_vls; @@ -1182,6 +1217,18 @@ __osm_lid_mgr_set_physp_pi( "__osm_lid_mgr_set_physp_pi: " "Updating neighbor_mtu on switch port 0 to:%u\n", ib_port_info_get_neighbor_mtu( p_pi ) ); + + /* Determine if enhanced switch port 0 and if so set LMC */ + p_sw = osm_get_switch_by_guid( p_mgr->p_subn, p_node->node_info.node_guid ); + if (osm_switch_is_sp0_enhanced(p_sw)) + { + /* M_KeyProtectBits are always zero */ + p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc; + /* Check to see if the value we are setting is different than + the value in the port_info. If it is, turn on send_set flag */ + if (memcmp( &p_pi->mkey_lmc, &p_old_pi->mkey_lmc, sizeof(p_pi->mkey_lmc) )) + send_set = TRUE; + } } context.pi_context.node_guid = osm_node_get_node_guid( p_node ); Index: opensm/osm_link_mgr.c =================================================================== --- opensm/osm_link_mgr.c (revision 8357) +++ opensm/osm_link_mgr.c (working copy) @@ -126,6 +126,7 @@ __osm_link_mgr_set_physp_pi( uint8_t port_num; uint8_t mtu; uint8_t op_vls; + boolean_t esp0 = FALSE; boolean_t send_set = FALSE; osm_physp_t *p_remote_physp; @@ -172,6 +173,7 @@ __osm_link_mgr_set_physp_pi( } goto Exit; } + esp0 = TRUE; } /* @@ -236,7 +238,15 @@ __osm_link_mgr_set_physp_pi( sizeof(p_pi->m_key_lease_period) )) send_set = TRUE; - p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc; + if (esp0 == FALSE) + p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc; + else + { + if (p_mgr->p_subn->opt.lmc_esp0) + p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc; + else + p_pi->mkey_lmc = 0; + } if (memcmp( &p_pi->mkey_lmc, &p_old_pi->mkey_lmc, sizeof(p_pi->mkey_lmc) )) send_set = TRUE; Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 8357) +++ opensm/osm_subnet.c (working copy) @@ -449,6 +449,7 @@ osm_subn_set_default_opt( p_opt->max_msg_fifo_timeout = 50*OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC; p_opt->sm_priority = OSM_DEFAULT_SM_PRIORITY; p_opt->lmc = OSM_DEFAULT_LMC; + p_opt->lmc_esp0 = FALSE; p_opt->max_op_vls = OSM_DEFAULT_MAX_OP_VLS; p_opt->reassign_lids = FALSE; p_opt->reassign_lfts = TRUE; @@ -823,6 +824,10 @@ osm_subn_parse_conf_file( "lmc", p_key, p_val, &p_opts->lmc); + __osm_subn_opts_unpack_boolean( + "lmc_esp0", + p_key, p_val, &p_opts->lmc_esp0); + __osm_subn_opts_unpack_uint8( "max_op_vls", p_key, p_val, &p_opts->max_op_vls); @@ -1009,6 +1014,10 @@ osm_subn_write_conf_file( "subnet_prefix 0x%016" PRIx64 "\n\n" "# The LMC value used on this subnet\n" "lmc %u\n\n" + "# lmc_esp0 determines whether LMC value used on subnet is used for\n" + "#enhanced switch port 0. If TRUE, LMC value for subnet is used for\n" + "#ESP0. Otherwise, LMC value for ESP0s is 0.\n" + "lmc_esp0 %s\n\n" "# The code of maximal time a packet can live in a switch\n" "# The actual time is 4.096usec * 2^\n" "# The value 0x14 disables this mechanism\n" @@ -1045,6 +1054,7 @@ osm_subn_write_conf_file( cl_ntoh64(p_opts->sm_key), cl_ntoh64(p_opts->subnet_prefix), p_opts->lmc, + p_opts->lmc_esp0 ? "TRUE" : "FALSE", p_opts->packet_life_time, p_opts->vl_stall_count, p_opts->leaf_vl_stall_count, Index: opensm/osm_ucast_mgr.c =================================================================== --- opensm/osm_ucast_mgr.c (revision 8357) +++ opensm/osm_ucast_mgr.c (working copy) @@ -351,7 +351,7 @@ static void __osm_ucast_mgr_dump_tables( } /********************************************************************** - Add each switch's own LID to its LID matrix. + Add each switch's own LID(s) to its LID matrix. **********************************************************************/ static void __osm_ucast_mgr_process_hop_0( @@ -361,8 +361,9 @@ __osm_ucast_mgr_process_hop_0( osm_switch_t* const p_sw = (osm_switch_t*)p_map_item; osm_ucast_mgr_t* const p_mgr = (osm_ucast_mgr_t*)context; osm_node_t *p_node; - uint16_t lid_ho; + uint16_t lid_ho, base_lid_ho, max_lid_ho; cl_status_t status; + uint8_t lmc; OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_hop_0 ); @@ -377,27 +378,35 @@ __osm_ucast_mgr_process_hop_0( */ osm_switch_prepare_path_rebuild( p_sw ); - lid_ho = cl_ntoh16( osm_node_get_base_lid( p_node, 0 ) ); + base_lid_ho = cl_ntoh16( osm_node_get_base_lid( p_node, 0 ) ); + if (osm_switch_is_sp0_enhanced( p_sw )) + lmc = osm_node_get_lmc( p_node, 0 ); + else + lmc = 0; + max_lid_ho = (uint16_t)( base_lid_ho + (1 << lmc) - 1 ); - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) + for (lid_ho = base_lid_ho; lid_ho <= max_lid_ho; lid_ho++) { - osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "__osm_ucast_mgr_process_hop_0: " - "Processing switch GUID 0x%" PRIx64 ", LID 0x%X\n", - cl_ntoh64( osm_node_get_node_guid( p_node ) ), - lid_ho ); - } + if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_mgr->p_log, OSM_LOG_DEBUG, + "__osm_ucast_mgr_process_hop_0: " + "Processing switch GUID 0x%" PRIx64 ", LID 0x%X\n", + cl_ntoh64( osm_node_get_node_guid( p_node ) ), + lid_ho ); + } - status = osm_switch_set_hops( p_sw, lid_ho, 0, 0 ); - if( status != CL_SUCCESS ) - { - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_ucast_mgr_process_hop_0: ERR 3A02: " - "Setting hop count failed (%s) for " - "switch GUID 0x%" PRIx64 ", LID 0x%X\n", - CL_STATUS_MSG( status ), - cl_ntoh64( osm_node_get_node_guid( p_node ) ), - lid_ho ); + status = osm_switch_set_hops( p_sw, lid_ho, 0, 0 ); + if( status != CL_SUCCESS ) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "__osm_ucast_mgr_process_hop_0: ERR 3A02: " + "Setting hop count failed (%s) for " + "switch GUID 0x%" PRIx64 ", LID 0x%X\n", + CL_STATUS_MSG( status ), + cl_ntoh64( osm_node_get_node_guid( p_node ) ), + lid_ho ); + } } OSM_LOG_EXIT( p_mgr->p_log ); @@ -1095,7 +1104,7 @@ osm_ucast_mgr_process( "Starting switches Min Hop Table Assignment\n"); /* - Set the switch matrices for each switch's own port 0 LID, + Set the switch matrices for each switch's own port 0 LID(s) then set the lid matrices for the each switch's leaf nodes. */ cl_qmap_apply_func( p_sw_guid_tbl, From bugzilla-daemon at openib.org Tue Jul 4 06:15:52 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 4 Jul 2006 06:15:52 -0700 (PDT) Subject: [openib-general] [Bug 167] HPL test failed over WSD Message-ID: <20060704131552.D6B59228743@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=167 evgeniyge at voltaire.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |ftillier at silverstorm.com ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From dotanb at mellanox.co.il Tue Jul 4 07:50:22 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 4 Jul 2006 17:50:22 +0300 Subject: [openib-general] [ucm] device file of the ucm is not being created Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30271BBB0@mtlexch01.mtl.com> Hi sean. I have the following configuration: ************************************************************* Host Architecture : x86_64 Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 3) Kernel Version : 2.6.9-34.ELsmp Memory size : 4037868 kB Driver Version : OFED-1.0.1-pre3 HCA ID(s) : mthca0 HCA model(s) : 25208 FW version(s) : 4.7.600 Board(s) : MT_00A0000001 ************************************************************* in the file openib.conf, i enabled the ucm and restarted the driver. when i tried to use the ucm, i noticed that the device file of the ucm wasn't created. here are the udev rules: KERNEL="umad*", NAME="infiniband/%k" KERNEL="issm*", NAME="infiniband/%k" KERNEL="ucm*", NAME="infiniband/%k", MODE="0666" KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" KERNEL="uat", NAME="infiniband/%k", MODE="0666" KERNEL="ucma", NAME="infiniband/%k", MODE="0666" KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" do you know that is the problem? thanks Dotan Barak Software Verification Engineer Mellanox Technologies Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zach.brown at oracle.com Tue Jul 4 09:38:49 2006 From: zach.brown at oracle.com (Zach Brown) Date: Tue, 04 Jul 2006 09:38:49 -0700 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: <20060704070328.GG21049@mellanox.co.il> References: <20060703225019.7379.96075.sendpatchset@tetsuo.zabbo.net> <20060704070328.GG21049@mellanox.co.il> Message-ID: <44AA9999.3060308@oracle.com> > This moves code out of a common function and so results in code duplication > and has memory cost. Of course. I don't care which trade-offs you prefer to maintain as long as the driver stops yelling at me as the machine boots. That patch was what Arjan suggested as the simplest. Also, while looking at this I saw that the locks are being re-initialized from mthca_modify_qp(). Is that just a side-effect of relying on mthca_wq_init() to reset the non-lock members? If you're concerned about microoptimization it seems like this could be avoided. - z From mst at mellanox.co.il Tue Jul 4 09:52:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 4 Jul 2006 19:52:34 +0300 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: <44AA9999.3060308@oracle.com> References: <44AA9999.3060308@oracle.com> Message-ID: <20060704165234.GA5935@mellanox.co.il> Quoting r. Zach Brown : > Is that just a side-effect of > relying on mthca_wq_init() to reset the non-lock members? Yes. > If you're > concerned about microoptimization it seems like this could be avoided. This is off the fast path so I think Roland's idea was we should only care about code size. But might be worth thinking this over, anyway. Thanks! -- MST From rdreier at cisco.com Tue Jul 4 13:39:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 04 Jul 2006 13:39:34 -0700 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: <44AA9999.3060308@oracle.com> (Zach Brown's message of "Tue, 04 Jul 2006 09:38:49 -0700") References: <20060703225019.7379.96075.sendpatchset@tetsuo.zabbo.net> <20060704070328.GG21049@mellanox.co.il> <44AA9999.3060308@oracle.com> Message-ID: Zach> Also, while looking at this I saw that the locks are being Zach> re-initialized from mthca_modify_qp(). Is that just a Zach> side-effect of relying on mthca_wq_init() to reset the Zach> non-lock members? If you're concerned about Zach> microoptimization it seems like this could be avoided. I think that is actually a very minor bug you've found. If someone were posting a work request at the same time as they transitioned a QP to reset (which is a legitimate if not sensible thing to do), then the spinlock could get reinitialized while it was held. Which would be bad. So I think I like your original patch the best. - R. From mst at mellanox.co.il Tue Jul 4 20:07:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 5 Jul 2006 06:07:45 +0300 Subject: [openib-general] [PATCH] mthca: initialize send and receive queue locks separately In-Reply-To: References: Message-ID: <20060705030745.GA20709@mellanox.co.il> Quoting r. Roland Dreier : > I think that is actually a very minor bug you've found. If someone > were posting a work request at the same time as they transitioned a QP > to reset (which is a legitimate if not sensible thing to do), then the > spinlock could get reinitialized while it was held. Which would be > bad. I think you can't post work requests to a QP in reset state, since IB spec forbids this. If you do, it seems bad things will happen anyway, for example head/tail pointers may get out of sync as mthca_wq_init clears these. -- MST From bpradip at in.ibm.com Wed Jul 5 02:31:01 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Wed, 5 Jul 2006 15:01:01 +0530 Subject: [openib-general] [PATCH 0/2] perftest: enhancement to rdma_bw to allow use of RDMA CM Message-ID: <20060705093054.GA15470@harry-potter.in.ibm.com> This patchset allows rdma_bw to use RDMA CM. This patch tries to address the comments from Michael Tsirkin on the earlier posted patch by Steve Wise. See the thread below for history. http://thread.gmane.org/gmane.linux.drivers.openib/25745/focus=25747 This patchset consists changes for rdma_bw and Makefile. 1 - rdma_bw.c changes 2 - Makefile changes Signed-off-by: Pradipta Kumar Banerjee --- Thanks, Pradipta Kumar From bpradip at in.ibm.com Wed Jul 5 02:38:38 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Wed, 5 Jul 2006 15:08:38 +0530 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM Message-ID: <20060705093835.GB15470@harry-potter.in.ibm.com> This patch contains changes to the rdma_bw.c to allow use of RDMA CM. This has been successfully tested with Ammasso iWARP cards and IBM eHCA IB cards. Summary of changes # Added an option (-c|--cma) to enable use of RDMA CM # Tried to localise the changes to the pp_* routines (as pointed by Michael earlier) as much as possible. No seperate *_cma routines as in the earlier patch by Steve. # Added a new structure (struct pp_data) containing the user parameters as well as other data required by most of the routines. This makes it convenient to pass the parameters between various routines. # All outputs to stdout/stderr are prefixed with the process-id. This helps to sort the output when multiple servers/clients are run from the same machine Signed-off-by: Pradipta Kumar Banerjee --- Index: perftest/rdma_bw.c ============================================================================= --- rdma_bw.c.org 2006-07-05 12:51:06.000000000 +0530 +++ rdma_bw.c 2006-07-05 13:10:15.000000000 +0530 @@ -53,19 +53,23 @@ #include #include +#include #include "get_clock.h" #define PINGPONG_RDMA_WRID 3 static int page_size; +static pid_t pid; struct pingpong_context { struct ibv_context *context; struct ibv_pd *pd; struct ibv_mr *mr; - struct ibv_cq *cq; + struct ibv_cq *rcq; + struct ibv_cq *scq; struct ibv_qp *qp; + struct ibv_comp_channel *ch; void *buf; unsigned size; int tx_depth; @@ -81,6 +85,29 @@ struct pingpong_dest { unsigned long long vaddr; }; +struct pp_data { + int port; + int ib_port; + unsigned size; + int tx_depth; + int use_cma; + int sockfd; + char *servername; + struct pingpong_dest my_dest; + struct pingpong_dest *rem_dest; + struct ibv_device *ib_dev; + struct rdma_event_channel *cm_channel; + struct rdma_cm_id *cm_id; + +}; + +static void pp_post_recv(struct pingpong_context *); +static void pp_wait_for_done(struct pingpong_context *); +static void pp_send_done(struct pingpong_context *); +static void pp_wait_for_start(struct pingpong_context *); +static void pp_send_start(struct pingpong_context *); +static void pp_close_cma(struct pp_data ); +static struct pingpong_context *pp_init_ctx(void *, struct pp_data *); static uint16_t pp_get_local_lid(struct pingpong_context *ctx, int port) { @@ -92,7 +119,7 @@ static uint16_t pp_get_local_lid(struct return attr.lid; } -static int pp_client_connect(const char *servername, int port) +static struct pingpong_context *pp_client_connect(struct pp_data *data) { struct addrinfo *res, *t; struct addrinfo hints = { @@ -102,74 +129,186 @@ static int pp_client_connect(const char char *service; int n; int sockfd = -1; + struct rdma_cm_event *event; + struct sockaddr_in sin; + struct pingpong_context *ctx = NULL; + struct rdma_conn_param conn_param; - asprintf(&service, "%d", port); - n = getaddrinfo(servername, service, &hints, &res); + asprintf(&service, "%d", data->port); + n = getaddrinfo(data->servername, service, &hints, &res); if (n < 0) { - fprintf(stderr, "%s for %s:%d\n", gai_strerror(n), servername, port); - return n; + fprintf(stderr, "%d:%s: %s for %s:%d\n", + pid, __func__, gai_strerror(n), + data->servername, data->port); + goto err4; } - for (t = res; t; t = t->ai_next) { - sockfd = socket(t->ai_family, t->ai_socktype, t->ai_protocol); - if (sockfd >= 0) { - if (!connect(sockfd, t->ai_addr, t->ai_addrlen)) - break; - close(sockfd); - sockfd = -1; + if (data->use_cma) { + sin.sin_addr.s_addr = ((struct sockaddr_in*)res->ai_addr)->sin_addr.s_addr; + sin.sin_family = AF_INET; + sin.sin_port = htons(data->port); + if (rdma_resolve_addr(data->cm_id, NULL, + (struct sockaddr *)&sin, 2000)) { + fprintf(stderr, "%d:%s: rdma_resolve_addr failed\n", + pid, __func__ ); + goto err2; + } + + if (rdma_get_cm_event(data->cm_channel, &event)) + goto err2; + + if (event->event != RDMA_CM_EVENT_ADDR_RESOLVED) { + fprintf(stderr, "%d:%s: unexpected CM event %d\n", + pid, __func__, event->event); + goto err1; + } + rdma_ack_cm_event(event); + + if (rdma_resolve_route(data->cm_id, 2000)) { + fprintf(stderr, "%d:%s: rdma_resolve_route failed\n", + pid, __func__); + goto err2; + } + + if (rdma_get_cm_event(data->cm_channel, &event)) + goto err2; + + if (event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) { + fprintf(stderr, "%d:%s: unexpected CM event %d\n", + pid, __func__, event->event); + rdma_ack_cm_event(event); + goto err1; + } + rdma_ack_cm_event(event); + ctx = pp_init_ctx(data->cm_id, data); + if (!ctx) { + fprintf(stderr, "%d:%s: pp_init_ctx failed\n", pid, __func__); + goto err2; + } + data->my_dest.psn = lrand48() & 0xffffff; + data->my_dest.qpn = 0; + data->my_dest.rkey = ctx->mr->rkey; + data->my_dest.vaddr = (uintptr_t)ctx->buf + ctx->size; + + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 1; + conn_param.initiator_depth = 1; + conn_param.retry_count = 5; + conn_param.private_data = &data->my_dest; + conn_param.private_data_len = sizeof(data->my_dest); + + if (rdma_connect(data->cm_id, &conn_param)) { + fprintf(stderr, "%d:%s: rdma_connect failure\n", pid, __func__); + goto err2; + } + + if (rdma_get_cm_event(data->cm_channel, &event)) + goto err2; + + if (event->event != RDMA_CM_EVENT_ESTABLISHED) { + fprintf(stderr, "%d:%s: unexpected CM event %d\n", + pid, __func__, event->event); + goto err1; + } + if (!event->private_data || + (event->private_data_len < sizeof(*data->rem_dest))) { + fprintf(stderr, "%d:%s: bad private data ptr %p len %d\n", + pid, __func__, event->private_data, + event->private_data_len); + goto err1; } + data->rem_dest = malloc(sizeof *data->rem_dest); + if (!data->rem_dest) + goto err1; + + memcpy(data->rem_dest, event->private_data, + sizeof(*data->rem_dest)); + rdma_ack_cm_event(event); + + } else { + for (t = res; t; t = t->ai_next) { + sockfd = socket(t->ai_family, t->ai_socktype, + t->ai_protocol); + if (sockfd >= 0) { + if (!connect(sockfd, t->ai_addr, t->ai_addrlen)) + break; + close(sockfd); + sockfd = -1; + } + } + if (sockfd < 0) { + fprintf(stderr, "%d:%s: Couldn't connect to %s:%d\n", + pid, __func__, data->servername, data->port); + goto err3; + } + ctx = pp_init_ctx(data->ib_dev, data); + if (!ctx) + goto err3; + data->sockfd = sockfd; } freeaddrinfo(res); + return ctx; + +err1: + rdma_ack_cm_event(event); +err2: + rdma_destroy_id(data->cm_id); + rdma_destroy_event_channel(data->cm_channel); +err3: + freeaddrinfo(res); +err4: + return NULL; - if (sockfd < 0) { - fprintf(stderr, "Couldn't connect to %s:%d\n", servername, port); - return sockfd; - } - return sockfd; } -struct pingpong_dest * pp_client_exch_dest(int sockfd, - const struct pingpong_dest *my_dest) +static int pp_client_exch_dest(struct pp_data *data) { - struct pingpong_dest *rem_dest = NULL; char msg[sizeof "0000:000000:000000:00000000:0000000000000000"]; int parsed; - - sprintf(msg, "%04x:%06x:%06x:%08x:%016Lx", my_dest->lid, my_dest->qpn, - my_dest->psn,my_dest->rkey,my_dest->vaddr); - if (write(sockfd, msg, sizeof msg) != sizeof msg) { - perror("client write"); - fprintf(stderr, "Couldn't send local address\n"); - goto out; - } - - if (read(sockfd, msg, sizeof msg) != sizeof msg) { - perror("client read"); - fprintf(stderr, "Couldn't read remote address\n"); - goto out; - } - - rem_dest = malloc(sizeof *rem_dest); - if (!rem_dest) - goto out; - - parsed = sscanf(msg, "%x:%x:%x:%x:%Lx", &rem_dest->lid, &rem_dest->qpn, - &rem_dest->psn,&rem_dest->rkey,&rem_dest->vaddr); - - if (parsed != 5) { - fprintf(stderr, "Couldn't parse line <%.*s>\n",(int)sizeof msg, - msg); - free(rem_dest); - rem_dest = NULL; - goto out; + + if (!data->use_cma) { + sprintf(msg, "%04x:%06x:%06x:%08x:%016Lx", data->my_dest.lid, + data->my_dest.qpn, data->my_dest.psn, + data->my_dest.rkey, data->my_dest.vaddr); + if (write(data->sockfd, msg, sizeof msg) != sizeof msg) { + perror("client write"); + fprintf(stderr, "%d:%s: Couldn't send local address\n", + pid, __func__); + goto err; + } + + if (read(data->sockfd, msg, sizeof msg) != sizeof msg) { + perror("client read"); + fprintf(stderr, "%d:%s: Couldn't read remote address\n", + pid, __func__); + goto err; + } + + if (data->rem_dest != NULL) + free(data->rem_dest); + data->rem_dest = malloc(sizeof *data->rem_dest); + if (!data->rem_dest) + goto err; + + parsed = sscanf(msg, "%x:%x:%x:%x:%Lx", &data->rem_dest->lid, + &data->rem_dest->qpn, &data->rem_dest->psn, + &data->rem_dest->rkey, &data->rem_dest->vaddr); + + if (parsed != 5) { + fprintf(stderr, "%d:%s: Couldn't parse line <%.*s>\n", + pid, __func__, (int)sizeof msg, msg); + free(data->rem_dest); + goto err; + } } -out: - return rem_dest; + return 0; +err: + return 1; } -int pp_server_connect(int port) +static struct pingpong_context *pp_server_connect(struct pp_data *data) { struct addrinfo *res, *t; struct addrinfo hints = { @@ -180,201 +319,338 @@ int pp_server_connect(int port) char *service; int sockfd = -1, connfd; int n; - - asprintf(&service, "%d", port); - n = getaddrinfo(NULL, service, &hints, &res); - - if (n < 0) { - fprintf(stderr, "%s for port %d\n", gai_strerror(n), port); - return n; - } - - for (t = res; t; t = t->ai_next) { - sockfd = socket(t->ai_family, t->ai_socktype, t->ai_protocol); - if (sockfd >= 0) { - n = 1; - - setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &n, sizeof n); - - if (!bind(sockfd, t->ai_addr, t->ai_addrlen)) - break; + struct rdma_cm_event *event; + struct sockaddr_in sin; + struct pingpong_context *ctx = NULL; + struct rdma_cm_id *child_cm_id; + struct rdma_conn_param conn_param; + + asprintf(&service, "%d", data->port); + if ( (n = getaddrinfo(NULL, service, &hints, &res)) < 0 ) { + fprintf(stderr, "%d:%s: %s for port %d\n", pid, __func__, + gai_strerror(n), data->port); + goto err5; + } + + if (data->use_cma) { + sin.sin_addr.s_addr = 0; + sin.sin_family = AF_INET; + sin.sin_port = htons(data->port); + if (rdma_bind_addr(data->cm_id, (struct sockaddr *)&sin)) { + fprintf(stderr, "%d:%s: rdma_bind_addr failed\n", pid, __func__); + goto err3; + } + + if (rdma_listen(data->cm_id, 0)) { + fprintf(stderr, "%d:%s: rdma_listen failed\n", pid, __func__); + goto err3; + } + + if (rdma_get_cm_event(data->cm_channel, &event)) + goto err3; + + if (event->event != RDMA_CM_EVENT_CONNECT_REQUEST) { + fprintf(stderr, "%d:%s: bad event waiting for connect request %d\n", + pid, __func__, event->event); + goto err2; + } + + if (!event->private_data || + (event->private_data_len < sizeof(*data->rem_dest))) { + fprintf(stderr, "%d:%s: bad private data len %d\n", pid, + __func__, event->private_data_len); + goto err2; + } + + data->rem_dest = malloc(sizeof *data->rem_dest); + if (!data->rem_dest) + goto err2; + + memcpy(data->rem_dest, event->private_data, sizeof(*data->rem_dest)); + + child_cm_id = (struct rdma_cm_id *)event->id; + ctx = pp_init_ctx(child_cm_id, data); + if (!ctx) { + free(data->rem_dest); + goto err1; + } + data->my_dest.psn = lrand48() & 0xffffff; + data->my_dest.qpn = 0; + data->my_dest.rkey = ctx->mr->rkey; + data->my_dest.vaddr = (uintptr_t)ctx->buf + ctx->size; + + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 1; + conn_param.initiator_depth = 1; + conn_param.private_data = &data->my_dest; + conn_param.private_data_len = sizeof(data->my_dest); + if (rdma_accept(child_cm_id, &conn_param)) { + fprintf(stderr, "%d:%s: rdma_accept failed\n", pid, __func__); + goto err1; + } + rdma_ack_cm_event(event); + if (rdma_get_cm_event(data->cm_channel, &event)) { + fprintf(stderr, "%d:%s: rdma_get_cm_event error\n", pid, __func__); + rdma_destroy_id(child_cm_id); + goto err3; + } + if (event->event != RDMA_CM_EVENT_ESTABLISHED) { + fprintf(stderr, "%d:%s: bad event waiting for established %d\n", + pid, __func__, event->event); + goto err1; + } + rdma_ack_cm_event(event); + } else { + for (t = res; t; t = t->ai_next) { + sockfd = socket(t->ai_family, t->ai_socktype, t->ai_protocol); + if (sockfd >= 0) { + n = 1; + + setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &n, sizeof n); + + if (!bind(sockfd, t->ai_addr, t->ai_addrlen)) + break; + close(sockfd); + sockfd = -1; + } + } + + if (sockfd < 0) { + fprintf(stderr, "%d:%s: Couldn't listen to port %d\n", pid, + __func__, data->port); + goto err4; + } + + listen(sockfd, 1); + connfd = accept(sockfd, NULL, 0); + if (connfd < 0) { + perror("server accept"); + fprintf(stderr, "%d:%s: accept() failed\n", pid, __func__); close(sockfd); - sockfd = -1; + goto err4; } - } - - freeaddrinfo(res); + + close(sockfd); - if (sockfd < 0) { - fprintf(stderr, "Couldn't listen to port %d\n", port); - return sockfd; + ctx = pp_init_ctx(data->ib_dev, data); + if (!ctx) + goto err4; + data->sockfd = connfd; } + freeaddrinfo(res); + return ctx; - listen(sockfd, 1); - connfd = accept(sockfd, NULL, 0); - if (connfd < 0) { - perror("server accept"); - fprintf(stderr, "accept() failed\n"); - close(sockfd); - return connfd; - } +err1: + rdma_destroy_id(child_cm_id); +err2: + rdma_ack_cm_event(event); +err3: + rdma_destroy_id(data->cm_id); + rdma_destroy_event_channel(data->cm_channel); +err4: + freeaddrinfo(res); +err5: + return NULL; - close(sockfd); - return connfd; } -static struct pingpong_dest *pp_server_exch_dest(int connfd, const struct pingpong_dest *my_dest) +static int pp_server_exch_dest(struct pp_data *data) { char msg[sizeof "0000:000000:000000:00000000:0000000000000000"]; - struct pingpong_dest *rem_dest = NULL; int parsed; int n; + + if (!data->use_cma) { + n = read(data->sockfd, msg, sizeof msg); + if (n != sizeof msg) { + perror("server read"); + fprintf(stderr, "%d:%s: %d/%d Couldn't read remote address\n", + pid, __func__, n, (int) sizeof msg); + goto err; + } + + if (data->rem_dest != NULL) + free(data->rem_dest); + data->rem_dest = malloc(sizeof *data->rem_dest); + if (!data->rem_dest) + goto err; + + parsed = sscanf(msg, "%x:%x:%x:%x:%Lx", &data->rem_dest->lid, + &data->rem_dest->qpn, &data->rem_dest->psn, + &data->rem_dest->rkey, &data->rem_dest->vaddr); + if (parsed != 5) { + fprintf(stderr, "%d:%s: Couldn't parse line <%.*s>\n", pid, + __func__, (int)sizeof msg, msg); + free(data->rem_dest); + goto err; + } + + sprintf(msg, "%04x:%06x:%06x:%08x:%016Lx", data->my_dest.lid, + data->my_dest.qpn, data->my_dest.psn, + data->my_dest.rkey, data->my_dest.vaddr); + if (write(data->sockfd, msg, sizeof msg) != sizeof msg) { + perror("server write"); + fprintf(stderr, "%d:%s: Couldn't send local address\n", + pid, __func__); + free(data->rem_dest); + goto err; + } + } - n = read(connfd, msg, sizeof msg); - if (n != sizeof msg) { - perror("server read"); - fprintf(stderr, "%d/%d: Couldn't read remote address\n", n, (int) sizeof msg); - goto out; - } - - rem_dest = malloc(sizeof *rem_dest); - if (!rem_dest) - goto out; - - parsed = sscanf(msg, "%x:%x:%x:%x:%Lx", &rem_dest->lid, &rem_dest->qpn, - &rem_dest->psn, &rem_dest->rkey, &rem_dest->vaddr); - if (parsed != 5) { - fprintf(stderr, "Couldn't parse line <%.*s>\n",(int)sizeof msg, - msg); - free(rem_dest); - rem_dest = NULL; - goto out; - } - - sprintf(msg, "%04x:%06x:%06x:%08x:%016Lx", my_dest->lid, my_dest->qpn, - my_dest->psn, my_dest->rkey, my_dest->vaddr); - if (write(connfd, msg, sizeof msg) != sizeof msg) { - perror("server write"); - fprintf(stderr, "Couldn't send local address\n"); - free(rem_dest); - rem_dest = NULL; - goto out; - } -out: - return rem_dest; -} - -static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, - unsigned size, - int tx_depth, int port) + return 0; +err: + return 1; +} + +static struct pingpong_context *pp_init_ctx(void *ptr, struct pp_data *data) { struct pingpong_context *ctx; + struct ibv_device *ib_dev; + struct rdma_cm_id *cm_id; ctx = malloc(sizeof *ctx); if (!ctx) return NULL; - ctx->size = size; - ctx->tx_depth = tx_depth; + ctx->size = data->size; + ctx->tx_depth = data->tx_depth; - ctx->buf = memalign(page_size, size * 2); + ctx->buf = memalign(page_size, ctx->size * 2); if (!ctx->buf) { - fprintf(stderr, "Couldn't allocate work buf.\n"); + fprintf(stderr, "%d:%s: Couldn't allocate work buf.\n", + pid, __func__); return NULL; } - memset(ctx->buf, 0, size * 2); + memset(ctx->buf, 0, ctx->size * 2); - ctx->context = ibv_open_device(ib_dev); - if (!ctx->context) { - fprintf(stderr, "Couldn't get context for %s\n", - ibv_get_device_name(ib_dev)); - return NULL; + if (data->use_cma) { + cm_id = (struct rdma_cm_id *)ptr; + ctx->context = cm_id->verbs; + if (!ctx->context) { + fprintf(stderr, "%d:%s: Unbound cm_id!!\n", pid, + __func__); + return NULL; + } + + } else { + ib_dev = (struct ibv_device *)ptr; + ctx->context = ibv_open_device(ib_dev); + if (!ctx->context) { + fprintf(stderr, "%d:%s: Couldn't get context for %s\n", + pid, __func__, ibv_get_device_name(ib_dev)); + return NULL; + } } ctx->pd = ibv_alloc_pd(ctx->context); if (!ctx->pd) { - fprintf(stderr, "Couldn't allocate PD\n"); + fprintf(stderr, "%d:%s: Couldn't allocate PD\n", pid, __func__); return NULL; } /* We dont really want IBV_ACCESS_LOCAL_WRITE, but IB spec says: * The Consumer is not allowed to assign Remote Write or Remote Atomic to * a Memory Region that has not been assigned Local Write. */ - ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size * 2, + ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, ctx->size * 2, IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE); if (!ctx->mr) { - fprintf(stderr, "Couldn't allocate MR\n"); + fprintf(stderr, "%d:%s: Couldn't allocate MR\n", pid, __func__); return NULL; } - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); - if (!ctx->cq) { - fprintf(stderr, "Couldn't create CQ\n"); + + ctx->ch = ibv_create_comp_channel(ctx->context); + if (!ctx->ch) { + fprintf(stderr, "%d:%s: Couldn't create comp channel\n", pid, + __func__); return NULL; } - { - struct ibv_qp_init_attr attr = { - .send_cq = ctx->cq, - .recv_cq = ctx->cq, - .cap = { - .max_send_wr = tx_depth, - /* Work around: driver doesnt support - * recv_wr = 0 */ - .max_recv_wr = 1, - .max_send_sge = 1, - .max_recv_sge = 1, - .max_inline_data = 0 - }, - .qp_type = IBV_QPT_RC - }; + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); + if (!ctx->rcq) { + fprintf(stderr, "%d:%s: Couldn't create recv CQ\n", pid, + __func__); + return NULL; + } - ctx->qp = ibv_create_qp(ctx->pd, &attr); - if (!ctx->qp) { - fprintf(stderr, "Couldn't create QP\n"); - return NULL; - } + ctx->scq = ibv_create_cq(ctx->context, ctx->tx_depth, ctx, ctx->ch, 0); + if (!ctx->scq) { + fprintf(stderr, "%d:%s: Couldn't create send CQ\n", pid, + __func__); + return NULL; } - { - struct ibv_qp_attr attr; - attr.qp_state = IBV_QPS_INIT; - attr.pkey_index = 0; - attr.port_num = port; - attr.qp_access_flags = IBV_ACCESS_REMOTE_WRITE; - - if (ibv_modify_qp(ctx->qp, &attr, - IBV_QP_STATE | - IBV_QP_PKEY_INDEX | - IBV_QP_PORT | - IBV_QP_ACCESS_FLAGS)) { - fprintf(stderr, "Failed to modify QP to INIT\n"); + struct ibv_qp_init_attr attr = { + .send_cq = ctx->scq, + .recv_cq = ctx->rcq, + .cap = { + .max_send_wr = ctx->tx_depth, + /* Work around: driver doesnt support + * recv_wr = 0 */ + .max_recv_wr = 1, + .max_send_sge = 1, + .max_recv_sge = 1, + .max_inline_data = 0 + }, + .qp_type = IBV_QPT_RC + }; + + if (data->use_cma) { + if (rdma_create_qp(cm_id, ctx->pd, &attr)) { + fprintf(stderr, "%d:%s: Couldn't create QP\n", pid, __func__); return NULL; } + ctx->qp = cm_id->qp; + pp_post_recv(ctx); + return ctx; + } else { + ctx->qp = ibv_create_qp(ctx->pd, &attr); + if (!ctx->qp) { + fprintf(stderr, "%d:%s: Couldn't create QP\n", pid, __func__); + return NULL; + } + { + struct ibv_qp_attr attr; + + attr.qp_state = IBV_QPS_INIT; + attr.pkey_index = 0; + attr.port_num = data->ib_port; + attr.qp_access_flags = IBV_ACCESS_REMOTE_WRITE; + + if (ibv_modify_qp(ctx->qp, &attr, + IBV_QP_STATE | + IBV_QP_PKEY_INDEX | + IBV_QP_PORT | + IBV_QP_ACCESS_FLAGS)) { + fprintf(stderr, "%d:%s: Failed to modify QP to INIT\n", + pid, __func__); + return NULL; + } + } + + return ctx; } - - return ctx; + } -static int pp_connect_ctx(struct pingpong_context *ctx, int port, int my_psn, - struct pingpong_dest *dest) +static int pp_connect_ctx(struct pingpong_context *ctx, struct pp_data data) { struct ibv_qp_attr attr; memset(&attr, 0, sizeof attr); attr.qp_state = IBV_QPS_RTR; attr.path_mtu = IBV_MTU_2048; - attr.dest_qp_num = dest->qpn; - attr.rq_psn = dest->psn; + attr.dest_qp_num = data.rem_dest->qpn; + attr.rq_psn = data.rem_dest->psn; attr.max_dest_rd_atomic = 1; attr.min_rnr_timer = 12; attr.ah_attr.is_global = 0; - attr.ah_attr.dlid = dest->lid; + attr.ah_attr.dlid = data.rem_dest->lid; attr.ah_attr.sl = 0; attr.ah_attr.src_path_bits = 0; - attr.ah_attr.port_num = port; + attr.ah_attr.port_num = data.ib_port; if (ibv_modify_qp(ctx->qp, &attr, IBV_QP_STATE | IBV_QP_AV | @@ -383,7 +659,7 @@ static int pp_connect_ctx(struct pingpon IBV_QP_RQ_PSN | IBV_QP_MAX_DEST_RD_ATOMIC | IBV_QP_MIN_RNR_TIMER)) { - fprintf(stderr, "Failed to modify QP to RTR\n"); + fprintf(stderr, "%d:%s: Failed to modify QP to RTR\n", pid, __func__); return 1; } @@ -391,7 +667,7 @@ static int pp_connect_ctx(struct pingpon attr.timeout = 14; attr.retry_cnt = 7; attr.rnr_retry = 7; - attr.sq_psn = my_psn; + attr.sq_psn = data.my_dest.psn; attr.max_rd_atomic = 1; if (ibv_modify_qp(ctx->qp, &attr, IBV_QP_STATE | @@ -400,13 +676,172 @@ static int pp_connect_ctx(struct pingpon IBV_QP_RNR_RETRY | IBV_QP_SQ_PSN | IBV_QP_MAX_QP_RD_ATOMIC)) { - fprintf(stderr, "Failed to modify QP to RTS\n"); + fprintf(stderr, "%d:%s: Failed to modify QP to RTS\n", pid, __func__); return 1; } return 0; } +static void pp_post_recv(struct pingpong_context *ctx) +{ + struct ibv_sge list; + struct ibv_recv_wr wr, *bad_wr; + int rc; + + list.addr = (uintptr_t) ctx->buf; + list.length = 1; + list.lkey = ctx->mr->lkey; + wr.next = NULL; + wr.wr_id = 0xdeadbeef; + wr.sg_list = &list; + wr.num_sge = 1; + + rc = ibv_post_recv(ctx->qp, &wr, &bad_wr); + if (rc) { + perror("ibv_post_recv"); + fprintf(stderr, "%d:%s: ibv_post_recv failed %d\n", pid, + __func__, rc); + } +} + +static void pp_wait_for_done(struct pingpong_context *ctx) +{ + struct ibv_wc wc; + int ne; + + do { + usleep(500); + ne = ibv_poll_cq(ctx->rcq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%d:%s: bad wc status %d\n", pid, __func__, + wc.status); + if (!(wc.opcode & IBV_WC_RECV)) + fprintf(stderr, "%d:%s: bad wc opcode %d\n", pid, __func__, + wc.opcode); + if (wc.wr_id != 0xdeadbeef) + fprintf(stderr, "%d:%s: bad wc wr_id 0x%x\n", pid, __func__, + (int)wc.wr_id); +} + +static void pp_send_done(struct pingpong_context *ctx) +{ + struct ibv_send_wr *bad_wr; + struct ibv_wc wc; + int ne; + + ctx->list.addr = (uintptr_t) ctx->buf; + ctx->list.length = 1; + ctx->list.lkey = ctx->mr->lkey; + ctx->wr.wr_id = 0xcafebabe; + ctx->wr.sg_list = &ctx->list; + ctx->wr.num_sge = 1; + ctx->wr.opcode = IBV_WR_SEND; + ctx->wr.send_flags = IBV_SEND_SIGNALED; + ctx->wr.next = NULL; + if (ibv_post_send(ctx->qp, &ctx->wr, &bad_wr)) { + fprintf(stderr, "%d:%s: ibv_post_send failed\n", pid, __func__); + return; + } + do { + usleep(500); + ne = ibv_poll_cq(ctx->scq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%d:%s: bad wc status %d\n", pid, __func__, + wc.status); + if (wc.opcode != IBV_WC_SEND) + fprintf(stderr, "%d:%s: bad wc opcode %d\n", pid, __func__, + wc.opcode); + if (wc.wr_id != 0xcafebabe) + fprintf(stderr, "%d:%s: bad wc wr_id 0x%x\n", pid, __func__, + (int)wc.wr_id); +} + +static void pp_wait_for_start(struct pingpong_context *ctx) +{ + struct ibv_wc wc; + int ne; + + do { + usleep(500); + ne = ibv_poll_cq(ctx->rcq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%d:%s: bad wc status %d\n", pid, __func__, + wc.status); + if (!(wc.opcode & IBV_WC_RECV)) + fprintf(stderr, "%d:%s: bad wc opcode %d\n", pid, __func__, + wc.opcode); + if (wc.wr_id != 0xdeadbeef) + fprintf(stderr, "%d:%s: bad wc wr_id 0x%x\n", pid, __func__, + (int)wc.wr_id); + pp_post_recv(ctx); +} + +static void pp_send_start(struct pingpong_context *ctx) +{ + struct ibv_send_wr *bad_wr; + struct ibv_wc wc; + int ne; + + ctx->list.addr = (uintptr_t) ctx->buf; + ctx->list.length = 1; + ctx->list.lkey = ctx->mr->lkey; + ctx->wr.wr_id = 0xabbaabba; + ctx->wr.sg_list = &ctx->list; + ctx->wr.num_sge = 1; + ctx->wr.opcode = IBV_WR_SEND; + ctx->wr.send_flags = IBV_SEND_SIGNALED; + ctx->wr.next = NULL; + if (ibv_post_send(ctx->qp, &ctx->wr, &bad_wr)) { + fprintf(stderr, "%d:%s: ibv_post_send failed\n", pid, __func__); + return; + } + do { + usleep(500); + ne = ibv_poll_cq(ctx->scq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%d:%s: bad wc status %d\n", pid, __func__, + wc.status); + if (wc.opcode != IBV_WC_SEND) + fprintf(stderr, "%d:%s: bad wc opcode %d\n", pid, __func__, + wc.opcode); + if (wc.wr_id != 0xabbaabba) + fprintf(stderr, "%d:%s: bad wc wr_id 0x%x\n", pid, __func__, + (int)wc.wr_id); +} + +static void pp_close_cma(struct pp_data data) +{ + struct rdma_cm_event *event; + int rc; + + if (data.servername) { + rc = rdma_disconnect(data.cm_id); + if (rc) { + perror("rdma_disconnect"); + fprintf(stderr, "%d:%s: rdma disconnect error\n", pid, + __func__); + return; + } + } + + rdma_get_cm_event(data.cm_channel, &event); + if (event->event != RDMA_CM_EVENT_DISCONNECTED) + fprintf(stderr, "%d:%s: unexpected event during disconnect %d\n", + pid, __func__, event->event); + rdma_ack_cm_event(event); + rdma_destroy_id(data.cm_id); + rdma_destroy_event_channel(data.cm_channel); +} + static void usage(const char *argv0) { printf("Usage:\n"); @@ -421,6 +856,7 @@ static void usage(const char *argv0) printf(" -t, --tx-depth= size of tx queue (default 100)\n"); printf(" -n, --iters= number of exchanges (at least 2, default 1000)\n"); printf(" -b, --bidirectional measure bidirectional bandwidth (default unidirectional)\n"); + printf(" -c, --cma use RDMA CM\n"); } static void print_report(unsigned int iters, unsigned size, int duplex, @@ -452,43 +888,46 @@ static void print_report(unsigned int it tsize = duplex ? 2 : 1; tsize = tsize * size; - printf("Bandwidth peak (#%d to #%d): %g MB/sec\n", + printf("\n%d: Bandwidth peak (#%d to #%d): %g MB/sec\n", pid, opt_posted, opt_completed, tsize * cycles_to_units / opt_delta / 0x100000); - printf("Bandwidth average: %g MB/sec\n", + printf("%d: Bandwidth average: %g MB/sec\n", pid, tsize * iters * cycles_to_units / (tcompleted[iters - 1] - tposted[0]) / 0x100000); - printf("Service Demand peak (#%d to #%d): %ld cycles/KB\n", + printf("%d: Service Demand peak (#%d to #%d): %ld cycles/KB\n", pid, opt_posted, opt_completed, (unsigned long)opt_delta * 1024 / tsize); - printf("Service Demand Avg : %ld cycles/KB\n", + printf("%d: Service Demand Avg : %ld cycles/KB\n", pid, (unsigned long)(tcompleted[iters - 1] - tposted[0]) * - 1024 / (tsize * iters)); + 1024 / (tsize * iters)); } int main(int argc, char *argv[]) { struct ibv_device **dev_list; - struct ibv_device *ib_dev; - struct pingpong_context *ctx; - struct pingpong_dest my_dest; - struct pingpong_dest *rem_dest; + struct pingpong_context *ctx = NULL; char *ib_devname = NULL; - char *servername = NULL; - int port = 18515; - int ib_port = 1; - long long size = 65536; - int tx_depth = 100; int iters = 1000; int scnt, ccnt; - int sockfd; int duplex = 0; struct ibv_qp *qp; - - cycles_t *tposted; - cycles_t *tcompleted; + cycles_t *tposted; + cycles_t *tcompleted; + struct pp_data data = { + .port = 18515, + .ib_port = 1, + .size = 65536, + .tx_depth = 100, + .use_cma = 0, + .servername = NULL, + .rem_dest = NULL, + .ib_dev = NULL, + .cm_channel = NULL, + .cm_id = NULL + + }; /* Parameter parsing. */ while (1) { @@ -502,17 +941,18 @@ int main(int argc, char *argv[]) { .name = "iters", .has_arg = 1, .val = 'n' }, { .name = "tx-depth", .has_arg = 1, .val = 't' }, { .name = "bidirectional", .has_arg = 0, .val = 'b' }, + { .name = "cma", .has_arg = 0, .val = 'c' }, { 0 } }; - c = getopt_long(argc, argv, "p:d:i:s:n:t:b", long_options, NULL); + c = getopt_long(argc, argv, "p:d:i:s:n:t:bc", long_options, NULL); if (c == -1) break; switch (c) { case 'p': - port = strtol(optarg, NULL, 0); - if (port < 0 || port > 65535) { + data.port = strtol(optarg, NULL, 0); + if (data.port < 0 || data.port > 65535) { usage(argv[0]); return 1; } @@ -523,24 +963,24 @@ int main(int argc, char *argv[]) break; case 'i': - ib_port = strtol(optarg, NULL, 0); - if (ib_port < 0) { + data.ib_port = strtol(optarg, NULL, 0); + if (data.ib_port < 0) { usage(argv[0]); return 1; } break; case 's': - size = strtoll(optarg, NULL, 0); - if (size < 1 || size > UINT_MAX / 2) { + data.size = strtoll(optarg, NULL, 0); + if (data.size < 1 || data.size > UINT_MAX / 2) { usage(argv[0]); return 1; } break; case 't': - tx_depth = strtol(optarg, NULL, 0); - if (tx_depth < 1) { usage(argv[0]); return 1; } + data.tx_depth = strtol(optarg, NULL, 0); + if (data.tx_depth < 1) { usage(argv[0]); return 1; } break; case 'n': @@ -552,19 +992,13 @@ int main(int argc, char *argv[]) break; - case 'l': - tx_depth = strtol(optarg, NULL, 0); - if (tx_depth < 1) { - usage(argv[0]); - return 1; - } - - break; - case 'b': duplex = 1; break; + case 'c': + data.use_cma = 1; + break; default: usage(argv[0]); return 1; @@ -572,103 +1006,156 @@ int main(int argc, char *argv[]) } if (optind == argc - 1) - servername = strdupa(argv[optind]); + data.servername = strdupa(argv[optind]); else if (optind < argc) { usage(argv[0]); return 1; } - + /* Get the PID and prepend it to every output on stdout/stderr + * This helps to parse output when multiple client/server are + * run from single host + */ + pid = getpid(); + + printf("%d: | port=%d | ib_port=%d | size=%d | tx_depth=%d | iters=%d | duplex=%d | cma=%d |\n", + pid, data.port, data.ib_port, data.size, data.tx_depth, + iters, duplex, data.use_cma); + /* Done with parameter parsing. Perform setup. */ - srand48(getpid() * time(NULL)); + srand48(pid * time(NULL)); page_size = sysconf(_SC_PAGESIZE); - dev_list = ibv_get_device_list(NULL); - - if (!ib_devname) { - ib_dev = dev_list[0]; - if (!ib_dev) { - fprintf(stderr, "No IB devices found\n"); + if (data.use_cma) { + data.cm_channel = rdma_create_event_channel(); + if (!data.cm_channel) { + fprintf(stderr, "%d:%s: rdma_create_event_channel failed\n", + pid, __func__); return 1; } - } else { - for (; (ib_dev = *dev_list); ++dev_list) - if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) - break; - if (!ib_dev) { - fprintf(stderr, "IB device %s not found\n", ib_devname); + if (rdma_create_id(data.cm_channel, &data.cm_id, NULL)) { + fprintf(stderr, "%d:%s: rdma_create_id failed\n", + pid, __func__); return 1; } - } - - ctx = pp_init_ctx(ib_dev, size, tx_depth, ib_port); - if (!ctx) - return 1; - - /* Create connection between client and server. - * We do it by exchanging data over a TCP socket connection. */ - - my_dest.lid = pp_get_local_lid(ctx, ib_port); - my_dest.qpn = ctx->qp->qp_num; - my_dest.psn = lrand48() & 0xffffff; - if (!my_dest.lid) { - fprintf(stderr, "Local lid 0x0 detected. Is an SM running?\n"); - return 1; - } - my_dest.rkey = ctx->mr->rkey; - my_dest.vaddr = (uintptr_t)ctx->buf + ctx->size; - - printf(" local address: LID %#04x, QPN %#06x, PSN %#06x " - "RKey %#08x VAddr %#016Lx\n", - my_dest.lid, my_dest.qpn, my_dest.psn, - my_dest.rkey, my_dest.vaddr); - - if (servername) { - sockfd = pp_client_connect(servername, port); - if (sockfd < 0) - return 1; - rem_dest = pp_client_exch_dest(sockfd, &my_dest); + + if (data.servername) { + ctx = pp_client_connect(&data); + if (!ctx) + return 1; + } else { + ctx = pp_server_connect(&data); + if (!ctx) + return 1; + } } else { - sockfd = pp_server_connect(port); - if (sockfd < 0) + dev_list = ibv_get_device_list(NULL); + + if (!ib_devname) { + data.ib_dev = dev_list[0]; + if (!data.ib_dev) { + fprintf(stderr, "%d:%s: No IB devices found\n", + pid, __func__); + return 1; + } + } else { + for (; (data.ib_dev = *dev_list); ++dev_list) + if (!strcmp(ibv_get_device_name(data.ib_dev), ib_devname)) + break; + if (!data.ib_dev) { + fprintf(stderr, "%d:%s: IB device %s not found\n", + pid, __func__, ib_devname); + return 1; + } + } + if (data.servername) { + ctx = pp_client_connect(&data); + if (!ctx) + return 1; + } else { + ctx = pp_server_connect(&data); + if (!ctx) + return 1; + } + data.my_dest.lid = pp_get_local_lid(ctx, data.ib_port); + if (!data.my_dest.lid) { + fprintf(stderr, "%d:%s: Local lid 0x0 detected. Is an SM running?\n", + pid, __func__); return 1; - rem_dest = pp_server_exch_dest(sockfd, &my_dest); + } + data.my_dest.qpn = ctx->qp->qp_num; + data.my_dest.psn = lrand48() & 0xffffff; + data.my_dest.rkey = ctx->mr->rkey; + data.my_dest.vaddr = (uintptr_t)ctx->buf + ctx->size; + + /* Create connection between client and server. + * We do it by exchanging data over a TCP socket connection. */ + if (data.servername) { + if (pp_client_exch_dest(&data)) + return 1; + } else { + if (pp_server_exch_dest(&data)) + return 1; + } } - if (!rem_dest) - return 1; - - printf(" remote address: LID %#04x, QPN %#06x, PSN %#06x, " - "RKey %#08x VAddr %#016Lx\n", - rem_dest->lid, rem_dest->qpn, rem_dest->psn, - rem_dest->rkey, rem_dest->vaddr); - - if (pp_connect_ctx(ctx, ib_port, my_dest.psn, rem_dest)) - return 1; + printf("%d: Local address: LID %#04x, QPN %#06x, PSN %#06x " + "RKey %#08x VAddr %#016Lx\n", pid, + data.my_dest.lid, data.my_dest.qpn, data.my_dest.psn, + data.my_dest.rkey, data.my_dest.vaddr); + + printf("%d: Remote address: LID %#04x, QPN %#06x, PSN %#06x, " + "RKey %#08x VAddr %#016Lx\n\n", pid, + data.rem_dest->lid, data.rem_dest->qpn, data.rem_dest->psn, + data.rem_dest->rkey, data.rem_dest->vaddr); + + if (data.use_cma) { + /* + * Synch up and force the server to wait for the client to send + * the first message (MPA requirement). + */ + if (data.servername) { + pp_send_start(ctx); + } else { + pp_wait_for_start(ctx); + } - /* An additional handshake is required *after* moving qp to RTR. - Arbitrarily reuse exch_dest for this purpose. */ - if (servername) { - rem_dest = pp_client_exch_dest(sockfd, &my_dest); } else { - rem_dest = pp_server_exch_dest(sockfd, &my_dest); + if (pp_connect_ctx(ctx, data)) + return 1; + + /* An additional handshake is required *after* moving qp to RTR. + Arbitrarily reuse exch_dest for this purpose. */ + if (data.servername) { + if (pp_client_exch_dest(&data)) + return 1; + } else { + if (pp_server_exch_dest(&data)) + return 1; + } } /* For half duplex tests, server just waits for client to exit */ - if (!servername && !duplex) { - rem_dest = pp_server_exch_dest(sockfd, &my_dest); - write(sockfd, "done", sizeof "done"); - close(sockfd); + if (!data.servername && !duplex) { + if (data.use_cma) { + pp_wait_for_done(ctx); + pp_send_done(ctx); + pp_close_cma(data); + } else { + pp_server_exch_dest(&data); + write(data.sockfd, "done", sizeof "done"); + close(data.sockfd); + } return 0; } ctx->list.addr = (uintptr_t) ctx->buf; ctx->list.length = ctx->size; ctx->list.lkey = ctx->mr->lkey; - ctx->wr.wr.rdma.remote_addr = rem_dest->vaddr; - ctx->wr.wr.rdma.rkey = rem_dest->rkey; + ctx->wr.wr.rdma.remote_addr = data.rem_dest->vaddr; + ctx->wr.wr.rdma.rkey = data.rem_dest->rkey; ctx->wr.wr_id = PINGPONG_RDMA_WRID; ctx->wr.sg_list = &ctx->list; ctx->wr.num_sge = 1; @@ -699,13 +1186,13 @@ int main(int argc, char *argv[]) while (scnt < iters || ccnt < iters) { - while (scnt < iters && scnt - ccnt < tx_depth) { + while (scnt < iters && scnt - ccnt < data.tx_depth) { struct ibv_send_wr *bad_wr; tposted[scnt] = get_cycles(); if (ibv_post_send(qp, &ctx->wr, &bad_wr)) { - fprintf(stderr, "Couldn't post send: scnt=%d\n", - scnt); + fprintf(stderr, "%d:%s: Couldn't post send: scnt=%d\n", + pid, __func__, scnt); return 1; } ++scnt; @@ -715,38 +1202,46 @@ int main(int argc, char *argv[]) struct ibv_wc wc; int ne; do { - ne = ibv_poll_cq(ctx->cq, 1, &wc); + ne = ibv_poll_cq(ctx->scq, 1, &wc); } while (ne == 0); tcompleted[ccnt] = get_cycles(); if (ne < 0) { - fprintf(stderr, "poll CQ failed %d\n", ne); + fprintf(stderr, "%d:%s: poll CQ failed %d\n", pid, + __func__, ne); return 1; } if (wc.status != IBV_WC_SUCCESS) { - fprintf(stderr, "Completion wth error at %s:\n", - servername ? "client" : "server"); - fprintf(stderr, "Failed status %d: wr_id %d\n", - wc.status, (int) wc.wr_id); - fprintf(stderr, "scnt=%d, ccnt=%d\n", - scnt, ccnt); + fprintf(stderr, "%d:%s: Completion with error at %s:\n", + pid, __func__, data.servername ? "client" : "server"); + fprintf(stderr, "%d:%s: Failed status %d: wr_id %d\n", + pid, __func__, wc.status, (int) wc.wr_id); + fprintf(stderr, "%d:%s: scnt=%d, ccnt=%d\n", + pid, __func__, scnt, ccnt); return 1; } ccnt += 1; } } - if (servername) { - rem_dest = pp_client_exch_dest(sockfd, &my_dest); + if (data.use_cma) { + /* This is racy when duplex mode is used*/ + pp_send_done(ctx); + pp_wait_for_done(ctx); + pp_close_cma(data); } else { - rem_dest = pp_server_exch_dest(sockfd, &my_dest); + if (data.servername) + pp_client_exch_dest(&data); + else + pp_server_exch_dest(&data); + + write(data.sockfd, "done", sizeof "done"); + close(data.sockfd); + } - - write(sockfd, "done", sizeof "done"); - close(sockfd); - - print_report(iters, size, duplex, tposted, tcompleted); + + print_report(iters, data.size, duplex, tposted, tcompleted); free(tposted); free(tcompleted); From bpradip at in.ibm.com Wed Jul 5 02:50:16 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Wed, 5 Jul 2006 15:20:16 +0530 Subject: [openib-general] [PATCH 2/2] perftest: enhancement to rdma_bw to allow use of RDMA CM Message-ID: <20060705095015.GC15470@harry-potter.in.ibm.com> This contains changes to the Makefile to properly build rdma_bw. Includes the librdmacm library. Signed-off-by: Pradipta Kumar Banerjee --- Index: perftest/Makefile ============================================================================= --- Makefile.org 2006-07-05 15:12:06.000000000 +0530 +++ Makefile 2006-07-05 15:12:17.000000000 +0530 @@ -10,7 +10,7 @@ EXTRA_HEADERS = get_clock.h LOADLIBES += LDFLAGS += -${TESTS}: LOADLIBES += -libverbs +${TESTS}: LOADLIBES += -libverbs -lrdmacm ${TESTS} ${UTILS}: %: %.c ${EXTRA_FILES} ${EXTRA_HEADERS} $(CC) $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) $< ${EXTRA_FILES} $(LOADLIBES) $(LDLIBS) -o ib_$@ From mst at mellanox.co.il Wed Jul 5 04:29:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 5 Jul 2006 14:29:13 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060705093835.GB15470@harry-potter.in.ibm.com> References: <20060705093835.GB15470@harry-potter.in.ibm.com> Message-ID: <20060705112913.GB21557@mellanox.co.il> Thanks for following up on this, on the surface looks good. I'm busy at the moment, but a quick question: Quoting r. Pradipta Kumar Banerjee : > +static void pp_wait_for_done(struct pingpong_context *ctx) > +{ > + struct ibv_wc wc; > + int ne; > + > + do { > + usleep(500); > + ne = ibv_poll_cq(ctx->rcq, 1, &wc); > + } while (ne == 0); I wander whether we can get rid of the rcq and an extra send at the end by simply closing the QP, and ignoring the possible completion with error for the last round. I think this would be a nice simplification. What do you think? -- MST From bpradip at in.ibm.com Wed Jul 5 05:31:50 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Wed, 05 Jul 2006 18:01:50 +0530 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060705112913.GB21557@mellanox.co.il> References: <20060705093835.GB15470@harry-potter.in.ibm.com> <20060705112913.GB21557@mellanox.co.il> Message-ID: <44ABB136.30307@in.ibm.com> Michael S. Tsirkin wrote: > Thanks for following up on this, on the surface looks good. > I'm busy at the moment, but a quick question: > > Quoting r. Pradipta Kumar Banerjee : >> +static void pp_wait_for_done(struct pingpong_context *ctx) >> +{ >> + struct ibv_wc wc; >> + int ne; >> + >> + do { >> + usleep(500); >> + ne = ibv_poll_cq(ctx->rcq, 1, &wc); >> + } while (ne == 0); > > I wander whether we can get rid of the rcq and an extra send at the > end by simply closing the QP, and ignoring the possible completion with error > for the last round. I think this would be a nice simplification. > What do you think? > Indeed we can get rid of the extra send at the end (pp_wait_for_done and pp_send_done can be removed). As for the rcq need to think about it. Its being used to synchronize the client and the server so that the client sends the first message. Thanks, Pradipta Kumar. From halr at voltaire.com Wed Jul 5 06:08:57 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jul 2006 09:08:57 -0400 Subject: [openib-general] [PATCH][MINOR] OpenSM/SA: Indicate unsupported method/attribute rather than invalid request in SA response Message-ID: <1152104935.4533.132054.camel@hal.voltaire.com> OpenSM/SA: Indicate unsupported method/attribute rather than invalid request in SA response When method/attribute is unsupported, indicate this error rather than more generic invalid request error. Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_guidinfo_record.c =================================================================== --- opensm/osm_sa_guidinfo_record.c (revision 8365) +++ opensm/osm_sa_guidinfo_record.c (working copy) @@ -445,7 +445,7 @@ osm_gir_rcv_process( "osm_gir_rcv_process: ERR 5105: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_vlarb_record.c =================================================================== --- opensm/osm_sa_vlarb_record.c (revision 8365) +++ opensm/osm_sa_vlarb_record.c (working copy) @@ -372,7 +372,7 @@ osm_vlarb_rec_rcv_process( "osm_vlarb_rec_rcv_process: ERR 2A05: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_multipath_record.c =================================================================== --- opensm/osm_sa_multipath_record.c (revision 8365) +++ opensm/osm_sa_multipath_record.c (working copy) @@ -1444,7 +1444,7 @@ osm_mpr_rcv_process( { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_mpr_rcv_process: ERR 4510: " - "Invalid request as RMPP_FLAG_ACTIVE is not set\n" ); + "Invalid request since RMPP_FLAG_ACTIVE is not set\n" ); osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); goto Exit; } @@ -1454,7 +1454,7 @@ osm_mpr_rcv_process( "osm_mpr_rcv_process: ERR 4513: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_lft_record.c =================================================================== --- opensm/osm_sa_lft_record.c (revision 8365) +++ opensm/osm_sa_lft_record.c (working copy) @@ -353,7 +353,7 @@ osm_lftr_rcv_process( "osm_lftr_rcv_process: ERR 4408: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_service_record.c =================================================================== --- opensm/osm_sa_service_record.c (revision 8365) +++ opensm/osm_sa_service_record.c (working copy) @@ -1085,7 +1085,7 @@ osm_sr_rcv_process( { osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, "osm_sr_rcv_process: " - "Component Mask check failed\n" ); + "Component Mask check failed for set request\n" ); osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); goto Exit; } @@ -1097,7 +1097,7 @@ osm_sr_rcv_process( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "osm_sr_rcv_process: " - "Component Mask check failed\n" ); + "Component Mask check failed for delete request\n" ); osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); goto Exit; } @@ -1110,8 +1110,9 @@ osm_sr_rcv_process( default: osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "osm_sr_rcv_process: " - "Bad Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_sa_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); break; } Index: opensm/osm_sa_portinfo_record.c =================================================================== --- opensm/osm_sa_portinfo_record.c (revision 8365) +++ opensm/osm_sa_portinfo_record.c (working copy) @@ -657,7 +657,7 @@ osm_pir_rcv_process( "osm_pir_rcv_process: ERR 2105: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_pkey_record.c =================================================================== --- opensm/osm_sa_pkey_record.c (revision 8365) +++ opensm/osm_sa_pkey_record.c (working copy) @@ -367,7 +367,7 @@ osm_pkey_rec_rcv_process( "osm_pkey_rec_rcv_process: ERR 4605: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_slvl_record.c =================================================================== --- opensm/osm_sa_slvl_record.c (revision 8365) +++ opensm/osm_sa_slvl_record.c (working copy) @@ -352,7 +352,7 @@ osm_slvl_rec_rcv_process( "osm_slvl_rec_rcv_process: ERR 2604: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_sminfo_record.c =================================================================== --- opensm/osm_sa_sminfo_record.c (revision 8365) +++ opensm/osm_sa_sminfo_record.c (working copy) @@ -168,7 +168,7 @@ osm_smir_rcv_process( "osm_smir_rcv_process: ERR 2804: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_informinfo.c =================================================================== --- opensm/osm_sa_informinfo.c (revision 8370) +++ opensm/osm_sa_informinfo.c (working copy) @@ -361,11 +361,7 @@ osm_infr_rcv_process_set_method( #if 0 if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) - { - osm_dump_inform_info( p_rcv->p_log, - p_recvd_inform_info, - OSM_LOG_DEBUG ); - } + osm_dump_inform_info( p_rcv->p_log, p_recvd_inform_info, OSM_LOG_DEBUG ); #endif /* Grab the lock */ @@ -534,25 +530,25 @@ osm_infr_rcv_process( IN const osm_madw_t* const p_madw ) { ib_sa_mad_t *p_sa_mad; - ib_net16_t sa_status = IB_SA_MAD_STATUS_REQ_INVALID; OSM_LOG_ENTER( p_rcv->p_log, osm_infr_rcv_process ); CL_ASSERT( p_madw ); + p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); - switch (p_sa_mad->method) + CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_INFORM_INFO ); + + if (p_sa_mad->method != IB_MAD_METHOD_SET) { - case IB_MAD_METHOD_SET: - osm_infr_rcv_process_set_method( p_rcv, p_madw ); - break; - default: osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "osm_infr_rcv_process: " - "Bad Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method )); - osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); - break; + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_sa_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); } + osm_infr_rcv_process_set_method( p_rcv, p_madw ); + OSM_LOG_EXIT( p_rcv->p_log ); } Index: opensm/osm_sa_node_record.c =================================================================== --- opensm/osm_sa_node_record.c (revision 8365) +++ opensm/osm_sa_node_record.c (working copy) @@ -445,7 +445,7 @@ osm_nr_rcv_process( "osm_nr_rcv_process: ERR 1D05: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_link_record.c =================================================================== --- opensm/osm_sa_link_record.c (revision 8365) +++ opensm/osm_sa_link_record.c (working copy) @@ -723,7 +723,7 @@ osm_lr_rcv_process( "osm_lr_rcv_process: ERR 1804: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_mcmember_record.c =================================================================== --- opensm/osm_sa_mcmember_record.c (revision 8365) +++ opensm/osm_sa_mcmember_record.c (working copy) @@ -2372,7 +2372,7 @@ osm_mcmr_rcv_process( "osm_mcmr_rcv_process: ERR 1B21: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); break; } Index: opensm/Makefile.am =================================================================== --- opensm/Makefile.am (revision 8365) +++ opensm/Makefile.am (working copy) @@ -31,7 +31,11 @@ $(srcdir)/../include/opensm/osm_svn_revi fi endif +if OSMV_OPENIB +libopensm_la_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT -DDUAL_SIDED_RMPP $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +else libopensm_la_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +endif if HAVE_LD_VERSION_SCRIPT libopensm_version_script = -Wl,--version-script=$(srcdir)/libopensm.map @@ -87,8 +91,13 @@ opensm_SOURCES = main.c osm_console.c os osm_ucast_mgr.c osm_ucast_updn.c osm_ucast_file.c \ osm_vl15intf.c osm_vl_arb_rcv.c \ osm_vl_arb_rcv_ctrl.c st.c +if OSMV_OPENIB +opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing -DVENDOR_RMPP_SUPPORT -DDUAL_SIDED_RMPP $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +opensm_CXXFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT -DDUAL_SIDED_RMPP $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +else opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 opensm_CXXFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +endif # for linking with the simulator client library we have to use g++: if OSMV_SIM Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8365) +++ opensm/osm_sa_path_record.c (working copy) @@ -1684,7 +1684,7 @@ osm_pr_rcv_process( "osm_pr_rcv_process: ERR 1F17: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } From halr at voltaire.com Wed Jul 5 06:17:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jul 2006 09:17:03 -0400 Subject: [openib-general] [PATCH][MINOR] OpenSM/SA: Indicate unsupported method/attribute rather than invalid request in SA response Message-ID: <1152105421.4533.132311.camel@hal.voltaire.com> OpenSM/SA: Indicate unsupported method/attribute rather than invalid request in SA response When method/attribute is unsupported, indicate this error rather than more generic invalid request error. Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_guidinfo_record.c =================================================================== --- opensm/osm_sa_guidinfo_record.c (revision 8365) +++ opensm/osm_sa_guidinfo_record.c (working copy) @@ -445,7 +445,7 @@ osm_gir_rcv_process( "osm_gir_rcv_process: ERR 5105: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_informinfo.c =================================================================== --- opensm/osm_sa_informinfo.c (revision 8370) +++ opensm/osm_sa_informinfo.c (working copy) @@ -361,11 +361,7 @@ osm_infr_rcv_process_set_method( #if 0 if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) - { - osm_dump_inform_info( p_rcv->p_log, - p_recvd_inform_info, - OSM_LOG_DEBUG ); - } + osm_dump_inform_info( p_rcv->p_log, p_recvd_inform_info, OSM_LOG_DEBUG ); #endif /* Grab the lock */ @@ -534,25 +530,25 @@ osm_infr_rcv_process( IN const osm_madw_t* const p_madw ) { ib_sa_mad_t *p_sa_mad; - ib_net16_t sa_status = IB_SA_MAD_STATUS_REQ_INVALID; OSM_LOG_ENTER( p_rcv->p_log, osm_infr_rcv_process ); CL_ASSERT( p_madw ); + p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); - switch (p_sa_mad->method) + CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_INFORM_INFO ); + + if (p_sa_mad->method != IB_MAD_METHOD_SET) { - case IB_MAD_METHOD_SET: - osm_infr_rcv_process_set_method( p_rcv, p_madw ); - break; - default: osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "osm_infr_rcv_process: " - "Bad Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method )); - osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); - break; + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_sa_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); } + osm_infr_rcv_process_set_method( p_rcv, p_madw ); + OSM_LOG_EXIT( p_rcv->p_log ); } Index: opensm/osm_sa_lft_record.c =================================================================== --- opensm/osm_sa_lft_record.c (revision 8365) +++ opensm/osm_sa_lft_record.c (working copy) @@ -353,7 +353,7 @@ osm_lftr_rcv_process( "osm_lftr_rcv_process: ERR 4408: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_link_record.c =================================================================== --- opensm/osm_sa_link_record.c (revision 8365) +++ opensm/osm_sa_link_record.c (working copy) @@ -723,7 +723,7 @@ osm_lr_rcv_process( "osm_lr_rcv_process: ERR 1804: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_mcmember_record.c =================================================================== --- opensm/osm_sa_mcmember_record.c (revision 8365) +++ opensm/osm_sa_mcmember_record.c (working copy) @@ -2372,7 +2372,7 @@ osm_mcmr_rcv_process( "osm_mcmr_rcv_process: ERR 1B21: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); break; } Index: opensm/osm_sa_multipath_record.c =================================================================== --- opensm/osm_sa_multipath_record.c (revision 8365) +++ opensm/osm_sa_multipath_record.c (working copy) @@ -1444,7 +1444,7 @@ osm_mpr_rcv_process( { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_mpr_rcv_process: ERR 4510: " - "Invalid request as RMPP_FLAG_ACTIVE is not set\n" ); + "Invalid request since RMPP_FLAG_ACTIVE is not set\n" ); osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); goto Exit; } @@ -1454,7 +1454,7 @@ osm_mpr_rcv_process( "osm_mpr_rcv_process: ERR 4513: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_node_record.c =================================================================== --- opensm/osm_sa_node_record.c (revision 8365) +++ opensm/osm_sa_node_record.c (working copy) @@ -445,7 +445,7 @@ osm_nr_rcv_process( "osm_nr_rcv_process: ERR 1D05: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8365) +++ opensm/osm_sa_path_record.c (working copy) @@ -1684,7 +1684,7 @@ osm_pr_rcv_process( "osm_pr_rcv_process: ERR 1F17: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_pkey_record.c =================================================================== --- opensm/osm_sa_pkey_record.c (revision 8365) +++ opensm/osm_sa_pkey_record.c (working copy) @@ -367,7 +367,7 @@ osm_pkey_rec_rcv_process( "osm_pkey_rec_rcv_process: ERR 4605: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_portinfo_record.c =================================================================== --- opensm/osm_sa_portinfo_record.c (revision 8365) +++ opensm/osm_sa_portinfo_record.c (working copy) @@ -657,7 +657,7 @@ osm_pir_rcv_process( "osm_pir_rcv_process: ERR 2105: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_service_record.c =================================================================== --- opensm/osm_sa_service_record.c (revision 8365) +++ opensm/osm_sa_service_record.c (working copy) @@ -1085,7 +1085,7 @@ osm_sr_rcv_process( { osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, "osm_sr_rcv_process: " - "Component Mask check failed\n" ); + "Component Mask check failed for set request\n" ); osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); goto Exit; } @@ -1097,7 +1097,7 @@ osm_sr_rcv_process( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "osm_sr_rcv_process: " - "Component Mask check failed\n" ); + "Component Mask check failed for delete request\n" ); osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); goto Exit; } @@ -1110,8 +1110,9 @@ osm_sr_rcv_process( default: osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "osm_sr_rcv_process: " - "Bad Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_sa_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); break; } Index: opensm/osm_sa_slvl_record.c =================================================================== --- opensm/osm_sa_slvl_record.c (revision 8365) +++ opensm/osm_sa_slvl_record.c (working copy) @@ -352,7 +352,7 @@ osm_slvl_rec_rcv_process( "osm_slvl_rec_rcv_process: ERR 2604: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_sminfo_record.c =================================================================== --- opensm/osm_sa_sminfo_record.c (revision 8365) +++ opensm/osm_sa_sminfo_record.c (working copy) @@ -168,7 +168,7 @@ osm_smir_rcv_process( "osm_smir_rcv_process: ERR 2804: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } Index: opensm/osm_sa_vlarb_record.c =================================================================== --- opensm/osm_sa_vlarb_record.c (revision 8365) +++ opensm/osm_sa_vlarb_record.c (working copy) @@ -372,7 +372,7 @@ osm_vlarb_rec_rcv_process( "osm_vlarb_rec_rcv_process: ERR 2A05: " "Unsupported Method (%s)\n", ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } From swise at opengridcomputing.com Wed Jul 5 06:24:10 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 05 Jul 2006 08:24:10 -0500 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060705112913.GB21557@mellanox.co.il> References: <20060705093835.GB15470@harry-potter.in.ibm.com> <20060705112913.GB21557@mellanox.co.il> Message-ID: <1152105850.2316.3.camel@stevo-desktop> On Wed, 2006-07-05 at 14:29 +0300, Michael S. Tsirkin wrote: > Thanks for following up on this, on the surface looks good. > I'm busy at the moment, but a quick question: > > Quoting r. Pradipta Kumar Banerjee : > > +static void pp_wait_for_done(struct pingpong_context *ctx) > > +{ > > + struct ibv_wc wc; > > + int ne; > > + > > + do { > > + usleep(500); > > + ne = ibv_poll_cq(ctx->rcq, 1, &wc); > > + } while (ne == 0); > > I wander whether we can get rid of the rcq and an extra send at the > end by simply closing the QP, and ignoring the possible completion with error > for the last round. I think this would be a nice simplification. > What do you think? > For IWARP, the ULP must quiesce the QP before moving it to CLOSING (ie before doing an rdma_disconnect()). By quiesce, I mean make sure there are no outstanding SQ WRs and no outstanding RDMA reads in progress. That's why there is this message exchange at the end of the test. For instance, in the bi-directional bandwidth tests, one side could finish the test and issue an rdma_disconnect() while the other end still has pending rdma writes. This is an error condition in iWARP. So they sync up at the end with a message exchange before disconnecting. Make sense? From halr at voltaire.com Wed Jul 5 07:31:13 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jul 2006 10:31:13 -0400 Subject: [openib-general] [PATCH] osmtest/osmtest.c: Fix status decode in ib_get_mad_status_str Message-ID: <1152109871.4533.135615.camel@hal.voltaire.com> osmtest/osmtest.c: Fix status decode in ib_get_mad_status_str Signed-off-by: Hal Rosenstock Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 8370) +++ osmtest/osmtest.c (working copy) @@ -284,7 +284,7 @@ ib_get_mad_status_str( IN const ib_mad_t strcat( &line[offset], ib_mad_status_str_redirect ); offset += sizeof( ib_mad_status_str_redirect ) - 1; } - if( status & IB_MAD_STATUS_UNSUP_CLASS_VER ) + if( ( status & IB_MAD_STATUS_INVALID_FIELD ) == IB_MAD_STATUS_UNSUP_CLASS_VER ) { if( !first ) { @@ -295,7 +295,7 @@ ib_get_mad_status_str( IN const ib_mad_t strcat( &line[offset], ib_mad_status_str_unsup_class_ver ); offset += sizeof( ib_mad_status_str_unsup_class_ver ) - 1; } - if( status & IB_MAD_STATUS_UNSUP_METHOD ) + if( ( status & IB_MAD_STATUS_INVALID_FIELD ) == IB_MAD_STATUS_UNSUP_METHOD ) { if( !first ) { @@ -306,7 +306,7 @@ ib_get_mad_status_str( IN const ib_mad_t strcat( &line[offset], ib_mad_status_str_unsup_method ); offset += sizeof( ib_mad_status_str_unsup_method ) - 1; } - if( status & IB_MAD_STATUS_UNSUP_METHOD_ATTR ) + if( (status & IB_MAD_STATUS_INVALID_FIELD ) == IB_MAD_STATUS_UNSUP_METHOD_ATTR ) { if( !first ) { @@ -317,7 +317,7 @@ ib_get_mad_status_str( IN const ib_mad_t strcat( &line[offset], ib_mad_status_str_unsup_method_attr ); offset += sizeof( ib_mad_status_str_unsup_method_attr ) - 1; } - if( status & IB_MAD_STATUS_INVALID_FIELD ) + if( ( status & IB_MAD_STATUS_INVALID_FIELD ) == IB_MAD_STATUS_INVALID_FIELD ) { if( !first ) { From Don.Albert at Bull.com Wed Jul 5 08:30:38 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Wed, 5 Jul 2006 08:30:38 -0700 Subject: [openib-general] [openfabrics-ewg] Link Initialization problem and hangs in MTHCA on OFED-1.0 In-Reply-To: Message-ID: If anyone is keeping score, the problem we were having with a Mellanox MT25204 DDR HCA apparently initializing correctly but failing to receive any MAD datagrams from the link is now solved. It turned out to be some interaction between the BIOS and the latest OFED releases. The machine in question is an Intel SE7525GP2 motherboard (7525 chip set). Updating to version P10 of the BIOS firmware fixed the problem. Thanks to Hal Rosenstock for his advice and suggestions. -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Jul 5 08:41:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jul 2006 11:41:07 -0400 Subject: [openib-general] [PATCH] RMPP: add Dual-sided RMPP support In-Reply-To: <000001c69cc2$9f7ac710$4e268686@amr.corp.intel.com> References: <000001c69cc2$9f7ac710$4e268686@amr.corp.intel.com> Message-ID: <1152113775.4533.138583.camel@hal.voltaire.com> On Fri, 2006-06-30 at 23:58, Sean Hefty wrote: > Add support for dual-sided RMPP transfers. > > The implementation assumes that any RMPP request that requires a response uses > DS RMPP. Based on the RMPP start-up scenarios defined by the spec, this should > be a valid assumption. That is, there is no start-up scenario defined where > an RMPP request is followed by a non-RMPP response. By having this assumption, > we avoid any API changes. > > In order for a node that supports DS RMPP to communicate with one that does not, > RMPP responses assume a new window size of 1 if a DS ACK has not been received. > (By DS ACK, I'm referring to the ACK of the final ACK to the request.) This > is a slight spec deviation, but is necessary to allow communication with nodes > that do not generate the DS ACK. It also handles the case when a response is > sent after the request state has been discarded. > > Signed-off-by: Sean Hefty > --- > This was tested by running grmpp between OpenFabric nodes running with and > without DS RMPP support. Additional testing is desirable before committing, > since it affects all MADs using RMPP. This worked for me in the various configurations I tried testing SA MultiPathRecord (one real world consumer of DS RMPP). -- Hal From mike.heffner at evergrid.com Wed Jul 5 09:15:05 2006 From: mike.heffner at evergrid.com (Mike Heffner) Date: Wed, 05 Jul 2006 12:15:05 -0400 Subject: [openib-general] Error messages for ibverbs failures Message-ID: <44ABE589.704@evergrid.com> All, Is there a way to get detailed error messages for failures in the ibverbs API? Something similar to perror/strerror or VAPI_strerror from VAPI? Cheers, Mike -- Mike Heffner EverGrid Software Blacksburg, VA USA Voice: (540) 443-3500 #603 From halr at voltaire.com Wed Jul 5 10:32:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jul 2006 13:32:58 -0400 Subject: [openib-general] [PATCH] OpenSM/ib_types.h: Define component masks for SMInfo and InformInfo records Message-ID: <1152120777.4533.143624.camel@hal.voltaire.com> OpenSM/ib_types.h: Define component masks for SMInfo and InformInfo records Signed-off-by: Hal Rosenstock Index: include/iba/ib_types.h =================================================================== --- include/iba/ib_types.h (revision 8390) +++ include/iba/ib_types.h (working copy) @@ -2459,6 +2459,35 @@ typedef struct _ib_path_rec #define IB_MPR_COMPMASK_DGIDCOUNT (CL_HTON64(((uint64_t)1)<<20)) #define IB_MPR_COMPMASK_RESV4 (CL_HTON64(((uint64_t)1)<<21)) +/* SMInfo Record Component Masks */ +#define IB_SMIR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) +#define IB_SMIR_COMPMASK_RESV0 (CL_HTON64(((uint64_t)1)<<1)) +#define IB_SMIR_COMPMASK_GUID (CL_HTON64(((uint64_t)1)<<2)) +#define IB_SMIR_COMPMASK_SMKEY (CL_HTON64(((uint64_t)1)<<3)) +#define IB_SMIR_COMPMASK_ACTCOUNT (CL_HTON64(((uint64_t)1)<<4)) +#define IB_SMIR_COMPMASK_PRIORITY (CL_HTON64(((uint64_t)1)<<5)) +#define IB_SMIR_COMPMASK_SMSTATE (CL_HTON64(((uint64_t)1)<<6)) + +/* InformInfo Record Component Masks */ +#define IB_IIR_COMPMASK_SUBSCRIBERGID (CL_HTON64(((uint64_t)1)<<0)) +#define IB_IIR_COMPMASK_ENUM (CL_HTON64(((uint64_t)1)<<1)) +#define IB_IIR_COMPMASK_RESV0 (CL_HTON64(((uint64_t)1)<<2)) +#define IB_IIR_COMPMASK_GID (CL_HTON64(((uint64_t)1)<<3)) +#define IB_IIR_COMPMASK_LIDRANGEBEGIN (CL_HTON64(((uint64_t)1)<<4)) +#define IB_IIR_COMPMASK_LIDRANGEEND (CL_HTON64(((uint64_t)1)<<5)) +#define IB_IIR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<6)) +#define IB_IIR_COMPMASK_ISGENERIC (CL_HTON64(((uint64_t)1)<<7)) +#define IB_IIR_COMPMASK_SUBSCRIBE (CL_HTON64(((uint64_t)1)<<8)) +#define IB_IIR_COMPMASK_TYPE (CL_HTON64(((uint64_t)1)<<9)) +#define IB_IIR_COMPMASK_TRAPNUMB (CL_HTON64(((uint64_t)1)<<10)) +#define IB_IIR_COMPMASK_DEVICEID (CL_HTON64(((uint64_t)1)<<10)) +#define IB_IIR_COMPMASK_QPN (CL_HTON64(((uint64_t)1)<<11)) +#define IB_IIR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<12)) +#define IB_IIR_COMPMASK_RESPTIME (CL_HTON64(((uint64_t)1)<<13)) +#define IB_IIR_COMPMASK_RESV3 (CL_HTON64(((uint64_t)1)<<14)) +#define IB_IIR_COMPMASK_PRODTYPE (CL_HTON64(((uint64_t)1)<<15)) +#define IB_IIR_COMPMASK_VENDID (CL_HTON64(((uint64_t)1)<<15)) + /****f* IBA Base: Types/ib_path_rec_init_local * NAME * ib_path_rec_init_local From bugzilla-daemon at openib.org Wed Jul 5 11:05:34 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 5 Jul 2006 11:05:34 -0700 (PDT) Subject: [openib-general] [Bug 159] OFED1.0: Missing interfaces Message-ID: <20060705180534.D09EF228742@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=159 ------- Comment #2 from venkatesh.babu at 3leafnetworks.com 2006-07-05 11:05 ------- Ok, now I understand how ib_cm_path_migrate() can be implemented in application. Can you comment on the tsIbInServiceNoticeHandler() interface also. Is there some mechanism to achieve the same functionality in Gen2 ? (In reply to comment #1) > Item 25 from FM #35398: > Gen2 is compliant. In Gen2, the driver does not do any modify_qp commands for > LAP/APR. Rather, it relies on the application to manage alternate path > migration. > > This is done via procedures in libibcm/src/cm.c, libibverbs/src/marshall.c, > linux-kernel/infiniband/core/cm.c, core/ucm.c, core/uverbs_marshall.c > > Thus, for example, user-space app can call ib_cm_send_lap() to send the new alt > path. The receiver can do ib_cm_poll() to see if there is any event waiting to > be pulled to user space. The user-space app can then call ib_cm_get_event() > to get the LAP event (including the new alternate path info), and can send the > reply using ib_cm_send_apr(). > Responsibility for actually performing the modify-qp is left to the user > application. > > This is more in keeping with the IB Spec. > ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Wed Jul 5 11:02:31 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 5 Jul 2006 21:02:31 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <44ABB136.30307@in.ibm.com> References: <44ABB136.30307@in.ibm.com> Message-ID: <20060705180231.GA22390@mellanox.co.il> Quoting r. Pradipta Kumar Banerjee : > Subject: Re: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM > > Michael S. Tsirkin wrote: > > Thanks for following up on this, on the surface looks good. > > I'm busy at the moment, but a quick question: > > > > Quoting r. Pradipta Kumar Banerjee : > >> +static void pp_wait_for_done(struct pingpong_context *ctx) > >> +{ > >> + struct ibv_wc wc; > >> + int ne; > >> + > >> + do { > >> + usleep(500); > >> + ne = ibv_poll_cq(ctx->rcq, 1, &wc); > >> + } while (ne == 0); > > > > I wander whether we can get rid of the rcq and an extra send at the > > end by simply closing the QP, and ignoring the possible completion with error > > for the last round. I think this would be a nice simplification. > > What do you think? > > > Indeed we can get rid of the extra send at the end (pp_wait_for_done and > pp_send_done can be removed). As for the rcq need to think about it. Its being > used to synchronize the client and the server so that the client sends the first > message. Why do we need an RC send for this? CM or socket message should be enough, no? -- MST From halr at voltaire.com Wed Jul 5 11:03:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jul 2006 14:03:42 -0400 Subject: [openib-general] [PATCH] osmtest/osmtest.c: Add SMInfo record test Message-ID: <1152122620.4533.144975.camel@hal.voltaire.com> osmtest/osmtest.c: Add SMInfo record test Signed-off-by: Hal Rosenstock Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 8390) +++ osmtest/osmtest.c (working copy) @@ -4667,6 +4667,84 @@ osmtest_get_lft_rec_by_lid( IN osmtest_t OSM_LOG_EXIT( &p_osmt->log ); return ( status ); } + +/********************************************************************** + **********************************************************************/ +ib_api_status_t +osmtest_sminfo_record_request( + IN osmtest_t * const p_osmt, + IN OUT osmtest_req_context_t * const p_context ) +{ + ib_api_status_t status = IB_SUCCESS; + osmv_user_query_t user; + osmv_query_req_t req; + ib_sminfo_record_t record; + ib_mad_t *p_mad; + + OSM_LOG_ENTER( &p_osmt->log, osmtest_sminfo_record_request ); + + /* + * Do a blocking query for these records in the subnet. + * The result is returned in the result field of the caller's + * context structure. + * + * The query structures are locals. + */ + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &record, 0, sizeof( record ) ); + + p_context->p_osmt = p_osmt; + user.attr_id = IB_MAD_ATTR_SMINFO_RECORD; + user.attr_offset = cl_ntoh16( ( uint16_t ) ( sizeof( record ) >> 3 ) ); + user.p_attr = &record; + + req.query_type = OSMV_QUERY_USER_DEFINED; + req.timeout_ms = p_osmt->opt.transaction_timeout; + req.retry_cnt = p_osmt->opt.retry_count; + + req.flags = OSM_SA_FLAGS_SYNC; + req.query_context = p_context; + req.pfn_query_cb = osmtest_query_res_cb; + req.p_query_input = &user; + req.sm_key = 0; + + status = osmv_query_sa( p_osmt->h_bind, &req ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_sminfo_record_request: ERR 008C: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); + goto Exit; + } + + status = p_context->result.status; + + if( status != IB_SUCCESS ) + { + if (status != IB_INVALID_PARAMETER) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_sminfo_record_request: ERR 008D: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); + } + if( status == IB_REMOTE_ERROR ) + { + p_mad = osm_madw_get_mad_ptr( p_context->result.p_result_madw ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_sminfo_record_request: " + "Remote error = %s\n", + ib_get_mad_status_str( p_mad )); + + status = (ib_net16_t) (p_mad->status & IB_SMP_STATUS_MASK ); + } + goto Exit; + } + + Exit: + OSM_LOG_EXIT( &p_osmt->log ); + return ( status ); +} #endif /********************************************************************** @@ -5451,6 +5529,12 @@ osmtest_validate_against_db( IN osmtest_ if ( status != IB_SUCCESS ) goto Exit; + /* SMInfoRecord test */ + memset( &context, 0, sizeof( context ) ); + status = osmtest_sminfo_record_request( p_osmt, &context ); + if ( status != IB_SUCCESS ) + goto Exit; + if (lmc != 0) { test_lid = cl_ntoh16( p_osmt->local_port.lid + 1 ); From halr at voltaire.com Wed Jul 5 11:15:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jul 2006 14:15:04 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_sa_sminfo_record.c: In osm_smir_rcv_process, move pkey match to be ahead of allocating response MAD which actually eliminates the leak of the response MAD on pkey not matching Message-ID: <1152123299.4533.145507.camel@hal.voltaire.com> OpenSM/osm_sa_sminfo_record.c: In osm_smir_rcv_process, move pkey match to be ahead of allocating response MAD which actually eliminates the leak of the response MAD on pkey not matching Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_sminfo_record.c =================================================================== --- opensm/osm_sa_sminfo_record.c (revision 8391) +++ opensm/osm_sa_sminfo_record.c (working copy) @@ -172,6 +172,19 @@ osm_smir_rcv_process( goto Exit; } + /* check the matching of pkeys with the local physp the SM is on. */ + local_guid = p_rcv->p_subn->sm_port_guid; + local_port = (osm_port_t*)cl_qmap_get( &p_rcv->p_subn->port_guid_tbl, local_guid ); + if (FALSE == + osm_physp_share_pkey( p_rcv->p_log, p_req_physp, + osm_port_get_default_phys_ptr( local_port ) ) ) + { + osm_log(p_rcv->p_log, OSM_LOG_ERROR, + "osm_smir_rcv_process: ERR 2802: " + "Cannot get SMInfo record due to pkey violation\n" ); + goto Exit; + } + /* * Get a MAD to reply. Address of Mad is in the received mad_wrapper */ @@ -187,19 +200,6 @@ osm_smir_rcv_process( goto Exit; } - /* check the matching of pkeys with the local physp the SM is on. */ - local_guid = p_rcv->p_subn->sm_port_guid; - local_port = (osm_port_t*)cl_qmap_get( &p_rcv->p_subn->port_guid_tbl, local_guid ); - if (FALSE == - osm_physp_share_pkey( p_rcv->p_log, p_req_physp, - osm_port_get_default_phys_ptr( local_port ) ) ) - { - osm_log(p_rcv->p_log, OSM_LOG_ERROR, - "osm_smir_rcv_process: ERR 2802: " - "Cannot get SMInfo record due to pkey violation\n" ); - goto Exit; - } - p_resp_sa_mad = osm_madw_get_sa_mad_ptr( p_resp_madw ); p_resp_sminfo_rec = (ib_sminfo_record_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); From halr at voltaire.com Wed Jul 5 11:27:36 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jul 2006 14:27:36 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_sa_sminfo_record.c: In osm_smir_rcv_process, send response as RMPP if request was GetTable Message-ID: <1152124056.4533.146084.camel@hal.voltaire.com> OpenSM/osm_sa_sminfo_record.c: In osm_smir_rcv_process, send response as RMPP if request was GetTable Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_sminfo_record.c =================================================================== --- opensm/osm_sa_sminfo_record.c (revision 8392) +++ opensm/osm_sa_sminfo_record.c (working copy) @@ -224,10 +224,15 @@ osm_smir_rcv_process( /* Copy the MAD header back into the response mad */ memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); + if( p_resp_sa_mad->method == IB_MAD_METHOD_GETTABLE ) + { + p_resp_sa_mad->rmpp_flags = IB_RMPP_FLAG_ACTIVE; + /* Fill in the offset (paylen will be done by the rmpp SAR) */ + p_resp_sa_mad->attr_offset = + ib_get_attr_offset( sizeof(ib_sminfo_record_t) ); + } + p_resp_sa_mad->method |= IB_MAD_METHOD_RESP_MASK; - /* Fill in the offset (paylen will be done by the rmpp SAR) */ - p_resp_sa_mad->attr_offset = - ib_get_attr_offset( sizeof(ib_sminfo_record_t) ); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; From swise at opengridcomputing.com Wed Jul 5 11:54:39 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 05 Jul 2006 13:54:39 -0500 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060705180231.GA22390@mellanox.co.il> References: <44ABB136.30307@in.ibm.com> <20060705180231.GA22390@mellanox.co.il> Message-ID: <1152125679.12325.13.camel@stevo-desktop> On Wed, 2006-07-05 at 21:02 +0300, Michael S. Tsirkin wrote: > Quoting r. Pradipta Kumar Banerjee : > > Subject: Re: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM > > > > Michael S. Tsirkin wrote: > > > Thanks for following up on this, on the surface looks good. > > > I'm busy at the moment, but a quick question: > > > > > > Quoting r. Pradipta Kumar Banerjee : > > >> +static void pp_wait_for_done(struct pingpong_context *ctx) > > >> +{ > > >> + struct ibv_wc wc; > > >> + int ne; > > >> + > > >> + do { > > >> + usleep(500); > > >> + ne = ibv_poll_cq(ctx->rcq, 1, &wc); > > >> + } while (ne == 0); > > > > > > I wander whether we can get rid of the rcq and an extra send at the > > > end by simply closing the QP, and ignoring the possible completion with error > > > for the last round. I think this would be a nice simplification. > > > What do you think? > > > > > Indeed we can get rid of the extra send at the end (pp_wait_for_done and > > pp_send_done can be removed). As for the rcq need to think about it. Its being > > used to synchronize the client and the server so that the client sends the first > > message. > > Why do we need an RC send for this? CM or socket message should be enough, no? > There's no CM message for this. We could use the socket/tcp connection, but the current patch doesn't event setup the TCP connection if you use the --cma flag. So the best and easiest thing to do is use SEND. Why is this an issue? What's wrong with using the QP send/recv to sync up? Steve.. From sean.hefty at intel.com Wed Jul 5 12:16:30 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 5 Jul 2006 12:16:30 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44A8EC1D.9010605@voltaire.com> Message-ID: <000601c6a067$856bada0$17d0180a@amr.corp.intel.com> >Sean - looking on the cma/ucma APIs i see that the kernel APIs are not >in place yet (eg the equivalent of rdma_set/get_options) or in the >kernel the CMA consumer is expected to call directly the APIs exported >by rdma_cm_ib.h? Kernel clients are expected to call the APIs directly. - Sean From mst at mellanox.co.il Wed Jul 5 12:13:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 5 Jul 2006 22:13:03 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bwto allow use of RDMA CM In-Reply-To: <1152125679.12325.13.camel@stevo-desktop> References: <1152125679.12325.13.camel@stevo-desktop> Message-ID: <20060705191303.GA22777@mellanox.co.il> Quoting r. Steve Wise : > Why is this an issue? What's wrong with using the QP send/recv to sync up? I'd like code to be as simple as possible. I'm not developing a real protocol here, remember. What is the message doing, again? -- MST From swise at opengridcomputing.com Wed Jul 5 12:26:45 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 05 Jul 2006 14:26:45 -0500 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bwto allow use of RDMA CM In-Reply-To: <20060705191303.GA22777@mellanox.co.il> References: <1152125679.12325.13.camel@stevo-desktop> <20060705191303.GA22777@mellanox.co.il> Message-ID: <1152127605.12325.24.camel@stevo-desktop> On Wed, 2006-07-05 at 22:13 +0300, Michael S. Tsirkin wrote: > Quoting r. Steve Wise : > > Why is this an issue? What's wrong with using the QP send/recv to sync up? > > I'd like code to be as simple as possible. I'm not developing a real protocol > here, remember. What is the message doing, again? > It allows the client and server to sync up indicating both sides are done running the test. From mst at mellanox.co.il Wed Jul 5 12:28:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 5 Jul 2006 22:28:50 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <000601c6a067$856bada0$17d0180a@amr.corp.intel.com> References: <000601c6a067$856bada0$17d0180a@amr.corp.intel.com> Message-ID: <20060705192850.GA23077@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] ucma into kernel.org > > >Sean - looking on the cma/ucma APIs i see that the kernel APIs are not > >in place yet (eg the equivalent of rdma_set/get_options) or in the > >kernel the CMA consumer is expected to call directly the APIs exported > >by rdma_cm_ib.h? > > Kernel clients are expected to call the APIs directly. But are the options IB specific? Can they be geeralized to work for all transports? -- MST From mst at mellanox.co.il Wed Jul 5 12:30:27 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 5 Jul 2006 22:30:27 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement tordma_bwto allow use of RDMA CM In-Reply-To: <1152127605.12325.24.camel@stevo-desktop> References: <1152127605.12325.24.camel@stevo-desktop> Message-ID: <20060705193026.GB23077@mellanox.co.il> Quoting r. Steve Wise : > > I'd like code to be as simple as possible. I'm not developing a real > > protocol here, remember. What is the message doing, again? > > > > It allows the client and server to sync up indicating both sides are > done running the test. > We just agreed with Pradipta Kumar Banerjee this message is not needed at all - they can just exit. I think there was an additional message at the beginning, that's what we were discussing. -- MST From troy at scl.ameslab.gov Wed Jul 5 17:13:25 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Wed, 5 Jul 2006 19:13:25 -0500 Subject: [openib-general] libehca.conf required, but no documentation.. Message-ID: <20060706001325.GA6865@minbar-g5.scl.ameslab.gov> Please provide some documentation for libecha.conf. There is not even a README in userspace/libehca. Also, what firmware are the people doing echa development using? Most versions I have tried seem to have some sort of nasty issue if the SM doesn't bring the port active fast enough, which seems like some kind of fundamental design flaw to me. Could someone please explain this so I understand what is supposed to be happening? From sean.hefty at intel.com Wed Jul 5 21:09:08 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 5 Jul 2006 21:09:08 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060705192850.GA23077@mellanox.co.il> Message-ID: <000001c6a0b1$ee316d30$2bd1180a@amr.corp.intel.com> >But are the options IB specific? Can they be geeralized to work for all >transports? The options in rdma_cm_ib are IB specific (get/set path records and IB CM timeout values). - Sean From mst at mellanox.co.il Wed Jul 5 21:23:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 07:23:44 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <000001c6a0b1$ee316d30$2bd1180a@amr.corp.intel.com> References: <000001c6a0b1$ee316d30$2bd1180a@amr.corp.intel.com> Message-ID: <20060706042344.GA5453@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] ucma into kernel.org > > >But are the options IB specific? Can they be geeralized to work for all > >transports? > > The options in rdma_cm_ib are IB specific (get/set path records and IB CM > timeout values). Timeout values and retry count seem like something that is generic enough. At least for retry count TCP has TCP_SYNCNT, so this looks generic for me. No? With regard to timeout, can CMA use the ARP roundtrip time to estimate it, instead of asking the user or assuming hardcoded values? -- MST From ogerlitz at voltaire.com Wed Jul 5 21:52:48 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 06 Jul 2006 07:52:48 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706042344.GA5453@mellanox.co.il> References: <000001c6a0b1$ee316d30$2bd1180a@amr.corp.intel.com> <20060706042344.GA5453@mellanox.co.il> Message-ID: <44AC9720.8040104@voltaire.com> Michael S. Tsirkin wrote: > With regard to timeout, can CMA use the ARP roundtrip time to estimate it, > instead of asking the user or assuming hardcoded values? ARP resolution might be cached by the OS - so the CMA can not count on the round trip time Or. From bpradip at in.ibm.com Wed Jul 5 23:06:56 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Thu, 06 Jul 2006 11:36:56 +0530 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <1152105850.2316.3.camel@stevo-desktop> References: <20060705093835.GB15470@harry-potter.in.ibm.com> <20060705112913.GB21557@mellanox.co.il> <1152105850.2316.3.camel@stevo-desktop> Message-ID: <44ACA880.5090200@in.ibm.com> Steve Wise wrote: > On Wed, 2006-07-05 at 14:29 +0300, Michael S. Tsirkin wrote: >> Thanks for following up on this, on the surface looks good. >> I'm busy at the moment, but a quick question: >> >> Quoting r. Pradipta Kumar Banerjee : >>> +static void pp_wait_for_done(struct pingpong_context *ctx) >>> +{ >>> + struct ibv_wc wc; >>> + int ne; >>> + >>> + do { >>> + usleep(500); >>> + ne = ibv_poll_cq(ctx->rcq, 1, &wc); >>> + } while (ne == 0); >> I wander whether we can get rid of the rcq and an extra send at the >> end by simply closing the QP, and ignoring the possible completion with error >> for the last round. I think this would be a nice simplification. >> What do you think? >> > > For IWARP, the ULP must quiesce the QP before moving it to CLOSING (ie > before doing an rdma_disconnect()). By quiesce, I mean make sure there > are no outstanding SQ WRs and no outstanding RDMA reads in progress. > That's why there is this message exchange at the end of the test. > > For instance, in the bi-directional bandwidth tests, one side could > finish the test and issue an rdma_disconnect() while the other end still > has pending rdma writes. This is an error condition in iWARP. So they > sync up at the end with a message exchange before disconnecting. > > Make sense? Steve, Is it necessary to handle this error condition for measuring bandwidth ? Once the bandwidth computation is done, let the application terminate, which should anyway flush the queues. Thanks, Pradipta Kumar. > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Wed Jul 5 23:20:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 09:20:51 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <1152105850.2316.3.camel@stevo-desktop> References: <20060705093835.GB15470@harry-potter.in.ibm.com> <20060705112913.GB21557@mellanox.co.il> <1152105850.2316.3.camel@stevo-desktop> Message-ID: <20060706062051.GB5874@mellanox.co.il> Quoting r. Steve Wise : > For instance, in the bi-directional bandwidth tests, one side could > finish the test and issue an rdma_disconnect() while the other end still > has pending rdma writes. This is an error condition in iWARP. So they > sync up at the end with a message exchange before disconnecting. > > Make sense? I see. Is this the only message exchange there is? -- MST From dotanb at mellanox.co.il Wed Jul 5 23:43:54 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 6 Jul 2006 09:43:54 +0300 Subject: [openib-general] Error messages for ibverbs failures In-Reply-To: <44ABE589.704@evergrid.com> References: <44ABE589.704@evergrid.com> Message-ID: <200607060943.54850.dotanb@mellanox.co.il> Hi Mike. On Wednesday 05 July 2006 19:15, Mike Heffner wrote: > Is there a way to get detailed error messages for failures in the > ibverbs API? Something similar to perror/strerror or VAPI_strerror from > VAPI? As much as i know there isn't any way to know what the problem was. Sometimes, when the verb return an error, errno contains the problems (it depend the location of the error :user / kernel). There aren't any IB error messages in errno, for example: EINVAL can be invalid LID, or PORT or any other parameter ... Dotan From bpradip at in.ibm.com Wed Jul 5 23:50:28 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Thu, 06 Jul 2006 12:20:28 +0530 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060705180231.GA22390@mellanox.co.il> References: <44ABB136.30307@in.ibm.com> <20060705180231.GA22390@mellanox.co.il> Message-ID: <44ACB2B4.9040306@in.ibm.com> Michael S. Tsirkin wrote: > Quoting r. Pradipta Kumar Banerjee : >> Subject: Re: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM >> >> Michael S. Tsirkin wrote: >>> Thanks for following up on this, on the surface looks good. >>> I'm busy at the moment, but a quick question: >>> >>> Quoting r. Pradipta Kumar Banerjee : >>>> +static void pp_wait_for_done(struct pingpong_context *ctx) >>>> +{ >>>> + struct ibv_wc wc; >>>> + int ne; >>>> + >>>> + do { >>>> + usleep(500); >>>> + ne = ibv_poll_cq(ctx->rcq, 1, &wc); >>>> + } while (ne == 0); >>> I wander whether we can get rid of the rcq and an extra send at the >>> end by simply closing the QP, and ignoring the possible completion with error >>> for the last round. I think this would be a nice simplification. >>> What do you think? >>> >> Indeed we can get rid of the extra send at the end (pp_wait_for_done and >> pp_send_done can be removed). As for the rcq need to think about it. Its being >> used to synchronize the client and the server so that the client sends the first >> message. > > Why do we need an RC send for this? CM or socket message should be enough, no? > Here is a case where socket message doesn't suffice - Ammasso iWARP cards create two interfaces - one the usual ethernet interface (ethX) and another iWARP interface (iwX). The RDMA protocol is run over the iWARP interface which doesn't allow for normal socket based communication. So if we want to use sockets for exchanging the setup info we need to add one more option to the rdma_bw.c, which will allow to use the IP addr bound to the ethX interface for socket based communication and the IP addr bound to the iWARP interface for RDMA operations. IMO using rcq seems to be a generic and better solution. What do you say ? Thanks, Pradipta Kumar. From mst at mellanox.co.il Thu Jul 6 00:16:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 10:16:41 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <44ACB2B4.9040306@in.ibm.com> References: <44ACB2B4.9040306@in.ibm.com> Message-ID: <20060706071641.GA7338@mellanox.co.il> Quoting r. Pradipta Kumar Banerjee : > IMO using rcq seems to be a generic and better solution. Hmm, I see. Need to document the message format then. We are only pasing the vaddr there, right? -- MST From mst at mellanox.co.il Thu Jul 6 00:51:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 10:51:37 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44AC9720.8040104@voltaire.com> References: <44AC9720.8040104@voltaire.com> Message-ID: <20060706075137.GB7338@mellanox.co.il> Quoting r. Or Gerlitz : > > With regard to timeout, can CMA use the ARP roundtrip time to estimate it, > > instead of asking the user or assuming hardcoded values? > > ARP resolution might be cached by the OS - so the CMA can not count on > the round trip time It's a problem, I agree, but hard-coding timeouts still does not make sense to me - I honestly don't see how will an application know which value to use here, since the roundtrip really depends on the topology. Any ideas on how this can be handled correctly? Does CMA at least back off exponentially on timeout? -- MST From mst at mellanox.co.il Thu Jul 6 00:58:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 10:58:01 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706042344.GA5453@mellanox.co.il> References: <20060706042344.GA5453@mellanox.co.il> Message-ID: <20060706075801.GC7338@mellanox.co.il> Quoting r. Michael S. Tsirkin : > > The options in rdma_cm_ib are IB specific (get/set path records and IB CM > > timeout values). > > Timeout values and retry count seem like something that is generic enough. > At least for retry count TCP has TCP_SYNCNT, so this looks generic for me. No? Thinking about this some more, how about changing max_cm_retries to max_request_retries, and adding this in rdma_cm as a generic option? Sean? I think we also need to do something about the timeouts, but I don't yet have any ideas on what this could be, so this probably better stay IB specific for now. -- MST From ogerlitz at voltaire.com Thu Jul 6 01:05:43 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 06 Jul 2006 11:05:43 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706075137.GB7338@mellanox.co.il> References: <44AC9720.8040104@voltaire.com> <20060706075137.GB7338@mellanox.co.il> Message-ID: <44ACC457.9000707@voltaire.com> Michael S. Tsirkin wrote: > It's a problem, I agree, but hard-coding timeouts still does not make sense > to me - I honestly don't see how will an application know which value to > use here, since the roundtrip really depends on the topology. > Any ideas on how this can be handled correctly? Does CMA at least back off > exponentially on timeout? From our experience on order K nodes cluster, we did not have issues with CM traffic, but: the CM traffic was not NxN but rather NxM where N was (say) 1K and M was (say) 16, the app being cluster file system - Lustre /VIBNAL which is the Lustre IB layer for the voltaire gen1 stack. As for NxN CM/CMA consumers, i recall it has been mentioned on this list that CM timeouts/retries had to be changed to have (say) N=128 nodes (ranks?) operating fine with Intel MPI using uDAPL. Sean - have you been into the loop of analyzing /debugging @ this site? Can you confirm **this** was the issue which made the setup broken and working when you enlarged/changed things (what? and from which value to which value?) Without any relevant (non) use case i don't think there's a need to spend energy on code to generate the correct timeouts/retries for this or that setting. Or. From mst at mellanox.co.il Thu Jul 6 01:19:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 11:19:42 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44ACC457.9000707@voltaire.com> References: <44ACC457.9000707@voltaire.com> Message-ID: <20060706081942.GA7925@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: ucma into kernel.org > > Michael S. Tsirkin wrote: > > It's a problem, I agree, but hard-coding timeouts still does not make sense > > to me - I honestly don't see how will an application know which value to > > use here, since the roundtrip really depends on the topology. > > > Any ideas on how this can be handled correctly? Does CMA at least back off > > exponentially on timeout? > > From our experience on order K nodes cluster, we did not have issues > with CM traffic, but: the CM traffic was not NxN but rather NxM where N > was (say) 1K and M was (say) 16, the app being cluster file system - > Lustre /VIBNAL which is the Lustre IB layer for the voltaire gen1 stack. Not sure what you mean by "did not have issues with CM traffic". Did no packets get lost? Did you run any other traffic on the same fabric, concurrently? I also don't really see how do gen1 tests have any bearing on gen2 CMA. > As for NxN CM/CMA consumers, i recall it has been mentioned on this list > that CM timeouts/retries had to be changed to have (say) N=128 nodes > (ranks?) operating fine with Intel MPI using uDAPL. > > Sean - have you been into the loop of analyzing /debugging @ this site? > > Can you confirm **this** was the issue which made the setup broken and > working when you enlarged/changed things (what? and from which value to > which value?) What I am saying that giving the application control over the timeouts seems more like a workaround than a solution. > Without any relevant (non) use case i don't think there's a need to > spend energy on code to generate the correct timeouts/retries for this > or that setting. I think apps already have control over retry count - witness TCP_SYNCNT. As for the timeouts - I think you are right that's why we need something adaptive, users won't have the energy to tune these per network/application. -- MST From ogerlitz at voltaire.com Thu Jul 6 01:34:09 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 06 Jul 2006 11:34:09 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706081942.GA7925@mellanox.co.il> References: <44ACC457.9000707@voltaire.com> <20060706081942.GA7925@mellanox.co.il> Message-ID: <44ACCB01.1080709@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> From our experience on order K nodes cluster, we did not have issues >> with CM traffic, but: the CM traffic was not NxN but rather NxM where N >> was (say) 1K and M was (say) 16, the app being cluster file system - >> Lustre /VIBNAL which is the Lustre IB layer for the voltaire gen1 stack. > > Not sure what you mean by "did not have issues with CM traffic". Did no packets > get lost? Did you run any other traffic on the same fabric, concurrently? I > also don't really see how do gen1 tests have any bearing on gen2 CMA. The relevancy is simple: + it is IB + there are order K IB CM connections + it is in production, with the main IB apps being MPI && cluster fs I can't tell if part of the acceptance was "reconnecting" the file system with much other IB traffic in the air. Or. From mst at mellanox.co.il Thu Jul 6 01:57:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 11:57:24 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44ACCB01.1080709@voltaire.com> References: <44ACCB01.1080709@voltaire.com> Message-ID: <20060706085724.GA7969@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: ucma into kernel.org > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz : > >> From our experience on order K nodes cluster, we did not have issues > >> with CM traffic, but: the CM traffic was not NxN but rather NxM where N > >> was (say) 1K and M was (say) 16, the app being cluster file system - > >> Lustre /VIBNAL which is the Lustre IB layer for the voltaire gen1 stack. > > > > Not sure what you mean by "did not have issues with CM traffic". Did no > > packets get lost? Did you run any other traffic on the same fabric, > > concurrently? I also don't really see how do gen1 tests have any bearing on > > gen2 CMA. > > The relevancy is simple: > > + it is IB > + there are order K IB CM connections > + it is in production, with the main IB apps being MPI && cluster fs > > I can't tell if part of the acceptance was "reconnecting" the file > system with much other IB traffic in the air. > > Or. So what are you saying? What did you learn from this system? What does this say about CMA timeouts? That any timeout value is as good as any other? That packets are never lost? This is the part that I am not getting. -- MST From ogerlitz at voltaire.com Thu Jul 6 02:13:49 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 06 Jul 2006 12:13:49 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706085724.GA7969@mellanox.co.il> References: <44ACCB01.1080709@voltaire.com> <20060706085724.GA7969@mellanox.co.il> Message-ID: <44ACD44D.3020209@voltaire.com> Michael S. Tsirkin wrote: > So what are you saying? What did you learn from this system? What does this say > about CMA timeouts? That any timeout value is as good as any other? That > packets are never lost? This is the part that I am not getting. I have learned that without tuning the CM timeouts/retries running N x m (m <= M) CM re-connection-ing in parallel worked fine (eg it was part of the acceptance to kill a lustre server and have the all the clients reconnect to the ghost of this server). I can check the values if anyone can think it may be of help. Or. From mst at mellanox.co.il Thu Jul 6 02:18:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 12:18:36 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44ACD44D.3020209@voltaire.com> References: <44ACD44D.3020209@voltaire.com> Message-ID: <20060706091836.GB8925@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: ucma into kernel.org > > Michael S. Tsirkin wrote: > > So what are you saying? What did you learn from this system? What does this > > say about CMA timeouts? That any timeout value is as good as any other? That > > packets are never lost? This is the part that I am not getting. > > I have learned that without tuning the CM timeouts/retries running > N x m (m <= M) CM re-connection-ing in parallel worked fine (eg it was > part of the acceptance to kill a lustre server and have the all the > clients reconnect to the ghost of this server). OK, so that's one application. But this does not tell us much unless we know how gen1 CM managed timeouts as compared to gen2, does it? > I can check the values if anyone can think it may be of help. I guess we need this info if we are to learn anything from your setup. -- MST From bpradip at in.ibm.com Thu Jul 6 02:23:30 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Thu, 06 Jul 2006 14:53:30 +0530 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060706071641.GA7338@mellanox.co.il> References: <44ACB2B4.9040306@in.ibm.com> <20060706071641.GA7338@mellanox.co.il> Message-ID: <44ACD692.1010802@in.ibm.com> Michael S. Tsirkin wrote: > Quoting r. Pradipta Kumar Banerjee : >> IMO using rcq seems to be a generic and better solution. > > Hmm, I see. Need to document the message format then. > We are only pasing the vaddr there, right? > Michael, Actually 'rcq' is being used for handling the 'start' and 'done' messages. As for the lid, qpn, psn, rkey and vaddr, these gets exchanged as part of the rdma_listen/rdma_connect calls. See pp_server_connect and pp_client_connect. OTH I tried testing rdma_bw on Ammasso iWARP without exchanging the 'start' and 'done' messages and it worked. I am not sure if this is the right thing to do. Maybe Steve can throw more light on this. Thanks, Pradipta From yipeeyipeeyipeeyipee at yahoo.com Thu Jul 6 04:01:36 2006 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Thu, 6 Jul 2006 11:01:36 +0000 (UTC) Subject: [openib-general] =?utf-8?q?node=5Fguid_and_ports_guid?= Message-ID: Hi, Is there any defined relation between a node's guid and its ports' guid? On my HCA's I see that the 2 lsb bits of the node guids are always zero and port 1 's guid equals the node guid + 1. Does this always has to be the case? Thanks, y From ftillier at silverstorm.com Thu Jul 6 04:19:56 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Thu, 6 Jul 2006 07:19:56 -0400 Subject: [openib-general] node_guid and ports guid In-Reply-To: References: Message-ID: <79ae2f320607060419o175d66d1wab2c7b680359c6c4@mail.gmail.com> On 7/6/06, yipee wrote: > Hi, > > Is there any defined relation between a node's guid and its ports' guid? > On my HCA's I see that the 2 lsb bits of the node guids are always zero and port > 1 's guid equals the node guid + 1. Does this always has to be the case? No, GUID assignments are up to the vendor - the only requirement is that the OUI of the vendor be the first 3 bytes. The next 5 bytes can be anything the vendor decides. - Fab From halr at voltaire.com Thu Jul 6 04:15:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jul 2006 07:15:49 -0400 Subject: [openib-general] [PATCH] [MINOR] OpenSM/osm_helper: Add support for SMInfoRecord dumping Message-ID: <1152184357.4533.190813.camel@hal.voltaire.com> OpenSM/osm_helper: Add support for SMInfoRecord dumping Signed-off-by: Hal Rosenstock Index: include/opensm/osm_helper.h =================================================================== --- include/opensm/osm_helper.h (revision 8419) +++ include/opensm/osm_helper.h (working copy) @@ -271,6 +271,12 @@ osm_dump_inform_info_record( IN const osm_log_level_t log_level ); void +osm_dump_sm_info_record( + IN osm_log_t* const p_log, + IN const ib_sminfo_record_t* const p_smir, + IN const osm_log_level_t log_level ); + +void osm_dump_pkey_block( IN osm_log_t* const p_log, IN uint64_t port_guid, Index: opensm/osm_helper.c =================================================================== --- opensm/osm_helper.c (revision 8419) +++ opensm/osm_helper.c (working copy) @@ -1549,6 +1549,39 @@ osm_dump_sm_info( /********************************************************************** **********************************************************************/ void +osm_dump_sm_info_record( + IN osm_log_t* const p_log, + IN const ib_sminfo_record_t* const p_smir, + IN const osm_log_level_t log_level ) +{ + if( osm_log_is_active( p_log, log_level ) ) + { + osm_log( p_log, OSM_LOG_DEBUG, + "SMInfo Record dump:\n" + "\t\t\t\tRID\n" + "\t\t\t\tLid.....................0x%X\n" + "\t\t\t\tReserved................0x%X\n" + "\t\t\t\tSMInfo dump:\n" + "\t\t\t\tguid....................0x%016" PRIx64 "\n" + "\t\t\t\tsm_key..................0x%016" PRIx64 "\n" + "\t\t\t\tact_count...............%u\n" + "\t\t\t\tpriority................%u\n" + "\t\t\t\tsm_state................%u\n" + "", + cl_ntoh16( p_smir->lid ), + cl_ntoh16( p_smir->resv0 ), + cl_ntoh64( p_smir->sm_info.guid ), + cl_ntoh64( p_smir->sm_info.sm_key ), + cl_ntoh32( p_smir->sm_info.act_count ), + ib_sminfo_get_priority( &p_smir->sm_info ), + ib_sminfo_get_state( &p_smir->sm_info ) + ); + } +} + +/********************************************************************** + **********************************************************************/ +void osm_dump_notice( IN osm_log_t* const p_log, IN const ib_mad_notice_attr_t *p_ntci, Index: opensm/libopensm.map =================================================================== --- opensm/libopensm.map (revision 8419) +++ opensm/libopensm.map (working copy) @@ -32,6 +32,7 @@ OPENSM_1.0 { osm_dump_slvl_map_table; osm_dump_vl_arb_table; osm_dump_sm_info; + osm_dump_sm_info_record; osm_dump_notice; osm_dump_dr_smp; osm_dump_sa_mad; From halr at voltaire.com Thu Jul 6 04:19:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jul 2006 07:19:07 -0400 Subject: [openib-general] node_guid and ports guid In-Reply-To: References: Message-ID: <1152184497.4533.190920.camel@hal.voltaire.com> On Thu, 2006-07-06 at 07:01, yipee wrote: > Hi, > > Is there any defined relation between a node's guid and its ports' guid? > On my HCA's I see that the 2 lsb bits of the node guids are always zero and port > 1 's guid equals the node guid + 1. Does this always has to be the case? No, it is not an IBA requirement although vendors may and do do something like this. -- Hal > > > Thanks, > y > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Thu Jul 6 07:31:35 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 17:31:35 +0300 Subject: [openib-general] Kernel Oops related to IPoIB (multicast module?) In-Reply-To: <44A19F6D.7060008@ichips.intel.com> References: <44A19F6D.7060008@ichips.intel.com> Message-ID: <20060706143134.GA10491@mellanox.co.il> Quoting r. Sean Hefty : > > The SA query interface always invokes a callback, regardless if a call > > succeeds. So if a call to ib_sa_mcmmember_rec_set() fails (which happens in > > this case because the SM is down), the user's callback is still invoked. > > The multicast module is coded assuming that an immediate failure does not > > result in a callback, so the callback is unexpected, which throws off the > > reference counting. > > I've committed a patch that should hopefully fix this problem. > > The problem was that a return code of 0 from the SA query calls should have > been treated as valid, rather than an error. I'm still seeing hangs on unload, unfortunately. This is headless box so I couldn't get any more data on this hang as yet. -- MST From swise at opengridcomputing.com Thu Jul 6 07:43:21 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 06 Jul 2006 09:43:21 -0500 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060706062051.GB5874@mellanox.co.il> References: <20060705093835.GB15470@harry-potter.in.ibm.com> <20060705112913.GB21557@mellanox.co.il> <1152105850.2316.3.camel@stevo-desktop> <20060706062051.GB5874@mellanox.co.il> Message-ID: <1152197001.12579.4.camel@stevo-desktop> On Thu, 2006-07-06 at 09:20 +0300, Michael S. Tsirkin wrote: > Quoting r. Steve Wise : > > For instance, in the bi-directional bandwidth tests, one side could > > finish the test and issue an rdma_disconnect() while the other end still > > has pending rdma writes. This is an error condition in iWARP. So they > > sync up at the end with a message exchange before disconnecting. > > > > Make sense? > > I see. Is this the only message exchange there is? There are 2 message exchanges to comply with IWARP spec: 1) an exchange right after connection setup to force the client to be the first side to send an RDMA message. So the server waits for a RECV completion and the client posts a "lets start" SEND. 2) an exchange at the end of the test to ensure all RDMA operations are done on the QPs. Both client and server post a "I'm done" SEND and await a "I'm done" completion before disconnecting. Steve. From mst at mellanox.co.il Thu Jul 6 07:57:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 17:57:57 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use ofRDMA CM In-Reply-To: <1152197001.12579.4.camel@stevo-desktop> References: <1152197001.12579.4.camel@stevo-desktop> Message-ID: <20060706145757.GC10513@mellanox.co.il> Quoting r. Steve Wise : > Subject: Re: [PATCH 1/2] perftest: enhancement to rdma_bw to allow use ofRDMA CM > > On Thu, 2006-07-06 at 09:20 +0300, Michael S. Tsirkin wrote: > > Quoting r. Steve Wise : > > > For instance, in the bi-directional bandwidth tests, one side could > > > finish the test and issue an rdma_disconnect() while the other end still > > > has pending rdma writes. This is an error condition in iWARP. So they > > > sync up at the end with a message exchange before disconnecting. > > > > > > Make sense? > > > > I see. Is this the only message exchange there is? > > > There are 2 message exchanges to comply with IWARP spec: > > 1) an exchange right after connection setup to force the client to be > the first side to send an RDMA message. So the server waits for a RECV > completion and the client posts a "lets start" SEND. I don't get this one. Why can't we just look at command line arguments for this? Whoever got the remote IP is the client. What do you mean by "force client to be the first"? > 2) an exchange at the end of the test to ensure all RDMA operations are > done on the QPs. Both client and server post a "I'm done" SEND and > await a "I'm done" completion before disconnecting. Yes, I think I get this part, and I agree its best to do it this way for both IB and iWarp. -- MST From swise at opengridcomputing.com Thu Jul 6 08:34:28 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 06 Jul 2006 10:34:28 -0500 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use ofRDMA CM In-Reply-To: <20060706145757.GC10513@mellanox.co.il> References: <1152197001.12579.4.camel@stevo-desktop> <20060706145757.GC10513@mellanox.co.il> Message-ID: <1152200068.12579.9.camel@stevo-desktop> On Thu, 2006-07-06 at 17:57 +0300, Michael S. Tsirkin wrote: > Quoting r. Steve Wise : > > Subject: Re: [PATCH 1/2] perftest: enhancement to rdma_bw to allow use ofRDMA CM > > > > On Thu, 2006-07-06 at 09:20 +0300, Michael S. Tsirkin wrote: > > > Quoting r. Steve Wise : > > > > For instance, in the bi-directional bandwidth tests, one side could > > > > finish the test and issue an rdma_disconnect() while the other end still > > > > has pending rdma writes. This is an error condition in iWARP. So they > > > > sync up at the end with a message exchange before disconnecting. > > > > > > > > Make sense? > > > > > > I see. Is this the only message exchange there is? > > > > > > There are 2 message exchanges to comply with IWARP spec: > > > > 1) an exchange right after connection setup to force the client to be > > the first side to send an RDMA message. So the server waits for a RECV > > completion and the client posts a "lets start" SEND. > > I don't get this one. Why can't we just look at command line arguments for > this? Whoever got the remote IP is the client. What do you mean by "force > client to be the first"? This is an iWARP spec/compliance issue. The client MUST send the first RDMA message. The server CANNOT send an RDMA message until one is received from the client. We've discussed this before on the openib list. (If you want gory details on this, I'll find the original thread where we discussed this). I'm not arguing that this is a reasonable restriction, but it is what it is. And if the perftests are going to run over rnics, they should have this code... Steve. From eitan at mellanox.co.il Thu Jul 6 08:53:48 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 6 Jul 2006 18:53:48 +0300 Subject: [openib-general] [PATCH] [MINOR] OpenSM/osm_helper: Add support for SMInfoRecorddumping Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236898A@mtlexch01.mtl.com> Sure thanks Eitan Zahavi From mst at mellanox.co.il Thu Jul 6 08:57:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 18:57:12 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow useofRDMA CM In-Reply-To: <1152200068.12579.9.camel@stevo-desktop> References: <1152200068.12579.9.camel@stevo-desktop> Message-ID: <20060706155712.GA10828@mellanox.co.il> Quoting r. Steve Wise : > Subject: Re: [PATCH 1/2] perftest: enhancement to rdma_bw to allow useofRDMA CM > > On Thu, 2006-07-06 at 17:57 +0300, Michael S. Tsirkin wrote: > > Quoting r. Steve Wise : > > > Subject: Re: [PATCH 1/2] perftest: enhancement to rdma_bw to allow use ofRDMA CM > > > > > > On Thu, 2006-07-06 at 09:20 +0300, Michael S. Tsirkin wrote: > > > > Quoting r. Steve Wise : > > > > > For instance, in the bi-directional bandwidth tests, one side could > > > > > finish the test and issue an rdma_disconnect() while the other end still > > > > > has pending rdma writes. This is an error condition in iWARP. So they > > > > > sync up at the end with a message exchange before disconnecting. > > > > > > > > > > Make sense? > > > > > > > > I see. Is this the only message exchange there is? > > > > > > > > > There are 2 message exchanges to comply with IWARP spec: > > > > > > 1) an exchange right after connection setup to force the client to be > > > the first side to send an RDMA message. So the server waits for a RECV > > > completion and the client posts a "lets start" SEND. > > > > I don't get this one. Why can't we just look at command line arguments for > > this? Whoever got the remote IP is the client. What do you mean by "force > > client to be the first"? > > This is an iWARP spec/compliance issue. The client MUST send the first > RDMA message. The server CANNOT send an RDMA message until one is > received from the client. We've discussed this before on the openib > list. So, what you do is post a 0-sized RDMA and then a SEND right after that as part of the setup? I see. > (If you want gory details on this, I'll find the original thread > where we discussed this). Might be an interesting read. -- MST From swise at opengridcomputing.com Thu Jul 6 09:17:43 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 06 Jul 2006 11:17:43 -0500 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow useofRDMA CM In-Reply-To: <20060706155712.GA10828@mellanox.co.il> References: <1152200068.12579.9.camel@stevo-desktop> <20060706155712.GA10828@mellanox.co.il> Message-ID: <1152202663.12579.16.camel@stevo-desktop> > So, what you do is post a 0-sized RDMA and then a SEND right after that > as part of the setup? I see. > > > (If you want gory details on this, I'll find the original thread > > where we discussed this). > > Might be an interesting read. Probably not ;-) here's the thread discussing this in the context of dapltest: http://thread.gmane.org/gmane.linux.drivers.openib/24778/focus=24778 Here's the MPA IETF draft that requires the active side to send the first RDMA FPDU: http://www.ietf.org/internet-drafts/draft-ietf-rddp-mpa-03.txt Steve. From mshefty at ichips.intel.com Thu Jul 6 09:17:44 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 06 Jul 2006 09:17:44 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706081942.GA7925@mellanox.co.il> References: <44ACC457.9000707@voltaire.com> <20060706081942.GA7925@mellanox.co.il> Message-ID: <44AD37A8.8060602@ichips.intel.com> Michael S. Tsirkin wrote: > What I am saying that giving the application control over the timeouts > seems more like a workaround than a solution. The CM timeout depends on both the round trip time, as well as the time it takes the remote service to respond to the connection request. The errors that we've seen were related to the amount of time that it took the remote service to respond, which was on the order of 30+ seconds. I believe that this is an application specific problem that requires the application to determine the proper timeout. Note that the underlying IB CM timeout already includes timeout information based on the obtained SA path record. The exposed values are application response timeout values only. For the retry count, I'm not aware of any way to programmatically determine the proper retry count. Additionally, CM timeouts are linear, not exponentially increasing. - Sean From mst at mellanox.co.il Thu Jul 6 09:19:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 19:19:09 +0300 Subject: [openib-general] RFC: CMA backlog (was Re: CMA backlog) In-Reply-To: <44A1B68C.9030806@ichips.intel.com> References: <44A1B68C.9030806@ichips.intel.com> Message-ID: <20060706161909.GA11535@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: RFC: CMA backlog (was Re: CMA backlog) > > Michael S. Tsirkin wrote: > > Looks good to me. Please go ahead, then I'll use this in SDP and test this way. > > Committed in 8261. > How do you look at merging this for 2.6.18? this is more or less a bugfix: it seems that if we get ENOMEM trying to allocate memory to send response MAD does not make sense anyway. -- MST From swise at opengridcomputing.com Thu Jul 6 09:24:12 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 06 Jul 2006 11:24:12 -0500 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow useofRDMA CM In-Reply-To: <1152202663.12579.16.camel@stevo-desktop> References: <1152200068.12579.9.camel@stevo-desktop> <20060706155712.GA10828@mellanox.co.il> <1152202663.12579.16.camel@stevo-desktop> Message-ID: <1152203052.12579.18.camel@stevo-desktop> > Here's the MPA IETF draft that requires the active side to send the > first RDMA FPDU: > > http://www.ietf.org/internet-drafts/draft-ietf-rddp-mpa-03.txt > Oops, try this: http://www.ietf.org/internet-drafts/draft-ietf-rddp-mpa-05.txt From mst at mellanox.co.il Thu Jul 6 09:41:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 19:41:45 +0300 Subject: [openib-general] [PATCH] mthca: fix static rate in mthca_ah_query Message-ID: <20060706164145.GA11904@mellanox.co.il> Roland, please review: git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 This has the following patch: IB/mthca: fix static rate returned by mthca_ah_query -- commit 8b9395d3b67af87ce4c218af06a259a8af246d75 Author: Jack Morgenstein Date: Thu Jul 6 19:33:25 2006 +0300 mthca_ah_query returs the static rate of the address handle in internal mthc format. fix it to use rate encoding from enum ib_rate, which is what users expect. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: latest/drivers/infiniband/hw/mthca/mthca_av.c =================================================================== --- latest.orig/drivers/infiniband/hw/mthca/mthca_av.c 2006-07-02 18:00:34.000000000 +0300 +++ latest/drivers/infiniband/hw/mthca/mthca_av.c 2006-07-05 13:50:06.000000000 +0300 @@ -303,9 +303,10 @@ int mthca_ah_query(struct ib_ah *ibah, s memset(attr, 0, sizeof *attr); attr->dlid = be16_to_cpu(ah->av->dlid); attr->sl = be32_to_cpu(ah->av->sl_tclass_flowlabel) >> 28; - attr->static_rate = ah->av->msg_sr & 0x7; - attr->src_path_bits = ah->av->g_slid & 0x7F; attr->port_num = be32_to_cpu(ah->av->port_pd) >> 24; + attr->static_rate = mthca_rate_to_ib(dev, ah->av->msg_sr & 0x7, + attr->port_num); + attr->src_path_bits = ah->av->g_slid & 0x7F; attr->ah_flags = mthca_ah_grh_present(ah) ? IB_AH_GRH : 0; if (attr->ah_flags) { -- MST From mst at mellanox.co.il Thu Jul 6 09:49:11 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 19:49:11 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44AD37A8.8060602@ichips.intel.com> References: <44AD37A8.8060602@ichips.intel.com> Message-ID: <20060706164911.GA11926@mellanox.co.il> Quoting r. Sean Hefty : > For the retry count, I'm not aware of any way to > programmatically determine the proper retry count. TCP sockets just expose this to application through the TCP_SYNCNT option. Which leads again to my suggestion: since both TCP and IB CM have this, let us change max_cm_retries to max_request_retries, and add this in rdma_cm as a generic option. What do our iWarp experts say? Sean? -- MST From mst at mellanox.co.il Thu Jul 6 10:11:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 20:11:06 +0300 Subject: [openib-general] [PATCH] mthca: comment fix Message-ID: <20060706171106.GA12226@mellanox.co.il> comment in mthca_qp.c makes it seem lockdep is the only reason WQ locks should be initialized separately, but as Roland pointed out, there are other reasons, e.g. that mthca_wq_init is called from modify qp as well. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 490fc78..2f3917e 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1114,7 +1114,7 @@ static int mthca_alloc_qp_common(struct qp->sq_policy = send_policy; mthca_wq_init(&qp->sq); mthca_wq_init(&qp->rq); - /* these are initialized separately so lockdep can tell them apart */ + spin_lock_init(&qp->sq.lock); spin_lock_init(&qp->rq.lock); -- MST From mshefty at ichips.intel.com Thu Jul 6 10:14:03 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 06 Jul 2006 10:14:03 -0700 Subject: [openib-general] [ucm] device file of the ucm is not being created In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30271BBB0@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30271BBB0@mtlexch01.mtl.com> Message-ID: <44AD44DB.4070201@ichips.intel.com> Dotan Barak wrote: > KERNEL="ucma", NAME="infiniband/%k", MODE="0666" > KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" > > > do you know that is the problem? The ucma should be in /sys/class/misc/rdma_cm. - Sean From mshefty at ichips.intel.com Thu Jul 6 10:53:08 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 06 Jul 2006 10:53:08 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706164911.GA11926@mellanox.co.il> References: <44AD37A8.8060602@ichips.intel.com> <20060706164911.GA11926@mellanox.co.il> Message-ID: <44AD4E04.2030908@ichips.intel.com> Michael S. Tsirkin wrote: > TCP sockets just expose this to application through the > TCP_SYNCNT option. > > Which leads again to my suggestion: since both TCP and IB CM have this, let us > change max_cm_retries to max_request_retries, and add this in rdma_cm as a > generic option. I'm not against adding in generic option for this, but the retry values have different limits. IB is limited to a maximum of 15, while TCP has a maximum of 255. - Sean From bpradip at in.ibm.com Thu Jul 6 11:11:25 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Thu, 06 Jul 2006 23:41:25 +0530 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <1152197001.12579.4.camel@stevo-desktop> References: <20060705093835.GB15470@harry-potter.in.ibm.com> <20060705112913.GB21557@mellanox.co.il> <1152105850.2316.3.camel@stevo-desktop> <20060706062051.GB5874@mellanox.co.il> <1152197001.12579.4.camel@stevo-desktop> Message-ID: <44AD524D.5050600@in.ibm.com> Steve Wise wrote: > On Thu, 2006-07-06 at 09:20 +0300, Michael S. Tsirkin wrote: >> Quoting r. Steve Wise : >>> For instance, in the bi-directional bandwidth tests, one side could >>> finish the test and issue an rdma_disconnect() while the other end still >>> has pending rdma writes. This is an error condition in iWARP. So they >>> sync up at the end with a message exchange before disconnecting. >>> >>> Make sense? >> I see. Is this the only message exchange there is? > > > There are 2 message exchanges to comply with IWARP spec: > > 1) an exchange right after connection setup to force the client to be > the first side to send an RDMA message. So the server waits for a RECV > completion and the client posts a "lets start" SEND. Steve, Thanks for clarifying this. However I have a doubt. Wouldn't server doing a rdma_listen() and the client doing a rdma_connect() suffice ? I am going through the iWARP spec to understand why this first exchange is required and why a listen/connect sequence is not enough. Any pointers will be really helpful. Thanks, Pradipta > > 2) an exchange at the end of the test to ensure all RDMA operations are > done on the QPs. Both client and server post a "I'm done" SEND and > await a "I'm done" completion before disconnecting. > > > Steve. > > From eitan at mellanox.co.il Thu Jul 6 11:11:46 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 6 Jul 2006 21:11:46 +0300 Subject: [openib-general] ib_types.h and Win/Linux consolidation Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236898F@mtlexch01.mtl.com> Hi All, I have been approached by several people asking for where does one gets a header file defining the IBTA "wire" protocol. My response was: "Ohh we got it all coded in osm/include/iba/ib_types.h". "But that thing is so down the tree I do not consider as official" was the answer. So the point is clear: If we are missing such a complete IBTA H file and people are actually looking for where the wire protocol is being defined why shouldn't we promote ib_types.h to the main include directory? Another issue with ib_types.h : Apparently the WinIB (OpenIB windows) version and the Linux version are a little different. Major changes are that the Windows version is a spec 1.1 compliant and the Linux is supporting version 1.2. Another difference is the fact some "verbs" or core oriented definitions found their way into the WinIB version. I hope we can clean those up and have a merged version in place. Thanks Eitan Zahavi -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Thu Jul 6 11:31:16 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 06 Jul 2006 13:31:16 -0500 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <44AD524D.5050600@in.ibm.com> References: <20060705093835.GB15470@harry-potter.in.ibm.com> <20060705112913.GB21557@mellanox.co.il> <1152105850.2316.3.camel@stevo-desktop> <20060706062051.GB5874@mellanox.co.il> <1152197001.12579.4.camel@stevo-desktop> <44AD524D.5050600@in.ibm.com> Message-ID: <1152210676.12579.24.camel@stevo-desktop> On Thu, 2006-07-06 at 23:41 +0530, Pradipta Kumar Banerjee wrote: > Steve Wise wrote: > > On Thu, 2006-07-06 at 09:20 +0300, Michael S. Tsirkin wrote: > >> Quoting r. Steve Wise : > >>> For instance, in the bi-directional bandwidth tests, one side could > >>> finish the test and issue an rdma_disconnect() while the other end still > >>> has pending rdma writes. This is an error condition in iWARP. So they > >>> sync up at the end with a message exchange before disconnecting. > >>> > >>> Make sense? > >> I see. Is this the only message exchange there is? > > > > > > There are 2 message exchanges to comply with IWARP spec: > > > > 1) an exchange right after connection setup to force the client to be > > the first side to send an RDMA message. So the server waits for a RECV > > completion and the client posts a "lets start" SEND. > Steve, > Thanks for clarifying this. However I have a doubt. Wouldn't server doing a > rdma_listen() and the client doing a rdma_connect() suffice ? > I am going through the iWARP spec to understand why this first exchange is > required and why a listen/connect sequence is not enough. Any pointers will be > really helpful. > read the mpa spec. see section 7... http://www.ietf.org/internet-drafts/draft-ietf-rddp-mpa-05.txt From mshefty at ichips.intel.com Thu Jul 6 11:31:44 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 06 Jul 2006 11:31:44 -0700 Subject: [openib-general] ib_types.h and Win/Linux consolidation In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30236898F@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236898F@mtlexch01.mtl.com> Message-ID: <44AD5710.2030707@ichips.intel.com> Eitan Zahavi wrote: > I have been approached by several people asking for where does one gets > a header > file defining the IBTA “wire” protocol. I'm not sure that it makes sense to try to have a single header file that defines all wire data structures. For example, I don't think that any module outside of the IB CM cares about the CM wire formats. Getting and setting values in wire format is also non-trivial for many of the structures definitions. The approach used so far has been to pack/unpack the structures under another API, like the sa_query. > So the point is clear: If we are missing such a complete IBTA H file and > people are actually looking for where the wire protocol is being defined > why shouldn’t we promote ib_types.h to the main include directory? What wire structures are missing from the include files in include/rdma? - Sean From halr at voltaire.com Thu Jul 6 12:03:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jul 2006 15:03:02 -0400 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30236898F@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236898F@mtlexch01.mtl.com> Message-ID: <1152212580.4533.211703.camel@hal.voltaire.com> Hi Eitan, On Thu, 2006-07-06 at 14:11, Eitan Zahavi wrote: > Hi All, > > I have been approached by several people asking for where does one > gets a header > > file defining the IBTA “wire” protocol. > > My response was: “Ohh we got it all coded in > osm/include/iba/ib_types.h”. Actually I was thinking the opposite: that this file was way too big and should be broken up into smaller more manageable pieces. In this usage, I think all is referring to all MADs. ib_types.h supports SM and SA MADs and some other MADs but not all MADs. > “But that thing is so down the tree I do not consider as official” was > the answer. It is currently used by management and utils. utils/linux-user could be moved. What would be the new proposed location for this ? > So the point is clear: If we are missing such a complete IBTA H file > and people are actually looking for where the wire protocol is being > defined why shouldn’t we promote ib_types.h to the main include > directory? > > Another issue with ib_types.h : > > Apparently the WinIB (OpenIB windows) version and the Linux version > are a little different. > > Major changes are that the Windows version is a spec 1.1 compliant and > the Linux is supporting version 1.2. Why is Windows 1.1 and not 1.2 compliant ? > Another difference is the fact some “verbs” or core oriented > definitions found their way into the WinIB version. Another issue is that ib_types.h requires some complib things too. > I hope we can clean those up and have a merged version in place. Perhaps but is this a real requirement ? -- Hal > Thanks > > Eitan Zahavi > > > > ______________________________________________________________________ > > _______________________________________________ > openib-windows mailing list > openib-windows at openib.org > http://openib.org/mailman/listinfo/openib-windows From ftillier at silverstorm.com Thu Jul 6 12:22:48 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Thu, 6 Jul 2006 15:22:48 -0400 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation In-Reply-To: <1152212580.4533.211703.camel@hal.voltaire.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236898F@mtlexch01.mtl.com> <1152212580.4533.211703.camel@hal.voltaire.com> Message-ID: <79ae2f320607061222r4f9d2643k265516fb990777db@mail.gmail.com> On 06 Jul 2006 15:03:02 -0400, Hal Rosenstock wrote: > Hi Eitan, > > On Thu, 2006-07-06 at 14:11, Eitan Zahavi wrote: > > Hi All, > > > > I have been approached by several people asking for where does one > > gets a header > > > > file defining the IBTA "wire" protocol. > > > > My response was: "Ohh we got it all coded in > > osm/include/iba/ib_types.h". > > Actually I was thinking the opposite: that this file was way too big and > should be broken up into smaller more manageable pieces. > > In this usage, I think all is referring to all MADs. ib_types.h supports > SM and SA MADs and some other MADs but not all MADs. > > > "But that thing is so down the tree I do not consider as official" was > > the answer. > > It is currently used by management and utils. utils/linux-user could be > moved. > > What would be the new proposed location for this ? > > > So the point is clear: If we are missing such a complete IBTA H file > > and people are actually looking for where the wire protocol is being > > defined why shouldn't we promote ib_types.h to the main include > > directory? > > > > Another issue with ib_types.h : > > > > Apparently the WinIB (OpenIB windows) version and the Linux version > > are a little different. > > > > Major changes are that the Windows version is a spec 1.1 compliant and > > the Linux is supporting version 1.2. > > Why is Windows 1.1 and not 1.2 compliant ? I'm not sure what the deficiencies are here - Windows doesn't have SRQ support, or IB 1.2 FMR support, but other than that the on-wire MAD stuff should all be fine. What's missing? > > Another difference is the fact some "verbs" or core oriented > > definitions found their way into the WinIB version. > > Another issue is that ib_types.h requires some complib things too. > > > I hope we can clean those up and have a merged version in place. > > Perhaps but is this a real requirement ? I think that it would be a mistake to share a file like this between Windows and Linux, as that requires some sort of abstraction for types and packing. Since abstractions like this won't fly in LKML, it would become the burden of the Windows stack to try to get things to be portable. I don't support such an uneven burden being placed on me. - Fab From troy at scl.ameslab.gov Thu Jul 6 12:25:49 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Thu, 6 Jul 2006 14:25:49 -0500 Subject: [openib-general] [ANNOUNCE] NetPIPE 3.7 release candidate 1 Message-ID: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> I am preparing to release an update to the NetPIPE benchmark ( http://scl.ameslab.gov/Projects/NetPIPE/NetPIPE.html ), and I would very much like to hear some feedback on the OpenIB verbs implementation (NPibv), and take any patches to make it build on Windows as well. I would also like hear from anyone that has run NetPIPE on RDMA ethernet hardware, since I currently don't have any to test with. There is a tarball available at http://scl.ameslab.gov/~troy/NetPIPE-3.7-rc1.tar.gz and a mercurial repository at: http://source.scl.ameslab.gov/hg/netpipe3-dev If you have any interest in NetPIPE, please take a look and send any comments and/or patches to netpipe at scl.ameslab.gov Thanks. From halr at voltaire.com Thu Jul 6 12:40:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jul 2006 15:40:43 -0400 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation In-Reply-To: <79ae2f320607061222r4f9d2643k265516fb990777db@mail.gmail.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236898F@mtlexch01.mtl.com> <1152212580.4533.211703.camel@hal.voltaire.com> <79ae2f320607061222r4f9d2643k265516fb990777db@mail.gmail.com> Message-ID: <1152214620.4533.213199.camel@hal.voltaire.com> Hi Fab, On Thu, 2006-07-06 at 15:22, Fabian Tillier wrote: > On 06 Jul 2006 15:03:02 -0400, Hal Rosenstock wrote: > > Hi Eitan, > > > > On Thu, 2006-07-06 at 14:11, Eitan Zahavi wrote: > > > Hi All, > > > > > > I have been approached by several people asking for where does one > > > gets a header > > > > > > file defining the IBTA "wire" protocol. > > > > > > My response was: "Ohh we got it all coded in > > > osm/include/iba/ib_types.h". > > > > Actually I was thinking the opposite: that this file was way too big and > > should be broken up into smaller more manageable pieces. > > > > In this usage, I think all is referring to all MADs. ib_types.h supports > > SM and SA MADs and some other MADs but not all MADs. > > > > > "But that thing is so down the tree I do not consider as official" was > > > the answer. > > > > It is currently used by management and utils. utils/linux-user could be > > moved. > > > > What would be the new proposed location for this ? > > > > > So the point is clear: If we are missing such a complete IBTA H file > > > and people are actually looking for where the wire protocol is being > > > defined why shouldn't we promote ib_types.h to the main include > > > directory? > > > > > > Another issue with ib_types.h : > > > > > > Apparently the WinIB (OpenIB windows) version and the Linux version > > > are a little different. > > > > > > Major changes are that the Windows version is a spec 1.1 compliant and > > > the Linux is supporting version 1.2. > > > > Why is Windows 1.1 and not 1.2 compliant ? > > I'm not sure what the deficiencies are here - Windows doesn't have SRQ > support, or IB 1.2 FMR support, Linux doesn't have real 1.2 FMR support either. > but other than that the on-wire MAD > stuff should all be fine. > > What's missing? I'm not familiar with the Windows support so can't comment on the specifics. Eitan ? > > > Another difference is the fact some "verbs" or core oriented > > > definitions found their way into the WinIB version. > > > > Another issue is that ib_types.h requires some complib things too. > > > > > I hope we can clean those up and have a merged version in place. > > > > Perhaps but is this a real requirement ? > > I think that it would be a mistake to share a file like this between > Windows and Linux, as that requires some sort of abstraction for types > and packing. Since abstractions like this won't fly in LKML, This is not an LKML file. It's a userspace file. > it would > become the burden of the Windows stack to try to get things to be > portable. I don't support such an uneven burden being placed on me. I'm neither for or against this as I do not understand what this means (e.g. what are the specific changes). In the past, Windows changes have been rolled back into the Linux user support for OpenSM. However, I am also wary of any "baggage" here which limits what can be done. As I have stated before, IMO OpenIB OpenSM is currently accomodating Windows as opposed to it being a firm requirement that it be supported. I would prefer the two not to fork but if OpenSM on Linux is impaired for some reason a split might be necessary. One area where there has been difficulty in this in the past has been the direct use of pthreads versus the threading API in complib. -- Hal > - Fab From halr at voltaire.com Thu Jul 6 12:54:52 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jul 2006 15:54:52 -0400 Subject: [openib-general] [PATCH][MINOR] OpenSM/osm_switch.c: In osm_switch_init, handle osm_fwd_tbl_init failures Message-ID: <1152215582.4533.213875.camel@hal.voltaire.com> OpenSM/osm_switch.c: In osm_switch_init, handle osm_fwd_tbl_init failures Signed-off-by: Hal Rosenstock Index: opensm/osm_switch.c =================================================================== --- opensm/osm_switch.c (revision 8435) +++ opensm/osm_switch.c (working copy) @@ -103,6 +103,8 @@ osm_switch_init( goto Exit; status = osm_fwd_tbl_init( &p_sw->fwd_tbl, p_si ); + if( status != IB_SUCCESS ) + goto Exit; p_sw->p_prof = malloc( sizeof(*p_sw->p_prof) * num_ports ); if( p_sw->p_prof == NULL ) From Thomas.Talpey at netapp.com Thu Jul 6 13:08:23 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Thu, 06 Jul 2006 16:08:23 -0400 Subject: [openib-general] Fwd: WG Action: Conclusion of IP over InfiniBand (ipoib) Message-ID: <7.0.1.0.2.20060706160808.048c65a8@netapp.com> FYI... > ---------- Forwarded Message ---------- >To: ietf-announce at ietf.org >From: IESG Secretary >Date: Wed, 05 Jul 2006 15:50:01 -0400 >Cc: ipoverib at ietf.org, "H.K. Jerry Chu" , > Bill Strahm >Subject: WG Action: Conclusion of IP over InfiniBand (ipoib) >List-Id: ietf-announce.ietf.org >List-Post: >List-Help: >List-Subscribe: , > > >The IP over InfiniBand WG (ipoib) in the Internet Area has concluded. > >The IESG contact persons are Jari Arkko and Mark Townsley. > >+++ > >The IPOIB working group has completed its main task of >defining how to run IP over InfiniBand. It has published >three RFCs and a fourth one is in the RFC Editor's queue, >soon to become an RFC as well. > >There are some additional work items in the milestone >plan, a set of MIBs. But after reviewing the status and >activity in the group it seems best to close the WG. >There are a few individuals who are still interested in >pursuing a part of the MIB work, and they are encouraged >to submit their work as an AD sponsored document, when >the work is completed. > >The mailing list for the group will remain active. > >_______________________________________________ >IETF-Announce mailing list >IETF-Announce at ietf.org >https://www1.ietf.org/mailman/listinfo/ietf-announce > ---------- End of Forwarded Message ---------- From mst at mellanox.co.il Thu Jul 6 13:12:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 23:12:28 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44AD4E04.2030908@ichips.intel.com> References: <44AD4E04.2030908@ichips.intel.com> Message-ID: <20060706201228.GA12873@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: ucma into kernel.org > > Michael S. Tsirkin wrote: > > TCP sockets just expose this to application through the > > TCP_SYNCNT option. > > > > Which leads again to my suggestion: since both TCP and IB CM have this, let us > > change max_cm_retries to max_request_retries, and add this in rdma_cm as a > > generic option. > > I'm not against adding in generic option for this, but the retry values have > different limits. IB is limited to a maximum of 15, while TCP has a maximum of 255. What limits IB retry cound to 15? I would expect it to be arbitrary. Note I am talking about CM REQ retries that we do in software and that are not passed in any message. -- MST From mshefty at ichips.intel.com Thu Jul 6 13:27:19 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 06 Jul 2006 13:27:19 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706201228.GA12873@mellanox.co.il> References: <44AD4E04.2030908@ichips.intel.com> <20060706201228.GA12873@mellanox.co.il> Message-ID: <44AD7227.2050903@ichips.intel.com> Michael S. Tsirkin wrote: > What limits IB retry cound to 15? I would expect it to be arbitrary. Note I am > talking about CM REQ retries that we do in software and that are not passed in > any message. Max CM retries is a 4-bit value carried in the REQ indicating the number of times that a REQ, REP, or DREQ can be retried. See 12.7.27. I would expect software to adhere to this value. - Sean From ftillier at silverstorm.com Thu Jul 6 13:30:51 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Thu, 6 Jul 2006 16:30:51 -0400 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060706201228.GA12873@mellanox.co.il> References: <44AD4E04.2030908@ichips.intel.com> <20060706201228.GA12873@mellanox.co.il> Message-ID: <79ae2f320607061330wf194b88i57d3724469d0ad86@mail.gmail.com> On 7/6/06, Michael S. Tsirkin wrote: > Quoting r. Sean Hefty : > > Subject: Re: ucma into kernel.org > > > > Michael S. Tsirkin wrote: > > > TCP sockets just expose this to application through the > > > TCP_SYNCNT option. > > > > > > Which leads again to my suggestion: since both TCP and IB CM have this, let us > > > change max_cm_retries to max_request_retries, and add this in rdma_cm as a > > > generic option. > > > > I'm not against adding in generic option for this, but the retry values have > > different limits. IB is limited to a maximum of 15, while TCP has a maximum of 255. > > What limits IB retry cound to 15? I would expect it to be arbitrary. Note I am > talking about CM REQ retries that we do in software and that are not passed in > any message. The CM REQ includes a "Max CM Retries" field, 4-bits, that specifies the maximum number of retries that can be done for REQ, REP, and DREQ before a REJ is sent to indicate that the connection is aborted. - Fab From benh at kernel.crashing.org Thu Jul 6 15:37:01 2006 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 07 Jul 2006 08:37:01 +1000 Subject: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems In-Reply-To: References: Message-ID: <1152225421.9862.12.camel@localhost.localdomain> > +#if defined(__powerpc__) > + /* There isn't a generic way to specify writethrough mappings */ > + pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE; > + pgprot_val(vma->vm_page_prot) |= _PAGE_WRITETHRU; > + pgprot_val(vma->vm_page_prot) &= ~_PAGE_GUARDED; > +#endif I don't see any case where having both NO_CACHE and WRITE_THRU can be legal... It's one or the other. > +/** > + * ipath_unordered_wc - indicate whether write combining is ordered > + * > + * PowerPC systems (at least those in the 970 processor family) > + * write partially filled store buffers in address order, but will write > + * completely filled store buffers in "random" order, and therefore must > + * have serialization for correctness with current InfiniPath chips. > + * > + */ > +int ipath_unordered_wc(void) > +{ > + return 1; > +} How is the above providing any kind of serialisation ? Ben. From johnip at sgi.com Thu Jul 6 16:00:36 2006 From: johnip at sgi.com (John Partridge) Date: Thu, 06 Jul 2006 18:00:36 -0500 Subject: [openib-general] [Bug 146] OFED-1.0 DAPL fails to build on SLES10 on IA64 with IA64_FETCHADD error Message-ID: <44AD9614.5090907@sgi.com> James, I have had the opportunity to look into the CONFIG_NR_CPUS compile error and I think the issue is that in dapl_osd.h (line 83) asm/system.h has been included (I can't see a reason why) this in turn includes asm/percpu.h which in turn includes linux/threads.h which does :- #ifdef CONFIG_SMP #define NR_CPUS CONFIG_NR_CPUS #else and of course CONFIG_NR_CPUS is defined in the kernel include file include/linux/autoconf.h I have tried compiling without asm/system.h included in dapl_osd.h but I com up against the IA64_FETCHADD’ undeclared error. Do you know why asm/system.h is included in dapl_osd.h ? Thanks John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From bos at pathscale.com Thu Jul 6 16:34:27 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 06 Jul 2006 16:34:27 -0700 Subject: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems In-Reply-To: <1152225421.9862.12.camel@localhost.localdomain> References: <1152225421.9862.12.camel@localhost.localdomain> Message-ID: <1152228867.24748.3.camel@localhost.localdomain> On Fri, 2006-07-07 at 08:37 +1000, Benjamin Herrenschmidt wrote: > > +int ipath_unordered_wc(void) > > +{ > > + return 1; > > +} > > How is the above providing any kind of serialisation ? It's not intended to; it tells the *caller* whether to do it. References: <1152225421.9862.12.camel@localhost.localdomain> <1152228867.24748.3.camel@localhost.localdomain> Message-ID: <1152229145.9862.21.camel@localhost.localdomain> On Thu, 2006-07-06 at 16:34 -0700, Bryan O'Sullivan wrote: > On Fri, 2006-07-07 at 08:37 +1000, Benjamin Herrenschmidt wrote: > > > > +int ipath_unordered_wc(void) > > > +{ > > > + return 1; > > > +} > > > > How is the above providing any kind of serialisation ? > > It's not intended to; it tells the *caller* whether to do it. Ah ok. What barrier do you use for that ? Ben. From gnn at neville-neil.com Thu Jul 6 23:38:46 2006 From: gnn at neville-neil.com (George V. Neville-Neil) Date: Fri, 07 Jul 2006 15:38:46 +0900 Subject: [openib-general] [Netpipe] [ANNOUNCE] NetPIPE 3.7 release candidate 1 In-Reply-To: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> References: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> Message-ID: At Thu, 6 Jul 2006 14:25:49 -0500, Troy Benjegerdes wrote: > > I am preparing to release an update to the NetPIPE benchmark > ( http://scl.ameslab.gov/Projects/NetPIPE/NetPIPE.html ), and I would > very much like to hear some feedback on the OpenIB verbs implementation > (NPibv), and take any patches to make it build on Windows as well. > > I would also like hear from anyone that has run NetPIPE on RDMA > ethernet hardware, since I currently don't have any to test with. > > There is a tarball available at > http://scl.ameslab.gov/~troy/NetPIPE-3.7-rc1.tar.gz > > and a mercurial repository at: > > http://source.scl.ameslab.gov/hg/netpipe3-dev > > If you have any interest in NetPIPE, please take a look and send any > comments and/or patches to netpipe at scl.ameslab.gov > I've tested the tarball with SCTP and TCP6 on Mac and FreeBSD. Seems fine. Thanks, George From mst at mellanox.co.il Fri Jul 7 02:11:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 7 Jul 2006 12:11:47 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44AD7227.2050903@ichips.intel.com> References: <44AD7227.2050903@ichips.intel.com> Message-ID: <20060707091147.GC27746@mellanox.co.il> Quoting r. Sean Hefty : > Max CM retries is a 4-bit value carried in the REQ indicating the number of > times that a REQ, REP, or DREQ can be retried. See 12.7.27. I would expect > software to adhere to this value. Hmm. How can SDP implement TCP_SYNCNT then? I would like to retry as much times as requested. -- MST From mshefty at ichips.intel.com Fri Jul 7 09:48:53 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 07 Jul 2006 09:48:53 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060707091147.GC27746@mellanox.co.il> References: <44AD7227.2050903@ichips.intel.com> <20060707091147.GC27746@mellanox.co.il> Message-ID: <44AE9075.8060801@ichips.intel.com> Michael S. Tsirkin wrote: >>Max CM retries is a 4-bit value carried in the REQ indicating the number of >>times that a REQ, REP, or DREQ can be retried. See 12.7.27. I would expect >>software to adhere to this value. > > Hmm. How can SDP implement TCP_SYNCNT then? I would like to retry as much times > as requested. My guess is that you would need to retry the connection attempt using a different connection identifier, which unfortunately will look like a new connection attempt to the remote side. I'm not sure how much that will matter though. This sounds like an architectural issue with SDP... In practice, I doubt that SYNCNT would ever be set higher than about 5, so it may not be that much of a real problem. - Sean From troy at scl.ameslab.gov Fri Jul 7 09:53:36 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Fri, 07 Jul 2006 11:53:36 -0500 Subject: [openib-general] [ANNOUNCE] NetPIPE 3.7 release candidate 1 In-Reply-To: <44AE3644.1000008@in.ibm.com> References: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> <44AE3644.1000008@in.ibm.com> Message-ID: <44AE9190.3010605@scl.ameslab.gov> Pradipta Kumar Banerjee wrote: > Troy Benjegerdes wrote: > >> I am preparing to release an update to the NetPIPE benchmark >> ( http://scl.ameslab.gov/Projects/NetPIPE/NetPIPE.html ), and I would >> very much like to hear some feedback on the OpenIB verbs implementation >> (NPibv), and take any patches to make it build on Windows as well. >> >> I would also like hear from anyone that has run NetPIPE on RDMA ethernet >> hardware, since I currently don't have any to test with. > > Troy, > This unfortunately doesn't work with the iWARP devices. I tested it > on Ammasso iWARP cards. Ammasso iWARP cards create two interfaces - > one the usual ethernet interface (ethX) and another iWARP interface > (iwX). > The RDMA protocol is run over the iWARP interface which doesn't allow > for normal socket based communication. Hence bind/listen etc is on the > ethX interface whereas for RDMA these should be on the iwX interface. > Using RDMA CM solves the problem. Its a generic solution for both IB > and iWARP. > ( For more information look at the following thread > http://thread.gmane.org/gmane.linux.drivers.openib/27629/focus=27629 ) > > If you are interested I can try making the relevant changes to NetPIPE > and send it across to you. > > Thanks, > Pradipta Kumar. Is the connection manager api considered 'stable' yet? Last I knew it was still undergoing a lot of development. Wouldn't it be possible to adapt the existing tcp connection setup code to work for Iwarp without using the connection manager? Most of the time when I test IB networks, I just bind to ethernet eth0 interfaces anyway. From halr at voltaire.com Fri Jul 7 10:32:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Jul 2006 13:32:15 -0400 Subject: [openib-general] [PATCH][MINOR]OpenSM/osm_trap_rcv.c: Message-ID: <1152292871.4533.272707.camel@hal.voltaire.com> OpenSM/osm_trap_rcv.c: In __osm_trap_rcv_process_request, display port number for SM traps 129-131 Signed-off-by: Hal Rosenstock Index: opensm/osm_trap_rcv.c =================================================================== --- opensm/osm_trap_rcv.c (revision 8447) +++ opensm/osm_trap_rcv.c (working copy) @@ -399,16 +399,35 @@ __osm_trap_rcv_process_request( /* Print some info about the incoming Trap */ if (ib_notice_is_generic(p_ntci)) { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_trap_rcv_process_request: " - "Received Generic Notice type:0x%02X num:%u Producer:%u " - "from LID:0x%04X TID:0x%016" PRIx64 "\n", - ib_notice_get_type(p_ntci), - cl_ntoh16(p_ntci->g_or_v.generic.trap_num), - cl_ntoh32(ib_notice_get_prod_type(p_ntci)), - cl_hton16(source_lid), - cl_ntoh64(p_smp->trans_id) - ); + if ((p_ntci->g_or_v.generic.trap_num == CL_HTON16(129)) || + (p_ntci->g_or_v.generic.trap_num == CL_HTON16(130)) || + (p_ntci->g_or_v.generic.trap_num == CL_HTON16(131))) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_trap_rcv_process_request: " + "Received Generic Notice type:0x%02X num:%u Producer:%u " + "from LID:0x%04X Port %d TID:0x%016" PRIx64 "\n", + ib_notice_get_type(p_ntci), + cl_ntoh16(p_ntci->g_or_v.generic.trap_num), + cl_ntoh32(ib_notice_get_prod_type(p_ntci)), + cl_hton16(source_lid), + p_ntci->data_details.ntc_129_131.port_num, + cl_ntoh64(p_smp->trans_id) + ); + } + else + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_trap_rcv_process_request: " + "Received Generic Notice type:0x%02X num:%u Producer:%u " + "from LID:0x%04X TID:0x%016" PRIx64 "\n", + ib_notice_get_type(p_ntci), + cl_ntoh16(p_ntci->g_or_v.generic.trap_num), + cl_ntoh32(ib_notice_get_prod_type(p_ntci)), + cl_hton16(source_lid), + cl_ntoh64(p_smp->trans_id) + ); + } } else { From mshefty at ichips.intel.com Fri Jul 7 10:41:53 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 07 Jul 2006 10:41:53 -0700 Subject: [openib-general] [ANNOUNCE] NetPIPE 3.7 release candidate 1 In-Reply-To: <44AE9190.3010605@scl.ameslab.gov> References: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> <44AE3644.1000008@in.ibm.com> <44AE9190.3010605@scl.ameslab.gov> Message-ID: <44AE9CE1.2080808@ichips.intel.com> Troy Benjegerdes wrote: > Is the connection manager api considered 'stable' yet? Last I knew it > was still undergoing a lot of development. The connection manager API is relatively stable, and has been for a while. Changes to the API have come in the form of new options and calls, rather than changes to the existing calls. > Wouldn't it be possible to adapt the existing tcp connection setup code > to work for Iwarp without using the connection manager? Most of the time > when I test IB networks, I just bind to ethernet eth0 interfaces anyway. My understanding is that it's not possible to connect iWarp QPs without using the connection manager. For IB, it's possible to connect over sockets and exchange QP information over a TCP connection, but the same is not true for iWarp. - Sean From ravinandan.arakali at neterion.com Fri Jul 7 11:04:45 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 7 Jul 2006 11:04:45 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) Message-ID: <000b01c6a1ef$d4e745e0$4710100a@pc.s2io.com> Hi, I have two systems connected back-to-back with ammasso cards. I have configured and built the libraries and drivers as mentioned in https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 I can ping on the regular ethX interface but both rping and krping on iwarp interfaces fail (no packets sent/received). Following is the configuration: System1: eth interface: 17.1.1.101 iwarp interface: 17.2.2.101 System2: eth interface: 17.1.1.102 iwarp interface: 17.2.2.102 On System2 I run # rping -s -vV -C100 -a 0.0.0.0 -p 9999 On System1 I run # rping -c -vV -C100 -a 17.2.2.102 -p 9999 Sometimes I have noticed that when client is started, the server core dumps. Other times, there is no core dump but there's no traffic flowing. Any help is appreciated. Thanks, Ravi From swise at opengridcomputing.com Fri Jul 7 11:07:07 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 07 Jul 2006 13:07:07 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <000b01c6a1ef$d4e745e0$4710100a@pc.s2io.com> References: <000b01c6a1ef$d4e745e0$4710100a@pc.s2io.com> Message-ID: <1152295627.18332.26.camel@stevo-desktop> adding -d to rping might give more debug info... Also if the server side core dumps, please get a stack trace and lets see whats up... Steve. On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: > Hi, > I have two systems connected back-to-back with ammasso cards. I have > configured and built the libraries and drivers as mentioned in > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 > > I can ping on the regular ethX interface but both rping and krping on iwarp > interfaces fail (no packets sent/received). > Following is the configuration: > System1: > eth interface: 17.1.1.101 > iwarp interface: 17.2.2.101 > > System2: > eth interface: 17.1.1.102 > iwarp interface: 17.2.2.102 > > On System2 I run > # rping -s -vV -C100 -a 0.0.0.0 -p 9999 > > On System1 I run > # rping -c -vV -C100 -a 17.2.2.102 -p 9999 > > Sometimes I have noticed that when client is started, the server core dumps. > Other times, there is no core dump but there's no traffic flowing. > > Any help is appreciated. > > Thanks, > Ravi From ravinandan.arakali at neterion.com Fri Jul 7 11:20:58 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 7 Jul 2006 11:20:58 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <1152295627.18332.26.camel@stevo-desktop> Message-ID: <000c01c6a1f2$18a34a70$4710100a@pc.s2io.com> I added -d to rping command. I am not seeing the core dump on this run. Following is the output(and it blocks at this point). openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 ipaddr (0.0.0.0) port 9999 created cm_id 0x804e6e0 rdma_bind_addr successful rdma_listen openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 ipaddr (17.2.2.102) port 9999 created cm_id 0x506b00 cma_event type 0 cma_id 0x506b00 (parent) cma_event type 2 cma_id 0x506b00 (parent) rdma_resolve_addr - rdma_resolve_route successful -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Friday, July 07, 2006 11:07 AM To: ravinandan.arakali at neterion.com Cc: openib-general at openib.org Subject: Re: ping problem with ammasso cards(iWARP interface) adding -d to rping might give more debug info... Also if the server side core dumps, please get a stack trace and lets see whats up... Steve. On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: > Hi, > I have two systems connected back-to-back with ammasso cards. I have > configured and built the libraries and drivers as mentioned in > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 > > I can ping on the regular ethX interface but both rping and krping on iwarp > interfaces fail (no packets sent/received). > Following is the configuration: > System1: > eth interface: 17.1.1.101 > iwarp interface: 17.2.2.101 > > System2: > eth interface: 17.1.1.102 > iwarp interface: 17.2.2.102 > > On System2 I run > # rping -s -vV -C100 -a 0.0.0.0 -p 9999 > > On System1 I run > # rping -c -vV -C100 -a 17.2.2.102 -p 9999 > > Sometimes I have noticed that when client is started, the server core dumps. > Other times, there is no core dump but there's no traffic flowing. > > Any help is appreciated. > > Thanks, > Ravi From swise at opengridcomputing.com Fri Jul 7 11:30:17 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 07 Jul 2006 13:30:17 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <000c01c6a1f2$18a34a70$4710100a@pc.s2io.com> References: <000c01c6a1f2$18a34a70$4710100a@pc.s2io.com> Message-ID: <1152297018.18332.44.camel@stevo-desktop> Just to be clear: openfab2's rdma address is 17.2.2.102? On Fri, 2006-07-07 at 11:20 -0700, Ravinandan Arakali wrote: > I added -d to rping command. I am not seeing the core dump on > this run. Following is the output(and it blocks at this point). > > openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > rdma_bind_addr successful > rdma_listen > > openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful > > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Friday, July 07, 2006 11:07 AM > To: ravinandan.arakali at neterion.com > Cc: openib-general at openib.org > Subject: Re: ping problem with ammasso cards(iWARP interface) > > > adding -d to rping might give more debug info... > > Also if the server side core dumps, please get a stack trace and lets > see whats up... > > Steve. > > > On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: > > Hi, > > I have two systems connected back-to-back with ammasso cards. I have > > configured and built the libraries and drivers as mentioned in > > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 > > > > I can ping on the regular ethX interface but both rping and krping on > iwarp > > interfaces fail (no packets sent/received). > > Following is the configuration: > > System1: > > eth interface: 17.1.1.101 > > iwarp interface: 17.2.2.101 > > > > System2: > > eth interface: 17.1.1.102 > > iwarp interface: 17.2.2.102 > > > > On System2 I run > > # rping -s -vV -C100 -a 0.0.0.0 -p 9999 > > > > On System1 I run > > # rping -c -vV -C100 -a 17.2.2.102 -p 9999 > > > > Sometimes I have noticed that when client is started, the server core > dumps. > > Other times, there is no core dump but there's no traffic flowing. > > > > Any help is appreciated. > > > > Thanks, > > Ravi From ravinandan.arakali at neterion.com Fri Jul 7 11:36:42 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 7 Jul 2006 11:36:42 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <1152297018.18332.44.camel@stevo-desktop> Message-ID: <000d01c6a1f4$4b4f05c0$4710100a@pc.s2io.com> Yes, that's correct. Output of openfab and openfab2. openfab:~ # ifconfig eth0 Link encap:Ethernet HWaddr 00:E0:81:2A:B5:F5 inet addr:172.10.1.251 Bcast:172.10.1.255 Mask:255.255.255.0 inet6 addr: fe80::2e0:81ff:fe2a:b5f5/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2963 errors:0 dropped:0 overruns:0 frame:0 TX packets:389 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:273099 (266.6 Kb) TX bytes:77406 (75.5 Kb) eth1 Link encap:Ethernet HWaddr 00:0D:B2:00:06:40 inet addr:17.1.1.101 Bcast:17.1.1.255 Mask:255.255.255.0 inet6 addr: fe80::20d:b2ff:fe00:640/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4 errors:0 dropped:0 overruns:0 frame:0 TX packets:10 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:316 (316.0 b) TX bytes:748 (748.0 b) Interrupt:185 iw1 Link encap:Ethernet HWaddr 00:0D:B2:00:06:41 inet addr:17.2.2.101 Mask:255.255.255.0 inet6 addr: fe80::20d:b2ff:fe00:641/64 Scope:Link UP RUNNING NOARP MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:122 errors:0 dropped:0 overruns:0 frame:0 TX packets:122 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:12503 (12.2 Kb) TX bytes:12503 (12.2 Kb) openfab2:/usr/src/linux-2.6.16.20 # ifconfig eth1 Link encap:Ethernet HWaddr 00:06:5B:F2:B1:24 inet addr:172.10.1.249 Bcast:172.10.1.255 Mask:255.255.255.0 inet6 addr: fe80::206:5bff:fef2:b124/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3454 errors:0 dropped:0 overruns:0 frame:0 TX packets:728 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:327164 (319.4 Kb) TX bytes:129892 (126.8 Kb) Base address:0xdce0 Memory:fdbe0000-fdc00000 eth3 Link encap:Ethernet HWaddr 00:0D:B2:00:12:8A inet addr:17.1.1.102 Bcast:17.1.1.255 Mask:255.255.255.0 inet6 addr: fe80::20d:b2ff:fe00:128a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7 errors:0 dropped:0 overruns:0 frame:0 TX packets:10 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:516 (516.0 b) TX bytes:748 (748.0 b) Interrupt:193 iw0 Link encap:Ethernet HWaddr 00:0D:B2:00:12:8B inet addr:17.2.2.102 Mask:255.255.255.0 inet6 addr: fe80::20d:b2ff:fe00:128b/64 Scope:Link UP RUNNING NOARP MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:672 (672.0 b) TX bytes:672 (672.0 b) -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Friday, July 07, 2006 11:30 AM To: ravinandan.arakali at neterion.com Cc: openib-general at openib.org Subject: RE: ping problem with ammasso cards(iWARP interface) Just to be clear: openfab2's rdma address is 17.2.2.102? On Fri, 2006-07-07 at 11:20 -0700, Ravinandan Arakali wrote: > I added -d to rping command. I am not seeing the core dump on > this run. Following is the output(and it blocks at this point). > > openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > rdma_bind_addr successful > rdma_listen > > openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful > > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Friday, July 07, 2006 11:07 AM > To: ravinandan.arakali at neterion.com > Cc: openib-general at openib.org > Subject: Re: ping problem with ammasso cards(iWARP interface) > > > adding -d to rping might give more debug info... > > Also if the server side core dumps, please get a stack trace and lets > see whats up... > > Steve. > > > On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: > > Hi, > > I have two systems connected back-to-back with ammasso cards. I have > > configured and built the libraries and drivers as mentioned in > > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 > > > > I can ping on the regular ethX interface but both rping and krping on > iwarp > > interfaces fail (no packets sent/received). > > Following is the configuration: > > System1: > > eth interface: 17.1.1.101 > > iwarp interface: 17.2.2.101 > > > > System2: > > eth interface: 17.1.1.102 > > iwarp interface: 17.2.2.102 > > > > On System2 I run > > # rping -s -vV -C100 -a 0.0.0.0 -p 9999 > > > > On System1 I run > > # rping -c -vV -C100 -a 17.2.2.102 -p 9999 > > > > Sometimes I have noticed that when client is started, the server core > dumps. > > Other times, there is no core dump but there's no traffic flowing. > > > > Any help is appreciated. > > > > Thanks, > > Ravi From bpradip at in.ibm.com Fri Jul 7 11:36:47 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Sat, 08 Jul 2006 00:06:47 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <000c01c6a1f2$18a34a70$4710100a@pc.s2io.com> References: <000c01c6a1f2$18a34a70$4710100a@pc.s2io.com> Message-ID: <44AEA9BF.3070106@in.ibm.com> Ravinandan Arakali wrote: > I added -d to rping command. I am not seeing the core dump on > this run. Following is the output(and it blocks at this point). > > openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > rdma_bind_addr successful > rdma_listen > > openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful > > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Friday, July 07, 2006 11:07 AM > To: ravinandan.arakali at neterion.com > Cc: openib-general at openib.org > Subject: Re: ping problem with ammasso cards(iWARP interface) > > > adding -d to rping might give more debug info... > > Also if the server side core dumps, please get a stack trace and lets > see whats up... > > Steve. > > > On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: >> Hi, >> I have two systems connected back-to-back with ammasso cards. I have >> configured and built the libraries and drivers as mentioned in >> https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 >> >> I can ping on the regular ethX interface but both rping and krping on > iwarp >> interfaces fail (no packets sent/received). >> Following is the configuration: >> System1: >> eth interface: 17.1.1.101 >> iwarp interface: 17.2.2.101 >> >> System2: >> eth interface: 17.1.1.102 >> iwarp interface: 17.2.2.102 >> >> On System2 I run >> # rping -s -vV -C100 -a 0.0.0.0 -p 9999 >> >> On System1 I run >> # rping -c -vV -C100 -a 17.2.2.102 -p 9999 >> >> Sometimes I have noticed that when client is started, the server core > dumps. >> Other times, there is no core dump but there's no traffic flowing. Ravi, Did you try with the 'krping' (kernel module) utility ? This is the same as 'rping' but it's a kernel module instead of a user space application. Also hope you are using the latest stack. Also enable debugging for the Ammasso driver (modprobe rdma_krping debug=3) and send us the dmesg output. Thanks, Pradipta Kumar >> >> Any help is appreciated. >> >> Thanks, >> Ravi > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From sean.hefty at intel.com Fri Jul 7 11:30:56 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 7 Jul 2006 11:30:56 -0700 Subject: [openib-general] user_mad check question Message-ID: <000101c6a1f3$7c7783d0$8698070a@amr.corp.intel.com> The following check in user_mad is done when sending a MAD. /* * If userspace is generating a request that will generate a * response, we need to make sure the high-order part of the * transaction ID matches the agent being used to send the * MAD. */ method = ((struct ib_mad_hdr *) packet->msg->mad)->method; if (!(method & IB_MGMT_METHOD_RESP) && method != IB_MGMT_METHOD_TRAP_REPRESS && method != IB_MGMT_METHOD_SEND) { tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid; *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 | (be64_to_cpup(tid) & 0xffffffff)); } Does anyone know the reason for the last check (method != SEND)? Isn't it still a requirement that the TID be unique? I'm trying to add some validation checks against MADs sent from userspace to protect against a client sending non-data RMPP MADs, duplicate MADs, etc., and the above check looks incorrect to me. - Sean From ravinandan.arakali at neterion.com Fri Jul 7 11:43:23 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 7 Jul 2006 11:43:23 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <44AEA9BF.3070106@in.ibm.com> Message-ID: <000e01c6a1f5$3a67d2e0$4710100a@pc.s2io.com> Pradipta kumar, I had tried krping earlier but same result as with rping. I believe I am using latest stack since it was checked out from https://openib.org/svn/gen2/trunk I will try bumping up debug level on rdma_krping. Do you want the dmesg output while running rping or krping ? Ravi -----Original Message----- From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] Sent: Friday, July 07, 2006 11:37 AM To: Ravinandan Arakali Cc: openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARP interface) Ravinandan Arakali wrote: > I added -d to rping command. I am not seeing the core dump on > this run. Following is the output(and it blocks at this point). > > openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > rdma_bind_addr successful > rdma_listen > > openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful > > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Friday, July 07, 2006 11:07 AM > To: ravinandan.arakali at neterion.com > Cc: openib-general at openib.org > Subject: Re: ping problem with ammasso cards(iWARP interface) > > > adding -d to rping might give more debug info... > > Also if the server side core dumps, please get a stack trace and lets > see whats up... > > Steve. > > > On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: >> Hi, >> I have two systems connected back-to-back with ammasso cards. I have >> configured and built the libraries and drivers as mentioned in >> https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 >> >> I can ping on the regular ethX interface but both rping and krping on > iwarp >> interfaces fail (no packets sent/received). >> Following is the configuration: >> System1: >> eth interface: 17.1.1.101 >> iwarp interface: 17.2.2.101 >> >> System2: >> eth interface: 17.1.1.102 >> iwarp interface: 17.2.2.102 >> >> On System2 I run >> # rping -s -vV -C100 -a 0.0.0.0 -p 9999 >> >> On System1 I run >> # rping -c -vV -C100 -a 17.2.2.102 -p 9999 >> >> Sometimes I have noticed that when client is started, the server core > dumps. >> Other times, there is no core dump but there's no traffic flowing. Ravi, Did you try with the 'krping' (kernel module) utility ? This is the same as 'rping' but it's a kernel module instead of a user space application. Also hope you are using the latest stack. Also enable debugging for the Ammasso driver (modprobe rdma_krping debug=3) and send us the dmesg output. Thanks, Pradipta Kumar >> >> Any help is appreciated. >> >> Thanks, >> Ravi > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From bpradip at in.ibm.com Fri Jul 7 11:47:59 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Sat, 08 Jul 2006 00:17:59 +0530 Subject: [openib-general] [ANNOUNCE] NetPIPE 3.7 release candidate 1 In-Reply-To: <44AE9CE1.2080808@ichips.intel.com> References: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> <44AE3644.1000008@in.ibm.com> <44AE9190.3010605@scl.ameslab.gov> <44AE9CE1.2080808@ichips.intel.com> Message-ID: <44AEAC5F.5000602@in.ibm.com> Sean Hefty wrote: > Troy Benjegerdes wrote: >> Is the connection manager api considered 'stable' yet? Last I knew it >> was still undergoing a lot of development. > > The connection manager API is relatively stable, and has been for a while. > Changes to the API have come in the form of new options and calls, rather than > changes to the existing calls. True. We have been using it for quite some time for all our work on iWarp. > >> Wouldn't it be possible to adapt the existing tcp connection setup code >> to work for Iwarp without using the connection manager? Most of the time >> when I test IB networks, I just bind to ethernet eth0 interfaces anyway. > > My understanding is that it's not possible to connect iWarp QPs without using > the connection manager. For IB, it's possible to connect over sockets and > exchange QP information over a TCP connection, but the same is not true for iWarp. > > - Sean Troy, the NetPIPE code needs to use the cm for successful working with iwarp. May be the changes can be in a seperate file altogether (like cm.c as in ibv.c) and have a seperate option in the Makefile to enable the same. Thanks, Pradipta Kumar > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From bpradip at in.ibm.com Fri Jul 7 11:57:26 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Sat, 08 Jul 2006 00:27:26 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <000e01c6a1f5$3a67d2e0$4710100a@pc.s2io.com> References: <000e01c6a1f5$3a67d2e0$4710100a@pc.s2io.com> Message-ID: <44AEAE96.9050803@in.ibm.com> Ravinandan Arakali wrote: > Pradipta kumar, > I had tried krping earlier but same result as with rping. > I believe I am using latest stack since it was checked out from > https://openib.org/svn/gen2/trunk For iWARP use this https://openib.org/svn/gen2/branches/iwarp I am not sure why, but the main trunk no longer contain iwarp specific code. > > I will try bumping up debug level on rdma_krping. Do you > want the dmesg output while running rping or krping ? anything will do. Also I only mentioned for rdma_krping, please do the same for the ammasso driver (modprobe iw_c2 debug=16) too > > Ravi > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 11:37 AM > To: Ravinandan Arakali > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: >> I added -d to rping command. I am not seeing the core dump on >> this run. Following is the output(and it blocks at this point). >> >> openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 >> ipaddr (0.0.0.0) >> port 9999 >> created cm_id 0x804e6e0 >> rdma_bind_addr successful >> rdma_listen >> >> openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 >> ipaddr (17.2.2.102) >> port 9999 >> created cm_id 0x506b00 >> cma_event type 0 cma_id 0x506b00 (parent) >> cma_event type 2 cma_id 0x506b00 (parent) >> rdma_resolve_addr - rdma_resolve_route successful >> >> >> -----Original Message----- >> From: Steve Wise [mailto:swise at opengridcomputing.com] >> Sent: Friday, July 07, 2006 11:07 AM >> To: ravinandan.arakali at neterion.com >> Cc: openib-general at openib.org >> Subject: Re: ping problem with ammasso cards(iWARP interface) >> >> >> adding -d to rping might give more debug info... >> >> Also if the server side core dumps, please get a stack trace and lets >> see whats up... >> >> Steve. >> >> >> On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: >>> Hi, >>> I have two systems connected back-to-back with ammasso cards. I have >>> configured and built the libraries and drivers as mentioned in >>> > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 >>> I can ping on the regular ethX interface but both rping and krping on >> iwarp >>> interfaces fail (no packets sent/received). >>> Following is the configuration: >>> System1: >>> eth interface: 17.1.1.101 >>> iwarp interface: 17.2.2.101 >>> >>> System2: >>> eth interface: 17.1.1.102 >>> iwarp interface: 17.2.2.102 >>> >>> On System2 I run >>> # rping -s -vV -C100 -a 0.0.0.0 -p 9999 >>> >>> On System1 I run >>> # rping -c -vV -C100 -a 17.2.2.102 -p 9999 >>> >>> Sometimes I have noticed that when client is started, the server core >> dumps. >>> Other times, there is no core dump but there's no traffic flowing. > > Ravi, > Did you try with the 'krping' (kernel module) utility ? This is the same > as > 'rping' but it's a kernel module instead of a user space application. > Also hope you are using the latest stack. > Also enable debugging for the Ammasso driver (modprobe rdma_krping debug=3) > and > send us the dmesg output. > > Thanks, > Pradipta Kumar > >>> Any help is appreciated. >>> >>> Thanks, >>> Ravi >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general >> > From swise at opengridcomputing.com Fri Jul 7 12:02:15 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 07 Jul 2006 14:02:15 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <000e01c6a1f5$3a67d2e0$4710100a@pc.s2io.com> References: <000e01c6a1f5$3a67d2e0$4710100a@pc.s2io.com> Message-ID: <1152298935.11928.4.camel@stevo-desktop> On Fri, 2006-07-07 at 11:43 -0700, Ravinandan Arakali wrote: > Pradipta kumar, > I had tried krping earlier but same result as with rping. > I believe I am using latest stack since it was checked out from > https://openib.org/svn/gen2/trunk > Um, that's not the iwarp branch. Its: https://openib.org/svn/gen2/branches/iwarp Steve. From swise at opengridcomputing.com Fri Jul 7 12:03:03 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 07 Jul 2006 14:03:03 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <44AEAE96.9050803@in.ibm.com> References: <000e01c6a1f5$3a67d2e0$4710100a@pc.s2io.com> <44AEAE96.9050803@in.ibm.com> Message-ID: <1152298983.11928.6.camel@stevo-desktop> On Sat, 2006-07-08 at 00:27 +0530, Pradipta Kumar Banerjee wrote: > Ravinandan Arakali wrote: > > Pradipta kumar, > > I had tried krping earlier but same result as with rping. > > I believe I am using latest stack since it was checked out from > > https://openib.org/svn/gen2/trunk > For iWARP use this https://openib.org/svn/gen2/branches/iwarp > > I am not sure why, but the main trunk no longer contain iwarp specific code. > The main trunk has never contained iwarp-specific code. We're working on integrating it in so we don't need an iwarp branch... Steve. From ravinandan.arakali at neterion.com Fri Jul 7 12:03:00 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 7 Jul 2006 12:03:00 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <44AEAE96.9050803@in.ibm.com> Message-ID: <001001c6a1f7$f7ae09d0$4710100a@pc.s2io.com> Sorry, I gave the wrong URL earlier. I checked out from: svn co -----Original Message----- From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] Sent: Friday, July 07, 2006 11:57 AM To: ravinandan.arakali at neterion.com Cc: openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARP interface) Ravinandan Arakali wrote: > Pradipta kumar, > I had tried krping earlier but same result as with rping. > I believe I am using latest stack since it was checked out from > https://openib.org/svn/gen2/trunk For iWARP use this https://openib.org/svn/gen2/branches/iwarp I am not sure why, but the main trunk no longer contain iwarp specific code. > > I will try bumping up debug level on rdma_krping. Do you > want the dmesg output while running rping or krping ? anything will do. Also I only mentioned for rdma_krping, please do the same for the ammasso driver (modprobe iw_c2 debug=16) too > > Ravi > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 11:37 AM > To: Ravinandan Arakali > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: >> I added -d to rping command. I am not seeing the core dump on >> this run. Following is the output(and it blocks at this point). >> >> openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 >> ipaddr (0.0.0.0) >> port 9999 >> created cm_id 0x804e6e0 >> rdma_bind_addr successful >> rdma_listen >> >> openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 >> ipaddr (17.2.2.102) >> port 9999 >> created cm_id 0x506b00 >> cma_event type 0 cma_id 0x506b00 (parent) >> cma_event type 2 cma_id 0x506b00 (parent) >> rdma_resolve_addr - rdma_resolve_route successful >> >> >> -----Original Message----- >> From: Steve Wise [mailto:swise at opengridcomputing.com] >> Sent: Friday, July 07, 2006 11:07 AM >> To: ravinandan.arakali at neterion.com >> Cc: openib-general at openib.org >> Subject: Re: ping problem with ammasso cards(iWARP interface) >> >> >> adding -d to rping might give more debug info... >> >> Also if the server side core dumps, please get a stack trace and lets >> see whats up... >> >> Steve. >> >> >> On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: >>> Hi, >>> I have two systems connected back-to-back with ammasso cards. I have >>> configured and built the libraries and drivers as mentioned in >>> > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 >>> I can ping on the regular ethX interface but both rping and krping on >> iwarp >>> interfaces fail (no packets sent/received). >>> Following is the configuration: >>> System1: >>> eth interface: 17.1.1.101 >>> iwarp interface: 17.2.2.101 >>> >>> System2: >>> eth interface: 17.1.1.102 >>> iwarp interface: 17.2.2.102 >>> >>> On System2 I run >>> # rping -s -vV -C100 -a 0.0.0.0 -p 9999 >>> >>> On System1 I run >>> # rping -c -vV -C100 -a 17.2.2.102 -p 9999 >>> >>> Sometimes I have noticed that when client is started, the server core >> dumps. >>> Other times, there is no core dump but there's no traffic flowing. > > Ravi, > Did you try with the 'krping' (kernel module) utility ? This is the same > as > 'rping' but it's a kernel module instead of a user space application. > Also hope you are using the latest stack. > Also enable debugging for the Ammasso driver (modprobe rdma_krping debug=3) > and > send us the dmesg output. > > Thanks, > Pradipta Kumar > >>> Any help is appreciated. >>> >>> Thanks, >>> Ravi >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general >> > From bpradip at in.ibm.com Fri Jul 7 12:30:18 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Sat, 08 Jul 2006 01:00:18 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <000c01c6a1f2$18a34a70$4710100a@pc.s2io.com> References: <000c01c6a1f2$18a34a70$4710100a@pc.s2io.com> Message-ID: <44AEB64A.1000903@in.ibm.com> Ravinandan Arakali wrote: > I added -d to rping command. I am not seeing the core dump on > this run. Following is the output(and it blocks at this point). > > openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > rdma_bind_addr successful > rdma_listen > > openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful After this the client should have printed something like created pd 0x.... created channel 0x.... It seems the client is stuck while allocating pd..(rping.c = rping_setup_qp -> ibv_alloc_pd) Is this the complete output you have pasted? Don't you get any errors on the screen ? Thanks Pradipta > > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Friday, July 07, 2006 11:07 AM > To: ravinandan.arakali at neterion.com > Cc: openib-general at openib.org > Subject: Re: ping problem with ammasso cards(iWARP interface) > > > adding -d to rping might give more debug info... > > Also if the server side core dumps, please get a stack trace and lets > see whats up... > > Steve. > > > On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: >> Hi, >> I have two systems connected back-to-back with ammasso cards. I have >> configured and built the libraries and drivers as mentioned in >> https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 >> >> I can ping on the regular ethX interface but both rping and krping on > iwarp >> interfaces fail (no packets sent/received). >> Following is the configuration: >> System1: >> eth interface: 17.1.1.101 >> iwarp interface: 17.2.2.101 >> >> System2: >> eth interface: 17.1.1.102 >> iwarp interface: 17.2.2.102 >> >> On System2 I run >> # rping -s -vV -C100 -a 0.0.0.0 -p 9999 >> >> On System1 I run >> # rping -c -vV -C100 -a 17.2.2.102 -p 9999 >> >> Sometimes I have noticed that when client is started, the server core > dumps. >> Other times, there is no core dump but there's no traffic flowing. >> >> Any help is appreciated. >> >> Thanks, >> Ravi > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From trimmer at silverstorm.com Fri Jul 7 12:54:46 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 7 Jul 2006 15:54:46 -0400 Subject: [openib-general] user_mad check question In-Reply-To: <000101c6a1f3$7c7783d0$8698070a@amr.corp.intel.com> Message-ID: Todd Rimmer Chief Systems Architect SilverStorm Technologies Voice: 610-233-4852 Fax: 610-233-4777 TRimmer at SilverStorm.com www.SilverStorm.com > From: Sean Hefty > Sent: Friday, July 07, 2006 2:31 PM > > The following check in user_mad is done when sending a MAD. > > /* > * If userspace is generating a request that will generate a > * response, we need to make sure the high-order part of the > * transaction ID matches the agent being used to send the > * MAD. > */ > method = ((struct ib_mad_hdr *) packet->msg->mad)->method; > > if (!(method & IB_MGMT_METHOD_RESP) && > method != IB_MGMT_METHOD_TRAP_REPRESS && > method != IB_MGMT_METHOD_SEND) { > tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid; > *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 | > (be64_to_cpup(tid) & 0xffffffff)); > } > > Does anyone know the reason for the last check (method != SEND)? Isn't it > still > a requirement that the TID be unique? > > I'm trying to add some validation checks against MADs sent from userspace > to > protect against a client sending non-data RMPP MADs, duplicate MADs, etc., > and > the above check looks incorrect to me. > While the TID can be appropriate for a SEND (it depends on management class, some classes could chose to always use 0), this code fragment cannot be sure if the SEND is a new request or a response to an existing request. Hence it cannot be certain if it should modify the TID. I have not studied this whole code path, but it might hint at a problem when handling MAD classes which use SEND instead of REQ/RESP type messages. Namely the hi_tid is not inserted for SEND. Does other code on the receive path expect hi_tid? One such class which can use SEND to hold requests and their responses is the BMA. So can the SNMP tunnel and the CM (although CM is not usually done from user space). For the BMA case, a request can be: Class=BM Method=SEND AttributeModifier has BM RESPONSE bit set Todd Rimmer From ravinandan.arakali at neterion.com Fri Jul 7 13:02:04 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 7 Jul 2006 13:02:04 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <44AEAE96.9050803@in.ibm.com> Message-ID: <001101c6a200$38929cb0$4710100a@pc.s2io.com> Attached are the outputs when rping and krping are run. The dmesg output when krping is run is quite verbose. Hopefully, you can throw some light by looking at it. The rping server core dumped as soon as the client was run. Thanks, Ravi -----Original Message----- From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] Sent: Friday, July 07, 2006 11:57 AM To: ravinandan.arakali at neterion.com Cc: openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARP interface) Ravinandan Arakali wrote: > Pradipta kumar, > I had tried krping earlier but same result as with rping. > I believe I am using latest stack since it was checked out from > https://openib.org/svn/gen2/trunk For iWARP use this https://openib.org/svn/gen2/branches/iwarp I am not sure why, but the main trunk no longer contain iwarp specific code. > > I will try bumping up debug level on rdma_krping. Do you > want the dmesg output while running rping or krping ? anything will do. Also I only mentioned for rdma_krping, please do the same for the ammasso driver (modprobe iw_c2 debug=16) too > > Ravi > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 11:37 AM > To: Ravinandan Arakali > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: >> I added -d to rping command. I am not seeing the core dump on >> this run. Following is the output(and it blocks at this point). >> >> openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 >> ipaddr (0.0.0.0) >> port 9999 >> created cm_id 0x804e6e0 >> rdma_bind_addr successful >> rdma_listen >> >> openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 >> ipaddr (17.2.2.102) >> port 9999 >> created cm_id 0x506b00 >> cma_event type 0 cma_id 0x506b00 (parent) >> cma_event type 2 cma_id 0x506b00 (parent) >> rdma_resolve_addr - rdma_resolve_route successful >> >> >> -----Original Message----- >> From: Steve Wise [mailto:swise at opengridcomputing.com] >> Sent: Friday, July 07, 2006 11:07 AM >> To: ravinandan.arakali at neterion.com >> Cc: openib-general at openib.org >> Subject: Re: ping problem with ammasso cards(iWARP interface) >> >> >> adding -d to rping might give more debug info... >> >> Also if the server side core dumps, please get a stack trace and lets >> see whats up... >> >> Steve. >> >> >> On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: >>> Hi, >>> I have two systems connected back-to-back with ammasso cards. I have >>> configured and built the libraries and drivers as mentioned in >>> > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 >>> I can ping on the regular ethX interface but both rping and krping on >> iwarp >>> interfaces fail (no packets sent/received). >>> Following is the configuration: >>> System1: >>> eth interface: 17.1.1.101 >>> iwarp interface: 17.2.2.101 >>> >>> System2: >>> eth interface: 17.1.1.102 >>> iwarp interface: 17.2.2.102 >>> >>> On System2 I run >>> # rping -s -vV -C100 -a 0.0.0.0 -p 9999 >>> >>> On System1 I run >>> # rping -c -vV -C100 -a 17.2.2.102 -p 9999 >>> >>> Sometimes I have noticed that when client is started, the server core >> dumps. >>> Other times, there is no core dump but there's no traffic flowing. > > Ravi, > Did you try with the 'krping' (kernel module) utility ? This is the same > as > 'rping' but it's a kernel module instead of a user space application. > Also hope you are using the latest stack. > Also enable debugging for the Ammasso driver (modprobe rdma_krping debug=3) > and > send us the dmesg output. > > Thanks, > Pradipta Kumar > >>> Any help is appreciated. >>> >>> Thanks, >>> Ravi >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general >> > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rping_output.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: krping_output.txt URL: From bpradip at in.ibm.com Fri Jul 7 13:28:45 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Sat, 08 Jul 2006 01:58:45 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001101c6a200$38929cb0$4710100a@pc.s2io.com> References: <001101c6a200$38929cb0$4710100a@pc.s2io.com> Message-ID: <44AEC3FD.7050604@in.ibm.com> Ravinandan Arakali wrote: > Attached are the outputs when rping and krping are run. > The dmesg output when krping is run is quite verbose. > Hopefully, you can throw some light by looking at it. > The rping server core dumped as soon as the client was run. Ravi, Use the following for krping. You will be able to see the ping data using dmesg. echo "server,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping echo "client,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping As for the rping, can u run rping under gdb (gdb librdmacm/example/.libs/rping) and send the backtrace on segfault. Thanks, Pradipta > > Thanks, > Ravi > > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Fri Jul 7 13:47:03 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 07 Jul 2006 15:47:03 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001101c6a200$38929cb0$4710100a@pc.s2io.com> References: <001101c6a200$38929cb0$4710100a@pc.s2io.com> Message-ID: <1152305223.11928.9.camel@stevo-desktop> By the way, what HW platform, distro, kernel are you using? Steve. On Fri, 2006-07-07 at 13:02 -0700, Ravinandan Arakali wrote: > Attached are the outputs when rping and krping are run. > The dmesg output when krping is run is quite verbose. > Hopefully, you can throw some light by looking at it. > The rping server core dumped as soon as the client was run. > > Thanks, > Ravi > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 11:57 AM > To: ravinandan.arakali at neterion.com > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: > > Pradipta kumar, > > I had tried krping earlier but same result as with rping. > > I believe I am using latest stack since it was checked out from > > https://openib.org/svn/gen2/trunk > For iWARP use this https://openib.org/svn/gen2/branches/iwarp > > I am not sure why, but the main trunk no longer contain iwarp specific code. > > > > > I will try bumping up debug level on rdma_krping. Do you > > want the dmesg output while running rping or krping ? > > anything will do. Also I only mentioned for rdma_krping, please do the same > for > the ammasso driver (modprobe iw_c2 debug=16) too > > > > > > Ravi > > > > -----Original Message----- > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > Sent: Friday, July 07, 2006 11:37 AM > > To: Ravinandan Arakali > > Cc: openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > interface) > > > > > > Ravinandan Arakali wrote: > >> I added -d to rping command. I am not seeing the core dump on > >> this run. Following is the output(and it blocks at this point). > >> > >> openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 > >> ipaddr (0.0.0.0) > >> port 9999 > >> created cm_id 0x804e6e0 > >> rdma_bind_addr successful > >> rdma_listen > >> > >> openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 > >> ipaddr (17.2.2.102) > >> port 9999 > >> created cm_id 0x506b00 > >> cma_event type 0 cma_id 0x506b00 (parent) > >> cma_event type 2 cma_id 0x506b00 (parent) > >> rdma_resolve_addr - rdma_resolve_route successful > >> > >> > >> -----Original Message----- > >> From: Steve Wise [mailto:swise at opengridcomputing.com] > >> Sent: Friday, July 07, 2006 11:07 AM > >> To: ravinandan.arakali at neterion.com > >> Cc: openib-general at openib.org > >> Subject: Re: ping problem with ammasso cards(iWARP interface) > >> > >> > >> adding -d to rping might give more debug info... > >> > >> Also if the server side core dumps, please get a stack trace and lets > >> see whats up... > >> > >> Steve. > >> > >> > >> On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: > >>> Hi, > >>> I have two systems connected back-to-back with ammasso cards. I have > >>> configured and built the libraries and drivers as mentioned in > >>> > > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 > >>> I can ping on the regular ethX interface but both rping and krping on > >> iwarp > >>> interfaces fail (no packets sent/received). > >>> Following is the configuration: > >>> System1: > >>> eth interface: 17.1.1.101 > >>> iwarp interface: 17.2.2.101 > >>> > >>> System2: > >>> eth interface: 17.1.1.102 > >>> iwarp interface: 17.2.2.102 > >>> > >>> On System2 I run > >>> # rping -s -vV -C100 -a 0.0.0.0 -p 9999 > >>> > >>> On System1 I run > >>> # rping -c -vV -C100 -a 17.2.2.102 -p 9999 > >>> > >>> Sometimes I have noticed that when client is started, the server core > >> dumps. > >>> Other times, there is no core dump but there's no traffic flowing. > > > > Ravi, > > Did you try with the 'krping' (kernel module) utility ? This is the > same > > as > > 'rping' but it's a kernel module instead of a user space application. > > Also hope you are using the latest stack. > > Also enable debugging for the Ammasso driver (modprobe rdma_krping > debug=3) > > and > > send us the dmesg output. > > > > Thanks, > > Pradipta Kumar > > > >>> Any help is appreciated. > >>> > >>> Thanks, > >>> Ravi > >> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > >> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Fri Jul 7 13:49:26 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 07 Jul 2006 15:49:26 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <000d01c6a1f4$4b4f05c0$4710100a@pc.s2io.com> References: <000d01c6a1f4$4b4f05c0$4710100a@pc.s2io.com> Message-ID: <1152305366.11928.12.camel@stevo-desktop> > > openfab2:/usr/src/linux-2.6.16.20 # ifconfig > eth1 Link encap:Ethernet HWaddr 00:06:5B:F2:B1:24 > inet addr:172.10.1.249 Bcast:172.10.1.255 Mask:255.255.255.0 > inet6 addr: fe80::206:5bff:fef2:b124/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:3454 errors:0 dropped:0 overruns:0 frame:0 > TX packets:728 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:100 > RX bytes:327164 (319.4 Kb) TX bytes:129892 (126.8 Kb) > Base address:0xdce0 Memory:fdbe0000-fdc00000 > > eth3 Link encap:Ethernet HWaddr 00:0D:B2:00:12:8A > inet addr:17.1.1.102 Bcast:17.1.1.255 Mask:255.255.255.0 > inet6 addr: fe80::20d:b2ff:fe00:128a/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:7 errors:0 dropped:0 overruns:0 frame:0 > TX packets:10 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:516 (516.0 b) TX bytes:748 (748.0 b) > Interrupt:193 > > iw0 Link encap:Ethernet HWaddr 00:0D:B2:00:12:8B > inet addr:17.2.2.102 Mask:255.255.255.0 > inet6 addr: fe80::20d:b2ff:fe00:128b/64 Scope:Link > UP RUNNING NOARP MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > The above is strange. It should be eth3 and iw3. I wonder if the fact that its iw0 is screwing things up. From mshefty at ichips.intel.com Fri Jul 7 13:59:30 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 07 Jul 2006 13:59:30 -0700 Subject: [openib-general] user_mad check question In-Reply-To: References: Message-ID: <44AECB32.6040300@ichips.intel.com> Rimmer, Todd wrote: > While the TID can be appropriate for a SEND (it depends on management > class, some classes could chose to always use 0), this code fragment > cannot be sure if the SEND is a new request or a response to an existing > request. Hence it cannot be certain if it should modify the TID. If the TID is not modified, then there's no guarantee that the send is unique across multiple agents. I guess what I need to determine is when is it wrong to modify the TID of a send. My interpretation of C13-18.1.1 is that sends must also be unique for TID, SGID, and class. I know that the spec states that classes that do not use the TID should set it to 0, but are there any classes defined that do this? > I have not studied this whole code path, but it might hint at a problem > when handling MAD classes which use SEND instead of REQ/RESP type > messages. > Namely the hi_tid is not inserted for SEND. Does other code on the > receive path expect hi_tid? The hi_tid is used to route a response to the correct agent. But since sends wouldn't have the response bit set anyway, routing would need to be done based solely on the class. This is what is done for the CM. Can sends ever be used with RMPP? > One such class which can use SEND to hold requests and their responses > is the BMA. So can the SNMP tunnel and the CM (although CM is not > usually done from user space). For the BMA case, a request can be: > Class=BM > Method=SEND > AttributeModifier has BM RESPONSE bit set Ugh. This makes it really hard to determine if a send is a new operation. - Sean From ravinandan.arakali at neterion.com Fri Jul 7 14:05:38 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 7 Jul 2006 14:05:38 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARPinterface) In-Reply-To: <1152305223.11928.9.camel@stevo-desktop> Message-ID: <001701c6a209$197ff760$4710100a@pc.s2io.com> openfab is an opteron(dual CPU) running SLES10 upgraded to 2.6.16.20 kernel. openfab2 is a 32-bit xeon system running SLES10 RC2.5(2.6.16.18 kernel). -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Friday, July 07, 2006 1:47 PM To: Ravinandan Arakali Cc: bpradip at in.ibm.com; openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARPinterface) By the way, what HW platform, distro, kernel are you using? Steve. On Fri, 2006-07-07 at 13:02 -0700, Ravinandan Arakali wrote: > Attached are the outputs when rping and krping are run. > The dmesg output when krping is run is quite verbose. > Hopefully, you can throw some light by looking at it. > The rping server core dumped as soon as the client was run. > > Thanks, > Ravi > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 11:57 AM > To: ravinandan.arakali at neterion.com > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: > > Pradipta kumar, > > I had tried krping earlier but same result as with rping. > > I believe I am using latest stack since it was checked out from > > https://openib.org/svn/gen2/trunk > For iWARP use this https://openib.org/svn/gen2/branches/iwarp > > I am not sure why, but the main trunk no longer contain iwarp specific code. > > > > > I will try bumping up debug level on rdma_krping. Do you > > want the dmesg output while running rping or krping ? > > anything will do. Also I only mentioned for rdma_krping, please do the same > for > the ammasso driver (modprobe iw_c2 debug=16) too > > > > > > Ravi > > > > -----Original Message----- > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > Sent: Friday, July 07, 2006 11:37 AM > > To: Ravinandan Arakali > > Cc: openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > interface) > > > > > > Ravinandan Arakali wrote: > >> I added -d to rping command. I am not seeing the core dump on > >> this run. Following is the output(and it blocks at this point). > >> > >> openfab2:~ # rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 > >> ipaddr (0.0.0.0) > >> port 9999 > >> created cm_id 0x804e6e0 > >> rdma_bind_addr successful > >> rdma_listen > >> > >> openfab:~ # rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 > >> ipaddr (17.2.2.102) > >> port 9999 > >> created cm_id 0x506b00 > >> cma_event type 0 cma_id 0x506b00 (parent) > >> cma_event type 2 cma_id 0x506b00 (parent) > >> rdma_resolve_addr - rdma_resolve_route successful > >> > >> > >> -----Original Message----- > >> From: Steve Wise [mailto:swise at opengridcomputing.com] > >> Sent: Friday, July 07, 2006 11:07 AM > >> To: ravinandan.arakali at neterion.com > >> Cc: openib-general at openib.org > >> Subject: Re: ping problem with ammasso cards(iWARP interface) > >> > >> > >> adding -d to rping might give more debug info... > >> > >> Also if the server side core dumps, please get a stack trace and lets > >> see whats up... > >> > >> Steve. > >> > >> > >> On Fri, 2006-07-07 at 11:04 -0700, Ravinandan Arakali wrote: > >>> Hi, > >>> I have two systems connected back-to-back with ammasso cards. I have > >>> configured and built the libraries and drivers as mentioned in > >>> > > https://openib.org/tiki/tiki-index.php?page=Install+OpenIB+for+Ammasso1100 > >>> I can ping on the regular ethX interface but both rping and krping on > >> iwarp > >>> interfaces fail (no packets sent/received). > >>> Following is the configuration: > >>> System1: > >>> eth interface: 17.1.1.101 > >>> iwarp interface: 17.2.2.101 > >>> > >>> System2: > >>> eth interface: 17.1.1.102 > >>> iwarp interface: 17.2.2.102 > >>> > >>> On System2 I run > >>> # rping -s -vV -C100 -a 0.0.0.0 -p 9999 > >>> > >>> On System1 I run > >>> # rping -c -vV -C100 -a 17.2.2.102 -p 9999 > >>> > >>> Sometimes I have noticed that when client is started, the server core > >> dumps. > >>> Other times, there is no core dump but there's no traffic flowing. > > > > Ravi, > > Did you try with the 'krping' (kernel module) utility ? This is the > same > > as > > 'rping' but it's a kernel module instead of a user space application. > > Also hope you are using the latest stack. > > Also enable debugging for the Ammasso driver (modprobe rdma_krping > debug=3) > > and > > send us the dmesg output. > > > > Thanks, > > Pradipta Kumar > > > >>> Any help is appreciated. > >>> > >>> Thanks, > >>> Ravi > >> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > >> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From jlentini at netapp.com Fri Jul 7 14:45:55 2006 From: jlentini at netapp.com (James Lentini) Date: Fri, 7 Jul 2006 17:45:55 -0400 (EDT) Subject: [openib-general] [Bug 146] OFED-1.0 DAPL fails to build on SLES10 on IA64 with IA64_FETCHADD error In-Reply-To: <44AD9614.5090907@sgi.com> References: <44AD9614.5090907@sgi.com> Message-ID: On Thu, 6 Jul 2006, John Partridge wrote: > James, > > I have had the opportunity to look into the CONFIG_NR_CPUS compile error > and I think the issue is that in dapl_osd.h (line 83) asm/system.h has > been included (I can't see a reason why) this in turn includes > asm/percpu.h which in turn includes linux/threads.h which does :- > > #ifdef CONFIG_SMP > #define NR_CPUS CONFIG_NR_CPUS > #else > > and of course CONFIG_NR_CPUS is defined in the kernel include file > include/linux/autoconf.h > > I have tried compiling without asm/system.h included in dapl_osd.h > but I com up against the IA64_FETCHADD� undeclared error. > > Do you know why asm/system.h is included in dapl_osd.h ? This is included on IA64 and PPC systems. Since we have not done testing on IA64 or PPC, I'm certain that this was a contribution for a IA64 or PPC DAPL user. For that reason, I'm not certain why the asm/system.h was included. It is certainly possible that it is necessary on PPC but not IA64. As for the IA64_FETCHADD error, I had a typo in the patch I sent you. Could you please re-test with this: Index: dapl/udapl/linux/dapl_osd.h =================================================================== --- dapl/udapl/linux/dapl_osd.h (revision 8253) +++ dapl/udapl/linux/dapl_osd.h (working copy) @@ -80,7 +80,6 @@ #if defined(__ia64__) || defined(__PPC64__) #include -#include #endif /* Useful debug definitions */ Index: Makefile.am =================================================================== --- Makefile.am (revision 8253) +++ Makefile.am (working copy) @@ -1,10 +1,11 @@ # $Id: $ +OSFLAGS = -DOS_RELEASE=$(shell expr `uname -r | cut -f1 -d.` \* 65536 + `uname -r | cut -f2 -d.`) # Check for RedHat, needed for ia64 udapl atomic operations (IA64_FETCHADD syntax) if OS_RHEL -OSFLAGS=-DREDHAT_EL4 +OSFLAGS += -DREDHAT_EL4 else -OSFLAGS= +OSFLAGS += endif if DEBUG From trimmer at silverstorm.com Fri Jul 7 15:11:58 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 7 Jul 2006 18:11:58 -0400 Subject: [openib-general] user_mad check question In-Reply-To: <44AECB32.6040300@ichips.intel.com> Message-ID: > From: Sean > > > Can sends ever be used with RMPP? Yes, however no present classes define this, but that does not rule out future classes. 13.6.2 indicates the method field may be used by an RMPP class in any way. > > > One such class which can use SEND to hold requests and their responses > > is the BMA. So can the SNMP tunnel and the CM (although CM is not > > usually done from user space). For the BMA case, a request can be: > > Class=BM > > Method=SEND > > AttributeModifier has BM RESPONSE bit set > > Ugh. This makes it really hard to determine if a send is a new operation. The BM case can be tested for specifically, however future classes or vendor specific classes are tougher. The approach we took in our proprietary stack was to classify messages as request/responses based on presently defined classes (BM was only special case needed). We permitted any consumer to issue requests (and created unique tids which helped the stack direct the response back to the client). For a consumer to get inbound requests or any other unsoliticed messages (such as SEND), it had to register with the stack for that class and we only permitted one such consumer to register per class. This permitted those messages to be directed solely based on class. In this context consumer could be kernel or user level. For example the CM and PMA were each kernel level consumers. A user level consumer was also possible. In this context we defined a request as: (R bit = 0 && ! TRAP_REPRESS && ! SEND) || (Class=BM && SEND && AttributeModifier BM Response bit not set) We defined a response as: ((R bit set || TRAP_REPRESS) && ! SEND) || (Class=BM && SEND && AttributeModifier BM Response bit set) While similar these are not strictly complements of eachother. There are some messages (such as SEND for the CM) which will be classified as neither a request nor a response. For such messages the TID cannot be safely altered and those messages would be routed only by Class to exactly 1 consumer so registered. This approach allowed multiple clients for a given standard class (such as the BM, PM, etc) to exist on a given node. However only a single agent could run on a node. Vendor specific classes choosing to use send would be limited to a single client OR agent per node. To date we have not found any cases where this limitation was an issue. Since the above algorithm for request/respone identification is a little tricky and subject to possible extension in the future (as new classes are defined), I would recommend putting this in a inline function or macro alongside the datatypes for mad packets. Todd Rimmer From ravinandan.arakali at neterion.com Fri Jul 7 15:21:44 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 7 Jul 2006 15:21:44 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <44AEC3FD.7050604@in.ibm.com> Message-ID: <001a01c6a213$bb5efa40$4710100a@pc.s2io.com> Pradipta, Following is the output from gdb after core dump. I have also copy-pasted the gdb output on client system. Attached is the dmesg output when krping test is run in verbose mode. The ping data on the sender(client) seems okay. The content is shifted forward by one character for each packet. On receiver, after receiving ping pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages can be lost during writing to /var/log/messages ? ----------------------------------------- (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 Starting program: /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a 0.0.0.0 -p 9999 [Thread debugging using libthread_db enabled] [New Thread -1210054992 (LWP 3668)] ipaddr (0.0.0.0) port 9999 created cm_id 0x804e6e0 [New Thread -1210057824 (LWP 3671)] rdma_bind_addr successful rdma_listen cma_event type 4 cma_id 0x804e968 (child) child cma 0x804e968 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1210054992 (LWP 3668)] rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 514 cb->pd = ibv_alloc_pd(cm_id->verbs); (gdb) bt #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 ) at examples/rping.c:767 (gdb) --------------------------------- (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 Starting program: /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a 17.2.2.102 -p 9999 [Thread debugging using libthread_db enabled] [New Thread 47388824908032 (LWP 4620)] ipaddr (17.2.2.102) port 9999 created cm_id 0x506b00 [New Thread 1082132800 (LWP 4623)] cma_event type 0 cma_id 0x506b00 (parent) cma_event type 2 cma_id 0x506b00 (parent) rdma_resolve_addr - rdma_resolve_route successful created pd 0x506e60 created channel 0x506e80 created cq 0x506ea0 created qp 0x506f40 rping_setup_buffers called on cb 0x505010 allocated & registered buffers... [New Thread 1090525504 (LWP 4624)] cq_thread started. -----Original Message----- From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] Sent: Friday, July 07, 2006 1:29 PM To: Ravinandan Arakali Cc: bpradip at in.ibm.com; openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARP interface) Ravinandan Arakali wrote: > Attached are the outputs when rping and krping are run. > The dmesg output when krping is run is quite verbose. > Hopefully, you can throw some light by looking at it. > The rping server core dumped as soon as the client was run. Ravi, Use the following for krping. You will be able to see the ping data using dmesg. echo "server,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping echo "client,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping As for the rping, can u run rping under gdb (gdb librdmacm/example/.libs/rping) and send the backtrace on segfault. Thanks, Pradipta > > Thanks, > Ravi > > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: krping_verbose.txt URL: From troy at scl.ameslab.gov Fri Jul 7 15:55:57 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Fri, 07 Jul 2006 17:55:57 -0500 Subject: [openib-general] [ANNOUNCE] NetPIPE 3.7 release candidate 1 In-Reply-To: <44AEAC5F.5000602@in.ibm.com> References: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> <44AE3644.1000008@in.ibm.com> <44AE9190.3010605@scl.ameslab.gov> <44AE9CE1.2080808@ichips.intel.com> <44AEAC5F.5000602@in.ibm.com> Message-ID: <44AEE67D.3070705@scl.ameslab.gov> Pradipta Kumar Banerjee wrote: > Sean Hefty wrote: >> Troy Benjegerdes wrote: >>> Is the connection manager api considered 'stable' yet? Last I knew >>> it was still undergoing a lot of development. >> >> The connection manager API is relatively stable, and has been for a >> while. Changes to the API have come in the form of new options and >> calls, rather than changes to the existing calls. > True. We have been using it for quite some time for all our work on > iWarp. > >> >>> Wouldn't it be possible to adapt the existing tcp connection setup >>> code to work for Iwarp without using the connection manager? Most of >>> the time when I test IB networks, I just bind to ethernet eth0 >>> interfaces anyway. >> >> My understanding is that it's not possible to connect iWarp QPs >> without using the connection manager. For IB, it's possible to >> connect over sockets and exchange QP information over a TCP >> connection, but the same is not true for iWarp. >> >> - Sean > Troy, the NetPIPE code needs to use the cm for successful working with > iwarp. May be the changes can be in a seperate file altogether (like > cm.c as in ibv.c) and have a seperate option in the Makefile to enable > the same. > > in the future, I'd kinda like to deprecate the old out of bad IP QP connection setup, but it would be nice to separate the code into something like 'ib-cm.c'. I would be happy to take a patch to use the CM instead of the IP connection setup we do now. I'm also interested in how to use the CM for things like the OpenFabrics verbs port of PVFS we are working on (which works on IB right now) From bpradip at in.ibm.com Fri Jul 7 23:19:34 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Sat, 08 Jul 2006 11:49:34 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001a01c6a213$bb5efa40$4710100a@pc.s2io.com> References: <001a01c6a213$bb5efa40$4710100a@pc.s2io.com> Message-ID: <44AF4E76.4090502@in.ibm.com> Ravinandan Arakali wrote: > Pradipta, > Following is the output from gdb after core dump. I have also copy-pasted > the gdb output on client system. > > Attached is the dmesg output when krping test is run in verbose mode. > The ping data on the sender(client) seems okay. The content is shifted > forward by one character for each packet. On receiver, after receiving ping > pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages > can be lost during writing to /var/log/messages ? krping is indeed working!!...Using 'verbose' allows you to see the ping data. When not using 'verbose' you see only 'send/recv' messages. > > ----------------------------------------- > (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > Starting program: > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > 0.0.0.0 -p 9999 > [Thread debugging using libthread_db enabled] > [New Thread -1210054992 (LWP 3668)] > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > [New Thread -1210057824 (LWP 3671)] > rdma_bind_addr successful > rdma_listen > cma_event type 4 cma_id 0x804e968 (child) > child cma 0x804e968 > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1210054992 (LWP 3668)] > rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > (gdb) bt > #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > ) at examples/rping.c:767 > (gdb) > > --------------------------------- > (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > Starting program: > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > 17.2.2.102 -p 9999 > [Thread debugging using libthread_db enabled] > [New Thread 47388824908032 (LWP 4620)] > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > [New Thread 1082132800 (LWP 4623)] > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful > created pd 0x506e60 > created channel 0x506e80 > created cq 0x506ea0 > created qp 0x506f40 > rping_setup_buffers called on cb 0x505010 > allocated & registered buffers... > [New Thread 1090525504 (LWP 4624)] > cq_thread started. > > From mst at mellanox.co.il Thu Jul 6 02:28:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 6 Jul 2006 12:28:34 +0300 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <44ACD692.1010802@in.ibm.com> References: <44ACD692.1010802@in.ibm.com> Message-ID: <20060706092834.GA9116@mellanox.co.il> Quoting r. Pradipta Kumar Banerjee : > Subject: Re: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM > > Michael S. Tsirkin wrote: > > Quoting r. Pradipta Kumar Banerjee : > >> IMO using rcq seems to be a generic and better solution. > > > > Hmm, I see. Need to document the message format then. > > We are only pasing the vaddr there, right? > > > > Michael, > Actually 'rcq' is being used for handling the 'start' and 'done' messages. > As for the lid, qpn, psn, rkey and vaddr, these gets exchanged as part of the > rdma_listen/rdma_connect calls. See pp_server_connect and pp_client_connect. > OTH I tried testing rdma_bw on Ammasso iWARP without exchanging the 'start' and > 'done' messages and it worked. I am not sure if this is the right thing to do. > Maybe Steve can throw more light on this. This makes sense. But why do we need the start message then? -- MST From mst at mellanox.co.il Sun Jul 9 02:05:21 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 9 Jul 2006 12:05:21 +0300 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference Message-ID: <20060709090521.GB2609@mellanox.co.il> Any objections to the following cleanup patch? --- ib_fmr_pool_map_phys gets the virtual address by pointer but never writes there, and users (e.g. srp) seem to assume this and ignore the value returned. This patch cleans up the API to get the VA by value, and updates all users. Signed-off-by: Michael S. Tsirkin diff --git a/include/rdma/ib_fmr_pool.h b/include/rdma/ib_fmr_pool.h index 4ace54c..00dadbf 100644 --- a/include/rdma/ib_fmr_pool.h +++ b/include/rdma/ib_fmr_pool.h @@ -88,7 +88,7 @@ int ib_flush_fmr_pool(struct ib_fmr_pool struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle, u64 *page_list, int list_len, - u64 *io_virtual_address); + u64 io_virtual_address); int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr); diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c index 615fe9c..86a3b2d 100644 --- a/drivers/infiniband/core/fmr_pool.c +++ b/drivers/infiniband/core/fmr_pool.c @@ -426,7 +426,7 @@ EXPORT_SYMBOL(ib_flush_fmr_pool); struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle, u64 *page_list, int list_len, - u64 *io_virtual_address) + u64 io_virtual_address) { struct ib_fmr_pool *pool = pool_handle; struct ib_pool_fmr *fmr; @@ -440,7 +440,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys fmr = ib_fmr_cache_lookup(pool, page_list, list_len, - *io_virtual_address); + io_virtual_address); if (fmr) { /* found in cache */ ++fmr->ref_count; @@ -464,7 +464,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys spin_unlock_irqrestore(&pool->pool_lock, flags); result = ib_map_phys_fmr(fmr->fmr, page_list, list_len, - *io_virtual_address); + io_virtual_address); if (result) { spin_lock_irqsave(&pool->pool_lock, flags); @@ -481,7 +481,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys fmr->ref_count = 1; if (pool->cache_bucket) { - fmr->io_virtual_address = *io_virtual_address; + fmr->io_virtual_address = io_virtual_address; fmr->page_list_len = list_len; memcpy(fmr->page_list, page_list, list_len * sizeof(*page_list)); diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index ff117bb..72febf1 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -594,7 +594,7 @@ int iser_reg_page_vec(struct iser_conn mem = ib_fmr_pool_map_phys(ib_conn->fmr_pool, page_list, page_vec->length, - &io_addr); + io_addr); if (IS_ERR(mem)) { status = (int)PTR_ERR(mem); diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 4e22afe..9bf182a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -615,7 +615,7 @@ static int srp_map_fmr(struct srp_device (sg_dma_address(&scat[i]) & dev->fmr_page_mask) + j; req->fmr = ib_fmr_pool_map_phys(dev->fmr_pool, - dma_pages, page_cnt, &io_addr); + dma_pages, page_cnt, io_addr); if (IS_ERR(req->fmr)) { ret = PTR_ERR(req->fmr); goto out; -- MST From dotanb at mellanox.co.il Sun Jul 9 05:16:21 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Sun, 9 Jul 2006 15:16:21 +0300 Subject: [openib-general] [ucm] device file of the ucm is not being created In-Reply-To: <44AD44DB.4070201@ichips.intel.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30271BBB0@mtlexch01.mtl.com> <44AD44DB.4070201@ichips.intel.com> Message-ID: <200607091516.21239.dotanb@mellanox.co.il> On Thursday 06 July 2006 20:14, Sean Hefty wrote: > Dotan Barak wrote: > > KERNEL="ucma", NAME="infiniband/%k", MODE="0666" > > KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" > > do you know that is the problem? > > The ucma should be in /sys/class/misc/rdma_cm. Sometimes, the ucmatose is being executed, when we execute it on 10.4.3.142: the first test case that uses the rdma_cm fails, here is a dump from out logs: Daemon command line: /usr/local/bin/ucmatose Client command line: /usr/local/bin/ucmatose 11.4.3.142 =======================+ Daemon Output +==================== CMA: unable to open /dev/infiniband/rdma_cm failed to create event channel =======================+ Client Output +==================== cmatose: starting client cmatose: connecting cmatose: event: 8, error: 8 receiving data transfers sending replies data transfers complete test complete return status 0 The strange thing is that the next test that tries to use the rdma_cm passes ... thanks Dotan From eitan at mellanox.co.il Sun Jul 9 07:35:47 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 9 Jul 2006 17:35:47 +0300 Subject: [openib-general] ib_types.h and Win/Linux consolidation Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A81C4@mtlexch01.mtl.com> Hi Sean, Hal, Fab, I did the obvious diff... Attached is the results. To me most of the differences seem trivial to merge. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Thursday, July 06, 2006 9:32 PM > To: Eitan Zahavi > Cc: openib-general at openib.org; openib-windows at openib.org; Erez Cohen > Subject: Re: [openib-general] ib_types.h and Win/Linux consolidation > > Eitan Zahavi wrote: > > I have been approached by several people asking for where does one > > gets a header file defining the IBTA "wire" protocol. > > I'm not sure that it makes sense to try to have a single header file that defines all wire > data structures. For example, I don't think that any module outside of the IB CM > cares about the CM wire formats. > > Getting and setting values in wire format is also non-trivial for many of the structures > definitions. The approach used so far has been to pack/unpack the structures under > another API, like the sa_query. > > > So the point is clear: If we are missing such a complete IBTA H file > > and people are actually looking for where the wire protocol is being > > defined why shouldn't we promote ib_types.h to the main include directory? > > What wire structures are missing from the include files in include/rdma? > > - Sean -------------- next part -------------- A non-text attachment was scrubbed... Name: ib_types.win2ofed.diff.tgz Type: application/x-compressed Size: 34824 bytes Desc: ib_types.win2ofed.diff.tgz URL: From dotanb at mellanox.co.il Sun Jul 9 08:26:00 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Sun, 9 Jul 2006 18:26:00 +0300 Subject: [openib-general] [ANNOUNCE] NetPIPE 3.7 release candidate 1 In-Reply-To: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> References: <20060706192549.GB6865@minbar-g5.scl.ameslab.gov> Message-ID: <200607091826.01151.dotanb@mellanox.co.il> Hi. On Thursday 06 July 2006 22:25, Troy Benjegerdes wrote: > I am preparing to release an update to the NetPIPE benchmark > ( http://scl.ameslab.gov/Projects/NetPIPE/NetPIPE.html ), and I would > very much like to hear some feedback on the OpenIB verbs implementation > (NPibv), and take any patches to make it build on Windows as well. I downloaded and executed the NetPipe after compile it for openib (using the ibv parameter) over mellanox HCAs (with OFED 1.0). when i executed the test with the default parameters, everything was fine. when i tried to play a little bit with the parameter i saw several issues: 1) when i tried to change the test type + completion mode i got seg fault: w125 # ./NPibv Preposting asynchronous receives (required for Infiniband) NetPIPE: partial read of buffer address in AfterAlignmentInit: Success w126 tmp]# ./NPibv -h 11.4.3.125 -t send_recv -c event Using Send/Receive communications Using VAPI event completion Preposting asynchronous receives (required for Infiniband) Segmentation fault 2) when i tried to change the port number i got seg fault: w125 # ./NPibv Preposting asynchronous receives (required for Infiniband) w126 # ./NPibv -h 11.4.3.125 -P 2 Segmentation fault 3) in the read completion type there is a "vapi_poll" type. what is the meaning of this type over the gen2 driver? thanks Dotan From sean.hefty at intel.com Sun Jul 9 22:56:44 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 9 Jul 2006 22:56:44 -0700 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A81C4@mtlexch01.mtl.com> Message-ID: <000001c6a3e5$9ffca210$dad8180a@amr.corp.intel.com> >Hi Sean, Hal, Fab, > >I did the obvious diff... >Attached is the results. > >To me most of the differences seem trivial to merge. Can you please resend as inline text, or at the very least a plain text attachment? - Sean From krkumar2 at in.ibm.com Mon Jul 10 03:06:58 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 15:36:58 +0530 Subject: [openib-general] [RFC] [PATCH 1/7] ibrdmaverbs config files. Message-ID: <1152526018.27956.9.camel@localhost.localdomain> diff -ruNp ORG/librdmaverbs/autogen.sh NEW/librdmaverbs/autogen.sh --- ORG/librdmaverbs/autogen.sh 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/autogen.sh 2006-07-10 18:07:45.000000000 -0700 @@ -0,0 +1,8 @@ +#! /bin/sh + +set -x +aclocal -I config +libtoolize --force --copy +autoheader +automake --foreign --add-missing --copy +autoconf diff -ruNp ORG/librdmaverbs/librdmaverbs.spec.in NEW/librdmaverbs/librdmaverbs.spec.in --- ORG/librdmaverbs/librdmaverbs.spec.in 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/librdmaverbs.spec.in 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,106 @@ +# $Id: librdmaverbs.spec.in 7631 2006-06-02 19:53:25Z swise $ + +%define ver @VERSION@ + +Name: librdmaverbs +Version: 1.1 +Release: 0.1.pre1%{?dist} +Summary: A library for direct userspace use of InfiniBand + +Group: System Environment/Libraries +License: GPL/BSD +Url: http://openib.org/ +Source: http://openib.org/downloads/librdmaverbs-1.1-pre1.tar.gz +BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) + +%description +librdmaverbs is a library that allows userspace processes to use +InfiniBand/iWARP "verbs" as described in the InfiniBand Architecture +Specification. This includes direct hardware access for fast path +operations. + +For this library to be useful, a device-specific plug-in module should +also be installed. + +%package devel +Summary: Development files for the librdmaverbs library +Group: System Environment/Libraries + +%description devel +Static libraries and header files for the librdmaverbs verbs library. + +%package utils +Summary: Examples for the librdmaverbs library +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release} + +%description utils +Useful librdmaverbs1 example programs such as rdma_devinfo, which +displays information about InfiniBand devices. + +%prep +%setup -q -n %{name}-%{ver} + +%build +%configure +make %{?_smp_mflags} + +%install +rm -rf $RPM_BUILD_ROOT +%makeinstall +# remove unpackaged files from the buildroot +rm -f $RPM_BUILD_ROOT%{_libdir}/*.la + +%clean +rm -rf $RPM_BUILD_ROOT + +%post -p /sbin/ldconfig +%postun -p /sbin/ldconfig + +%files +%defattr(-,root,root,-) +%{_libdir}/librdmaverbs*.so.* +%doc AUTHORS COPYING ChangeLog README + +%files devel +%defattr(-,root,root,-) +%{_libdir}/lib*.so +%{_libdir}/*.a +%{_includedir}/* + +%files utils +%defattr(-,root,root,-) +%{_bindir}/* +%{_mandir}/man1/* + +%changelog +* Mon May 22 2006 Roland Dreier - 1.1-0.1.pre1 +- New upstream release +- Remove dependency on libsysfs, since it is no longer used + +* Thu May 4 2006 Roland Dreier - 1.0.4-1 +- New upstream release + +* Mon Mar 14 2006 Roland Dreier - 1.0.3-1 +- New upstream release + +* Mon Mar 13 2006 Roland Dreier - 1.0.1-1 +- New upstream release + +* Thu Feb 16 2006 Roland Dreier - 1.0-1 +- New upstream release + +* Wed Feb 15 2006 Roland Dreier - 1.0-0.5.rc7 +- New upstream release + +* Sun Jan 22 2006 Roland Dreier - 1.0-0.4.rc6 +- New upstream release + +* Tue Oct 25 2005 Roland Dreier - 1.0-0.3.rc5 +- New upstream release + +* Wed Oct 5 2005 Roland Dreier - 1.0-0.2.rc4 +- Update to upstream 1.0-rc4 release + +* Mon Sep 26 2005 Roland Dreier - 1.0-0.1.rc3 +- Initial attempt at Fedora Extras-compliant spec file diff -ruNp ORG/librdmaverbs/configure.in NEW/librdmaverbs/configure.in --- ORG/librdmaverbs/configure.in 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/configure.in 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,37 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(librdmaverbs, 1.1-pre1, openib-general at openib.org) +AC_CONFIG_SRCDIR([src/rdmaverbs.h]) +AC_CONFIG_AUX_DIR(config) +AM_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE(librdmaverbs, 1.1-pre1) + +AM_PROG_LIBTOOL + +dnl Checks for programs +AC_PROG_CC + +dnl Checks for libraries +AC_CHECK_LIB(dl, dlsym, [], + AC_MSG_ERROR([dlsym() not found. librdmaverbs requires libdl.])) +AC_CHECK_LIB(pthread, pthread_mutex_init, [], + AC_MSG_ERROR([pthread_mutex_init() not found. librdmaverbs requires libpthread.])) + +dnl Checks for header files. +AC_HEADER_STDC + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST + +AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, + if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then + ac_cv_version_script=yes + else + ac_cv_version_script=no + fi) + +AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") + +AC_CONFIG_FILES([Makefile librdmaverbs.spec]) +AC_OUTPUT From halr at voltaire.com Mon Jul 10 03:07:34 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 06:07:34 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_vendor_sa_api.h: Add MCMemberRecord query result support Message-ID: <1152457325.4508.16395.camel@hal.voltaire.com> OpenSM/osm_vendor_sa_api.h: Add MCMemberRecord query result support Signed-off-by: Hal Rosenstock Index: osm/include/vendor/osm_vendor_sa_api.h =================================================================== --- osm/include/vendor/osm_vendor_sa_api.h (revision 8462) +++ osm/include/vendor/osm_vendor_sa_api.h (working copy) @@ -609,6 +609,47 @@ osmv_get_query_svc_rec( * osmv_query_res_t, osm_madw_t, osmv_get_query_result, ib_service_record_t *****/ +/****f* OpenSM Vendor SA Client/osmv_get_query_mc_rec +* NAME +* osmv_get_query_mc_rec +* +* DESCRIPTION +* Retrieves a multicast record result from a MAD returned by a call to +* osmv_query_sa(). +* +* SYNOPSIS +*/ +static inline ib_member_rec_t* +osmv_get_query_mc_rec( + IN osm_madw_t *p_result_madw, + IN uint32_t result_index ) +{ + ib_sa_mad_t *p_sa_mad; + + CL_ASSERT( p_result_madw ); + p_sa_mad = (ib_sa_mad_t*)osm_madw_get_mad_ptr( p_result_madw ); + CL_ASSERT( p_sa_mad && p_sa_mad->attr_id == IB_MAD_ATTR_MCMEMBER_RECORD ); + + return( (ib_member_rec_t*)osmv_get_query_result( p_result_madw, + result_index ) ); +} +/* +* PARAMETERS +* p_result_madw +* [in] This is a reference to the MAD returned as a result of the +* query. +* +* result_index +* [in] A zero-based index indicating which result to return. +* +* NOTES +* This call returns a pointer to the start of a service record result from +* a call to osmv_query_sa(). +* +* SEE ALSO +* osmv_query_res_t, osm_madw_t, osmv_get_query_result, ib_member_rec_t +*****/ + /****f* OpenSM Vendor SA Client/osmv_pfn_query_cb_t * NAME * osmv_pfn_query_cb_t From halr at voltaire.com Mon Jul 10 03:07:38 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 06:07:38 -0400 Subject: [openib-general] [PATCH] diags/saquery: Add support to query multicast groups Message-ID: <1152457330.4508.16397.camel@hal.voltaire.com> diags/saquery: Add support to query multicast groups Signed-off-by: Hal Rosenstock Index: diags/src/saquery.c =================================================================== --- diags/src/saquery.c (revision 8461) +++ diags/src/saquery.c (working copy) @@ -181,6 +181,26 @@ print_path_record(ib_path_rec_t *p_pr) } static void +print_multicast_group_record(ib_member_rec_t *p_mcmr) +{ + printf("\nMCMemberRecord group dump:\n" + "\t\t\t\tMGID....................0x%016" PRIx64 " : " + "0x%016" PRIx64 "\n" + "\t\t\t\tMlid....................0x%X\n" + "\t\t\t\tMtu.....................0x%X\n" + "\t\t\t\tpkey....................0x%X\n" + "\t\t\t\tRate....................0x%X\n" + "", + cl_ntoh64( p_mcmr->mgid.unicast.prefix ), + cl_ntoh64( p_mcmr->mgid.unicast.interface_id ), + cl_ntoh16( p_mcmr->mlid ), + p_mcmr->mtu, + cl_ntoh16( p_mcmr->pkey ), + p_mcmr->rate + ); +} + +static void return_mad(void) { /* @@ -276,6 +296,26 @@ print_path_records(osm_bind_handle_t bin return (status); } +static ib_api_status_t +print_multicast_group_records(osm_bind_handle_t bind_handle) +{ + int i = 0; + ib_member_rec_t *mcast_record = NULL; + ib_net16_t attr_offset = ib_get_attr_offset(sizeof(*mcast_record)); + ib_api_status_t status; + + status = get_all_records(bind_handle, IB_MAD_ATTR_MCMEMBER_RECORD, attr_offset); + if (status != IB_SUCCESS) + return (status); + + for (i = 0; i < result.result_cnt; i++) { + mcast_record = osmv_get_query_mc_rec(result.p_result_madw, i); + print_multicast_group_record(mcast_record); + } + return_mad(); + return (status); +} + static osm_bind_handle_t get_bind_handle(void) { @@ -349,6 +389,7 @@ usage(void) fprintf(stderr, " -N get NodeRecord info\n"); fprintf(stderr, " -L Return just the Lid of the name specified\n"); fprintf(stderr, " -G Return just the Guid of the name specified\n"); + fprintf(stderr, " -g get multicast group info\n"); exit(-1); } @@ -358,12 +399,13 @@ main(int argc, char **argv) int ch = 0; osm_bind_handle_t bind_handle; - static char const str_opts[] = "PNLGdh"; + static char const str_opts[] = "PNLGgdh"; static const struct option long_opts [] = { {"P", 0, 0, 'P'}, {"N", 0, 0, 'N'}, {"L", 0, 0, 'L'}, {"G", 0, 0, 'G'}, + {"g", 0, 0, 'g'}, {"d", 0, 0, 'd'}, {"help", 0, 0, 'h'}, { } @@ -385,6 +427,9 @@ main(int argc, char **argv) case 'G': node_print_desc = GUID_ONLY; break; + case 'g': + query_type = IB_MAD_ATTR_MCMEMBER_RECORD; + break; case 'd': osm_debug = 1; break; @@ -408,6 +453,9 @@ main(int argc, char **argv) case IB_MAD_ATTR_PATH_RECORD: print_path_records(bind_handle); break; + case IB_MAD_ATTR_MCMEMBER_RECORD: + print_multicast_group_records(bind_handle); + break; default: fprintf(stderr, "Unknown query type %d\n", query_type); break; From halr at voltaire.com Mon Jul 10 03:07:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 06:07:43 -0400 Subject: [openib-general] [PATCH] OpenSM/SA client API: Support attribute modifer in user defined requests Message-ID: <1152457335.4508.16399.camel@hal.voltaire.com> OpenSM/SA client API: Support attribute modifer in user defined requests Signed-off-by: Hal Rosenstock Index: include/vendor/osm_vendor_sa_api.h =================================================================== --- include/vendor/osm_vendor_sa_api.h (revision 8461) +++ include/vendor/osm_vendor_sa_api.h (working copy) @@ -192,6 +192,7 @@ typedef struct _osmv_user_query uint8_t method; ib_net16_t attr_id; ib_net16_t attr_offset; + ib_net32_t attr_mod; ib_net64_t comp_mask; void *p_attr; } osmv_user_query_t; @@ -209,6 +210,9 @@ typedef struct _osmv_user_query * this value by passing in the sizeof( attribute ) into the * ib_get_attr_offset() routine. * +* attr_mod +* Attribute modifier for query request. +* * comp_mask * Indicates the attribute components that are specified for the * query. Index: libvendor/osm_vendor_mlx_sa.c =================================================================== --- libvendor/osm_vendor_mlx_sa.c (revision 8461) +++ libvendor/osm_vendor_mlx_sa.c (working copy) @@ -452,6 +452,7 @@ typedef struct _osmv_sa_mad_data uint8_t method; ib_net16_t attr_id; ib_net16_t attr_offset; + ib_net32_t attr_mod; ib_net64_t comp_mask; void *p_attr; } osmv_sa_mad_data_t; @@ -465,6 +466,9 @@ typedef struct _osmv_sa_mad_data * attr_offset * Offset as defined by RMPP * + * attr_mod + * Attribute modifier + * * comp_mask * The component mask of the query * @@ -548,7 +552,7 @@ __osmv_send_sa_req( p_sa_mad_data->method, /* method */ cl_hton64( ( uint64_t ) trans_id ),/* tid */ p_sa_mad_data->attr_id, /* attr id */ - 0 /* attr mod */ + p_sa_mad_data->attr_mod /* attr mod */ ); /* Set the query information. */ @@ -640,6 +644,7 @@ osmv_query_sa( /* Set the request information. */ sa_mad_data.method = IB_MAD_METHOD_GETTABLE; + sa_mad_data.attr_mod = 0; /* Set the MAD attributes and component mask correctly. */ switch ( p_query_req->query_type ) @@ -652,6 +657,7 @@ osmv_query_sa( if (p_user_query->method) sa_mad_data.method = p_user_query->method; sa_mad_data.attr_offset = p_user_query->attr_offset; sa_mad_data.attr_id = p_user_query->attr_id; + sa_mad_data.attr_mod = p_user_query->attr_mod; sa_mad_data.comp_mask = p_user_query->comp_mask; sa_mad_data.p_attr = p_user_query->p_attr; break; Index: libvendor/osm_vendor_ibumad_sa.c =================================================================== --- libvendor/osm_vendor_ibumad_sa.c (revision 8461) +++ libvendor/osm_vendor_ibumad_sa.c (working copy) @@ -444,6 +444,7 @@ typedef struct _osmv_sa_mad_data uint8_t method; ib_net16_t attr_id; ib_net16_t attr_offset; + ib_net32_t attr_mod; ib_net64_t comp_mask; void *p_attr; } osmv_sa_mad_data_t; @@ -457,6 +458,9 @@ typedef struct _osmv_sa_mad_data * attr_offset * Offset as defined by RMPP * + * attr_mod + * Attribute modifier + * * comp_mask * The component mask of the query * @@ -540,7 +544,7 @@ __osmv_send_sa_req( p_sa_mad_data->method, /* method */ cl_hton64( ( uint64_t ) trans_id ),/* tid */ p_sa_mad_data->attr_id, /* attr id */ - 0 /* attr mod */ + p_sa_mad_data->attr_mod /* attr mod */ ); /* Set the query information. */ @@ -635,6 +639,7 @@ osmv_query_sa( /* Set the request information. */ sa_mad_data.method = IB_MAD_METHOD_GETTABLE; + sa_mad_data.attr_mod = 0; /* Set the MAD attributes and component mask correctly. */ switch ( p_query_req->query_type ) @@ -648,6 +653,7 @@ osmv_query_sa( sa_mad_data.method = p_user_query->method; sa_mad_data.attr_offset = p_user_query->attr_offset; sa_mad_data.attr_id = p_user_query->attr_id; + sa_mad_data.attr_mod = p_user_query->attr_mod; sa_mad_data.comp_mask = p_user_query->comp_mask; sa_mad_data.p_attr = p_user_query->p_attr; break; Index: libvendor/libosmvendor.ver =================================================================== --- libvendor/libosmvendor.ver (revision 8461) +++ libvendor/libosmvendor.ver (working copy) @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=1:0:0 +LIBVERSION=2:0:0 From halr at voltaire.com Mon Jul 10 03:07:46 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 06:07:46 -0400 Subject: [openib-general] [PATCH] diags/saquery: Add SA PortInfoRecord:CapabilityMask IsSM query Message-ID: <1152457341.4508.16401.camel@hal.voltaire.com> diags/saquery: Add SA PortInfoRecord:CapabilityMask IsSM query This patch is predicated on SA client API change for user defined queries supporting attribute modifier. Signed-off-by: Hal Rosenstock Index: src/saquery.c =================================================================== --- src/saquery.c (revision 8461) +++ src/saquery.c (working copy) @@ -181,6 +181,26 @@ print_path_record(ib_path_rec_t *p_pr) } static void +print_portinfo_record(ib_portinfo_record_t *p_pir) +{ + const ib_port_info_t * const p_pi = &p_pir->port_info; + + printf("\nPortInfoRecord dump:\n" + "\t\t\t\tEndPortLid..............0x%X\n" + "\t\t\t\tPortNum.................0x%X\n" + "\t\t\t\tbase_lid................0x%X\n" + "\t\t\t\tmaster_sm_base_lid......0x%X\n" + "\t\t\t\tcapability_mask.........0x%X\n" + "", + cl_ntoh16(p_pir->lid), + p_pir->port_num, + cl_ntoh16( p_pi->base_lid ), + cl_ntoh16( p_pi->master_sm_base_lid ), + cl_ntoh32( p_pi->capability_mask ) + ); +} + +static void return_mad(void) { /* @@ -233,6 +253,51 @@ get_all_records(osm_bind_handle_t bind_h return (status); } +/* + * Get the portinfo records available with IsSM CapabilityMask bit on. + */ +static ib_api_status_t +get_issm_records(osm_bind_handle_t bind_handle) +{ + ib_api_status_t status; + osmv_query_req_t req; + osmv_user_query_t user; + ib_portinfo_record_t attr; + + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &attr, 0, sizeof ( attr ) ); + attr.port_info.capability_mask = IB_PORT_CAP_IS_SM; + + user.attr_id = IB_MAD_ATTR_PORTINFO_RECORD; + user.attr_offset = ib_get_attr_offset(sizeof(ib_portinfo_record_t)); + user.attr_mod = cl_ntoh32(1 << 31); /* enhanced query */ + user.comp_mask = IB_PIR_COMPMASK_CAPMASK; + user.p_attr = &attr; + + req.query_type = OSMV_QUERY_USER_DEFINED; + req.timeout_ms = 100; + req.retry_cnt = 1; + req.flags = OSM_SA_FLAGS_SYNC; + req.query_context = NULL; + req.pfn_query_cb = query_res_cb; + req.p_query_input = &user; + req.sm_key = 0; + + if ((status = osmv_query_sa(bind_handle, &req)) != IB_SUCCESS) { + fprintf(stderr, "Query SA failed: %s\n", + ib_get_err_str(status)); + return (status); + } + + if (result.status != IB_SUCCESS) { + fprintf(stderr, "Query result returned: %s\n", + ib_get_err_str(result.status)); + return (result.status); + } + return (status); +} + static ib_api_status_t print_node_records(osm_bind_handle_t bind_handle) { @@ -276,6 +341,25 @@ print_path_records(osm_bind_handle_t bin return (status); } +ib_api_status_t +print_portinfo_records(osm_bind_handle_t bind_handle) +{ + int i = 0; + ib_portinfo_record_t *portinfo_record = NULL; + ib_api_status_t status; + + status = get_issm_records(bind_handle); + if (status != IB_SUCCESS) + return (status); + + for (i = 0; i < result.result_cnt; i++) { + portinfo_record = osmv_get_query_portinfo_rec(result.p_result_madw, i); + print_portinfo_record(portinfo_record); + } + return_mad(); + return (status); +} + static osm_bind_handle_t get_bind_handle(void) { @@ -342,13 +426,14 @@ clean_up(void) static void usage(void) { - fprintf(stderr, "Usage: %s [-h -d -P -N -L -G][]\n", argv0); + fprintf(stderr, "Usage: %s [-h -d -P -N -L -G -s][]\n", argv0); fprintf(stderr, " Queries node records by default\n"); fprintf(stderr, " -d enable debugging\n"); fprintf(stderr, " -P get PathRecord info\n"); fprintf(stderr, " -N get NodeRecord info\n"); fprintf(stderr, " -L Return just the Lid of the name specified\n"); fprintf(stderr, " -G Return just the Guid of the name specified\n"); + fprintf(stderr, " -s Return the PortInfoRecords with isSM capability mask bit on\n"); exit(-1); } @@ -358,12 +443,13 @@ main(int argc, char **argv) int ch = 0; osm_bind_handle_t bind_handle; - static char const str_opts[] = "PNLGdh"; + static char const str_opts[] = "PNLGsdh"; static const struct option long_opts [] = { {"P", 0, 0, 'P'}, {"N", 0, 0, 'N'}, {"L", 0, 0, 'L'}, {"G", 0, 0, 'G'}, + {"s", 0, 0, 's'}, {"d", 0, 0, 'd'}, {"help", 0, 0, 'h'}, { } @@ -385,6 +471,9 @@ main(int argc, char **argv) case 'G': node_print_desc = GUID_ONLY; break; + case 's': + query_type = IB_MAD_ATTR_PORTINFO_RECORD; + break; case 'd': osm_debug = 1; break; @@ -408,6 +497,9 @@ main(int argc, char **argv) case IB_MAD_ATTR_PATH_RECORD: print_path_records(bind_handle); break; + case IB_MAD_ATTR_PORTINFO_RECORD: + print_portinfo_records(bind_handle); + break; default: fprintf(stderr, "Unknown query type %d\n", query_type); break; From krkumar2 at in.ibm.com Mon Jul 10 03:12:34 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 15:42:34 +0530 Subject: [openib-general] [RFC] [PATCH 3/7] ibrdmaverbs hdr files Message-ID: <1152526354.28026.1.camel@localhost.localdomain> diff -ruNp ORG/librdmaverbs/include/rdma/arch.h NEW/librdmaverbs/include/rdma/arch.h --- ORG/librdmaverbs/include/rdma/arch.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/include/rdma/arch.h 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,95 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: arch.h 6987 2006-05-08 15:18:51Z tom $ + */ + +#ifndef INFINIBAND_ARCH_H +#define INFINIBAND_ARCH_H + +#include +#include + +#if __BYTE_ORDER == __LITTLE_ENDIAN +static inline uint64_t htonll(uint64_t x) { return bswap_64(x); } +static inline uint64_t ntohll(uint64_t x) { return bswap_64(x); } +#elif __BYTE_ORDER == __BIG_ENDIAN +static inline uint64_t htonll(uint64_t x) { return x; } +static inline uint64_t ntohll(uint64_t x) { return x; } +#else +#error __BYTE_ORDER is neither __LITTLE_ENDIAN nor __BIG_ENDIAN +#endif + +/* + * Architecture-specific defines. Currently, an architecture is + * required to implement the following operations: + * + * mb() - memory barrier. No loads or stores may be reordered across + * this macro by either the compiler or the CPU. + */ + +#if defined(__i386__) + +#define mb() asm volatile("" ::: "memory") + +#elif defined(__x86_64__) + +#define mb() asm volatile("" ::: "memory") + +#elif defined(__PPC64__) + +#define mb() asm volatile("sync" ::: "memory") + +#elif defined(__ia64__) + +#define mb() asm volatile("mf" ::: "memory") + +#elif defined(__PPC__) + +#define mb() asm volatile("sync" ::: "memory") + +#elif defined(__sparc_v9__) + +#define mb() asm volatile("membar #LoadLoad | #LoadStore | #StoreStore | #StoreLoad" ::: "memory") + +#elif defined(__sparc__) + +#define mb() asm volatile("sync" ::: "memory") + +#else + +#warning No architecture specific defines found. Using generic implementation. + +#define mb() asm volatile("" ::: "memory") + +#endif + +#endif /* INFINIBAND_ARCH_H */ diff -ruNp ORG/librdmaverbs/include/rdma/driver.h NEW/librdmaverbs/include/rdma/driver.h --- ORG/librdmaverbs/include/rdma/driver.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/include/rdma/driver.h 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,146 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2005 PathScale, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: driver.h 7631 2006-06-02 19:53:25Z swise $ + */ + +#ifndef INFINIBAND_DRIVER_H +#define INFINIBAND_DRIVER_H + +#include +#include + +#ifdef __cplusplus +# define BEGIN_C_DECLS extern "C" { +# define END_C_DECLS } +#else /* !__cplusplus */ +# define BEGIN_C_DECLS +# define END_C_DECLS +#endif /* __cplusplus */ + +/* + * Device-specific drivers should declare their device init function + * as below (the name must be "openib_driver_init"): + * + * struct rdma_device *rdma_driver_init(const char *uverbs_sys_path, + * int abi_version); + * + * librdmaverbs will call each driver's rdma_driver_init() function once + * for each InfiniBand device. If the device is one that the driver + * can support, it should return a struct rdma_device * with the ops + * member filled in. If the driver does not support the device, it + * should return NULL from openib_driver_init(). + */ + +typedef struct rdma_device *(*rdma_driver_init_func)(const char *, int); + +int rdma_cmd_get_context(struct rdma_context *context, struct rdma_get_context *cmd, + size_t cmd_size, struct rdma_get_context_resp *resp, + size_t resp_size); +int rdma_cmd_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr, + uint64_t *raw_fw_ver, + struct rdma_query_device *cmd, size_t cmd_size); +int rdma_cmd_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr, + struct rdma_query_port *cmd, size_t cmd_size); +int rdma_cmd_query_gid(struct rdma_context *context, uint8_t port_num, + int index, union rdma_gid *gid); +int rdma_cmd_query_pkey(struct rdma_context *context, uint8_t port_num, + int index, uint16_t *pkey); +int rdma_cmd_alloc_pd(struct rdma_context *context, struct rdma_pd *pd, + struct rdma_alloc_pd *cmd, size_t cmd_size, + struct rdma_alloc_pd_resp *resp, size_t resp_size); +int rdma_cmd_dealloc_pd(struct rdma_pd *pd); +int rdma_cmd_reg_mr(struct rdma_pd *pd, void *addr, size_t length, + uint64_t hca_va, enum rdma_access_flags access, + struct rdma_mr *mr, struct rdma_reg_mr *cmd, + size_t cmd_size); +int rdma_cmd_dereg_mr(struct rdma_mr *mr); +int rdma_cmd_create_cq(struct rdma_context *context, int cqe, + struct rdma_comp_channel *channel, + int comp_vector, struct rdma_cq *cq, + struct rdma_create_cq *cmd, size_t cmd_size, + struct rdma_create_cq_resp *resp, size_t resp_size); +int rdma_cmd_poll_cq(struct rdma_cq *cq, int ne, struct rdma_wc *wc); +int rdma_cmd_req_notify_cq(struct rdma_cq *cq, int solicited_only); +int rdma_cmd_resize_cq(struct rdma_cq *cq, int cqe, + struct rdma_resize_cq *cmd, size_t cmd_size); +int rdma_cmd_destroy_cq(struct rdma_cq *cq); + +int rdma_cmd_create_srq(struct rdma_pd *pd, + struct rdma_srq *srq, struct rdma_srq_init_attr *attr, + struct rdma_create_srq *cmd, size_t cmd_size, + struct rdma_create_srq_resp *resp, size_t resp_size); +int rdma_cmd_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask, + struct rdma_modify_srq *cmd, size_t cmd_size); +int rdma_cmd_query_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + struct rdma_query_srq *cmd, size_t cmd_size); +int rdma_cmd_destroy_srq(struct rdma_srq *srq); + +int rdma_cmd_create_qp(struct rdma_pd *pd, + struct rdma_qp *qp, struct rdma_qp_init_attr *attr, + struct rdma_create_qp *cmd, size_t cmd_size, + struct rdma_create_qp_resp *resp, size_t resp_size); +int rdma_cmd_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *qp_attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *qp_init_attr, + struct rdma_query_qp *cmd, size_t cmd_size); +int rdma_cmd_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_modify_qp *cmd, size_t cmd_size); +int rdma_cmd_destroy_qp(struct rdma_qp *qp); +int rdma_cmd_post_send(struct rdma_qp *ibqp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr); +int rdma_cmd_post_recv(struct rdma_qp *ibqp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr); +int rdma_cmd_post_srq_recv(struct rdma_srq *srq, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr); +int rdma_cmd_create_ah(struct rdma_pd *pd, struct rdma_ah *ah, + struct rdma_ah_attr *attr); +int rdma_cmd_destroy_ah(struct rdma_ah *ah); +int rdma_cmd_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid); +int rdma_cmd_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid); + +/* + * sysfs helper functions + */ +const char *rdma_get_sysfs_path(void); + +int rdma_read_sysfs_file(const char *dir, const char *file, + char *buf, size_t size); + +#endif /* INFINIBAND_DRIVER_H */ diff -ruNp ORG/librdmaverbs/include/rdma/kern-abi.h NEW/librdmaverbs/include/rdma/kern-abi.h --- ORG/librdmaverbs/include/rdma/kern-abi.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/include/rdma/kern-abi.h 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,881 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2005 PathScale, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: kern-abi.h 5640 2006-03-06 21:36:57Z tom $ + */ + +#ifndef KERN_ABI_H +#define KERN_ABI_H + +#include + +/* + * This file must be kept in sync with the kernel's version of + * drivers/infiniband/include/ib_user_verbs.h + */ + +/* + * The minimum and maximum kernel ABI that we can handle. + */ +#define RDMA_USER_VERBS_MIN_ABI_VERSION 1 +#define RDMA_USER_VERBS_MAX_ABI_VERSION 6 + +enum { + RDMA_USER_VERBS_CMD_GET_CONTEXT, + RDMA_USER_VERBS_CMD_QUERY_DEVICE, + RDMA_USER_VERBS_CMD_QUERY_PORT, + RDMA_USER_VERBS_CMD_ALLOC_PD, + RDMA_USER_VERBS_CMD_DEALLOC_PD, + RDMA_USER_VERBS_CMD_CREATE_AH, + RDMA_USER_VERBS_CMD_MODIFY_AH, + RDMA_USER_VERBS_CMD_QUERY_AH, + RDMA_USER_VERBS_CMD_DESTROY_AH, + RDMA_USER_VERBS_CMD_REG_MR, + RDMA_USER_VERBS_CMD_REG_SMR, + RDMA_USER_VERBS_CMD_REREG_MR, + RDMA_USER_VERBS_CMD_QUERY_MR, + RDMA_USER_VERBS_CMD_DEREG_MR, + RDMA_USER_VERBS_CMD_ALLOC_MW, + RDMA_USER_VERBS_CMD_BIND_MW, + RDMA_USER_VERBS_CMD_DEALLOC_MW, + RDMA_USER_VERBS_CMD_CREATE_COMP_CHANNEL, + RDMA_USER_VERBS_CMD_CREATE_CQ, + RDMA_USER_VERBS_CMD_RESIZE_CQ, + RDMA_USER_VERBS_CMD_DESTROY_CQ, + RDMA_USER_VERBS_CMD_POLL_CQ, + RDMA_USER_VERBS_CMD_PEEK_CQ, + RDMA_USER_VERBS_CMD_REQ_NOTIFY_CQ, + RDMA_USER_VERBS_CMD_CREATE_QP, + RDMA_USER_VERBS_CMD_QUERY_QP, + RDMA_USER_VERBS_CMD_MODIFY_QP, + RDMA_USER_VERBS_CMD_DESTROY_QP, + RDMA_USER_VERBS_CMD_POST_SEND, + RDMA_USER_VERBS_CMD_POST_RECV, + RDMA_USER_VERBS_CMD_ATTACH_MCAST, + RDMA_USER_VERBS_CMD_DETACH_MCAST, + RDMA_USER_VERBS_CMD_CREATE_SRQ, + RDMA_USER_VERBS_CMD_MODIFY_SRQ, + RDMA_USER_VERBS_CMD_QUERY_SRQ, + RDMA_USER_VERBS_CMD_DESTROY_SRQ, + RDMA_USER_VERBS_CMD_POST_SRQ_RECV +}; + +/* + * Make sure that all structs defined in this file remain laid out so + * that they pack the same way on 32-bit and 64-bit architectures (to + * avoid incompatibility between 32-bit userspace and 64-bit kernels). + * Specifically: + * - Do not use pointer types -- pass pointers in __u64 instead. + * - Make sure that any structure larger than 4 bytes is padded to a + * multiple of 8 bytes. Otherwise the structure size will be + * different between 32-bit and 64-bit architectures. + */ + +struct rdma_kern_async_event { + __u64 element; + __u32 event_type; + __u32 reserved; +}; + +struct rdma_comp_event { + __u64 cq_handle; +}; + +/* + * All commands from userspace should start with a __u32 command field + * followed by __u16 in_words and out_words fields (which give the + * length of the command block and response buffer if any in 32-bit + * words). The kernel driver will read these fields first and read + * the rest of the command struct based on these value. + */ + +struct rdma_query_params { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; +}; + +struct rdma_query_params_resp { + __u32 num_cq_events; +}; + +struct rdma_get_context { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 driver_data[0]; +}; + +struct rdma_get_context_resp { + __u32 async_fd; + __u32 num_comp_vectors; +}; + +struct rdma_query_device { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 driver_data[0]; +}; + +struct rdma_query_device_resp { + __u64 fw_ver; + __u64 node_guid; + __u64 sys_image_guid; + __u64 max_mr_size; + __u64 page_size_cap; + __u32 vendor_id; + __u32 vendor_part_id; + __u32 hw_ver; + __u32 max_qp; + __u32 max_qp_wr; + __u32 device_cap_flags; + __u32 max_sge; + __u32 max_sge_rd; + __u32 max_cq; + __u32 max_cqe; + __u32 max_mr; + __u32 max_pd; + __u32 max_qp_rd_atom; + __u32 max_ee_rd_atom; + __u32 max_res_rd_atom; + __u32 max_qp_init_rd_atom; + __u32 max_ee_init_rd_atom; + __u32 atomic_cap; + __u32 max_ee; + __u32 max_rdd; + __u32 max_mw; + __u32 max_raw_ipv6_qp; + __u32 max_raw_ethy_qp; + __u32 max_mcast_grp; + __u32 max_mcast_qp_attach; + __u32 max_total_mcast_qp_attach; + __u32 max_ah; + __u32 max_fmr; + __u32 max_map_per_fmr; + __u32 max_srq; + __u32 max_srq_wr; + __u32 max_srq_sge; + __u16 max_pkeys; + __u8 local_ca_ack_delay; + __u8 phys_port_cnt; + __u8 reserved[4]; +}; + +struct rdma_query_port { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u8 port_num; + __u8 reserved[7]; + __u64 driver_data[0]; +}; + +struct rdma_query_port_resp { + __u32 port_cap_flags; + __u32 max_msg_sz; + __u32 bad_pkey_cntr; + __u32 qkey_viol_cntr; + __u32 gid_tbl_len; + __u16 pkey_tbl_len; + __u16 lid; + __u16 sm_lid; + __u8 state; + __u8 max_mtu; + __u8 active_mtu; + __u8 lmc; + __u8 max_vl_num; + __u8 sm_sl; + __u8 subnet_timeout; + __u8 init_type_reply; + __u8 active_width; + __u8 active_speed; + __u8 phys_state; + __u8 reserved[3]; +}; + +struct rdma_alloc_pd { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 driver_data[0]; +}; + +struct rdma_alloc_pd_resp { + __u32 pd_handle; +}; + +struct rdma_dealloc_pd { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 pd_handle; +}; + +struct rdma_reg_mr { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 start; + __u64 length; + __u64 hca_va; + __u32 pd_handle; + __u32 access_flags; + __u64 driver_data[0]; +}; + +struct rdma_reg_mr_resp { + __u32 mr_handle; + __u32 lkey; + __u32 rkey; +}; + +struct rdma_dereg_mr { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 mr_handle; +}; + +struct rdma_create_comp_channel { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; +}; + +struct rdma_create_comp_channel_resp { + __u32 fd; +}; + +struct rdma_create_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 cqe; + __u32 comp_vector; + __s32 comp_channel; + __u32 reserved; + __u64 driver_data[0]; +}; + +struct rdma_create_cq_resp { + __u32 cq_handle; + __u32 cqe; +}; + +struct rdma_kern_wc { + __u64 wr_id; + __u32 status; + __u32 opcode; + __u32 vendor_err; + __u32 byte_len; + __u32 imm_data; + __u32 qp_num; + __u32 src_qp; + __u32 wc_flags; + __u16 pkey_index; + __u16 slid; + __u8 sl; + __u8 dlid_path_bits; + __u8 port_num; + __u8 reserved; +}; + +struct rdma_poll_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 cq_handle; + __u32 ne; +}; + +struct rdma_poll_cq_resp { + __u32 count; + __u32 reserved; + struct rdma_kern_wc wc[0]; +}; + +struct rdma_req_notify_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 cq_handle; + __u32 solicited; +}; + +struct rdma_resize_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 cq_handle; + __u32 cqe; + __u64 driver_data[0]; +}; + +struct rdma_resize_cq_resp { + __u32 cqe; +}; + +struct rdma_destroy_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 cq_handle; + __u32 reserved; +}; + +struct rdma_destroy_cq_resp { + __u32 comp_events_reported; + __u32 async_events_reported; +}; + +struct rdma_kern_global_route { + __u8 dgid[16]; + __u32 flow_label; + __u8 sgid_index; + __u8 hop_limit; + __u8 traffic_class; + __u8 reserved; +}; + +struct rdma_kern_ah_attr { + struct rdma_kern_global_route grh; + __u16 dlid; + __u8 sl; + __u8 src_path_bits; + __u8 static_rate; + __u8 is_global; + __u8 port_num; + __u8 reserved; +}; + +struct rdma_kern_qp_attr { + __u32 qp_attr_mask; + __u32 qp_state; + __u32 cur_qp_state; + __u32 path_mtu; + __u32 path_mig_state; + __u32 qkey; + __u32 rq_psn; + __u32 sq_psn; + __u32 dest_qp_num; + __u32 qp_access_flags; + + struct rdma_kern_ah_attr ah_attr; + struct rdma_kern_ah_attr alt_ah_attr; + + /* ib_qp_cap */ + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + + __u16 pkey_index; + __u16 alt_pkey_index; + __u8 en_sqd_async_notify; + __u8 sq_draining; + __u8 max_rd_atomic; + __u8 max_dest_rd_atomic; + __u8 min_rnr_timer; + __u8 port_num; + __u8 timeout; + __u8 retry_cnt; + __u8 rnr_retry; + __u8 alt_port_num; + __u8 alt_timeout; + __u8 reserved[5]; +}; + +struct rdma_create_qp { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 pd_handle; + __u32 send_cq_handle; + __u32 recv_cq_handle; + __u32 srq_handle; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + __u8 sq_sig_all; + __u8 qp_type; + __u8 is_srq; + __u8 reserved; + __u64 driver_data[0]; +}; + +struct rdma_create_qp_resp { + __u32 qp_handle; + __u32 qpn; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + __u32 reserved; +}; + +struct rdma_qp_dest { + __u8 dgid[16]; + __u32 flow_label; + __u16 dlid; + __u16 reserved; + __u8 sgid_index; + __u8 hop_limit; + __u8 traffic_class; + __u8 sl; + __u8 src_path_bits; + __u8 static_rate; + __u8 is_global; + __u8 port_num; +}; + +struct rdma_query_qp { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 qp_handle; + __u32 attr_mask; + __u64 driver_data[0]; +}; + +struct rdma_query_qp_resp { + struct rdma_qp_dest dest; + struct rdma_qp_dest alt_dest; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + __u32 qkey; + __u32 rq_psn; + __u32 sq_psn; + __u32 dest_qp_num; + __u32 qp_access_flags; + __u16 pkey_index; + __u16 alt_pkey_index; + __u8 qp_state; + __u8 cur_qp_state; + __u8 path_mtu; + __u8 path_mig_state; + __u8 en_sqd_async_notify; + __u8 max_rd_atomic; + __u8 max_dest_rd_atomic; + __u8 min_rnr_timer; + __u8 port_num; + __u8 timeout; + __u8 retry_cnt; + __u8 rnr_retry; + __u8 alt_port_num; + __u8 alt_timeout; + __u8 sq_sig_all; + __u8 reserved[5]; + __u64 driver_data[0]; +}; + +struct rdma_modify_qp { + __u32 command; + __u16 in_words; + __u16 out_words; + struct rdma_qp_dest dest; + struct rdma_qp_dest alt_dest; + __u32 qp_handle; + __u32 attr_mask; + __u32 qkey; + __u32 rq_psn; + __u32 sq_psn; + __u32 dest_qp_num; + __u32 qp_access_flags; + __u16 pkey_index; + __u16 alt_pkey_index; + __u8 qp_state; + __u8 cur_qp_state; + __u8 path_mtu; + __u8 path_mig_state; + __u8 en_sqd_async_notify; + __u8 max_rd_atomic; + __u8 max_dest_rd_atomic; + __u8 min_rnr_timer; + __u8 port_num; + __u8 timeout; + __u8 retry_cnt; + __u8 rnr_retry; + __u8 alt_port_num; + __u8 alt_timeout; + __u8 reserved[2]; + __u64 driver_data[0]; +}; + +struct rdma_destroy_qp { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 qp_handle; + __u32 reserved; +}; + +struct rdma_destroy_qp_resp { + __u32 events_reported; +}; + +struct rdma_kern_send_wr { + __u64 wr_id; + __u32 num_sge; + __u32 opcode; + __u32 send_flags; + __u32 imm_data; + union { + struct { + __u64 remote_addr; + __u32 rkey; + __u32 reserved; + } rdma; + struct { + __u64 remote_addr; + __u64 compare_add; + __u64 swap; + __u32 rkey; + __u32 reserved; + } atomic; + struct { + __u32 ah; + __u32 remote_qpn; + __u32 remote_qkey; + __u32 reserved; + } ud; + } wr; +}; + +struct rdma_post_send { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 qp_handle; + __u32 wr_count; + __u32 sge_count; + __u32 wqe_size; + struct rdma_kern_send_wr send_wr[0]; +}; + +struct rdma_post_send_resp { + __u32 bad_wr; +}; + +struct rdma_kern_recv_wr { + __u64 wr_id; + __u32 num_sge; + __u32 reserved; +}; + +struct rdma_post_recv { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 qp_handle; + __u32 wr_count; + __u32 sge_count; + __u32 wqe_size; + struct rdma_kern_recv_wr recv_wr[0]; +}; + +struct rdma_post_recv_resp { + __u32 bad_wr; +}; + +struct rdma_post_srq_recv { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 srq_handle; + __u32 wr_count; + __u32 sge_count; + __u32 wqe_size; + struct rdma_kern_recv_wr recv_wr[0]; +}; + +struct rdma_post_srq_recv_resp { + __u32 bad_wr; +}; + +struct rdma_create_ah { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 pd_handle; + __u32 reserved; + struct rdma_kern_ah_attr attr; +}; + +struct rdma_create_ah_resp { + __u32 handle; +}; + +struct rdma_destroy_ah { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 ah_handle; +}; + +struct rdma_attach_mcast { + __u32 command; + __u16 in_words; + __u16 out_words; + __u8 gid[16]; + __u32 qp_handle; + __u16 mlid; + __u16 reserved; + __u64 driver_data[0]; +}; + +struct rdma_detach_mcast { + __u32 command; + __u16 in_words; + __u16 out_words; + __u8 gid[16]; + __u32 qp_handle; + __u16 mlid; + __u16 reserved; + __u64 driver_data[0]; +}; + +struct rdma_create_srq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 pd_handle; + __u32 max_wr; + __u32 max_sge; + __u32 srq_limit; + __u64 driver_data[0]; +}; + +struct rdma_create_srq_resp { + __u32 srq_handle; + __u32 max_wr; + __u32 max_sge; + __u32 reserved; +}; + +struct rdma_modify_srq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 srq_handle; + __u32 attr_mask; + __u32 max_wr; + __u32 srq_limit; + __u64 driver_data[0]; +}; + +struct rdma_query_srq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 srq_handle; + __u32 reserved; + __u64 driver_data[0]; +}; + +struct rdma_query_srq_resp { + __u32 max_wr; + __u32 max_sge; + __u32 srq_limit; + __u32 reserved; +}; + +struct rdma_destroy_srq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 srq_handle; + __u32 reserved; +}; + +struct rdma_destroy_srq_resp { + __u32 events_reported; +}; + +/* + * Compatibility with older ABI versions + */ + +enum { + RDMA_USER_VERBS_CMD_QUERY_PARAMS_V2, + RDMA_USER_VERBS_CMD_GET_CONTEXT_V2, + RDMA_USER_VERBS_CMD_QUERY_DEVICE_V2, + RDMA_USER_VERBS_CMD_QUERY_PORT_V2, + RDMA_USER_VERBS_CMD_QUERY_GID_V2, + RDMA_USER_VERBS_CMD_QUERY_PKEY_V2, + RDMA_USER_VERBS_CMD_ALLOC_PD_V2, + RDMA_USER_VERBS_CMD_DEALLOC_PD_V2, + RDMA_USER_VERBS_CMD_CREATE_AH_V2, + RDMA_USER_VERBS_CMD_MODIFY_AH_V2, + RDMA_USER_VERBS_CMD_QUERY_AH_V2, + RDMA_USER_VERBS_CMD_DESTROY_AH_V2, + RDMA_USER_VERBS_CMD_REG_MR_V2, + RDMA_USER_VERBS_CMD_REG_SMR_V2, + RDMA_USER_VERBS_CMD_REREG_MR_V2, + RDMA_USER_VERBS_CMD_QUERY_MR_V2, + RDMA_USER_VERBS_CMD_DEREG_MR_V2, + RDMA_USER_VERBS_CMD_ALLOC_MW_V2, + RDMA_USER_VERBS_CMD_BIND_MW_V2, + RDMA_USER_VERBS_CMD_DEALLOC_MW_V2, + RDMA_USER_VERBS_CMD_CREATE_CQ_V2, + RDMA_USER_VERBS_CMD_RESIZE_CQ_V2, + RDMA_USER_VERBS_CMD_DESTROY_CQ_V2, + RDMA_USER_VERBS_CMD_POLL_CQ_V2, + RDMA_USER_VERBS_CMD_PEEK_CQ_V2, + RDMA_USER_VERBS_CMD_REQ_NOTIFY_CQ_V2, + RDMA_USER_VERBS_CMD_CREATE_QP_V2, + RDMA_USER_VERBS_CMD_QUERY_QP_V2, + RDMA_USER_VERBS_CMD_MODIFY_QP_V2, + RDMA_USER_VERBS_CMD_DESTROY_QP_V2, + RDMA_USER_VERBS_CMD_POST_SEND_V2, + RDMA_USER_VERBS_CMD_POST_RECV_V2, + RDMA_USER_VERBS_CMD_ATTACH_MCAST_V2, + RDMA_USER_VERBS_CMD_DETACH_MCAST_V2, + RDMA_USER_VERBS_CMD_CREATE_SRQ_V2, + RDMA_USER_VERBS_CMD_MODIFY_SRQ_V2, + RDMA_USER_VERBS_CMD_QUERY_SRQ_V2, + RDMA_USER_VERBS_CMD_DESTROY_SRQ_V2, + RDMA_USER_VERBS_CMD_POST_SRQ_RECV_V2, + /* + * Set commands that didn't exist to -1 so our compile-time + * trick opcodes in RDMA_INIT_CMD() doesn't break. + */ + RDMA_USER_VERBS_CMD_CREATE_COMP_CHANNEL_V2 = -1, +}; + +struct rdma_destroy_cq_v1 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 cq_handle; +}; + +struct rdma_destroy_qp_v1 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 qp_handle; +}; + +struct rdma_destroy_srq_v1 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 srq_handle; +}; + +struct rdma_get_context_v2 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 cq_fd_tab; + __u64 driver_data[0]; +}; + +struct rdma_create_cq_v2 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 cqe; + __u32 event_handler; + __u64 driver_data[0]; +}; + +struct rdma_modify_srq_v3 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 srq_handle; + __u32 attr_mask; + __u32 max_wr; + __u32 max_sge; + __u32 srq_limit; + __u32 reserved; + __u64 driver_data[0]; +}; + +struct rdma_create_qp_resp_v3 { + __u32 qp_handle; + __u32 qpn; +}; + +struct rdma_create_qp_resp_v4 { + __u32 qp_handle; + __u32 qpn; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; +}; + +struct rdma_create_srq_resp_v5 { + __u32 srq_handle; +}; + +#endif /* KERN_ABI_H */ diff -ruNp ORG/librdmaverbs/include/rdma/marshall.h NEW/librdmaverbs/include/rdma/marshall.h --- ORG/librdmaverbs/include/rdma/marshall.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/include/rdma/marshall.h 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,62 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef INFINIBAND_MARSHALL_H +#define INFINIBAND_MARSHALL_H + +#include +#include +#include +#include + +#ifdef __cplusplus +# define BEGIN_C_DECLS extern "C" { +# define END_C_DECLS } +#else /* !__cplusplus */ +# define BEGIN_C_DECLS +# define END_C_DECLS +#endif /* __cplusplus */ + +BEGIN_C_DECLS + +void rdma_copy_qp_attr_from_kern(struct rdma_qp_attr *dst, + struct rdma_kern_qp_attr *src); + +void rdma_copy_path_rec_from_kern(struct rdma_sa_path_rec *dst, + struct rdma_kern_path_rec *src); + +void rdma_copy_path_rec_to_kern(struct rdma_kern_path_rec *dst, + struct rdma_sa_path_rec *src); + +END_C_DECLS + +#endif /* INFINIBAND_MARSHALL_H */ diff -ruNp ORG/librdmaverbs/include/rdma/opcode.h NEW/librdmaverbs/include/rdma/opcode.h --- ORG/librdmaverbs/include/rdma/opcode.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/include/rdma/opcode.h 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,149 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: opcode.h 1989 2005-03-14 20:25:13Z roland $ + */ + +#ifndef INFINIBAND_OPCODE_H +#define INFINIBAND_OPCODE_H + +/* + * This macro cleans up the definitions of constants for BTH opcodes. + * It is used to define constants such as RDMA_OPCODE_UD_SEND_ONLY, + * which becomes RDMA_OPCODE_UD + RDMA_OPCODE_SEND_ONLY, and this gives + * the correct value. + * + * In short, user code should use the constants defined using the + * macro rather than worrying about adding together other constants. +*/ +#define RDMA_OPCODE(transport, op) \ + RDMA_OPCODE_ ## transport ## _ ## op = \ + RDMA_OPCODE_ ## transport + RDMA_OPCODE_ ## op + +enum { + /* transport types -- just used to define real constants */ + RDMA_OPCODE_RC = 0x00, + RDMA_OPCODE_UC = 0x20, + RDMA_OPCODE_RD = 0x40, + RDMA_OPCODE_UD = 0x60, + + /* operations -- just used to define real constants */ + RDMA_OPCODE_SEND_FIRST = 0x00, + RDMA_OPCODE_SEND_MIDDLE = 0x01, + RDMA_OPCODE_SEND_LAST = 0x02, + RDMA_OPCODE_SEND_LAST_WITH_IMMEDIATE = 0x03, + RDMA_OPCODE_SEND_ONLY = 0x04, + RDMA_OPCODE_SEND_ONLY_WITH_IMMEDIATE = 0x05, + RDMA_OPCODE_RDMA_WRITE_FIRST = 0x06, + RDMA_OPCODE_RDMA_WRITE_MIDDLE = 0x07, + RDMA_OPCODE_RDMA_WRITE_LAST = 0x08, + RDMA_OPCODE_RDMA_WRITE_LAST_WITH_IMMEDIATE = 0x09, + RDMA_OPCODE_RDMA_WRITE_ONLY = 0x0a, + RDMA_OPCODE_RDMA_WRITE_ONLY_WITH_IMMEDIATE = 0x0b, + RDMA_OPCODE_RDMA_READ_REQUEST = 0x0c, + RDMA_OPCODE_RDMA_READ_RESPONSE_FIRST = 0x0d, + RDMA_OPCODE_RDMA_READ_RESPONSE_MIDDLE = 0x0e, + RDMA_OPCODE_RDMA_READ_RESPONSE_LAST = 0x0f, + RDMA_OPCODE_RDMA_READ_RESPONSE_ONLY = 0x10, + RDMA_OPCODE_ACKNOWLEDGE = 0x11, + RDMA_OPCODE_ATOMIC_ACKNOWLEDGE = 0x12, + RDMA_OPCODE_COMPARE_SWAP = 0x13, + RDMA_OPCODE_FETCH_ADD = 0x14, + + /* real constants follow -- see comment about above RDMA_OPCODE() + macro for more details */ + + /* RC */ + RDMA_OPCODE(RC, SEND_FIRST), + RDMA_OPCODE(RC, SEND_MIDDLE), + RDMA_OPCODE(RC, SEND_LAST), + RDMA_OPCODE(RC, SEND_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(RC, SEND_ONLY), + RDMA_OPCODE(RC, SEND_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(RC, RDMA_WRITE_FIRST), + RDMA_OPCODE(RC, RDMA_WRITE_MIDDLE), + RDMA_OPCODE(RC, RDMA_WRITE_LAST), + RDMA_OPCODE(RC, RDMA_WRITE_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(RC, RDMA_WRITE_ONLY), + RDMA_OPCODE(RC, RDMA_WRITE_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(RC, RDMA_READ_REQUEST), + RDMA_OPCODE(RC, RDMA_READ_RESPONSE_FIRST), + RDMA_OPCODE(RC, RDMA_READ_RESPONSE_MIDDLE), + RDMA_OPCODE(RC, RDMA_READ_RESPONSE_LAST), + RDMA_OPCODE(RC, RDMA_READ_RESPONSE_ONLY), + RDMA_OPCODE(RC, ACKNOWLEDGE), + RDMA_OPCODE(RC, ATOMIC_ACKNOWLEDGE), + RDMA_OPCODE(RC, COMPARE_SWAP), + RDMA_OPCODE(RC, FETCH_ADD), + + /* UC */ + RDMA_OPCODE(UC, SEND_FIRST), + RDMA_OPCODE(UC, SEND_MIDDLE), + RDMA_OPCODE(UC, SEND_LAST), + RDMA_OPCODE(UC, SEND_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(UC, SEND_ONLY), + RDMA_OPCODE(UC, SEND_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(UC, RDMA_WRITE_FIRST), + RDMA_OPCODE(UC, RDMA_WRITE_MIDDLE), + RDMA_OPCODE(UC, RDMA_WRITE_LAST), + RDMA_OPCODE(UC, RDMA_WRITE_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(UC, RDMA_WRITE_ONLY), + RDMA_OPCODE(UC, RDMA_WRITE_ONLY_WITH_IMMEDIATE), + + /* RD */ + RDMA_OPCODE(RD, SEND_FIRST), + RDMA_OPCODE(RD, SEND_MIDDLE), + RDMA_OPCODE(RD, SEND_LAST), + RDMA_OPCODE(RD, SEND_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(RD, SEND_ONLY), + RDMA_OPCODE(RD, SEND_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(RD, RDMA_WRITE_FIRST), + RDMA_OPCODE(RD, RDMA_WRITE_MIDDLE), + RDMA_OPCODE(RD, RDMA_WRITE_LAST), + RDMA_OPCODE(RD, RDMA_WRITE_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(RD, RDMA_WRITE_ONLY), + RDMA_OPCODE(RD, RDMA_WRITE_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(RD, RDMA_READ_REQUEST), + RDMA_OPCODE(RD, RDMA_READ_RESPONSE_FIRST), + RDMA_OPCODE(RD, RDMA_READ_RESPONSE_MIDDLE), + RDMA_OPCODE(RD, RDMA_READ_RESPONSE_LAST), + RDMA_OPCODE(RD, RDMA_READ_RESPONSE_ONLY), + RDMA_OPCODE(RD, ACKNOWLEDGE), + RDMA_OPCODE(RD, ATOMIC_ACKNOWLEDGE), + RDMA_OPCODE(RD, COMPARE_SWAP), + RDMA_OPCODE(RD, FETCH_ADD), + + /* UD */ + RDMA_OPCODE(UD, SEND_ONLY), + RDMA_OPCODE(UD, SEND_ONLY_WITH_IMMEDIATE) +}; + +#endif /* INFINIBAND_OPCODE_H */ diff -ruNp ORG/librdmaverbs/include/rdma/sa-kern-abi.h NEW/librdmaverbs/include/rdma/sa-kern-abi.h --- ORG/librdmaverbs/include/rdma/sa-kern-abi.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/include/rdma/sa-kern-abi.h 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,65 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef INFINIBAND_SA_KERN_ABI_H +#define INFINIBAND_SA_KERN_ABI_H + +#include + +/* + * Obsolete, deprecated names. Will be removed in libibverbs 1.1. + */ +#define ib_kern_path_rec rdma_kern_path_rec + +struct rdma_kern_path_rec { + __u8 dgid[16]; + __u8 sgid[16]; + __u16 dlid; + __u16 slid; + __u32 raw_traffic; + __u32 flow_label; + __u32 reversible; + __u32 mtu; + __u16 pkey; + __u8 hop_limit; + __u8 traffic_class; + __u8 numb_path; + __u8 sl; + __u8 mtu_selector; + __u8 rate_selector; + __u8 rate; + __u8 packet_life_time_selector; + __u8 packet_life_time; + __u8 preference; +}; + +#endif /* INFINIBAND_SA_KERN_ABI_H */ diff -ruNp ORG/librdmaverbs/include/rdma/sa.h NEW/librdmaverbs/include/rdma/sa.h --- ORG/librdmaverbs/include/rdma/sa.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/include/rdma/sa.h 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,102 @@ +/* + * Copyright (c) 2004 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: sa.h 2616 2005-06-15 15:22:39Z halr $ + */ + +#ifndef INFINIBAND_SA_H +#define INFINIBAND_SA_H + +#include + +struct rdma_sa_path_rec { + /* reserved */ + /* reserved */ + union rdma_gid dgid; + union rdma_gid sgid; + uint16_t dlid; + uint16_t slid; + int raw_traffic; + /* reserved */ + uint32_t flow_label; + uint8_t hop_limit; + uint8_t traffic_class; + int reversible; + uint8_t numb_path; + uint16_t pkey; + /* reserved */ + uint8_t sl; + uint8_t mtu_selector; + uint8_t mtu; + uint8_t rate_selector; + uint8_t rate; + uint8_t packet_life_time_selector; + uint8_t packet_life_time; + uint8_t preference; +}; + +struct rdma_sa_mcmember_rec { + union rdma_gid mgid; + union rdma_gid port_gid; + uint32_t qkey; + uint16_t mlid; + uint8_t mtu_selector; + uint8_t mtu; + uint8_t traffic_class; + uint16_t pkey; + uint8_t rate_selector; + uint8_t rate; + uint8_t packet_life_time_selector; + uint8_t packet_life_time; + uint8_t sl; + uint32_t flow_label; + uint8_t hop_limit; + uint8_t scope; + uint8_t join_state; + int proxy_join; +}; + +struct rdma_sa_service_rec { + uint64_t id; + union rdma_gid gid; + uint16_t pkey; + /* uint16_t resv; */ + uint32_t lease; + uint8_t key[16]; + uint8_t name[64]; + uint8_t data8[16]; + uint16_t data16[8]; + uint32_t data32[4]; + uint64_t data64[2]; +}; + +#endif /* INFINIBAND_SA_H */ diff -ruNp ORG/librdmaverbs/include/rdma/verbs.h NEW/librdmaverbs/include/rdma/verbs.h --- ORG/librdmaverbs/include/rdma/verbs.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/include/rdma/verbs.h 2006-07-10 18:07:46.000000000 -0700 @@ -0,0 +1,1026 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2004 Intel Corporation. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2005 PathScale, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: verbs.h 7631 2006-06-02 19:53:25Z swise $ + */ + +#ifndef INFINIBAND_VERBS_H +#define INFINIBAND_VERBS_H + +#include +#include + +#ifdef __cplusplus +# define BEGIN_C_DECLS extern "C" { +# define END_C_DECLS } +#else /* !__cplusplus */ +# define BEGIN_C_DECLS +# define END_C_DECLS +#endif /* __cplusplus */ + +#if __GNUC__ >= 3 +# define __attribute_const __attribute__((const)) +#else +# define __attribute_const +#endif + +BEGIN_C_DECLS + +union rdma_gid { + uint8_t raw[16]; + struct { + uint64_t subnet_prefix; + uint64_t interface_id; + } global; +}; + +enum rdma_node_type { + RDMA_NODE_UNKNOWN=-1, + RDMA_NODE_CA = 1, + RDMA_NODE_SWITCH, + RDMA_NODE_ROUTER, + RDMA_NODE_RNIC +}; + +enum rdma_transport_type { + RDMA_TRANSPORT_UNKNOWN=0, + RDMA_TRANSPORT_IB=1, + RDMA_TRANSPORT_IWARP=2 +}; + +enum rdma_device_cap_flags { + RDMA_DEVICE_RESIZE_MAX_WR = 1, + RDMA_DEVICE_BAD_PKEY_CNTR = 1 << 1, + RDMA_DEVICE_BAD_QKEY_CNTR = 1 << 2, + RDMA_DEVICE_RAW_MULTI = 1 << 3, + RDMA_DEVICE_AUTO_PATH_MIG = 1 << 4, + RDMA_DEVICE_CHANGE_PHY_PORT = 1 << 5, + RDMA_DEVICE_UD_AV_PORT_ENFORCE = 1 << 6, + RDMA_DEVICE_CURR_QP_STATE_MOD = 1 << 7, + RDMA_DEVICE_SHUTDOWN_PORT = 1 << 8, + RDMA_DEVICE_INIT_TYPE = 1 << 9, + RDMA_DEVICE_PORT_ACTIVE_EVENT = 1 << 10, + RDMA_DEVICE_SYS_IMAGE_GUID = 1 << 11, + RDMA_DEVICE_RC_RNR_NAK_GEN = 1 << 12, + RDMA_DEVICE_SRQ_RESIZE = 1 << 13, + RDMA_DEVICE_N_NOTIFY_CQ = 1 << 14 +}; + +enum rdma_atomic_cap { + RDMA_ATOMIC_NONE, + RDMA_ATOMIC_HCA, + RDMA_ATOMIC_GLOB +}; + +struct rdma_device_attr { + char fw_ver[64]; + uint64_t node_guid; + uint64_t sys_image_guid; + uint64_t max_mr_size; + uint64_t page_size_cap; + uint32_t vendor_id; + uint32_t vendor_part_id; + uint32_t hw_ver; + int max_qp; + int max_qp_wr; + int device_cap_flags; + int max_sge; + int max_sge_rd; + int max_cq; + int max_cqe; + int max_mr; + int max_pd; + int max_qp_rd_atom; + int max_ee_rd_atom; + int max_res_rd_atom; + int max_qp_init_rd_atom; + int max_ee_init_rd_atom; + enum rdma_atomic_cap atomic_cap; + int max_ee; + int max_rdd; + int max_mw; + int max_raw_ipv6_qp; + int max_raw_ethy_qp; + int max_mcast_grp; + int max_mcast_qp_attach; + int max_total_mcast_qp_attach; + int max_ah; + int max_fmr; + int max_map_per_fmr; + int max_srq; + int max_srq_wr; + int max_srq_sge; + uint16_t max_pkeys; + uint8_t local_ca_ack_delay; + uint8_t phys_port_cnt; +}; + +enum rdma_mtu { + RDMA_MTU_256 = 1, + RDMA_MTU_512 = 2, + RDMA_MTU_1024 = 3, + RDMA_MTU_2048 = 4, + RDMA_MTU_4096 = 5 +}; + +enum rdma_port_state { + RDMA_PORT_NOP = 0, + RDMA_PORT_DOWN = 1, + RDMA_PORT_INIT = 2, + RDMA_PORT_ARMED = 3, + RDMA_PORT_ACTIVE = 4, + RDMA_PORT_ACTIVE_DEFER = 5 +}; + +struct rdma_port_attr { + enum rdma_port_state state; + enum rdma_mtu max_mtu; + enum rdma_mtu active_mtu; + int gid_tbl_len; + uint32_t port_cap_flags; + uint32_t max_msg_sz; + uint32_t bad_pkey_cntr; + uint32_t qkey_viol_cntr; + uint16_t pkey_tbl_len; + uint16_t lid; + uint16_t sm_lid; + uint8_t lmc; + uint8_t max_vl_num; + uint8_t sm_sl; + uint8_t subnet_timeout; + uint8_t init_type_reply; + uint8_t active_width; + uint8_t active_speed; + uint8_t phys_state; +}; + +enum rdma_event_type { + RDMA_EVENT_CQ_ERR, + RDMA_EVENT_QP_FATAL, + RDMA_EVENT_QP_REQ_ERR, + RDMA_EVENT_QP_ACCESS_ERR, + RDMA_EVENT_COMM_EST, + RDMA_EVENT_SQ_DRAINED, + RDMA_EVENT_PATH_MIG, + RDMA_EVENT_PATH_MIG_ERR, + RDMA_EVENT_DEVICE_FATAL, + RDMA_EVENT_PORT_ACTIVE, + RDMA_EVENT_PORT_ERR, + RDMA_EVENT_LID_CHANGE, + RDMA_EVENT_PKEY_CHANGE, + RDMA_EVENT_SM_CHANGE, + RDMA_EVENT_SRQ_ERR, + RDMA_EVENT_SRQ_LIMIT_REACHED, + RDMA_EVENT_QP_LAST_WQE_REACHED, + RDMA_EVENT_CLIENT_REREGISTER +}; + +struct rdma_async_event { + union { + struct rdma_cq *cq; + struct rdma_qp *qp; + struct rdma_srq *srq; + int port_num; + } element; + enum rdma_event_type event_type; +}; + +enum rdma_wc_status { + RDMA_WC_SUCCESS, + RDMA_WC_LOC_LEN_ERR, + RDMA_WC_LOC_QP_OP_ERR, + RDMA_WC_LOC_EEC_OP_ERR, + RDMA_WC_LOC_PROT_ERR, + RDMA_WC_WR_FLUSH_ERR, + RDMA_WC_MW_BIND_ERR, + RDMA_WC_BAD_RESP_ERR, + RDMA_WC_LOC_ACCESS_ERR, + RDMA_WC_REM_INV_REQ_ERR, + RDMA_WC_REM_ACCESS_ERR, + RDMA_WC_REM_OP_ERR, + RDMA_WC_RETRY_EXC_ERR, + RDMA_WC_RNR_RETRY_EXC_ERR, + RDMA_WC_LOC_RDD_VIOL_ERR, + RDMA_WC_REM_INV_RD_REQ_ERR, + RDMA_WC_REM_ABORT_ERR, + RDMA_WC_INV_EECN_ERR, + RDMA_WC_INV_EEC_STATE_ERR, + RDMA_WC_FATAL_ERR, + RDMA_WC_RESP_TIMEOUT_ERR, + RDMA_WC_GENERAL_ERR +}; + +enum rdma_wc_opcode { + RDMA_WC_SEND, + RDMA_WC_RDMA_WRITE, + RDMA_WC_RDMA_READ, + RDMA_WC_COMP_SWAP, + RDMA_WC_FETCH_ADD, + RDMA_WC_BIND_MW, +/* + * Set value of RDMA_WC_RECV so consumers can test if a completion is a + * receive by testing (opcode & RDMA_WC_RECV). + */ + RDMA_WC_RECV = 1 << 7, + RDMA_WC_RECV_RDMA_WITH_IMM +}; + +enum rdma_wc_flags { + RDMA_WC_GRH = 1 << 0, + RDMA_WC_WITH_IMM = 1 << 1 +}; + +struct rdma_wc { + uint64_t wr_id; + enum rdma_wc_status status; + enum rdma_wc_opcode opcode; + uint32_t vendor_err; + uint32_t byte_len; + uint32_t imm_data; /* in network byte order */ + uint32_t qp_num; + uint32_t src_qp; + enum rdma_wc_flags wc_flags; + uint16_t pkey_index; + uint16_t slid; + uint8_t sl; + uint8_t dlid_path_bits; +}; + +enum rdma_access_flags { + RDMA_ACCESS_LOCAL_WRITE = 1, + RDMA_ACCESS_REMOTE_WRITE = (1<<1), + RDMA_ACCESS_REMOTE_READ = (1<<2), + RDMA_ACCESS_REMOTE_ATOMIC = (1<<3), + RDMA_ACCESS_MW_BIND = (1<<4) +}; + +struct rdma_pd { + struct rdma_context *context; + uint32_t handle; +}; + +struct rdma_mr { + struct rdma_context *context; + struct rdma_pd *pd; + uint32_t handle; + uint32_t lkey; + uint32_t rkey; +}; + +struct rdma_global_route { + union rdma_gid dgid; + uint32_t flow_label; + uint8_t sgid_index; + uint8_t hop_limit; + uint8_t traffic_class; +}; + +enum rdma_rate { + RDMA_RATE_MAX = 0, + RDMA_RATE_2_5_GBPS = 2, + RDMA_RATE_5_GBPS = 5, + RDMA_RATE_10_GBPS = 3, + RDMA_RATE_20_GBPS = 6, + RDMA_RATE_30_GBPS = 4, + RDMA_RATE_40_GBPS = 7, + RDMA_RATE_60_GBPS = 8, + RDMA_RATE_80_GBPS = 9, + RDMA_RATE_120_GBPS = 10 +}; + +/** + * rdma_rate_to_mult - Convert the IB rate enum to a multiple of the + * base rate of 2.5 Gbit/sec. For example, RDMA_RATE_5_GBPS will be + * converted to 2, since 5 Gbit/sec is 2 * 2.5 Gbit/sec. + * @rate: rate to convert. + */ +int rdma_rate_to_mult(enum rdma_rate rate) __attribute_const; + +/** + * mult_to_rdma_rate - Convert a multiple of 2.5 Gbit/sec to an IB rate enum. + * @mult: multiple to convert. + */ +enum rdma_rate mult_to_rdma_rate(int mult) __attribute_const; + +struct rdma_ah_attr { + struct rdma_global_route grh; + uint16_t dlid; + uint8_t sl; + uint8_t src_path_bits; + uint8_t static_rate; + uint8_t is_global; + uint8_t port_num; +}; + +enum rdma_srq_attr_mask { + RDMA_SRQ_MAX_WR = 1 << 0, + RDMA_SRQ_LIMIT = 1 << 1 +}; + +struct rdma_srq_attr { + uint32_t max_wr; + uint32_t max_sge; + uint32_t srq_limit; +}; + +struct rdma_srq_init_attr { + void *srq_context; + struct rdma_srq_attr attr; +}; + +enum rdma_qp_type { + RDMA_QPT_RC = 2, + RDMA_QPT_UC, + RDMA_QPT_UD +}; + +struct rdma_qp_cap { + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; +}; + +struct rdma_qp_init_attr { + void *qp_context; + struct rdma_cq *send_cq; + struct rdma_cq *recv_cq; + struct rdma_srq *srq; + struct rdma_qp_cap cap; + enum rdma_qp_type qp_type; + int sq_sig_all; +}; + +enum rdma_qp_attr_mask { + RDMA_QP_STATE = 1 << 0, + RDMA_QP_CUR_STATE = 1 << 1, + RDMA_QP_EN_SQD_ASYNC_NOTIFY = 1 << 2, + RDMA_QP_ACCESS_FLAGS = 1 << 3, + RDMA_QP_PKEY_INDEX = 1 << 4, + RDMA_QP_PORT = 1 << 5, + RDMA_QP_QKEY = 1 << 6, + RDMA_QP_AV = 1 << 7, + RDMA_QP_PATH_MTU = 1 << 8, + RDMA_QP_TIMEOUT = 1 << 9, + RDMA_QP_RETRY_CNT = 1 << 10, + RDMA_QP_RNR_RETRY = 1 << 11, + RDMA_QP_RQ_PSN = 1 << 12, + RDMA_QP_MAX_QP_RD_ATOMIC = 1 << 13, + RDMA_QP_ALT_PATH = 1 << 14, + RDMA_QP_MIN_RNR_TIMER = 1 << 15, + RDMA_QP_SQ_PSN = 1 << 16, + RDMA_QP_MAX_DEST_RD_ATOMIC = 1 << 17, + RDMA_QP_PATH_MIG_STATE = 1 << 18, + RDMA_QP_CAP = 1 << 19, + RDMA_QP_DEST_QPN = 1 << 20 +}; + +enum rdma_qp_state { + RDMA_QPS_RESET, + RDMA_QPS_INIT, + RDMA_QPS_RTR, + RDMA_QPS_RTS, + RDMA_QPS_SQD, + RDMA_QPS_SQE, + RDMA_QPS_ERR +}; + +enum rdma_mig_state { + RDMA_MIG_MIGRATED, + RDMA_MIG_REARM, + RDMA_MIG_ARMED +}; + +struct rdma_qp_attr { + enum rdma_qp_state qp_state; + enum rdma_qp_state cur_qp_state; + enum rdma_mtu path_mtu; + enum rdma_mig_state path_mig_state; + uint32_t qkey; + uint32_t rq_psn; + uint32_t sq_psn; + uint32_t dest_qp_num; + int qp_access_flags; + struct rdma_qp_cap cap; + struct rdma_ah_attr ah_attr; + struct rdma_ah_attr alt_ah_attr; + uint16_t pkey_index; + uint16_t alt_pkey_index; + uint8_t en_sqd_async_notify; + uint8_t sq_draining; + uint8_t max_rd_atomic; + uint8_t max_dest_rd_atomic; + uint8_t min_rnr_timer; + uint8_t port_num; + uint8_t timeout; + uint8_t retry_cnt; + uint8_t rnr_retry; + uint8_t alt_port_num; + uint8_t alt_timeout; +}; + +enum rdma_wr_opcode { + RDMA_WR_RDMA_WRITE, + RDMA_WR_RDMA_WRITE_WITH_IMM, + RDMA_WR_SEND, + RDMA_WR_SEND_WITH_IMM, + RDMA_WR_RDMA_READ, + RDMA_WR_ATOMIC_CMP_AND_SWP, + RDMA_WR_ATOMIC_FETCH_AND_ADD +}; + +enum rdma_send_flags { + RDMA_SEND_FENCE = 1 << 0, + RDMA_SEND_SIGNALED = 1 << 1, + RDMA_SEND_SOLICITED = 1 << 2, + RDMA_SEND_INLINE = 1 << 3 +}; + +struct rdma_sge { + uint64_t addr; + uint32_t length; + uint32_t lkey; +}; + +struct rdma_send_wr { + struct rdma_send_wr *next; + uint64_t wr_id; + struct rdma_sge *sg_list; + int num_sge; + enum rdma_wr_opcode opcode; + enum rdma_send_flags send_flags; + uint32_t imm_data; /* in network byte order */ + union { + struct { + uint64_t remote_addr; + uint32_t rkey; + } rdma; + struct { + uint64_t remote_addr; + uint64_t compare_add; + uint64_t swap; + uint32_t rkey; + } atomic; + struct { + struct rdma_ah *ah; + uint32_t remote_qpn; + uint32_t remote_qkey; + } ud; + } wr; +}; + +struct rdma_recv_wr { + struct rdma_recv_wr *next; + uint64_t wr_id; + struct rdma_sge *sg_list; + int num_sge; +}; + +struct rdma_srq { + struct rdma_context *context; + void *srq_context; + struct rdma_pd *pd; + uint32_t handle; + + pthread_mutex_t mutex; + pthread_cond_t cond; + uint32_t events_completed; +}; + +struct rdma_qp { + struct rdma_context *context; + void *qp_context; + struct rdma_pd *pd; + struct rdma_cq *send_cq; + struct rdma_cq *recv_cq; + struct rdma_srq *srq; + uint32_t handle; + uint32_t qp_num; + enum rdma_qp_state state; + enum rdma_qp_type qp_type; + + pthread_mutex_t mutex; + pthread_cond_t cond; + uint32_t events_completed; +}; + +struct rdma_comp_channel { + int fd; +}; + +struct rdma_cq { + struct rdma_context *context; + void *cq_context; + uint32_t handle; + int cqe; + + pthread_mutex_t mutex; + pthread_cond_t cond; + uint32_t comp_events_completed; + uint32_t async_events_completed; +}; + +struct rdma_ah { + struct rdma_context *context; + struct rdma_pd *pd; + uint32_t handle; +}; + +struct rdma_device; +struct rdma_context; + +struct rdma_device_ops { + struct rdma_context * (*alloc_context)(struct rdma_device *device, int cmd_fd); + void (*free_context)(struct rdma_context *context); +}; + +enum { + RDMA_SYSFS_NAME_MAX = 64, + RDMA_SYSFS_PATH_MAX = 256 +}; + +struct rdma_device { + struct rdma_driver *driver; + enum rdma_node_type node_type; + struct rdma_device_ops ops; + /* Name of underlying kernel IB device, eg "mthca0" */ + char name[RDMA_SYSFS_NAME_MAX]; + /* Name of uverbs device, eg "uverbs0" */ + char dev_name[RDMA_SYSFS_NAME_MAX]; + /* Path to infiniband_verbs class device in sysfs */ + char dev_path[RDMA_SYSFS_PATH_MAX]; + /* Path to infiniband class device in sysfs */ + char ibdev_path[RDMA_SYSFS_PATH_MAX]; +}; + +struct rdma_context_ops { + int (*query_device)(struct rdma_context *context, + struct rdma_device_attr *device_attr); + int (*query_port)(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr); + struct rdma_pd * (*alloc_pd)(struct rdma_context *context); + int (*dealloc_pd)(struct rdma_pd *pd); + struct rdma_mr * (*reg_mr)(struct rdma_pd *pd, void *addr, size_t length, + enum rdma_access_flags access); + int (*dereg_mr)(struct rdma_mr *mr); + struct rdma_cq * (*create_cq)(struct rdma_context *context, int cqe, + struct rdma_comp_channel *channel, + int comp_vector); + int (*poll_cq)(struct rdma_cq *cq, int num_entries, struct rdma_wc *wc); + int (*req_notify_cq)(struct rdma_cq *cq, int solicited_only); + void (*cq_event)(struct rdma_cq *cq); + int (*resize_cq)(struct rdma_cq *cq, int cqe); + int (*destroy_cq)(struct rdma_cq *cq); + struct rdma_srq * (*create_srq)(struct rdma_pd *pd, + struct rdma_srq_init_attr *srq_init_attr); + int (*modify_srq)(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask); + int (*query_srq)(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr); + int (*destroy_srq)(struct rdma_srq *srq); + int (*post_srq_recv)(struct rdma_srq *srq, + struct rdma_recv_wr *recv_wr, + struct rdma_recv_wr **bad_recv_wr); + struct rdma_qp * (*create_qp)(struct rdma_pd *pd, struct rdma_qp_init_attr *attr); + int (*query_qp)(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr); + int (*modify_qp)(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask); + int (*destroy_qp)(struct rdma_qp *qp); + int (*post_send)(struct rdma_qp *qp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr); + int (*post_recv)(struct rdma_qp *qp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr); + struct rdma_ah * (*create_ah)(struct rdma_pd *pd, struct rdma_ah_attr *attr); + int (*destroy_ah)(struct rdma_ah *ah); + int (*attach_mcast)(struct rdma_qp *qp, union rdma_gid *gid, + uint16_t lid); + int (*detach_mcast)(struct rdma_qp *qp, union rdma_gid *gid, + uint16_t lid); +}; + +struct rdma_context { + struct rdma_device *device; + struct rdma_context_ops ops; + int cmd_fd; + int async_fd; + int num_comp_vectors; + void *abi_compat; +}; + +/** + * rdma_get_device_list - Get list of IB devices currently available + * @num_devices: optional. if non-NULL, set to the number of devices + * returned in the array. + * + * Return a NULL-terminated array of IB devices. The array can be + * released with rdma_free_device_list(). + */ +struct rdma_device **rdma_get_device_list(int *num_devices); + +/** + * rdma_free_device_list - Free list from rdma_get_device_list() + * + * Free an array of devices returned from rdma_get_device_list(). Once + * the array is freed, pointers to devices that were not opened with + * rdma_open_device() are no longer valid. Client code must open all + * devices it intends to use before calling rdma_free_device_list(). + */ +void rdma_free_device_list(struct rdma_device **list); + +/** + * rdma_get_device_name - Return kernel device name + */ +const char *rdma_get_device_name(struct rdma_device *device); + +/** + * rdma_get_device_guid - Return device's node GUID + */ +uint64_t rdma_get_device_guid(struct rdma_device *device); + +/** + * rdma_get_transport_type - Return device's network transport type + */ +static inline enum rdma_transport_type +rdma_get_transport_type(struct rdma_context *context) +{ + if (!context->device) + return RDMA_TRANSPORT_UNKNOWN; + + switch (context->device->node_type) { + case RDMA_NODE_CA: + case RDMA_NODE_SWITCH: + case RDMA_NODE_ROUTER: + return RDMA_TRANSPORT_IB; + case RDMA_NODE_RNIC: + return RDMA_TRANSPORT_IWARP; + default: + return RDMA_TRANSPORT_UNKNOWN; + } +} + +/** + * rdma_get_node_type - Return device's node type + */ +static inline enum rdma_node_type +rdma_get_node_type(struct rdma_context *context) +{ + if (!context->device) + return RDMA_NODE_UNKNOWN; + + return context->device->node_type; +} + +/** + * rdma_open_device - Initialize device for use + */ +struct rdma_context *rdma_open_device(struct rdma_device *device); + +/** + * rdma_close_device - Release device + */ +int rdma_close_device(struct rdma_context *context); + +/** + * rdma_get_async_event - Get next async event + * @event: Pointer to use to return async event + * + * All async events returned by rdma_get_async_event() must eventually + * be acknowledged with rdma_ack_async_event(). + */ +int rdma_get_async_event(struct rdma_context *context, + struct rdma_async_event *event); + +/** + * rdma_ack_async_event - Acknowledge an async event + * @event: Event to be acknowledged. + * + * All async events which are returned by rdma_get_async_event() must + * be acknowledged. To avoid races, destroying an object (CQ, SRQ or + * QP) will wait for all affiliated events to be acknowledged, so + * there should be a one-to-one correspondence between acks and + * successful gets. + */ +void rdma_ack_async_event(struct rdma_async_event *event); + +/** + * rdma_query_device - Get device properties + */ +int rdma_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr); + +/** + * rdma_query_port - Get port properties + */ +int rdma_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr); + +/** + * rdma_query_gid - Get a GID table entry + */ +int rdma_query_gid(struct rdma_context *context, uint8_t port_num, + int index, union rdma_gid *gid); + +/** + * rdma_query_pkey - Get a P_Key table entry + */ +int rdma_query_pkey(struct rdma_context *context, uint8_t port_num, + int index, uint16_t *pkey); + +/** + * rdma_alloc_pd - Allocate a protection domain + */ +struct rdma_pd *rdma_alloc_pd(struct rdma_context *context); + +/** + * rdma_dealloc_pd - Free a protection domain + */ +int rdma_dealloc_pd(struct rdma_pd *pd); + +/** + * rdma_reg_mr - Register a memory region + */ +struct rdma_mr *rdma_reg_mr(struct rdma_pd *pd, void *addr, + size_t length, enum rdma_access_flags access); + +/** + * rdma_dereg_mr - Deregister a memory region + */ +int rdma_dereg_mr(struct rdma_mr *mr); + +/** + * rdma_create_comp_channel - Create a completion event channel + */ +struct rdma_comp_channel *rdma_create_comp_channel(struct rdma_context *context); + +/** + * rdma_destroy_comp_channel - Destroy a completion event channel + */ +int rdma_destroy_comp_channel(struct rdma_comp_channel *channel); + +/** + * rdma_create_cq - Create a completion queue + * @context - Context CQ will be attached to + * @cqe - Minimum number of entries required for CQ + * @cq_context - Consumer-supplied context returned for completion events + * @channel - Completion channel where completion events will be queued. + * May be NULL if completion events will not be used. + * @comp_vector - Completion vector used to signal completion events. + * Must be >= 0 and < context->num_comp_vectors. + */ +struct rdma_cq *rdma_create_cq(struct rdma_context *context, int cqe, + void *cq_context, + struct rdma_comp_channel *channel, + int comp_vector); + +/** + * rdma_resize_cq - Modifies the capacity of the CQ. + * @cq: The CQ to resize. + * @cqe: The minimum size of the CQ. + * + * Users can examine the cq structure to determine the actual CQ size. + */ +int rdma_resize_cq(struct rdma_cq *cq, int cqe); + +/** + * rdma_destroy_cq - Destroy a completion queue + */ +int rdma_destroy_cq(struct rdma_cq *cq); + +/** + * rdma_get_cq_event - Read next CQ event + * @channel: Channel to get next event from. + * @cq: Used to return pointer to CQ. + * @cq_context: Used to return consumer-supplied CQ context. + * + * All completion events returned by rdma_get_cq_event() must + * eventually be acknowledged with rdma_ack_cq_events(). + */ +int rdma_get_cq_event(struct rdma_comp_channel *channel, + struct rdma_cq **cq, void **cq_context); + +/** + * rdma_ack_cq_events - Acknowledge CQ completion events + * @cq: CQ to acknowledge events for + * @nevents: Number of events to acknowledge. + * + * All completion events which are returned by rdma_get_cq_event() must + * be acknowledged. To avoid races, rdma_destroy_cq() will wait for + * all completion events to be acknowledged, so there should be a + * one-to-one correspondence between acks and successful gets. An + * application may accumulate multiple completion events and + * acknowledge them in a single call to rdma_ack_cq_events() by passing + * the number of events to ack in @nevents. + */ +void rdma_ack_cq_events(struct rdma_cq *cq, unsigned int nevents); + +/** + * rdma_poll_cq - Poll a CQ for work completions + * @cq:the CQ being polled + * @num_entries:maximum number of completions to return + * @wc:array of at least @num_entries of &struct rdma_wc where completions + * will be returned + * + * Poll a CQ for (possibly multiple) completions. If the return value + * is < 0, an error occurred. If the return value is >= 0, it is the + * number of completions returned. If the return value is + * non-negative and strictly less than num_entries, then the CQ was + * emptied. + */ +static inline int rdma_poll_cq(struct rdma_cq *cq, int num_entries, struct rdma_wc *wc) +{ + return cq->context->ops.poll_cq(cq, num_entries, wc); +} + +/** + * rdma_req_notify_cq - Request completion notification on a CQ. An + * event will be added to the completion channel associated with the + * CQ when an entry is added to the CQ. + * @cq: The completion queue to request notification for. + * @solicited_only: If non-zero, an event will be generated only for + * the next solicited CQ entry. If zero, any CQ entry, solicited or + * not, will generate an event. + */ +static inline int rdma_req_notify_cq(struct rdma_cq *cq, int solicited_only) +{ + return cq->context->ops.req_notify_cq(cq, solicited_only); +} + +/** + * rdma_create_srq - Creates a SRQ associated with the specified protection + * domain. + * @pd: The protection domain associated with the SRQ. + * @srq_init_attr: A list of initial attributes required to create the SRQ. + * + * srq_attr->max_wr and srq_attr->max_sge are read the determine the + * requested size of the SRQ, and set to the actual values allocated + * on return. If rdma_create_srq() succeeds, then max_wr and max_sge + * will always be at least as large as the requested values. + */ +struct rdma_srq *rdma_create_srq(struct rdma_pd *pd, + struct rdma_srq_init_attr *srq_init_attr); + +/** + * rdma_modify_srq - Modifies the attributes for the specified SRQ. + * @srq: The SRQ to modify. + * @srq_attr: On input, specifies the SRQ attributes to modify. On output, + * the current values of selected SRQ attributes are returned. + * @srq_attr_mask: A bit-mask used to specify which attributes of the SRQ + * are being modified. + * + * The mask may contain RDMA_SRQ_MAX_WR to resize the SRQ and/or + * RDMA_SRQ_LIMIT to set the SRQ's limit and request notification when + * the number of receives queued drops below the limit. + */ +int rdma_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask); + +/** + * rdma_query_srq - Returns the attribute list and current values for the + * specified SRQ. + * @srq: The SRQ to query. + * @srq_attr: The attributes of the specified SRQ. + */ +int rdma_query_srq(struct rdma_srq *srq, struct rdma_srq_attr *srq_attr); + +/** + * rdma_destroy_srq - Destroys the specified SRQ. + * @srq: The SRQ to destroy. + */ +int rdma_destroy_srq(struct rdma_srq *srq); + +/** + * rdma_post_srq_recv - Posts a list of work requests to the specified SRQ. + * @srq: The SRQ to post the work request on. + * @recv_wr: A list of work requests to post on the receive queue. + * @bad_recv_wr: On an immediate failure, this parameter will reference + * the work request that failed to be posted on the QP. + */ +static inline int rdma_post_srq_recv(struct rdma_srq *srq, + struct rdma_recv_wr *recv_wr, + struct rdma_recv_wr **bad_recv_wr) +{ + return srq->context->ops.post_srq_recv(srq, recv_wr, bad_recv_wr); +} + +/** + * rdmav_create_qp - Create a queue pair. + */ +struct rdma_qp *rdmav_create_qp(struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr); + +/** + * rdma_modify_qp - Modify a queue pair. + */ +int rdma_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask); + +/** + * rdma_query_qp - Returns the attribute list and current values for the + * specified QP. + * @qp: The QP to query. + * @attr: The attributes of the specified QP. + * @attr_mask: A bit-mask used to select specific attributes to query. + * @init_attr: Additional attributes of the selected QP. + * + * The qp_attr_mask may be used to limit the query to gathering only the + * selected attributes. + */ +int rdma_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr); + +/** + * rdmav_destroy_qp - Destroy a queue pair. + */ +int rdmav_destroy_qp(struct rdma_qp *qp); + +/** + * rdma_post_send - Post a list of work requests to a send queue. + */ +static inline int rdma_post_send(struct rdma_qp *qp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr) +{ + return qp->context->ops.post_send(qp, wr, bad_wr); +} + +/** + * rdma_post_recv - Post a list of work requests to a receive queue. + */ +static inline int rdma_post_recv(struct rdma_qp *qp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) +{ + return qp->context->ops.post_recv(qp, wr, bad_wr); +} + +/** + * rdma_create_ah - Create an address handle. + */ +struct rdma_ah *rdma_create_ah(struct rdma_pd *pd, struct rdma_ah_attr *attr); + +/** + * rdma_destroy_ah - Destroy an address handle. + */ +int rdma_destroy_ah(struct rdma_ah *ah); + +/** + * rdma_attach_mcast - Attaches the specified QP to a multicast group. + * @qp: QP to attach to the multicast group. The QP must be a UD QP. + * @gid: Multicast group GID. + * @lid: Multicast group LID in host byte order. + * + * In order to route multicast packets correctly, subnet + * administration must have created the multicast group and configured + * the fabric appropriately. The port associated with the specified + * QP must also be a member of the multicast group. + */ +int rdma_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid); + +/** + * rdma_detach_mcast - Detaches the specified QP from a multicast group. + * @qp: QP to detach from the multicast group. + * @gid: Multicast group GID. + * @lid: Multicast group LID in host byte order. + */ +int rdma_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid); + +END_C_DECLS + +# undef __attribute_const + +#endif /* INFINIBAND_VERBS_H */ From krkumar2 at in.ibm.com Mon Jul 10 03:06:54 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 15:36:54 +0530 Subject: [openib-general] [RFC] [PATCH 0/7] Transport Neutral Verbs Proposal Message-ID: <1152526014.27956.8.camel@localhost.localdomain> This patchset is a proposal to create a new librdmaverbs library (similar to libibverbs) with the intention to remove transport specific knowledge from the library. Patch 1/7 - librdmaverbs config files. Patch 2/7 - librdmaverbs config files 2. Patch 3/7 - librdmaverbs header files. Patch 4/7 - librdmaverbs src files. Patch 5/7 - librdmaverbs libtool files. Patch 6/7 - librdmacm changes to use new API. Patch 7/7 - libamso changes to use new API. Please provide comments. Thanks, - KK From krkumar2 at in.ibm.com Mon Jul 10 03:18:11 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 15:48:11 +0530 Subject: [openib-general] [RFC] [PATCH 2/7] ibrdmaverbs config files 2 Message-ID: <1152526691.28048.2.camel@localhost.localdomain> I am sending this as attachment (bzip2) as it is large and I am getting a "glibc free" error for large files :( -------------- next part -------------- A non-text attachment was scrubbed... Name: diff.rdma.config_dir.bz2 Type: application/x-bzip Size: 67209 bytes Desc: not available URL: From krkumar2 at in.ibm.com Mon Jul 10 03:22:25 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 15:52:25 +0530 Subject: [openib-general] [RFC] [PATCH 2/7] ibrdmaverbs src files Message-ID: <1152526945.28048.7.camel@localhost.localdomain> This library provides equivalent functionality to libibverbs, but changes the data types and verb API's to be transport neutral. This patch contains the source files. Signed-of-by: Krishna Kumar diff -ruNp ORG/librdmaverbs/src/cmd.c NEW/librdmaverbs/src/cmd.c --- ORG/librdmaverbs/src/cmd.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/cmd.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,1060 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005 PathScale, Inc. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: cmd.c 7631 2006-06-02 19:53:25Z swise $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include + +#include "rdmaverbs.h" + +static int rdma_cmd_get_context_v2(struct rdma_context *context, + struct rdma_get_context *new_cmd, + size_t new_cmd_size, + struct rdma_get_context_resp *resp, + size_t resp_size) +{ + struct rdma_abi_compat_v2 *t; + struct rdma_get_context_v2 *cmd; + size_t cmd_size; + uint32_t cq_fd; + + t = malloc(sizeof *t); + if (!t) + return ENOMEM; + pthread_mutex_init(&t->in_use, NULL); + + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; + cmd = alloca(cmd_size); + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); + cmd->cq_fd_tab = (uintptr_t) &cq_fd; + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + context->async_fd = resp->async_fd; + context->num_comp_vectors = 1; + t->channel.fd = cq_fd; + context->abi_compat = t; + + return 0; +} + +int rdma_cmd_get_context(struct rdma_context *context, struct rdma_get_context *cmd, + size_t cmd_size, struct rdma_get_context_resp *resp, + size_t resp_size) +{ + if (abi_ver <= 2) + return rdma_cmd_get_context_v2(context, cmd, cmd_size, resp, resp_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + context->async_fd = resp->async_fd; + context->num_comp_vectors = resp->num_comp_vectors; + + return 0; +} + +int rdma_cmd_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr, + uint64_t *raw_fw_ver, + struct rdma_query_device *cmd, size_t cmd_size) +{ + struct rdma_query_device_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_DEVICE, &resp, sizeof resp); + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + memset(device_attr->fw_ver, 0, sizeof device_attr->fw_ver); + *raw_fw_ver = resp.fw_ver; + device_attr->node_guid = resp.node_guid; + device_attr->sys_image_guid = resp.sys_image_guid; + device_attr->max_mr_size = resp.max_mr_size; + device_attr->page_size_cap = resp.page_size_cap; + device_attr->vendor_id = resp.vendor_id; + device_attr->vendor_part_id = resp.vendor_part_id; + device_attr->hw_ver = resp.hw_ver; + device_attr->max_qp = resp.max_qp; + device_attr->max_qp_wr = resp.max_qp_wr; + device_attr->device_cap_flags = resp.device_cap_flags; + device_attr->max_sge = resp.max_sge; + device_attr->max_sge_rd = resp.max_sge_rd; + device_attr->max_cq = resp.max_cq; + device_attr->max_cqe = resp.max_cqe; + device_attr->max_mr = resp.max_mr; + device_attr->max_pd = resp.max_pd; + device_attr->max_qp_rd_atom = resp.max_qp_rd_atom; + device_attr->max_ee_rd_atom = resp.max_ee_rd_atom; + device_attr->max_res_rd_atom = resp.max_res_rd_atom; + device_attr->max_qp_init_rd_atom = resp.max_qp_init_rd_atom; + device_attr->max_ee_init_rd_atom = resp.max_ee_init_rd_atom; + device_attr->atomic_cap = resp.atomic_cap; + device_attr->max_ee = resp.max_ee; + device_attr->max_rdd = resp.max_rdd; + device_attr->max_mw = resp.max_mw; + device_attr->max_raw_ipv6_qp = resp.max_raw_ipv6_qp; + device_attr->max_raw_ethy_qp = resp.max_raw_ethy_qp; + device_attr->max_mcast_grp = resp.max_mcast_grp; + device_attr->max_mcast_qp_attach = resp.max_mcast_qp_attach; + device_attr->max_total_mcast_qp_attach = resp.max_total_mcast_qp_attach; + device_attr->max_ah = resp.max_ah; + device_attr->max_fmr = resp.max_fmr; + device_attr->max_map_per_fmr = resp.max_map_per_fmr; + device_attr->max_srq = resp.max_srq; + device_attr->max_srq_wr = resp.max_srq_wr; + device_attr->max_srq_sge = resp.max_srq_sge; + device_attr->max_pkeys = resp.max_pkeys; + device_attr->local_ca_ack_delay = resp.local_ca_ack_delay; + device_attr->phys_port_cnt = resp.phys_port_cnt; + + return 0; +} + +int rdma_cmd_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr, + struct rdma_query_port *cmd, size_t cmd_size) +{ + struct rdma_query_port_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_PORT, &resp, sizeof resp); + cmd->port_num = port_num; + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + port_attr->state = resp.state; + port_attr->max_mtu = resp.max_mtu; + port_attr->active_mtu = resp.active_mtu; + port_attr->gid_tbl_len = resp.gid_tbl_len; + port_attr->port_cap_flags = resp.port_cap_flags; + port_attr->max_msg_sz = resp.max_msg_sz; + port_attr->bad_pkey_cntr = resp.bad_pkey_cntr; + port_attr->qkey_viol_cntr = resp.qkey_viol_cntr; + port_attr->pkey_tbl_len = resp.pkey_tbl_len; + port_attr->lid = resp.lid; + port_attr->sm_lid = resp.sm_lid; + port_attr->lmc = resp.lmc; + port_attr->max_vl_num = resp.max_vl_num; + port_attr->sm_sl = resp.sm_sl; + port_attr->subnet_timeout = resp.subnet_timeout; + port_attr->init_type_reply = resp.init_type_reply; + port_attr->active_width = resp.active_width; + port_attr->active_speed = resp.active_speed; + port_attr->phys_state = resp.phys_state; + + return 0; +} + +int rdma_cmd_alloc_pd(struct rdma_context *context, struct rdma_pd *pd, + struct rdma_alloc_pd *cmd, size_t cmd_size, + struct rdma_alloc_pd_resp *resp, size_t resp_size) +{ + RDMA_INIT_CMD_RESP(cmd, cmd_size, ALLOC_PD, resp, resp_size); + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + pd->handle = resp->pd_handle; + + return 0; +} + +int rdma_cmd_dealloc_pd(struct rdma_pd *pd) +{ + struct rdma_dealloc_pd cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DEALLOC_PD); + cmd.pd_handle = pd->handle; + + if (write(pd->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_reg_mr(struct rdma_pd *pd, void *addr, size_t length, + uint64_t hca_va, enum rdma_access_flags access, + struct rdma_mr *mr, struct rdma_reg_mr *cmd, + size_t cmd_size) +{ + struct rdma_reg_mr_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, REG_MR, &resp, sizeof resp); + + cmd->start = (uintptr_t) addr; + cmd->length = length; + cmd->hca_va = hca_va; + cmd->pd_handle = pd->handle; + cmd->access_flags = access; + + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + mr->handle = resp.mr_handle; + mr->lkey = resp.lkey; + mr->rkey = resp.rkey; + + return 0; +} + +int rdma_cmd_dereg_mr(struct rdma_mr *mr) +{ + struct rdma_dereg_mr cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DEREG_MR); + cmd.mr_handle = mr->handle; + + if (write(mr->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +static int rdma_cmd_create_cq_v2(struct rdma_context *context, int cqe, + struct rdma_cq *cq, + struct rdma_create_cq *new_cmd, size_t new_cmd_size, + struct rdma_create_cq_resp *resp, size_t resp_size) +{ + struct rdma_create_cq_v2 *cmd; + size_t cmd_size; + + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; + cmd = alloca(cmd_size); + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); + cmd->user_handle = (uintptr_t) cq; + cmd->cqe = cqe; + cmd->event_handler = 0; + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + cq->handle = resp->cq_handle; + cq->cqe = resp->cqe; + + return 0; +} + +int rdma_cmd_create_cq(struct rdma_context *context, int cqe, + struct rdma_comp_channel *channel, + int comp_vector, struct rdma_cq *cq, + struct rdma_create_cq *cmd, size_t cmd_size, + struct rdma_create_cq_resp *resp, size_t resp_size) +{ + if (abi_ver <= 2) + return rdma_cmd_create_cq_v2(context, cqe, cq, + cmd, cmd_size, resp, resp_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); + cmd->user_handle = (uintptr_t) cq; + cmd->cqe = cqe; + cmd->comp_vector = comp_vector; + cmd->comp_channel = channel ? channel->fd : -1; + cmd->reserved = 0; + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + cq->handle = resp->cq_handle; + cq->cqe = resp->cqe; + + return 0; +} + +int rdma_cmd_poll_cq(struct rdma_cq *ibcq, int ne, struct rdma_wc *wc) +{ + struct rdma_poll_cq cmd; + struct rdma_poll_cq_resp *resp; + int i; + int rsize; + int ret; + + rsize = sizeof *resp + ne * sizeof(struct rdma_kern_wc); + resp = malloc(rsize); + if (!resp) + return -1; + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, POLL_CQ, resp, rsize); + cmd.cq_handle = ibcq->handle; + cmd.ne = ne; + + if (write(ibcq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) { + ret = -1; + goto out; + } + + for (i = 0; i < resp->count; i++) { + wc[i].wr_id = resp->wc[i].wr_id; + wc[i].status = resp->wc[i].status; + wc[i].opcode = resp->wc[i].opcode; + wc[i].vendor_err = resp->wc[i].vendor_err; + wc[i].byte_len = resp->wc[i].byte_len; + wc[i].imm_data = resp->wc[i].imm_data; + wc[i].qp_num = resp->wc[i].qp_num; + wc[i].src_qp = resp->wc[i].src_qp; + wc[i].wc_flags = resp->wc[i].wc_flags; + wc[i].pkey_index = resp->wc[i].pkey_index; + wc[i].slid = resp->wc[i].slid; + wc[i].sl = resp->wc[i].sl; + wc[i].dlid_path_bits = resp->wc[i].dlid_path_bits; + } + + ret = resp->count; + +out: + free(resp); + return ret; +} + +int rdma_cmd_req_notify_cq(struct rdma_cq *ibcq, int solicited_only) +{ + struct rdma_req_notify_cq cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, REQ_NOTIFY_CQ); + cmd.cq_handle = ibcq->handle; + cmd.solicited = !!solicited_only; + + if (write(ibcq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_resize_cq(struct rdma_cq *cq, int cqe, + struct rdma_resize_cq *cmd, size_t cmd_size) +{ + struct rdma_resize_cq_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, RESIZE_CQ, &resp, sizeof resp); + cmd->cq_handle = cq->handle; + cmd->cqe = cqe; + + if (write(cq->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + cq->cqe = resp.cqe; + + return 0; +} + +static int rdma_cmd_destroy_cq_v1(struct rdma_cq *cq) +{ + struct rdma_destroy_cq_v1 cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_CQ); + cmd.cq_handle = cq->handle; + + if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_destroy_cq(struct rdma_cq *cq) +{ + struct rdma_destroy_cq cmd; + struct rdma_destroy_cq_resp resp; + + if (abi_ver == 1) + return rdma_cmd_destroy_cq_v1(cq); + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_CQ, &resp, sizeof resp); + cmd.cq_handle = cq->handle; + + if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + pthread_mutex_lock(&cq->mutex); + while (cq->comp_events_completed != resp.comp_events_reported || + cq->async_events_completed != resp.async_events_reported) + pthread_cond_wait(&cq->cond, &cq->mutex); + pthread_mutex_unlock(&cq->mutex); + + return 0; +} + +int rdma_cmd_create_srq(struct rdma_pd *pd, + struct rdma_srq *srq, struct rdma_srq_init_attr *attr, + struct rdma_create_srq *cmd, size_t cmd_size, + struct rdma_create_srq_resp *resp, size_t resp_size) +{ + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_SRQ, resp, resp_size); + cmd->user_handle = (uintptr_t) srq; + cmd->pd_handle = pd->handle; + cmd->max_wr = attr->attr.max_wr; + cmd->max_sge = attr->attr.max_sge; + cmd->srq_limit = attr->attr.srq_limit; + + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + srq->handle = resp->srq_handle; + + if (abi_ver > 5) { + attr->attr.max_wr = resp->max_wr; + attr->attr.max_sge = resp->max_sge; + } else { + struct rdma_create_srq_resp_v5 *resp_v5 = + (struct rdma_create_srq_resp_v5 *) resp; + + memmove((void *) resp + sizeof *resp, + (void *) resp_v5 + sizeof *resp_v5, + resp_size - sizeof *resp); + } + + return 0; +} + +static int rdma_cmd_modify_srq_v3(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask, + struct rdma_modify_srq *new_cmd, + size_t new_cmd_size) +{ + struct rdma_modify_srq_v3 *cmd; + size_t cmd_size; + + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; + cmd = alloca(cmd_size); + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); + + cmd->srq_handle = srq->handle; + cmd->attr_mask = srq_attr_mask; + cmd->max_wr = srq_attr->max_wr; + cmd->srq_limit = srq_attr->srq_limit; + cmd->max_sge = 0; + cmd->reserved = 0; + + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + return 0; +} + +int rdma_cmd_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask, + struct rdma_modify_srq *cmd, size_t cmd_size) +{ + if (abi_ver == 3) + return rdma_cmd_modify_srq_v3(srq, srq_attr, srq_attr_mask, + cmd, cmd_size); + + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); + + cmd->srq_handle = srq->handle; + cmd->attr_mask = srq_attr_mask; + cmd->max_wr = srq_attr->max_wr; + cmd->srq_limit = srq_attr->srq_limit; + + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + return 0; +} + +int rdma_cmd_query_srq(struct rdma_srq *srq, struct rdma_srq_attr *srq_attr, + struct rdma_query_srq *cmd, size_t cmd_size) +{ + struct rdma_query_srq_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_SRQ, &resp, sizeof resp); + cmd->srq_handle = srq->handle; + + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + srq_attr->max_wr = resp.max_wr; + srq_attr->max_sge = resp.max_sge; + srq_attr->srq_limit = resp.srq_limit; + + return 0; +} + +static int rdma_cmd_destroy_srq_v1(struct rdma_srq *srq) +{ + struct rdma_destroy_srq_v1 cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_SRQ); + cmd.srq_handle = srq->handle; + + if (write(srq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_destroy_srq(struct rdma_srq *srq) +{ + struct rdma_destroy_srq cmd; + struct rdma_destroy_srq_resp resp; + + if (abi_ver == 1) + return rdma_cmd_destroy_srq_v1(srq); + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_SRQ, &resp, sizeof resp); + cmd.srq_handle = srq->handle; + + if (write(srq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + pthread_mutex_lock(&srq->mutex); + while (srq->events_completed != resp.events_reported) + pthread_cond_wait(&srq->cond, &srq->mutex); + pthread_mutex_unlock(&srq->mutex); + + return 0; +} + +int rdma_cmd_create_qp(struct rdma_pd *pd, + struct rdma_qp *qp, struct rdma_qp_init_attr *attr, + struct rdma_create_qp *cmd, size_t cmd_size, + struct rdma_create_qp_resp *resp, size_t resp_size) +{ + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_QP, resp, resp_size); + + cmd->user_handle = (uintptr_t) qp; + cmd->pd_handle = pd->handle; + cmd->send_cq_handle = attr->send_cq->handle; + cmd->recv_cq_handle = attr->recv_cq->handle; + cmd->srq_handle = attr->srq ? attr->srq->handle : 0; + cmd->max_send_wr = attr->cap.max_send_wr; + cmd->max_recv_wr = attr->cap.max_recv_wr; + cmd->max_send_sge = attr->cap.max_send_sge; + cmd->max_recv_sge = attr->cap.max_recv_sge; + cmd->max_inline_data = attr->cap.max_inline_data; + cmd->sq_sig_all = attr->sq_sig_all; + cmd->qp_type = attr->qp_type; + cmd->is_srq = !!attr->srq; + + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + qp->handle = resp->qp_handle; + qp->qp_num = resp->qpn; + + if (abi_ver > 3) { + attr->cap.max_recv_sge = resp->max_recv_sge; + attr->cap.max_send_sge = resp->max_send_sge; + attr->cap.max_recv_wr = resp->max_recv_wr; + attr->cap.max_send_wr = resp->max_send_wr; + attr->cap.max_inline_data = resp->max_inline_data; + } + + if (abi_ver == 4) { + struct rdma_create_qp_resp_v4 *resp_v4 = + (struct rdma_create_qp_resp_v4 *) resp; + + memmove((void *) resp + sizeof *resp, + (void *) resp_v4 + sizeof *resp_v4, + resp_size - sizeof *resp); + } else if (abi_ver <= 3) { + struct rdma_create_qp_resp_v3 *resp_v3 = + (struct rdma_create_qp_resp_v3 *) resp; + + memmove((void *) resp + sizeof *resp, + (void *) resp_v3 + sizeof *resp_v3, + resp_size - sizeof *resp); + } + + return 0; +} + +int rdma_cmd_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr, + struct rdma_query_qp *cmd, size_t cmd_size) +{ + struct rdma_query_qp_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_QP, &resp, sizeof resp); + cmd->qp_handle = qp->handle; + cmd->attr_mask = attr_mask; + + if (write(qp->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + attr->qkey = resp.qkey; + attr->rq_psn = resp.rq_psn; + attr->sq_psn = resp.sq_psn; + attr->dest_qp_num = resp.dest_qp_num; + attr->qp_access_flags = resp.qp_access_flags; + attr->pkey_index = resp.pkey_index; + attr->alt_pkey_index = resp.alt_pkey_index; + attr->qp_state = resp.qp_state; + attr->cur_qp_state = resp.cur_qp_state; + attr->path_mtu = resp.path_mtu; + attr->path_mig_state = resp.path_mig_state; + attr->en_sqd_async_notify = resp.en_sqd_async_notify; + attr->max_rd_atomic = resp.max_rd_atomic; + attr->max_dest_rd_atomic = resp.max_dest_rd_atomic; + attr->min_rnr_timer = resp.min_rnr_timer; + attr->port_num = resp.port_num; + attr->timeout = resp.timeout; + attr->retry_cnt = resp.retry_cnt; + attr->rnr_retry = resp.rnr_retry; + attr->alt_port_num = resp.alt_port_num; + attr->alt_timeout = resp.alt_timeout; + attr->cap.max_send_wr = resp.max_send_wr; + attr->cap.max_recv_wr = resp.max_recv_wr; + attr->cap.max_send_sge = resp.max_send_sge; + attr->cap.max_recv_sge = resp.max_recv_sge; + attr->cap.max_inline_data = resp.max_inline_data; + + memcpy(attr->ah_attr.grh.dgid.raw, resp.dest.dgid, 16); + attr->ah_attr.grh.flow_label = resp.dest.flow_label; + attr->ah_attr.dlid = resp.dest.dlid; + attr->ah_attr.grh.sgid_index = resp.dest.sgid_index; + attr->ah_attr.grh.hop_limit = resp.dest.hop_limit; + attr->ah_attr.grh.traffic_class = resp.dest.traffic_class; + attr->ah_attr.sl = resp.dest.sl; + attr->ah_attr.src_path_bits = resp.dest.src_path_bits; + attr->ah_attr.static_rate = resp.dest.static_rate; + attr->ah_attr.is_global = resp.dest.is_global; + attr->ah_attr.port_num = resp.dest.port_num; + + memcpy(attr->alt_ah_attr.grh.dgid.raw, resp.alt_dest.dgid, 16); + attr->alt_ah_attr.grh.flow_label = resp.alt_dest.flow_label; + attr->alt_ah_attr.dlid = resp.alt_dest.dlid; + attr->alt_ah_attr.grh.sgid_index = resp.alt_dest.sgid_index; + attr->alt_ah_attr.grh.hop_limit = resp.alt_dest.hop_limit; + attr->alt_ah_attr.grh.traffic_class = resp.alt_dest.traffic_class; + attr->alt_ah_attr.sl = resp.alt_dest.sl; + attr->alt_ah_attr.src_path_bits = resp.alt_dest.src_path_bits; + attr->alt_ah_attr.static_rate = resp.alt_dest.static_rate; + attr->alt_ah_attr.is_global = resp.alt_dest.is_global; + attr->alt_ah_attr.port_num = resp.alt_dest.port_num; + + init_attr->qp_context = qp->qp_context; + init_attr->send_cq = qp->send_cq; + init_attr->recv_cq = qp->recv_cq; + init_attr->srq = qp->srq; + init_attr->qp_type = qp->qp_type; + init_attr->cap.max_send_wr = resp.max_send_wr; + init_attr->cap.max_recv_wr = resp.max_recv_wr; + init_attr->cap.max_send_sge = resp.max_send_sge; + init_attr->cap.max_recv_sge = resp.max_recv_sge; + init_attr->cap.max_inline_data = resp.max_inline_data; + init_attr->sq_sig_all = resp.sq_sig_all; + + return 0; +} + +int rdma_cmd_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_modify_qp *cmd, size_t cmd_size) +{ + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_QP); + + cmd->qp_handle = qp->handle; + cmd->attr_mask = attr_mask; + cmd->qkey = attr->qkey; + cmd->rq_psn = attr->rq_psn; + cmd->sq_psn = attr->sq_psn; + cmd->dest_qp_num = attr->dest_qp_num; + cmd->qp_access_flags = attr->qp_access_flags; + cmd->pkey_index = attr->pkey_index; + cmd->alt_pkey_index = attr->alt_pkey_index; + cmd->qp_state = attr->qp_state; + cmd->cur_qp_state = attr->cur_qp_state; + cmd->path_mtu = attr->path_mtu; + cmd->path_mig_state = attr->path_mig_state; + cmd->en_sqd_async_notify = attr->en_sqd_async_notify; + cmd->max_rd_atomic = attr->max_rd_atomic; + cmd->max_dest_rd_atomic = attr->max_dest_rd_atomic; + cmd->min_rnr_timer = attr->min_rnr_timer; + cmd->port_num = attr->port_num; + cmd->timeout = attr->timeout; + cmd->retry_cnt = attr->retry_cnt; + cmd->rnr_retry = attr->rnr_retry; + cmd->alt_port_num = attr->alt_port_num; + cmd->alt_timeout = attr->alt_timeout; + + memcpy(cmd->dest.dgid, attr->ah_attr.grh.dgid.raw, 16); + cmd->dest.flow_label = attr->ah_attr.grh.flow_label; + cmd->dest.dlid = attr->ah_attr.dlid; + cmd->dest.sgid_index = attr->ah_attr.grh.sgid_index; + cmd->dest.hop_limit = attr->ah_attr.grh.hop_limit; + cmd->dest.traffic_class = attr->ah_attr.grh.traffic_class; + cmd->dest.sl = attr->ah_attr.sl; + cmd->dest.src_path_bits = attr->ah_attr.src_path_bits; + cmd->dest.static_rate = attr->ah_attr.static_rate; + cmd->dest.is_global = attr->ah_attr.is_global; + cmd->dest.port_num = attr->ah_attr.port_num; + + memcpy(cmd->alt_dest.dgid, attr->alt_ah_attr.grh.dgid.raw, 16); + cmd->alt_dest.flow_label = attr->alt_ah_attr.grh.flow_label; + cmd->alt_dest.dlid = attr->alt_ah_attr.dlid; + cmd->alt_dest.sgid_index = attr->alt_ah_attr.grh.sgid_index; + cmd->alt_dest.hop_limit = attr->alt_ah_attr.grh.hop_limit; + cmd->alt_dest.traffic_class = attr->alt_ah_attr.grh.traffic_class; + cmd->alt_dest.sl = attr->alt_ah_attr.sl; + cmd->alt_dest.src_path_bits = attr->alt_ah_attr.src_path_bits; + cmd->alt_dest.static_rate = attr->alt_ah_attr.static_rate; + cmd->alt_dest.is_global = attr->alt_ah_attr.is_global; + cmd->alt_dest.port_num = attr->alt_ah_attr.port_num; + + if (write(qp->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + return 0; +} + +static int rdma_cmd_destroy_qp_v1(struct rdma_qp *qp) +{ + struct rdma_destroy_qp_v1 cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_QP); + cmd.qp_handle = qp->handle; + + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_post_send(struct rdma_qp *ibqp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr) +{ + struct rdma_post_send *cmd; + struct rdma_post_send_resp resp; + struct rdma_send_wr *i; + struct rdma_kern_send_wr *n, *tmp; + struct rdma_sge *s; + unsigned wr_count = 0; + unsigned sge_count = 0; + int cmd_size; + int ret = 0; + + for (i = wr; i; i = i->next) { + wr_count++; + sge_count += i->num_sge; + } + + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; + cmd = alloca(cmd_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_SEND, &resp, sizeof resp); + cmd->qp_handle = ibqp->handle; + cmd->wr_count = wr_count; + cmd->sge_count = sge_count; + cmd->wqe_size = sizeof *n; + + n = (struct rdma_kern_send_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); + + tmp = n; + for (i = wr; i; i = i->next) { + tmp->wr_id = i->wr_id; + tmp->num_sge = i->num_sge; + tmp->opcode = i->opcode; + tmp->send_flags = i->send_flags; + tmp->imm_data = i->imm_data; + if (ibqp->qp_type == RDMA_QPT_UD) { + tmp->wr.ud.ah = i->wr.ud.ah->handle; + tmp->wr.ud.remote_qpn = i->wr.ud.remote_qpn; + tmp->wr.ud.remote_qkey = i->wr.ud.remote_qkey; + } else { + switch(i->opcode) { + case RDMA_WR_RDMA_WRITE: + case RDMA_WR_RDMA_WRITE_WITH_IMM: + case RDMA_WR_RDMA_READ: + tmp->wr.rdma.remote_addr = + i->wr.rdma.remote_addr; + tmp->wr.rdma.rkey = i->wr.rdma.rkey; + break; + case RDMA_WR_ATOMIC_CMP_AND_SWP: + case RDMA_WR_ATOMIC_FETCH_AND_ADD: + tmp->wr.atomic.remote_addr = + i->wr.atomic.remote_addr; + tmp->wr.atomic.compare_add = + i->wr.atomic.compare_add; + tmp->wr.atomic.swap = i->wr.atomic.swap; + tmp->wr.atomic.rkey = i->wr.atomic.rkey; + break; + default: + break; + } + } + + if (tmp->num_sge) { + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); + s += tmp->num_sge; + } + + tmp++; + } + + resp.bad_wr = 0; + if (write(ibqp->context->cmd_fd, cmd, cmd_size) != cmd_size) + ret = errno; + + wr_count = resp.bad_wr; + if (wr_count) { + i = wr; + while (--wr_count) + i = i->next; + *bad_wr = i; + } + + return ret; +} + +int rdma_cmd_post_recv(struct rdma_qp *ibqp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) +{ + struct rdma_post_recv *cmd; + struct rdma_post_recv_resp resp; + struct rdma_recv_wr *i; + struct rdma_kern_recv_wr *n, *tmp; + struct rdma_sge *s; + unsigned wr_count = 0; + unsigned sge_count = 0; + int cmd_size; + int ret = 0; + + for (i = wr; i; i = i->next) { + wr_count++; + sge_count += i->num_sge; + } + + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; + cmd = alloca(cmd_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_RECV, &resp, sizeof resp); + cmd->qp_handle = ibqp->handle; + cmd->wr_count = wr_count; + cmd->sge_count = sge_count; + cmd->wqe_size = sizeof *n; + + n = (struct rdma_kern_recv_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); + + tmp = n; + for (i = wr; i; i = i->next) { + tmp->wr_id = i->wr_id; + tmp->num_sge = i->num_sge; + + if (tmp->num_sge) { + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); + s += tmp->num_sge; + } + + tmp++; + } + + resp.bad_wr = 0; + if (write(ibqp->context->cmd_fd, cmd, cmd_size) != cmd_size) + ret = errno; + + wr_count = resp.bad_wr; + if (wr_count) { + i = wr; + while (--wr_count) + i = i->next; + *bad_wr = i; + } + + return ret; +} + +int rdma_cmd_post_srq_recv(struct rdma_srq *srq, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) +{ + struct rdma_post_srq_recv *cmd; + struct rdma_post_srq_recv_resp resp; + struct rdma_recv_wr *i; + struct rdma_kern_recv_wr *n, *tmp; + struct rdma_sge *s; + unsigned wr_count = 0; + unsigned sge_count = 0; + int cmd_size; + int ret = 0; + + for (i = wr; i; i = i->next) { + wr_count++; + sge_count += i->num_sge; + } + + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; + cmd = alloca(cmd_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_SRQ_RECV, &resp, sizeof resp); + cmd->srq_handle = srq->handle; + cmd->wr_count = wr_count; + cmd->sge_count = sge_count; + cmd->wqe_size = sizeof *n; + + n = (struct rdma_kern_recv_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); + + tmp = n; + for (i = wr; i; i = i->next) { + tmp->wr_id = i->wr_id; + tmp->num_sge = i->num_sge; + + if (tmp->num_sge) { + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); + s += tmp->num_sge; + } + + tmp++; + } + + resp.bad_wr = 0; + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) + ret = errno; + + wr_count = resp.bad_wr; + if (wr_count) { + i = wr; + while (--wr_count) + i = i->next; + *bad_wr = i; + } + + return ret; +} + +int rdma_cmd_create_ah(struct rdma_pd *pd, struct rdma_ah *ah, + struct rdma_ah_attr *attr) +{ + struct rdma_create_ah cmd; + struct rdma_create_ah_resp resp; + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, CREATE_AH, &resp, sizeof resp); + cmd.user_handle = (uintptr_t) ah; + cmd.pd_handle = pd->handle; + cmd.attr.dlid = attr->dlid; + cmd.attr.sl = attr->sl; + cmd.attr.src_path_bits = attr->src_path_bits; + cmd.attr.static_rate = attr->static_rate; + cmd.attr.is_global = attr->is_global; + cmd.attr.port_num = attr->port_num; + cmd.attr.grh.flow_label = attr->grh.flow_label; + cmd.attr.grh.sgid_index = attr->grh.sgid_index; + cmd.attr.grh.hop_limit = attr->grh.hop_limit; + cmd.attr.grh.traffic_class = attr->grh.traffic_class; + memcpy(cmd.attr.grh.dgid, attr->grh.dgid.raw, 16); + + if (write(pd->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + ah->handle = resp.handle; + + return 0; +} + +int rdma_cmd_destroy_ah(struct rdma_ah *ah) +{ + struct rdma_destroy_ah cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_AH); + cmd.ah_handle = ah->handle; + + if (write(ah->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_destroy_qp(struct rdma_qp *qp) +{ + struct rdma_destroy_qp cmd; + struct rdma_destroy_qp_resp resp; + + if (abi_ver == 1) + return rdma_cmd_destroy_qp_v1(qp); + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_QP, &resp, sizeof resp); + cmd.qp_handle = qp->handle; + + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + pthread_mutex_lock(&qp->mutex); + while (qp->events_completed != resp.events_reported) + pthread_cond_wait(&qp->cond, &qp->mutex); + pthread_mutex_unlock(&qp->mutex); + + return 0; +} + +int rdma_cmd_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + struct rdma_attach_mcast cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, ATTACH_MCAST); + memcpy(cmd.gid, gid->raw, sizeof cmd.gid); + cmd.qp_handle = qp->handle; + cmd.mlid = lid; + + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + struct rdma_detach_mcast cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DETACH_MCAST); + memcpy(cmd.gid, gid->raw, sizeof cmd.gid); + cmd.qp_handle = qp->handle; + cmd.mlid = lid; + + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} diff -ruNp ORG/librdmaverbs/src/device.c NEW/librdmaverbs/src/device.c --- ORG/librdmaverbs/src/device.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/device.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,270 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: device.c 7631 2006-06-02 19:53:25Z swise $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "rdmaverbs.h" + +static pthread_mutex_t device_list_lock = PTHREAD_MUTEX_INITIALIZER; +static int num_devices; +static struct rdma_device **device_list; + +struct rdma_device **rdma_get_device_list(int *num) +{ + struct rdma_device **l; + int i; + + pthread_mutex_lock(&device_list_lock); + + if (!num_devices) + num_devices = rdmaverbs_init(&device_list); + + l = calloc(num_devices + 1, sizeof (struct rdma_device *)); + for (i = 0; i < num_devices; ++i) + l[i] = device_list[i]; + + pthread_mutex_unlock(&device_list_lock); + + if (num) + *num = l ? num_devices : 0; + + return l; +} + +void rdma_free_device_list(struct rdma_device **list) +{ + free(list); +} + +const char *rdma_get_device_name(struct rdma_device *device) +{ + return device->name; +} + +uint64_t rdma_get_device_guid(struct rdma_device *device) +{ + char attr[24]; + uint64_t guid = 0; + uint16_t parts[4]; + int i; + + if (rdma_read_sysfs_file(device->ibdev_path, "node_guid", + attr, sizeof attr) < 0) + return 0; + + if (sscanf(attr, "%hx:%hx:%hx:%hx", + parts, parts + 1, parts + 2, parts + 3) != 4) + return 0; + + for (i = 0; i < 4; ++i) + guid = (guid << 16) | parts[i]; + + return htonll(guid); +} + +static enum rdma_node_type query_node_type(struct rdma_device *device) +{ + char node_desc[24]; + char node_str[24]; + int node_type; + + if (rdma_read_sysfs_file(device->ibdev_path, "node_type", + node_desc, sizeof(node_desc)) < 0) + return RDMA_NODE_UNKNOWN; + + sscanf(node_desc, "%d: %s\n", (int*)&node_type, node_str); + return (enum rdma_node_type) node_type; +} + +struct rdma_context *rdma_open_device(struct rdma_device *device) +{ + char *devpath; + int cmd_fd; + struct rdma_context *context; + + asprintf(&devpath, "/dev/infiniband/%s", device->dev_name); + + /* + * We'll only be doing writes, but we need O_RDWR in case the + * provider needs to mmap() the file. + */ + cmd_fd = open(devpath, O_RDWR); + free(devpath); + + if (cmd_fd < 0) + return NULL; + + device->node_type = query_node_type(device); + + context = device->ops.alloc_context(device, cmd_fd); + if (!context) + goto err; + + context->device = device; + context->cmd_fd = cmd_fd; + + return context; + +err: + close(cmd_fd); + + return NULL; +} + +int rdma_close_device(struct rdma_context *context) +{ + int async_fd = context->async_fd; + int cmd_fd = context->cmd_fd; + int cq_fd = -1; + + if (abi_ver <= 2) { + struct rdma_abi_compat_v2 *t = context->abi_compat; + cq_fd = t->channel.fd; + free(context->abi_compat); + } + + context->device->ops.free_context(context); + + close(async_fd); + close(cmd_fd); + if (abi_ver <= 2) + close(cq_fd); + + return 0; +} + +int rdma_get_async_event(struct rdma_context *context, + struct rdma_async_event *event) +{ + struct rdma_kern_async_event ev; + + if (read(context->async_fd, &ev, sizeof ev) != sizeof ev) + return -1; + + event->event_type = ev.event_type; + + switch (event->event_type) { + case RDMA_EVENT_CQ_ERR: + event->element.cq = (void *) (uintptr_t) ev.element; + break; + + case RDMA_EVENT_QP_FATAL: + case RDMA_EVENT_QP_REQ_ERR: + case RDMA_EVENT_QP_ACCESS_ERR: + case RDMA_EVENT_COMM_EST: + case RDMA_EVENT_SQ_DRAINED: + case RDMA_EVENT_PATH_MIG: + case RDMA_EVENT_PATH_MIG_ERR: + case RDMA_EVENT_QP_LAST_WQE_REACHED: + event->element.qp = (void *) (uintptr_t) ev.element; + break; + + case RDMA_EVENT_SRQ_ERR: + case RDMA_EVENT_SRQ_LIMIT_REACHED: + event->element.srq = (void *) (uintptr_t) ev.element; + break; + + default: + event->element.port_num = ev.element; + break; + } + + return 0; +} + +void rdma_ack_async_event(struct rdma_async_event *event) +{ + switch (event->event_type) { + case RDMA_EVENT_CQ_ERR: + { + struct rdma_cq *cq = event->element.cq; + + pthread_mutex_lock(&cq->mutex); + ++cq->async_events_completed; + pthread_cond_signal(&cq->cond); + pthread_mutex_unlock(&cq->mutex); + + return; + } + + case RDMA_EVENT_QP_FATAL: + case RDMA_EVENT_QP_REQ_ERR: + case RDMA_EVENT_QP_ACCESS_ERR: + case RDMA_EVENT_COMM_EST: + case RDMA_EVENT_SQ_DRAINED: + case RDMA_EVENT_PATH_MIG: + case RDMA_EVENT_PATH_MIG_ERR: + case RDMA_EVENT_QP_LAST_WQE_REACHED: + { + struct rdma_qp *qp = event->element.qp; + + pthread_mutex_lock(&qp->mutex); + ++qp->events_completed; + pthread_cond_signal(&qp->cond); + pthread_mutex_unlock(&qp->mutex); + + return; + } + + case RDMA_EVENT_SRQ_ERR: + case RDMA_EVENT_SRQ_LIMIT_REACHED: + { + struct rdma_srq *srq = event->element.srq; + + pthread_mutex_lock(&srq->mutex); + ++srq->events_completed; + pthread_cond_signal(&srq->cond); + pthread_mutex_unlock(&srq->mutex); + + return; + } + + default: + return; + } +} diff -ruNp ORG/librdmaverbs/src/init.c NEW/librdmaverbs/src/init.c --- ORG/librdmaverbs/src/init.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/init.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,274 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: init.c 7631 2006-06-02 19:53:25Z swise $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "rdmaverbs.h" + +#ifndef OPENRDMA_DRIVER_PATH_ENV +# define OPENRDMA_DRIVER_PATH_ENV "OPENRDMA_DRIVER_PATH" +#endif + +HIDDEN int abi_ver; + +static char default_path[] = DRIVER_PATH; +static const char *user_path; + +static struct rdma_driver *driver_list; + +static void load_driver(char *so_path) +{ + void *dlhandle; + rdma_driver_init_func init_func; + struct rdma_driver *driver; + + dlhandle = dlopen(so_path, RTLD_NOW); + if (!dlhandle) { + fprintf(stderr, PFX "Warning: couldn't load driver %s: %s\n", + so_path, dlerror()); + return; + } + + dlerror(); + init_func = dlsym(dlhandle, "rdma_driver_init"); + if (dlerror() != NULL || !init_func) { + dlclose(dlhandle); + return; + } + + driver = malloc(sizeof *driver); + if (!driver) { + fprintf(stderr, PFX "Fatal: couldn't allocate driver for %s\n", so_path); + dlclose(dlhandle); + return; + } + + driver->init_func = init_func; + driver->next = driver_list; + driver_list = driver; +} + +static void find_drivers(char *dir) +{ + size_t len = strlen(dir); + glob_t so_glob; + char *pat; + int ret; + int i; + + if (!len) + return; + + while (len && dir[len - 1] == '/') + dir[--len] = '\0'; + + asprintf(&pat, "%s/*.so", dir); + + ret = glob(pat, 0, NULL, &so_glob); + free(pat); + + if (ret) { + if (ret != GLOB_NOMATCH) + fprintf(stderr, PFX "Warning: couldn't search %s\n", pat); + return; + } + + for (i = 0; i < so_glob.gl_pathc; ++i) + load_driver(so_glob.gl_pathv[i]); + + globfree(&so_glob); +} + +static struct rdma_device *init_drivers(const char *class_path, + const char *dev_name) +{ + struct rdma_driver *driver; + struct rdma_device *dev; + int abi_ver = 0; + char sys_path[RDMA_SYSFS_PATH_MAX]; + char ibdev_name[RDMA_SYSFS_NAME_MAX]; + char value[8]; + + snprintf(sys_path, sizeof sys_path, "%s/%s", + class_path, dev_name); + + if (rdma_read_sysfs_file(sys_path, "abi_version", value, sizeof value) > 0) + abi_ver = strtol(value, NULL, 10); + + if (rdma_read_sysfs_file(sys_path, "ibdev", ibdev_name, sizeof ibdev_name) < 0) { + fprintf(stderr, PFX "Warning: no ibdev class attr for %s\n", + sys_path); + return NULL; + } + + for (driver = driver_list; driver; driver = driver->next) { + dev = driver->init_func(sys_path, abi_ver); + if (!dev) + continue; + + dev->driver = driver; + strcpy(dev->dev_path, sys_path); + snprintf(dev->ibdev_path, RDMA_SYSFS_PATH_MAX, "%s/class/infiniband/% s", + rdma_get_sysfs_path(), ibdev_name); + strcpy(dev->dev_name, dev_name); + strcpy(dev->name, ibdev_name); + + return dev; + } + + fprintf(stderr, PFX "Warning: no userspace device-specific driver found for %s\n" + " driver search path: ", dev_name); + if (user_path) + fprintf(stderr, "%s:", user_path); + fprintf(stderr, "%s\n", default_path); + + return NULL; +} + +static int check_abi_version(const char *path) +{ + char value[8]; + + if (rdma_read_sysfs_file(path, "class/infiniband_verbs/abi_version", + value, sizeof value) < 0) { + fprintf(stderr, PFX "Fatal: couldn't read uverbs ABI version.\n"); + return -1; + } + + abi_ver = strtol(value, NULL, 10); + + if (abi_ver < RDMA_USER_VERBS_MIN_ABI_VERSION || + abi_ver > RDMA_USER_VERBS_MAX_ABI_VERSION) { + fprintf(stderr, PFX "Fatal: kernel ABI version %d " + "doesn't match library version %d.\n", + abi_ver, RDMA_USER_VERBS_MAX_ABI_VERSION); + return -1; + } + + return 0; +} + +HIDDEN int rdmaverbs_init(struct rdma_device ***list) +{ + const char *sysfs_path; + char *wr_path, *dir; + char class_path[RDMA_SYSFS_PATH_MAX]; + DIR *class_dir; + struct dirent *dent; + struct rdma_device *device; + struct rdma_device **new_list; + int num_devices = 0; + int list_size = 0; + + *list = NULL; + + if (rdma_init_mem_map()) + return 0; + + find_drivers(default_path); + + /* + * Only follow use path passed in through the calling user's + * environment if we're not running SUID. + */ + if (getuid() == geteuid()) { + user_path = getenv(OPENRDMA_DRIVER_PATH_ENV); + if (user_path) { + wr_path = strdupa(user_path); + while ((dir = strsep(&wr_path, ";:"))) + find_drivers(dir); + } + } + + /* + * Now check if a driver is statically linked. Since we push + * drivers onto our driver list, the last driver we find will + * be the first one we try. + */ + load_driver(NULL); + + sysfs_path = rdma_get_sysfs_path(); + if (!sysfs_path) { + fprintf(stderr, PFX "Fatal: couldn't find sysfs mount.\n"); + return 0; + } + + if (check_abi_version(sysfs_path)) + return 0; + + snprintf(class_path, sizeof class_path, "%s/class/infiniband_verbs", + sysfs_path); + class_dir = opendir(class_path); + if (!class_dir) { + fprintf(stderr, PFX "Fatal: couldn't open sysfs class " + "directory '%s'.\n", class_path); + return 0; + } + + while ((dent = readdir(class_dir))) { + if (dent->d_name[0] == '.' || dent->d_type == DT_REG) + continue; + + device = init_drivers(class_path, dent->d_name); + if (!device) + continue; + + if (list_size <= num_devices) { + list_size = list_size ? list_size * 2 : 1; + new_list = realloc(*list, list_size * sizeof (struct rdma_device *)); + if (!new_list) + goto out; + *list = new_list; + } + + (*list)[num_devices++] = device; + } + + closedir(class_dir); + +out: + return num_devices; +} diff -ruNp ORG/librdmaverbs/src/librdmaverbs.map NEW/librdmaverbs/src/librdmaverbs.map --- ORG/librdmaverbs/src/librdmaverbs.map 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/librdmaverbs.map 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,76 @@ +IBVERBS_1.0 { + global: + rdma_get_device_list; + rdma_free_device_list; + rdma_get_device_name; + rdma_get_device_guid; + rdma_open_device; + rdma_close_device; + rdma_get_async_event; + rdma_ack_async_event; + rdma_query_device; + rdma_query_port; + rdma_query_gid; + rdma_query_pkey; + rdma_alloc_pd; + rdma_dealloc_pd; + rdma_reg_mr; + rdma_dereg_mr; + rdma_create_comp_channel; + rdma_destroy_comp_channel; + rdma_create_cq; + rdma_resize_cq; + rdma_destroy_cq; + rdma_get_cq_event; + rdma_ack_cq_events; + rdma_create_srq; + rdma_modify_srq; + rdma_query_srq; + rdma_destroy_srq; + rdmav_create_qp; + rdma_query_qp; + rdma_modify_qp; + rdmav_destroy_qp; + rdma_create_ah; + rdma_destroy_ah; + rdma_attach_mcast; + rdma_detach_mcast; + rdma_cmd_get_context; + rdma_cmd_query_device; + rdma_cmd_query_port; + rdma_cmd_query_gid; + rdma_cmd_query_pkey; + rdma_cmd_alloc_pd; + rdma_cmd_dealloc_pd; + rdma_cmd_reg_mr; + rdma_cmd_dereg_mr; + rdma_cmd_create_cq; + rdma_cmd_poll_cq; + rdma_cmd_req_notify_cq; + rdma_cmd_resize_cq; + rdma_cmd_destroy_cq; + rdma_cmd_create_srq; + rdma_cmd_modify_srq; + rdma_cmd_query_srq; + rdma_cmd_destroy_srq; + rdma_cmd_create_qp; + rdma_cmd_query_qp; + rdma_cmd_modify_qp; + rdma_cmd_destroy_qp; + rdma_cmd_post_send; + rdma_cmd_post_recv; + rdma_cmd_post_srq_recv; + rdma_cmd_create_ah; + rdma_cmd_destroy_ah; + rdma_cmd_attach_mcast; + rdma_cmd_detach_mcast; + rdma_copy_qp_attr_from_kern; + rdma_copy_path_rec_from_kern; + rdma_copy_path_rec_to_kern; + rdma_rate_to_mult; + mult_to_rdma_rate; + rdma_get_sysfs_path; + rdma_read_sysfs_file; + + local: *; +}; diff -ruNp ORG/librdmaverbs/src/marshall.c NEW/librdmaverbs/src/marshall.c --- ORG/librdmaverbs/src/marshall.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/marshall.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,142 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include + +#include + +static void rdma_copy_ah_attr_from_kern(struct rdma_ah_attr *dst, + struct rdma_kern_ah_attr *src) +{ + memcpy(dst->grh.dgid.raw, src->grh.dgid, sizeof dst->grh.dgid); + dst->grh.flow_label = src->grh.flow_label; + dst->grh.sgid_index = src->grh.sgid_index; + dst->grh.hop_limit = src->grh.hop_limit; + dst->grh.traffic_class = src->grh.traffic_class; + + dst->dlid = src->dlid; + dst->sl = src->sl; + dst->src_path_bits = src->src_path_bits; + dst->static_rate = src->static_rate; + dst->is_global = src->is_global; + dst->port_num = src->port_num; +} + +void rdma_copy_qp_attr_from_kern(struct rdma_qp_attr *dst, + struct rdma_kern_qp_attr *src) +{ + dst->cur_qp_state = src->cur_qp_state; + dst->path_mtu = src->path_mtu; + dst->path_mig_state = src->path_mig_state; + dst->qkey = src->qkey; + dst->rq_psn = src->rq_psn; + dst->sq_psn = src->sq_psn; + dst->dest_qp_num = src->dest_qp_num; + dst->qp_access_flags = src->qp_access_flags; + + dst->cap.max_send_wr = src->max_send_wr; + dst->cap.max_recv_wr = src->max_recv_wr; + dst->cap.max_send_sge = src->max_send_sge; + dst->cap.max_recv_sge = src->max_recv_sge; + dst->cap.max_inline_data = src->max_inline_data; + + rdma_copy_ah_attr_from_kern(&dst->ah_attr, &src->ah_attr); + rdma_copy_ah_attr_from_kern(&dst->alt_ah_attr, &src->alt_ah_attr); + + dst->pkey_index = src->pkey_index; + dst->alt_pkey_index = src->alt_pkey_index; + dst->en_sqd_async_notify = src->en_sqd_async_notify; + dst->sq_draining = src->sq_draining; + dst->max_rd_atomic = src->max_rd_atomic; + dst->max_dest_rd_atomic = src->max_dest_rd_atomic; + dst->min_rnr_timer = src->min_rnr_timer; + dst->port_num = src->port_num; + dst->timeout = src->timeout; + dst->retry_cnt = src->retry_cnt; + dst->rnr_retry = src->rnr_retry; + dst->alt_port_num = src->alt_port_num; + dst->alt_timeout = src->alt_timeout; +} + +void rdma_copy_path_rec_from_kern(struct rdma_sa_path_rec *dst, + struct rdma_kern_path_rec *src) +{ + memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid); + memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid); + + dst->dlid = src->dlid; + dst->slid = src->slid; + dst->raw_traffic = src->raw_traffic; + dst->flow_label = src->flow_label; + dst->hop_limit = src->hop_limit; + dst->traffic_class = src->traffic_class; + dst->reversible = src->reversible; + dst->numb_path = src->numb_path; + dst->pkey = src->pkey; + dst->sl = src->sl; + dst->mtu_selector = src->mtu_selector; + dst->mtu = src->mtu; + dst->rate_selector = src->rate_selector; + dst->rate = src->rate; + dst->packet_life_time = src->packet_life_time; + dst->preference = src->preference; + dst->packet_life_time_selector = src->packet_life_time_selector; +} + +void rdma_copy_path_rec_to_kern(struct rdma_kern_path_rec *dst, + struct rdma_sa_path_rec *src) +{ + memcpy(dst->dgid, src->dgid.raw, sizeof src->dgid); + memcpy(dst->sgid, src->sgid.raw, sizeof src->sgid); + + dst->dlid = src->dlid; + dst->slid = src->slid; + dst->raw_traffic = src->raw_traffic; + dst->flow_label = src->flow_label; + dst->hop_limit = src->hop_limit; + dst->traffic_class = src->traffic_class; + dst->reversible = src->reversible; + dst->numb_path = src->numb_path; + dst->pkey = src->pkey; + dst->sl = src->sl; + dst->mtu_selector = src->mtu_selector; + dst->mtu = src->mtu; + dst->rate_selector = src->rate_selector; + dst->rate = src->rate; + dst->packet_life_time = src->packet_life_time; + dst->preference = src->preference; + dst->packet_life_time_selector = src->packet_life_time_selector; +} diff -ruNp ORG/librdmaverbs/src/memory.c NEW/librdmaverbs/src/memory.c --- ORG/librdmaverbs/src/memory.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/memory.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,258 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: memory.c 6987 2006-05-08 15:18:51Z tom $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include + +#include "rdmaverbs.h" + +/* + * We keep a linked list of page ranges that have been locked along with a + * reference count to manage overlapping registrations, etc. + * + * Eventually we should turn this into an RB-tree or something similar + * to avoid the O(n) cost of registering/unregistering memory. + */ + +struct rdma_mem_node { + struct rdma_mem_node *prev, *next; + uintptr_t start, end; + int refcnt; +}; + +static struct { + struct rdma_mem_node *first; + pthread_mutex_t mutex; + uintptr_t page_size; +} mem_map; + +int rdma_init_mem_map(void) +{ + struct rdma_mem_node *node = NULL; + + node = malloc(sizeof *node); + if (!node) + goto fail; + + node->prev = node->next = NULL; + node->start = 0; + node->end = UINTPTR_MAX; + node->refcnt = 0; + + mem_map.first = node; + + mem_map.page_size = sysconf(_SC_PAGESIZE); + if (mem_map.page_size < 0) + goto fail; + + if (pthread_mutex_init(&mem_map.mutex, NULL)) + goto fail; + + return 0; + +fail: + if (node) + free(node); + + return -1; +} + +static struct rdma_mem_node *__mm_find_first(uintptr_t start, uintptr_t end) +{ + struct rdma_mem_node *node = mem_map.first; + + while (node) { + if ((node->start <= start && node->end >= start) || + (node->start <= end && node->end >= end)) + break; + node = node->next; + } + + return node; +} + +static struct rdma_mem_node *__mm_prev(struct rdma_mem_node *node) +{ + return node->prev; +} + +static struct rdma_mem_node *__mm_next(struct rdma_mem_node *node) +{ + return node->next; +} + +static void __mm_add(struct rdma_mem_node *node, + struct rdma_mem_node *new) +{ + new->prev = node; + new->next = node->next; + node->next = new; + if (new->next) + new->next->prev = new; +} + +static void __mm_remove(struct rdma_mem_node *node) +{ + /* Never have to remove the first node, so we can use prev */ + node->prev->next = node->next; + if (node->next) + node->next->prev = node->prev; +} + +int rdma_lock_range(void *base, size_t size) +{ + uintptr_t start, end; + struct rdma_mem_node *node, *tmp; + int ret = 0; + + if (!size) + return 0; + + start = (uintptr_t) base & ~(mem_map.page_size - 1); + end = ((uintptr_t) (base + size + mem_map.page_size - 1) & + ~(mem_map.page_size - 1)) - 1; + + pthread_mutex_lock(&mem_map.mutex); + + node = __mm_find_first(start, end); + + if (node->start < start) { + tmp = malloc(sizeof *tmp); + if (!tmp) { + ret = -1; + goto out; + } + + tmp->start = start; + tmp->end = node->end; + tmp->refcnt = node->refcnt; + node->end = start - 1; + + __mm_add(node, tmp); + node = tmp; + } + + while (node->start <= end) { + if (node->end > end) { + tmp = malloc(sizeof *tmp); + if (!tmp) { + ret = -1; + goto out; + } + + tmp->start = end + 1; + tmp->end = node->end; + tmp->refcnt = node->refcnt; + node->end = end; + + __mm_add(node, tmp); + } + + + if (node->refcnt++ == 0) { + ret = mlock((void *) node->start, + node->end - node->start + 1); + if (ret) + goto out; + } + + node = __mm_next(node); + } + +out: + pthread_mutex_unlock(&mem_map.mutex); + + return ret; +} + +int rdma_unlock_range(void *base, size_t size) +{ + uintptr_t start, end; + struct rdma_mem_node *node, *tmp; + int ret = 0; + + if (!size) + return 0; + + start = (uintptr_t) base & ~(mem_map.page_size - 1); + end = ((uintptr_t) (base + size + mem_map.page_size - 1) & + ~(mem_map.page_size - 1)) - 1; + + pthread_mutex_lock(&mem_map.mutex); + + node = __mm_find_first(start, end); + + if (node->start != start) { + ret = -1; + goto out; + } + + while (node && node->end <= end) { + if (--node->refcnt == 0) { + ret = munlock((void *) node->start, + node->end - node->start + 1); + } + + if (__mm_prev(node) && node->refcnt == __mm_prev(node)->refcnt) { + __mm_prev(node)->end = node->end; + tmp = __mm_prev(node); + __mm_remove(node); + node = tmp; + } + + node = __mm_next(node); + } + + if (node && node->refcnt == __mm_prev(node)->refcnt) { + __mm_prev(node)->end = node->end; + tmp = __mm_prev(node); + __mm_remove(node); + } + + if (node->end != end) { + ret = -1; + goto out; + } + +out: + pthread_mutex_unlock(&mem_map.mutex); + + return ret; +} diff -ruNp ORG/librdmaverbs/src/rdmaverbs.h NEW/librdmaverbs/src/rdmaverbs.h --- ORG/librdmaverbs/src/rdmaverbs.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/rdmaverbs.h 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,87 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef RDMA_VERBS_H +#define RDMA_VERBS_H + +#include + +#include + +#define HIDDEN __attribute__((visibility ("hidden"))) + +#define INIT __attribute__((constructor)) +#define FINI __attribute__((destructor)) + +#define PFX "librdmaverbs: " + +struct rdma_driver { + rdma_driver_init_func init_func; + struct rdma_driver *next; +}; + +struct rdma_abi_compat_v2 { + struct rdma_comp_channel channel; + pthread_mutex_t in_use; +}; + +extern HIDDEN int abi_ver; + +extern HIDDEN int rdmaverbs_init(struct rdma_device ***list); + +extern HIDDEN int rdma_init_mem_map(void); +extern HIDDEN int rdma_lock_range(void *base, size_t size); +extern HIDDEN int rdma_unlock_range(void *base, size_t size); + +#define RDMA_INIT_CMD(cmd, size, opcode) \ + do { \ + if (abi_ver > 2) \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode; \ + else \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode##_V2; \ + (cmd)->in_words = (size) / 4; \ + (cmd)->out_words = 0; \ + } while (0) + +#define RDMA_INIT_CMD_RESP(cmd, size, opcode, out, outsize) \ + do { \ + if (abi_ver > 2) \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode; \ + else \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode##_V2; \ + (cmd)->in_words = (size) / 4; \ + (cmd)->out_words = (outsize) / 4; \ + (cmd)->response = (uintptr_t) (out); \ + } while (0) + +#endif /* RDMA_VERBS_H */ diff -ruNp ORG/librdmaverbs/src/sysfs.c NEW/librdmaverbs/src/sysfs.c --- ORG/librdmaverbs/src/sysfs.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/sysfs.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,104 @@ +/* + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include + +#include "rdmaverbs.h" + +static char *sysfs_path; + +const char *rdma_get_sysfs_path(void) +{ + char *env = NULL; + + if (sysfs_path) + return sysfs_path; + + /* + * Only follow use path passed in through the calling user's + * environment if we're not running SUID. + */ + if (getuid() == geteuid()) + env = getenv("SYSFS_PATH"); + + if (env) { + int len; + + sysfs_path = strndup(env, RDMA_SYSFS_PATH_MAX); + len = strlen(sysfs_path); + while (len > 0 && sysfs_path[len - 1] == '/') { + --len; + sysfs_path[len] = '\0'; + } + } else + sysfs_path = "/sys"; + + return sysfs_path; +} + +int rdma_read_sysfs_file(const char *dir, const char *file, + char *buf, size_t size) +{ + char *path; + int fd; + int len; + + asprintf(&path, "%s/%s", dir, file); + + fd = open(path, O_RDONLY); + if (fd < 0) { + free(path); + return -1; + } + + len = read(fd, buf, size); + + close(fd); + free(path); + + if (len > 0 && buf[len - 1] == '\n') + buf[--len] = '\0'; + + return len; +} diff -ruNp ORG/librdmaverbs/src/verbs.c NEW/librdmaverbs/src/verbs.c --- ORG/librdmaverbs/src/verbs.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/verbs.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,408 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: verbs.c 7631 2006-06-02 19:53:25Z swise $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include + +#include "rdmaverbs.h" + +int rdma_rate_to_mult(enum rdma_rate rate) +{ + switch (rate) { + case RDMA_RATE_2_5_GBPS: return 1; + case RDMA_RATE_5_GBPS: return 2; + case RDMA_RATE_10_GBPS: return 4; + case RDMA_RATE_20_GBPS: return 8; + case RDMA_RATE_30_GBPS: return 12; + case RDMA_RATE_40_GBPS: return 16; + case RDMA_RATE_60_GBPS: return 24; + case RDMA_RATE_80_GBPS: return 32; + case RDMA_RATE_120_GBPS: return 48; + default: return -1; + } +} + +enum rdma_rate mult_to_rdma_rate(int mult) +{ + switch (mult) { + case 1: return RDMA_RATE_2_5_GBPS; + case 2: return RDMA_RATE_5_GBPS; + case 4: return RDMA_RATE_10_GBPS; + case 8: return RDMA_RATE_20_GBPS; + case 12: return RDMA_RATE_30_GBPS; + case 16: return RDMA_RATE_40_GBPS; + case 24: return RDMA_RATE_60_GBPS; + case 32: return RDMA_RATE_80_GBPS; + case 48: return RDMA_RATE_120_GBPS; + default: return RDMA_RATE_MAX; + } +} + +int rdma_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr) +{ + return context->ops.query_device(context, device_attr); +} + +int rdma_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr) +{ + return context->ops.query_port(context, port_num, port_attr); +} + +int rdma_query_gid(struct rdma_context *context, uint8_t port_num, + int index, union rdma_gid *gid) +{ + char name[24]; + char attr[41]; + uint16_t val; + int i; + + snprintf(name, sizeof name, "ports/%d/gids/%d", port_num, index); + + if (rdma_read_sysfs_file(context->device->ibdev_path, name, + attr, sizeof attr) < 0) + return -1; + + for (i = 0; i < 8; ++i) { + if (sscanf(attr + i * 5, "%hx", &val) != 1) + return -1; + gid->raw[i * 2 ] = val >> 8; + gid->raw[i * 2 + 1] = val & 0xff; + } + + return 0; +} + +int rdma_query_pkey(struct rdma_context *context, uint8_t port_num, + int index, uint16_t *pkey) +{ + char name[24]; + char attr[8]; + uint16_t val; + + snprintf(name, sizeof name, "ports/%d/pkeys/%d", port_num, index); + + if (rdma_read_sysfs_file(context->device->ibdev_path, name, + attr, sizeof attr) < 0) + return -1; + + if (sscanf(attr, "%hx", &val) != 1) + return -1; + + *pkey = htons(val); + return 0; +} + +struct rdma_pd *rdma_alloc_pd(struct rdma_context *context) +{ + struct rdma_pd *pd; + + pd = context->ops.alloc_pd(context); + if (pd) + pd->context = context; + + return pd; +} + +int rdma_dealloc_pd(struct rdma_pd *pd) +{ + return pd->context->ops.dealloc_pd(pd); +} + +struct rdma_mr *rdma_reg_mr(struct rdma_pd *pd, void *addr, + size_t length, enum rdma_access_flags access) +{ + struct rdma_mr *mr; + + mr = pd->context->ops.reg_mr(pd, addr, length, access); + if (mr) { + mr->context = pd->context; + mr->pd = pd; + } + + return mr; +} + +int rdma_dereg_mr(struct rdma_mr *mr) +{ + return mr->context->ops.dereg_mr(mr); +} + +static struct rdma_comp_channel *rdma_create_comp_channel_v2(struct rdma_context *context) +{ + struct rdma_abi_compat_v2 *t = context->abi_compat; + static int warned; + + if (!pthread_mutex_trylock(&t->in_use)) + return &t->channel; + + if (!warned) { + fprintf(stderr, PFX "Warning: kernel's ABI version %d limits capacity.\n" + " Only one completion channel can be created per context.\n", + abi_ver); + ++warned; + } + + return NULL; +} + +struct rdma_comp_channel *rdma_create_comp_channel(struct rdma_context *context) +{ + struct rdma_comp_channel *channel; + struct rdma_create_comp_channel cmd; + struct rdma_create_comp_channel_resp resp; + + if (abi_ver <= 2) + return rdma_create_comp_channel_v2(context); + + channel = malloc(sizeof *channel); + if (!channel) + return NULL; + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, CREATE_COMP_CHANNEL, &resp, sizeof resp); + if (write(context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) { + free(channel); + return NULL; + } + + channel->fd = resp.fd; + + return channel; +} + +static int rdma_destroy_comp_channel_v2(struct rdma_comp_channel *channel) +{ + struct rdma_abi_compat_v2 *t = (struct rdma_abi_compat_v2 *) channel; + pthread_mutex_unlock(&t->in_use); + return 0; +} + +int rdma_destroy_comp_channel(struct rdma_comp_channel *channel) +{ + if (abi_ver <= 2) + return rdma_destroy_comp_channel_v2(channel); + + close(channel->fd); + free(channel); + + return 0; +} + +struct rdma_cq *rdma_create_cq(struct rdma_context *context, int cqe, void *cq_context, + struct rdma_comp_channel *channel, int comp_vector) +{ + struct rdma_cq *cq = context->ops.create_cq(context, cqe, channel, + comp_vector); + + if (cq) { + cq->context = context; + cq->cq_context = cq_context; + cq->comp_events_completed = 0; + cq->async_events_completed = 0; + pthread_mutex_init(&cq->mutex, NULL); + pthread_cond_init(&cq->cond, NULL); + } + + return cq; +} + +int rdma_resize_cq(struct rdma_cq *cq, int cqe) +{ + if (!cq->context->ops.resize_cq) + return ENOSYS; + + return cq->context->ops.resize_cq(cq, cqe); +} + +int rdma_destroy_cq(struct rdma_cq *cq) +{ + return cq->context->ops.destroy_cq(cq); +} + + +int rdma_get_cq_event(struct rdma_comp_channel *channel, + struct rdma_cq **cq, void **cq_context) +{ + struct rdma_comp_event ev; + + if (read(channel->fd, &ev, sizeof ev) != sizeof ev) + return -1; + + *cq = (struct rdma_cq *) (uintptr_t) ev.cq_handle; + *cq_context = (*cq)->cq_context; + + if ((*cq)->context->ops.cq_event) + (*cq)->context->ops.cq_event(*cq); + + return 0; +} + +void rdma_ack_cq_events(struct rdma_cq *cq, unsigned int nevents) +{ + pthread_mutex_lock(&cq->mutex); + cq->comp_events_completed += nevents; + pthread_cond_signal(&cq->cond); + pthread_mutex_unlock(&cq->mutex); +} + +struct rdma_srq *rdma_create_srq(struct rdma_pd *pd, + struct rdma_srq_init_attr *srq_init_attr) +{ + struct rdma_srq *srq; + + if (!pd->context->ops.create_srq) + return NULL; + + srq = pd->context->ops.create_srq(pd, srq_init_attr); + if (srq) { + srq->context = pd->context; + srq->srq_context = srq_init_attr->srq_context; + srq->pd = pd; + srq->events_completed = 0; + pthread_mutex_init(&srq->mutex, NULL); + pthread_cond_init(&srq->cond, NULL); + } + + return srq; +} + +int rdma_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask) +{ + return srq->context->ops.modify_srq(srq, srq_attr, srq_attr_mask); +} + +int rdma_query_srq(struct rdma_srq *srq, struct rdma_srq_attr *srq_attr) +{ + return srq->context->ops.query_srq(srq, srq_attr); +} + +int rdma_destroy_srq(struct rdma_srq *srq) +{ + return srq->context->ops.destroy_srq(srq); +} + +struct rdma_qp *rdmav_create_qp(struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr) +{ + struct rdma_qp *qp = pd->context->ops.create_qp(pd, qp_init_attr); + + if (qp) { + qp->context = pd->context; + qp->qp_context = qp_init_attr->qp_context; + qp->pd = pd; + qp->send_cq = qp_init_attr->send_cq; + qp->recv_cq = qp_init_attr->recv_cq; + qp->srq = qp_init_attr->srq; + qp->qp_type = qp_init_attr->qp_type; + qp->events_completed = 0; + pthread_mutex_init(&qp->mutex, NULL); + pthread_cond_init(&qp->cond, NULL); + } + + return qp; +} + +int rdma_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr) +{ + int ret; + + ret = qp->context->ops.query_qp(qp, attr, attr_mask, init_attr); + if (ret) + return ret; + + if (attr_mask & RDMA_QP_STATE) + qp->state = attr->qp_state; + + return 0; +} + +int rdma_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask) +{ + int ret; + + ret = qp->context->ops.modify_qp(qp, attr, attr_mask); + if (ret) + return ret; + + if (attr_mask & RDMA_QP_STATE) + qp->state = attr->qp_state; + + return 0; +} + +int rdmav_destroy_qp(struct rdma_qp *qp) +{ + return qp->context->ops.destroy_qp(qp); +} + +struct rdma_ah *rdma_create_ah(struct rdma_pd *pd, struct rdma_ah_attr *attr) +{ + struct rdma_ah *ah = pd->context->ops.create_ah(pd, attr); + + if (ah) { + ah->context = pd->context; + ah->pd = pd; + } + + return ah; +} + +int rdma_destroy_ah(struct rdma_ah *ah) +{ + return ah->context->ops.destroy_ah(ah); +} + +int rdma_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + return qp->context->ops.attach_mcast(qp, gid, lid); +} + +int rdma_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + return qp->context->ops.detach_mcast(qp, gid, lid); +} From krkumar2 at in.ibm.com Mon Jul 10 03:24:16 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 15:54:16 +0530 Subject: [openib-general] [RFC] [PATCH 2/7] ibrdmaverbs src files In-Reply-To: <1152526945.28048.7.camel@localhost.localdomain> References: <1152526945.28048.7.camel@localhost.localdomain> Message-ID: <1152527056.28048.10.camel@localhost.localdomain> Sorry, this should be [PATCH 4/7]. And I seem to have missed "Signed off" for earlier patches. Repeatedly doing evolution which failed each time with "glibc free" error led me to forget it the time individual mails were sent out :-) thanks, - KK On Mon, 2006-07-10 at 15:52 +0530, Krishna Kumar wrote: > This library provides equivalent functionality to > libibverbs, but changes the data types and verb > API's to be transport neutral. This patch contains > the source files. > > Signed-of-by: Krishna Kumar > > diff -ruNp ORG/librdmaverbs/src/cmd.c NEW/librdmaverbs/src/cmd.c > --- ORG/librdmaverbs/src/cmd.c 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/src/cmd.c 2006-07-10 18:07:47.000000000 -0700 > @@ -0,0 +1,1060 @@ > +/* > + * Copyright (c) 2005 Topspin Communications. All rights reserved. > + * Copyright (c) 2005 PathScale, Inc. All rights reserved. > + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id: cmd.c 7631 2006-06-02 19:53:25Z swise $ > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "rdmaverbs.h" > + > +static int rdma_cmd_get_context_v2(struct rdma_context *context, > + struct rdma_get_context *new_cmd, > + size_t new_cmd_size, > + struct rdma_get_context_resp *resp, > + size_t resp_size) > +{ > + struct rdma_abi_compat_v2 *t; > + struct rdma_get_context_v2 *cmd; > + size_t cmd_size; > + uint32_t cq_fd; > + > + t = malloc(sizeof *t); > + if (!t) > + return ENOMEM; > + pthread_mutex_init(&t->in_use, NULL); > + > + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; > + cmd = alloca(cmd_size); > + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof > *new_cmd); > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); > + cmd->cq_fd_tab = (uintptr_t) &cq_fd; > + > + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + context->async_fd = resp->async_fd; > + context->num_comp_vectors = 1; > + t->channel.fd = cq_fd; > + context->abi_compat = t; > + > + return 0; > +} > + > +int rdma_cmd_get_context(struct rdma_context *context, struct > rdma_get_context *cmd, > + size_t cmd_size, struct rdma_get_context_resp *resp, > + size_t resp_size) > +{ > + if (abi_ver <= 2) > + return rdma_cmd_get_context_v2(context, cmd, cmd_size, resp, > resp_size); > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); > + > + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + context->async_fd = resp->async_fd; > + context->num_comp_vectors = resp->num_comp_vectors; > + > + return 0; > +} > + > +int rdma_cmd_query_device(struct rdma_context *context, > + struct rdma_device_attr *device_attr, > + uint64_t *raw_fw_ver, > + struct rdma_query_device *cmd, size_t cmd_size) > +{ > + struct rdma_query_device_resp resp; > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_DEVICE, &resp, sizeof resp); > + > + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + memset(device_attr->fw_ver, 0, sizeof device_attr->fw_ver); > + *raw_fw_ver = resp.fw_ver; > + device_attr->node_guid = resp.node_guid; > + device_attr->sys_image_guid = resp.sys_image_guid; > + device_attr->max_mr_size = resp.max_mr_size; > + device_attr->page_size_cap = resp.page_size_cap; > + device_attr->vendor_id = resp.vendor_id; > + device_attr->vendor_part_id = resp.vendor_part_id; > + device_attr->hw_ver = resp.hw_ver; > + device_attr->max_qp = resp.max_qp; > + device_attr->max_qp_wr = resp.max_qp_wr; > + device_attr->device_cap_flags = resp.device_cap_flags; > + device_attr->max_sge = resp.max_sge; > + device_attr->max_sge_rd = resp.max_sge_rd; > + device_attr->max_cq = resp.max_cq; > + device_attr->max_cqe = resp.max_cqe; > + device_attr->max_mr = resp.max_mr; > + device_attr->max_pd = resp.max_pd; > + device_attr->max_qp_rd_atom = resp.max_qp_rd_atom; > + device_attr->max_ee_rd_atom = resp.max_ee_rd_atom; > + device_attr->max_res_rd_atom = resp.max_res_rd_atom; > + device_attr->max_qp_init_rd_atom = resp.max_qp_init_rd_atom; > + device_attr->max_ee_init_rd_atom = resp.max_ee_init_rd_atom; > + device_attr->atomic_cap = resp.atomic_cap; > + device_attr->max_ee = resp.max_ee; > + device_attr->max_rdd = resp.max_rdd; > + device_attr->max_mw = resp.max_mw; > + device_attr->max_raw_ipv6_qp = resp.max_raw_ipv6_qp; > + device_attr->max_raw_ethy_qp = resp.max_raw_ethy_qp; > + device_attr->max_mcast_grp = resp.max_mcast_grp; > + device_attr->max_mcast_qp_attach = resp.max_mcast_qp_attach; > + device_attr->max_total_mcast_qp_attach = > resp.max_total_mcast_qp_attach; > + device_attr->max_ah = resp.max_ah; > + device_attr->max_fmr = resp.max_fmr; > + device_attr->max_map_per_fmr = resp.max_map_per_fmr; > + device_attr->max_srq = resp.max_srq; > + device_attr->max_srq_wr = resp.max_srq_wr; > + device_attr->max_srq_sge = resp.max_srq_sge; > + device_attr->max_pkeys = resp.max_pkeys; > + device_attr->local_ca_ack_delay = resp.local_ca_ack_delay; > + device_attr->phys_port_cnt = resp.phys_port_cnt; > + > + return 0; > +} > + > +int rdma_cmd_query_port(struct rdma_context *context, uint8_t port_num, > + struct rdma_port_attr *port_attr, > + struct rdma_query_port *cmd, size_t cmd_size) > +{ > + struct rdma_query_port_resp resp; > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_PORT, &resp, sizeof resp); > + cmd->port_num = port_num; > + > + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + port_attr->state = resp.state; > + port_attr->max_mtu = resp.max_mtu; > + port_attr->active_mtu = resp.active_mtu; > + port_attr->gid_tbl_len = resp.gid_tbl_len; > + port_attr->port_cap_flags = resp.port_cap_flags; > + port_attr->max_msg_sz = resp.max_msg_sz; > + port_attr->bad_pkey_cntr = resp.bad_pkey_cntr; > + port_attr->qkey_viol_cntr = resp.qkey_viol_cntr; > + port_attr->pkey_tbl_len = resp.pkey_tbl_len; > + port_attr->lid = resp.lid; > + port_attr->sm_lid = resp.sm_lid; > + port_attr->lmc = resp.lmc; > + port_attr->max_vl_num = resp.max_vl_num; > + port_attr->sm_sl = resp.sm_sl; > + port_attr->subnet_timeout = resp.subnet_timeout; > + port_attr->init_type_reply = resp.init_type_reply; > + port_attr->active_width = resp.active_width; > + port_attr->active_speed = resp.active_speed; > + port_attr->phys_state = resp.phys_state; > + > + return 0; > +} > + > +int rdma_cmd_alloc_pd(struct rdma_context *context, struct rdma_pd *pd, > + struct rdma_alloc_pd *cmd, size_t cmd_size, > + struct rdma_alloc_pd_resp *resp, size_t resp_size) > +{ > + RDMA_INIT_CMD_RESP(cmd, cmd_size, ALLOC_PD, resp, resp_size); > + > + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + pd->handle = resp->pd_handle; > + > + return 0; > +} > + > +int rdma_cmd_dealloc_pd(struct rdma_pd *pd) > +{ > + struct rdma_dealloc_pd cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, DEALLOC_PD); > + cmd.pd_handle = pd->handle; > + > + if (write(pd->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_reg_mr(struct rdma_pd *pd, void *addr, size_t length, > + uint64_t hca_va, enum rdma_access_flags access, > + struct rdma_mr *mr, struct rdma_reg_mr *cmd, > + size_t cmd_size) > +{ > + struct rdma_reg_mr_resp resp; > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, REG_MR, &resp, sizeof resp); > + > + cmd->start = (uintptr_t) addr; > + cmd->length = length; > + cmd->hca_va = hca_va; > + cmd->pd_handle = pd->handle; > + cmd->access_flags = access; > + > + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + mr->handle = resp.mr_handle; > + mr->lkey = resp.lkey; > + mr->rkey = resp.rkey; > + > + return 0; > +} > + > +int rdma_cmd_dereg_mr(struct rdma_mr *mr) > +{ > + struct rdma_dereg_mr cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, DEREG_MR); > + cmd.mr_handle = mr->handle; > + > + if (write(mr->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > + > +static int rdma_cmd_create_cq_v2(struct rdma_context *context, int cqe, > + struct rdma_cq *cq, > + struct rdma_create_cq *new_cmd, size_t new_cmd_size, > + struct rdma_create_cq_resp *resp, size_t resp_size) > +{ > + struct rdma_create_cq_v2 *cmd; > + size_t cmd_size; > + > + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; > + cmd = alloca(cmd_size); > + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof > *new_cmd); > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); > + cmd->user_handle = (uintptr_t) cq; > + cmd->cqe = cqe; > + cmd->event_handler = 0; > + > + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + cq->handle = resp->cq_handle; > + cq->cqe = resp->cqe; > + > + return 0; > +} > + > +int rdma_cmd_create_cq(struct rdma_context *context, int cqe, > + struct rdma_comp_channel *channel, > + int comp_vector, struct rdma_cq *cq, > + struct rdma_create_cq *cmd, size_t cmd_size, > + struct rdma_create_cq_resp *resp, size_t resp_size) > +{ > + if (abi_ver <= 2) > + return rdma_cmd_create_cq_v2(context, cqe, cq, > + cmd, cmd_size, resp, resp_size); > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); > + cmd->user_handle = (uintptr_t) cq; > + cmd->cqe = cqe; > + cmd->comp_vector = comp_vector; > + cmd->comp_channel = channel ? channel->fd : -1; > + cmd->reserved = 0; > + > + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + cq->handle = resp->cq_handle; > + cq->cqe = resp->cqe; > + > + return 0; > +} > + > +int rdma_cmd_poll_cq(struct rdma_cq *ibcq, int ne, struct rdma_wc *wc) > +{ > + struct rdma_poll_cq cmd; > + struct rdma_poll_cq_resp *resp; > + int i; > + int rsize; > + int ret; > + > + rsize = sizeof *resp + ne * sizeof(struct rdma_kern_wc); > + resp = malloc(rsize); > + if (!resp) > + return -1; > + > + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, POLL_CQ, resp, rsize); > + cmd.cq_handle = ibcq->handle; > + cmd.ne = ne; > + > + if (write(ibcq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) { > + ret = -1; > + goto out; > + } > + > + for (i = 0; i < resp->count; i++) { > + wc[i].wr_id = resp->wc[i].wr_id; > + wc[i].status = resp->wc[i].status; > + wc[i].opcode = resp->wc[i].opcode; > + wc[i].vendor_err = resp->wc[i].vendor_err; > + wc[i].byte_len = resp->wc[i].byte_len; > + wc[i].imm_data = resp->wc[i].imm_data; > + wc[i].qp_num = resp->wc[i].qp_num; > + wc[i].src_qp = resp->wc[i].src_qp; > + wc[i].wc_flags = resp->wc[i].wc_flags; > + wc[i].pkey_index = resp->wc[i].pkey_index; > + wc[i].slid = resp->wc[i].slid; > + wc[i].sl = resp->wc[i].sl; > + wc[i].dlid_path_bits = resp->wc[i].dlid_path_bits; > + } > + > + ret = resp->count; > + > +out: > + free(resp); > + return ret; > +} > + > +int rdma_cmd_req_notify_cq(struct rdma_cq *ibcq, int solicited_only) > +{ > + struct rdma_req_notify_cq cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, REQ_NOTIFY_CQ); > + cmd.cq_handle = ibcq->handle; > + cmd.solicited = !!solicited_only; > + > + if (write(ibcq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_resize_cq(struct rdma_cq *cq, int cqe, > + struct rdma_resize_cq *cmd, size_t cmd_size) > +{ > + struct rdma_resize_cq_resp resp; > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, RESIZE_CQ, &resp, sizeof resp); > + cmd->cq_handle = cq->handle; > + cmd->cqe = cqe; > + > + if (write(cq->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + cq->cqe = resp.cqe; > + > + return 0; > +} > + > +static int rdma_cmd_destroy_cq_v1(struct rdma_cq *cq) > +{ > + struct rdma_destroy_cq_v1 cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_CQ); > + cmd.cq_handle = cq->handle; > + > + if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_destroy_cq(struct rdma_cq *cq) > +{ > + struct rdma_destroy_cq cmd; > + struct rdma_destroy_cq_resp resp; > + > + if (abi_ver == 1) > + return rdma_cmd_destroy_cq_v1(cq); > + > + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_CQ, &resp, sizeof resp); > + cmd.cq_handle = cq->handle; > + > + if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + pthread_mutex_lock(&cq->mutex); > + while (cq->comp_events_completed != resp.comp_events_reported || > + cq->async_events_completed != resp.async_events_reported) > + pthread_cond_wait(&cq->cond, &cq->mutex); > + pthread_mutex_unlock(&cq->mutex); > + > + return 0; > +} > + > +int rdma_cmd_create_srq(struct rdma_pd *pd, > + struct rdma_srq *srq, struct rdma_srq_init_attr *attr, > + struct rdma_create_srq *cmd, size_t cmd_size, > + struct rdma_create_srq_resp *resp, size_t resp_size) > +{ > + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_SRQ, resp, resp_size); > + cmd->user_handle = (uintptr_t) srq; > + cmd->pd_handle = pd->handle; > + cmd->max_wr = attr->attr.max_wr; > + cmd->max_sge = attr->attr.max_sge; > + cmd->srq_limit = attr->attr.srq_limit; > + > + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + srq->handle = resp->srq_handle; > + > + if (abi_ver > 5) { > + attr->attr.max_wr = resp->max_wr; > + attr->attr.max_sge = resp->max_sge; > + } else { > + struct rdma_create_srq_resp_v5 *resp_v5 = > + (struct rdma_create_srq_resp_v5 *) resp; > + > + memmove((void *) resp + sizeof *resp, > + (void *) resp_v5 + sizeof *resp_v5, > + resp_size - sizeof *resp); > + } > + > + return 0; > +} > + > +static int rdma_cmd_modify_srq_v3(struct rdma_srq *srq, > + struct rdma_srq_attr *srq_attr, > + enum rdma_srq_attr_mask srq_attr_mask, > + struct rdma_modify_srq *new_cmd, > + size_t new_cmd_size) > +{ > + struct rdma_modify_srq_v3 *cmd; > + size_t cmd_size; > + > + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; > + cmd = alloca(cmd_size); > + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof > *new_cmd); > + > + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); > + > + cmd->srq_handle = srq->handle; > + cmd->attr_mask = srq_attr_mask; > + cmd->max_wr = srq_attr->max_wr; > + cmd->srq_limit = srq_attr->srq_limit; > + cmd->max_sge = 0; > + cmd->reserved = 0; > + > + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_modify_srq(struct rdma_srq *srq, > + struct rdma_srq_attr *srq_attr, > + enum rdma_srq_attr_mask srq_attr_mask, > + struct rdma_modify_srq *cmd, size_t cmd_size) > +{ > + if (abi_ver == 3) > + return rdma_cmd_modify_srq_v3(srq, srq_attr, srq_attr_mask, > + cmd, cmd_size); > + > + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); > + > + cmd->srq_handle = srq->handle; > + cmd->attr_mask = srq_attr_mask; > + cmd->max_wr = srq_attr->max_wr; > + cmd->srq_limit = srq_attr->srq_limit; > + > + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_query_srq(struct rdma_srq *srq, struct rdma_srq_attr > *srq_attr, > + struct rdma_query_srq *cmd, size_t cmd_size) > +{ > + struct rdma_query_srq_resp resp; > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_SRQ, &resp, sizeof resp); > + cmd->srq_handle = srq->handle; > + > + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + srq_attr->max_wr = resp.max_wr; > + srq_attr->max_sge = resp.max_sge; > + srq_attr->srq_limit = resp.srq_limit; > + > + return 0; > +} > + > +static int rdma_cmd_destroy_srq_v1(struct rdma_srq *srq) > +{ > + struct rdma_destroy_srq_v1 cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_SRQ); > + cmd.srq_handle = srq->handle; > + > + if (write(srq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_destroy_srq(struct rdma_srq *srq) > +{ > + struct rdma_destroy_srq cmd; > + struct rdma_destroy_srq_resp resp; > + > + if (abi_ver == 1) > + return rdma_cmd_destroy_srq_v1(srq); > + > + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_SRQ, &resp, sizeof resp); > + cmd.srq_handle = srq->handle; > + > + if (write(srq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + pthread_mutex_lock(&srq->mutex); > + while (srq->events_completed != resp.events_reported) > + pthread_cond_wait(&srq->cond, &srq->mutex); > + pthread_mutex_unlock(&srq->mutex); > + > + return 0; > +} > + > +int rdma_cmd_create_qp(struct rdma_pd *pd, > + struct rdma_qp *qp, struct rdma_qp_init_attr *attr, > + struct rdma_create_qp *cmd, size_t cmd_size, > + struct rdma_create_qp_resp *resp, size_t resp_size) > +{ > + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_QP, resp, resp_size); > + > + cmd->user_handle = (uintptr_t) qp; > + cmd->pd_handle = pd->handle; > + cmd->send_cq_handle = attr->send_cq->handle; > + cmd->recv_cq_handle = attr->recv_cq->handle; > + cmd->srq_handle = attr->srq ? attr->srq->handle : 0; > + cmd->max_send_wr = attr->cap.max_send_wr; > + cmd->max_recv_wr = attr->cap.max_recv_wr; > + cmd->max_send_sge = attr->cap.max_send_sge; > + cmd->max_recv_sge = attr->cap.max_recv_sge; > + cmd->max_inline_data = attr->cap.max_inline_data; > + cmd->sq_sig_all = attr->sq_sig_all; > + cmd->qp_type = attr->qp_type; > + cmd->is_srq = !!attr->srq; > + > + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + qp->handle = resp->qp_handle; > + qp->qp_num = resp->qpn; > + > + if (abi_ver > 3) { > + attr->cap.max_recv_sge = resp->max_recv_sge; > + attr->cap.max_send_sge = resp->max_send_sge; > + attr->cap.max_recv_wr = resp->max_recv_wr; > + attr->cap.max_send_wr = resp->max_send_wr; > + attr->cap.max_inline_data = resp->max_inline_data; > + } > + > + if (abi_ver == 4) { > + struct rdma_create_qp_resp_v4 *resp_v4 = > + (struct rdma_create_qp_resp_v4 *) resp; > + > + memmove((void *) resp + sizeof *resp, > + (void *) resp_v4 + sizeof *resp_v4, > + resp_size - sizeof *resp); > + } else if (abi_ver <= 3) { > + struct rdma_create_qp_resp_v3 *resp_v3 = > + (struct rdma_create_qp_resp_v3 *) resp; > + > + memmove((void *) resp + sizeof *resp, > + (void *) resp_v3 + sizeof *resp_v3, > + resp_size - sizeof *resp); > + } > + > + return 0; > +} > + > +int rdma_cmd_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, > + enum rdma_qp_attr_mask attr_mask, > + struct rdma_qp_init_attr *init_attr, > + struct rdma_query_qp *cmd, size_t cmd_size) > +{ > + struct rdma_query_qp_resp resp; > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_QP, &resp, sizeof resp); > + cmd->qp_handle = qp->handle; > + cmd->attr_mask = attr_mask; > + > + if (write(qp->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + attr->qkey = resp.qkey; > + attr->rq_psn = resp.rq_psn; > + attr->sq_psn = resp.sq_psn; > + attr->dest_qp_num = resp.dest_qp_num; > + attr->qp_access_flags = resp.qp_access_flags; > + attr->pkey_index = resp.pkey_index; > + attr->alt_pkey_index = resp.alt_pkey_index; > + attr->qp_state = resp.qp_state; > + attr->cur_qp_state = resp.cur_qp_state; > + attr->path_mtu = resp.path_mtu; > + attr->path_mig_state = resp.path_mig_state; > + attr->en_sqd_async_notify = resp.en_sqd_async_notify; > + attr->max_rd_atomic = resp.max_rd_atomic; > + attr->max_dest_rd_atomic = resp.max_dest_rd_atomic; > + attr->min_rnr_timer = resp.min_rnr_timer; > + attr->port_num = resp.port_num; > + attr->timeout = resp.timeout; > + attr->retry_cnt = resp.retry_cnt; > + attr->rnr_retry = resp.rnr_retry; > + attr->alt_port_num = resp.alt_port_num; > + attr->alt_timeout = resp.alt_timeout; > + attr->cap.max_send_wr = resp.max_send_wr; > + attr->cap.max_recv_wr = resp.max_recv_wr; > + attr->cap.max_send_sge = resp.max_send_sge; > + attr->cap.max_recv_sge = resp.max_recv_sge; > + attr->cap.max_inline_data = resp.max_inline_data; > + > + memcpy(attr->ah_attr.grh.dgid.raw, resp.dest.dgid, 16); > + attr->ah_attr.grh.flow_label = resp.dest.flow_label; > + attr->ah_attr.dlid = resp.dest.dlid; > + attr->ah_attr.grh.sgid_index = resp.dest.sgid_index; > + attr->ah_attr.grh.hop_limit = resp.dest.hop_limit; > + attr->ah_attr.grh.traffic_class = resp.dest.traffic_class; > + attr->ah_attr.sl = resp.dest.sl; > + attr->ah_attr.src_path_bits = resp.dest.src_path_bits; > + attr->ah_attr.static_rate = resp.dest.static_rate; > + attr->ah_attr.is_global = resp.dest.is_global; > + attr->ah_attr.port_num = resp.dest.port_num; > + > + memcpy(attr->alt_ah_attr.grh.dgid.raw, resp.alt_dest.dgid, 16); > + attr->alt_ah_attr.grh.flow_label = resp.alt_dest.flow_label; > + attr->alt_ah_attr.dlid = resp.alt_dest.dlid; > + attr->alt_ah_attr.grh.sgid_index = resp.alt_dest.sgid_index; > + attr->alt_ah_attr.grh.hop_limit = resp.alt_dest.hop_limit; > + attr->alt_ah_attr.grh.traffic_class = resp.alt_dest.traffic_class; > + attr->alt_ah_attr.sl = resp.alt_dest.sl; > + attr->alt_ah_attr.src_path_bits = resp.alt_dest.src_path_bits; > + attr->alt_ah_attr.static_rate = resp.alt_dest.static_rate; > + attr->alt_ah_attr.is_global = resp.alt_dest.is_global; > + attr->alt_ah_attr.port_num = resp.alt_dest.port_num; > + > + init_attr->qp_context = qp->qp_context; > + init_attr->send_cq = qp->send_cq; > + init_attr->recv_cq = qp->recv_cq; > + init_attr->srq = qp->srq; > + init_attr->qp_type = qp->qp_type; > + init_attr->cap.max_send_wr = resp.max_send_wr; > + init_attr->cap.max_recv_wr = resp.max_recv_wr; > + init_attr->cap.max_send_sge = resp.max_send_sge; > + init_attr->cap.max_recv_sge = resp.max_recv_sge; > + init_attr->cap.max_inline_data = resp.max_inline_data; > + init_attr->sq_sig_all = resp.sq_sig_all; > + > + return 0; > +} > + > +int rdma_cmd_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, > + enum rdma_qp_attr_mask attr_mask, > + struct rdma_modify_qp *cmd, size_t cmd_size) > +{ > + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_QP); > + > + cmd->qp_handle = qp->handle; > + cmd->attr_mask = attr_mask; > + cmd->qkey = attr->qkey; > + cmd->rq_psn = attr->rq_psn; > + cmd->sq_psn = attr->sq_psn; > + cmd->dest_qp_num = attr->dest_qp_num; > + cmd->qp_access_flags = attr->qp_access_flags; > + cmd->pkey_index = attr->pkey_index; > + cmd->alt_pkey_index = attr->alt_pkey_index; > + cmd->qp_state = attr->qp_state; > + cmd->cur_qp_state = attr->cur_qp_state; > + cmd->path_mtu = attr->path_mtu; > + cmd->path_mig_state = attr->path_mig_state; > + cmd->en_sqd_async_notify = attr->en_sqd_async_notify; > + cmd->max_rd_atomic = attr->max_rd_atomic; > + cmd->max_dest_rd_atomic = attr->max_dest_rd_atomic; > + cmd->min_rnr_timer = attr->min_rnr_timer; > + cmd->port_num = attr->port_num; > + cmd->timeout = attr->timeout; > + cmd->retry_cnt = attr->retry_cnt; > + cmd->rnr_retry = attr->rnr_retry; > + cmd->alt_port_num = attr->alt_port_num; > + cmd->alt_timeout = attr->alt_timeout; > + > + memcpy(cmd->dest.dgid, attr->ah_attr.grh.dgid.raw, 16); > + cmd->dest.flow_label = attr->ah_attr.grh.flow_label; > + cmd->dest.dlid = attr->ah_attr.dlid; > + cmd->dest.sgid_index = attr->ah_attr.grh.sgid_index; > + cmd->dest.hop_limit = attr->ah_attr.grh.hop_limit; > + cmd->dest.traffic_class = attr->ah_attr.grh.traffic_class; > + cmd->dest.sl = attr->ah_attr.sl; > + cmd->dest.src_path_bits = attr->ah_attr.src_path_bits; > + cmd->dest.static_rate = attr->ah_attr.static_rate; > + cmd->dest.is_global = attr->ah_attr.is_global; > + cmd->dest.port_num = attr->ah_attr.port_num; > + > + memcpy(cmd->alt_dest.dgid, attr->alt_ah_attr.grh.dgid.raw, 16); > + cmd->alt_dest.flow_label = attr->alt_ah_attr.grh.flow_label; > + cmd->alt_dest.dlid = attr->alt_ah_attr.dlid; > + cmd->alt_dest.sgid_index = attr->alt_ah_attr.grh.sgid_index; > + cmd->alt_dest.hop_limit = attr->alt_ah_attr.grh.hop_limit; > + cmd->alt_dest.traffic_class = attr->alt_ah_attr.grh.traffic_class; > + cmd->alt_dest.sl = attr->alt_ah_attr.sl; > + cmd->alt_dest.src_path_bits = attr->alt_ah_attr.src_path_bits; > + cmd->alt_dest.static_rate = attr->alt_ah_attr.static_rate; > + cmd->alt_dest.is_global = attr->alt_ah_attr.is_global; > + cmd->alt_dest.port_num = attr->alt_ah_attr.port_num; > + > + if (write(qp->context->cmd_fd, cmd, cmd_size) != cmd_size) > + return errno; > + > + return 0; > +} > + > +static int rdma_cmd_destroy_qp_v1(struct rdma_qp *qp) > +{ > + struct rdma_destroy_qp_v1 cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_QP); > + cmd.qp_handle = qp->handle; > + > + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_post_send(struct rdma_qp *ibqp, struct rdma_send_wr *wr, > + struct rdma_send_wr **bad_wr) > +{ > + struct rdma_post_send *cmd; > + struct rdma_post_send_resp resp; > + struct rdma_send_wr *i; > + struct rdma_kern_send_wr *n, *tmp; > + struct rdma_sge *s; > + unsigned wr_count = 0; > + unsigned sge_count = 0; > + int cmd_size; > + int ret = 0; > + > + for (i = wr; i; i = i->next) { > + wr_count++; > + sge_count += i->num_sge; > + } > + > + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; > + cmd = alloca(cmd_size); > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_SEND, &resp, sizeof resp); > + cmd->qp_handle = ibqp->handle; > + cmd->wr_count = wr_count; > + cmd->sge_count = sge_count; > + cmd->wqe_size = sizeof *n; > + > + n = (struct rdma_kern_send_wr *) ((void *) cmd + sizeof *cmd); > + s = (struct rdma_sge *) (n + wr_count); > + > + tmp = n; > + for (i = wr; i; i = i->next) { > + tmp->wr_id = i->wr_id; > + tmp->num_sge = i->num_sge; > + tmp->opcode = i->opcode; > + tmp->send_flags = i->send_flags; > + tmp->imm_data = i->imm_data; > + if (ibqp->qp_type == RDMA_QPT_UD) { > + tmp->wr.ud.ah = i->wr.ud.ah->handle; > + tmp->wr.ud.remote_qpn = i->wr.ud.remote_qpn; > + tmp->wr.ud.remote_qkey = i->wr.ud.remote_qkey; > + } else { > + switch(i->opcode) { > + case RDMA_WR_RDMA_WRITE: > + case RDMA_WR_RDMA_WRITE_WITH_IMM: > + case RDMA_WR_RDMA_READ: > + tmp->wr.rdma.remote_addr = > + i->wr.rdma.remote_addr; > + tmp->wr.rdma.rkey = i->wr.rdma.rkey; > + break; > + case RDMA_WR_ATOMIC_CMP_AND_SWP: > + case RDMA_WR_ATOMIC_FETCH_AND_ADD: > + tmp->wr.atomic.remote_addr = > + i->wr.atomic.remote_addr; > + tmp->wr.atomic.compare_add = > + i->wr.atomic.compare_add; > + tmp->wr.atomic.swap = i->wr.atomic.swap; > + tmp->wr.atomic.rkey = i->wr.atomic.rkey; > + break; > + default: > + break; > + } > + } > + > + if (tmp->num_sge) { > + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); > + s += tmp->num_sge; > + } > + > + tmp++; > + } > + > + resp.bad_wr = 0; > + if (write(ibqp->context->cmd_fd, cmd, cmd_size) != cmd_size) > + ret = errno; > + > + wr_count = resp.bad_wr; > + if (wr_count) { > + i = wr; > + while (--wr_count) > + i = i->next; > + *bad_wr = i; > + } > + > + return ret; > +} > + > +int rdma_cmd_post_recv(struct rdma_qp *ibqp, struct rdma_recv_wr *wr, > + struct rdma_recv_wr **bad_wr) > +{ > + struct rdma_post_recv *cmd; > + struct rdma_post_recv_resp resp; > + struct rdma_recv_wr *i; > + struct rdma_kern_recv_wr *n, *tmp; > + struct rdma_sge *s; > + unsigned wr_count = 0; > + unsigned sge_count = 0; > + int cmd_size; > + int ret = 0; > + > + for (i = wr; i; i = i->next) { > + wr_count++; > + sge_count += i->num_sge; > + } > + > + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; > + cmd = alloca(cmd_size); > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_RECV, &resp, sizeof resp); > + cmd->qp_handle = ibqp->handle; > + cmd->wr_count = wr_count; > + cmd->sge_count = sge_count; > + cmd->wqe_size = sizeof *n; > + > + n = (struct rdma_kern_recv_wr *) ((void *) cmd + sizeof *cmd); > + s = (struct rdma_sge *) (n + wr_count); > + > + tmp = n; > + for (i = wr; i; i = i->next) { > + tmp->wr_id = i->wr_id; > + tmp->num_sge = i->num_sge; > + > + if (tmp->num_sge) { > + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); > + s += tmp->num_sge; > + } > + > + tmp++; > + } > + > + resp.bad_wr = 0; > + if (write(ibqp->context->cmd_fd, cmd, cmd_size) != cmd_size) > + ret = errno; > + > + wr_count = resp.bad_wr; > + if (wr_count) { > + i = wr; > + while (--wr_count) > + i = i->next; > + *bad_wr = i; > + } > + > + return ret; > +} > + > +int rdma_cmd_post_srq_recv(struct rdma_srq *srq, struct rdma_recv_wr > *wr, > + struct rdma_recv_wr **bad_wr) > +{ > + struct rdma_post_srq_recv *cmd; > + struct rdma_post_srq_recv_resp resp; > + struct rdma_recv_wr *i; > + struct rdma_kern_recv_wr *n, *tmp; > + struct rdma_sge *s; > + unsigned wr_count = 0; > + unsigned sge_count = 0; > + int cmd_size; > + int ret = 0; > + > + for (i = wr; i; i = i->next) { > + wr_count++; > + sge_count += i->num_sge; > + } > + > + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; > + cmd = alloca(cmd_size); > + > + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_SRQ_RECV, &resp, sizeof resp); > + cmd->srq_handle = srq->handle; > + cmd->wr_count = wr_count; > + cmd->sge_count = sge_count; > + cmd->wqe_size = sizeof *n; > + > + n = (struct rdma_kern_recv_wr *) ((void *) cmd + sizeof *cmd); > + s = (struct rdma_sge *) (n + wr_count); > + > + tmp = n; > + for (i = wr; i; i = i->next) { > + tmp->wr_id = i->wr_id; > + tmp->num_sge = i->num_sge; > + > + if (tmp->num_sge) { > + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); > + s += tmp->num_sge; > + } > + > + tmp++; > + } > + > + resp.bad_wr = 0; > + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) > + ret = errno; > + > + wr_count = resp.bad_wr; > + if (wr_count) { > + i = wr; > + while (--wr_count) > + i = i->next; > + *bad_wr = i; > + } > + > + return ret; > +} > + > +int rdma_cmd_create_ah(struct rdma_pd *pd, struct rdma_ah *ah, > + struct rdma_ah_attr *attr) > +{ > + struct rdma_create_ah cmd; > + struct rdma_create_ah_resp resp; > + > + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, CREATE_AH, &resp, sizeof resp); > + cmd.user_handle = (uintptr_t) ah; > + cmd.pd_handle = pd->handle; > + cmd.attr.dlid = attr->dlid; > + cmd.attr.sl = attr->sl; > + cmd.attr.src_path_bits = attr->src_path_bits; > + cmd.attr.static_rate = attr->static_rate; > + cmd.attr.is_global = attr->is_global; > + cmd.attr.port_num = attr->port_num; > + cmd.attr.grh.flow_label = attr->grh.flow_label; > + cmd.attr.grh.sgid_index = attr->grh.sgid_index; > + cmd.attr.grh.hop_limit = attr->grh.hop_limit; > + cmd.attr.grh.traffic_class = attr->grh.traffic_class; > + memcpy(cmd.attr.grh.dgid, attr->grh.dgid.raw, 16); > + > + if (write(pd->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + ah->handle = resp.handle; > + > + return 0; > +} > + > +int rdma_cmd_destroy_ah(struct rdma_ah *ah) > +{ > + struct rdma_destroy_ah cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_AH); > + cmd.ah_handle = ah->handle; > + > + if (write(ah->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_destroy_qp(struct rdma_qp *qp) > +{ > + struct rdma_destroy_qp cmd; > + struct rdma_destroy_qp_resp resp; > + > + if (abi_ver == 1) > + return rdma_cmd_destroy_qp_v1(qp); > + > + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_QP, &resp, sizeof resp); > + cmd.qp_handle = qp->handle; > + > + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + pthread_mutex_lock(&qp->mutex); > + while (qp->events_completed != resp.events_reported) > + pthread_cond_wait(&qp->cond, &qp->mutex); > + pthread_mutex_unlock(&qp->mutex); > + > + return 0; > +} > + > +int rdma_cmd_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, > uint16_t lid) > +{ > + struct rdma_attach_mcast cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, ATTACH_MCAST); > + memcpy(cmd.gid, gid->raw, sizeof cmd.gid); > + cmd.qp_handle = qp->handle; > + cmd.mlid = lid; > + > + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > + > +int rdma_cmd_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, > uint16_t lid) > +{ > + struct rdma_detach_mcast cmd; > + > + RDMA_INIT_CMD(&cmd, sizeof cmd, DETACH_MCAST); > + memcpy(cmd.gid, gid->raw, sizeof cmd.gid); > + cmd.qp_handle = qp->handle; > + cmd.mlid = lid; > + > + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) > + return errno; > + > + return 0; > +} > diff -ruNp ORG/librdmaverbs/src/device.c NEW/librdmaverbs/src/device.c > --- ORG/librdmaverbs/src/device.c 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/src/device.c 2006-07-10 18:07:47.000000000 -0700 > @@ -0,0 +1,270 @@ > +/* > + * Copyright (c) 2004, 2005 Topspin Communications. All rights > reserved. > + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id: device.c 7631 2006-06-02 19:53:25Z swise $ > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#include "rdmaverbs.h" > + > +static pthread_mutex_t device_list_lock = PTHREAD_MUTEX_INITIALIZER; > +static int num_devices; > +static struct rdma_device **device_list; > + > +struct rdma_device **rdma_get_device_list(int *num) > +{ > + struct rdma_device **l; > + int i; > + > + pthread_mutex_lock(&device_list_lock); > + > + if (!num_devices) > + num_devices = rdmaverbs_init(&device_list); > + > + l = calloc(num_devices + 1, sizeof (struct rdma_device *)); > + for (i = 0; i < num_devices; ++i) > + l[i] = device_list[i]; > + > + pthread_mutex_unlock(&device_list_lock); > + > + if (num) > + *num = l ? num_devices : 0; > + > + return l; > +} > + > +void rdma_free_device_list(struct rdma_device **list) > +{ > + free(list); > +} > + > +const char *rdma_get_device_name(struct rdma_device *device) > +{ > + return device->name; > +} > + > +uint64_t rdma_get_device_guid(struct rdma_device *device) > +{ > + char attr[24]; > + uint64_t guid = 0; > + uint16_t parts[4]; > + int i; > + > + if (rdma_read_sysfs_file(device->ibdev_path, "node_guid", > + attr, sizeof attr) < 0) > + return 0; > + > + if (sscanf(attr, "%hx:%hx:%hx:%hx", > + parts, parts + 1, parts + 2, parts + 3) != 4) > + return 0; > + > + for (i = 0; i < 4; ++i) > + guid = (guid << 16) | parts[i]; > + > + return htonll(guid); > +} > + > +static enum rdma_node_type query_node_type(struct rdma_device *device) > +{ > + char node_desc[24]; > + char node_str[24]; > + int node_type; > + > + if (rdma_read_sysfs_file(device->ibdev_path, "node_type", > + node_desc, sizeof(node_desc)) < 0) > + return RDMA_NODE_UNKNOWN; > + > + sscanf(node_desc, "%d: %s\n", (int*)&node_type, node_str); > + return (enum rdma_node_type) node_type; > +} > + > +struct rdma_context *rdma_open_device(struct rdma_device *device) > +{ > + char *devpath; > + int cmd_fd; > + struct rdma_context *context; > + > + asprintf(&devpath, "/dev/infiniband/%s", device->dev_name); > + > + /* > + * We'll only be doing writes, but we need O_RDWR in case the > + * provider needs to mmap() the file. > + */ > + cmd_fd = open(devpath, O_RDWR); > + free(devpath); > + > + if (cmd_fd < 0) > + return NULL; > + > + device->node_type = query_node_type(device); > + > + context = device->ops.alloc_context(device, cmd_fd); > + if (!context) > + goto err; > + > + context->device = device; > + context->cmd_fd = cmd_fd; > + > + return context; > + > +err: > + close(cmd_fd); > + > + return NULL; > +} > + > +int rdma_close_device(struct rdma_context *context) > +{ > + int async_fd = context->async_fd; > + int cmd_fd = context->cmd_fd; > + int cq_fd = -1; > + > + if (abi_ver <= 2) { > + struct rdma_abi_compat_v2 *t = context->abi_compat; > + cq_fd = t->channel.fd; > + free(context->abi_compat); > + } > + > + context->device->ops.free_context(context); > + > + close(async_fd); > + close(cmd_fd); > + if (abi_ver <= 2) > + close(cq_fd); > + > + return 0; > +} > + > +int rdma_get_async_event(struct rdma_context *context, > + struct rdma_async_event *event) > +{ > + struct rdma_kern_async_event ev; > + > + if (read(context->async_fd, &ev, sizeof ev) != sizeof ev) > + return -1; > + > + event->event_type = ev.event_type; > + > + switch (event->event_type) { > + case RDMA_EVENT_CQ_ERR: > + event->element.cq = (void *) (uintptr_t) ev.element; > + break; > + > + case RDMA_EVENT_QP_FATAL: > + case RDMA_EVENT_QP_REQ_ERR: > + case RDMA_EVENT_QP_ACCESS_ERR: > + case RDMA_EVENT_COMM_EST: > + case RDMA_EVENT_SQ_DRAINED: > + case RDMA_EVENT_PATH_MIG: > + case RDMA_EVENT_PATH_MIG_ERR: > + case RDMA_EVENT_QP_LAST_WQE_REACHED: > + event->element.qp = (void *) (uintptr_t) ev.element; > + break; > + > + case RDMA_EVENT_SRQ_ERR: > + case RDMA_EVENT_SRQ_LIMIT_REACHED: > + event->element.srq = (void *) (uintptr_t) ev.element; > + break; > + > + default: > + event->element.port_num = ev.element; > + break; > + } > + > + return 0; > +} > + > +void rdma_ack_async_event(struct rdma_async_event *event) > +{ > + switch (event->event_type) { > + case RDMA_EVENT_CQ_ERR: > + { > + struct rdma_cq *cq = event->element.cq; > + > + pthread_mutex_lock(&cq->mutex); > + ++cq->async_events_completed; > + pthread_cond_signal(&cq->cond); > + pthread_mutex_unlock(&cq->mutex); > + > + return; > + } > + > + case RDMA_EVENT_QP_FATAL: > + case RDMA_EVENT_QP_REQ_ERR: > + case RDMA_EVENT_QP_ACCESS_ERR: > + case RDMA_EVENT_COMM_EST: > + case RDMA_EVENT_SQ_DRAINED: > + case RDMA_EVENT_PATH_MIG: > + case RDMA_EVENT_PATH_MIG_ERR: > + case RDMA_EVENT_QP_LAST_WQE_REACHED: > + { > + struct rdma_qp *qp = event->element.qp; > + > + pthread_mutex_lock(&qp->mutex); > + ++qp->events_completed; > + pthread_cond_signal(&qp->cond); > + pthread_mutex_unlock(&qp->mutex); > + > + return; > + } > + > + case RDMA_EVENT_SRQ_ERR: > + case RDMA_EVENT_SRQ_LIMIT_REACHED: > + { > + struct rdma_srq *srq = event->element.srq; > + > + pthread_mutex_lock(&srq->mutex); > + ++srq->events_completed; > + pthread_cond_signal(&srq->cond); > + pthread_mutex_unlock(&srq->mutex); > + > + return; > + } > + > + default: > + return; > + } > +} > diff -ruNp ORG/librdmaverbs/src/init.c NEW/librdmaverbs/src/init.c > --- ORG/librdmaverbs/src/init.c 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/src/init.c 2006-07-10 18:07:47.000000000 -0700 > @@ -0,0 +1,274 @@ > +/* > + * Copyright (c) 2004, 2005 Topspin Communications. All rights > reserved. > + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id: init.c 7631 2006-06-02 19:53:25Z swise $ > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "rdmaverbs.h" > + > +#ifndef OPENRDMA_DRIVER_PATH_ENV > +# define OPENRDMA_DRIVER_PATH_ENV "OPENRDMA_DRIVER_PATH" > +#endif > + > +HIDDEN int abi_ver; > + > +static char default_path[] = DRIVER_PATH; > +static const char *user_path; > + > +static struct rdma_driver *driver_list; > + > +static void load_driver(char *so_path) > +{ > + void *dlhandle; > + rdma_driver_init_func init_func; > + struct rdma_driver *driver; > + > + dlhandle = dlopen(so_path, RTLD_NOW); > + if (!dlhandle) { > + fprintf(stderr, PFX "Warning: couldn't load driver %s: %s\n", > + so_path, dlerror()); > + return; > + } > + > + dlerror(); > + init_func = dlsym(dlhandle, "rdma_driver_init"); > + if (dlerror() != NULL || !init_func) { > + dlclose(dlhandle); > + return; > + } > + > + driver = malloc(sizeof *driver); > + if (!driver) { > + fprintf(stderr, PFX "Fatal: couldn't allocate driver for %s\n", > so_path); > + dlclose(dlhandle); > + return; > + } > + > + driver->init_func = init_func; > + driver->next = driver_list; > + driver_list = driver; > +} > + > +static void find_drivers(char *dir) > +{ > + size_t len = strlen(dir); > + glob_t so_glob; > + char *pat; > + int ret; > + int i; > + > + if (!len) > + return; > + > + while (len && dir[len - 1] == '/') > + dir[--len] = '\0'; > + > + asprintf(&pat, "%s/*.so", dir); > + > + ret = glob(pat, 0, NULL, &so_glob); > + free(pat); > + > + if (ret) { > + if (ret != GLOB_NOMATCH) > + fprintf(stderr, PFX "Warning: couldn't search %s\n", pat); > + return; > + } > + > + for (i = 0; i < so_glob.gl_pathc; ++i) > + load_driver(so_glob.gl_pathv[i]); > + > + globfree(&so_glob); > +} > + > +static struct rdma_device *init_drivers(const char *class_path, > + const char *dev_name) > +{ > + struct rdma_driver *driver; > + struct rdma_device *dev; > + int abi_ver = 0; > + char sys_path[RDMA_SYSFS_PATH_MAX]; > + char ibdev_name[RDMA_SYSFS_NAME_MAX]; > + char value[8]; > + > + snprintf(sys_path, sizeof sys_path, "%s/%s", > + class_path, dev_name); > + > + if (rdma_read_sysfs_file(sys_path, "abi_version", value, sizeof value) > > 0) > + abi_ver = strtol(value, NULL, 10); > + > + if (rdma_read_sysfs_file(sys_path, "ibdev", ibdev_name, sizeof > ibdev_name) < 0) { > + fprintf(stderr, PFX "Warning: no ibdev class attr for %s\n", > + sys_path); > + return NULL; > + } > + > + for (driver = driver_list; driver; driver = driver->next) { > + dev = driver->init_func(sys_path, abi_ver); > + if (!dev) > + continue; > + > + dev->driver = driver; > + strcpy(dev->dev_path, sys_path); > + snprintf(dev->ibdev_path, RDMA_SYSFS_PATH_MAX, "%s/class/infiniband/% > s", > + rdma_get_sysfs_path(), ibdev_name); > + strcpy(dev->dev_name, dev_name); > + strcpy(dev->name, ibdev_name); > + > + return dev; > + } > + > + fprintf(stderr, PFX "Warning: no userspace device-specific driver > found for %s\n" > + " driver search path: ", dev_name); > + if (user_path) > + fprintf(stderr, "%s:", user_path); > + fprintf(stderr, "%s\n", default_path); > + > + return NULL; > +} > + > +static int check_abi_version(const char *path) > +{ > + char value[8]; > + > + if (rdma_read_sysfs_file(path, "class/infiniband_verbs/abi_version", > + value, sizeof value) < 0) { > + fprintf(stderr, PFX "Fatal: couldn't read uverbs ABI version.\n"); > + return -1; > + } > + > + abi_ver = strtol(value, NULL, 10); > + > + if (abi_ver < RDMA_USER_VERBS_MIN_ABI_VERSION || > + abi_ver > RDMA_USER_VERBS_MAX_ABI_VERSION) { > + fprintf(stderr, PFX "Fatal: kernel ABI version %d " > + "doesn't match library version %d.\n", > + abi_ver, RDMA_USER_VERBS_MAX_ABI_VERSION); > + return -1; > + } > + > + return 0; > +} > + > +HIDDEN int rdmaverbs_init(struct rdma_device ***list) > +{ > + const char *sysfs_path; > + char *wr_path, *dir; > + char class_path[RDMA_SYSFS_PATH_MAX]; > + DIR *class_dir; > + struct dirent *dent; > + struct rdma_device *device; > + struct rdma_device **new_list; > + int num_devices = 0; > + int list_size = 0; > + > + *list = NULL; > + > + if (rdma_init_mem_map()) > + return 0; > + > + find_drivers(default_path); > + > + /* > + * Only follow use path passed in through the calling user's > + * environment if we're not running SUID. > + */ > + if (getuid() == geteuid()) { > + user_path = getenv(OPENRDMA_DRIVER_PATH_ENV); > + if (user_path) { > + wr_path = strdupa(user_path); > + while ((dir = strsep(&wr_path, ";:"))) > + find_drivers(dir); > + } > + } > + > + /* > + * Now check if a driver is statically linked. Since we push > + * drivers onto our driver list, the last driver we find will > + * be the first one we try. > + */ > + load_driver(NULL); > + > + sysfs_path = rdma_get_sysfs_path(); > + if (!sysfs_path) { > + fprintf(stderr, PFX "Fatal: couldn't find sysfs mount.\n"); > + return 0; > + } > + > + if (check_abi_version(sysfs_path)) > + return 0; > + > + snprintf(class_path, sizeof class_path, "%s/class/infiniband_verbs", > + sysfs_path); > + class_dir = opendir(class_path); > + if (!class_dir) { > + fprintf(stderr, PFX "Fatal: couldn't open sysfs class " > + "directory '%s'.\n", class_path); > + return 0; > + } > + > + while ((dent = readdir(class_dir))) { > + if (dent->d_name[0] == '.' || dent->d_type == DT_REG) > + continue; > + > + device = init_drivers(class_path, dent->d_name); > + if (!device) > + continue; > + > + if (list_size <= num_devices) { > + list_size = list_size ? list_size * 2 : 1; > + new_list = realloc(*list, list_size * sizeof (struct rdma_device > *)); > + if (!new_list) > + goto out; > + *list = new_list; > + } > + > + (*list)[num_devices++] = device; > + } > + > + closedir(class_dir); > + > +out: > + return num_devices; > +} > diff -ruNp ORG/librdmaverbs/src/librdmaverbs.map > NEW/librdmaverbs/src/librdmaverbs.map > --- ORG/librdmaverbs/src/librdmaverbs.map 1969-12-31 16:00:00.000000000 > -0800 > +++ NEW/librdmaverbs/src/librdmaverbs.map 2006-07-10 18:07:47.000000000 > -0700 > @@ -0,0 +1,76 @@ > +IBVERBS_1.0 { > + global: > + rdma_get_device_list; > + rdma_free_device_list; > + rdma_get_device_name; > + rdma_get_device_guid; > + rdma_open_device; > + rdma_close_device; > + rdma_get_async_event; > + rdma_ack_async_event; > + rdma_query_device; > + rdma_query_port; > + rdma_query_gid; > + rdma_query_pkey; > + rdma_alloc_pd; > + rdma_dealloc_pd; > + rdma_reg_mr; > + rdma_dereg_mr; > + rdma_create_comp_channel; > + rdma_destroy_comp_channel; > + rdma_create_cq; > + rdma_resize_cq; > + rdma_destroy_cq; > + rdma_get_cq_event; > + rdma_ack_cq_events; > + rdma_create_srq; > + rdma_modify_srq; > + rdma_query_srq; > + rdma_destroy_srq; > + rdmav_create_qp; > + rdma_query_qp; > + rdma_modify_qp; > + rdmav_destroy_qp; > + rdma_create_ah; > + rdma_destroy_ah; > + rdma_attach_mcast; > + rdma_detach_mcast; > + rdma_cmd_get_context; > + rdma_cmd_query_device; > + rdma_cmd_query_port; > + rdma_cmd_query_gid; > + rdma_cmd_query_pkey; > + rdma_cmd_alloc_pd; > + rdma_cmd_dealloc_pd; > + rdma_cmd_reg_mr; > + rdma_cmd_dereg_mr; > + rdma_cmd_create_cq; > + rdma_cmd_poll_cq; > + rdma_cmd_req_notify_cq; > + rdma_cmd_resize_cq; > + rdma_cmd_destroy_cq; > + rdma_cmd_create_srq; > + rdma_cmd_modify_srq; > + rdma_cmd_query_srq; > + rdma_cmd_destroy_srq; > + rdma_cmd_create_qp; > + rdma_cmd_query_qp; > + rdma_cmd_modify_qp; > + rdma_cmd_destroy_qp; > + rdma_cmd_post_send; > + rdma_cmd_post_recv; > + rdma_cmd_post_srq_recv; > + rdma_cmd_create_ah; > + rdma_cmd_destroy_ah; > + rdma_cmd_attach_mcast; > + rdma_cmd_detach_mcast; > + rdma_copy_qp_attr_from_kern; > + rdma_copy_path_rec_from_kern; > + rdma_copy_path_rec_to_kern; > + rdma_rate_to_mult; > + mult_to_rdma_rate; > + rdma_get_sysfs_path; > + rdma_read_sysfs_file; > + > + local: *; > +}; > diff -ruNp ORG/librdmaverbs/src/marshall.c > NEW/librdmaverbs/src/marshall.c > --- ORG/librdmaverbs/src/marshall.c 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/src/marshall.c 2006-07-10 18:07:47.000000000 -0700 > @@ -0,0 +1,142 @@ > +/* > + * Copyright (c) 2005 Intel Corporation. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > + > +#include > + > +static void rdma_copy_ah_attr_from_kern(struct rdma_ah_attr *dst, > + struct rdma_kern_ah_attr *src) > +{ > + memcpy(dst->grh.dgid.raw, src->grh.dgid, sizeof dst->grh.dgid); > + dst->grh.flow_label = src->grh.flow_label; > + dst->grh.sgid_index = src->grh.sgid_index; > + dst->grh.hop_limit = src->grh.hop_limit; > + dst->grh.traffic_class = src->grh.traffic_class; > + > + dst->dlid = src->dlid; > + dst->sl = src->sl; > + dst->src_path_bits = src->src_path_bits; > + dst->static_rate = src->static_rate; > + dst->is_global = src->is_global; > + dst->port_num = src->port_num; > +} > + > +void rdma_copy_qp_attr_from_kern(struct rdma_qp_attr *dst, > + struct rdma_kern_qp_attr *src) > +{ > + dst->cur_qp_state = src->cur_qp_state; > + dst->path_mtu = src->path_mtu; > + dst->path_mig_state = src->path_mig_state; > + dst->qkey = src->qkey; > + dst->rq_psn = src->rq_psn; > + dst->sq_psn = src->sq_psn; > + dst->dest_qp_num = src->dest_qp_num; > + dst->qp_access_flags = src->qp_access_flags; > + > + dst->cap.max_send_wr = src->max_send_wr; > + dst->cap.max_recv_wr = src->max_recv_wr; > + dst->cap.max_send_sge = src->max_send_sge; > + dst->cap.max_recv_sge = src->max_recv_sge; > + dst->cap.max_inline_data = src->max_inline_data; > + > + rdma_copy_ah_attr_from_kern(&dst->ah_attr, &src->ah_attr); > + rdma_copy_ah_attr_from_kern(&dst->alt_ah_attr, &src->alt_ah_attr); > + > + dst->pkey_index = src->pkey_index; > + dst->alt_pkey_index = src->alt_pkey_index; > + dst->en_sqd_async_notify = src->en_sqd_async_notify; > + dst->sq_draining = src->sq_draining; > + dst->max_rd_atomic = src->max_rd_atomic; > + dst->max_dest_rd_atomic = src->max_dest_rd_atomic; > + dst->min_rnr_timer = src->min_rnr_timer; > + dst->port_num = src->port_num; > + dst->timeout = src->timeout; > + dst->retry_cnt = src->retry_cnt; > + dst->rnr_retry = src->rnr_retry; > + dst->alt_port_num = src->alt_port_num; > + dst->alt_timeout = src->alt_timeout; > +} > + > +void rdma_copy_path_rec_from_kern(struct rdma_sa_path_rec *dst, > + struct rdma_kern_path_rec *src) > +{ > + memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid); > + memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid); > + > + dst->dlid = src->dlid; > + dst->slid = src->slid; > + dst->raw_traffic = src->raw_traffic; > + dst->flow_label = src->flow_label; > + dst->hop_limit = src->hop_limit; > + dst->traffic_class = src->traffic_class; > + dst->reversible = src->reversible; > + dst->numb_path = src->numb_path; > + dst->pkey = src->pkey; > + dst->sl = src->sl; > + dst->mtu_selector = src->mtu_selector; > + dst->mtu = src->mtu; > + dst->rate_selector = src->rate_selector; > + dst->rate = src->rate; > + dst->packet_life_time = src->packet_life_time; > + dst->preference = src->preference; > + dst->packet_life_time_selector = src->packet_life_time_selector; > +} > + > +void rdma_copy_path_rec_to_kern(struct rdma_kern_path_rec *dst, > + struct rdma_sa_path_rec *src) > +{ > + memcpy(dst->dgid, src->dgid.raw, sizeof src->dgid); > + memcpy(dst->sgid, src->sgid.raw, sizeof src->sgid); > + > + dst->dlid = src->dlid; > + dst->slid = src->slid; > + dst->raw_traffic = src->raw_traffic; > + dst->flow_label = src->flow_label; > + dst->hop_limit = src->hop_limit; > + dst->traffic_class = src->traffic_class; > + dst->reversible = src->reversible; > + dst->numb_path = src->numb_path; > + dst->pkey = src->pkey; > + dst->sl = src->sl; > + dst->mtu_selector = src->mtu_selector; > + dst->mtu = src->mtu; > + dst->rate_selector = src->rate_selector; > + dst->rate = src->rate; > + dst->packet_life_time = src->packet_life_time; > + dst->preference = src->preference; > + dst->packet_life_time_selector = src->packet_life_time_selector; > +} > diff -ruNp ORG/librdmaverbs/src/memory.c NEW/librdmaverbs/src/memory.c > --- ORG/librdmaverbs/src/memory.c 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/src/memory.c 2006-07-10 18:07:47.000000000 -0700 > @@ -0,0 +1,258 @@ > +/* > + * Copyright (c) 2004, 2005 Topspin Communications. All rights > reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id: memory.c 6987 2006-05-08 15:18:51Z tom $ > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > +#include > + > +#include "rdmaverbs.h" > + > +/* > + * We keep a linked list of page ranges that have been locked along > with a > + * reference count to manage overlapping registrations, etc. > + * > + * Eventually we should turn this into an RB-tree or something similar > + * to avoid the O(n) cost of registering/unregistering memory. > + */ > + > +struct rdma_mem_node { > + struct rdma_mem_node *prev, *next; > + uintptr_t start, end; > + int refcnt; > +}; > + > +static struct { > + struct rdma_mem_node *first; > + pthread_mutex_t mutex; > + uintptr_t page_size; > +} mem_map; > + > +int rdma_init_mem_map(void) > +{ > + struct rdma_mem_node *node = NULL; > + > + node = malloc(sizeof *node); > + if (!node) > + goto fail; > + > + node->prev = node->next = NULL; > + node->start = 0; > + node->end = UINTPTR_MAX; > + node->refcnt = 0; > + > + mem_map.first = node; > + > + mem_map.page_size = sysconf(_SC_PAGESIZE); > + if (mem_map.page_size < 0) > + goto fail; > + > + if (pthread_mutex_init(&mem_map.mutex, NULL)) > + goto fail; > + > + return 0; > + > +fail: > + if (node) > + free(node); > + > + return -1; > +} > + > +static struct rdma_mem_node *__mm_find_first(uintptr_t start, uintptr_t > end) > +{ > + struct rdma_mem_node *node = mem_map.first; > + > + while (node) { > + if ((node->start <= start && node->end >= start) || > + (node->start <= end && node->end >= end)) > + break; > + node = node->next; > + } > + > + return node; > +} > + > +static struct rdma_mem_node *__mm_prev(struct rdma_mem_node *node) > +{ > + return node->prev; > +} > + > +static struct rdma_mem_node *__mm_next(struct rdma_mem_node *node) > +{ > + return node->next; > +} > + > +static void __mm_add(struct rdma_mem_node *node, > + struct rdma_mem_node *new) > +{ > + new->prev = node; > + new->next = node->next; > + node->next = new; > + if (new->next) > + new->next->prev = new; > +} > + > +static void __mm_remove(struct rdma_mem_node *node) > +{ > + /* Never have to remove the first node, so we can use prev */ > + node->prev->next = node->next; > + if (node->next) > + node->next->prev = node->prev; > +} > + > +int rdma_lock_range(void *base, size_t size) > +{ > + uintptr_t start, end; > + struct rdma_mem_node *node, *tmp; > + int ret = 0; > + > + if (!size) > + return 0; > + > + start = (uintptr_t) base & ~(mem_map.page_size - 1); > + end = ((uintptr_t) (base + size + mem_map.page_size - 1) & > + ~(mem_map.page_size - 1)) - 1; > + > + pthread_mutex_lock(&mem_map.mutex); > + > + node = __mm_find_first(start, end); > + > + if (node->start < start) { > + tmp = malloc(sizeof *tmp); > + if (!tmp) { > + ret = -1; > + goto out; > + } > + > + tmp->start = start; > + tmp->end = node->end; > + tmp->refcnt = node->refcnt; > + node->end = start - 1; > + > + __mm_add(node, tmp); > + node = tmp; > + } > + > + while (node->start <= end) { > + if (node->end > end) { > + tmp = malloc(sizeof *tmp); > + if (!tmp) { > + ret = -1; > + goto out; > + } > + > + tmp->start = end + 1; > + tmp->end = node->end; > + tmp->refcnt = node->refcnt; > + node->end = end; > + > + __mm_add(node, tmp); > + } > + > + > + if (node->refcnt++ == 0) { > + ret = mlock((void *) node->start, > + node->end - node->start + 1); > + if (ret) > + goto out; > + } > + > + node = __mm_next(node); > + } > + > +out: > + pthread_mutex_unlock(&mem_map.mutex); > + > + return ret; > +} > + > +int rdma_unlock_range(void *base, size_t size) > +{ > + uintptr_t start, end; > + struct rdma_mem_node *node, *tmp; > + int ret = 0; > + > + if (!size) > + return 0; > + > + start = (uintptr_t) base & ~(mem_map.page_size - 1); > + end = ((uintptr_t) (base + size + mem_map.page_size - 1) & > + ~(mem_map.page_size - 1)) - 1; > + > + pthread_mutex_lock(&mem_map.mutex); > + > + node = __mm_find_first(start, end); > + > + if (node->start != start) { > + ret = -1; > + goto out; > + } > + > + while (node && node->end <= end) { > + if (--node->refcnt == 0) { > + ret = munlock((void *) node->start, > + node->end - node->start + 1); > + } > + > + if (__mm_prev(node) && node->refcnt == __mm_prev(node)->refcnt) { > + __mm_prev(node)->end = node->end; > + tmp = __mm_prev(node); > + __mm_remove(node); > + node = tmp; > + } > + > + node = __mm_next(node); > + } > + > + if (node && node->refcnt == __mm_prev(node)->refcnt) { > + __mm_prev(node)->end = node->end; > + tmp = __mm_prev(node); > + __mm_remove(node); > + } > + > + if (node->end != end) { > + ret = -1; > + goto out; > + } > + > +out: > + pthread_mutex_unlock(&mem_map.mutex); > + > + return ret; > +} > diff -ruNp ORG/librdmaverbs/src/rdmaverbs.h > NEW/librdmaverbs/src/rdmaverbs.h > --- ORG/librdmaverbs/src/rdmaverbs.h 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/src/rdmaverbs.h 2006-07-10 18:07:47.000000000 -0700 > @@ -0,0 +1,87 @@ > +/* > + * Copyright (c) 2004, 2005 Topspin Communications. All rights > reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#ifndef RDMA_VERBS_H > +#define RDMA_VERBS_H > + > +#include > + > +#include > + > +#define HIDDEN __attribute__((visibility ("hidden"))) > + > +#define INIT __attribute__((constructor)) > +#define FINI __attribute__((destructor)) > + > +#define PFX "librdmaverbs: " > + > +struct rdma_driver { > + rdma_driver_init_func init_func; > + struct rdma_driver *next; > +}; > + > +struct rdma_abi_compat_v2 { > + struct rdma_comp_channel channel; > + pthread_mutex_t in_use; > +}; > + > +extern HIDDEN int abi_ver; > + > +extern HIDDEN int rdmaverbs_init(struct rdma_device ***list); > + > +extern HIDDEN int rdma_init_mem_map(void); > +extern HIDDEN int rdma_lock_range(void *base, size_t size); > +extern HIDDEN int rdma_unlock_range(void *base, size_t size); > + > +#define RDMA_INIT_CMD(cmd, size, opcode) \ > + do { \ > + if (abi_ver > 2) \ > + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode; \ > + else \ > + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode##_V2; \ > + (cmd)->in_words = (size) / 4; \ > + (cmd)->out_words = 0; \ > + } while (0) > + > +#define RDMA_INIT_CMD_RESP(cmd, size, opcode, out, outsize) \ > + do { \ > + if (abi_ver > 2) \ > + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode; \ > + else \ > + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode##_V2; \ > + (cmd)->in_words = (size) / 4; \ > + (cmd)->out_words = (outsize) / 4; \ > + (cmd)->response = (uintptr_t) (out); \ > + } while (0) > + > +#endif /* RDMA_VERBS_H */ > diff -ruNp ORG/librdmaverbs/src/sysfs.c NEW/librdmaverbs/src/sysfs.c > --- ORG/librdmaverbs/src/sysfs.c 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/src/sysfs.c 2006-07-10 18:07:47.000000000 -0700 > @@ -0,0 +1,104 @@ > +/* > + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id$ > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "rdmaverbs.h" > + > +static char *sysfs_path; > + > +const char *rdma_get_sysfs_path(void) > +{ > + char *env = NULL; > + > + if (sysfs_path) > + return sysfs_path; > + > + /* > + * Only follow use path passed in through the calling user's > + * environment if we're not running SUID. > + */ > + if (getuid() == geteuid()) > + env = getenv("SYSFS_PATH"); > + > + if (env) { > + int len; > + > + sysfs_path = strndup(env, RDMA_SYSFS_PATH_MAX); > + len = strlen(sysfs_path); > + while (len > 0 && sysfs_path[len - 1] == '/') { > + --len; > + sysfs_path[len] = '\0'; > + } > + } else > + sysfs_path = "/sys"; > + > + return sysfs_path; > +} > + > +int rdma_read_sysfs_file(const char *dir, const char *file, > + char *buf, size_t size) > +{ > + char *path; > + int fd; > + int len; > + > + asprintf(&path, "%s/%s", dir, file); > + > + fd = open(path, O_RDONLY); > + if (fd < 0) { > + free(path); > + return -1; > + } > + > + len = read(fd, buf, size); > + > + close(fd); > + free(path); > + > + if (len > 0 && buf[len - 1] == '\n') > + buf[--len] = '\0'; > + > + return len; > +} > diff -ruNp ORG/librdmaverbs/src/verbs.c NEW/librdmaverbs/src/verbs.c > --- ORG/librdmaverbs/src/verbs.c 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/src/verbs.c 2006-07-10 18:07:47.000000000 -0700 > @@ -0,0 +1,408 @@ > +/* > + * Copyright (c) 2005 Topspin Communications. All rights reserved. > + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id: verbs.c 7631 2006-06-02 19:53:25Z swise $ > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include "rdmaverbs.h" > + > +int rdma_rate_to_mult(enum rdma_rate rate) > +{ > + switch (rate) { > + case RDMA_RATE_2_5_GBPS: return 1; > + case RDMA_RATE_5_GBPS: return 2; > + case RDMA_RATE_10_GBPS: return 4; > + case RDMA_RATE_20_GBPS: return 8; > + case RDMA_RATE_30_GBPS: return 12; > + case RDMA_RATE_40_GBPS: return 16; > + case RDMA_RATE_60_GBPS: return 24; > + case RDMA_RATE_80_GBPS: return 32; > + case RDMA_RATE_120_GBPS: return 48; > + default: return -1; > + } > +} > + > +enum rdma_rate mult_to_rdma_rate(int mult) > +{ > + switch (mult) { > + case 1: return RDMA_RATE_2_5_GBPS; > + case 2: return RDMA_RATE_5_GBPS; > + case 4: return RDMA_RATE_10_GBPS; > + case 8: return RDMA_RATE_20_GBPS; > + case 12: return RDMA_RATE_30_GBPS; > + case 16: return RDMA_RATE_40_GBPS; > + case 24: return RDMA_RATE_60_GBPS; > + case 32: return RDMA_RATE_80_GBPS; > + case 48: return RDMA_RATE_120_GBPS; > + default: return RDMA_RATE_MAX; > + } > +} > + > +int rdma_query_device(struct rdma_context *context, > + struct rdma_device_attr *device_attr) > +{ > + return context->ops.query_device(context, device_attr); > +} > + > +int rdma_query_port(struct rdma_context *context, uint8_t port_num, > + struct rdma_port_attr *port_attr) > +{ > + return context->ops.query_port(context, port_num, port_attr); > +} > + > +int rdma_query_gid(struct rdma_context *context, uint8_t port_num, > + int index, union rdma_gid *gid) > +{ > + char name[24]; > + char attr[41]; > + uint16_t val; > + int i; > + > + snprintf(name, sizeof name, "ports/%d/gids/%d", port_num, index); > + > + if (rdma_read_sysfs_file(context->device->ibdev_path, name, > + attr, sizeof attr) < 0) > + return -1; > + > + for (i = 0; i < 8; ++i) { > + if (sscanf(attr + i * 5, "%hx", &val) != 1) > + return -1; > + gid->raw[i * 2 ] = val >> 8; > + gid->raw[i * 2 + 1] = val & 0xff; > + } > + > + return 0; > +} > + > +int rdma_query_pkey(struct rdma_context *context, uint8_t port_num, > + int index, uint16_t *pkey) > +{ > + char name[24]; > + char attr[8]; > + uint16_t val; > + > + snprintf(name, sizeof name, "ports/%d/pkeys/%d", port_num, index); > + > + if (rdma_read_sysfs_file(context->device->ibdev_path, name, > + attr, sizeof attr) < 0) > + return -1; > + > + if (sscanf(attr, "%hx", &val) != 1) > + return -1; > + > + *pkey = htons(val); > + return 0; > +} > + > +struct rdma_pd *rdma_alloc_pd(struct rdma_context *context) > +{ > + struct rdma_pd *pd; > + > + pd = context->ops.alloc_pd(context); > + if (pd) > + pd->context = context; > + > + return pd; > +} > + > +int rdma_dealloc_pd(struct rdma_pd *pd) > +{ > + return pd->context->ops.dealloc_pd(pd); > +} > + > +struct rdma_mr *rdma_reg_mr(struct rdma_pd *pd, void *addr, > + size_t length, enum rdma_access_flags access) > +{ > + struct rdma_mr *mr; > + > + mr = pd->context->ops.reg_mr(pd, addr, length, access); > + if (mr) { > + mr->context = pd->context; > + mr->pd = pd; > + } > + > + return mr; > +} > + > +int rdma_dereg_mr(struct rdma_mr *mr) > +{ > + return mr->context->ops.dereg_mr(mr); > +} > + > +static struct rdma_comp_channel *rdma_create_comp_channel_v2(struct > rdma_context *context) > +{ > + struct rdma_abi_compat_v2 *t = context->abi_compat; > + static int warned; > + > + if (!pthread_mutex_trylock(&t->in_use)) > + return &t->channel; > + > + if (!warned) { > + fprintf(stderr, PFX "Warning: kernel's ABI version %d limits > capacity.\n" > + " Only one completion channel can be created per context.\n", > + abi_ver); > + ++warned; > + } > + > + return NULL; > +} > + > +struct rdma_comp_channel *rdma_create_comp_channel(struct rdma_context > *context) > +{ > + struct rdma_comp_channel *channel; > + struct rdma_create_comp_channel cmd; > + struct rdma_create_comp_channel_resp resp; > + > + if (abi_ver <= 2) > + return rdma_create_comp_channel_v2(context); > + > + channel = malloc(sizeof *channel); > + if (!channel) > + return NULL; > + > + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, CREATE_COMP_CHANNEL, &resp, > sizeof resp); > + if (write(context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) { > + free(channel); > + return NULL; > + } > + > + channel->fd = resp.fd; > + > + return channel; > +} > + > +static int rdma_destroy_comp_channel_v2(struct rdma_comp_channel > *channel) > +{ > + struct rdma_abi_compat_v2 *t = (struct rdma_abi_compat_v2 *) channel; > + pthread_mutex_unlock(&t->in_use); > + return 0; > +} > + > +int rdma_destroy_comp_channel(struct rdma_comp_channel *channel) > +{ > + if (abi_ver <= 2) > + return rdma_destroy_comp_channel_v2(channel); > + > + close(channel->fd); > + free(channel); > + > + return 0; > +} > + > +struct rdma_cq *rdma_create_cq(struct rdma_context *context, int cqe, > void *cq_context, > + struct rdma_comp_channel *channel, int comp_vector) > +{ > + struct rdma_cq *cq = context->ops.create_cq(context, cqe, channel, > + comp_vector); > + > + if (cq) { > + cq->context = context; > + cq->cq_context = cq_context; > + cq->comp_events_completed = 0; > + cq->async_events_completed = 0; > + pthread_mutex_init(&cq->mutex, NULL); > + pthread_cond_init(&cq->cond, NULL); > + } > + > + return cq; > +} > + > +int rdma_resize_cq(struct rdma_cq *cq, int cqe) > +{ > + if (!cq->context->ops.resize_cq) > + return ENOSYS; > + > + return cq->context->ops.resize_cq(cq, cqe); > +} > + > +int rdma_destroy_cq(struct rdma_cq *cq) > +{ > + return cq->context->ops.destroy_cq(cq); > +} > + > + > +int rdma_get_cq_event(struct rdma_comp_channel *channel, > + struct rdma_cq **cq, void **cq_context) > +{ > + struct rdma_comp_event ev; > + > + if (read(channel->fd, &ev, sizeof ev) != sizeof ev) > + return -1; > + > + *cq = (struct rdma_cq *) (uintptr_t) ev.cq_handle; > + *cq_context = (*cq)->cq_context; > + > + if ((*cq)->context->ops.cq_event) > + (*cq)->context->ops.cq_event(*cq); > + > + return 0; > +} > + > +void rdma_ack_cq_events(struct rdma_cq *cq, unsigned int nevents) > +{ > + pthread_mutex_lock(&cq->mutex); > + cq->comp_events_completed += nevents; > + pthread_cond_signal(&cq->cond); > + pthread_mutex_unlock(&cq->mutex); > +} > + > +struct rdma_srq *rdma_create_srq(struct rdma_pd *pd, > + struct rdma_srq_init_attr *srq_init_attr) > +{ > + struct rdma_srq *srq; > + > + if (!pd->context->ops.create_srq) > + return NULL; > + > + srq = pd->context->ops.create_srq(pd, srq_init_attr); > + if (srq) { > + srq->context = pd->context; > + srq->srq_context = srq_init_attr->srq_context; > + srq->pd = pd; > + srq->events_completed = 0; > + pthread_mutex_init(&srq->mutex, NULL); > + pthread_cond_init(&srq->cond, NULL); > + } > + > + return srq; > +} > + > +int rdma_modify_srq(struct rdma_srq *srq, > + struct rdma_srq_attr *srq_attr, > + enum rdma_srq_attr_mask srq_attr_mask) > +{ > + return srq->context->ops.modify_srq(srq, srq_attr, srq_attr_mask); > +} > + > +int rdma_query_srq(struct rdma_srq *srq, struct rdma_srq_attr > *srq_attr) > +{ > + return srq->context->ops.query_srq(srq, srq_attr); > +} > + > +int rdma_destroy_srq(struct rdma_srq *srq) > +{ > + return srq->context->ops.destroy_srq(srq); > +} > + > +struct rdma_qp *rdmav_create_qp(struct rdma_pd *pd, > + struct rdma_qp_init_attr *qp_init_attr) > +{ > + struct rdma_qp *qp = pd->context->ops.create_qp(pd, qp_init_attr); > + > + if (qp) { > + qp->context = pd->context; > + qp->qp_context = qp_init_attr->qp_context; > + qp->pd = pd; > + qp->send_cq = qp_init_attr->send_cq; > + qp->recv_cq = qp_init_attr->recv_cq; > + qp->srq = qp_init_attr->srq; > + qp->qp_type = qp_init_attr->qp_type; > + qp->events_completed = 0; > + pthread_mutex_init(&qp->mutex, NULL); > + pthread_cond_init(&qp->cond, NULL); > + } > + > + return qp; > +} > + > +int rdma_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, > + enum rdma_qp_attr_mask attr_mask, > + struct rdma_qp_init_attr *init_attr) > +{ > + int ret; > + > + ret = qp->context->ops.query_qp(qp, attr, attr_mask, init_attr); > + if (ret) > + return ret; > + > + if (attr_mask & RDMA_QP_STATE) > + qp->state = attr->qp_state; > + > + return 0; > +} > + > +int rdma_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, > + enum rdma_qp_attr_mask attr_mask) > +{ > + int ret; > + > + ret = qp->context->ops.modify_qp(qp, attr, attr_mask); > + if (ret) > + return ret; > + > + if (attr_mask & RDMA_QP_STATE) > + qp->state = attr->qp_state; > + > + return 0; > +} > + > +int rdmav_destroy_qp(struct rdma_qp *qp) > +{ > + return qp->context->ops.destroy_qp(qp); > +} > + > +struct rdma_ah *rdma_create_ah(struct rdma_pd *pd, struct rdma_ah_attr > *attr) > +{ > + struct rdma_ah *ah = pd->context->ops.create_ah(pd, attr); > + > + if (ah) { > + ah->context = pd->context; > + ah->pd = pd; > + } > + > + return ah; > +} > + > +int rdma_destroy_ah(struct rdma_ah *ah) > +{ > + return ah->context->ops.destroy_ah(ah); > +} > + > +int rdma_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t > lid) > +{ > + return qp->context->ops.attach_mcast(qp, gid, lid); > +} > + > +int rdma_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t > lid) > +{ > + return qp->context->ops.detach_mcast(qp, gid, lid); > +} > From krkumar2 at in.ibm.com Mon Jul 10 03:28:22 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 15:58:22 +0530 Subject: [openib-general] [RFC] [PATCH 6/7] librdmacm changes to use new API Message-ID: <1152527302.28079.1.camel@localhost.localdomain> Changes the librdmacm library to use the librdmaverbs data types and verb API's. The example programs are also modified to use the new verb API's. Signed-of-by: Krishna Kumar diff -ruNp ORG/librdmacm/configure.in NEW/librdmacm/configure.in --- ORG/librdmacm/configure.in 2006-07-09 22:12:17.000000000 -0700 +++ NEW/librdmacm/configure.in 2006-07-10 18:07:45.000000000 -0700 @@ -27,8 +27,8 @@ if test "$disable_libcheck" != "yes" then AC_CHECK_LIB(sysfs, sysfs_open_class, [], AC_MSG_ERROR([sysfs_open_class() not found. librdmacm requires libsysfs.])) -AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], - AC_MSG_ERROR([ibv_get_device_list() not found. librdmacm requires libibverbs.])) +AC_CHECK_LIB(rdmaverbs, rdma_get_device_list, [], + AC_MSG_ERROR([rdma_get_device_list() not found. librdmacm requires librdmaverbs.])) fi dnl Checks for header files. @@ -36,8 +36,8 @@ if test "$disable_libcheck" != "yes" then AC_CHECK_HEADER(sysfs/libsysfs.h, [], AC_MSG_ERROR([ not found. librdmacm requires libsysfs.])) -AC_CHECK_HEADER(infiniband/verbs.h, [], - AC_MSG_ERROR([ not found. Is libibverbs installed?])) +AC_CHECK_HEADER(rdma/verbs.h, [], + AC_MSG_ERROR([ not found. Is librdmaverbs installed?])) fi AC_HEADER_STDC diff -ruNp ORG/librdmacm/examples/cmatose.c NEW/librdmacm/examples/cmatose.c --- ORG/librdmacm/examples/cmatose.c 2006-07-09 22:12:17.000000000 -0700 +++ NEW/librdmacm/examples/cmatose.c 2006-07-10 18:07:45.000000000 -0700 @@ -62,9 +62,9 @@ struct cmatest_node { int id; struct rdma_cm_id *cma_id; int connected; - struct ibv_pd *pd; - struct ibv_cq *cq; - struct ibv_mr *mr; + struct rdma_pd *pd; + struct rdma_cq *cq; + struct rdma_mr *mr; void *mem; }; @@ -100,8 +100,8 @@ static int create_message(struct cmatest printf("failed message allocation\n"); return -1; } - node->mr = ibv_reg_mr(node->pd, node->mem, message_size, - IBV_ACCESS_LOCAL_WRITE); + node->mr = rdma_reg_mr(node->pd, node->mem, message_size, + RDMA_ACCESS_LOCAL_WRITE); if (!node->mr) { printf("failed to reg MR\n"); goto err; @@ -114,10 +114,10 @@ err: static int init_node(struct cmatest_node *node) { - struct ibv_qp_init_attr init_qp_attr; + struct rdma_qp_init_attr init_qp_attr; int cqe, ret; - node->pd = ibv_alloc_pd(node->cma_id->verbs); + node->pd = rdma_alloc_pd(node->cma_id->verbs); if (!node->pd) { ret = -ENOMEM; printf("cmatose: unable to allocate PD\n"); @@ -125,7 +125,7 @@ static int init_node(struct cmatest_node } cqe = message_count ? message_count * 2 : 2; - node->cq = ibv_create_cq(node->cma_id->verbs, cqe, node, 0, 0); + node->cq = rdma_create_cq(node->cma_id->verbs, cqe, node, 0, 0); if (!node->cq) { ret = -ENOMEM; printf("cmatose: unable to create CQ\n"); @@ -139,7 +139,7 @@ static int init_node(struct cmatest_node init_qp_attr.cap.max_recv_sge = 1; init_qp_attr.qp_context = node; init_qp_attr.sq_sig_all = 1; - init_qp_attr.qp_type = IBV_QPT_RC; + init_qp_attr.qp_type = RDMA_QPT_RC; init_qp_attr.send_cq = node->cq; init_qp_attr.recv_cq = node->cq; ret = rdma_create_qp(node->cma_id, node->pd, &init_qp_attr); @@ -159,8 +159,8 @@ out: static int post_recvs(struct cmatest_node *node) { - struct ibv_recv_wr recv_wr, *recv_failure; - struct ibv_sge sge; + struct rdma_recv_wr recv_wr, *recv_failure; + struct rdma_sge sge; int i, ret = 0; if (!message_count) @@ -176,7 +176,7 @@ static int post_recvs(struct cmatest_nod sge.addr = (uintptr_t) node->mem; for (i = 0; i < message_count && !ret; i++ ) { - ret = ibv_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); + ret = rdma_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); if (ret) { printf("failed to post receives: %d\n", ret); break; @@ -187,8 +187,8 @@ static int post_recvs(struct cmatest_nod static int post_sends(struct cmatest_node *node) { - struct ibv_send_wr send_wr, *bad_send_wr; - struct ibv_sge sge; + struct rdma_send_wr send_wr, *bad_send_wr; + struct rdma_sge sge; int i, ret = 0; if (!node->connected || !message_count) @@ -197,7 +197,7 @@ static int post_sends(struct cmatest_nod send_wr.next = NULL; send_wr.sg_list = &sge; send_wr.num_sge = 1; - send_wr.opcode = IBV_WR_SEND; + send_wr.opcode = RDMA_WR_SEND; send_wr.send_flags = 0; send_wr.wr_id = (unsigned long)node; @@ -206,7 +206,7 @@ static int post_sends(struct cmatest_nod sge.addr = (uintptr_t) node->mem; for (i = 0; i < message_count && !ret; i++) { - ret = ibv_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); + ret = rdma_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); if (ret) printf("failed to post sends: %d\n", ret); } @@ -350,15 +350,15 @@ static void destroy_node(struct cmatest_ rdma_destroy_qp(node->cma_id); if (node->cq) - ibv_destroy_cq(node->cq); + rdma_destroy_cq(node->cq); if (node->mem) { - ibv_dereg_mr(node->mr); + rdma_dereg_mr(node->mr); free(node->mem); } if (node->pd) - ibv_dealloc_pd(node->pd); + rdma_dealloc_pd(node->pd); /* Destroy the RDMA ID after all device resources */ rdma_destroy_id(node->cma_id); @@ -404,7 +404,7 @@ static void destroy_nodes(void) static int poll_cqs(void) { - struct ibv_wc wc[8]; + struct rdma_wc wc[8]; int done, i, ret; for (i = 0; i < connections; i++) { @@ -412,7 +412,7 @@ static int poll_cqs(void) continue; for (done = 0; done < message_count; done += ret) { - ret = ibv_poll_cq(test.nodes[i].cq, 8, wc); + ret = rdma_poll_cq(test.nodes[i].cq, 8, wc); if (ret < 0) { printf("cmatose: failed polling CQ: %d\n", ret); return ret; diff -ruNp ORG/librdmacm/examples/query.c NEW/librdmacm/examples/query.c --- ORG/librdmacm/examples/query.c 2006-07-09 22:12:17.000000000 -0700 +++ NEW/librdmacm/examples/query.c 2006-07-10 18:07:45.000000000 -0700 @@ -83,8 +83,8 @@ int main(int argc, char *argv[]) } printf("node_type=%d\n", - ibv_get_transport_type(cm_id->verbs)); + rdma_get_transport_type(cm_id->verbs)); printf("transport_type=%d\n", - ibv_get_transport_type(cm_id->verbs)); + rdma_get_transport_type(cm_id->verbs)); return 0; } diff -ruNp ORG/librdmacm/examples/rping.c NEW/librdmacm/examples/rping.c --- ORG/librdmacm/examples/rping.c 2006-07-09 22:12:17.000000000 -0700 +++ NEW/librdmacm/examples/rping.c 2006-07-10 18:07:45.000000000 -0700 @@ -111,32 +111,32 @@ struct rping_rdma_info { struct rping_cb { int server; /* 0 iff client */ pthread_t cqthread; - struct ibv_comp_channel *channel; - struct ibv_cq *cq; - struct ibv_pd *pd; - struct ibv_qp *qp; + struct rdma_comp_channel *channel; + struct rdma_cq *cq; + struct rdma_pd *pd; + struct rdma_qp *qp; - struct ibv_recv_wr rq_wr; /* recv work request record */ - struct ibv_sge recv_sgl; /* recv single SGE */ + struct rdma_recv_wr rq_wr; /* recv work request record */ + struct rdma_sge recv_sgl; /* recv single SGE */ struct rping_rdma_info recv_buf;/* malloc'd buffer */ - struct ibv_mr *recv_mr; /* MR associated with this buffer */ + struct rdma_mr *recv_mr; /* MR associated with this buffer */ - struct ibv_send_wr sq_wr; /* send work requrest record */ - struct ibv_sge send_sgl; + struct rdma_send_wr sq_wr; /* send work requrest record */ + struct rdma_sge send_sgl; struct rping_rdma_info send_buf;/* single send buf */ - struct ibv_mr *send_mr; + struct rdma_mr *send_mr; - struct ibv_send_wr rdma_sq_wr; /* rdma work request record */ - struct ibv_sge rdma_sgl; /* rdma single SGE */ + struct rdma_send_wr rdma_sq_wr; /* rdma work request record */ + struct rdma_sge rdma_sgl; /* rdma single SGE */ char *rdma_buf; /* used as rdma sink */ - struct ibv_mr *rdma_mr; + struct rdma_mr *rdma_mr; uint32_t remote_rkey; /* remote guys RKEY */ uint64_t remote_addr; /* remote guys TO */ uint32_t remote_len; /* remote guys LEN */ char *start_buf; /* rdma read src */ - struct ibv_mr *start_mr; + struct rdma_mr *start_mr; enum test_state state; /* used for cond/signalling */ sem_t sem; @@ -232,7 +232,7 @@ static int rping_cma_event_handler(struc return ret; } -static int server_recv(struct rping_cb *cb, struct ibv_wc *wc) +static int server_recv(struct rping_cb *cb, struct rdma_wc *wc) { if (wc->byte_len != sizeof(cb->recv_buf)) { fprintf(stderr, "Received bogus data, size %d\n", wc->byte_len); @@ -253,7 +253,7 @@ static int server_recv(struct rping_cb * return 0; } -static int client_recv(struct rping_cb *cb, struct ibv_wc *wc) +static int client_recv(struct rping_cb *cb, struct rdma_wc *wc) { if (wc->byte_len != sizeof(cb->recv_buf)) { fprintf(stderr, "Received bogus data, size %d\n", wc->byte_len); @@ -270,39 +270,39 @@ static int client_recv(struct rping_cb * static int rping_cq_event_handler(struct rping_cb *cb) { - struct ibv_wc wc; - struct ibv_recv_wr *bad_wr; + struct rdma_wc wc; + struct rdma_recv_wr *bad_wr; int ret; - while ((ret = ibv_poll_cq(cb->cq, 1, &wc)) == 1) { + while ((ret = rdma_poll_cq(cb->cq, 1, &wc)) == 1) { ret = 0; if (wc.status) { fprintf(stderr, "cq completion failed status %d\n", wc.status); - if (wc.status != IBV_WC_WR_FLUSH_ERR) + if (wc.status != RDMA_WC_WR_FLUSH_ERR) ret = -1; goto error; } switch (wc.opcode) { - case IBV_WC_SEND: + case RDMA_WC_SEND: DEBUG_LOG("send completion\n"); break; - case IBV_WC_RDMA_WRITE: + case RDMA_WC_RDMA_WRITE: DEBUG_LOG("rdma write completion\n"); cb->state = RDMA_WRITE_COMPLETE; sem_post(&cb->sem); break; - case IBV_WC_RDMA_READ: + case RDMA_WC_RDMA_READ: DEBUG_LOG("rdma read completion\n"); cb->state = RDMA_READ_COMPLETE; sem_post(&cb->sem); break; - case IBV_WC_RECV: + case RDMA_WC_RECV: DEBUG_LOG("recv completion\n"); ret = cb->server ? server_recv(cb, &wc) : client_recv(cb, &wc); @@ -311,7 +311,7 @@ static int rping_cq_event_handler(struct goto error; } - ret = ibv_post_recv(cb->qp, &cb->rq_wr, &bad_wr); + ret = rdma_post_recv(cb->qp, &cb->rq_wr, &bad_wr); if (ret) { fprintf(stderr, "post recv error: %d\n", ret); goto error; @@ -374,14 +374,14 @@ static void rping_setup_wr(struct rping_ cb->send_sgl.length = sizeof cb->send_buf; cb->send_sgl.lkey = cb->send_mr->lkey; - cb->sq_wr.opcode = IBV_WR_SEND; - cb->sq_wr.send_flags = IBV_SEND_SIGNALED; + cb->sq_wr.opcode = RDMA_WR_SEND; + cb->sq_wr.send_flags = RDMA_SEND_SIGNALED; cb->sq_wr.sg_list = &cb->send_sgl; cb->sq_wr.num_sge = 1; cb->rdma_sgl.addr = (uint64_t) (unsigned long) cb->rdma_buf; cb->rdma_sgl.lkey = cb->rdma_mr->lkey; - cb->rdma_sq_wr.send_flags = IBV_SEND_SIGNALED; + cb->rdma_sq_wr.send_flags = RDMA_SEND_SIGNALED; cb->rdma_sq_wr.sg_list = &cb->rdma_sgl; cb->rdma_sq_wr.num_sge = 1; } @@ -392,14 +392,14 @@ static int rping_setup_buffers(struct rp DEBUG_LOG("rping_setup_buffers called on cb %p\n", cb); - cb->recv_mr = ibv_reg_mr(cb->pd, &cb->recv_buf, sizeof cb->recv_buf, - IBV_ACCESS_LOCAL_WRITE); + cb->recv_mr = rdma_reg_mr(cb->pd, &cb->recv_buf, sizeof cb->recv_buf, + RDMA_ACCESS_LOCAL_WRITE); if (!cb->recv_mr) { fprintf(stderr, "recv_buf reg_mr failed\n"); return errno; } - cb->send_mr = ibv_reg_mr(cb->pd, &cb->send_buf, sizeof cb->send_buf, 0); + cb->send_mr = rdma_reg_mr(cb->pd, &cb->send_buf, sizeof cb->send_buf, 0); if (!cb->send_mr) { fprintf(stderr, "send_buf reg_mr failed\n"); ret = errno; @@ -413,10 +413,10 @@ static int rping_setup_buffers(struct rp goto err2; } - cb->rdma_mr = ibv_reg_mr(cb->pd, cb->rdma_buf, cb->size, - IBV_ACCESS_LOCAL_WRITE | - IBV_ACCESS_REMOTE_READ | - IBV_ACCESS_REMOTE_WRITE); + cb->rdma_mr = rdma_reg_mr(cb->pd, cb->rdma_buf, cb->size, + RDMA_ACCESS_LOCAL_WRITE | + RDMA_ACCESS_REMOTE_READ | + RDMA_ACCESS_REMOTE_WRITE); if (!cb->rdma_mr) { fprintf(stderr, "rdma_buf reg_mr failed\n"); ret = errno; @@ -431,10 +431,10 @@ static int rping_setup_buffers(struct rp goto err4; } - cb->start_mr = ibv_reg_mr(cb->pd, cb->start_buf, cb->size, - IBV_ACCESS_LOCAL_WRITE | - IBV_ACCESS_REMOTE_READ | - IBV_ACCESS_REMOTE_WRITE); + cb->start_mr = rdma_reg_mr(cb->pd, cb->start_buf, cb->size, + RDMA_ACCESS_LOCAL_WRITE | + RDMA_ACCESS_REMOTE_READ | + RDMA_ACCESS_REMOTE_WRITE); if (!cb->start_mr) { fprintf(stderr, "start_buf reg_mr failed\n"); ret = errno; @@ -449,32 +449,32 @@ static int rping_setup_buffers(struct rp err5: free(cb->start_buf); err4: - ibv_dereg_mr(cb->rdma_mr); + rdma_dereg_mr(cb->rdma_mr); err3: free(cb->rdma_buf); err2: - ibv_dereg_mr(cb->send_mr); + rdma_dereg_mr(cb->send_mr); err1: - ibv_dereg_mr(cb->recv_mr); + rdma_dereg_mr(cb->recv_mr); return ret; } static void rping_free_buffers(struct rping_cb *cb) { DEBUG_LOG("rping_free_buffers called on cb %p\n", cb); - ibv_dereg_mr(cb->recv_mr); - ibv_dereg_mr(cb->send_mr); - ibv_dereg_mr(cb->rdma_mr); + rdma_dereg_mr(cb->recv_mr); + rdma_dereg_mr(cb->send_mr); + rdma_dereg_mr(cb->rdma_mr); free(cb->rdma_buf); if (!cb->server) { - ibv_dereg_mr(cb->start_mr); + rdma_dereg_mr(cb->start_mr); free(cb->start_buf); } } static int rping_create_qp(struct rping_cb *cb) { - struct ibv_qp_init_attr init_attr; + struct rdma_qp_init_attr init_attr; int ret; memset(&init_attr, 0, sizeof(init_attr)); @@ -482,7 +482,7 @@ static int rping_create_qp(struct rping_ init_attr.cap.max_recv_wr = 2; init_attr.cap.max_recv_sge = 1; init_attr.cap.max_send_sge = 1; - init_attr.qp_type = IBV_QPT_RC; + init_attr.qp_type = RDMA_QPT_RC; init_attr.send_cq = cb->cq; init_attr.recv_cq = cb->cq; @@ -501,43 +501,43 @@ static int rping_create_qp(struct rping_ static void rping_free_qp(struct rping_cb *cb) { - ibv_destroy_qp(cb->qp); - ibv_destroy_cq(cb->cq); - ibv_destroy_comp_channel(cb->channel); - ibv_dealloc_pd(cb->pd); + rdmav_destroy_qp(cb->qp); + rdma_destroy_cq(cb->cq); + rdma_destroy_comp_channel(cb->channel); + rdma_dealloc_pd(cb->pd); } static int rping_setup_qp(struct rping_cb *cb, struct rdma_cm_id *cm_id) { int ret; - cb->pd = ibv_alloc_pd(cm_id->verbs); + cb->pd = rdma_alloc_pd(cm_id->verbs); if (!cb->pd) { - fprintf(stderr, "ibv_alloc_pd failed\n"); + fprintf(stderr, "rdma_alloc_pd failed\n"); return errno; } DEBUG_LOG("created pd %p\n", cb->pd); - cb->channel = ibv_create_comp_channel(cm_id->verbs); + cb->channel = rdma_create_comp_channel(cm_id->verbs); if (!cb->channel) { - fprintf(stderr, "ibv_create_comp_channel failed\n"); + fprintf(stderr, "rdma_create_comp_channel failed\n"); ret = errno; goto err1; } DEBUG_LOG("created channel %p\n", cb->channel); - cb->cq = ibv_create_cq(cm_id->verbs, RPING_SQ_DEPTH * 2, cb, + cb->cq = rdma_create_cq(cm_id->verbs, RPING_SQ_DEPTH * 2, cb, cb->channel, 0); if (!cb->cq) { - fprintf(stderr, "ibv_create_cq failed\n"); + fprintf(stderr, "rdma_create_cq failed\n"); ret = errno; goto err2; } DEBUG_LOG("created cq %p\n", cb->cq); - ret = ibv_req_notify_cq(cb->cq, 0); + ret = rdma_req_notify_cq(cb->cq, 0); if (ret) { - fprintf(stderr, "ibv_create_cq failed\n"); + fprintf(stderr, "rdma_create_cq failed\n"); ret = errno; goto err3; } @@ -551,11 +551,11 @@ static int rping_setup_qp(struct rping_c return 0; err3: - ibv_destroy_cq(cb->cq); + rdma_destroy_cq(cb->cq); err2: - ibv_destroy_comp_channel(cb->channel); + rdma_destroy_comp_channel(cb->channel); err1: - ibv_dealloc_pd(cb->pd); + rdma_dealloc_pd(cb->pd); return ret; } @@ -581,35 +581,35 @@ static void *cm_thread(void *arg) static void *cq_thread(void *arg) { struct rping_cb *cb = arg; - struct ibv_cq *ev_cq; + struct rdma_cq *ev_cq; void *ev_ctx; int ret; DEBUG_LOG("cq_thread started.\n"); while (1) { - ret = ibv_get_cq_event(cb->channel, &ev_cq, &ev_ctx); + ret = rdma_get_cq_event(cb->channel, &ev_cq, &ev_ctx); if (ret) { fprintf(stderr, "Failed to get cq event!\n"); exit(ret); } if (ev_cq != cb->cq) { - fprintf(stderr, "Unkown CQ!\n"); + fprintf(stderr, "Unknown CQ!\n"); exit(-1); } - ret = ibv_req_notify_cq(cb->cq, 0); + ret = rdma_req_notify_cq(cb->cq, 0); if (ret) { fprintf(stderr, "Failed to set notify!\n"); exit(ret); } ret = rping_cq_event_handler(cb); - ibv_ack_cq_events(cb->cq, 1); + rdma_ack_cq_events(cb->cq, 1); if (ret) exit(ret); } } -static void rping_format_send(struct rping_cb *cb, char *buf, struct ibv_mr *mr) +static void rping_format_send(struct rping_cb *cb, char *buf, struct rdma_mr *mr) { struct rping_rdma_info *info = &cb->send_buf; @@ -623,7 +623,7 @@ static void rping_format_send(struct rpi static int rping_test_server(struct rping_cb *cb) { - struct ibv_send_wr *bad_wr; + struct rdma_send_wr *bad_wr; int ret; while (1) { @@ -639,12 +639,12 @@ static int rping_test_server(struct rpin DEBUG_LOG("server received sink adv\n"); /* Issue RDMA Read. */ - cb->rdma_sq_wr.opcode = IBV_WR_RDMA_READ; + cb->rdma_sq_wr.opcode = RDMA_WR_RDMA_READ; cb->rdma_sq_wr.wr.rdma.rkey = cb->remote_rkey; cb->rdma_sq_wr.wr.rdma.remote_addr = cb->remote_addr; cb->rdma_sq_wr.sg_list->length = cb->remote_len; - ret = ibv_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -666,7 +666,7 @@ static int rping_test_server(struct rpin printf("server ping data: %s\n", cb->rdma_buf); /* Tell client to continue */ - ret = ibv_post_send(cb->qp, &cb->sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -684,7 +684,7 @@ static int rping_test_server(struct rpin DEBUG_LOG("server received sink adv\n"); /* RDMA Write echo data */ - cb->rdma_sq_wr.opcode = IBV_WR_RDMA_WRITE; + cb->rdma_sq_wr.opcode = RDMA_WR_RDMA_WRITE; cb->rdma_sq_wr.wr.rdma.rkey = cb->remote_rkey; cb->rdma_sq_wr.wr.rdma.remote_addr = cb->remote_addr; cb->rdma_sq_wr.sg_list->length = strlen(cb->rdma_buf) + 1; @@ -693,7 +693,7 @@ static int rping_test_server(struct rpin cb->rdma_sq_wr.sg_list->addr, cb->rdma_sq_wr.sg_list->length); - ret = ibv_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -710,7 +710,7 @@ static int rping_test_server(struct rpin DEBUG_LOG("server rdma write complete \n"); /* Tell client to begin again */ - ret = ibv_post_send(cb->qp, &cb->sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -757,7 +757,7 @@ static int rping_bind_server(struct rpin static int rping_run_server(struct rping_cb *cb) { - struct ibv_recv_wr *bad_wr; + struct rdma_recv_wr *bad_wr; int ret; ret = rping_bind_server(cb); @@ -776,9 +776,9 @@ static int rping_run_server(struct rping goto err1; } - ret = ibv_post_recv(cb->qp, &cb->rq_wr, &bad_wr); + ret = rdma_post_recv(cb->qp, &cb->rq_wr, &bad_wr); if (ret) { - fprintf(stderr, "ibv_post_recv failed: %d\n", ret); + fprintf(stderr, "rdma_post_recv failed: %d\n", ret); goto err2; } @@ -804,7 +804,7 @@ err1: static int rping_test_client(struct rping_cb *cb) { int ping, start, cc, i, ret = 0; - struct ibv_send_wr *bad_wr; + struct rdma_send_wr *bad_wr; unsigned char c; start = 65; @@ -825,7 +825,7 @@ static int rping_test_client(struct rpin cb->start_buf[cb->size - 1] = 0; rping_format_send(cb, cb->start_buf, cb->start_mr); - ret = ibv_post_send(cb->qp, &cb->sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -841,7 +841,7 @@ static int rping_test_client(struct rpin } rping_format_send(cb, cb->rdma_buf, cb->rdma_mr); - ret = ibv_post_send(cb->qp, &cb->sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -926,7 +926,7 @@ static int rping_bind_client(struct rpin static int rping_run_client(struct rping_cb *cb) { - struct ibv_recv_wr *bad_wr; + struct rdma_recv_wr *bad_wr; int ret; ret = rping_bind_client(cb); @@ -945,9 +945,9 @@ static int rping_run_client(struct rping goto err1; } - ret = ibv_post_recv(cb->qp, &cb->rq_wr, &bad_wr); + ret = rdma_post_recv(cb->qp, &cb->rq_wr, &bad_wr); if (ret) { - fprintf(stderr, "ibv_post_recv failed: %d\n", ret); + fprintf(stderr, "rdma_post_recv failed: %d\n", ret); goto err2; } diff -ruNp ORG/librdmacm/include/rdma/rdma_cma.h NEW/librdmacm/include/rdma/rdma_cma.h --- ORG/librdmacm/include/rdma/rdma_cma.h 2006-07-09 22:12:15.000000000 -0700 +++ NEW/librdmacm/include/rdma/rdma_cma.h 2006-07-10 18:07:45.000000000 -0700 @@ -32,8 +32,8 @@ #include #include -#include -#include +#include +#include /* * Upon receiving a device removal event, users must destroy the associated @@ -62,12 +62,12 @@ enum { /* IB specific option names for get/set. */ enum { - IB_PATH_OPTIONS = 1, + RDMA_PATH_OPTIONS = 1, }; struct ib_addr { - union ibv_gid sgid; - union ibv_gid dgid; + union rdma_gid sgid; + union rdma_gid dgid; uint16_t pkey; }; @@ -81,7 +81,7 @@ struct rdma_addr { struct rdma_route { struct rdma_addr addr; - struct ibv_sa_path_rec *path_rec; + struct rdma_sa_path_rec *path_rec; int num_paths; }; @@ -90,10 +90,10 @@ struct rdma_event_channel { }; struct rdma_cm_id { - struct ibv_context *verbs; + struct rdma_context *verbs; struct rdma_event_channel *channel; void *context; - struct ibv_qp *qp; + struct rdma_qp *qp; struct rdma_route route; uint8_t port_num; }; @@ -179,8 +179,8 @@ int rdma_resolve_route(struct rdma_cm_id * QPs allocated to an rdma_cm_id will automatically be transitioned by the CMA * through their states. */ -int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd *pd, - struct ibv_qp_init_attr *qp_init_attr); +int rdma_create_qp(struct rdma_cm_id *id, struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr); /** * rdma_destroy_qp - Deallocate the QP associated with the specified RDMA diff -ruNp ORG/librdmacm/include/rdma/rdma_cma_abi.h NEW/librdmacm/include/rdma/rdma_cma_abi.h --- ORG/librdmacm/include/rdma/rdma_cma_abi.h 2006-07-09 22:12:15.000000000 -0700 +++ NEW/librdmacm/include/rdma/rdma_cma_abi.h 2006-07-10 18:07:45.000000000 -0700 @@ -33,7 +33,7 @@ #ifndef RDMA_CMA_ABI_H #define RDMA_CMA_ABI_H -#include +#include /* * This file must be kept in sync with the kernel's version of rdma_user_cm.h @@ -113,7 +113,7 @@ struct ucma_abi_query_route { struct ucma_abi_query_route_resp { __u64 node_guid; - struct ibv_kern_path_rec ib_route[2]; + struct rdma_kern_path_rec ib_route[2]; struct sockaddr_in6 src_addr; struct sockaddr_in6 dst_addr; __u32 num_paths; diff -ruNp ORG/librdmacm/src/cma.c NEW/librdmacm/src/cma.c --- ORG/librdmacm/src/cma.c 2006-07-09 22:12:15.000000000 -0700 +++ NEW/librdmacm/src/cma.c 2006-07-10 18:07:45.000000000 -0700 @@ -51,7 +51,7 @@ #include -#include +#include #include #include @@ -102,7 +102,7 @@ do { } while (0) struct cma_device { - struct ibv_context *verbs; + struct rdma_context *verbs; uint64_t guid; int port_cnt; }; @@ -117,7 +117,7 @@ struct cma_id_private { uint32_t handle; }; -static struct ibv_device **dev_list; +static struct rdma_device **dev_list; static struct dlist *cma_dev_list; static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; static int ucma_initialized; @@ -132,7 +132,7 @@ static void ucma_cleanup(void) if (cma_dev_list) { dlist_for_each_data(cma_dev_list, cma_dev, struct cma_device) - ibv_close_device(cma_dev->verbs); + rdma_close_device(cma_dev->verbs); dlist_destroy(cma_dev_list); cma_dev_list = NULL; @@ -184,7 +184,7 @@ static int ucma_init(void) { int i; struct cma_device *cma_dev; - struct ibv_device_attr attr; + struct rdma_device_attr attr; int ret; pthread_mutex_lock(&mut); @@ -201,7 +201,7 @@ static int ucma_init(void) goto err; } - dev_list = ibv_get_device_list(NULL); + dev_list = rdma_get_device_list(NULL); if (!dev_list) { printf("CMA: unable to get RDMA device liste\n"); ret = -ENODEV; @@ -215,15 +215,15 @@ static int ucma_init(void) goto err; } - cma_dev->guid = ibv_get_device_guid(dev_list[i]); - cma_dev->verbs = ibv_open_device(dev_list[i]); + cma_dev->guid = rdma_get_device_guid(dev_list[i]); + cma_dev->verbs = rdma_open_device(dev_list[i]); if (!cma_dev->verbs) { printf("CMA: unable to open RDMA device\n"); ret = -ENODEV; goto err; } - ret = ibv_query_device(cma_dev->verbs, &attr); + ret = rdma_query_device(cma_dev->verbs, &attr); if (ret) { printf("CMA: unable to query RDMA device\n"); goto err; @@ -240,7 +240,7 @@ err: ucma_cleanup(); pthread_mutex_unlock(&mut); if (dev_list) - ibv_free_device_list(dev_list); + rdma_free_device_list(dev_list); return ret; } @@ -429,7 +429,7 @@ static int ucma_query_route(struct rdma_ id->route.num_paths = resp->num_paths; for (i = 0; i < resp->num_paths; i++) - ibv_copy_path_rec_from_kern(&id->route.path_rec[i], + rdma_copy_path_rec_from_kern(&id->route.path_rec[i], &resp->ib_route[i]); } @@ -528,11 +528,11 @@ int rdma_resolve_route(struct rdma_cm_id return 0; } -static int rdma_init_qp_attr(struct rdma_cm_id *id, struct ibv_qp_attr *qp_attr, +static int rdma_init_qp_attr(struct rdma_cm_id *id, struct rdma_qp_attr *qp_attr, int *qp_attr_mask) { struct ucma_abi_init_qp_attr *cmd; - struct ibv_kern_qp_attr *resp; + struct rdma_kern_qp_attr *resp; struct cma_id_private *id_priv; void *msg; int ret, size; @@ -546,70 +546,70 @@ static int rdma_init_qp_attr(struct rdma if (ret != size) return (ret > 0) ? -ENODATA : ret; - ibv_copy_qp_attr_from_kern(qp_attr, resp); + rdma_copy_qp_attr_from_kern(qp_attr, resp); *qp_attr_mask = resp->qp_attr_mask; return 0; } static int ucma_modify_qp_rtr(struct rdma_cm_id *id) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; int qp_attr_mask, ret; if (!id->qp) return -EINVAL; /* Need to update QP attributes from default values. */ - qp_attr.qp_state = IBV_QPS_INIT; + qp_attr.qp_state = RDMA_QPS_INIT; ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); if (ret) return ret; - ret = ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); + ret = rdma_modify_qp(id->qp, &qp_attr, qp_attr_mask); if (ret) return ret; - qp_attr.qp_state = IBV_QPS_RTR; + qp_attr.qp_state = RDMA_QPS_RTR; ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); if (ret) return ret; - return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); + return rdma_modify_qp(id->qp, &qp_attr, qp_attr_mask); } static int ucma_modify_qp_rts(struct rdma_cm_id *id) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; int qp_attr_mask, ret; - qp_attr.qp_state = IBV_QPS_RTS; + qp_attr.qp_state = RDMA_QPS_RTS; ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); if (ret) return ret; - return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); + return rdma_modify_qp(id->qp, &qp_attr, qp_attr_mask); } static int ucma_modify_qp_sqd(struct rdma_cm_id *id) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; if (!id->qp) return 0; - qp_attr.qp_state = IBV_QPS_SQD; - return ibv_modify_qp(id->qp, &qp_attr, IBV_QP_STATE); + qp_attr.qp_state = RDMA_QPS_SQD; + return rdma_modify_qp(id->qp, &qp_attr, RDMA_QP_STATE); } static int ucma_modify_qp_err(struct rdma_cm_id *id) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; if (!id->qp) return 0; - qp_attr.qp_state = IBV_QPS_ERR; - return ibv_modify_qp(id->qp, &qp_attr, IBV_QP_STATE); + qp_attr.qp_state = RDMA_QPS_ERR; + return rdma_modify_qp(id->qp, &qp_attr, RDMA_QP_STATE); } static int ucma_find_pkey(struct cma_device *cma_dev, uint8_t port_num, @@ -619,7 +619,7 @@ static int ucma_find_pkey(struct cma_dev uint16_t chk_pkey; for (i = 0, ret = 0; !ret; i++) { - ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); + ret = rdma_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); if (!ret && pkey == chk_pkey) { *pkey_index = (uint16_t) i; return 0; @@ -629,9 +629,9 @@ static int ucma_find_pkey(struct cma_dev return -EINVAL; } -static int ucma_init_ib_qp(struct cma_id_private *id_priv, struct ibv_qp *qp) +static int ucma_init_ib_qp(struct cma_id_private *id_priv, struct rdma_qp *qp) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; struct ib_addr *ibaddr; int ret; @@ -642,24 +642,24 @@ static int ucma_init_ib_qp(struct cma_id return ret; qp_attr.port_num = id_priv->id.port_num; - qp_attr.qp_state = IBV_QPS_INIT; - qp_attr.qp_access_flags = IBV_ACCESS_LOCAL_WRITE; - return ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_ACCESS_FLAGS | - IBV_QP_PKEY_INDEX | IBV_QP_PORT); + qp_attr.qp_state = RDMA_QPS_INIT; + qp_attr.qp_access_flags = RDMA_ACCESS_LOCAL_WRITE; + return rdma_modify_qp(qp, &qp_attr, RDMA_QP_STATE | RDMA_QP_ACCESS_FLAGS | + RDMA_QP_PKEY_INDEX | RDMA_QP_PORT); } -int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd *pd, - struct ibv_qp_init_attr *qp_init_attr) +int rdma_create_qp(struct rdma_cm_id *id, struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr) { struct cma_id_private *id_priv; - struct ibv_qp *qp; + struct rdma_qp *qp; int ret; id_priv = container_of(id, struct cma_id_private, id); if (id->verbs != pd->context) return -EINVAL; - qp = ibv_create_qp(pd, qp_init_attr); + qp = rdmav_create_qp(pd, qp_init_attr); if (!qp) return -ENOMEM; @@ -670,18 +670,18 @@ int rdma_create_qp(struct rdma_cm_id *id id->qp = qp; return 0; err: - ibv_destroy_qp(qp); + rdmav_destroy_qp(qp); return ret; } void rdma_destroy_qp(struct rdma_cm_id *id) { - ibv_destroy_qp(id->qp); + rdmav_destroy_qp(id->qp); } static void ucma_copy_conn_param_to_kern(struct ucma_abi_conn_param *dst, struct rdma_conn_param *src, - struct ibv_qp *qp) + struct rdma_qp *qp) { dst->qp_num = qp->qp_num; dst->qp_type = qp->qp_type; @@ -797,11 +797,11 @@ int rdma_disconnect(struct rdma_cm_id *i void *msg; int ret, size; - switch (ibv_get_transport_type(id->verbs)) { - case IBV_TRANSPORT_IB: + switch (rdma_get_transport_type(id->verbs)) { + case RDMA_TRANSPORT_IB: ret = ucma_modify_qp_err(id); break; - case IBV_TRANSPORT_IWARP: + case RDMA_TRANSPORT_IWARP: ret = ucma_modify_qp_sqd(id); break; default: --- ORG/librdmaverbs/Makefile.am 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/Makefile.am 2006-07-10 22:08:54.000000000 -0700 @@ -0,0 +1,40 @@ +# $Id: Makefile.am 7631 2006-06-02 19:53:25Z swise $ + +INCLUDES = -I$(srcdir)/include + +lib_LTLIBRARIES = src/librdmaverbs.la + +AM_CFLAGS = -g -Wall -D_GNU_SOURCE + +src_librdmaverbs_la_CFLAGS = -g -Wall -D_GNU_SOURCE -DDRIVER_PATH= \"$(libdir)/rdma\" + +if HAVE_LD_VERSION_SCRIPT + librdmaverbs_version_script = -Wl,--version-script= $(srcdir)/src/librdmaverbs.map +else + librdmaverbs_version_script = +endif + +src_librdmaverbs_la_SOURCES = src/cmd.c src/device.c src/init.c src/marshall.c \ + src/memory.c src/sysfs.c src/verbs.c +src_librdmaverbs_la_LDFLAGS = -version-info 2 -export-dynamic \ + $(librdmaverbs_version_script) +src_librdmaverbs_la_DEPENDENCIES = $(srcdir)/src/librdmaverbs.map + +librdmaverbsincludedir = $(includedir)/rdma + +librdmaverbsinclude_HEADERS = include/rdma/arch.h include/rdma/driver.h \ + include/rdma/kern-abi.h include/rdma/opcode.h include/rdma/verbs.h \ + include/rdma/sa-kern-abi.h include/rdma/sa.h include/rdma/marshall.h + +DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ + debian/ibverbs-utils.install debian/librdmaverbs2.install \ + debian/librdmaverbs2.postinst debian/librdmaverbs-dev.install \ + debian/rules + +EXTRA_DIST = include/rdma/driver.h include/rdma/kern-abi.h \ + include/rdma/opcode.h include/rdma/verbs.h include/rdma/marshall.h \ + include/rdma/sa-kern-abi.h include/rdma/sa.h \ + src/rdmaverbs.h src/librdmaverbs.map librdmaverbs.spec.in + +dist-hook: librdmaverbs.spec + cp librdmaverbs.spec $(distdir) From mst at mellanox.co.il Mon Jul 10 04:06:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 14:06:18 +0300 Subject: [openib-general] [PATCH] IB/mthca: fix static rate returned by mthca_ah_query Message-ID: <20060710110618.GC24705@mellanox.co.il> Andrew, this has been out for a couple of days - could you drop this into -mm / push this along to Linus, please? ----- Forwarded message from "Michael S. Tsirkin" ----- Please review: git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 This has the following patch: IB/mthca: fix static rate returned by mthca_ah_query -- commit 8b9395d3b67af87ce4c218af06a259a8af246d75 Author: Jack Morgenstein Date: Thu Jul 6 19:33:25 2006 +0300 mthca_ah_query returs the static rate of the address handle in internal mthc format. fix it to use rate encoding from enum ib_rate, which is what users expect. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: latest/drivers/infiniband/hw/mthca/mthca_av.c =================================================================== --- latest.orig/drivers/infiniband/hw/mthca/mthca_av.c 2006-07-02 18:00:34.000000000 +0300 +++ latest/drivers/infiniband/hw/mthca/mthca_av.c 2006-07-05 13:50:06.000000000 +0300 @@ -303,9 +303,10 @@ int mthca_ah_query(struct ib_ah *ibah, s memset(attr, 0, sizeof *attr); attr->dlid = be16_to_cpu(ah->av->dlid); attr->sl = be32_to_cpu(ah->av->sl_tclass_flowlabel) >> 28; - attr->static_rate = ah->av->msg_sr & 0x7; - attr->src_path_bits = ah->av->g_slid & 0x7F; attr->port_num = be32_to_cpu(ah->av->port_pd) >> 24; + attr->static_rate = mthca_rate_to_ib(dev, ah->av->msg_sr & 0x7, + attr->port_num); + attr->src_path_bits = ah->av->g_slid & 0x7F; attr->ah_flags = mthca_ah_grh_present(ah) ? IB_AH_GRH : 0; if (attr->ah_flags) { -- MST From mst at mellanox.co.il Mon Jul 10 04:14:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 14:14:12 +0300 Subject: [openib-general] [PATCH] IB/mthca: comment fix Message-ID: <20060710111412.GD24705@mellanox.co.il> Hi Andrew, Here's a cosmetic patch for IB/mthca. Pls drop it into -mm and on. --- comment in mthca_qp.c makes it seem lockdep is the only reason WQ locks should be initialized separately, but as Zach Brown and Roland pointed out, there are other reasons, e.g. that mthca_wq_init is called from modify qp as well. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 490fc78..2f3917e 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1114,7 +1114,7 @@ static int mthca_alloc_qp_common(struct qp->sq_policy = send_policy; mthca_wq_init(&qp->sq); mthca_wq_init(&qp->rq); - /* these are initialized separately so lockdep can tell them apart */ + spin_lock_init(&qp->sq.lock); spin_lock_init(&qp->rq.lock); -- MST From krkumar2 at in.ibm.com Mon Jul 10 03:34:05 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 16:04:05 +0530 Subject: [openib-general] [RFC] [PATCH 5/7] librdmaverbs libtool Message-ID: <1152527645.28099.6.camel@localhost.localdomain> Library building support. Sending as attachment as otherwise I get a glibc error followed by mailer hang. Signed-of-by: Krishna Kumar -------------- next part -------------- A non-text attachment was scrubbed... Name: diff.rdma_libtool.bz2 Type: application/x-bzip Size: 41950 bytes Desc: not available URL: From krkumar2 at in.ibm.com Mon Jul 10 03:31:06 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 16:01:06 +0530 Subject: [openib-general] [RFC] [PATCH 7/7] libamso library changes Message-ID: <1152527466.28099.2.camel@localhost.localdomain> Changes the libamso user-mode library to use librdmaverbs data types and API's. Signed-of-by: Krishna Kumar diff -ruNp ORG/libamso/Makefile.am NEW/libamso/Makefile.am --- ORG/libamso/Makefile.am 2006-07-09 22:12:11.000000000 -0700 +++ NEW/libamso/Makefile.am 2006-07-10 18:07:44.000000000 -0700 @@ -1,6 +1,6 @@ # $Id: $ -amsolibdir = $(libdir)/infiniband +amsolibdir = $(libdir)/rdma amsolib_LTLIBRARIES = src/amso.la diff -ruNp ORG/libamso/configure.in NEW/libamso/configure.in --- ORG/libamso/configure.in 2006-07-09 22:12:11.000000000 -0700 +++ NEW/libamso/configure.in 2006-07-10 21:02:06.000000000 -0700 @@ -12,13 +12,13 @@ dnl Checks for programs AC_PROG_CC dnl Checks for libraries -AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], - AC_MSG_ERROR([ibv_get_device_list() not found. libmthca requires libibverbs.])) +AC_CHECK_LIB(rdmaverbs, rdma_get_device_list, [], + AC_MSG_ERROR([rdma_get_device_list() not found. libamso requires librdmaverbs.])) dnl Checks for header files. AC_CHECK_HEADERS(sysfs/libsysfs.h) -AC_CHECK_HEADER(infiniband/driver.h, [], - AC_MSG_ERROR([ not found. Is libibverbs installed?])) +AC_CHECK_HEADER(rdma/driver.h, [], + AC_MSG_ERROR([ not found. Is librdmaverbs installed?])) AC_HEADER_STDC dnl Checks for typedefs, structures, and compiler characteristics. @@ -26,7 +26,7 @@ AC_C_CONST AC_CHECK_SIZEOF(long) dnl Checks for library functions -AC_CHECK_FUNCS(ibv_read_sysfs_file) +AC_CHECK_FUNCS(rdma_read_sysfs_file) AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then diff -ruNp ORG/libamso/libamso.spec.in NEW/libamso/libamso.spec.in --- ORG/libamso/libamso.spec.in 2006-07-09 22:12:11.000000000 -0700 +++ NEW/libamso/libamso.spec.in 2006-07-10 18:07:44.000000000 -0700 @@ -39,18 +39,18 @@ make %{?_smp_mflags} rm -rf $RPM_BUILD_ROOT %makeinstall # remove unpackaged files from the buildroot -rm -f $RPM_BUILD_ROOT%{_libdir}/infiniband/*.la +rm -f $RPM_BUILD_ROOT%{_libdir}/rdma/*.la %clean rm -rf $RPM_BUILD_ROOT %files %defattr(-,root,root,-) -%{_libdir}/infiniband/amso.so +%{_libdir}/rdma/amso.so %doc AUTHORS COPYING ChangeLog README %files devel %defattr(-,root,root,-) -%{_libdir}/infiniband/amso.a +%{_libdir}/rdma/amso.a %changelog diff -ruNp ORG/libamso/src/amso-abi.h NEW/libamso/src/amso-abi.h --- ORG/libamso/src/amso-abi.h 2006-07-09 22:12:10.000000000 -0700 +++ NEW/libamso/src/amso-abi.h 2006-07-10 18:07:44.000000000 -0700 @@ -32,23 +32,23 @@ #ifndef AMSO_ABI_H #define AMSO_ABI_H -#include +#include struct amso_alloc_ucontext_resp { - struct ibv_get_context_resp ibv_resp; + struct rdma_get_context_resp rdma_resp; }; struct amso_alloc_pd_resp { - struct ibv_alloc_pd_resp ibv_resp; + struct rdma_alloc_pd_resp rdma_resp; }; struct amso_create_cq { - struct ibv_create_cq ibv_cmd; + struct rdma_create_cq rdma_cmd; }; struct amso_create_cq_resp { - struct ibv_create_cq_resp ibv_resp; + struct rdma_create_cq_resp rdma_resp; __u32 cqid; __u32 entries; __u64 physaddr; /* library mmaps this to get addressability */ @@ -56,11 +56,11 @@ struct amso_create_cq_resp { }; struct amso_create_qp { - struct ibv_create_qp ibv_cmd; + struct rdma_create_qp rdma_cmd; }; struct amso_create_qp_resp { - struct ibv_create_qp_resp ibv_resp; + struct rdma_create_qp_resp rdma_resp; __u32 qpid; __u32 entries; /* actual number of entries after creation */ __u64 physaddr; /* library mmaps this to get addressability */ diff -ruNp ORG/libamso/src/amso.c NEW/libamso/src/amso.c --- ORG/libamso/src/amso.c 2006-07-09 22:12:10.000000000 -0700 +++ NEW/libamso/src/amso.c 2006-07-10 18:07:44.000000000 -0700 @@ -44,7 +44,7 @@ #include #endif -#ifndef HAVE_IBV_READ_SYSFS_FILE +#ifndef HAVE_RDMA_READ_SYSFS_FILE #include #include #include @@ -69,7 +69,7 @@ struct { HCA(AMSO, 1100, 1100), }; -static struct ibv_context_ops amso_ctx_ops = { +static struct rdma_context_ops amso_ctx_ops = { .query_device = amso_query_device, .query_port = amso_query_port, .alloc_pd = amso_alloc_pd, @@ -92,50 +92,50 @@ static struct ibv_context_ops amso_ctx_o .detach_mcast = amso_detach_mcast }; -static struct ibv_context *amso_alloc_context(struct ibv_device *ibdev, +static struct rdma_context *amso_alloc_context(struct rdma_device *ibdev, int cmd_fd) { struct amso_context *context; - struct ibv_get_context cmd; + struct rdma_get_context cmd; struct amso_alloc_ucontext_resp resp; context = malloc(sizeof *context); if (!context) return NULL; - context->ibv_ctx.cmd_fd = cmd_fd; + context->rdma_ctx.cmd_fd = cmd_fd; - if (ibv_cmd_get_context(&context->ibv_ctx, &cmd, sizeof cmd, - &resp.ibv_resp, sizeof resp)) + if (rdma_cmd_get_context(&context->rdma_ctx, &cmd, sizeof cmd, + &resp.rdma_resp, sizeof resp)) goto err_free; - context->ibv_ctx.device = ibdev; - context->ibv_ctx.ops = amso_ctx_ops; - context->ibv_ctx.ops.req_notify_cq = amso_arm_cq; - context->ibv_ctx.ops.cq_event = NULL; - context->ibv_ctx.ops.post_send = amso_post_send; - context->ibv_ctx.ops.post_recv = amso_post_recv; - context->ibv_ctx.ops.post_srq_recv = amso_post_srq_recv; + context->rdma_ctx.device = ibdev; + context->rdma_ctx.ops = amso_ctx_ops; + context->rdma_ctx.ops.req_notify_cq = amso_arm_cq; + context->rdma_ctx.ops.cq_event = NULL; + context->rdma_ctx.ops.post_send = amso_post_send; + context->rdma_ctx.ops.post_recv = amso_post_recv; + context->rdma_ctx.ops.post_srq_recv = amso_post_srq_recv; - return &context->ibv_ctx; + return &context->rdma_ctx; err_free: free(context); return NULL; } -static void amso_free_context(struct ibv_context *ibctx) +static void amso_free_context(struct rdma_context *ibctx) { struct amso_context *context = to_amso_ctx(ibctx); free(context); } -static struct ibv_device_ops amso_dev_ops = { +static struct rdma_device_ops amso_dev_ops = { .alloc_context = amso_alloc_context, .free_context = amso_free_context }; -struct ibv_device *ibv_driver_init(const char *uverbs_sys_path, +struct rdma_device *rdma_driver_init(const char *uverbs_sys_path, int abi_version) { char value[8]; @@ -143,12 +143,12 @@ struct ibv_device *ibv_driver_init(const unsigned vendor, device; int i; - if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor", + if (rdma_read_sysfs_file(uverbs_sys_path, "device/vendor", value, sizeof value) < 0) return NULL; sscanf(value, "%i", &vendor); - if (ibv_read_sysfs_file(uverbs_sys_path, "device/device", + if (rdma_read_sysfs_file(uverbs_sys_path, "device/device", value, sizeof value) < 0) return NULL; sscanf(value, "%i", &device); @@ -167,23 +167,23 @@ found: return NULL; } - dev->ibv_dev.ops = amso_dev_ops; + dev->rdma_dev.ops = amso_dev_ops; dev->hca_type = hca_table[i].type; dev->page_size = sysconf(_SC_PAGESIZE); - return &dev->ibv_dev; + return &dev->rdma_dev; } #ifdef HAVE_SYSFS_LIBSYSFS_H -struct ibv_device *openib_driver_init(struct sysfs_class_device *sysdev) +struct rdma_device *openib_driver_init(struct sysfs_class_device *sysdev) { int abi_ver = 0; char value[8]; - if (ibv_read_sysfs_file(sysdev->path, "abi_version", + if (rdma_read_sysfs_file(sysdev->path, "abi_version", value, sizeof value) > 0) abi_ver = strtol(value, NULL, 10); - return ibv_driver_init(sysdev->path, abi_ver); + return rdma_driver_init(sysdev->path, abi_ver); } #endif /* HAVE_SYSFS_LIBSYSFS_H */ diff -ruNp ORG/libamso/src/amso.h NEW/libamso/src/amso.h --- ORG/libamso/src/amso.h 2006-07-09 22:12:10.000000000 -0700 +++ NEW/libamso/src/amso.h 2006-07-10 18:07:44.000000000 -0700 @@ -32,8 +32,8 @@ #ifndef AMSO_H #define AMSO_H -#include -#include +#include +#include #define HIDDEN __attribute__((visibility ("hidden"))) @@ -44,21 +44,21 @@ enum amso_hca_type { }; struct amso_device { - struct ibv_device ibv_dev; + struct rdma_device rdma_dev; enum amso_hca_type hca_type; int page_size; }; struct amso_context { - struct ibv_context ibv_ctx; + struct rdma_context rdma_ctx; }; struct amso_pd { - struct ibv_pd ibv_pd; + struct rdma_pd rdma_pd; }; struct amso_cq { - struct ibv_cq ibv_cq; + struct rdma_cq rdma_cq; __u32 cqid; __u32 entries; __u64 physaddr; @@ -66,7 +66,7 @@ struct amso_cq { }; struct amso_qp { - struct ibv_qp ibv_qp; + struct rdma_qp rdma_qp; __u32 qpid; __u32 entries; __u64 physaddr; @@ -76,81 +76,81 @@ struct amso_qp { #define to_amso_xxx(xxx, type) \ ((struct amso_##type *) \ - ((void *) ib##xxx - offsetof(struct amso_##type, ibv_##xxx))) + ((void *) ib##xxx - offsetof(struct amso_##type, rdma_##xxx))) -static inline struct amso_device *to_amso_dev(struct ibv_device *ibdev) +static inline struct amso_device *to_amso_dev(struct rdma_device *ibdev) { return to_amso_xxx(dev, device); } -static inline struct amso_context *to_amso_ctx(struct ibv_context *ibctx) +static inline struct amso_context *to_amso_ctx(struct rdma_context *ibctx) { return to_amso_xxx(ctx, context); } -static inline struct amso_pd *to_amso_pd(struct ibv_pd *ibpd) +static inline struct amso_pd *to_amso_pd(struct rdma_pd *ibpd) { return to_amso_xxx(pd, pd); } -static inline struct amso_cq *to_amso_cq(struct ibv_cq *ibcq) +static inline struct amso_cq *to_amso_cq(struct rdma_cq *ibcq) { return to_amso_xxx(cq, cq); } -static inline struct amso_qp *to_amso_qp(struct ibv_qp *ibqp) +static inline struct amso_qp *to_amso_qp(struct rdma_qp *ibqp) { return to_amso_xxx(qp, qp); } -extern int amso_query_device(struct ibv_context *context, - struct ibv_device_attr *attr); -extern int amso_query_port(struct ibv_context *context, uint8_t port, - struct ibv_port_attr *attr); - -extern struct ibv_pd *amso_alloc_pd(struct ibv_context *context); -extern int amso_free_pd(struct ibv_pd *pd); - -extern struct ibv_mr *amso_reg_mr(struct ibv_pd *pd, void *addr, - size_t length, enum ibv_access_flags access); -extern int amso_dereg_mr(struct ibv_mr *mr); +extern int amso_query_device(struct rdma_context *context, + struct rdma_device_attr *attr); +extern int amso_query_port(struct rdma_context *context, uint8_t port, + struct rdma_port_attr *attr); + +extern struct rdma_pd *amso_alloc_pd(struct rdma_context *context); +extern int amso_free_pd(struct rdma_pd *pd); + +extern struct rdma_mr *amso_reg_mr(struct rdma_pd *pd, void *addr, + size_t length, enum rdma_access_flags access); +extern int amso_dereg_mr(struct rdma_mr *mr); -struct ibv_cq *amso_create_cq(struct ibv_context *context, int cqe, - struct ibv_comp_channel *channel, +struct rdma_cq *amso_create_cq(struct rdma_context *context, int cqe, + struct rdma_comp_channel *channel, int comp_vector); -extern int amso_resize_cq(struct ibv_cq *cq, int cqe); -extern int amso_destroy_cq(struct ibv_cq *cq); -extern int amso_poll_cq(struct ibv_cq *cq, int ne, struct ibv_wc *wc); -extern int amso_arm_cq(struct ibv_cq *cq, int solicited); -extern void amso_cq_event(struct ibv_cq *cq); +extern int amso_resize_cq(struct rdma_cq *cq, int cqe); +extern int amso_destroy_cq(struct rdma_cq *cq); +extern int amso_poll_cq(struct rdma_cq *cq, int ne, struct rdma_wc *wc); +extern int amso_arm_cq(struct rdma_cq *cq, int solicited); +extern void amso_cq_event(struct rdma_cq *cq); extern void amso_init_cq_buf(struct amso_cq *cq, int nent); -extern struct ibv_srq *amso_create_srq(struct ibv_pd *pd, - struct ibv_srq_init_attr *attr); -extern int amso_modify_srq(struct ibv_srq *srq, - struct ibv_srq_attr *attr, - enum ibv_srq_attr_mask mask); -extern int amso_destroy_srq(struct ibv_srq *srq); -extern int amso_post_srq_recv(struct ibv_srq *ibsrq, - struct ibv_recv_wr *wr, - struct ibv_recv_wr **bad_wr); - -extern struct ibv_qp *amso_create_qp(struct ibv_pd *pd, - struct ibv_qp_init_attr *attr); -extern int amso_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, - enum ibv_qp_attr_mask attr_mask); -extern int amso_destroy_qp(struct ibv_qp *qp); -extern int amso_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, - struct ibv_send_wr **bad_wr); -extern int amso_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, - struct ibv_recv_wr **bad_wr); -extern struct ibv_ah *amso_create_ah(struct ibv_pd *pd, - struct ibv_ah_attr *ah_attr); -extern int amso_destroy_ah(struct ibv_ah *ah); -extern int amso_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, +extern struct rdma_srq *amso_create_srq(struct rdma_pd *pd, + struct rdma_srq_init_attr *attr); +extern int amso_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *attr, + enum rdma_srq_attr_mask mask); +extern int amso_destroy_srq(struct rdma_srq *srq); +extern int amso_post_srq_recv(struct rdma_srq *ibsrq, + struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr); + +extern struct rdma_qp *amso_create_qp(struct rdma_pd *pd, + struct rdma_qp_init_attr *attr); +extern int amso_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask); +extern int amso_destroy_qp(struct rdma_qp *qp); +extern int amso_post_send(struct rdma_qp *ibqp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr); +extern int amso_post_recv(struct rdma_qp *ibqp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr); +extern struct rdma_ah *amso_create_ah(struct rdma_pd *pd, + struct rdma_ah_attr *ah_attr); +extern int amso_destroy_ah(struct rdma_ah *ah); +extern int amso_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid); -extern int amso_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, +extern int amso_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid); #endif /* AMSO_H */ diff -ruNp ORG/libamso/src/amso.map NEW/libamso/src/amso.map --- ORG/libamso/src/amso.map 2006-07-09 22:12:10.000000000 -0700 +++ NEW/libamso/src/amso.map 2006-07-10 18:07:44.000000000 -0700 @@ -1,6 +1,6 @@ { global: - ibv_driver_init; + rdma_driver_init; openib_driver_init; local: *; }; diff -ruNp ORG/libamso/src/cq.c NEW/libamso/src/cq.c --- ORG/libamso/src/cq.c 2006-07-09 22:12:10.000000000 -0700 +++ NEW/libamso/src/cq.c 2006-07-10 18:07:45.000000000 -0700 @@ -37,21 +37,21 @@ #include #include -#include +#include #include "amso.h" #include "amso-abi.h" -int amso_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc) +int amso_poll_cq(struct rdma_cq *ibcq, int ne, struct rdma_wc *wc) { - return ibv_cmd_poll_cq(ibcq, ne, wc); + return rdma_cmd_poll_cq(ibcq, ne, wc); } -int amso_arm_cq(struct ibv_cq *cq, int solicited) +int amso_arm_cq(struct rdma_cq *cq, int solicited) { - return ibv_cmd_req_notify_cq(cq, solicited); + return rdma_cmd_req_notify_cq(cq, solicited); } diff -ruNp ORG/libamso/src/qp.c NEW/libamso/src/qp.c --- ORG/libamso/src/qp.c 2006-07-09 22:12:10.000000000 -0700 +++ NEW/libamso/src/qp.c 2006-07-10 18:07:45.000000000 -0700 @@ -41,15 +41,15 @@ #include "amso.h" #include -int amso_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, - struct ibv_send_wr **bad_wr) +int amso_post_send(struct rdma_qp *ibqp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr) { - return ibv_cmd_post_send(ibqp, wr, bad_wr); + return rdma_cmd_post_send(ibqp, wr, bad_wr); } -int amso_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, - struct ibv_recv_wr **bad_wr) +int amso_post_recv(struct rdma_qp *ibqp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) { - return ibv_cmd_post_recv(ibqp, wr, bad_wr); + return rdma_cmd_post_recv(ibqp, wr, bad_wr); } diff -ruNp ORG/libamso/src/verbs.c NEW/libamso/src/verbs.c --- ORG/libamso/src/verbs.c 2006-07-09 22:12:10.000000000 -0700 +++ NEW/libamso/src/verbs.c 2006-07-10 18:07:45.000000000 -0700 @@ -46,15 +46,15 @@ #include "amso-abi.h" -int amso_query_device(struct ibv_context *context, struct ibv_device_attr *attr) +int amso_query_device(struct rdma_context *context, struct rdma_device_attr *attr) { - struct ibv_query_device cmd; + struct rdma_query_device cmd; uint64_t raw_fw_ver; unsigned major, minor, sub_minor; int ret; ret = - ibv_cmd_query_device(context, attr, &raw_fw_ver, &cmd, sizeof cmd); + rdma_cmd_query_device(context, attr, &raw_fw_ver, &cmd, sizeof cmd); if (ret) return ret; @@ -68,17 +68,17 @@ int amso_query_device(struct ibv_context return 0; } -int amso_query_port(struct ibv_context *context, uint8_t port, - struct ibv_port_attr *attr) +int amso_query_port(struct rdma_context *context, uint8_t port, + struct rdma_port_attr *attr) { - struct ibv_query_port cmd; + struct rdma_query_port cmd; - return ibv_cmd_query_port(context, port, attr, &cmd, sizeof cmd); + return rdma_cmd_query_port(context, port, attr, &cmd, sizeof cmd); } -struct ibv_pd *amso_alloc_pd(struct ibv_context *context) +struct rdma_pd *amso_alloc_pd(struct rdma_context *context) { - struct ibv_alloc_pd cmd; + struct rdma_alloc_pd cmd; struct amso_alloc_pd_resp resp; struct amso_pd *pd; @@ -86,20 +86,20 @@ struct ibv_pd *amso_alloc_pd(struct ibv_ if (!pd) return NULL; - if (ibv_cmd_alloc_pd(context, &pd->ibv_pd, &cmd, sizeof cmd, - &resp.ibv_resp, sizeof resp)) { + if (rdma_cmd_alloc_pd(context, &pd->rdma_pd, &cmd, sizeof cmd, + &resp.rdma_resp, sizeof resp)) { free(pd); return NULL; } - return &pd->ibv_pd; + return &pd->rdma_pd; } -int amso_free_pd(struct ibv_pd *pd) +int amso_free_pd(struct rdma_pd *pd) { int ret; - ret = ibv_cmd_dealloc_pd(pd); + ret = rdma_cmd_dealloc_pd(pd); if (ret) return ret; @@ -107,18 +107,18 @@ int amso_free_pd(struct ibv_pd *pd) return 0; } -static struct ibv_mr *__amso_reg_mr(struct ibv_pd *pd, void *addr, +static struct rdma_mr *__amso_reg_mr(struct rdma_pd *pd, void *addr, size_t length, uint64_t hca_va, - enum ibv_access_flags access) + enum rdma_access_flags access) { - struct ibv_mr *mr; - struct ibv_reg_mr cmd; + struct rdma_mr *mr; + struct rdma_reg_mr cmd; mr = malloc(sizeof *mr); if (!mr) return NULL; - if (ibv_cmd_reg_mr(pd, addr, length, hca_va, + if (rdma_cmd_reg_mr(pd, addr, length, hca_va, access, mr, &cmd, sizeof cmd)) { free(mr); return NULL; @@ -127,17 +127,17 @@ static struct ibv_mr *__amso_reg_mr(stru return mr; } -struct ibv_mr *amso_reg_mr(struct ibv_pd *pd, void *addr, - size_t length, enum ibv_access_flags access) +struct rdma_mr *amso_reg_mr(struct rdma_pd *pd, void *addr, + size_t length, enum rdma_access_flags access) { return __amso_reg_mr(pd, addr, length, (uintptr_t) addr, access); } -int amso_dereg_mr(struct ibv_mr *mr) +int amso_dereg_mr(struct rdma_mr *mr) { int ret; - ret = ibv_cmd_dereg_mr(mr); + ret = rdma_cmd_dereg_mr(mr); if (ret) return ret; @@ -145,8 +145,8 @@ int amso_dereg_mr(struct ibv_mr *mr) return 0; } -struct ibv_cq *amso_create_cq(struct ibv_context *context, int cqe, - struct ibv_comp_channel *channel, int comp_vector) +struct rdma_cq *amso_create_cq(struct rdma_context *context, int cqe, + struct rdma_comp_channel *channel, int comp_vector) { struct amso_create_cq cmd; struct amso_create_cq_resp resp; @@ -157,9 +157,9 @@ struct ibv_cq *amso_create_cq(struct ibv if (!cq) return NULL; - ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector, - &cq->ibv_cq, &cmd.ibv_cmd, sizeof cmd, - &resp.ibv_resp, sizeof resp); + ret = rdma_cmd_create_cq(context, cqe, channel, comp_vector, + &cq->rdma_cq, &cmd.rdma_cmd, sizeof cmd, + &resp.rdma_resp, sizeof resp); if (ret) goto err; @@ -170,7 +170,7 @@ struct ibv_cq *amso_create_cq(struct ibv MAP_SHARED, context->cmd_fd, cq->physaddr); #endif - return &cq->ibv_cq; + return &cq->rdma_cq; err: @@ -179,12 +179,12 @@ err: return NULL; } -int amso_resize_cq(struct ibv_cq *cq, int cqe) +int amso_resize_cq(struct rdma_cq *cq, int cqe) { int ret; - struct ibv_resize_cq cmd; + struct rdma_resize_cq cmd; - ret = ibv_cmd_resize_cq(cq, cqe, &cmd, sizeof cmd); + ret = rdma_cmd_resize_cq(cq, cqe, &cmd, sizeof cmd); if (ret) return ret; /* We will need to unmap and remap when we implement user mode */ @@ -192,41 +192,41 @@ int amso_resize_cq(struct ibv_cq *cq, in return 0; } -int amso_destroy_cq(struct ibv_cq *cq) +int amso_destroy_cq(struct rdma_cq *cq) { int ret; - ret = ibv_cmd_destroy_cq(cq); + ret = rdma_cmd_destroy_cq(cq); if (ret) return ret; return 0; } -struct ibv_srq *amso_create_srq(struct ibv_pd *pd, - struct ibv_srq_init_attr *attr) +struct rdma_srq *amso_create_srq(struct rdma_pd *pd, + struct rdma_srq_init_attr *attr) { return (void *) -ENOSYS; } -int amso_modify_srq(struct ibv_srq *srq, - struct ibv_srq_attr *attr, enum ibv_srq_attr_mask attr_mask) +int amso_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *attr, enum rdma_srq_attr_mask attr_mask) { return -ENOSYS; } -int amso_destroy_srq(struct ibv_srq *srq) +int amso_destroy_srq(struct rdma_srq *srq) { return -ENOSYS; } -int amso_post_srq_recv(struct ibv_srq *ibsrq, - struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr) +int amso_post_srq_recv(struct rdma_srq *ibsrq, + struct rdma_recv_wr *wr, struct rdma_recv_wr **bad_wr) { return -ENOSYS; } -struct ibv_qp *amso_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) +struct rdma_qp *amso_create_qp(struct rdma_pd *pd, struct rdma_qp_init_attr *attr) { struct amso_create_qp cmd; struct amso_create_qp_resp resp; @@ -244,8 +244,8 @@ struct ibv_qp *amso_create_qp(struct ibv if (!qp) return NULL; - ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd, - &resp.ibv_resp, sizeof resp); + ret = rdma_cmd_create_qp(pd, &qp->rdma_qp, attr, &cmd.rdma_cmd, sizeof cmd, + &resp.rdma_resp, sizeof resp); if (ret) goto err; @@ -253,26 +253,26 @@ struct ibv_qp *amso_create_qp(struct ibv qp->physaddr = resp.physaddr; #endif - return &qp->ibv_qp; + return &qp->rdma_qp; err: free(qp); return NULL; } -int amso_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, - enum ibv_qp_attr_mask attr_mask) +int amso_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask) { - struct ibv_modify_qp cmd; + struct rdma_modify_qp cmd; - return ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof cmd); + return rdma_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof cmd); } -int amso_destroy_qp(struct ibv_qp *qp) +int amso_destroy_qp(struct rdma_qp *qp) { int ret; - ret = ibv_cmd_destroy_qp(qp); + ret = rdma_cmd_destroy_qp(qp); if (ret) return ret; @@ -281,22 +281,22 @@ int amso_destroy_qp(struct ibv_qp *qp) return 0; } -struct ibv_ah *amso_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) +struct rdma_ah *amso_create_ah(struct rdma_pd *pd, struct rdma_ah_attr *attr) { return (void *) -ENOSYS; } -int amso_destroy_ah(struct ibv_ah *ah) +int amso_destroy_ah(struct rdma_ah *ah) { return -ENOSYS; } -int amso_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +int amso_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) { return -ENOSYS; } -int amso_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +int amso_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) { return -ENOSYS; } From krkumar2 at in.ibm.com Mon Jul 10 03:39:45 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 10 Jul 2006 16:09:45 +0530 Subject: [openib-general] [RFC] [PATCH 4/7] ibrdmaverbs src files In-Reply-To: <1152527056.28048.10.camel@localhost.localdomain> References: <1152526945.28048.7.camel@localhost.localdomain> <1152527056.28048.10.camel@localhost.localdomain> Message-ID: <1152527985.28182.2.camel@localhost.localdomain> Resending this with correct subject. Also set "Preformat". Thanks, - KK This library provides equivalent functionality to libibverbs, but changes the data types and verb API's to be transport neutral. This patch contains the source files. Signed-of-by: Krishna Kumar diff -ruNp ORG/librdmaverbs/src/cmd.c NEW/librdmaverbs/src/cmd.c --- ORG/librdmaverbs/src/cmd.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/cmd.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,1060 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005 PathScale, Inc. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: cmd.c 7631 2006-06-02 19:53:25Z swise $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include + +#include "rdmaverbs.h" + +static int rdma_cmd_get_context_v2(struct rdma_context *context, + struct rdma_get_context *new_cmd, + size_t new_cmd_size, + struct rdma_get_context_resp *resp, + size_t resp_size) +{ + struct rdma_abi_compat_v2 *t; + struct rdma_get_context_v2 *cmd; + size_t cmd_size; + uint32_t cq_fd; + + t = malloc(sizeof *t); + if (!t) + return ENOMEM; + pthread_mutex_init(&t->in_use, NULL); + + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; + cmd = alloca(cmd_size); + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); + cmd->cq_fd_tab = (uintptr_t) &cq_fd; + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + context->async_fd = resp->async_fd; + context->num_comp_vectors = 1; + t->channel.fd = cq_fd; + context->abi_compat = t; + + return 0; +} + +int rdma_cmd_get_context(struct rdma_context *context, struct rdma_get_context *cmd, + size_t cmd_size, struct rdma_get_context_resp *resp, + size_t resp_size) +{ + if (abi_ver <= 2) + return rdma_cmd_get_context_v2(context, cmd, cmd_size, resp, resp_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + context->async_fd = resp->async_fd; + context->num_comp_vectors = resp->num_comp_vectors; + + return 0; +} + +int rdma_cmd_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr, + uint64_t *raw_fw_ver, + struct rdma_query_device *cmd, size_t cmd_size) +{ + struct rdma_query_device_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_DEVICE, &resp, sizeof resp); + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + memset(device_attr->fw_ver, 0, sizeof device_attr->fw_ver); + *raw_fw_ver = resp.fw_ver; + device_attr->node_guid = resp.node_guid; + device_attr->sys_image_guid = resp.sys_image_guid; + device_attr->max_mr_size = resp.max_mr_size; + device_attr->page_size_cap = resp.page_size_cap; + device_attr->vendor_id = resp.vendor_id; + device_attr->vendor_part_id = resp.vendor_part_id; + device_attr->hw_ver = resp.hw_ver; + device_attr->max_qp = resp.max_qp; + device_attr->max_qp_wr = resp.max_qp_wr; + device_attr->device_cap_flags = resp.device_cap_flags; + device_attr->max_sge = resp.max_sge; + device_attr->max_sge_rd = resp.max_sge_rd; + device_attr->max_cq = resp.max_cq; + device_attr->max_cqe = resp.max_cqe; + device_attr->max_mr = resp.max_mr; + device_attr->max_pd = resp.max_pd; + device_attr->max_qp_rd_atom = resp.max_qp_rd_atom; + device_attr->max_ee_rd_atom = resp.max_ee_rd_atom; + device_attr->max_res_rd_atom = resp.max_res_rd_atom; + device_attr->max_qp_init_rd_atom = resp.max_qp_init_rd_atom; + device_attr->max_ee_init_rd_atom = resp.max_ee_init_rd_atom; + device_attr->atomic_cap = resp.atomic_cap; + device_attr->max_ee = resp.max_ee; + device_attr->max_rdd = resp.max_rdd; + device_attr->max_mw = resp.max_mw; + device_attr->max_raw_ipv6_qp = resp.max_raw_ipv6_qp; + device_attr->max_raw_ethy_qp = resp.max_raw_ethy_qp; + device_attr->max_mcast_grp = resp.max_mcast_grp; + device_attr->max_mcast_qp_attach = resp.max_mcast_qp_attach; + device_attr->max_total_mcast_qp_attach = resp.max_total_mcast_qp_attach; + device_attr->max_ah = resp.max_ah; + device_attr->max_fmr = resp.max_fmr; + device_attr->max_map_per_fmr = resp.max_map_per_fmr; + device_attr->max_srq = resp.max_srq; + device_attr->max_srq_wr = resp.max_srq_wr; + device_attr->max_srq_sge = resp.max_srq_sge; + device_attr->max_pkeys = resp.max_pkeys; + device_attr->local_ca_ack_delay = resp.local_ca_ack_delay; + device_attr->phys_port_cnt = resp.phys_port_cnt; + + return 0; +} + +int rdma_cmd_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr, + struct rdma_query_port *cmd, size_t cmd_size) +{ + struct rdma_query_port_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_PORT, &resp, sizeof resp); + cmd->port_num = port_num; + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + port_attr->state = resp.state; + port_attr->max_mtu = resp.max_mtu; + port_attr->active_mtu = resp.active_mtu; + port_attr->gid_tbl_len = resp.gid_tbl_len; + port_attr->port_cap_flags = resp.port_cap_flags; + port_attr->max_msg_sz = resp.max_msg_sz; + port_attr->bad_pkey_cntr = resp.bad_pkey_cntr; + port_attr->qkey_viol_cntr = resp.qkey_viol_cntr; + port_attr->pkey_tbl_len = resp.pkey_tbl_len; + port_attr->lid = resp.lid; + port_attr->sm_lid = resp.sm_lid; + port_attr->lmc = resp.lmc; + port_attr->max_vl_num = resp.max_vl_num; + port_attr->sm_sl = resp.sm_sl; + port_attr->subnet_timeout = resp.subnet_timeout; + port_attr->init_type_reply = resp.init_type_reply; + port_attr->active_width = resp.active_width; + port_attr->active_speed = resp.active_speed; + port_attr->phys_state = resp.phys_state; + + return 0; +} + +int rdma_cmd_alloc_pd(struct rdma_context *context, struct rdma_pd *pd, + struct rdma_alloc_pd *cmd, size_t cmd_size, + struct rdma_alloc_pd_resp *resp, size_t resp_size) +{ + RDMA_INIT_CMD_RESP(cmd, cmd_size, ALLOC_PD, resp, resp_size); + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + pd->handle = resp->pd_handle; + + return 0; +} + +int rdma_cmd_dealloc_pd(struct rdma_pd *pd) +{ + struct rdma_dealloc_pd cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DEALLOC_PD); + cmd.pd_handle = pd->handle; + + if (write(pd->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_reg_mr(struct rdma_pd *pd, void *addr, size_t length, + uint64_t hca_va, enum rdma_access_flags access, + struct rdma_mr *mr, struct rdma_reg_mr *cmd, + size_t cmd_size) +{ + struct rdma_reg_mr_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, REG_MR, &resp, sizeof resp); + + cmd->start = (uintptr_t) addr; + cmd->length = length; + cmd->hca_va = hca_va; + cmd->pd_handle = pd->handle; + cmd->access_flags = access; + + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + mr->handle = resp.mr_handle; + mr->lkey = resp.lkey; + mr->rkey = resp.rkey; + + return 0; +} + +int rdma_cmd_dereg_mr(struct rdma_mr *mr) +{ + struct rdma_dereg_mr cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DEREG_MR); + cmd.mr_handle = mr->handle; + + if (write(mr->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +static int rdma_cmd_create_cq_v2(struct rdma_context *context, int cqe, + struct rdma_cq *cq, + struct rdma_create_cq *new_cmd, size_t new_cmd_size, + struct rdma_create_cq_resp *resp, size_t resp_size) +{ + struct rdma_create_cq_v2 *cmd; + size_t cmd_size; + + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; + cmd = alloca(cmd_size); + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); + cmd->user_handle = (uintptr_t) cq; + cmd->cqe = cqe; + cmd->event_handler = 0; + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + cq->handle = resp->cq_handle; + cq->cqe = resp->cqe; + + return 0; +} + +int rdma_cmd_create_cq(struct rdma_context *context, int cqe, + struct rdma_comp_channel *channel, + int comp_vector, struct rdma_cq *cq, + struct rdma_create_cq *cmd, size_t cmd_size, + struct rdma_create_cq_resp *resp, size_t resp_size) +{ + if (abi_ver <= 2) + return rdma_cmd_create_cq_v2(context, cqe, cq, + cmd, cmd_size, resp, resp_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); + cmd->user_handle = (uintptr_t) cq; + cmd->cqe = cqe; + cmd->comp_vector = comp_vector; + cmd->comp_channel = channel ? channel->fd : -1; + cmd->reserved = 0; + + if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + cq->handle = resp->cq_handle; + cq->cqe = resp->cqe; + + return 0; +} + +int rdma_cmd_poll_cq(struct rdma_cq *ibcq, int ne, struct rdma_wc *wc) +{ + struct rdma_poll_cq cmd; + struct rdma_poll_cq_resp *resp; + int i; + int rsize; + int ret; + + rsize = sizeof *resp + ne * sizeof(struct rdma_kern_wc); + resp = malloc(rsize); + if (!resp) + return -1; + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, POLL_CQ, resp, rsize); + cmd.cq_handle = ibcq->handle; + cmd.ne = ne; + + if (write(ibcq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) { + ret = -1; + goto out; + } + + for (i = 0; i < resp->count; i++) { + wc[i].wr_id = resp->wc[i].wr_id; + wc[i].status = resp->wc[i].status; + wc[i].opcode = resp->wc[i].opcode; + wc[i].vendor_err = resp->wc[i].vendor_err; + wc[i].byte_len = resp->wc[i].byte_len; + wc[i].imm_data = resp->wc[i].imm_data; + wc[i].qp_num = resp->wc[i].qp_num; + wc[i].src_qp = resp->wc[i].src_qp; + wc[i].wc_flags = resp->wc[i].wc_flags; + wc[i].pkey_index = resp->wc[i].pkey_index; + wc[i].slid = resp->wc[i].slid; + wc[i].sl = resp->wc[i].sl; + wc[i].dlid_path_bits = resp->wc[i].dlid_path_bits; + } + + ret = resp->count; + +out: + free(resp); + return ret; +} + +int rdma_cmd_req_notify_cq(struct rdma_cq *ibcq, int solicited_only) +{ + struct rdma_req_notify_cq cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, REQ_NOTIFY_CQ); + cmd.cq_handle = ibcq->handle; + cmd.solicited = !!solicited_only; + + if (write(ibcq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_resize_cq(struct rdma_cq *cq, int cqe, + struct rdma_resize_cq *cmd, size_t cmd_size) +{ + struct rdma_resize_cq_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, RESIZE_CQ, &resp, sizeof resp); + cmd->cq_handle = cq->handle; + cmd->cqe = cqe; + + if (write(cq->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + cq->cqe = resp.cqe; + + return 0; +} + +static int rdma_cmd_destroy_cq_v1(struct rdma_cq *cq) +{ + struct rdma_destroy_cq_v1 cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_CQ); + cmd.cq_handle = cq->handle; + + if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_destroy_cq(struct rdma_cq *cq) +{ + struct rdma_destroy_cq cmd; + struct rdma_destroy_cq_resp resp; + + if (abi_ver == 1) + return rdma_cmd_destroy_cq_v1(cq); + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_CQ, &resp, sizeof resp); + cmd.cq_handle = cq->handle; + + if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + pthread_mutex_lock(&cq->mutex); + while (cq->comp_events_completed != resp.comp_events_reported || + cq->async_events_completed != resp.async_events_reported) + pthread_cond_wait(&cq->cond, &cq->mutex); + pthread_mutex_unlock(&cq->mutex); + + return 0; +} + +int rdma_cmd_create_srq(struct rdma_pd *pd, + struct rdma_srq *srq, struct rdma_srq_init_attr *attr, + struct rdma_create_srq *cmd, size_t cmd_size, + struct rdma_create_srq_resp *resp, size_t resp_size) +{ + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_SRQ, resp, resp_size); + cmd->user_handle = (uintptr_t) srq; + cmd->pd_handle = pd->handle; + cmd->max_wr = attr->attr.max_wr; + cmd->max_sge = attr->attr.max_sge; + cmd->srq_limit = attr->attr.srq_limit; + + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + srq->handle = resp->srq_handle; + + if (abi_ver > 5) { + attr->attr.max_wr = resp->max_wr; + attr->attr.max_sge = resp->max_sge; + } else { + struct rdma_create_srq_resp_v5 *resp_v5 = + (struct rdma_create_srq_resp_v5 *) resp; + + memmove((void *) resp + sizeof *resp, + (void *) resp_v5 + sizeof *resp_v5, + resp_size - sizeof *resp); + } + + return 0; +} + +static int rdma_cmd_modify_srq_v3(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask, + struct rdma_modify_srq *new_cmd, + size_t new_cmd_size) +{ + struct rdma_modify_srq_v3 *cmd; + size_t cmd_size; + + cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; + cmd = alloca(cmd_size); + memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); + + cmd->srq_handle = srq->handle; + cmd->attr_mask = srq_attr_mask; + cmd->max_wr = srq_attr->max_wr; + cmd->srq_limit = srq_attr->srq_limit; + cmd->max_sge = 0; + cmd->reserved = 0; + + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + return 0; +} + +int rdma_cmd_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask, + struct rdma_modify_srq *cmd, size_t cmd_size) +{ + if (abi_ver == 3) + return rdma_cmd_modify_srq_v3(srq, srq_attr, srq_attr_mask, + cmd, cmd_size); + + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); + + cmd->srq_handle = srq->handle; + cmd->attr_mask = srq_attr_mask; + cmd->max_wr = srq_attr->max_wr; + cmd->srq_limit = srq_attr->srq_limit; + + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + return 0; +} + +int rdma_cmd_query_srq(struct rdma_srq *srq, struct rdma_srq_attr *srq_attr, + struct rdma_query_srq *cmd, size_t cmd_size) +{ + struct rdma_query_srq_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_SRQ, &resp, sizeof resp); + cmd->srq_handle = srq->handle; + + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + srq_attr->max_wr = resp.max_wr; + srq_attr->max_sge = resp.max_sge; + srq_attr->srq_limit = resp.srq_limit; + + return 0; +} + +static int rdma_cmd_destroy_srq_v1(struct rdma_srq *srq) +{ + struct rdma_destroy_srq_v1 cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_SRQ); + cmd.srq_handle = srq->handle; + + if (write(srq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_destroy_srq(struct rdma_srq *srq) +{ + struct rdma_destroy_srq cmd; + struct rdma_destroy_srq_resp resp; + + if (abi_ver == 1) + return rdma_cmd_destroy_srq_v1(srq); + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_SRQ, &resp, sizeof resp); + cmd.srq_handle = srq->handle; + + if (write(srq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + pthread_mutex_lock(&srq->mutex); + while (srq->events_completed != resp.events_reported) + pthread_cond_wait(&srq->cond, &srq->mutex); + pthread_mutex_unlock(&srq->mutex); + + return 0; +} + +int rdma_cmd_create_qp(struct rdma_pd *pd, + struct rdma_qp *qp, struct rdma_qp_init_attr *attr, + struct rdma_create_qp *cmd, size_t cmd_size, + struct rdma_create_qp_resp *resp, size_t resp_size) +{ + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_QP, resp, resp_size); + + cmd->user_handle = (uintptr_t) qp; + cmd->pd_handle = pd->handle; + cmd->send_cq_handle = attr->send_cq->handle; + cmd->recv_cq_handle = attr->recv_cq->handle; + cmd->srq_handle = attr->srq ? attr->srq->handle : 0; + cmd->max_send_wr = attr->cap.max_send_wr; + cmd->max_recv_wr = attr->cap.max_recv_wr; + cmd->max_send_sge = attr->cap.max_send_sge; + cmd->max_recv_sge = attr->cap.max_recv_sge; + cmd->max_inline_data = attr->cap.max_inline_data; + cmd->sq_sig_all = attr->sq_sig_all; + cmd->qp_type = attr->qp_type; + cmd->is_srq = !!attr->srq; + + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + qp->handle = resp->qp_handle; + qp->qp_num = resp->qpn; + + if (abi_ver > 3) { + attr->cap.max_recv_sge = resp->max_recv_sge; + attr->cap.max_send_sge = resp->max_send_sge; + attr->cap.max_recv_wr = resp->max_recv_wr; + attr->cap.max_send_wr = resp->max_send_wr; + attr->cap.max_inline_data = resp->max_inline_data; + } + + if (abi_ver == 4) { + struct rdma_create_qp_resp_v4 *resp_v4 = + (struct rdma_create_qp_resp_v4 *) resp; + + memmove((void *) resp + sizeof *resp, + (void *) resp_v4 + sizeof *resp_v4, + resp_size - sizeof *resp); + } else if (abi_ver <= 3) { + struct rdma_create_qp_resp_v3 *resp_v3 = + (struct rdma_create_qp_resp_v3 *) resp; + + memmove((void *) resp + sizeof *resp, + (void *) resp_v3 + sizeof *resp_v3, + resp_size - sizeof *resp); + } + + return 0; +} + +int rdma_cmd_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr, + struct rdma_query_qp *cmd, size_t cmd_size) +{ + struct rdma_query_qp_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_QP, &resp, sizeof resp); + cmd->qp_handle = qp->handle; + cmd->attr_mask = attr_mask; + + if (write(qp->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + attr->qkey = resp.qkey; + attr->rq_psn = resp.rq_psn; + attr->sq_psn = resp.sq_psn; + attr->dest_qp_num = resp.dest_qp_num; + attr->qp_access_flags = resp.qp_access_flags; + attr->pkey_index = resp.pkey_index; + attr->alt_pkey_index = resp.alt_pkey_index; + attr->qp_state = resp.qp_state; + attr->cur_qp_state = resp.cur_qp_state; + attr->path_mtu = resp.path_mtu; + attr->path_mig_state = resp.path_mig_state; + attr->en_sqd_async_notify = resp.en_sqd_async_notify; + attr->max_rd_atomic = resp.max_rd_atomic; + attr->max_dest_rd_atomic = resp.max_dest_rd_atomic; + attr->min_rnr_timer = resp.min_rnr_timer; + attr->port_num = resp.port_num; + attr->timeout = resp.timeout; + attr->retry_cnt = resp.retry_cnt; + attr->rnr_retry = resp.rnr_retry; + attr->alt_port_num = resp.alt_port_num; + attr->alt_timeout = resp.alt_timeout; + attr->cap.max_send_wr = resp.max_send_wr; + attr->cap.max_recv_wr = resp.max_recv_wr; + attr->cap.max_send_sge = resp.max_send_sge; + attr->cap.max_recv_sge = resp.max_recv_sge; + attr->cap.max_inline_data = resp.max_inline_data; + + memcpy(attr->ah_attr.grh.dgid.raw, resp.dest.dgid, 16); + attr->ah_attr.grh.flow_label = resp.dest.flow_label; + attr->ah_attr.dlid = resp.dest.dlid; + attr->ah_attr.grh.sgid_index = resp.dest.sgid_index; + attr->ah_attr.grh.hop_limit = resp.dest.hop_limit; + attr->ah_attr.grh.traffic_class = resp.dest.traffic_class; + attr->ah_attr.sl = resp.dest.sl; + attr->ah_attr.src_path_bits = resp.dest.src_path_bits; + attr->ah_attr.static_rate = resp.dest.static_rate; + attr->ah_attr.is_global = resp.dest.is_global; + attr->ah_attr.port_num = resp.dest.port_num; + + memcpy(attr->alt_ah_attr.grh.dgid.raw, resp.alt_dest.dgid, 16); + attr->alt_ah_attr.grh.flow_label = resp.alt_dest.flow_label; + attr->alt_ah_attr.dlid = resp.alt_dest.dlid; + attr->alt_ah_attr.grh.sgid_index = resp.alt_dest.sgid_index; + attr->alt_ah_attr.grh.hop_limit = resp.alt_dest.hop_limit; + attr->alt_ah_attr.grh.traffic_class = resp.alt_dest.traffic_class; + attr->alt_ah_attr.sl = resp.alt_dest.sl; + attr->alt_ah_attr.src_path_bits = resp.alt_dest.src_path_bits; + attr->alt_ah_attr.static_rate = resp.alt_dest.static_rate; + attr->alt_ah_attr.is_global = resp.alt_dest.is_global; + attr->alt_ah_attr.port_num = resp.alt_dest.port_num; + + init_attr->qp_context = qp->qp_context; + init_attr->send_cq = qp->send_cq; + init_attr->recv_cq = qp->recv_cq; + init_attr->srq = qp->srq; + init_attr->qp_type = qp->qp_type; + init_attr->cap.max_send_wr = resp.max_send_wr; + init_attr->cap.max_recv_wr = resp.max_recv_wr; + init_attr->cap.max_send_sge = resp.max_send_sge; + init_attr->cap.max_recv_sge = resp.max_recv_sge; + init_attr->cap.max_inline_data = resp.max_inline_data; + init_attr->sq_sig_all = resp.sq_sig_all; + + return 0; +} + +int rdma_cmd_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_modify_qp *cmd, size_t cmd_size) +{ + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_QP); + + cmd->qp_handle = qp->handle; + cmd->attr_mask = attr_mask; + cmd->qkey = attr->qkey; + cmd->rq_psn = attr->rq_psn; + cmd->sq_psn = attr->sq_psn; + cmd->dest_qp_num = attr->dest_qp_num; + cmd->qp_access_flags = attr->qp_access_flags; + cmd->pkey_index = attr->pkey_index; + cmd->alt_pkey_index = attr->alt_pkey_index; + cmd->qp_state = attr->qp_state; + cmd->cur_qp_state = attr->cur_qp_state; + cmd->path_mtu = attr->path_mtu; + cmd->path_mig_state = attr->path_mig_state; + cmd->en_sqd_async_notify = attr->en_sqd_async_notify; + cmd->max_rd_atomic = attr->max_rd_atomic; + cmd->max_dest_rd_atomic = attr->max_dest_rd_atomic; + cmd->min_rnr_timer = attr->min_rnr_timer; + cmd->port_num = attr->port_num; + cmd->timeout = attr->timeout; + cmd->retry_cnt = attr->retry_cnt; + cmd->rnr_retry = attr->rnr_retry; + cmd->alt_port_num = attr->alt_port_num; + cmd->alt_timeout = attr->alt_timeout; + + memcpy(cmd->dest.dgid, attr->ah_attr.grh.dgid.raw, 16); + cmd->dest.flow_label = attr->ah_attr.grh.flow_label; + cmd->dest.dlid = attr->ah_attr.dlid; + cmd->dest.sgid_index = attr->ah_attr.grh.sgid_index; + cmd->dest.hop_limit = attr->ah_attr.grh.hop_limit; + cmd->dest.traffic_class = attr->ah_attr.grh.traffic_class; + cmd->dest.sl = attr->ah_attr.sl; + cmd->dest.src_path_bits = attr->ah_attr.src_path_bits; + cmd->dest.static_rate = attr->ah_attr.static_rate; + cmd->dest.is_global = attr->ah_attr.is_global; + cmd->dest.port_num = attr->ah_attr.port_num; + + memcpy(cmd->alt_dest.dgid, attr->alt_ah_attr.grh.dgid.raw, 16); + cmd->alt_dest.flow_label = attr->alt_ah_attr.grh.flow_label; + cmd->alt_dest.dlid = attr->alt_ah_attr.dlid; + cmd->alt_dest.sgid_index = attr->alt_ah_attr.grh.sgid_index; + cmd->alt_dest.hop_limit = attr->alt_ah_attr.grh.hop_limit; + cmd->alt_dest.traffic_class = attr->alt_ah_attr.grh.traffic_class; + cmd->alt_dest.sl = attr->alt_ah_attr.sl; + cmd->alt_dest.src_path_bits = attr->alt_ah_attr.src_path_bits; + cmd->alt_dest.static_rate = attr->alt_ah_attr.static_rate; + cmd->alt_dest.is_global = attr->alt_ah_attr.is_global; + cmd->alt_dest.port_num = attr->alt_ah_attr.port_num; + + if (write(qp->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + return 0; +} + +static int rdma_cmd_destroy_qp_v1(struct rdma_qp *qp) +{ + struct rdma_destroy_qp_v1 cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_QP); + cmd.qp_handle = qp->handle; + + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_post_send(struct rdma_qp *ibqp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr) +{ + struct rdma_post_send *cmd; + struct rdma_post_send_resp resp; + struct rdma_send_wr *i; + struct rdma_kern_send_wr *n, *tmp; + struct rdma_sge *s; + unsigned wr_count = 0; + unsigned sge_count = 0; + int cmd_size; + int ret = 0; + + for (i = wr; i; i = i->next) { + wr_count++; + sge_count += i->num_sge; + } + + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; + cmd = alloca(cmd_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_SEND, &resp, sizeof resp); + cmd->qp_handle = ibqp->handle; + cmd->wr_count = wr_count; + cmd->sge_count = sge_count; + cmd->wqe_size = sizeof *n; + + n = (struct rdma_kern_send_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); + + tmp = n; + for (i = wr; i; i = i->next) { + tmp->wr_id = i->wr_id; + tmp->num_sge = i->num_sge; + tmp->opcode = i->opcode; + tmp->send_flags = i->send_flags; + tmp->imm_data = i->imm_data; + if (ibqp->qp_type == RDMA_QPT_UD) { + tmp->wr.ud.ah = i->wr.ud.ah->handle; + tmp->wr.ud.remote_qpn = i->wr.ud.remote_qpn; + tmp->wr.ud.remote_qkey = i->wr.ud.remote_qkey; + } else { + switch(i->opcode) { + case RDMA_WR_RDMA_WRITE: + case RDMA_WR_RDMA_WRITE_WITH_IMM: + case RDMA_WR_RDMA_READ: + tmp->wr.rdma.remote_addr = + i->wr.rdma.remote_addr; + tmp->wr.rdma.rkey = i->wr.rdma.rkey; + break; + case RDMA_WR_ATOMIC_CMP_AND_SWP: + case RDMA_WR_ATOMIC_FETCH_AND_ADD: + tmp->wr.atomic.remote_addr = + i->wr.atomic.remote_addr; + tmp->wr.atomic.compare_add = + i->wr.atomic.compare_add; + tmp->wr.atomic.swap = i->wr.atomic.swap; + tmp->wr.atomic.rkey = i->wr.atomic.rkey; + break; + default: + break; + } + } + + if (tmp->num_sge) { + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); + s += tmp->num_sge; + } + + tmp++; + } + + resp.bad_wr = 0; + if (write(ibqp->context->cmd_fd, cmd, cmd_size) != cmd_size) + ret = errno; + + wr_count = resp.bad_wr; + if (wr_count) { + i = wr; + while (--wr_count) + i = i->next; + *bad_wr = i; + } + + return ret; +} + +int rdma_cmd_post_recv(struct rdma_qp *ibqp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) +{ + struct rdma_post_recv *cmd; + struct rdma_post_recv_resp resp; + struct rdma_recv_wr *i; + struct rdma_kern_recv_wr *n, *tmp; + struct rdma_sge *s; + unsigned wr_count = 0; + unsigned sge_count = 0; + int cmd_size; + int ret = 0; + + for (i = wr; i; i = i->next) { + wr_count++; + sge_count += i->num_sge; + } + + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; + cmd = alloca(cmd_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_RECV, &resp, sizeof resp); + cmd->qp_handle = ibqp->handle; + cmd->wr_count = wr_count; + cmd->sge_count = sge_count; + cmd->wqe_size = sizeof *n; + + n = (struct rdma_kern_recv_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); + + tmp = n; + for (i = wr; i; i = i->next) { + tmp->wr_id = i->wr_id; + tmp->num_sge = i->num_sge; + + if (tmp->num_sge) { + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); + s += tmp->num_sge; + } + + tmp++; + } + + resp.bad_wr = 0; + if (write(ibqp->context->cmd_fd, cmd, cmd_size) != cmd_size) + ret = errno; + + wr_count = resp.bad_wr; + if (wr_count) { + i = wr; + while (--wr_count) + i = i->next; + *bad_wr = i; + } + + return ret; +} + +int rdma_cmd_post_srq_recv(struct rdma_srq *srq, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) +{ + struct rdma_post_srq_recv *cmd; + struct rdma_post_srq_recv_resp resp; + struct rdma_recv_wr *i; + struct rdma_kern_recv_wr *n, *tmp; + struct rdma_sge *s; + unsigned wr_count = 0; + unsigned sge_count = 0; + int cmd_size; + int ret = 0; + + for (i = wr; i; i = i->next) { + wr_count++; + sge_count += i->num_sge; + } + + cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; + cmd = alloca(cmd_size); + + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_SRQ_RECV, &resp, sizeof resp); + cmd->srq_handle = srq->handle; + cmd->wr_count = wr_count; + cmd->sge_count = sge_count; + cmd->wqe_size = sizeof *n; + + n = (struct rdma_kern_recv_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); + + tmp = n; + for (i = wr; i; i = i->next) { + tmp->wr_id = i->wr_id; + tmp->num_sge = i->num_sge; + + if (tmp->num_sge) { + memcpy(s, i->sg_list, tmp->num_sge * sizeof *s); + s += tmp->num_sge; + } + + tmp++; + } + + resp.bad_wr = 0; + if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) + ret = errno; + + wr_count = resp.bad_wr; + if (wr_count) { + i = wr; + while (--wr_count) + i = i->next; + *bad_wr = i; + } + + return ret; +} + +int rdma_cmd_create_ah(struct rdma_pd *pd, struct rdma_ah *ah, + struct rdma_ah_attr *attr) +{ + struct rdma_create_ah cmd; + struct rdma_create_ah_resp resp; + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, CREATE_AH, &resp, sizeof resp); + cmd.user_handle = (uintptr_t) ah; + cmd.pd_handle = pd->handle; + cmd.attr.dlid = attr->dlid; + cmd.attr.sl = attr->sl; + cmd.attr.src_path_bits = attr->src_path_bits; + cmd.attr.static_rate = attr->static_rate; + cmd.attr.is_global = attr->is_global; + cmd.attr.port_num = attr->port_num; + cmd.attr.grh.flow_label = attr->grh.flow_label; + cmd.attr.grh.sgid_index = attr->grh.sgid_index; + cmd.attr.grh.hop_limit = attr->grh.hop_limit; + cmd.attr.grh.traffic_class = attr->grh.traffic_class; + memcpy(cmd.attr.grh.dgid, attr->grh.dgid.raw, 16); + + if (write(pd->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + ah->handle = resp.handle; + + return 0; +} + +int rdma_cmd_destroy_ah(struct rdma_ah *ah) +{ + struct rdma_destroy_ah cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_AH); + cmd.ah_handle = ah->handle; + + if (write(ah->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_destroy_qp(struct rdma_qp *qp) +{ + struct rdma_destroy_qp cmd; + struct rdma_destroy_qp_resp resp; + + if (abi_ver == 1) + return rdma_cmd_destroy_qp_v1(qp); + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_QP, &resp, sizeof resp); + cmd.qp_handle = qp->handle; + + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + pthread_mutex_lock(&qp->mutex); + while (qp->events_completed != resp.events_reported) + pthread_cond_wait(&qp->cond, &qp->mutex); + pthread_mutex_unlock(&qp->mutex); + + return 0; +} + +int rdma_cmd_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + struct rdma_attach_mcast cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, ATTACH_MCAST); + memcpy(cmd.gid, gid->raw, sizeof cmd.gid); + cmd.qp_handle = qp->handle; + cmd.mlid = lid; + + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} + +int rdma_cmd_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + struct rdma_detach_mcast cmd; + + RDMA_INIT_CMD(&cmd, sizeof cmd, DETACH_MCAST); + memcpy(cmd.gid, gid->raw, sizeof cmd.gid); + cmd.qp_handle = qp->handle; + cmd.mlid = lid; + + if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + return errno; + + return 0; +} diff -ruNp ORG/librdmaverbs/src/device.c NEW/librdmaverbs/src/device.c --- ORG/librdmaverbs/src/device.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/device.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,270 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: device.c 7631 2006-06-02 19:53:25Z swise $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "rdmaverbs.h" + +static pthread_mutex_t device_list_lock = PTHREAD_MUTEX_INITIALIZER; +static int num_devices; +static struct rdma_device **device_list; + +struct rdma_device **rdma_get_device_list(int *num) +{ + struct rdma_device **l; + int i; + + pthread_mutex_lock(&device_list_lock); + + if (!num_devices) + num_devices = rdmaverbs_init(&device_list); + + l = calloc(num_devices + 1, sizeof (struct rdma_device *)); + for (i = 0; i < num_devices; ++i) + l[i] = device_list[i]; + + pthread_mutex_unlock(&device_list_lock); + + if (num) + *num = l ? num_devices : 0; + + return l; +} + +void rdma_free_device_list(struct rdma_device **list) +{ + free(list); +} + +const char *rdma_get_device_name(struct rdma_device *device) +{ + return device->name; +} + +uint64_t rdma_get_device_guid(struct rdma_device *device) +{ + char attr[24]; + uint64_t guid = 0; + uint16_t parts[4]; + int i; + + if (rdma_read_sysfs_file(device->ibdev_path, "node_guid", + attr, sizeof attr) < 0) + return 0; + + if (sscanf(attr, "%hx:%hx:%hx:%hx", + parts, parts + 1, parts + 2, parts + 3) != 4) + return 0; + + for (i = 0; i < 4; ++i) + guid = (guid << 16) | parts[i]; + + return htonll(guid); +} + +static enum rdma_node_type query_node_type(struct rdma_device *device) +{ + char node_desc[24]; + char node_str[24]; + int node_type; + + if (rdma_read_sysfs_file(device->ibdev_path, "node_type", + node_desc, sizeof(node_desc)) < 0) + return RDMA_NODE_UNKNOWN; + + sscanf(node_desc, "%d: %s\n", (int*)&node_type, node_str); + return (enum rdma_node_type) node_type; +} + +struct rdma_context *rdma_open_device(struct rdma_device *device) +{ + char *devpath; + int cmd_fd; + struct rdma_context *context; + + asprintf(&devpath, "/dev/infiniband/%s", device->dev_name); + + /* + * We'll only be doing writes, but we need O_RDWR in case the + * provider needs to mmap() the file. + */ + cmd_fd = open(devpath, O_RDWR); + free(devpath); + + if (cmd_fd < 0) + return NULL; + + device->node_type = query_node_type(device); + + context = device->ops.alloc_context(device, cmd_fd); + if (!context) + goto err; + + context->device = device; + context->cmd_fd = cmd_fd; + + return context; + +err: + close(cmd_fd); + + return NULL; +} + +int rdma_close_device(struct rdma_context *context) +{ + int async_fd = context->async_fd; + int cmd_fd = context->cmd_fd; + int cq_fd = -1; + + if (abi_ver <= 2) { + struct rdma_abi_compat_v2 *t = context->abi_compat; + cq_fd = t->channel.fd; + free(context->abi_compat); + } + + context->device->ops.free_context(context); + + close(async_fd); + close(cmd_fd); + if (abi_ver <= 2) + close(cq_fd); + + return 0; +} + +int rdma_get_async_event(struct rdma_context *context, + struct rdma_async_event *event) +{ + struct rdma_kern_async_event ev; + + if (read(context->async_fd, &ev, sizeof ev) != sizeof ev) + return -1; + + event->event_type = ev.event_type; + + switch (event->event_type) { + case RDMA_EVENT_CQ_ERR: + event->element.cq = (void *) (uintptr_t) ev.element; + break; + + case RDMA_EVENT_QP_FATAL: + case RDMA_EVENT_QP_REQ_ERR: + case RDMA_EVENT_QP_ACCESS_ERR: + case RDMA_EVENT_COMM_EST: + case RDMA_EVENT_SQ_DRAINED: + case RDMA_EVENT_PATH_MIG: + case RDMA_EVENT_PATH_MIG_ERR: + case RDMA_EVENT_QP_LAST_WQE_REACHED: + event->element.qp = (void *) (uintptr_t) ev.element; + break; + + case RDMA_EVENT_SRQ_ERR: + case RDMA_EVENT_SRQ_LIMIT_REACHED: + event->element.srq = (void *) (uintptr_t) ev.element; + break; + + default: + event->element.port_num = ev.element; + break; + } + + return 0; +} + +void rdma_ack_async_event(struct rdma_async_event *event) +{ + switch (event->event_type) { + case RDMA_EVENT_CQ_ERR: + { + struct rdma_cq *cq = event->element.cq; + + pthread_mutex_lock(&cq->mutex); + ++cq->async_events_completed; + pthread_cond_signal(&cq->cond); + pthread_mutex_unlock(&cq->mutex); + + return; + } + + case RDMA_EVENT_QP_FATAL: + case RDMA_EVENT_QP_REQ_ERR: + case RDMA_EVENT_QP_ACCESS_ERR: + case RDMA_EVENT_COMM_EST: + case RDMA_EVENT_SQ_DRAINED: + case RDMA_EVENT_PATH_MIG: + case RDMA_EVENT_PATH_MIG_ERR: + case RDMA_EVENT_QP_LAST_WQE_REACHED: + { + struct rdma_qp *qp = event->element.qp; + + pthread_mutex_lock(&qp->mutex); + ++qp->events_completed; + pthread_cond_signal(&qp->cond); + pthread_mutex_unlock(&qp->mutex); + + return; + } + + case RDMA_EVENT_SRQ_ERR: + case RDMA_EVENT_SRQ_LIMIT_REACHED: + { + struct rdma_srq *srq = event->element.srq; + + pthread_mutex_lock(&srq->mutex); + ++srq->events_completed; + pthread_cond_signal(&srq->cond); + pthread_mutex_unlock(&srq->mutex); + + return; + } + + default: + return; + } +} diff -ruNp ORG/librdmaverbs/src/init.c NEW/librdmaverbs/src/init.c --- ORG/librdmaverbs/src/init.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/init.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,274 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: init.c 7631 2006-06-02 19:53:25Z swise $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "rdmaverbs.h" + +#ifndef OPENRDMA_DRIVER_PATH_ENV +# define OPENRDMA_DRIVER_PATH_ENV "OPENRDMA_DRIVER_PATH" +#endif + +HIDDEN int abi_ver; + +static char default_path[] = DRIVER_PATH; +static const char *user_path; + +static struct rdma_driver *driver_list; + +static void load_driver(char *so_path) +{ + void *dlhandle; + rdma_driver_init_func init_func; + struct rdma_driver *driver; + + dlhandle = dlopen(so_path, RTLD_NOW); + if (!dlhandle) { + fprintf(stderr, PFX "Warning: couldn't load driver %s: %s\n", + so_path, dlerror()); + return; + } + + dlerror(); + init_func = dlsym(dlhandle, "rdma_driver_init"); + if (dlerror() != NULL || !init_func) { + dlclose(dlhandle); + return; + } + + driver = malloc(sizeof *driver); + if (!driver) { + fprintf(stderr, PFX "Fatal: couldn't allocate driver for %s\n", so_path); + dlclose(dlhandle); + return; + } + + driver->init_func = init_func; + driver->next = driver_list; + driver_list = driver; +} + +static void find_drivers(char *dir) +{ + size_t len = strlen(dir); + glob_t so_glob; + char *pat; + int ret; + int i; + + if (!len) + return; + + while (len && dir[len - 1] == '/') + dir[--len] = '\0'; + + asprintf(&pat, "%s/*.so", dir); + + ret = glob(pat, 0, NULL, &so_glob); + free(pat); + + if (ret) { + if (ret != GLOB_NOMATCH) + fprintf(stderr, PFX "Warning: couldn't search %s\n", pat); + return; + } + + for (i = 0; i < so_glob.gl_pathc; ++i) + load_driver(so_glob.gl_pathv[i]); + + globfree(&so_glob); +} + +static struct rdma_device *init_drivers(const char *class_path, + const char *dev_name) +{ + struct rdma_driver *driver; + struct rdma_device *dev; + int abi_ver = 0; + char sys_path[RDMA_SYSFS_PATH_MAX]; + char ibdev_name[RDMA_SYSFS_NAME_MAX]; + char value[8]; + + snprintf(sys_path, sizeof sys_path, "%s/%s", + class_path, dev_name); + + if (rdma_read_sysfs_file(sys_path, "abi_version", value, sizeof value) > 0) + abi_ver = strtol(value, NULL, 10); + + if (rdma_read_sysfs_file(sys_path, "ibdev", ibdev_name, sizeof ibdev_name) < 0) { + fprintf(stderr, PFX "Warning: no ibdev class attr for %s\n", + sys_path); + return NULL; + } + + for (driver = driver_list; driver; driver = driver->next) { + dev = driver->init_func(sys_path, abi_ver); + if (!dev) + continue; + + dev->driver = driver; + strcpy(dev->dev_path, sys_path); + snprintf(dev->ibdev_path, RDMA_SYSFS_PATH_MAX, "%s/class/infiniband/%s", + rdma_get_sysfs_path(), ibdev_name); + strcpy(dev->dev_name, dev_name); + strcpy(dev->name, ibdev_name); + + return dev; + } + + fprintf(stderr, PFX "Warning: no userspace device-specific driver found for %s\n" + " driver search path: ", dev_name); + if (user_path) + fprintf(stderr, "%s:", user_path); + fprintf(stderr, "%s\n", default_path); + + return NULL; +} + +static int check_abi_version(const char *path) +{ + char value[8]; + + if (rdma_read_sysfs_file(path, "class/infiniband_verbs/abi_version", + value, sizeof value) < 0) { + fprintf(stderr, PFX "Fatal: couldn't read uverbs ABI version.\n"); + return -1; + } + + abi_ver = strtol(value, NULL, 10); + + if (abi_ver < RDMA_USER_VERBS_MIN_ABI_VERSION || + abi_ver > RDMA_USER_VERBS_MAX_ABI_VERSION) { + fprintf(stderr, PFX "Fatal: kernel ABI version %d " + "doesn't match library version %d.\n", + abi_ver, RDMA_USER_VERBS_MAX_ABI_VERSION); + return -1; + } + + return 0; +} + +HIDDEN int rdmaverbs_init(struct rdma_device ***list) +{ + const char *sysfs_path; + char *wr_path, *dir; + char class_path[RDMA_SYSFS_PATH_MAX]; + DIR *class_dir; + struct dirent *dent; + struct rdma_device *device; + struct rdma_device **new_list; + int num_devices = 0; + int list_size = 0; + + *list = NULL; + + if (rdma_init_mem_map()) + return 0; + + find_drivers(default_path); + + /* + * Only follow use path passed in through the calling user's + * environment if we're not running SUID. + */ + if (getuid() == geteuid()) { + user_path = getenv(OPENRDMA_DRIVER_PATH_ENV); + if (user_path) { + wr_path = strdupa(user_path); + while ((dir = strsep(&wr_path, ";:"))) + find_drivers(dir); + } + } + + /* + * Now check if a driver is statically linked. Since we push + * drivers onto our driver list, the last driver we find will + * be the first one we try. + */ + load_driver(NULL); + + sysfs_path = rdma_get_sysfs_path(); + if (!sysfs_path) { + fprintf(stderr, PFX "Fatal: couldn't find sysfs mount.\n"); + return 0; + } + + if (check_abi_version(sysfs_path)) + return 0; + + snprintf(class_path, sizeof class_path, "%s/class/infiniband_verbs", + sysfs_path); + class_dir = opendir(class_path); + if (!class_dir) { + fprintf(stderr, PFX "Fatal: couldn't open sysfs class " + "directory '%s'.\n", class_path); + return 0; + } + + while ((dent = readdir(class_dir))) { + if (dent->d_name[0] == '.' || dent->d_type == DT_REG) + continue; + + device = init_drivers(class_path, dent->d_name); + if (!device) + continue; + + if (list_size <= num_devices) { + list_size = list_size ? list_size * 2 : 1; + new_list = realloc(*list, list_size * sizeof (struct rdma_device *)); + if (!new_list) + goto out; + *list = new_list; + } + + (*list)[num_devices++] = device; + } + + closedir(class_dir); + +out: + return num_devices; +} diff -ruNp ORG/librdmaverbs/src/librdmaverbs.map NEW/librdmaverbs/src/librdmaverbs.map --- ORG/librdmaverbs/src/librdmaverbs.map 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/librdmaverbs.map 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,76 @@ +IBVERBS_1.0 { + global: + rdma_get_device_list; + rdma_free_device_list; + rdma_get_device_name; + rdma_get_device_guid; + rdma_open_device; + rdma_close_device; + rdma_get_async_event; + rdma_ack_async_event; + rdma_query_device; + rdma_query_port; + rdma_query_gid; + rdma_query_pkey; + rdma_alloc_pd; + rdma_dealloc_pd; + rdma_reg_mr; + rdma_dereg_mr; + rdma_create_comp_channel; + rdma_destroy_comp_channel; + rdma_create_cq; + rdma_resize_cq; + rdma_destroy_cq; + rdma_get_cq_event; + rdma_ack_cq_events; + rdma_create_srq; + rdma_modify_srq; + rdma_query_srq; + rdma_destroy_srq; + rdmav_create_qp; + rdma_query_qp; + rdma_modify_qp; + rdmav_destroy_qp; + rdma_create_ah; + rdma_destroy_ah; + rdma_attach_mcast; + rdma_detach_mcast; + rdma_cmd_get_context; + rdma_cmd_query_device; + rdma_cmd_query_port; + rdma_cmd_query_gid; + rdma_cmd_query_pkey; + rdma_cmd_alloc_pd; + rdma_cmd_dealloc_pd; + rdma_cmd_reg_mr; + rdma_cmd_dereg_mr; + rdma_cmd_create_cq; + rdma_cmd_poll_cq; + rdma_cmd_req_notify_cq; + rdma_cmd_resize_cq; + rdma_cmd_destroy_cq; + rdma_cmd_create_srq; + rdma_cmd_modify_srq; + rdma_cmd_query_srq; + rdma_cmd_destroy_srq; + rdma_cmd_create_qp; + rdma_cmd_query_qp; + rdma_cmd_modify_qp; + rdma_cmd_destroy_qp; + rdma_cmd_post_send; + rdma_cmd_post_recv; + rdma_cmd_post_srq_recv; + rdma_cmd_create_ah; + rdma_cmd_destroy_ah; + rdma_cmd_attach_mcast; + rdma_cmd_detach_mcast; + rdma_copy_qp_attr_from_kern; + rdma_copy_path_rec_from_kern; + rdma_copy_path_rec_to_kern; + rdma_rate_to_mult; + mult_to_rdma_rate; + rdma_get_sysfs_path; + rdma_read_sysfs_file; + + local: *; +}; diff -ruNp ORG/librdmaverbs/src/marshall.c NEW/librdmaverbs/src/marshall.c --- ORG/librdmaverbs/src/marshall.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/marshall.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,142 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include + +#include + +static void rdma_copy_ah_attr_from_kern(struct rdma_ah_attr *dst, + struct rdma_kern_ah_attr *src) +{ + memcpy(dst->grh.dgid.raw, src->grh.dgid, sizeof dst->grh.dgid); + dst->grh.flow_label = src->grh.flow_label; + dst->grh.sgid_index = src->grh.sgid_index; + dst->grh.hop_limit = src->grh.hop_limit; + dst->grh.traffic_class = src->grh.traffic_class; + + dst->dlid = src->dlid; + dst->sl = src->sl; + dst->src_path_bits = src->src_path_bits; + dst->static_rate = src->static_rate; + dst->is_global = src->is_global; + dst->port_num = src->port_num; +} + +void rdma_copy_qp_attr_from_kern(struct rdma_qp_attr *dst, + struct rdma_kern_qp_attr *src) +{ + dst->cur_qp_state = src->cur_qp_state; + dst->path_mtu = src->path_mtu; + dst->path_mig_state = src->path_mig_state; + dst->qkey = src->qkey; + dst->rq_psn = src->rq_psn; + dst->sq_psn = src->sq_psn; + dst->dest_qp_num = src->dest_qp_num; + dst->qp_access_flags = src->qp_access_flags; + + dst->cap.max_send_wr = src->max_send_wr; + dst->cap.max_recv_wr = src->max_recv_wr; + dst->cap.max_send_sge = src->max_send_sge; + dst->cap.max_recv_sge = src->max_recv_sge; + dst->cap.max_inline_data = src->max_inline_data; + + rdma_copy_ah_attr_from_kern(&dst->ah_attr, &src->ah_attr); + rdma_copy_ah_attr_from_kern(&dst->alt_ah_attr, &src->alt_ah_attr); + + dst->pkey_index = src->pkey_index; + dst->alt_pkey_index = src->alt_pkey_index; + dst->en_sqd_async_notify = src->en_sqd_async_notify; + dst->sq_draining = src->sq_draining; + dst->max_rd_atomic = src->max_rd_atomic; + dst->max_dest_rd_atomic = src->max_dest_rd_atomic; + dst->min_rnr_timer = src->min_rnr_timer; + dst->port_num = src->port_num; + dst->timeout = src->timeout; + dst->retry_cnt = src->retry_cnt; + dst->rnr_retry = src->rnr_retry; + dst->alt_port_num = src->alt_port_num; + dst->alt_timeout = src->alt_timeout; +} + +void rdma_copy_path_rec_from_kern(struct rdma_sa_path_rec *dst, + struct rdma_kern_path_rec *src) +{ + memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid); + memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid); + + dst->dlid = src->dlid; + dst->slid = src->slid; + dst->raw_traffic = src->raw_traffic; + dst->flow_label = src->flow_label; + dst->hop_limit = src->hop_limit; + dst->traffic_class = src->traffic_class; + dst->reversible = src->reversible; + dst->numb_path = src->numb_path; + dst->pkey = src->pkey; + dst->sl = src->sl; + dst->mtu_selector = src->mtu_selector; + dst->mtu = src->mtu; + dst->rate_selector = src->rate_selector; + dst->rate = src->rate; + dst->packet_life_time = src->packet_life_time; + dst->preference = src->preference; + dst->packet_life_time_selector = src->packet_life_time_selector; +} + +void rdma_copy_path_rec_to_kern(struct rdma_kern_path_rec *dst, + struct rdma_sa_path_rec *src) +{ + memcpy(dst->dgid, src->dgid.raw, sizeof src->dgid); + memcpy(dst->sgid, src->sgid.raw, sizeof src->sgid); + + dst->dlid = src->dlid; + dst->slid = src->slid; + dst->raw_traffic = src->raw_traffic; + dst->flow_label = src->flow_label; + dst->hop_limit = src->hop_limit; + dst->traffic_class = src->traffic_class; + dst->reversible = src->reversible; + dst->numb_path = src->numb_path; + dst->pkey = src->pkey; + dst->sl = src->sl; + dst->mtu_selector = src->mtu_selector; + dst->mtu = src->mtu; + dst->rate_selector = src->rate_selector; + dst->rate = src->rate; + dst->packet_life_time = src->packet_life_time; + dst->preference = src->preference; + dst->packet_life_time_selector = src->packet_life_time_selector; +} diff -ruNp ORG/librdmaverbs/src/memory.c NEW/librdmaverbs/src/memory.c --- ORG/librdmaverbs/src/memory.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/memory.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,258 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: memory.c 6987 2006-05-08 15:18:51Z tom $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include + +#include "rdmaverbs.h" + +/* + * We keep a linked list of page ranges that have been locked along with a + * reference count to manage overlapping registrations, etc. + * + * Eventually we should turn this into an RB-tree or something similar + * to avoid the O(n) cost of registering/unregistering memory. + */ + +struct rdma_mem_node { + struct rdma_mem_node *prev, *next; + uintptr_t start, end; + int refcnt; +}; + +static struct { + struct rdma_mem_node *first; + pthread_mutex_t mutex; + uintptr_t page_size; +} mem_map; + +int rdma_init_mem_map(void) +{ + struct rdma_mem_node *node = NULL; + + node = malloc(sizeof *node); + if (!node) + goto fail; + + node->prev = node->next = NULL; + node->start = 0; + node->end = UINTPTR_MAX; + node->refcnt = 0; + + mem_map.first = node; + + mem_map.page_size = sysconf(_SC_PAGESIZE); + if (mem_map.page_size < 0) + goto fail; + + if (pthread_mutex_init(&mem_map.mutex, NULL)) + goto fail; + + return 0; + +fail: + if (node) + free(node); + + return -1; +} + +static struct rdma_mem_node *__mm_find_first(uintptr_t start, uintptr_t end) +{ + struct rdma_mem_node *node = mem_map.first; + + while (node) { + if ((node->start <= start && node->end >= start) || + (node->start <= end && node->end >= end)) + break; + node = node->next; + } + + return node; +} + +static struct rdma_mem_node *__mm_prev(struct rdma_mem_node *node) +{ + return node->prev; +} + +static struct rdma_mem_node *__mm_next(struct rdma_mem_node *node) +{ + return node->next; +} + +static void __mm_add(struct rdma_mem_node *node, + struct rdma_mem_node *new) +{ + new->prev = node; + new->next = node->next; + node->next = new; + if (new->next) + new->next->prev = new; +} + +static void __mm_remove(struct rdma_mem_node *node) +{ + /* Never have to remove the first node, so we can use prev */ + node->prev->next = node->next; + if (node->next) + node->next->prev = node->prev; +} + +int rdma_lock_range(void *base, size_t size) +{ + uintptr_t start, end; + struct rdma_mem_node *node, *tmp; + int ret = 0; + + if (!size) + return 0; + + start = (uintptr_t) base & ~(mem_map.page_size - 1); + end = ((uintptr_t) (base + size + mem_map.page_size - 1) & + ~(mem_map.page_size - 1)) - 1; + + pthread_mutex_lock(&mem_map.mutex); + + node = __mm_find_first(start, end); + + if (node->start < start) { + tmp = malloc(sizeof *tmp); + if (!tmp) { + ret = -1; + goto out; + } + + tmp->start = start; + tmp->end = node->end; + tmp->refcnt = node->refcnt; + node->end = start - 1; + + __mm_add(node, tmp); + node = tmp; + } + + while (node->start <= end) { + if (node->end > end) { + tmp = malloc(sizeof *tmp); + if (!tmp) { + ret = -1; + goto out; + } + + tmp->start = end + 1; + tmp->end = node->end; + tmp->refcnt = node->refcnt; + node->end = end; + + __mm_add(node, tmp); + } + + + if (node->refcnt++ == 0) { + ret = mlock((void *) node->start, + node->end - node->start + 1); + if (ret) + goto out; + } + + node = __mm_next(node); + } + +out: + pthread_mutex_unlock(&mem_map.mutex); + + return ret; +} + +int rdma_unlock_range(void *base, size_t size) +{ + uintptr_t start, end; + struct rdma_mem_node *node, *tmp; + int ret = 0; + + if (!size) + return 0; + + start = (uintptr_t) base & ~(mem_map.page_size - 1); + end = ((uintptr_t) (base + size + mem_map.page_size - 1) & + ~(mem_map.page_size - 1)) - 1; + + pthread_mutex_lock(&mem_map.mutex); + + node = __mm_find_first(start, end); + + if (node->start != start) { + ret = -1; + goto out; + } + + while (node && node->end <= end) { + if (--node->refcnt == 0) { + ret = munlock((void *) node->start, + node->end - node->start + 1); + } + + if (__mm_prev(node) && node->refcnt == __mm_prev(node)->refcnt) { + __mm_prev(node)->end = node->end; + tmp = __mm_prev(node); + __mm_remove(node); + node = tmp; + } + + node = __mm_next(node); + } + + if (node && node->refcnt == __mm_prev(node)->refcnt) { + __mm_prev(node)->end = node->end; + tmp = __mm_prev(node); + __mm_remove(node); + } + + if (node->end != end) { + ret = -1; + goto out; + } + +out: + pthread_mutex_unlock(&mem_map.mutex); + + return ret; +} diff -ruNp ORG/librdmaverbs/src/rdmaverbs.h NEW/librdmaverbs/src/rdmaverbs.h --- ORG/librdmaverbs/src/rdmaverbs.h 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/rdmaverbs.h 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,87 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef RDMA_VERBS_H +#define RDMA_VERBS_H + +#include + +#include + +#define HIDDEN __attribute__((visibility ("hidden"))) + +#define INIT __attribute__((constructor)) +#define FINI __attribute__((destructor)) + +#define PFX "librdmaverbs: " + +struct rdma_driver { + rdma_driver_init_func init_func; + struct rdma_driver *next; +}; + +struct rdma_abi_compat_v2 { + struct rdma_comp_channel channel; + pthread_mutex_t in_use; +}; + +extern HIDDEN int abi_ver; + +extern HIDDEN int rdmaverbs_init(struct rdma_device ***list); + +extern HIDDEN int rdma_init_mem_map(void); +extern HIDDEN int rdma_lock_range(void *base, size_t size); +extern HIDDEN int rdma_unlock_range(void *base, size_t size); + +#define RDMA_INIT_CMD(cmd, size, opcode) \ + do { \ + if (abi_ver > 2) \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode; \ + else \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode##_V2; \ + (cmd)->in_words = (size) / 4; \ + (cmd)->out_words = 0; \ + } while (0) + +#define RDMA_INIT_CMD_RESP(cmd, size, opcode, out, outsize) \ + do { \ + if (abi_ver > 2) \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode; \ + else \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode##_V2; \ + (cmd)->in_words = (size) / 4; \ + (cmd)->out_words = (outsize) / 4; \ + (cmd)->response = (uintptr_t) (out); \ + } while (0) + +#endif /* RDMA_VERBS_H */ diff -ruNp ORG/librdmaverbs/src/sysfs.c NEW/librdmaverbs/src/sysfs.c --- ORG/librdmaverbs/src/sysfs.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/sysfs.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,104 @@ +/* + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include + +#include "rdmaverbs.h" + +static char *sysfs_path; + +const char *rdma_get_sysfs_path(void) +{ + char *env = NULL; + + if (sysfs_path) + return sysfs_path; + + /* + * Only follow use path passed in through the calling user's + * environment if we're not running SUID. + */ + if (getuid() == geteuid()) + env = getenv("SYSFS_PATH"); + + if (env) { + int len; + + sysfs_path = strndup(env, RDMA_SYSFS_PATH_MAX); + len = strlen(sysfs_path); + while (len > 0 && sysfs_path[len - 1] == '/') { + --len; + sysfs_path[len] = '\0'; + } + } else + sysfs_path = "/sys"; + + return sysfs_path; +} + +int rdma_read_sysfs_file(const char *dir, const char *file, + char *buf, size_t size) +{ + char *path; + int fd; + int len; + + asprintf(&path, "%s/%s", dir, file); + + fd = open(path, O_RDONLY); + if (fd < 0) { + free(path); + return -1; + } + + len = read(fd, buf, size); + + close(fd); + free(path); + + if (len > 0 && buf[len - 1] == '\n') + buf[--len] = '\0'; + + return len; +} diff -ruNp ORG/librdmaverbs/src/verbs.c NEW/librdmaverbs/src/verbs.c --- ORG/librdmaverbs/src/verbs.c 1969-12-31 16:00:00.000000000 -0800 +++ NEW/librdmaverbs/src/verbs.c 2006-07-10 18:07:47.000000000 -0700 @@ -0,0 +1,408 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: verbs.c 7631 2006-06-02 19:53:25Z swise $ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include + +#include "rdmaverbs.h" + +int rdma_rate_to_mult(enum rdma_rate rate) +{ + switch (rate) { + case RDMA_RATE_2_5_GBPS: return 1; + case RDMA_RATE_5_GBPS: return 2; + case RDMA_RATE_10_GBPS: return 4; + case RDMA_RATE_20_GBPS: return 8; + case RDMA_RATE_30_GBPS: return 12; + case RDMA_RATE_40_GBPS: return 16; + case RDMA_RATE_60_GBPS: return 24; + case RDMA_RATE_80_GBPS: return 32; + case RDMA_RATE_120_GBPS: return 48; + default: return -1; + } +} + +enum rdma_rate mult_to_rdma_rate(int mult) +{ + switch (mult) { + case 1: return RDMA_RATE_2_5_GBPS; + case 2: return RDMA_RATE_5_GBPS; + case 4: return RDMA_RATE_10_GBPS; + case 8: return RDMA_RATE_20_GBPS; + case 12: return RDMA_RATE_30_GBPS; + case 16: return RDMA_RATE_40_GBPS; + case 24: return RDMA_RATE_60_GBPS; + case 32: return RDMA_RATE_80_GBPS; + case 48: return RDMA_RATE_120_GBPS; + default: return RDMA_RATE_MAX; + } +} + +int rdma_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr) +{ + return context->ops.query_device(context, device_attr); +} + +int rdma_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr) +{ + return context->ops.query_port(context, port_num, port_attr); +} + +int rdma_query_gid(struct rdma_context *context, uint8_t port_num, + int index, union rdma_gid *gid) +{ + char name[24]; + char attr[41]; + uint16_t val; + int i; + + snprintf(name, sizeof name, "ports/%d/gids/%d", port_num, index); + + if (rdma_read_sysfs_file(context->device->ibdev_path, name, + attr, sizeof attr) < 0) + return -1; + + for (i = 0; i < 8; ++i) { + if (sscanf(attr + i * 5, "%hx", &val) != 1) + return -1; + gid->raw[i * 2 ] = val >> 8; + gid->raw[i * 2 + 1] = val & 0xff; + } + + return 0; +} + +int rdma_query_pkey(struct rdma_context *context, uint8_t port_num, + int index, uint16_t *pkey) +{ + char name[24]; + char attr[8]; + uint16_t val; + + snprintf(name, sizeof name, "ports/%d/pkeys/%d", port_num, index); + + if (rdma_read_sysfs_file(context->device->ibdev_path, name, + attr, sizeof attr) < 0) + return -1; + + if (sscanf(attr, "%hx", &val) != 1) + return -1; + + *pkey = htons(val); + return 0; +} + +struct rdma_pd *rdma_alloc_pd(struct rdma_context *context) +{ + struct rdma_pd *pd; + + pd = context->ops.alloc_pd(context); + if (pd) + pd->context = context; + + return pd; +} + +int rdma_dealloc_pd(struct rdma_pd *pd) +{ + return pd->context->ops.dealloc_pd(pd); +} + +struct rdma_mr *rdma_reg_mr(struct rdma_pd *pd, void *addr, + size_t length, enum rdma_access_flags access) +{ + struct rdma_mr *mr; + + mr = pd->context->ops.reg_mr(pd, addr, length, access); + if (mr) { + mr->context = pd->context; + mr->pd = pd; + } + + return mr; +} + +int rdma_dereg_mr(struct rdma_mr *mr) +{ + return mr->context->ops.dereg_mr(mr); +} + +static struct rdma_comp_channel *rdma_create_comp_channel_v2(struct rdma_context *context) +{ + struct rdma_abi_compat_v2 *t = context->abi_compat; + static int warned; + + if (!pthread_mutex_trylock(&t->in_use)) + return &t->channel; + + if (!warned) { + fprintf(stderr, PFX "Warning: kernel's ABI version %d limits capacity.\n" + " Only one completion channel can be created per context.\n", + abi_ver); + ++warned; + } + + return NULL; +} + +struct rdma_comp_channel *rdma_create_comp_channel(struct rdma_context *context) +{ + struct rdma_comp_channel *channel; + struct rdma_create_comp_channel cmd; + struct rdma_create_comp_channel_resp resp; + + if (abi_ver <= 2) + return rdma_create_comp_channel_v2(context); + + channel = malloc(sizeof *channel); + if (!channel) + return NULL; + + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, CREATE_COMP_CHANNEL, &resp, sizeof resp); + if (write(context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) { + free(channel); + return NULL; + } + + channel->fd = resp.fd; + + return channel; +} + +static int rdma_destroy_comp_channel_v2(struct rdma_comp_channel *channel) +{ + struct rdma_abi_compat_v2 *t = (struct rdma_abi_compat_v2 *) channel; + pthread_mutex_unlock(&t->in_use); + return 0; +} + +int rdma_destroy_comp_channel(struct rdma_comp_channel *channel) +{ + if (abi_ver <= 2) + return rdma_destroy_comp_channel_v2(channel); + + close(channel->fd); + free(channel); + + return 0; +} + +struct rdma_cq *rdma_create_cq(struct rdma_context *context, int cqe, void *cq_context, + struct rdma_comp_channel *channel, int comp_vector) +{ + struct rdma_cq *cq = context->ops.create_cq(context, cqe, channel, + comp_vector); + + if (cq) { + cq->context = context; + cq->cq_context = cq_context; + cq->comp_events_completed = 0; + cq->async_events_completed = 0; + pthread_mutex_init(&cq->mutex, NULL); + pthread_cond_init(&cq->cond, NULL); + } + + return cq; +} + +int rdma_resize_cq(struct rdma_cq *cq, int cqe) +{ + if (!cq->context->ops.resize_cq) + return ENOSYS; + + return cq->context->ops.resize_cq(cq, cqe); +} + +int rdma_destroy_cq(struct rdma_cq *cq) +{ + return cq->context->ops.destroy_cq(cq); +} + + +int rdma_get_cq_event(struct rdma_comp_channel *channel, + struct rdma_cq **cq, void **cq_context) +{ + struct rdma_comp_event ev; + + if (read(channel->fd, &ev, sizeof ev) != sizeof ev) + return -1; + + *cq = (struct rdma_cq *) (uintptr_t) ev.cq_handle; + *cq_context = (*cq)->cq_context; + + if ((*cq)->context->ops.cq_event) + (*cq)->context->ops.cq_event(*cq); + + return 0; +} + +void rdma_ack_cq_events(struct rdma_cq *cq, unsigned int nevents) +{ + pthread_mutex_lock(&cq->mutex); + cq->comp_events_completed += nevents; + pthread_cond_signal(&cq->cond); + pthread_mutex_unlock(&cq->mutex); +} + +struct rdma_srq *rdma_create_srq(struct rdma_pd *pd, + struct rdma_srq_init_attr *srq_init_attr) +{ + struct rdma_srq *srq; + + if (!pd->context->ops.create_srq) + return NULL; + + srq = pd->context->ops.create_srq(pd, srq_init_attr); + if (srq) { + srq->context = pd->context; + srq->srq_context = srq_init_attr->srq_context; + srq->pd = pd; + srq->events_completed = 0; + pthread_mutex_init(&srq->mutex, NULL); + pthread_cond_init(&srq->cond, NULL); + } + + return srq; +} + +int rdma_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask) +{ + return srq->context->ops.modify_srq(srq, srq_attr, srq_attr_mask); +} + +int rdma_query_srq(struct rdma_srq *srq, struct rdma_srq_attr *srq_attr) +{ + return srq->context->ops.query_srq(srq, srq_attr); +} + +int rdma_destroy_srq(struct rdma_srq *srq) +{ + return srq->context->ops.destroy_srq(srq); +} + +struct rdma_qp *rdmav_create_qp(struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr) +{ + struct rdma_qp *qp = pd->context->ops.create_qp(pd, qp_init_attr); + + if (qp) { + qp->context = pd->context; + qp->qp_context = qp_init_attr->qp_context; + qp->pd = pd; + qp->send_cq = qp_init_attr->send_cq; + qp->recv_cq = qp_init_attr->recv_cq; + qp->srq = qp_init_attr->srq; + qp->qp_type = qp_init_attr->qp_type; + qp->events_completed = 0; + pthread_mutex_init(&qp->mutex, NULL); + pthread_cond_init(&qp->cond, NULL); + } + + return qp; +} + +int rdma_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr) +{ + int ret; + + ret = qp->context->ops.query_qp(qp, attr, attr_mask, init_attr); + if (ret) + return ret; + + if (attr_mask & RDMA_QP_STATE) + qp->state = attr->qp_state; + + return 0; +} + +int rdma_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask) +{ + int ret; + + ret = qp->context->ops.modify_qp(qp, attr, attr_mask); + if (ret) + return ret; + + if (attr_mask & RDMA_QP_STATE) + qp->state = attr->qp_state; + + return 0; +} + +int rdmav_destroy_qp(struct rdma_qp *qp) +{ + return qp->context->ops.destroy_qp(qp); +} + +struct rdma_ah *rdma_create_ah(struct rdma_pd *pd, struct rdma_ah_attr *attr) +{ + struct rdma_ah *ah = pd->context->ops.create_ah(pd, attr); + + if (ah) { + ah->context = pd->context; + ah->pd = pd; + } + + return ah; +} + +int rdma_destroy_ah(struct rdma_ah *ah) +{ + return ah->context->ops.destroy_ah(ah); +} + +int rdma_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + return qp->context->ops.attach_mcast(qp, gid, lid); +} + +int rdma_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + return qp->context->ops.detach_mcast(qp, gid, lid); +} From eitan at mellanox.co.il Mon Jul 10 03:54:26 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 10 Jul 2006 13:54:26 +0300 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A83C5@mtlexch01.mtl.com> > Can you please resend as inline text, or at the very least a plain text attachment? Enjoy: 2c2,3 < * Copyright (c) 2005 SilverStorm Technologies. All rights reserved. --- > * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. 5,6c6,10 < * This software is available to you under the OpenIB.org BSD license < * below: --- > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > * General Public License (GPL) Version 2, available from the file > * COPYING in the main directory of this source tree, or the > * OpenIB.org BSD license below: 30c34 < * $Id: ib_types.h 381 2006-06-08 17:10:14Z ftillier $ --- > * $Id: ib_types.h 8291 2006-06-29 18:53:06Z halr $ 36a41 > #include 40,54d44 < #ifdef CL_KERNEL < #define AL_EXPORT < #define AL_API < #define AL_INLINE static inline < #else < #if defined( EXPORT_AL_SYMBOLS ) < #define AL_EXPORT __declspec(dllexport) < #else < #define AL_EXPORT __declspec(dllimport) < #endif < #define AL_API __stdcall < #define AL_INLINE AL_EXPORT inline < #endif /* CL_KERNEL */ < < 56,57c46,50 < extern "C" < { --- > # define BEGIN_C_DECLS extern "C" { > # define END_C_DECLS } > #else /* !__cplusplus */ > # define BEGIN_C_DECLS > # define END_C_DECLS 59a53 > BEGIN_C_DECLS 222c216 < #define IB_DEFAULT_SUBNET_PREFIX (CL_HTON64(CL_CONST64(0xFE80000000000000))) --- > #define IB_DEFAULT_SUBNET_PREFIX (CL_HTON64(0xFE80000000000000ULL)) 365c359 < #define IB_PKEY_ENTRIES_MAX (IB_PKEY_MAX_BLOCKS * IB_PKEY_BLOCK_SIZE) --- > #define IB_PKEY_ENTRIES_MAX (IB_PKEY_MAX_BLOCKS * IB_NUM_PKEY_ELEMENTS_IN_BLOCK) 377c371 < #define IB_PKEY_BASE_MASK (CL_NTOH16(0x7FFF)) --- > #define IB_PKEY_BASE_MASK (CL_HTON16(0x7FFF)) 391a386,397 > /****d* IBA Base: Constants/IB_DEFAULT_PARTIAL_PKEY > * NAME > * IB_DEFAULT_PARTIAL_PKEY > * > * DESCRIPTION > * 0x7FFF in network order > * > * SOURCE > */ > #define IB_DEFAULT_PARTIAL_PKEY (CL_HTON16(0x7FFF)) > /**********/ > 511a518,541 > /****d* IBA Base: Constants/IB_MCLASS_DEV_ADM > * NAME > * IB_MCLASS_DEV_ADM > * > * DESCRIPTION > * Subnet Management Class, Device Administration > * > * SOURCE > */ > #define IB_MCLASS_DEV_ADM 0x10 > /**********/ > > /****d* IBA Base: Constants/IB_MCLASS_BIS > * NAME > * IB_MCLASS_BIS > * > * DESCRIPTION > * Subnet Management Class, BIS > * > * SOURCE > */ > #define IB_MCLASS_BIS 0x12 > /**********/ > 541c571 < * Indicitates if the Class Code if a vendor specific class from --- > * Indicates if the Class Code if a vendor specific class from 573c603 < * Indicitates if the Class Code if a vendor specific class from --- > * Indicates if the Class Code if a vendor specific class from 606c636 < * Indicitates if the Class Code if a vendor specific class --- > * Indicates if the Class Code if a vendor specific class 631a662,693 > /****f* IBA Base: Types/ib_class_is_rmpp > * NAME > * ib_class_is_rmpp > * > * DESCRIPTION > * Indicates if the Class Code supports RMPP > * > * SYNOPSIS > */ > static inline boolean_t > ib_class_is_rmpp( > IN const uint8_t class_code ) > { > return( (class_code == IB_MCLASS_SUBN_ADM) || > (class_code == IB_MCLASS_DEV_MGMT) || > (class_code == IB_MCLASS_DEV_ADM) || > (class_code == IB_MCLASS_BIS) || > ib_class_is_vendor_specific_high( class_code ) ); > } > /* > * PARAMETERS > * class_code > * [in] The Management Datagram Class Code > * > * RETURN VALUE > * TRUE if the class supports RMPP > * FALSE otherwise. > * > * NOTES > * > *********/ > 902d963 < /*********/ 1100c1161 < * SmInfoRecord attribute (15.2.5) --- > * SMInfoRecord attribute (15.2.5) 1106a1168,1179 > /****d* IBA Base: Constants/IB_MAD_ATTR_GUIDINFO_RECORD > * NAME > * IB_MAD_ATTR_GUIDINFO_RECORD > * > * DESCRIPTION > * GuidInfoRecord attribute (15.2.5) > * > * SOURCE > */ > #define IB_MAD_ATTR_GUIDINFO_RECORD (CL_NTOH16(0x0030)) > /**********/ > 1117a1191 > 1128a1203 > 1139a1215 > 1145c1221 < * LinearForwardingRecord attribute (15.2.5.6) --- > * LinearForwardingTableRecord attribute (15.2.5.6) 1150a1227 > 1156c1233 < * P-KEY table attribute (15.2.5) --- > * PKEY Table Record attribute (15.2.5) 1161a1239 > 1172a1251 > 1178c1257 < * VL Arbitration Table attribute (15.2.5) --- > * VL Arbitration Table Record attribute (15.2.5) 1183a1263 > 1189c1269 < * VSLtoL Map Table attribute (15.2.5) --- > * SLtoVL Mapping Table Record attribute (15.2.5) 1194a1275 > 1205a1287 > 1216a1299 > 1222c1305 < * MultiPath attribute (15.2.5) --- > * MultiPathRecord attribute (15.2.5) 1227a1311 > 1233c1317 < * Service Association attribute (15.2.5) --- > * Service Association Record attribute (15.2.5) 1238a1323 > 1250c1335,1336 < /****d* IBA Base: Constants/IB_MAD_ATTR_IOC_PROFILE --- > > /****d* IBA Base: Constants/IB_MAD_ATTR_IO_CONTROLLER_PROFILE 1252c1338 < * IB_MAD_ATTR_IOC_PROFILE --- > * IB_MAD_ATTR_IO_CONTROLLER_PROFILE 1259c1345 < #define IB_MAD_ATTR_IOC_PROFILE (CL_NTOH16(0x0011)) --- > #define IB_MAD_ATTR_IO_CONTROLLER_PROFILE (CL_NTOH16(0x0011)) 1260a1347 > 1271a1359 > 1282a1371 > 1293a1383 > 1304a1395 > 1315a1407 > 1327c1419,1420 < /*** IBA Base: Constants/IB_MAD_ATTR_SVC_ASSOCIATION_RECORD --- > > /****d* IBA Base: Constants/IB_MAD_ATTR_SVC_ASSOCIATION_RECORD 1332c1425 < * Service Association attribute (15.2.5) --- > * Service Association Record attribute (15.2.5) 1337a1431 > 1348a1443 > 1359a1455 > 1382a1479 > 1393a1491 > 1404a1503 > 1419c1518 < /****d* IBA Base: Constants/IB_MTU_TYPE --- > /****d* IBA Base: Constants/IB_MTU_LEN_TYPE 1421c1520 < * IB_MTU_TYPE --- > * IB_MTU_LEN_TYPE 1434,1438c1533,1541 < #define IB_MTU_256 1 < #define IB_MTU_512 2 < #define IB_MTU_1024 3 < #define IB_MTU_2048 4 < #define IB_MTU_4096 5 --- > #define IB_MTU_LEN_256 1 > #define IB_MTU_LEN_512 2 > #define IB_MTU_LEN_1024 3 > #define IB_MTU_LEN_2048 4 > #define IB_MTU_LEN_4096 5 > > #define IB_MIN_MTU IB_MTU_LEN_256 > #define IB_MAX_MTU IB_MTU_LEN_4096 > 1509c1612 < * Mask for the selector field for path record MTU, rate --- > * Mask for the selector field for path record MTU, rate, 1514a1618,1628 > /****d* IBA Base: Constants/IB_MULTIPATH_REC_SELECTOR_MASK > * NAME > * IB_MULTIPATH_REC_SELECTOR_MASK > * > * DESCRIPTION > * Mask for the selector field for multipath record MTU, rate, > * and packet lifetime. > * > * SOURCE > */ > #define IB_MULTIPATH_REC_SELECTOR_MASK 0xC0 1521c1635 < * Mask for the base value field for path record MTU, rate --- > * Mask for the base value field for path record MTU, rate, 1527a1642,1653 > /****d* IBA Base: Constants/IB_MULTIPATH_REC_BASE_MASK > * NAME > * IB_MULTIPATH_REC_BASE_MASK > * > * DESCRIPTION > * Mask for the base value field for multipath record MTU, rate, > * and packet lifetime. > * > * SOURCE > */ > #define IB_MULTIPATH_REC_BASE_MASK 0x3F > /**********/ 1534c1660 < * Definitions are from the InfiniBand Architecture Specification v1.1 --- > * Definitions are from the InfiniBand Architecture Specification v1.2 1616c1742 < AL_INLINE const char* AL_API --- > static inline const char* 1618c1744 < IN uint8_t node_type ) --- > IN uint32_t node_type ) 1620c1746 < if( node_type >= IB_NODE_TYPE_ROUTER ) --- > if( node_type >= IB_NOTICE_NODE_TYPE_ROUTER ) 1658c1784 < AL_INLINE const char* AL_API --- > static inline const char* 1668c1794 < * port_state --- > * node_type 1689c1815 < AL_INLINE const uint8_t AL_API --- > static inline uint8_t 1744c1870 < AL_INLINE ib_net16_t AL_API --- > static inline ib_net16_t 1767c1893 < * Indicitates if the port is a full member of the parition. --- > * Indicates if the port is a full member of the parition. 1771c1897 < AL_INLINE boolean_t AL_API --- > static inline boolean_t 1791a1918,1950 > /****f* IBA Base: Types/ib_pkey_is_invalid > * NAME > * ib_pkey_is_invalid > * > * DESCRIPTION > * Returns TRUE if the given P_Key is an invalid P_Key > * C10-116: the CI shall regard a P_Key as invalid if its low-order > * 15 bits are all zero... > * > * SYNOPSIS > */ > static inline boolean_t > ib_pkey_is_invalid( > IN const ib_net16_t pkey ) > { > if (ib_pkey_get_base(pkey) == 0x0000) > return TRUE; > > return FALSE; > } > /* > * PARAMETERS > * pkey > * [in] P_Key value > * > * RETURN VALUE > * Returns the base P_Key value with the membership bit stripped. > * > * NOTES > * > * SEE ALSO > *********/ > 1835c1994 < AL_INLINE boolean_t AL_API --- > static inline boolean_t 1851c2010 < AL_INLINE void AL_API --- > static inline void 1884c2043 < AL_INLINE ib_net64_t AL_API --- > static inline ib_net64_t 1948,1949c2107 < CL_HTON64( CL_CONST64(0xFFFFFFFFFFFF0000) ) ) == < CL_HTON64( CL_CONST64(0xFEC0000000000000) ) ); --- > CL_HTON64( 0xFFFFFFFFFFFF0000ULL ) ) == CL_HTON64( 0xFEC0000000000000ULL ) ); 1974c2132 < AL_INLINE ib_net64_t AL_API --- > static inline ib_net64_t 1994,2021d2151 < /****s* IBA Base: Types/ib_field32_t < * NAME < * ib_field32_t < * < * DESCRIPTION < * Represents a 32-bit field, and allows access as a 32-bit network byte < * ordered or a 4-byte array. < * < * SYNOPSIS < */ < #include < typedef union _ib_field32_t < { < net32_t val; < uint8_t bytes[4]; < < } PACK_SUFFIX ib_field32_t; < #include < /* < * FIELDS < * val < * Full field value. < * < * bytes < * Byte array representing the field. The byte array provides identical < * access independently from CPU byte-ordering. < *********/ < 2033c2163 < typedef __declspec(align(8)) struct _ib_path_rec --- > typedef struct _ib_path_rec 2035c2165 < uint64_t resv0; --- > uint8_t resv0[8]; 2040c2170 < ib_field32_t hop_flow_raw; --- > ib_net32_t hop_flow_raw; 2049,2050c2179 < uint16_t resv1; < uint32_t resv2; --- > uint8_t resv2[6]; 2078c2207,2208 < * In queries, maximum number of paths to return. --- > * Reversible path - 1 bit to say if path is reversible. > * num_path [6:0] In queries, maximum number of paths to return. 2103,2105d2232 < * resv1 < * Reserved bytes. < * 2111,2132c2238,2258 < /* Port Record Component Masks */ < #define IB_PR_COMPMASK_LID (CL_HTON64(1<<0)) < #define IB_PR_COMPMASK_DGID (CL_HTON64(1<<2)) < #define IB_PR_COMPMASK_SGID (CL_HTON64(1<<3)) < #define IB_PR_COMPMASK_DLID (CL_HTON64(1<<4)) < #define IB_PR_COMPMASK_SLID (CL_HTON64(1<<5)) < #define IB_PR_COMPMASK_RAWTRAFIC (CL_HTON64(1<<6)) < #define IB_PR_COMPMASK_RESV0 (CL_HTON64(1<<7)) < #define IB_PR_COMPMASK_FLOWLABEL (CL_HTON64(1<<8)) < #define IB_PR_COMPMASK_HOPLIMIT (CL_HTON64(1<<9)) < #define IB_PR_COMPMASK_TCLASS (CL_HTON64(1<<10)) < #define IB_PR_COMPMASK_REVERSIBLE (CL_HTON64(1<<11)) < #define IB_PR_COMPMASK_NUM_PATH (CL_HTON64(1<<12)) < #define IB_PR_COMPMASK_PKEY (CL_HTON64(1<<13)) < #define IB_PR_COMPMASK_RESV1 (CL_HTON64(1<<14)) < #define IB_PR_COMPMASK_SL (CL_HTON64(1<<15)) < #define IB_PR_COMPMASK_MTU_SEL (CL_HTON64(1<<16)) < #define IB_PR_COMPMASK_MTU (CL_HTON64(1<<17)) < #define IB_PR_COMPMASK_RATE_SEL (CL_HTON64(1<<18)) < #define IB_PR_COMPMASK_RATE (CL_HTON64(1<<19)) < #define IB_PR_COMPMASK_PKTLIFETIME_SEL (CL_HTON64(1<<20)) < #define IB_PR_COMPMASK_PFTLIFETIME (CL_HTON64(1<<21)) --- > /* Path Record Component Masks */ > #define IB_PR_COMPMASK_DGID (CL_HTON64(((uint64_t)1)<<2)) > #define IB_PR_COMPMASK_SGID (CL_HTON64(((uint64_t)1)<<3)) > #define IB_PR_COMPMASK_DLID (CL_HTON64(((uint64_t)1)<<4)) > #define IB_PR_COMPMASK_SLID (CL_HTON64(((uint64_t)1)<<5)) > #define IB_PR_COMPMASK_RAWTRAFFIC (CL_HTON64(((uint64_t)1)<<6)) > #define IB_PR_COMPMASK_RESV0 (CL_HTON64(((uint64_t)1)<<7)) > #define IB_PR_COMPMASK_FLOWLABEL (CL_HTON64(((uint64_t)1)<<8)) > #define IB_PR_COMPMASK_HOPLIMIT (CL_HTON64(((uint64_t)1)<<9)) > #define IB_PR_COMPMASK_TCLASS (CL_HTON64(((uint64_t)1)<<10)) > #define IB_PR_COMPMASK_REVERSIBLE (CL_HTON64(((uint64_t)1)<<11)) > #define IB_PR_COMPMASK_NUMBPATH (CL_HTON64(((uint64_t)1)<<12)) > #define IB_PR_COMPMASK_PKEY (CL_HTON64(((uint64_t)1)<<13)) > #define IB_PR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<14)) > #define IB_PR_COMPMASK_SL (CL_HTON64(((uint64_t)1)<<15)) > #define IB_PR_COMPMASK_MTUSELEC (CL_HTON64(((uint64_t)1)<<16)) > #define IB_PR_COMPMASK_MTU (CL_HTON64(((uint64_t)1)<<17)) > #define IB_PR_COMPMASK_RATESELEC (CL_HTON64(((uint64_t)1)<<18)) > #define IB_PR_COMPMASK_RATE (CL_HTON64(((uint64_t)1)<<19)) > #define IB_PR_COMPMASK_PKTLIFETIMESELEC (CL_HTON64(((uint64_t)1)<<20)) > #define IB_PR_COMPMASK_PKTLIFETIME (CL_HTON64(((uint64_t)1)<<21)) 2135,2138c2261,2264 < #define IB_LR_COMPMASK_FROM_LID (CL_HTON64(1<<0)) < #define IB_LR_COMPMASK_FROM_PORT (CL_HTON64(1<<1)) < #define IB_LR_COMPMASK_TO_PORT (CL_HTON64(1<<2)) < #define IB_LR_COMPMASK_TO_LID (CL_HTON64(1<<3)) --- > #define IB_LR_COMPMASK_FROM_LID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_LR_COMPMASK_FROM_PORT (CL_HTON64(((uint64_t)1)<<1)) > #define IB_LR_COMPMASK_TO_PORT (CL_HTON64(((uint64_t)1)<<2)) > #define IB_LR_COMPMASK_TO_LID (CL_HTON64(((uint64_t)1)<<3)) 2141,2143c2267,2270 < #define IB_VLA_COMPMASK_LID (CL_HTON64(1<<0)) < #define IB_VLA_COMPMASK_OUT_PORT (CL_HTON64(1<<1)) < #define IB_VLA_COMPMASK_BLOCK (CL_HTON64(1<<2)) --- > #define IB_VLA_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_VLA_COMPMASK_OUT_PORT (CL_HTON64(((uint64_t)1)<<1)) > #define IB_VLA_COMPMASK_BLOCK (CL_HTON64(((uint64_t)1)<<2)) > 2145,2166c2272,2301 < #define IB_SLVL_COMPMASK_LID (CL_HTON64(1<<0)) < #define IB_SLVL_COMPMASK_IN_PORT (CL_HTON64(1<<1)) < #define IB_SLVL_COMPMASK_OUT_PORT (CL_HTON64(1<<2)) < /* LFT Record MASKS */ < #define IB_LFTR_COMPMASK_LID (CL_HTON64(1<<0)) < #define IB_LFTR_COMPMASK_BLOCK (CL_HTON64(1<<1)) < /* ModeInfo Record MASKS */ < #define IB_NR_COMPMASK_LID (CL_HTON64(1<<0)) < #define IB_NR_COMPMASK_RESERVED1 (CL_HTON64(1<<1)) < #define IB_NR_COMPMASK_BASEVERSION (CL_HTON64(1<<2)) < #define IB_NR_COMPMASK_CLASSVERSION (CL_HTON64(1<<3)) < #define IB_NR_COMPMASK_NODETYPE (CL_HTON64(1<<4)) < #define IB_NR_COMPMASK_NUMPORTS (CL_HTON64(1<<5)) < #define IB_NR_COMPMASK_SYSIMAGEGUID (CL_HTON64(1<<6)) < #define IB_NR_COMPMASK_NODEGUID (CL_HTON64(1<<7)) < #define IB_NR_COMPMASK_PORTGUID (CL_HTON64(1<<8)) < #define IB_NR_COMPMASK_PARTCAP (CL_HTON64(1<<9)) < #define IB_NR_COMPMASK_DEVID (CL_HTON64(1<<10)) < #define IB_NR_COMPMASK_REV (CL_HTON64(1<<11)) < #define IB_NR_COMPMASK_PORTNUM (CL_HTON64(1<<12)) < #define IB_NR_COMPMASK_VENDID (CL_HTON64(1<<13)) < #define IB_NR_COMPMASK_NODEDESC (CL_HTON64(1<<14)) --- > #define IB_SLVL_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_SLVL_COMPMASK_IN_PORT (CL_HTON64(((uint64_t)1)<<1)) > #define IB_SLVL_COMPMASK_OUT_PORT (CL_HTON64(((uint64_t)1)<<2)) > > /* P_Key Table Record Masks */ > #define IB_PKEY_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_PKEY_COMPMASK_BLOCK (CL_HTON64(((uint64_t)1)<<1)) > #define IB_PKEY_COMPMASK_PORT (CL_HTON64(((uint64_t)1)<<2)) > > /* LFT Record Masks */ > #define IB_LFTR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_LFTR_COMPMASK_BLOCK (CL_HTON64(((uint64_t)1)<<1)) > > /* NodeInfo Record Masks */ > #define IB_NR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_NR_COMPMASK_RESERVED1 (CL_HTON64(((uint64_t)1)<<1)) > #define IB_NR_COMPMASK_BASEVERSION (CL_HTON64(((uint64_t)1)<<2)) > #define IB_NR_COMPMASK_CLASSVERSION (CL_HTON64(((uint64_t)1)<<3)) > #define IB_NR_COMPMASK_NODETYPE (CL_HTON64(((uint64_t)1)<<4)) > #define IB_NR_COMPMASK_NUMPORTS (CL_HTON64(((uint64_t)1)<<5)) > #define IB_NR_COMPMASK_SYSIMAGEGUID (CL_HTON64(((uint64_t)1)<<6)) > #define IB_NR_COMPMASK_NODEGUID (CL_HTON64(((uint64_t)1)<<7)) > #define IB_NR_COMPMASK_PORTGUID (CL_HTON64(((uint64_t)1)<<8)) > #define IB_NR_COMPMASK_PARTCAP (CL_HTON64(((uint64_t)1)<<9)) > #define IB_NR_COMPMASK_DEVID (CL_HTON64(((uint64_t)1)<<10)) > #define IB_NR_COMPMASK_REV (CL_HTON64(((uint64_t)1)<<11)) > #define IB_NR_COMPMASK_PORTNUM (CL_HTON64(((uint64_t)1)<<12)) > #define IB_NR_COMPMASK_VENDID (CL_HTON64(((uint64_t)1)<<13)) > #define IB_NR_COMPMASK_NODEDESC (CL_HTON64(((uint64_t)1)<<14)) > 2168,2199c2303,2334 < #define IB_SR_COMPMASK_SID (CL_HTON64(1<<0)) < #define IB_SR_COMPMASK_SGID (CL_HTON64(1<<1)) < #define IB_SR_COMPMASK_SPKEY (CL_HTON64(1<<2)) < #define IB_SR_COMPMASK_RES1 (CL_HTON64(1<<3)) < #define IB_SR_COMPMASK_SLEASE (CL_HTON64(1<<4)) < #define IB_SR_COMPMASK_SKEY (CL_HTON64(1<<5)) < #define IB_SR_COMPMASK_SNAME (CL_HTON64(1<<6)) < #define IB_SR_COMPMASK_SDATA8_0 (CL_HTON64(1<<7)) < #define IB_SR_COMPMASK_SDATA8_1 (CL_HTON64(1<<8)) < #define IB_SR_COMPMASK_SDATA8_2 (CL_HTON64(1<<9)) < #define IB_SR_COMPMASK_SDATA8_3 (CL_HTON64(1<<10)) < #define IB_SR_COMPMASK_SDATA8_4 (CL_HTON64(1<<11)) < #define IB_SR_COMPMASK_SDATA8_5 (CL_HTON64(1<<12)) < #define IB_SR_COMPMASK_SDATA8_6 (CL_HTON64(1<<13)) < #define IB_SR_COMPMASK_SDATA8_7 (CL_HTON64(1<<14)) < #define IB_SR_COMPMASK_SDATA8_8 (CL_HTON64(1<<15)) < #define IB_SR_COMPMASK_SDATA8_9 (CL_HTON64(1<<16)) < #define IB_SR_COMPMASK_SDATA8_10 (CL_HTON64(1<<17)) < #define IB_SR_COMPMASK_SDATA8_11 (CL_HTON64(1<<18)) < #define IB_SR_COMPMASK_SDATA8_12 (CL_HTON64(1<<19)) < #define IB_SR_COMPMASK_SDATA8_13 (CL_HTON64(1<<20)) < #define IB_SR_COMPMASK_SDATA8_14 (CL_HTON64(1<<21)) < #define IB_SR_COMPMASK_SDATA8_15 (CL_HTON64(1<<22)) < #define IB_SR_COMPMASK_SDATA16_0 (CL_HTON64(1<<23)) < #define IB_SR_COMPMASK_SDATA16_1 (CL_HTON64(1<<24)) < #define IB_SR_COMPMASK_SDATA16_2 (CL_HTON64(1<<25)) < #define IB_SR_COMPMASK_SDATA16_3 (CL_HTON64(1<<26)) < #define IB_SR_COMPMASK_SDATA16_4 (CL_HTON64(1<<27)) < #define IB_SR_COMPMASK_SDATA16_5 (CL_HTON64(1<<28)) < #define IB_SR_COMPMASK_SDATA16_6 (CL_HTON64(1<<29)) < #define IB_SR_COMPMASK_SDATA16_7 (CL_HTON64(1<<30)) < #define IB_SR_COMPMASK_SDATA32_0 (CL_HTON64(1<<31)) --- > #define IB_SR_COMPMASK_SID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_SR_COMPMASK_SGID (CL_HTON64(((uint64_t)1)<<1)) > #define IB_SR_COMPMASK_SPKEY (CL_HTON64(((uint64_t)1)<<2)) > #define IB_SR_COMPMASK_RES1 (CL_HTON64(((uint64_t)1)<<3)) > #define IB_SR_COMPMASK_SLEASE (CL_HTON64(((uint64_t)1)<<4)) > #define IB_SR_COMPMASK_SKEY (CL_HTON64(((uint64_t)1)<<5)) > #define IB_SR_COMPMASK_SNAME (CL_HTON64(((uint64_t)1)<<6)) > #define IB_SR_COMPMASK_SDATA8_0 (CL_HTON64(((uint64_t)1)<<7)) > #define IB_SR_COMPMASK_SDATA8_1 (CL_HTON64(((uint64_t)1)<<8)) > #define IB_SR_COMPMASK_SDATA8_2 (CL_HTON64(((uint64_t)1)<<9)) > #define IB_SR_COMPMASK_SDATA8_3 (CL_HTON64(((uint64_t)1)<<10)) > #define IB_SR_COMPMASK_SDATA8_4 (CL_HTON64(((uint64_t)1)<<11)) > #define IB_SR_COMPMASK_SDATA8_5 (CL_HTON64(((uint64_t)1)<<12)) > #define IB_SR_COMPMASK_SDATA8_6 (CL_HTON64(((uint64_t)1)<<13)) > #define IB_SR_COMPMASK_SDATA8_7 (CL_HTON64(((uint64_t)1)<<14)) > #define IB_SR_COMPMASK_SDATA8_8 (CL_HTON64(((uint64_t)1)<<15)) > #define IB_SR_COMPMASK_SDATA8_9 (CL_HTON64(((uint64_t)1)<<16)) > #define IB_SR_COMPMASK_SDATA8_10 (CL_HTON64(((uint64_t)1)<<17)) > #define IB_SR_COMPMASK_SDATA8_11 (CL_HTON64(((uint64_t)1)<<18)) > #define IB_SR_COMPMASK_SDATA8_12 (CL_HTON64(((uint64_t)1)<<19)) > #define IB_SR_COMPMASK_SDATA8_13 (CL_HTON64(((uint64_t)1)<<20)) > #define IB_SR_COMPMASK_SDATA8_14 (CL_HTON64(((uint64_t)1)<<21)) > #define IB_SR_COMPMASK_SDATA8_15 (CL_HTON64(((uint64_t)1)<<22)) > #define IB_SR_COMPMASK_SDATA16_0 (CL_HTON64(((uint64_t)1)<<23)) > #define IB_SR_COMPMASK_SDATA16_1 (CL_HTON64(((uint64_t)1)<<24)) > #define IB_SR_COMPMASK_SDATA16_2 (CL_HTON64(((uint64_t)1)<<25)) > #define IB_SR_COMPMASK_SDATA16_3 (CL_HTON64(((uint64_t)1)<<26)) > #define IB_SR_COMPMASK_SDATA16_4 (CL_HTON64(((uint64_t)1)<<27)) > #define IB_SR_COMPMASK_SDATA16_5 (CL_HTON64(((uint64_t)1)<<28)) > #define IB_SR_COMPMASK_SDATA16_6 (CL_HTON64(((uint64_t)1)<<29)) > #define IB_SR_COMPMASK_SDATA16_7 (CL_HTON64(((uint64_t)1)<<30)) > #define IB_SR_COMPMASK_SDATA32_0 (CL_HTON64(((uint64_t)1)<<31)) 2204a2340 > 2206,2253c2342,2391 < #define IB_PIR_COMPMASK_LID (CL_HTON64(1<<0)) < #define IB_PIR_COMPMASK_PORTNUM (CL_HTON64(1<<1)) < #define IB_PIR_COMPMASK_RESV1 (CL_HTON64(1<<2)) < #define IB_PIR_COMPMASK_MKEY (CL_HTON64(1<<3)) < #define IB_PIR_COMPMASK_GIDPRE (CL_HTON64(1<<4)) < #define IB_PIR_COMPMASK_BASELID (CL_HTON64(1<<5)) < #define IB_PIR_COMPMASK_SMLID (CL_HTON64(1<<6)) < #define IB_PIR_COMPMASK_CAPMASK (CL_HTON64(1<<7)) < #define IB_PIR_COMPMASK_DIAGCODE (CL_HTON64(1<<8)) < #define IB_PIR_COMPMASK_MKEYLEASEPRD (CL_HTON64(1<<9)) < #define IB_PIR_COMPMASK_LOCALPORTNUM (CL_HTON64(1<<10)) < #define IB_PIR_COMPMASK_LNKWIDTHSUPPORT (CL_HTON64(1<<11)) < #define IB_PIR_COMPMASK_LNKWIDTHACTIVE (CL_HTON64(1<<12)) < #define IB_PIR_COMPMASK_LINKWIDTHENABLED (CL_HTON64(1<<13)) < #define IB_PIR_COMPMASK_LNKSPEEDSUPPORT (CL_HTON64(1<<14)) < #define IB_PIR_COMPMASK_PORTSTATE (CL_HTON64(1<<15)) < #define IB_PIR_COMPMASK_PORTPHYSTATE (CL_HTON64(1<<16)) < #define IB_PIR_COMPMASK_LINKDWNDFLTSTATE (CL_HTON64(1<<17)) < #define IB_PIR_COMPMASK_MKEYPROTBITS (CL_HTON64(1<<18)) < #define IB_PIR_COMPMASK_LMC (CL_HTON64(1<<19)) < #define IB_PIR_COMPMASK_LINKSPEEDACTIVE (CL_HTON64(1<<20)) < #define IB_PIR_COMPMASK_LINKSPEEDENABLE (CL_HTON64(1<<21)) < #define IB_PIR_COMPMASK_NEIGHBORMTU (CL_HTON64(1<<22)) < #define IB_PIR_COMPMASK_MASTERSMSL (CL_HTON64(1<<23)) < #define IB_PIR_COMPMASK_VLCAP (CL_HTON64(1<<24)) < #define IB_PIR_COMPMASK_INITTYPE (CL_HTON64(1<<25)) < #define IB_PIR_COMPMASK_VLHIGHLIMIT (CL_HTON64(1<<26)) < #define IB_PIR_COMPMASK_VLARBHIGHCAP (CL_HTON64(1<<27)) < #define IB_PIR_COMPMASK_VLARBLOWCAP (CL_HTON64(1<<28)) < #define IB_PIR_COMPMASK_INITTYPEREPLY (CL_HTON64(1<<29)) < #define IB_PIR_COMPMASK_MTUCAP (CL_HTON64(1<<30)) < #define IB_PIR_COMPMASK_VLSTALLCNT (CL_HTON64(1<<31)) < #define IB_PIR_COMPMASK_HOQLIFE (CL_HTON64(((uint64_t)1)<<32)) < #define IB_PIR_COMPMASK_OPVLS (CL_HTON64(((uint64_t)1)<<33)) < #define IB_PIR_COMPMASK_PARENFIN (CL_HTON64(((uint64_t)1)<<34)) < #define IB_PIR_COMPMASK_PARENFOUT (CL_HTON64(((uint64_t)1)<<35)) < #define IB_PIR_COMPMASK_FILTERRAWIN (CL_HTON64(((uint64_t)1)<<36)) < #define IB_PIR_COMPMASK_FILTERRAWOUT (CL_HTON64(((uint64_t)1)<<37)) < #define IB_PIR_COMPMASK_MKEYVIO (CL_HTON64(((uint64_t)1)<<38)) < #define IB_PIR_COMPMASK_PKEYVIO (CL_HTON64(((uint64_t)1)<<39)) < #define IB_PIR_COMPMASK_QKEYVIO (CL_HTON64(((uint64_t)1)<<40)) < #define IB_PIR_COMPMASK_GUIDCAP (CL_HTON64(((uint64_t)1)<<41)) < #define IB_PIR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<42)) < #define IB_PIR_COMPMASK_SUBNTO (CL_HTON64(((uint64_t)1)<<43)) < #define IB_PIR_COMPMASK_RESV3 (CL_HTON64(((uint64_t)1)<<44)) < #define IB_PIR_COMPMASK_RESPTIME (CL_HTON64(((uint64_t)1)<<45)) < #define IB_PIR_COMPMASK_LOCALPHYERR (CL_HTON64(((uint64_t)1)<<46)) < #define IB_PIR_COMPMASK_OVERRUNERR (CL_HTON64(((uint64_t)1)<<47)) --- > #define IB_PIR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_PIR_COMPMASK_PORTNUM (CL_HTON64(((uint64_t)1)<<1)) > #define IB_PIR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<2)) > #define IB_PIR_COMPMASK_MKEY (CL_HTON64(((uint64_t)1)<<3)) > #define IB_PIR_COMPMASK_GIDPRE (CL_HTON64(((uint64_t)1)<<4)) > #define IB_PIR_COMPMASK_BASELID (CL_HTON64(((uint64_t)1)<<5)) > #define IB_PIR_COMPMASK_SMLID (CL_HTON64(((uint64_t)1)<<6)) > #define IB_PIR_COMPMASK_CAPMASK (CL_HTON64(((uint64_t)1)<<7)) > #define IB_PIR_COMPMASK_DIAGCODE (CL_HTON64(((uint64_t)1)<<8)) > #define IB_PIR_COMPMASK_MKEYLEASEPRD (CL_HTON64(((uint64_t)1)<<9)) > #define IB_PIR_COMPMASK_LOCALPORTNUM (CL_HTON64(((uint64_t)1)<<10)) > #define IB_PIR_COMPMASK_LINKWIDTHENABLED (CL_HTON64(((uint64_t)1)<<11)) > #define IB_PIR_COMPMASK_LNKWIDTHSUPPORT (CL_HTON64(((uint64_t)1)<<12)) > #define IB_PIR_COMPMASK_LNKWIDTHACTIVE (CL_HTON64(((uint64_t)1)<<13)) > #define IB_PIR_COMPMASK_LNKSPEEDSUPPORT (CL_HTON64(((uint64_t)1)<<14)) > #define IB_PIR_COMPMASK_PORTSTATE (CL_HTON64(((uint64_t)1)<<15)) > #define IB_PIR_COMPMASK_PORTPHYSTATE (CL_HTON64(((uint64_t)1)<<16)) > #define IB_PIR_COMPMASK_LINKDWNDFLTSTATE (CL_HTON64(((uint64_t)1)<<17)) > #define IB_PIR_COMPMASK_MKEYPROTBITS (CL_HTON64(((uint64_t)1)<<18)) > #define IB_PIR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<19)) > #define IB_PIR_COMPMASK_LMC (CL_HTON64(((uint64_t)1)<<20)) > #define IB_PIR_COMPMASK_LINKSPEEDACTIVE (CL_HTON64(((uint64_t)1)<<21)) > #define IB_PIR_COMPMASK_LINKSPEEDENABLE (CL_HTON64(((uint64_t)1)<<22)) > #define IB_PIR_COMPMASK_NEIGHBORMTU (CL_HTON64(((uint64_t)1)<<23)) > #define IB_PIR_COMPMASK_MASTERSMSL (CL_HTON64(((uint64_t)1)<<24)) > #define IB_PIR_COMPMASK_VLCAP (CL_HTON64(((uint64_t)1)<<25)) > #define IB_PIR_COMPMASK_INITTYPE (CL_HTON64(((uint64_t)1)<<26)) > #define IB_PIR_COMPMASK_VLHIGHLIMIT (CL_HTON64(((uint64_t)1)<<27)) > #define IB_PIR_COMPMASK_VLARBHIGHCAP (CL_HTON64(((uint64_t)1)<<28)) > #define IB_PIR_COMPMASK_VLARBLOWCAP (CL_HTON64(((uint64_t)1)<<29)) > #define IB_PIR_COMPMASK_INITTYPEREPLY (CL_HTON64(((uint64_t)1)<<30)) > #define IB_PIR_COMPMASK_MTUCAP (CL_HTON64(((uint64_t)1)<<31)) > #define IB_PIR_COMPMASK_VLSTALLCNT (CL_HTON64(((uint64_t)1)<<32)) > #define IB_PIR_COMPMASK_HOQLIFE (CL_HTON64(((uint64_t)1)<<33)) > #define IB_PIR_COMPMASK_OPVLS (CL_HTON64(((uint64_t)1)<<34)) > #define IB_PIR_COMPMASK_PARENFIN (CL_HTON64(((uint64_t)1)<<35)) > #define IB_PIR_COMPMASK_PARENFOUT (CL_HTON64(((uint64_t)1)<<36)) > #define IB_PIR_COMPMASK_FILTERRAWIN (CL_HTON64(((uint64_t)1)<<37)) > #define IB_PIR_COMPMASK_FILTERRAWOUT (CL_HTON64(((uint64_t)1)<<38)) > #define IB_PIR_COMPMASK_MKEYVIO (CL_HTON64(((uint64_t)1)<<39)) > #define IB_PIR_COMPMASK_PKEYVIO (CL_HTON64(((uint64_t)1)<<40)) > #define IB_PIR_COMPMASK_QKEYVIO (CL_HTON64(((uint64_t)1)<<41)) > #define IB_PIR_COMPMASK_GUIDCAP (CL_HTON64(((uint64_t)1)<<42)) > #define IB_PIR_COMPMASK_RESV3 (CL_HTON64(((uint64_t)1)<<43)) > #define IB_PIR_COMPMASK_SUBNTO (CL_HTON64(((uint64_t)1)<<44)) > #define IB_PIR_COMPMASK_RESV4 (CL_HTON64(((uint64_t)1)<<45)) > #define IB_PIR_COMPMASK_RESPTIME (CL_HTON64(((uint64_t)1)<<46)) > #define IB_PIR_COMPMASK_LOCALPHYERR (CL_HTON64(((uint64_t)1)<<47)) > #define IB_PIR_COMPMASK_OVERRUNERR (CL_HTON64(((uint64_t)1)<<48)) > 2255,2273c2393,2449 < #define IB_MCR_COMPMASK_GID (CL_HTON64(1<<0)) < #define IB_MCR_COMPMASK_MGID (CL_HTON64(1<<0)) < #define IB_MCR_COMPMASK_PORT_GID (CL_HTON64(1<<1)) < #define IB_MCR_COMPMASK_QKEY (CL_HTON64(1<<2)) < #define IB_MCR_COMPMASK_MLID (CL_HTON64(1<<3)) < #define IB_MCR_COMPMASK_MTU_SEL (CL_HTON64(1<<4)) < #define IB_MCR_COMPMASK_MTU (CL_HTON64(1<<5)) < #define IB_MCR_COMPMASK_TCLASS (CL_HTON64(1<<6)) < #define IB_MCR_COMPMASK_PKEY (CL_HTON64(1<<7)) < #define IB_MCR_COMPMASK_RATE_SEL (CL_HTON64(1<<8)) < #define IB_MCR_COMPMASK_RATE (CL_HTON64(1<<9)) < #define IB_MCR_COMPMASK_LIFE_SEL (CL_HTON64(1<<10)) < #define IB_MCR_COMPMASK_LIFE (CL_HTON64(1<<11)) < #define IB_MCR_COMPMASK_SL (CL_HTON64(1<<12)) < #define IB_MCR_COMPMASK_FLOW (CL_HTON64(1<<13)) < #define IB_MCR_COMPMASK_HOP (CL_HTON64(1<<14)) < #define IB_MCR_COMPMASK_SCOPE (CL_HTON64(1<<15)) < #define IB_MCR_COMPMASK_JOIN_STATE (CL_HTON64(1<<16)) < #define IB_MCR_COMPMASK_PROXY (CL_HTON64(1<<17)) --- > #define IB_MCR_COMPMASK_GID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_MCR_COMPMASK_MGID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_MCR_COMPMASK_PORT_GID (CL_HTON64(((uint64_t)1)<<1)) > #define IB_MCR_COMPMASK_QKEY (CL_HTON64(((uint64_t)1)<<2)) > #define IB_MCR_COMPMASK_MLID (CL_HTON64(((uint64_t)1)<<3)) > #define IB_MCR_COMPMASK_MTU_SEL (CL_HTON64(((uint64_t)1)<<4)) > #define IB_MCR_COMPMASK_MTU (CL_HTON64(((uint64_t)1)<<5)) > #define IB_MCR_COMPMASK_TCLASS (CL_HTON64(((uint64_t)1)<<6)) > #define IB_MCR_COMPMASK_PKEY (CL_HTON64(((uint64_t)1)<<7)) > #define IB_MCR_COMPMASK_RATE_SEL (CL_HTON64(((uint64_t)1)<<8)) > #define IB_MCR_COMPMASK_RATE (CL_HTON64(((uint64_t)1)<<9)) > #define IB_MCR_COMPMASK_LIFE_SEL (CL_HTON64(((uint64_t)1)<<10)) > #define IB_MCR_COMPMASK_LIFE (CL_HTON64(((uint64_t)1)<<11)) > #define IB_MCR_COMPMASK_SL (CL_HTON64(((uint64_t)1)<<12)) > #define IB_MCR_COMPMASK_FLOW (CL_HTON64(((uint64_t)1)<<13)) > #define IB_MCR_COMPMASK_HOP (CL_HTON64(((uint64_t)1)<<14)) > #define IB_MCR_COMPMASK_SCOPE (CL_HTON64(((uint64_t)1)<<15)) > #define IB_MCR_COMPMASK_JOIN_STATE (CL_HTON64(((uint64_t)1)<<16)) > #define IB_MCR_COMPMASK_PROXY (CL_HTON64(((uint64_t)1)<<17)) > > /* GUID Info Record Component Masks */ > #define IB_GIR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > #define IB_GIR_COMPMASK_BLOCKNUM (CL_HTON64(((uint64_t)1)<<1)) > #define IB_GIR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<2)) > #define IB_GIR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<3)) > #define IB_GIR_COMPMASK_GID0 (CL_HTON64(((uint64_t)1)<<4)) > #define IB_GIR_COMPMASK_GID1 (CL_HTON64(((uint64_t)1)<<5)) > #define IB_GIR_COMPMASK_GID2 (CL_HTON64(((uint64_t)1)<<6)) > #define IB_GIR_COMPMASK_GID3 (CL_HTON64(((uint64_t)1)<<7)) > #define IB_GIR_COMPMASK_GID4 (CL_HTON64(((uint64_t)1)<<8)) > #define IB_GIR_COMPMASK_GID5 (CL_HTON64(((uint64_t)1)<<9)) > #define IB_GIR_COMPMASK_GID6 (CL_HTON64(((uint64_t)1)<<10)) > #define IB_GIR_COMPMASK_GID7 (CL_HTON64(((uint64_t)1)<<11)) > > /* MultiPath Record Component Masks */ > #define IB_MPR_COMPMASK_RAWTRAFFIC (CL_HTON64(((uint64_t)1)<<0)) > #define IB_MPR_COMPMASK_RESV0 (CL_HTON64(((uint64_t)1)<<1)) > #define IB_MPR_COMPMASK_FLOWLABEL (CL_HTON64(((uint64_t)1)<<2)) > #define IB_MPR_COMPMASK_HOPLIMIT (CL_HTON64(((uint64_t)1)<<3)) > #define IB_MPR_COMPMASK_TCLASS (CL_HTON64(((uint64_t)1)<<4)) > #define IB_MPR_COMPMASK_REVERSIBLE (CL_HTON64(((uint64_t)1)<<5)) > #define IB_MPR_COMPMASK_NUMBPATH (CL_HTON64(((uint64_t)1)<<6)) > #define IB_MPR_COMPMASK_PKEY (CL_HTON64(((uint64_t)1)<<7)) > #define IB_MPR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<8)) > #define IB_MPR_COMPMASK_SL (CL_HTON64(((uint64_t)1)<<9)) > #define IB_MPR_COMPMASK_MTUSELEC (CL_HTON64(((uint64_t)1)<<10)) > #define IB_MPR_COMPMASK_MTU (CL_HTON64(((uint64_t)1)<<11)) > #define IB_MPR_COMPMASK_RATESELEC (CL_HTON64(((uint64_t)1)<<12)) > #define IB_MPR_COMPMASK_RATE (CL_HTON64(((uint64_t)1)<<13)) > #define IB_MPR_COMPMASK_PKTLIFETIMESELEC (CL_HTON64(((uint64_t)1)<<14)) > #define IB_MPR_COMPMASK_PKTLIFETIME (CL_HTON64(((uint64_t)1)<<15)) > #define IB_MPR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<16)) > #define IB_MPR_COMPMASK_INDEPSELEC (CL_HTON64(((uint64_t)1)<<17)) > #define IB_MPR_COMPMASK_RESV3 (CL_HTON64(((uint64_t)1)<<18)) > #define IB_MPR_COMPMASK_SGIDCOUNT (CL_HTON64(((uint64_t)1)<<19)) > #define IB_MPR_COMPMASK_DGIDCOUNT (CL_HTON64(((uint64_t)1)<<20)) > #define IB_MPR_COMPMASK_RESV4 (CL_HTON64(((uint64_t)1)<<21)) 2284c2460 < AL_INLINE void AL_API --- > static inline void 2287,2300c2463,2476 < IN const ib_gid_t* const p_dgid, < IN const ib_gid_t* const p_sgid, < IN const ib_net16_t dlid, < IN const ib_net16_t slid, < IN const uint8_t num_path, < IN const ib_net16_t pkey, < IN const uint8_t sl, < IN const uint8_t mtu_selector, < IN const uint8_t mtu, < IN const uint8_t rate_selector, < IN const uint8_t rate, < IN const uint8_t pkt_life_selector, < IN const uint8_t pkt_life, < IN const uint8_t preference ) --- > IN ib_gid_t* const p_dgid, > IN ib_gid_t* const p_sgid, > IN ib_net16_t dlid, > IN ib_net16_t slid, > IN uint8_t num_path, > IN ib_net16_t pkey, > IN uint8_t sl, > IN uint8_t mtu_selector, > IN uint8_t mtu, > IN uint8_t rate_selector, > IN uint8_t rate, > IN uint8_t pkt_life_selector, > IN uint8_t pkt_life, > IN uint8_t preference ) 2319c2495 < p_rec->hop_flow_raw.val = 0; --- > p_rec->hop_flow_raw = 0; 2322,2324c2498,2500 < p_rec->resv0 = 0; < p_rec->resv1 = 0; < p_rec->resv2 = 0; --- > *((uint64_t*)p_rec->resv0) = 0; > *((uint32_t*)p_rec->resv2) = 0; > *((uint16_t*)p_rec->resv2 + 2) = 0; 2344c2520,2521 < * [in] In queries, maximum number of paths to return. --- > * [in] Reversible path - 1 bit to say if path is reversible. > * num_path [6:0] In queries, maximum number of paths to return. 2383a2561,2589 > /****f* IBA Base: Types/ib_path_rec_num_path > * NAME > * ib_path_rec_num_path > * > * DESCRIPTION > * Get max number of paths to return. > * > * SYNOPSIS > */ > static inline uint8_t > ib_path_rec_num_path( > IN const ib_path_rec_t* const p_rec ) > { > return( p_rec->num_path &0x7F ); > } > /* > * PARAMETERS > * p_rec > * [in] Pointer to the path record object. > * > * RETURN VALUES > * Maximum number of paths to return for each unique SGID_DGID combination. > * > * NOTES > * > * SEE ALSO > * ib_path_rec_t > *********/ > 2393c2599 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2405,2411c2611 < * Encoded path MTU. < * 1: 256 < * 2: 512 < * 3: 1024 < * 4: 2048 < * 5: 4096 < * others: reserved --- > * SL. 2428c2628 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2463c2663 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2496c2696 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2511a2712,2717 > * 5: 5 Gb/sec. > * 6: 20 Gb/sec. > * 7: 40 Gb/sec. > * 8: 60 Gb/sec. > * 9: 80 Gb/sec. > * 10: 120 Gb/sec. 2529c2735 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2562c2768 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2574c2780 < * Encoded path pkt_life = 4.096 µsec * 2 PacketLifeTime. --- > * Encoded path pkt_life = 4.096 µsec * 2 ** PacketLifeTime. 2591c2797 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2624c2830 < AL_INLINE net32_t AL_API --- > static inline uint32_t 2628c2834 < return( cl_hton32( (cl_ntoh32(p_rec->hop_flow_raw.val) >> 8) & 0x000FFFFF ) ); --- > return( ((cl_ntoh32(p_rec->hop_flow_raw) >> 8) & 0x000FFFFF) ); 2653c2859 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2657c2863 < return( p_rec->hop_flow_raw.bytes[3] ); --- > return( (uint8_t)(p_rec->hop_flow_raw & 0x000000FF ) ); 2673,2714d2878 < /****f* IBA Base: Types/ib_path_rec_set_hop_flow_raw < * NAME < * ib_path_rec_set_hop_flow_raw < * < * DESCRIPTION < * Sets the hop limit, flow label, and raw traffic bits of a path record. < * < * SYNOPSIS < */ < AL_INLINE void AL_API < ib_path_rec_set_hop_flow_raw( < OUT ib_path_rec_t* const p_rec, < IN const uint8_t hop_limit, < IN const net32_t flow_lbl, < IN const boolean_t raw ) < { < p_rec->hop_flow_raw.val = (cl_ntoh32( flow_lbl ) & 0x000FFFFF) << 8; < if( raw ) < p_rec->hop_flow_raw.val |= 0x80000000; < p_rec->hop_flow_raw.val = cl_hton32( p_rec->hop_flow_raw.val ); < p_rec->hop_flow_raw.bytes[3] = hop_limit; < } < /* < * PARAMETERS < * p_rec < * Pointer to the path record whose hop limit, flow label, and rab < * traffic fields to set. < * < * hop_limit < * Hop limit to set in the path record. < * < * flow_lbl < * Flow label, in network byte order, to set in the path record. < * < * raw < * Boolean flag to indicate whether the path record is for raw traffic. < * < * SEE ALSO < * ib_path_rec_t < *********/ < < 2776c2940,2941 < ib_net32_t resp_time_val; --- > uint8_t reserved[3]; > uint8_t resp_time_val; 2894c3059 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2922c3087 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 2963d3127 < 3041c3205 < AL_INLINE void AL_API --- > static inline void 3104c3268 < AL_INLINE void AL_API --- > static inline void 3149c3313 < AL_INLINE boolean_t AL_API --- > static inline boolean_t 3206c3370 < AL_INLINE boolean_t AL_API --- > static inline boolean_t 3231c3395 < AL_INLINE void AL_API --- > static inline void 3241c3405 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 3259c3423,3424 < #define IB_SMP_DIRECTION (CL_HTON16(0x8000)) --- > #define IB_SMP_DIRECTION_HO 0x8000 > #define IB_SMP_DIRECTION (CL_HTON16(IB_SMP_DIRECTION_HO)) 3271c3436,3437 < #define IB_SMP_STATUS_MASK (CL_HTON16(0x7FFF)) --- > #define IB_SMP_STATUS_MASK_HO 0x7FFF > #define IB_SMP_STATUS_MASK (CL_HTON16(IB_SMP_STATUS_MASK_HO)) 3376c3542 < AL_INLINE ib_net16_t AL_API --- > static inline ib_net16_t 3405c3571 < AL_INLINE boolean_t AL_API --- > static inline boolean_t 3433c3599 < AL_INLINE boolean_t AL_API --- > static inline boolean_t 3466c3632 < AL_INLINE void AL_API --- > static inline void 3496c3662 < cl_memclr( p_smp->resv1, --- > memset( p_smp->resv1, 0, 3503c3669 < cl_memcpy( &p_smp->initial_path, path_out, --- > memcpy( &p_smp->initial_path, path_out, 3552c3718 < AL_INLINE void* AL_API --- > static inline void* 3645,3647c3811 < < < AL_INLINE uint32_t AL_API --- > static inline uint32_t 3654c3818 < AL_INLINE ib_net16_t AL_API --- > static inline ib_net16_t 3658,3660d3821 < if( attr_size & 0x07 ) < return( cl_hton16( (uint16_t)(attr_size >> 3) + 1 ) ); < else 3664d3824 < 3674c3834 < AL_INLINE void* AL_API --- > static inline void* 3711c3871 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 3741c3901 < AL_INLINE ib_net32_t AL_API --- > static inline ib_net32_t 3810,3811c3970,3971 < uint8_t state_info1; // LinkSpeedSupported and PortState < uint8_t state_info2; // PortPhysState and LinkDownDefaultState --- > uint8_t state_info1; /* LinkSpeedSupported and PortState */ > uint8_t state_info2; /* PortPhysState and LinkDownDefaultState */ 3813c3973 < uint8_t link_speed; // LinkSpeedEnabled and LinkSpeedActive --- > uint8_t link_speed; /* LinkSpeedEnabled and LinkSpeedActive */ 3815c3975 < uint8_t vl_cap; // VlCap and InitType --- > uint8_t vl_cap; /* VLCap and InitType */ 3826c3986,3987 < uint8_t subnet_timeout; --- > uint8_t subnet_timeout; /* cli_rereg(1b), resrv( > 2b), timeout(5b) */ 3835a3997 > #define IB_PORT_LMC_MAX 0x07 3841a4004,4006 > #define IB_PORT_PHYS_STATE_MASK 0xF0 > #define IB_PORT_PHYS_STATE_SHIFT 4 > #define IB_PORT_LNKDWNDFTSTATE_MASK 0x0F 3854c4019 < #define IB_PORT_CAP_HAS_SYS_GUID (CL_NTOH32(0x00000800)) --- > #define IB_PORT_CAP_HAS_SYS_IMG_GUID (CL_NTOH32(0x00000800)) 3861c4026 < #define IB_PORT_CAP_HAS_REINIT (CL_NTOH32(0x00040000)) --- > #define IB_PORT_CAP_REINIT (CL_NTOH32(0x00040000)) 3867,3868c4032,4033 < #define IB_PORT_CAP_HAS_LINK_TRIP (CL_NTOH32(0x01000000)) < #define IB_PORT_CAP_HAS_CLI_REREG (CL_NTOH32(0x02000000)) --- > #define IB_PORT_CAP_HAS_LINK_RT_LATENCY (CL_NTOH32(0x01000000)) > #define IB_PORT_CAP_HAS_CLIENT_REREG (CL_NTOH32(0x02000000)) 3885c4050 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 3912c4077 < AL_INLINE void AL_API --- > static inline void 3943c4108 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 3960a4126,4152 > /****f* IBA Base: Types/ib_port_info_get_init_type > * NAME > * ib_port_info_get_init_type > * > * DESCRIPTION > * Gets the init type of a port. > * > * SYNOPSIS > */ > static inline uint8_t > ib_port_info_get_init_type( > IN const ib_port_info_t* const p_pi) > { > return (uint8_t) (p_pi->vl_cap & 0x0F); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * InitType field > * > * NOTES > * > * SEE ALSO > *********/ 3970c4162 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 3997c4189 < AL_INLINE void AL_API --- > static inline void 4028c4220 < AL_INLINE void AL_API --- > static inline void 4057c4249 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 4088c4280 < AL_INLINE void AL_API --- > static inline void 4114c4306 < /****f* IBA Base: Types/ib_port_info_get_link_speed_active --- > /****f* IBA Base: Types/ib_port_info_get_port_phys_state 4116c4308 < * ib_port_info_get_link_speed_active --- > * ib_port_info_get_port_phys_state 4119c4311 < * Returns the Link Speed Active value assigned to this port. --- > * Returns the encoded value for the port physical state. 4123,4124c4315,4316 < AL_INLINE uint8_t AL_API < ib_port_info_get_link_speed_active( --- > static inline uint8_t > ib_port_info_get_port_phys_state( 4127,4128c4319,4321 < return( (uint8_t)((p_pi->link_speed & IB_PORT_LINK_SPEED_ACTIVE_MASK) >> < IB_PORT_LINK_SPEED_SHIFT) ); --- > return( (uint8_t)((p_pi->state_info2 & > IB_PORT_PHYS_STATE_MASK) >> > IB_PORT_PHYS_STATE_SHIFT) ); 4136c4329 < * Returns the link speed active value assigned to this port. --- > * Returns the encoded value for the port physical state. 4142a4336,4467 > /****f* IBA Base: Types/ib_port_info_set_port_phys_state. > * NAME > * ib_port_info_set_port_phys_state > * > * DESCRIPTION > * Given an integer of the port physical state, > * Set the appropriate bits in state_info2 > * > * SYNOPSIS > */ > static inline void > ib_port_info_set_port_phys_state( > IN uint8_t const phys_state, > IN ib_port_info_t* p_pi ) > { > p_pi->state_info2 = > ( ~IB_PORT_PHYS_STATE_MASK & p_pi->state_info2 ) | > ( IB_PORT_PHYS_STATE_MASK & > (phys_state << IB_PORT_PHYS_STATE_SHIFT) ); > } > /* > * PARAMETERS > * phys_state > * [in] port physical state. > * > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * This function does not return a value. > * > * NOTES > * > * SEE ALSO > *********/ > > /****f* IBA Base: Types/ib_port_info_get_link_down_def_state > * NAME > * ib_port_info_get_link_down_def_state > * > * DESCRIPTION > * Returns the link down default state. > * > * SYNOPSIS > */ > static inline uint8_t > ib_port_info_get_link_down_def_state( > IN const ib_port_info_t* const p_pi ) > { > return( (uint8_t)(p_pi->state_info2 & IB_PORT_LNKDWNDFTSTATE_MASK) ); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * link down default state of the port. > * > * NOTES > * > * SEE ALSO > *********/ > > /****f* IBA Base: Types/ib_port_info_set_link_down_def_state > * NAME > * ib_port_info_set_link_down_def_state > * > * DESCRIPTION > * Sets the link down default state of the port. > * > * SYNOPSIS > */ > static inline void > ib_port_info_set_link_down_def_state( > IN ib_port_info_t* const p_pi, > IN const uint8_t link_dwn_state ) > { > p_pi->state_info2 = (uint8_t)((p_pi->state_info2 & 0xF0) | link_dwn_state ); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * link_dwn_state > * [in] Link down default state of the port. > * > * RETURN VALUES > * None. > * > * NOTES > * > * SEE ALSO > *********/ > > /****f* IBA Base: Types/ib_port_info_get_link_speed_active > * NAME > * ib_port_info_get_link_speed_active > * > * DESCRIPTION > * Returns the Link Speed Active value assigned to this port. > * > * SYNOPSIS > */ > static inline uint8_t > ib_port_info_get_link_speed_active( > IN const ib_port_info_t* const p_pi ) > { > return( (uint8_t)((p_pi->link_speed & > IB_PORT_LINK_SPEED_ACTIVE_MASK) >> > IB_PORT_LINK_SPEED_SHIFT) ); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * Returns the link speed active value assigned to this port. > * > * NOTES > * > * SEE ALSO > *********/ > > #define IB_LINK_WIDTH_ACTIVE_1X 1 > #define IB_LINK_WIDTH_ACTIVE_4X 2 > #define IB_LINK_WIDTH_ACTIVE_12X 8 > #define IB_LINK_SPEED_ACTIVE_2_5 1 > #define IB_LINK_SPEED_ACTIVE_5 2 > #define IB_LINK_SPEED_ACTIVE_10 4 4144,4151c4469 < #define IB_LINK_WIDTH_ACTIVE_1X 1 < #define IB_LINK_WIDTH_ACTIVE_4X 2 < #define IB_LINK_WIDTH_ACTIVE_12X 8 < < #define IB_LINK_SPEED_ACTIVE_2_5 1 < #define IB_LINK_SPEED_ACTIVE_5 2 < #define IB_LINK_SPEED_ACTIVE_10 4 < --- > /* following v1 ver1.2 p901 */ 4161a4480,4481 > #define IB_MIN_RATE IB_PATH_RECORD_RATE_2_5_GBS > #define IB_MAX_RATE IB_PATH_RECORD_RATE_120_GBS 4172c4492,4493 < AL_INLINE uint8_t AL_API --- > > static inline uint8_t 4176,4177c4497,4502 < switch( p_pi->link_width_active * p_pi->link_width_active * < ib_port_info_get_link_speed_active( p_pi ) ) --- > uint8_t rate = 0; > > switch (ib_port_info_get_link_speed_active(p_pi)) > { > case IB_LINK_SPEED_ACTIVE_2_5: > switch (p_pi->link_width_active) 4179,4180c4504,4510 < case 1: < return IB_PATH_RECORD_RATE_2_5_GBS; --- > case IB_LINK_WIDTH_ACTIVE_1X: > rate = IB_PATH_RECORD_RATE_2_5_GBS; > break; > > case IB_LINK_WIDTH_ACTIVE_4X: > rate = IB_PATH_RECORD_RATE_10_GBS; > break; 4182,4183c4512,4514 < case 2: < return IB_PATH_RECORD_RATE_5_GBS; --- > case IB_LINK_WIDTH_ACTIVE_12X: > rate = IB_PATH_RECORD_RATE_30_GBS; > break; 4185,4186c4516,4526 < case 4: < return IB_PATH_RECORD_RATE_10_GBS; --- > default: > rate = IB_PATH_RECORD_RATE_2_5_GBS; > break; > } > break; > case IB_LINK_SPEED_ACTIVE_5: > switch (p_pi->link_width_active) > { > case IB_LINK_WIDTH_ACTIVE_1X: > rate = IB_PATH_RECORD_RATE_5_GBS; > break; 4188,4189c4528,4530 < case 8: < return IB_PATH_RECORD_RATE_20_GBS; --- > case IB_LINK_WIDTH_ACTIVE_4X: > rate = IB_PATH_RECORD_RATE_20_GBS; > break; 4191,4192c4532,4534 < case 16: < return IB_PATH_RECORD_RATE_40_GBS; --- > case IB_LINK_WIDTH_ACTIVE_12X: > rate = IB_PATH_RECORD_RATE_60_GBS; > break; 4194,4195c4536,4546 < case 64: < return IB_PATH_RECORD_RATE_30_GBS; --- > default: > rate = IB_PATH_RECORD_RATE_5_GBS; > break; > } > break; > case IB_LINK_SPEED_ACTIVE_10: > switch (p_pi->link_width_active) > { > case IB_LINK_WIDTH_ACTIVE_1X: > rate = IB_PATH_RECORD_RATE_10_GBS; > break; 4197,4198c4548,4550 < case 128: < return IB_PATH_RECORD_RATE_60_GBS; --- > case IB_LINK_WIDTH_ACTIVE_4X: > rate = IB_PATH_RECORD_RATE_40_GBS; > break; 4200,4201c4552,4554 < case 256: < return IB_PATH_RECORD_RATE_120_GBS; --- > case IB_LINK_WIDTH_ACTIVE_12X: > rate =IB_PATH_RECORD_RATE_120_GBS; > break; 4204c4557,4563 < return IB_PATH_RECORD_RATE_2_5_GBS; --- > rate = IB_PATH_RECORD_RATE_10_GBS; > break; > } > break; > default: > rate = IB_PATH_RECORD_RATE_2_5_GBS; > break; 4205a4565,4566 > > return rate; 4229c4590 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 4238c4599 < //link_width_supported = 1: 1x --- > /* link_width_supported = 1: 1x */ 4242c4603 < //link_width_supported = 3: 1x or 4x --- > /* link_width_supported = 3: 1x or 4x */ 4254c4615 < //link_width_supported = 11: 1x or 4x or 12x --- > /* link_width_supported = 11: 1x or 4x or 12x */ 4291d4651 < 4301c4661 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 4328c4688 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 4355c4715 < AL_INLINE void AL_API --- > static inline void 4379a4740,4798 > /****f* IBA Base: Types/ib_port_info_get_master_smsl > * NAME > * ib_port_info_get_master_smsl > * > * DESCRIPTION > * Returns the encoded value for the Master SMSL at this port. > * > * SYNOPSIS > */ > static inline uint8_t > ib_port_info_get_master_smsl( > IN const ib_port_info_t* const p_pi ) > { > return (uint8_t) (p_pi->mtu_smsl & 0x0F); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * Returns the encoded value for the Master SMSL at this port. > * > * NOTES > * > * SEE ALSO > *********/ > /****f* IBA Base: Types/ib_port_info_set_master_smsl > * NAME > * ib_port_info_set_master_smsl > * > * DESCRIPTION > * Sets the Master SMSL value in the PortInfo attribute. > * > * SYNOPSIS > */ > static inline void > ib_port_info_set_master_smsl( > IN ib_port_info_t* const p_pi, > IN const uint8_t smsl ) > { > p_pi->mtu_smsl = (uint8_t)((p_pi->mtu_smsl & 0xF0) | smsl ); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * mtu > * [in] Encoded Master SMSL value to set > * > * RETURN VALUES > * None. > * > * NOTES > * > * SEE ALSO > *********/ > 4389c4808 < AL_INLINE void AL_API --- > static inline void 4395c4814,4816 < p_pi->subnet_timeout = (uint8_t)(timeout & 0x1F); --- > p_pi->subnet_timeout = > (uint8_t)( > (p_pi->subnet_timeout & 0x80) | (timeout & 0x1F)); 4413c4834 < /****f* IBA Base: Types/ib_port_info_get_lmc --- > /****f* IBA Base: Types/ib_port_info_set_client_rereg 4415c4836 < * ib_port_info_get_lmc --- > * ib_port_info_set_client_rereg 4418c4839 < * Returns the LMC value assigned to this port. --- > * Sets the encoded client reregistration bit value in the PortInfo attribute. 4422,4424c4843,4846 < AL_INLINE uint8_t AL_API < ib_port_info_get_lmc( < IN const ib_port_info_t* const p_pi ) --- > static inline void > ib_port_info_set_client_rereg( > IN ib_port_info_t* const p_pi, > IN const uint8_t client_rereg ) 4426c4848,4851 < return( (uint8_t)(p_pi->mkey_lmc & IB_PORT_LMC_MASK) ); --- > CL_ASSERT( client_rereg <= 0x1 ); > p_pi->subnet_timeout = > (uint8_t)( > (p_pi->subnet_timeout & 0x1F) | ((client_rereg << 7) & 0x80)); 4432a4858,4860 > * client_rereg > * [in] Client reregistration value to set (either 1 or 0). > * 4434c4862 < * Returns the LMC value assigned to this port. --- > * None. 4440c4868,4869 < /****f* IBA Base: Types/ib_port_info_set_lmc --- > > /****f* IBA Base: Types/ib_port_info_get_timeout 4442c4871 < * ib_port_info_set_lmc --- > * ib_port_info_get_timeout 4445c4874 < * Sets the LMC value in the PortInfo attribute. --- > * Gets the encoded subnet timeout value in the PortInfo attribute. 4449,4452c4878,4880 < AL_INLINE void AL_API < ib_port_info_set_lmc( < IN ib_port_info_t* const p_pi, < IN const uint8_t lmc ) --- > static inline uint8_t > ib_port_info_get_timeout( > IN ib_port_info_t const* p_pi ) 4454,4455c4882 < CL_ASSERT( lmc <= 0x7 ); < p_pi->mkey_lmc = (uint8_t)((p_pi->mkey_lmc & 0xF8) | lmc); --- > return(p_pi->subnet_timeout & 0x1F ); 4462,4464d4888 < * lmc < * [in] LMC value to set, must be less than 7. < * 4466c4890 < * None. --- > * The encoded timeout value 4472c4896,4897 < /****f* IBA Base: Types/ib_port_info_get_mpb --- > > /****f* IBA Base: Types/ib_port_info_get_client_rereg 4474c4899 < * ib_port_info_get_mpb --- > * ib_port_info_get_client_rereg 4477c4902 < * Returns the M_Key protect bits assigned to this port. --- > * Gets the encoded client reregistration bit value in the PortInfo attribute. 4481,4483c4906,4908 < AL_INLINE uint8_t AL_API < ib_port_info_get_mpb( < IN const ib_port_info_t* const p_pi ) --- > static inline uint8_t > ib_port_info_get_client_rereg( > IN ib_port_info_t const* p_pi ) 4485,4486c4910 < return( (uint8_t)((p_pi->mkey_lmc & IB_PORT_MPB_MASK) >> < IB_PORT_MPB_SHIFT) ); --- > return ( (p_pi->subnet_timeout & 0x80 ) >> 7); 4490c4914 < * p_ni --- > * p_pi 4494c4918 < * Returns the M_Key protect bits assigned to this port. --- > * Client reregistration value (either 1 or 0). 4501c4925 < /****f* IBA Base: Types/ib_port_info_set_mpb --- > /****f* IBA Base: Types/ib_port_info_set_hoq_lifetime 4503c4927 < * ib_port_info_set_mpb --- > * ib_port_info_set_hoq_lifetime 4506c4930,4931 < * Set the M_Key protect bits of this port. --- > * Sets the Head of Queue Lifetime for which a packet can live in the head > * of VL queue 4510,4513c4935,4938 < AL_INLINE void AL_API < ib_port_info_set_mpb( < IN ib_port_info_t* p_pi, < IN uint8_t mpb ) --- > static inline void > ib_port_info_set_hoq_lifetime( > IN ib_port_info_t* const p_pi, > IN const uint8_t hoq_life ) 4515,4517c4940,4941 < p_pi->mkey_lmc = < ((p_pi->mkey_lmc & ~IB_PORT_MPB_MASK) | < (mpb << IB_PORT_MPB_SHIFT)); --- > p_pi->vl_stall_life = (uint8_t)((hoq_life & 0x1f) | > (p_pi->vl_stall_life & 0xe0)); 4521,4523c4945 < * mpb < * [in] M_Key protect bits < * p_ni --- > * p_pi 4525a4948,4950 > * hoq_life > * [in] Encoded lifetime value to set > * 4526a4952 > * None. 4532a4959,4974 > /****f* IBA Base: Types/ib_port_info_get_hoq_lifetime > * NAME > * ib_port_info_get_hoq_lifetime > * > * DESCRIPTION > * Gets the Head of Queue Lifetime for which a packet can live in the head > * of VL queue > * > * SYNOPSIS > */ > static inline uint8_t > ib_port_info_get_hoq_lifetime( > IN const ib_port_info_t* const p_pi ) > { > return( (uint8_t)(p_pi->vl_stall_life & 0x1f) ); > } 4534c4976,4987 < typedef uint8_t ib_svc_name_t[64]; --- > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * Encoded lifetime value > * > * NOTES > * > * SEE ALSO > *********/ 4536,4537c4989,5002 < #include < typedef struct _ib_service_record --- > /****f* IBA Base: Types/ib_port_info_set_vl_stall_count > * NAME > * ib_port_info_set_vl_stall_count > * > * DESCRIPTION > * Sets the VL Stall Count which define the number of contiguous > * HLL (hoq) drops that will put the VL into stalled mode. > * > * SYNOPSIS > */ > static inline void > ib_port_info_set_vl_stall_count( > IN ib_port_info_t* const p_pi, > IN const uint8_t vl_stall_count ) 4539,4591c5004,5010 < ib_net64_t service_id; < ib_gid_t service_gid; < ib_net16_t service_pkey; < ib_net16_t resv; < ib_net32_t service_lease; < uint8_t service_key[16]; < ib_svc_name_t service_name; < uint8_t service_data8[16]; < ib_net16_t service_data16[8]; < ib_net32_t service_data32[4]; < ib_net64_t service_data64[2]; < < } PACK_SUFFIX ib_service_record_t; < #include < < #include < typedef struct _ib_portinfo_record < { < ib_net16_t lid; < uint8_t port_num; < uint8_t resv; < ib_port_info_t port_info; < uint8_t pad[6]; < < } PACK_SUFFIX ib_portinfo_record_t; < #include < < #include < typedef struct _ib_link_record < { < ib_net16_t from_lid; < uint8_t from_port_num; < uint8_t to_port_num; < ib_net16_t to_lid; < uint8_t pad[2]; < < } PACK_SUFFIX ib_link_record_t; < #include < < #include < typedef struct _ib_sminfo_record < { < ib_net16_t lid; < uint16_t resv0; < ib_sm_info_t sm_info; < uint8_t pad[7]; < < } PACK_SUFFIX ib_sminfo_record_t; < #include < < /****s* IBA Base: Types/ib_lft_record_t < * NAME < * ib_lft_record_t --- > p_pi->vl_stall_life = (uint8_t)((p_pi->vl_stall_life & 0x1f) | > ((vl_stall_count << 5) & 0xe0)); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. 4593,4594c5012,5013 < * DESCRIPTION < * IBA defined LinearForwardingTable. (14.2.5.6) --- > * vl_stall_count > * [in] value to set 4596,4612c5015,5016 < * SYNOPSIS < */ < #include < typedef struct _ib_lft_record < { < ib_net16_t lid; < ib_net16_t block_num; < uint32_t resv0; < uint8_t lft[64]; < < } PACK_SUFFIX ib_lft_record_t; < #include < /************/ < < /****s* IBA Base: Types/ib_switch_info_t < * NAME < * ib_switch_info_t --- > * RETURN VALUES > * None. 4614,4615c5018 < * DESCRIPTION < * IBA defined SwitchInfo. (14.2.5.4) --- > * NOTES 4617,4649c5020,5021 < * SYNOPSIS < */ < #include < typedef struct _ib_switch_info < { < ib_net16_t lin_cap; < ib_net16_t rand_cap; < ib_net16_t mcast_cap; < ib_net16_t lin_top; < uint8_t def_port; < uint8_t def_mcast_pri_port; < uint8_t def_mcast_not_port; < uint8_t life_state; < ib_net16_t lids_per_port; < ib_net16_t enforce_cap; < uint8_t flags; < < } PACK_SUFFIX ib_switch_info_t; < #include < /************/ < < #include < typedef struct _ib_switch_info_record < { < ib_net16_t lid; < uint16_t resv0; < ib_switch_info_t switch_info; < uint8_t pad[3]; < < } PACK_SUFFIX ib_switch_info_record_t; < #include < < #define IB_SWITCH_PSC 0x04 --- > * SEE ALSO > *********/ 4651c5023 < /****f* IBA Base: Types/ib_switch_info_get_state_change --- > /****f* IBA Base: Types/ib_port_info_get_vl_stall_count 4653c5025 < * ib_switch_info_get_state_change --- > * ib_port_info_get_vl_stall_count 4656c5028,5029 < * Returns the value of the state change flag. --- > * Gets the VL Stall Count which define the number of contiguous > * HLL (hoq) drops that will put the VL into stalled mode 4660,4662c5033,5035 < AL_INLINE boolean_t AL_API < ib_switch_info_get_state_change( < IN const ib_switch_info_t* const p_si ) --- > static inline uint8_t > ib_port_info_get_vl_stall_count( > IN const ib_port_info_t* const p_pi ) 4664c5037 < return( (p_si->life_state & IB_SWITCH_PSC) == IB_SWITCH_PSC ); --- > return( (uint8_t)(p_pi->vl_stall_life & 0xe0) >> 5); 4665a5039 > 4668,4669c5042,5043 < * p_si < * [in] Pointer to a SwitchInfo attribute. --- > * p_pi > * [in] Pointer to a PortInfo attribute. 4672c5046 < * Returns the value of the state change flag. --- > * vl stall count 4679c5053 < /****f* IBA Base: Types/ib_switch_info_clear_state_change --- > /****f* IBA Base: Types/ib_port_info_get_lmc 4681c5055 < * ib_switch_info_clear_state_change --- > * ib_port_info_get_lmc 4684c5058 < * Clears the switch's state change bit. --- > * Returns the LMC value assigned to this port. 4688,4690c5062,5064 < AL_INLINE void AL_API < ib_switch_info_clear_state_change( < IN ib_switch_info_t* const p_si ) --- > static inline uint8_t > ib_port_info_get_lmc( > IN const ib_port_info_t* const p_pi ) 4692c5066 < p_si->life_state = (uint8_t)(p_si->life_state & 0xFB); --- > return( (uint8_t)(p_pi->mkey_lmc & IB_PORT_LMC_MASK) ); 4696c5070 < * p_ni --- > * p_pi 4707c5081 < /****s* IBA Base: Types/ib_guid_info_t --- > /****f* IBA Base: Types/ib_port_info_set_lmc 4709c5083 < * ib_guid_info_t --- > * ib_port_info_set_lmc 4712c5086 < * IBA defined GuidInfo. (14.2.5.5) --- > * Sets the LMC value in the PortInfo attribute. 4716,4719c5090,5093 < #define GUID_TABLE_MAX_ENTRIES 8 < < #include < typedef struct _ib_guid_info --- > static inline void > ib_port_info_set_lmc( > IN ib_port_info_t* const p_pi, > IN const uint8_t lmc ) 4721,4729c5095,5101 < ib_net64_t guid[GUID_TABLE_MAX_ENTRIES]; < < } PACK_SUFFIX ib_guid_info_t; < #include < /************/ < < /****s* IBA Base: Types/ib_pkey_table_info_t < * NAME < * ib_pkey_table_info_t --- > CL_ASSERT( lmc <= IB_PORT_LMC_MAX ); > p_pi->mkey_lmc = (uint8_t)((p_pi->mkey_lmc & 0xF8) | lmc); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. 4731,4732c5103,5104 < * DESCRIPTION < * IBA defined PKey table. (14.2.5.7) --- > * lmc > * [in] LMC value to set, must be less than 7. 4734,4745c5106,5112 < * SYNOPSIS < */ < #define PKEY_TABLE_MAX_ENTRIES 32 < < #include < typedef struct _ib_pkey_table_info < { < ib_net16_t pkey[PKEY_TABLE_MAX_ENTRIES]; < < } PACK_SUFFIX ib_pkey_table_info_t; < #include < /************/ --- > * RETURN VALUES > * None. > * > * NOTES > * > * SEE ALSO > *********/ 4747,4748c5114 < #define IB_MAX_NUM_VLS 16 < /****s* IBA Base: Types/ib_slvl_table_t --- > /****f* IBA Base: Types/ib_port_info_get_link_speed_enabled 4750c5116 < * ib_slvl_table_t --- > * ib_port_info_get_link_speed_enabled 4753c5119 < * IBA defined SL2VL Mapping Table Attribute. (14.2.5.8) --- > * Returns the link speed enabled value assigned to this port. 4757,4758c5123,5125 < #include < typedef struct _ib_slvl_table --- > static inline uint8_t > ib_port_info_get_link_speed_enabled( > IN const ib_port_info_t* const p_pi ) 4760,4766c5127,5141 < uint8_t vl_table[IB_MAX_NUM_VLS/2]; < < } PACK_SUFFIX ib_slvl_table_t; < #include < /************/ < < /****f* IBA Base: Types/ib_slvl_table_get_vl --- > return( (uint8_t)(p_pi->link_speed & IB_PORT_LINK_SPEED_ENABLED_MASK) ); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * Port state. > * > * NOTES > * > * SEE ALSO > *********/ > /****f* IBA Base: Types/ib_port_info_set_link_speed_enabled 4768c5143 < * ib_slvl_table_get_vl --- > * ib_port_info_set_link_speed_enabled 4771c5146 < * Retrieves the VL for a given SL from an SL to VL mapping table. --- > * Sets the link speed enabled value in the PortInfo attribute. 4775,4778c5150,5153 < AL_INLINE uint8_t AL_API < ib_slvl_table_get_vl( < IN const ib_slvl_table_t* const p_slvl_tbl, < IN const uint8_t sl ) --- > static inline void > ib_port_info_set_link_speed_enabled( > IN ib_port_info_t* const p_pi, > IN const uint8_t link_speed_enabled ) 4780,4789c5155 < uint8_t vl; < < /* There are two VL's per byte. */ < vl = p_slvl_tbl->vl_table[sl/2]; < /* If odd, shift down 4 bits. */ < if( sl % 2 ) < vl >>= 4; < < /* Mask off upper bits and return. */ < return vl & 0x0F; --- > p_pi->link_speed = (uint8_t)((p_pi->link_speed & 0xF0) | link_speed_enabled ); 4793,4794c5159,5160 < * p_slvl_tbl < * [in] Pointer to the SL to VL mapping table from which to return the VL. --- > * p_pi > * [in] Pointer to a PortInfo attribute. 4796,4797c5162,5163 < * sl < * [in] SL in the table for which to return the VL. --- > * link_speed_enabled > * [in] link speed enabled value to set. 4800c5166,5168 < * Returns the VL value for the specified SL in the provided table. --- > * None. > * > * NOTES 4803d5170 < * ib_slvl_table_t, ib_slvl_table_set_vl 4806c5173 < /****f* IBA Base: Types/ib_slvl_table_set_vl --- > /****f* IBA Base: Types/ib_port_info_get_mpb 4808c5175 < * ib_slvl_table_set_vl --- > * ib_port_info_get_mpb 4811c5178 < * Sets the VL for a given SL in an SL to VL mapping table. --- > * Returns the M_Key protect bits assigned to this port. 4815,4832c5182,5184 < AL_INLINE void AL_API < ib_slvl_table_set_vl( < IN OUT ib_slvl_table_t* const p_slvl_tbl, < IN const uint8_t sl, < IN const uint8_t vl ) < { < uint8_t entry; < < /* Get the current value for the byte in which the VL is stored. */ < entry = p_slvl_tbl->vl_table[sl/2]; < < /* Clear the appropriate bits and set the new VL value. */ < if( sl % 2 ) < { < entry &= 0x0F; < entry |= ((vl & 0x0F) << 4); < } < else --- > static inline uint8_t > ib_port_info_get_mpb( > IN const ib_port_info_t* const p_pi ) 4834,4838c5186,5187 < entry &= 0xF0; < entry |= (vl & 0x0F); < } < /* Store the updated entry back into the table. */ < p_slvl_tbl->vl_table[sl/2] = entry; --- > return( (uint8_t)((p_pi->mkey_lmc & IB_PORT_MPB_MASK) >> > IB_PORT_MPB_SHIFT) ); 4842,4849c5191,5192 < * slvl_tbl < * [in/out] Pointer to the SL to VL mapping table in which to store the VL. < * < * sl < * [in] SL in the table for which to store the VL. < * < * vl < * [in] VL to store at the specifed SL. --- > * p_ni > * [in] Pointer to a PortInfo attribute. 4852c5195,5197 < * This function does not return a value. --- > * Returns the M_Key protect bits assigned to this port. > * > * NOTES 4855d5199 < * ib_slvl_table_t, ib_slvl_table_get_vl 4858c5202 < /****s* IBA Base: Types/ib_slvl_table_record_t --- > /****f* IBA Base: Types/ib_port_info_set_mpb 4860c5204 < * ib_slvl_table_record_t --- > * ib_port_info_set_mpb 4863c5207 < * IBA defined Sl to VL Mapping Table Record for SA Query. (15.2.5.4) --- > * Set the M_Key protect bits of this port. 4867,4868c5211,5214 < #include < typedef struct _ib_slvl_table_record --- > static inline void > ib_port_info_set_mpb( > IN ib_port_info_t* p_pi, > IN uint8_t mpb ) 4870,4880c5216,5233 < ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 < uint8_t in_port_num; // reserved for CA's < uint8_t out_port_num; // reserved for CA's < uint32_t resv; < ib_slvl_table_t slvl_tbl; < < } PACK_SUFFIX ib_slvl_table_record_t; < #include < /************/ < < /****s* IBA Base: Types/ib_vl_arb_element_t --- > p_pi->mkey_lmc = > (~IB_PORT_MPB_MASK & p_pi->mkey_lmc) | > ( IB_PORT_MPB_MASK & (mpb << IB_PORT_MPB_SHIFT) ); > } > /* > * PARAMETERS > * mpb > * [in] M_Key protect bits > * p_ni > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * > * NOTES > * > * SEE ALSO > *********/ > /****f* IBA Base: Types/ib_port_info_get_local_phy_err_thd 4882c5235 < * ib_vl_arb_element_t --- > * ib_port_info_get_local_phy_err_thd 4885c5238 < * IBA defined VL Arbitration Table Element. (14.2.5.9) --- > * Returns the Phy Link Threshold 4889,4890c5242,5244 < #include < typedef struct _ib_vl_arb_element --- > static inline uint8_t > ib_port_info_get_local_phy_err_thd( > IN const ib_port_info_t* const p_pi ) 4892,4899c5246,5260 < uint8_t res_vl; < uint8_t weight; < < } PACK_SUFFIX ib_vl_arb_element_t; < #include < /************/ < < /****f* IBA Base: Types/ib_vl_arb_element_get_vl --- > return (uint8_t)( (p_pi->error_threshold & 0xF0) >> 4); > } > /* > * PARAMETERS > * p_pi > * [in] Pointer to a PortInfo attribute. > * > * RETURN VALUES > * Returns the Phy Link error threshold assigned to this port. > * > * NOTES > * > * SEE ALSO > *********/ > /****f* IBA Base: Types/ib_port_info_get_overrun_err_thd 4901c5262 < * ib_vl_arb_element_get_vl --- > * ib_port_info_get_local_overrun_err_thd 4904c5265 < * Retrieves the VL from a VL arbitration table element. --- > * Returns the Credits Overrun Errors Threshold 4908,4910c5269,5271 < AL_INLINE uint8_t AL_API < ib_vl_arb_element_get_vl( < IN const ib_vl_arb_element_t vl_arb_element ) --- > static inline uint8_t > ib_port_info_get_overrun_err_thd( > IN const ib_port_info_t* const p_pi ) 4912c5273 < return (vl_arb_element.res_vl >> 4); --- > return (uint8_t)(p_pi->error_threshold & 0x0F); 4916,4917c5277,5278 < * vl_arb_element < * [in] VL arbitration table element from which to return the VL. --- > * p_pi > * [in] Pointer to a PortInfo attribute. 4920c5281,5283 < * Returns the VL value for the specified VL arbitration table element. --- > * Returns the Credits Overrun errors threshold assigned to this port. > * > * NOTES 4923d5285 < * vl_arb_element, ib_vl_arb_element_set_vl 4926c5288 < /****f* IBA Base: Types/ib_vl_arb_element_set_vl --- > /****f* IBA Base: Types/ib_port_info_set_phy_and_overrun_err_thd 4928c5290 < * ib_vl_arb_element_set_vl --- > * ib_port_info_set_phy_and_overrun_err_thd 4931c5293 < * Retrieves the VL from a VL arbitration table element. --- > * Sets the Phy Link and Credits Overrun Errors Threshold 4935,4938c5297,5301 < AL_INLINE void AL_API < ib_vl_arb_element_set_vl( < IN OUT ib_vl_arb_element_t* const p_vl_arb_element, < IN const uint8_t vl ) --- > static inline void > ib_port_info_set_phy_and_overrun_err_thd( > IN ib_port_info_t* const p_pi, > IN uint8_t phy_threshold, > IN uint8_t overrun_threshold ) 4940c5303,5304 < p_vl_arb_element->res_vl = vl << 4; --- > p_pi->error_threshold = > (uint8_t)( ((phy_threshold & 0x0F) << 4) | (overrun_threshold & 0x0F) ); 4944,4945c5308,5309 < * vl_arb_element < * [in/out] VL arbitration table element in which to store the VL. --- > * p_pi > * [in] Pointer to a PortInfo attribute. 4947,4948c5311,5315 < * vl < * [in] VL to store in the specified element. --- > * phy_threshold > * [in] Physical Link Errors Threshold above which Trap 129 is generated > * > * overrun_threshold > * [in] Credits overrun Errors Threshold above which Trap 129 is generated 4951c5318,5320 < * This function does not return a value. --- > * None. > * > * NOTES 4954d5322 < * vl_arb_element, ib_vl_arb_element_get_vl 4957,4958c5325,5380 < #define IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK 32 < /****s* IBA Base: Types/ib_vl_arb_table_t --- > typedef uint8_t ib_svc_name_t[64]; > > #include > typedef struct _ib_service_record > { > ib_net64_t service_id; > ib_gid_t service_gid; > ib_net16_t service_pkey; > ib_net16_t resv; > ib_net32_t service_lease; > uint8_t service_key[16]; > ib_svc_name_t service_name; > uint8_t service_data8[16]; > ib_net16_t service_data16[8]; > ib_net32_t service_data32[4]; > ib_net64_t service_data64[2]; > > } PACK_SUFFIX ib_service_record_t; > #include > > #include > typedef struct _ib_portinfo_record > { > ib_net16_t lid; > uint8_t port_num; > uint8_t resv; > ib_port_info_t port_info; > uint8_t pad[6]; > > } PACK_SUFFIX ib_portinfo_record_t; > #include > > #include > typedef struct _ib_link_record > { > ib_net16_t from_lid; > uint8_t from_port_num; > uint8_t to_port_num; > ib_net16_t to_lid; > uint8_t pad[2]; > > } PACK_SUFFIX ib_link_record_t; > #include > > #include > typedef struct _ib_sminfo_record > { > ib_net16_t lid; > uint16_t resv0; > ib_sm_info_t sm_info; > uint8_t pad[7]; > > } PACK_SUFFIX ib_sminfo_record_t; > #include > > /****s* IBA Base: Types/ib_lft_record_t 4960c5382 < * ib_vl_arb_table_t --- > * ib_lft_record_t 4963c5385 < * IBA defined VL Arbitration Table. (14.2.5.9) --- > * IBA defined LinearForwardingTable. (14.2.5.6) 4968c5390 < typedef struct _ib_vl_arb_table --- > typedef struct _ib_lft_record 4970,4972c5392,5396 < ib_vl_arb_element_t vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]; < < } PACK_SUFFIX ib_vl_arb_table_t; --- > ib_net16_t lid; > ib_net16_t block_num; > uint32_t resv0; > uint8_t lft[64]; > } PACK_SUFFIX ib_lft_record_t; 4976c5400 < /****s* IBA Base: Types/ib_vl_arb_table_record_t --- > /****s* IBA Base: Types/ib_switch_info_t 4978c5402 < * ib_vl_arb_table_record_t --- > * ib_switch_info_t 4981c5405 < * IBA defined VL Arbitration Table Record for SA Query. (15.2.5.9) --- > * IBA defined SwitchInfo. (14.2.5.4) 4986c5410 < typedef struct _ib_vl_arb_table_record --- > typedef struct _ib_switch_info 4988,4992c5412,5422 < ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 < uint8_t port_num; < uint8_t block_num; < uint32_t reserved; < ib_vl_arb_table_t vl_arb_tbl; --- > ib_net16_t lin_cap; > ib_net16_t rand_cap; > ib_net16_t mcast_cap; > ib_net16_t lin_top; > uint8_t def_port; > uint8_t def_mcast_pri_port; > uint8_t def_mcast_not_port; > uint8_t life_state; > ib_net16_t lids_per_port; > ib_net16_t enforce_cap; > uint8_t flags; 4994c5424 < } PACK_SUFFIX ib_vl_arb_table_record_t; --- > } PACK_SUFFIX ib_switch_info_t; 4998c5428,5441 < /****s* IBA Base: Types/ib_grh_t --- > #include > typedef struct _ib_switch_info_record > { > ib_net16_t lid; > uint16_t resv0; > ib_switch_info_t switch_info; > uint8_t pad[3]; > > } PACK_SUFFIX ib_switch_info_record_t; > #include > > #define IB_SWITCH_PSC 0x04 > > /****f* IBA Base: Types/ib_switch_info_get_state_change 5000c5443 < * ib_grh_t --- > * ib_switch_info_get_state_change 5003c5446 < * Global route header information received with unreliable datagram messages --- > * Returns the value of the state change flag. 5007,5008c5450,5452 < #include < typedef struct _ib_grh --- > static inline boolean_t > ib_switch_info_get_state_change( > IN const ib_switch_info_t* const p_si ) 5010,5019c5454,5467 < ib_net32_t ver_class_flow; < ib_net16_t resv1; < uint8_t resv2; < uint8_t hop_limit; < ib_gid_t src_gid; < ib_gid_t dest_gid; < < } PACK_SUFFIX ib_grh_t; < #include < /************/ --- > return( (p_si->life_state & IB_SWITCH_PSC) == IB_SWITCH_PSC ); > } > /* > * PARAMETERS > * p_si > * [in] Pointer to a SwitchInfo attribute. > * > * RETURN VALUES > * Returns the value of the state change flag. > * > * NOTES > * > * SEE ALSO > *********/ 5021c5469 < /****f* IBA Base: Types/ib_grh_get_ver_class_flow --- > /****f* IBA Base: Types/ib_switch_info_clear_state_change 5023c5471 < * ib_grh_get_ver_class_flow --- > * ib_switch_info_clear_state_change 5026c5474 < * Get encoded version, traffic class and flow label in grh --- > * Clears the switch's state change bit. 5030,5035c5478,5480 < AL_INLINE void AL_API < ib_grh_get_ver_class_flow( < IN const ib_net32_t ver_class_flow, < OUT uint8_t* const p_ver OPTIONAL, < OUT uint8_t* const p_tclass OPTIONAL, < OUT net32_t* const p_flow_lbl OPTIONAL ) --- > static inline void > ib_switch_info_clear_state_change( > IN ib_switch_info_t* const p_si ) 5037,5048c5482 < ib_net32_t tmp_ver_class_flow; < < tmp_ver_class_flow = cl_ntoh32( ver_class_flow ); < < if (p_ver) < *p_ver = (uint8_t)(tmp_ver_class_flow >> 28); < < if (p_tclass) < *p_tclass = (uint8_t)(tmp_ver_class_flow >> 20); < < if (p_flow_lbl) < *p_flow_lbl = (ver_class_flow & CL_HTON32( 0x000FFFFF )); --- > p_si->life_state = (uint8_t)(p_si->life_state & 0xFB); 5052,5053c5486,5487 < * ver_class_flow < * [in] the version, traffic class and flow label info. --- > * p_ni > * [in] Pointer to a PortInfo attribute. 5056,5063c5490 < * p_ver < * [out] pointer to the version info. < * < * p_tclass < * [out] pointer to the traffic class info. < * < * p_flow_lbl < * [out] pointer to the flow label info --- > * Returns the LMC value assigned to this port. 5068d5494 < * ib_grh_t 5071c5497,5498 < /****f* IBA Base: Types/ib_grh_set_ver_class_flow --- > > /****f* IBA Base: Types/ib_switch_info_is_enhanced_port0 5073c5500 < * ib_grh_set_ver_class_flow --- > * ib_switch_info_is_enhanced_port0 5076c5503,5505 < * Set encoded version, traffic class and flow label in grh --- > * Returns TRUE if the enhancedPort0 bit is on (meaning the switch > * port zero supports enhanced functions). > * Returns FALSE otherwise. 5080,5084c5509,5511 < AL_INLINE ib_net32_t AL_API < ib_grh_set_ver_class_flow( < IN const uint8_t ver, < IN const uint8_t tclass, < IN const net32_t flow_lbl ) --- > static inline boolean_t > ib_switch_info_is_enhanced_port0( > IN const ib_switch_info_t* const p_si ) 5086,5090c5513 < ib_net32_t ver_class_flow; < < ver_class_flow = cl_hton32( (ver << 28) | (tclass << 20) ); < ver_class_flow |= (flow_lbl & CL_HTON32( 0x000FFFFF )); < return (ver_class_flow); --- > return( (p_si->flags & 0x08) == 0x08 ); 5094,5101c5517,5518 < * ver < * [in] the version info. < * < * tclass < * [in] the traffic class info. < * < * flow_lbl < * [in] the flow label info --- > * p_si > * [in] Pointer to a SwitchInfo attribute. 5104,5105c5521 < * ver_class_flow < * [out] the version, traffic class and flow label info. --- > * Returns TRUE if the switch supports enhanced port 0. FALSE otherwise. 5110d5525 < * ib_grh_t 5113c5528 < /****s* IBA Base: Types/ib_member_rec_t --- > /****s* IBA Base: Types/ib_guid_info_t 5115c5530 < * ib_member_rec_t --- > * ib_guid_info_t 5118,5119c5533 < * Multicast member record, used to create, join, and leave multicast < * groups. --- > * IBA defined GuidInfo. (14.2.5.5) 5122a5537,5538 > #define GUID_TABLE_MAX_ENTRIES 8 > 5124c5540 < typedef struct _ib_member_rec --- > typedef struct _ib_guid_info 5126,5130c5542,5564 < ib_gid_t mgid; < ib_gid_t port_gid; < ib_net32_t qkey; < ib_net16_t mlid; < uint8_t mtu; --- > ib_net64_t guid[GUID_TABLE_MAX_ENTRIES]; > > } PACK_SUFFIX ib_guid_info_t; > #include > /************/ > > #include > typedef struct _ib_guidinfo_record > { > ib_net16_t lid; > uint8_t block_num; > uint8_t resv; > uint32_t reserved; > ib_guid_info_t guid_info; > } PACK_SUFFIX ib_guidinfo_record_t; > #include > > #define IB_MULTIPATH_MAX_GIDS 11 /* Support max that can fit into first MAD (for now) */ > > #include > typedef struct _ib_multipath_rec_t > { > ib_net32_t hop_flow_raw; 5131a5566 > uint8_t num_path; 5132a5568,5570 > uint8_t resv0; > uint8_t sl; > uint8_t mtu; 5135,5141c5573,5579 < ib_net32_t sl_flow_hop; < uint8_t scope_state; < uint8_t proxy_join; < uint8_t reserved[2]; < uint8_t pad[4]; < < } PACK_SUFFIX ib_member_rec_t; --- > uint8_t resv1; > uint8_t independence; /* formerly resv2 */ > uint8_t sgid_count; > uint8_t dgid_count; > uint8_t resv3[7]; > ib_gid_t gids[IB_MULTIPATH_MAX_GIDS]; > } PACK_SUFFIX ib_multipath_rec_t; 5145,5146c5583,5584 < * mgid < * Multicast GID address for this multicast group. --- > * hop_flow_raw > * Global routing parameters: hop count, flow label and raw bit. 5148,5149c5586,5587 < * port_gid < * Valid GID of the endpoint joining this multicast group. --- > * tclass > * Another global routing parameter. 5151,5152c5589,5592 < * requestor_gid < * GID of the endpoint making this request on hehave of port_gid. --- > * num_path > * Reversible path - 1 bit to say if path is reversible. > * num_path [6:0] In queries, maximum number of paths to return. > * In responses, undefined. 5154,5155c5594,5595 < * qkey < * Q_Key to be used by this multicast group. --- > * pkey > * Partition key (P_Key) to use on this path. 5157,5158c5597,5598 < * mlid < * Multicast LID for this multicast group. --- > * sl > * Service level to use on this path. 5162,5168d5601 < * < * tclass < * Another global routing parameter. < * < * pkey < * Partition key (P_Key) to use for this member. < * 5175,5233c5608,5610 < * sl_flow_hop < * Global routing parameters: service level, hop count, and flow label. < * < * scope_state < * MGID scope and JoinState of multicast request. < * < * proxy_join < * Enables others in the Partition to proxy add/remove from the group < * < * SEE ALSO < *********/ < < /****f* IBA Base: Types/ib_member_get_sl_flow_hop < * NAME < * ib_member_get_sl_flow_hop < * < * DESCRIPTION < * Get encoded sl flow label and hop limit < * < * SYNOPSIS < */ < AL_INLINE void AL_API < ib_member_get_sl_flow_hop( < IN const ib_net32_t sl_flow_hop, < OUT uint8_t* const p_sl OPTIONAL, < OUT net32_t* const p_flow_lbl OPTIONAL, < OUT uint8_t* const p_hop OPTIONAL ) < { < ib_net32_t tmp_sl_flow_hop; < < if (p_sl) < *p_sl = (uint8_t)(sl_flow_hop & 0x0f); < < tmp_sl_flow_hop = sl_flow_hop >> 4; < < if (p_flow_lbl) < *p_flow_lbl = (uint32_t)(tmp_sl_flow_hop & 0xffffff); < < tmp_sl_flow_hop = tmp_sl_flow_hop >> 20; < < if (p_hop) < *p_hop = (uint8_t)(tmp_sl_flow_hop & 0xff); < } < /* < * PARAMETERS < * sl_flow_hop < * [in] the sl flow label and hop limit of MC Group < * < * RETURN VALUES < * p_sl < * [out] pointer to the service level < * < * p_flow_lbl < * [out] pointer to the flow label info < * < * p_hop < * [out] pointer to the hop count limit. < * < * NOTES --- > * preference > * Indicates the relative merit of this path versus other path > * records returned from the SA. Lower numbers are better. 5236d5612 < * ib_member_rec_t 5239c5615 < /****f* IBA Base: Types/ib_member_set_sl_flow_hop --- > /****f* IBA Base: Types/ib_multipath_rec_num_path 5241c5617 < * ib_member_set_sl_flow_hop --- > * ib_multipath_rec_num_path 5244c5620 < * Set encoded version, sl flow and hop --- > * Get max number of paths to return. 5248,5261c5624,5628 < AL_INLINE ib_net32_t AL_API < ib_member_set_sl_flow_hop( < IN const uint8_t sl, < IN const net32_t flow_lbl, < IN const uint8_t hop_limit ) < { < ib_net32_t sl_flow_hop; < < sl_flow_hop = sl; < sl_flow_hop <<= 20; < sl_flow_hop |= (cl_ntoh32( flow_lbl ) & 0x000FFFFF); < sl_flow_hop <<= 8; < sl_flow_hop |= hop_limit; < return cl_hton32(sl_flow_hop); --- > static inline uint8_t > ib_multipath_rec_num_path( > IN const ib_multipath_rec_t* const p_rec ) > { > return( p_rec->num_path &0x7F ); 5265,5272c5632,5633 < * sl < * [in] the service level. < * < * flow_lbl < * [in] the flow label info < * < * hop_limit < * [in] the hop limit. --- > * p_rec > * [in] Pointer to the multipath record object. 5275,5276c5636 < * sl_flow_hop < * [out] the sl flow label and hop limit --- > * Maximum number of paths to return for each unique SGID_DGID combination. 5281c5641 < * ib_member_rec_t --- > * ib_multipath_rec_t 5284c5644 < /****f* IBA Base: Types/ib_member_get_scope --- > /****f* IBA Base: Types/ib_multipath_rec_sl 5286c5646 < * ib_member_get_scope --- > * ib_multipath_rec_sl 5289c5649 < * Get encoded MGID scope --- > * Get multipath service level. 5293,5295c5653,5655 < AL_INLINE uint8_t AL_API < ib_member_get_scope( < IN const uint8_t scope_state ) --- > static inline uint8_t > ib_multipath_rec_sl( > IN const ib_multipath_rec_t* const p_rec ) 5297c5657 < return (scope_state >> 4); --- > return( (uint8_t)((cl_ntoh16( p_rec->sl )) & 0xF) ); 5301,5302c5661,5662 < * scope_state < * [in] the scope and state --- > * p_rec > * [in] Pointer to the multipath record object. 5305c5665,5667 < * Encoded scope. --- > * SL. > * > * NOTES 5308c5670 < * ib_member_rec_t --- > * ib_multipath_rec_t 5311c5673 < /****f* IBA Base: Types/ib_member_get_state --- > /****f* IBA Base: Types/ib_multipath_rec_mtu 5313c5675 < * ib_member_get_state --- > * ib_multipath_rec_mtu 5316c5678 < * Get encoded MGID JoinState --- > * Get encoded path MTU. 5320,5322c5682,5684 < AL_INLINE uint8_t AL_API < ib_member_get_state( < IN const uint8_t scope_state ) --- > static inline uint8_t > ib_multipath_rec_mtu( > IN const ib_multipath_rec_t* const p_rec ) 5324c5686 < return (scope_state & 0x0f); --- > return( (uint8_t)(p_rec->mtu & IB_MULTIPATH_REC_BASE_MASK) ); 5328,5329c5690,5691 < * scope_state < * [in] the scope and state --- > * p_rec > * [in] Pointer to the multipath record object. 5332c5694,5702 < * Encoded JoinState --- > * Encoded path MTU. > * 1: 256 > * 2: 512 > * 3: 1024 > * 4: 2048 > * 5: 4096 > * others: reserved > * > * NOTES 5335c5705 < * ib_member_rec_t --- > * ib_multipath_rec_t 5338c5708 < /****f* IBA Base: Types/ib_member_get_scope_state --- > /****f* IBA Base: Types/ib_multipath_rec_mtu_sel 5340c5710 < * ib_member_get_scope_state --- > * ib_multipath_rec_mtu_sel 5343c5713 < * Get encoded MGID scope and JoinState --- > * Get encoded multipath MTU selector. 5347,5351c5717,5719 < AL_INLINE void AL_API < ib_member_get_scope_state( < IN const uint8_t scope_state, < OUT uint8_t* const p_scope, < OUT uint8_t* const p_state ) --- > static inline uint8_t > ib_multipath_rec_mtu_sel( > IN const ib_multipath_rec_t* const p_rec ) 5353,5357c5721 < if (p_scope) < *p_scope = ib_member_get_scope( scope_state ); < < if (p_state) < *p_state = ib_member_get_state( scope_state ); --- > return( (uint8_t)((p_rec->mtu & IB_MULTIPATH_REC_SELECTOR_MASK) >> 6) ); 5361,5362c5725,5726 < * scope_state < * [in] the scope and state --- > * p_rec > * [in] Pointer to the multipath record object. 5365,5369c5729,5733 < * p_scope < * [out] pointer to the MGID scope < * < * p_state < * [out] pointer to the join state --- > * Encoded path MTU selector value (for queries). > * 0: greater than MTU specified > * 1: less than MTU specified > * 2: exactly the MTU specified > * 3: largest MTU available 5374c5738 < * ib_member_rec_t --- > * ib_multipath_rec_t 5377c5741 < /****f* IBA Base: Types/ib_member_set_scope --- > /****f* IBA Base: Types/ib_multipath_rec_rate 5379c5743 < * ib_member_set_scope --- > * ib_multipath_rec_rate 5382c5746 < * Set encoded scope of a MCR. --- > * Get encoded multipath rate. 5386,5389c5750,5752 < AL_INLINE void AL_API < ib_member_set_scope( < IN OUT uint8_t* const p_scope_state, < IN const uint8_t scope ) --- > static inline uint8_t > ib_multipath_rec_rate( > IN const ib_multipath_rec_t* const p_rec ) 5391,5394c5754 < CL_ASSERT( scope <= 0x0F ); < /* Scope is MS 4-bits. */ < *p_scope_state &= 0xF0; < *p_scope_state |= (scope << 4); --- > return( (uint8_t)(p_rec->rate & IB_MULTIPATH_REC_BASE_MASK) ); 5398,5402c5758,5759 < * scope_state < * [in/out] Pointer to the MCR scope_state field. < * < * scope < * [in] The desired scope. --- > * p_rec > * [in] Pointer to the multipath record object. 5405c5762,5768 < * This function does not return a value. --- > * Encoded multipath rate. > * 2: 2.5 Gb/sec. > * 3: 10 Gb/sec. > * 4: 30 Gb/sec. > * others: reserved > * > * NOTES 5408c5771 < * ib_member_rec_t --- > * ib_multipath_rec_t 5411c5774 < /****f* IBA Base: Types/ib_member_set_state --- > /****f* IBA Base: Types/ib_multipath_rec_rate_sel 5413c5776 < * ib_member_set_state --- > * ib_multipath_rec_rate_sel 5416c5779 < * Set encoded JoinState of a MCR. --- > * Get encoded multipath rate selector. 5420,5423c5783,5785 < AL_INLINE void AL_API < ib_member_set_state( < IN OUT uint8_t* const p_scope_state, < IN const uint8_t state ) --- > static inline uint8_t > ib_multipath_rec_rate_sel( > IN const ib_multipath_rec_t* const p_rec ) 5425,5428c5787 < CL_ASSERT( state <= 0x0F ); < /* State is LS 4-bits. */ < *p_scope_state &= 0x0F; < *p_scope_state |= (state & 0x0F); --- > return( (uint8_t)((p_rec->rate & IB_MULTIPATH_REC_SELECTOR_MASK) >> 6) ); 5432,5436c5791,5792 < * scope_state < * [in/out] Pointer to the MCR scope_state field to modify. < * < * state < * [in] the JoinState --- > * p_rec > * [in] Pointer to the multipath record object. 5439c5795,5801 < * This function does not return a value. --- > * Encoded path rate selector value (for queries). > * 0: greater than rate specified > * 1: less than rate specified > * 2: exactly the rate specified > * 3: largest rate available > * > * NOTES 5442c5804 < * ib_member_rec_t --- > * ib_multipath_rec_t 5445c5807 < /****f* IBA Base: Types/ib_member_set_scope_state --- > /****f* IBA Base: Types/ib_multipath_rec_pkt_life 5447c5809 < * ib_member_set_scope_state --- > * ib_multipath_rec_pkt_life 5450c5812 < * Set encoded version, MGID scope and JoinState --- > * Get encoded multipath pkt_life. 5454,5457c5816,5818 < AL_INLINE uint8_t AL_API < ib_member_set_scope_state( < IN const uint8_t scope, < IN const uint8_t state ) --- > static inline uint8_t > ib_multipath_rec_pkt_life( > IN const ib_multipath_rec_t* const p_rec ) 5459,5460c5820 < /* Scope is MS 4-bits, state is LS 4-bits */ < return ((scope << 4) | (state & 0xF)); --- > return( (uint8_t)(p_rec->pkt_life & IB_MULTIPATH_REC_BASE_MASK) ); 5464,5468c5824,5825 < * scope < * [in] the MGID scope < * < * state < * [in] the JoinState --- > * p_rec > * [in] Pointer to the multipath record object. 5471,5472c5828 < * scope_state < * [out] the encoded one --- > * Encoded multipath pkt_life = 4.096 µsec * 2 ** PacketLifeTime. 5477c5833 < * ib_member_rec_t --- > * ib_multipath_rec_t 5480c5836 < /****f* IBA Base: Types/ib_member_set_join_state --- > /****f* IBA Base: Types/ib_multipath_rec_pkt_life_sel 5482c5838 < * ib_member_set_join_state --- > * ib_multipath_rec_pkt_life_sel 5485c5841 < * Set JoinState --- > * Get encoded multipath pkt_lifetime selector. 5489,5492c5845,5847 < AL_INLINE void AL_API < ib_member_set_join_state( < IN OUT ib_member_rec_t *p_mc_rec, < IN const uint8_t state ) --- > static inline uint8_t > ib_multipath_rec_pkt_life_sel( > IN const ib_multipath_rec_t* const p_rec ) 5494,5495c5849 < p_mc_rec->scope_state &= 0xF0; < p_mc_rec->scope_state |= (state & 0x0F); --- > return( (uint8_t)((p_rec->pkt_life & IB_MULTIPATH_REC_SELECTOR_MASK) >> 6 )); 5499,5503c5853,5854 < * p_mc_rec < * [in] pointer to the member record < * < * state < * [in] the JoinState --- > * p_rec > * [in] Pointer to the multipath record object. 5506c5857,5861 < * NONE --- > * Encoded path pkt_lifetime selector value (for queries). > * 0: greater than rate specified > * 1: less than rate specified > * 2: exactly the rate specified > * 3: smallest packet lifetime available 5511c5866 < * ib_member_rec_t --- > * ib_multipath_rec_t 5514,5515c5869,5877 < /* < * Join State Codes: --- > #define IB_NUM_PKEY_ELEMENTS_IN_BLOCK 32 > /****s* IBA Base: Types/ib_pkey_table_t > * NAME > * ib_pkey_table_t > * > * DESCRIPTION > * IBA defined PKey table. (14.2.5.7) > * > * SYNOPSIS 5517,5519d5878 < #define IB_MC_REC_STATE_FULL_MEMBER 0x01 < #define IB_MC_REC_STATE_NON_MEMBER 0x02 < #define IB_MC_REC_STATE_SEND_ONLY_MEMBER 0x04 5521,5529c5880,5883 < /* < * Generic MAD notice types < */ < #define IB_NOTICE_TYPE_FATAL 0x00 < #define IB_NOTICE_TYPE_URGENT 0x01 < #define IB_NOTICE_TYPE_SECURITY 0x02 < #define IB_NOTICE_TYPE_SUBN_MGMT 0x03 < #define IB_NOTICE_TYPE_INFO 0x04 < #define IB_NOTICE_TYPE_EMPTY 0x7F --- > #include > typedef struct _ib_pkey_table > { > ib_net16_t pkey_entry[IB_NUM_PKEY_ELEMENTS_IN_BLOCK]; 5530a5885,5887 > } PACK_SUFFIX ib_pkey_table_t; > #include > /************/ 5531a5889,5897 > /****s* IBA Base: Types/ib_pkey_table_record_t > * NAME > * ib_pkey_table_record_t > * > * DESCRIPTION > * IBA defined P_Key Table Record for SA Query. (15.2.5.11) > * > * SYNOPSIS > */ 5533c5899 < typedef struct _ib_mad_notice_attr --- > typedef struct _ib_pkey_table_record 5535,5545c5901,5906 < /* is_generic:1, type:7, producer type or vendor id:24 */ < net32_t combo1; < /* trap number or device id, depending on is_generic. */ < net16_t combo2; < < net16_t issuer_lid; < /* notice toggle:1, notice_count:15 */ < net16_t combo3; < < uint8_t data_details[54]; < ib_gid_t issuer_gid; --- > ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 > uint16_t block_num; > uint8_t port_num; // for switch: port number, for CA: reserved > uint8_t reserved1; > uint16_t reserved2; > ib_pkey_table_t pkey_tbl; 5547c5908 < } PACK_SUFFIX ib_mad_notice_attr_t; --- > } PACK_SUFFIX ib_pkey_table_record_t; 5548a5910 > /************/ 5550c5912,5914 < /****f* IBA Base: Types/ib_notice_get_generic --- > #define IB_DROP_VL 15 > #define IB_MAX_NUM_VLS 16 > /****s* IBA Base: Types/ib_slvl_table_t 5552c5916 < * ib_notice_get_generic --- > * ib_slvl_table_t 5555c5919 < * Retrieves whether a notice trap is generic. --- > * IBA defined SL2VL Mapping Table Attribute. (14.2.5.8) 5559,5561c5923,5924 < AL_INLINE boolean_t AL_API < ib_notice_get_generic( < IN const ib_mad_notice_attr_t* const p_notice_attr ) --- > #include > typedef struct _ib_slvl_table 5563,5574c5926,5933 < if( cl_ntoh32( p_notice_attr->combo1 ) & 0x00000001 ) < return TRUE; < return FALSE; < } < /* < * PARAMETERS < * p_notice_attr < * [in] Pointer to the notice attribute structure for which to return < * whether it is generic or not. < * < * RETURN VALUES < * Returns TRUE if the notice is generic. --- > uint8_t raw_vl_by_sl[IB_MAX_NUM_VLS/2]; > } PACK_SUFFIX ib_slvl_table_t; > #include > /************/ > > /****s* IBA Base: Types/ib_slvl_table_record_t > * NAME > * ib_slvl_table_record_t 5576c5935,5936 < * Returns FALSE if the notice is vendor specific. --- > * DESCRIPTION > * IBA defined SL to VL Mapping Table Record for SA Query. (15.2.5.4) 5578,5580c5938,5947 < * SEE ALSO < * ib_mad_notice_attr_t, ib_notice_set_generic < *********/ --- > * SYNOPSIS > */ > #include > typedef struct _ib_slvl_table_record > { > ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 > uint8_t in_port_num; // reserved for CAs > uint8_t out_port_num; // reserved for CAs > uint32_t resv; > ib_slvl_table_t slvl_tbl; 5582c5949,5953 < /****f* IBA Base: Types/ib_notice_set_generic --- > } PACK_SUFFIX ib_slvl_table_record_t; > #include > /************/ > > /****f* IBA Base: Types/ib_slvl_table_set 5584c5955 < * ib_notice_set_generic --- > * ib_slvl_table_set 5587c5958 < * Sets whether a notice trap is generic. --- > * Set slvl table entry. 5591,5594c5962,5966 < AL_INLINE void AL_API < ib_notice_set_generic( < IN OUT ib_mad_notice_attr_t* const p_notice_attr, < IN const boolean_t is_generic ) --- > static inline void > ib_slvl_table_set( > IN ib_slvl_table_t* p_slvl_tbl, > IN uint8_t sl_index, > IN uint8_t vl ) 5596c5968,5970 < uint32_t val; --- > uint8_t idx = sl_index/2; > CL_ASSERT(vl <= 15); > CL_ASSERT(sl_index <= 15); 5598,5600c5972,5976 < val = cl_ntoh32( p_notice_attr->combo1 ); < if( is_generic ) < val |= 0x00000001; --- > if (sl_index%2) > { > /* this is an odd sl. Need to update the ls bits */ > p_slvl_tbl->raw_vl_by_sl[idx] = ( p_slvl_tbl->raw_vl_by_sl[idx] & 0xF0 ) | vl ; > } 5602,5603c5978,5981 < val &= 0xFFFFFFFE; < p_notice_attr->combo1 = cl_hton32( val ); --- > { > /* this is an even sl. Need to update the ms bits */ > p_slvl_tbl->raw_vl_by_sl[idx] = ( vl << 4 ) | ( p_slvl_tbl->raw_vl_by_sl[idx] & 0x0F ); > } 5607,5609c5985,5989 < * p_notice_attr < * [in/out] Pointer to the notice attribute structure for which to set < * the generic bit. --- > * p_slvl_tbl > * [in] pointer to ib_slvl_table_t object. > * > * sl_index > * [in] the sl index in the table to be updated. 5611,5612c5991,5992 < * is_generic < * [in] TRUE if the notice is generic, FALSE if vendor specific. --- > * vl > * [in] the vl value to update for that sl. 5615c5995,5997 < * This function does not return a value. --- > * None > * > * NOTES 5618c6000 < * ib_mad_notice_attr_t, ib_notice_get_generic --- > * ib_slvl_table_t 5621c6003 < /****f* IBA Base: Types/ib_notice_get_type --- > /****f* IBA Base: Types/ib_slvl_table_get 5623c6005 < * ib_notice_get_type --- > * ib_slvl_table_get 5626c6008 < * Retrieves the type of a notice trap. --- > * Get slvl table entry. 5630,5632c6012,6020 < AL_INLINE uint8_t AL_API < ib_notice_get_type( < IN const ib_mad_notice_attr_t* const p_notice_attr ) --- > static inline uint8_t > ib_slvl_table_get( > IN const ib_slvl_table_t* p_slvl_tbl, > IN uint8_t sl_index ) > { > uint8_t idx = sl_index/2; > CL_ASSERT(sl_index <= 15); > > if (sl_index%2) 5634c6022,6029 < return (uint8_t)((cl_ntoh32( p_notice_attr->combo1 ) >> 1) & 0x0000007F); --- > /* this is an odd sl. Need to return the ls bits. */ > return ( p_slvl_tbl->raw_vl_by_sl[idx] & 0x0F ); > } > else > { > /* this is an even sl. Need to return the ms bits. */ > return ( (p_slvl_tbl->raw_vl_by_sl[idx] & 0xF0) >> 4 ); > } 5638,5639c6033,6037 < * p_notice_attr < * [in] Pointer to the notice attribute structure whose type to return. --- > * p_slvl_tbl > * [in] pointer to ib_slvl_table_t object. > * > * sl_index > * [in] the sl index in the table whose value should be returned. 5642c6040,6042 < * Returns the type of the notice. --- > * vl for the requested sl_index. > * > * NOTES 5645c6045 < * ib_mad_notice_attr_t, ib_notice_set_type --- > * ib_slvl_table_t 5648c6048 < /****f* IBA Base: Types/ib_notice_set_type --- > /****s* IBA Base: Types/ib_vl_arb_element_t 5650c6050 < * ib_notice_set_type --- > * ib_vl_arb_element_t 5653c6053 < * Sets the type of a notice trap. --- > * IBA defined VL Arbitration Table Element. (14.2.5.9) 5657,5660c6057,6058 < AL_INLINE void AL_API < ib_notice_set_type( < IN OUT ib_mad_notice_attr_t* const p_notice_attr, < IN const uint8_t type ) --- > #include > typedef struct _ib_vl_arb_element 5662c6060,6061 < uint32_t val; --- > uint8_t vl; > uint8_t weight; 5664,5684c6063,6065 < val = cl_ntoh32( p_notice_attr->combo1 ); < /* Clear the type. */ < val &= 0xFFFFFF01; < /* Set new value. */ < val |= (((uint32_t)(type & 0x7F)) << 1); < p_notice_attr->combo1 = cl_hton32( val ); < } < /* < * PARAMETERS < * p_notice_attr < * [in/out] Pointer to the notice attribute structure whose type to set. < * < * type < * [in] Type of notice trap. < * < * RETURN VALUES < * This function does not return a value. < * < * SEE ALSO < * ib_mad_notice_attr_t, ib_notice_get_type < *********/ --- > } PACK_SUFFIX ib_vl_arb_element_t; > #include > /************/ 5686c6067,6068 < /****f* IBA Base: Types/ib_notice_get_prod_type --- > #define IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK 32 > /****s* IBA Base: Types/ib_vl_arb_table_t 5688c6070 < * ib_notice_get_prod_type --- > * ib_vl_arb_table_t 5691c6073 < * Retrieves the producer type from a generic notice trap. --- > * IBA defined VL Arbitration Table. (14.2.5.9) 5695,5697c6077,6078 < AL_INLINE uint32_t AL_API < ib_notice_get_prod_type( < IN const ib_mad_notice_attr_t* const p_notice_attr ) --- > #include > typedef struct _ib_vl_arb_table 5699,5712c6080,6084 < return (cl_ntoh32( p_notice_attr->combo1 ) >> 8); < } < /* < * PARAMETERS < * p_notice_attr < * [in] Pointer to the notice attribute structure whose < * prducer type to return. < * < * RETURN VALUES < * Returns the producer type of the notice, in host byte order. < * < * SEE ALSO < * ib_mad_notice_attr_t, ib_notice_set_prod_type < *********/ --- > ib_vl_arb_element_t vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]; > > } PACK_SUFFIX ib_vl_arb_table_t; > #include > /************/ 5714c6086 < /****f* IBA Base: Types/ib_notice_set_prod_type --- > /****s* IBA Base: Types/ib_vl_arb_table_record_t 5716c6088 < * ib_notice_set_prod_type --- > * ib_vl_arb_table_record_t 5719c6091 < * Sets the producer type of a generic notice trap. --- > * IBA defined VL Arbitration Table Record for SA Query. (15.2.5.9) 5723,5726c6095,6096 < AL_INLINE void AL_API < ib_notice_set_prod_type( < IN OUT ib_mad_notice_attr_t* const p_notice_attr, < IN const uint32_t prod_type ) --- > #include > typedef struct _ib_vl_arb_table_record 5728c6098,6106 < uint32_t val; --- > ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 > uint8_t port_num; > uint8_t block_num; > uint32_t reserved; > ib_vl_arb_table_t vl_arb_tbl; > > } PACK_SUFFIX ib_vl_arb_table_record_t; > #include > /************/ 5730,5736d6107 < val = cl_ntoh32( p_notice_attr->combo1 ); < /* Clear the type. */ < val &= 0x000000FF; < /* Set new value. */ < val |= (prod_type << 8); < p_notice_attr->combo1 = cl_hton32( val ); < } 5738,5751c6109,6119 < * PARAMETERS < * p_notice_attr < * [in/out] Pointer to the notice attribute structure < * whose producer type to set. < * < * prod_type < * [in] Producer type of notice trap. < * < * RETURN VALUES < * This function does not return a value. < * < * SEE ALSO < * ib_mad_notice_attr_t, ib_notice_get_prod_type < *********/ --- > * Global route header information received with unreliable datagram messages > */ > #include > typedef struct _ib_grh > { > ib_net32_t ver_class_flow; > ib_net16_t resv1; > uint8_t resv2; > uint8_t hop_limit; > ib_gid_t src_gid; > ib_gid_t dest_gid; 5753c6121,6124 < /****f* IBA Base: Types/ib_notice_get_vend_id --- > } PACK_SUFFIX ib_grh_t; > #include > > /****f* IBA Base: Types/ib_grh_get_ver_class_flow 5755c6126 < * ib_notice_get_vend_id --- > * ib_grh_get_ver_class_flow 5758c6129 < * Retrieves the vendor ID from a vendor specific notice trap. --- > * Get encoded version, traffic class and flow label in grh 5762,5764c6133,6138 < AL_INLINE uint32_t AL_API < ib_notice_get_vend_id( < IN const ib_mad_notice_attr_t* const p_notice_attr ) --- > static inline void > ib_grh_get_ver_class_flow( > IN const ib_net32_t ver_class_flow, > OUT uint8_t* const p_ver, > OUT uint8_t* const p_tclass, > OUT uint32_t* const p_flow_lbl ) 5766c6140,6153 < return ib_notice_get_prod_type( p_notice_attr ); --- > ib_net32_t tmp_ver_class_flow; > > if (p_ver) > *p_ver = (uint8_t)(ver_class_flow & 0x0f); > > tmp_ver_class_flow = ver_class_flow >> 4; > > if (p_tclass) > *p_tclass = (uint8_t)(tmp_ver_class_flow & 0xff); > > tmp_ver_class_flow = tmp_ver_class_flow >> 8; > > if (p_flow_lbl) > *p_flow_lbl = tmp_ver_class_flow & 0xfffff; 5770,5772c6157,6158 < * p_notice_attr < * [in] Pointer to the notice attribute structure whose < * vendor ID to return. --- > * ver_class_flow > * [in] the version, traffic class and flow label info. 5775c6161,6170 < * Returns the vendor ID of the notice, in host byte order. --- > * p_ver > * [out] pointer to the version info. > * > * p_tclass > * [out] pointer to the traffic class info. > * > * p_flow_lbl > * [out] pointer to the flow label info > * > * NOTES 5778c6173 < * ib_mad_notice_attr_t, ib_notice_set_vend_id --- > * ib_grh_t 5781c6176 < /****f* IBA Base: Types/ib_notice_set_vend_id --- > /****f* IBA Base: Types/ib_grh_set_ver_class_flow 5783c6178 < * ib_notice_set_vend_id --- > * ib_grh_set_ver_class_flow 5786c6181 < * Sets the vendor ID of a vendor specific notice trap. --- > * Set encoded version, traffic class and flow label in grh 5790,5793c6185,6189 < AL_INLINE void AL_API < ib_notice_set_vend_id( < IN OUT ib_mad_notice_attr_t* const p_notice_attr, < IN const uint32_t vend_id ) --- > static inline ib_net32_t > ib_grh_set_ver_class_flow( > IN const uint8_t ver, > IN const uint8_t tclass, > IN const uint32_t flow_lbl ) 5795c6191,6198 < ib_notice_set_prod_type( p_notice_attr, vend_id ); --- > ib_net32_t ver_class_flow; > > ver_class_flow = flow_lbl; > ver_class_flow = ver_class_flow << 8; > ver_class_flow = ver_class_flow | tclass; > ver_class_flow = ver_class_flow << 4; > ver_class_flow = ver_class_flow | ver; > return (ver_class_flow); 5799,5801c6202,6203 < * p_notice_attr < * [in/out] Pointer to the notice attribute structure < * whose vendor ID to set. --- > * ver > * [in] the version info. 5803,5804c6205,6209 < * vend_id < * [in] Vendor ID of notice trap. --- > * tclass > * [in] the traffic class info. > * > * flow_lbl > * [in] the flow label info 5807c6212,6215 < * This function does not return a value. --- > * ver_class_flow > * [out] the version, traffic class and flow label info. > * > * NOTES 5810c6218 < * ib_mad_notice_attr_t, ib_notice_get_vend_id --- > * ib_grh_t 5813c6221 < /****f* IBA Base: Types/ib_notice_get_trap_num --- > /****s* IBA Base: Types/ib_member_rec_t 5815c6223 < * ib_notice_get_trap_num --- > * ib_member_rec_t 5818c6226,6227 < * Retrieves the trap number from a generic notice trap. --- > * Multicast member record, used to create, join, and leave multicast > * groups. 5822,5824c6231,6232 < AL_INLINE uint16_t AL_API < ib_notice_get_trap_num( < IN const ib_mad_notice_attr_t* const p_notice_attr ) --- > #include > typedef struct _ib_member_rec 5826,5827c6234,6250 < return cl_ntoh16( p_notice_attr->combo2 ); < } --- > ib_gid_t mgid; > ib_gid_t port_gid; > ib_net32_t qkey; > ib_net16_t mlid; > uint8_t mtu; > uint8_t tclass; > ib_net16_t pkey; > uint8_t rate; > uint8_t pkt_life; > ib_net32_t sl_flow_hop; > uint8_t scope_state; > uint8_t proxy_join:1; > uint8_t reserved[2]; > uint8_t pad[4]; > > } PACK_SUFFIX ib_member_rec_t; > #include 5829,5832c6252,6254 < * PARAMETERS < * p_notice_attr < * [in] Pointer to the notice attribute structure whose < * trap number to return. --- > * FIELDS > * mgid > * Multicast GID address for this multicast group. 5834,5835c6256,6257 < * RETURN VALUES < * Returns the vendor ID of the notice, in host byte order. --- > * port_gid > * Valid GID of the endpoint joining this multicast group. 5837,5843c6259,6260 < * SEE ALSO < * ib_mad_notice_attr_t, ib_notice_set_trap_num < *********/ < < /****f* IBA Base: Types/ib_notice_set_trap_num < * NAME < * ib_notice_set_trap_num --- > * qkey > * Q_Key to be sued by this multicast group. 5845,5846c6262,6263 < * DESCRIPTION < * Sets the trap number of a generic notice trap. --- > * mlid > * Multicast LID for this multicast group. 5848,5861c6265,6266 < * SYNOPSIS < */ < AL_INLINE void AL_API < ib_notice_set_trap_num( < IN OUT ib_mad_notice_attr_t* const p_notice_attr, < IN const uint16_t trap_num ) < { < p_notice_attr->combo2 = cl_hton16( trap_num ); < } < /* < * PARAMETERS < * p_notice_attr < * [in/out] Pointer to the notice attribute structure < * whose trap number to set. --- > * mtu > * MTU and MTU selector fields to use on this path 5863,5864c6268,6269 < * trap_num < * [in] Trap number to set. --- > * tclass > * Another global routing parameter. 5866,5867c6271,6272 < * RETURN VALUES < * This function does not return a value. --- > * pkey > * Partition key (P_Key) to use for this member. 5869,5875c6274,6275 < * SEE ALSO < * ib_mad_notice_attr_t, ib_notice_get_trap_num < *********/ < < /****f* IBA Base: Types/ib_notice_get_dev_id < * NAME < * ib_notice_get_dev_id --- > * rate > * Rate and rate selector fields to use on this path. 5877,5878c6277,6278 < * DESCRIPTION < * Retrieves the device ID from a vendor specific notice trap. --- > * pkt_life > * Packet lifetime 5880,5892c6280,6281 < * SYNOPSIS < */ < AL_INLINE uint16_t AL_API < ib_notice_get_dev_id( < IN const ib_mad_notice_attr_t* const p_notice_attr ) < { < return ib_notice_get_trap_num( p_notice_attr ); < } < /* < * PARAMETERS < * p_notice_attr < * [in] Pointer to the notice attribute structure whose < * device ID to return. --- > * sl_flow_hop > * Global routing parameters: service level, hop count, and flow label. 5894,5895c6283,6287 < * RETURN VALUES < * Returns the vendor ID of the notice, in host byte order. --- > * scope_state > * MGID scope and JoinState of multicast request. > * > * proxy_join > * Enables others in the Partition to proxy add/remove from the group 5898d6289 < * ib_mad_notice_attr_t, ib_notice_set_dev_id 5901c6292 < /****f* IBA Base: Types/ib_notice_set_dev_id --- > /****f* IBA Base: Types/ib_member_get_sl_flow_hop 5903c6294 < * ib_notice_set_dev_id --- > * ib_member_get_sl_flow_hop 5906c6297 < * Sets the producer type of a vendor specific notice trap. --- > * Get encoded sl, flow label, and hop limit 5910,5913c6301,6306 < AL_INLINE void AL_API < ib_notice_set_dev_id( < IN OUT ib_mad_notice_attr_t* const p_notice_attr, < IN const uint16_t dev_id ) --- > static inline void > ib_member_get_sl_flow_hop( > IN const ib_net32_t sl_flow_hop, > OUT uint8_t* const p_sl, > OUT uint32_t* const p_flow_lbl, > OUT uint8_t* const p_hop ) 5915c6308,6320 < ib_notice_set_trap_num( p_notice_attr, dev_id ); --- > uint32_t tmp; > > tmp = cl_ntoh32(sl_flow_hop); > if (p_hop) > *p_hop = (uint8_t)tmp; > tmp >>= 8; > > if (p_flow_lbl) > *p_flow_lbl = (uint32_t)(tmp & 0xfffff); > tmp >>= 20; > > if (p_sl) > *p_sl = (uint8_t)tmp; 5919,5924c6324,6325 < * p_notice_attr < * [in/out] Pointer to the notice attribute structure < * whose device ID to set. < * < * dev_id < * [in] Device ID of notice trap. --- > * sl_flow_hop > * [in] the sl, flow label, and hop limit of MC Group 5927c6328,6337 < * This function does not return a value. --- > * p_sl > * [out] pointer to the service level > * > * p_flow_lbl > * [out] pointer to the flow label info > * > * p_hop > * [out] pointer to the hop count limit. > * > * NOTES 5930c6340 < * ib_mad_notice_attr_t, ib_notice_get_dev_id --- > * ib_member_rec_t 5933c6343 < /****f* IBA Base: Types/ib_notice_get_toggle --- > /****f* IBA Base: Types/ib_member_set_sl_flow_hop 5935c6345 < * ib_notice_get_toggle --- > * ib_member_set_sl_flow_hop 5938c6348 < * Retrieves the notice toggle bit from a notice trap. --- > * Set encoded sl, flow label, and hop limit 5942,5944c6352,6356 < AL_INLINE boolean_t AL_API < ib_notice_get_toggle( < IN const ib_mad_notice_attr_t* const p_notice_attr ) --- > static inline ib_net32_t > ib_member_set_sl_flow_hop( > IN const uint8_t sl, > IN const uint32_t flow_label, > IN const uint8_t hop_limit ) 5946c6358,6361 < return (cl_ntoh16( p_notice_attr->combo3 ) & 0x0001); --- > uint32_t tmp; > > tmp = (sl << 28) | ((flow_label & 0xfffff) << 8) | hop_limit; > return cl_hton32(tmp); 5950,5952c6365,6372 < * p_notice_attr < * [in] Pointer to the notice attribute structure whose < * notice toggle bit value to return. --- > * sl > * [in] the service level. > * > * flow_lbl > * [in] the flow label info > * > * hop_limit > * [in] the hop limit. 5955c6375,6376 < * Returns TRUE if the notice toggle bit of the notice is set. --- > * sl_flow_hop > * [out] the sl, flow label, and hop limit 5957c6378 < * Returns FALSE otherwise. --- > * NOTES 5960c6381 < * ib_mad_notice_attr_t, ib_notice_set_toggle --- > * ib_member_rec_t 5963c6384 < /****f* IBA Base: Types/ib_notice_set_toggle --- > /****f* IBA Base: Types/ib_member_get_scope_state 5965c6386 < * ib_notice_set_toggle --- > * ib_member_get_scope_state 5968c6389 < * Sets the notice toggle bit of a notice trap. --- > * Get encoded MGID scope and JoinState 5972,5975c6393,6397 < AL_INLINE void AL_API < ib_notice_set_toggle( < IN OUT ib_mad_notice_attr_t* const p_notice_attr, < IN const boolean_t toggle_val ) --- > static inline void > ib_member_get_scope_state( > IN const uint8_t scope_state, > OUT uint8_t* const p_scope, > OUT uint8_t* const p_state ) 5977,5983c6399,6408 < uint16_t val; < val = cl_ntoh16( p_notice_attr->combo3 ); < if( toggle_val ) < val |= 0x0001; < else < val &= 0xFFFE; < p_notice_attr->combo3 = cl_hton16( val ); --- > uint8_t tmp_scope_state; > > if (p_state) > *p_state = (uint8_t)(scope_state & 0x0f); > > tmp_scope_state = scope_state >> 4; > > if (p_scope) > *p_scope = (uint8_t)(tmp_scope_state & 0x0f); > 5987,5993c6412,6413 < * p_notice_attr < * [in/out] Pointer to the notice attribute structure < * whose notice toggle bit to set or clear. < * < * toggle_val < * [in] Boolean value indicating whether the toggle bit of the notice < * should be set or cleared. --- > * scope_state > * [in] the scope and state 5996c6416,6422 < * This function does not return a value. --- > * p_scope > * [out] pointer to the MGID scope > * > * p_state > * [out] pointer to the join state > * > * NOTES 5999c6425 < * ib_mad_notice_attr_t, ib_notice_get_toggle --- > * ib_member_rec_t 6002c6428 < /****f* IBA Base: Types/ib_notice_get_count --- > /****f* IBA Base: Types/ib_member_set_scope_state 6004c6430 < * ib_notice_get_count --- > * ib_member_set_scope_state 6007c6433 < * Retrieves the notice toggle count from a notice trap. --- > * Set encoded version, MGID scope and JoinState 6011,6013c6437,6440 < AL_INLINE boolean_t AL_API < ib_notice_get_count( < IN const ib_mad_notice_attr_t* const p_notice_attr ) --- > static inline uint8_t > ib_member_set_scope_state( > IN const uint8_t scope, > IN const uint8_t state ) 6015c6442,6447 < return ((cl_ntoh16( p_notice_attr->combo3 ) & 0xFFFE) >> 1); --- > uint8_t scope_state; > > scope_state = scope; > scope_state = scope_state << 4; > scope_state = scope_state | state; > return (scope_state); 6019,6021c6451,6455 < * p_notice_attr < * [in] Pointer to the notice attribute structure whose < * notice toggle count to return. --- > * scope > * [in] the MGID scope > * > * state > * [in] the JoinState 6024c6458,6461 < * Returns the notice toggle count of the notice. --- > * scope_state > * [out] the encoded one > * > * NOTES 6027c6464 < * ib_mad_notice_attr_t, ib_notice_set_count --- > * ib_member_rec_t 6030c6467 < /****f* IBA Base: Types/ib_notice_set_count --- > /****f* IBA Base: Types/ib_member_set_join_state 6032c6469 < * ib_notice_set_count --- > * ib_member_set_join_state 6035c6472 < * Sets the toggle count of a notice trap. --- > * Set JoinState 6039,6042c6476,6479 < AL_INLINE void AL_API < ib_notice_set_count( < IN OUT ib_mad_notice_attr_t* const p_notice_attr, < IN const uint16_t toggle_cnt ) --- > static inline void > ib_member_set_join_state( > IN OUT ib_member_rec_t *p_mc_rec, > IN const uint8_t state ) 6044,6048c6481,6482 < uint16_t val; < val = cl_ntoh16( p_notice_attr->combo3 ); < val &= 0x0001; < val |= (toggle_cnt << 1); < p_notice_attr->combo3 = cl_hton16( val ); --- > /* keep the scope as it is */ > p_mc_rec->scope_state = (p_mc_rec->scope_state & 0xF0) | (0x0f & state); 6052,6054c6486,6487 < * p_notice_attr < * [in/out] Pointer to the notice attribute structure < * whose device ID to set. --- > * p_mc_rec > * [in] pointer to the member record 6056,6057c6489,6490 < * toggle_cnt < * [in] Toggle count value of the notice. --- > * state > * [in] the JoinState 6060c6493,6495 < * This function does not return a value. --- > * NONE > * > * NOTES 6063c6498 < * ib_mad_notice_attr_t, ib_notice_get_count --- > * ib_member_rec_t 6065a6501,6518 > /* > * Join State Codes: > */ > #define IB_MC_REC_STATE_FULL_MEMBER 0x01 > #define IB_MC_REC_STATE_NON_MEMBER 0x02 > #define IB_MC_REC_STATE_SEND_ONLY_MEMBER 0x04 > > /* > * Generic MAD notice types > */ > #define IB_NOTICE_TYPE_FATAL 0x00 > #define IB_NOTICE_TYPE_URGENT 0x01 > #define IB_NOTICE_TYPE_SECURITY 0x02 > #define IB_NOTICE_TYPE_SUBN_MGMT 0x03 > #define IB_NOTICE_TYPE_INFO 0x04 > #define IB_NOTICE_TYPE_EMPTY 0x7F > > 6067c6520 < typedef struct _ib_inform_info --- > typedef struct _ib_mad_notice_attr // Total Size calc Accumulated 6069c6522 < ib_gid_t gid; --- > uint8_t generic_type; // 1 1 6071,6078c6524,6531 < ib_net16_t lid_range_begin; < ib_net16_t lid_range_end; < // EZ: not v1.1 ib_net16_t pkey; < ib_net16_t reserved; < uint8_t is_generic; < uint8_t subscribe; < // EZ: not v1.1 ib_net16_t class_range; < ib_net16_t trap_type; --- > union _notice_g_or_v > { > struct _notice_generic // 5 6 > { > uint8_t prod_type_msb; > ib_net16_t prod_type_lsb; > ib_net16_t trap_num; > } PACK_SUFFIX generic; 6080,6085c6533,6539 < /* trap num or dev_id */ < ib_net16_t combo1; < /* QPN:24, resv:3, resp_time_val:5 */ < ib_net32_t combo2; < /* resv:8, producer type or vendor id:24 */ < ib_net32_t combo3; --- > struct _notice_vend > { > uint8_t vend_id_msb; > ib_net16_t vend_id_lsb; > ib_net16_t dev_id; > } PACK_SUFFIX vend; > } g_or_v; 6087,6088c6541,6565 < } PACK_SUFFIX ib_inform_info_t; < #include --- > ib_net16_t issuer_lid; // 2 8 > ib_net16_t toggle_count; // 2 10 > > union _data_details // 54 64 > { > struct _raw_data > { > uint8_t details[54]; > } PACK_SUFFIX raw_data; > > struct _ntc_64_67 > { > uint8_t res[6]; > ib_gid_t gid; // the Node or Multicast Group that came in/out > } PACK_SUFFIX ntc_64_67; > > struct _ntc_128 { > ib_net16_t sw_lid; // the sw lid of which link state changed > } PACK_SUFFIX ntc_128; > > struct _ntc_129_131 { > ib_net16_t pad; > ib_net16_t lid; // lid and port number of the violation > uint8_t port_num; > } PACK_SUFFIX ntc_129_131; 6089a6567,6629 > struct _ntc_144 { > ib_net16_t pad1; > ib_net16_t lid; // lid where capability mask changed > ib_net16_t pad2; > ib_net32_t new_cap_mask; // new capability mask > } PACK_SUFFIX ntc_144; > > struct _ntc_145 { > ib_net16_t pad1; > ib_net16_t lid; // lid where sys guid changed > ib_net16_t pad2; > ib_net64_t new_sys_guid; // new system image guid > } PACK_SUFFIX ntc_145; > > struct _ntc_256 { // total: 54 > ib_net16_t pad1; // 2 > ib_net16_t lid; // 2 > ib_net16_t pad2; // 2 > uint8_t method; // 1 > uint8_t pad3; // 1 > ib_net16_t attr_id; // 2 > ib_net32_t attr_mod; // 4 > ib_net64_t mkey; // 8 > uint8_t dr_slid; // 1 > uint8_t dr_trunc_hop; // 1 > uint8_t dr_rtn_path[30]; // 30 > } PACK_SUFFIX ntc_256; > > struct _ntc_257_258 // violation of p/q_key // 49 > { > ib_net16_t pad1; // 2 > ib_net16_t lid1; // 2 > ib_net16_t lid2; // 2 > ib_net32_t key; // 2 > uint8_t sl; // 1 > ib_net32_t qp1; // 4 > ib_net32_t qp2; // 4 > ib_gid_t gid1; // 16 > ib_gid_t gid2; // 16 > } PACK_SUFFIX ntc_257_258; > > struct _ntc_259 // p/q_key violation with sw info 53 > { > ib_net16_t data_valid; // 2 > ib_net16_t lid1; // 2 > ib_net16_t lid2; // 2 > ib_net32_t key; // 4 > uint8_t sl; // 1 > ib_net32_t qp1; // 4 > uint8_t qp2_msb; // 1 > ib_net16_t qp2_lsb; // 2 > ib_gid_t gid1; // 16 > ib_gid_t gid2; // 16 > ib_net16_t sw_lid; // 2 > uint8_t port_no; // 1 > } PACK_SUFFIX ntc_259; > > } data_details; > > ib_gid_t issuer_gid; // 16 80 > > } PACK_SUFFIX ib_mad_notice_attr_t; > #include 6091c6631 < /****f* IBA Base: Types/ib_inform_get_trap_num --- > /****f* IBA Base: Types/ib_notice_is_generic 6093c6633 < * ib_inform_get_trap_num --- > * ib_notice_is_generic 6096c6636 < * Retrieves the trap number from an inform info structure. --- > * Check if the notice is generic 6100,6102c6640,6642 < AL_INLINE uint16_t AL_API < ib_inform_get_trap_num( < IN const ib_inform_info_t* const p_inform_info ) --- > static inline boolean_t > ib_notice_is_generic( > IN const ib_mad_notice_attr_t *p_ntc) 6104c6644 < return cl_ntoh16( p_inform_info->combo1 ); --- > return (p_ntc->generic_type & 0x80); 6108,6110c6648,6649 < * p_inform_info < * [in] Pointer to the inform info structure whose < * trap number to return. --- > * p_ntc > * [in] Pointer to the notice MAD attribute 6113c6652 < * Returns the trap number of the infrom info, in host byte order. --- > * TRUE if mad is generic 6116c6655 < * ib_inform_info_t, ib_inform_set_trap_num --- > * ib_mad_notice_attr_t 6119c6658 < /****f* IBA Base: Types/ib_inform_set_trap_num --- > /****f* IBA Base: Types/ib_notice_get_type 6121c6660 < * ib_inform_set_trap_num --- > * ib_notice_get_type 6124c6663 < * Sets the trap number of an inform info structure. --- > * Get the notice type 6128,6131c6667,6669 < AL_INLINE void AL_API < ib_inform_set_trap_num( < IN OUT ib_inform_info_t* const p_inform_info, < IN const uint16_t trap_num ) --- > static inline uint8_t > ib_notice_get_type( > IN const ib_mad_notice_attr_t *p_ntc) 6133c6671 < p_inform_info->combo1 = cl_hton16( trap_num ); --- > return p_ntc->generic_type & 0x7f; 6137,6142c6675,6676 < * p_inform_info < * [in/out] Pointer to the inform info structure < * whose trap number to set. < * < * trap_num < * [in] Trap number to set. --- > * p_ntc > * [in] Pointer to the notice MAD attribute 6145c6679 < * This function does not return a value. --- > * TRUE if mad is generic 6148c6682 < * ib_inform_info_t, ib_inform_get_trap_num --- > * ib_mad_notice_attr_t 6151c6685 < /****f* IBA Base: Types/ib_inform_get_dev_id --- > /****f* IBA Base: Types/ib_notice_get_prod_type 6153c6687 < * ib_inform_get_dev_id --- > * ib_notice_get_prod_type 6156c6690 < * Retrieves the device ID from a vendor specific inform trap. --- > * Get the notice Producer Type of Generic Notice 6160,6162c6694,6696 < AL_INLINE uint16_t AL_API < ib_inform_get_dev_id( < IN const ib_inform_info_t* const p_inform_info ) --- > static inline ib_net32_t > ib_notice_get_prod_type( > IN const ib_mad_notice_attr_t *p_ntc) 6164c6698,6701 < return ib_inform_get_trap_num( p_inform_info ); --- > uint32_t pt; > pt = cl_ntoh16(p_ntc->g_or_v.generic.prod_type_lsb) | > (p_ntc->g_or_v.generic.prod_type_msb << 16); > return cl_hton32(pt); 6168,6170c6705,6706 < * p_inform_info < * [in] Pointer to the inform info structure whose < * device ID to return. --- > * p_ntc > * [in] Pointer to the notice MAD attribute 6173c6709 < * Returns the vendor ID of the inform info, in host byte order. --- > * The producer type 6176c6712 < * ib_inform_info_t, ib_inform_set_dev_id --- > * ib_mad_notice_attr_t 6179c6715 < /****f* IBA Base: Types/ib_inform_set_dev_id --- > /****f* IBA Base: Types/ib_notice_set_prod_type 6181c6717 < * ib_inform_set_dev_id --- > * ib_notice_set_prod_type 6184c6720 < * Sets the producer type of a vendor specific inform trap. --- > * Set the notice Producer Type of Generic Notice 6188,6191c6724,6727 < AL_INLINE void AL_API < ib_inform_set_dev_id( < IN OUT ib_inform_info_t* const p_inform_info, < IN const uint16_t dev_id ) --- > static inline void > ib_notice_set_prod_type( > IN ib_mad_notice_attr_t *p_ntc, > IN ib_net32_t prod_type_val) 6193c6729,6731 < ib_inform_set_trap_num( p_inform_info, dev_id ); --- > uint32_t ptv = cl_ntoh32(prod_type_val); > p_ntc->g_or_v.generic.prod_type_lsb = cl_hton16((uint16_t)(ptv & 0x0000ffff)); > p_ntc->g_or_v.generic.prod_type_msb = (uint8_t)( (ptv & 0x00ff0000) >> 16); 6197,6199c6735,6736 < * p_inform_info < * [in/out] Pointer to the inform info structure < * whose device ID to set. --- > * p_ntc > * [in] Pointer to the notice MAD attribute 6201,6202c6738,6739 < * dev_id < * [in] Device ID of inform trap. --- > * prod_type > * [in] The producer Type code 6205c6742 < * This function does not return a value. --- > * None 6208c6745 < * ib_inform_info_t, ib_inform_get_dev_id --- > * ib_mad_notice_attr_t 6211c6748 < /****f* IBA Base: Types/ib_inform_get_qpn --- > /****f* IBA Base: Types/ib_notice_get_vend_id 6213c6750 < * ib_inform_get_qpn --- > * ib_notice_get_vend_id 6216c6753 < * Retrieves the QPN from an inform info structure. --- > * Get the Vendor Id of Vendor type Notice 6220,6222c6757,6759 < AL_INLINE net32_t AL_API < ib_inform_get_qpn( < IN const ib_inform_info_t* const p_inform_info ) --- > static inline ib_net32_t > ib_notice_get_vend_id( > IN const ib_mad_notice_attr_t *p_ntc) 6224c6761,6764 < return (p_inform_info->combo2 & CL_NTOH32( 0x00FFFFFF )); --- > uint32_t vi; > vi = cl_ntoh16(p_ntc->g_or_v.vend.vend_id_lsb) | > (p_ntc->g_or_v.vend.vend_id_msb << 16); > return cl_hton32(vi); 6228,6230c6768,6769 < * p_inform_info < * [in] Pointer to the inform info structure whose < * QPN to return. --- > * p_ntc > * [in] Pointer to the notice MAD attribute 6233c6772 < * Returns the QPN of the infrom info. --- > * The Vendor Id of Vendor type Notice 6236c6775 < * ib_inform_info_t, ib_inform_set_qpn --- > * ib_mad_notice_attr_t 6239c6778 < /****f* IBA Base: Types/ib_inform_set_qpn --- > /****f* IBA Base: Types/ib_notice_set_vend_id 6241c6780 < * ib_inform_set_qpn --- > * ib_notice_set_vend_id 6244c6783 < * Sets the QPN of an inform info structure. --- > * Set the notice Producer Type of Generic Notice 6248,6251c6787,6790 < AL_INLINE void AL_API < ib_inform_set_qpn( < IN OUT ib_inform_info_t* const p_inform_info, < IN const net32_t qpn ) --- > static inline void > ib_notice_set_vend_id( > IN ib_mad_notice_attr_t *p_ntc, > IN ib_net32_t vend_id) 6253,6254c6792,6794 < p_inform_info->combo2 &= CL_NTOH32( 0xFF000000 ); < p_inform_info->combo2 |= (qpn & CL_NTOH32( 0x00FFFFFF )); --- > uint32_t vi = cl_ntoh32(vend_id); > p_ntc->g_or_v.vend.vend_id_lsb = cl_hton16((uint16_t)(vi & 0x0000ffff)); > p_ntc->g_or_v.vend.vend_id_msb = (uint8_t)((vi & 0x00ff0000) >> 16); 6258,6260c6798,6799 < * p_inform_info < * [in/out] Pointer to the inform info structure < * whose QPN to set. --- > * p_ntc > * [in] Pointer to the notice MAD attribute 6262,6263c6801,6802 < * qpn < * [in] QPN of the inform info. --- > * vend_id > * [in] The producer Type code 6266c6805 < * This function does not return a value. --- > * None 6269c6808 < * ib_inform_info_t, ib_inform_get_qpn --- > * ib_mad_notice_attr_t 6272,6283c6811,6812 < /****f* IBA Base: Types/ib_inform_get_resp_time_val < * NAME < * ib_inform_get_resp_time_val < * < * DESCRIPTION < * Retrieves the response time value from an inform info structure. < * < * SYNOPSIS < */ < AL_INLINE uint8_t AL_API < ib_inform_get_resp_time_val( < IN const ib_inform_info_t* const p_inform_info ) --- > #include > typedef struct _ib_inform_info 6285,6298c6814,6839 < return (uint8_t)(cl_ntoh32( p_inform_info->combo2 ) >> 27); < } < /* < * PARAMETERS < * p_inform_info < * [in] Pointer to the inform info structure whose < * response time value to return. < * < * RETURN VALUES < * Returns the response time value of the infrom info. < * < * SEE ALSO < * ib_inform_info_t, ib_inform_set_resp_time_val < *********/ --- > ib_gid_t gid; > ib_net16_t lid_range_begin; > ib_net16_t lid_range_end; > ib_net16_t reserved1; > uint8_t is_generic; > uint8_t subscribe; > ib_net16_t trap_type; > union _inform_g_or_v > { > struct _inform_generic > { > ib_net16_t trap_num; > ib_net32_t qpn_resp_time_val; > uint8_t reserved2; > uint8_t node_type_msb; > ib_net16_t node_type_lsb; > } PACK_SUFFIX generic; > > struct _inform_vend > { > ib_net16_t dev_id; > ib_net32_t qpn_resp_time_val; > uint8_t reserved2; > uint8_t vendor_id_msb; > ib_net16_t vendor_id_lsb; > } PACK_SUFFIX vend; 6300c6841,6846 < /****f* IBA Base: Types/ib_inform_set_resp_time_val --- > } PACK_SUFFIX g_or_v; > > } PACK_SUFFIX ib_inform_info_t; > #include > > /****f* IBA Base: Types/ib_inform_info_get_qpn_resp_time 6302c6848 < * ib_inform_set_resp_time_val --- > * ib_inform_info_get_qpn_resp_time 6305c6851 < * Sets the response time value of an inform info structure. --- > * Get QPN of the inform info 6309,6312c6855,6859 < AL_INLINE void AL_API < ib_inform_set_resp_time_val( < IN OUT ib_inform_info_t* const p_inform_info, < IN const uint8_t resp_time_val ) --- > static inline void > ib_inform_info_get_qpn_resp_time( > IN const ib_net32_t qpn_resp_time_val, > OUT ib_net32_t* const p_qpn, > OUT uint8_t* const p_resp_time_val ) 6314c6861 < uint32_t val; --- > uint32_t tmp = cl_ntoh32(qpn_resp_time_val); 6316,6319c6863,6867 < val = cl_ntoh32( p_inform_info->combo2 ); < val &= 0x07FFFFFF; < val |= (resp_time_val << 27); < p_inform_info->combo2 = cl_hton32( val ); --- > if (p_qpn) > *p_qpn = cl_hton32((tmp & 0xffffff00) >> 8); > > if (p_resp_time_val) > *p_resp_time_val = (uint8_t)(tmp & 0x0000001f); 6323,6328c6871,6872 < * p_inform_info < * [in/out] Pointer to the inform info structure < * whose response time value to set. < * < * resp_time_val < * [in] Response time value of the inform info. --- > * qpn_resp_time_val > * [in] the qpn and resp time val from the mad 6331,6339c6875,6876 < * This function does not return a value. < * < * SEE ALSO < * ib_inform_info_t, ib_inform_get_resp_time_val < *********/ < < /****f* IBA Base: Types/ib_inform_get_prod_type < * NAME < * ib_inform_get_prod_type --- > * p_qpn > * [out] pointer to the qpn 6341,6356c6878,6879 < * DESCRIPTION < * Retrieves the producer type from an inform info structure. < * < * SYNOPSIS < */ < AL_INLINE uint32_t AL_API < ib_inform_get_prod_type( < IN const ib_inform_info_t* const p_inform_info ) < { < return (cl_ntoh32( p_inform_info->combo3 ) >> 8); < } < /* < * PARAMETERS < * p_inform_info < * [in] Pointer to the inform info structure whose < * prducer type to return. --- > * p_state > * [out] pointer to the resp time val 6358,6359c6881 < * RETURN VALUES < * Returns the producer type of the infrom info, in host byte order. --- > * NOTES 6362c6884 < * ib_inform_info_t, ib_inform_set_prod_type --- > * ib_inform_info_t 6365c6887 < /****f* IBA Base: Types/ib_inform_set_prod_type --- > /****f* IBA Base: Types/ib_inform_info_set_qpn 6367c6889 < * ib_inform_set_prod_type --- > * ib_inform_info_set_qpn 6370c6892 < * Sets the producer type of an inform info structure. --- > * Set the QPN of the inform info 6374,6377c6896,6899 < AL_INLINE void AL_API < ib_inform_set_prod_type( < IN OUT ib_inform_info_t* const p_inform_info, < IN const uint32_t prod_type ) --- > static inline void > ib_inform_info_set_qpn( > IN ib_inform_info_t *p_ii, > IN ib_net32_t const qpn) 6379c6901,6906 < p_inform_info->combo3 = cl_hton32( prod_type << 8 ); --- > uint32_t tmp = cl_ntoh32(p_ii->g_or_v.generic.qpn_resp_time_val); > p_ii->g_or_v.generic.qpn_resp_time_val = > cl_hton32( > (tmp & 0x000000ff) | > ((cl_ntoh32(qpn) << 8) & 0xffffff00) > ); 6383,6388d6909 < * p_inform_info < * [in/out] Pointer to the inform info structure < * whose producer type to set. < * < * prod_type < * [in] Producer type of inform trap. 6390,6391c6911 < * RETURN VALUES < * This function does not return a value. --- > * NOTES 6394c6914 < * ib_inform_info_t, ib_inform_get_prod_type --- > * ib_inform_info_t 6397c6917 < /****f* IBA Base: Types/ib_inform_get_vend_id --- > /****f* IBA Base: Types/ib_inform_info_get_node_type 6399c6919 < * ib_inform_get_vend_id --- > * ib_inform_info_get_node_type 6402c6922 < * Retrieves the vendor ID from an inform info structure. --- > * Get Node Type of the Inform Info 6406,6408c6926,6928 < AL_INLINE uint32_t AL_API < ib_inform_get_vend_id( < IN const ib_inform_info_t* const p_inform_info ) --- > static inline ib_net32_t > ib_inform_info_get_node_type( > IN const ib_inform_info_t *p_inf) 6410c6930,6933 < return ib_inform_get_prod_type( p_inform_info ); --- > uint32_t nt; > nt = cl_ntoh16(p_inf->g_or_v.generic.node_type_lsb) | > (p_inf->g_or_v.generic.node_type_msb << 16); > return cl_hton32(nt); 6414,6416c6937,6938 < * p_inform_info < * [in] Pointer to the inform info structure whose < * vendor ID to return. --- > * p_inf > * [in] pointer to an inform info 6419c6941,6943 < * Returns the vendor ID of the infrom info, in host byte order. --- > * The node type > * > * NOTES 6422c6946 < * ib_inform_info_t, ib_inform_set_vend_id --- > * ib_inform_info_t 6425c6949 < /****f* IBA Base: Types/ib_inform_set_vend_id --- > /****f* IBA Base: Types/ib_inform_info_get_vend_id 6427c6951 < * ib_inform_set_vend_id --- > * ib_inform_info_get_vend_id 6430c6954 < * Sets the vendor ID of an inform info structure. --- > * Get Node Type of the Inform Info 6434,6437c6958,6960 < AL_INLINE void AL_API < ib_inform_set_vend_id( < IN OUT ib_inform_info_t* const p_inform_info, < IN const uint32_t vend_id ) --- > static inline ib_net32_t > ib_inform_info_get_vend_id( > IN const ib_inform_info_t *p_inf) 6439c6962,6965 < ib_inform_set_prod_type( p_inform_info, vend_id ); --- > uint32_t vi; > vi = cl_ntoh16(p_inf->g_or_v.vend.vendor_id_lsb) | > (p_inf->g_or_v.vend.vendor_id_msb << 16); > return cl_hton32(vi); 6443,6448c6969,6970 < * p_inform_info < * [in/out] Pointer to the inform info structure < * whose vendor ID to set. < * < * vend_id < * [in] Vendor ID of inform trap. --- > * p_inf > * [in] pointer to an inform info 6451c6973,6975 < * This function does not return a value. --- > * The node type > * > * NOTES 6454c6978 < * ib_mad_inform_info_t, ib_inform_get_vend_id --- > * ib_inform_info_t 6457d6980 < 6471,6472c6994,6995 < net16_t subscriber_enum; < uint16_t reserved[3]; --- > ib_net16_t subscriber_enum; > uint8_t reserved[6]; 6477c7000 < /********/ --- > 6504c7027 < ib_mad_t hdr; --- > ib_mad_t header; 6514c7037 < * hdr --- > * header 6545d7067 < #define IB_DM_MAX_CTRL 0xFF; 6586c7108 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 6615c7137 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 6644c7166 < AL_INLINE uint8_t AL_API --- > static inline uint8_t 6649,6654c7171,7175 < if( !slot ) < return SLOT_DOES_NOT_EXIST; < else if( slot-- & 0x01 ) < return (p_iou_info->controller_list[slot >> 1] >> 4); < else < return (p_iou_info->controller_list[slot >> 1] & 0x0F); --- > if( slot >= IB_DM_CTRL_LIST_SIZE ) return SLOT_DOES_NOT_EXIST; > else return (int8_t) > ( (slot%2) ? > ((p_iou_info->controller_list[slot/2] & 0xf0) >> 4 ): > (p_iou_info->controller_list[slot/2] & 0x0f) ); 6662c7183 < * [in] 1-based slot number of the IOC slot to check. --- > * [in] Pointer to the IO Unit information structure. 6665,6666c7186 < * Returns the encoded value for the desired slot. Possible values are < * SLOT_DOES_NOT_EXIST, IOC_NOT_INSTALLED, and IOC_INSTALLED. --- > * OptionROM field of the IO Unit information. 6669d7188 < * The input slot number is 1-based, not zero based. 6685c7204 < typedef __declspec(align(8)) struct _ib_ioc_profile --- > typedef struct _ib_ioc_profile 6794c7313 < AL_INLINE uint32_t AL_API --- > static inline uint32_t 6802c7321 < AL_INLINE void AL_API --- > static inline void 6811,6826d7329 < AL_INLINE uint32_t AL_API < ib_ioc_profile_get_subsys_vend_id( < IN const ib_ioc_profile_t* const p_ioc_profile ) < { < return( cl_ntoh32(p_ioc_profile->subsys_vend_id) >> 8 ); < } < < < AL_INLINE void AL_API < ib_ioc_profile_set_subsys_vend_id( < IN ib_ioc_profile_t* const p_ioc_profile, < IN const uint32_t subsys_vend_id ) < { < p_ioc_profile->subsys_id = (cl_hton32(subsys_vend_id) << 8); < } < 6886,6891c7389 < /****f* IBA Bases: Types/ib_dm_get_slot_lo_hi < * DESCRIPTION < * Returns the IOC slot number, and the lower and upper bound of the < * service entries given the attribute modifier of ServiceEntries response. < */ < AL_INLINE void AL_API --- > static inline void 6894,6896c7392,7394 < OUT uint8_t *const p_slot OPTIONAL, < OUT uint8_t *const p_lo OPTIONAL, < OUT uint8_t *const p_hi OPTIONAL ) --- > OUT uint8_t *const p_slot, > OUT uint8_t *const p_lo, > OUT uint8_t *const p_hi ) 6901c7399 < *p_slot = (uint8_t)( ( tmp_slot_lo_hi >> 16 ) ); --- > *p_slot = (uint8_t)( ( tmp_slot_lo_hi >> 16 ) & 0x0f ); 6904c7402 < *p_hi = (uint8_t)( ( tmp_slot_lo_hi >> 8 ) ); --- > *p_hi = (uint8_t)( ( tmp_slot_lo_hi >> 8 ) & 0xff ); 6907c7405 < *p_lo = (uint8_t)( ( tmp_slot_lo_hi >> 0 ) ); --- > *p_lo = (uint8_t)( ( tmp_slot_lo_hi >> 0 ) & 0xff ); 6909,6924d7406 < /* SEE ALSO < * ib_dm_set_slot_lo_hi < ********/ < < /****f* IBA Bases: Types/ib_dm_set_slot_lo_hi < * DESCRIPTION < * Joins the IOC slot number, and the lower and upper bound of the service < * entries and returns it. < */ < AL_INLINE net32_t AL_API < ib_dm_set_slot_lo_hi( < IN const uint8_t slot, < IN const uint8_t lo, < IN const uint8_t hi ) < { < uint32_t tmp; 6926,6933d7407 < tmp = slot << 16; < tmp |= hi << 8; < tmp |= lo; < return cl_hton32( tmp ); < } < /* SEE ALSO < * ib_dm_get_slot_lo_hi < ********/ 6936c7410 < * Information describing an I/O controller --- > * IBA defined information describing an I/O controller 6938c7412 < #pragma warning(disable:4324) --- > #include 6941,6948c7415,7420 < net64_t chassis_guid; < uint8_t chassis_slot; < net64_t iou_guid; < uint8_t iou_slot; < ib_ioc_profile_t profile; < < } ib_ioc_info_t; < #pragma warning(default:4324) --- > ib_net64_t module_guid; > ib_net64_t iou_guid; > ib_ioc_profile_t ioc_profile; > ib_net64_t access_key; > uint16_t initiators_conf; > uint8_t resv[38]; 6949a7422,7423 > } PACK_SUFFIX ib_ioc_info_t; > #include 6954a7429 > #define IB_MCLASS_CM_VER_1 1 6959,6969c7434,7456 < #define IB_REQ_PDATA_SIZE 92 < #define IB_MRA_PDATA_SIZE 222 < #define IB_REJ_PDATA_SIZE 148 < #define IB_REP_PDATA_SIZE 196 < #define IB_RTU_PDATA_SIZE 224 < #define IB_LAP_PDATA_SIZE 168 < #define IB_APR_PDATA_SIZE 148 < #define IB_DREQ_PDATA_SIZE 220 < #define IB_DREP_PDATA_SIZE 224 < #define IB_SIDR_REQ_PDATA_SIZE 216 < #define IB_SIDR_REP_PDATA_SIZE 136 --- > #define IB_REQ_PDATA_SIZE_VER2 92 > #define IB_MRA_PDATA_SIZE_VER2 222 > #define IB_REJ_PDATA_SIZE_VER2 148 > #define IB_REP_PDATA_SIZE_VER2 196 > #define IB_RTU_PDATA_SIZE_VER2 224 > #define IB_LAP_PDATA_SIZE_VER2 168 > #define IB_APR_PDATA_SIZE_VER2 148 > #define IB_DREQ_PDATA_SIZE_VER2 220 > #define IB_DREP_PDATA_SIZE_VER2 224 > #define IB_SIDR_REQ_PDATA_SIZE_VER2 216 > #define IB_SIDR_REP_PDATA_SIZE_VER2 136 > > #define IB_REQ_PDATA_SIZE_VER1 92 > #define IB_MRA_PDATA_SIZE_VER1 222 > #define IB_REJ_PDATA_SIZE_VER1 148 > #define IB_REP_PDATA_SIZE_VER1 204 > #define IB_RTU_PDATA_SIZE_VER1 224 > #define IB_LAP_PDATA_SIZE_VER1 168 > #define IB_APR_PDATA_SIZE_VER1 151 > #define IB_DREQ_PDATA_SIZE_VER1 220 > #define IB_DREP_PDATA_SIZE_VER1 224 > #define IB_SIDR_REQ_PDATA_SIZE_VER1 216 > #define IB_SIDR_REP_PDATA_SIZE_VER1 140 7022a7510,7511 > > #define IB_REJ_SERVICE_HANDOFF CL_HTON16(65535) 7048c7537 < #define IB_AP_INVALID_QPN 6 --- > #define IB_AP_INVALID_QPN_EECN 6 7084c7573 < typedef uint8_t ib_sidr_status_t; --- > typedef uint16_t ib_sidr_status_t; 7093,7186d7581 < /************/ < /****s* IBA Base: Types/ib_gmp_t < * NAME < * ib_gmp_t < * < * DESCRIPTION < * IBA defined GMP MAD format. (16.1.1) < * < * SYNOPSIS < */ < #define IB_GMP_DATA_SIZE 200 < < #include < typedef struct _ib_gmp < { < uint8_t base_ver; < uint8_t mgmt_class; < uint8_t class_ver; < uint8_t method; < ib_net16_t status; < ib_net16_t resv; < ib_net64_t trans_id; < ib_net16_t attr_id; < ib_net16_t resv1; < ib_net32_t attr_mod; < uint8_t resv2[40]; < uint8_t data[IB_GMP_DATA_SIZE]; < < } PACK_SUFFIX ib_gmp_t; < #include < /**********/ < #define IB_GMP_MAD_HDR_SIZE (sizeof(ib_gmp_t) - IB_GMP_DATA_SIZE) < < < < /************/ < /****s* IBA Base: Types/ib_port_counters_t < * NAME < * ib_gmp_t < * < * DESCRIPTION < * IBA defined PortCounters MAD format. (16.1.3.5) < * < * SYNOPSIS < */ < #include < typedef struct _ib_port_counters < { < uint8_t reserved0; < uint8_t port_select; < ib_net16_t counter_select; < ib_net16_t symbol_error_counter; < uint8_t link_error_recovery_counter; < uint8_t link_down_counter; < ib_net16_t port_rcv_errors; < ib_net16_t port_rcv_remote_physical_errors; < ib_net16_t port_rcv_switch_relay_errors; < ib_net16_t port_xmit_discard; < uint8_t port_xmit_constraint_errors; < uint8_t port_rcv_constraint_errors; < uint8_t reserved1; < /* uint4_t excessive_buffer_overrun_errors; < uint4_t local_link_integrity_errors; */ < uint8_t lli_errors_exc_buf_errors; < ib_net16_t reserved2; < ib_net16_t vl15_dropped; < ib_net32_t port_xmit_data; < ib_net32_t port_rcv_data; < ib_net32_t port_xmit_pkts; < ib_net32_t port_rcv_pkts; < < } PACK_SUFFIX ib_port_counters_t; < #include < < < #define IB_COUNTER_SYMBOL_ERROR CL_NTOH16(1<<0) < #define IB_COUNTER_LINK_RECOVERY_ERROR CL_NTOH16(1<<1) < #define IB_COUNTER_LINK_DOWN CL_NTOH16(1<<2) < #define IB_COUNTER_RCV_ERROR CL_NTOH16(1<<3) < #define IB_COUNTERT_RCV_RMT_PHY_ERROR CL_NTOH16(1<<4) < #define IB_COUNTER_RCV_SWITCH_RELAY_ERROR CL_NTOH16(1<<5) < #define IB_COUNTER_XMIT_DISCARD CL_NTOH16(1<<6) < #define IB_COUNTER_XMIT_CONSTRAIN CL_NTOH16(1<<7) < #define IB_COUNTER_RCV_CONSTRAIN CL_NTOH16(1<<8) < #define IB_COUNTER_LINK_INTEG_ERROR CL_NTOH16(1<<9) < #define IB_COUNTER_EXECE_BUF_ERROR CL_NTOH16(1<<10) < #define IB_COUNTER_VL15_DROP CL_NTOH16(1<<11) < #define IB_COUNTER_XMIT_DATA CL_NTOH16(1<<12) < #define IB_COUNTER_XMIT_PKT CL_NTOH16(1<<13) < #define IB_COUNTER_RCV_DATA CL_NTOH16(1<<14) < #define IB_COUNTER_RCV_PKT CL_NTOH16(1<<15) < #define IB_COUNTER_ALL 0xff < < 7192,7200c7587,7596 < typedef struct _ib_ca* __ptr64 ib_ca_handle_t; < typedef struct _ib_pd* __ptr64 ib_pd_handle_t; < typedef struct _ib_mr* __ptr64 ib_mr_handle_t; < typedef struct _mlnx_fmr* __ptr64 mlnx_fmr_handle_t; < typedef struct _ib_mw* __ptr64 ib_mw_handle_t; < typedef struct _ib_qp* __ptr64 ib_qp_handle_t; < typedef struct _ib_cq* __ptr64 ib_cq_handle_t; < typedef struct _ib_av* __ptr64 ib_av_handle_t; < typedef struct _ib_mcast* __ptr64 ib_mcast_handle_t; --- > typedef struct _ib_ca *ib_ca_handle_t; > typedef struct _ib_pd *ib_pd_handle_t; > typedef struct _ib_rdd *ib_rdd_handle_t; > typedef struct _ib_mr *ib_mr_handle_t; > typedef struct _ib_mw *ib_mw_handle_t; > typedef struct _ib_qp *ib_qp_handle_t; > typedef struct _ib_eec *ib_eec_handle_t; > typedef struct _ib_cq *ib_cq_handle_t; > typedef struct _ib_av *ib_av_handle_t; > typedef struct _ib_mcast *ib_mcast_handle_t; 7233a7630 > IB_INVALID_EEC_STATE, 7250a7648 > IB_INVALID_EEC_HANDLE, 7254d7651 < IB_INVALID_FMR_HANDLE, 7255a7653 > IB_INVALID_RDD_HANDLE, 7264a7663 > IB_EE_IN_TIMEWAIT, 7267,7269d7665 < IB_INVALID_INDEX, < IB_NO_MATCH, < IB_PENDING, 7275c7671 < --- > extern const char* ib_error_str[]; 7286c7682 < AL_EXPORT const char* AL_API --- > static inline const char* 7288c7684,7689 < IN ib_api_status_t status ); --- > IN ib_api_status_t status ) > { > if( status > IB_UNKNOWN_ERROR ) > status = IB_UNKNOWN_ERROR; > return( ib_error_str[status] ); > } 7323a7725,7727 > IB_AE_EEC_FATAL, > IB_AE_EEC_COMM, > IB_AE_EEC_APM, 7334a7739 > IB_AE_EEC_APM_ERROR, 7339d7743 < IB_AE_CLIENT_REREGISTER, 7346c7750 < * An error occurred when accessing the send queue of the QP. --- > * An error occurred when accessing the send queue of the QP or EEC. 7355c7759 < * An error occurred when accessing the receive queue of the QP. --- > * An error occurred when accessing the receive queue of the QP or EEC. 7372a7777,7791 > * IB_AE_EEC_FATAL > * If reliable datagram service is supported, this event indicates that > * a catastrophic error occurred while accessing or processing the EEC > * that prevents reporting of completions. > * > * IB_AE_EEC_COMM > * If reliable datagram service is supported, this event indicates that > * the first packet has arrived for the receive work queue where the > * EEC is still in the RTR state. > * > * IB_AE_EEC_APM > * If reliable datagram service and alternate path migration is supported, > * this event indicates that the EEC connection has migrated to the > * alternate path. > * 7414a7834,7838 > * IB_AE_EEC_APM_ERROR > * If reliable datagram service and alternate path migration is supported, > * this event indicates that an incoming path migration request to this > * EEC was not accepted. > * 7429,7431d7852 < * IB_AE_CLIENT_REREGISTER < * The SM idicate to client to reregister its SA records. < * 7438c7859 < --- > extern const char* ib_async_event_str[]; 7449c7870 < AL_EXPORT const char* AL_API --- > static inline const char* 7451c7872,7877 < IN ib_async_event_t event ); --- > IN ib_async_event_t event ) > { > if( event > IB_AE_UNKNOWN ) > event = IB_AE_UNKNOWN; > return( ib_async_event_str[event] ); > } 7485c7911 < void* __ptr64 context; --- > void *context; 7595,7599d8020 < boolean_t ipd; < boolean_t pkey_switch_ext_port; < boolean_t bm; < boolean_t link_rtl; < boolean_t client_reregister; 7703,7704c8124,8125 < ib_gid_t* __ptr64 p_gid_table; < ib_net16_t* __ptr64 p_pkey_table; --- > ib_gid_t *p_gid_table; > ib_net16_t *p_pkey_table; 7729d8149 < uint64_t fw_ver; 7756a8177 > uint8_t max_eec_resp_res; 7759a8181,8184 > uint8_t max_eec_init_depth; > > uint32_t max_eecs; > uint32_t max_rdds; 7767,7768d8191 < uint32_t max_fmr; < uint32_t max_map_per_fmr; 7798,7799c8221,8222 < uint32_t* __ptr64 p_page_size; < ib_port_attr_t* __ptr64 p_port_attr; --- > uint32_t *p_page_size; > ib_port_attr_t *p_port_attr; 7816,7818d8238 < * fw_ver < * Device Firmware version. < * 7870,7871c8290,8292 < * Maximum limit on number of responder resources for incomming RDMA < * operations on QPs. --- > * max_eec_resp_res > * Maximum limit on number of responder resources for incoming RDMA > * operations, on QPs and EEC's respectively. 7878c8299,8300 < * Maximimum initiator depth per QP for initiating RDMA reads and --- > * max_eec_init_depth > * Maximimum initiator depth per QP or EEC for initiating RDMA reads and 7880a8303,8308 > * max_eecs > * Maximimum number of EEC's supported by the HCA. > * > * max_rdds > * Maximum number of Reliable datagram domains supported. > * 7913c8341 < * Indicates ability to change primary port for a QP during a --- > * Indicates ability to change primary port for a QP or EEC during a 7974c8402 < AL_EXPORT ib_ca_attr_t* AL_API --- > ib_ca_attr_t* 7977,8014c8405 < IN const ib_ca_attr_t* const p_src ); < /* < * PARAMETERS < * p_dest < * Pointer to the buffer that is the destination of the copy. < * < * p_src < * Pointer to the CA attributes to copy. < * < * RETURN VALUE < * Pointer to the copied CA attributes. < * < * NOTES < * The buffer pointed to by the p_dest parameter must be at least the size < * specified in the size field of the buffer pointed to by p_src. < * < * SEE ALSO < * ib_ca_attr_t, ib_dup_ca_attr, ib_free_ca_attr < *****/ < < < /****d* Access Layer/ib_pd_type_t < * NAME < * ib_pd_type_t < * < * DESCRIPTION < * Indicates the type of protection domain being allocated. < * < * SYNOPSIS < */ < typedef enum _ib_pd_type < { < IB_PDT_NORMAL, < IB_PDT_ALIAS, < IB_PDT_SQP, < IB_PDT_UD < < } ib_pd_type_t; --- > IN const ib_ca_attr_t* const p_src ); 8016,8018c8407,8412 < * VALUES < * IB_PDT_NORMAL < * Protection domain for all non-aliased QPs. --- > * PARAMETERS > * p_dest > * Pointer to the buffer that is the destination of the copy. > * > * p_src > * Pointer to the CA attributes to copy. 8020,8021c8414,8415 < * IB_PDT_ALIAS < * Protection domain for IB_QPT_QP0_ALIAS and IB_QPT_QP1_ALIAS QPs. --- > * RETURN VALUE > * Pointer to the copied CA attributes. 8023,8024c8417,8419 < * IB_PDT_SQP < * Protection domain for special queue pair usage. --- > * NOTES > * The buffer pointed to by the p_dest parameter must be at least the size > * specified in the size field of the buffer pointed to by p_src. 8026,8027c8421,8422 < * IB_PDT_UD < * Protection domain for UD queue pair usage. --- > * SEE ALSO > * ib_ca_attr_t, ib_dup_ca_attr, ib_free_ca_attr 8081c8476,8477 < IB_QPT_UNRELIABLE_DGRM = 3, /* Purposefully skip RDD type. */ --- > IB_QPT_RELIABLE_DGRM = 2, /* Matches CM REQ transport type */ > IB_QPT_UNRELIABLE_DGRM, 8098a8495,8497 > * IB_QPT_RELIABLE_DGRM > * Reliable, datagram queue pair. > * 8211a8611,8612 > ib_rdd_handle_t h_rdd; > 8227a8629,8633 > * h_rdd > * A handle to a reliable datagram domain to associate with the queue > * pair. This field is ignored if the queue pair is not a reliable > * datagram type queue pair. > * 8291d8696 < uint32_t sq_max_inline; 8300a8706 > ib_rdd_handle_t h_rdd; 8322,8326c8728,8729 < * This is a handle to a protection domain associated with the QP. < * < * sq_max_inline < * Maximum payload that can be inlined directly in a WQE, eliminating < * protection checks and additional DMA operations. --- > * This is a handle to a protection domain associated with the queue > * pair, or NULL if the queue pair is type IB_QPT_RELIABLE_DGRM. 8384a8788,8797 > struct _qp_reset > { > /* > * Time, in milliseconds, that the QP needs to spend in the > * time wait state before being reused. > */ > uint32_t timewait; > > } reset; > 8386a8800 > ib_qp_opts_t opts; 8400d8813 < uint8_t rnr_nak_timeout; 8408a8822 > uint8_t rnr_nak_timeout; 8416a8831 > uint8_t rnr_nak_timeout; 8421d8835 < uint8_t rnr_nak_timeout; 8454c8868 < /****d* Access Layer/ib_wr_type_t --- > /****s* Access Layer/ib_eec_attr_t 8456c8870 < * ib_wr_type_t --- > * ib_eec_attr_t 8459c8873 < * Identifies the type of work request posted to a queue pair. --- > * Information about an end-to-end context. 8463c8877 < typedef enum _ib_wr_type_t --- > typedef struct _ib_eec_attr 8465,8470c8879,8881 < WR_SEND = 1, < WR_RDMA_WRITE, < WR_RDMA_READ, < WR_COMPARE_SWAP, < WR_FETCH_ADD, < WR_UNKNOWN --- > ib_qp_state_t state; > ib_rdd_handle_t h_rdd; > ib_net32_t local_eecn; 8472,8473c8883,8899 < } ib_wr_type_t; < /*****/ --- > ib_net32_t sq_psn; > ib_net32_t rq_psn; > uint8_t primary_port; > uint16_t pkey_index; > uint32_t resp_res; > ib_net32_t remote_eecn; > uint32_t init_depth; > uint32_t dest_num; // ??? What is this? > ib_av_attr_t primary_av; > ib_av_attr_t alternate_av; > ib_apm_state_t apm_state; > > } ib_eec_attr_t; > /* > * SEE ALSO > * ib_qp_state_t, ib_av_attr_t, ib_apm_state_t > *****/ 8476c8902 < /****f* IBA Base: Types/ib_get_wr_type_str --- > /****d* Access Layer/ib_eec_opts_t 8478c8904 < * ib_get_wr_type_str --- > * ib_eec_opts_t 8481c8907 < * Returns a string for the specified work request type --- > * Optional fields supplied in the modify EEC operation. 8485,8488c8911,8919 < AL_EXPORT const char* AL_API < ib_get_wr_type_str( < IN uint8_t wr_type ); < --- > typedef uint32_t ib_eec_opts_t; > #define IB_MOD_EEC_ALTERNATE_AV 0x00000001 > #define IB_MOD_EEC_PKEY 0x00000002 > #define IB_MOD_EEC_APM_STATE 0x00000004 > #define IB_MOD_EEC_PRIMARY_AV 0x00000008 > #define IB_MOD_EEC_RNR 0x00000010 > #define IB_MOD_EEC_RESP_RES 0x00000020 > #define IB_MOD_EEC_OUTSTANDING 0x00000040 > #define IB_MOD_EEC_PRIMARY_PORT 0x00000080 8490,8493c8921,8924 < * PARAMETERS < * wr_type < * [in] Encoded work request type as defined in the < work request attribute. --- > * NOTES > * > * > *****/ 8495,8496c8926,8928 < * RETURN VALUES < * Pointer to the work request type string. --- > /****s* Access Layer/ib_eec_mod_t > * NAME > * ib_eec_mod_t 8498c8930,8932 < * NOTES --- > * DESCRIPTION > * Information needed to change the state of an end-to-end context through > * the ib_modify_eec function. 8499a8934,8989 > * SYNOPSIS > */ > typedef struct _ib_eec_mod > { > ib_qp_state_t req_state; > > union _eec_state > { > struct _eec_init > { > uint8_t primary_port; > uint16_t pkey_index; > > } init; > > struct _eec_rtr > { > ib_net32_t rq_psn; > ib_net32_t remote_eecn; > ib_av_attr_t primary_av; > uint8_t resp_res; > > ib_eec_opts_t opts; > ib_av_attr_t alternate_av; > uint16_t pkey_index; > > } rtr; > > struct _eec_rts > { > ib_net32_t sq_psn; > uint8_t retry_cnt; > uint8_t rnr_retry_cnt; > uint8_t local_ack_timeout; > uint8_t init_depth; > > ib_eec_opts_t opts; > ib_av_attr_t alternate_av; > ib_apm_state_t apm_state; > > ib_av_attr_t primary_av; > uint16_t pkey_index; > uint8_t primary_port; > > } rts; > > struct _eec_sqd > { > boolean_t sqd_event; > > } sqd; > > } state; > > } ib_eec_mod_t; > /* 8500a8991,8996 > * ib_qp_state_t, ib_av_attr_t, ib_apm_state_t > *****/ > > > /****d* Access Layer/ib_wr_type_t > * NAME 8502c8998,9013 < *********/ --- > * > * DESCRIPTION > * Identifies the type of work request posted to a queue pair. > * > * SYNOPSIS > */ > typedef enum _ib_wr_type_t > { > WR_SEND = 1, > WR_RDMA_WRITE, > WR_RDMA_READ, > WR_COMPARE_SWAP, > WR_FETCH_ADD > > } ib_wr_type_t; > /*****/ 8518c9029 < uint64_t vaddr; --- > void *vaddr; 8543a9055,9056 > > 8570a9084 > * 8575a9090 > * 8597c9112 < struct _ib_send_wr* __ptr64 p_next; --- > struct _ib_send_wr *p_next; 8602c9117 < ib_local_ds_t* __ptr64 ds_array; --- > ib_local_ds_t *ds_array; 8612,8613d9126 < uint16_t pkey_index; < void* __ptr64 rsvd; 8616a9130,9137 > struct _send_rd > { > ib_net32_t remote_qp; > ib_net32_t remote_qkey; > ib_net32_t eecn; > > } rd; > 8641c9162 < net32_t rkey; --- > uint32_t rkey; 8689,8693c9210,9215 < * dgrm.ud.pkey_index < * The pkey index for this send work request. This is valid only < * for IB_QPT_QP1 and IB_QPT_QP1_ALIAS QP types. The work request < * is posted to using this pkey index build the GMP's BTH instead < * of the QP's pkey. --- > * dgrm.rd.remote_qp > * Identifies the destination queue pair of a reliable datagram send > * operation. > * > * dgrm.rd.remote_qkey > * The qkey for the destination queue pair. 8695,8696c9217,9219 < * dgrm.ud.rsvd < * Reserved for use by the Access Layer. --- > * dgrm.rd.eecn > * The local end-to-end context number to use with the reliable datagram > * send operation. 8764c9287 < struct _ib_recv_wr* __ptr64 p_next; --- > struct _ib_recv_wr *p_next; 8767c9290 < ib_local_ds_t* __ptr64 ds_array; --- > ib_local_ds_t *ds_array; 8810c9333 < net32_t current_rkey; --- > uint32_t current_rkey; 8856a9380 > IB_WCS_LOCAL_EEC_OP_ERR, 8865,8867c9389,9391 < IB_WCS_BAD_RESP_ERR, < IB_WCS_LOCAL_ACCESS_ERR, < IB_WCS_GENERAL_ERR, --- > IB_WCS_REM_INVALID_RD_REQ_ERR, > IB_WCS_INVALID_EECN, > IB_WCS_INVALID_EEC_STATE, 8894a9419,9423 > * IB_WCS_LOCAL_EEC_OP_ERR > * An internal EEC consistency error was generated while processing > * this work request. This may indicate that the EEC was in an incorrect > * state for the requested operation. > * 8931,8938c9460,9468 < * IB_WCS_BAD_RESP_ERR, < * An unexpected transport layer opcode was returned < * by the responder. < * < * IB_WCS_LOCAL_ACCESS_ERR, < * A protection error occurred on a local data buffer < * during the processing of a RDMA Write with Immediate Data < * operation sent from the remote node. --- > * IB_WCS_REM_INVALID_RD_REQ_ERR, > * Responder detected an invalid RD message. This may be the result of an > * invalid qkey or an RDD mismatch. > * > * IB_WCS_INVALID_EECN > * An invalid EE context number was detected. > * > * IB_WCS_INVALID_EEC_STATE > * The EEC was in an invalid state for the specified request. 8947,8950d9476 < * < * IB_WCS_GENERAL_ERR, < * Any other error < * 8953a9480,9481 > extern const char* ib_wc_status_str[]; > 8964c9492 < AL_EXPORT const char* AL_API --- > static inline const char* 8966c9494,9499 < IN ib_wc_status_t wc_status ); --- > IN ib_wc_status_t wc_status ) > { > if( wc_status > IB_WCS_UNKNOWN ) > wc_status = IB_WCS_UNKNOWN; > return( ib_wc_status_str[wc_status] ); > } 8999,9000c9532 < IB_WC_RECV_RDMA_WRITE, < IB_WC_UNKNOWN --- > IB_WC_RECV_RDMA_WRITE 9006,9031d9537 < /****f* IBA Base: Types/ib_get_wc_type_str < * NAME < * ib_get_wc_type_str < * < * DESCRIPTION < * Returns a string for the specified work completion type. < * < * SYNOPSIS < */ < AL_EXPORT const char* AL_API < ib_get_wc_type_str( < IN ib_wc_type_t wc_type ); < /* < * PARAMETERS < * wc_type < * [in] work completion type value < * < * RETURN VALUES < * Pointer to the work completion type description string. < * < * NOTES < * < * SEE ALSO < *********/ < < 9077c9583 < struct _ib_wc* __ptr64 p_next; --- > struct _ib_wc *p_next; 9105a9612,9621 > struct _wc_rd > { > ib_net32_t remote_eecn; > ib_net32_t remote_qp; > ib_net16_t remote_lid; > uint8_t remote_sl; > uint32_t free_cnt; > > } rd; > 9140a9657 > * 9172,9173c9689,9690 < * The pkey index of the source queue pair. This is valid only for < * IB_QPT_QP1 and IB_QPT_QP1_ALIAS QP types. --- > * The pkey index for the source queue pair. This is valid only for > * GSI type QP's. 9183a9701,9717 > * recv.rd.remote_eecn > * The remote end-to-end context number that sent the received message. > * > * recv.rd.remote_qp > * Identifies the source queue pair of a received message. > * > * recv.rd.remote_lid > * The source LID of the received message. > * > * recv.rd.remote_sl > * The service level used by the source of the received message. > * > * recv.rd.free_cnt > * The number of available entries in the completion queue. Reliable > * datagrams may complete out of order, so this field may be used to > * determine the number of additional completions that may occur. > * 9229c9763 < void* __ptr64 vaddr; --- > void *vaddr; 9249,9320d9782 < #ifdef CL_KERNEL < < /****s* Access Layer/mlnx_fmr_create_t < * NAME < * mlnx_fmr_create_t < * < * DESCRIPTION < * Information required to create a Mellanox fast memory region. < * < * SYNOPSIS < */ < typedef struct _mlnx_fmr_create < { < int max_pages; < int max_maps; < uint8_t page_size; < ib_access_t access_ctrl; < < } mlnx_fmr_create_t; < /* < * FIELDS < * max_pages < * max pages in the region. < * < * max_maps < * max times, the region can be mapped before remapping. < * < * page_size < * log2 of the page size (e.g. 12 for 4KB). < * < * access_ctrl < * Access rights of the registered region. < * < * NOTES < * This is a Mellanox specific extension to verbs. < * < * SEE ALSO < * ib_access_t < *****/ < < #endif < < /****s* Access Layer/ib_phys_range_t < * NAME < * ib_phys_range_t < * < * DESCRIPTION < * Information describing a physical memory range. < * < * SYNOPSIS < */ < typedef struct _ib_phys_range < { < uint64_t base_addr; < uint64_t size; < < } ib_phys_range_t; < /* < * FIELDS < * base_addr < * Physical address of the base of the memory range. < * < * size < * size, in bytes, of the memory range. < * < * NOTES < * The base address must be start and end on an HCA-supported page boundary. < * < * SEE ALSO < * ib_phys_create_t < *********/ < 9334,9335c9796,9797 < uint32_t num_ranges; < ib_phys_range_t* __ptr64 range_array; --- > uint32_t num_bufs; > uint64_t *buf_array; 9337c9799 < uint32_t hca_page_size; --- > uint32_t page_size; 9342d9803 < * FIELDS 9346,9347c9807,9808 < * num_ranges < * Number of ib_phys_range structures listed in the specified range array. --- > * num_bufs > * Number of buffers listed in the specified buffer array. 9349,9350c9810,9811 < * range_array < * An array of ib_phys_range structures to be registered as a single memory --- > * buf_array > * An array of physical buffers to be registered as a single memory 9354,9355c9815,9816 < * The offset into the first physical memory range of the specified memory < * region on which to start the virtual address. --- > * The offset into the first physical page of the specified memory > * region to start the virtual address. 9357,9358c9818,9819 < * hca_page_size < * The HCA page size to use to register the memory. --- > * page_size > * The physical page size of the memory being registered. 9380,9383c9841,9844 < uint64_t local_lb; < uint64_t local_ub; < uint64_t remote_lb; < uint64_t remote_ub; --- > void *local_lb; > void *local_ub; > void *remote_lb; > void *remote_ub; 9385,9386c9846,9847 < net32_t lkey; < net32_t rkey; --- > uint32_t lkey; > uint32_t rkey; 9396,9397c9857 < * memory access. This is always a 64-bit quantity to support registering < * more than 4GB of memory on 32-bit systems with PAE. --- > * memory access. 9401,9402c9861 < * memory access. This is always a 64-bit quantity to support registering < * more than 4GB of memory on 32-bit systems with PAE. --- > * memory access. 9406,9407c9865 < * memory access. This is always a 64-bit quantity to support registering < * more than 4GB of memory on 32-bit systems with PAE. --- > * memory access. 9411,9412c9869 < * memory access. This is always a 64-bit quantity to support registering < * more than 4GB of memory on 32-bit systems with PAE. --- > * memory access. 9465d9921 < #define IB_CA_MOD_IS_CLIENT_REREGISTER_SUPPORTED 0x00800000 9544,9547d9999 < * < * IB_CA_MOD_IS_CLIENT_REREGISTER_SUPPORTED < * Used to modify the system image GUID for the port. < * 9661a10114 > IN OUT void* p_buf OPTIONAL; 9663c10116 < IN uint32_t buf_info; --- > IN OUT uint32_t num_bytes_ret; 9665,9666d10117 < OUT uint32_t num_bytes_ret; < IN OUT void* __ptr64 p_buf OPTIONAL; 9674,9684d10124 < * status < * The completion status from the verbs provider. This field should be < * initialize to indicate an error to allow detection and cleanup in < * case a communication error occurs between user-mode and kernel-mode. < * < * buf_size < * The size of the buffer in bytes. < * < * buf_info < * Additional buffer information < * 9690a10131,10133 > * buf_size > * The size of the buffer in bytes. > * 9696a10140,10144 > * status > * The completion status from the verbs provider. This field should be > * initialize to indicate an error to allow detection and cleanup in > * case a communication error occurs between user-mode and kernel-mode. > * 9703a10152 > END_C_DECLS 9705,9710c10154 < #ifdef __cplusplus < } /* extern "C" */ < #endif /* __cplusplus */ < < < #endif /* __IB_TYPES_H__ */ --- > #endif // __IB_TYPES_H__ From mst at mellanox.co.il Mon Jul 10 05:18:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 15:18:32 +0300 Subject: [openib-general] [PATCH] IB/mthca: don't use privileged uar for kernel access Message-ID: <20060710121832.GH24705@mellanox.co.il> Make kernel use UAR2 instead of UAR1 for hardware access: this adds sanity checking from the hardware side, without any performance cost. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/hw/mthca/mthca_uar.c b/drivers/infiniband/hw/mthca/mthca_uar.c index 8e92198..8b72848 100644 --- a/drivers/infiniband/hw/mthca/mthca_uar.c +++ b/drivers/infiniband/hw/mthca/mthca_uar.c @@ -60,7 +60,7 @@ int mthca_init_uar_table(struct mthca_de ret = mthca_alloc_init(&dev->uar_table.alloc, dev->limits.num_uars, dev->limits.num_uars - 1, - dev->limits.reserved_uars); + dev->limits.reserved_uars + 1); if (ret) return ret; -- MST From mst at mellanox.co.il Mon Jul 10 05:43:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 15:43:47 +0300 Subject: [openib-general] [PATCH upstream] IB/cm: drop REQ when out of memory In-Reply-To: <000101c69a37$fb0deb30$e598070a@amr.corp.intel.com> References: <000101c69a37$fb0deb30$e598070a@amr.corp.intel.com> Message-ID: <20060710124347.GI24705@mellanox.co.il> I plan to send the following (from SVN trunk rev 8261) upstream to Andrew. Comments? --- If a user of the IB CM returns -ENOMEM from their connection callback, simply drop the incoming REQ - do not attempt to send a reject. This should allow the sender to retry the request. Signed-off-by: Michael S. Tsirkin Signed-off-by: Sean Hefty Index: l/drivers/infiniband/core/cm.c =================================================================== --- l/drivers/infiniband/core/cm.c (revision 8224) +++ l/drivers/infiniband/core/cm.c (working copy) @@ -702,7 +702,7 @@ static void cm_reset_to_idle(struct cm_i } } -void ib_destroy_cm_id(struct ib_cm_id *cm_id) +static void cm_destroy_id(struct ib_cm_id *cm_id, int err) { struct cm_id_private *cm_id_priv; struct cm_work *work; @@ -736,12 +736,22 @@ retest: sizeof cm_id_priv->av.port->cm_dev->ca_guid, NULL, 0); break; + case IB_CM_REQ_RCVD: + if (err == -ENOMEM) { + /* Do not reject to allow future retries. */ + cm_reset_to_idle(cm_id_priv); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + } else { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); + } + break; case IB_CM_MRA_REQ_RCVD: case IB_CM_REP_SENT: case IB_CM_MRA_REP_RCVD: ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); /* Fall through */ - case IB_CM_REQ_RCVD: case IB_CM_MRA_REQ_SENT: case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: @@ -776,6 +786,11 @@ retest: kfree(cm_id_priv->private_data); kfree(cm_id_priv); } + +void ib_destroy_cm_id(struct ib_cm_id *cm_id) +{ + cm_destroy_id(cm_id, 0); +} EXPORT_SYMBOL(ib_destroy_cm_id); int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask, @@ -1164,7 +1179,7 @@ static void cm_process_work(struct cm_id } cm_deref_id(cm_id_priv); if (ret) - ib_destroy_cm_id(&cm_id_priv->id); + cm_destroy_id(&cm_id_priv->id, ret); } static void cm_format_mra(struct cm_mra_msg *mra_msg, -- MST From mst at mellanox.co.il Mon Jul 10 05:47:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 15:47:55 +0300 Subject: [openib-general] [PATCH upstream] IB/addr: gid structure alignment fix Message-ID: <20060710124755.GJ24705@mellanox.co.il> I plan to send the following (from SVN r8265) upstream to Andrew. Comments? --- The device address contains unsigned character arrays, which contain raw GID addresses. The GIDs may not be naturally aligned, so do not cast them to structures or unions. Signed-off-by: Sean Hefty Signed-off-by: Michael S. Tsirkin Index: gitcma/include/rdma/ib_addr.h =================================================================== --- gitcma.orig/include/rdma/ib_addr.h 2006-07-09 23:41:27.000000000 +0300 +++ gitcma/include/rdma/ib_addr.h 2006-07-09 23:51:23.000000000 +0300 @@ -89,9 +89,10 @@ static inline void ib_addr_set_pkey(stru dev_addr->broadcast[9] = (unsigned char) pkey; } -static inline union ib_gid *ib_addr_get_sgid(struct rdma_dev_addr *dev_addr) +static inline void ib_addr_get_sgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) { - return (union ib_gid *) (dev_addr->src_dev_addr + 4); + memcpy(gid, dev_addr->src_dev_addr + 4, sizeof *gid); } static inline void ib_addr_set_sgid(struct rdma_dev_addr *dev_addr, @@ -100,9 +101,10 @@ static inline void ib_addr_set_sgid(stru memcpy(dev_addr->src_dev_addr + 4, gid, sizeof *gid); } -static inline union ib_gid *ib_addr_get_dgid(struct rdma_dev_addr *dev_addr) +static inline void ib_addr_get_dgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) { - return (union ib_gid *) (dev_addr->dst_dev_addr + 4); + memcpy(gid, dev_addr->dst_dev_addr + 4, sizeof *gid); } static inline void ib_addr_set_dgid(struct rdma_dev_addr *dev_addr, Index: gitcma/drivers/infiniband/core/cma.c =================================================================== --- gitcma.orig/drivers/infiniband/core/cma.c 2006-07-09 23:41:26.000000000 +0300 +++ gitcma/drivers/infiniband/core/cma.c 2006-07-09 23:51:23.000000000 +0300 @@ -262,14 +262,14 @@ static void cma_detach_from_dev(struct r static int cma_acquire_ib_dev(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; - union ib_gid *gid; + union ib_gid gid; int ret = -ENODEV; - gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); + ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid), mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) { - ret = ib_find_cached_gid(cma_dev->device, gid, + ret = ib_find_cached_gid(cma_dev->device, &gid, &id_priv->id.port_num, NULL); if (!ret) { cma_attach_to_dev(id_priv, cma_dev); @@ -1134,8 +1134,8 @@ static int cma_query_ib_route(struct rdm struct ib_sa_path_rec path_rec; memset(&path_rec, 0, sizeof path_rec); - path_rec.sgid = *ib_addr_get_sgid(addr); - path_rec.dgid = *ib_addr_get_dgid(addr); + ib_addr_get_sgid(addr, &path_rec.sgid); + ib_addr_get_dgid(addr, &path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); path_rec.numb_path = 1; @@ -1263,7 +1263,7 @@ static int cma_bind_loopback(struct rdma { struct cma_device *cma_dev; struct ib_port_attr port_attr; - union ib_gid *gid; + union ib_gid gid; u16 pkey; int ret; u8 p; @@ -1284,8 +1284,7 @@ static int cma_bind_loopback(struct rdma } port_found: - gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); - ret = ib_get_cached_gid(cma_dev->device, p, 0, gid); + ret = ib_get_cached_gid(cma_dev->device, p, 0, &gid); if (ret) goto out; @@ -1293,6 +1292,7 @@ port_found: if (ret) goto out; + ib_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); id_priv->id.port_num = p; cma_attach_to_dev(id_priv, cma_dev); @@ -1339,6 +1339,7 @@ static int cma_resolve_loopback(struct r { struct cma_work *work; struct sockaddr_in *src_in, *dst_in; + union ib_gid gid; int ret; work = kzalloc(sizeof *work, GFP_KERNEL); @@ -1351,8 +1352,8 @@ static int cma_resolve_loopback(struct r goto err; } - ib_addr_set_dgid(&id_priv->id.route.addr.dev_addr, - ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr)); + ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); + ib_addr_set_dgid(&id_priv->id.route.addr.dev_addr, &gid); if (cma_zero_addr(&id_priv->id.route.addr.src_addr)) { src_in = (struct sockaddr_in *)&id_priv->id.route.addr.src_addr; -- MST From ftillier at silverstorm.com Mon Jul 10 06:21:35 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Mon, 10 Jul 2006 09:21:35 -0400 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A83C5@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A83C5@mtlexch01.mtl.com> Message-ID: <79ae2f320607100621n479ba1e3n5ea0a1679c73603f@mail.gmail.com> Hi Eitan, A lot of the changes are whitespace, comments, and repeated things like using the AL_API and AL_INLINE macros versus static inline. Could you filter these out and send out what the actual changes that matter are? I quickly lost interest here. Also, the AL_API and AL_INLINE stuff cannot go away in Windows, so if this is an issue for Linux OpenSM, then we should stop now. - Fab On 7/10/06, Eitan Zahavi wrote: > > Can you please resend as inline text, or at the very least a plain text attachment? > > Enjoy: > > 2c2,3 > < * Copyright (c) 2005 SilverStorm Technologies. All rights reserved. > --- > > * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > > * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > 5,6c6,10 > < * This software is available to you under the OpenIB.org BSD license > < * below: > --- > > * This software is available to you under a choice of one of two > > * licenses. You may choose to be licensed under the terms of the GNU > > * General Public License (GPL) Version 2, available from the file > > * COPYING in the main directory of this source tree, or the > > * OpenIB.org BSD license below: > 30c34 > < * $Id: ib_types.h 381 2006-06-08 17:10:14Z ftillier $ > --- > > * $Id: ib_types.h 8291 2006-06-29 18:53:06Z halr $ > 36a41 > > #include > 40,54d44 > < #ifdef CL_KERNEL > < #define AL_EXPORT > < #define AL_API > < #define AL_INLINE static inline > < #else > < #if defined( EXPORT_AL_SYMBOLS ) > < #define AL_EXPORT __declspec(dllexport) > < #else > < #define AL_EXPORT __declspec(dllimport) > < #endif > < #define AL_API __stdcall > < #define AL_INLINE AL_EXPORT inline > < #endif /* CL_KERNEL */ > < > < > 56,57c46,50 > < extern "C" > < { > --- > > # define BEGIN_C_DECLS extern "C" { > > # define END_C_DECLS } > > #else /* !__cplusplus */ > > # define BEGIN_C_DECLS > > # define END_C_DECLS > 59a53 > > BEGIN_C_DECLS > 222c216 > < #define IB_DEFAULT_SUBNET_PREFIX (CL_HTON64(CL_CONST64(0xFE80000000000000))) > --- > > #define IB_DEFAULT_SUBNET_PREFIX (CL_HTON64(0xFE80000000000000ULL)) > 365c359 > < #define IB_PKEY_ENTRIES_MAX (IB_PKEY_MAX_BLOCKS * IB_PKEY_BLOCK_SIZE) > --- > > #define IB_PKEY_ENTRIES_MAX (IB_PKEY_MAX_BLOCKS * IB_NUM_PKEY_ELEMENTS_IN_BLOCK) > 377c371 > < #define IB_PKEY_BASE_MASK (CL_NTOH16(0x7FFF)) > --- > > #define IB_PKEY_BASE_MASK (CL_HTON16(0x7FFF)) > 391a386,397 > > /****d* IBA Base: Constants/IB_DEFAULT_PARTIAL_PKEY > > * NAME > > * IB_DEFAULT_PARTIAL_PKEY > > * > > * DESCRIPTION > > * 0x7FFF in network order > > * > > * SOURCE > > */ > > #define IB_DEFAULT_PARTIAL_PKEY (CL_HTON16(0x7FFF)) > > /**********/ > > > 511a518,541 > > /****d* IBA Base: Constants/IB_MCLASS_DEV_ADM > > * NAME > > * IB_MCLASS_DEV_ADM > > * > > * DESCRIPTION > > * Subnet Management Class, Device Administration > > * > > * SOURCE > > */ > > #define IB_MCLASS_DEV_ADM 0x10 > > /**********/ > > > > /****d* IBA Base: Constants/IB_MCLASS_BIS > > * NAME > > * IB_MCLASS_BIS > > * > > * DESCRIPTION > > * Subnet Management Class, BIS > > * > > * SOURCE > > */ > > #define IB_MCLASS_BIS 0x12 > > /**********/ > > > 541c571 > < * Indicitates if the Class Code if a vendor specific class from > --- > > * Indicates if the Class Code if a vendor specific class from > 573c603 > < * Indicitates if the Class Code if a vendor specific class from > --- > > * Indicates if the Class Code if a vendor specific class from > 606c636 > < * Indicitates if the Class Code if a vendor specific class > --- > > * Indicates if the Class Code if a vendor specific class > 631a662,693 > > /****f* IBA Base: Types/ib_class_is_rmpp > > * NAME > > * ib_class_is_rmpp > > * > > * DESCRIPTION > > * Indicates if the Class Code supports RMPP > > * > > * SYNOPSIS > > */ > > static inline boolean_t > > ib_class_is_rmpp( > > IN const uint8_t class_code ) > > { > > return( (class_code == IB_MCLASS_SUBN_ADM) || > > (class_code == IB_MCLASS_DEV_MGMT) || > > (class_code == IB_MCLASS_DEV_ADM) || > > (class_code == IB_MCLASS_BIS) || > > ib_class_is_vendor_specific_high( class_code ) ); > > } > > /* > > * PARAMETERS > > * class_code > > * [in] The Management Datagram Class Code > > * > > * RETURN VALUE > > * TRUE if the class supports RMPP > > * FALSE otherwise. > > * > > * NOTES > > * > > *********/ > > > 902d963 > < /*********/ > 1100c1161 > < * SmInfoRecord attribute (15.2.5) > --- > > * SMInfoRecord attribute (15.2.5) > 1106a1168,1179 > > /****d* IBA Base: Constants/IB_MAD_ATTR_GUIDINFO_RECORD > > * NAME > > * IB_MAD_ATTR_GUIDINFO_RECORD > > * > > * DESCRIPTION > > * GuidInfoRecord attribute (15.2.5) > > * > > * SOURCE > > */ > > #define IB_MAD_ATTR_GUIDINFO_RECORD (CL_NTOH16(0x0030)) > > /**********/ > > > 1117a1191 > > > 1128a1203 > > > 1139a1215 > > > 1145c1221 > < * LinearForwardingRecord attribute (15.2.5.6) > --- > > * LinearForwardingTableRecord attribute (15.2.5.6) > 1150a1227 > > > 1156c1233 > < * P-KEY table attribute (15.2.5) > --- > > * PKEY Table Record attribute (15.2.5) > 1161a1239 > > > 1172a1251 > > > 1178c1257 > < * VL Arbitration Table attribute (15.2.5) > --- > > * VL Arbitration Table Record attribute (15.2.5) > 1183a1263 > > > 1189c1269 > < * VSLtoL Map Table attribute (15.2.5) > --- > > * SLtoVL Mapping Table Record attribute (15.2.5) > 1194a1275 > > > 1205a1287 > > > 1216a1299 > > > 1222c1305 > < * MultiPath attribute (15.2.5) > --- > > * MultiPathRecord attribute (15.2.5) > 1227a1311 > > > 1233c1317 > < * Service Association attribute (15.2.5) > --- > > * Service Association Record attribute (15.2.5) > 1238a1323 > > > 1250c1335,1336 > < /****d* IBA Base: Constants/IB_MAD_ATTR_IOC_PROFILE > --- > > > > /****d* IBA Base: Constants/IB_MAD_ATTR_IO_CONTROLLER_PROFILE > 1252c1338 > < * IB_MAD_ATTR_IOC_PROFILE > --- > > * IB_MAD_ATTR_IO_CONTROLLER_PROFILE > 1259c1345 > < #define IB_MAD_ATTR_IOC_PROFILE (CL_NTOH16(0x0011)) > --- > > #define IB_MAD_ATTR_IO_CONTROLLER_PROFILE (CL_NTOH16(0x0011)) > 1260a1347 > > > 1271a1359 > > > 1282a1371 > > > 1293a1383 > > > 1304a1395 > > > 1315a1407 > > > 1327c1419,1420 > < /*** IBA Base: Constants/IB_MAD_ATTR_SVC_ASSOCIATION_RECORD > --- > > > > /****d* IBA Base: Constants/IB_MAD_ATTR_SVC_ASSOCIATION_RECORD > 1332c1425 > < * Service Association attribute (15.2.5) > --- > > * Service Association Record attribute (15.2.5) > 1337a1431 > > > 1348a1443 > > > 1359a1455 > > > 1382a1479 > > > 1393a1491 > > > 1404a1503 > > > 1419c1518 > < /****d* IBA Base: Constants/IB_MTU_TYPE > --- > > /****d* IBA Base: Constants/IB_MTU_LEN_TYPE > 1421c1520 > < * IB_MTU_TYPE > --- > > * IB_MTU_LEN_TYPE > 1434,1438c1533,1541 > < #define IB_MTU_256 1 > < #define IB_MTU_512 2 > < #define IB_MTU_1024 3 > < #define IB_MTU_2048 4 > < #define IB_MTU_4096 5 > --- > > #define IB_MTU_LEN_256 1 > > #define IB_MTU_LEN_512 2 > > #define IB_MTU_LEN_1024 3 > > #define IB_MTU_LEN_2048 4 > > #define IB_MTU_LEN_4096 5 > > > > #define IB_MIN_MTU IB_MTU_LEN_256 > > #define IB_MAX_MTU IB_MTU_LEN_4096 > > > 1509c1612 > < * Mask for the selector field for path record MTU, rate > --- > > * Mask for the selector field for path record MTU, rate, > 1514a1618,1628 > > /****d* IBA Base: Constants/IB_MULTIPATH_REC_SELECTOR_MASK > > * NAME > > * IB_MULTIPATH_REC_SELECTOR_MASK > > * > > * DESCRIPTION > > * Mask for the selector field for multipath record MTU, rate, > > * and packet lifetime. > > * > > * SOURCE > > */ > > #define IB_MULTIPATH_REC_SELECTOR_MASK 0xC0 > 1521c1635 > < * Mask for the base value field for path record MTU, rate > --- > > * Mask for the base value field for path record MTU, rate, > 1527a1642,1653 > > /****d* IBA Base: Constants/IB_MULTIPATH_REC_BASE_MASK > > * NAME > > * IB_MULTIPATH_REC_BASE_MASK > > * > > * DESCRIPTION > > * Mask for the base value field for multipath record MTU, rate, > > * and packet lifetime. > > * > > * SOURCE > > */ > > #define IB_MULTIPATH_REC_BASE_MASK 0x3F > > /**********/ > 1534c1660 > < * Definitions are from the InfiniBand Architecture Specification v1.1 > --- > > * Definitions are from the InfiniBand Architecture Specification v1.2 > 1616c1742 > < AL_INLINE const char* AL_API > --- > > static inline const char* > 1618c1744 > < IN uint8_t node_type ) > --- > > IN uint32_t node_type ) > 1620c1746 > < if( node_type >= IB_NODE_TYPE_ROUTER ) > --- > > if( node_type >= IB_NOTICE_NODE_TYPE_ROUTER ) > 1658c1784 > < AL_INLINE const char* AL_API > --- > > static inline const char* > 1668c1794 > < * port_state > --- > > * node_type > 1689c1815 > < AL_INLINE const uint8_t AL_API > --- > > static inline uint8_t > 1744c1870 > < AL_INLINE ib_net16_t AL_API > --- > > static inline ib_net16_t > 1767c1893 > < * Indicitates if the port is a full member of the parition. > --- > > * Indicates if the port is a full member of the parition. > 1771c1897 > < AL_INLINE boolean_t AL_API > --- > > static inline boolean_t > 1791a1918,1950 > > /****f* IBA Base: Types/ib_pkey_is_invalid > > * NAME > > * ib_pkey_is_invalid > > * > > * DESCRIPTION > > * Returns TRUE if the given P_Key is an invalid P_Key > > * C10-116: the CI shall regard a P_Key as invalid if its low-order > > * 15 bits are all zero... > > * > > * SYNOPSIS > > */ > > static inline boolean_t > > ib_pkey_is_invalid( > > IN const ib_net16_t pkey ) > > { > > if (ib_pkey_get_base(pkey) == 0x0000) > > return TRUE; > > > > return FALSE; > > } > > /* > > * PARAMETERS > > * pkey > > * [in] P_Key value > > * > > * RETURN VALUE > > * Returns the base P_Key value with the membership bit stripped. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > > 1835c1994 > < AL_INLINE boolean_t AL_API > --- > > static inline boolean_t > 1851c2010 > < AL_INLINE void AL_API > --- > > static inline void > 1884c2043 > < AL_INLINE ib_net64_t AL_API > --- > > static inline ib_net64_t > 1948,1949c2107 > < CL_HTON64( CL_CONST64(0xFFFFFFFFFFFF0000) ) ) == > < CL_HTON64( CL_CONST64(0xFEC0000000000000) ) ); > --- > > CL_HTON64( 0xFFFFFFFFFFFF0000ULL ) ) == CL_HTON64( 0xFEC0000000000000ULL ) ); > 1974c2132 > < AL_INLINE ib_net64_t AL_API > --- > > static inline ib_net64_t > 1994,2021d2151 > < /****s* IBA Base: Types/ib_field32_t > < * NAME > < * ib_field32_t > < * > < * DESCRIPTION > < * Represents a 32-bit field, and allows access as a 32-bit network byte > < * ordered or a 4-byte array. > < * > < * SYNOPSIS > < */ > < #include > < typedef union _ib_field32_t > < { > < net32_t val; > < uint8_t bytes[4]; > < > < } PACK_SUFFIX ib_field32_t; > < #include > < /* > < * FIELDS > < * val > < * Full field value. > < * > < * bytes > < * Byte array representing the field. The byte array provides identical > < * access independently from CPU byte-ordering. > < *********/ > < > 2033c2163 > < typedef __declspec(align(8)) struct _ib_path_rec > --- > > typedef struct _ib_path_rec > 2035c2165 > < uint64_t resv0; > --- > > uint8_t resv0[8]; > 2040c2170 > < ib_field32_t hop_flow_raw; > --- > > ib_net32_t hop_flow_raw; > 2049,2050c2179 > < uint16_t resv1; > < uint32_t resv2; > --- > > uint8_t resv2[6]; > 2078c2207,2208 > < * In queries, maximum number of paths to return. > --- > > * Reversible path - 1 bit to say if path is reversible. > > * num_path [6:0] In queries, maximum number of paths to return. > 2103,2105d2232 > < * resv1 > < * Reserved bytes. > < * > 2111,2132c2238,2258 > < /* Port Record Component Masks */ > < #define IB_PR_COMPMASK_LID (CL_HTON64(1<<0)) > < #define IB_PR_COMPMASK_DGID (CL_HTON64(1<<2)) > < #define IB_PR_COMPMASK_SGID (CL_HTON64(1<<3)) > < #define IB_PR_COMPMASK_DLID (CL_HTON64(1<<4)) > < #define IB_PR_COMPMASK_SLID (CL_HTON64(1<<5)) > < #define IB_PR_COMPMASK_RAWTRAFIC (CL_HTON64(1<<6)) > < #define IB_PR_COMPMASK_RESV0 (CL_HTON64(1<<7)) > < #define IB_PR_COMPMASK_FLOWLABEL (CL_HTON64(1<<8)) > < #define IB_PR_COMPMASK_HOPLIMIT (CL_HTON64(1<<9)) > < #define IB_PR_COMPMASK_TCLASS (CL_HTON64(1<<10)) > < #define IB_PR_COMPMASK_REVERSIBLE (CL_HTON64(1<<11)) > < #define IB_PR_COMPMASK_NUM_PATH (CL_HTON64(1<<12)) > < #define IB_PR_COMPMASK_PKEY (CL_HTON64(1<<13)) > < #define IB_PR_COMPMASK_RESV1 (CL_HTON64(1<<14)) > < #define IB_PR_COMPMASK_SL (CL_HTON64(1<<15)) > < #define IB_PR_COMPMASK_MTU_SEL (CL_HTON64(1<<16)) > < #define IB_PR_COMPMASK_MTU (CL_HTON64(1<<17)) > < #define IB_PR_COMPMASK_RATE_SEL (CL_HTON64(1<<18)) > < #define IB_PR_COMPMASK_RATE (CL_HTON64(1<<19)) > < #define IB_PR_COMPMASK_PKTLIFETIME_SEL (CL_HTON64(1<<20)) > < #define IB_PR_COMPMASK_PFTLIFETIME (CL_HTON64(1<<21)) > --- > > /* Path Record Component Masks */ > > #define IB_PR_COMPMASK_DGID (CL_HTON64(((uint64_t)1)<<2)) > > #define IB_PR_COMPMASK_SGID (CL_HTON64(((uint64_t)1)<<3)) > > #define IB_PR_COMPMASK_DLID (CL_HTON64(((uint64_t)1)<<4)) > > #define IB_PR_COMPMASK_SLID (CL_HTON64(((uint64_t)1)<<5)) > > #define IB_PR_COMPMASK_RAWTRAFFIC (CL_HTON64(((uint64_t)1)<<6)) > > #define IB_PR_COMPMASK_RESV0 (CL_HTON64(((uint64_t)1)<<7)) > > #define IB_PR_COMPMASK_FLOWLABEL (CL_HTON64(((uint64_t)1)<<8)) > > #define IB_PR_COMPMASK_HOPLIMIT (CL_HTON64(((uint64_t)1)<<9)) > > #define IB_PR_COMPMASK_TCLASS (CL_HTON64(((uint64_t)1)<<10)) > > #define IB_PR_COMPMASK_REVERSIBLE (CL_HTON64(((uint64_t)1)<<11)) > > #define IB_PR_COMPMASK_NUMBPATH (CL_HTON64(((uint64_t)1)<<12)) > > #define IB_PR_COMPMASK_PKEY (CL_HTON64(((uint64_t)1)<<13)) > > #define IB_PR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<14)) > > #define IB_PR_COMPMASK_SL (CL_HTON64(((uint64_t)1)<<15)) > > #define IB_PR_COMPMASK_MTUSELEC (CL_HTON64(((uint64_t)1)<<16)) > > #define IB_PR_COMPMASK_MTU (CL_HTON64(((uint64_t)1)<<17)) > > #define IB_PR_COMPMASK_RATESELEC (CL_HTON64(((uint64_t)1)<<18)) > > #define IB_PR_COMPMASK_RATE (CL_HTON64(((uint64_t)1)<<19)) > > #define IB_PR_COMPMASK_PKTLIFETIMESELEC (CL_HTON64(((uint64_t)1)<<20)) > > #define IB_PR_COMPMASK_PKTLIFETIME (CL_HTON64(((uint64_t)1)<<21)) > 2135,2138c2261,2264 > < #define IB_LR_COMPMASK_FROM_LID (CL_HTON64(1<<0)) > < #define IB_LR_COMPMASK_FROM_PORT (CL_HTON64(1<<1)) > < #define IB_LR_COMPMASK_TO_PORT (CL_HTON64(1<<2)) > < #define IB_LR_COMPMASK_TO_LID (CL_HTON64(1<<3)) > --- > > #define IB_LR_COMPMASK_FROM_LID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_LR_COMPMASK_FROM_PORT (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_LR_COMPMASK_TO_PORT (CL_HTON64(((uint64_t)1)<<2)) > > #define IB_LR_COMPMASK_TO_LID (CL_HTON64(((uint64_t)1)<<3)) > 2141,2143c2267,2270 > < #define IB_VLA_COMPMASK_LID (CL_HTON64(1<<0)) > < #define IB_VLA_COMPMASK_OUT_PORT (CL_HTON64(1<<1)) > < #define IB_VLA_COMPMASK_BLOCK (CL_HTON64(1<<2)) > --- > > #define IB_VLA_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_VLA_COMPMASK_OUT_PORT (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_VLA_COMPMASK_BLOCK (CL_HTON64(((uint64_t)1)<<2)) > > > 2145,2166c2272,2301 > < #define IB_SLVL_COMPMASK_LID (CL_HTON64(1<<0)) > < #define IB_SLVL_COMPMASK_IN_PORT (CL_HTON64(1<<1)) > < #define IB_SLVL_COMPMASK_OUT_PORT (CL_HTON64(1<<2)) > < /* LFT Record MASKS */ > < #define IB_LFTR_COMPMASK_LID (CL_HTON64(1<<0)) > < #define IB_LFTR_COMPMASK_BLOCK (CL_HTON64(1<<1)) > < /* ModeInfo Record MASKS */ > < #define IB_NR_COMPMASK_LID (CL_HTON64(1<<0)) > < #define IB_NR_COMPMASK_RESERVED1 (CL_HTON64(1<<1)) > < #define IB_NR_COMPMASK_BASEVERSION (CL_HTON64(1<<2)) > < #define IB_NR_COMPMASK_CLASSVERSION (CL_HTON64(1<<3)) > < #define IB_NR_COMPMASK_NODETYPE (CL_HTON64(1<<4)) > < #define IB_NR_COMPMASK_NUMPORTS (CL_HTON64(1<<5)) > < #define IB_NR_COMPMASK_SYSIMAGEGUID (CL_HTON64(1<<6)) > < #define IB_NR_COMPMASK_NODEGUID (CL_HTON64(1<<7)) > < #define IB_NR_COMPMASK_PORTGUID (CL_HTON64(1<<8)) > < #define IB_NR_COMPMASK_PARTCAP (CL_HTON64(1<<9)) > < #define IB_NR_COMPMASK_DEVID (CL_HTON64(1<<10)) > < #define IB_NR_COMPMASK_REV (CL_HTON64(1<<11)) > < #define IB_NR_COMPMASK_PORTNUM (CL_HTON64(1<<12)) > < #define IB_NR_COMPMASK_VENDID (CL_HTON64(1<<13)) > < #define IB_NR_COMPMASK_NODEDESC (CL_HTON64(1<<14)) > --- > > #define IB_SLVL_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_SLVL_COMPMASK_IN_PORT (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_SLVL_COMPMASK_OUT_PORT (CL_HTON64(((uint64_t)1)<<2)) > > > > /* P_Key Table Record Masks */ > > #define IB_PKEY_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_PKEY_COMPMASK_BLOCK (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_PKEY_COMPMASK_PORT (CL_HTON64(((uint64_t)1)<<2)) > > > > /* LFT Record Masks */ > > #define IB_LFTR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_LFTR_COMPMASK_BLOCK (CL_HTON64(((uint64_t)1)<<1)) > > > > /* NodeInfo Record Masks */ > > #define IB_NR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_NR_COMPMASK_RESERVED1 (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_NR_COMPMASK_BASEVERSION (CL_HTON64(((uint64_t)1)<<2)) > > #define IB_NR_COMPMASK_CLASSVERSION (CL_HTON64(((uint64_t)1)<<3)) > > #define IB_NR_COMPMASK_NODETYPE (CL_HTON64(((uint64_t)1)<<4)) > > #define IB_NR_COMPMASK_NUMPORTS (CL_HTON64(((uint64_t)1)<<5)) > > #define IB_NR_COMPMASK_SYSIMAGEGUID (CL_HTON64(((uint64_t)1)<<6)) > > #define IB_NR_COMPMASK_NODEGUID (CL_HTON64(((uint64_t)1)<<7)) > > #define IB_NR_COMPMASK_PORTGUID (CL_HTON64(((uint64_t)1)<<8)) > > #define IB_NR_COMPMASK_PARTCAP (CL_HTON64(((uint64_t)1)<<9)) > > #define IB_NR_COMPMASK_DEVID (CL_HTON64(((uint64_t)1)<<10)) > > #define IB_NR_COMPMASK_REV (CL_HTON64(((uint64_t)1)<<11)) > > #define IB_NR_COMPMASK_PORTNUM (CL_HTON64(((uint64_t)1)<<12)) > > #define IB_NR_COMPMASK_VENDID (CL_HTON64(((uint64_t)1)<<13)) > > #define IB_NR_COMPMASK_NODEDESC (CL_HTON64(((uint64_t)1)<<14)) > > > 2168,2199c2303,2334 > < #define IB_SR_COMPMASK_SID (CL_HTON64(1<<0)) > < #define IB_SR_COMPMASK_SGID (CL_HTON64(1<<1)) > < #define IB_SR_COMPMASK_SPKEY (CL_HTON64(1<<2)) > < #define IB_SR_COMPMASK_RES1 (CL_HTON64(1<<3)) > < #define IB_SR_COMPMASK_SLEASE (CL_HTON64(1<<4)) > < #define IB_SR_COMPMASK_SKEY (CL_HTON64(1<<5)) > < #define IB_SR_COMPMASK_SNAME (CL_HTON64(1<<6)) > < #define IB_SR_COMPMASK_SDATA8_0 (CL_HTON64(1<<7)) > < #define IB_SR_COMPMASK_SDATA8_1 (CL_HTON64(1<<8)) > < #define IB_SR_COMPMASK_SDATA8_2 (CL_HTON64(1<<9)) > < #define IB_SR_COMPMASK_SDATA8_3 (CL_HTON64(1<<10)) > < #define IB_SR_COMPMASK_SDATA8_4 (CL_HTON64(1<<11)) > < #define IB_SR_COMPMASK_SDATA8_5 (CL_HTON64(1<<12)) > < #define IB_SR_COMPMASK_SDATA8_6 (CL_HTON64(1<<13)) > < #define IB_SR_COMPMASK_SDATA8_7 (CL_HTON64(1<<14)) > < #define IB_SR_COMPMASK_SDATA8_8 (CL_HTON64(1<<15)) > < #define IB_SR_COMPMASK_SDATA8_9 (CL_HTON64(1<<16)) > < #define IB_SR_COMPMASK_SDATA8_10 (CL_HTON64(1<<17)) > < #define IB_SR_COMPMASK_SDATA8_11 (CL_HTON64(1<<18)) > < #define IB_SR_COMPMASK_SDATA8_12 (CL_HTON64(1<<19)) > < #define IB_SR_COMPMASK_SDATA8_13 (CL_HTON64(1<<20)) > < #define IB_SR_COMPMASK_SDATA8_14 (CL_HTON64(1<<21)) > < #define IB_SR_COMPMASK_SDATA8_15 (CL_HTON64(1<<22)) > < #define IB_SR_COMPMASK_SDATA16_0 (CL_HTON64(1<<23)) > < #define IB_SR_COMPMASK_SDATA16_1 (CL_HTON64(1<<24)) > < #define IB_SR_COMPMASK_SDATA16_2 (CL_HTON64(1<<25)) > < #define IB_SR_COMPMASK_SDATA16_3 (CL_HTON64(1<<26)) > < #define IB_SR_COMPMASK_SDATA16_4 (CL_HTON64(1<<27)) > < #define IB_SR_COMPMASK_SDATA16_5 (CL_HTON64(1<<28)) > < #define IB_SR_COMPMASK_SDATA16_6 (CL_HTON64(1<<29)) > < #define IB_SR_COMPMASK_SDATA16_7 (CL_HTON64(1<<30)) > < #define IB_SR_COMPMASK_SDATA32_0 (CL_HTON64(1<<31)) > --- > > #define IB_SR_COMPMASK_SID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_SR_COMPMASK_SGID (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_SR_COMPMASK_SPKEY (CL_HTON64(((uint64_t)1)<<2)) > > #define IB_SR_COMPMASK_RES1 (CL_HTON64(((uint64_t)1)<<3)) > > #define IB_SR_COMPMASK_SLEASE (CL_HTON64(((uint64_t)1)<<4)) > > #define IB_SR_COMPMASK_SKEY (CL_HTON64(((uint64_t)1)<<5)) > > #define IB_SR_COMPMASK_SNAME (CL_HTON64(((uint64_t)1)<<6)) > > #define IB_SR_COMPMASK_SDATA8_0 (CL_HTON64(((uint64_t)1)<<7)) > > #define IB_SR_COMPMASK_SDATA8_1 (CL_HTON64(((uint64_t)1)<<8)) > > #define IB_SR_COMPMASK_SDATA8_2 (CL_HTON64(((uint64_t)1)<<9)) > > #define IB_SR_COMPMASK_SDATA8_3 (CL_HTON64(((uint64_t)1)<<10)) > > #define IB_SR_COMPMASK_SDATA8_4 (CL_HTON64(((uint64_t)1)<<11)) > > #define IB_SR_COMPMASK_SDATA8_5 (CL_HTON64(((uint64_t)1)<<12)) > > #define IB_SR_COMPMASK_SDATA8_6 (CL_HTON64(((uint64_t)1)<<13)) > > #define IB_SR_COMPMASK_SDATA8_7 (CL_HTON64(((uint64_t)1)<<14)) > > #define IB_SR_COMPMASK_SDATA8_8 (CL_HTON64(((uint64_t)1)<<15)) > > #define IB_SR_COMPMASK_SDATA8_9 (CL_HTON64(((uint64_t)1)<<16)) > > #define IB_SR_COMPMASK_SDATA8_10 (CL_HTON64(((uint64_t)1)<<17)) > > #define IB_SR_COMPMASK_SDATA8_11 (CL_HTON64(((uint64_t)1)<<18)) > > #define IB_SR_COMPMASK_SDATA8_12 (CL_HTON64(((uint64_t)1)<<19)) > > #define IB_SR_COMPMASK_SDATA8_13 (CL_HTON64(((uint64_t)1)<<20)) > > #define IB_SR_COMPMASK_SDATA8_14 (CL_HTON64(((uint64_t)1)<<21)) > > #define IB_SR_COMPMASK_SDATA8_15 (CL_HTON64(((uint64_t)1)<<22)) > > #define IB_SR_COMPMASK_SDATA16_0 (CL_HTON64(((uint64_t)1)<<23)) > > #define IB_SR_COMPMASK_SDATA16_1 (CL_HTON64(((uint64_t)1)<<24)) > > #define IB_SR_COMPMASK_SDATA16_2 (CL_HTON64(((uint64_t)1)<<25)) > > #define IB_SR_COMPMASK_SDATA16_3 (CL_HTON64(((uint64_t)1)<<26)) > > #define IB_SR_COMPMASK_SDATA16_4 (CL_HTON64(((uint64_t)1)<<27)) > > #define IB_SR_COMPMASK_SDATA16_5 (CL_HTON64(((uint64_t)1)<<28)) > > #define IB_SR_COMPMASK_SDATA16_6 (CL_HTON64(((uint64_t)1)<<29)) > > #define IB_SR_COMPMASK_SDATA16_7 (CL_HTON64(((uint64_t)1)<<30)) > > #define IB_SR_COMPMASK_SDATA32_0 (CL_HTON64(((uint64_t)1)<<31)) > 2204a2340 > > > 2206,2253c2342,2391 > < #define IB_PIR_COMPMASK_LID (CL_HTON64(1<<0)) > < #define IB_PIR_COMPMASK_PORTNUM (CL_HTON64(1<<1)) > < #define IB_PIR_COMPMASK_RESV1 (CL_HTON64(1<<2)) > < #define IB_PIR_COMPMASK_MKEY (CL_HTON64(1<<3)) > < #define IB_PIR_COMPMASK_GIDPRE (CL_HTON64(1<<4)) > < #define IB_PIR_COMPMASK_BASELID (CL_HTON64(1<<5)) > < #define IB_PIR_COMPMASK_SMLID (CL_HTON64(1<<6)) > < #define IB_PIR_COMPMASK_CAPMASK (CL_HTON64(1<<7)) > < #define IB_PIR_COMPMASK_DIAGCODE (CL_HTON64(1<<8)) > < #define IB_PIR_COMPMASK_MKEYLEASEPRD (CL_HTON64(1<<9)) > < #define IB_PIR_COMPMASK_LOCALPORTNUM (CL_HTON64(1<<10)) > < #define IB_PIR_COMPMASK_LNKWIDTHSUPPORT (CL_HTON64(1<<11)) > < #define IB_PIR_COMPMASK_LNKWIDTHACTIVE (CL_HTON64(1<<12)) > < #define IB_PIR_COMPMASK_LINKWIDTHENABLED (CL_HTON64(1<<13)) > < #define IB_PIR_COMPMASK_LNKSPEEDSUPPORT (CL_HTON64(1<<14)) > < #define IB_PIR_COMPMASK_PORTSTATE (CL_HTON64(1<<15)) > < #define IB_PIR_COMPMASK_PORTPHYSTATE (CL_HTON64(1<<16)) > < #define IB_PIR_COMPMASK_LINKDWNDFLTSTATE (CL_HTON64(1<<17)) > < #define IB_PIR_COMPMASK_MKEYPROTBITS (CL_HTON64(1<<18)) > < #define IB_PIR_COMPMASK_LMC (CL_HTON64(1<<19)) > < #define IB_PIR_COMPMASK_LINKSPEEDACTIVE (CL_HTON64(1<<20)) > < #define IB_PIR_COMPMASK_LINKSPEEDENABLE (CL_HTON64(1<<21)) > < #define IB_PIR_COMPMASK_NEIGHBORMTU (CL_HTON64(1<<22)) > < #define IB_PIR_COMPMASK_MASTERSMSL (CL_HTON64(1<<23)) > < #define IB_PIR_COMPMASK_VLCAP (CL_HTON64(1<<24)) > < #define IB_PIR_COMPMASK_INITTYPE (CL_HTON64(1<<25)) > < #define IB_PIR_COMPMASK_VLHIGHLIMIT (CL_HTON64(1<<26)) > < #define IB_PIR_COMPMASK_VLARBHIGHCAP (CL_HTON64(1<<27)) > < #define IB_PIR_COMPMASK_VLARBLOWCAP (CL_HTON64(1<<28)) > < #define IB_PIR_COMPMASK_INITTYPEREPLY (CL_HTON64(1<<29)) > < #define IB_PIR_COMPMASK_MTUCAP (CL_HTON64(1<<30)) > < #define IB_PIR_COMPMASK_VLSTALLCNT (CL_HTON64(1<<31)) > < #define IB_PIR_COMPMASK_HOQLIFE (CL_HTON64(((uint64_t)1)<<32)) > < #define IB_PIR_COMPMASK_OPVLS (CL_HTON64(((uint64_t)1)<<33)) > < #define IB_PIR_COMPMASK_PARENFIN (CL_HTON64(((uint64_t)1)<<34)) > < #define IB_PIR_COMPMASK_PARENFOUT (CL_HTON64(((uint64_t)1)<<35)) > < #define IB_PIR_COMPMASK_FILTERRAWIN (CL_HTON64(((uint64_t)1)<<36)) > < #define IB_PIR_COMPMASK_FILTERRAWOUT (CL_HTON64(((uint64_t)1)<<37)) > < #define IB_PIR_COMPMASK_MKEYVIO (CL_HTON64(((uint64_t)1)<<38)) > < #define IB_PIR_COMPMASK_PKEYVIO (CL_HTON64(((uint64_t)1)<<39)) > < #define IB_PIR_COMPMASK_QKEYVIO (CL_HTON64(((uint64_t)1)<<40)) > < #define IB_PIR_COMPMASK_GUIDCAP (CL_HTON64(((uint64_t)1)<<41)) > < #define IB_PIR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<42)) > < #define IB_PIR_COMPMASK_SUBNTO (CL_HTON64(((uint64_t)1)<<43)) > < #define IB_PIR_COMPMASK_RESV3 (CL_HTON64(((uint64_t)1)<<44)) > < #define IB_PIR_COMPMASK_RESPTIME (CL_HTON64(((uint64_t)1)<<45)) > < #define IB_PIR_COMPMASK_LOCALPHYERR (CL_HTON64(((uint64_t)1)<<46)) > < #define IB_PIR_COMPMASK_OVERRUNERR (CL_HTON64(((uint64_t)1)<<47)) > --- > > #define IB_PIR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_PIR_COMPMASK_PORTNUM (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_PIR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<2)) > > #define IB_PIR_COMPMASK_MKEY (CL_HTON64(((uint64_t)1)<<3)) > > #define IB_PIR_COMPMASK_GIDPRE (CL_HTON64(((uint64_t)1)<<4)) > > #define IB_PIR_COMPMASK_BASELID (CL_HTON64(((uint64_t)1)<<5)) > > #define IB_PIR_COMPMASK_SMLID (CL_HTON64(((uint64_t)1)<<6)) > > #define IB_PIR_COMPMASK_CAPMASK (CL_HTON64(((uint64_t)1)<<7)) > > #define IB_PIR_COMPMASK_DIAGCODE (CL_HTON64(((uint64_t)1)<<8)) > > #define IB_PIR_COMPMASK_MKEYLEASEPRD (CL_HTON64(((uint64_t)1)<<9)) > > #define IB_PIR_COMPMASK_LOCALPORTNUM (CL_HTON64(((uint64_t)1)<<10)) > > #define IB_PIR_COMPMASK_LINKWIDTHENABLED (CL_HTON64(((uint64_t)1)<<11)) > > #define IB_PIR_COMPMASK_LNKWIDTHSUPPORT (CL_HTON64(((uint64_t)1)<<12)) > > #define IB_PIR_COMPMASK_LNKWIDTHACTIVE (CL_HTON64(((uint64_t)1)<<13)) > > #define IB_PIR_COMPMASK_LNKSPEEDSUPPORT (CL_HTON64(((uint64_t)1)<<14)) > > #define IB_PIR_COMPMASK_PORTSTATE (CL_HTON64(((uint64_t)1)<<15)) > > #define IB_PIR_COMPMASK_PORTPHYSTATE (CL_HTON64(((uint64_t)1)<<16)) > > #define IB_PIR_COMPMASK_LINKDWNDFLTSTATE (CL_HTON64(((uint64_t)1)<<17)) > > #define IB_PIR_COMPMASK_MKEYPROTBITS (CL_HTON64(((uint64_t)1)<<18)) > > #define IB_PIR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<19)) > > #define IB_PIR_COMPMASK_LMC (CL_HTON64(((uint64_t)1)<<20)) > > #define IB_PIR_COMPMASK_LINKSPEEDACTIVE (CL_HTON64(((uint64_t)1)<<21)) > > #define IB_PIR_COMPMASK_LINKSPEEDENABLE (CL_HTON64(((uint64_t)1)<<22)) > > #define IB_PIR_COMPMASK_NEIGHBORMTU (CL_HTON64(((uint64_t)1)<<23)) > > #define IB_PIR_COMPMASK_MASTERSMSL (CL_HTON64(((uint64_t)1)<<24)) > > #define IB_PIR_COMPMASK_VLCAP (CL_HTON64(((uint64_t)1)<<25)) > > #define IB_PIR_COMPMASK_INITTYPE (CL_HTON64(((uint64_t)1)<<26)) > > #define IB_PIR_COMPMASK_VLHIGHLIMIT (CL_HTON64(((uint64_t)1)<<27)) > > #define IB_PIR_COMPMASK_VLARBHIGHCAP (CL_HTON64(((uint64_t)1)<<28)) > > #define IB_PIR_COMPMASK_VLARBLOWCAP (CL_HTON64(((uint64_t)1)<<29)) > > #define IB_PIR_COMPMASK_INITTYPEREPLY (CL_HTON64(((uint64_t)1)<<30)) > > #define IB_PIR_COMPMASK_MTUCAP (CL_HTON64(((uint64_t)1)<<31)) > > #define IB_PIR_COMPMASK_VLSTALLCNT (CL_HTON64(((uint64_t)1)<<32)) > > #define IB_PIR_COMPMASK_HOQLIFE (CL_HTON64(((uint64_t)1)<<33)) > > #define IB_PIR_COMPMASK_OPVLS (CL_HTON64(((uint64_t)1)<<34)) > > #define IB_PIR_COMPMASK_PARENFIN (CL_HTON64(((uint64_t)1)<<35)) > > #define IB_PIR_COMPMASK_PARENFOUT (CL_HTON64(((uint64_t)1)<<36)) > > #define IB_PIR_COMPMASK_FILTERRAWIN (CL_HTON64(((uint64_t)1)<<37)) > > #define IB_PIR_COMPMASK_FILTERRAWOUT (CL_HTON64(((uint64_t)1)<<38)) > > #define IB_PIR_COMPMASK_MKEYVIO (CL_HTON64(((uint64_t)1)<<39)) > > #define IB_PIR_COMPMASK_PKEYVIO (CL_HTON64(((uint64_t)1)<<40)) > > #define IB_PIR_COMPMASK_QKEYVIO (CL_HTON64(((uint64_t)1)<<41)) > > #define IB_PIR_COMPMASK_GUIDCAP (CL_HTON64(((uint64_t)1)<<42)) > > #define IB_PIR_COMPMASK_RESV3 (CL_HTON64(((uint64_t)1)<<43)) > > #define IB_PIR_COMPMASK_SUBNTO (CL_HTON64(((uint64_t)1)<<44)) > > #define IB_PIR_COMPMASK_RESV4 (CL_HTON64(((uint64_t)1)<<45)) > > #define IB_PIR_COMPMASK_RESPTIME (CL_HTON64(((uint64_t)1)<<46)) > > #define IB_PIR_COMPMASK_LOCALPHYERR (CL_HTON64(((uint64_t)1)<<47)) > > #define IB_PIR_COMPMASK_OVERRUNERR (CL_HTON64(((uint64_t)1)<<48)) > > > 2255,2273c2393,2449 > < #define IB_MCR_COMPMASK_GID (CL_HTON64(1<<0)) > < #define IB_MCR_COMPMASK_MGID (CL_HTON64(1<<0)) > < #define IB_MCR_COMPMASK_PORT_GID (CL_HTON64(1<<1)) > < #define IB_MCR_COMPMASK_QKEY (CL_HTON64(1<<2)) > < #define IB_MCR_COMPMASK_MLID (CL_HTON64(1<<3)) > < #define IB_MCR_COMPMASK_MTU_SEL (CL_HTON64(1<<4)) > < #define IB_MCR_COMPMASK_MTU (CL_HTON64(1<<5)) > < #define IB_MCR_COMPMASK_TCLASS (CL_HTON64(1<<6)) > < #define IB_MCR_COMPMASK_PKEY (CL_HTON64(1<<7)) > < #define IB_MCR_COMPMASK_RATE_SEL (CL_HTON64(1<<8)) > < #define IB_MCR_COMPMASK_RATE (CL_HTON64(1<<9)) > < #define IB_MCR_COMPMASK_LIFE_SEL (CL_HTON64(1<<10)) > < #define IB_MCR_COMPMASK_LIFE (CL_HTON64(1<<11)) > < #define IB_MCR_COMPMASK_SL (CL_HTON64(1<<12)) > < #define IB_MCR_COMPMASK_FLOW (CL_HTON64(1<<13)) > < #define IB_MCR_COMPMASK_HOP (CL_HTON64(1<<14)) > < #define IB_MCR_COMPMASK_SCOPE (CL_HTON64(1<<15)) > < #define IB_MCR_COMPMASK_JOIN_STATE (CL_HTON64(1<<16)) > < #define IB_MCR_COMPMASK_PROXY (CL_HTON64(1<<17)) > --- > > #define IB_MCR_COMPMASK_GID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_MCR_COMPMASK_MGID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_MCR_COMPMASK_PORT_GID (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_MCR_COMPMASK_QKEY (CL_HTON64(((uint64_t)1)<<2)) > > #define IB_MCR_COMPMASK_MLID (CL_HTON64(((uint64_t)1)<<3)) > > #define IB_MCR_COMPMASK_MTU_SEL (CL_HTON64(((uint64_t)1)<<4)) > > #define IB_MCR_COMPMASK_MTU (CL_HTON64(((uint64_t)1)<<5)) > > #define IB_MCR_COMPMASK_TCLASS (CL_HTON64(((uint64_t)1)<<6)) > > #define IB_MCR_COMPMASK_PKEY (CL_HTON64(((uint64_t)1)<<7)) > > #define IB_MCR_COMPMASK_RATE_SEL (CL_HTON64(((uint64_t)1)<<8)) > > #define IB_MCR_COMPMASK_RATE (CL_HTON64(((uint64_t)1)<<9)) > > #define IB_MCR_COMPMASK_LIFE_SEL (CL_HTON64(((uint64_t)1)<<10)) > > #define IB_MCR_COMPMASK_LIFE (CL_HTON64(((uint64_t)1)<<11)) > > #define IB_MCR_COMPMASK_SL (CL_HTON64(((uint64_t)1)<<12)) > > #define IB_MCR_COMPMASK_FLOW (CL_HTON64(((uint64_t)1)<<13)) > > #define IB_MCR_COMPMASK_HOP (CL_HTON64(((uint64_t)1)<<14)) > > #define IB_MCR_COMPMASK_SCOPE (CL_HTON64(((uint64_t)1)<<15)) > > #define IB_MCR_COMPMASK_JOIN_STATE (CL_HTON64(((uint64_t)1)<<16)) > > #define IB_MCR_COMPMASK_PROXY (CL_HTON64(((uint64_t)1)<<17)) > > > > /* GUID Info Record Component Masks */ > > #define IB_GIR_COMPMASK_LID (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_GIR_COMPMASK_BLOCKNUM (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_GIR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<2)) > > #define IB_GIR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<3)) > > #define IB_GIR_COMPMASK_GID0 (CL_HTON64(((uint64_t)1)<<4)) > > #define IB_GIR_COMPMASK_GID1 (CL_HTON64(((uint64_t)1)<<5)) > > #define IB_GIR_COMPMASK_GID2 (CL_HTON64(((uint64_t)1)<<6)) > > #define IB_GIR_COMPMASK_GID3 (CL_HTON64(((uint64_t)1)<<7)) > > #define IB_GIR_COMPMASK_GID4 (CL_HTON64(((uint64_t)1)<<8)) > > #define IB_GIR_COMPMASK_GID5 (CL_HTON64(((uint64_t)1)<<9)) > > #define IB_GIR_COMPMASK_GID6 (CL_HTON64(((uint64_t)1)<<10)) > > #define IB_GIR_COMPMASK_GID7 (CL_HTON64(((uint64_t)1)<<11)) > > > > /* MultiPath Record Component Masks */ > > #define IB_MPR_COMPMASK_RAWTRAFFIC (CL_HTON64(((uint64_t)1)<<0)) > > #define IB_MPR_COMPMASK_RESV0 (CL_HTON64(((uint64_t)1)<<1)) > > #define IB_MPR_COMPMASK_FLOWLABEL (CL_HTON64(((uint64_t)1)<<2)) > > #define IB_MPR_COMPMASK_HOPLIMIT (CL_HTON64(((uint64_t)1)<<3)) > > #define IB_MPR_COMPMASK_TCLASS (CL_HTON64(((uint64_t)1)<<4)) > > #define IB_MPR_COMPMASK_REVERSIBLE (CL_HTON64(((uint64_t)1)<<5)) > > #define IB_MPR_COMPMASK_NUMBPATH (CL_HTON64(((uint64_t)1)<<6)) > > #define IB_MPR_COMPMASK_PKEY (CL_HTON64(((uint64_t)1)<<7)) > > #define IB_MPR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<8)) > > #define IB_MPR_COMPMASK_SL (CL_HTON64(((uint64_t)1)<<9)) > > #define IB_MPR_COMPMASK_MTUSELEC (CL_HTON64(((uint64_t)1)<<10)) > > #define IB_MPR_COMPMASK_MTU (CL_HTON64(((uint64_t)1)<<11)) > > #define IB_MPR_COMPMASK_RATESELEC (CL_HTON64(((uint64_t)1)<<12)) > > #define IB_MPR_COMPMASK_RATE (CL_HTON64(((uint64_t)1)<<13)) > > #define IB_MPR_COMPMASK_PKTLIFETIMESELEC (CL_HTON64(((uint64_t)1)<<14)) > > #define IB_MPR_COMPMASK_PKTLIFETIME (CL_HTON64(((uint64_t)1)<<15)) > > #define IB_MPR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<16)) > > #define IB_MPR_COMPMASK_INDEPSELEC (CL_HTON64(((uint64_t)1)<<17)) > > #define IB_MPR_COMPMASK_RESV3 (CL_HTON64(((uint64_t)1)<<18)) > > #define IB_MPR_COMPMASK_SGIDCOUNT (CL_HTON64(((uint64_t)1)<<19)) > > #define IB_MPR_COMPMASK_DGIDCOUNT (CL_HTON64(((uint64_t)1)<<20)) > > #define IB_MPR_COMPMASK_RESV4 (CL_HTON64(((uint64_t)1)<<21)) > 2284c2460 > < AL_INLINE void AL_API > --- > > static inline void > 2287,2300c2463,2476 > < IN const ib_gid_t* const p_dgid, > < IN const ib_gid_t* const p_sgid, > < IN const ib_net16_t dlid, > < IN const ib_net16_t slid, > < IN const uint8_t num_path, > < IN const ib_net16_t pkey, > < IN const uint8_t sl, > < IN const uint8_t mtu_selector, > < IN const uint8_t mtu, > < IN const uint8_t rate_selector, > < IN const uint8_t rate, > < IN const uint8_t pkt_life_selector, > < IN const uint8_t pkt_life, > < IN const uint8_t preference ) > --- > > IN ib_gid_t* const p_dgid, > > IN ib_gid_t* const p_sgid, > > IN ib_net16_t dlid, > > IN ib_net16_t slid, > > IN uint8_t num_path, > > IN ib_net16_t pkey, > > IN uint8_t sl, > > IN uint8_t mtu_selector, > > IN uint8_t mtu, > > IN uint8_t rate_selector, > > IN uint8_t rate, > > IN uint8_t pkt_life_selector, > > IN uint8_t pkt_life, > > IN uint8_t preference ) > 2319c2495 > < p_rec->hop_flow_raw.val = 0; > --- > > p_rec->hop_flow_raw = 0; > 2322,2324c2498,2500 > < p_rec->resv0 = 0; > < p_rec->resv1 = 0; > < p_rec->resv2 = 0; > --- > > *((uint64_t*)p_rec->resv0) = 0; > > *((uint32_t*)p_rec->resv2) = 0; > > *((uint16_t*)p_rec->resv2 + 2) = 0; > 2344c2520,2521 > < * [in] In queries, maximum number of paths to return. > --- > > * [in] Reversible path - 1 bit to say if path is reversible. > > * num_path [6:0] In queries, maximum number of paths to return. > 2383a2561,2589 > > /****f* IBA Base: Types/ib_path_rec_num_path > > * NAME > > * ib_path_rec_num_path > > * > > * DESCRIPTION > > * Get max number of paths to return. > > * > > * SYNOPSIS > > */ > > static inline uint8_t > > ib_path_rec_num_path( > > IN const ib_path_rec_t* const p_rec ) > > { > > return( p_rec->num_path &0x7F ); > > } > > /* > > * PARAMETERS > > * p_rec > > * [in] Pointer to the path record object. > > * > > * RETURN VALUES > > * Maximum number of paths to return for each unique SGID_DGID combination. > > * > > * NOTES > > * > > * SEE ALSO > > * ib_path_rec_t > > *********/ > > > 2393c2599 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2405,2411c2611 > < * Encoded path MTU. > < * 1: 256 > < * 2: 512 > < * 3: 1024 > < * 4: 2048 > < * 5: 4096 > < * others: reserved > --- > > * SL. > 2428c2628 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2463c2663 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2496c2696 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2511a2712,2717 > > * 5: 5 Gb/sec. > > * 6: 20 Gb/sec. > > * 7: 40 Gb/sec. > > * 8: 60 Gb/sec. > > * 9: 80 Gb/sec. > > * 10: 120 Gb/sec. > 2529c2735 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2562c2768 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2574c2780 > < * Encoded path pkt_life = 4.096 �sec * 2 PacketLifeTime. > --- > > * Encoded path pkt_life = 4.096 �sec * 2 ** PacketLifeTime. > 2591c2797 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2624c2830 > < AL_INLINE net32_t AL_API > --- > > static inline uint32_t > 2628c2834 > < return( cl_hton32( (cl_ntoh32(p_rec->hop_flow_raw.val) >> 8) & 0x000FFFFF ) ); > --- > > return( ((cl_ntoh32(p_rec->hop_flow_raw) >> 8) & 0x000FFFFF) ); > 2653c2859 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2657c2863 > < return( p_rec->hop_flow_raw.bytes[3] ); > --- > > return( (uint8_t)(p_rec->hop_flow_raw & 0x000000FF ) ); > 2673,2714d2878 > < /****f* IBA Base: Types/ib_path_rec_set_hop_flow_raw > < * NAME > < * ib_path_rec_set_hop_flow_raw > < * > < * DESCRIPTION > < * Sets the hop limit, flow label, and raw traffic bits of a path record. > < * > < * SYNOPSIS > < */ > < AL_INLINE void AL_API > < ib_path_rec_set_hop_flow_raw( > < OUT ib_path_rec_t* const p_rec, > < IN const uint8_t hop_limit, > < IN const net32_t flow_lbl, > < IN const boolean_t raw ) > < { > < p_rec->hop_flow_raw.val = (cl_ntoh32( flow_lbl ) & 0x000FFFFF) << 8; > < if( raw ) > < p_rec->hop_flow_raw.val |= 0x80000000; > < p_rec->hop_flow_raw.val = cl_hton32( p_rec->hop_flow_raw.val ); > < p_rec->hop_flow_raw.bytes[3] = hop_limit; > < } > < /* > < * PARAMETERS > < * p_rec > < * Pointer to the path record whose hop limit, flow label, and rab > < * traffic fields to set. > < * > < * hop_limit > < * Hop limit to set in the path record. > < * > < * flow_lbl > < * Flow label, in network byte order, to set in the path record. > < * > < * raw > < * Boolean flag to indicate whether the path record is for raw traffic. > < * > < * SEE ALSO > < * ib_path_rec_t > < *********/ > < > < > 2776c2940,2941 > < ib_net32_t resp_time_val; > --- > > uint8_t reserved[3]; > > uint8_t resp_time_val; > 2894c3059 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2922c3087 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 2963d3127 > < > 3041c3205 > < AL_INLINE void AL_API > --- > > static inline void > 3104c3268 > < AL_INLINE void AL_API > --- > > static inline void > 3149c3313 > < AL_INLINE boolean_t AL_API > --- > > static inline boolean_t > 3206c3370 > < AL_INLINE boolean_t AL_API > --- > > static inline boolean_t > 3231c3395 > < AL_INLINE void AL_API > --- > > static inline void > 3241c3405 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 3259c3423,3424 > < #define IB_SMP_DIRECTION (CL_HTON16(0x8000)) > --- > > #define IB_SMP_DIRECTION_HO 0x8000 > > #define IB_SMP_DIRECTION (CL_HTON16(IB_SMP_DIRECTION_HO)) > 3271c3436,3437 > < #define IB_SMP_STATUS_MASK (CL_HTON16(0x7FFF)) > --- > > #define IB_SMP_STATUS_MASK_HO 0x7FFF > > #define IB_SMP_STATUS_MASK (CL_HTON16(IB_SMP_STATUS_MASK_HO)) > 3376c3542 > < AL_INLINE ib_net16_t AL_API > --- > > static inline ib_net16_t > 3405c3571 > < AL_INLINE boolean_t AL_API > --- > > static inline boolean_t > 3433c3599 > < AL_INLINE boolean_t AL_API > --- > > static inline boolean_t > 3466c3632 > < AL_INLINE void AL_API > --- > > static inline void > 3496c3662 > < cl_memclr( p_smp->resv1, > --- > > memset( p_smp->resv1, 0, > 3503c3669 > < cl_memcpy( &p_smp->initial_path, path_out, > --- > > memcpy( &p_smp->initial_path, path_out, > 3552c3718 > < AL_INLINE void* AL_API > --- > > static inline void* > 3645,3647c3811 > < > < > < AL_INLINE uint32_t AL_API > --- > > static inline uint32_t > 3654c3818 > < AL_INLINE ib_net16_t AL_API > --- > > static inline ib_net16_t > 3658,3660d3821 > < if( attr_size & 0x07 ) > < return( cl_hton16( (uint16_t)(attr_size >> 3) + 1 ) ); > < else > 3664d3824 > < > 3674c3834 > < AL_INLINE void* AL_API > --- > > static inline void* > 3711c3871 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 3741c3901 > < AL_INLINE ib_net32_t AL_API > --- > > static inline ib_net32_t > 3810,3811c3970,3971 > < uint8_t state_info1; // LinkSpeedSupported and PortState > < uint8_t state_info2; // PortPhysState and LinkDownDefaultState > --- > > uint8_t state_info1; /* LinkSpeedSupported and PortState */ > > uint8_t state_info2; /* PortPhysState and LinkDownDefaultState */ > 3813c3973 > < uint8_t link_speed; // LinkSpeedEnabled and LinkSpeedActive > --- > > uint8_t link_speed; /* LinkSpeedEnabled and LinkSpeedActive */ > 3815c3975 > < uint8_t vl_cap; // VlCap and InitType > --- > > uint8_t vl_cap; /* VLCap and InitType */ > 3826c3986,3987 > < uint8_t subnet_timeout; > --- > > uint8_t subnet_timeout; /* cli_rereg(1b), resrv( > > 2b), timeout(5b) */ > 3835a3997 > > #define IB_PORT_LMC_MAX 0x07 > 3841a4004,4006 > > #define IB_PORT_PHYS_STATE_MASK 0xF0 > > #define IB_PORT_PHYS_STATE_SHIFT 4 > > #define IB_PORT_LNKDWNDFTSTATE_MASK 0x0F > 3854c4019 > < #define IB_PORT_CAP_HAS_SYS_GUID (CL_NTOH32(0x00000800)) > --- > > #define IB_PORT_CAP_HAS_SYS_IMG_GUID (CL_NTOH32(0x00000800)) > 3861c4026 > < #define IB_PORT_CAP_HAS_REINIT (CL_NTOH32(0x00040000)) > --- > > #define IB_PORT_CAP_REINIT (CL_NTOH32(0x00040000)) > 3867,3868c4032,4033 > < #define IB_PORT_CAP_HAS_LINK_TRIP (CL_NTOH32(0x01000000)) > < #define IB_PORT_CAP_HAS_CLI_REREG (CL_NTOH32(0x02000000)) > --- > > #define IB_PORT_CAP_HAS_LINK_RT_LATENCY (CL_NTOH32(0x01000000)) > > #define IB_PORT_CAP_HAS_CLIENT_REREG (CL_NTOH32(0x02000000)) > 3885c4050 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 3912c4077 > < AL_INLINE void AL_API > --- > > static inline void > 3943c4108 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 3960a4126,4152 > > /****f* IBA Base: Types/ib_port_info_get_init_type > > * NAME > > * ib_port_info_get_init_type > > * > > * DESCRIPTION > > * Gets the init type of a port. > > * > > * SYNOPSIS > > */ > > static inline uint8_t > > ib_port_info_get_init_type( > > IN const ib_port_info_t* const p_pi) > > { > > return (uint8_t) (p_pi->vl_cap & 0x0F); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * InitType field > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > 3970c4162 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 3997c4189 > < AL_INLINE void AL_API > --- > > static inline void > 4028c4220 > < AL_INLINE void AL_API > --- > > static inline void > 4057c4249 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 4088c4280 > < AL_INLINE void AL_API > --- > > static inline void > 4114c4306 > < /****f* IBA Base: Types/ib_port_info_get_link_speed_active > --- > > /****f* IBA Base: Types/ib_port_info_get_port_phys_state > 4116c4308 > < * ib_port_info_get_link_speed_active > --- > > * ib_port_info_get_port_phys_state > 4119c4311 > < * Returns the Link Speed Active value assigned to this port. > --- > > * Returns the encoded value for the port physical state. > 4123,4124c4315,4316 > < AL_INLINE uint8_t AL_API > < ib_port_info_get_link_speed_active( > --- > > static inline uint8_t > > ib_port_info_get_port_phys_state( > 4127,4128c4319,4321 > < return( (uint8_t)((p_pi->link_speed & IB_PORT_LINK_SPEED_ACTIVE_MASK) >> > < IB_PORT_LINK_SPEED_SHIFT) ); > --- > > return( (uint8_t)((p_pi->state_info2 & > > IB_PORT_PHYS_STATE_MASK) >> > > IB_PORT_PHYS_STATE_SHIFT) ); > 4136c4329 > < * Returns the link speed active value assigned to this port. > --- > > * Returns the encoded value for the port physical state. > 4142a4336,4467 > > /****f* IBA Base: Types/ib_port_info_set_port_phys_state. > > * NAME > > * ib_port_info_set_port_phys_state > > * > > * DESCRIPTION > > * Given an integer of the port physical state, > > * Set the appropriate bits in state_info2 > > * > > * SYNOPSIS > > */ > > static inline void > > ib_port_info_set_port_phys_state( > > IN uint8_t const phys_state, > > IN ib_port_info_t* p_pi ) > > { > > p_pi->state_info2 = > > ( ~IB_PORT_PHYS_STATE_MASK & p_pi->state_info2 ) | > > ( IB_PORT_PHYS_STATE_MASK & > > (phys_state << IB_PORT_PHYS_STATE_SHIFT) ); > > } > > /* > > * PARAMETERS > > * phys_state > > * [in] port physical state. > > * > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * This function does not return a value. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > > > /****f* IBA Base: Types/ib_port_info_get_link_down_def_state > > * NAME > > * ib_port_info_get_link_down_def_state > > * > > * DESCRIPTION > > * Returns the link down default state. > > * > > * SYNOPSIS > > */ > > static inline uint8_t > > ib_port_info_get_link_down_def_state( > > IN const ib_port_info_t* const p_pi ) > > { > > return( (uint8_t)(p_pi->state_info2 & IB_PORT_LNKDWNDFTSTATE_MASK) ); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * link down default state of the port. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > > > /****f* IBA Base: Types/ib_port_info_set_link_down_def_state > > * NAME > > * ib_port_info_set_link_down_def_state > > * > > * DESCRIPTION > > * Sets the link down default state of the port. > > * > > * SYNOPSIS > > */ > > static inline void > > ib_port_info_set_link_down_def_state( > > IN ib_port_info_t* const p_pi, > > IN const uint8_t link_dwn_state ) > > { > > p_pi->state_info2 = (uint8_t)((p_pi->state_info2 & 0xF0) | link_dwn_state ); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * link_dwn_state > > * [in] Link down default state of the port. > > * > > * RETURN VALUES > > * None. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > > > /****f* IBA Base: Types/ib_port_info_get_link_speed_active > > * NAME > > * ib_port_info_get_link_speed_active > > * > > * DESCRIPTION > > * Returns the Link Speed Active value assigned to this port. > > * > > * SYNOPSIS > > */ > > static inline uint8_t > > ib_port_info_get_link_speed_active( > > IN const ib_port_info_t* const p_pi ) > > { > > return( (uint8_t)((p_pi->link_speed & > > IB_PORT_LINK_SPEED_ACTIVE_MASK) >> > > IB_PORT_LINK_SPEED_SHIFT) ); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * Returns the link speed active value assigned to this port. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > > > #define IB_LINK_WIDTH_ACTIVE_1X 1 > > #define IB_LINK_WIDTH_ACTIVE_4X 2 > > #define IB_LINK_WIDTH_ACTIVE_12X 8 > > #define IB_LINK_SPEED_ACTIVE_2_5 1 > > #define IB_LINK_SPEED_ACTIVE_5 2 > > #define IB_LINK_SPEED_ACTIVE_10 4 > 4144,4151c4469 > < #define IB_LINK_WIDTH_ACTIVE_1X 1 > < #define IB_LINK_WIDTH_ACTIVE_4X 2 > < #define IB_LINK_WIDTH_ACTIVE_12X 8 > < > < #define IB_LINK_SPEED_ACTIVE_2_5 1 > < #define IB_LINK_SPEED_ACTIVE_5 2 > < #define IB_LINK_SPEED_ACTIVE_10 4 > < > --- > > /* following v1 ver1.2 p901 */ > 4161a4480,4481 > > #define IB_MIN_RATE IB_PATH_RECORD_RATE_2_5_GBS > > #define IB_MAX_RATE IB_PATH_RECORD_RATE_120_GBS > 4172c4492,4493 > < AL_INLINE uint8_t AL_API > --- > > > > static inline uint8_t > 4176,4177c4497,4502 > < switch( p_pi->link_width_active * p_pi->link_width_active * > < ib_port_info_get_link_speed_active( p_pi ) ) > --- > > uint8_t rate = 0; > > > > switch (ib_port_info_get_link_speed_active(p_pi)) > > { > > case IB_LINK_SPEED_ACTIVE_2_5: > > switch (p_pi->link_width_active) > 4179,4180c4504,4510 > < case 1: > < return IB_PATH_RECORD_RATE_2_5_GBS; > --- > > case IB_LINK_WIDTH_ACTIVE_1X: > > rate = IB_PATH_RECORD_RATE_2_5_GBS; > > break; > > > > case IB_LINK_WIDTH_ACTIVE_4X: > > rate = IB_PATH_RECORD_RATE_10_GBS; > > break; > 4182,4183c4512,4514 > < case 2: > < return IB_PATH_RECORD_RATE_5_GBS; > --- > > case IB_LINK_WIDTH_ACTIVE_12X: > > rate = IB_PATH_RECORD_RATE_30_GBS; > > break; > 4185,4186c4516,4526 > < case 4: > < return IB_PATH_RECORD_RATE_10_GBS; > --- > > default: > > rate = IB_PATH_RECORD_RATE_2_5_GBS; > > break; > > } > > break; > > case IB_LINK_SPEED_ACTIVE_5: > > switch (p_pi->link_width_active) > > { > > case IB_LINK_WIDTH_ACTIVE_1X: > > rate = IB_PATH_RECORD_RATE_5_GBS; > > break; > 4188,4189c4528,4530 > < case 8: > < return IB_PATH_RECORD_RATE_20_GBS; > --- > > case IB_LINK_WIDTH_ACTIVE_4X: > > rate = IB_PATH_RECORD_RATE_20_GBS; > > break; > 4191,4192c4532,4534 > < case 16: > < return IB_PATH_RECORD_RATE_40_GBS; > --- > > case IB_LINK_WIDTH_ACTIVE_12X: > > rate = IB_PATH_RECORD_RATE_60_GBS; > > break; > 4194,4195c4536,4546 > < case 64: > < return IB_PATH_RECORD_RATE_30_GBS; > --- > > default: > > rate = IB_PATH_RECORD_RATE_5_GBS; > > break; > > } > > break; > > case IB_LINK_SPEED_ACTIVE_10: > > switch (p_pi->link_width_active) > > { > > case IB_LINK_WIDTH_ACTIVE_1X: > > rate = IB_PATH_RECORD_RATE_10_GBS; > > break; > 4197,4198c4548,4550 > < case 128: > < return IB_PATH_RECORD_RATE_60_GBS; > --- > > case IB_LINK_WIDTH_ACTIVE_4X: > > rate = IB_PATH_RECORD_RATE_40_GBS; > > break; > 4200,4201c4552,4554 > < case 256: > < return IB_PATH_RECORD_RATE_120_GBS; > --- > > case IB_LINK_WIDTH_ACTIVE_12X: > > rate =IB_PATH_RECORD_RATE_120_GBS; > > break; > 4204c4557,4563 > < return IB_PATH_RECORD_RATE_2_5_GBS; > --- > > rate = IB_PATH_RECORD_RATE_10_GBS; > > break; > > } > > break; > > default: > > rate = IB_PATH_RECORD_RATE_2_5_GBS; > > break; > 4205a4565,4566 > > > > return rate; > 4229c4590 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 4238c4599 > < //link_width_supported = 1: 1x > --- > > /* link_width_supported = 1: 1x */ > 4242c4603 > < //link_width_supported = 3: 1x or 4x > --- > > /* link_width_supported = 3: 1x or 4x */ > 4254c4615 > < //link_width_supported = 11: 1x or 4x or 12x > --- > > /* link_width_supported = 11: 1x or 4x or 12x */ > 4291d4651 > < > 4301c4661 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 4328c4688 > < AL_INLINE uint8_t AL_API > --- > > static inline uint8_t > 4355c4715 > < AL_INLINE void AL_API > --- > > static inline void > 4379a4740,4798 > > /****f* IBA Base: Types/ib_port_info_get_master_smsl > > * NAME > > * ib_port_info_get_master_smsl > > * > > * DESCRIPTION > > * Returns the encoded value for the Master SMSL at this port. > > * > > * SYNOPSIS > > */ > > static inline uint8_t > > ib_port_info_get_master_smsl( > > IN const ib_port_info_t* const p_pi ) > > { > > return (uint8_t) (p_pi->mtu_smsl & 0x0F); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * Returns the encoded value for the Master SMSL at this port. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > /****f* IBA Base: Types/ib_port_info_set_master_smsl > > * NAME > > * ib_port_info_set_master_smsl > > * > > * DESCRIPTION > > * Sets the Master SMSL value in the PortInfo attribute. > > * > > * SYNOPSIS > > */ > > static inline void > > ib_port_info_set_master_smsl( > > IN ib_port_info_t* const p_pi, > > IN const uint8_t smsl ) > > { > > p_pi->mtu_smsl = (uint8_t)((p_pi->mtu_smsl & 0xF0) | smsl ); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * mtu > > * [in] Encoded Master SMSL value to set > > * > > * RETURN VALUES > > * None. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > > 4389c4808 > < AL_INLINE void AL_API > --- > > static inline void > 4395c4814,4816 > < p_pi->subnet_timeout = (uint8_t)(timeout & 0x1F); > --- > > p_pi->subnet_timeout = > > (uint8_t)( > > (p_pi->subnet_timeout & 0x80) | (timeout & 0x1F)); > 4413c4834 > < /****f* IBA Base: Types/ib_port_info_get_lmc > --- > > /****f* IBA Base: Types/ib_port_info_set_client_rereg > 4415c4836 > < * ib_port_info_get_lmc > --- > > * ib_port_info_set_client_rereg > 4418c4839 > < * Returns the LMC value assigned to this port. > --- > > * Sets the encoded client reregistration bit value in the PortInfo attribute. > 4422,4424c4843,4846 > < AL_INLINE uint8_t AL_API > < ib_port_info_get_lmc( > < IN const ib_port_info_t* const p_pi ) > --- > > static inline void > > ib_port_info_set_client_rereg( > > IN ib_port_info_t* const p_pi, > > IN const uint8_t client_rereg ) > 4426c4848,4851 > < return( (uint8_t)(p_pi->mkey_lmc & IB_PORT_LMC_MASK) ); > --- > > CL_ASSERT( client_rereg <= 0x1 ); > > p_pi->subnet_timeout = > > (uint8_t)( > > (p_pi->subnet_timeout & 0x1F) | ((client_rereg << 7) & 0x80)); > 4432a4858,4860 > > * client_rereg > > * [in] Client reregistration value to set (either 1 or 0). > > * > 4434c4862 > < * Returns the LMC value assigned to this port. > --- > > * None. > 4440c4868,4869 > < /****f* IBA Base: Types/ib_port_info_set_lmc > --- > > > > /****f* IBA Base: Types/ib_port_info_get_timeout > 4442c4871 > < * ib_port_info_set_lmc > --- > > * ib_port_info_get_timeout > 4445c4874 > < * Sets the LMC value in the PortInfo attribute. > --- > > * Gets the encoded subnet timeout value in the PortInfo attribute. > 4449,4452c4878,4880 > < AL_INLINE void AL_API > < ib_port_info_set_lmc( > < IN ib_port_info_t* const p_pi, > < IN const uint8_t lmc ) > --- > > static inline uint8_t > > ib_port_info_get_timeout( > > IN ib_port_info_t const* p_pi ) > 4454,4455c4882 > < CL_ASSERT( lmc <= 0x7 ); > < p_pi->mkey_lmc = (uint8_t)((p_pi->mkey_lmc & 0xF8) | lmc); > --- > > return(p_pi->subnet_timeout & 0x1F ); > 4462,4464d4888 > < * lmc > < * [in] LMC value to set, must be less than 7. > < * > 4466c4890 > < * None. > --- > > * The encoded timeout value > 4472c4896,4897 > < /****f* IBA Base: Types/ib_port_info_get_mpb > --- > > > > /****f* IBA Base: Types/ib_port_info_get_client_rereg > 4474c4899 > < * ib_port_info_get_mpb > --- > > * ib_port_info_get_client_rereg > 4477c4902 > < * Returns the M_Key protect bits assigned to this port. > --- > > * Gets the encoded client reregistration bit value in the PortInfo attribute. > 4481,4483c4906,4908 > < AL_INLINE uint8_t AL_API > < ib_port_info_get_mpb( > < IN const ib_port_info_t* const p_pi ) > --- > > static inline uint8_t > > ib_port_info_get_client_rereg( > > IN ib_port_info_t const* p_pi ) > 4485,4486c4910 > < return( (uint8_t)((p_pi->mkey_lmc & IB_PORT_MPB_MASK) >> > < IB_PORT_MPB_SHIFT) ); > --- > > return ( (p_pi->subnet_timeout & 0x80 ) >> 7); > 4490c4914 > < * p_ni > --- > > * p_pi > 4494c4918 > < * Returns the M_Key protect bits assigned to this port. > --- > > * Client reregistration value (either 1 or 0). > 4501c4925 > < /****f* IBA Base: Types/ib_port_info_set_mpb > --- > > /****f* IBA Base: Types/ib_port_info_set_hoq_lifetime > 4503c4927 > < * ib_port_info_set_mpb > --- > > * ib_port_info_set_hoq_lifetime > 4506c4930,4931 > < * Set the M_Key protect bits of this port. > --- > > * Sets the Head of Queue Lifetime for which a packet can live in the head > > * of VL queue > 4510,4513c4935,4938 > < AL_INLINE void AL_API > < ib_port_info_set_mpb( > < IN ib_port_info_t* p_pi, > < IN uint8_t mpb ) > --- > > static inline void > > ib_port_info_set_hoq_lifetime( > > IN ib_port_info_t* const p_pi, > > IN const uint8_t hoq_life ) > 4515,4517c4940,4941 > < p_pi->mkey_lmc = > < ((p_pi->mkey_lmc & ~IB_PORT_MPB_MASK) | > < (mpb << IB_PORT_MPB_SHIFT)); > --- > > p_pi->vl_stall_life = (uint8_t)((hoq_life & 0x1f) | > > (p_pi->vl_stall_life & 0xe0)); > 4521,4523c4945 > < * mpb > < * [in] M_Key protect bits > < * p_ni > --- > > * p_pi > 4525a4948,4950 > > * hoq_life > > * [in] Encoded lifetime value to set > > * > 4526a4952 > > * None. > 4532a4959,4974 > > /****f* IBA Base: Types/ib_port_info_get_hoq_lifetime > > * NAME > > * ib_port_info_get_hoq_lifetime > > * > > * DESCRIPTION > > * Gets the Head of Queue Lifetime for which a packet can live in the head > > * of VL queue > > * > > * SYNOPSIS > > */ > > static inline uint8_t > > ib_port_info_get_hoq_lifetime( > > IN const ib_port_info_t* const p_pi ) > > { > > return( (uint8_t)(p_pi->vl_stall_life & 0x1f) ); > > } > 4534c4976,4987 > < typedef uint8_t ib_svc_name_t[64]; > --- > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * Encoded lifetime value > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > 4536,4537c4989,5002 > < #include > < typedef struct _ib_service_record > --- > > /****f* IBA Base: Types/ib_port_info_set_vl_stall_count > > * NAME > > * ib_port_info_set_vl_stall_count > > * > > * DESCRIPTION > > * Sets the VL Stall Count which define the number of contiguous > > * HLL (hoq) drops that will put the VL into stalled mode. > > * > > * SYNOPSIS > > */ > > static inline void > > ib_port_info_set_vl_stall_count( > > IN ib_port_info_t* const p_pi, > > IN const uint8_t vl_stall_count ) > 4539,4591c5004,5010 > < ib_net64_t service_id; > < ib_gid_t service_gid; > < ib_net16_t service_pkey; > < ib_net16_t resv; > < ib_net32_t service_lease; > < uint8_t service_key[16]; > < ib_svc_name_t service_name; > < uint8_t service_data8[16]; > < ib_net16_t service_data16[8]; > < ib_net32_t service_data32[4]; > < ib_net64_t service_data64[2]; > < > < } PACK_SUFFIX ib_service_record_t; > < #include > < > < #include > < typedef struct _ib_portinfo_record > < { > < ib_net16_t lid; > < uint8_t port_num; > < uint8_t resv; > < ib_port_info_t port_info; > < uint8_t pad[6]; > < > < } PACK_SUFFIX ib_portinfo_record_t; > < #include > < > < #include > < typedef struct _ib_link_record > < { > < ib_net16_t from_lid; > < uint8_t from_port_num; > < uint8_t to_port_num; > < ib_net16_t to_lid; > < uint8_t pad[2]; > < > < } PACK_SUFFIX ib_link_record_t; > < #include > < > < #include > < typedef struct _ib_sminfo_record > < { > < ib_net16_t lid; > < uint16_t resv0; > < ib_sm_info_t sm_info; > < uint8_t pad[7]; > < > < } PACK_SUFFIX ib_sminfo_record_t; > < #include > < > < /****s* IBA Base: Types/ib_lft_record_t > < * NAME > < * ib_lft_record_t > --- > > p_pi->vl_stall_life = (uint8_t)((p_pi->vl_stall_life & 0x1f) | > > ((vl_stall_count << 5) & 0xe0)); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > 4593,4594c5012,5013 > < * DESCRIPTION > < * IBA defined LinearForwardingTable. (14.2.5.6) > --- > > * vl_stall_count > > * [in] value to set > 4596,4612c5015,5016 > < * SYNOPSIS > < */ > < #include > < typedef struct _ib_lft_record > < { > < ib_net16_t lid; > < ib_net16_t block_num; > < uint32_t resv0; > < uint8_t lft[64]; > < > < } PACK_SUFFIX ib_lft_record_t; > < #include > < /************/ > < > < /****s* IBA Base: Types/ib_switch_info_t > < * NAME > < * ib_switch_info_t > --- > > * RETURN VALUES > > * None. > 4614,4615c5018 > < * DESCRIPTION > < * IBA defined SwitchInfo. (14.2.5.4) > --- > > * NOTES > 4617,4649c5020,5021 > < * SYNOPSIS > < */ > < #include > < typedef struct _ib_switch_info > < { > < ib_net16_t lin_cap; > < ib_net16_t rand_cap; > < ib_net16_t mcast_cap; > < ib_net16_t lin_top; > < uint8_t def_port; > < uint8_t def_mcast_pri_port; > < uint8_t def_mcast_not_port; > < uint8_t life_state; > < ib_net16_t lids_per_port; > < ib_net16_t enforce_cap; > < uint8_t flags; > < > < } PACK_SUFFIX ib_switch_info_t; > < #include > < /************/ > < > < #include > < typedef struct _ib_switch_info_record > < { > < ib_net16_t lid; > < uint16_t resv0; > < ib_switch_info_t switch_info; > < uint8_t pad[3]; > < > < } PACK_SUFFIX ib_switch_info_record_t; > < #include > < > < #define IB_SWITCH_PSC 0x04 > --- > > * SEE ALSO > > *********/ > 4651c5023 > < /****f* IBA Base: Types/ib_switch_info_get_state_change > --- > > /****f* IBA Base: Types/ib_port_info_get_vl_stall_count > 4653c5025 > < * ib_switch_info_get_state_change > --- > > * ib_port_info_get_vl_stall_count > 4656c5028,5029 > < * Returns the value of the state change flag. > --- > > * Gets the VL Stall Count which define the number of contiguous > > * HLL (hoq) drops that will put the VL into stalled mode > 4660,4662c5033,5035 > < AL_INLINE boolean_t AL_API > < ib_switch_info_get_state_change( > < IN const ib_switch_info_t* const p_si ) > --- > > static inline uint8_t > > ib_port_info_get_vl_stall_count( > > IN const ib_port_info_t* const p_pi ) > 4664c5037 > < return( (p_si->life_state & IB_SWITCH_PSC) == IB_SWITCH_PSC ); > --- > > return( (uint8_t)(p_pi->vl_stall_life & 0xe0) >> 5); > 4665a5039 > > > 4668,4669c5042,5043 > < * p_si > < * [in] Pointer to a SwitchInfo attribute. > --- > > * p_pi > > * [in] Pointer to a PortInfo attribute. > 4672c5046 > < * Returns the value of the state change flag. > --- > > * vl stall count > 4679c5053 > < /****f* IBA Base: Types/ib_switch_info_clear_state_change > --- > > /****f* IBA Base: Types/ib_port_info_get_lmc > 4681c5055 > < * ib_switch_info_clear_state_change > --- > > * ib_port_info_get_lmc > 4684c5058 > < * Clears the switch's state change bit. > --- > > * Returns the LMC value assigned to this port. > 4688,4690c5062,5064 > < AL_INLINE void AL_API > < ib_switch_info_clear_state_change( > < IN ib_switch_info_t* const p_si ) > --- > > static inline uint8_t > > ib_port_info_get_lmc( > > IN const ib_port_info_t* const p_pi ) > 4692c5066 > < p_si->life_state = (uint8_t)(p_si->life_state & 0xFB); > --- > > return( (uint8_t)(p_pi->mkey_lmc & IB_PORT_LMC_MASK) ); > 4696c5070 > < * p_ni > --- > > * p_pi > 4707c5081 > < /****s* IBA Base: Types/ib_guid_info_t > --- > > /****f* IBA Base: Types/ib_port_info_set_lmc > 4709c5083 > < * ib_guid_info_t > --- > > * ib_port_info_set_lmc > 4712c5086 > < * IBA defined GuidInfo. (14.2.5.5) > --- > > * Sets the LMC value in the PortInfo attribute. > 4716,4719c5090,5093 > < #define GUID_TABLE_MAX_ENTRIES 8 > < > < #include > < typedef struct _ib_guid_info > --- > > static inline void > > ib_port_info_set_lmc( > > IN ib_port_info_t* const p_pi, > > IN const uint8_t lmc ) > 4721,4729c5095,5101 > < ib_net64_t guid[GUID_TABLE_MAX_ENTRIES]; > < > < } PACK_SUFFIX ib_guid_info_t; > < #include > < /************/ > < > < /****s* IBA Base: Types/ib_pkey_table_info_t > < * NAME > < * ib_pkey_table_info_t > --- > > CL_ASSERT( lmc <= IB_PORT_LMC_MAX ); > > p_pi->mkey_lmc = (uint8_t)((p_pi->mkey_lmc & 0xF8) | lmc); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > 4731,4732c5103,5104 > < * DESCRIPTION > < * IBA defined PKey table. (14.2.5.7) > --- > > * lmc > > * [in] LMC value to set, must be less than 7. > 4734,4745c5106,5112 > < * SYNOPSIS > < */ > < #define PKEY_TABLE_MAX_ENTRIES 32 > < > < #include > < typedef struct _ib_pkey_table_info > < { > < ib_net16_t pkey[PKEY_TABLE_MAX_ENTRIES]; > < > < } PACK_SUFFIX ib_pkey_table_info_t; > < #include > < /************/ > --- > > * RETURN VALUES > > * None. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > 4747,4748c5114 > < #define IB_MAX_NUM_VLS 16 > < /****s* IBA Base: Types/ib_slvl_table_t > --- > > /****f* IBA Base: Types/ib_port_info_get_link_speed_enabled > 4750c5116 > < * ib_slvl_table_t > --- > > * ib_port_info_get_link_speed_enabled > 4753c5119 > < * IBA defined SL2VL Mapping Table Attribute. (14.2.5.8) > --- > > * Returns the link speed enabled value assigned to this port. > 4757,4758c5123,5125 > < #include > < typedef struct _ib_slvl_table > --- > > static inline uint8_t > > ib_port_info_get_link_speed_enabled( > > IN const ib_port_info_t* const p_pi ) > 4760,4766c5127,5141 > < uint8_t vl_table[IB_MAX_NUM_VLS/2]; > < > < } PACK_SUFFIX ib_slvl_table_t; > < #include > < /************/ > < > < /****f* IBA Base: Types/ib_slvl_table_get_vl > --- > > return( (uint8_t)(p_pi->link_speed & IB_PORT_LINK_SPEED_ENABLED_MASK) ); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * Port state. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > /****f* IBA Base: Types/ib_port_info_set_link_speed_enabled > 4768c5143 > < * ib_slvl_table_get_vl > --- > > * ib_port_info_set_link_speed_enabled > 4771c5146 > < * Retrieves the VL for a given SL from an SL to VL mapping table. > --- > > * Sets the link speed enabled value in the PortInfo attribute. > 4775,4778c5150,5153 > < AL_INLINE uint8_t AL_API > < ib_slvl_table_get_vl( > < IN const ib_slvl_table_t* const p_slvl_tbl, > < IN const uint8_t sl ) > --- > > static inline void > > ib_port_info_set_link_speed_enabled( > > IN ib_port_info_t* const p_pi, > > IN const uint8_t link_speed_enabled ) > 4780,4789c5155 > < uint8_t vl; > < > < /* There are two VL's per byte. */ > < vl = p_slvl_tbl->vl_table[sl/2]; > < /* If odd, shift down 4 bits. */ > < if( sl % 2 ) > < vl >>= 4; > < > < /* Mask off upper bits and return. */ > < return vl & 0x0F; > --- > > p_pi->link_speed = (uint8_t)((p_pi->link_speed & 0xF0) | link_speed_enabled ); > 4793,4794c5159,5160 > < * p_slvl_tbl > < * [in] Pointer to the SL to VL mapping table from which to return the VL. > --- > > * p_pi > > * [in] Pointer to a PortInfo attribute. > 4796,4797c5162,5163 > < * sl > < * [in] SL in the table for which to return the VL. > --- > > * link_speed_enabled > > * [in] link speed enabled value to set. > 4800c5166,5168 > < * Returns the VL value for the specified SL in the provided table. > --- > > * None. > > * > > * NOTES > 4803d5170 > < * ib_slvl_table_t, ib_slvl_table_set_vl > 4806c5173 > < /****f* IBA Base: Types/ib_slvl_table_set_vl > --- > > /****f* IBA Base: Types/ib_port_info_get_mpb > 4808c5175 > < * ib_slvl_table_set_vl > --- > > * ib_port_info_get_mpb > 4811c5178 > < * Sets the VL for a given SL in an SL to VL mapping table. > --- > > * Returns the M_Key protect bits assigned to this port. > 4815,4832c5182,5184 > < AL_INLINE void AL_API > < ib_slvl_table_set_vl( > < IN OUT ib_slvl_table_t* const p_slvl_tbl, > < IN const uint8_t sl, > < IN const uint8_t vl ) > < { > < uint8_t entry; > < > < /* Get the current value for the byte in which the VL is stored. */ > < entry = p_slvl_tbl->vl_table[sl/2]; > < > < /* Clear the appropriate bits and set the new VL value. */ > < if( sl % 2 ) > < { > < entry &= 0x0F; > < entry |= ((vl & 0x0F) << 4); > < } > < else > --- > > static inline uint8_t > > ib_port_info_get_mpb( > > IN const ib_port_info_t* const p_pi ) > 4834,4838c5186,5187 > < entry &= 0xF0; > < entry |= (vl & 0x0F); > < } > < /* Store the updated entry back into the table. */ > < p_slvl_tbl->vl_table[sl/2] = entry; > --- > > return( (uint8_t)((p_pi->mkey_lmc & IB_PORT_MPB_MASK) >> > > IB_PORT_MPB_SHIFT) ); > 4842,4849c5191,5192 > < * slvl_tbl > < * [in/out] Pointer to the SL to VL mapping table in which to store the VL. > < * > < * sl > < * [in] SL in the table for which to store the VL. > < * > < * vl > < * [in] VL to store at the specifed SL. > --- > > * p_ni > > * [in] Pointer to a PortInfo attribute. > 4852c5195,5197 > < * This function does not return a value. > --- > > * Returns the M_Key protect bits assigned to this port. > > * > > * NOTES > 4855d5199 > < * ib_slvl_table_t, ib_slvl_table_get_vl > 4858c5202 > < /****s* IBA Base: Types/ib_slvl_table_record_t > --- > > /****f* IBA Base: Types/ib_port_info_set_mpb > 4860c5204 > < * ib_slvl_table_record_t > --- > > * ib_port_info_set_mpb > 4863c5207 > < * IBA defined Sl to VL Mapping Table Record for SA Query. (15.2.5.4) > --- > > * Set the M_Key protect bits of this port. > 4867,4868c5211,5214 > < #include > < typedef struct _ib_slvl_table_record > --- > > static inline void > > ib_port_info_set_mpb( > > IN ib_port_info_t* p_pi, > > IN uint8_t mpb ) > 4870,4880c5216,5233 > < ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 > < uint8_t in_port_num; // reserved for CA's > < uint8_t out_port_num; // reserved for CA's > < uint32_t resv; > < ib_slvl_table_t slvl_tbl; > < > < } PACK_SUFFIX ib_slvl_table_record_t; > < #include > < /************/ > < > < /****s* IBA Base: Types/ib_vl_arb_element_t > --- > > p_pi->mkey_lmc = > > (~IB_PORT_MPB_MASK & p_pi->mkey_lmc) | > > ( IB_PORT_MPB_MASK & (mpb << IB_PORT_MPB_SHIFT) ); > > } > > /* > > * PARAMETERS > > * mpb > > * [in] M_Key protect bits > > * p_ni > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > /****f* IBA Base: Types/ib_port_info_get_local_phy_err_thd > 4882c5235 > < * ib_vl_arb_element_t > --- > > * ib_port_info_get_local_phy_err_thd > 4885c5238 > < * IBA defined VL Arbitration Table Element. (14.2.5.9) > --- > > * Returns the Phy Link Threshold > 4889,4890c5242,5244 > < #include > < typedef struct _ib_vl_arb_element > --- > > static inline uint8_t > > ib_port_info_get_local_phy_err_thd( > > IN const ib_port_info_t* const p_pi ) > 4892,4899c5246,5260 > < uint8_t res_vl; > < uint8_t weight; > < > < } PACK_SUFFIX ib_vl_arb_element_t; > < #include > < /************/ > < > < /****f* IBA Base: Types/ib_vl_arb_element_get_vl > --- > > return (uint8_t)( (p_pi->error_threshold & 0xF0) >> 4); > > } > > /* > > * PARAMETERS > > * p_pi > > * [in] Pointer to a PortInfo attribute. > > * > > * RETURN VALUES > > * Returns the Phy Link error threshold assigned to this port. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > > /****f* IBA Base: Types/ib_port_info_get_overrun_err_thd > 4901c5262 > < * ib_vl_arb_element_get_vl > --- > > * ib_port_info_get_local_overrun_err_thd > 4904c5265 > < * Retrieves the VL from a VL arbitration table element. > --- > > * Returns the Credits Overrun Errors Threshold > 4908,4910c5269,5271 > < AL_INLINE uint8_t AL_API > < ib_vl_arb_element_get_vl( > < IN const ib_vl_arb_element_t vl_arb_element ) > --- > > static inline uint8_t > > ib_port_info_get_overrun_err_thd( > > IN const ib_port_info_t* const p_pi ) > 4912c5273 > < return (vl_arb_element.res_vl >> 4); > --- > > return (uint8_t)(p_pi->error_threshold & 0x0F); > 4916,4917c5277,5278 > < * vl_arb_element > < * [in] VL arbitration table element from which to return the VL. > --- > > * p_pi > > * [in] Pointer to a PortInfo attribute. > 4920c5281,5283 > < * Returns the VL value for the specified VL arbitration table element. > --- > > * Returns the Credits Overrun errors threshold assigned to this port. > > * > > * NOTES > 4923d5285 > < * vl_arb_element, ib_vl_arb_element_set_vl > 4926c5288 > < /****f* IBA Base: Types/ib_vl_arb_element_set_vl > --- > > /****f* IBA Base: Types/ib_port_info_set_phy_and_overrun_err_thd > 4928c5290 > < * ib_vl_arb_element_set_vl > --- > > * ib_port_info_set_phy_and_overrun_err_thd > 4931c5293 > < * Retrieves the VL from a VL arbitration table element. > --- > > * Sets the Phy Link and Credits Overrun Errors Threshold > 4935,4938c5297,5301 > < AL_INLINE void AL_API > < ib_vl_arb_element_set_vl( > < IN OUT ib_vl_arb_element_t* const p_vl_arb_element, > < IN const uint8_t vl ) > --- > > static inline void > > ib_port_info_set_phy_and_overrun_err_thd( > > IN ib_port_info_t* const p_pi, > > IN uint8_t phy_threshold, > > IN uint8_t overrun_threshold ) > 4940c5303,5304 > < p_vl_arb_element->res_vl = vl << 4; > --- > > p_pi->error_threshold = > > (uint8_t)( ((phy_threshold & 0x0F) << 4) | (overrun_threshold & 0x0F) ); > 4944,4945c5308,5309 > < * vl_arb_element > < * [in/out] VL arbitration table element in which to store the VL. > --- > > * p_pi > > * [in] Pointer to a PortInfo attribute. > 4947,4948c5311,5315 > < * vl > < * [in] VL to store in the specified element. > --- > > * phy_threshold > > * [in] Physical Link Errors Threshold above which Trap 129 is generated > > * > > * overrun_threshold > > * [in] Credits overrun Errors Threshold above which Trap 129 is generated > 4951c5318,5320 > < * This function does not return a value. > --- > > * None. > > * > > * NOTES > 4954d5322 > < * vl_arb_element, ib_vl_arb_element_get_vl > 4957,4958c5325,5380 > < #define IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK 32 > < /****s* IBA Base: Types/ib_vl_arb_table_t > --- > > typedef uint8_t ib_svc_name_t[64]; > > > > #include > > typedef struct _ib_service_record > > { > > ib_net64_t service_id; > > ib_gid_t service_gid; > > ib_net16_t service_pkey; > > ib_net16_t resv; > > ib_net32_t service_lease; > > uint8_t service_key[16]; > > ib_svc_name_t service_name; > > uint8_t service_data8[16]; > > ib_net16_t service_data16[8]; > > ib_net32_t service_data32[4]; > > ib_net64_t service_data64[2]; > > > > } PACK_SUFFIX ib_service_record_t; > > #include > > > > #include > > typedef struct _ib_portinfo_record > > { > > ib_net16_t lid; > > uint8_t port_num; > > uint8_t resv; > > ib_port_info_t port_info; > > uint8_t pad[6]; > > > > } PACK_SUFFIX ib_portinfo_record_t; > > #include > > > > #include > > typedef struct _ib_link_record > > { > > ib_net16_t from_lid; > > uint8_t from_port_num; > > uint8_t to_port_num; > > ib_net16_t to_lid; > > uint8_t pad[2]; > > > > } PACK_SUFFIX ib_link_record_t; > > #include > > > > #include > > typedef struct _ib_sminfo_record > > { > > ib_net16_t lid; > > uint16_t resv0; > > ib_sm_info_t sm_info; > > uint8_t pad[7]; > > > > } PACK_SUFFIX ib_sminfo_record_t; > > #include > > > > /****s* IBA Base: Types/ib_lft_record_t > 4960c5382 > < * ib_vl_arb_table_t > --- > > * ib_lft_record_t > 4963c5385 > < * IBA defined VL Arbitration Table. (14.2.5.9) > --- > > * IBA defined LinearForwardingTable. (14.2.5.6) > 4968c5390 > < typedef struct _ib_vl_arb_table > --- > > typedef struct _ib_lft_record > 4970,4972c5392,5396 > < ib_vl_arb_element_t vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]; > < > < } PACK_SUFFIX ib_vl_arb_table_t; > --- > > ib_net16_t lid; > > ib_net16_t block_num; > > uint32_t resv0; > > uint8_t lft[64]; > > } PACK_SUFFIX ib_lft_record_t; > 4976c5400 > < /****s* IBA Base: Types/ib_vl_arb_table_record_t > --- > > /****s* IBA Base: Types/ib_switch_info_t > 4978c5402 > < * ib_vl_arb_table_record_t > --- > > * ib_switch_info_t > 4981c5405 > < * IBA defined VL Arbitration Table Record for SA Query. (15.2.5.9) > --- > > * IBA defined SwitchInfo. (14.2.5.4) > 4986c5410 > < typedef struct _ib_vl_arb_table_record > --- > > typedef struct _ib_switch_info > 4988,4992c5412,5422 > < ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 > < uint8_t port_num; > < uint8_t block_num; > < uint32_t reserved; > < ib_vl_arb_table_t vl_arb_tbl; > --- > > ib_net16_t lin_cap; > > ib_net16_t rand_cap; > > ib_net16_t mcast_cap; > > ib_net16_t lin_top; > > uint8_t def_port; > > uint8_t def_mcast_pri_port; > > uint8_t def_mcast_not_port; > > uint8_t life_state; > > ib_net16_t lids_per_port; > > ib_net16_t enforce_cap; > > uint8_t flags; > 4994c5424 > < } PACK_SUFFIX ib_vl_arb_table_record_t; > --- > > } PACK_SUFFIX ib_switch_info_t; > 4998c5428,5441 > < /****s* IBA Base: Types/ib_grh_t > --- > > #include > > typedef struct _ib_switch_info_record > > { > > ib_net16_t lid; > > uint16_t resv0; > > ib_switch_info_t switch_info; > > uint8_t pad[3]; > > > > } PACK_SUFFIX ib_switch_info_record_t; > > #include > > > > #define IB_SWITCH_PSC 0x04 > > > > /****f* IBA Base: Types/ib_switch_info_get_state_change > 5000c5443 > < * ib_grh_t > --- > > * ib_switch_info_get_state_change > 5003c5446 > < * Global route header information received with unreliable datagram messages > --- > > * Returns the value of the state change flag. > 5007,5008c5450,5452 > < #include > < typedef struct _ib_grh > --- > > static inline boolean_t > > ib_switch_info_get_state_change( > > IN const ib_switch_info_t* const p_si ) > 5010,5019c5454,5467 > < ib_net32_t ver_class_flow; > < ib_net16_t resv1; > < uint8_t resv2; > < uint8_t hop_limit; > < ib_gid_t src_gid; > < ib_gid_t dest_gid; > < > < } PACK_SUFFIX ib_grh_t; > < #include > < /************/ > --- > > return( (p_si->life_state & IB_SWITCH_PSC) == IB_SWITCH_PSC ); > > } > > /* > > * PARAMETERS > > * p_si > > * [in] Pointer to a SwitchInfo attribute. > > * > > * RETURN VALUES > > * Returns the value of the state change flag. > > * > > * NOTES > > * > > * SEE ALSO > > *********/ > 5021c5469 > < /****f* IBA Base: Types/ib_grh_get_ver_class_flow > --- > > /****f* IBA Base: Types/ib_switch_info_clear_state_change > 5023c5471 > < * ib_grh_get_ver_class_flow > --- > > * ib_switch_info_clear_state_change > 5026c5474 > < * Get encoded version, traffic class and flow label in grh > --- > > * Clears the switch's state change bit. > 5030,5035c5478,5480 > < AL_INLINE void AL_API > < ib_grh_get_ver_class_flow( > < IN const ib_net32_t ver_class_flow, > < OUT uint8_t* const p_ver OPTIONAL, > < OUT uint8_t* const p_tclass OPTIONAL, > < OUT net32_t* const p_flow_lbl OPTIONAL ) > --- > > static inline void > > ib_switch_info_clear_state_change( > > IN ib_switch_info_t* const p_si ) > 5037,5048c5482 > < ib_net32_t tmp_ver_class_flow; > < > < tmp_ver_class_flow = cl_ntoh32( ver_class_flow ); > < > < if (p_ver) > < *p_ver = (uint8_t)(tmp_ver_class_flow >> 28); > < > < if (p_tclass) > < *p_tclass = (uint8_t)(tmp_ver_class_flow >> 20); > < > < if (p_flow_lbl) > < *p_flow_lbl = (ver_class_flow & CL_HTON32( 0x000FFFFF )); > --- > > p_si->life_state = (uint8_t)(p_si->life_state & 0xFB); > 5052,5053c5486,5487 > < * ver_class_flow > < * [in] the version, traffic class and flow label info. > --- > > * p_ni > > * [in] Pointer to a PortInfo attribute. > 5056,5063c5490 > < * p_ver > < * [out] pointer to the version info. > < * > < * p_tclass > < * [out] pointer to the traffic class info. > < * > < * p_flow_lbl > < * [out] pointer to the flow label info > --- > > * Returns the LMC value assigned to this port. > 5068d5494 > < * ib_grh_t > 5071c5497,5498 > < /****f* IBA Base: Types/ib_grh_set_ver_class_flow > --- > > > > /****f* IBA Base: Types/ib_switch_info_is_enhanced_port0 > 5073c5500 > < * ib_grh_set_ver_class_flow > --- > > * ib_switch_info_is_enhanced_port0 > 5076c5503,5505 > < * Set encoded version, traffic class and flow label in grh > --- > > * Returns TRUE if the enhancedPort0 bit is on (meaning the switch > > * port zero supports enhanced functions). > > * Returns FALSE otherwise. > 5080,5084c5509,5511 > < AL_INLINE ib_net32_t AL_API > < ib_grh_set_ver_class_flow( > < IN const uint8_t ver, > < IN const uint8_t tclass, > < IN const net32_t flow_lbl ) > --- > > static inline boolean_t > > ib_switch_info_is_enhanced_port0( > > IN const ib_switch_info_t* const p_si ) > 5086,5090c5513 > < ib_net32_t ver_class_flow; > < > < ver_class_flow = cl_hton32( (ver << 28) | (tclass << 20) ); > < ver_class_flow |= (flow_lbl & CL_HTON32( 0x000FFFFF )); > < return (ver_class_flow); > --- > > return( (p_si->flags & 0x08) == 0x08 ); > 5094,5101c5517,5518 > < * ver > < * [in] the version info. > < * > < * tclass > < * [in] the traffic class info. > < * > < * flow_lbl > < * [in] the flow label info > --- > > * p_si > > * [in] Pointer to a SwitchInfo attribute. > 5104,5105c5521 > < * ver_class_flow > < * [out] the version, traffic class and flow label info. > --- > > * Returns TRUE if the switch supports enhanced port 0. FALSE otherwise. > 5110d5525 > < * ib_grh_t > 5113c5528 > < /****s* IBA Base: Types/ib_member_rec_t > --- > > /****s* IBA Base: Types/ib_guid_info_t > 5115c5530 > < * ib_member_rec_t > --- > > * ib_guid_info_t > 5118,5119c5533 > < * Multicast member record, used to create, join, and leave multicast > < * groups. > --- > > * IBA defined GuidInfo. (14.2.5.5) > 5122a5537,5538 > > #define GUID_TABLE_MAX_ENTRIES 8 > > > 5124c5540 > < typedef struct _ib_member_rec > --- > > typedef struct _ib_guid_info > 5126,5130c5542,5564 > < ib_gid_t mgid; > < ib_gid_t port_gid; > < ib_net32_t qkey; > < ib_net16_t mlid; > < uint8_t mtu; > --- > > ib_net64_t guid[GUID_TABLE_MAX_ENTRIES]; > > > > } PACK_SUFFIX ib_guid_info_t; > > #include > > /************/ > > > > #include > > typedef struct _ib_guidinfo_record > > { > > ib_net16_t lid; > > uint8_t block_num; > > uint8_t resv; > > uint32_t reserved; > > ib_guid_info_t guid_info; > > } PACK_SUFFIX ib_guidinfo_record_t; > > #include > > > > #define IB_MULTIPATH_MAX_GIDS 11 /* Support max that can fit into first MAD (for now) */ > > > > #include > > typedef struct _ib_multipath_rec_t > > { > > ib_net32_t hop_flow_raw; > 5131a5566 > > uint8_t num_path; > 5132a5568,5570 > > uint8_t resv0; > > uint8_t sl; > > uint8_t mtu; > 5135,5141c5573,5579 > < ib_net32_t sl_flow_hop; > < uint8_t scope_state; > < uint8_t proxy_join; > < uint8_t reserved[2]; > < uint8_t pad[4]; > < > < } PACK_SUFFIX ib_member_rec_t; > --- > > uint8_t resv1; > > uint8_t independence; /* formerly resv2 */ > > uint8_t sgid_count; > > uint8_t dgid_count; > > uint8_t resv3[7]; > > ib_gid_t gids[IB_MULTIPATH_MAX_GIDS]; > > } PACK_SUFFIX ib_multipath_rec_t; > 5145,5146c5583,5584 > < * mgid > < * Multicast GID address for this multicast group. > --- > > * hop_flow_raw > > * Global routing parameters: hop count, flow label and raw bit. > 5148,5149c5586,5587 > < * port_gid > < * Valid GID of the endpoint joining this multicast group. > --- > > * tclass > > * Another global routing parameter. > 5151,5152c5589,5592 > < * requestor_gid > < * GID of the endpoint making this request on hehave of port_gid. > --- > > * num_path > > * Reversible path - 1 bit to say if path is reversible. > > * num_path [6:0] In queries, maximum number of paths to return. > > * In responses, undefined. > 5154,5155c5594,5595 > < * qkey > < * Q_Key to be used by this multicast group. > --- > > * pkey > > * Partition key (P_Key) to use on this path. > 5157,5158c5597,5598 > < * mlid > < * Multicast LID for this multicast group. > --- > > * sl > > * Service level to use on this path. > 5162,5168d5601 > < * > < * tclass > < * Another global routing parameter. > < * > < * pkey > < * Partition key (P_Key) to use for this member. > < * > 5175,5233c5608,5610 > < * sl_flow_hop > < * Global routing parameters: service level, hop count, and flow label. > < * > < * scope_state > < * MGID scope and JoinState of multicast request. > < * > < * proxy_join > < * Enables others in the Partition to proxy add/remove from the group > < * > < * SEE ALSO > < *********/ > < > < /****f* IBA Base: Types/ib_member_get_sl_flow_hop > < * NAME > < * ib_member_get_sl_flow_hop > < * > < * DESCRIPTION > < * Get encoded sl flow label and hop limit > < * > < * SYNOPSIS > < */ > < AL_INLINE void AL_API > < ib_member_get_sl_flow_hop( > < IN const ib_net32_t sl_flow_hop, > < OUT uint8_t* const p_sl OPTIONAL, > < OUT net32_t* const p_flow_lbl OPTIONAL, > < OUT uint8_t* const p_hop OPTIONAL ) > < { > < ib_net32_t tmp_sl_flow_hop; > < > < if (p_sl) > < *p_sl = (uint8_t)(sl_flow_hop & 0x0f); > < > < tmp_sl_flow_hop = sl_flow_hop >> 4; > < > < if (p_flow_lbl) > < *p_flow_lbl = (uint32_t)(tmp_sl_flow_hop & 0xffffff); > < > < tmp_sl_flow_hop = tmp_sl_flow_hop >> 20; > < > < if (p_hop) > < *p_hop = (uint8_t)(tmp_sl_flow_hop & 0xff); > < } > < /* > < * PARAMETERS > < * sl_flow_hop > < * [in] the sl flow label and hop limit of MC Group > < * > < * RETURN VALUES > < * p_sl > < * [out] pointer to the service level > < * > < * p_flow_lbl > < * [out] pointer to the flow label info > < * > < * p_hop > < * [out] pointer to the hop count limit. > < * > < * NOTES > --- > > * preference > > * Indicates the relative merit of this path versus other path > > * records returned from the SA. Lower numbers are better. > 5236d5612 > < * ib_member_rec_t > 5239c5615 > < /****f* IBA Base: Types/ib_member_set_sl_flow_hop > --- > > /****f* IBA Base: Types/ib_multipath_rec_num_path > 5241c5617 > < * ib_member_set_sl_flow_hop > --- > > * ib_multipath_rec_num_path > 5244c5620 > < * Set encoded version, sl flow and hop > --- > > * Get max number of paths to return. > 5248,5261c5624,5628 > < AL_INLINE ib_net32_t AL_API > < ib_member_set_sl_flow_hop( > < IN const uint8_t sl, > < IN const net32_t flow_lbl, > < IN const uint8_t hop_limit ) > < { > < ib_net32_t sl_flow_hop; > < > < sl_flow_hop = sl; > < sl_flow_hop <<= 20; > < sl_flow_hop |= (cl_ntoh32( flow_lbl ) & 0x000FFFFF); > < sl_flow_hop <<= 8; > < sl_flow_hop |= hop_limit; > < return cl_hton32(sl_flow_hop); > --- > > static inline uint8_t > > ib_multipath_rec_num_path( > > IN const ib_multipath_rec_t* const p_rec ) > > { > > return( p_rec->num_path &0x7F ); > 5265,5272c5632,5633 > < * sl > < * [in] the service level. > < * > < * flow_lbl > < * [in] the flow label info > < * > < * hop_limit > < * [in] the hop limit. > --- > > * p_rec > > * [in] Pointer to the multipath record object. > 5275,5276c5636 > < * sl_flow_hop > < * [out] the sl flow label and hop limit > --- > > * Maximum number of paths to return for each unique SGID_DGID combination. > 5281c5641 > < * ib_member_rec_t > --- > > * ib_multipath_rec_t > 5284c5644 > < /****f* IBA Base: Types/ib_member_get_scope > --- > > /****f* IBA Base: Types/ib_multipath_rec_sl > 5286c5646 > < * ib_member_get_scope > --- > > * ib_multipath_rec_sl > 5289c5649 > < * Get encoded MGID scope > --- > > * Get multipath service level. > 5293,5295c5653,5655 > < AL_INLINE uint8_t AL_API > < ib_member_get_scope( > < IN const uint8_t scope_state ) > --- > > static inline uint8_t > > ib_multipath_rec_sl( > > IN const ib_multipath_rec_t* const p_rec ) > 5297c5657 > < return (scope_state >> 4); > --- > > return( (uint8_t)((cl_ntoh16( p_rec->sl )) & 0xF) ); > 5301,5302c5661,5662 > < * scope_state > < * [in] the scope and state > --- > > * p_rec > > * [in] Pointer to the multipath record object. > 5305c5665,5667 > < * Encoded scope. > --- > > * SL. > > * > > * NOTES > 5308c5670 > < * ib_member_rec_t > --- > > * ib_multipath_rec_t > 5311c5673 > < /****f* IBA Base: Types/ib_member_get_state > --- > > /****f* IBA Base: Types/ib_multipath_rec_mtu > 5313c5675 > < * ib_member_get_state > --- > > * ib_multipath_rec_mtu > 5316c5678 > < * Get encoded MGID JoinState > --- > > * Get encoded path MTU. > 5320,5322c5682,5684 > < AL_INLINE uint8_t AL_API > < ib_member_get_state( > < IN const uint8_t scope_state ) > --- > > static inline uint8_t > > ib_multipath_rec_mtu( > > IN const ib_multipath_rec_t* const p_rec ) > 5324c5686 > < return (scope_state & 0x0f); > --- > > return( (uint8_t)(p_rec->mtu & IB_MULTIPATH_REC_BASE_MASK) ); > 5328,5329c5690,5691 > < * scope_state > < * [in] the scope and state > --- > > * p_rec > > * [in] Pointer to the multipath record object. > 5332c5694,5702 > < * Encoded JoinState > --- > > * Encoded path MTU. > > * 1: 256 > > * 2: 512 > > * 3: 1024 > > * 4: 2048 > > * 5: 4096 > > * others: reserved > > * > > * NOTES > 5335c5705 > < * ib_member_rec_t > --- > > * ib_multipath_rec_t > 5338c5708 > < /****f* IBA Base: Types/ib_member_get_scope_state > --- > > /****f* IBA Base: Types/ib_multipath_rec_mtu_sel > 5340c5710 > < * ib_member_get_scope_state > --- > > * ib_multipath_rec_mtu_sel > 5343c5713 > < * Get encoded MGID scope and JoinState > --- > > * Get encoded multipath MTU selector. > 5347,5351c5717,5719 > < AL_INLINE void AL_API > < ib_member_get_scope_state( > < IN const uint8_t scope_state, > < OUT uint8_t* const p_scope, > < OUT uint8_t* const p_state ) > --- > > static inline uint8_t > > ib_multipath_rec_mtu_sel( > > IN const ib_multipath_rec_t* const p_rec ) > 5353,5357c5721 > < if (p_scope) > < *p_scope = ib_member_get_scope( scope_state ); > < > < if (p_state) > < *p_state = ib_member_get_state( scope_state ); > --- > > return( (uint8_t)((p_rec->mtu & IB_MULTIPATH_REC_SELECTOR_MASK) >> 6) ); > 5361,5362c5725,5726 > < * scope_state > < * [in] the scope and state > --- > > * p_rec > > * [in] Pointer to the multipath record object. > 5365,5369c5729,5733 > < * p_scope > < * [out] pointer to the MGID scope > < * > < * p_state > < * [out] pointer to the join state > --- > > * Encoded path MTU selector value (for queries). > > * 0: greater than MTU specified > > * 1: less than MTU specified > > * 2: exactly the MTU specified > > * 3: largest MTU available > 5374c5738 > < * ib_member_rec_t > --- > > * ib_multipath_rec_t > 5377c5741 > < /****f* IBA Base: Types/ib_member_set_scope > --- > > /****f* IBA Base: Types/ib_multipath_rec_rate > 5379c5743 > < * ib_member_set_scope > --- > > * ib_multipath_rec_rate > 5382c5746 > < * Set encoded scope of a MCR. > --- > > * Get encoded multipath rate. > 5386,5389c5750,5752 > < AL_INLINE void AL_API > < ib_member_set_scope( > < IN OUT uint8_t* const p_scope_state, > < IN const uint8_t scope ) > --- > > static inline uint8_t > > ib_multipath_rec_rate( > > IN const ib_multipath_rec_t* const p_rec ) > 5391,5394c5754 > < CL_ASSERT( scope <= 0x0F ); > < /* Scope is MS 4-bits. */ > < *p_scope_state &= 0xF0; > < *p_scope_state |= (scope << 4); > --- > > return( (uint8_t)(p_rec->rate & IB_MULTIPATH_REC_BASE_MASK) ); > 5398,5402c5758,5759 > < * scope_state > < * [in/out] Pointer to the MCR scope_state field. > < * > < * scope > < * [in] The desired scope. > --- > > * p_rec > > * [in] Pointer to the multipath record object. > 5405c5762,5768 > < * This function does not return a value. > --- > > * Encoded multipath rate. > > * 2: 2.5 Gb/sec. > > * 3: 10 Gb/sec. > > * 4: 30 Gb/sec. > > * others: reserved > > * > > * NOTES > 5408c5771 > < * ib_member_rec_t > --- > > * ib_multipath_rec_t > 5411c5774 > < /****f* IBA Base: Types/ib_member_set_state > --- > > /****f* IBA Base: Types/ib_multipath_rec_rate_sel > 5413c5776 > < * ib_member_set_state > --- > > * ib_multipath_rec_rate_sel > 5416c5779 > < * Set encoded JoinState of a MCR. > --- > > * Get encoded multipath rate selector. > 5420,5423c5783,5785 > < AL_INLINE void AL_API > < ib_member_set_state( > < IN OUT uint8_t* const p_scope_state, > < IN const uint8_t state ) > --- > > static inline uint8_t > > ib_multipath_rec_rate_sel( > > IN const ib_multipath_rec_t* const p_rec ) > 5425,5428c5787 > < CL_ASSERT( state <= 0x0F ); > < /* State is LS 4-bits. */ > < *p_scope_state &= 0x0F; > < *p_scope_state |= (state & 0x0F); > --- > > return( (uint8_t)((p_rec->rate & IB_MULTIPATH_REC_SELECTOR_MASK) >> 6) ); > 5432,5436c5791,5792 > < * scope_state > < * [in/out] Pointer to the MCR scope_state field to modify. > < * > < * state > < * [in] the JoinState > --- > > * p_rec > > * [in] Pointer to the multipath record object. > 5439c5795,5801 > < * This function does not return a value. > --- > > * Encoded path rate selector value (for queries). > > * 0: greater than rate specified > > * 1: less than rate specified > > * 2: exactly the rate specified > > * 3: largest rate available > > * > > * NOTES > 5442c5804 > < * ib_member_rec_t > --- > > * ib_multipath_rec_t > 5445c5807 > < /****f* IBA Base: Types/ib_member_set_scope_state > --- > > /****f* IBA Base: Types/ib_multipath_rec_pkt_life > 5447c5809 > < * ib_member_set_scope_state > --- > > * ib_multipath_rec_pkt_life > 5450c5812 > < * Set encoded version, MGID scope and JoinState > --- > > * Get encoded multipath pkt_life. > 5454,5457c5816,5818 > < AL_INLINE uint8_t AL_API > < ib_member_set_scope_state( > < IN const uint8_t scope, > < IN const uint8_t state ) > --- > > static inline uint8_t > > ib_multipath_rec_pkt_life( > > IN const ib_multipath_rec_t* const p_rec ) > 5459,5460c5820 > < /* Scope is MS 4-bits, state is LS 4-bits */ > < return ((scope << 4) | (state & 0xF)); > --- > > return( (uint8_t)(p_rec->pkt_life & IB_MULTIPATH_REC_BASE_MASK) ); > 5464,5468c5824,5825 > < * scope > < * [in] the MGID scope > < * > < * state > < * [in] the JoinState > --- > > * p_rec > > * [in] Pointer to the multipath record object. > 5471,5472c5828 > < * scope_state > < * [out] the encoded one > --- > > * Encoded multipath pkt_life = 4.096 �sec * 2 ** PacketLifeTime. > 5477c5833 > < * ib_member_rec_t > --- > > * ib_multipath_rec_t > 5480c5836 > < /****f* IBA Base: Types/ib_member_set_join_state > --- > > /****f* IBA Base: Types/ib_multipath_rec_pkt_life_sel > 5482c5838 > < * ib_member_set_join_state > --- > > * ib_multipath_rec_pkt_life_sel > 5485c5841 > < * Set JoinState > --- > > * Get encoded multipath pkt_lifetime selector. > 5489,5492c5845,5847 > < AL_INLINE void AL_API > < ib_member_set_join_state( > < IN OUT ib_member_rec_t *p_mc_rec, > < IN const uint8_t state ) > --- > > static inline uint8_t > > ib_multipath_rec_pkt_life_sel( > > IN const ib_multipath_rec_t* const p_rec ) > 5494,5495c5849 > < p_mc_rec->scope_state &= 0xF0; > < p_mc_rec->scope_state |= (state & 0x0F); > --- > > return( (uint8_t)((p_rec->pkt_life & IB_MULTIPATH_REC_SELECTOR_MASK) >> 6 )); > 5499,5503c5853,5854 > < * p_mc_rec > < * [in] pointer to the member record > < * > < * state > < * [in] the JoinState > --- > > * p_rec > > * [in] Pointer to the multipath record object. > 5506c5857,5861 > < * NONE > --- > > * Encoded path pkt_lifetime selector value (for queries). > > * 0: greater than rate specified > > * 1: less than rate specified > > * 2: exactly the rate specified > > * 3: smallest packet lifetime available > 5511c5866 > < * ib_member_rec_t > --- > > * ib_multipath_rec_t > 5514,5515c5869,5877 > < /* > < * Join State Codes: > --- > > #define IB_NUM_PKEY_ELEMENTS_IN_BLOCK 32 > > /****s* IBA Base: Types/ib_pkey_table_t > > * NAME > > * ib_pkey_table_t > > * > > * DESCRIPTION > > * IBA defined PKey table. (14.2.5.7) > > * > > * SYNOPSIS > 5517,5519d5878 > < #define IB_MC_REC_STATE_FULL_MEMBER 0x01 > < #define IB_MC_REC_STATE_NON_MEMBER 0x02 > < #define IB_MC_REC_STATE_SEND_ONLY_MEMBER 0x04 > 5521,5529c5880,5883 > < /* > < * Generic MAD notice types > < */ > < #define IB_NOTICE_TYPE_FATAL 0x00 > < #define IB_NOTICE_TYPE_URGENT 0x01 > < #define IB_NOTICE_TYPE_SECURITY 0x02 > < #define IB_NOTICE_TYPE_SUBN_MGMT 0x03 > < #define IB_NOTICE_TYPE_INFO 0x04 > < #define IB_NOTICE_TYPE_EMPTY 0x7F > --- > > #include > > typedef struct _ib_pkey_table > > { > > ib_net16_t pkey_entry[IB_NUM_PKEY_ELEMENTS_IN_BLOCK]; > 5530a5885,5887 > > } PACK_SUFFIX ib_pkey_table_t; > > #include > > /************/ > 5531a5889,5897 > > /****s* IBA Base: Types/ib_pkey_table_record_t > > * NAME > > * ib_pkey_table_record_t > > * > > * DESCRIPTION > > * IBA defined P_Key Table Record for SA Query. (15.2.5.11) > > * > > * SYNOPSIS > > */ > 5533c5899 > < typedef struct _ib_mad_notice_attr > --- > > typedef struct _ib_pkey_table_record > 5535,5545c5901,5906 > < /* is_generic:1, type:7, producer type or vendor id:24 */ > < net32_t combo1; > < /* trap number or device id, depending on is_generic. */ > < net16_t combo2; > < > < net16_t issuer_lid; > < /* notice toggle:1, notice_count:15 */ > < net16_t combo3; > < > < uint8_t data_details[54]; > < ib_gid_t issuer_gid; > --- > > ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 > > uint16_t block_num; > > uint8_t port_num; // for switch: port number, for CA: reserved > > uint8_t reserved1; > > uint16_t reserved2; > > ib_pkey_table_t pkey_tbl; > 5547c5908 > < } PACK_SUFFIX ib_mad_notice_attr_t; > --- > > } PACK_SUFFIX ib_pkey_table_record_t; > 5548a5910 > > /************/ > 5550c5912,5914 > < /****f* IBA Base: Types/ib_notice_get_generic > --- > > #define IB_DROP_VL 15 > > #define IB_MAX_NUM_VLS 16 > > /****s* IBA Base: Types/ib_slvl_table_t > 5552c5916 > < * ib_notice_get_generic > --- > > * ib_slvl_table_t > 5555c5919 > < * Retrieves whether a notice trap is generic. > --- > > * IBA defined SL2VL Mapping Table Attribute. (14.2.5.8) > 5559,5561c5923,5924 > < AL_INLINE boolean_t AL_API > < ib_notice_get_generic( > < IN const ib_mad_notice_attr_t* const p_notice_attr ) > --- > > #include > > typedef struct _ib_slvl_table > 5563,5574c5926,5933 > < if( cl_ntoh32( p_notice_attr->combo1 ) & 0x00000001 ) > < return TRUE; > < return FALSE; > < } > < /* > < * PARAMETERS > < * p_notice_attr > < * [in] Pointer to the notice attribute structure for which to return > < * whether it is generic or not. > < * > < * RETURN VALUES > < * Returns TRUE if the notice is generic. > --- > > uint8_t raw_vl_by_sl[IB_MAX_NUM_VLS/2]; > > } PACK_SUFFIX ib_slvl_table_t; > > #include > > /************/ > > > > /****s* IBA Base: Types/ib_slvl_table_record_t > > * NAME > > * ib_slvl_table_record_t > 5576c5935,5936 > < * Returns FALSE if the notice is vendor specific. > --- > > * DESCRIPTION > > * IBA defined SL to VL Mapping Table Record for SA Query. (15.2.5.4) > 5578,5580c5938,5947 > < * SEE ALSO > < * ib_mad_notice_attr_t, ib_notice_set_generic > < *********/ > --- > > * SYNOPSIS > > */ > > #include > > typedef struct _ib_slvl_table_record > > { > > ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 > > uint8_t in_port_num; // reserved for CAs > > uint8_t out_port_num; // reserved for CAs > > uint32_t resv; > > ib_slvl_table_t slvl_tbl; > 5582c5949,5953 > < /****f* IBA Base: Types/ib_notice_set_generic > --- > > } PACK_SUFFIX ib_slvl_table_record_t; > > #include > > /************/ > > > > /****f* IBA Base: Types/ib_slvl_table_set > 5584c5955 > < * ib_notice_set_generic > --- > > * ib_slvl_table_set > 5587c5958 > < * Sets whether a notice trap is generic. > --- > > * Set slvl table entry. > 5591,5594c5962,5966 > < AL_INLINE void AL_API > < ib_notice_set_generic( > < IN OUT ib_mad_notice_attr_t* const p_notice_attr, > < IN const boolean_t is_generic ) > --- > > static inline void > > ib_slvl_table_set( > > IN ib_slvl_table_t* p_slvl_tbl, > > IN uint8_t sl_index, > > IN uint8_t vl ) > 5596c5968,5970 > < uint32_t val; > --- > > uint8_t idx = sl_index/2; > > CL_ASSERT(vl <= 15); > > CL_ASSERT(sl_index <= 15); > 5598,5600c5972,5976 > < val = cl_ntoh32( p_notice_attr->combo1 ); > < if( is_generic ) > < val |= 0x00000001; > --- > > if (sl_index%2) > > { > > /* this is an odd sl. Need to update the ls bits */ > > p_slvl_tbl->raw_vl_by_sl[idx] = ( p_slvl_tbl->raw_vl_by_sl[idx] & 0xF0 ) | vl ; > > } > 5602,5603c5978,5981 > < val &= 0xFFFFFFFE; > < p_notice_attr->combo1 = cl_hton32( val ); > --- > > { > > /* this is an even sl. Need to update the ms bits */ > > p_slvl_tbl->raw_vl_by_sl[idx] = ( vl << 4 ) | ( p_slvl_tbl->raw_vl_by_sl[idx] & 0x0F ); > > } > 5607,5609c5985,5989 > < * p_notice_attr > < * [in/out] Pointer to the notice attribute structure for which to set > < * the generic bit. > --- > > * p_slvl_tbl > > * [in] pointer to ib_slvl_table_t object. > > * > > * sl_index > > * [in] the sl index in the table to be updated. > 5611,5612c5991,5992 > < * is_generic > < * [in] TRUE if the notice is generic, FALSE if vendor specific. > --- > > * vl > > * [in] the vl value to update for that sl. > 5615c5995,5997 > < * This function does not return a value. > --- > > * None > > * > > * NOTES > 5618c6000 > < * ib_mad_notice_attr_t, ib_notice_get_generic > --- > > * ib_slvl_table_t > 5621c6003 > < /****f* IBA Base: Types/ib_notice_get_type > --- > > /****f* IBA Base: Types/ib_slvl_table_get > 5623c6005 > < * ib_notice_get_type > --- > > * ib_slvl_table_get > 5626c6008 > < * Retrieves the type of a notice trap. > --- > > * Get slvl table entry. > 5630,5632c6012,6020 > < AL_INLINE uint8_t AL_API > < ib_notice_get_type( > < IN const ib_mad_notice_attr_t* const p_notice_attr ) > --- > > static inline uint8_t > > ib_slvl_table_get( > > IN const ib_slvl_table_t* p_slvl_tbl, > > IN uint8_t sl_index ) > > { > > uint8_t idx = sl_index/2; > > CL_ASSERT(sl_index <= 15); > > > > if (sl_index%2) > 5634c6022,6029 > < return (uint8_t)((cl_ntoh32( p_notice_attr->combo1 ) >> 1) & 0x0000007F); > --- > > /* this is an odd sl. Need to return the ls bits. */ > > return ( p_slvl_tbl->raw_vl_by_sl[idx] & 0x0F ); > > } > > else > > { > > /* this is an even sl. Need to return the ms bits. */ > > return ( (p_slvl_tbl->raw_vl_by_sl[idx] & 0xF0) >> 4 ); > > } > 5638,5639c6033,6037 > < * p_notice_attr > < * [in] Pointer to the notice attribute structure whose type to return. > --- > > * p_slvl_tbl > > * [in] pointer to ib_slvl_table_t object. > > * > > * sl_index > > * [in] the sl index in the table whose value should be returned. > 5642c6040,6042 > < * Returns the type of the notice. > --- > > * vl for the requested sl_index. > > * > > * NOTES > 5645c6045 > < * ib_mad_notice_attr_t, ib_notice_set_type > --- > > * ib_slvl_table_t > 5648c6048 > < /****f* IBA Base: Types/ib_notice_set_type > --- > > /****s* IBA Base: Types/ib_vl_arb_element_t > 5650c6050 > < * ib_notice_set_type > --- > > * ib_vl_arb_element_t > 5653c6053 > < * Sets the type of a notice trap. > --- > > * IBA defined VL Arbitration Table Element. (14.2.5.9) > 5657,5660c6057,6058 > < AL_INLINE void AL_API > < ib_notice_set_type( > < IN OUT ib_mad_notice_attr_t* const p_notice_attr, > < IN const uint8_t type ) > --- > > #include > > typedef struct _ib_vl_arb_element > 5662c6060,6061 > < uint32_t val; > --- > > uint8_t vl; > > uint8_t weight; > 5664,5684c6063,6065 > < val = cl_ntoh32( p_notice_attr->combo1 ); > < /* Clear the type. */ > < val &= 0xFFFFFF01; > < /* Set new value. */ > < val |= (((uint32_t)(type & 0x7F)) << 1); > < p_notice_attr->combo1 = cl_hton32( val ); > < } > < /* > < * PARAMETERS > < * p_notice_attr > < * [in/out] Pointer to the notice attribute structure whose type to set. > < * > < * type > < * [in] Type of notice trap. > < * > < * RETURN VALUES > < * This function does not return a value. > < * > < * SEE ALSO > < * ib_mad_notice_attr_t, ib_notice_get_type > < *********/ > --- > > } PACK_SUFFIX ib_vl_arb_element_t; > > #include > > /************/ > 5686c6067,6068 > < /****f* IBA Base: Types/ib_notice_get_prod_type > --- > > #define IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK 32 > > /****s* IBA Base: Types/ib_vl_arb_table_t > 5688c6070 > < * ib_notice_get_prod_type > --- > > * ib_vl_arb_table_t > 5691c6073 > < * Retrieves the producer type from a generic notice trap. > --- > > * IBA defined VL Arbitration Table. (14.2.5.9) > 5695,5697c6077,6078 > < AL_INLINE uint32_t AL_API > < ib_notice_get_prod_type( > < IN const ib_mad_notice_attr_t* const p_notice_attr ) > --- > > #include > > typedef struct _ib_vl_arb_table > 5699,5712c6080,6084 > < return (cl_ntoh32( p_notice_attr->combo1 ) >> 8); > < } > < /* > < * PARAMETERS > < * p_notice_attr > < * [in] Pointer to the notice attribute structure whose > < * prducer type to return. > < * > < * RETURN VALUES > < * Returns the producer type of the notice, in host byte order. > < * > < * SEE ALSO > < * ib_mad_notice_attr_t, ib_notice_set_prod_type > < *********/ > --- > > ib_vl_arb_element_t vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]; > > > > } PACK_SUFFIX ib_vl_arb_table_t; > > #include > > /************/ > 5714c6086 > < /****f* IBA Base: Types/ib_notice_set_prod_type > --- > > /****s* IBA Base: Types/ib_vl_arb_table_record_t > 5716c6088 > < * ib_notice_set_prod_type > --- > > * ib_vl_arb_table_record_t > 5719c6091 > < * Sets the producer type of a generic notice trap. > --- > > * IBA defined VL Arbitration Table Record for SA Query. (15.2.5.9) > 5723,5726c6095,6096 > < AL_INLINE void AL_API > < ib_notice_set_prod_type( > < IN OUT ib_mad_notice_attr_t* const p_notice_attr, > < IN const uint32_t prod_type ) > --- > > #include > > typedef struct _ib_vl_arb_table_record > 5728c6098,6106 > < uint32_t val; > --- > > ib_net16_t lid; // for CA: lid of port, for switch lid of port 0 > > uint8_t port_num; > > uint8_t block_num; > > uint32_t reserved; > > ib_vl_arb_table_t vl_arb_tbl; > > > > } PACK_SUFFIX ib_vl_arb_table_record_t; > > #include > > /************/ > 5730,5736d6107 > < val = cl_ntoh32( p_notice_attr->combo1 ); > < /* Clear the type. */ > < val &= 0x000000FF; > < /* Set new value. */ > < val |= (prod_type << 8); > < p_notice_attr->combo1 = cl_hton32( val ); > < } > 5738,5751c6109,6119 > < * PARAMETERS > < * p_notice_attr > < * [in/out] Pointer to the notice attribute structure > < * whose producer type to set. > < * > < * prod_type > < * [in] Producer type of notice trap. > < * > < * RETURN VALUES > < * This function does not return a value. > < * > < * SEE ALSO > < * ib_mad_notice_attr_t, ib_notice_get_prod_type > < *********/ > --- > > * Global route header information received with unreliable datagram messages > > */ > > #include > > typedef struct _ib_grh > > { > > ib_net32_t ver_class_flow; > > ib_net16_t resv1; > > uint8_t resv2; > > uint8_t hop_limit; > > ib_gid_t src_gid; > > ib_gid_t dest_gid; > 5753c6121,6124 > < /****f* IBA Base: Types/ib_notice_get_vend_id > --- > > } PACK_SUFFIX ib_grh_t; > > #include > > > > /****f* IBA Base: Types/ib_grh_get_ver_class_flow > 5755c6126 > < * ib_notice_get_vend_id > --- > > * ib_grh_get_ver_class_flow > 5758c6129 > < * Retrieves the vendor ID from a vendor specific notice trap. > --- > > * Get encoded version, traffic class and flow label in grh > 5762,5764c6133,6138 > < AL_INLINE uint32_t AL_API > < ib_notice_get_vend_id( > < IN const ib_mad_notice_attr_t* const p_notice_attr ) > --- > > static inline void > > ib_grh_get_ver_class_flow( > > IN const ib_net32_t ver_class_flow, > > OUT uint8_t* const p_ver, > > OUT uint8_t* const p_tclass, > > OUT uint32_t* const p_flow_lbl ) > 5766c6140,6153 > < return ib_notice_get_prod_type( p_notice_attr ); > --- > > ib_net32_t tmp_ver_class_flow; > > > > if (p_ver) > > *p_ver = (uint8_t)(ver_class_flow & 0x0f); > > > > tmp_ver_class_flow = ver_class_flow >> 4; > > > > if (p_tclass) > > *p_tclass = (uint8_t)(tmp_ver_class_flow & 0xff); > > > > tmp_ver_class_flow = tmp_ver_class_flow >> 8; > > > > if (p_flow_lbl) > > *p_flow_lbl = tmp_ver_class_flow & 0xfffff; > 5770,5772c6157,6158 > < * p_notice_attr > < * [in] Pointer to... > > [Message clipped] From ftillier at silverstorm.com Mon Jul 10 06:23:28 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Mon, 10 Jul 2006 09:23:28 -0400 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation In-Reply-To: <79ae2f320607100621n479ba1e3n5ea0a1679c73603f@mail.gmail.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A83C5@mtlexch01.mtl.com> <79ae2f320607100621n479ba1e3n5ea0a1679c73603f@mail.gmail.com> Message-ID: <79ae2f320607100623t74bc63f2mf6f800599c174868@mail.gmail.com> On 7/10/06, Fabian Tillier improperly quoted a huge message. I want to appologize to everyone for not clipping the original mail - I hit send too soon. - Fab From mst at mellanox.co.il Mon Jul 10 08:18:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 18:18:24 +0300 Subject: [openib-general] [PATCH updated] IB/mthca: comment fix In-Reply-To: <20060710111412.GD24705@mellanox.co.il> References: <20060710111412.GD24705@mellanox.co.il> Message-ID: <20060710151824.GL24705@mellanox.co.il> OK, the following makes it clear what the function does, from its name. More comments? --- After recent changes, mthca_wq_init does not actually initialize the WQ as it used to - it simply resets all index fields to their initial values. So, let's rename it to mthca_wq_reset. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 490fc78..cd8b672 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -222,9 +222,8 @@ static void *get_send_wqe(struct mthca_q (PAGE_SIZE - 1)); } -static void mthca_wq_init(struct mthca_wq *wq) +static void mthca_wq_reset(struct mthca_wq *wq) { - /* mthca_alloc_qp_common() initializes the locks */ wq->next_ind = 0; wq->last_comp = wq->max - 1; wq->head = 0; @@ -845,10 +844,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); - mthca_wq_init(&qp->sq); + mthca_wq_reset(&qp->sq); qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); - mthca_wq_init(&qp->rq); + mthca_wq_reset(&qp->rq); qp->rq.last = get_recv_wqe(qp, qp->rq.max - 1); if (mthca_is_memfree(dev)) { @@ -1112,9 +1111,9 @@ static int mthca_alloc_qp_common(struct qp->atomic_rd_en = 0; qp->resp_depth = 0; qp->sq_policy = send_policy; - mthca_wq_init(&qp->sq); - mthca_wq_init(&qp->rq); - /* these are initialized separately so lockdep can tell them apart */ + mthca_wq_reset(&qp->sq); + mthca_wq_reset(&qp->rq); + spin_lock_init(&qp->sq.lock); spin_lock_init(&qp->rq.lock); -- MST From halr at voltaire.com Mon Jul 10 08:37:45 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 11:37:45 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_sa_path_record.c: Support SL in PathRecord requests based on whether or not QoS is enabled Message-ID: <1152545863.4508.92349.camel@hal.voltaire.com> OpenSM/osm_sa_path_record.c: Support SL in PathRecord requests based on whether or not QoS is enabled Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8470) +++ opensm/osm_sa_path_record.c (working copy) @@ -548,7 +548,10 @@ __osm_pr_rcv_get_path_parms( p_parms->mtu = mtu; p_parms->rate = rate; p_parms->pkt_life = pkt_life; - p_parms->sl = OSM_DEFAULT_SL; + if ( comp_mask & IB_PR_COMPMASK_SL ) + p_parms->sl = ib_path_rec_sl( p_pr ); + else + p_parms->sl = OSM_DEFAULT_SL; if( comp_mask & IB_PR_COMPMASK_RAWTRAFFIC && cl_ntoh32( p_pr->hop_flow_raw ) & ( 1<<31 ) ) @@ -987,15 +990,19 @@ __osm_pr_rcv_get_end_points( into the endpoints. */ - if( comp_mask & IB_PR_COMPMASK_SL ) + /* Check for default SL only if QoS not enabled */ + if ( p_rcv->p_subn->opt.no_qos ) { - if( p_pr->sl != OSM_DEFAULT_SL ) + if( comp_mask & IB_PR_COMPMASK_SL ) { - if ( p_sa_mad->method == IB_MAD_METHOD_GET ) - sa_status = IB_SA_MAD_STATUS_NO_RECORDS; - *pp_src_port = 0; - *pp_dest_port = 0; - goto Exit; + if( ib_path_rec_sl( p_pr ) != OSM_DEFAULT_SL ) + { + if ( p_sa_mad->method == IB_MAD_METHOD_GET ) + sa_status = IB_SA_MAD_STATUS_NO_RECORDS; + *pp_src_port = 0; + *pp_dest_port = 0; + goto Exit; + } } } @@ -1414,7 +1421,7 @@ __osm_pr_match_mgrp_attributes( if( comp_mask & IB_PR_COMPMASK_SL ) { - if( ( p_pr->sl & 0xf ) != sl ) + if( ib_path_rec_sl( p_pr ) != sl ) goto Exit; } From halr at voltaire.com Mon Jul 10 08:41:26 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 11:41:26 -0400 Subject: [openib-general] [PATCH] OpenSM: Support configurable SL per partition Message-ID: <1152546085.4508.92522.camel@hal.voltaire.com> OpenSM: Support configurable SL per partition Signed-off-by: Sasha Khapyorsky Signed-off-by: Hal Rosenstock Index: include/opensm/osm_partition.h =================================================================== --- include/opensm/osm_partition.h (revision 8470) +++ include/opensm/osm_partition.h (working copy) @@ -102,6 +102,7 @@ typedef struct _osm_prtn { cl_map_item_t map_item; uint16_t pkey; + uint8_t sl; cl_map_t full_guid_tbl; cl_map_t part_guid_tbl; char name[32]; @@ -114,6 +115,9 @@ typedef struct _osm_prtn * pkey * The IBA defined P_KEY of this Partition. * +* sl +* The Service Level (SL) associated with this Partiton. +* * port_guid_tbl * Container of pointers to all Port objects in the Partition, * indexed by port GUID. Index: opensm/osm_prtn.c =================================================================== --- opensm/osm_prtn.c (revision 8470) +++ opensm/osm_prtn.c (working copy) @@ -79,6 +79,7 @@ osm_prtn_t* osm_prtn_new( memset(p, 0, sizeof(*p)); p->pkey = pkey; + p->sl = OSM_DEFAULT_SL; cl_map_construct(&p->full_guid_tbl); cl_map_init(&p->full_guid_tbl, 32); cl_map_construct(&p->part_guid_tbl); @@ -220,7 +221,7 @@ ib_api_status_t osm_prtn_add_mcgroup(osm mc_rec.pkey = pkey; mc_rec.rate = rate ? rate : 0x3; /* 10Gb/sec */ mc_rec.pkt_life = OSM_DEFAULT_SUBNET_TIMEOUT; - mc_rec.sl_flow_hop = OSM_DEFAULT_SL << 28; + mc_rec.sl_flow_hop = ib_member_set_sl_flow_hop(p->sl, 0, 0); /* Note: scope needs to be consistent with MGID */ mc_rec.scope_state = 0x21; Index: opensm/osm_prtn_config.c =================================================================== --- opensm/osm_prtn_config.c (revision 8470) +++ opensm/osm_prtn_config.c (working copy) @@ -54,6 +54,7 @@ #include #include +#include #include #include #include @@ -82,7 +83,7 @@ struct part_conf { osm_log_t *p_log; osm_subn_t *p_subn; osm_prtn_t *p_prtn; - unsigned is_ipoib, mtu, rate; + unsigned is_ipoib, mtu, rate, sl; }; @@ -122,6 +123,16 @@ static int partition_create(unsigned lin if (!conf->p_prtn) return -1; + if (conf->p_subn->opt.no_qos) { + if (conf->sl != OSM_DEFAULT_SL) { + osm_log(conf->p_log, OSM_LOG_ERROR, + "partition_create: Overriding SL %d to default SL %d on partition %s as QoS not enabled\n", + conf->sl, OSM_DEFAULT_SL, name); + conf->sl = OSM_DEFAULT_SL; + } + } + conf->p_prtn->sl = conf->sl & 0xf; + if (conf->is_ipoib) osm_prtn_add_mcgroup(conf->p_log, conf->p_subn, conf->p_prtn, conf->is_ipoib, conf->rate, conf->mtu); @@ -145,6 +156,15 @@ static int partition_add_flag(unsigned l PARSEWARN(conf->p_log, lineno, "flag \'rate\' requires valid value" " - skipped.\n"); + } else if (!strncmp(flag, "sl", len)) { + unsigned sl; + + if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > 15) + PARSEWARN(conf->p_log, lineno, + "flag \'sl\' requires valid value" + " - skipped.\n"); + else + conf->sl = sl; } else { PARSEWARN(conf->p_log, lineno, "unrecognized partition flag \'%s\'" @@ -254,6 +274,8 @@ static struct part_conf *new_part_conf(o conf->p_log = p_log; conf->p_subn = p_subn; conf->p_prtn = NULL; + conf->is_ipoib = 0; + conf->sl = OSM_DEFAULT_SL; return conf; } Index: doc/partition-config.txt =================================================================== --- doc/partition-config.txt (revision 8470) +++ doc/partition-config.txt (working copy) @@ -46,6 +46,7 @@ ipoib - indicates that this partiti result IPoIB capable MC group will be created. rate= - specifies rate for this IPoIB MC group (default is 3 (10GBps)) mtu= - specifies MTU for this IPoIB MC group (default is 4 (2048)) +sl= - specifies SL for this IPoIB MC group (default is 0) Note that values for 'rate' and 'mtu' should be specified as defined in IBTA specification (for example mtu=4 for 2048). From halr at voltaire.com Mon Jul 10 09:02:01 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 12:02:01 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_sa_multipath_record.c: Support SL in MultiPathRecord requests based on whether or not QoS is enabled Message-ID: <1152547320.4508.93534.camel@hal.voltaire.com> OpenSM/osm_sa_multipath_record.c: Support SL in MultiPathRecord requests based on whether or not QoS is enabled Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_multipath_record.c =================================================================== --- opensm/osm_sa_multipath_record.c (revision 8475) +++ opensm/osm_sa_multipath_record.c (working copy) @@ -198,6 +198,18 @@ __osm_mpr_rcv_get_path_parms( else required_sl = OSM_DEFAULT_SL; + /* Check for default SL only if QoS not enabled */ + if ( p_rcv->p_subn->opt.no_qos ) { + if ( required_sl != OSM_DEFAULT_SL) { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4511: " + "Requested SL %d not default SL %d and QoS not enabled\n", + required_sl, OSM_DEFAULT_SL); + status = IB_NOT_FOUND; + goto Exit; + } + } + if ( comp_mask & IB_MPR_COMPMASK_PKEY ) { required_pkey = p_mpr->pkey; if ( !osm_physp_has_pkey( p_rcv->p_log, required_pkey, p_physp ) || From jlentini at netapp.com Mon Jul 10 09:35:30 2006 From: jlentini at netapp.com (James Lentini) Date: Mon, 10 Jul 2006 12:35:30 -0400 (EDT) Subject: [openib-general] [RFC] [PATCH 3/7] ibrdmaverbs hdr files In-Reply-To: <1152526354.28026.1.camel@localhost.localdomain> References: <1152526354.28026.1.camel@localhost.localdomain> Message-ID: What are the differences between this and the libibverbs? Is it only that the ibv_ prefix was changed to rdma_? Long term, I don't think it makes sense having two libraries with the exact same functionality, but different function names. Replacing ibv_ with rdma_ would be ideal, but disruptive to current users. Could you provide a backwards compatibility header (one that mapped each ibv_ function to its equivalent rdma_ function)? One additional comment below: On Mon, 10 Jul 2006, Krishna Kumar wrote: > diff -ruNp ORG/librdmaverbs/include/rdma/arch.h NEW/librdmaverbs/include/rdma/arch.h > --- ORG/librdmaverbs/include/rdma/arch.h 1969-12-31 16:00:00.000000000 -0800 > +++ NEW/librdmaverbs/include/rdma/arch.h 2006-07-10 18:07:46.000000000 -0700 > @@ -0,0 +1,95 @@ > +/* > + * Copyright (c) 2005 Topspin Communications. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id: arch.h 6987 2006-05-08 15:18:51Z tom $ > + */ > + > +#ifndef INFINIBAND_ARCH_H > +#define INFINIBAND_ARCH_H I wouldn't prefix all of you include guards with INFINIBAND_. Not only is it confusing, but it collides with the libibverbs headers. I'd suggest using RDMA_ instead. From mshefty at ichips.intel.com Mon Jul 10 09:44:22 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 09:44:22 -0700 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation In-Reply-To: <79ae2f320607100621n479ba1e3n5ea0a1679c73603f@mail.gmail.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A83C5@mtlexch01.mtl.com> <79ae2f320607100621n479ba1e3n5ea0a1679c73603f@mail.gmail.com> Message-ID: <44B283E6.1020405@ichips.intel.com> Fabian Tillier wrote: > Could you filter these out and send out what the actual changes > thatmatter are? I quickly lost interest here. Can you also use the -up diff format? - Sean From mshefty at ichips.intel.com Mon Jul 10 09:46:37 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 09:46:37 -0700 Subject: [openib-general] [PATCH upstream] IB/cm: drop REQ when out of memory In-Reply-To: <20060710124347.GI24705@mellanox.co.il> References: <000101c69a37$fb0deb30$e598070a@amr.corp.intel.com> <20060710124347.GI24705@mellanox.co.il> Message-ID: <44B2846D.8040108@ichips.intel.com> Michael S. Tsirkin wrote: > I plan to send the following (from SVN trunk rev 8261) upstream to Andrew. > Comments? This is fine. - Sean From mshefty at ichips.intel.com Mon Jul 10 09:48:36 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 09:48:36 -0700 Subject: [openib-general] [PATCH upstream] IB/addr: gid structure alignment fix In-Reply-To: <20060710124755.GJ24705@mellanox.co.il> References: <20060710124755.GJ24705@mellanox.co.il> Message-ID: <44B284E4.3060304@ichips.intel.com> Michael S. Tsirkin wrote: > I plan to send the following (from SVN r8265) upstream to Andrew. > Comments? looks fine - thanks for separating these changes out - Sean From rdreier at cisco.com Mon Jul 10 09:57:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 10 Jul 2006 09:57:03 -0700 Subject: [openib-general] [RFC] [PATCH 3/7] ibrdmaverbs hdr files In-Reply-To: (James Lentini's message of "Mon, 10 Jul 2006 12:35:30 -0400 (EDT)") References: <1152526354.28026.1.camel@localhost.localdomain> Message-ID: James> Long term, I don't think it makes sense having two James> libraries with the exact same functionality, but different James> function names. Replacing ibv_ with rdma_ would be ideal, James> but disruptive to current users. Could you provide a James> backwards compatibility header (one that mapped each ibv_ James> function to its equivalent rdma_ function)? Agreed. I think that the way forward is to provide a libibverbs release that has both rdma_ and ibv_ prefixed identifiers, and also provides both and headers, etc. Then in the next major libibverbs release, the ibv_ versions of identifiers can be deprecated, and finally removed in the next major release (which could be renamed to librdmaverbs). Following that route is something that I've been meaning to work on, and I'd be happy to accept patches in that direction. But a big bang approach that breaks the world is not a good idea I don't think. - R. From halr at voltaire.com Mon Jul 10 10:10:44 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 13:10:44 -0400 Subject: [openib-general] user_mad check question In-Reply-To: <000101c6a1f3$7c7783d0$8698070a@amr.corp.intel.com> References: <000101c6a1f3$7c7783d0$8698070a@amr.corp.intel.com> Message-ID: <1152551443.11136.2563.camel@hal.voltaire.com> On Fri, 2006-07-07 at 14:30, Sean Hefty wrote: > The following check in user_mad is done when sending a MAD. > > /* > * If userspace is generating a request that will generate a > * response, we need to make sure the high-order part of the > * transaction ID matches the agent being used to send the > * MAD. > */ > method = ((struct ib_mad_hdr *) packet->msg->mad)->method; > > if (!(method & IB_MGMT_METHOD_RESP) && > method != IB_MGMT_METHOD_TRAP_REPRESS && > method != IB_MGMT_METHOD_SEND) { > tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid; > *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 | > (be64_to_cpup(tid) & 0xffffffff)); > } > > Does anyone know the reason for the last check (method != SEND)? Because there is no required response to a send method. > Isn't it still a requirement that the TID be unique? I'm not sure if duplicate detection is based on the TID or some upper layer protocol but if send is an operation under C13-18.1.1, each different send is a different TID. Some SENDs have "responses" which are sent by SEND method and others don't. I'm not sure if the matching is transaction ID based or not. -- Hal > I'm trying to add some validation checks against MADs sent from userspace to > protect against a client sending non-data RMPP MADs, duplicate MADs, etc., and > the above check looks incorrect to me. > > - Sean From halr at voltaire.com Mon Jul 10 10:11:35 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 13:11:35 -0400 Subject: [openib-general] user_mad check question In-Reply-To: <44AECB32.6040300@ichips.intel.com> References: <44AECB32.6040300@ichips.intel.com> Message-ID: <1152551495.11136.2621.camel@hal.voltaire.com> On Fri, 2006-07-07 at 16:59, Sean Hefty wrote: [snip...] > Can sends ever be used with RMPP? There;s nothing that precludes this in the IBA although I do not believe this is currently used by any class. -- Hal From mshefty at ichips.intel.com Mon Jul 10 10:15:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 10:15:33 -0700 Subject: [openib-general] user_mad check question In-Reply-To: References: Message-ID: <44B28B35.9020405@ichips.intel.com> Rimmer, Todd wrote: > We defined a response as: > ((R bit set || TRAP_REPRESS) && ! SEND) || (Class=BM && SEND && > AttributeModifier BM Response bit set) At this point, I'm leaning towards setting the upper bits of the TID for all MADs that are not responses. (This is for usermode only, so kernel agents such as the CM are left as is.) I think that a response can be identified as: Response bit set || trap repress || (class=BM && attribute modifier response bit set) The last check is missing from current code, and the SEND checks listed above seem unnecessary. - Sean From sean.hefty at intel.com Mon Jul 10 10:39:25 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 10:39:25 -0700 Subject: [openib-general] user_mad check question In-Reply-To: <1152551443.11136.2563.camel@hal.voltaire.com> Message-ID: <000001c6a447$c97c5600$ff0da8c0@amr.corp.intel.com> >> Does anyone know the reason for the last check (method != SEND)? > >Because there is no required response to a send method. > >> Isn't it still a requirement that the TID be unique? > >I'm not sure if duplicate detection is based on the TID or some upper >layer protocol but if send is an operation under C13-18.1.1, each >different send is a different TID. Some SENDs have "responses" which are >sent by SEND method and others don't. I'm not sure if the matching is >transaction ID based or not. If sends are included in C13-18.1.1, then the upper bits of the TID need to be set to guarantee that they are unique across multiple agents. If sends are excluded from C13-18.1.1, then I believe that this would also exclude sends from ever being used for RMPP. - Sean From mst at mellanox.co.il Mon Jul 10 11:07:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 21:07:01 +0300 Subject: [openib-general] [PATCH] srp: fix fmr error handling Message-ID: <20060710180701.GA29641@mellanox.co.il> I plan to send the following upstream. Comments? --- From: Vu Pham srp_unmap_data assumes req->fmr is NULL if the request is not mapped, so we must clean it out in case of an error. Signed-off-by: Vu Pham Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 4e22afe..8f472e7 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -615,9 +615,10 @@ static int srp_map_fmr(struct srp_device (sg_dma_address(&scat[i]) & dev->fmr_page_mask) + j; req->fmr = ib_fmr_pool_map_phys(dev->fmr_pool, dma_pages, page_cnt, &io_addr); if (IS_ERR(req->fmr)) { ret = PTR_ERR(req->fmr); + req->fmr = NULL; goto out; } -- MST From halr at voltaire.com Mon Jul 10 11:06:24 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 14:06:24 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_vendor_sa_api.h: Add InformInfoRecord query result support Message-ID: <1152554784.11136.5011.camel@hal.voltaire.com> OpenSM/osm_vendor_sa_api.h: Add InformInfoRecord query result support Signed-off-by: Hal Rosenstock Index: include/vendor/osm_vendor_sa_api.h =================================================================== --- include/vendor/osm_vendor_sa_api.h (revision 8475) +++ include/vendor/osm_vendor_sa_api.h (working copy) @@ -654,6 +654,47 @@ osmv_get_query_mc_rec( * osmv_query_res_t, osm_madw_t, osmv_get_query_result, ib_member_rec_t *****/ +/****f* OpenSM Vendor SA Client/osmv_get_query_inform_info_rec +* NAME +* osmv_get_query_inform_info_rec +* +* DESCRIPTION +* Retrieves an InformInfo record result from a MAD returned by +* a call to osmv_query_sa(). +* +* SYNOPSIS +*/ +static inline ib_inform_info_record_t* +osmv_get_query_inform_info_rec( + IN osm_madw_t *p_result_madw, + IN uint32_t result_index ) +{ + ib_sa_mad_t *p_sa_mad; + + CL_ASSERT( p_result_madw ); + p_sa_mad = (ib_sa_mad_t*)osm_madw_get_mad_ptr( p_result_madw ); + CL_ASSERT( p_sa_mad && p_sa_mad->attr_id == IB_MAD_ATTR_INFORM_INFO_RECORD ); + + return( (ib_inform_info_record_t*)osmv_get_query_result( p_result_madw, + result_index ) ); +} +/* +* PARAMETERS +* p_result_madw +* [in] This is a reference to the MAD returned as a result of the +* query. +* +* result_index +* [in] A zero-based index indicating which result to return. +* +* NOTES +* This call returns a pointer to the start of a service record result from +* a call to osmv_query_sa(). +* +* SEE ALSO +* osmv_query_res_t, osm_madw_t, osmv_get_query_result, ib_inform_info_record_t +*****/ + /****f* OpenSM Vendor SA Client/osmv_pfn_query_cb_t * NAME * osmv_pfn_query_cb_t From mst at mellanox.co.il Mon Jul 10 11:14:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 21:14:06 +0300 Subject: [openib-general] [PATCH] IB/cm: drop REQ when out of memory Message-ID: <20060710181406.GC29641@mellanox.co.il> Hello Andrew! Could you please drop the following in -mm and on to Linus? --- If a user of the IB CM returns -ENOMEM from their connection callback, simply drop the incoming REQ - do not attempt to send a reject. This should allow the sender to retry the request. Signed-off-by: Michael S. Tsirkin Signed-off-by: Sean Hefty Index: l/drivers/infiniband/core/cm.c =================================================================== --- l/drivers/infiniband/core/cm.c (revision 8224) +++ l/drivers/infiniband/core/cm.c (working copy) @@ -702,7 +702,7 @@ static void cm_reset_to_idle(struct cm_i } } -void ib_destroy_cm_id(struct ib_cm_id *cm_id) +static void cm_destroy_id(struct ib_cm_id *cm_id, int err) { struct cm_id_private *cm_id_priv; struct cm_work *work; @@ -736,12 +736,22 @@ retest: sizeof cm_id_priv->av.port->cm_dev->ca_guid, NULL, 0); break; + case IB_CM_REQ_RCVD: + if (err == -ENOMEM) { + /* Do not reject to allow future retries. */ + cm_reset_to_idle(cm_id_priv); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + } else { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); + } + break; case IB_CM_MRA_REQ_RCVD: case IB_CM_REP_SENT: case IB_CM_MRA_REP_RCVD: ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); /* Fall through */ - case IB_CM_REQ_RCVD: case IB_CM_MRA_REQ_SENT: case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: @@ -776,6 +786,11 @@ retest: kfree(cm_id_priv->private_data); kfree(cm_id_priv); } + +void ib_destroy_cm_id(struct ib_cm_id *cm_id) +{ + cm_destroy_id(cm_id, 0); +} EXPORT_SYMBOL(ib_destroy_cm_id); int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask, @@ -1164,7 +1179,7 @@ static void cm_process_work(struct cm_id } cm_deref_id(cm_id_priv); if (ret) - ib_destroy_cm_id(&cm_id_priv->id); + cm_destroy_id(&cm_id_priv->id, ret); } static void cm_format_mra(struct cm_mra_msg *mra_msg, -- MST _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ----- End forwarded message ----- -- MST From mst at mellanox.co.il Mon Jul 10 11:18:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 21:18:25 +0300 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference Message-ID: <20060710181825.GE29641@mellanox.co.il> I plan to send the following upstream - killing unnecessary code is always good. Comments? --- ib_fmr_pool_map_phys gets the virtual address by pointer but never writes there, and users (e.g. srp) seem to assume this and ignore the value returned. This patch cleans up the API to get the VA by value, and updates all users. Signed-off-by: Michael S. Tsirkin diff --git a/include/rdma/ib_fmr_pool.h b/include/rdma/ib_fmr_pool.h index 4ace54c..00dadbf 100644 --- a/include/rdma/ib_fmr_pool.h +++ b/include/rdma/ib_fmr_pool.h @@ -88,7 +88,7 @@ int ib_flush_fmr_pool(struct ib_fmr_pool struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle, u64 *page_list, int list_len, - u64 *io_virtual_address); + u64 io_virtual_address); int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr); diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c index 615fe9c..86a3b2d 100644 --- a/drivers/infiniband/core/fmr_pool.c +++ b/drivers/infiniband/core/fmr_pool.c @@ -426,7 +426,7 @@ EXPORT_SYMBOL(ib_flush_fmr_pool); struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle, u64 *page_list, int list_len, - u64 *io_virtual_address) + u64 io_virtual_address) { struct ib_fmr_pool *pool = pool_handle; struct ib_pool_fmr *fmr; @@ -440,7 +440,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys fmr = ib_fmr_cache_lookup(pool, page_list, list_len, - *io_virtual_address); + io_virtual_address); if (fmr) { /* found in cache */ ++fmr->ref_count; @@ -464,7 +464,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys spin_unlock_irqrestore(&pool->pool_lock, flags); result = ib_map_phys_fmr(fmr->fmr, page_list, list_len, - *io_virtual_address); + io_virtual_address); if (result) { spin_lock_irqsave(&pool->pool_lock, flags); @@ -481,7 +481,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys fmr->ref_count = 1; if (pool->cache_bucket) { - fmr->io_virtual_address = *io_virtual_address; + fmr->io_virtual_address = io_virtual_address; fmr->page_list_len = list_len; memcpy(fmr->page_list, page_list, list_len * sizeof(*page_list)); diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index ff117bb..72febf1 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -594,7 +594,7 @@ int iser_reg_page_vec(struct iser_conn mem = ib_fmr_pool_map_phys(ib_conn->fmr_pool, page_list, page_vec->length, - &io_addr); + io_addr); if (IS_ERR(mem)) { status = (int)PTR_ERR(mem); diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 4e22afe..9bf182a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -615,7 +615,7 @@ static int srp_map_fmr(struct srp_device (sg_dma_address(&scat[i]) & dev->fmr_page_mask) + j; req->fmr = ib_fmr_pool_map_phys(dev->fmr_pool, - dma_pages, page_cnt, &io_addr); + dma_pages, page_cnt, io_addr); if (IS_ERR(req->fmr)) { ret = PTR_ERR(req->fmr); goto out; -- MST From mst at mellanox.co.il Mon Jul 10 11:15:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 21:15:47 +0300 Subject: [openib-general] [PATCH] IB/addr: gid structure alignment fix Message-ID: <20060710181547.GD29641@mellanox.co.il> Hello, Andrew! Could you please drop the following in -mm and on to Linus? --- The device address contains unsigned character arrays, which contain raw GID addresses. The GIDs may not be naturally aligned, so do not cast them to structures or unions. Signed-off-by: Sean Hefty Signed-off-by: Michael S. Tsirkin Index: gitcma/include/rdma/ib_addr.h =================================================================== --- gitcma.orig/include/rdma/ib_addr.h 2006-07-09 23:41:27.000000000 +0300 +++ gitcma/include/rdma/ib_addr.h 2006-07-09 23:51:23.000000000 +0300 @@ -89,9 +89,10 @@ static inline void ib_addr_set_pkey(stru dev_addr->broadcast[9] = (unsigned char) pkey; } -static inline union ib_gid *ib_addr_get_sgid(struct rdma_dev_addr *dev_addr) +static inline void ib_addr_get_sgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) { - return (union ib_gid *) (dev_addr->src_dev_addr + 4); + memcpy(gid, dev_addr->src_dev_addr + 4, sizeof *gid); } static inline void ib_addr_set_sgid(struct rdma_dev_addr *dev_addr, @@ -100,9 +101,10 @@ static inline void ib_addr_set_sgid(stru memcpy(dev_addr->src_dev_addr + 4, gid, sizeof *gid); } -static inline union ib_gid *ib_addr_get_dgid(struct rdma_dev_addr *dev_addr) +static inline void ib_addr_get_dgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) { - return (union ib_gid *) (dev_addr->dst_dev_addr + 4); + memcpy(gid, dev_addr->dst_dev_addr + 4, sizeof *gid); } static inline void ib_addr_set_dgid(struct rdma_dev_addr *dev_addr, Index: gitcma/drivers/infiniband/core/cma.c =================================================================== --- gitcma.orig/drivers/infiniband/core/cma.c 2006-07-09 23:41:26.000000000 +0300 +++ gitcma/drivers/infiniband/core/cma.c 2006-07-09 23:51:23.000000000 +0300 @@ -262,14 +262,14 @@ static void cma_detach_from_dev(struct r static int cma_acquire_ib_dev(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; - union ib_gid *gid; + union ib_gid gid; int ret = -ENODEV; - gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); + ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid), mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) { - ret = ib_find_cached_gid(cma_dev->device, gid, + ret = ib_find_cached_gid(cma_dev->device, &gid, &id_priv->id.port_num, NULL); if (!ret) { cma_attach_to_dev(id_priv, cma_dev); @@ -1134,8 +1134,8 @@ static int cma_query_ib_route(struct rdm struct ib_sa_path_rec path_rec; memset(&path_rec, 0, sizeof path_rec); - path_rec.sgid = *ib_addr_get_sgid(addr); - path_rec.dgid = *ib_addr_get_dgid(addr); + ib_addr_get_sgid(addr, &path_rec.sgid); + ib_addr_get_dgid(addr, &path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); path_rec.numb_path = 1; @@ -1263,7 +1263,7 @@ static int cma_bind_loopback(struct rdma { struct cma_device *cma_dev; struct ib_port_attr port_attr; - union ib_gid *gid; + union ib_gid gid; u16 pkey; int ret; u8 p; @@ -1284,8 +1284,7 @@ static int cma_bind_loopback(struct rdma } port_found: - gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); - ret = ib_get_cached_gid(cma_dev->device, p, 0, gid); + ret = ib_get_cached_gid(cma_dev->device, p, 0, &gid); if (ret) goto out; @@ -1293,6 +1292,7 @@ port_found: if (ret) goto out; + ib_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); id_priv->id.port_num = p; cma_attach_to_dev(id_priv, cma_dev); @@ -1339,6 +1339,7 @@ static int cma_resolve_loopback(struct r { struct cma_work *work; struct sockaddr_in *src_in, *dst_in; + union ib_gid gid; int ret; work = kzalloc(sizeof *work, GFP_KERNEL); @@ -1351,8 +1352,8 @@ static int cma_resolve_loopback(struct r goto err; } - ib_addr_set_dgid(&id_priv->id.route.addr.dev_addr, - ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr)); + ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); + ib_addr_set_dgid(&id_priv->id.route.addr.dev_addr, &gid); if (cma_zero_addr(&id_priv->id.route.addr.src_addr)) { src_in = (struct sockaddr_in *)&id_priv->id.route.addr.src_addr; -- MST _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ----- End forwarded message ----- -- MST From ravinandan.arakali at neterion.com Mon Jul 10 11:22:43 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Mon, 10 Jul 2006 11:22:43 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <44AF4E76.4090502@in.ibm.com> Message-ID: <001401c6a44d$d5e06d90$4710100a@pc.s2io.com> Pradipta, Okay, thanks.. Initially, I was not sure since I don't remember non-zero values in /proc/krping. When I re-ran the krping test, I see following output openfab2:~ # cat /proc/krping 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 As you mentioned, the RDMA traffic seems to be flowing indeed ! Any idea why rping is dumping core ? Has any testing been done using SDP with ammasso cards ? Regards, Ravi -----Original Message----- From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] Sent: Friday, July 07, 2006 11:20 PM To: Ravinandan Arakali Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARP interface) Ravinandan Arakali wrote: > Pradipta, > Following is the output from gdb after core dump. I have also copy-pasted > the gdb output on client system. > > Attached is the dmesg output when krping test is run in verbose mode. > The ping data on the sender(client) seems okay. The content is shifted > forward by one character for each packet. On receiver, after receiving ping > pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages > can be lost during writing to /var/log/messages ? krping is indeed working!!...Using 'verbose' allows you to see the ping data. When not using 'verbose' you see only 'send/recv' messages. > > ----------------------------------------- > (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > Starting program: > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > 0.0.0.0 -p 9999 > [Thread debugging using libthread_db enabled] > [New Thread -1210054992 (LWP 3668)] > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > [New Thread -1210057824 (LWP 3671)] > rdma_bind_addr successful > rdma_listen > cma_event type 4 cma_id 0x804e968 (child) > child cma 0x804e968 > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1210054992 (LWP 3668)] > rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > (gdb) bt > #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > ) at examples/rping.c:767 > (gdb) > > --------------------------------- > (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > Starting program: > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > 17.2.2.102 -p 9999 > [Thread debugging using libthread_db enabled] > [New Thread 47388824908032 (LWP 4620)] > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > [New Thread 1082132800 (LWP 4623)] > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful > created pd 0x506e60 > created channel 0x506e80 > created cq 0x506ea0 > created qp 0x506f40 > rping_setup_buffers called on cb 0x505010 > allocated & registered buffers... > [New Thread 1090525504 (LWP 4624)] > cq_thread started. > > From trimmer at silverstorm.com Mon Jul 10 11:24:49 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 10 Jul 2006 14:24:49 -0400 Subject: [openib-general] user_mad check question In-Reply-To: <44B28B35.9020405@ichips.intel.com> Message-ID: > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Monday, July 10, 2006 1:16 PM > > Rimmer, Todd wrote: > > We defined a response as: > > ((R bit set || TRAP_REPRESS) && ! SEND) || (Class=BM && SEND && > > AttributeModifier BM Response bit set) > > At this point, I'm leaning towards setting the upper bits of the TID for > all > MADs that are not responses. (This is for usermode only, so kernel agents > such > as the CM are left as is.) I disagree, this implies a non-symmetric translation of the TID for SENDs (ie. it would be translated on the outbound SEND but not on any corresponding inbound SEND which might be a reply). The CM and BMA established the precedent for a SEND based protocol where TID was important and class specific information implied request/response status. I recommend only translating the TID on output MADs for messages we understand well enough to ensure the reverse translation would occur when the response arrived back to the sender (BMA is one such case, non-SEND messages is another). > > I think that a response can be identified as: > > Response bit set || trap repress || > (class=BM && attribute modifier response bit set) > > The last check is missing from current code, and the SEND checks listed > above > seem unnecessary. The test above for BM should include && Method==SEND. The BMA protocol also uses Get/Set and other methods, this response bit test is only applicable to the SEND method for the BMA. Todd Rimmer From sean.hefty at intel.com Mon Jul 10 11:51:58 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 11:51:58 -0700 Subject: [openib-general] user_mad check question In-Reply-To: Message-ID: <000101c6a451$ebef69c0$ff0da8c0@amr.corp.intel.com> >I disagree, this implies a non-symmetric translation of the TID for >SENDs (ie. it would be translated on the outbound SEND but not on any >corresponding inbound SEND which might be a reply). The CM and BMA >established the precedent for a SEND based protocol where TID was >important and class specific information implied request/response >status. I'm not following you here. Any MAD that is not a response must have a unique TID when going out of the same port as another MAD of the same class. If different agents are generating sends for the same class, then that uniqueness is enforced by setting of the upper 32-bits of the TID. Kernel agents are expected to do this themselves. Userspace agents have this done for them. >I recommend only translating the TID on output MADs for messages we >understand well enough to ensure the reverse translation would occur >when the response arrived back to the sender (BMA is one such case, >non-SEND messages is another). No reverse translation is done. >The test above for BM should include && Method==SEND. The BMA protocol >also uses Get/Set and other methods, this response bit test is only >applicable to the SEND method for the BMA. It looked like the attribute modifier is always zero except when it is a response. - Sean From rdreier at cisco.com Mon Jul 10 11:58:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 10 Jul 2006 11:58:48 -0700 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference In-Reply-To: <20060710181825.GE29641@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 10 Jul 2006 21:18:25 +0300") References: <20060710181825.GE29641@mellanox.co.il> Message-ID: Yes, looks like a good idea since the underlying FMR API doesn't change the IOVA. Acked-by: Roland Dreier From or.gerlitz at gmail.com Mon Jul 10 12:37:27 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 10 Jul 2006 21:37:27 +0200 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference In-Reply-To: References: <20060710181825.GE29641@mellanox.co.il> Message-ID: <15ddcffd0607101237ge2d31fcgae6f704bcbc0ae22@mail.gmail.com> On 7/10/06, Roland Dreier wrote: > Yes, looks like a good idea since the underlying FMR API doesn't > change the IOVA. So this means that a ULP can not map on the same time these two page sets: and suggesting the verbs layer to have A being the IOVA at the HCA IOMMU (eg MPT/MTT in the mellanox case), and getting some A* to be used for the second map? Such a mapping might be possible eg under zero copy (direct IO) test doing two reads on parallel. Or. From mst at mellanox.co.il Mon Jul 10 12:38:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 22:38:14 +0300 Subject: [openib-general] [PATCH] srp: fix fmr error handling Message-ID: <20060710193814.GA30521@mellanox.co.il> Andrew, could you pls drop the following in -mm and on to Linus? --- From: Vu Pham srp_unmap_data assumes req->fmr is NULL if the request is not mapped, so we must clean it out in case of an error. Signed-off-by: Vu Pham Signed-off-by: Michael S. Tsirkin Acked-by: Roland Dreier diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 4e22afe..8f472e7 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -615,9 +615,10 @@ static int srp_map_fmr(struct srp_device (sg_dma_address(&scat[i]) & dev->fmr_page_mask) + j; req->fmr = ib_fmr_pool_map_phys(dev->fmr_pool, dma_pages, page_cnt, &io_addr); if (IS_ERR(req->fmr)) { ret = PTR_ERR(req->fmr); + req->fmr = NULL; goto out; } -- MST ----- End forwarded message ----- -- MST From mst at mellanox.co.il Mon Jul 10 12:44:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 10 Jul 2006 22:44:09 +0300 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference In-Reply-To: <15ddcffd0607101237ge2d31fcgae6f704bcbc0ae22@mail.gmail.com> References: <15ddcffd0607101237ge2d31fcgae6f704bcbc0ae22@mail.gmail.com> Message-ID: <20060710194409.GB30521@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference > > On 7/10/06, Roland Dreier wrote: > > Yes, looks like a good idea since the underlying FMR API doesn't > > change the IOVA. > > So this means that a ULP can not map on the same time these two page sets: > and suggesting the verbs layer to have A being the IOVA > at the HCA IOMMU (eg MPT/MTT in the mellanox case), and getting some > A* to be used for the second map? > > Such a mapping might be possible eg under zero copy (direct IO) test > doing two reads on parallel. Sorry, I don't understand what you are trying to do, or how its related to the patch. Example? -- MST From rdreier at cisco.com Mon Jul 10 13:25:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 10 Jul 2006 13:25:01 -0700 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference In-Reply-To: <15ddcffd0607101237ge2d31fcgae6f704bcbc0ae22@mail.gmail.com> (Or Gerlitz's message of "Mon, 10 Jul 2006 21:37:27 +0200") References: <20060710181825.GE29641@mellanox.co.il> <15ddcffd0607101237ge2d31fcgae6f704bcbc0ae22@mail.gmail.com> Message-ID: Or> So this means that a ULP can not map on the same time these Or> two page sets: and suggesting the verbs layer to Or> have A being the IOVA at the HCA IOMMU (eg MPT/MTT in the Or> mellanox case), and getting some A* to be used for the second Or> map? I don't follow. The FMR implementation will always use the IOVA passed in by the consumer, so a ULP can always map whatever page sets it wants at whatever IOVA it wants (subject to alignment restrictions, of course). So there's no point in making the IOVA be an output parameter from the FMR pool implementation, since the IOVA from the consumer will never be changed. - R. From swise at opengridcomputing.com Mon Jul 10 13:40:03 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 10 Jul 2006 15:40:03 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001a01c6a213$bb5efa40$4710100a@pc.s2io.com> References: <001a01c6a213$bb5efa40$4710100a@pc.s2io.com> Message-ID: <1152564003.6377.4.camel@stevo-desktop> Hey guys, I lost this thread for a day due to a mailer problem, but I just noticed something screwy with the iwarp branch. I accidentally checked in config.h and config.h.in for librdma. I'm wondering if that's causing problems? maybe nuke those files, redo autogen/configure/make/make install and see if that helps? Steve. On Fri, 2006-07-07 at 15:21 -0700, Ravinandan Arakali wrote: > Pradipta, > Following is the output from gdb after core dump. I have also copy-pasted > the gdb output on client system. > > Attached is the dmesg output when krping test is run in verbose mode. > The ping data on the sender(client) seems okay. The content is shifted > forward by one character for each packet. On receiver, after receiving ping > pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages > can be lost during writing to /var/log/messages ? > > ----------------------------------------- > (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > Starting program: > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > 0.0.0.0 -p 9999 > [Thread debugging using libthread_db enabled] > [New Thread -1210054992 (LWP 3668)] > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > [New Thread -1210057824 (LWP 3671)] > rdma_bind_addr successful > rdma_listen > cma_event type 4 cma_id 0x804e968 (child) > child cma 0x804e968 > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1210054992 (LWP 3668)] > rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > (gdb) bt > #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > ) at examples/rping.c:767 > (gdb) > > --------------------------------- > (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > Starting program: > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > 17.2.2.102 -p 9999 > [Thread debugging using libthread_db enabled] > [New Thread 47388824908032 (LWP 4620)] > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > [New Thread 1082132800 (LWP 4623)] > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful > created pd 0x506e60 > created channel 0x506e80 > created cq 0x506ea0 > created qp 0x506f40 > rping_setup_buffers called on cb 0x505010 > allocated & registered buffers... > [New Thread 1090525504 (LWP 4624)] > cq_thread started. > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 1:29 PM > To: Ravinandan Arakali > Cc: bpradip at in.ibm.com; openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: > > Attached are the outputs when rping and krping are run. > > The dmesg output when krping is run is quite verbose. > > Hopefully, you can throw some light by looking at it. > > The rping server core dumped as soon as the client was run. > > Ravi, > Use the following for krping. You will be able to see the ping data using > dmesg. > > echo "server,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping > echo "client,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping > > As for the rping, can u run rping under gdb (gdb > librdmacm/example/.libs/rping) > and send the backtrace on segfault. > > Thanks, > Pradipta > > > > Thanks, > > Ravi > > > > > >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Mon Jul 10 14:15:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 00:15:42 +0300 Subject: [openib-general] [PATCHv2] IB/mthca: comment fix In-Reply-To: <20060710151824.GL24705@mellanox.co.il> References: <20060710111412.GD24705@mellanox.co.il> <20060710151824.GL24705@mellanox.co.il> Message-ID: <20060710211542.GA30898@mellanox.co.il> OK, this is fine with both Arjan van de Ven and Roland Dreier, so - Andrew, could you take this into -mm please? --- After recent changes, mthca_wq_init does not actually initialize the WQ as it used to - it simply resets all index fields to their initial values. So, let's rename it to mthca_wq_reset. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 490fc78..cd8b672 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -222,9 +222,8 @@ static void *get_send_wqe(struct mthca_q (PAGE_SIZE - 1)); } -static void mthca_wq_init(struct mthca_wq *wq) +static void mthca_wq_reset(struct mthca_wq *wq) { - /* mthca_alloc_qp_common() initializes the locks */ wq->next_ind = 0; wq->last_comp = wq->max - 1; wq->head = 0; @@ -845,10 +844,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); - mthca_wq_init(&qp->sq); + mthca_wq_reset(&qp->sq); qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); - mthca_wq_init(&qp->rq); + mthca_wq_reset(&qp->rq); qp->rq.last = get_recv_wqe(qp, qp->rq.max - 1); if (mthca_is_memfree(dev)) { @@ -1112,9 +1111,9 @@ static int mthca_alloc_qp_common(struct qp->atomic_rd_en = 0; qp->resp_depth = 0; qp->sq_policy = send_policy; - mthca_wq_init(&qp->sq); - mthca_wq_init(&qp->rq); - /* these are initialized separately so lockdep can tell them apart */ + mthca_wq_reset(&qp->sq); + mthca_wq_reset(&qp->rq); + spin_lock_init(&qp->sq.lock); spin_lock_init(&qp->rq.lock); -- MST From sashak at voltaire.com Mon Jul 10 14:23:40 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 11 Jul 2006 00:23:40 +0300 Subject: [openib-general] [PATCH] opensm: ibnetdiscover: fix switch info size Message-ID: <20060710212340.8083.60717.stgit@sashak.voltaire.com> This fixes overflow of buffer used for switchinfo in ibnetdiscover. Signed-off-by: Sasha Khapyorsky --- diags/src/ibnetdiscover.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/diags/src/ibnetdiscover.c b/diags/src/ibnetdiscover.c index 021f93f..a71bab7 100644 --- a/diags/src/ibnetdiscover.c +++ b/diags/src/ibnetdiscover.c @@ -124,7 +124,7 @@ int get_node(Node *node, Port *port, ib_portid_t *portid) { char portinfo[64]; - char switchinfo[32]; + char switchinfo[64]; void *pi = portinfo, *ni = node->nodeinfo, *nd = node->nodedesc; void *si = switchinfo; From zach.brown at oracle.com Mon Jul 10 14:20:19 2006 From: zach.brown at oracle.com (Zach Brown) Date: Mon, 10 Jul 2006 14:20:19 -0700 Subject: [openib-general] [PATCHv2] IB/mthca: comment fix In-Reply-To: <20060710211542.GA30898@mellanox.co.il> References: <20060710111412.GD24705@mellanox.co.il> <20060710151824.GL24705@mellanox.co.il> <20060710211542.GA30898@mellanox.co.il> Message-ID: <44B2C493.4060007@oracle.com> Michael S. Tsirkin wrote: > OK, this is fine with both Arjan van de Ven and Roland Dreier, so - > Andrew, could you take this into -mm please? > Signed-off-by: Michael S. Tsirkin Acked-by: Zach Brown From halr at voltaire.com Mon Jul 10 14:26:24 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 17:26:24 -0400 Subject: [openib-general] [PATCH] opensm: ibnetdiscover: fix switch info size In-Reply-To: <20060710212340.8083.60717.stgit@sashak.voltaire.com> References: <20060710212340.8083.60717.stgit@sashak.voltaire.com> Message-ID: <1152566783.13048.6371.camel@hal.voltaire.com> On Mon, 2006-07-10 at 17:23, Sasha Khapyorsky wrote: > This fixes overflow of buffer used for switchinfo in ibnetdiscover. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Mon Jul 10 14:32:46 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Jul 2006 17:32:46 -0400 Subject: [openib-general] [PATCH] libibmad: Support MFT and Notice/Trap fields Message-ID: <1152567165.13048.6696.camel@hal.voltaire.com> libibmad: Support MFT and Notice/Trap fields Signed-off-by: Sasha Khapyorsky Signed-off-by: Hal Rosenstock Index: include/infiniband/mad.h =================================================================== --- include/infiniband/mad.h (revision 8361) +++ include/infiniband/mad.h (working copy) @@ -152,6 +152,12 @@ enum GSI_ATTR_ID { IB_GSI_ATTR_LAST, }; +enum TRAP_NUM_ID { + IB_TRAP_128, + + IB_TRAP_LAST +}; + #define IB_VENDOR_OPENIB_PING_CLASS (IB_VENDOR_RANGE2_START_CLASS + 2) #define IB_VENDOR_OPENIB_SYSSTAT_CLASS (IB_VENDOR_RANGE2_START_CLASS + 3) #define IB_OPENIB_OUI (0x001405) @@ -363,11 +369,28 @@ enum MAD_FIELDS { IB_LINEAR_FORW_TBL_F, /* + * SwitchMulticastForwardingTable fields: + */ + IB_MULTICAST_FORW_TBL_F, + + /* * NodeDescription fields: */ IB_NODE_DESC_F, /* + * Notice/Trap fields + */ + IB_NOTICE_IS_GENERIC_F, + IB_NOTICE_TYPE_F, + IB_NOTICE_PRODUCER_F, + IB_NOTICE_TRAP_NUMBER_F, + IB_NOTICE_ISSUER_LID_F, + IB_NOTICE_TOGGLE_F, + IB_NOTICE_COUNT_F, + IB_NOTICE_DATA_LID_F, + + /* * GS Performance */ IB_PC_FIRST_F, Index: src/fields.c =================================================================== --- src/fields.c (revision 8361) +++ src/fields.c (working copy) @@ -203,6 +203,23 @@ ib_field_t ib_mad_f [] = { [IB_LINEAR_FORW_TBL_F] {0, 512, "LinearForwTbl", mad_dump_array}, /* + * SwitchMulticastForwardingTable fields: + */ + [IB_MULTICAST_FORW_TBL_F] {0, 512, "MulticastForwTbl", mad_dump_array}, + + /* + * Notice/Trap fields + */ + [IB_NOTICE_IS_GENERIC_F] {BITSOFFS(0,1), "NoticeIsGeneric", mad_dump_uint}, + [IB_NOTICE_TYPE_F] {BITSOFFS(1,7), "NoticeType", mad_dump_uint}, + [IB_NOTICE_PRODUCER_F] {BITSOFFS(8,24), "NoticeProducerType", mad_dump_node_type}, + [IB_NOTICE_TRAP_NUMBER_F] {BITSOFFS(32,16), "NoticeTrapNumber", mad_dump_uint}, + [IB_NOTICE_ISSUER_LID_F] {BITSOFFS(48,16), "NoticeIssuerLID", mad_dump_uint}, + [IB_NOTICE_TOGGLE_F] {BITSOFFS(64,1), "NoticeToggle", mad_dump_uint}, + [IB_NOTICE_COUNT_F] {BITSOFFS(65,15), "NoticeCount", mad_dump_uint}, + [IB_NOTICE_DATA_LID_F] {BITSOFFS(80,16), "NoticeDataLID", mad_dump_uint}, + + /* * NodeDescription fields: */ [IB_NODE_DESC_F] {0, 64*8, "NodeDesc", mad_dump_string}, From weiny2 at llnl.gov Mon Jul 10 14:44:29 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 10 Jul 2006 14:44:29 -0700 Subject: [openib-general] rdma_cm callback event private data length == 0 Message-ID: <20060710144429.0b117007.weiny2@llnl.gov> We have run into a bug here using Lustre which uses the rdma_cm interface. When nodes crash and come back up they try to connect to the "servers" and the "servers" were sending back an IB_CM_REJ_CONSUMER_DEFINED message with a private data structure of length 7. However, the client would see a length of 0 for the private data. At first Eric and I thought that the core was sending the REJ message without private data. However I have found that the message is from the Lustre ULP and does in fact have the 7 bytes of data in it. The "problem" with the private_data_len being 0 appears to be in the cma_ib_handler function. The following is a patch which simply tells the user the private data length for the REJ message. Lustre, which checks this length, then happily gets its data. Is this a bug which needs to be fixed for all the CM messages? Or is it incorrect to look at this length to determine if there is private data included in the message? Since the ib_cm_event structure does not have a length for this data I don't know how else to set this value? Thanks in advance, Ira Weiny LLNL Index: openib/infiniband/core/cma.c =================================================================== --- openib/infiniband/core/cma.c (revision 2508) +++ openib/infiniband/core/cma.c (working copy) @@ -814,6 +826,7 @@ cma_modify_qp_err(&id_priv->id); status = ib_event->param.rej_rcvd.reason; event = RDMA_CM_EVENT_REJECTED; + private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE; break; default: printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d", @@ -821,6 +834,11 @@ goto out; } + if (ib_event->event == IB_CM_REJ_RECEIVED) + { + printk(KERN_CRIT "REJECT (private_data_len = %d)\n", + private_data_len); + } ret = cma_notify_user(id_priv, event, status, ib_event->private_data, private_data_len); if (ret) { From mshefty at ichips.intel.com Mon Jul 10 14:50:01 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 14:50:01 -0700 Subject: [openib-general] [PATCH] libibmad: Support MFT and Notice/Trap fields In-Reply-To: <1152567165.13048.6696.camel@hal.voltaire.com> References: <1152567165.13048.6696.camel@hal.voltaire.com> Message-ID: <44B2CB89.7020307@ichips.intel.com> Hal Rosenstock wrote: > +enum TRAP_NUM_ID { > + IB_TRAP_128, > + > + IB_TRAP_LAST > +}; Should TRAP_128 be defined as 0? - Sean From mshefty at ichips.intel.com Mon Jul 10 15:03:26 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 15:03:26 -0700 Subject: [openib-general] rdma_cm callback event private data length == 0 In-Reply-To: <20060710144429.0b117007.weiny2@llnl.gov> References: <20060710144429.0b117007.weiny2@llnl.gov> Message-ID: <44B2CEAE.1070902@ichips.intel.com> Ira Weiny wrote: > The "problem" with the private_data_len being 0 appears to be in the > cma_ib_handler function. The following is a patch which simply tells the user > the private data length for the REJ message. Lustre, which checks this length, > then happily gets its data. Is this a bug which needs to be fixed for all the > CM messages? Or is it incorrect to look at this length to determine if there > is private data included in the message? Since the ib_cm_event structure does > not have a length for this data I don't know how else to set this value? This is a bug. The private_data_len should have been set as done in your patch. (The IB CM protocol does not indicate if any of the private data is valid, so must be passed through to the consumer as if the entire data is valid.) I've committed a fix in svn 8483 based on your patch (minus the if-statement). Thanks, - Sean From mst at mellanox.co.il Mon Jul 10 15:03:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 01:03:09 +0300 Subject: [openib-general] rdma_cm callback event private data length == 0 In-Reply-To: <20060710144429.0b117007.weiny2@llnl.gov> References: <20060710144429.0b117007.weiny2@llnl.gov> Message-ID: <20060710220309.GB30898@mellanox.co.il> Hi! > @@ -821,6 +834,11 @@ > goto out; > } > > + if (ib_event->event == IB_CM_REJ_RECEIVED) > + { > + printk(KERN_CRIT "REJECT (private_data_len = %d)\n", > + private_data_len); > + } Not sure why is this KERN_CRIT? Also, pls take a look at Documentation/CodingStyle, Chapter 3: Placing Braces For more detail http://www.mellanox.com/mst/boring.txt -- MST From pradeep at us.ibm.com Mon Jul 10 15:06:47 2006 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Mon, 10 Jul 2006 15:06:47 -0700 Subject: [openib-general] Minor edits to installation cheat sheet needed Message-ID: I was experimenting with udev and realized that there some minor errors in the OpenIB Wiki Installation Cheat Sheet. This causes the udev rules to fail and the devices are not created as expected. In the section "Adding udev rules" changes akin to the following need to be made- from: KERNEL="uverbs*", NAME="infiniband/%k" KERNEL="ucm*", NAME="infiniband/%k", MODE="0666" KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" to: KERNEL=="uverbs*", NAME="infiniband/%k" KERNEL=="ucm*", NAME="infiniband/%k", MODE="0666" KERNEL=="rdma_cm", NAME="infiniband/%k", MODE="0666" Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravinandan.arakali at neterion.com Mon Jul 10 15:18:05 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Mon, 10 Jul 2006 15:18:05 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARPinterface) In-Reply-To: <1152564003.6377.4.camel@stevo-desktop> Message-ID: <002301c6a46e$b80b3d60$4710100a@pc.s2io.com> I regenerated librdma after removing config.h and config.h.in but still see the rping crash. Ravi -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Monday, July 10, 2006 1:40 PM To: Ravinandan Arakali Cc: bpradip at in.ibm.com; openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARPinterface) Hey guys, I lost this thread for a day due to a mailer problem, but I just noticed something screwy with the iwarp branch. I accidentally checked in config.h and config.h.in for librdma. I'm wondering if that's causing problems? maybe nuke those files, redo autogen/configure/make/make install and see if that helps? Steve. On Fri, 2006-07-07 at 15:21 -0700, Ravinandan Arakali wrote: > Pradipta, > Following is the output from gdb after core dump. I have also copy-pasted > the gdb output on client system. > > Attached is the dmesg output when krping test is run in verbose mode. > The ping data on the sender(client) seems okay. The content is shifted > forward by one character for each packet. On receiver, after receiving ping > pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages > can be lost during writing to /var/log/messages ? > > ----------------------------------------- > (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > Starting program: > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > 0.0.0.0 -p 9999 > [Thread debugging using libthread_db enabled] > [New Thread -1210054992 (LWP 3668)] > ipaddr (0.0.0.0) > port 9999 > created cm_id 0x804e6e0 > [New Thread -1210057824 (LWP 3671)] > rdma_bind_addr successful > rdma_listen > cma_event type 4 cma_id 0x804e968 (child) > child cma 0x804e968 > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1210054992 (LWP 3668)] > rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > (gdb) bt > #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > ) at examples/rping.c:767 > (gdb) > > --------------------------------- > (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > Starting program: > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > 17.2.2.102 -p 9999 > [Thread debugging using libthread_db enabled] > [New Thread 47388824908032 (LWP 4620)] > ipaddr (17.2.2.102) > port 9999 > created cm_id 0x506b00 > [New Thread 1082132800 (LWP 4623)] > cma_event type 0 cma_id 0x506b00 (parent) > cma_event type 2 cma_id 0x506b00 (parent) > rdma_resolve_addr - rdma_resolve_route successful > created pd 0x506e60 > created channel 0x506e80 > created cq 0x506ea0 > created qp 0x506f40 > rping_setup_buffers called on cb 0x505010 > allocated & registered buffers... > [New Thread 1090525504 (LWP 4624)] > cq_thread started. > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 1:29 PM > To: Ravinandan Arakali > Cc: bpradip at in.ibm.com; openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: > > Attached are the outputs when rping and krping are run. > > The dmesg output when krping is run is quite verbose. > > Hopefully, you can throw some light by looking at it. > > The rping server core dumped as soon as the client was run. > > Ravi, > Use the following for krping. You will be able to see the ping data using > dmesg. > > echo "server,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping > echo "client,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping > > As for the rping, can u run rping under gdb (gdb > librdmacm/example/.libs/rping) > and send the backtrace on segfault. > > Thanks, > Pradipta > > > > Thanks, > > Ravi > > > > > >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mshefty at ichips.intel.com Mon Jul 10 15:20:48 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 15:20:48 -0700 Subject: [openib-general] rdma_cm callback event private data length == 0 In-Reply-To: <20060710220309.GB30898@mellanox.co.il> References: <20060710144429.0b117007.weiny2@llnl.gov> <20060710220309.GB30898@mellanox.co.il> Message-ID: <44B2D2C0.4040701@ichips.intel.com> Michael S. Tsirkin wrote: >>+ if (ib_event->event == IB_CM_REJ_RECEIVED) >>+ { >>+ printk(KERN_CRIT "REJECT (private_data_len = %d)\n", >>+ private_data_len); >>+ } > > > Not sure why is this KERN_CRIT? > Also, pls take a look at Documentation/CodingStyle, Chapter 3: Placing Braces > > For more detail http://www.mellanox.com/mst/boring.txt To be clear, I didn't commit this portion of the patch to svn. - Sean From mst at mellanox.co.il Mon Jul 10 15:19:02 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 01:19:02 +0300 Subject: [openib-general] infiniband patch series (was Re: ipath patch series a-comin', but no IB maintainer to shepherd them) Message-ID: <20060710221902.GB32328@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Yes, -mm seems like a good way to get more review. Andrew, am I using the right format to send things upstream to you? There's really a set of independent patches, so it didn't make sense to me to batch them up in a series. OK? Maybe the addition of the git tree (below) serves to clarify things. > Further, in the hope that this will help keep things reasonably stable till > Roland comes back, and help everyone see what's being merged, I have > created a git branch for all things infiniband going into 2.6.18. > > You can get at it here: > git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 BTW, all outstanding infiniband patches intended for upstream are currently there. Here's the list: Jack Morgenstein: IB/mthca: fix mthca_ah_query static rate format Michael S. Tsirkin: IB/cm: drop REQ when out of memory IB/mthca: comment fix Sean Hefty: IB/addr: gid structure alignment fix Vu Pham: IB/srp: fix fmr error handling -- MST From swise at opengridcomputing.com Mon Jul 10 15:21:24 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 10 Jul 2006 17:21:24 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARPinterface) In-Reply-To: <002301c6a46e$b80b3d60$4710100a@pc.s2io.com> References: <002301c6a46e$b80b3d60$4710100a@pc.s2io.com> Message-ID: <1152570084.10833.1.camel@stevo-desktop> Did you regenerate libamso too? I had even worse garbage in libamso (Makefile, for instance, was checked in). :-( On Mon, 2006-07-10 at 15:18 -0700, Ravinandan Arakali wrote: > I regenerated librdma after removing config.h and config.h.in but still > see the rping crash. > > Ravi > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Monday, July 10, 2006 1:40 PM > To: Ravinandan Arakali > Cc: bpradip at in.ibm.com; openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso > cards(iWARPinterface) > > > Hey guys, > > I lost this thread for a day due to a mailer problem, but I just noticed > something screwy with the iwarp branch. I accidentally checked in > config.h and config.h.in for librdma. > > I'm wondering if that's causing problems? > > maybe nuke those files, redo autogen/configure/make/make install and see > if that helps? > > > Steve. > > > > On Fri, 2006-07-07 at 15:21 -0700, Ravinandan Arakali wrote: > > Pradipta, > > Following is the output from gdb after core dump. I have also copy-pasted > > the gdb output on client system. > > > > Attached is the dmesg output when krping test is run in verbose mode. > > The ping data on the sender(client) seems okay. The content is shifted > > forward by one character for each packet. On receiver, after receiving > ping > > pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages > > can be lost during writing to /var/log/messages ? > > > > ----------------------------------------- > > (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > > Starting program: > > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > > 0.0.0.0 -p 9999 > > [Thread debugging using libthread_db enabled] > > [New Thread -1210054992 (LWP 3668)] > > ipaddr (0.0.0.0) > > port 9999 > > created cm_id 0x804e6e0 > > [New Thread -1210057824 (LWP 3671)] > > rdma_bind_addr successful > > rdma_listen > > cma_event type 4 cma_id 0x804e968 (child) > > child cma 0x804e968 > > > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread -1210054992 (LWP 3668)] > > rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > > (gdb) bt > > #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > > ) at examples/rping.c:767 > > (gdb) > > > > --------------------------------- > > (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > > Starting program: > > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > > 17.2.2.102 -p 9999 > > [Thread debugging using libthread_db enabled] > > [New Thread 47388824908032 (LWP 4620)] > > ipaddr (17.2.2.102) > > port 9999 > > created cm_id 0x506b00 > > [New Thread 1082132800 (LWP 4623)] > > cma_event type 0 cma_id 0x506b00 (parent) > > cma_event type 2 cma_id 0x506b00 (parent) > > rdma_resolve_addr - rdma_resolve_route successful > > created pd 0x506e60 > > created channel 0x506e80 > > created cq 0x506ea0 > > created qp 0x506f40 > > rping_setup_buffers called on cb 0x505010 > > allocated & registered buffers... > > [New Thread 1090525504 (LWP 4624)] > > cq_thread started. > > > > > > -----Original Message----- > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > Sent: Friday, July 07, 2006 1:29 PM > > To: Ravinandan Arakali > > Cc: bpradip at in.ibm.com; openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > interface) > > > > > > Ravinandan Arakali wrote: > > > Attached are the outputs when rping and krping are run. > > > The dmesg output when krping is run is quite verbose. > > > Hopefully, you can throw some light by looking at it. > > > The rping server core dumped as soon as the client was run. > > > > Ravi, > > Use the following for krping. You will be able to see the ping data > using > > dmesg. > > > > echo "server,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping > > echo "client,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping > > > > As for the rping, can u run rping under gdb (gdb > > librdmacm/example/.libs/rping) > > and send the backtrace on segfault. > > > > Thanks, > > Pradipta > > > > > > Thanks, > > > Ravi > > > > > > > > >> To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Mon Jul 10 15:23:59 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 01:23:59 +0300 Subject: [openib-general] rdma_cm callback event private data length == 0 In-Reply-To: <44B2CEAE.1070902@ichips.intel.com> References: <44B2CEAE.1070902@ichips.intel.com> Message-ID: <20060710222359.GC32328@mellanox.co.il> Quoting r. Sean Hefty : > This is a bug. The private_data_len should have been set as done in your > patch. > > (The IB CM protocol does not indicate if any of the private data is valid, so > must be passed through to the consumer as if the entire data is valid.) > > I've committed a fix in svn 8483 based on your patch (minus the if-statement). > > Thanks, > - Sean I guess we need this upstream, don't we? -- MST From mshefty at ichips.intel.com Mon Jul 10 15:27:19 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 15:27:19 -0700 Subject: [openib-general] rdma_cm callback event private data length == 0 In-Reply-To: <20060710222359.GC32328@mellanox.co.il> References: <44B2CEAE.1070902@ichips.intel.com> <20060710222359.GC32328@mellanox.co.il> Message-ID: <44B2D447.9050905@ichips.intel.com> Michael S. Tsirkin wrote: > I guess we need this upstream, don't we? Yes - I made a note to forward out a patch, but if you just pull it from svn 8483 to merge into your git tree, that would work. - Sean From mst at mellanox.co.il Mon Jul 10 15:26:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 01:26:38 +0300 Subject: [openib-general] Minor edits to installation cheat sheet needed In-Reply-To: References: Message-ID: <20060710222638.GD32328@mellanox.co.il> Quoting r. Pradeep Satyanarayana : > Subject: Minor edits to installation cheat sheet needed > > > I was experimenting with udev and realized that there some minor errors in the OpenIB Wiki Installation Cheat Sheet. This causes the udev rules to fail and the > devices are not created as expected. > > In the section "Adding udev rules" changes akin to the following need to be made- > > from: > KERNEL="uverbs*", NAME="infiniband/%k" > KERNEL="ucm*", NAME="infiniband/%k", MODE="0666" > KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" > > to: > > KERNEL=="uverbs*", NAME="infiniband/%k" > KERNEL=="ucm*", NAME="infiniband/%k", MODE="0666" > KERNEL=="rdma_cm", NAME="infiniband/%k", MODE="0666" This works for me as documented in wiki (with a single =). Might this depend on udev version? -- MST From mst at mellanox.co.il Mon Jul 10 15:43:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 01:43:42 +0300 Subject: [openib-general] [PATCH] IB/cm: set private data length for reject messages Message-ID: <20060710224342.GA496@mellanox.co.il> Hi Andrew, Here's another infiniband patch that needs to go upstream. --- From: "Ira Weiny" Set private data length for reject messages to the correct size. Fix from openib svn r8483. Signed-off-by: Sean Hefty Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/core/cma.c =================================================================== --- openib/drivers/infiniband/core/cma.c (revision 8482) +++ openib/drivers/infiniband/core/cma.c (revision 8483) @@ -906,6 +906,7 @@ static int cma_ib_handler(struct ib_cm_i cma_modify_qp_err(&id_priv->id); status = ib_event->param.rej_rcvd.reason; event = RDMA_CM_EVENT_REJECTED; + private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE; break; default: printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d", -- MST From mst at mellanox.co.il Mon Jul 10 15:10:11 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 01:10:11 +0300 Subject: [openib-general] [PATCH fixed] srp: fix fmr error handling In-Reply-To: <20060710193814.GA30521@mellanox.co.il> References: <20060710193814.GA30521@mellanox.co.il> Message-ID: <20060710221011.GA32328@mellanox.co.il> Sorry, the last patch I sent was corrupted, here's an updated version: --- commit 7289361d5f81463e4abb27334773750279547e8a Author: Vu Pham Date: Mon Jul 10 22:38:14 2006 +0300 [PATCH] srp: fix fmr error handling srp_unmap_data assumes req->fmr is NULL if the request is not mapped, so we must clean it out in case of an error. Signed-off-by: Vu Pham Signed-off-by: Michael S. Tsirkin Acked-by: Roland Dreier diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 4e22afe..6191180 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -618,6 +618,7 @@ static int srp_map_fmr(struct srp_device dma_pages, page_cnt, &io_addr); if (IS_ERR(req->fmr)) { ret = PTR_ERR(req->fmr); + req->fmr = NULL; goto out; } -- MST From pradeep at us.ibm.com Mon Jul 10 15:51:21 2006 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Mon, 10 Jul 2006 15:51:21 -0700 Subject: [openib-general] Minor edits to installation cheat sheet needed In-Reply-To: <20060710222638.GD32328@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 07/10/2006 03:26:38 PM: > Quoting r. Pradeep Satyanarayana : > > Subject: Minor edits to installation cheat sheet needed > > > > > > I was experimenting with udev and realized that there some minor > errors in the OpenIB Wiki Installation Cheat Sheet. This causes the > udev rules to fail and the > > devices are not created as expected. > > > > In the section "Adding udev rules" changes akin to the following > need to be made- > > > > from: > > KERNEL="uverbs*", NAME="infiniband/%k" > > KERNEL="ucm*", NAME="infiniband/%k", MODE="0666" > > KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" > > > > to: > > > > KERNEL=="uverbs*", NAME="infiniband/%k" > > KERNEL=="ucm*", NAME="infiniband/%k", MODE="0666" > > KERNEL=="rdma_cm", NAME="infiniband/%k", MODE="0666" > > This works for me as documented in wiki (with a single =). > Might this depend on udev version? Hmmm....quite possibly. Here is what I have : udevinfo -V udevinfo, version 094 > > -- > MST Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Jul 10 15:54:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 01:54:51 +0300 Subject: [openib-general] Minor edits to installation cheat sheet needed In-Reply-To: References: Message-ID: <20060710225451.GA923@mellanox.co.il> Quoting r. Pradeep Satyanarayana : > > This works for me as documented in wiki (with a single =). > > Might this depend on udev version? > > Hmmm....quite possibly. Here is what I have : > udevinfo -V > udevinfo, version 094 Which distro? Here's e.g. RHAS4 U3: # udevinfo -V udevinfo, version 039 well, I guess we need both formats in wiki. go ahead and add yours as well. -- MST From weiny2 at llnl.gov Mon Jul 10 16:10:45 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 10 Jul 2006 16:10:45 -0700 Subject: [openib-general] rdma_cm callback event private data length == 0 In-Reply-To: <44B2D447.9050905@ichips.intel.com> References: <44B2CEAE.1070902@ichips.intel.com> <20060710222359.GC32328@mellanox.co.il> <44B2D447.9050905@ichips.intel.com> Message-ID: <20060710161045.363d0f17.weiny2@llnl.gov> Do the other events need this as well? ie REQ, REP, RTU, DREQ, DREP? Ira On Mon, 10 Jul 2006 15:27:19 -0700 Sean Hefty wrote: > Michael S. Tsirkin wrote: > > I guess we need this upstream, don't we? > > Yes - I made a note to forward out a patch, but if you just pull it > from svn 8483 to merge into your git tree, that would work. > > - Sean From mshefty at ichips.intel.com Mon Jul 10 16:16:11 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 16:16:11 -0700 Subject: [openib-general] rdma_cm callback event private data length == 0 In-Reply-To: <20060710161045.363d0f17.weiny2@llnl.gov> References: <44B2CEAE.1070902@ichips.intel.com> <20060710222359.GC32328@mellanox.co.il> <44B2D447.9050905@ichips.intel.com> <20060710161045.363d0f17.weiny2@llnl.gov> Message-ID: <44B2DFBB.7000901@ichips.intel.com> Ira Weiny wrote: > Do the other events need this as well? > > ie REQ, REP, RTU, DREQ, DREP? The REQ and REP set the private data size already. Private data is not reported to the user through the RDMA CM for the other events. As a general rule, an application cannot be guaranteed to receive private data except in REQ and REP messages when a connection succeeds. - Sean From mshefty at ichips.intel.com Mon Jul 10 16:22:17 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 16:22:17 -0700 Subject: [openib-general] [ucm] device file of the ucm is not being created In-Reply-To: <200607091516.21239.dotanb@mellanox.co.il> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30271BBB0@mtlexch01.mtl.com> <44AD44DB.4070201@ichips.intel.com> <200607091516.21239.dotanb@mellanox.co.il> Message-ID: <44B2E129.2000505@ichips.intel.com> Dotan Barak wrote: > On Thursday 06 July 2006 20:14, Sean Hefty wrote: > >>Dotan Barak wrote: >> >>>KERNEL="ucma", NAME="infiniband/%k", MODE="0666" >>>KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" >>>do you know that is the problem? >> >>The ucma should be in /sys/class/misc/rdma_cm. To clarify, I believe that the rdma_cm uses different device files depending on whether you're using the backport patches. Older kernels do not have the misc device, so the backport patches relocate the file under the infiniband directory. > =======================+ Client Output +==================== > cmatose: starting client > cmatose: connecting > cmatose: event: 8, error: 8 This looks like an error trying to connect to a server before it has started running. - Sean From ravinandan.arakali at neterion.com Mon Jul 10 16:23:38 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Mon, 10 Jul 2006 16:23:38 -0700 Subject: [openib-general] ping problem with ammassocards(iWARPinterface) In-Reply-To: <1152570084.10833.1.camel@stevo-desktop> Message-ID: <002801c6a477$e024cf10$4710100a@pc.s2io.com> Yes, I did, and still see the crash. -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Monday, July 10, 2006 3:21 PM To: ravinandan.arakali at neterion.com Cc: bpradip at in.ibm.com; openib-general at openib.org Subject: RE: [openib-general] ping problem with ammassocards(iWARPinterface) Did you regenerate libamso too? I had even worse garbage in libamso (Makefile, for instance, was checked in). :-( On Mon, 2006-07-10 at 15:18 -0700, Ravinandan Arakali wrote: > I regenerated librdma after removing config.h and config.h.in but still > see the rping crash. > > Ravi > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Monday, July 10, 2006 1:40 PM > To: Ravinandan Arakali > Cc: bpradip at in.ibm.com; openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso > cards(iWARPinterface) > > > Hey guys, > > I lost this thread for a day due to a mailer problem, but I just noticed > something screwy with the iwarp branch. I accidentally checked in > config.h and config.h.in for librdma. > > I'm wondering if that's causing problems? > > maybe nuke those files, redo autogen/configure/make/make install and see > if that helps? > > > Steve. > > > > On Fri, 2006-07-07 at 15:21 -0700, Ravinandan Arakali wrote: > > Pradipta, > > Following is the output from gdb after core dump. I have also copy-pasted > > the gdb output on client system. > > > > Attached is the dmesg output when krping test is run in verbose mode. > > The ping data on the sender(client) seems okay. The content is shifted > > forward by one character for each packet. On receiver, after receiving > ping > > pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages > > can be lost during writing to /var/log/messages ? > > > > ----------------------------------------- > > (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > > Starting program: > > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > > 0.0.0.0 -p 9999 > > [Thread debugging using libthread_db enabled] > > [New Thread -1210054992 (LWP 3668)] > > ipaddr (0.0.0.0) > > port 9999 > > created cm_id 0x804e6e0 > > [New Thread -1210057824 (LWP 3671)] > > rdma_bind_addr successful > > rdma_listen > > cma_event type 4 cma_id 0x804e968 (child) > > child cma 0x804e968 > > > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread -1210054992 (LWP 3668)] > > rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > > (gdb) bt > > #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > > ) at examples/rping.c:767 > > (gdb) > > > > --------------------------------- > > (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > > Starting program: > > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > > 17.2.2.102 -p 9999 > > [Thread debugging using libthread_db enabled] > > [New Thread 47388824908032 (LWP 4620)] > > ipaddr (17.2.2.102) > > port 9999 > > created cm_id 0x506b00 > > [New Thread 1082132800 (LWP 4623)] > > cma_event type 0 cma_id 0x506b00 (parent) > > cma_event type 2 cma_id 0x506b00 (parent) > > rdma_resolve_addr - rdma_resolve_route successful > > created pd 0x506e60 > > created channel 0x506e80 > > created cq 0x506ea0 > > created qp 0x506f40 > > rping_setup_buffers called on cb 0x505010 > > allocated & registered buffers... > > [New Thread 1090525504 (LWP 4624)] > > cq_thread started. > > > > > > -----Original Message----- > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > Sent: Friday, July 07, 2006 1:29 PM > > To: Ravinandan Arakali > > Cc: bpradip at in.ibm.com; openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > interface) > > > > > > Ravinandan Arakali wrote: > > > Attached are the outputs when rping and krping are run. > > > The dmesg output when krping is run is quite verbose. > > > Hopefully, you can throw some light by looking at it. > > > The rping server core dumped as soon as the client was run. > > > > Ravi, > > Use the following for krping. You will be able to see the ping data > using > > dmesg. > > > > echo "server,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping > > echo "client,port=9999,addr=17.2.2.102,validate,verbose" > /proc/krping > > > > As for the rping, can u run rping under gdb (gdb > > librdmacm/example/.libs/rping) > > and send the backtrace on segfault. > > > > Thanks, > > Pradipta > > > > > > Thanks, > > > Ravi > > > > > > > > >> To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From bugzilla-daemon at openib.org Mon Jul 10 16:33:01 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 10 Jul 2006 16:33:01 -0700 (PDT) Subject: [openib-general] [Bug 159] OFED1.0: Missing interfaces Message-ID: <20060710233301.0FCD0228742@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=159 venkatesh.babu at 3leafnetworks.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|WONTFIX | ------- Comment #3 from venkatesh.babu at 3leafnetworks.com 2006-07-10 16:33 ------- I am reopening the bug because there was no explantion for missing tsIbInServiceNoticeHandler() interface according to comment #2. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon Jul 10 16:55:32 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 10 Jul 2006 16:55:32 -0700 (PDT) Subject: [openib-general] [Bug 172] New: Need an interface to load alternate path to RC QP Message-ID: <20060710235532.7F0FF228742@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=172 Summary: Need an interface to load alternate path to RC QP Product: OpenFabrics Linux Version: gen2 Platform: All OS/Version: All Status: NEW Severity: critical Priority: P2 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: venkatesh.babu at 3leafnetworks.com RC QP can be initially setup with primary and alternate paths, but after the failover to load a new alternate path ib_modify_qp() needs to be used. This interface requires address handle (struct ib_ah_attr) of the alternate path. The user needs to get a address handle for the alternate path record (struct ib_sa_path_rec). Only CM's internal function cm_init_av_by_path() can do this and also initialize its internal data structure alt_av field of struct cm_id_private. So new interface needs to be defined to get this functionality. I have implemented this interface similar to ib_cm_init_qp_attr() and tested it. I can simulate the failover back and forth between the two ports couple of time. *** cm.c 2006-09-30 15:52:44.000000000 -0700 --- /root/src/OFED-1.0/SOURCES/openib-1.0.orig/src/linux-kernel/infiniband/core/cm.c 2006-05-15 10:07:01.000000000 -0700 *************** *** 3241,3286 **** } EXPORT_SYMBOL(ib_cm_init_qp_attr); - int ib_cm_init_rearm_attr(struct ib_cm_id *cm_id, - struct ib_qp_attr *qp_attr, - int *qp_attr_mask, - struct ib_sa_path_rec *alternate_path) - { - struct cm_id_private *cm_id_priv; - unsigned long flags; - int ret; - - cm_id_priv = container_of(cm_id, struct cm_id_private, id); - - if (alternate_path) { - ret = cm_init_av_by_path(alternate_path, &cm_id_priv->alt_av); - if (ret) - return ret; - } - - spin_lock_irqsave(&cm_id_priv->lock, flags); - switch (cm_id_priv->id.state) { - case IB_CM_ESTABLISHED: - if (cm_id_priv->alt_av.ah_attr.dlid) { - qp_attr->alt_port_num = - cm_id_priv->alt_av.port->port_num; - qp_attr->alt_ah_attr = cm_id_priv->alt_av.ah_attr; - qp_attr->path_mig_state = IB_MIG_REARM; - - *qp_attr_mask = IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE; - } - ret = 0; - break; - default: - ret = -EINVAL; - break; - } - spin_unlock_irqrestore(&cm_id_priv->lock, flags); - - return ret; - } - EXPORT_SYMBOL(ib_cm_init_rearm_attr); - static void cm_add_one(struct ib_device *device) { struct cm_device *cm_dev; --- 3239,3244 ---- *** ib_cm.h 2006-09-30 16:05:05.000000000 -0700 --- /root/src/OFED-1.0/SOURCES/openib-1.0.orig/src/linux-kernel/infiniband/include/rdma/ib_cm.h 2006-04-10 10:50:19.000000000 -0700 *************** *** 521,547 **** int *qp_attr_mask); /** - * ib_cm_init_rearm_attr - Initializes the QP attributes for use in - * transitioning to rearm state. - * @cm_id: Communication identifier associated with the QP attributes to - * initialize. - * @qp_attr: On input, specifies the desired QP state. On output, the - * mandatory and desired optional attributes will be set in order to - * modify the QP to the specified state. - * @qp_attr_mask: The QP attribute mask that may be used to transition the - * QP to the specified state. - * @alternate_path: Rearm the QP with this alternate path. - * - * This call will set all required attributes for the given transition, along - * with known optional attributes. Users may override the attributes returned- * from this call before calling ib_modify_qp. - */ - int ib_cm_init_rearm_attr(struct ib_cm_id *cm_id, - struct ib_qp_attr *qp_attr, - int *qp_attr_mask, - struct ib_sa_path_rec *alternate_path); - - /** * ib_send_cm_apr - Sends an alternate path response message in response to * a load alternate path request. * @cm_id: Connection identifier associated with the alternate path response. --- 521,526 ---- ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon Jul 10 17:10:06 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 10 Jul 2006 17:10:06 -0700 (PDT) Subject: [openib-general] [Bug 172] Need an interface to load alternate path to RC QP Message-ID: <20060711001006.95FE8228742@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=172 ------- Comment #1 from sean.hefty at intel.com 2006-07-10 17:10 ------- ib_sa.h contains a call ib_init_ah_from_path() that will initialize address handle attributes based on a path record. The alternate path information in the IB CM should probably change based on a call to ib_send_cm_lap(), which loads a new alternate path. Alternate path support in the IB CM was not tested, so likely has missing features needed to make it work correctly. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sean.hefty at intel.com Mon Jul 10 17:24:55 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 10 Jul 2006 17:24:55 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 Message-ID: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> Enhance validation of MADs sent by userspace clients for spec compliance with C13-18.1.1 (duplicate requests / responses). Also verify that RMPP MADs are data only, to avoid a userspace app causing a kernel crash by sending non-data MADs. Signed-off-by: Sean Hefty --- NOTE: This changes a check in ib_umad_write(), which avoided setting the TID for SEND MADs. This patch is similar to that provided by Jack Mogenstein. See: http://openib.org/pipermail/openib-general/2006-May/021975.html The difference is that this patch is limited to checking MADs issued from userspace only, and avoids accessing MAD data buffers while they are mapped. Index: include/rdma/ib_mad.h =================================================================== --- include/rdma/ib_mad.h (revision 8484) +++ include/rdma/ib_mad.h (working copy) @@ -75,6 +75,7 @@ #define IB_MGMT_METHOD_TRAP_REPRESS 0x07 #define IB_MGMT_METHOD_RESP 0x80 +#define IB_BM_ATTR_MOD_RESP cpu_to_be32(1) #define IB_MGMT_MAX_METHODS 128 @@ -247,6 +248,18 @@ struct ib_mad_send_buf { }; /** + * ib_response_mad - Returns if the specified MAD has been generated in + * response to a sent request or trap. + */ +static inline int ib_response_mad(struct ib_mad *mad) +{ + return ((mad->mad_hdr.method & IB_MGMT_METHOD_RESP) || + (mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) || + ((mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_BM) && + (mad->mad_hdr.attr_mod & IB_BM_ATTR_MOD_RESP))); +} + +/** * ib_get_rmpp_resptime - Returns the RMPP response time. * @rmpp_hdr: An RMPP header. */ Index: core/user_mad.c =================================================================== --- core/user_mad.c (revision 8484) +++ core/user_mad.c (working copy) @@ -112,8 +112,10 @@ struct ib_umad_device { struct ib_umad_file { struct ib_umad_port *port; struct list_head recv_list; + struct list_head send_list; struct list_head port_list; spinlock_t recv_lock; + spinlock_t send_lock; wait_queue_head_t recv_wait; struct ib_mad_agent *agent[IB_UMAD_MAX_AGENTS]; int agents_dead; @@ -177,12 +179,21 @@ static int queue_packet(struct ib_umad_f return ret; } +static void dequeue_send(struct ib_umad_file *file, + struct ib_umad_packet *packet) + { + spin_lock_irq(&file->send_lock); + list_del(&packet->list); + spin_unlock_irq(&file->send_lock); + } + static void send_handler(struct ib_mad_agent *agent, struct ib_mad_send_wc *send_wc) { struct ib_umad_file *file = agent->context; struct ib_umad_packet *packet = send_wc->send_buf->context[0]; + dequeue_send(file, packet); ib_destroy_ah(packet->msg->ah); ib_free_send_mad(packet->msg); @@ -370,6 +381,51 @@ static int copy_rmpp_mad(struct ib_mad_s return 0; } +static int same_destination(struct ib_user_mad_hdr *hdr1, + struct ib_user_mad_hdr *hdr2) +{ + if (!hdr1->grh_present && !hdr2->grh_present) + return (hdr1->lid == hdr2->lid); + + if (hdr1->grh_present && hdr2->grh_present) + return memcmp(hdr1->gid, hdr2->gid, 16); + + return 0; +} + +static int is_duplicate(struct ib_umad_file *file, + struct ib_umad_packet *packet) +{ + struct ib_umad_packet *sent_packet; + struct ib_mad_hdr *sent_hdr, *hdr; + + hdr = (struct ib_mad_hdr *) packet->mad.data; + list_for_each_entry(sent_packet, &file->send_list, list) { + sent_hdr = (struct ib_mad_hdr *) sent_packet->mad.data; + + if ((hdr->tid != sent_hdr->tid) || + (hdr->mgmt_class != sent_hdr->mgmt_class)) + continue; + + /* + * No need to be overly clever here. If two new operations have + * the same TID, reject the second as a duplicate. This is more + * restrictive than required by the spec. + */ + if (!ib_response_mad((struct ib_mad *) hdr)) { + if (!ib_response_mad((struct ib_mad *) sent_hdr)) + return 1; + continue; + } else if (!ib_response_mad((struct ib_mad *) sent_hdr)) + continue; + + if (same_destination(&packet->mad.hdr, &sent_packet->mad.hdr)) + return 1; + } + + return 0; +} + static ssize_t ib_umad_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { @@ -379,7 +435,6 @@ static ssize_t ib_umad_write(struct file struct ib_ah_attr ah_attr; struct ib_ah *ah; struct ib_rmpp_mad *rmpp_mad; - u8 method; __be64 *tid; int ret, data_len, hdr_len, copy_offset, rmpp_active; @@ -438,6 +493,11 @@ static ssize_t ib_umad_write(struct file copy_offset = IB_MGMT_RMPP_HDR; rmpp_active = ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) & IB_MGMT_RMPP_FLAG_ACTIVE; + if (rmpp_active && + rmpp_mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) { + ret = -EINVAL; + goto err_ah; + } } data_len = count - sizeof (struct ib_user_mad) - hdr_len; @@ -473,28 +533,36 @@ static ssize_t ib_umad_write(struct file } /* - * If userspace is generating a request that will generate a - * response, we need to make sure the high-order part of the - * transaction ID matches the agent being used to send the - * MAD. + * Set the high-order part of the transaction ID to make MADs from + * different agents unique, and allow routing responses back to the + * original requestor. */ - method = ((struct ib_mad_hdr *) packet->msg->mad)->method; - - if (!(method & IB_MGMT_METHOD_RESP) && - method != IB_MGMT_METHOD_TRAP_REPRESS && - method != IB_MGMT_METHOD_SEND) { + if (!ib_response_mad(packet->msg->mad)) { tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid; *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 | (be64_to_cpup(tid) & 0xffffffff)); + rmpp_mad->mad_hdr.tid = *tid; + } + + spin_lock_irq(&file->send_lock); + ret = is_duplicate(file, packet); + if (!ret) + list_add_tail(&packet->list, &file->send_list); + spin_unlock_irq(&file->send_lock); + if (ret) { + ret = -EINVAL; + goto err_msg; } ret = ib_post_send_mad(packet->msg, NULL); if (ret) - goto err_msg; + goto err_send; up_read(&file->port->mutex); return count; +err_send: + dequeue_send(file, packet); err_msg: ib_free_send_mad(packet->msg); err_ah: @@ -657,7 +725,9 @@ static int ib_umad_open(struct inode *in } spin_lock_init(&file->recv_lock); + spin_lock_init(&file->send_lock); INIT_LIST_HEAD(&file->recv_list); + INIT_LIST_HEAD(&file->send_list); init_waitqueue_head(&file->recv_wait); file->port = port; Index: core/mad.c =================================================================== --- core/mad.c (revision 8484) +++ core/mad.c (working copy) @@ -570,13 +570,6 @@ int ib_unregister_mad_agent(struct ib_ma } EXPORT_SYMBOL(ib_unregister_mad_agent); -static inline int response_mad(struct ib_mad *mad) -{ - /* Trap represses are responses although response bit is reset */ - return ((mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) || - (mad->mad_hdr.method & IB_MGMT_METHOD_RESP)); -} - static void dequeue_mad(struct ib_mad_list_head *mad_list) { struct ib_mad_queue *mad_queue; @@ -723,7 +716,7 @@ static int handle_outgoing_dr_smp(struct switch (ret) { case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY: - if (response_mad(&mad_priv->mad.mad) && + if (ib_response_mad(&mad_priv->mad.mad) && mad_agent_priv->agent.recv_handler) { local->mad_priv = mad_priv; local->recv_mad_agent = mad_agent_priv; @@ -1551,7 +1544,7 @@ find_mad_agent(struct ib_mad_port_privat unsigned long flags; spin_lock_irqsave(&port_priv->reg_lock, flags); - if (response_mad(mad)) { + if (ib_response_mad(mad)) { u32 hi_tid; struct ib_mad_agent_private *entry; @@ -1799,7 +1792,7 @@ static void ib_mad_complete_recv(struct } /* Complete corresponding request */ - if (response_mad(mad_recv_wc->recv_buf.mad)) { + if (ib_response_mad(mad_recv_wc->recv_buf.mad)) { spin_lock_irqsave(&mad_agent_priv->lock, flags); mad_send_wr = ib_find_send_mad(mad_agent_priv, mad_recv_wc); if (!mad_send_wr) { From johann.george at qlogic.com Mon Jul 10 17:59:00 2006 From: johann.george at qlogic.com (Johann George) Date: Mon, 10 Jul 2006 17:59:00 -0700 Subject: [openib-general] which patches are needed for 2.6.17 kernel? Message-ID: <20060711005900.GA29613@cuprite.pathscale.com> I'm trying to compile the lastest kernel (2.6.17.4) with the latest OpenFabrics SVN (8483) and notice that ulp/iser does not compile (missing scsi/libiscsi.h, etc.). What patches are currently needed to make the current kernel compile with the current SVN? Thanks. Johann From krkumar2 at in.ibm.com Mon Jul 10 20:48:50 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 11 Jul 2006 09:18:50 +0530 Subject: [openib-general] [RFC] [PATCH 2/7] ibrdmaverbs config files 2 In-Reply-To: <1152589394.8881.9.camel@localhost.localdomain> References: <1152589394.8881.9.camel@localhost.localdomain> Message-ID: <1152589730.8881.15.camel@localhost.localdomain> Hi Roland, > Agreed. I think that the way forward is to provide a libibverbs > release that has both rdma_ and ibv_ prefixed identifiers, and also > provides both and headers, etc. > Then in the next major libibverbs release, the ibv_ versions of > identifiers can be deprecated, and finally removed in the next major > release (which could be renamed to librdmaverbs). Actually what you are suggesting is almost the same as what I have done - instead of creating different API's in the same library, I have created a new library to do the same. > Following that route is something that I've been meaning to work on, > and I'd be happy to accept patches in that direction. But a big bang > approach that breaks the world is not a good idea I don't think. The intention was never to break the existing applications, since I am not suggesting to remove libibverbs immediately. The intention is : when all applications are converted to use the new API, then the libibverbs can be removed. Does that sound reasonable ? Otherwise I am OK to change it wrt to your idea. Thanks, - KK From krkumar2 at in.ibm.com Mon Jul 10 20:50:19 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 11 Jul 2006 09:20:19 +0530 Subject: [openib-general] [RFC] [PATCH 3/7] ibrdmaverbs hdr files In-Reply-To: <1152589394.8881.9.camel@localhost.localdomain> References: <1152589394.8881.9.camel@localhost.localdomain> Message-ID: <1152589819.8881.16.camel@localhost.localdomain> Hi James, > I wouldn't prefix all of you include guards with INFINIBAND_. Not only > is it confusing, but it collides with the libibverbs headers. I'd > suggest using RDMA_ instead. Correct, thanks for pointing it out. - KK From akpm at osdl.org Mon Jul 10 21:11:51 2006 From: akpm at osdl.org (Andrew Morton) Date: Mon, 10 Jul 2006 21:11:51 -0700 Subject: [openib-general] infiniband patch series (was Re: ipath patch series a-comin', but no IB maintainer to shepherd them) In-Reply-To: <20060710221902.GB32328@mellanox.co.il> References: <20060710221902.GB32328@mellanox.co.il> Message-ID: <20060710211151.4213742b.akpm@osdl.org> On Tue, 11 Jul 2006 01:19:02 +0300 "Michael S. Tsirkin" wrote: > Quoting r. Michael S. Tsirkin : > > Yes, -mm seems like a good way to get more review. > > Andrew, am I using the right format to send things upstream to you? > There's really a set of independent patches, so it didn't make sense > to me to batch them up in a series. OK? Sure. Although I am a little surprised to be be receiving them while Roland is in taking-time-off-but-not-really-doing-so mode. But whatever - I'll put them in the for-2.6.18 queue, thanks. From bpradip at in.ibm.com Mon Jul 10 21:44:43 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Tue, 11 Jul 2006 10:14:43 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001401c6a44d$d5e06d90$4710100a@pc.s2io.com> References: <001401c6a44d$d5e06d90$4710100a@pc.s2io.com> Message-ID: <44B32CBB.9060201@in.ibm.com> Ravinandan Arakali wrote: > Pradipta, > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > values in /proc/krping. When I re-ran the krping test, I see following > output > openfab2:~ # cat /proc/krping > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > Any idea why rping is dumping core ? I had noticed such crashes, but doing a clean install of the libraries solved the problem. Thanks, Pradipta Kumar. > > Has any testing been done using SDP with ammasso cards ? > > Regards, > Ravi > > > -----Original Message----- From mst at mellanox.co.il Mon Jul 10 22:00:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 08:00:55 +0300 Subject: [openib-general] infiniband patch series (was Re: ipath patch series a-comin', but no IB maintainer to shepherd them) In-Reply-To: <20060710211151.4213742b.akpm@osdl.org> References: <20060710211151.4213742b.akpm@osdl.org> Message-ID: <20060711050055.GA16064@mellanox.co.il> Quoting r. Andrew Morton : > Sure. Although I am a little surprised to be be receiving them while > Roland is in taking-time-off-but-not-really-doing-so mode. Well, I don't know what's up either, but Roland acked patches explicitly so I figured that's what he wants, too. -- MST From bpradip at in.ibm.com Mon Jul 10 22:22:12 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Tue, 11 Jul 2006 10:52:12 +0530 Subject: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060706092834.GA9116@mellanox.co.il> References: <44ACD692.1010802@in.ibm.com> <20060706092834.GA9116@mellanox.co.il> Message-ID: <44B33584.7020504@in.ibm.com> Michael S. Tsirkin wrote: > Quoting r. Pradipta Kumar Banerjee : >> Subject: Re: [openib-general] [PATCH 1/2] perftest: enhancement to rdma_bw to allow use of RDMA CM >> >> Michael S. Tsirkin wrote: >>> Quoting r. Pradipta Kumar Banerjee : >>>> IMO using rcq seems to be a generic and better solution. >>> Hmm, I see. Need to document the message format then. >>> We are only pasing the vaddr there, right? >>> >> Michael, >> Actually 'rcq' is being used for handling the 'start' and 'done' messages. >> As for the lid, qpn, psn, rkey and vaddr, these gets exchanged as part of the >> rdma_listen/rdma_connect calls. See pp_server_connect and pp_client_connect. >> OTH I tried testing rdma_bw on Ammasso iWARP without exchanging the 'start' and >> 'done' messages and it worked. I am not sure if this is the right thing to do. >> Maybe Steve can throw more light on this. > > This makes sense. But why do we need the start message then? > Hi Michael, As per my understanding of the mpa/iwarp spec, here is the reason for the 'start' message . The receipt of RDMA_CM_EVENT_ESTABLISHED event by the user mode application, as part of the rdma_listen/rdma_accept/rdma_connect sequence, is an indication that the MPA startup sequence has successfully completed. Taking the example of the cxgb3 driver - the driver generates the IW_CM_EVENT_ESTABLISHED event and goes to the FPDU mode only after successful completion of the MPA startup sequence. This event eventually gets passed to the user as RDMA_CM_EVENT_ESTABLISHED. (function cm_conn_est_handler in core/iwcm.c, function cma_iw_handler in core/cma.c, function established_upcall in hw/cxgb3/iwch_cm.c) Now, as per the MPA spec, the complete startup negotation is over only when the initiator sends the first FPDU frame and the responder receives it. [ http://www.ietf.org/internet-drafts/draft-ietf-rddp-mpa-05.txt 7.1.2 Connection Startup Rules 4. MPA Responder mode implementations MUST receive and validate at least one FPDU before sending any FPDUs or Markers. Note: this requirement is present to allow the Initiator time to get its receiver into Full Operation before an FPDU arrives, avoiding potential race conditions at the Initiator. This was also subject to some debate in the work group before rough consensus was reached. Eliminating this requirement would allow faster startup in some types of applications. However, that would also make certain implementations (particularly "dual stack") much harder. ] So the 'pp_send_start/pp_wait_for_start' actually takes care of the above requirement, initiator (client) sending a 'start message' as FPDU and the responder (server) receiving the same correctly. ____Client____ ____Server____ Waits for MPA Request Frame Sends MPA Request Frame Waits for incoming MPA Reply Frame Receives MPA Request Frame Enables FPDU decoding (doesn't send any FPDUs) Receives MPA Reply Frame < The above communication sequence is taken care by the rdma_listen - rdma_accept/rdma_connect calls and subsequent generation of RDMA_CM_EVENT_ESTABLISHED event > Send first FPDU Receive first FPDU < The above communication sequence is taken care by the pp_wait_for_start/pp_send_start calls > Hope I have made myself clear without generating further confusions :-) Thanks, Pradipta Kumar. From mst at mellanox.co.il Mon Jul 10 22:51:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 08:51:34 +0300 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> Message-ID: <20060711055134.GB16513@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 > > Enhance validation of MADs sent by userspace clients for spec > compliance with C13-18.1.1 (duplicate requests / responses). Also > verify that RMPP MADs are data only, to avoid a userspace app > causing a kernel crash by sending non-data MADs. > > Signed-off-by: Sean Hefty This will be 2.6.18 material, right? -- MST From eitan at mellanox.co.il Mon Jul 10 23:59:37 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 11 Jul 2006 09:59:37 +0300 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A8627@mtlexch01.mtl.com> Guys, The files are in SVN: https://openib.org/svn/gen2/trunk/src/userspace/management/osm/include/iba/ib_types.h svn://openib.tc.cornell.edu/gen1/trunk/inc/iba/ib_types.h You can use whatever diff you like ... Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Monday, July 10, 2006 7:44 PM > To: Fabian Tillier > Cc: Eitan Zahavi; Sean Hefty; openib-windows at openib.org; openib- > general at openib.org > Subject: Re: [Openib-windows] [openib-general] ib_types.h and Win/Linux > consolidation > > Fabian Tillier wrote: > > Could you filter these out and send out what the actual changes > > thatmatter are? I quickly lost interest here. > > Can you also use the -up diff format? > > - Sean From ogerlitz at voltaire.com Tue Jul 11 00:02:59 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 11 Jul 2006 10:02:59 +0300 Subject: [openib-general] few notes about last night (and elsewhere) IB upstream traffic Message-ID: <44B34D23.4020409@voltaire.com> Hey Michael, The way IB patches are pushed upstream along the last (first to IB...) kernel cycles is that patches are gathered in a branch at Roland's GIT tree @kernel.org and then once or twice along the lifetime of feature or bug fixes merge window, Roland sends an email to Linus with CC to lkml and openib asking him to pull a patch set. I strongly prefer this work mode among other things as of the following reasons: - it allows developers to easily keep track on what's went upstream and when by just saving this 1 or 2 emails per window - Roland tree under kernel.org is accessed via http and supports fancy and very helpful browsing - it creates much less traffic which is practically noise at lkml (why are you sending each patch to netdev as well??? other than making these patches confuse my mailer rules i don't see any reason for that) - it confines to the linux hierarchic maintaining model, Roland is the IB maintainer and he sends Linus ***bunch*** of patches, ***once in a while*** Also if not urgent (eg the bug is killer and the kernel is about to be out in few hours) i think its wrong to send fixes directly from the oven to Linus; patches should reside in the IB git tree for a while, were they can be pulled/compiled and tested by IB developers. To summarize, i suggest to revert to the work mode that has been set by Roland so far, and have it done by Roland. I did not see any note from him that he will not keep on doing the excellent job he did so far, only mentioned that he would be somehow slow for some time as of good and happy reasons. Also, and sorry to do that in public, please slow down. Or. Andrew Morton wrote: > On Tue, 11 Jul 2006 01:19:02 +0300 > "Michael S. Tsirkin" wrote: > >> Quoting r. Michael S. Tsirkin : >>> Yes, -mm seems like a good way to get more review. >> Andrew, am I using the right format to send things upstream to you? >> There's really a set of independent patches, so it didn't make sense >> to me to batch them up in a series. OK? > > Sure. Although I am a little surprised to be be receiving them while > Roland is in taking-time-off-but-not-really-doing-so mode. > > But whatever - I'll put them in the for-2.6.18 queue, thanks. From eitan at mellanox.co.il Tue Jul 11 00:06:16 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 11 Jul 2006 10:06:16 +0300 Subject: [openib-general] [PATCH] OpenSM: Support configurable SL per partition Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A8631@mtlexch01.mtl.com> Thanks. We were anxiously waiting for this patch. I will be testing it once committed. Eitan Zahavi From yipeeyipeeyipeeyipee at yahoo.com Tue Jul 11 00:24:17 2006 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Tue, 11 Jul 2006 07:24:17 +0000 (UTC) Subject: [openib-general] =?utf-8?q?openSM__-_IS=5FSM_capability_mask_prob?= =?utf-8?q?lem?= Message-ID: Hi, On one of my IB setups I get the following error from openSM: osm_vendor_set_sm: ERR 5431: setting IS_SM capability mask failed; errno 2 what's this IS_SM capability mask? what might cause its setting to fail? Thanks, y From ogerlitz at voltaire.com Tue Jul 11 00:30:52 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 11 Jul 2006 10:30:52 +0300 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference In-Reply-To: References: <20060710181825.GE29641@mellanox.co.il> <15ddcffd0607101237ge2d31fcgae6f704bcbc0ae22@mail.gmail.com> Message-ID: <44B353AC.9010506@voltaire.com> Roland Dreier wrote: > Or> So this means that a ULP can not map on the same time these > Or> two page sets: and suggesting the verbs layer to > Or> have A being the IOVA at the HCA IOMMU (eg MPT/MTT in the > Or> mellanox case), and getting some A* to be used for the second > Or> map? > > I don't follow. The FMR implementation will always use the IOVA > passed in by the consumer, so a ULP can always map whatever page sets > it wants at whatever IOVA it wants (subject to alignment restrictions, > of course). So there's no point in making the IOVA be an output > parameter from the FMR pool implementation, since the IOVA from the > consumer will never be changed. OK, i was confused to think that the ib verbs layer, namely ib_map_phys_fmr gets u64 *iova where i see now it gets u64 iova, so indeed there's no point with the FMR pool have the iova being a pointer in its API, sure, you can go a head and commit this fix allover, to iser as well, i don' think its 2.6.18 material since it does not fix any bug. As for my example from above, i was thinking and looking now in the Mellanox PRM proves this thought to be wrong, that as of the HCA the IOVA provided to the FMR map verb i just as ***suggestion*** to the actual VA used for this mapping (this suggested VA had a restriction on the "fmr page size" lower bits etc etc). So the HCA can set another VA and reports it back through the driver as an out param of the FMR map verb. This would allow for mapping the same page concurrently (as the first one) in two maps, which for itself does not make much sense, the only example i was able to think of is user space someone that wants to read into the same page from two reading flows in parallel with direct IO, never mind. Or. From eitan at mellanox.co.il Tue Jul 11 00:46:43 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 11 Jul 2006 10:46:43 +0300 Subject: [openib-general] openSM - IS_SM capability mask problem Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A8679@mtlexch01.mtl.com> You probably have another SM already running on your machine. The error means that OpenSM failed to set the local port IS_SM capability mask bit (which say there is an SM running on that port). If you do not have another SM running on the port you should probably restart the driver as the ref count on the IS_SM bit got crazy. Hope this helps... Eitan Zahavi > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of yipee > Sent: Tuesday, July 11, 2006 10:24 AM > To: openib-general at openib.org > Subject: [openib-general] openSM - IS_SM capability mask problem > > Hi, > > On one of my IB setups I get the following error from openSM: > osm_vendor_set_sm: ERR 5431: setting IS_SM capability mask failed; errno 2 > > what's this IS_SM capability mask? what might cause its setting to fail? > > > Thanks, > y > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ogerlitz at voltaire.com Tue Jul 11 01:54:26 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 11 Jul 2006 11:54:26 +0300 Subject: [openib-general] which patches are needed for 2.6.17 kernel? In-Reply-To: <20060711005900.GA29613@cuprite.pathscale.com> References: <20060711005900.GA29613@cuprite.pathscale.com> Message-ID: <44B36742.6050500@voltaire.com> Johann George wrote: > I'm trying to compile the lastest kernel (2.6.17.4) with the latest > OpenFabrics SVN (8483) and notice that ulp/iser does not compile (missing > scsi/libiscsi.h, etc.). What patches are currently needed to make the > current kernel compile with the current SVN? Thanks. As noted in the iser wiki, please use the patches from https://openib.org/svn/gen2/branches/backport/2.6.17 Or. From mst at mellanox.co.il Tue Jul 11 02:55:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 12:55:30 +0300 Subject: [openib-general] few notes about last night (and elsewhere) IB upstream traffic In-Reply-To: <44B34D23.4020409@voltaire.com> References: <44B34D23.4020409@voltaire.com> Message-ID: <20060711095530.GA17857@mellanox.co.il> Quoting r. Or Gerlitz : > - Roland tree under kernel.org is accessed via http and supports fancy > and very helpful browsing I think that's gitweb. Get it here: http://packages.debian.org/testing/devel/gitweb -- MST From halr at voltaire.com Tue Jul 11 02:52:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jul 2006 05:52:58 -0400 Subject: [openib-general] openSM - IS_SM capability mask problem In-Reply-To: References: Message-ID: <1152611577.13048.41435.camel@hal.voltaire.com> On Tue, 2006-07-11 at 03:24, yipee wrote: > Hi, > > On one of my IB setups I get the following error from openSM: > osm_vendor_set_sm: ERR 5431: setting IS_SM capability mask failed; errno 2 > > what's this IS_SM capability mask? what might cause its setting to fail? It's not the setting which is failing. You are likely not using an SM which supports this (it is an enhanced capability defined in a 1.2 erratum). Are you running a recent OpenSM or something else ? -- Hal > > Thanks, > y > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Tue Jul 11 03:17:45 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jul 2006 06:17:45 -0400 Subject: [openib-general] [PATCH] libibmad: Support MFT and Notice/Trap fields In-Reply-To: <44B2CB89.7020307@ichips.intel.com> References: <1152567165.13048.6696.camel@hal.voltaire.com> <44B2CB89.7020307@ichips.intel.com> Message-ID: <1152613063.13048.42638.camel@hal.voltaire.com> On Mon, 2006-07-10 at 17:50, Sean Hefty wrote: > Hal Rosenstock wrote: > > +enum TRAP_NUM_ID { > > + IB_TRAP_128, > > + > > + IB_TRAP_LAST > > +}; > > Should TRAP_128 be defined as 0? Good catch. This is being removed from this include for now. -- Hal > - Sean From halr at voltaire.com Tue Jul 11 03:18:11 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jul 2006 06:18:11 -0400 Subject: [openib-general] [PATCH][TRIVIAL] libibmad: Remove recently added TRAP_NUM_ID enum Message-ID: <1152613065.13048.42640.camel@hal.voltaire.com> libibmad: Remove recently added TRAP_NUM_ID enum Signed-off-by: Hal Rosenstock Index: include/infiniband/mad.h =================================================================== --- include/infiniband/mad.h (revision 8488) +++ include/infiniband/mad.h (working copy) @@ -152,12 +152,6 @@ enum GSI_ATTR_ID { IB_GSI_ATTR_LAST }; -enum TRAP_NUM_ID { - IB_TRAP_128, - - IB_TRAP_LAST -}; - #define IB_VENDOR_OPENIB_PING_CLASS (IB_VENDOR_RANGE2_START_CLASS + 2) #define IB_VENDOR_OPENIB_SYSSTAT_CLASS (IB_VENDOR_RANGE2_START_CLASS + 3) #define IB_OPENIB_OUI (0x001405) From mst at mellanox.co.il Tue Jul 11 03:59:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 13:59:30 +0300 Subject: [openib-general] [PATCH] libmthca: fix range checking for mthca_create_srq Message-ID: <20060711105930.GA18101@mellanox.co.il> If SRQ max_wr is too high, srq buf size calculation srq->buf_size = srq->max << srq->wqe_shift overflows, resulting in crash on buffer access. The simplest solution is to limit max_wr since practically hardware limits us to less than 1 << 16 WRs. Signed-off-by: Michael S. Tsirkin Index: openib/src/userspace/libmthca/src/verbs.c =================================================================== --- openib/src/userspace/libmthca/src/verbs.c (revision 8484) +++ openib/src/userspace/libmthca/src/verbs.c (working copy) @@ -372,7 +372,7 @@ struct ibv_srq *mthca_create_srq(struct int ret; /* Sanity check SRQ size before proceeding */ - if (attr->attr.max_wr > 16 << 20 || attr->attr.max_sge > 64) + if (attr->attr.max_wr >= 1 << 16 || attr->attr.max_sge > 64) return NULL; srq = malloc(sizeof *srq); -- MST From moshek at voltaire.com Tue Jul 11 06:03:05 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Tue, 11 Jul 2006 16:03:05 +0300 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Message-ID: Vladimir, I'm trying the last tarball OFED svn Rev=8031 on IBM ppc64 redhat as4 U 3. I compiled the pciutils & sysfsutils rpm's with -m64 as you wrote. I face tvflash.c compile error. Do I need to compile more rpm's to 64 bit ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir Sokolovsky Sent: Thursday, May 25, 2006 12:49 PM To: Scott Weitzenkamp (sweitzen) Cc: openib-general at openib.org Subject: Re: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o In OFED-1.0-rc5 all binaries and libraries will be compiled on *ppc64 *with *-m64* flag. This requires sysfsutils and sysfsutils-devel 64-bit RPM to be installed (in order to build libibverbs). Also pciutils and pciutils-devel 64-bit required for tvflash package. libsdp will be built both 32 and 64 bit libraries. Note: in order to build sysfsutils 64-bit RPM run: CC="gcc -m64" rpmbuild --rebuild sysfsutils-1.3.0-1.2.1.src.rpm (This was tested on Fedora C4 PPC64) Regards, Vladimir Scott Weitzenkamp (sweitzen) wrote: > I know Vlad made some changes for rc5 in this area, at least for > libsdp, not sure if other libs got changed as well. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > ------------------------------------------------------------------------ > *From:* Paul [mailto:paul.lundin at gmail.com] > *Sent:* Wednesday, May 24, 2006 11:00 AM > *To:* Scott Weitzenkamp (sweitzen) > *Cc:* openib-general at openib.org > *Subject:* Re: [openib-general] Compilation issues on rhel4 u3 > ppc64 sysfs.o > > Scott, > Upon further inspection the build.sh and install.sh scripts > built 32bit libraries and binaries. If I export CFLAGS (and the > like) to include -m64 then the build dies while looking for a > 64bit libsysfs. rhel4 u3 does not include a ppc64 sysfsutils, nor > have I been able to find an actual 64bit version of it. Is there a > workaround for getting things to build actual ppc64 > binaries/libraries ? > > The actual error is: > checking for dlsym in -ldl... yes > checking for pthread_mutex_init in -lpthread... yes > checking for sysfs_open_class in -lsysfs... no > configure: error: sysfs_open_class() not found. libibverbs > requires libsysfs. > _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From yipeeyipeeyipeeyipee at yahoo.com Tue Jul 11 06:27:25 2006 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Tue, 11 Jul 2006 13:27:25 +0000 (UTC) Subject: [openib-general] =?utf-8?q?openSM__-_IS=5FSM_capability_mask_prob?= =?utf-8?q?lem?= References: <1152611577.13048.41435.camel@hal.voltaire.com> Message-ID: Hal Rosenstock voltaire.com> writes: [snip] > It's not the setting which is failing. You are likely not using an SM > which supports this (it is an enhanced capability defined in a 1.2 > erratum). Are you running a recent OpenSM or something else ? > I'm running a 1.1 openSM on a 2.6.14 kernel. Is there a bug in this version that causes this problem? Should I upgrade to a newer one? Would it be compatible with the 2.6.14 kernel modules/userspace libraries? thanks, y From ftillier at silverstorm.com Tue Jul 11 07:47:49 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 11 Jul 2006 10:47:49 -0400 Subject: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A8627@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3027A8627@mtlexch01.mtl.com> Message-ID: <79ae2f320607110747y7dbd083ai4aff9625cd546375@mail.gmail.com> Hi Eitan, On 7/11/06, Eitan Zahavi wrote: > Guys, > > The files are in SVN: > https://openib.org/svn/gen2/trunk/src/userspace/management/osm/include/iba/ib_types.h > svn://openib.tc.cornell.edu/gen1/trunk/inc/iba/ib_types.h > > You can use whatever diff you like ... No diff tool that I know of will filter out the trivial changes that make up most of the patch from what I have seen. I don't have time to spend on this now, so if you'd like me to look at this, I'll need you to do some of the work to make it easier for me to digest. Cheers, - Fab From joccyan at yahoo.com Tue Jul 11 09:30:59 2006 From: joccyan at yahoo.com (Michelle Efevberha) Date: Tue, 11 Jul 2006 20:30:59 +0400 Subject: [openib-general] AFFORDABLE DREAM HOMES Message-ID: <20060711162921.EB3B1F0003@sentry-two.sandia.gov> PROPERTIES AVAILABLE FOR SALE UNITED ARAB EMIRATES DIRECTLY FROM OWNER Dear customers we are offering properties all over the emirates (Dubai, Sharjah, Ajman,Fujeirah,Ras al khaimah), and other countries(Barhain and Turkey) Land, Apartments, villas for sale Below are just a few,we have many more. Saba 2 (sheikh Zayde road) 2 bedroom apartment 1,100 sq ft Price: 950,000 dhs Lake Terrace Tower 2 apartment available (1 br) 1 bedroom flat 722sq ft Price: 650,000dhs Studio 484 sq ft Price: 450,000 dhs Global Lake view 2 bedroom apartment Price:1,189,800dhs 23rd floor Global Green View 2 bedroom apartment Price: 909,750dhs 8th floor 3Bedroom apartment 10th floor Apt Number: 1011 1,300 sq ft Price: 880,000 dhs(negociable) Marina Ocean heights 3 Bedroom apartments Price: 2,960,000dhs Sea view 2,115 sq ft 2 parking International city tower 3 studios On floors 8, 12 & 15 479 sq ft Price: 320,000dhs BUILDINGS IN SHARJAH 2 buildings 1 is G+25(105 million dhs) 2 is G+15(80 Million dhs) BUILDING IN AJMAN FOR SALE New Building G+1 12 apartments, all 2 bedroom apartment Down stairs: 5 shops, 2 studios Price: 4,900,000 dhs LAND IN DUBAI In A barsha 15,000sqft=700,000dhs Near the airport: 90,000 sqft=135million dhs On sheikh zayed road: 15,000sq ft= 225million dhs close to Dubai tower:50,000sqft=100million dhs LAND IN AJMAN 10,000 SQ ft for 800,000 dhs LAND IN RAS AL KHAIMAH Marjan Island: 1sq ft 400dhs Large Plot 1,200,000sq ft=15,600,000 dhs(Another plot) LAND IN FUJEIRAH Farmland direct to the sea 100,00 sq ft=1 10million dhs IN BARHAIN Tubli Resort in Barhain goes for 105 million dirhams LAND IN TURKEY 40 mins. by ferry boat from ISTANBUL .The plot is within Yallova- Bursa- Izmit area. Selling Price 4000sq.ft.-5000sq.ft. $8,000 6000sq.ft.-8000sq.ft. $9,800 Send your reply to emiratesp at yahoo.com Or michelle.joccy at gmail.com For more information: Please contact Michelle: +971 50 8817012 Dubai UAE From mshefty at ichips.intel.com Tue Jul 11 09:45:46 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 09:45:46 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <20060711055134.GB16513@mellanox.co.il> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> <20060711055134.GB16513@mellanox.co.il> Message-ID: <44B3D5BA.2090109@ichips.intel.com> Michael S. Tsirkin wrote: > This will be 2.6.18 material, right? I want to get some wider testing of the patch before pushing upstream, but I consider this a bug fix that we should try to push into 2.6.18. - Sean From mshefty at ichips.intel.com Tue Jul 11 09:59:20 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 09:59:20 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ibrdmaverbs config files 2 In-Reply-To: <1152589730.8881.15.camel@localhost.localdomain> References: <1152589394.8881.9.camel@localhost.localdomain> <1152589730.8881.15.camel@localhost.localdomain> Message-ID: <44B3D8E8.2060206@ichips.intel.com> Krishna Kumar wrote: > The intention was never to break the existing applications, since I am > not suggesting to remove libibverbs immediately. The intention is : when > all applications are converted to use the new API, then the libibverbs > can be removed. > > Does that sound reasonable ? Otherwise I am OK to change it wrt to your > idea. By keeping all APIs to a single library with compatibility code, it's easier to convert to the new APIs. The changes can occur one call at a time, versus converting everything in an application at once. There's only one library to download, build, and install. Fixes to the library don't need to duplicated. Etc. - Sean From mst at mellanox.co.il Tue Jul 11 10:22:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 20:22:16 +0300 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <44B3D5BA.2090109@ichips.intel.com> References: <44B3D5BA.2090109@ichips.intel.com> Message-ID: <20060711172216.GA19137@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 > > Michael S. Tsirkin wrote: > > This will be 2.6.18 material, right? > > I want to get some wider testing of the patch before pushing upstream. This goes without saying. > I consider this a bug fix that we should try to push into 2.6.18. Right. Is this in SVN at the moment? -- MST From sean.hefty at intel.com Tue Jul 11 10:39:36 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 10:39:36 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <20060711172216.GA19137@mellanox.co.il> Message-ID: <000001c6a510$fabd2f40$e598070a@amr.corp.intel.com> >> I consider this a bug fix that we should try to push into 2.6.18. > >Right. Is this in SVN at the moment? I haven't checked this in yet. I was just about to run some additional tests before doing this. My biggest concern is that modifying the TID for SENDs may cause an issue with some application. - Sean From rdreier at cisco.com Tue Jul 11 11:44:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 11 Jul 2006 11:44:19 -0700 Subject: [openib-general] infiniband patch series References: <20060710211151.4213742b.akpm@osdl.org> <20060711050055.GA16064@mellanox.co.il> Message-ID: Andrew> Sure. Although I am a little surprised to be be receiving Andrew> them while Roland is in Andrew> taking-time-off-but-not-really-doing-so mode. Michael> Well, I don't know what's up either, but Roland acked Michael> patches explicitly so I figured that's what he wants, Michael> too. I'm in steal-10-minutes-to-read-email-every-now-and-then-mode, so I'd rather let someone else handle all the patch merging etc. I'll be back for real after Ottawa I guess... - R. From mst at mellanox.co.il Tue Jul 11 11:54:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 21:54:45 +0300 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <000001c6a510$fabd2f40$e598070a@amr.corp.intel.com> References: <000001c6a510$fabd2f40$e598070a@amr.corp.intel.com> Message-ID: <20060711185445.GC19137@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 > > >> I consider this a bug fix that we should try to push into 2.6.18. > > > >Right. Is this in SVN at the moment? > > I haven't checked this in yet. I was just about to run some additional tests > before doing this. My biggest concern is that modifying the TID for SENDs may > cause an issue with some application. How about we check that application has put 0 in high 32 bit, and return an error if it did not? -- MST From sean.hefty at intel.com Tue Jul 11 12:08:14 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 12:08:14 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <20060711185445.GC19137@mellanox.co.il> Message-ID: <000101c6a51d$5c5d70a0$e598070a@amr.corp.intel.com> >> I haven't checked this in yet. I was just about to run some additional tests >> before doing this. My biggest concern is that modifying the TID for SENDs >may >> cause an issue with some application. > >How about we check that application has put 0 in high 32 bit, and return >an error if it did not? This should work if we can determine whether the SEND is a new operation or a response to some other request. After further testing, this patch breaks osmtest as a result of modifying the TID for a SEND. I could just ignore SENDs like the existing code did, but I'm not convinced that that's the right thing to do. - Sean From johann.george at qlogic.com Tue Jul 11 12:13:05 2006 From: johann.george at qlogic.com (Johann George) Date: Tue, 11 Jul 2006 12:13:05 -0700 Subject: [openib-general] which patches are needed for 2.6.17 kernel? In-Reply-To: <44B36742.6050500@voltaire.com> References: <20060711005900.GA29613@cuprite.pathscale.com> <44B36742.6050500@voltaire.com> Message-ID: <20060711191305.GA16852@cuprite.pathscale.com> > As noted in the iser wiki, please use the patches from Thanks for updating the wiki. Still having problems compiling: CC [M] drivers/infiniband/core/addr.o In file included from drivers/infiniband/core/addr.c:38: drivers/infiniband/include/rdma/ib_addr.h:43: error: field 'dev_type' has incomplete type drivers/infiniband/core/addr.c: In function 'copy_addr': drivers/infiniband/core/addr.c:95: error: 'RDMA_NODE_IB_CA' undeclared (first use in this function) drivers/infiniband/core/addr.c:95: error: (Each undeclared identifier is reported only once drivers/infiniband/core/addr.c:95: error: for each function it appears in.) drivers/infiniband/core/addr.c:98: error: 'RDMA_NODE_RNIC' undeclared (first use in this function) make[3]: *** [drivers/infiniband/core/addr.o] Error 1 make[2]: *** [drivers/infiniband/core] Error 2 make[1]: *** [drivers/infiniband] Error 2 From mshefty at ichips.intel.com Tue Jul 11 12:17:15 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 12:17:15 -0700 Subject: [openib-general] which patches are needed for 2.6.17 kernel? In-Reply-To: <20060711191305.GA16852@cuprite.pathscale.com> References: <20060711005900.GA29613@cuprite.pathscale.com> <44B36742.6050500@voltaire.com> <20060711191305.GA16852@cuprite.pathscale.com> Message-ID: <44B3F93B.7090809@ichips.intel.com> Johann George wrote: > Still having problems compiling: > > CC [M] drivers/infiniband/core/addr.o In file included from > drivers/infiniband/core/addr.c:38: > drivers/infiniband/include/rdma/ib_addr.h:43: error: field 'dev_type' has > incomplete type drivers/infiniband/core/addr.c: In function 'copy_addr': > drivers/infiniband/core/addr.c:95: error: 'RDMA_NODE_IB_CA' undeclared (first > use in this function) It looks like you're pulling old header files. Did you delete or rename the standard linux/include/rdma directory? - Sean From rdreier at cisco.com Tue Jul 11 12:29:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 11 Jul 2006 12:29:23 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ibrdmaverbs config files 2 References: <1152589394.8881.9.camel@localhost.localdomain> <1152589730.8881.15.camel@localhost.localdomain> Message-ID: Krishna> Actually what you are suggesting is almost the same as Krishna> what I have done - instead of creating different API's in Krishna> the same library, I have created a new library to do the Krishna> same. The problem with that is then there are two libraries to maintain, fixes have to be merged twice, etc. It's much better to follow an evolutionary path. - R. From or.gerlitz at gmail.com Tue Jul 11 12:55:35 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 11 Jul 2006 21:55:35 +0200 Subject: [openib-general] which patches are needed for 2.6.17 kernel? In-Reply-To: <44B3F93B.7090809@ichips.intel.com> References: <20060711005900.GA29613@cuprite.pathscale.com> <44B36742.6050500@voltaire.com> <20060711191305.GA16852@cuprite.pathscale.com> <44B3F93B.7090809@ichips.intel.com> Message-ID: <15ddcffd0607111255u2df65128oe1f63135b2770d17@mail.gmail.com> On 7/11/06, Sean Hefty wrote: > Johann George wrote: > > Still having problems compiling: > > > > CC [M] drivers/infiniband/core/addr.o In file included from > > drivers/infiniband/core/addr.c:38: > > drivers/infiniband/include/rdma/ib_addr.h:43: error: field 'dev_type' has > > incomplete type drivers/infiniband/core/addr.c: In function 'copy_addr': > > drivers/infiniband/core/addr.c:95: error: 'RDMA_NODE_IB_CA' undeclared (first > > use in this function) > > It looks like you're pulling old header files. Did you delete or rename the > standard linux/include/rdma directory? Indeed, note that when you want to replace a kernel's IB code with what's in the SVN its a must to replace both drivers/infiniband AND include/rdma, a common method is to put drivers/infiniband and include/rdma a side, put the svn code under drivers/infiniband and have include/rdma being a soft link to ../drivers/infiniband/include/rdma Or. the common methof From mst at mellanox.co.il Tue Jul 11 12:55:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 11 Jul 2006 22:55:47 +0300 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <000101c6a51d$5c5d70a0$e598070a@amr.corp.intel.com> References: <000101c6a51d$5c5d70a0$e598070a@amr.corp.intel.com> Message-ID: <20060711195547.GA21546@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 > > >> I haven't checked this in yet. I was just about to run some additional tests > >> before doing this. My biggest concern is that modifying the TID for SENDs > >may > >> cause an issue with some application. > > > >How about we check that application has put 0 in high 32 bit, and return > >an error if it did not? > > This should work if we can determine whether the SEND is a new operation or a > response to some other request. Isn't ib_response_mad doing this? > After further testing, this patch breaks osmtest as a result of modifying the > TID for a SEND. What does the test do? > I could just ignore SENDs like the existing code did, but I'm > not convinced that that's the right thing to do. What does Jack's patch do? -- MST From zach.brown at oracle.com Tue Jul 11 13:10:48 2006 From: zach.brown at oracle.com (Zach Brown) Date: Tue, 11 Jul 2006 13:10:48 -0700 Subject: [openib-general] ipoib lockdep warning Message-ID: <44B405C8.4040706@oracle.com> I get an awfully verbose lockdep warning when I bring up ipoib devices and wanted to make sure it didn't go unreported. I've put it up at the following URL: http://oss.oracle.com/~zab/ipoib-multicast-lockdep-warning.txt I looked into it a bit and it seems to be on to something. There might be a AB, BC, CA ordering deadlock. AB: priv->lock is held while acquiring query_idr.lock ipoib_mcast_send() spin_lock(&priv->lock); ipoib_mcast_sendonly_join() ib_sa_mcmember_rec_query() send_mad() idr_pre_get(&query_idr) spin_lock(&idp->lock); BC: query_idr.lock is taken with interrupts enabled and so is implicitly ordered before dev->_xmit_lock which is taken in interrupt context. ipoib_mcast_join_task() ipoib_mcast_join() ib_sa_mcmember_rec_query() send_mad() idr_pre_get(&query_idr) spin_lock(&idp->lock) CA: dev->_xmit_lock is held while acquiring priv->lock. This triggers the lockdep warning that adding the dep between dev->_xmit_lock and priv->lock connects a soft-irq-safe lock to a soft-irq-unsafe one. ipoib_mcast_restart_task() local_irq_save(flags); netif_tx_lock(dev) spin_lock(&dev->_xmit_lock); spin_lock(&priv->lock); I can imagine all sorts of potential fixes (block ints when calling idr? reorder acquiry in ipoib_mcast_restart_task()?) but I'm operating on a partial view of the paths here so I wasn't comfortable suggesting a fix. I wouldn't be surprised to hear that there are circumstances that both lockdep and I don't know about that stop this from being a problem :). In any case, it'd be fantastic if someone who knows this code could sit down with lockdep and some ipoib regression suite to shake out lockdep's complaints. - z From johann.george at qlogic.com Tue Jul 11 13:14:06 2006 From: johann.george at qlogic.com (Johann George) Date: Tue, 11 Jul 2006 13:14:06 -0700 Subject: [openib-general] which patches are needed for 2.6.17 kernel? In-Reply-To: <44B3F93B.7090809@ichips.intel.com> References: <20060711005900.GA29613@cuprite.pathscale.com> <44B36742.6050500@voltaire.com> <20060711191305.GA16852@cuprite.pathscale.com> <44B3F93B.7090809@ichips.intel.com> Message-ID: <20060711201406.GA17993@cuprite.pathscale.com> > It looks like you're pulling old header files. Did you delete or rename the > standard linux/include/rdma directory? Yes. I moved it temporarily and forgot to put it back. All seems to be fine now. Thanks. Johann From sean.hefty at intel.com Tue Jul 11 13:07:46 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 13:07:46 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <20060711195547.GA21546@mellanox.co.il> Message-ID: <000301c6a525$ad0ed040$e598070a@amr.corp.intel.com> >> This should work if we can determine whether the SEND is a new operation or a >> response to some other request. > >Isn't ib_response_mad doing this? It should. We just need to ensure that it identifies the MADs correctly. >> After further testing, this patch breaks osmtest as a result of modifying the >> TID for a SEND. > >What does the test do? Hmm... I just reran the test, and it worked now. Now I'm really confused as to what the problem was. The SM / SA classes don't use SENDs, so I was struggling to understand the issue. Maybr I'm hitting into some sort of SM conflict that takes a while to resolve itself... >> I could just ignore SENDs like the existing code did, but I'm >> not convinced that that's the right thing to do. > >What does Jack's patch do? Jack's patch operated beneath the MAD interface, so left the TIDs unchanged. It relied on user_mad to set the TIDs correctly. - Sean From pmullaney at novell.com Tue Jul 11 14:14:33 2006 From: pmullaney at novell.com (Patrick Mullaney) Date: Tue, 11 Jul 2006 15:14:33 -0600 Subject: [openib-general] [PATCH] OpenSM: Support configurable SL per partition Message-ID: <44B3C059020000C700012B8C@sinclair.provo.novell.com> This will avoid an invalid warning about service level value if sl=0 is used in the partition config file. Can you include something like it in your original patch? Thanks. Pat -------------------------------------------------------------------------------------------------- --- opensm.old/osm_prtn_config.c 2006-07-11 16:06:33.000000000 -0400 +++ opensm/osm_prtn_config.c 2006-07-11 16:08:00.000000000 -0400 @@ -52,6 +52,7 @@ #include #include #include +#include #include #include @@ -159,7 +160,7 @@ static int partition_add_flag(unsigned l } else if (!strncmp(flag, "sl", len)) { unsigned sl; - if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > 15) + if (!val || (sl = strtoul(val, NULL, 0)) == UINT_MAX || sl > 15) PARSEWARN(conf->p_log, lineno, "flag \'sl\' requires valid value" " - skipped.\n"); From mst at mellanox.co.il Tue Jul 11 14:16:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 12 Jul 2006 00:16:20 +0300 Subject: [openib-general] ipoib lockdep warning In-Reply-To: <44B405C8.4040706@oracle.com> References: <44B405C8.4040706@oracle.com> Message-ID: <20060711211620.GB21546@mellanox.co.il> Quoting r. Zach Brown : > BC: > > query_idr.lock is taken with interrupts enabled and so is implicitly > ordered before dev->_xmit_lock which is taken in interrupt context. > > ipoib_mcast_join_task() > ipoib_mcast_join() > ib_sa_mcmember_rec_query() > send_mad() > idr_pre_get(&query_idr) > spin_lock(&idp->lock) Got to check, but if that's true we have a simple deadlock here: ib_sa_mcmember_rec_query might get called from interrupt context as well, deadlocking on idp->lock? Sean? > I can imagine all sorts of potential fixes (block ints when calling idr? > reorder acquiry in ipoib_mcast_restart_task()?) but I'm operating on a > partial view of the paths here so I wasn't comfortable suggesting a fix. > I wouldn't be surprised to hear that there are circumstances that both > lockdep and I don't know about that stop this from being a problem :). Awesome, thanks for the analysis! Your help is very much appreciated. -- MST From sean.hefty at intel.com Tue Jul 11 14:40:32 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 14:40:32 -0700 Subject: [openib-general] ipoib lockdep warning In-Reply-To: <20060711211620.GB21546@mellanox.co.il> Message-ID: <000401c6a532$a3017f50$e598070a@amr.corp.intel.com> >Quoting r. Zach Brown : >> BC: >> >> query_idr.lock is taken with interrupts enabled and so is implicitly >> ordered before dev->_xmit_lock which is taken in interrupt context. >> >> ipoib_mcast_join_task() >> ipoib_mcast_join() >> ib_sa_mcmember_rec_query() >> send_mad() >> idr_pre_get(&query_idr) >> spin_lock(&idp->lock) > >Got to check, but if that's true we have a simple deadlock here: >ib_sa_mcmember_rec_query might get called from interrupt >context as well, deadlocking on idp->lock? > >Sean? As a side note, I believe that this is the upstream code and does not include the latest multicast changes. I'm not sure if anything calls into the sa_query interfaces from interrupt context, but I doubt it. From my brief look at the initially reported issue, I can't determine if there's an actual problem without studying the ipoib code more. - Sean From zach.brown at oracle.com Tue Jul 11 14:50:55 2006 From: zach.brown at oracle.com (Zach Brown) Date: Tue, 11 Jul 2006 14:50:55 -0700 Subject: [openib-general] ipoib lockdep warning In-Reply-To: <000401c6a532$a3017f50$e598070a@amr.corp.intel.com> References: <000401c6a532$a3017f50$e598070a@amr.corp.intel.com> Message-ID: <44B41D3F.7010409@oracle.com> > As a side note, I believe that this is the upstream code and does not include > the latest multicast changes. Indeed, I should have mentioned in my report that I was running 2.6.17-mm6. - z From johnip at sgi.com Tue Jul 11 14:55:55 2006 From: johnip at sgi.com (John Partridge) Date: Tue, 11 Jul 2006 16:55:55 -0500 Subject: [openib-general] [Bug 146] OFED-1.0 DAPL fails to build on SLES10 on IA64 with IA64_FETCHADD error In-Reply-To: References: <44AD9614.5090907@sgi.com> Message-ID: <44B41E6B.6070703@sgi.com> James Lentini wrote: > This is included on IA64 and PPC systems. Since we have not done > testing on IA64 or PPC, I'm certain that this was a contribution for a > IA64 or PPC DAPL user. For that reason, I'm not certain why the > asm/system.h was included. It is certainly possible that it is > necessary on PPC but not IA64. > James, Thanks your patch has fixed the build errors. But I do suggest we leave system.h included for PPC until we know we don't need it (see my patch I sent to you last week). The resulting build from your last patch has been installed and we are in the process of DAPL tests now. I do know that the libdat works with Intel MPI (although we had to manually create a symlink from libdat.so.1 to libdat.so - should this not already exist?) I do have one question about how the dapl RPM's are organized, we are creating a DAPL interface for the ccNUMA xpmem on SGI Altix systems. At the moment we have a libdat and libdapl(xpmem). It is our objective to use the OFED-1.0 libdat, as libdat will be used for non-infiniband interfaces I don't quite understand why libdat (dat.conf) are not a separate RPM and the libdapl interfaces installed in a separate RPM ? Would his not make more sense ? Thanks John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From rdreier at cisco.com Tue Jul 11 15:54:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 11 Jul 2006 15:54:26 -0700 Subject: [openib-general] ipoib lockdep warning References: <44B405C8.4040706@oracle.com> Message-ID: No time to really look at this in detail, but I think the issue is a slightly bogus conversion to netif_tx_lock(). Can you try this patch and see if things are better? diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index ab40488..ddd1946 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -820,9 +820,8 @@ void ipoib_mcast_restart_task(void *dev_ ipoib_mcast_stop_thread(dev, 0); - local_irq_save(flags); netif_tx_lock(dev); - spin_lock(&priv->lock); + spin_lock_irqsave(&priv->lock, flags); /* * Unfortunately, the networking core only gives us a list of all of @@ -893,9 +892,8 @@ void ipoib_mcast_restart_task(void *dev_ } } - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); netif_tx_unlock(dev); - local_irq_restore(flags); /* We have to cancel outside of the spinlock */ list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { From mshefty at ichips.intel.com Tue Jul 11 15:55:54 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 15:55:54 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> Message-ID: <44B42C7A.9030207@ichips.intel.com> Sean Hefty wrote: > @@ -438,6 +493,11 @@ static ssize_t ib_umad_write(struct file > copy_offset = IB_MGMT_RMPP_HDR; > rmpp_active = ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) & > IB_MGMT_RMPP_FLAG_ACTIVE; > + if (rmpp_active && > + rmpp_mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) { > + ret = -EINVAL; > + goto err_ah; > + } It appears that this check results in discarding RMPP responses from openSM. For now, I'm going to remove this check from the patch. (My testing seems to work fine then.) Other issues that I've been running into appear to be related to a combination of timing issues running the tests too quickly after starting opensm and running multiple copies of opensm on different systems. - Sean From zach.brown at oracle.com Tue Jul 11 16:27:10 2006 From: zach.brown at oracle.com (Zach Brown) Date: Tue, 11 Jul 2006 16:27:10 -0700 Subject: [openib-general] ipoib lockdep warning In-Reply-To: References: <44B405C8.4040706@oracle.com> Message-ID: <44B433CE.1030103@oracle.com> Roland Dreier wrote: > No time to really look at this in detail, but I think the issue is a > slightly bogus conversion to netif_tx_lock(). Can you try this patch > and see if things are better? > - local_irq_save(flags); > netif_tx_lock(dev); > - spin_lock(&priv->lock); > + spin_lock_irqsave(&priv->lock, flags); Hmm, won't that hold dev->_xmit_lock with interrupts enabled? That seems like it'd deadlock with dev_watchdog grabbing it from a softirq? But setting that aside, I don't think this will help as it doesn't change the order that the locks are acquired in this path. This can still be racing with other cpus that are each in the other two paths (queue_idr.lock -> softirq -> dev->_xmit_lock, priv->lock -> queue_idr.lock). The local state of interrupt masking on this cpu doesn't stop the other cpus from each grabbing the first lock in their path and then trying to grab the second. Imagine that they all race to grab the first (A, B, C) and succeed and then all get stuck spinning on their second lock (B, C, A). Maybe you could get the priv->lock here before dev->_xmit_lock. Then we'd have AB, BC, AC, and that's OK. I'm going to guess that this won't work because other paths have dev->_xmit_lock -> priv->lock ordering. Another possibility is masking interrupts when getting queue_idr.lock. That drops the implicit dependency between queue_idr and _xmit_lock and gives us AB, B, CA -- which is fine. That means blocking ints while in idr_pre_get() which allocs which leads to GFP_ATOMIC and more likely allocation failure. That's my reading, anyway. - z From mshefty at ichips.intel.com Tue Jul 11 16:41:16 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 11 Jul 2006 16:41:16 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <20060711055134.GB16513@mellanox.co.il> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> <20060711055134.GB16513@mellanox.co.il> Message-ID: <44B4371C.7020809@ichips.intel.com> Michael S. Tsirkin wrote: > This will be 2.6.18 material, right? I've committed this to svn. Assuming that further testing goes without a hitch, you can pull the actual check-in from revision 8498 to push upstream. - Sean From yates2 at llnl.gov Tue Jul 11 16:39:21 2006 From: yates2 at llnl.gov (Kim Yates) Date: Tue, 11 Jul 2006 16:39:21 -0700 Subject: [openib-general] Call for topics for September OFA developers workshop Message-ID: It's time to start planning for the next Developers' workshop. The plan is to have it on September 28-29, following (overlapping somewhat) Intel's IDF in San Francisco. Please send me suggestions for topics you'd like to see addressed, speakers or activities you'd like to see on the agenda, etc., and I'll collect them and pass them along for discussion. Recall the agenda from the last developers' workshop is available here: http://openfabrics.org/conference/spring2006sonoma/ ________________________________________________________ Kim Yates LLNL Center for Applied Scientific Computing kimyates at llnl.gov (925) 424-3642 From rdreier at cisco.com Tue Jul 11 16:43:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 11 Jul 2006 16:43:18 -0700 Subject: [openib-general] ipoib lockdep warning In-Reply-To: <44B433CE.1030103@oracle.com> (Zach Brown's message of "Tue, 11 Jul 2006 16:27:10 -0700") References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> Message-ID: Hmm, good point. It sort of seems to me like the idr interfaces are broken by design. Internally, lib/idr.c uses bare spin_lock(&idp->lock) with no interrupt disabling or anything in both the idr_pre_get() and idr_get_new() code paths. But idr_pre_get() is supposed to be called in a context that can sleep, while idr_get_new() is supposed to be called with locks held to serialize things (at least according to http://lwn.net/Articles/103209/). So, ugh... maybe the best thing to do is change lib/idr.c to use spin_lock_irqsave() internally? - R. From zach.brown at oracle.com Tue Jul 11 16:53:27 2006 From: zach.brown at oracle.com (Zach Brown) Date: Tue, 11 Jul 2006 16:53:27 -0700 Subject: [openib-general] ipoib lockdep warning In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> Message-ID: <44B439F7.3090008@oracle.com> Roland Dreier wrote: > Hmm, good point. > > It sort of seems to me like the idr interfaces are broken by design. > Internally, lib/idr.c uses bare spin_lock(&idp->lock) with no > interrupt disabling or anything in both the idr_pre_get() and > idr_get_new() code paths. I wasn't thrilled to see that either. We seem to have a fair precedent (list.h, rbtree, etc) for leaving serialization to callers. > So, ugh... maybe the best thing to do is change lib/idr.c to use > spin_lock_irqsave() internally? I dunno, it seems to have had _irq() locking in the past? From the comment at the top: * Modified by George Anzinger to reuse immediately and to use * find bit instructions. Also removed _irq on spinlocks. - z From rdreier at cisco.com Tue Jul 11 17:06:10 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 11 Jul 2006 17:06:10 -0700 Subject: [openib-general] ipoib lockdep warning In-Reply-To: <44B439F7.3090008@oracle.com> (Zach Brown's message of "Tue, 11 Jul 2006 16:53:27 -0700") References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <44B439F7.3090008@oracle.com> Message-ID: > > So, ugh... maybe the best thing to do is change lib/idr.c to use > > spin_lock_irqsave() internally? > > I dunno, it seems to have had _irq() locking in the past? From the > comment at the top: > > * Modified by George Anzinger to reuse immediately and to use > * find bit instructions. Also removed _irq on spinlocks. Well, _irq would be no good, because we might want to call idr stuff with interrupts disabled. But making idr internally _irqsave seems like the right fix to me. I think the real issue here is that the sa_query.c stuff wants to use the idr mechanism to assign "query ids", and other modules want to be able to start queries from any context. So if idr uses bare spin_lock internally, then sa_query.c has no choice but to wrap all idr calls inside spin_lock_irqsave and do all allocation with GFP_ATOMIC, which doesn't seem very nice. - R. From rdreier at cisco.com Tue Jul 11 19:33:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 11 Jul 2006 19:33:07 -0700 Subject: [openib-general] ipoib lockdep warning In-Reply-To: <44B405C8.4040706@oracle.com> (Zach Brown's message of "Tue, 11 Jul 2006 13:10:48 -0700") References: <44B405C8.4040706@oracle.com> Message-ID: OK, the patch to lib/idr.c below is at least one way to fix this. However, with that applied I get the lockdep warning below, which seems to be a false positive -- I'm not sure what the right fix is, but the blame really seems to fall on udp_ioctl for poking into the sk_buff_head lock itself. Ingo and/or Arjan, any thoughts on the idr.c change or the sk_buff_head warning? Thanks, Roland Here's the idr.c change: diff --git a/lib/idr.c b/lib/idr.c index 4d09681..16d2143 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -38,14 +38,15 @@ static kmem_cache_t *idr_layer_cache; static struct idr_layer *alloc_layer(struct idr *idp) { struct idr_layer *p; + unsigned long flags; - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); if ((p = idp->id_free)) { idp->id_free = p->ary[0]; idp->id_free_cnt--; p->ary[0] = NULL; } - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); return(p); } @@ -59,12 +60,14 @@ static void __free_layer(struct idr *idp static void free_layer(struct idr *idp, struct idr_layer *p) { + unsigned long flags; + /* * Depends on the return element being zeroed. */ - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); __free_layer(idp, p); - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); } /** @@ -168,6 +171,7 @@ static int idr_get_new_above_int(struct { struct idr_layer *p, *new; int layers, v, id; + unsigned long flags; id = starting_id; build_up: @@ -191,14 +195,14 @@ build_up: * The allocation failed. If we built part of * the structure tear it down. */ - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); for (new = p; p && p != idp->top; new = p) { p = p->ary[0]; new->ary[0] = NULL; new->bitmap = new->count = 0; __free_layer(idp, new); } - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); return -1; } new->ary[0] = p; And here's the warning I get, which appears to be a false positive: ====================================================== [ INFO: hard-safe -> hard-unsafe lock order detected ] ------------------------------------------------------ swapper/0 [HC0[0]:SC1[2]:HE0:SE0] is trying to acquire: (&skb_queue_lock_key){-+..}, at: [] skb_queue_tail+0x1d/0x47 and this task is already holding: (&priv->lock){.+..}, at: [] ipoib_mcast_send+0x29/0x413 [ib_ipoib] which would create a new lock dependency: (&priv->lock){.+..} -> (&skb_queue_lock_key){-+..} but this new dependency connects a hard-irq-safe lock: (&priv->tx_lock){+...} ... which became hard-irq-safe at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] ipoib_ib_completion+0x340/0x3ff [ib_ipoib] [] mthca_cq_completion+0x65/0x6b [ib_mthca] [] mthca_eq_int+0x70/0x3d2 [ib_mthca] [] mthca_arbel_interrupt+0x3b/0x98 [ib_mthca] [] handle_IRQ_event+0x28/0x64 [] __do_IRQ+0xac/0x117 [] do_IRQ+0xf7/0x108 [] common_interrupt+0x64/0x65 to a hard-irq-unsafe lock: (&skb_queue_lock_key){-+..} ... which became hard-irq-unsafe at: ... [] lock_acquire+0x4a/0x69 [] _spin_lock_bh+0x26/0x33 [] udp_ioctl+0x46/0x87 [] inet_ioctl+0x8c/0x8f [] sock_ioctl+0x1c0/0x1ea [] do_ioctl+0x26/0x74 [] vfs_ioctl+0x24a/0x264 [] sys_ioctl+0x41/0x68 [] system_call+0x7d/0x83 other info that might help us debug this: 2 locks held by swapper/0: #0: (&priv->tx_lock){+...}, at: [] ipoib_start_xmit+0x42/0x66d [ib_ipoib] #1: (&priv->lock){.+..}, at: [] ipoib_mcast_send+0x29/0x413 [ib_ipoib] the hard-irq-safe lock's dependencies: -> (&priv->tx_lock){+...} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] ipoib_mcast_join_finish+0x340/0x379 [ib_ipoib] [] ipoib_mcast_join_complete+0xa2/0x294 [ib_ipoib] [] ib_sa_mcmember_rec_callback+0x4b/0x57 [ib_sa] [] recv_handler+0x3d/0x4a [ib_sa] [] ib_mad_completion_handler+0x3df/0x5e4 [ib_mad] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 in-hardirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] ipoib_ib_completion+0x340/0x3ff [ib_ipoib] [] mthca_cq_completion+0x65/0x6b [ib_mthca] [] mthca_eq_int+0x70/0x3d2 [ib_mthca] [] mthca_arbel_interrupt+0x3b/0x98 [ib_mthca] [] handle_IRQ_event+0x28/0x64 [] __do_IRQ+0xac/0x117 [] do_IRQ+0xf7/0x108 [] common_interrupt+0x64/0x65 } ... key at: [] __key.23652+0x0/0xffffffffffff804f [ib_ipoib] -> (&priv->lock){.+..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] ipoib_mcast_start_thread+0x6e/0x87 [ib_ipoib] [] ipoib_ib_dev_up+0x52/0x58 [ib_ipoib] [] ipoib_open+0x65/0x102 [ib_ipoib] [] dev_open+0x3a/0x80 [] dev_change_flags+0x65/0x139 [] devinet_ioctl+0x240/0x5e2 [] inet_ioctl+0x70/0x8f [] sock_ioctl+0x1c0/0x1ea [] do_ioctl+0x26/0x74 [] vfs_ioctl+0x24a/0x264 [] sys_ioctl+0x41/0x68 [] system_call+0x7d/0x83 in-softirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock+0x21/0x2e [] ipoib_mcast_send+0x28/0x413 [ib_ipoib] [] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [] dev_hard_start_xmit+0x1ab/0x221 [] __qdisc_run+0xfa/0x1cd [] dev_queue_xmit+0x136/0x263 [] neigh_connected_output+0xae/0xc7 [] ip6_output2+0x254/0x28c [ipv6] [] ip6_output+0x7e2/0x7f8 [ipv6] [] ndisc_send_ns+0x38f/0x4c1 [ipv6] [] addrconf_dad_timer+0xfb/0x11e [ipv6] [] run_timer_softirq+0x150/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 } ... key at: [] __key.23651+0x0/0xffffffffffff8047 [ib_ipoib] -> (&idr_lock){....} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] send_mad+0x39/0x141 [ib_sa] [] ib_sa_mcmember_rec_query+0x147/0x17c [ib_sa] [] ipoib_mcast_join+0x15f/0x1f2 [ib_ipoib] [] ipoib_mcast_join_task+0x24b/0x2e7 [ib_ipoib] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 } ... key at: [] __key.17990+0x0/0xffffffffffffc8c2 [ib_sa] -> (query_idr.lock){....} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] free_layer+0x1c/0x40 [] idr_pre_get+0x31/0x42 [] send_mad+0x25/0x141 [ib_sa] [] ib_sa_mcmember_rec_query+0x147/0x17c [ib_sa] [] ipoib_mcast_join+0x15f/0x1f2 [ib_ipoib] [] ipoib_mcast_join_task+0x24b/0x2e7 [ib_ipoib] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 } ... key at: [] query_idr+0x30/0xffffffffffffce4a [ib_sa] ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] alloc_layer+0x18/0x4d [] idr_get_new_above_int+0x37/0x225 [] idr_get_new+0xf/0x2f [] send_mad+0x4f/0x141 [ib_sa] [] ib_sa_mcmember_rec_query+0x147/0x17c [ib_sa] [] ipoib_mcast_join+0x15f/0x1f2 [ib_ipoib] [] ipoib_mcast_join_task+0x24b/0x2e7 [ib_ipoib] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] ib_sa_cancel_query+0x1b/0x73 [ib_sa] [] wait_for_mcast_join+0x35/0xd2 [ib_ipoib] [] ipoib_mcast_stop_thread+0xa6/0xe6 [ib_ipoib] [] ipoib_mcast_restart_task+0x4b/0x3d9 [ib_ipoib] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 -> (&mad_agent_priv->lock){....} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] ib_post_send_mad+0x3e0/0x512 [ib_mad] [] agent_send_response+0x11b/0x155 [ib_mad] [] ib_mad_completion_handler+0x2ea/0x5e4 [ib_mad] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 } ... key at: [] __key.17415+0x0/0xffffffffffffaf74 [ib_mad] -> (base_lock_keys + cpu){++..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] run_timer_softirq+0x41/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] do_IRQ+0xfc/0x108 [] common_interrupt+0x64/0x65 in-hardirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock+0x21/0x2e [] __mod_timer+0x93/0xc8 [] mod_timer+0x2c/0x2f [] i8042_interrupt+0x2a/0x222 [] handle_IRQ_event+0x28/0x64 [] __do_IRQ+0xac/0x117 [] do_IRQ+0xf7/0x108 [] common_interrupt+0x64/0x65 [<7ffffffffffffffd>] 0x7ffffffffffffffd [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 in-softirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] run_timer_softirq+0x41/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] do_IRQ+0xfc/0x108 [] common_interrupt+0x64/0x65 } ... key at: [] base_lock_keys+0x0/0x20 ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] lock_timer_base+0x21/0x47 [] try_to_del_timer_sync+0x14/0x62 [] del_timer_sync+0x11/0x1a [] ib_mad_complete_send_wr+0x10b/0x1f8 [ib_mad] [] ib_mad_send_done_handler+0x125/0x17a [ib_mad] [] ib_mad_completion_handler+0x571/0x5e4 [ib_mad] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 -> (base_lock_keys + cpu#4){++..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] run_timer_softirq+0x41/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 in-hardirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] lock_timer_base+0x21/0x47 [] del_timer+0x1e/0x5f [] scsi_delete_timer+0x12/0x5d [] scsi_done+0xd/0x1e [] ata_scsi_qc_complete+0xb1/0xc3 [] __ata_qc_complete+0x218/0x225 [] ata_qc_complete+0xcf/0xd5 [] ata_hsm_qc_complete+0x1c6/0x1d8 [] ata_hsm_move+0x618/0x638 [] ata_interrupt+0x161/0x1ae [] handle_IRQ_event+0x28/0x64 [] __do_IRQ+0xac/0x117 [] do_IRQ+0xf7/0x108 [] common_interrupt+0x64/0x65 in-softirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] run_timer_softirq+0x41/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 } ... key at: [] base_lock_keys+0x18/0x20 ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock+0x21/0x2e [] __mod_timer+0x93/0xc8 [] queue_delayed_work+0x75/0x7d [] wait_for_response+0xdf/0xe8 [ib_mad] [] ib_mad_complete_send_wr+0xc1/0x1f8 [ib_mad] [] ib_mad_send_done_handler+0x125/0x17a [ib_mad] [] ib_mad_completion_handler+0x571/0x5e4 [ib_mad] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 -> (base_lock_keys + cpu#2){++..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] run_timer_softirq+0x41/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 in-hardirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock+0x21/0x2e [] __mod_timer+0x93/0xc8 [] mod_timer+0x2c/0x2f [] i8042_interrupt+0x2a/0x222 [] handle_IRQ_event+0x28/0x64 [] __do_IRQ+0xac/0x117 [] do_IRQ+0xf7/0x108 [] common_interrupt+0x64/0x65 in-softirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] run_timer_softirq+0x41/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 } ... key at: [] base_lock_keys+0x8/0x20 ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock+0x21/0x2e [] __mod_timer+0x93/0xc8 [] queue_delayed_work+0x75/0x7d [] wait_for_response+0xdf/0xe8 [ib_mad] [] ib_mad_complete_send_wr+0xc1/0x1f8 [ib_mad] [] ib_mad_send_done_handler+0x125/0x17a [ib_mad] [] ib_mad_completion_handler+0x571/0x5e4 [ib_mad] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] ib_modify_mad+0x27/0x14c [ib_mad] [] ib_cancel_mad+0xa/0xd [ib_mad] [] ib_sa_cancel_query+0x6a/0x73 [ib_sa] [] wait_for_mcast_join+0x35/0xd2 [ib_ipoib] [] ipoib_mcast_stop_thread+0xa6/0xe6 [ib_ipoib] [] ipoib_mcast_restart_task+0x4b/0x3d9 [ib_ipoib] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 -> (modlist_lock){.+..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] module_text_address+0x15/0x3b [] __register_kprobe+0x2bc/0x2da [] register_kprobe+0xc/0xf [] arch_init_kprobes+0xf/0x12 [] init_kprobes+0x3f/0x52 [] init+0x143/0x308 [] child_rip+0x7/0x12 in-softirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] is_module_address+0x14/0x80 [] static_obj+0x6c/0x79 [] lockdep_init_map+0x7d/0xc3 [] __spin_lock_init+0x2e/0x58 [] ipoib_mcast_alloc+0x97/0xce [ib_ipoib] [] ipoib_mcast_send+0x109/0x413 [ib_ipoib] [] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [] dev_hard_start_xmit+0x1ab/0x221 [] __qdisc_run+0xfa/0x1cd [] dev_queue_xmit+0x136/0x263 [] neigh_connected_output+0xae/0xc7 [] ip6_output2+0x254/0x28c [ipv6] [] ip6_output+0x7e2/0x7f8 [ipv6] [] ndisc_send_rs+0x33d/0x46f [ipv6] [] addrconf_dad_completed+0x90/0xe2 [ipv6] [] addrconf_dad_timer+0x74/0x11e [ipv6] [] run_timer_softirq+0x150/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 } ... key at: [] modlist_lock+0x18/0x40 ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] is_module_address+0x14/0x80 [] static_obj+0x6c/0x79 [] lockdep_init_map+0x7d/0xc3 [] __spin_lock_init+0x2e/0x58 [] ipoib_mcast_alloc+0x97/0xce [ib_ipoib] [] ipoib_mcast_restart_task+0x174/0x3d9 [ib_ipoib] [] run_workqueue+0xa0/0xf7 [] worker_thread+0xee/0x122 [] kthread+0xd0/0xfb [] child_rip+0x7/0x12 -> (&qp->sq.lock){.+..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x28/0x35 [] mthca_modify_qp+0x67/0xd7b [ib_mthca] [] ib_modify_qp+0xc/0xf [ib_core] [] ib_mad_init_device+0x2db/0x55c [ib_mad] [] ib_register_device+0x20d/0x300 [ib_core] [] mthca_register_device+0x3e0/0x430 [ib_mthca] [] mthca_init_one+0xbc2/0xcbc [ib_mthca] [] pci_device_probe+0x4b/0x72 [] driver_probe_device+0x59/0xaf [] __driver_attach+0x58/0x90 [] bus_for_each_dev+0x48/0x7a [] driver_attach+0x1b/0x1e [] bus_add_driver+0x74/0x112 [] driver_register+0x8c/0x91 [] __pci_register_driver+0x60/0x84 [] 0xffffffff88022016 [] sys_init_module+0x174c/0x1884 [] system_call+0x7d/0x83 in-softirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] mthca_arbel_post_send+0x2f/0x5be [ib_mthca] [] ipoib_send+0x117/0x1d6 [ib_ipoib] [] ipoib_mcast_send+0x3fb/0x413 [ib_ipoib] [] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [] dev_hard_start_xmit+0x1ab/0x221 [] __qdisc_run+0xfa/0x1cd [] dev_queue_xmit+0x136/0x263 [] neigh_connected_output+0xae/0xc7 [] ip6_output2+0x254/0x28c [ipv6] [] ip6_output+0x7e2/0x7f8 [ipv6] [] ndisc_send_ns+0x38f/0x4c1 [ipv6] [] addrconf_dad_timer+0xfb/0x11e [ipv6] [] run_timer_softirq+0x150/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 } ... key at: [] __key.20374+0x0/0xffffffffffff231a [ib_mthca] -> (&qp->rq.lock){+...} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock+0x21/0x2e [] mthca_modify_qp+0x76/0xd7b [ib_mthca] [] ib_modify_qp+0xc/0xf [ib_core] [] ib_mad_init_device+0x2db/0x55c [ib_mad] [] ib_register_device+0x20d/0x300 [ib_core] [] mthca_register_device+0x3e0/0x430 [ib_mthca] [] mthca_init_one+0xbc2/0xcbc [ib_mthca] [] pci_device_probe+0x4b/0x72 [] driver_probe_device+0x59/0xaf [] __driver_attach+0x58/0x90 [] bus_for_each_dev+0x48/0x7a [] driver_attach+0x1b/0x1e [] bus_add_driver+0x74/0x112 [] driver_register+0x8c/0x91 [] __pci_register_driver+0x60/0x84 [] 0xffffffff88022016 [] sys_init_module+0x174c/0x1884 [] system_call+0x7d/0x83 in-hardirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] mthca_arbel_post_receive+0x35/0x23a [ib_mthca] [] ipoib_ib_post_receive+0x81/0xf9 [ib_ipoib] [] ipoib_ib_completion+0x286/0x3ff [ib_ipoib] [] mthca_cq_completion+0x65/0x6b [ib_mthca] [] mthca_eq_int+0x70/0x3d2 [ib_mthca] [] mthca_arbel_interrupt+0x3b/0x98 [ib_mthca] [] handle_IRQ_event+0x28/0x64 [] __do_IRQ+0xac/0x117 [] do_IRQ+0xf7/0x108 [] common_interrupt+0x64/0x65 } ... key at: [] __key.20375+0x0/0xffffffffffff2322 [ib_mthca] ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock+0x21/0x2e [] mthca_modify_qp+0x76/0xd7b [ib_mthca] [] ib_modify_qp+0xc/0xf [ib_core] [] ib_mad_init_device+0x2db/0x55c [ib_mad] [] ib_register_device+0x20d/0x300 [ib_core] [] mthca_register_device+0x3e0/0x430 [ib_mthca] [] mthca_init_one+0xbc2/0xcbc [ib_mthca] [] pci_device_probe+0x4b/0x72 [] driver_probe_device+0x59/0xaf [] __driver_attach+0x58/0x90 [] bus_for_each_dev+0x48/0x7a [] driver_attach+0x1b/0x1e [] bus_add_driver+0x74/0x112 [] driver_register+0x8c/0x91 [] __pci_register_driver+0x60/0x84 [] 0xffffffff88022016 [] sys_init_module+0x174c/0x1884 [] system_call+0x7d/0x83 ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] mthca_arbel_post_send+0x2f/0x5be [ib_mthca] [] ipoib_send+0x117/0x1d6 [ib_ipoib] [] ipoib_mcast_send+0x3fb/0x413 [ib_ipoib] [] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [] dev_hard_start_xmit+0x1ab/0x221 [] __qdisc_run+0xfa/0x1cd [] dev_queue_xmit+0x136/0x263 [] neigh_connected_output+0xae/0xc7 [] ip6_output2+0x254/0x28c [ipv6] [] ip6_output+0x7e2/0x7f8 [ipv6] [] ndisc_send_ns+0x38f/0x4c1 [ipv6] [] addrconf_dad_timer+0xfb/0x11e [ipv6] [] run_timer_softirq+0x150/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 ... acquired at: [] lock_acquire+0x4a/0x69 [] _spin_lock+0x21/0x2e [] ipoib_mcast_send+0x28/0x413 [ib_ipoib] [] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [] dev_hard_start_xmit+0x1ab/0x221 [] __qdisc_run+0xfa/0x1cd [] dev_queue_xmit+0x136/0x263 [] neigh_connected_output+0xae/0xc7 [] ip6_output2+0x254/0x28c [ipv6] [] ip6_output+0x7e2/0x7f8 [ipv6] [] ndisc_send_ns+0x38f/0x4c1 [ipv6] [] addrconf_dad_timer+0xfb/0x11e [ipv6] [] run_timer_softirq+0x150/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 the hard-irq-unsafe lock's dependencies: -> (&skb_queue_lock_key){-+..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] skb_queue_tail+0x1c/0x47 [] netlink_broadcast+0x212/0x2ea [] kobject_uevent+0x3c9/0x43d [] store_uevent+0x16/0x1e [] class_device_attr_store+0x1b/0x1e [] sysfs_write_file+0xb7/0xe4 [] vfs_write+0xad/0x154 [] sys_write+0x46/0x6f [] system_call+0x7d/0x83 in-softirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] skb_dequeue+0x18/0x5c [] skb_queue_purge+0x18/0x26 [] neigh_timer_handler+0x26f/0x346 [] run_timer_softirq+0x150/0x1dc [] __do_softirq+0x6b/0xf7 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x59/0x5f [] apic_timer_interrupt+0x69/0x70 hardirq-on-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_bh+0x26/0x33 [] udp_ioctl+0x46/0x87 [] inet_ioctl+0x8c/0x8f [] sock_ioctl+0x1c0/0x1ea [] do_ioctl+0x26/0x74 [] vfs_ioctl+0x24a/0x264 [] sys_ioctl+0x41/0x68 [] system_call+0x7d/0x83 } ... key at: [] skb_queue_lock_key+0x0/0x18 stack backtrace: Call Trace: [] show_trace+0xaa/0x238 [] dump_stack+0x13/0x15 [] check_usage+0x282/0x293 [] __lock_acquire+0x85c/0xa29 [] lock_acquire+0x4b/0x69 [] _spin_lock_irqsave+0x2c/0x3c [] skb_queue_tail+0x1d/0x47 [] :ib_ipoib:ipoib_mcast_send+0x19a/0x413 [] :ib_ipoib:ipoib_start_xmit+0x362/0x66d [] dev_hard_start_xmit+0x1ac/0x221 [] __qdisc_run+0xfb/0x1cd [] dev_queue_xmit+0x137/0x263 [] neigh_connected_output+0xaf/0xc7 [] :ipv6:ip6_output2+0x255/0x28c [] :ipv6:ip6_output+0x7e3/0x7f8 [] :ipv6:ndisc_send_rs+0x33e/0x46f [] :ipv6:addrconf_dad_completed+0x91/0xe2 [] :ipv6:addrconf_dad_timer+0x75/0x11e [] run_timer_softirq+0x151/0x1dc [] __do_softirq+0x6c/0xf7 [] call_softirq+0x1e/0x28 ib0: no IPv6 routers present From tziporet at mellanox.co.il Tue Jul 11 22:53:07 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 12 Jul 2006 08:53:07 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> Hi All, I wish to start the release process of OFED 1.1. I would like that we will have a meeting next Monday to review this proposal of the release features and schedule. If possible I wish to move the meeting hour from 9am PST to 11am or 11:30am PST Tziporet ------------------------------------------------------------------------ --------- Schedule: Target release date: 24-Aug Intermediate milestones: * Development: now - 31-Jul * Create 1.1 branch of user level code and rc1: 24-Jul * Features freeze (rc2): 31-Jul * Code freeze (rc-x): 18- Aug Features: * OS: * Novell: - SLES 9.0 SP3* - SLES10 (official release)* * Redhat: - Redhat EL4 up2 - Redhat EL4 up3 * kernel.org: - Kernel 2.6.17* * changes from last release Note: Fedora C4 and SuSE Pro 10 were dropped from the list since I have not seen so many customers requesting them. We will keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Please reply if this is acceptable * General changes: - lib32 on 64 bits systems - Add madeye utility - Kernel code based on 2.6.18 - Bug fixes * Core: - Set options in CMA & uCMA (needed for Intel MPI) - HCA fatal - full flow support - Huge pages support * OSM: - Partition Manager (Pkey) - Pre-computed routing load from file * SDP: - Beta quality - Improved latency - Improved bandwidth of small messages (by implementing the Naggle algorithm) - Support the backlog parameter in the listen call - Interoperability with other SDP implementations - support sending/receiving out of band data * SRP: - GA quality - DM (Device Mapper) - for high availability - Basic failover/failback testing with daemon+srp+XVM/MPP and Engenio target * IPoIB - Performance tuning - Bonding - for high availability * uDAPL: - Scalability features needed for Intel MPI - take from trunk * Arlin & James - please reply if there are more features needed. * OSU - MVAPICH - Based on 0.97 (we will not move to 0.98 since we tested it and found it is less stable then 0.97) - Message coalescing * Open MPI - TBD from Jeff * MPI tests: - Replace to the new test versions from LLNL, Intel, OSU * iSER - Any update Voltaire will drive to kernel 2.6.18 * RDS: - TBD - Oracle and SilverStorm should decide what should be in. Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Wed Jul 12 00:13:44 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 12 Jul 2006 10:13:44 +0300 Subject: [openib-general] link to the ipoib charter page pointed by the kernel IB docs is broken Message-ID: <44B4A128.6010900@voltaire.com> Hi Hal, I think you were involved in setting/updating the pointers from the IPoIB kernel docs to the IETF website... I see now that the link pointed by drivers/infiniband/ulp/ipoib/Kconfig and Documentation/infiniband/ipoib is broken, i can find many copies of it eg http://mirror.switch.ch/ftp/doc/ietf/ipoib/ipoib-charter.txt but the original one http://www.ietf.org/html.charters/ipoib-charter.html does not work. Or. From ogerlitz at voltaire.com Wed Jul 12 00:21:28 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 12 Jul 2006 10:21:28 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> Message-ID: <44B4A2F8.3000609@voltaire.com> Tziporet Koren wrote: > I wish to start the release process of OFED 1.1. > • IPoIB > – Bonding - for high availability Can you point me to the person/company which is working on this? I've started to look on it as well so we can exchange ideas and join forces. > • iSER > – Any update Voltaire will drive to kernel 2.6.18 OK, please use the iser code in the for-2.6.18 branch of Roland's GIT tree @kernel.org Or. From sweitzen at cisco.com Wed Jul 12 00:27:14 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 12 Jul 2006 00:27:14 -0700 Subject: [openib-general] OFED 1.1 release - schedule and features Message-ID: For SDP, I would like to see "improved stability" (maybe you have this in mind under "beta quality"), also how about "AIO support"? The rest of the list looks good. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Tuesday, July 11, 2006 10:53 PM To: OpenFabricsEWG Cc: openib Subject: [openib-general] OFED 1.1 release - schedule and features Hi All, I wish to start the release process of OFED 1.1. I would like that we will have a meeting next Monday to review this proposal of the release features and schedule. If possible I wish to move the meeting hour from 9am PST to 11am or 11:30am PST Tziporet ------------------------------------------------------------------------ --------- Schedule: Target release date: 24-Aug Intermediate milestones: * Development: now - 31-Jul * Create 1.1 branch of user level code and rc1: 24-Jul * Features freeze (rc2): 31-Jul * Code freeze (rc-x): 18- Aug Features: * OS: * Novell: - SLES 9.0 SP3* - SLES10 (official release)* * Redhat: - Redhat EL4 up2 - Redhat EL4 up3 * kernel.org: - Kernel 2.6.17* * changes from last release Note: Fedora C4 and SuSE Pro 10 were dropped from the list since I have not seen so many customers requesting them. We will keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Please reply if this is acceptable * General changes: - lib32 on 64 bits systems - Add madeye utility - Kernel code based on 2.6.18 - Bug fixes * Core: - Set options in CMA & uCMA (needed for Intel MPI) - HCA fatal - full flow support - Huge pages support * OSM: - Partition Manager (Pkey) - Pre-computed routing load from file * SDP: - Beta quality - Improved latency - Improved bandwidth of small messages (by implementing the Naggle algorithm) - Support the backlog parameter in the listen call - Interoperability with other SDP implementations - support sending/receiving out of band data * SRP: - GA quality - DM (Device Mapper) - for high availability - Basic failover/failback testing with daemon+srp+XVM/MPP and Engenio target * IPoIB - Performance tuning - Bonding - for high availability * uDAPL: - Scalability features needed for Intel MPI - take from trunk * Arlin & James - please reply if there are more features needed. * OSU - MVAPICH - Based on 0.97 (we will not move to 0.98 since we tested it and found it is less stable then 0.97) - Message coalescing * Open MPI - TBD from Jeff * MPI tests: - Replace to the new test versions from LLNL, Intel, OSU * iSER - Any update Voltaire will drive to kernel 2.6.18 * RDS: - TBD - Oracle and SilverStorm should decide what should be in. Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Wed Jul 12 00:58:12 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 12 Jul 2006 10:58:12 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <44B4A2F8.3000609@voltaire.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> <44B4A2F8.3000609@voltaire.com> Message-ID: <44B4AB94.1080405@mellanox.co.il> Or Gerlitz wrote: >> • IPoIB >> – Bonding - for high availability > > Can you point me to the person/company which is working on this? I've > started to look on it as well so we can exchange ideas and join forces. Vlad and Eitan from Mellanox are working on this > > >> • iSER >> – Any update Voltaire will drive to kernel 2.6.18 > > OK, please use the iser code in the for-2.6.18 branch of Roland's GIT > tree @kernel.org This is what we are doing Tziporet From tziporet at mellanox.co.il Wed Jul 12 00:59:27 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 12 Jul 2006 10:59:27 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: References: Message-ID: <44B4ABDF.9030301@mellanox.co.il> Scott Weitzenkamp (sweitzen) wrote: > For SDP, I would like to see "improved stability" (maybe you have this > in mind under "beta quality"), also how about "AIO support"? The rest > of the list looks good. > Yes - beta quality means improved stability. AIO is not planed for 1.1 (schedule issue). If needed we can add it to 1.2 Tziporet From mst at mellanox.co.il Wed Jul 12 01:40:11 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 12 Jul 2006 11:40:11 +0300 Subject: [openib-general] [PATCH 0/2] perftest: enhancement to rdma_bw to allow use of RDMA CM In-Reply-To: <20060705093054.GA15470@harry-potter.in.ibm.com> References: <20060705093054.GA15470@harry-potter.in.ibm.com> Message-ID: <20060712084011.GA18466@mellanox.co.il> Quoting r. Pradipta Kumar Banerjee : > Subject: [PATCH 0/2] perftest: enhancement to rdma_bw to allow use of RDMA CM > > This patchset allows rdma_bw to use RDMA CM. This patch tries to address the > comments from Michael Tsirkin on the earlier posted patch by Steve Wise. > See the thread below for history. Thanks - checked in revision 8498. -- MST From krkumar2 at in.ibm.com Wed Jul 12 02:30:37 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Wed, 12 Jul 2006 15:00:37 +0530 Subject: [openib-general] [RFC] [PATCH 2/7] ibrdmaverbs config files 2 Message-ID: <1152696637.4636.19.camel@localhost.localdomain> > The problem with that is then there are two libraries to maintain, > fixes have to be merged twice, etc. It's much better to follow an > evolutionary path. Thanks for the feedback. OK, I will make the changes and re-submit. From mingo at elte.hu Wed Jul 12 02:38:20 2006 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 12 Jul 2006 11:38:20 +0200 Subject: [openib-general] ipoib lockdep warning In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> Message-ID: <20060712093820.GA9218@elte.hu> * Roland Dreier wrote: > Hmm, good point. > > It sort of seems to me like the idr interfaces are broken by design. [...] > So, ugh... maybe the best thing to do is change lib/idr.c to use > spin_lock_irqsave() internally? i agree that the IDR subsystem should be irq-safe if GFP_ATOMIC is passed in. So the _irqsave()/_irqrestore() fix should be done. But i also think that you should avoid using GFP_ATOMIC for any sort of reliable IO path and push as much work into process context as possible. Is it acceptable for your infiniband IO model to fail with -ENOMEM if GFP_ATOMIC happens to fail, and is the IO retried transparently? Ingo From halr at voltaire.com Wed Jul 12 03:32:33 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 06:32:33 -0400 Subject: [openib-general] [PATCH] OpenSM: Support configurable SL per partition In-Reply-To: <44B3C059020000C700012B8C@sinclair.provo.novell.com> References: <44B3C059020000C700012B8C@sinclair.provo.novell.com> Message-ID: <1152700352.13048.112347.camel@hal.voltaire.com> Hi Pat, On Tue, 2006-07-11 at 17:14, Patrick Mullaney wrote: > This will avoid an invalid warning about service level value if sl=0 is > used in the partition config file. Can you include something like it in > your original patch? Yes, SL 0 is valid so this warning should not be output. I will include this patch when I commit this functionality which will hopefully be over the next few days. Thanks for catching this! -- Hal > > Thanks. > Pat > > -------------------------------------------------------------------------------------------------- > > --- opensm.old/osm_prtn_config.c 2006-07-11 16:06:33.000000000 > -0400 > +++ opensm/osm_prtn_config.c 2006-07-11 16:08:00.000000000 -0400 > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -159,7 +160,7 @@ static int partition_add_flag(unsigned l > } else if (!strncmp(flag, "sl", len)) { > unsigned sl; > > - if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > > 15) > + if (!val || (sl = strtoul(val, NULL, 0)) == UINT_MAX || > sl > 15) > PARSEWARN(conf->p_log, lineno, > "flag \'sl\' requires valid value" > " - skipped.\n"); > From halr at voltaire.com Wed Jul 12 03:36:55 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 06:36:55 -0400 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> Message-ID: <1152700614.13048.112582.camel@hal.voltaire.com> On Wed, 2006-07-12 at 01:53, Tziporet Koren wrote: > Hi All, > > > > I wish to start the release process of OFED 1.1. > > I would like that we will have a meeting next Monday to review this > proposal of the release features and schedule. > > If possible I wish to move the meeting hour from 9am PST to 11am or > 11:30am PST > > > > Tziporet > > > > --------------------------------------------------------------------------------- > > > > Schedule: > > Target release date: 24-Aug > > Intermediate milestones: > > • Development: now – 31-Jul > > • Create 1.1 branch of user level code and rc1: 24-Jul > > • Features freeze (rc2): 31-Jul > > • Code freeze (rc-x): 18- Aug > > > > Features: > > • OS: > > • Novell: > > – SLES 9.0 SP3* > > – SLES10 (official release)* > > • Redhat: > > – Redhat EL4 up2 > > – Redhat EL4 up3 > > • kernel.org: > > – Kernel 2.6.17* > > * changes from last release > > > > Note: Fedora C4 and SuSE Pro 10 were dropped from the list since I > have not seen so many customers requesting them. > > We will keep the backport patches for these OSes and make sure OFED > compile and loaded properly but will not do full QA cycle. > > Please reply if this is acceptable > > > > • General changes: > > – lib32 on 64 bits systems > > – Add madeye utility > > – Kernel code based on 2.6.18 > > – Bug fixes > > • Core: > > – Set options in CMA & uCMA (needed for Intel MPI) > > – HCA fatal - full flow support > > – Huge pages support > > • OSM: > > – Partition Manager (Pkey) Also, primitive QoS support. > – Pre-computed routing load from file Also, diags: Add saquery tool Enhancement to ibnetdiscover tool with grouping function > • SDP: > > – Beta quality > > – Improved latency > > – Improved bandwidth of small messages (by implementing the > Naggle algorithm) > > – Support the backlog parameter in the listen call > > – Interoperability with other SDP implementations > > – support sending/receiving out of band data > > • SRP: > > – GA quality > > – DM (Device Mapper) - for high availability > > – Basic failover/failback testing with daemon+srp+XVM/MPP and > Engenio target > > • IPoIB > > – Performance tuning > > – Bonding - for high availability > > • uDAPL: > > – Scalability features needed for Intel MPI – take from trunk > > • Arlin & James – please reply if there are more features > needed. > > • OSU - MVAPICH > > – Based on 0.97 (we will not move to 0.98 since we tested it > and found it is less stable then 0.97) > > – Message coalescing > > • Open MPI > > – TBD from Jeff > > • MPI tests: > > – Replace to the new test versions from LLNL, Intel, OSU > > • iSER > > – Any update Voltaire will drive to kernel 2.6.18 > > • RDS: > > – TBD – Oracle and SilverStorm should decide what should be in. > > > > > > Tziporet Koren > > Software Director > > Mellanox Technologies > > mailto: tziporet at mellanox.co.il > Tel +972-4-9097200, ext 380 > > > > > > ______________________________________________________________________ > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg From halr at voltaire.com Wed Jul 12 03:46:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 06:46:43 -0400 Subject: [openib-general] link to the ipoib charter page pointed by the kernel IB docs is broken In-Reply-To: <44B4A128.6010900@voltaire.com> References: <44B4A128.6010900@voltaire.com> Message-ID: <1152701198.13048.113048.camel@hal.voltaire.com> Hi Or, On Wed, 2006-07-12 at 03:13, Or Gerlitz wrote: > Hi Hal, > > I think you were involved in setting/updating the pointers from the > IPoIB kernel docs to the IETF website... > I see now that the link pointed by drivers/infiniband/ulp/ipoib/Kconfig > and Documentation/infiniband/ipoib is broken, i can find many copies of > it eg http://mirror.switch.ch/ftp/doc/ietf/ipoib/ipoib-charter.txt but > the original one http://www.ietf.org/html.charters/ipoib-charter.html > does not work. That is likely due to the IPoIB WG was recently disbanded (as of the Montreal IETF this week). Those pointers/files need updating now. -- Hal > Or. > From tziporet at mellanox.co.il Wed Jul 12 03:51:16 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 12 Jul 2006 13:51:16 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <1152700614.13048.112582.camel@hal.voltaire.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> <1152700614.13048.112582.camel@hal.voltaire.com> Message-ID: <44B4D424.5030003@mellanox.co.il> Hal Rosenstock wrote: >> • OSM: >> >> – Partition Manager (Pkey) >> > > Also, primitive QoS support. > > >> – Pre-computed routing load from file >> > > Also, diags: > > Add saquery tool > > Enhancement to ibnetdiscover tool with grouping function > OK - I will update my plans with these features. BTW - I count on you to be the owner of madaye Tziporet From mst at mellanox.co.il Wed Jul 12 04:09:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 12 Jul 2006 14:09:55 +0300 Subject: [openib-general] ipoib lockdep warning In-Reply-To: <20060712093820.GA9218@elte.hu> References: <20060712093820.GA9218@elte.hu> Message-ID: <20060712110955.GB18466@mellanox.co.il> Quoting r. Ingo Molnar : > But i also think that you should avoid using GFP_ATOMIC for any sort of > reliable IO path and push as much work into process context as possible. > Is it acceptable for your infiniband IO model to fail with -ENOMEM if > GFP_ATOMIC happens to fail, and is the IO retried transparently? Yes, this is true for users that pass GFP_ATOMIC to sa_query, at least. But might not be so for other users: send_mad in sa_query actually gets gfp_flags parameter, but for some reason it does not pass it to idr_pre_get, which means even sa query done with GFP_KERNEL flag is likely to fail. Sean, it seems we need something like the following - what do you think? -- Avoid bogus out out memory errors: fix sa_query to actually pass gfp_mask supplied by the user to idr_pre_get. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index e911c99..aeda484 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -488,13 +488,13 @@ static void init_mad(struct ib_sa_mad *m spin_unlock_irqrestore(&tid_lock, flags); } -static int send_mad(struct ib_sa_query *query, int timeout_ms) +static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) { unsigned long flags; int ret, id; retry: - if (!idr_pre_get(&query_idr, GFP_ATOMIC)) + if (!idr_pre_get(&query_idr, gfp_mask)) return -ENOMEM; spin_lock_irqsave(&idr_lock, flags); ret = idr_get_new(&query_idr, query, &id); @@ -630,7 +630,7 @@ int ib_sa_path_rec_get(struct ib_device *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -752,7 +752,7 @@ int ib_sa_service_rec_query(struct ib_de *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -844,7 +844,7 @@ int ib_sa_mcmember_rec_query(struct ib_d *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; -- MST From halr at voltaire.com Wed Jul 12 04:11:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 07:11:02 -0400 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <000301c6a525$ad0ed040$e598070a@amr.corp.intel.com> References: <000301c6a525$ad0ed040$e598070a@amr.corp.intel.com> Message-ID: <1152702660.13048.114238.camel@hal.voltaire.com> On Tue, 2006-07-11 at 16:07, Sean Hefty wrote: [snip...] > >> After further testing, this patch breaks osmtest as a result of modifying the > >> TID for a SEND. > > > >What does the test do? > > Hmm... I just reran the test, and it worked now. Now I'm really confused as to > what the problem was. The SM / SA classes don't use SENDs, so I was struggling > to understand the issue. That's correct and furthermore I don't think osmtest does any send method operations. The only thing it would do that for would be a negative test (to see that the SM/SA rejects an unknown method for a class). > Maybr I'm hitting into some sort of SM conflict that > takes a while to resolve itself... Rather than conflict, there is a caveat in when to run osmtest. I'll respond with the specifics in a response to a subsequent email of yours. -- Hal From halr at voltaire.com Wed Jul 12 04:13:48 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 07:13:48 -0400 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <44B42C7A.9030207@ichips.intel.com> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> <44B42C7A.9030207@ichips.intel.com> Message-ID: <1152702826.13048.114411.camel@hal.voltaire.com> On Tue, 2006-07-11 at 18:55, Sean Hefty wrote: > Other issues that I've been running into appear to be related to a combination > of timing issues running the tests too quickly after starting opensm Yes, if opensm has not gotten far enough, osmtest will fail. The SM must initialize the subnet and get to the point where it is ready to process SA MADs. Just to be sure, what osmtest failure did you see ? > and running multiple copies of opensm on different systems. Not sure what that would fail. The other SMs should be standbys. I can't think of what would fail in osmtest off the top of my head but haven't tried this yet but am now about to. -- Hal From halr at voltaire.com Wed Jul 12 04:23:54 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 07:23:54 -0400 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <1152702826.13048.114411.camel@hal.voltaire.com> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> <44B42C7A.9030207@ichips.intel.com> <1152702826.13048.114411.camel@hal.voltaire.com> Message-ID: <1152703433.13048.114866.camel@hal.voltaire.com> On Wed, 2006-07-12 at 07:13, Hal Rosenstock wrote: > > and running multiple copies of opensm on different systems. > > Not sure what that would fail. The other SMs should be standbys. I can't > think of what would fail in osmtest off the top of my head but haven't > tried this yet but am now about to. I think I see what might be going on. To confirm, can you increase the transaction timeout on the osmtest side and see if that helps ? Something like: osmtest -f a -t 1000 -- Hal From halr at voltaire.com Wed Jul 12 05:29:56 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 08:29:56 -0400 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <44B4D424.5030003@mellanox.co.il> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> <1152700614.13048.112582.camel@hal.voltaire.com> <44B4D424.5030003@mellanox.co.il> Message-ID: <1152707394.13048.118178.camel@hal.voltaire.com> On Wed, 2006-07-12 at 06:51, Tziporet Koren wrote: > Hal Rosenstock wrote: > >> • OSM: > >> > >> – Partition Manager (Pkey) > >> > > > > Also, primitive QoS support. > > > > > >> – Pre-computed routing load from file > >> > > > > Also, diags: > > > > Add saquery tool > > > > Enhancement to ibnetdiscover tool with grouping function > > > > OK - I will update my plans with these features. Thanks. > BTW - I count on you to be the owner of madaye We can cover SLES10 and RHEL4 on x86_64 and x86. Can Mellanox or someone else pick up the other "holes" in the matrix ? -- Hal From halr at voltaire.com Wed Jul 12 05:34:50 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 08:34:50 -0400 Subject: [openib-general] openSM - IS_SM capability mask problem In-Reply-To: References: <1152611577.13048.41435.camel@hal.voltaire.com> Message-ID: <1152707687.13048.118412.camel@hal.voltaire.com> On Tue, 2006-07-11 at 09:27, yipee wrote: > Hal Rosenstock voltaire.com> writes: > [snip] > > It's not the setting which is failing. You are likely not using an SM > > which supports this (it is an enhanced capability defined in a 1.2 > > erratum). Are you running a recent OpenSM or something else ? > > > > I'm running a 1.1 openSM on a 2.6.14 kernel. Is there a bug in this > version that causes this problem? It is lacking this 1.2 erratum support which is a feature enhancement. > Should I upgrade to a newer one? I don't know how to answer this question for you but if an old one doesn't do what you want/need, then it might behoove you to upgrade. Note that this feature is supported in the OFED 1.0 version of OpenSM as well as the trunk version so that gives you a couple of options depending on some other factors. > Would it be compatible with the 2.6.14 kernel > modules/userspace libraries? Yes (I'm pretty sure). The user_mad API has not changed in quite some time now. What ABI version is 2.6.14 ? -- Hal > > thanks, > y > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From yipeeyipeeyipeeyipee at yahoo.com Wed Jul 12 06:13:15 2006 From: yipeeyipeeyipeeyipee at yahoo.com (yipeeyipee yipeeyipee) Date: Wed, 12 Jul 2006 06:13:15 -0700 (PDT) Subject: [openib-general] openSM - IS_SM capability mask problem In-Reply-To: <1152707687.13048.118412.camel@hal.voltaire.com> Message-ID: <20060712131315.24542.qmail@web35705.mail.mud.yahoo.com> --- Hal Rosenstock wrote: [snip] Should this IS_SM bit in port attributes be supported in the switch hardware? > Yes (I'm pretty sure). The user_mad API has not > changed in quite some > time now. What ABI version is 2.6.14 ? I don't know where to check this. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From halr at voltaire.com Wed Jul 12 06:08:05 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 09:08:05 -0400 Subject: [openib-general] [PATCH] OpenSM: Support configurable SL per partition In-Reply-To: <44B3C059020000C700012B8C@sinclair.provo.novell.com> References: <44B3C059020000C700012B8C@sinclair.provo.novell.com> Message-ID: <1152709684.13048.120033.camel@hal.voltaire.com> Hi again Pat, On Tue, 2006-07-11 at 17:14, Patrick Mullaney wrote: > This will avoid an invalid warning about service level value if sl=0 is > used in the partition config file. Can you include something like it in > your original patch? > > Thanks. > Pat > > -------------------------------------------------------------------------------------------------- > > --- opensm.old/osm_prtn_config.c 2006-07-11 16:06:33.000000000 > -0400 > +++ opensm/osm_prtn_config.c 2006-07-11 16:08:00.000000000 -0400 > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -159,7 +160,7 @@ static int partition_add_flag(unsigned l > } else if (!strncmp(flag, "sl", len)) { > unsigned sl; > > - if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > > 15) > + if (!val || (sl = strtoul(val, NULL, 0)) == UINT_MAX Should this be UINT_MAX or ULONG_MAX ? -- Hal > || > sl > 15) > PARSEWARN(conf->p_log, lineno, > "flag \'sl\' requires valid value" > " - skipped.\n"); > From halr at voltaire.com Wed Jul 12 06:39:46 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 09:39:46 -0400 Subject: [openib-general] openSM - IS_SM capability mask problem In-Reply-To: <20060712131315.24542.qmail@web35705.mail.mud.yahoo.com> References: <20060712131315.24542.qmail@web35705.mail.mud.yahoo.com> Message-ID: <1152711585.13048.121652.camel@hal.voltaire.com> On Wed, 2006-07-12 at 09:13, yipeeyipee yipeeyipee wrote: > --- Hal Rosenstock wrote: > > [snip] > Should this IS_SM bit in port attributes be supported > in the switch hardware? If you are running an SM on your switch, the IS_SM bit would be on for port 0. Otherwise not. > > Yes (I'm pretty sure). The user_mad API has not > > changed in quite some > > time now. What ABI version is 2.6.14 ? > > I don't know where to check this. One of two places: 1. include/rdma/ib_user_mad.h: IB_USER_MAD_ABI_VERSION 2. libibumad/include/infiniband/umad.h:IB_UMAD_ABI_VERSION I checked the #1 in 2.6.14 and it is version 5 which is fine. The management upgrade is totally compatible. Of course, none of the kernel improvements/fixes in the MAD space are part of this unless you backported them. -- Hal > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From pmullaney at novell.com Wed Jul 12 07:26:41 2006 From: pmullaney at novell.com (Patrick Mullaney) Date: Wed, 12 Jul 2006 08:26:41 -0600 Subject: [openib-general] [PATCH] OpenSM: Support configurable SL per partition Message-ID: <44B4B241020000C700012C08@sinclair.provo.novell.com> Hi Hal, Yea, I think it should be ULONG_MAX and your local variable sl should also be an unsigned long. Its precision will get truncated on the assignment to conf->sl but no problem due to the range being limited(0-15). Pat >>> Hal Rosenstock 07/12/06 7:08 AM >>> Hi again Pat, On Tue, 2006-07-11 at 17:14, Patrick Mullaney wrote: > This will avoid an invalid warning about service level value if sl=0 is > used in the partition config file. Can you include something like it in > your original patch? > > Thanks. > Pat > > -------------------------------------------------------------------------------------------------- > > --- opensm.old/osm_prtn_config.c 2006-07-11 16:06:33.000000000 > -0400 > +++ opensm/osm_prtn_config.c 2006-07-11 16:08:00.000000000 -0400 > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -159,7 +160,7 @@ static int partition_add_flag(unsigned l > } else if (!strncmp(flag, "sl", len)) { > unsigned sl; > > - if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > > 15) > + if (!val || (sl = strtoul(val, NULL, 0)) == UINT_MAX Should this be UINT_MAX or ULONG_MAX ? -- Hal > || > sl > 15) > PARSEWARN(conf->p_log, lineno, > "flag \'sl\' requires valid value" > " - skipped.\n"); > From jlentini at netapp.com Wed Jul 12 09:17:03 2006 From: jlentini at netapp.com (James Lentini) Date: Wed, 12 Jul 2006 12:17:03 -0400 (EDT) Subject: [openib-general] [Bug 146] OFED-1.0 DAPL fails to build on SLES10 on IA64 with IA64_FETCHADD error In-Reply-To: <44B41E6B.6070703@sgi.com> References: <44AD9614.5090907@sgi.com> <44B41E6B.6070703@sgi.com> Message-ID: On Tue, 11 Jul 2006, John Partridge wrote: > The resulting build from your last patch has been installed and we > are in the process of DAPL tests now. I do know that the libdat > works with Intel MPI (although we had to manually create a symlink > from libdat.so.1 to libdat.so - should this not already exist?) Did you install dapl rpm or dapl-devel rpm? The dapl rpm should have libdat.so.1 inside it. > I do have one question about how the dapl RPM's are organized, we are > creating a DAPL interface for the ccNUMA xpmem on SGI Altix systems. > At the moment we have a libdat and libdapl(xpmem). It is our objective > to use the OFED-1.0 libdat, as libdat will be used for non-infiniband > interfaces I don't quite understand why libdat (dat.conf) are not a > separate RPM and the libdapl interfaces installed in a separate RPM ? > Would his not make more sense ? The DAPL provider library can have an arbitrary name. You should name your DAPL provider something unique (e.g. libdaplxpmem). From jlentini at netapp.com Wed Jul 12 09:17:25 2006 From: jlentini at netapp.com (James Lentini) Date: Wed, 12 Jul 2006 12:17:25 -0400 (EDT) Subject: [openib-general] [Bug 146] OFED-1.0 DAPL fails to build on SLES10 on IA64 with IA64_FETCHADD error In-Reply-To: <44B41E6B.6070703@sgi.com> References: <44AD9614.5090907@sgi.com> <44B41E6B.6070703@sgi.com> Message-ID: On Tue, 11 Jul 2006, John Partridge wrote: > James Lentini wrote: > > This is included on IA64 and PPC systems. Since we have not done testing on > > IA64 or PPC, I'm certain that this was a contribution for a IA64 or PPC DAPL > > user. For that reason, I'm not certain why the asm/system.h was included. It > > is certainly possible that it is necessary on PPC but not IA64. > > > > James, > > Thanks your patch has fixed the build errors. I've checked in the patch below to solve this problem: Index: dapl/udapl/linux/dapl_osd.h =================================================================== --- dapl/udapl/linux/dapl_osd.h (revision 8253) +++ dapl/udapl/linux/dapl_osd.h (working copy) @@ -80,6 +80,8 @@ #if defined(__ia64__) || defined(__PPC64__) #include +#endif +#if defined(__PPC64__) #include #endif Index: Makefile.am =================================================================== --- Makefile.am (revision 8253) +++ Makefile.am (working copy) @@ -1,10 +1,11 @@ # $Id: $ +OSFLAGS = -DOS_RELEASE=$(shell expr `uname -r | cut -f1 -d.` \* 65536 + `uname -r | cut -f2 -d.`) # Check for RedHat, needed for ia64 udapl atomic operations (IA64_FETCHADD syntax) if OS_RHEL -OSFLAGS=-DREDHAT_EL4 +OSFLAGS += -DREDHAT_EL4 else -OSFLAGS= +OSFLAGS += endif if DEBUG From krause at cup.hp.com Wed Jul 12 09:22:30 2006 From: krause at cup.hp.com (Michael Krause) Date: Wed, 12 Jul 2006 09:22:30 -0700 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <44B4ABDF.9030301@mellanox.co.il> References: <44B4ABDF.9030301@mellanox.co.il> Message-ID: <6.2.0.14.2.20060712092034.1007a230@esmail.cup.hp.com> At 12:59 AM 7/12/2006, Tziporet Koren wrote: >Scott Weitzenkamp (sweitzen) wrote: > > For SDP, I would like to see "improved stability" (maybe you have this > > in mind under "beta quality"), also how about "AIO support"? The rest > > of the list looks good. > > >Yes - beta quality means improved stability. >AIO is not planed for 1.1 (schedule issue). If needed we can add it to 1.2 Would be nice if people thought about implementing the Sockets API Extensions from the OpenGroup. They provide explicit memory management and async communications which will allow SDP performance to be fully exploited. The benefits go beyond what is found in AIO or on other OS such as Windows. If one were to extend slightly to have explicit RDMA Read and Write from the Sockets API, then it would be quite possible to eliminate SDP entirely for new applications leaving SDP strictly for legacy Sockets environments. Mike >Tziporet > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed Jul 12 09:31:53 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 12 Jul 2006 09:31:53 -0700 Subject: [openib-general] ipoib lockdep warning In-Reply-To: <20060712110955.GB18466@mellanox.co.il> References: <20060712093820.GA9218@elte.hu> <20060712110955.GB18466@mellanox.co.il> Message-ID: <44B523F9.1060501@ichips.intel.com> Michael S. Tsirkin wrote: > Yes, this is true for users that pass GFP_ATOMIC to sa_query, at least. But > might not be so for other users: send_mad in sa_query actually gets gfp_flags > parameter, but for some reason it does not pass it to idr_pre_get, which means > even sa query done with GFP_KERNEL flag is likely to fail. > > Sean, it seems we need something like the following - what do you think? I noticed this same thing looking at the code yesterday. I can't think of any reason why your patch wouldn't work. - Sean From johnip at sgi.com Wed Jul 12 09:35:27 2006 From: johnip at sgi.com (John Partridge) Date: Wed, 12 Jul 2006 11:35:27 -0500 Subject: [openib-general] [Bug 146] OFED-1.0 DAPL fails to build on SLES10 on IA64 with IA64_FETCHADD error In-Reply-To: References: <44AD9614.5090907@sgi.com> <44B41E6B.6070703@sgi.com> Message-ID: <44B524CF.1020906@sgi.com> James Lentini wrote: > > On Tue, 11 Jul 2006, John Partridge wrote: > > >>The resulting build from your last patch has been installed and we >>are in the process of DAPL tests now. I do know that the libdat >>works with Intel MPI (although we had to manually create a symlink >>from libdat.so.1 to libdat.so - should this not already exist?) > > > Did you install dapl rpm or dapl-devel rpm? The dapl rpm should have > libdat.so.1 inside it. I installed the dapl rpm. I do have libdat.so.1 but I also expect a symlink to libdat.so which does not exist (Intel MPI appears to need it) I also noticed that the dat.conf points to /usr/local/ofed/lib/libdaplcma.so but there is no symlink in the /usr/local/ofed/lib directory for it, I do have the libdaplcma.so.1 am I missing something here ? It's not a huge problem one can always create a symlink, but I'm just concerned I have something messed in my rpm. > > >>I do have one question about how the dapl RPM's are organized, we are >>creating a DAPL interface for the ccNUMA xpmem on SGI Altix systems. >>At the moment we have a libdat and libdapl(xpmem). It is our objective >>to use the OFED-1.0 libdat, as libdat will be used for non-infiniband >>interfaces I don't quite understand why libdat (dat.conf) are not a >>separate RPM and the libdapl interfaces installed in a separate RPM ? >>Would his not make more sense ? > > > The DAPL provider library can have an arbitrary name. You should name > your DAPL provider something unique (e.g. libdaplxpmem). OK that makes sense, but my point was should the libdat be a seperate rpm from the libdapl libs ? Thanks John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From troy at scl.ameslab.gov Wed Jul 12 09:40:23 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Wed, 12 Jul 2006 11:40:23 -0500 Subject: [openib-general] ehca issues, again Message-ID: <44B525F7.3030203@scl.ameslab.gov> This is the latest svn ehca code, 2.6.17 kernel. Can I also request that the EHCA driver print out what PHYP firmware it is known to work with, just like mthca prints out a warning if the mellanox card firmware is out of date? And while I'm asking about PHYP, what version are the ehca developer running? p5l6:~# modprobe hcad_mod nr_ports=1 eHCA Infiniband Device Driver (Rel.: SVNEHCA_0009) xics_enable_irq: irq=36868: ibm_int_on returned -3 p5l6:~# p5l6:~# ibv_devinfo libibverbs: Fatal: couldn't read uverbs ABI version. No IB devices found p5l6:~# modprobe ib_uverbs p5l6:~# ibv_devinfo hca_id: ehca0 node_guid: 0002:5500:0003:5a00 sys_image_guid: 0000:0000:0000:0000 vendor_id: 0x5076 vendor_part_id: 0 hw_ver: 0x1000002 phys_port_cnt: 1 port: 1 state: PORT_DOWN (1) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 0 port_lid: 0 port_lmc: 0x00 p5l6:~# ibv_rc_pingpong PU0003 000e0091:ehca_hcall_7arg_7ret HCAD_ERROR opcode=168 ret=ffffffffffffffd3 arg1=1000000003000004 arg2=2000000000000006 arg3=ac0000000000000 arg4=e7a9b000 arg5=0 arg6=0 arg7=0 out1=0 out2=0 out3=0 out4=0 out5=0 out6=80000000000dad58 out7=0 PU0003 000b04c9:internal_modify_qp HCAD_ERROR hipz_h_modify_qp() failed rc=ffffffffffffffd3 ehca_qp=c0000000e3626c80 qp_num=6 Failed to modify QP to INIT p5l6:~# modprobe ib_umad PU0007 000b0073:ehca_define_sqp HCAD_ERROR Port 1 is not active. PU0007 000b0258:ehca_create_qp HCAD_ERROR ehca_define_sqp() failed rc=ffffffffffffffff PU0007 000b02c0:ehca_create_qp <<< failed ret=ffffffea ib_mad: Couldn't create ib_mad QP1 ib_mad: Couldn't open ehca0 port 1 From mshefty at ichips.intel.com Wed Jul 12 09:41:01 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 12 Jul 2006 09:41:01 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <1152703433.13048.114866.camel@hal.voltaire.com> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> <44B42C7A.9030207@ichips.intel.com> <1152702826.13048.114411.camel@hal.voltaire.com> <1152703433.13048.114866.camel@hal.voltaire.com> Message-ID: <44B5261D.4070600@ichips.intel.com> Hal Rosenstock wrote: >>>and running multiple copies of opensm on different systems. >> >>Not sure what that would fail. The other SMs should be standbys. I can't >>think of what would fail in osmtest off the top of my head but haven't >>tried this yet but am now about to. I was starting / stopping openSM on different systems soon before running the tests. Whenever I started it on one of my systems, it would always become the master and reassign LIDs. > I think I see what might be going on. To confirm, can you increase the > transaction timeout on the osmtest side and see if that helps ? > Something like: > > osmtest -f a -t 1000 Thanks - I'll try to rerun the tests and see if I get similar behavior. The failure I was seeing was while running osmtest -f c. If I'm remembering correctly, the second query of that test wasn't getting a response. - Sean From jeff.broughton at qlogic.com Wed Jul 12 09:48:03 2006 From: jeff.broughton at qlogic.com (Jeff Broughton) Date: Wed, 12 Jul 2006 09:48:03 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features Message-ID: <6DB5B58A8E5AB846A7B3B3BFF1B4315A2BDF88@AVEXCH1.qlogic.org> Mike, The whole purpose of SDP is to make sockets go faster without having to have the applications modified. This is what the customers want. I've heard this time and time again, across a wide spectrum of customers. Modifying the sockets API is just defining yet another RDMA API, and we have so many already.... -Jeff ________________________________ From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Michael Krause Sent: Wednesday, July 12, 2006 9:23 AM To: Tziporet Koren; Scott Weitzenkamp (sweitzen) Cc: OpenFabricsEWG; openib Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.1 release - schedule and features At 12:59 AM 7/12/2006, Tziporet Koren wrote: Scott Weitzenkamp (sweitzen) wrote: > For SDP, I would like to see "improved stability" (maybe you have this > in mind under "beta quality"), also how about "AIO support"? The rest > of the list looks good. > Yes - beta quality means improved stability. AIO is not planed for 1.1 (schedule issue). If needed we can add it to 1.2 Would be nice if people thought about implementing the Sockets API Extensions from the OpenGroup. They provide explicit memory management and async communications which will allow SDP performance to be fully exploited. The benefits go beyond what is found in AIO or on other OS such as Windows. If one were to extend slightly to have explicit RDMA Read and Write from the Sockets API, then it would be quite possible to eliminate SDP entirely for new applications leaving SDP strictly for legacy Sockets environments. Mike Tziporet _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From bpradip at in.ibm.com Wed Jul 12 10:30:47 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Wed, 12 Jul 2006 23:00:47 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001401c6a44d$d5e06d90$4710100a@pc.s2io.com> References: <001401c6a44d$d5e06d90$4710100a@pc.s2io.com> Message-ID: <44B531C7.9070302@in.ibm.com> Ravinandan, Do you still see the rping crash? Thanks, Pradipta Kumar. Ravinandan Arakali wrote: > Pradipta, > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > values in /proc/krping. When I re-ran the krping test, I see following > output > openfab2:~ # cat /proc/krping > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > Any idea why rping is dumping core ? > > Has any testing been done using SDP with ammasso cards ? > > Regards, > Ravi > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 11:20 PM > To: Ravinandan Arakali > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: >> Pradipta, >> Following is the output from gdb after core dump. I have also copy-pasted >> the gdb output on client system. >> >> Attached is the dmesg output when krping test is run in verbose mode. >> The ping data on the sender(client) seems okay. The content is shifted >> forward by one character for each packet. On receiver, after receiving > ping >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages >> can be lost during writing to /var/log/messages ? > krping is indeed working!!...Using 'verbose' allows you to see the ping > data. > When not using 'verbose' you see only 'send/recv' messages. >> ----------------------------------------- >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 >> Starting program: >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a >> 0.0.0.0 -p 9999 >> [Thread debugging using libthread_db enabled] >> [New Thread -1210054992 (LWP 3668)] >> ipaddr (0.0.0.0) >> port 9999 >> created cm_id 0x804e6e0 >> [New Thread -1210057824 (LWP 3671)] >> rdma_bind_addr successful >> rdma_listen >> cma_event type 4 cma_id 0x804e968 (child) >> child cma 0x804e968 >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread -1210054992 (LWP 3668)] >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); >> (gdb) bt >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 >> ) at examples/rping.c:767 >> (gdb) >> >> --------------------------------- >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 >> Starting program: >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a >> 17.2.2.102 -p 9999 >> [Thread debugging using libthread_db enabled] >> [New Thread 47388824908032 (LWP 4620)] >> ipaddr (17.2.2.102) >> port 9999 >> created cm_id 0x506b00 >> [New Thread 1082132800 (LWP 4623)] >> cma_event type 0 cma_id 0x506b00 (parent) >> cma_event type 2 cma_id 0x506b00 (parent) >> rdma_resolve_addr - rdma_resolve_route successful >> created pd 0x506e60 >> created channel 0x506e80 >> created cq 0x506ea0 >> created qp 0x506f40 >> rping_setup_buffers called on cb 0x505010 >> allocated & registered buffers... >> [New Thread 1090525504 (LWP 4624)] >> cq_thread started. >> >> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From halr at voltaire.com Wed Jul 12 10:16:14 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 13:16:14 -0400 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <44B5261D.4070600@ichips.intel.com> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> <44B42C7A.9030207@ichips.intel.com> <1152702826.13048.114411.camel@hal.voltaire.com> <1152703433.13048.114866.camel@hal.voltaire.com> <44B5261D.4070600@ichips.intel.com> Message-ID: <1152724571.13048.131846.camel@hal.voltaire.com> On Wed, 2006-07-12 at 12:41, Sean Hefty wrote: > Hal Rosenstock wrote: > >>>and running multiple copies of opensm on different systems. > >> > >>Not sure what that would fail. The other SMs should be standbys. I can't > >>think of what would fail in osmtest off the top of my head but haven't > >>tried this yet but am now about to. > > I was starting / stopping openSM on different systems soon before running the > tests. Not sure I quite understand the sequencing. > Whenever I started it on one of my systems, it would always become the > master and reassign LIDs. Reassigning LIDs is disruptive. However, this should be made to work. You can avoid this by synchronizing the /var/cache/osm/guid2lid file. > > I think I see what might be going on. To confirm, can you increase the > > transaction timeout on the osmtest side and see if that helps ? > > Something like: > > > > osmtest -f a -t 1000 > > Thanks - I'll try to rerun the tests and see if I get similar behavior. > > The failure I was seeing was while running osmtest -f c. If -c fails, there's no use in running a -a. Your failure is a different scenario than I was envisioning. > If I'm remembering > correctly, the second query of that test wasn't getting a response. Can you run with -V and send me the output ? I want to recreate this so I understand what is going on. -- Hal > - Sean From ravinandan.arakali at neterion.com Wed Jul 12 10:55:01 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Wed, 12 Jul 2006 10:55:01 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <44B531C7.9070302@in.ibm.com> Message-ID: <000601c6a5dc$4cbeaaf0$4710100a@pc.s2io.com> I haven't yet tried with a fresh installation. But I did notice that, compared to my current tree, several files under libamso and couple of files under librdma have been removed by Steve Wise. I did the same change to my tree and rebuilt librdmacm.so and amso.so but still see the same crash. Ravi -----Original Message----- From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] Sent: Wednesday, July 12, 2006 10:31 AM To: Ravinandan Arakali Cc: openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARP interface) Ravinandan, Do you still see the rping crash? Thanks, Pradipta Kumar. Ravinandan Arakali wrote: > Pradipta, > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > values in /proc/krping. When I re-ran the krping test, I see following > output > openfab2:~ # cat /proc/krping > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > Any idea why rping is dumping core ? > > Has any testing been done using SDP with ammasso cards ? > > Regards, > Ravi > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 11:20 PM > To: Ravinandan Arakali > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: >> Pradipta, >> Following is the output from gdb after core dump. I have also copy-pasted >> the gdb output on client system. >> >> Attached is the dmesg output when krping test is run in verbose mode. >> The ping data on the sender(client) seems okay. The content is shifted >> forward by one character for each packet. On receiver, after receiving > ping >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages >> can be lost during writing to /var/log/messages ? > krping is indeed working!!...Using 'verbose' allows you to see the ping > data. > When not using 'verbose' you see only 'send/recv' messages. >> ----------------------------------------- >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 >> Starting program: >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a >> 0.0.0.0 -p 9999 >> [Thread debugging using libthread_db enabled] >> [New Thread -1210054992 (LWP 3668)] >> ipaddr (0.0.0.0) >> port 9999 >> created cm_id 0x804e6e0 >> [New Thread -1210057824 (LWP 3671)] >> rdma_bind_addr successful >> rdma_listen >> cma_event type 4 cma_id 0x804e968 (child) >> child cma 0x804e968 >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread -1210054992 (LWP 3668)] >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); >> (gdb) bt >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 >> ) at examples/rping.c:767 >> (gdb) >> >> --------------------------------- >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 >> Starting program: >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a >> 17.2.2.102 -p 9999 >> [Thread debugging using libthread_db enabled] >> [New Thread 47388824908032 (LWP 4620)] >> ipaddr (17.2.2.102) >> port 9999 >> created cm_id 0x506b00 >> [New Thread 1082132800 (LWP 4623)] >> cma_event type 0 cma_id 0x506b00 (parent) >> cma_event type 2 cma_id 0x506b00 (parent) >> rdma_resolve_addr - rdma_resolve_route successful >> created pd 0x506e60 >> created channel 0x506e80 >> created cq 0x506ea0 >> created qp 0x506f40 >> rping_setup_buffers called on cb 0x505010 >> allocated & registered buffers... >> [New Thread 1090525504 (LWP 4624)] >> cq_thread started. >> >> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From sean.hefty at intel.com Wed Jul 12 10:58:33 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 12 Jul 2006 10:58:33 -0700 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <1152724571.13048.131846.camel@hal.voltaire.com> Message-ID: <000001c6a5dc$ca72ff00$ff0da8c0@amr.corp.intel.com> >> I was starting / stopping openSM on different systems soon before running the >> tests. > >Not sure I quite understand the sequencing. I was being somewhat random, just trying to stress things. How quickly will one SM take over for another after one dies? >Can you run with -V and send me the output ? I want to recreate this so >I understand what is going on. I'm having trouble re-creating the error at the moment, but I isolated my test systems from our larger cluster. I will need to reconnect to the cluster and see if I can cause the error again. - Sean From rdreier at cisco.com Wed Jul 12 11:56:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 12 Jul 2006 11:56:37 -0700 Subject: [openib-general] ipoib lockdep warning References: <20060712093820.GA9218@elte.hu> <20060712110955.GB18466@mellanox.co.il> Message-ID: > Avoid bogus out out memory errors: fix sa_query to actually pass gfp_mask > supplied by the user to idr_pre_get. Yes, this looks right to me. - R. From rdreier at cisco.com Wed Jul 12 12:01:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 12 Jul 2006 12:01:37 -0700 Subject: [openib-general] ipoib lockdep warning References: <44B405C8.4040706@oracle.com> <1152686978.3217.14.camel@laptopd505.fenrus.org> Message-ID: > this does not have to be a false positive! > It is not legal to take ANY non-hardirq safe lock after having taken a > lock that's used in hardirq context. > (having said that the skb_queue_tail lock needs a special treatment for > some real false positives; Linus merged that already) ... > Now this assumes your queue is shared with the networking stack, Right, understood -- but ipoib has a private skb queue. Which is why I said it was a false positive. Thanks, Roland From rdreier at cisco.com Wed Jul 12 12:06:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 12 Jul 2006 12:06:37 -0700 Subject: [openib-general] ipoib lockdep warning References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> Message-ID: > i agree that the IDR subsystem should be irq-safe if GFP_ATOMIC is > passed in. So the _irqsave()/_irqrestore() fix should be done. OK, I will send the idr change to Andrew. > But i also think that you should avoid using GFP_ATOMIC for any sort of > reliable IO path and push as much work into process context as possible. > Is it acceptable for your infiniband IO model to fail with -ENOMEM if > GFP_ATOMIC happens to fail, and is the IO retried transparently? Yes, I think it's OK. This idr use is in an inherently unreliable path. With that said, as Michael pointed out, we can change things to use GFP_ATOMIC less. Thanks, Roland From panda at cse.ohio-state.edu Wed Jul 12 12:07:54 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed, 12 Jul 2006 15:07:54 -0400 (EDT) Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> from "Tziporet Koren" at Jul 12, 2006 08:53:07 AM Message-ID: <200607121907.k6CJ7sc8000554@xi.cse.ohio-state.edu> Tziporet, > - Based on 0.97 (we will not move to 0.98 since we tested it and > found it is less stable then 0.97) Could you please indicate which version of 0.9.8 you tested and what are the exact problems you have faced. Please note that 0.9.8 has not been formally released yet. What is currently available is 0.9.8-RC0. We have been continuing our testing. Since the 0.9.8-RC0 release, there have been several fixes applied to the trunk based on our internal testing and feedbacks received from others. We plan to make a formal release soon. Please try the latest version available from the trunk. If you continue to see problems, please let us know and we will be happy to fix these problems before the release. Thanks, DK From rdreier at cisco.com Wed Jul 12 12:16:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 12 Jul 2006 12:16:37 -0700 Subject: [openib-general] OFED 1.1 release - schedule and features References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> Message-ID: To me this schedule seems too short to expect real new features like > - HCA fatal - full flow support > * IPoIB > - Bonding - for high availability that have had no work done (in public at least) yet to be integrated. If 1.1 is going to go to code freeze in 19 days then I don't see how these features can be finished, let alone stabilized in time. In my opinion it would be better to focus on polishing what was already in OFED 1.0 for OFED 1.1. Of course we should work on valuable features like IPoIB bonding, but I don't think they should be tied to OFED 1.1. - R. From rdreier at cisco.com Wed Jul 12 12:16:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 12 Jul 2006 12:16:38 -0700 Subject: [openib-general] link to the ipoib charter page pointed by the kernel IB docs is broken References: <44B4A128.6010900@voltaire.com> Message-ID: > I see now that the link pointed by drivers/infiniband/ulp/ipoib/Kconfig > and Documentation/infiniband/ipoib is broken, i can find many copies of > it eg http://mirror.switch.ch/ftp/doc/ietf/ipoib/ipoib-charter.txt but > the original one http://www.ietf.org/html.charters/ipoib-charter.html > does not work. Probably best to delete those links now that the ipoib working group has shut down. Feel free to send a patch... - R. From venkatesh.babu at 3leafnetworks.com Wed Jul 12 13:20:17 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Wed, 12 Jul 2006 13:20:17 -0700 Subject: [openib-general] APM: QP migration state change when failover triggered by hw Message-ID: <44B55981.6040408@3leafnetworks.com> According to IB Spec either Verbs consumer or CI can initiate the path migration, but it doesn't describe the poliy who should initiate the path migration and doesn't cleary define who should change the state in each case. If the hca (mthca0 - MT25208) supports automatic path migration when the port error (for example pulling the cable) occurrs, who is responsible for changeing the QP state from "Armed" to "Migrated" ? Is it the CI or the verbs consumer ? With the same MT25208 HCA, using the Gen1 code this APM functionality works fine. When cable is removed from the port corresponding to primary path, CI migrates to the alternate path; changes the QP state from "Armed" to "Migrated" and sends an event IB_QP_PATH_MIGRATED to verbs consumer. With OFED 1.0, when cable is removed from the port corresponding to the primary path, CI sends an event IB_EVENT_PATH_MIG, but is not changeing the state to "Migrated" and not migrating to the alternate path. So the traffic doesn't resume on the alternate path. With OFED 1.0, verbs consumer initiates the failover by explicitly changing the state from "Armed" to "Migrated", it correctly migrates to the alternate path and traffic resumes on that. VBabu From halr at voltaire.com Wed Jul 12 13:23:41 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 16:23:41 -0400 Subject: [openib-general] [PATCH] [MINOR] OpenSM/SA: Minor reordering of SA rcv_process functions to be more consistent Message-ID: <1152735803.13048.140956.camel@hal.voltaire.com> OpenSM/SA: Minor reordering of SA rcv_process functions to be more consistent Also, some cosmetic changes Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_guidinfo_record.c =================================================================== --- opensm/osm_sa_guidinfo_record.c (revision 8491) +++ opensm/osm_sa_guidinfo_record.c (working copy) @@ -426,6 +426,18 @@ osm_gir_rcv_process( CL_ASSERT( p_rcvd_mad->attr_id == IB_MAD_ATTR_GUIDINFO_RECORD ); + /* we only support SubnAdmGet and SubnAdmGetTable methods */ + if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && + (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_gir_rcv_process: ERR 5105: " + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_rcvd_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); + goto Exit; + } + /* update the requester physical port. */ p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, p_rcv->p_subn, @@ -438,17 +450,6 @@ osm_gir_rcv_process( goto Exit; } - if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && - (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_gir_rcv_process: ERR 5105: " - "Unsupported Method (%s)\n", - ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); - goto Exit; - } - if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) osm_dump_guidinfo_record( p_rcv->p_log, p_rcvd_rec, OSM_LOG_DEBUG ); Index: opensm/osm_sa_lft_record.c =================================================================== --- opensm/osm_sa_lft_record.c (revision 8491) +++ opensm/osm_sa_lft_record.c (working copy) @@ -303,8 +303,8 @@ __osm_lftr_rcv_by_comp_mask( **********************************************************************/ void osm_lftr_rcv_process( - IN osm_lftr_rcv_t* const p_rcv, - IN const osm_madw_t* const p_madw ) + IN osm_lftr_rcv_t* const p_rcv, + IN const osm_madw_t* const p_madw ) { const ib_sa_mad_t* p_rcvd_mad; const ib_lft_record_t* p_rcvd_rec; @@ -333,6 +333,17 @@ osm_lftr_rcv_process( CL_ASSERT( p_rcvd_mad->attr_id == IB_MAD_ATTR_LFT_RECORD ); + /* we only support SubnAdmGet and SubnAdmGetTable methods */ + if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && + (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_lftr_rcv_process: ERR 4408: " + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_rcvd_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); + goto Exit; + } + /* update the requester physical port. */ p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, p_rcv->p_subn, @@ -345,16 +356,6 @@ osm_lftr_rcv_process( goto Exit; } - if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && - (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_lftr_rcv_process: ERR 4408: " - "Unsupported Method (%s)\n", - ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); - goto Exit; - } - cl_qlist_init( &rec_list ); context.p_rcvd_rec = p_rcvd_rec; Index: opensm/osm_sa_link_record.c =================================================================== --- opensm/osm_sa_link_record.c (revision 8491) +++ opensm/osm_sa_link_record.c (working copy) @@ -716,7 +716,9 @@ osm_lr_rcv_process( p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); p_lr = (ib_link_record_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); - /* we only support SubnAdmGet and SubnAdmGetTable Methods */ + CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_LINK_RECORD ); + + /* we only support SubnAdmGet and SubnAdmGetTable methods */ if ( (p_sa_mad->method != IB_MAD_METHOD_GET) && (p_sa_mad->method != IB_MAD_METHOD_GETTABLE)) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, @@ -727,8 +729,6 @@ osm_lr_rcv_process( goto Exit; } - CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_LINK_RECORD ); - /* update the requester physical port. */ p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, p_rcv->p_subn, Index: opensm/osm_sa_mcmember_record.c =================================================================== --- opensm/osm_sa_mcmember_record.c (revision 8491) +++ opensm/osm_sa_mcmember_record.c (working copy) @@ -1386,7 +1386,7 @@ osm_mcmr_rcv_create_new_mgrp( Process a request for leaving the group **********************************************************************/ static void -osm_mcmr_rcv_leave_mgrp( +__osm_mcmr_rcv_leave_mgrp( IN osm_mcmr_recv_t* const p_rcv, IN const osm_madw_t* const p_madw ) { @@ -1403,7 +1403,7 @@ osm_mcmr_rcv_leave_mgrp( uint8_t port_join_state; uint8_t new_join_state; - OSM_LOG_ENTER( p_rcv->p_log, osm_mcmr_rcv_leave_mgrp ); + OSM_LOG_ENTER( p_rcv->p_log, __osm_mcmr_rcv_leave_mgrp ); p_mgrp = NULL; p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); @@ -1443,7 +1443,7 @@ osm_mcmr_rcv_leave_mgrp( if (new_join_state) { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, - "osm_mcmr_rcv_leave_mgrp: " + "__osm_mcmr_rcv_leave_mgrp: " "After update JoinState != 0. Updating from 0x%X to 0x%X\n", port_join_state, new_join_state @@ -1466,7 +1466,7 @@ osm_mcmr_rcv_leave_mgrp( if(status != IB_SUCCESS) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_rcv_leave_mgrp: ERR 1B09: " + "__osm_mcmr_rcv_leave_mgrp: ERR 1B09: " "osm_sm_mcgrp_leave failed\n" ); } @@ -1479,7 +1479,7 @@ osm_mcmr_rcv_leave_mgrp( { CL_PLOCK_RELEASE( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_rcv_leave_mgrp: ERR 1B25: " + "__osm_mcmr_rcv_leave_mgrp: ERR 1B25: " "Received an invalid delete request on " "MGID: 0x%016" PRIx64 " : " "0x%016" PRIx64 " for " @@ -1499,7 +1499,7 @@ osm_mcmr_rcv_leave_mgrp( { CL_PLOCK_RELEASE( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_DEBUG, - "osm_mcmr_rcv_leave_mgrp: " + "__osm_mcmr_rcv_leave_mgrp: " "Multicast group not present failed\n" ); sa_status = IB_SA_MAD_STATUS_REQ_INVALID; osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); @@ -1519,9 +1519,8 @@ osm_mcmr_rcv_leave_mgrp( /********************************************************************** Handle a join (or create) request **********************************************************************/ -static -void -osm_mcmr_rcv_join_mgrp( +static void +__osm_mcmr_rcv_join_mgrp( IN osm_mcmr_recv_t* const p_rcv, IN const osm_madw_t* const p_madw ) { @@ -1542,7 +1541,7 @@ osm_mcmr_rcv_join_mgrp( osm_mcast_req_type_t req_type; uint8_t join_state; - OSM_LOG_ENTER( p_rcv->p_log, osm_mcmr_rcv_join_mgrp ); + OSM_LOG_ENTER( p_rcv->p_log, __osm_mcmr_rcv_join_mgrp ); p_mgrp = NULL; p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); @@ -1556,7 +1555,7 @@ osm_mcmr_rcv_join_mgrp( if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, - "osm_mcmr_rcv_join_mgrp: " + "__osm_mcmr_rcv_join_mgrp: " "Dump of incoming record\n" ); osm_dump_mc_record( p_rcv->p_log, &mcmember_rec, OSM_LOG_DEBUG ); } @@ -1572,7 +1571,7 @@ osm_mcmr_rcv_join_mgrp( CL_PLOCK_RELEASE( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_DEBUG, - "osm_mcmr_rcv_join_mgrp: " + "__osm_mcmr_rcv_join_mgrp: " "Unknown port GUID 0x%016" PRIx64 "\n", portguid ); sa_status = IB_SA_MAD_STATUS_REQ_INVALID; @@ -1598,7 +1597,7 @@ osm_mcmr_rcv_join_mgrp( CL_PLOCK_RELEASE( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_DEBUG, - "osm_mcmr_rcv_join_mgrp: " + "__osm_mcmr_rcv_join_mgrp: " "port and requester don't share pkey\n" ); sa_status = IB_SA_MAD_STATUS_REQ_INVALID; osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); @@ -1617,7 +1616,7 @@ osm_mcmr_rcv_join_mgrp( { CL_PLOCK_RELEASE( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_rcv_join_mgrp: ERR 1B10: " + "__osm_mcmr_rcv_join_mgrp: ERR 1B10: " "Provided Join State != FullMember - required for create, " "MGID: 0x%016" PRIx64 " : " "0x%016" PRIx64 "\n", @@ -1651,7 +1650,7 @@ osm_mcmr_rcv_join_mgrp( else { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_rcv_join_mgrp: ERR 1B11: " + "__osm_mcmr_rcv_join_mgrp: ERR 1B11: " "method = %s, " "scope_state = 0x%x, " "component mask = 0x%016" PRIx64 ", " @@ -1715,7 +1714,7 @@ osm_mcmr_rcv_join_mgrp( if (!valid) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_rcv_join_mgrp: ERR 1B12: " + "__osm_mcmr_rcv_join_mgrp: ERR 1B12: " "__validate_more_comp_fields, __validate_port_caps, " "or JoinState = 0 failed from port 0x%016" PRIx64 ", " "sending IB_SA_MAD_STATUS_REQ_INVALID\n", @@ -1748,7 +1747,7 @@ osm_mcmr_rcv_join_mgrp( if (!valid) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_rcv_join_mgrp: ERR 1B13: " + "__osm_mcmr_rcv_join_mgrp: ERR 1B13: " "__validate_modify failed, " "sending IB_SA_MAD_STATUS_REQ_INVALID\n" ); @@ -1803,7 +1802,7 @@ osm_mcmr_rcv_join_mgrp( if (status != IB_SUCCESS) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_rcv_join_mgrp: ERR 1B14: " + "__osm_mcmr_rcv_join_mgrp: ERR 1B14: " "osm_sm_mcgrp_join failed, " "sending IB_SA_MAD_STATUS_NO_RESOURCES\n" ); @@ -2070,15 +2069,15 @@ __osm_sa_mcm_by_comp_mask_cb( /********************************************************************** Handle a query request **********************************************************************/ -void -osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* const p_rcv, - IN const osm_madw_t* const p_madw) +static void +__osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* const p_rcv, + IN const osm_madw_t* const p_madw) { - const ib_sa_mad_t* p_rcvd_mad; + const ib_sa_mad_t* p_rcvd_mad; const ib_member_rec_t* p_rcvd_rec; cl_qlist_t rec_list; osm_madw_t* p_resp_madw; - ib_sa_mad_t* p_resp_sa_mad; + ib_sa_mad_t* p_resp_sa_mad; ib_member_rec_t* p_resp_rec; uint32_t num_rec, pre_trim_num_rec; #ifndef VENDOR_RMPP_SUPPORT @@ -2087,23 +2086,17 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* uint32_t i; osm_sa_mcmr_search_ctxt_t context; osm_mcmr_item_t* p_rec_item; - ib_api_status_t status; + ib_api_status_t status; ib_net64_t comp_mask; osm_physp_t* p_req_physp; boolean_t trusted_req; - CL_ASSERT( p_rcv ); - - OSM_LOG_ENTER( p_rcv->p_log, osm_mcmr_query_mgrp ); - - CL_ASSERT( p_madw ); + OSM_LOG_ENTER( p_rcv->p_log, __osm_mcmr_query_mgrp ); p_rcvd_mad = osm_madw_get_sa_mad_ptr( p_madw ); p_rcvd_rec = (ib_member_rec_t*)ib_sa_mad_get_payload_ptr( p_rcvd_mad ); comp_mask = p_rcvd_mad->comp_mask; - CL_ASSERT( p_rcvd_mad->attr_id == IB_MAD_ATTR_MCMEMBER_RECORD ); - /* if sm_key is not zero and does not match we never get here see main SA receiver @@ -2117,7 +2110,7 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* if (p_req_physp == NULL) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_query_mgrp: ERR 1B04: " + "__osm_mcmr_query_mgrp: ERR 1B04: " "Cannot find requester physical port\n" ); goto Exit; } @@ -2149,7 +2142,7 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* if ( (p_rcvd_mad->method == IB_MAD_METHOD_GET) && (num_rec > 1)) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_query_mgrp: ERR 1B05: " + "__osm_mcmr_query_mgrp: ERR 1B05: " "Got more than one record for SubnAdmGet (%u)\n", num_rec ); osm_sa_send_error( p_rcv->p_resp, p_madw, @@ -2172,7 +2165,7 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* if (trim_num_rec < num_rec) { osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, - "osm_mcmr_query_mgrp: " + "__osm_mcmr_query_mgrp: " "Number of records:%u trimmed to:%u to fit in one MAD\n", num_rec, trim_num_rec ); num_rec = trim_num_rec; @@ -2180,7 +2173,7 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* #endif osm_log( p_rcv->p_log, OSM_LOG_DEBUG, - "osm_mcmr_query_mgrp: " + "__osm_mcmr_query_mgrp: " "Returning %u records\n", num_rec ); if ((p_rcvd_mad->method == IB_MAD_METHOD_GET) && (num_rec == 0)) @@ -2200,7 +2193,7 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* if( !p_resp_madw ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_query_mgrp: ERR 1B16: " + "__osm_mcmr_query_mgrp: ERR 1B16: " "osm_mad_pool_get failed\n" ); for( i = 0; i < num_rec; i++ ) @@ -2278,7 +2271,7 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* if(status != IB_SUCCESS) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mcmr_query_mgrp: ERR 1B17: " + "__osm_mcmr_query_mgrp: ERR 1B17: " "osm_vendor_send status = %s\n", ib_get_err_str(status) ); goto Exit; @@ -2310,6 +2303,8 @@ osm_mcmr_rcv_process( p_recvd_mcmember_rec = (ib_member_rec_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); + CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_MCMEMBER_RECORD ); + switch (p_sa_mad->method) { case IB_MAD_METHOD_SET: @@ -2338,7 +2333,7 @@ osm_mcmr_rcv_process( /* * Join or Create Multicast Group */ - osm_mcmr_rcv_join_mgrp(p_rcv, p_madw); + __osm_mcmr_rcv_join_mgrp(p_rcv, p_madw); break; case IB_MAD_METHOD_DELETE: valid = __check_join_comp_mask(p_sa_mad->comp_mask); @@ -2358,14 +2353,14 @@ osm_mcmr_rcv_process( /* * Leave Multicast Group */ - osm_mcmr_rcv_leave_mgrp(p_rcv, p_madw); + __osm_mcmr_rcv_leave_mgrp(p_rcv, p_madw); break; case IB_MAD_METHOD_GET: case IB_MAD_METHOD_GETTABLE: /* * Querying a Multicast Group */ - osm_mcmr_query_mgrp(p_rcv, p_madw); + __osm_mcmr_query_mgrp(p_rcv, p_madw); break; default: osm_log( p_rcv->p_log, OSM_LOG_ERROR, Index: opensm/osm_sa_multipath_record.c =================================================================== --- opensm/osm_sa_multipath_record.c (revision 8491) +++ opensm/osm_sa_multipath_record.c (working copy) @@ -1424,17 +1424,6 @@ osm_mpr_rcv_process( CL_ASSERT( p_madw ); - /* update the requester physical port. */ - requester_port = osm_get_port_by_mad_addr( p_rcv->p_log, p_rcv->p_subn, - osm_madw_get_mad_addr_ptr( p_madw ) ); - if ( requester_port == NULL ) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_mpr_rcv_process: ERR 4517: " - "Cannot find requester physical port\n" ); - goto Exit; - } - p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); p_mpr = (ib_multipath_rec_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); @@ -1449,6 +1438,7 @@ osm_mpr_rcv_process( goto Exit; } + /* we only support SubnAdmGetMulti method */ if ( p_sa_mad->method != IB_MAD_METHOD_GETMULTI ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_mpr_rcv_process: ERR 4513: " @@ -1458,6 +1448,17 @@ osm_mpr_rcv_process( goto Exit; } + /* update the requester physical port. */ + requester_port = osm_get_port_by_mad_addr( p_rcv->p_log, p_rcv->p_subn, + osm_madw_get_mad_addr_ptr( p_madw ) ); + if ( requester_port == NULL ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_mpr_rcv_process: ERR 4517: " + "Cannot find requester physical port\n" ); + goto Exit; + } + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) osm_dump_multipath_record( p_rcv->p_log, p_mpr, OSM_LOG_DEBUG ); Index: opensm/osm_sa_node_record.c =================================================================== --- opensm/osm_sa_node_record.c (revision 8491) +++ opensm/osm_sa_node_record.c (working copy) @@ -427,6 +427,17 @@ osm_nr_rcv_process( CL_ASSERT( p_rcvd_mad->attr_id == IB_MAD_ATTR_NODE_RECORD ); + /* we only support SubnAdmGet and SubnAdmGetTable methods */ + if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && + (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_nr_rcv_process: ERR 1D05: " + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_rcvd_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); + goto Exit; + } + /* update the requester physical port. */ p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, p_rcv->p_subn, @@ -439,16 +450,6 @@ osm_nr_rcv_process( goto Exit; } - if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && - (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_nr_rcv_process: ERR 1D05: " - "Unsupported Method (%s)\n", - ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); - goto Exit; - } - if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) osm_dump_node_record( p_rcv->p_log, p_rcvd_rec, OSM_LOG_DEBUG ); Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8491) +++ opensm/osm_sa_path_record.c (working copy) @@ -1414,7 +1414,7 @@ __osm_pr_match_mgrp_attributes( if( comp_mask & IB_PR_COMPMASK_SL ) { - if( ( p_pr->sl & 0xf ) != sl ) + if( ib_path_rec_sl( p_pr ) != sl ) goto Exit; } @@ -1422,7 +1422,7 @@ __osm_pr_match_mgrp_attributes( if( ( comp_mask & IB_PR_COMPMASK_NUMBPATH ) && ( p_sa_mad->method != IB_MAD_METHOD_GET ) ) { - if( ( p_pr->num_path & 0x7f ) == 0 ) + if( ib_path_rec_num_path( p_pr ) == 0 ) goto Exit; } @@ -1665,22 +1665,12 @@ osm_pr_rcv_process( CL_ASSERT( p_madw ); - /* update the requester physical port. */ - requester_port = osm_get_port_by_mad_addr( p_rcv->p_log, p_rcv->p_subn, - osm_madw_get_mad_addr_ptr( p_madw ) ); - if( requester_port == NULL ) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_pr_rcv_process: ERR 1F16: " - "Cannot find requester physical port\n" ); - goto Exit; - } - p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); p_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_PATH_RECORD ); + /* we only support SubnAdmGet and SubnAdmGetTable methods */ if ((p_sa_mad->method != IB_MAD_METHOD_GET) && (p_sa_mad->method != IB_MAD_METHOD_GETTABLE)) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, @@ -1691,6 +1681,17 @@ osm_pr_rcv_process( goto Exit; } + /* update the requester physical port. */ + requester_port = osm_get_port_by_mad_addr( p_rcv->p_log, p_rcv->p_subn, + osm_madw_get_mad_addr_ptr( p_madw ) ); + if( requester_port == NULL ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_pr_rcv_process: ERR 1F16: " + "Cannot find requester physical port\n" ); + goto Exit; + } + if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) osm_dump_path_record( p_rcv->p_log, p_pr, OSM_LOG_DEBUG ); Index: opensm/osm_sa_pkey_record.c =================================================================== --- opensm/osm_sa_pkey_record.c (revision 8491) +++ opensm/osm_sa_pkey_record.c (working copy) @@ -348,18 +348,7 @@ osm_pkey_rec_rcv_process( CL_ASSERT( p_rcvd_mad->attr_id == IB_MAD_ATTR_PKEY_TBL_RECORD ); - /* update the requester physical port. */ - p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, - p_rcv->p_subn, - osm_madw_get_mad_addr_ptr(p_madw) ); - if (p_req_physp == NULL) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_pkey_rec_rcv_process: ERR 4604: " - "Cannot find requester physical port\n" ); - goto Exit; - } - + /* we only support SubnAdmGet and SubnAdmGetTable methods */ if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) { @@ -370,7 +359,7 @@ osm_pkey_rec_rcv_process( osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); goto Exit; } - + /* p922 - P_KeyTableRecords shall only be provided in response to trusted requests. @@ -388,6 +377,18 @@ osm_pkey_rec_rcv_process( goto Exit; } + /* update the requester physical port. */ + p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, + p_rcv->p_subn, + osm_madw_get_mad_addr_ptr(p_madw) ); + if (p_req_physp == NULL) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_pkey_rec_rcv_process: ERR 4604: " + "Cannot find requester physical port\n" ); + goto Exit; + } + p_pkey = (ib_pkey_table_t*)ib_sa_mad_get_payload_ptr( p_rcvd_mad ); cl_qlist_init( &rec_list ); Index: opensm/osm_sa_portinfo_record.c =================================================================== --- opensm/osm_sa_portinfo_record.c (revision 8491) +++ opensm/osm_sa_portinfo_record.c (working copy) @@ -638,6 +638,18 @@ osm_pir_rcv_process( CL_ASSERT( p_rcvd_mad->attr_id == IB_MAD_ATTR_PORTINFO_RECORD ); + /* we only support SubnAdmGet and SubnAdmGetTable methods */ + if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && + (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_pir_rcv_process: ERR 2105: " + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_rcvd_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); + goto Exit; + } + /* update the requester physical port. */ p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, p_rcv->p_subn, @@ -650,17 +662,6 @@ osm_pir_rcv_process( goto Exit; } - if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && - (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_pir_rcv_process: ERR 2105: " - "Unsupported Method (%s)\n", - ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); - goto Exit; - } - if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) osm_dump_portinfo_record( p_rcv->p_log, p_rcvd_rec, OSM_LOG_DEBUG ); Index: opensm/osm_sa_service_record.c =================================================================== --- opensm/osm_sa_service_record.c (revision 8491) +++ opensm/osm_sa_service_record.c (working copy) @@ -795,6 +795,8 @@ osm_sr_rcv_process_get_method( p_recvd_service_rec = (ib_service_record_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); + CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_SERVICE_RECORD ); + if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) { osm_dump_service_record( p_rcv->p_log, Index: opensm/osm_sa_slvl_record.c =================================================================== --- opensm/osm_sa_slvl_record.c (revision 8491) +++ opensm/osm_sa_slvl_record.c (working copy) @@ -333,6 +333,18 @@ osm_slvl_rec_rcv_process( CL_ASSERT( p_rcvd_mad->attr_id == IB_MAD_ATTR_SLVL_RECORD ); + /* we only support SubnAdmGet and SubnAdmGetTable methods */ + if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && + (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_slvl_rec_rcv_process: ERR 2604: " + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_rcvd_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); + goto Exit; + } + /* update the requester physical port. */ p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, p_rcv->p_subn, @@ -345,17 +357,6 @@ osm_slvl_rec_rcv_process( goto Exit; } - if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && - (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_slvl_rec_rcv_process: ERR 2604: " - "Unsupported Method (%s)\n", - ib_get_sa_method_str( p_rcvd_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); - goto Exit; - } - p_slvl_tbl = (ib_slvl_table_t*)ib_sa_mad_get_payload_ptr( p_rcvd_mad ); cl_qlist_init( &rec_list ); Index: opensm/osm_sa_sminfo_record.c =================================================================== --- opensm/osm_sa_sminfo_record.c (revision 8491) +++ opensm/osm_sa_sminfo_record.c (working copy) @@ -149,6 +149,18 @@ osm_smir_rcv_process( CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_SMINFO_RECORD ); + /* we only support SubnAdmGet and SubnAdmGetTable methods */ + if ( (p_sa_mad->method != IB_MAD_METHOD_GET) && + (p_sa_mad->method != IB_MAD_METHOD_GETTABLE) ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_smir_rcv_process: ERR 2804: " + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_sa_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); + goto Exit; + } + /* update the requester physical port. */ p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, p_rcv->p_subn, @@ -161,17 +173,6 @@ osm_smir_rcv_process( goto Exit; } - if ( (p_sa_mad->method != IB_MAD_METHOD_GET) && - (p_sa_mad->method != IB_MAD_METHOD_GETTABLE) ) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_smir_rcv_process: ERR 2804: " - "Unsupported Method (%s)\n", - ib_get_sa_method_str( p_sa_mad->method ) ); - osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); - goto Exit; - } - if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) osm_dump_sm_info_record( p_rcv->p_log, p_sminfo_rec, OSM_LOG_DEBUG ); Index: opensm/osm_sa_vlarb_record.c =================================================================== --- opensm/osm_sa_vlarb_record.c (revision 8491) +++ opensm/osm_sa_vlarb_record.c (working copy) @@ -345,18 +345,6 @@ osm_vlarb_rec_rcv_process( OSM_LOG_ENTER( p_rcv->p_log, osm_vlarb_rec_rcv_process ); - /* update the requester physical port. */ - p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, - p_rcv->p_subn, - osm_madw_get_mad_addr_ptr(p_madw) ); - if (p_req_physp == NULL) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_vlarb_rec_rcv_process: ERR 2A04: " - "Cannot find requester physical port\n" ); - goto Exit; - } - CL_ASSERT( p_madw ); p_rcvd_mad = osm_madw_get_sa_mad_ptr( p_madw ); @@ -365,6 +353,7 @@ osm_vlarb_rec_rcv_process( CL_ASSERT( p_rcvd_mad->attr_id == IB_MAD_ATTR_VLARB_RECORD ); + /* we only support SubnAdmGet and SubnAdmGetTable methods */ if ( (p_rcvd_mad->method != IB_MAD_METHOD_GET) && (p_rcvd_mad->method != IB_MAD_METHOD_GETTABLE) ) { @@ -376,6 +365,18 @@ osm_vlarb_rec_rcv_process( goto Exit; } + /* update the requester physical port. */ + p_req_physp = osm_get_physp_by_mad_addr(p_rcv->p_log, + p_rcv->p_subn, + osm_madw_get_mad_addr_ptr(p_madw) ); + if (p_req_physp == NULL) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_vlarb_rec_rcv_process: ERR 2A04: " + "Cannot find requester physical port\n" ); + goto Exit; + } + p_vl_arb = (ib_vl_arb_table_t*)ib_sa_mad_get_payload_ptr( p_rcvd_mad ); cl_qlist_init( &rec_list ); From ardavis at ichips.intel.com Wed Jul 12 13:36:41 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 12 Jul 2006 13:36:41 -0700 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> Message-ID: <44B55D59.7070605@ichips.intel.com> Tziporet Koren wrote: > > • Core: > > – Set options in CMA & uCMA (needed for Intel MPI) > > – HCA fatal - full flow support > > – Huge pages support > > > • uDAPL: > > – Scalability features needed for Intel MPI – take from trunk > > • Arlin & James – please reply if there are more features needed. > The latest uDAPL from the trunk and uCMA set option support is sufficient. Thanks, -arlin From rdreier at cisco.com Wed Jul 12 13:45:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 12 Jul 2006 13:45:12 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> Message-ID: Currently, the code in lib/idr.c uses a bare spin_lock(&idp->lock) to do internal locking. This is a nasty trap for code that might call idr functions from different contexts; for example, it seems perfectly reasonable to call idr_get_new() from process context and idr_remove() from interrupt context -- but with the current locking this would lead to a potential deadlock. The simplest fix for this is to just convert the idr locking to use spin_lock_irqsave(). In particular, this fixes a very complicated locking issue detected by lockdep, involving the ib_ipoib driver's priv->lock and dev->_xmit_lock, which get involved with the ib_sa module's query_idr.lock. Cc: Arjan van de Ven Cc: Ingo Molnar Cc: Zach Brown , Signed-off-by: Roland Dreier --- diff --git a/lib/idr.c b/lib/idr.c index 4d09681..16d2143 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -38,14 +38,15 @@ static kmem_cache_t *idr_layer_cache; static struct idr_layer *alloc_layer(struct idr *idp) { struct idr_layer *p; + unsigned long flags; - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); if ((p = idp->id_free)) { idp->id_free = p->ary[0]; idp->id_free_cnt--; p->ary[0] = NULL; } - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); return(p); } @@ -59,12 +60,14 @@ static void __free_layer(struct idr *idp static void free_layer(struct idr *idp, struct idr_layer *p) { + unsigned long flags; + /* * Depends on the return element being zeroed. */ - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); __free_layer(idp, p); - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); } /** @@ -168,6 +171,7 @@ static int idr_get_new_above_int(struct { struct idr_layer *p, *new; int layers, v, id; + unsigned long flags; id = starting_id; build_up: @@ -191,14 +195,14 @@ build_up: * The allocation failed. If we built part of * the structure tear it down. */ - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); for (new = p; p && p != idp->top; new = p) { p = p->ary[0]; new->ary[0] = NULL; new->bitmap = new->count = 0; __free_layer(idp, new); } - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); return -1; } new->ary[0] = p; From mst at mellanox.co.il Wed Jul 12 13:46:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 12 Jul 2006 23:46:01 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <44B55D59.7070605@ichips.intel.com> References: <44B55D59.7070605@ichips.intel.com> Message-ID: <20060712204601.GB21725@mellanox.co.il> Quoting r. Arlin Davis : > The latest uDAPL from the trunk and uCMA set option support is sufficient. Which options do you set? Retry/timeout or path as well? -- MST From ardavis at ichips.intel.com Wed Jul 12 13:51:09 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 12 Jul 2006 13:51:09 -0700 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <20060712204601.GB21725@mellanox.co.il> References: <44B55D59.7070605@ichips.intel.com> <20060712204601.GB21725@mellanox.co.il> Message-ID: <44B560BD.6070702@ichips.intel.com> Michael S. Tsirkin wrote: >Quoting r. Arlin Davis : > > >>The latest uDAPL from the trunk and uCMA set option support is sufficient. >> >> > >Which options do you set? Retry/timeout or path as well? > > > Just retry/timeout. From ardavis at ichips.intel.com Wed Jul 12 13:52:17 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 12 Jul 2006 13:52:17 -0700 Subject: [openib-general] [Bug 146] OFED-1.0 DAPL fails to build on SLES10 on IA64 with IA64_FETCHADD error In-Reply-To: <44B524CF.1020906@sgi.com> References: <44AD9614.5090907@sgi.com> <44B41E6B.6070703@sgi.com> <44B524CF.1020906@sgi.com> Message-ID: <44B56101.5020105@ichips.intel.com> John Partridge wrote: > >I installed the dapl rpm. I do have libdat.so.1 but I also expect a >symlink to libdat.so which does not exist (Intel MPI appears to need it) > >I also noticed that the dat.conf points to >/usr/local/ofed/lib/libdaplcma.so but there is no symlink in the >/usr/local/ofed/lib directory for it, I do have the libdaplcma.so.1 >am I missing something here ? > > The links should be built during the RPM install. What RPM's are you using to install? Did you modify the dapl rpm? -arlin From afriedle at open-mpi.org Wed Jul 12 14:06:00 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Wed, 12 Jul 2006 14:06:00 -0700 Subject: [openib-general] multicast Message-ID: <44B56438.2000604@open-mpi.org> I'm trying to understand how the ibverbs multicast API works, but I'm not sure how multicast groups are created. I understand that ibv_attach_mcast() and ibv_detach_mcast() are used to leave/join a particular multicast group, but IB architecture spec indicates a group must be created first. How do I do this? Actually, is there any multicast sample code anywhere? Thanks, Andrew From mingo at elte.hu Wed Jul 12 14:14:43 2006 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 12 Jul 2006 23:14:43 +0200 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> Message-ID: <20060712211443.GA10944@elte.hu> * Roland Dreier wrote: > Currently, the code in lib/idr.c uses a bare spin_lock(&idp->lock) to > do internal locking. This is a nasty trap for code that might call > idr functions from different contexts; for example, it seems perfectly > reasonable to call idr_get_new() from process context and idr_remove() > from interrupt context -- but with the current locking this would lead > to a potential deadlock. > > The simplest fix for this is to just convert the idr locking to use > spin_lock_irqsave(). > > In particular, this fixes a very complicated locking issue detected by > lockdep, involving the ib_ipoib driver's priv->lock and > dev->_xmit_lock, which get involved with the ib_sa module's > query_idr.lock. > > Cc: Arjan van de Ven > Cc: Ingo Molnar Acked-by: Ingo Molnar Ingo From mshefty at ichips.intel.com Wed Jul 12 14:23:55 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 12 Jul 2006 14:23:55 -0700 Subject: [openib-general] multicast In-Reply-To: <44B56438.2000604@open-mpi.org> References: <44B56438.2000604@open-mpi.org> Message-ID: <44B5686B.8000708@ichips.intel.com> Andrew Friedley wrote: > I'm trying to understand how the ibverbs multicast API works, but I'm > not sure how multicast groups are created. I understand that > ibv_attach_mcast() and ibv_detach_mcast() are used to leave/join a > particular multicast group, but IB architecture spec indicates a group > must be created first. How do I do this? > > Actually, is there any multicast sample code anywhere? Before you can call ibv_attach_mcast(), you must first join the multicast group by sending a multicast join request to the SA. The join request may create the multicast group if it doesn't already exist, and the request has the necessary information. You can use the rdma_join_multicast() API in the rdma_cm to join a multicast group and attach the group to a QP. If the group has not already been created, it will be created for you, with the group parameters based on that used by ipoib. The rdma_join_multicast() call will invoke ibv_attach_mcast() for you once the join operation completes. There is sample code for joining multicast groups using this API in userspace/librdmacm/examples/mckey.c. If you want more details on multicast group creation, you'll need to see the spec, section 12.2.5.17. Code to create / join a multicast group can be found in the kernel cma.c module - see cma_join_ib_multicast(). You may want to see if rdma_join_multicast() meets your needs first. If it does not, please let me know what additional functionality you need. - Sean From halr at voltaire.com Wed Jul 12 14:49:56 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 17:49:56 -0400 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <000001c6a5dc$ca72ff00$ff0da8c0@amr.corp.intel.com> References: <000001c6a5dc$ca72ff00$ff0da8c0@amr.corp.intel.com> Message-ID: <1152740984.13048.144995.camel@hal.voltaire.com> On Wed, 2006-07-12 at 13:58, Sean Hefty wrote: > >> I was starting / stopping openSM on different systems soon before running the > >> tests. > > > >Not sure I quite understand the sequencing. > > I was being somewhat random, just trying to stress things. > How quickly will one SM take over for another after one dies? With the default sminfo_polling_timeout of 10 seconds and default polling_retry_number of 4, so the total handoff time should be around 40 seconds. I just did that experiment with 2 SMs and saw that as well. > >Can you run with -V and send me the output ? I want to recreate this so > >I understand what is going on. > > I'm having trouble re-creating the error at the moment, but I isolated my test > systems from our larger cluster. I will need to reconnect to the cluster and > see if I can cause the error again. That's another difference. I've never run osmtest in a large subnet. -- Hal > - Sean From krause at cup.hp.com Wed Jul 12 14:59:25 2006 From: krause at cup.hp.com (Michael Krause) Date: Wed, 12 Jul 2006 14:59:25 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <6DB5B58A8E5AB846A7B3B3BFF1B4315A2BDF88@AVEXCH1.qlogic.org> References: <6DB5B58A8E5AB846A7B3B3BFF1B4315A2BDF88@AVEXCH1.qlogic.org> Message-ID: <6.2.0.14.2.20060712144615.1008f360@esmail.cup.hp.com> At 09:48 AM 7/12/2006, Jeff Broughton wrote: >Mike, > >The whole purpose of SDP is to make sockets go faster without having to >have the applications modified. This is what the customers want. I've >heard this time and time again, across a wide spectrum of customers. I am well aware of this. However, Linux / Unix do not support async communications which severely limits the potential performance benefits of SDP. When we wrote the SDP specification it was fully understood that optimal performance is achieved through async communications. We spent considerable time constructing SDP to support both synchronous and asynchronous communication paradigms which there are many applications that would benefit. Customers want to be able to use RDMA interconnects without recompilation and through the use of SDP and shared libraries this is certainly practical to execute. Developers however are not the same as customers and it is developers who would benefit from the Sockets extensions and this would in turn benefit customers. >Modifying the sockets API is just defining yet another RDMA API, and we >have so many already.... I disagree. This effort has distilled the API to basically one for RDMA developers. Applications are supported over this via either MPI or Sockets. It seems rather self limiting to think the traditional BSD synchronous Sockets API is all the world should be able to use when it comes to Sockets. Sockets developers could easily incorporate the extensions into their applications providing them with improved designs and flexibility without having to learn about RDMA itself. If the couple of calls necessary to extend this API to support direct RDMA would allow them to eliminate SDP entirely, well, that has benefits that go beyond just its all Sockets; it also eliminates the IP cloud that hovers over SDP licensing. Something that many developers and customers would appreciate. In the end, this effort could choose to progress Sockets technology and extend the number of developers and applications that can achieve optimal performance with only minor knowledge growth or they can live with the limitations of the BSD Sockets API and either accept performance loss or be forced to jump through the hoops of using other rather niche or obscure API to accomplish what is possible with a small number of Sockets extensions which were defined by people with years of experience implementing Sockets and working with application developers. Mike > >-Jeff > > >---------- >From: openfabrics-ewg-bounces at openib.org >[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Michael Krause >Sent: Wednesday, July 12, 2006 9:23 AM >To: Tziporet Koren; Scott Weitzenkamp (sweitzen) >Cc: OpenFabricsEWG; openib >Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.1 release - >schedule and features > >At 12:59 AM 7/12/2006, Tziporet Koren wrote: >>Scott Weitzenkamp (sweitzen) wrote: >> > For SDP, I would like to see "improved stability" (maybe you have this >> > in mind under "beta quality"), also how about "AIO support"? The rest >> > of the list looks good. >> > >>Yes - beta quality means improved stability. >>AIO is not planed for 1.1 (schedule issue). If needed we can add it to 1.2 > >Would be nice if people thought about implementing the Sockets API >Extensions from the OpenGroup. They provide explicit memory management >and async communications which will allow SDP performance to be fully >exploited. The benefits go beyond what is found in AIO or on other OS >such as Windows. If one were to extend slightly to have explicit RDMA >Read and Write from the Sockets API, then it would be quite possible to >eliminate SDP entirely for new applications leaving SDP strictly for >legacy Sockets environments. > >Mike > > >>Tziporet >> >>_______________________________________________ >>openib-general mailing list >>openib-general at openib.org >>http://openib.org/mailman/listinfo/openib-general >> >>To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed Jul 12 15:36:23 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 12 Jul 2006 15:36:23 -0700 Subject: [openib-general] openSM failover / failback issue? In-Reply-To: <1152740984.13048.144995.camel@hal.voltaire.com> References: <000001c6a5dc$ca72ff00$ff0da8c0@amr.corp.intel.com> <1152740984.13048.144995.camel@hal.voltaire.com> Message-ID: <44B57967.1030706@ichips.intel.com> Hal Rosenstock wrote: > With the default sminfo_polling_timeout of 10 seconds and default > polling_retry_number of 4, so the total handoff time should be around 40 > seconds. I just did that experiment with 2 SMs and saw that as well. Okay - I narrowed down the test case to something reproducible. I have 2 systems connected into Woody's cluster. I'm not sure how many systems are in Woody's cluster, but probably around 8. OpenSM is running on one of the systems in the cluster. If I run osmtest from either of my two systems, it works fine. If I start openSM on one of my systems, it becomes the master SM. The LIDs on my systems are reassigned. If I run osmtest from either of my two systems, it still works fine. If I kill openSM on my system, then run osmtest -f c, I get a failure: Error on query (IB_TIMEOUT). It looks like a CLASS_PORT_INFO query, but the query is going to my now dead opensm system. At this point, if I unload / reload ib_mthca on either of my systems, Woody's SM kicks in, reassigns my systems' LIDs, and osmtest starts working again. I don't know if this is an HCA firmware issues, switch issue, or openSM issue. I don't think it's related to my changes or osmtest at this point. - Sean From ftillier at silverstorm.com Wed Jul 12 15:49:07 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Wed, 12 Jul 2006 18:49:07 -0400 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <6.2.0.14.2.20060712144615.1008f360@esmail.cup.hp.com> References: <6DB5B58A8E5AB846A7B3B3BFF1B4315A2BDF88@AVEXCH1.qlogic.org> <6.2.0.14.2.20060712144615.1008f360@esmail.cup.hp.com> Message-ID: <79ae2f320607121549h24e3112ak7de9a9c50005bb9f@mail.gmail.com> Hi Mike, On 7/12/06, Michael Krause wrote: > > At 09:48 AM 7/12/2006, Jeff Broughton wrote: > >> Modifying the sockets API is just defining yet another RDMA API, and we have >> so many already.... > > I disagree. This effort has distilled the API to basically one for RDMA > developers. Applications are supported over this via either MPI or Sockets. There's been a lot of effort to make the RDMA verbs easy to use. With the RDMA CM, socket-like connection semantics can be used to establish the connection between QPs. The connection establishment is the hard part - doing I/O is trivial in comparisson. This verbs and RDMA CM have nothing to do with MPI. If an application is going to be RDMA aware, I don't see any reason it shouldn't just use the verbs directly and use the RDMA CM to establish the connections. > It seems rather self limiting to think the traditional BSD synchronous > Sockets API is all the world should be able to use when it comes to Sockets. > Sockets developers could easily incorporate the extensions into their > applications providing them with improved designs and flexibility without > having to learn about RDMA itself. Wait, you want applications to be able to register memory and issue RDMA operations, but not have to learn about RDMA? How does that make sense? > If the couple of calls necessary to > extend this API to support direct RDMA would allow them to eliminate SDP > entirely, well, that has benefits that go beyond just its all Sockets; For a socket implementation to support RDMA, the socket must have an underlying RDMA QP. This means that if you want the application to not have to be verbs-aware, you can't really get rid of SDP - you're just extending SDP to let the application have a part in memory registration and RDMA, while still supporting the traditional BSD operations. This is IMO more complex than just letting applications interface directly with verbs, especially since the SDP implementation will size the QP for its own use, without a means for negotiating with the user so that you don't cause buffer overruns. > it also eliminates the IP cloud that hovers over SDP licensing. Something > that many developers and customers would appreciate. I believe that Microsoft's IP claims only apply to SDP over IB -- I don't believe SDP over iWarp is affected. I don't know how the RDMA verbs moving towards a hardware independent (wrt IB vs. iWarp) affects the IP claims, but it should certainly make things interesting if a single SDP code base can work over both IB and iWarp. - Fab From ralphc at pathscale.com Wed Jul 12 16:29:27 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 12 Jul 2006 16:29:27 -0700 Subject: [openib-general] Suggestions for how to remove bus_to_virt() Message-ID: <1152746967.4572.263.camel@brick.pathscale.com> I have been looking at how to eliminate the bus_to_virt() and phys_to_virt() calls used by the ib_ipath driver. I am looking for suggestions on how to proceed. The current IB core to IB device driver interface relies on a kernel module being able to call ib_get_dma_mr() to allocate a memory region which represents all of device addressable memory. The kernel module is then expected to call dma_map_single(), dma_map_sg(), etc. to convert physical or virtual addresses into device addresses. If the system has an IOMMU, there may be several physical pages mapped to a single contiguous device address region. This device address and length (possibly an array of them) is then passed to the IB device driver so the IB device can DMA data to or from memory. The ib_ipath driver cannot tell the HW to DMA data directly to the device (IOMMU) addresses and must copy the data. This means the driver needs to reverse the IOMMU mapping and somehow obtain kernel virtual addresses so it can memcpy() the data to the correct location. Currently, the ib_ipath driver requires that the mapping be one-to-one since there is no practical way to reverse IOMMU mappings. I believe it is generally agreed that trying to change the dma_map_* interface to include functions of this sort is not the right approach to take. One solution is to change the IB device driver interface so that kernel virtual addresses are passed to the IB device driver and the device driver is responsible for calling dma_map_single(), etc. I believe this will be unacceptable to the OpenFabrics community since it would require the driver to allocate large amounts of memory (#QPs * #MaxWRs * sizeof(dma_addr_t + length)) to store the information needed to undo the mapping when the DMA is complete. The current IB code allocates the storage for dma_unmap_single(), etc. as extra elements in structures already needed so it isn't a large overhead and it is based on the actual number of requests posted instead of the maximums allowed. Another solution is to change the IB device driver interface to add a function which tells the caller what type of addresses the device expects. Kernel modules would then be required to pass either a dma_map_xxx() address or a kernel virtual address based on the driver's preference. The current set of IB consumers either start with kmalloc/vmalloc memory (such as the MAD layer) or a list of physical pages (such as ISER and SRP). The current code could therefore be fairly easily changed except for ISER/SRP when a struct page doesn't have a direct kernel address (high pages) and would need to call kmap()/kunmap() in that case. I plan to implement this last approach unless someone has a better idea. I would like to get some "buy-in" before I spend a lot time coding only to be rejected when finished. From ravinandan.arakali at neterion.com Wed Jul 12 16:39:08 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Wed, 12 Jul 2006 16:39:08 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <44B531C7.9070302@in.ibm.com> Message-ID: <001601c6a60c$5fa64b70$4710100a@pc.s2io.com> Also, I am trying to run some of the iwarp bandwidth/latency tests (available under directory perftest). The first thing to do here is to run opensm. When I run opensm (with debug level 10), I get the following error. Any idea what needs to be done to get this working ? openfab2:/tmp/ib/src/userspace # opensm -d 10 ------------------------------------------------- OpenSM Rev:openib-1.2.0 Command Line Arguments: d level = 0xa Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.2.0 Using default GUID 0x0 Error: Could not get port guid Exiting SM openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind Jul 12 08:35:04 721702 [0000] -> Exiting SM -----Original Message----- From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] Sent: Wednesday, July 12, 2006 10:31 AM To: Ravinandan Arakali Cc: openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARP interface) Ravinandan, Do you still see the rping crash? Thanks, Pradipta Kumar. Ravinandan Arakali wrote: > Pradipta, > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > values in /proc/krping. When I re-ran the krping test, I see following > output > openfab2:~ # cat /proc/krping > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > Any idea why rping is dumping core ? > > Has any testing been done using SDP with ammasso cards ? > > Regards, > Ravi > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Friday, July 07, 2006 11:20 PM > To: Ravinandan Arakali > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan Arakali wrote: >> Pradipta, >> Following is the output from gdb after core dump. I have also copy-pasted >> the gdb output on client system. >> >> Attached is the dmesg output when krping test is run in verbose mode. >> The ping data on the sender(client) seems okay. The content is shifted >> forward by one character for each packet. On receiver, after receiving > ping >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because messages >> can be lost during writing to /var/log/messages ? > krping is indeed working!!...Using 'verbose' allows you to see the ping > data. > When not using 'verbose' you see only 'send/recv' messages. >> ----------------------------------------- >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 >> Starting program: >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a >> 0.0.0.0 -p 9999 >> [Thread debugging using libthread_db enabled] >> [New Thread -1210054992 (LWP 3668)] >> ipaddr (0.0.0.0) >> port 9999 >> created cm_id 0x804e6e0 >> [New Thread -1210057824 (LWP 3671)] >> rdma_bind_addr successful >> rdma_listen >> cma_event type 4 cma_id 0x804e968 (child) >> child cma 0x804e968 >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread -1210054992 (LWP 3668)] >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); >> (gdb) bt >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 >> ) at examples/rping.c:767 >> (gdb) >> >> --------------------------------- >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 >> Starting program: >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a >> 17.2.2.102 -p 9999 >> [Thread debugging using libthread_db enabled] >> [New Thread 47388824908032 (LWP 4620)] >> ipaddr (17.2.2.102) >> port 9999 >> created cm_id 0x506b00 >> [New Thread 1082132800 (LWP 4623)] >> cma_event type 0 cma_id 0x506b00 (parent) >> cma_event type 2 cma_id 0x506b00 (parent) >> rdma_resolve_addr - rdma_resolve_route successful >> created pd 0x506e60 >> created channel 0x506e80 >> created cq 0x506ea0 >> created qp 0x506f40 >> rping_setup_buffers called on cb 0x505010 >> allocated & registered buffers... >> [New Thread 1090525504 (LWP 4624)] >> cq_thread started. >> >> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From davem at davemloft.net Wed Jul 12 16:40:48 2006 From: davem at davemloft.net (David Miller) Date: Wed, 12 Jul 2006 16:40:48 -0700 (PDT) Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: <1152746967.4572.263.camel@brick.pathscale.com> References: <1152746967.4572.263.camel@brick.pathscale.com> Message-ID: <20060712.164048.51855175.davem@davemloft.net> From: Ralph Campbell Date: Wed, 12 Jul 2006 16:29:27 -0700 > Currently, the ib_ipath driver requires that the mapping be > one-to-one since there is no practical way to reverse IOMMU > mappings. You can maintain a hash table that maps DMA addresses back to kernel mappings. Depending upon your situation, you can optimize this to use very small keys if you have some kind of other identification method for your buffers. That would be for dynamic mappings. You were using consistent DMA memory, which I gather you're not, you could use the PCI DMA pool mechanism. From afriedle at open-mpi.org Wed Jul 12 17:01:41 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Wed, 12 Jul 2006 17:01:41 -0700 Subject: [openib-general] multicast In-Reply-To: <44B5686B.8000708@ichips.intel.com> References: <44B56438.2000604@open-mpi.org> <44B5686B.8000708@ichips.intel.com> Message-ID: <44B58D65.9000607@open-mpi.org> Sean Hefty wrote: > Andrew Friedley wrote: > >> I'm trying to understand how the ibverbs multicast API works, but I'm >> not sure how multicast groups are created. I understand that >> ibv_attach_mcast() and ibv_detach_mcast() are used to leave/join a >> particular multicast group, but IB architecture spec indicates a group >> must be created first. How do I do this? >> >> Actually, is there any multicast sample code anywhere? > > > Before you can call ibv_attach_mcast(), you must first join the > multicast group by sending a multicast join request to the SA. The join > request may create the multicast group if it doesn't already exist, and > the request has the necessary information. > > You can use the rdma_join_multicast() API in the rdma_cm to join a > multicast group and attach the group to a QP. If the group has not > already been created, it will be created for you, with the group > parameters based on that used by ipoib. The rdma_join_multicast() call > will invoke ibv_attach_mcast() for you once the join operation > completes. There is sample code for joining multicast groups using this > API in userspace/librdmacm/examples/mckey.c. Thanks for the explanation/pointer. This looks like it will work, though I have a concern I'll discuss below. > If you want more details on multicast group creation, you'll need to see > the spec, section 12.2.5.17. Code to create / join a multicast group > can be found in the kernel cma.c module - see cma_join_ib_multicast(). > > You may want to see if rdma_join_multicast() meets your needs first. If > it does not, please let me know what additional functionality you need. I'm concerned about how rdma_cm abstracts HCAs. It looks like I can use the src_addr argument to rdma_resolve_addr() to select which IP address/HCA (assuming one IP per HCA), but how can I enumerate the available HCAs? This is important for a number of reasons - one, so that I can pass on the available IP addresses to MPI peers out of band. It's also important to know which HCA's are available in the system, and to be able to select which HCA to use when connecting to a peer. This allows us to implement things like load balancing and failover. Matt Leininger suggested looking at the IB CM as an alternative, as it gives more low-level control. Am I missing something, or does the IB CM not handle multicast like the RDMA CM? Andrew From rdreier at cisco.com Wed Jul 12 17:11:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 12 Jul 2006 17:11:26 -0700 Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: <1152746967.4572.263.camel@brick.pathscale.com> (Ralph Campbell's message of "Wed, 12 Jul 2006 16:29:27 -0700") References: <1152746967.4572.263.camel@brick.pathscale.com> Message-ID: > One solution is to change the IB device driver interface so that > kernel virtual addresses are passed to the IB device driver and > the device driver is responsible for calling dma_map_single(), etc. > I believe this will be unacceptable to the OpenFabrics community Actually it's worse than unacceptable -- I don't see how this can work at all. The problem is that addresses are not just passed directly to the local HCA; they also might be embedded in protocol messages that are sent to a remote HCA and then used by the remote HCA to initiate RDMA. For example, the SRP driver uses ib_get_dma_mr() to get an R_Key, which it then sends to the target along with a DMA address. The target uses that R_Key/address to RDMA data directly to or from the host. There's no good way for the low-level driver to handle the DMA mapping, since the address is embedded in a protocol message that the low-level driver knows nothing about. > Another solution is to change the IB device driver interface to add > a function which tells the caller what type of addresses the device > expects. Kernel modules would then be required to pass either a > dma_map_xxx() address or a kernel virtual address based on the > driver's preference. > The current set of IB consumers either start with kmalloc/vmalloc > memory (such as the MAD layer) or a list of physical pages > (such as ISER and SRP). The current code could therefore be > fairly easily changed except for ISER/SRP when a struct page > doesn't have a direct kernel address (high pages) and would > need to call kmap()/kunmap() in that case. I have a few problems with this: first, it's unfortunate that every consumer needs two code paths to handle the two possibilities; second, I don't see how it handles the case of SRP's use of the ib_get_dma_mr() R_Key as above anyway; third, expecting consumers to kmap pages for a long time across work request execution is a bad idea. Maybe the least bad solution would be to add rdma_XXX wrappers around the dma mapping functions that RDMA consumers use; then most low-level drivers could just pass them through to the DMA mapping API, while the ipath driver could handle things itself. The problem with that is that it ends up wrapping a huge API -- for example, you need both dma_map_single and dma_map_sg at least, plus someone might want to use dma_alloc_coherent memory, not to mention the dma_pool stuff, etc. A cleaner solution would be to make the dma_ API really use the device it's passed anyway, and allow drivers to override the standard PCI stuff nicely. But that would be major surgery, I guess. (BTW, vmalloc memory should not be used for DMA, since it's not guaranteed to be DMA-able -- so anyone doing that is just wrong) - R. From davem at davemloft.net Wed Jul 12 17:40:13 2006 From: davem at davemloft.net (David Miller) Date: Wed, 12 Jul 2006 17:40:13 -0700 (PDT) Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: References: <1152746967.4572.263.camel@brick.pathscale.com> Message-ID: <20060712.174013.95062313.davem@davemloft.net> From: Roland Dreier Date: Wed, 12 Jul 2006 17:11:26 -0700 > A cleaner solution would be to make the dma_ API really use the device > it's passed anyway, and allow drivers to override the standard PCI > stuff nicely. But that would be major surgery, I guess. Clean but expensive, you should not force the rest of the kernel to eat the cost of something you want to do when it's totally unnecessary for most other users. For example, x86 never needs to do anything other than a direct virt_to_phys translation to produce a DMA address, no matter what bus the device is on. It's a single simple integer adjustment that can be done inline in about 2 or 3 instructions at most. Once you start allowing overrides then even x86 starts to eat the stupid costs of dereferencing some kind of device ops method. That doesn't make any sense, and that's why the DMA API works the way it does now. It's a platform or bus operation, not a device one. If you need device level DMA mapping semantics, create them for your device type. This is what USB does, btw. From sashak at voltaire.com Wed Jul 12 17:45:37 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Jul 2006 03:45:37 +0300 Subject: [openib-general] [PATCH] OpenSM: Support configurable SL per partition In-Reply-To: <44B3C059020000C700012B8C@sinclair.provo.novell.com> References: <44B3C059020000C700012B8C@sinclair.provo.novell.com> Message-ID: <20060713004537.GF17182@sashak.voltaire.com> Hi Pat, On 15:14 Tue 11 Jul , Patrick Mullaney wrote: > This will avoid an invalid warning about service level value if sl=0 is > used in the partition config file. Yes, this is wrong warning, but original goal of this check was to catch non-numeric string. Think something like this may be better: diff --git a/osm/opensm/osm_prtn_config.c b/osm/opensm/osm_prtn_config.c index bd15513..5647ca8 100644 --- a/osm/opensm/osm_prtn_config.c +++ b/osm/opensm/osm_prtn_config.c @@ -150,7 +150,9 @@ static int partition_add_flag(unsigned l " - skipped.\n"); } else if (!strncmp(flag, "sl", len)) { unsigned sl; - if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > 15) + char *end; + if (!val || !*val || (sl = strtoul(val, &end, 0)) > 15 || + (*end && !isspace(*end))) PARSEWARN(conf->p_log, lineno, "flag \'sl\' requires valid value" " - skipped.\n"); The overflow case should be filtered by sl > 15 check. Sasha > Can you include something like it in > your original patch? > > Thanks. > Pat > > -------------------------------------------------------------------------------------------------- > > --- opensm.old/osm_prtn_config.c 2006-07-11 16:06:33.000000000 > -0400 > +++ opensm/osm_prtn_config.c 2006-07-11 16:08:00.000000000 -0400 > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -159,7 +160,7 @@ static int partition_add_flag(unsigned l > } else if (!strncmp(flag, "sl", len)) { > unsigned sl; > > - if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > > 15) > + if (!val || (sl = strtoul(val, NULL, 0)) == UINT_MAX || > sl > 15) > PARSEWARN(conf->p_log, lineno, > "flag \'sl\' requires valid value" > " - skipped.\n"); > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From akpm at osdl.org Wed Jul 12 18:30:49 2006 From: akpm at osdl.org (Andrew Morton) Date: Wed, 12 Jul 2006 18:30:49 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> Message-ID: <20060712183049.bcb6c404.akpm@osdl.org> On Wed, 12 Jul 2006 13:45:12 -0700 Roland Dreier wrote: > Currently, the code in lib/idr.c uses a bare spin_lock(&idp->lock) to > do internal locking. This is a nasty trap for code that might call > idr functions from different contexts; for example, it seems perfectly > reasonable to call idr_get_new() from process context and idr_remove() > from interrupt context -- but with the current locking this would lead > to a potential deadlock. > > The simplest fix for this is to just convert the idr locking to use > spin_lock_irqsave(). > > In particular, this fixes a very complicated locking issue detected by > lockdep, involving the ib_ipoib driver's priv->lock and dev->_xmit_lock, > which get involved with the ib_sa module's query_idr.lock. Sigh. It was always a mistake (of the kernel programming 101 type) to put any locking at all in the idr code. At some stage we need to weed it all out and move it to callers. Your fix is yet more fallout from that mistake. From halr at voltaire.com Wed Jul 12 18:45:47 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 21:45:47 -0400 Subject: [openib-general] openSM failover / failback issue? In-Reply-To: <44B57967.1030706@ichips.intel.com> References: <000001c6a5dc$ca72ff00$ff0da8c0@amr.corp.intel.com> <1152740984.13048.144995.camel@hal.voltaire.com> <44B57967.1030706@ichips.intel.com> Message-ID: <1152755146.13048.156462.camel@hal.voltaire.com> On Wed, 2006-07-12 at 18:36, Sean Hefty wrote: > Hal Rosenstock wrote: > > With the default sminfo_polling_timeout of 10 seconds and default > > polling_retry_number of 4, so the total handoff time should be around 40 > > seconds. I just did that experiment with 2 SMs and saw that as well. > > Okay - I narrowed down the test case to something reproducible. > > I have 2 systems connected into Woody's cluster. I'm not sure how many systems > are in Woody's cluster, but probably around 8. OpenSM is running on one of the > systems in the cluster. If I run osmtest from either of my two systems, it > works fine. > > If I start openSM on one of my systems, it becomes the master SM. The LIDs on > my systems are reassigned. If I run osmtest from either of my two systems, it > still works fine. > > If I kill openSM on my system, then run osmtest -f c, I get a failure: Error on > query (IB_TIMEOUT). It looks like a CLASS_PORT_INFO query, Yes, that's the first SA query that osmtest makes. > but the query is going to my now dead opensm system. What does ibstat/ibstatus say for the SMLID on the osmtest machine ? What about the OpenSM machine ? > At this point, if I unload / reload ib_mthca on either of my systems, Woody's SM > kicks in, reassigns my systems' LIDs, and osmtest starts working again. to Woody's SM. > I don't know if this is an HCA firmware issues, switch issue, or openSM issue. > I don't think it's related to my changes or osmtest at this point. I'll see if I can reproduce this tomorrow. Also, can you send me the guid2lid files from the 3 SMs ? -- Hal > - Sean From halr at voltaire.com Wed Jul 12 18:55:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Jul 2006 21:55:03 -0400 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001601c6a60c$5fa64b70$4710100a@pc.s2io.com> References: <001601c6a60c$5fa64b70$4710100a@pc.s2io.com> Message-ID: <1152755701.13048.156865.camel@hal.voltaire.com> Ravinandan, On Wed, 2006-07-12 at 19:39, Ravinandan Arakali wrote: > Also, I am trying to run some of the iwarp bandwidth/latency tests > (available under directory perftest). > The first thing to do here is to run opensm. You don't need to run OpenSM for iWARP. > When I run opensm (with debug > level 10), I get the following error. Any idea what needs to be done to get > this working ? > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > Command Line Arguments: > d level = 0xa > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > > Using default GUID 0x0 > Error: Could not get port guid > Exiting SM > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > previous bind > Jul 12 08:35:04 721702 [0000] -> Exiting SM Do you have an IB device in the system ? That is what it could be complaining about in its own arcane way. -- Hal > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Wednesday, July 12, 2006 10:31 AM > To: Ravinandan Arakali > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan, > Do you still see the rping crash? > > Thanks, > Pradipta Kumar. > > Ravinandan Arakali wrote: > > Pradipta, > > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > > values in /proc/krping. When I re-ran the krping test, I see following > > output > > openfab2:~ # cat /proc/krping > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > Any idea why rping is dumping core ? > > > > Has any testing been done using SDP with ammasso cards ? > > > > Regards, > > Ravi > > > > > > -----Original Message----- > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > Sent: Friday, July 07, 2006 11:20 PM > > To: Ravinandan Arakali > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > interface) > > > > > > Ravinandan Arakali wrote: > >> Pradipta, > >> Following is the output from gdb after core dump. I have also copy-pasted > >> the gdb output on client system. > >> > >> Attached is the dmesg output when krping test is run in verbose mode. > >> The ping data on the sender(client) seems okay. The content is shifted > >> forward by one character for each packet. On receiver, after receiving > > ping > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > messages > >> can be lost during writing to /var/log/messages ? > > krping is indeed working!!...Using 'verbose' allows you to see the ping > > data. > > When not using 'verbose' you see only 'send/recv' messages. > >> ----------------------------------------- > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > >> Starting program: > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > >> 0.0.0.0 -p 9999 > >> [Thread debugging using libthread_db enabled] > >> [New Thread -1210054992 (LWP 3668)] > >> ipaddr (0.0.0.0) > >> port 9999 > >> created cm_id 0x804e6e0 > >> [New Thread -1210057824 (LWP 3671)] > >> rdma_bind_addr successful > >> rdma_listen > >> cma_event type 4 cma_id 0x804e968 (child) > >> child cma 0x804e968 > >> > >> Program received signal SIGSEGV, Segmentation fault. > >> [Switching to Thread -1210054992 (LWP 3668)] > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > >> (gdb) bt > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > >> ) at examples/rping.c:767 > >> (gdb) > >> > >> --------------------------------- > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > >> Starting program: > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > >> 17.2.2.102 -p 9999 > >> [Thread debugging using libthread_db enabled] > >> [New Thread 47388824908032 (LWP 4620)] > >> ipaddr (17.2.2.102) > >> port 9999 > >> created cm_id 0x506b00 > >> [New Thread 1082132800 (LWP 4623)] > >> cma_event type 0 cma_id 0x506b00 (parent) > >> cma_event type 2 cma_id 0x506b00 (parent) > >> rdma_resolve_addr - rdma_resolve_route successful > >> created pd 0x506e60 > >> created channel 0x506e80 > >> created cq 0x506ea0 > >> created qp 0x506f40 > >> rping_setup_buffers called on cb 0x505010 > >> allocated & registered buffers... > >> [New Thread 1090525504 (LWP 4624)] > >> cq_thread started. > >> > >> > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sean.hefty at intel.com Wed Jul 12 19:52:31 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 12 Jul 2006 19:52:31 -0700 Subject: [openib-general] openSM failover / failback issue? In-Reply-To: <1152755146.13048.156462.camel@hal.voltaire.com> Message-ID: <000001c6a627$63237000$19cc180a@amr.corp.intel.com> >> I don't know if this is an HCA firmware issues, switch issue, or openSM >issue. >> I don't think it's related to my changes or osmtest at this point. > >I'll see if I can reproduce this tomorrow. > >Also, can you send me the guid2lid files from the 3 SMs ? I'll send this tomorrow. Before reloading mthca, the SMLID still shows as LID 1, which is still assigned to my system. After reloading mthca, my LIDs change back to their original values. This seems similar to the problem that I saw a long time ago, when I needed to physically unplug my cables to kickstart the SM into seeing my nodes.. - Sean From sean.hefty at intel.com Wed Jul 12 21:17:49 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 12 Jul 2006 21:17:49 -0700 Subject: [openib-general] multicast In-Reply-To: <44B58D65.9000607@open-mpi.org> Message-ID: <000101c6a633$4dbaaba0$19cc180a@amr.corp.intel.com> >I'm concerned about how rdma_cm abstracts HCAs. It looks like I can use >the src_addr argument to rdma_resolve_addr() to select which IP >address/HCA (assuming one IP per HCA), but how can I enumerate the >available HCAs? The HCA / RDMA device abstraction is there for device hotplug, but the verb call to enumerate HCAs is still usable if you want to get a list of all HCAs in the system. You will likely have one IP address per port, rather than per HCA. You probably want to distinguish between locally assigned IP addresses (those given to ipoib devices - ib0, etc.), versus multicast IP addresses, and verify that your multicast routing tables direct traffic out of ipoib IP addresses, rather than Ethernet IP addresses. The IB multicast groups will base their local routing the same as the true IP multicast groups. >This is important for a number of reasons - one, so that I can pass on >the available IP addresses to MPI peers out of band. It's also >important to know which HCA's are available in the system, and to be >able to select which HCA to use when connecting to a peer. This allows >us to implement things like load balancing and failover. HCA / port selection can be controlled by selecting a specific IP address, and you can configure your multicast routing tables to direct traffic out any desired port. You should have the same control over using a specific HCA / port; only the type of address used to identify the port changes. I might be able to make things a little easier by adding some sort of call that identifies all RDMA IP addresses in the system. You could test for this today by calling rdma_bind_addr() on all IP addresses assigned to the system. This doesn't really help with multicast addresses though, since you don't bind to them... I'm not clear on what you mean about passing available IP addresses to MPI peers, or why it's done out of band. Are you talking about IP addresses of the local ipoib devices? Multicast IP addresses? By out of band, do you mean over a socket, as opposed to an IB connection? >Matt Leininger suggested looking at the IB CM as an alternative, as it >gives more low-level control. Am I missing something, or does the IB CM >not handle multicast like the RDMA CM? IB multicast groups require SA interaction, and are not associated with the IB CM. What control do you feel that the RDMA CM is missing? - Sean From bpradip at in.ibm.com Wed Jul 12 22:20:52 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Thu, 13 Jul 2006 10:50:52 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001601c6a60c$5fa64b70$4710100a@pc.s2io.com> References: <001601c6a60c$5fa64b70$4710100a@pc.s2io.com> Message-ID: <44B5D834.30505@in.ibm.com> Ravinandan Arakali wrote: > Also, I am trying to run some of the iwarp bandwidth/latency tests > (available under directory perftest). > The first thing to do here is to run opensm. When I run opensm (with debug You do not need opensm for iwarp. You will be able to use only rdma_bw and rdma_lat from the iwarp branch or rdma_bw from the main trunk. Rest are _ib_only_ . Thanks, Pradipta Kumar. > level 10), I get the following error. Any idea what needs to be done to get > this working ? > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > Command Line Arguments: > d level = 0xa > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > > Using default GUID 0x0 > Error: Could not get port guid > Exiting SM > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > previous bind > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Wednesday, July 12, 2006 10:31 AM > To: Ravinandan Arakali > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan, > Do you still see the rping crash? > > Thanks, > Pradipta Kumar. > > Ravinandan Arakali wrote: >> Pradipta, >> Okay, thanks.. Initially, I was not sure since I don't remember non-zero >> values in /proc/krping. When I re-ran the krping test, I see following >> output >> openfab2:~ # cat /proc/krping >> 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 >> From muli at il.ibm.com Wed Jul 12 22:46:58 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Thu, 13 Jul 2006 08:46:58 +0300 Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: <20060712.174013.95062313.davem@davemloft.net> References: <1152746967.4572.263.camel@brick.pathscale.com> <20060712.174013.95062313.davem@davemloft.net> Message-ID: <20060713054658.GC5096@rhun.ibm.com> On Wed, Jul 12, 2006 at 05:40:13PM -0700, David Miller wrote: > From: Roland Dreier > Date: Wed, 12 Jul 2006 17:11:26 -0700 > > > A cleaner solution would be to make the dma_ API really use the device > > it's passed anyway, and allow drivers to override the standard PCI > > stuff nicely. But that would be major surgery, I guess. > > Clean but expensive, you should not force the rest of the kernel > to eat the cost of something you want to do when it's totally > unnecessary for most other users. > > For example, x86 never needs to do anything other than a direct > virt_to_phys translation to produce a DMA address, no matter what > bus the device is on. It's a single simple integer adjustment > that can be done inline in about 2 or 3 instructions at most. It's possible that even x86 will support multiple IOMMUs in the future - for example, the Calgary IOMMU support we recently added to x86-64 could be modified to work on plain x86 as well. I like the idea of a per-device DMA-API implementation, but only if it can be done in a way that is zero cost to the majority of the users of the API. We already have dynamic dma_ops on x86-64 to support nommu, swiotlb, gart and Calgary cleanly, extending it to use a per-device dma-ops isn't too difficult. Cheers, Muli From olson at unixfolk.com Wed Jul 12 23:01:03 2006 From: olson at unixfolk.com (Dave Olson) Date: Wed, 12 Jul 2006 23:01:03 -0700 (PDT) Subject: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems In-Reply-To: References: Message-ID: On Thu, 6 Jul 2006, Benjamin Herrenschmidt wrote: | On Thu, 2006-07-06 at 16:34 -0700, Bryan O'Sullivan wrote: | > On Fri, 2006-07-07 at 08:37 +1000, Benjamin Herrenschmidt wrote: | > | > > > +int ipath_unordered_wc(void) | > > > +{ | > > > + return 1; | > > > +} | > > | > > How is the above providing any kind of serialisation ? | > | > It's not intended to; it tells the *caller* whether to do it. | | Ah ok. What barrier do you use for that ? on x86_64, the sfence instruction, on others, wmb(), in the kernel. In user-land, pretty much the same underlying assembly instructions. Dave Olson olson at unixfolk.com http://www.unixfolk.com/dave From stefanr at s5r6.in-berlin.de Thu Jul 13 00:45:53 2006 From: stefanr at s5r6.in-berlin.de (Stefan Richter) Date: Thu, 13 Jul 2006 09:45:53 +0200 Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: <20060712.174013.95062313.davem@davemloft.net> References: <1152746967.4572.263.camel@brick.pathscale.com> <20060712.174013.95062313.davem@davemloft.net> Message-ID: <44B5FA31.9030309@s5r6.in-berlin.de> David Miller wrote: > If you need device level DMA mapping semantics, create them for your > device type. This is what USB does, btw. Ralph, two other examples where drivers provide some sort of address lookup are: - drivers/ieee1394/dma.[hc] AFAIK this deals with housekeeping of ringbuffers as used by 1394 controllers for isochronous transmit and receive. Users of this little API are dv1394, video1394, ohci1394. - patch "dc395x: dynamically map scatter-gather for PIO" by Guennadi Liakhovetski, http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cdb8c2a6d848deb9eeefffff42974478fbb51b8c This mapping is not specific to SCSI. The user is a driver which mixes PIO and DMA. I don't know if these have any similarity to your requirements though. (I too need to come up with either a portable replacement of bus_to_virt or with a fundamentally different implementation but haven't started my project yet. This occurrence of bus_to_virt is in drivers/ieee1394/sbp2 but #ifdef'd out by default.) -- Stefan Richter -=====-=-==- -=== -==-- http://arcgraph.de/sr/ From ogerlitz at voltaire.com Thu Jul 13 01:00:39 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 13 Jul 2006 11:00:39 +0300 (IDT) Subject: [openib-general] [PATCH] IB/ipoib: remove broken link from Kconfig and documentation Message-ID: remove references to the IPoIB IETF working group as it has been closed. Signed-off-by: Or Gerlitz diff --git a/drivers/infiniband/ulp/ipoib/Kconfig b/drivers/infiniband/ulp/ipoib/Kconfig index 13d6d01..d74653d 100644 --- a/drivers/infiniband/ulp/ipoib/Kconfig +++ b/drivers/infiniband/ulp/ipoib/Kconfig @@ -6,8 +6,7 @@ config INFINIBAND_IPOIB transports IP packets over InfiniBand so you can use your IB device as a fancy NIC. - The IPoIB protocol is defined by the IETF ipoib working - group: . + See Documentation/infiniband/ipoib.txt for more information config INFINIBAND_IPOIB_DEBUG bool "IP-over-InfiniBand debugging" if EMBEDDED diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt index 1870355..864ff32 100644 --- a/Documentation/infiniband/ipoib.txt +++ b/Documentation/infiniband/ipoib.txt @@ -51,8 +51,6 @@ Debugging Information References - IETF IP over InfiniBand (ipoib) Working Group - http://ietf.org/html.charters/ipoib-charter.html Transmission of IP over InfiniBand (IPoIB) (RFC 4391) http://ietf.org/rfc/rfc4391.txt IP over InfiniBand (IPoIB) Architecture (RFC 4392) From ogerlitz at voltaire.com Thu Jul 13 01:00:55 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 13 Jul 2006 11:00:55 +0300 (IDT) Subject: [openib-general] [PATCH] IB/ipoib: remove broken link from Kconfig and documentation In-Reply-To: References: Message-ID: Roland - is this 2.6.18 (bugfixes) material? We have a fix for iSER's Kconfig and i wonder if its fine to send it now also. Or. From dotanb at mellanox.co.il Thu Jul 13 00:58:06 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 13 Jul 2006 10:58:06 +0300 Subject: [openib-general] there are compilation warnings in complib Message-ID: <200607131058.06590.dotanb@mellanox.co.il> Hi. I noticed that there are warnings during the compilation of the gen2 driver. machine props: ************************************************************* Host Architecture : x86_64 Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 2) Kernel Version : 2.6.9-22.ELsmp Memory size : 4037872 kB Driver Version : gen2_linux-20060711-1731 (REV=8496) HCA ID(s) : mthca0 HCA model(s) : 25218 FW version(s) : 5.1.921 Board(s) : MT_0150000001 ************************************************************* compilation log: make[1]: Entering directory `/tmp/openib_gen2/last_stable/src/userspace/management/osm/complib' if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./../include -Wall -Werror -g -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -MT libosmcomp_la-cl_memory.lo -MD -MP -MF ".deps/libosmcomp_la-cl_memory.Tpo" -c -o libosmcomp_la-cl_memory.lo `test -f 'cl_memory.c' || echo './'`cl_memory.c; \ then mv -f ".deps/libosmcomp_la-cl_memory.Tpo" ".deps/libosmcomp_la-cl_memory.Plo"; else rm -f ".deps/libosmcomp_la-cl_memory.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./../include -Wall -Werror -g -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -MT libosmcomp_la-cl_memory.lo -MD -MP -MF .deps/libosmcomp_la-cl_memory.Tpo -c cl_memory.c -fPIC -DPIC -o .libs/libosmcomp_la-cl_memory.o In file included from cl_memory.c:55: ./../include/complib/cl_memtrack.h:99: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) cl_memory.c:74: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) cl_memory.c: In function `__cl_mem_track_start': cl_memory.c:106: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) cl_memory.c:106: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) cl_memory.c: In function `__cl_mem_track_stop': cl_memory.c:143: warning: `cl_mem_display' is deprecated (declared at ./../include/complib/cl_memory.h:139) cl_memory.c:152: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:152: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:159: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:159: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c: In function `cl_mem_display': cl_memory.c:192: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:206: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:206: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c: In function `cl_mem_check': cl_memory.c:229: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:245: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:245: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c: In function `__cl_malloc_trk': cl_memory.c:297: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:306: warning: `cl_mem_check' is deprecated (declared at cl_memory.c:223) cl_memory.c:354: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:354: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:359: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c: In function `__cl_zalloc_trk': cl_memory.c:403: warning: `__cl_malloc_trk' is deprecated (declared at cl_memory.c:296) cl_memory.c: In function `__cl_find_mem': cl_memory.c:430: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:433: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:433: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c: In function `__cl_free_trk': cl_memory.c:448: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:454: warning: `cl_mem_check' is deprecated (declared at cl_memory.c:223) cl_memory.c:476: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) cl_memory.c:476: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) make[1]: *** [libosmcomp_la-cl_memory.lo] Error 1 make[1]: Leaving directory `/tmp/openib_gen2/last_stable/src/userspace/management/osm/complib' Can you please fix them? thanks Dotan From mst at mellanox.co.il Thu Jul 13 01:05:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 11:05:49 +0300 Subject: [openib-general] fix limit event range check for memfree Message-ID: <20060713080549.GA20223@mellanox.co.il> Hi, Roland! Here's a small mthca patch from Dotan Barak. --- mthca: fix limit event range check for memfree. Signed-off-by: Dotan Barak Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/hw/mthca/mthca_srq.c =================================================================== --- openib/drivers/infiniband/hw/mthca/mthca_srq.c (revision 8189) +++ openib/drivers/infiniband/hw/mthca/mthca_srq.c (working copy) @@ -370,7 +370,8 @@ int mthca_modify_srq(struct ib_srq *ibsr return -EINVAL; if (attr_mask & IB_SRQ_LIMIT) { - if (attr->srq_limit > srq->max) + u32 max_wr = (mthca_is_memfree(dev)) ? srq->max - 1 : srq->max; + if (attr->srq_limit > max_wr) return -EINVAL; mutex_lock(&srq->mutex); -- MST From mst at mellanox.co.il Thu Jul 13 01:15:39 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 11:15:39 +0300 Subject: [openib-general] [PATCH] IB/ipoib: remove broken link from Kconfig and documentation In-Reply-To: References: Message-ID: <20060713081539.GB20223@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH] IB/ipoib: remove broken link from Kconfig and documentation > > Roland - is this 2.6.18 (bugfixes) material? Looks safe enough :) > We have a fix for iSER's Kconfig and i wonder if its fine to send it now also. Where? Was this iSER Kconfig patch posted on openib? lkml? I can't find anything in the archives ... BTW, if there's something you intend for 2.6.18, pls Cc me, I keep a tree of IB updates destined for 2.6.18 and do some testing on it. I don't have an iSER target but at least I can compile, with multiple compilers. -- MST From gil at mellanox.co.il Thu Jul 13 02:05:41 2006 From: gil at mellanox.co.il (Gil Bloch) Date: Thu, 13 Jul 2006 12:05:41 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <200607121907.k6CJ7sc8000554@xi.cse.ohio-state.edu> References: <200607121907.k6CJ7sc8000554@xi.cse.ohio-state.edu> Message-ID: <44B60CE5.8010504@mellanox.co.il> Dhabaleswar Panda wrote: > Tziporet, > > >> - Based on 0.97 (we will not move to 0.98 since we tested it and >> found it is less stable then 0.97) >> > > Could you please indicate which version of 0.9.8 you tested and what > are the exact problems you have faced. > We are running 0.9.8-RC0 and we did not find specific problems with it. The problem is that we did not have the chance to run enough testings yet, and as you indicated, this is RC0 and some changes / fixes are expected. Thus, for the coming release we prefer to use the well-tested 0.9.7 and we will take the 0.9.8 (or any successor if released early enough). > Please note that 0.9.8 has not been formally released yet. What is > currently available is 0.9.8-RC0. We have been continuing our testing. > Since the 0.9.8-RC0 release, there have been several fixes applied to > the trunk based on our internal testing and feedbacks received from > others. We plan to make a formal release soon. Please try the latest > version available from the trunk. If you continue to see problems, > please let us know and we will be happy to fix these problems before > the release. > We will let you know if we find any problem with the 0.9.8. Regards, Gil > Thanks, > > DK > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From ogerlitz at voltaire.com Thu Jul 13 02:21:56 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 13 Jul 2006 12:21:56 +0300 Subject: [openib-general] [PATCH] IB/ipoib: remove broken link from Kconfig and documentation In-Reply-To: <20060713081539.GB20223@mellanox.co.il> References: <20060713081539.GB20223@mellanox.co.il> Message-ID: <44B610B4.4090705@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> We have a fix for iSER's Kconfig and i wonder if its fine to send it now also. > Where? Was this iSER Kconfig patch posted on openib? lkml? I can't find anything > in the archives ... Since i was thinking its not 2.6.18 material, I just applied it in the svn, no more. Or. From tziporet at mellanox.co.il Thu Jul 13 02:57:47 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 13 Jul 2006 12:57:47 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> Message-ID: <44B6191B.4060807@mellanox.co.il> Roland Dreier wrote: > To me this schedule seems too short to expect real new features like > Since the technical details were not in this mail some features seems a lot of changes while they are not (to be explained). > > - HCA fatal - full flow support > We just want to use the current flow that exists today and test it, so I can declare its working (this flow was not tested thoroughly in OFED 1.0) We also want to add a user level daemon that will catch the fatal event and restart the driver once it happened. So, I guess we will find some bugs in the flow but no new features should be implemented. > > * IPoIB > > - Bonding - for high availability > > that have had no work done (in public at least) yet to be integrated. > If 1.1 is going to go to code freeze in 19 days then I don't see how > these features can be finished, let alone stabilized in time. > The solution we wish to add is a not the bonding kernel module but a user level application that will handle failover between IB interfaces (sorry for not being accurate in the feature definition). So in any case it will not harm stability of the exiting code. The add-on functionality may be less stable (beta) and not GA as other modules. > In my opinion it would be better to focus on polishing what was > already in OFED 1.0 for OFED 1.1. Of course we should work on > valuable features like IPoIB bonding, but I don't think they should be > tied to OFED 1.1. In general I agree and this is the motivation over most changes and features. BTW - we will need your help regarding SRP daemon changes that were implemented by Ishai for OFED 1.0 but have not make it then. Tziporet From ogerlitz at voltaire.com Thu Jul 13 03:43:38 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 13 Jul 2006 13:43:38 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <44B6191B.4060807@mellanox.co.il> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> <44B6191B.4060807@mellanox.co.il> Message-ID: <44B623DA.20506@voltaire.com> Tziporet Koren wrote: > Roland Dreier wrote: > Since the technical details were not in this mail some features seems a >> > * IPoIB >> > - Bonding - for high availability >> that have had no work done (in public at least) yet to be integrated. >> If 1.1 is going to go to code freeze in 19 days then I don't see how >> these features can be finished, let alone stabilized in time. > The solution we wish to add is a not the bonding kernel module but a > user level application that will handle failover between IB interfaces > (sorry for not being accurate in the feature definition). So if the link which ib0 maps to is DOWN you move the ib0 IPv4 address to another device whose link is UP (eg ib1) and you somehow have ib1 send a gratuitous ARP? Or. From dotanb at mellanox.co.il Thu Jul 13 03:48:12 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 13 Jul 2006 13:48:12 +0300 Subject: [openib-general] there are compilation warnings in complib In-Reply-To: <1152786296.13048.181809.camel@hal.voltaire.com> References: <200607131058.06590.dotanb@mellanox.co.il> <1152786296.13048.181809.camel@hal.voltaire.com> Message-ID: <200607131348.12599.dotanb@mellanox.co.il> On Thursday 13 July 2006 13:25, Hal Rosenstock wrote: > > ./../include/complib/cl_memtrack.h:99: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > cl_memory.c:74: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > cl_memory.c: In function `__cl_mem_track_start': > > cl_memory.c:106: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > cl_memory.c:106: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > cl_memory.c: In function `__cl_mem_track_stop': > > cl_memory.c:143: warning: `cl_mem_display' is deprecated (declared at ./../include/complib/cl_memory.h:139) > > cl_memory.c:152: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > > cl_memory.c:152: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > > cl_memory.c:159: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > > cl_memory.c:159: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > > These warnings are expected as those routines and structures are being > deprecated. There was a thread on openib-general on using malloc/free > directly which is what is now done in (at least in the Linux complib; I > don't know if this is being carried over to the Windows one as that is a > separate implementation). > > -- Hal > If another c file or library would have used this header file, so those warning messages were understandable, but we get those warnings when we compile the comp_lib ... Dotan From halr at voltaire.com Thu Jul 13 03:25:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Jul 2006 06:25:58 -0400 Subject: [openib-general] there are compilation warnings in complib In-Reply-To: <200607131058.06590.dotanb@mellanox.co.il> References: <200607131058.06590.dotanb@mellanox.co.il> Message-ID: <1152786296.13048.181809.camel@hal.voltaire.com> On Thu, 2006-07-13 at 03:58, Dotan Barak wrote: > Hi. > > I noticed that there are warnings during the compilation of the gen2 driver. > > machine props: > ************************************************************* > Host Architecture : x86_64 > Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 2) > Kernel Version : 2.6.9-22.ELsmp > Memory size : 4037872 kB > Driver Version : gen2_linux-20060711-1731 (REV=8496) > HCA ID(s) : mthca0 > HCA model(s) : 25218 > FW version(s) : 5.1.921 > Board(s) : MT_0150000001 > ************************************************************* > > compilation log: > make[1]: Entering directory `/tmp/openib_gen2/last_stable/src/userspace/management/osm/complib' > if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./../include -Wall -Werror -g -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -MT libosmcomp_la-cl_memory.lo -MD -MP -MF ".deps/libosmcomp_la-cl_memory.Tpo" -c -o libosmcomp_la-cl_memory.lo `test -f 'cl_memory.c' || echo './'`cl_memory.c; \ > then mv -f ".deps/libosmcomp_la-cl_memory.Tpo" ".deps/libosmcomp_la-cl_memory.Plo"; else rm -f ".deps/libosmcomp_la-cl_memory.Tpo"; exit 1; fi > gcc -DHAVE_CONFIG_H -I. -I. -I. -I./../include -Wall -Werror -g -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -MT libosmcomp_la-cl_memory.lo -MD -MP -MF .deps/libosmcomp_la-cl_memory.Tpo -c cl_memory.c -fPIC -DPIC -o .libs/libosmcomp_la-cl_memory.o > In file included from cl_memory.c:55: > ./../include/complib/cl_memtrack.h:99: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > cl_memory.c:74: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > cl_memory.c: In function `__cl_mem_track_start': > cl_memory.c:106: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > cl_memory.c:106: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > cl_memory.c: In function `__cl_mem_track_stop': > cl_memory.c:143: warning: `cl_mem_display' is deprecated (declared at ./../include/complib/cl_memory.h:139) > cl_memory.c:152: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:152: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:159: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:159: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c: In function `cl_mem_display': > cl_memory.c:192: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:206: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:206: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c: In function `cl_mem_check': > cl_memory.c:229: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:245: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:245: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c: In function `__cl_malloc_trk': > cl_memory.c:297: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:306: warning: `cl_mem_check' is deprecated (declared at cl_memory.c:223) > cl_memory.c:354: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:354: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:359: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c: In function `__cl_zalloc_trk': > cl_memory.c:403: warning: `__cl_malloc_trk' is deprecated (declared at cl_memory.c:296) > cl_memory.c: In function `__cl_find_mem': > cl_memory.c:430: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:433: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:433: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c: In function `__cl_free_trk': > cl_memory.c:448: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:454: warning: `cl_mem_check' is deprecated (declared at cl_memory.c:223) > cl_memory.c:476: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > cl_memory.c:476: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > make[1]: *** [libosmcomp_la-cl_memory.lo] Error 1 > make[1]: Leaving directory `/tmp/openib_gen2/last_stable/src/userspace/management/osm/complib' > > > Can you please fix them? These warnings are expected as those routines and structures are being deprecated. There was a thread on openib-general on using malloc/free directly which is what is now done in (at least in the Linux complib; I don't know if this is being carried over to the Windows one as that is a separate implementation). -- Hal From mst at mellanox.co.il Thu Jul 13 04:52:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 14:52:57 +0300 Subject: [openib-general] there are compilation warnings in complib In-Reply-To: <1152786296.13048.181809.camel@hal.voltaire.com> References: <1152786296.13048.181809.camel@hal.voltaire.com> Message-ID: <20060713115257.GA21190@mellanox.co.il> Quoting r. Hal Rosenstock : > > In file included from cl_memory.c:55: > > ./../include/complib/cl_memtrack.h:99: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > cl_memory.c:74: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > > > Can you please fix them? > > These warnings are expected as those routines and structures are being > deprecated. There was a thread on openib-general on using malloc/free > directly which is what is now done in (at least in the Linux complib; I > don't know if this is being carried over to the Windows one as that is a > separate implementation). So, one deprecated routine is including another? I suggest compiling cl_memory.c with -Wno-deprecated then. -- MST From halr at voltaire.com Thu Jul 13 05:11:01 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Jul 2006 08:11:01 -0400 Subject: [openib-general] there are compilation warnings in complib In-Reply-To: <200607131348.12599.dotanb@mellanox.co.il> References: <200607131058.06590.dotanb@mellanox.co.il> <1152786296.13048.181809.camel@hal.voltaire.com> <200607131348.12599.dotanb@mellanox.co.il> Message-ID: <1152792650.13048.186653.camel@hal.voltaire.com> On Thu, 2006-07-13 at 06:48, Dotan Barak wrote: > On Thursday 13 July 2006 13:25, Hal Rosenstock wrote: > > > ./../include/complib/cl_memtrack.h:99: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > > cl_memory.c:74: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > > cl_memory.c: In function `__cl_mem_track_start': > > > cl_memory.c:106: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > > cl_memory.c:106: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > > cl_memory.c: In function `__cl_mem_track_stop': > > > cl_memory.c:143: warning: `cl_mem_display' is deprecated (declared at ./../include/complib/cl_memory.h:139) > > > cl_memory.c:152: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > > > cl_memory.c:152: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > > > cl_memory.c:159: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > > > cl_memory.c:159: warning: `cl_malloc_hdr_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:96) > > > > These warnings are expected as those routines and structures are being > > deprecated. There was a thread on openib-general on using malloc/free > > directly which is what is now done in (at least in the Linux complib; I > > don't know if this is being carried over to the Windows one as that is a > > separate implementation). > > > > -- Hal > > > > If another c file or library would have used this header file, so those warning messages > were understandable, but we get those warnings when we compile the comp_lib ... I'm unaware of a clean way to make that happen. -- Hal > Dotan From halr at voltaire.com Thu Jul 13 06:15:30 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Jul 2006 09:15:30 -0400 Subject: [openib-general] there are compilation warnings in complib In-Reply-To: <20060713115257.GA21190@mellanox.co.il> References: <1152786296.13048.181809.camel@hal.voltaire.com> <20060713115257.GA21190@mellanox.co.il> Message-ID: <1152796508.13048.189352.camel@hal.voltaire.com> On Thu, 2006-07-13 at 07:52, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > > In file included from cl_memory.c:55: > > > ./../include/complib/cl_memtrack.h:99: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > > cl_memory.c:74: warning: `cl_mem_tracker_t' is deprecated (declared at ./../include/complib/cl_memtrack.h:82) > > > > > > Can you please fix them? > > > > These warnings are expected as those routines and structures are being > > deprecated. There was a thread on openib-general on using malloc/free > > directly which is what is now done in (at least in the Linux complib; I > > don't know if this is being carried over to the Windows one as that is a > > separate implementation). > > So, one deprecated routine is including another? Not exactly. This is the implementation of the deprecated routines using the deprecated structures. > I suggest compiling cl_memory.c with -Wno-deprecated then. Thanks. -Wno-deprecated-declaration should be sufficient for this. -- Hal From rep.nop at aon.at Thu Jul 13 06:40:53 2006 From: rep.nop at aon.at (Bernhard Fischer) Date: Thu, 13 Jul 2006 15:40:53 +0200 Subject: [openib-general] [patch] update uverbs_event_get_sb() to new get_sb_pseudo() Message-ID: <20060713134053.GA16556@aon.at> Hi, get_sb_pseudo was changed but uverbs was not updated. PS: should nowadays the in-kernel impl be used instead of the one in svn? Signed-off-by: Bernhard Fischer -------------- next part -------------- Index: uverbs_main.c =================================================================== --- uverbs_main.c (revision 8507) +++ uverbs_main.c (working copy) @@ -816,10 +816,10 @@ } static struct super_block *uverbs_event_get_sb(struct file_system_type *fs_type, int flags, - const char *dev_name, void *data) + const char *dev_name, void *data, struct vfsmount *mnt) { return get_sb_pseudo(fs_type, "infinibandevent:", NULL, - INFINIBANDEVENTFS_MAGIC); + INFINIBANDEVENTFS_MAGIC, mnt); } static struct file_system_type uverbs_event_fs = { From moshek at voltaire.com Thu Jul 13 06:51:43 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 13 Jul 2006 16:51:43 +0300 Subject: [openib-general] SLES9 SP3 support was added Message-ID: I'm trying to use OFED 1.0 on sles9 SP3 PPC64 . OFED 1.0 requires sysfsutils to be installed . I tried to compile & install sysfsutils-1.2.0-4.src.rpm But found that it conflicts with udev--021-36 rpm Anyone knows to work around this problem ? Also, OFED 1.0 is svn rev 8031 while patches/2.6.5-7.244 includes some svn 8111 files. Was OFED 1.0 (svn 8031) tested on SLES9 SP3 ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Thursday, June 22, 2006 4:20 PM To: openfabrics-ewg at openib.org; OPENIB Subject: [openib-general] SLES9 SP3 support was added Hi All, We have added support for SLES9 SP3 that can be used with OFED 1.0. The kernel modules supported are: * mthca * core * CM & CMA * IPoIB * SRP All user level apps and libraries are working too. CPU Architectures supported: * x86 * x86_64 * ia64 The backport patches are available at: https://openib.org/svn/gen2/branches/1.0/ofed/patches/2.6.5-7.244/ There is also a need to take the updated configure and install.sh that add SLES9 specific support. There are no other changes in the package beside these. Is there a need to create a package (1.0.1) with SLES9 support? Tziporet _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Thu Jul 13 06:55:05 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 13 Jul 2006 08:55:05 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARP interface) In-Reply-To: <001601c6a60c$5fa64b70$4710100a@pc.s2io.com> References: <001601c6a60c$5fa64b70$4710100a@pc.s2io.com> Message-ID: <1152798905.25964.10.camel@stevo-desktop> Are you trying to run this over iwarp? It doesn't need an SM... For the perftests rdma_lat and rdma_bw in the iwarp branch, use the --cma flag. Steve. On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: > Also, I am trying to run some of the iwarp bandwidth/latency tests > (available under directory perftest). > The first thing to do here is to run opensm. When I run opensm (with debug > level 10), I get the following error. Any idea what needs to be done to get > this working ? > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > Command Line Arguments: > d level = 0xa > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > > Using default GUID 0x0 > Error: Could not get port guid > Exiting SM > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > previous bind > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Wednesday, July 12, 2006 10:31 AM > To: Ravinandan Arakali > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan, > Do you still see the rping crash? > > Thanks, > Pradipta Kumar. > > Ravinandan Arakali wrote: > > Pradipta, > > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > > values in /proc/krping. When I re-ran the krping test, I see following > > output > > openfab2:~ # cat /proc/krping > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > Any idea why rping is dumping core ? > > > > Has any testing been done using SDP with ammasso cards ? > > > > Regards, > > Ravi > > > > > > -----Original Message----- > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > Sent: Friday, July 07, 2006 11:20 PM > > To: Ravinandan Arakali > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > interface) > > > > > > Ravinandan Arakali wrote: > >> Pradipta, > >> Following is the output from gdb after core dump. I have also copy-pasted > >> the gdb output on client system. > >> > >> Attached is the dmesg output when krping test is run in verbose mode. > >> The ping data on the sender(client) seems okay. The content is shifted > >> forward by one character for each packet. On receiver, after receiving > > ping > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > messages > >> can be lost during writing to /var/log/messages ? > > krping is indeed working!!...Using 'verbose' allows you to see the ping > > data. > > When not using 'verbose' you see only 'send/recv' messages. > >> ----------------------------------------- > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > >> Starting program: > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > >> 0.0.0.0 -p 9999 > >> [Thread debugging using libthread_db enabled] > >> [New Thread -1210054992 (LWP 3668)] > >> ipaddr (0.0.0.0) > >> port 9999 > >> created cm_id 0x804e6e0 > >> [New Thread -1210057824 (LWP 3671)] > >> rdma_bind_addr successful > >> rdma_listen > >> cma_event type 4 cma_id 0x804e968 (child) > >> child cma 0x804e968 > >> > >> Program received signal SIGSEGV, Segmentation fault. > >> [Switching to Thread -1210054992 (LWP 3668)] > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > >> (gdb) bt > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > >> ) at examples/rping.c:767 > >> (gdb) > >> > >> --------------------------------- > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > >> Starting program: > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > >> 17.2.2.102 -p 9999 > >> [Thread debugging using libthread_db enabled] > >> [New Thread 47388824908032 (LWP 4620)] > >> ipaddr (17.2.2.102) > >> port 9999 > >> created cm_id 0x506b00 > >> [New Thread 1082132800 (LWP 4623)] > >> cma_event type 0 cma_id 0x506b00 (parent) > >> cma_event type 2 cma_id 0x506b00 (parent) > >> rdma_resolve_addr - rdma_resolve_route successful > >> created pd 0x506e60 > >> created channel 0x506e80 > >> created cq 0x506ea0 > >> created qp 0x506f40 > >> rping_setup_buffers called on cb 0x505010 > >> allocated & registered buffers... > >> [New Thread 1090525504 (LWP 4624)] > >> cq_thread started. > >> > >> > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Thu Jul 13 07:15:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 17:15:18 +0300 Subject: [openib-general] [patch] update uverbs_event_get_sb() to new get_sb_pseudo() In-Reply-To: <20060713134053.GA16556@aon.at> References: <20060713134053.GA16556@aon.at> Message-ID: <20060713141518.GE21190@mellanox.co.il> Quoting r. Bernhard Fischer : > Subject: [patch] update uverbs_event_get_sb() to new get_sb_pseudo() > > Hi, > > get_sb_pseudo was changed but uverbs was not updated. svn only supports latest kernel.org release. > > PS: should nowadays the in-kernel impl be used instead of the one in > svn? > > Signed-off-by: Bernhard Fischer Please test 2.6.18-rc1 if possible. There's also a way to port just the infiniband bits from 2.6.18 to older kernels if you want to. -- MST From tziporet at mellanox.co.il Thu Jul 13 07:32:27 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 13 Jul 2006 17:32:27 +0300 Subject: [openib-general] SLES9 SP3 support was added In-Reply-To: References: Message-ID: <44B6597B.3010807@mellanox.co.il> Moshe Kazir wrote: > I'm trying to use OFED 1.0 on sles9 SP3 PPC64 . > > OFED 1.0 requires sysfsutils to be installed . > > I tried to compile & install sysfsutils-1.2.0-4.src.rpm > > But found that it conflicts with udev--021-36 rpm > > Anyone knows to work around this problem ? > > Also, > > OFED 1.0 is svn rev 8031 while patches/2.6.5-7.244 includes some svn > 8111 files. > > Was OFED 1.0 (svn 8031) tested on SLES9 SP3 ? > > Moshe > > > OFED 1.0 will not work on SLES9 SP3 You need to use OFED 1.0.1 for this. Vlad will place this package on the releases place in svn so you can use it. Tziporet From halr at voltaire.com Thu Jul 13 07:07:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Jul 2006 10:07:04 -0400 Subject: [openib-general] IPoIB Path Rerequests and Compliance Message-ID: <1152799621.13048.191884.camel@hal.voltaire.com> Hi Roland, It appears to me that if for some reason IPoIB rerequests a path that has already been resolved, it has filled in values for fields in the request which are not to be honored by the SA as their component mask bits are not on. While the SA responds properly to this since it ignores anything which does not have the comp mask bit set, it is technically a non compliance. Is this worth fixing ? If so, do you want a patch for this ? (Also, I haven't yet checked any other ULPs for a similar issue). I also see some "funny" values in some of these fields as well. -- Hal From halr at voltaire.com Thu Jul 13 07:26:13 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Jul 2006 10:26:13 -0400 Subject: [openib-general] openSM failover / failback issue? In-Reply-To: <1152755146.13048.156462.camel@hal.voltaire.com> References: <000001c6a5dc$ca72ff00$ff0da8c0@amr.corp.intel.com> <1152740984.13048.144995.camel@hal.voltaire.com> <44B57967.1030706@ichips.intel.com> <1152755146.13048.156462.camel@hal.voltaire.com> Message-ID: <1152800767.13048.192761.camel@hal.voltaire.com> On Wed, 2006-07-12 at 21:45, Hal Rosenstock wrote: [snip...] > > I don't know if this is an HCA firmware issues, switch issue, or openSM issue. > > I don't think it's related to my changes or osmtest at this point. > > I'll see if I can reproduce this tomorrow. I've followed your scenario numerous times and haven't been able to reproduce this failure so I can look at it :-( Not sure what the differences are yet... -- Hal From krause at cup.hp.com Thu Jul 13 08:13:12 2006 From: krause at cup.hp.com (Michael Krause) Date: Thu, 13 Jul 2006 08:13:12 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <79ae2f320607121549h24e3112ak7de9a9c50005bb9f@mail.gmail.co m> References: <6DB5B58A8E5AB846A7B3B3BFF1B4315A2BDF88@AVEXCH1.qlogic.org> <6.2.0.14.2.20060712144615.1008f360@esmail.cup.hp.com> <79ae2f320607121549h24e3112ak7de9a9c50005bb9f@mail.gmail.com> Message-ID: <6.2.0.14.2.20060713080120.101c5008@esmail.cup.hp.com> At 03:49 PM 7/12/2006, Fabian Tillier wrote: >Hi Mike, > >On 7/12/06, Michael Krause wrote: >> >>At 09:48 AM 7/12/2006, Jeff Broughton wrote: >> >>>Modifying the sockets API is just defining yet another RDMA API, and we have >>>so many already.... >> >>I disagree. This effort has distilled the API to basically one for RDMA >>developers. Applications are supported over this via either MPI or Sockets. > >There's been a lot of effort to make the RDMA verbs easy to use. With >the RDMA CM, socket-like connection semantics can be used to establish >the connection between QPs. The connection establishment is the hard >part - doing I/O is trivial in comparisson. This verbs and RDMA CM >have nothing to do with MPI. > >If an application is going to be RDMA aware, I don't see any reason it >shouldn't just use the verbs directly and use the RDMA CM to establish >the connections. What's your point? It seems you are in agreement that there is a single RDMA API that people can use. >> It seems rather self limiting to think the traditional BSD synchronous >>Sockets API is all the world should be able to use when it comes to Sockets. >> Sockets developers could easily incorporate the extensions into their >>applications providing them with improved designs and flexibility without >>having to learn about RDMA itself. > >Wait, you want applications to be able to register memory and issue >RDMA operations, but not have to learn about RDMA? How does that make >sense? The Sockets API extensions allow developers to register memory. That has been a desire by many when it comes to SDP or copy avoidance technology as it optimizes the performance path by eliminating the need to do per op registration. For many applications which already known working sets, they can use this to enable the OS and underlying infrastructure to take advantage of this fact to improve performance and quality of the solution. The extensions provide the async communications and event collection mechanisms to also improve performance over the rather limiting select / poll supported by Sockets today. It currently does not support explicit RDMA but it is rather trivial to add such calls and remove the need to interject SDP if desired. The benefits of such new API extensions are there for those that want to eliminate one more ULP with its unfortunate IP cloud over head. >> If the couple of calls necessary to >>extend this API to support direct RDMA would allow them to eliminate SDP >>entirely, well, that has benefits that go beyond just its all Sockets; > >For a socket implementation to support RDMA, the socket must have an >underlying RDMA QP. This means that if you want the application to >not have to be verbs-aware, you can't really get rid of SDP - you're >just extending SDP to let the application have a part in memory >registration and RDMA, while still supporting the traditional BSD >operations. This is IMO more complex than just letting applications >interface directly with verbs, especially since the SDP implementation >will size the QP for its own use, without a means for negotiating with >the user so that you don't cause buffer overruns. Please take a look at the API extensions. I never stated that one gets rid of SDP unless one adds the RDMA-explicit calls. As for complexity, well, the goal is to extend to Sockets developers the optimal communication paradigm already available on OS such as Windows without having to leave with the same unfortunate constraints imposed by the OS. The same logic applies to extending the benefits derived from MPI which supports async communications as well as put / get semantics which would be analogous to the additional RDMA interfaces I referenced. I find it strange that people would argue against improving the Sockets developer's tool suite when the benefits are already proven elsewhere within the industry and even within this open source effort. Giving the millions of Sockets developers the choice of a set of extensions that work over both RDMA and traditional network stacks seems like a no brainer. Trying to force them to use a native RDMA API even if semantically similar to Sockets seems like a poor path to pursue. Leave the RDMA API to the middleware providers and those that need to be close the metal. >>it also eliminates the IP cloud that hovers over SDP licensing. Something >>that many developers and customers would appreciate. > >I believe that Microsoft's IP claims only apply to SDP over IB -- I >don't believe SDP over iWarp is affected. I don't know how the RDMA >verbs moving towards a hardware independent (wrt IB vs. iWarp) affects >the IP claims, but it should certainly make things interesting if a >single SDP code base can work over both IB and iWarp. SDP is SDP and it isn't just restricted to IB. I'll leave it to the lawyers to sort it out but having a single SDP with minor code execution path deltas for the IB-specifics isn't that hard to construct. It has been done on other OS already. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From rep.nop at aon.at Thu Jul 13 08:32:07 2006 From: rep.nop at aon.at (Bernhard Fischer) Date: Thu, 13 Jul 2006 17:32:07 +0200 Subject: [openib-general] [patch] update uverbs_event_get_sb() to new get_sb_pseudo() In-Reply-To: <20060713141518.GE21190@mellanox.co.il> References: <20060713134053.GA16556@aon.at> <20060713141518.GE21190@mellanox.co.il> Message-ID: <20060713153207.GA18483@aon.at> On Thu, Jul 13, 2006 at 05:15:18PM +0300, Michael S. Tsirkin wrote: >Quoting r. Bernhard Fischer : >> Subject: [patch] update uverbs_event_get_sb() to new get_sb_pseudo() >> >> Hi, >> >> get_sb_pseudo was changed but uverbs was not updated. > >svn only supports latest kernel.org release. AFAICT, i have 2.6.18-rc1-git6 (fetched about 1 hour ago..) grep get_sb_pseudo include/linux/fs.h -A2 extern int get_sb_pseudo(struct file_system_type *, char *, struct super_operations *ops, unsigned long, struct vfsmount *mnt); openib gen2 has grep get_sb_pseudo ../cvs/openib.gen2/upstream/gen2/trunk/src/linux-kernel/infiniband/core/uverbs_main.c -A1 return get_sb_pseudo(fs_type, "infinibandevent:", NULL, INFINIBANDEVENTFS_MAGIC); > >> >> PS: should nowadays the in-kernel impl be used instead of the one in >> svn? >> >> Signed-off-by: Bernhard Fischer > >Please test 2.6.18-rc1 if possible. There's also a way to port just >the infiniband bits from 2.6.18 to older kernels if you want to. I copied src/linux-kernel/infiniband into the drivers/ dir like i always did (after rm'ing the in-kernel drivers/infiniband dir). Neither uverbs_main.c did compile (hence the trivial patchlet in this thread) nor ipoib (complaining about xmit_lock not in netdev or the like). Is using the linux-kernel/infiniband driver from svn still the proper thing to do? Does ipoib from svn build for the current kernel? hm.. Could it be that the current git repo is not supported but only the last "prepatch"? From mst at mellanox.co.il Thu Jul 13 08:33:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 18:33:44 +0300 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference In-Reply-To: <20060709090521.GB2609@mellanox.co.il> References: <20060709090521.GB2609@mellanox.co.il> Message-ID: <20060713153344.GA22648@mellanox.co.il> Andrew, could you please drop the following into -mm and on to Linus? --- ib_fmr_pool_map_phys gets the virtual address by pointer but never writes there, and users (e.g. srp) seem to assume this and ignore the value returned. This patch cleans up the API to get the VA by value, and updates all users. Signed-off-by: Michael S. Tsirkin Acked-by: Roland Dreier diff --git a/include/rdma/ib_fmr_pool.h b/include/rdma/ib_fmr_pool.h index 4ace54c..00dadbf 100644 --- a/include/rdma/ib_fmr_pool.h +++ b/include/rdma/ib_fmr_pool.h @@ -88,7 +88,7 @@ int ib_flush_fmr_pool(struct ib_fmr_pool struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle, u64 *page_list, int list_len, - u64 *io_virtual_address); + u64 io_virtual_address); int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr); diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c index 615fe9c..86a3b2d 100644 --- a/drivers/infiniband/core/fmr_pool.c +++ b/drivers/infiniband/core/fmr_pool.c @@ -426,7 +426,7 @@ EXPORT_SYMBOL(ib_flush_fmr_pool); struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle, u64 *page_list, int list_len, - u64 *io_virtual_address) + u64 io_virtual_address) { struct ib_fmr_pool *pool = pool_handle; struct ib_pool_fmr *fmr; @@ -440,7 +440,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys fmr = ib_fmr_cache_lookup(pool, page_list, list_len, - *io_virtual_address); + io_virtual_address); if (fmr) { /* found in cache */ ++fmr->ref_count; @@ -464,7 +464,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys spin_unlock_irqrestore(&pool->pool_lock, flags); result = ib_map_phys_fmr(fmr->fmr, page_list, list_len, - *io_virtual_address); + io_virtual_address); if (result) { spin_lock_irqsave(&pool->pool_lock, flags); @@ -481,7 +481,7 @@ struct ib_pool_fmr *ib_fmr_pool_map_phys fmr->ref_count = 1; if (pool->cache_bucket) { - fmr->io_virtual_address = *io_virtual_address; + fmr->io_virtual_address = io_virtual_address; fmr->page_list_len = list_len; memcpy(fmr->page_list, page_list, list_len * sizeof(*page_list)); diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index ff117bb..72febf1 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -594,7 +594,7 @@ int iser_reg_page_vec(struct iser_conn mem = ib_fmr_pool_map_phys(ib_conn->fmr_pool, page_list, page_vec->length, - &io_addr); + io_addr); if (IS_ERR(mem)) { status = (int)PTR_ERR(mem); diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 4e22afe..9bf182a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -615,7 +615,7 @@ static int srp_map_fmr(struct srp_device (sg_dma_address(&scat[i]) & dev->fmr_page_mask) + j; req->fmr = ib_fmr_pool_map_phys(dev->fmr_pool, - dma_pages, page_cnt, &io_addr); + dma_pages, page_cnt, io_addr); if (IS_ERR(req->fmr)) { ret = PTR_ERR(req->fmr); goto out; -- MST From rep.nop at aon.at Thu Jul 13 08:35:56 2006 From: rep.nop at aon.at (Bernhard Fischer) Date: Thu, 13 Jul 2006 17:35:56 +0200 Subject: [openib-general] [patch] update uverbs_event_get_sb() to new get_sb_pseudo() In-Reply-To: <20060713153207.GA18483@aon.at> References: <20060713134053.GA16556@aon.at> <20060713141518.GE21190@mellanox.co.il> <20060713153207.GA18483@aon.at> Message-ID: <20060713153556.GA18776@aon.at> On Thu, Jul 13, 2006 at 05:32:07PM +0200, Bernhard Fischer wrote: >On Thu, Jul 13, 2006 at 05:15:18PM +0300, Michael S. Tsirkin wrote: >>Quoting r. Bernhard Fischer : >>> Subject: [patch] update uverbs_event_get_sb() to new get_sb_pseudo() >>> >>> Hi, >>> >>> get_sb_pseudo was changed but uverbs was not updated. >> >>svn only supports latest kernel.org release. >AFAICT, i have 2.6.18-rc1-git6 (fetched about 1 hour ago..) duh. Sorry for the noise. From mst at mellanox.co.il Thu Jul 13 08:34:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 18:34:56 +0300 Subject: [openib-general] [patch] update uverbs_event_get_sb() to new get_sb_pseudo() In-Reply-To: <20060713153207.GA18483@aon.at> References: <20060713153207.GA18483@aon.at> Message-ID: <20060713153456.GA23056@mellanox.co.il> Quoting r. Bernhard Fischer : > Is using the linux-kernel/infiniband driver from svn still the proper > thing to do? It depends on what you want to do I guess :) > Does ipoib from svn build for the current kernel? > hm.. Could it be that the current git repo is not supported but only the > last "prepatch"? Only the last release is supported by svn, that is 2.6.17.x at the moment. -- MST From rdreier at cisco.com Thu Jul 13 08:42:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Jul 2006 08:42:47 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> Message-ID: > Sigh. It was always a mistake (of the kernel programming 101 type) to put > any locking at all in the idr code. At some stage we need to weed it all > out and move it to callers. > > Your fix is yet more fallout from that mistake. Agreed. Consider me on the hook to fix this up in a better way once my life is a little saner. Maybe I'll try to cook something up on the plane ride to Ottawa. Anyway you can punch me in the stomach if I don't have something in time for 2.6.19. - R. From mst at mellanox.co.il Thu Jul 13 08:50:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 18:50:25 +0300 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <44B4371C.7020809@ichips.intel.com> References: <44B4371C.7020809@ichips.intel.com> Message-ID: <20060713155025.GB22648@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 > > Michael S. Tsirkin wrote: > > This will be 2.6.18 material, right? > > I've committed this to svn. Assuming that further testing goes without a > hitch, you can pull the actual check-in from revision 8498 to push upstream. OK, we'll see after Ottawa I guess - it's a necessary fix, but need to let it run for a while. -- MST From mst at mellanox.co.il Thu Jul 13 08:55:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 18:55:30 +0300 Subject: [openib-general] [PATCH] IB/core: use correct gfp_mask in sa_query In-Reply-To: <20060712110955.GB18466@mellanox.co.il> References: <20060712093820.GA9218@elte.hu> <20060712110955.GB18466@mellanox.co.il> Message-ID: <20060713155530.GC22648@mellanox.co.il> Andrew, could you please drop the following in -mm and on to Linus? -- Avoid bogus out of memory errors: fix sa_query to actually pass gfp_mask supplied by the user to idr_pre_get. Signed-off-by: Michael S. Tsirkin Acked-by: "Sean Hefty" Acked-by: "Roland Dreier" diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index e911c99..aeda484 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -488,13 +488,13 @@ static void init_mad(struct ib_sa_mad *m spin_unlock_irqrestore(&tid_lock, flags); } -static int send_mad(struct ib_sa_query *query, int timeout_ms) +static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) { unsigned long flags; int ret, id; retry: - if (!idr_pre_get(&query_idr, GFP_ATOMIC)) + if (!idr_pre_get(&query_idr, gfp_mask)) return -ENOMEM; spin_lock_irqsave(&idr_lock, flags); ret = idr_get_new(&query_idr, query, &id); @@ -630,7 +630,7 @@ int ib_sa_path_rec_get(struct ib_device *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -752,7 +752,7 @@ int ib_sa_service_rec_query(struct ib_de *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -844,7 +844,7 @@ int ib_sa_mcmember_rec_query(struct ib_d *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; -- MST From rdreier at cisco.com Thu Jul 13 09:02:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Jul 2006 09:02:47 -0700 Subject: [openib-general] Suggestions for how to remove bus_to_virt() References: <1152746967.4572.263.camel@brick.pathscale.com> <20060712.174013.95062313.davem@davemloft.net> Message-ID: > > A cleaner solution would be to make the dma_ API really use the device > > it's passed anyway, and allow drivers to override the standard PCI > > stuff nicely. But that would be major surgery, I guess. > Clean but expensive, you should not force the rest of the kernel > to eat the cost of something you want to do when it's totally > unnecessary for most other users. OK, fair enough. > For example, x86 never needs to do anything other than a direct > virt_to_phys translation to produce a DMA address, no matter what > bus the device is on. It's a single simple integer adjustment > that can be done inline in about 2 or 3 instructions at most. Except x86 needs to handle systems with IOMMUs now... > If you need device level DMA mapping semantics, create them for your > device type. This is what USB does, btw. Makes sense -- Ralph, I would suggest looking at USB as a model. - R. From afriedle at open-mpi.org Thu Jul 13 09:21:34 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Thu, 13 Jul 2006 09:21:34 -0700 Subject: [openib-general] multicast In-Reply-To: <000101c6a633$4dbaaba0$19cc180a@amr.corp.intel.com> References: <000101c6a633$4dbaaba0$19cc180a@amr.corp.intel.com> Message-ID: <44B6730E.8030207@open-mpi.org> Sean Hefty wrote: >>I'm concerned about how rdma_cm abstracts HCAs. It looks like I can use >>the src_addr argument to rdma_resolve_addr() to select which IP >>address/HCA (assuming one IP per HCA), but how can I enumerate the >>available HCAs? > > > The HCA / RDMA device abstraction is there for device hotplug, but the verb call > to enumerate HCAs is still usable if you want to get a list of all HCAs in the > system. > > You will likely have one IP address per port, rather than per HCA. You probably > want to distinguish between locally assigned IP addresses (those given to ipoib > devices - ib0, etc.), versus multicast IP addresses, and verify that your > multicast routing tables direct traffic out of ipoib IP addresses, rather than > Ethernet IP addresses. The IB multicast groups will base their local routing > the same as the true IP multicast groups. Yes - I'm actually talking about a separate issue here. It looks like using the RDMA CM for multicast is going to require using it for all of my connection management, so I'm looking at what that entails. Currently I'm using only ibverbs and Open MPI's runtime environment layer. > >>This is important for a number of reasons - one, so that I can pass on >>the available IP addresses to MPI peers out of band. It's also >>important to know which HCA's are available in the system, and to be >>able to select which HCA to use when connecting to a peer. This allows >>us to implement things like load balancing and failover. > > > HCA / port selection can be controlled by selecting a specific IP address, and > you can configure your multicast routing tables to direct traffic out any > desired port. You should have the same control over using a specific HCA / > port; only the type of address used to identify the port changes. > > I might be able to make things a little easier by adding some sort of call that > identifies all RDMA IP addresses in the system. You could test for this today > by calling rdma_bind_addr() on all IP addresses assigned to the system. This > doesn't really help with multicast addresses though, since you don't bind to > them... That would be very nice - Open MPI already supports enumeration of IP interfaces (which I could do rdma_bind_addr() on as you suggested) in a portable fashion, but I think being able to get this via RDMA CM is a better general solution. Right about the multicast addresses - should have made it clear that I was talking unicast IP. I understand RDMA CM is a generic CM intended for other types of devices (ie iWARP), not just infiniband. Will all of these devices be supported under the ibverbs interface? I'm thinking it would be a problem if we're picking up interfaces that don't support ibverbs, then try to use ibverbs to communicate over them. > I'm not clear on what you mean about passing available IP addresses to MPI > peers, or why it's done out of band. Are you talking about IP addresses of the > local ipoib devices? Multicast IP addresses? By out of band, do you mean over > a socket, as opposed to an IB connection? Sorry - I'm talking about IP addresses of the local ipoib devices, or whatever sort of addressing structure a particular network uses. Yes, we currently send this information out of band, over TCP. Our network initialization works like this - we have modules written for each type of network (TCP, infiniband, GM, etc). In the first initialization stage for each module, available interfaces are enumerated, initialized, and addressing information for each interface is made available to our runtime environment layer. This addressing information is exchanged among all peers in the MPI job via TCP (I believe we have a framework for supporting other methods, but only TCP is currently implemented). Finally, each network module takes all the peer addresses for its network and sets up any necessary data structures for communicating with each of those peers. > >>Matt Leininger suggested looking at the IB CM as an alternative, as it >>gives more low-level control. Am I missing something, or does the IB CM >>not handle multicast like the RDMA CM? > > > IB multicast groups require SA interaction, and are not associated with the IB > CM. What control do you feel that the RDMA CM is missing? At the moment, I'm more concerned about how the RDMA CM API fits with Open MPI (which I think it will, just need to re-think connection management). In the future though, one thing that comes to mind is control of dynamic/multipath routing. Andrew From ralphc at pathscale.com Thu Jul 13 09:37:54 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Thu, 13 Jul 2006 09:37:54 -0700 Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: References: <1152746967.4572.263.camel@brick.pathscale.com> <20060712.174013.95062313.davem@davemloft.net> Message-ID: <1152808674.4572.282.camel@brick.pathscale.com> Thanks to all for the pointers and suggestions. It will probably take me a while to follow up on these and make another proposal. On Thu, 2006-07-13 at 09:02 -0700, Roland Dreier wrote: > > > A cleaner solution would be to make the dma_ API really use the device > > > it's passed anyway, and allow drivers to override the standard PCI > > > stuff nicely. But that would be major surgery, I guess. > > > Clean but expensive, you should not force the rest of the kernel > > to eat the cost of something you want to do when it's totally > > unnecessary for most other users. > > OK, fair enough. > > > For example, x86 never needs to do anything other than a direct > > virt_to_phys translation to produce a DMA address, no matter what > > bus the device is on. It's a single simple integer adjustment > > that can be done inline in about 2 or 3 instructions at most. > > Except x86 needs to handle systems with IOMMUs now... > > > If you need device level DMA mapping semantics, create them for your > > device type. This is what USB does, btw. > > Makes sense -- Ralph, I would suggest looking at USB as a model. > > - R. From sean.hefty at intel.com Thu Jul 13 09:45:20 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 13 Jul 2006 09:45:20 -0700 Subject: [openib-general] multicast In-Reply-To: <44B6730E.8030207@open-mpi.org> Message-ID: <000201c6a69b$ba9b9d20$05d5180a@amr.corp.intel.com> >Yes - I'm actually talking about a separate issue here. It looks like >using the RDMA CM for multicast is going to require using it for all of >my connection management, so I'm looking at what that entails. >Currently I'm using only ibverbs and Open MPI's runtime environment layer. The RDMA CM is designed so that you can handle RC, UD, and multicast communication through it using a single addressing scheme (IP). It lets you mix UD and multicast traffic on the same QP. >That would be very nice - Open MPI already supports enumeration of IP >interfaces (which I could do rdma_bind_addr() on as you suggested) in a >portable fashion, but I think being able to get this via RDMA CM is a >better general solution. I see two reasonable solutions here. One is to provide an enumeration of RDMA IP addresses. The second is to use the existing methods to enumerate all IP addresses, then provide a check to determine if an IP address is usable as an RDMA address. The current code does the latter; although, not necessarily in an intuitive way. >I understand RDMA CM is a generic CM intended for other types of devices >(ie iWARP), not just infiniband. Will all of these devices be supported >under the ibverbs interface? I'm thinking it would be a problem if >we're picking up interfaces that don't support ibverbs, then try to use >ibverbs to communicate over them. The intent is that, yes, all RDMA devices should fit under the ibverbs (to be renamed rdma_verbs) interface. Changes would be made to ibverbs to accommodate new devices as they come along. >Sorry - I'm talking about IP addresses of the local ipoib devices, or >whatever sort of addressing structure a particular network uses. Yes, >we currently send this information out of band, over TCP. If you're using IP addresses for all RDMA devices, do you still need to exchange this information (which is also going over an IP address)? Is this to get more control over the routes used for connections? Do you do this even when using standard IP over Ethernet? >At the moment, I'm more concerned about how the RDMA CM API fits with >Open MPI (which I think it will, just need to re-think connection >management). In the future though, one thing that comes to mind is >control of dynamic/multipath routing. You have greater control over paths by using rdma_get_option / rdma_set_option. There are options available to retrieve a list of all possible path records from the local node to a specified destination. You can then manually set which path you want a connection to use. Use of these options require that the local SA cache be enabled. - Sean From afriedle at open-mpi.org Thu Jul 13 10:34:05 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Thu, 13 Jul 2006 10:34:05 -0700 Subject: [openib-general] multicast In-Reply-To: <000201c6a69b$ba9b9d20$05d5180a@amr.corp.intel.com> References: <000201c6a69b$ba9b9d20$05d5180a@amr.corp.intel.com> Message-ID: <44B6840D.7030604@open-mpi.org> Sean Hefty wrote: > I see two reasonable solutions here. One is to provide an enumeration of RDMA > IP addresses. The second is to use the existing methods to enumerate all IP > addresses, then provide a check to determine if an IP address is usable as an > RDMA address. The current code does the latter; although, not necessarily in an > intuitive way. These actually seem like the same sort of solution, the difference being who does the work. > >>I understand RDMA CM is a generic CM intended for other types of devices >>(ie iWARP), not just infiniband. Will all of these devices be supported >>under the ibverbs interface? I'm thinking it would be a problem if >>we're picking up interfaces that don't support ibverbs, then try to use >>ibverbs to communicate over them. > > > The intent is that, yes, all RDMA devices should fit under the ibverbs (to be > renamed rdma_verbs) interface. Changes would be made to ibverbs to accommodate > new devices as they come along. Very good. > If you're using IP addresses for all RDMA devices, do you still need to exchange > this information (which is also going over an IP address)? Is this to get more > control over the routes used for connections? Do you do this even when using > standard IP over Ethernet? Yes and no - it's certainly not *required*, but that is how the abstraction has been set up. We don't make the assumption that all networks in use support IP - this allows us to support networks like MVAPI and GM. Yes, in a sense, it does give more control over routes. Yes, even when TCP/IP is being used for MPI communication, as all networks we support are abstracted into a single API. We then do things like striping, failover, and support disjoint networks (that is, we don't assume one peer can communicate to another via all interfaces) irrespective of the network in use. Andrew From ravinandan.arakali at neterion.com Thu Jul 13 11:20:39 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Thu, 13 Jul 2006 11:20:39 -0700 Subject: [openib-general] ping problem with ammasso cards(iWARPinterface) In-Reply-To: <1152798905.25964.10.camel@stevo-desktop> Message-ID: <001001c6a6a9$0c239460$4710100a@pc.s2io.com> With the --cma option, I don't see the error about running SM. But there's no connection established. openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma pp_server_connect_cma starting server openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 pp_client_connect_cma starting client pp_client_connect_cma/856 unexpected CM event 8 pp_client_connect_cma NOT connected! pp_connect_cma(17.2.2.102,18515) failed! There are no messages in dmesg either. Ravi -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Thursday, July 13, 2006 6:55 AM To: Ravinandan Arakali Cc: bpradip at in.ibm.com; openib-general at openib.org Subject: Re: [openib-general] ping problem with ammasso cards(iWARPinterface) Are you trying to run this over iwarp? It doesn't need an SM... For the perftests rdma_lat and rdma_bw in the iwarp branch, use the --cma flag. Steve. On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: > Also, I am trying to run some of the iwarp bandwidth/latency tests > (available under directory perftest). > The first thing to do here is to run opensm. When I run opensm (with debug > level 10), I get the following error. Any idea what needs to be done to get > this working ? > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > Command Line Arguments: > d level = 0xa > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > > Using default GUID 0x0 > Error: Could not get port guid > Exiting SM > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > previous bind > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > -----Original Message----- > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > Sent: Wednesday, July 12, 2006 10:31 AM > To: Ravinandan Arakali > Cc: openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > interface) > > > Ravinandan, > Do you still see the rping crash? > > Thanks, > Pradipta Kumar. > > Ravinandan Arakali wrote: > > Pradipta, > > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > > values in /proc/krping. When I re-ran the krping test, I see following > > output > > openfab2:~ # cat /proc/krping > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > Any idea why rping is dumping core ? > > > > Has any testing been done using SDP with ammasso cards ? > > > > Regards, > > Ravi > > > > > > -----Original Message----- > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > Sent: Friday, July 07, 2006 11:20 PM > > To: Ravinandan Arakali > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > interface) > > > > > > Ravinandan Arakali wrote: > >> Pradipta, > >> Following is the output from gdb after core dump. I have also copy-pasted > >> the gdb output on client system. > >> > >> Attached is the dmesg output when krping test is run in verbose mode. > >> The ping data on the sender(client) seems okay. The content is shifted > >> forward by one character for each packet. On receiver, after receiving > > ping > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > messages > >> can be lost during writing to /var/log/messages ? > > krping is indeed working!!...Using 'verbose' allows you to see the ping > > data. > > When not using 'verbose' you see only 'send/recv' messages. > >> ----------------------------------------- > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > >> Starting program: > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > >> 0.0.0.0 -p 9999 > >> [Thread debugging using libthread_db enabled] > >> [New Thread -1210054992 (LWP 3668)] > >> ipaddr (0.0.0.0) > >> port 9999 > >> created cm_id 0x804e6e0 > >> [New Thread -1210057824 (LWP 3671)] > >> rdma_bind_addr successful > >> rdma_listen > >> cma_event type 4 cma_id 0x804e968 (child) > >> child cma 0x804e968 > >> > >> Program received signal SIGSEGV, Segmentation fault. > >> [Switching to Thread -1210054992 (LWP 3668)] > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > >> (gdb) bt > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address 0x6 > >> ) at examples/rping.c:767 > >> (gdb) > >> > >> --------------------------------- > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > >> Starting program: > >> tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > >> 17.2.2.102 -p 9999 > >> [Thread debugging using libthread_db enabled] > >> [New Thread 47388824908032 (LWP 4620)] > >> ipaddr (17.2.2.102) > >> port 9999 > >> created cm_id 0x506b00 > >> [New Thread 1082132800 (LWP 4623)] > >> cma_event type 0 cma_id 0x506b00 (parent) > >> cma_event type 2 cma_id 0x506b00 (parent) > >> rdma_resolve_addr - rdma_resolve_route successful > >> created pd 0x506e60 > >> created channel 0x506e80 > >> created cq 0x506ea0 > >> created qp 0x506f40 > >> rping_setup_buffers called on cb 0x505010 > >> allocated & registered buffers... > >> [New Thread 1090525504 (LWP 4624)] > >> cq_thread started. > >> > >> > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Thu Jul 13 11:27:00 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 13 Jul 2006 13:27:00 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARPinterface) In-Reply-To: <001001c6a6a9$0c239460$4710100a@pc.s2io.com> References: <001001c6a6a9$0c239460$4710100a@pc.s2io.com> Message-ID: <1152815220.25964.35.camel@stevo-desktop> I guess this isn't surprising since rping doesn't work for you either. Something fundamental is screwed up on your user side methinks... CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side wasn't listening on the appropriate TCP port, or the server process did an rdma_reject(). I'm guessing its the former... You could use tcpdmp and to see if the connection request is getting RST by the remote side. On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote: > With the --cma option, I don't see the error about running SM. > But there's no connection established. > > openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma > pp_server_connect_cma starting server > > openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 > pp_client_connect_cma starting client > pp_client_connect_cma/856 unexpected CM event 8 > pp_client_connect_cma NOT connected! > pp_connect_cma(17.2.2.102,18515) failed! > > There are no messages in dmesg either. > > Ravi > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Thursday, July 13, 2006 6:55 AM > To: Ravinandan Arakali > Cc: bpradip at in.ibm.com; openib-general at openib.org > Subject: Re: [openib-general] ping problem with ammasso > cards(iWARPinterface) > > > Are you trying to run this over iwarp? It doesn't need an SM... > > For the perftests rdma_lat and rdma_bw in the iwarp branch, use the > --cma flag. > > Steve. > > > On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: > > Also, I am trying to run some of the iwarp bandwidth/latency tests > > (available under directory perftest). > > The first thing to do here is to run opensm. When I run opensm (with debug > > level 10), I get the following error. Any idea what needs to be done to > get > > this working ? > > > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > > ------------------------------------------------- > > OpenSM Rev:openib-1.2.0 > > Command Line Arguments: > > d level = 0xa > > Log File: /var/log/osm.log > > ------------------------------------------------- > > OpenSM Rev:openib-1.2.0 > > > > Using default GUID 0x0 > > Error: Could not get port guid > > Exiting SM > > > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > > previous bind > > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > > > > > > > -----Original Message----- > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > Sent: Wednesday, July 12, 2006 10:31 AM > > To: Ravinandan Arakali > > Cc: openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > interface) > > > > > > Ravinandan, > > Do you still see the rping crash? > > > > Thanks, > > Pradipta Kumar. > > > > Ravinandan Arakali wrote: > > > Pradipta, > > > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > > > values in /proc/krping. When I re-ran the krping test, I see following > > > output > > > openfab2:~ # cat /proc/krping > > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > > Any idea why rping is dumping core ? > > > > > > Has any testing been done using SDP with ammasso cards ? > > > > > > Regards, > > > Ravi > > > > > > > > > -----Original Message----- > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > Sent: Friday, July 07, 2006 11:20 PM > > > To: Ravinandan Arakali > > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > > openib-general at openib.org > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > interface) > > > > > > > > > Ravinandan Arakali wrote: > > >> Pradipta, > > >> Following is the output from gdb after core dump. I have also > copy-pasted > > >> the gdb output on client system. > > >> > > >> Attached is the dmesg output when krping test is run in verbose mode. > > >> The ping data on the sender(client) seems okay. The content is shifted > > >> forward by one character for each packet. On receiver, after receiving > > > ping > > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > > messages > > >> can be lost during writing to /var/log/messages ? > > > krping is indeed working!!...Using 'verbose' allows you to see the ping > > > data. > > > When not using 'verbose' you see only 'send/recv' messages. > > >> ----------------------------------------- > > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > > >> Starting program: > > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > > >> 0.0.0.0 -p 9999 > > >> [Thread debugging using libthread_db enabled] > > >> [New Thread -1210054992 (LWP 3668)] > > >> ipaddr (0.0.0.0) > > >> port 9999 > > >> created cm_id 0x804e6e0 > > >> [New Thread -1210057824 (LWP 3671)] > > >> rdma_bind_addr successful > > >> rdma_listen > > >> cma_event type 4 cma_id 0x804e968 (child) > > >> child cma 0x804e968 > > >> > > >> Program received signal SIGSEGV, Segmentation fault. > > >> [Switching to Thread -1210054992 (LWP 3668)] > > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > > >> (gdb) bt > > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address > 0x6 > > >> ) at examples/rping.c:767 > > >> (gdb) > > >> > > >> --------------------------------- > > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > > >> Starting program: > > >> > tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > > >> 17.2.2.102 -p 9999 > > >> [Thread debugging using libthread_db enabled] > > >> [New Thread 47388824908032 (LWP 4620)] > > >> ipaddr (17.2.2.102) > > >> port 9999 > > >> created cm_id 0x506b00 > > >> [New Thread 1082132800 (LWP 4623)] > > >> cma_event type 0 cma_id 0x506b00 (parent) > > >> cma_event type 2 cma_id 0x506b00 (parent) > > >> rdma_resolve_addr - rdma_resolve_route successful > > >> created pd 0x506e60 > > >> created channel 0x506e80 > > >> created cq 0x506ea0 > > >> created qp 0x506f40 > > >> rping_setup_buffers called on cb 0x505010 > > >> allocated & registered buffers... > > >> [New Thread 1090525504 (LWP 4624)] > > >> cq_thread started. > > >> > > >> > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > From rep.nop at aon.at Thu Jul 13 11:51:16 2006 From: rep.nop at aon.at (Bernhard Fischer) Date: Thu, 13 Jul 2006 20:51:16 +0200 Subject: [openib-general] ibv_reg_mr fails: bad address [was: Re: Problem with mca_mpool_openib_register - Cannot allocate memory] In-Reply-To: <4496BE90.40607@princeton.edu> References: <4496BE90.40607@princeton.edu> Message-ID: <20060713185116.GA21861@aon.at> On Mon, Jun 19, 2006 at 11:11:12AM -0400, Bill Wichser wrote: >Running the openib stack from Redhat on a 2.6.9-34.ELsmp kernel, dual >Xeon. Running with openmpi v1.0.2 compiled w/gcc. > >While we still have the problem with btl_openib_endpoint.c returning 0 >byte(s) for max inline data, and realize that another IB stack addresses >this, another problem when running across more than a single host pops >up generating huge amounts of error messages. > >The errors go something like this: > >mca_mpool_openib_register: ibv_reg_mr(0x2ac2622000,1052672) failed with >error: Cannot allocate memory >[0,1,1][btl_openib.c:496:mca_btl_openib_prepare_dst] >mpool_register(0x2ac2622040,1048576) failed: base 0x2ac2222040 lb 0 >offset 4194304 while 8MB (as Bill stated below) is likely to fail, i'm getting: [x86-64n001:07622] mca_mpool_openib_register: ibv_reg_mr(0x717f2000,2113536) failed with error: Bad address Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x717f1ff8 two boxes involved, each with 2GB memory, my ulimits are ok: $ ulimit -l;rsh 10.100.0.44 "ulimit -l" unlimited unlimited Any hint on this one? TIA, Bernhard $ cat /sys/class/infiniband/mthca0/fw_ver 4.7.400 # lspci -vvxxx -s 01:00.0 01:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor compatibility mode) (rev a0) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor compatibility mode) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- >We fixed the /etc/security/limits.conf problem but I don't know what to >do about this one. The job seems to complete without error on 2 nodes >(4 processors) but to scale any larger just generates megabyte files of >these types of error messages. > >Any insights for this problem? All searches lead me to the limits.conf >which we have set to 8192. These are 8G machines if that makes any Bill, 8192 is just 8MB. See http://www.open-mpi.org/faq/?category=infiniband#ib-locked-pages and also make sure to have session required pam_limits.so in your rsh, rlogin and rexec files in pam.d >difference. > >Thanks, >Bill From swise at opengridcomputing.com Thu Jul 13 12:10:05 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 13 Jul 2006 14:10:05 -0500 Subject: [openib-general] ping problem with ammasso cards(iWARPinterface) In-Reply-To: <1152815220.25964.35.camel@stevo-desktop> References: <001001c6a6a9$0c239460$4710100a@pc.s2io.com> <1152815220.25964.35.camel@stevo-desktop> Message-ID: <1152817805.25964.40.camel@stevo-desktop> By the way, does this failure happen immediately or after some period of time? On Thu, 2006-07-13 at 13:27 -0500, Steve Wise wrote: > I guess this isn't surprising since rping doesn't work for you either. > Something fundamental is screwed up on your user side methinks... > > CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side > wasn't listening on the appropriate TCP port, or the server process did > an rdma_reject(). I'm guessing its the former... > > You could use tcpdmp and to see if the connection request is getting RST > by the remote side. > > > > > On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote: > > With the --cma option, I don't see the error about running SM. > > But there's no connection established. > > > > openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma > > pp_server_connect_cma starting server > > > > openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 > > pp_client_connect_cma starting client > > pp_client_connect_cma/856 unexpected CM event 8 > > pp_client_connect_cma NOT connected! > > pp_connect_cma(17.2.2.102,18515) failed! > > > > There are no messages in dmesg either. > > > > Ravi > > > > -----Original Message----- > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > Sent: Thursday, July 13, 2006 6:55 AM > > To: Ravinandan Arakali > > Cc: bpradip at in.ibm.com; openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso > > cards(iWARPinterface) > > > > > > Are you trying to run this over iwarp? It doesn't need an SM... > > > > For the perftests rdma_lat and rdma_bw in the iwarp branch, use the > > --cma flag. > > > > Steve. > > > > > > On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: > > > Also, I am trying to run some of the iwarp bandwidth/latency tests > > > (available under directory perftest). > > > The first thing to do here is to run opensm. When I run opensm (with debug > > > level 10), I get the following error. Any idea what needs to be done to > > get > > > this working ? > > > > > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.2.0 > > > Command Line Arguments: > > > d level = 0xa > > > Log File: /var/log/osm.log > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.2.0 > > > > > > Using default GUID 0x0 > > > Error: Could not get port guid > > > Exiting SM > > > > > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > > > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > > > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > > > > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > > > previous bind > > > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > Sent: Wednesday, July 12, 2006 10:31 AM > > > To: Ravinandan Arakali > > > Cc: openib-general at openib.org > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > interface) > > > > > > > > > Ravinandan, > > > Do you still see the rping crash? > > > > > > Thanks, > > > Pradipta Kumar. > > > > > > Ravinandan Arakali wrote: > > > > Pradipta, > > > > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > > > > values in /proc/krping. When I re-ran the krping test, I see following > > > > output > > > > openfab2:~ # cat /proc/krping > > > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > > > Any idea why rping is dumping core ? > > > > > > > > Has any testing been done using SDP with ammasso cards ? > > > > > > > > Regards, > > > > Ravi > > > > > > > > > > > > -----Original Message----- > > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > > Sent: Friday, July 07, 2006 11:20 PM > > > > To: Ravinandan Arakali > > > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > > > openib-general at openib.org > > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > > interface) > > > > > > > > > > > > Ravinandan Arakali wrote: > > > >> Pradipta, > > > >> Following is the output from gdb after core dump. I have also > > copy-pasted > > > >> the gdb output on client system. > > > >> > > > >> Attached is the dmesg output when krping test is run in verbose mode. > > > >> The ping data on the sender(client) seems okay. The content is shifted > > > >> forward by one character for each packet. On receiver, after receiving > > > > ping > > > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > > > messages > > > >> can be lost during writing to /var/log/messages ? > > > > krping is indeed working!!...Using 'verbose' allows you to see the ping > > > > data. > > > > When not using 'verbose' you see only 'send/recv' messages. > > > >> ----------------------------------------- > > > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > > > >> Starting program: > > > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > > > >> 0.0.0.0 -p 9999 > > > >> [Thread debugging using libthread_db enabled] > > > >> [New Thread -1210054992 (LWP 3668)] > > > >> ipaddr (0.0.0.0) > > > >> port 9999 > > > >> created cm_id 0x804e6e0 > > > >> [New Thread -1210057824 (LWP 3671)] > > > >> rdma_bind_addr successful > > > >> rdma_listen > > > >> cma_event type 4 cma_id 0x804e968 (child) > > > >> child cma 0x804e968 > > > >> > > > >> Program received signal SIGSEGV, Segmentation fault. > > > >> [Switching to Thread -1210054992 (LWP 3668)] > > > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > > > >> (gdb) bt > > > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address > > 0x6 > > > >> ) at examples/rping.c:767 > > > >> (gdb) > > > >> > > > >> --------------------------------- > > > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > > > >> Starting program: > > > >> > > tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > > > >> 17.2.2.102 -p 9999 > > > >> [Thread debugging using libthread_db enabled] > > > >> [New Thread 47388824908032 (LWP 4620)] > > > >> ipaddr (17.2.2.102) > > > >> port 9999 > > > >> created cm_id 0x506b00 > > > >> [New Thread 1082132800 (LWP 4623)] > > > >> cma_event type 0 cma_id 0x506b00 (parent) > > > >> cma_event type 2 cma_id 0x506b00 (parent) > > > >> rdma_resolve_addr - rdma_resolve_route successful > > > >> created pd 0x506e60 > > > >> created channel 0x506e80 > > > >> created cq 0x506ea0 > > > >> created qp 0x506f40 > > > >> rping_setup_buffers called on cb 0x505010 > > > >> allocated & registered buffers... > > > >> [New Thread 1090525504 (LWP 4624)] > > > >> cq_thread started. > > > >> > > > >> > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From bpradip at in.ibm.com Thu Jul 13 12:30:38 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Fri, 14 Jul 2006 01:00:38 +0530 Subject: [openib-general] ping problem with ammasso cards(iWARPinterface) In-Reply-To: <1152815220.25964.35.camel@stevo-desktop> References: <001001c6a6a9$0c239460$4710100a@pc.s2io.com> <1152815220.25964.35.camel@stevo-desktop> Message-ID: <44B69F5E.9080107@in.ibm.com> Steve Wise wrote: > I guess this isn't surprising since rping doesn't work for you either. > Something fundamental is screwed up on your user side methinks... True.. Ravinandan why don't you do one thing. If you are running on an x86 platform, send me (private mail) the rping binary, and all the libraries (libibverbs, libamso, libibcm, librdmacm). Thanks, Pradipta Kumar. > > CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side > wasn't listening on the appropriate TCP port, or the server process did > an rdma_reject(). I'm guessing its the former... > > You could use tcpdmp and to see if the connection request is getting RST > by the remote side. > > > > > On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote: >> With the --cma option, I don't see the error about running SM. >> But there's no connection established. >> >> openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma >> pp_server_connect_cma starting server >> >> openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 >> pp_client_connect_cma starting client >> pp_client_connect_cma/856 unexpected CM event 8 >> pp_client_connect_cma NOT connected! >> pp_connect_cma(17.2.2.102,18515) failed! >> >> There are no messages in dmesg either. >> >> Ravi >> >> -----Original Message----- >> From: Steve Wise [mailto:swise at opengridcomputing.com] >> Sent: Thursday, July 13, 2006 6:55 AM >> To: Ravinandan Arakali >> Cc: bpradip at in.ibm.com; openib-general at openib.org >> Subject: Re: [openib-general] ping problem with ammasso >> cards(iWARPinterface) From ravinandan.arakali at neterion.com Thu Jul 13 12:35:30 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Thu, 13 Jul 2006 12:35:30 -0700 Subject: [openib-general] ping problem with ammassocards(iWARPinterface) In-Reply-To: <1152817805.25964.40.camel@stevo-desktop> Message-ID: <001901c6a6b3$80750fb0$4710100a@pc.s2io.com> After some period of time(say about 20-30 seconds). Right now, I am trying to rebuild the libraries with latest code from iwarp branch. Ravi -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Thursday, July 13, 2006 12:10 PM To: ravinandan.arakali at neterion.com Cc: bpradip at in.ibm.com; openib-general at openib.org Subject: Re: [openib-general] ping problem with ammassocards(iWARPinterface) By the way, does this failure happen immediately or after some period of time? On Thu, 2006-07-13 at 13:27 -0500, Steve Wise wrote: > I guess this isn't surprising since rping doesn't work for you either. > Something fundamental is screwed up on your user side methinks... > > CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side > wasn't listening on the appropriate TCP port, or the server process did > an rdma_reject(). I'm guessing its the former... > > You could use tcpdmp and to see if the connection request is getting RST > by the remote side. > > > > > On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote: > > With the --cma option, I don't see the error about running SM. > > But there's no connection established. > > > > openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma > > pp_server_connect_cma starting server > > > > openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 > > pp_client_connect_cma starting client > > pp_client_connect_cma/856 unexpected CM event 8 > > pp_client_connect_cma NOT connected! > > pp_connect_cma(17.2.2.102,18515) failed! > > > > There are no messages in dmesg either. > > > > Ravi > > > > -----Original Message----- > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > Sent: Thursday, July 13, 2006 6:55 AM > > To: Ravinandan Arakali > > Cc: bpradip at in.ibm.com; openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso > > cards(iWARPinterface) > > > > > > Are you trying to run this over iwarp? It doesn't need an SM... > > > > For the perftests rdma_lat and rdma_bw in the iwarp branch, use the > > --cma flag. > > > > Steve. > > > > > > On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: > > > Also, I am trying to run some of the iwarp bandwidth/latency tests > > > (available under directory perftest). > > > The first thing to do here is to run opensm. When I run opensm (with debug > > > level 10), I get the following error. Any idea what needs to be done to > > get > > > this working ? > > > > > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.2.0 > > > Command Line Arguments: > > > d level = 0xa > > > Log File: /var/log/osm.log > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.2.0 > > > > > > Using default GUID 0x0 > > > Error: Could not get port guid > > > Exiting SM > > > > > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > > > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > > > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > > > > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > > > previous bind > > > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > Sent: Wednesday, July 12, 2006 10:31 AM > > > To: Ravinandan Arakali > > > Cc: openib-general at openib.org > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > interface) > > > > > > > > > Ravinandan, > > > Do you still see the rping crash? > > > > > > Thanks, > > > Pradipta Kumar. > > > > > > Ravinandan Arakali wrote: > > > > Pradipta, > > > > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > > > > values in /proc/krping. When I re-ran the krping test, I see following > > > > output > > > > openfab2:~ # cat /proc/krping > > > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > > > Any idea why rping is dumping core ? > > > > > > > > Has any testing been done using SDP with ammasso cards ? > > > > > > > > Regards, > > > > Ravi > > > > > > > > > > > > -----Original Message----- > > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > > Sent: Friday, July 07, 2006 11:20 PM > > > > To: Ravinandan Arakali > > > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > > > openib-general at openib.org > > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > > interface) > > > > > > > > > > > > Ravinandan Arakali wrote: > > > >> Pradipta, > > > >> Following is the output from gdb after core dump. I have also > > copy-pasted > > > >> the gdb output on client system. > > > >> > > > >> Attached is the dmesg output when krping test is run in verbose mode. > > > >> The ping data on the sender(client) seems okay. The content is shifted > > > >> forward by one character for each packet. On receiver, after receiving > > > > ping > > > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > > > messages > > > >> can be lost during writing to /var/log/messages ? > > > > krping is indeed working!!...Using 'verbose' allows you to see the ping > > > > data. > > > > When not using 'verbose' you see only 'send/recv' messages. > > > >> ----------------------------------------- > > > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > > > >> Starting program: > > > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > > > >> 0.0.0.0 -p 9999 > > > >> [Thread debugging using libthread_db enabled] > > > >> [New Thread -1210054992 (LWP 3668)] > > > >> ipaddr (0.0.0.0) > > > >> port 9999 > > > >> created cm_id 0x804e6e0 > > > >> [New Thread -1210057824 (LWP 3671)] > > > >> rdma_bind_addr successful > > > >> rdma_listen > > > >> cma_event type 4 cma_id 0x804e968 (child) > > > >> child cma 0x804e968 > > > >> > > > >> Program received signal SIGSEGV, Segmentation fault. > > > >> [Switching to Thread -1210054992 (LWP 3668)] > > > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > > > >> (gdb) bt > > > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address > > 0x6 > > > >> ) at examples/rping.c:767 > > > >> (gdb) > > > >> > > > >> --------------------------------- > > > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > > > >> Starting program: > > > >> > > tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > > > >> 17.2.2.102 -p 9999 > > > >> [Thread debugging using libthread_db enabled] > > > >> [New Thread 47388824908032 (LWP 4620)] > > > >> ipaddr (17.2.2.102) > > > >> port 9999 > > > >> created cm_id 0x506b00 > > > >> [New Thread 1082132800 (LWP 4623)] > > > >> cma_event type 0 cma_id 0x506b00 (parent) > > > >> cma_event type 2 cma_id 0x506b00 (parent) > > > >> rdma_resolve_addr - rdma_resolve_route successful > > > >> created pd 0x506e60 > > > >> created channel 0x506e80 > > > >> created cq 0x506ea0 > > > >> created qp 0x506f40 > > > >> rping_setup_buffers called on cb 0x505010 > > > >> allocated & registered buffers... > > > >> [New Thread 1090525504 (LWP 4624)] > > > >> cq_thread started. > > > >> > > > >> > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Thu Jul 13 13:08:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Jul 2006 13:08:47 -0700 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <44B6191B.4060807@mellanox.co.il> (Tziporet Koren's message of "Thu, 13 Jul 2006 12:57:47 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> <44B6191B.4060807@mellanox.co.il> Message-ID: > BTW - we will need your help regarding SRP daemon changes that were > implemented by Ishai for OFED 1.0 but have not make it then. Hmm, I haven't seen any SRP daemon stuff that looked suitable yet. Is there going to be an updated version sent out soon? - R. From rdreier at cisco.com Thu Jul 13 13:10:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Jul 2006 13:10:44 -0700 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: <44B623DA.20506@voltaire.com> (Or Gerlitz's message of "Thu, 13 Jul 2006 13:43:38 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7421@mtlexch01.mtl.com> <44B6191B.4060807@mellanox.co.il> <44B623DA.20506@voltaire.com> Message-ID: > So if the link which ib0 maps to is DOWN you move the ib0 IPv4 address > to another device whose link is UP (eg ib1) and you somehow have ib1 > send a gratuitous ARP? I think there may be a problem in the way IPoIB deals with gratuitous ARPs. Because if a neighbour structure is updated by the networking core, there's no way for IPoIB to know about that and update the associated IB path. Has anyone actually tried this failover approach? - R. From rdreier at cisco.com Thu Jul 13 13:11:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Jul 2006 13:11:42 -0700 Subject: [openib-general] IPoIB Path Rerequests and Compliance In-Reply-To: <1152799621.13048.191884.camel@hal.voltaire.com> (Hal Rosenstock's message of "13 Jul 2006 10:07:04 -0400") References: <1152799621.13048.191884.camel@hal.voltaire.com> Message-ID: > It appears to me that if for some reason IPoIB rerequests a path that > has already been resolved, it has filled in values for fields in the > request which are not to be honored by the SA as their component mask > bits are not on. While the SA responds properly to this since it ignores > anything which does not have the comp mask bit set, it is technically a > non compliance. Is this worth fixing ? If so, do you want a patch for > this ? (Also, I haven't yet checked any other ULPs for a similar issue). I'm not sure this is worth fixing but I guess the right place to fix it would be in the sa query module (so that you don't have to look at other ULPs). - R. From mst at mellanox.co.il Thu Jul 13 13:30:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 13 Jul 2006 23:30:49 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features In-Reply-To: References: Message-ID: <20060713203049.GA24900@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: OFED 1.1 release - schedule and features > > > So if the link which ib0 maps to is DOWN you move the ib0 IPv4 address > > to another device whose link is UP (eg ib1) and you somehow have ib1 > > send a gratuitous ARP? > > I think there may be a problem in the way IPoIB deals with gratuitous > ARPs. Because if a neighbour structure is updated by the networking > core, there's no way for IPoIB to know about that and update the > associated IB path. > Has anyone actually tried this failover approach? I think we discussed this once - if we see the issue, a single memcmp will probably solve this. -- MST From akpm at osdl.org Thu Jul 13 13:54:46 2006 From: akpm at osdl.org (Andrew Morton) Date: Thu, 13 Jul 2006 13:54:46 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> Message-ID: <20060713135446.5e2c6dd5.akpm@osdl.org> On Thu, 13 Jul 2006 08:42:47 -0700 Roland Dreier wrote: > > Sigh. It was always a mistake (of the kernel programming 101 type) to put > > any locking at all in the idr code. At some stage we need to weed it all > > out and move it to callers. > > > > Your fix is yet more fallout from that mistake. > > Agreed. Consider me on the hook to fix this up in a better way once > my life is a little saner. Maybe I'll try to cook something up on the > plane ride to Ottawa. > I suspect it'll get really ugly. It's a container library which needs to allocate memory when items are added, like the radix-tree. Either it needs to assume GFP_ATOMIC, which is bad and can easily fail or it does weird things like radix_tree_preload(). The basic problem is: idr_pre_get(GFP_KERNEL); spin_lock(my_lock); idr_get_new(..); which is racy, because some other CPU could have got in there and consumed some of the pool which was precharged by idr_pre_get(). It's wildly improbable that it'll actually fail. It requires all of: a) that the race occur b) that the racing thread consume an appreciable amount of the pool c) that this thread also consume an appreciable amount (such that the total of both exceeds the pool size). d) that a (needs to be added) GFP_ATOMIC attempt to replenish the pool inside idr_get_new() fails. From rdreier at cisco.com Thu Jul 13 14:03:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Jul 2006 14:03:21 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: <20060713135446.5e2c6dd5.akpm@osdl.org> (Andrew Morton's message of "Thu, 13 Jul 2006 13:54:46 -0700") References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> Message-ID: > I suspect it'll get really ugly. It's a container library which needs to > allocate memory when items are added, like the radix-tree. Either it needs > to assume GFP_ATOMIC, which is bad and can easily fail or it does weird > things like radix_tree_preload(). Actually I don't think it has to be too bad. We could tweak the interface a little bit so that consumers do something like: struct idr_layer *layer = NULL; /* opaque */ retry: spin_lock(&my_idr_lock); ret = idr_get_new(&my_idr, ptr, &id, layer); spin_unlock(&my_idr_lock); if (ret == -EAGAIN) { layer = idr_alloc_layer(&my_idr, GFP_KERNEL); if (!IS_ERR(layer)) goto retry; } in other words make the consumer responsible for passing in new memory that can be used for a new entry (or freed if other entries have become free in the meantime). - R. From arjan at infradead.org Thu Jul 13 14:05:46 2006 From: arjan at infradead.org (Arjan van de Ven) Date: Thu, 13 Jul 2006 23:05:46 +0200 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> Message-ID: <1152824747.3024.92.camel@laptopd505.fenrus.org> On Thu, 2006-07-13 at 14:03 -0700, Roland Dreier wrote: > > I suspect it'll get really ugly. It's a container library which needs to > > allocate memory when items are added, like the radix-tree. Either it needs > > to assume GFP_ATOMIC, which is bad and can easily fail or it does weird > > things like radix_tree_preload(). > > Actually I don't think it has to be too bad. We could tweak the > interface a little bit so that consumers do something like: > > it does get harder if this is needed for your IB device to do more work, so that your swap device on your IB can take more IO's to free up ram.. From akpm at osdl.org Thu Jul 13 14:43:41 2006 From: akpm at osdl.org (Andrew Morton) Date: Thu, 13 Jul 2006 14:43:41 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> Message-ID: <20060713144341.97d4f771.akpm@osdl.org> On Thu, 13 Jul 2006 14:03:21 -0700 Roland Dreier wrote: > > I suspect it'll get really ugly. It's a container library which needs to > > allocate memory when items are added, like the radix-tree. Either it needs > > to assume GFP_ATOMIC, which is bad and can easily fail or it does weird > > things like radix_tree_preload(). > > Actually I don't think it has to be too bad. We could tweak the > interface a little bit so that consumers do something like: > > struct idr_layer *layer = NULL; /* opaque */ > > retry: > spin_lock(&my_idr_lock); > ret = idr_get_new(&my_idr, ptr, &id, layer); > spin_unlock(&my_idr_lock); > > if (ret == -EAGAIN) { > layer = idr_alloc_layer(&my_idr, GFP_KERNEL); > if (!IS_ERR(layer)) > goto retry; > } > > in other words make the consumer responsible for passing in new memory > that can be used for a new entry (or freed if other entries have > become free in the meantime). > Good point, a try-again loop would work. Do we really need the caller to maintain a cache? I suspect something like drat: if (idr_pre_get(GFP_KERNEL) == ENOMEM) give_up(); spin_lock(); ret = idr_get_new(); spin_unlock(); if (ret == ENOMEM) goto drat; would do it. From swise at opengridcomputing.com Thu Jul 13 14:58:35 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 13 Jul 2006 16:58:35 -0500 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC Message-ID: <1152827915.25964.61.camel@stevo-desktop> All, I added a quick "how to" on the wiki for the Chelsio T3 rnic... Steve. From sean.hefty at intel.com Thu Jul 13 15:29:36 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 13 Jul 2006 15:29:36 -0700 Subject: [openib-general] [PATCH] librdmacm: remove dependency on sysfs Message-ID: <000001c6a6cb$d266eb50$ff0da8c0@amr.corp.intel.com> Remove libsysfs usage from librdmacm. Signed-off-by: Sean Hefty ---Index: configure.in =================================================================== --- configure.in (revision 8215) +++ configure.in (working copy) @@ -25,8 +25,6 @@ AC_CHECK_SIZEOF(long) dnl Checks for libraries if test "$disable_libcheck" != "yes" then -AC_CHECK_LIB(sysfs, sysfs_open_class, [], - AC_MSG_ERROR([sysfs_open_class() not found. librdmacm requires libsysfs.])) AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], AC_MSG_ERROR([ibv_get_device_list() not found. librdmacm requires libibverbs.])) fi @@ -34,8 +32,6 @@ fi dnl Checks for header files. if test "$disable_libcheck" != "yes" then -AC_CHECK_HEADER(sysfs/libsysfs.h, [], - AC_MSG_ERROR([ not found. librdmacm requires libsysfs.])) AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) fi Index: src/cma.c =================================================================== --- src/cma.c (revision 8215) +++ src/cma.c (working copy) @@ -49,8 +49,7 @@ #include #include -#include - +#include #include #include #include @@ -118,10 +117,9 @@ struct cma_id_private { uint32_t handle; }; -static struct ibv_device **dev_list; -static struct dlist *cma_dev_list; +static struct cma_device *cma_dev_array; +static int cma_dev_cnt; static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; -static int ucma_initialized; static int abi_ver = RDMA_USER_CM_MAX_ABI_VERSION; #define container_of(ptr, type, field) \ @@ -129,92 +127,67 @@ static int abi_ver = RDMA_USER_CM_MAX_AB static void ucma_cleanup(void) { - struct cma_device *cma_dev; - - if (cma_dev_list) { - dlist_for_each_data(cma_dev_list, cma_dev, struct cma_device) - ibv_close_device(cma_dev->verbs); + if (cma_dev_cnt) { + while (cma_dev_cnt) + ibv_close_device(cma_dev_array[--cma_dev_cnt].verbs); - dlist_destroy(cma_dev_list); - cma_dev_list = NULL; + free(cma_dev_array); + cma_dev_cnt = 0; } } static int check_abi_version(void) { - char path[256]; - struct sysfs_attribute *attr; - int ret = -1; - - if (sysfs_get_mnt_path(path, sizeof path)) { - fprintf(stderr, "librdmacm: couldn't find sysfs mount.\n"); - return -ENODEV; - } - - strncat(path, "/class/misc/rdma_cm/abi_version", sizeof path); + char value[8]; - attr = sysfs_open_attribute(path); - if (!attr) { - fprintf(stderr, "librdmacm: couldn't open rdma_cm ABI version.\n"); - return -ENOSYS; + if (ibv_read_sysfs_file(ibv_get_sysfs_path(), + "class/misc/rdma_cm/abi_version", + value, sizeof value) < 0) { + fprintf(stderr, "librdmacm: couldn't read ABI version.\n"); + return -1; } - if (sysfs_read_attribute(attr)) { - fprintf(stderr, "librdmacm: couldn't read rdma_cm ABI version.\n"); - goto out; - } - - abi_ver = strtol(attr->value, NULL, 10); - + abi_ver = strtol(value, NULL, 10); if (abi_ver < RDMA_USER_CM_MIN_ABI_VERSION || abi_ver > RDMA_USER_CM_MAX_ABI_VERSION) { fprintf(stderr, "librdmacm: kernel ABI version %d " "doesn't match library version %d.\n", abi_ver, RDMA_USER_CM_MAX_ABI_VERSION); - goto out; + return -1; } - - ret = 0; - -out: - sysfs_close_attribute(attr); - return ret; + return 0; } static int ucma_init(void) { - int i; + struct ibv_device **dev_list = NULL; struct cma_device *cma_dev; struct ibv_device_attr attr; - int ret; + int i, ret; pthread_mutex_lock(&mut); - if (ucma_initialized) + if (cma_dev_cnt) goto out; ret = check_abi_version(); if (ret) goto err; - cma_dev_list = dlist_new(sizeof *cma_dev); - if (!cma_dev_list) { - ret = -ENOMEM; - goto err; - } - - dev_list = ibv_get_device_list(NULL); + dev_list = ibv_get_device_list(&cma_dev_cnt); if (!dev_list) { printf("CMA: unable to get RDMA device list\n"); ret = -ENODEV; goto err; } + cma_dev_array = malloc(sizeof *cma_dev * cma_dev_cnt); + if (!cma_dev_array) { + ret = -ENOMEM; + goto err; + } + for (i = 0; dev_list[i]; ++i) { - cma_dev = malloc(sizeof *cma_dev); - if (!cma_dev) { - ret = -ENOMEM; - goto err; - } + cma_dev = &cma_dev_array[i]; cma_dev->guid = ibv_get_device_guid(dev_list[i]); cma_dev->verbs = ibv_open_device(dev_list[i]); @@ -231,9 +204,7 @@ static int ucma_init(void) } cma_dev->port_cnt = attr.phys_port_cnt; - dlist_push(cma_dev_list, cma_dev); } - ucma_initialized = 1; out: pthread_mutex_unlock(&mut); return 0; @@ -254,7 +225,7 @@ struct rdma_event_channel *rdma_create_e { struct rdma_event_channel *channel; - if (!ucma_initialized && ucma_init()) + if (!cma_dev_cnt && ucma_init()) return NULL; channel = malloc(sizeof *channel); @@ -281,13 +252,16 @@ void rdma_destroy_event_channel(struct r static int ucma_get_device(struct cma_id_private *id_priv, uint64_t guid) { struct cma_device *cma_dev; + int i; - dlist_for_each_data(cma_dev_list, cma_dev, struct cma_device) + for (i = 0; i < cma_dev_cnt; i++) { + cma_dev = &cma_dev_array[i]; if (cma_dev->guid == guid) { id_priv->cma_dev = cma_dev; id_priv->id.verbs = cma_dev->verbs; return 0; } + } return -ENODEV; } @@ -370,7 +344,7 @@ int rdma_create_id(struct rdma_event_cha void *msg; int ret, size; - ret = ucma_initialized ? 0 : ucma_init(); + ret = cma_dev_cnt ? 0 : ucma_init(); if (ret) return ret; @@ -1104,7 +1078,7 @@ int rdma_get_cm_event(struct rdma_event_ void *msg; int ret, size; - ret = ucma_initialized ? 0 : ucma_init(); + ret = cma_dev_cnt ? 0 : ucma_init(); if (ret) return ret; From sean.hefty at intel.com Thu Jul 13 16:31:56 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 13 Jul 2006 16:31:56 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users Message-ID: <000101c6a6d4$87fe8600$ff0da8c0@amr.corp.intel.com> The following patch adds calls to the userspace RDMA CM to return its list of RDMA devices. The calls are similar to ibv_get_device_list() / ibv_free_device_list(). Currently, RDMA device contexts are handed to the user only after they create an rdma_cm_id and bind it to a local device. By exposing the device list to the user, it makes it easier for the user to allocate device specific resources (such as PDs, CQs, etc.) that are shared among multiple rdma_cm_id's. Signed-off-by: Sean Hefty --- Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 8215) +++ include/rdma/rdma_cma.h (working copy) @@ -332,4 +332,20 @@ static inline uint16_t rdma_get_dst_port ((struct sockaddr_in *) &id->route.addr.dst_addr)->sin_port; } +/** + * rdma_get_devices - Get list of RDMA devices currently available. + * @num_devices: If non-NULL, set to the number of devices returned. + * + * Return a NULL-terminated array of opened RDMA devices. Callers can use this + * routine to allocate resources on specific RDMA devices that will be shared + * across multiple rdma_cm_id's. + * The array must be released by calling rdma_free_devices(). + */ +struct ibv_context **rdma_get_devices(int *num_devices); + +/** + * rdma_free_devices - Frees the list of devices returned by rdma_get_devices(). + */ +void rdma_free_devices(struct ibv_context **list); + #endif /* RDMA_CMA_H */ Index: src/cma.c =================================================================== --- src/cma.c (revision 8517) +++ src/cma.c (working copy) @@ -216,6 +216,32 @@ err: return ret; } +struct ibv_context **rdma_get_devices(int *num_devices) +{ + struct ibv_context **devs = NULL; + int i; + + if (!cma_dev_cnt && ucma_init()) + goto out; + + devs = malloc(sizeof *devs * (cma_dev_cnt + 1)); + if (!devs) + goto out; + + for (i = 0; i < cma_dev_cnt; i++) + devs[i] = cma_dev_array[i].verbs; + devs[i] = NULL; +out: + if (num_devices) + *num_devices = devs ? cma_dev_cnt : 0; + return devs; +} + +void rdma_free_devices(struct ibv_context **list) +{ + free(list); +} + static void __attribute__((destructor)) rdma_cma_fini(void) { ucma_cleanup(); Index: src/librdmacm.map =================================================================== --- src/librdmacm.map (revision 8215) +++ src/librdmacm.map (working copy) @@ -21,5 +21,7 @@ RDMACM_1.0 { rdma_get_dst_attr; rdma_join_multicast; rdma_leave_multicast; + rdma_get_devices; + rdma_free_devices; local: *; }; From louis.laborde at hp.com Thu Jul 13 17:14:54 2006 From: louis.laborde at hp.com (Louis Laborde) Date: Thu, 13 Jul 2006 17:14:54 -0700 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC In-Reply-To: <1152827915.25964.61.camel@stevo-desktop> References: <1152827915.25964.61.camel@stevo-desktop> Message-ID: <44B6E1FE.8090303@hp.com> Steve Wise wrote: > All, > > I added a quick "how to" on the wiki for the Chelsio T3 rnic... > > Steve. > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Steve, I am trying to bring up Chelsio T3 rnic on IA64 platform following your "how to". Everything went fine (svn checkout, patching, building, rebooting) until I tried to load iw_cxgb3. Any idea? Thanks, Louis +---------------------------------------------------------+ | Louis LABORDE e-mail: louis.laborde at hp.com | | HP Cupertino SISL phone: (408) 447-3649 | +---------------------------------------------------------+ [root at iblp0044 ~]# modprobe iw_cxgb3 Segmentation fault [root at iblp0044 ~]# dmesg ... ... eth2: Chelsio T320 2x10000BaseX RNIC (rev 0) PCI-X 133MHz/64-bit MSI-X eth2: 128MB CM, 256MB PMTX, 256MB PMRX eth3: Chelsio T320 2x10000BaseX RNIC (rev 0) PCI-X 133MHz/64-bit MSI-X Unable to handle kernel paging request at virtual address 0000000000248000 modprobe[3348]: Oops 8804682956800 [1] Modules linked in: iw_cxgb3 cxgb3c ib_umad ib_ucm ib_uverbs ib_sa ib_cm ib_mad ib_core cxgb3 Pid: 3348, CPU 0, comm: modprobe psr : 00001010081a6018 ifs : 8000000000000183 ip : [] Not tainted ip is at memset+0x240/0x420 unat: 0000000000000000 pfs : 0000000000000593 rsc : 0000000000000003 rnat: 0000002080000000 bsps: ffffffff80000000 pr : 0000000005550519 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a00000020022f540 b6 : a0000002000d44e0 b7 : a0000002000b9f00 f6 : 1003e0000000000000000 f7 : 1003e6db6db6db6db6db7 f8 : 1003e00000000071b5ed6 f9 : 1003e0000000000000000 f10 : 1003e0000000000020000 f11 : 1003e0000000000000000 r1 : a000000100c80920 r2 : e0000040f98034f8 r3 : a000000100a99f98 r8 : 0000000000248000 r9 : 6db6db6db6db6db7 r10 : 00000000071b5ed6 r11 : 0000000038daf6b0 r12 : e000004043897e30 r13 : e000004043890000 r14 : e0000040f98034f8 r15 : 00000040fa3e8000 r16 : a000000200248000 r17 : a0007fffc7200000 r18 : a000000100a9c220 r19 : a000000100a9c220 r20 : 00000040fffb8000 r21 : 0000000000000010 r22 : 0000000000000800 r23 : 0000000000000007 r24 : 0000000000248000 r25 : 00000040fc020000 r26 : a000000202248000 r27 : 0000000000248010 r28 : 0000000000288000 r29 : e0000040fc020000 r30 : 0000000000000000 r31 : 00000000000007ff Call Trace: [] show_stack+0x50/0xa0 sp=e0000040438979c0 bsp=e0000040438912a8 [] show_regs+0x820/0x840 sp=e000004043897b90 bsp=e000004043891260 [] die+0x1d0/0x2e0 sp=e000004043897b90 bsp=e000004043891218 [] ia64_do_page_fault+0x8e0/0xa00 sp=e000004043897bb0 bsp=e0000040438911b8 [] ia64_leave_kernel+0x0/0x280 sp=e000004043897c60 bsp=e0000040438911b8 [] memset+0x240/0x420 sp=e000004043897e30 bsp=e0000040438911a0 [] open_rnic_toe+0x140/0x620 [iw_cxgb3] sp=e000004043897e30 bsp=e000004043891148 [] t3c_register_client+0x140/0x1e0 [cxgb3c] sp=e000004043897e30 bsp=e000004043891118 [] iwch_init_module+0xc0/0x100 [iw_cxgb3] sp=e000004043897e30 bsp=e000004043891100 [] sys_init_module+0x250/0x520 sp=e000004043897e30 bsp=e000004043891088 [] ia64_ret_from_syscall+0x0/0x20 sp=e000004043897e30 bsp=e000004043891088 [] ia64_ivt+0xffffffff00010640/0x400 sp=e000004043898000 bsp=e000004043891088 BUG: modprobe/3348, lock held at task exit time! [a000000200129440] {t3cdev_db_lock} .. held by: modprobe: 3348 [e000004043890000, 116] ... acquired at: t3c_register_client+0x30/0x1e0 [cxgb3c] From rdreier at cisco.com Thu Jul 13 17:18:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Jul 2006 17:18:14 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> <1152824747.3024.92.camel@laptopd505.fenrus.org> Message-ID: Arjan> it does get harder if this is needed for your IB device to Arjan> do more work, so that your swap device on your IB can take Arjan> more IO's to free up ram.. That's the classic problem, but it's more a matter of the consumer using GFP_NOIO in the right places. - R. From louis.laborde at hp.com Thu Jul 13 18:00:46 2006 From: louis.laborde at hp.com (Louis Laborde) Date: Thu, 13 Jul 2006 18:00:46 -0700 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC In-Reply-To: <1152827915.25964.61.camel@stevo-desktop> References: <1152827915.25964.61.camel@stevo-desktop> Message-ID: <44B6ECBE.6040208@hp.com> Steve Wise wrote: > All, > > I added a quick "how to" on the wiki for the Chelsio T3 rnic... > > Steve. > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Steve, The two steps below fail: # (cd librdmacm && ./autogen.sh && ./configure && make && make install) # (cd libcxgb3 && ./autogen.sh && ./configure && make && make install) with: checking for ibv_get_device_list in -libverbs... no configure: error: ibv_get_device_list() not found. librdmacm requires libibverbs. checking for ibv_get_device_list in -libverbs... no configure: error: ibv_get_device_list() not found. libmthca requires libibverbs. Should the configure script do "-libibverbs" instead of "-libverbs" ? Thanks, Louis +---------------------------------------------------------+ | Louis LABORDE e-mail: louis.laborde at hp.com | | HP Cupertino SISL phone: (408) 447-3649 | +---------------------------------------------------------+ From rdreier at cisco.com Thu Jul 13 18:08:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Jul 2006 18:08:17 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> <20060713144341.97d4f771.akpm@osdl.org> Message-ID: > Good point, a try-again loop would work. Do we really need the caller to > maintain a cache? I suspect something like > > drat: > if (idr_pre_get(GFP_KERNEL) == ENOMEM) > give_up(); > spin_lock(); > ret = idr_get_new(); > spin_unlock(); > if (ret == ENOMEM) > goto drat; > > would do it. The problem (for my tiny brain at least) is that I don't know where idr_pre_get() can put the memory it allocates if there's no lock in the idr structure -- how do you maintain internal consistency if no locks are held when filling the cache? Having the caller hold a chunk of memory in a stack variable was the trick I came up with to get around that. - R. From akpm at osdl.org Thu Jul 13 18:18:35 2006 From: akpm at osdl.org (Andrew Morton) Date: Thu, 13 Jul 2006 18:18:35 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> <20060713144341.97d4f771.akpm@osdl.org> Message-ID: <20060713181835.ad5eeff6.akpm@osdl.org> On Thu, 13 Jul 2006 18:08:17 -0700 Roland Dreier wrote: > > Good point, a try-again loop would work. Do we really need the caller to > > maintain a cache? I suspect something like > > > > drat: > > if (idr_pre_get(GFP_KERNEL) == ENOMEM) > > give_up(); > > spin_lock(); > > ret = idr_get_new(); > > spin_unlock(); > > if (ret == ENOMEM) > > goto drat; > > > > would do it. > > The problem (for my tiny brain at least) is that I don't know where > idr_pre_get() can put the memory it allocates if there's no lock in > the idr structure -- how do you maintain internal consistency if no > locks are held when filling the cache? argh. Aren't you supposed to be on vacation or something? > Having the caller hold a chunk of memory in a stack variable was the > trick I came up with to get around that. Yes, that certainly works. From pmullaney at novell.com Thu Jul 13 18:28:06 2006 From: pmullaney at novell.com (Patrick Mullaney) Date: Thu, 13 Jul 2006 19:28:06 -0600 Subject: [openib-general] [PATCH] OpenSM: Support configurable SL per partition Message-ID: <44B69EC6020000C700012E4A@sinclair.provo.novell.com> Hi Sasha, I agree. Your check is best. Thanks. Pat Sorry if this is a duplicate I got a bounce earlier today. >>> Sasha Khapyorsky 07/12/06 6:45 PM >>> Hi Pat, On 15:14 Tue 11 Jul , Patrick Mullaney wrote: > This will avoid an invalid warning about service level value if sl=0 is > used in the partition config file. Yes, this is wrong warning, but original goal of this check was to catch non-numeric string. Think something like this may be better: diff --git a/osm/opensm/osm_prtn_config.c b/osm/opensm/osm_prtn_config.c index bd15513..5647ca8 100644 --- a/osm/opensm/osm_prtn_config.c +++ b/osm/opensm/osm_prtn_config.c @@ -150,7 +150,9 @@ static int partition_add_flag(unsigned l " - skipped.\n"); } else if (!strncmp(flag, "sl", len)) { unsigned sl; - if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > 15) + char *end; + if (!val || !*val || (sl = strtoul(val, &end, 0)) > 15 || + (*end && !isspace(*end))) PARSEWARN(conf->p_log, lineno, "flag \'sl\' requires valid value" " - skipped.\n"); The overflow case should be filtered by sl > 15 check. Sasha > Can you include something like it in > your original patch? > > Thanks. > Pat > > -------------------------------------------------------------------------------------------------- > > --- opensm.old/osm_prtn_config.c 2006-07-11 16:06:33.000000000 > -0400 > +++ opensm/osm_prtn_config.c 2006-07-11 16:08:00.000000000 -0400 > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -159,7 +160,7 @@ static int partition_add_flag(unsigned l > } else if (!strncmp(flag, "sl", len)) { > unsigned sl; > > - if (!val || (sl = strtoul(val, NULL, 0)) == 0 || sl > > 15) > + if (!val || (sl = strtoul(val, NULL, 0)) == UINT_MAX || > sl > 15) > PARSEWARN(conf->p_log, lineno, > "flag \'sl\' requires valid value" > " - skipped.\n"); > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From akpm at osdl.org Thu Jul 13 18:30:47 2006 From: akpm at osdl.org (Andrew Morton) Date: Thu, 13 Jul 2006 18:30:47 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: <20060713181835.ad5eeff6.akpm@osdl.org> References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> <20060713144341.97d4f771.akpm@osdl.org> <20060713181835.ad5eeff6.akpm@osdl.org> Message-ID: <20060713183047.642bd9e6.akpm@osdl.org> On Thu, 13 Jul 2006 18:18:35 -0700 Andrew Morton wrote: > > Having the caller hold a chunk of memory in a stack variable was the > > trick I came up with to get around that. > > Yes, that certainly works. Problem is, I think, you'll need to preallocate IDR_FREE_MAX items. And then free them all again when none of them were consumed (usual). Yes, storing the preallocated nodes in the idr itself requires locking. But that locking is 100% private to the IDR implementation. It locks only the preload list and not the user's stuff. radix_tree_preload() effectively does this. Except the preload list is kernel-wide. It's split across CPUs and uses local_irq_disable/preempt_disable locking tricks as a performance optimisation. But conceptually it's the same. Simply copying that would give something which is known to work... It seems like a large amount of fuss, but when you think about it the problem isn't simple. From somenath at veritas.com Thu Jul 13 19:25:35 2006 From: somenath at veritas.com (somenath) Date: Thu, 13 Jul 2006 19:25:35 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero Message-ID: <44B7009F.7000204@veritas.com> two Qs refers to openib kernel stack found in Redhat distribution (RHEL4 update 3) (redhat kernel version: 2.4.18-14 #1): 1. if ib_poll_cq(cq, 1, &wc) returns zero, does wc contain a valid entry? [just for reference what happend in gen1 stack following code is from gen1 code where if similar routine returned zero send_entry was still valid... if ( target->state == TARGET_ACTIVE_CONNECTION ) { status = ib_cq_poll( target->active_conn->cqs_hndl, &send_entry ); if ( status == 0 ) { cq_send_handler( NULL, &send_entry, target ); } } ] 2. why is the io completion routine called when ib_poll_cq() returns zero? is there some error happening here? thanks, som. From arjan at infradead.org Thu Jul 13 23:20:26 2006 From: arjan at infradead.org (Arjan van de Ven) Date: Fri, 14 Jul 2006 08:20:26 +0200 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> <1152824747.3024.92.camel@laptopd505.fenrus.org> Message-ID: <1152858026.3159.7.camel@laptopd505.fenrus.org> On Thu, 2006-07-13 at 17:18 -0700, Roland Dreier wrote: > Arjan> it does get harder if this is needed for your IB device to > Arjan> do more work, so that your swap device on your IB can take > Arjan> more IO's to free up ram.. > > That's the classic problem, but it's more a matter of the consumer > using GFP_NOIO in the right places. GFP_NOIO isn't going to save you in the cases where the memory really is running low and you need the memory to do more IO... From halr at voltaire.com Fri Jul 14 04:10:28 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Jul 2006 07:10:28 -0400 Subject: [openib-general] [PATCHv2] OpenSM/osm_sa_path_record.c: Support SL in PathRecord requests based on whether or not QoS is enabled Message-ID: <1152875427.4642.1132.camel@hal.voltaire.com> OpenSM/osm_sa_path_record.c: Support SL in PathRecord requests based on whether or not QoS is enabled Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8507) +++ opensm/osm_sa_path_record.c (working copy) @@ -66,6 +66,7 @@ #include #include #include +#include #define OSM_PR_RCV_POOL_MIN_SIZE 64 #define OSM_PR_RCV_POOL_GROW_SIZE 64 @@ -164,6 +165,7 @@ __osm_pr_rcv_get_path_parms( const osm_physp_t* p_physp; const osm_physp_t* p_dest_physp; const osm_switch_t* p_sw; + const osm_prtn_t* p_prtn; const ib_port_info_t* p_pi; const cl_qmap_t* p_sw_tbl; ib_api_status_t status = IB_SUCCESS; @@ -174,6 +176,7 @@ __osm_pr_rcv_get_path_parms( uint8_t required_mtu; uint8_t required_rate; uint8_t required_pkt_life; + uint8_t sl; ib_net16_t dest_lid; OSM_LOG_ENTER( p_rcv->p_log, __osm_pr_rcv_get_path_parms ); @@ -548,7 +551,6 @@ __osm_pr_rcv_get_path_parms( p_parms->mtu = mtu; p_parms->rate = rate; p_parms->pkt_life = pkt_life; - p_parms->sl = OSM_DEFAULT_SL; if( comp_mask & IB_PR_COMPMASK_RAWTRAFFIC && cl_ntoh32( p_pr->hop_flow_raw ) & ( 1<<31 ) ) @@ -560,8 +562,9 @@ __osm_pr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_pr_rcv_get_path_parms: ERR 1F1A: " - "Ports do not share specified P_Key\n"); + "Ports do not share specified P_Key 0x%4x\n", cl_ntoh16(pkey)); status = IB_NOT_FOUND; + goto Exit; } } else @@ -573,9 +576,32 @@ __osm_pr_rcv_get_path_parms( "__osm_pr_rcv_get_path_parms: ERR 1F1B: " "Ports do not have any shared P_Keys\n"); status = IB_NOT_FOUND; + goto Exit; } } + + p_prtn = (osm_prtn_t *)cl_qmap_get(&p_rcv->p_subn->prtn_pkey_tbl, + cl_ntoh16(pkey & ~0x8000)); + if ( p_prtn == (osm_prtn_t *)cl_qmap_end(&p_rcv->p_subn->prtn_pkey_tbl) ) + { + /* this may be possible when pkey tables are created somehow in + previous runs or things are going wrong here */ + sl = OSM_DEFAULT_SL; + osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, + "__osm_pr_rcv_get_path_parms: ERR 1F1C: " + "No partition found for P_Key 0x%04x - using default SL %d\n", cl_ntoh16(pkey), sl ); + } + else + sl = p_prtn->sl; + + if ( ( comp_mask & IB_PR_COMPMASK_SL ) && ib_path_rec_sl( p_pr ) != sl ) + { + status = IB_NOT_FOUND; + goto Exit; + } + p_parms->pkey = pkey; + p_parms->sl = sl; Exit: OSM_LOG_EXIT( p_rcv->p_log ); @@ -613,7 +639,7 @@ __osm_pr_rcv_build_pr( p_pr->slid = cl_hton16( src_lid_ho ); p_pr->pkey = p_parms->pkey; - p_pr->sl = p_parms->sl; + p_pr->sl = cl_hton16(p_parms->sl); p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80); p_pr->rate = (uint8_t)(p_parms->rate | 0x80); @@ -987,18 +1013,6 @@ __osm_pr_rcv_get_end_points( into the endpoints. */ - if( comp_mask & IB_PR_COMPMASK_SL ) - { - if( p_pr->sl != OSM_DEFAULT_SL ) - { - if ( p_sa_mad->method == IB_MAD_METHOD_GET ) - sa_status = IB_SA_MAD_STATUS_NO_RECORDS; - *pp_src_port = 0; - *pp_dest_port = 0; - goto Exit; - } - } - if( comp_mask & IB_PR_COMPMASK_SGID ) { *pp_src_port = (osm_port_t*)cl_qmap_get( @@ -1414,7 +1428,7 @@ __osm_pr_match_mgrp_attributes( if( comp_mask & IB_PR_COMPMASK_SL ) { - if( ( p_pr->sl & 0xf ) != sl ) + if( ib_path_rec_sl( p_pr ) != sl ) goto Exit; } @@ -1422,7 +1436,7 @@ __osm_pr_match_mgrp_attributes( if( ( comp_mask & IB_PR_COMPMASK_NUMBPATH ) && ( p_sa_mad->method != IB_MAD_METHOD_GET ) ) { - if( ( p_pr->num_path & 0x7f ) == 0 ) + if( ib_path_rec_num_path( p_pr ) == 0 ) goto Exit; } From halr at voltaire.com Fri Jul 14 04:11:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Jul 2006 07:11:00 -0400 Subject: [openib-general] [PATCHv2] OpenSM: Support configurable SL per partition Message-ID: <1152875460.4642.1197.camel@hal.voltaire.com> OpenSM: Support configurable SL per partition Signed-off-by: Sasha Khapyorsky Signed-off-by: Hal Rosenstock Index: include/opensm/osm_partition.h =================================================================== --- include/opensm/osm_partition.h (revision 8520) +++ include/opensm/osm_partition.h (working copy) @@ -102,6 +102,7 @@ { cl_map_item_t map_item; uint16_t pkey; + uint8_t sl; cl_map_t full_guid_tbl; cl_map_t part_guid_tbl; char name[32]; @@ -114,6 +115,9 @@ * pkey * The IBA defined P_KEY of this Partition. * +* sl +* The Service Level (SL) associated with this Partiton. +* * port_guid_tbl * Container of pointers to all Port objects in the Partition, * indexed by port GUID. Index: opensm/osm_prtn.c =================================================================== --- opensm/osm_prtn.c (revision 8520) +++ opensm/osm_prtn.c (working copy) @@ -79,6 +79,7 @@ memset(p, 0, sizeof(*p)); p->pkey = pkey; + p->sl = OSM_DEFAULT_SL; cl_map_construct(&p->full_guid_tbl); cl_map_init(&p->full_guid_tbl, 32); cl_map_construct(&p->part_guid_tbl); @@ -220,7 +221,7 @@ mc_rec.pkey = pkey; mc_rec.rate = rate ? rate : 0x3; /* 10Gb/sec */ mc_rec.pkt_life = OSM_DEFAULT_SUBNET_TIMEOUT; - mc_rec.sl_flow_hop = OSM_DEFAULT_SL << 28; + mc_rec.sl_flow_hop = ib_member_set_sl_flow_hop(p->sl, 0, 0); /* Note: scope needs to be consistent with MGID */ mc_rec.scope_state = 0x21; Index: opensm/osm_prtn_config.c =================================================================== --- opensm/osm_prtn_config.c (revision 8520) +++ opensm/osm_prtn_config.c (working copy) @@ -52,8 +52,10 @@ #include #include #include +#include #include +#include #include #include #include @@ -82,7 +84,7 @@ osm_log_t *p_log; osm_subn_t *p_subn; osm_prtn_t *p_prtn; - unsigned is_ipoib, mtu, rate; + unsigned is_ipoib, mtu, rate, sl; }; @@ -122,6 +124,16 @@ if (!conf->p_prtn) return -1; + if (conf->p_subn->opt.no_qos) { + if (conf->sl != OSM_DEFAULT_SL) { + osm_log(conf->p_log, OSM_LOG_ERROR, + "partition_create: Overriding SL %d to default SL %d on partition %s as QoS not enabled\n", + conf->sl, OSM_DEFAULT_SL, name); + conf->sl = OSM_DEFAULT_SL; + } + } + conf->p_prtn->sl = conf->sl; + if (conf->is_ipoib) osm_prtn_add_mcgroup(conf->p_log, conf->p_subn, conf->p_prtn, conf->is_ipoib, conf->rate, conf->mtu); @@ -145,6 +157,17 @@ PARSEWARN(conf->p_log, lineno, "flag \'rate\' requires valid value" " - skipped.\n"); + } else if (!strncmp(flag, "sl", len)) { + unsigned sl; + char *end; + + if (!val || !*val || (sl = strtoul(val, &end, 0)) > 15 || + (*end && !isspace(*end))) + PARSEWARN(conf->p_log, lineno, + "flag \'sl\' requires valid value" + " - skipped.\n"); + else + conf->sl = sl; } else { PARSEWARN(conf->p_log, lineno, "unrecognized partition flag \'%s\'" @@ -254,6 +277,8 @@ conf->p_log = p_log; conf->p_subn = p_subn; conf->p_prtn = NULL; + conf->is_ipoib = 0; + conf->sl = OSM_DEFAULT_SL; return conf; } Index: doc/partition-config.txt =================================================================== --- doc/partition-config.txt (revision 8520) +++ doc/partition-config.txt (working copy) @@ -46,6 +46,7 @@ result IPoIB capable MC group will be created. rate= - specifies rate for this IPoIB MC group (default is 3 (10GBps)) mtu= - specifies MTU for this IPoIB MC group (default is 4 (2048)) +sl= - specifies SL for this IPoIB MC group (default is 0) Note that values for 'rate' and 'mtu' should be specified as defined in IBTA specification (for example mtu=4 for 2048). From bpradip at in.ibm.com Fri Jul 14 05:53:31 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Fri, 14 Jul 2006 18:23:31 +0530 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <000101c6a6d4$87fe8600$ff0da8c0@amr.corp.intel.com> References: <000101c6a6d4$87fe8600$ff0da8c0@amr.corp.intel.com> Message-ID: <44B793CB.2080508@in.ibm.com> Sean Hefty wrote: > The following patch adds calls to the userspace RDMA CM to return its list > of RDMA devices. The calls are similar to ibv_get_device_list() / > ibv_free_device_list(). Thanks Sean for adding this functionality. This was needed. Thanks, Pradipta Kumar. > > Currently, RDMA device contexts are handed to the user only after they > create an rdma_cm_id and bind it to a local device. By exposing the device > list to the user, it makes it easier for the user to allocate device > specific resources (such as PDs, CQs, etc.) that are shared among multiple > rdma_cm_id's. > > Signed-off-by: Sean Hefty > --- > Index: include/rdma/rdma_cma.h > =================================================================== > --- include/rdma/rdma_cma.h (revision 8215) > +++ include/rdma/rdma_cma.h (working copy) > @@ -332,4 +332,20 @@ static inline uint16_t rdma_get_dst_port > ((struct sockaddr_in *) &id->route.addr.dst_addr)->sin_port; > } > > +/** > + * rdma_get_devices - Get list of RDMA devices currently available. > + * @num_devices: If non-NULL, set to the number of devices returned. > + * > + * Return a NULL-terminated array of opened RDMA devices. Callers can use this > + * routine to allocate resources on specific RDMA devices that will be shared > + * across multiple rdma_cm_id's. > + * The array must be released by calling rdma_free_devices(). > + */ > +struct ibv_context **rdma_get_devices(int *num_devices); > + > +/** > + * rdma_free_devices - Frees the list of devices returned by rdma_get_devices(). > + */ > +void rdma_free_devices(struct ibv_context **list); > + > #endif /* RDMA_CMA_H */ > Index: src/cma.c > =================================================================== > --- src/cma.c (revision 8517) > +++ src/cma.c (working copy) > @@ -216,6 +216,32 @@ err: > return ret; > } > > +struct ibv_context **rdma_get_devices(int *num_devices) > +{ > + struct ibv_context **devs = NULL; > + int i; > + > + if (!cma_dev_cnt && ucma_init()) > + goto out; > + > + devs = malloc(sizeof *devs * (cma_dev_cnt + 1)); > + if (!devs) > + goto out; > + > + for (i = 0; i < cma_dev_cnt; i++) > + devs[i] = cma_dev_array[i].verbs; > + devs[i] = NULL; > +out: > + if (num_devices) > + *num_devices = devs ? cma_dev_cnt : 0; > + return devs; > +} > + > +void rdma_free_devices(struct ibv_context **list) > +{ > + free(list); > +} > + > static void __attribute__((destructor)) rdma_cma_fini(void) > { > ucma_cleanup(); > Index: src/librdmacm.map > =================================================================== > --- src/librdmacm.map (revision 8215) > +++ src/librdmacm.map (working copy) > @@ -21,5 +21,7 @@ RDMACM_1.0 { > rdma_get_dst_attr; > rdma_join_multicast; > rdma_leave_multicast; > + rdma_get_devices; > + rdma_free_devices; > local: *; > }; > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From swise at opengridcomputing.com Fri Jul 14 06:43:28 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 14 Jul 2006 08:43:28 -0500 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC In-Reply-To: <44B6ECBE.6040208@hp.com> References: <1152827915.25964.61.camel@stevo-desktop> <44B6ECBE.6040208@hp.com> Message-ID: <1152884608.26840.21.camel@stevo-desktop> > > > > Steve, > > The two steps below fail: > > # (cd librdmacm && ./autogen.sh && ./configure && make && make install) > # (cd libcxgb3 && ./autogen.sh && ./configure && make && make install) > > > with: > checking for ibv_get_device_list in -libverbs... no > configure: error: ibv_get_device_list() not found. librdmacm requires libibverbs. > checking for ibv_get_device_list in -libverbs... no > configure: error: ibv_get_device_list() not found. libmthca requires libibverbs. > > Should the configure script do "-libibverbs" instead of "-libverbs" ? > no. The problem is your library search path doesn't include /usr/local/lib. Or, you didn't do an ldconfig after building libibverbs. Make sure /usr/local/lib is specified in /etc/ld.so.conf or in a file in /etc/ld.so.conf.d. Then run ldconfig. You can verify that its working be running ldconfig -v and seeing libibverbs in the path... From swise at opengridcomputing.com Fri Jul 14 06:54:12 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 14 Jul 2006 08:54:12 -0500 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC In-Reply-To: <44B6E1FE.8090303@hp.com> References: <1152827915.25964.61.camel@stevo-desktop> <44B6E1FE.8090303@hp.com> Message-ID: <1152885252.26840.30.camel@stevo-desktop> Erf. I see the bug. SCE (stewpid coding error :). The patch below will solve the crash, but the issue really is that we're failing to allocate our data structures. To work around this for now, I suggest you patch the cxgb3 src code to reduce the number of supported objects in core/cxio_hal.h. Namely T3_MAX_NUM_RI, T3_MAX_NUM_QP, etc... For the crash try this: ---- Index: iwch.c =================================================================== --- iwch.c (revision 8481) +++ iwch.c (working copy) @@ -65,7 +65,8 @@ static inline void *vzmalloc(int size) { void *p = vmalloc(size); - memset(p, 0, size); + if (p) + memset(p, 0, size); return p; } Steve. > [root at iblp0044 ~]# modprobe iw_cxgb3 > Segmentation fault > > [root at iblp0044 ~]# dmesg > ... > > ... > eth2: Chelsio T320 2x10000BaseX RNIC (rev 0) PCI-X 133MHz/64-bit MSI-X > eth2: 128MB CM, 256MB PMTX, 256MB PMRX > eth3: Chelsio T320 2x10000BaseX RNIC (rev 0) PCI-X 133MHz/64-bit MSI-X > Unable to handle kernel paging request at virtual address 0000000000248000 > modprobe[3348]: Oops 8804682956800 [1] > Modules linked in: iw_cxgb3 cxgb3c ib_umad ib_ucm ib_uverbs ib_sa ib_cm ib_mad ib_core cxgb3 > > Pid: 3348, CPU 0, comm: modprobe > psr : 00001010081a6018 ifs : 8000000000000183 ip : [] Not tainted > ip is at memset+0x240/0x420 > unat: 0000000000000000 pfs : 0000000000000593 rsc : 0000000000000003 > rnat: 0000002080000000 bsps: ffffffff80000000 pr : 0000000005550519 > ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f > csd : 0000000000000000 ssd : 0000000000000000 > b0 : a00000020022f540 b6 : a0000002000d44e0 b7 : a0000002000b9f00 > f6 : 1003e0000000000000000 f7 : 1003e6db6db6db6db6db7 > f8 : 1003e00000000071b5ed6 f9 : 1003e0000000000000000 > f10 : 1003e0000000000020000 f11 : 1003e0000000000000000 > r1 : a000000100c80920 r2 : e0000040f98034f8 r3 : a000000100a99f98 > r8 : 0000000000248000 r9 : 6db6db6db6db6db7 r10 : 00000000071b5ed6 > r11 : 0000000038daf6b0 r12 : e000004043897e30 r13 : e000004043890000 > r14 : e0000040f98034f8 r15 : 00000040fa3e8000 r16 : a000000200248000 > r17 : a0007fffc7200000 r18 : a000000100a9c220 r19 : a000000100a9c220 > r20 : 00000040fffb8000 r21 : 0000000000000010 r22 : 0000000000000800 > r23 : 0000000000000007 r24 : 0000000000248000 r25 : 00000040fc020000 > r26 : a000000202248000 r27 : 0000000000248010 r28 : 0000000000288000 > r29 : e0000040fc020000 r30 : 0000000000000000 r31 : 00000000000007ff > > Call Trace: > [] show_stack+0x50/0xa0 > sp=e0000040438979c0 bsp=e0000040438912a8 > [] show_regs+0x820/0x840 > sp=e000004043897b90 bsp=e000004043891260 > [] die+0x1d0/0x2e0 > sp=e000004043897b90 bsp=e000004043891218 > [] ia64_do_page_fault+0x8e0/0xa00 > sp=e000004043897bb0 bsp=e0000040438911b8 > [] ia64_leave_kernel+0x0/0x280 > sp=e000004043897c60 bsp=e0000040438911b8 > [] memset+0x240/0x420 > sp=e000004043897e30 bsp=e0000040438911a0 > [] open_rnic_toe+0x140/0x620 [iw_cxgb3] > sp=e000004043897e30 bsp=e000004043891148 > [] t3c_register_client+0x140/0x1e0 [cxgb3c] > sp=e000004043897e30 bsp=e000004043891118 > [] iwch_init_module+0xc0/0x100 [iw_cxgb3] > sp=e000004043897e30 bsp=e000004043891100 > [] sys_init_module+0x250/0x520 > sp=e000004043897e30 bsp=e000004043891088 > [] ia64_ret_from_syscall+0x0/0x20 > sp=e000004043897e30 bsp=e000004043891088 > [] ia64_ivt+0xffffffff00010640/0x400 > sp=e000004043898000 bsp=e000004043891088 > BUG: modprobe/3348, lock held at task exit time! > [a000000200129440] {t3cdev_db_lock} > .. held by: modprobe: 3348 [e000004043890000, 116] > ... acquired at: t3c_register_client+0x30/0x1e0 [cxgb3c] > > From swise at opengridcomputing.com Fri Jul 14 07:03:51 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 14 Jul 2006 09:03:51 -0500 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC In-Reply-To: <1152884608.26840.21.camel@stevo-desktop> References: <1152827915.25964.61.camel@stevo-desktop> <44B6ECBE.6040208@hp.com> <1152884608.26840.21.camel@stevo-desktop> Message-ID: <1152885831.26840.32.camel@stevo-desktop> BTW: I updated the wiki to mention this issue. On Fri, 2006-07-14 at 08:43 -0500, Steve Wise wrote: > > > > > > > Steve, > > > > The two steps below fail: > > > > # (cd librdmacm && ./autogen.sh && ./configure && make && make install) > > # (cd libcxgb3 && ./autogen.sh && ./configure && make && make install) > > > > > > with: > > checking for ibv_get_device_list in -libverbs... no > > configure: error: ibv_get_device_list() not found. librdmacm requires libibverbs. > > checking for ibv_get_device_list in -libverbs... no > > configure: error: ibv_get_device_list() not found. libmthca requires libibverbs. > > > > Should the configure script do "-libibverbs" instead of "-libverbs" ? > > > > no. The problem is your library search path doesn't > include /usr/local/lib. Or, you didn't do an ldconfig after building > libibverbs. Make sure /usr/local/lib is specified in /etc/ld.so.conf > or in a file in /etc/ld.so.conf.d. Then run ldconfig. You can verify > that its working be running ldconfig -v and seeing libibverbs in the > path... > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From minich at ornl.gov Fri Jul 14 07:13:57 2006 From: minich at ornl.gov (Makia Minich) Date: Fri, 14 Jul 2006 10:13:57 -0400 Subject: [openib-general] Problem with svn checkout Message-ID: >From the archive, I see that back in March someone was having a svn checkout problem (http://openib.org/pipermail/openib-general/2006-March/019134.html). Was there ever a solution? Because I'm having an issue doing a fresh checkout of the trunk: svn: In directory 'trunk/src/userspace/mpi/mvapich-gen2/www/www1' svn: Can't copy 'trunk/src/userspace/mpi/mvapich-gen2/www/www1/.svn/tmp/text-base/mpicc.html .svn-base' to 'trunk/src/userspace/mpi/mvapich-gen2/www/www1/mpicc.html.tmp': No such file or directory Hopefully I just didn't find the answer and it's a simple solution. Thanks, Makia From minich at ornl.gov Fri Jul 14 07:22:38 2006 From: minich at ornl.gov (Makia Minich) Date: Fri, 14 Jul 2006 10:22:38 -0400 Subject: [openib-general] Problem with svn checkout In-Reply-To: Message-ID: Hmmm... Nevermind, it would appear that this is an issue with my subversion client on OS X. I guess I need to figure out what is wrong there. Sorry for the disruption. On 7/14/06 10:13 AM, "Makia Minich" wrote: >> From the archive, I see that back in March someone was having a svn checkout > problem (http://openib.org/pipermail/openib-general/2006-March/019134.html). > Was there ever a solution? Because I'm having an issue doing a fresh > checkout of the trunk: > > svn: In directory 'trunk/src/userspace/mpi/mvapich-gen2/www/www1' > svn: Can't copy > 'trunk/src/userspace/mpi/mvapich-gen2/www/www1/.svn/tmp/text-base/mpicc.html > .svn-base' to > 'trunk/src/userspace/mpi/mvapich-gen2/www/www1/mpicc.html.tmp': No such file > or directory > > Hopefully I just didn't find the answer and it's a simple solution. > > Thanks, > Makia > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tom at opengridcomputing.com Fri Jul 14 07:57:12 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 14 Jul 2006 09:57:12 -0500 Subject: [openib-general] Usermode build weirdness on Suse10 64b Message-ID: <1152889032.7671.12.camel@trinity.ogc.int> I'm getting broken autogen.sh behavior on SuSE 10 64b. Basically, the autogen.sh generates a ./configure file that has a syntax error that prevents it from successfully determining sizeof(long). A workaround is to move the AC_CHECK_SIZEOF(long) line in the configure.in file up to just below AC_PROG_CC and everything works fine. This seem pretty busted to me, but maybe I'm doing something very wrong. It also looks like libibat is no longer used, but it's mentioned in the install cheat notes wiki. Thanks, Tom From mshefty at ichips.intel.com Fri Jul 14 09:17:50 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 14 Jul 2006 09:17:50 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44B793CB.2080508@in.ibm.com> References: <000101c6a6d4$87fe8600$ff0da8c0@amr.corp.intel.com> <44B793CB.2080508@in.ibm.com> Message-ID: <44B7C3AE.3010302@ichips.intel.com> Pradipta Kumar Banerjee wrote: > Thanks Sean for adding this functionality. This was needed. This was committed to svn 8523. - Sean From mshefty at ichips.intel.com Fri Jul 14 09:50:01 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 14 Jul 2006 09:50:01 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7009F.7000204@veritas.com> References: <44B7009F.7000204@veritas.com> Message-ID: <44B7CB39.6060401@ichips.intel.com> somenath wrote: > 1. if ib_poll_cq(cq, 1, &wc) returns zero, does wc contain a valid entry? no * Poll a CQ for (possibly multiple) completions. If the return value * is < 0, an error occurred. If the return value is >= 0, it is the * number of completions returned. If the return value is * non-negative and strictly less than num_entries, then the CQ was * emptied. - Sean From somenath at veritas.com Fri Jul 14 10:01:55 2006 From: somenath at veritas.com (somenath) Date: Fri, 14 Jul 2006 10:01:55 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7CB39.6060401@ichips.intel.com> References: <44B7009F.7000204@veritas.com> <44B7CB39.6060401@ichips.intel.com> Message-ID: <44B7CE03.2070806@veritas.com> Sean Hefty wrote: > somenath wrote: > >> 1. if ib_poll_cq(cq, 1, &wc) returns zero, does wc contain a valid >> entry? > > > no > > * Poll a CQ for (possibly multiple) completions. If the return value > * is < 0, an error occurred. If the return value is >= 0, it is the > * number of completions returned. If the return value is > * non-negative and strictly less than num_entries, then the CQ was > * emptied. > > - Sean Thanks Sean! As per the above definiton "no error" occurred since return value is 0.. And as per your clarification, wc doesn't contain a valid entry. So my next question: 2. why is the io completion routine called when ib_poll_cq() returns zero? does this kind of notification contain any information? is there some error happening here? what are some possible problem areas? any wild guess...? anyone else seen this kind of stuff? ( I get send completion and recv completion both your 0 return value). thanks, som. From mshefty at ichips.intel.com Fri Jul 14 10:09:07 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 14 Jul 2006 10:09:07 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7CE03.2070806@veritas.com> References: <44B7009F.7000204@veritas.com> <44B7CB39.6060401@ichips.intel.com> <44B7CE03.2070806@veritas.com> Message-ID: <44B7CFB3.3080107@ichips.intel.com> somenath wrote: > 2. why is the io completion routine called when ib_poll_cq() returns > zero? does this kind of notification contain any information? > is there some error happening here? what are some possible problem areas? > any wild guess...? Can you clarify what's happening? Are you calling ib_poll_cq() from your ib_comp_handler() and not finding a completion? - Sean From trimmer at silverstorm.com Fri Jul 14 10:20:47 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 14 Jul 2006 13:20:47 -0400 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7CFB3.3080107@ichips.intel.com> Message-ID: > From: Sean Hefty > somenath wrote: > > 2. why is the io completion routine called when ib_poll_cq() returns > > zero? does this kind of notification contain any information? > > is there some error happening here? what are some possible problem > areas? > > any wild guess...? > > Can you clarify what's happening? Are you calling ib_poll_cq() from your > ib_comp_handler() and not finding a completion? Be aware that an IB completion handler should be coded as: while poll_cq returns a completion process completion rearm CQ while poll_cq returns a completion process completion Per the IBTA spec, rearm CQ is not required to generate an event for completions already on the CQ, just for newly arriving ones (Mellanox HCAs will generate a completion event if any CQEs remain on the CQ). Due to various race situations between the HCA and the software/ULP, there are some valid cases where a completion event could occur after poll_cq has already processed the completion. Todd Rimmer From somenath at veritas.com Fri Jul 14 10:33:27 2006 From: somenath at veritas.com (somenath) Date: Fri, 14 Jul 2006 10:33:27 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7CFB3.3080107@ichips.intel.com> References: <44B7009F.7000204@veritas.com> <44B7CB39.6060401@ichips.intel.com> <44B7CE03.2070806@veritas.com> <44B7CFB3.3080107@ichips.intel.com> Message-ID: <44B7D567.1010901@veritas.com> Sean Hefty wrote: > somenath wrote: > >> 2. why is the io completion routine called when ib_poll_cq() returns >> zero? does this kind of notification contain any information? >> is there some error happening here? what are some possible problem >> areas? >> any wild guess...? > > > Can you clarify what's happening? Are you calling ib_poll_cq() from > your ib_comp_handler() and not finding a completion? > > - Sean That's right Sean. I establish a RC connection and pre-post a buffer of size 4K in one side, and try to send 4K packet from the other side. each side gets a completion event when ib_poll_cq() rets zero. ib_recv_comp_hanlder(cq, arg) { struct ib_wc wc; ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); //i check for errors, there is no error; count = ib_poll_cq(cq, 1, &wc); // here I find count = 0 } same happens to send_comp_handler too.. so, everytime I get a cq notification, I get a zero entry (send and recv completion event occurs on respective nodes). thanks, som. From somenath at veritas.com Fri Jul 14 10:39:04 2006 From: somenath at veritas.com (somenath) Date: Fri, 14 Jul 2006 10:39:04 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: References: Message-ID: <44B7D6B8.8020109@veritas.com> Rimmer, Todd wrote: >>From: Sean Hefty >>somenath wrote: >> >> >>>2. why is the io completion routine called when ib_poll_cq() returns >>>zero? does this kind of notification contain any information? >>>is there some error happening here? what are some possible problem >>> >>> >>areas? >> >> >>>any wild guess...? >>> >>> >>Can you clarify what's happening? Are you calling ib_poll_cq() from >> >> >your > > >>ib_comp_handler() and not finding a completion? >> >> > >Be aware that an IB completion handler should be coded as: > >while poll_cq returns a completion > process completion >rearm CQ > > that's what I am doing, to be more specific: rearm CQ; while (ib_poll_cq(cq, 1, &wc) > 0) { process completion(); } is that what you meant? >while poll_cq returns a completion > process completion > >Per the IBTA spec, rearm CQ is not required to generate an event for >completions already on the CQ, just for newly arriving ones (Mellanox >HCAs will generate a completion event if any CQEs remain on the CQ). > >Due to various race situations between the HCA and the software/ULP, >there are some valid cases where a completion event could occur after >poll_cq has already processed the completion. > > in my case, poll_cq() always returns 0, so I never get a valid wc entry... >Todd Rimmer > > From trimmer at silverstorm.com Fri Jul 14 10:50:42 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 14 Jul 2006 13:50:42 -0400 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7D6B8.8020109@veritas.com> Message-ID: > From: somenath [mailto:somenath at veritas.com] > Rimmer, Todd wrote: > > >Be aware that an IB completion handler should be coded as: > > > >while poll_cq returns a completion > > process completion > >rearm CQ > > > > > > that's what I am doing, to be more specific: > > rearm CQ; > while (ib_poll_cq(cq, 1, &wc) > 0) { > process completion(); > } > > is that what you meant? > > >while poll_cq returns a completion > > process completion > > > in my case, poll_cq() always returns 0, so I never get a valid wc entry... > I'm not sure why you are never getting a valid wc entry, however by coding it as you have, on mellanox hardware, you will always get an extra completion event. You should code it as: while (ib_poll_cq(cq, 1, &wc) > 0) { process completion(); } rearm CQ while (ib_poll_cq(cq, 1, &wc) > 0) { process completion(); } If you do the rearm CQ first, mellanox HCAs will find there are CQEs still on the CQ, and generate another completion event. Hence you should always poll 1st, then rearm, then poll again to make sure there were no CQEs arriving racing with the rearm. Todd Rimmer From somenath at veritas.com Fri Jul 14 10:51:25 2006 From: somenath at veritas.com (somenath) Date: Fri, 14 Jul 2006 10:51:25 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7D6B8.8020109@veritas.com> References: <44B7D6B8.8020109@veritas.com> Message-ID: <44B7D99D.5090907@veritas.com> somenath wrote: >Rimmer, Todd wrote: > > > >>>From: Sean Hefty >>>somenath wrote: >>> >>> >>> >>> >>>>2. why is the io completion routine called when ib_poll_cq() returns >>>>zero? does this kind of notification contain any information? >>>>is there some error happening here? what are some possible problem >>>> >>>> >>>> >>>> >>>areas? >>> >>> >>> >>> >>>>any wild guess...? >>>> >>>> >>>> >>>> >>>Can you clarify what's happening? Are you calling ib_poll_cq() from >>> >>> >>> >>> >>your >> >> >> >> >>>ib_comp_handler() and not finding a completion? >>> >>> >>> >>> >>Be aware that an IB completion handler should be coded as: >> >>while poll_cq returns a completion >> process completion >>rearm CQ >> >> >> >> > >that's what I am doing, to be more specific: > >rearm CQ; >while (ib_poll_cq(cq, 1, &wc) > 0) { > process completion(); >} > >is that what you meant? > > just to make sure I conveyed the exact thing I meant, if I change the above code as follows: while (ib_poll_cq(cq, 1, &wc) > 0) { process completion(); } rearm CQ; then I just get notification once, and don't get any futher notifications...so I assume rearm CQ should be done even if ib_poll_cq() returns zero. thanks, som. > > >>while poll_cq returns a completion >> process completion >> >>Per the IBTA spec, rearm CQ is not required to generate an event for >>completions already on the CQ, just for newly arriving ones (Mellanox >>HCAs will generate a completion event if any CQEs remain on the CQ). >> >>Due to various race situations between the HCA and the software/ULP, >>there are some valid cases where a completion event could occur after >>poll_cq has already processed the completion. >> >> >> >> > >in my case, poll_cq() always returns 0, so I never get a valid wc entry... > > > >>Todd Rimmer >> >> >> >> > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From trimmer at silverstorm.com Fri Jul 14 10:56:59 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 14 Jul 2006 13:56:59 -0400 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7D99D.5090907@veritas.com> Message-ID: > From: somenath [mailto:somenath at veritas.com] > > just to make sure I conveyed the exact thing I meant, if I change > the above code as follows: > > while (ib_poll_cq(cq, 1, &wc) > 0) { > process completion(); > } > rearm CQ; > > then I just get notification once, and don't get any futher > notifications...so I assume rearm CQ should be done even if > ib_poll_cq() returns zero. Yes, if you don't rearm you would never get another completion for the CQ. However it is not yet clear why you are not finding the initial completion when your 1st callback invoked poll_cq. Todd Rimmer From mshefty at ichips.intel.com Fri Jul 14 11:17:50 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 14 Jul 2006 11:17:50 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7D99D.5090907@veritas.com> References: <44B7D6B8.8020109@veritas.com> <44B7D99D.5090907@veritas.com> Message-ID: <44B7DFCE.7000908@ichips.intel.com> somenath wrote: > just to make sure I conveyed the exact thing I meant, if I change > the above code as follows: > > while (ib_poll_cq(cq, 1, &wc) > 0) { > process completion(); > } > rearm CQ; > > then I just get notification once, and don't get any futher > notifications...so I assume rearm CQ should be done even if > ib_poll_cq() returns zero. Hmm... what HCA cards and svn version are you using? Mellanox cards generate new events if any completions remain on the CQ. So, if ib_poll_cq() isn't finding any completions, then another event should be generated. If ib_poll_cq() is returning a completion, then it sounds like there's a bug with ib_poll_cq() returning the wrong number of completions. - Sean From somenath at veritas.com Fri Jul 14 11:34:31 2006 From: somenath at veritas.com (somenath) Date: Fri, 14 Jul 2006 11:34:31 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7DFCE.7000908@ichips.intel.com> References: <44B7D6B8.8020109@veritas.com> <44B7D99D.5090907@veritas.com> <44B7DFCE.7000908@ichips.intel.com> Message-ID: <44B7E3B7.3010802@veritas.com> Sean Hefty wrote: > somenath wrote: > >> just to make sure I conveyed the exact thing I meant, if I change >> the above code as follows: >> >> while (ib_poll_cq(cq, 1, &wc) > 0) { >> process completion(); >> } >> rearm CQ; >> >> then I just get notification once, and don't get any futher >> notifications...so I assume rearm CQ should be done even if >> ib_poll_cq() returns zero. > > > Hmm... what HCA cards and svn version are you using? Mellanox cards > generate new events if any completions remain on the CQ. So, if > ib_poll_cq() isn't finding any completions, then another event should > be generated. If ib_poll_cq() is returning a completion, then it > sounds like there's a bug with ib_poll_cq() returning the wrong number > of completions. > > - Sean hardware config: PCI express machine with PCI-Express cards of Mellanox HCA. Exact same configuration was working with gen1 stack (no change in the hardware setup). you can find rest of the details here: =================================================== [root at alekhine bin]# ./ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.6.0 Hardware version: a0 Node GUID: 0x0005ad0000039abc System image GUID: 0x0005ad000100d050 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 114 LMC: 0 SM lid: 1 Capability mask: 0x00510a68 Port GUID: 0x0005ad0000039abd Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00510a68 Port GUID: 0x0005ad0000039abe ========================================================= I am using the version of gen2 code as distributed in RH version 4 update 3 since I am trying to get a port on this version of the distribution. (so don't know the exact svn version they pulled from, I just use the RPM's in their binary distribution). thanks, som. From ralphc at pathscale.com Fri Jul 14 11:59:25 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Fri, 14 Jul 2006 11:59:25 -0700 Subject: [openib-general] [PATCH 28 of 39] IB/ipath - Fixes a bug where our delay for EEPROM no longer works due to compiler reordering Message-ID: <1152903565.4572.350.camel@brick.pathscale.com> > > > static void i2c_wait_for_writes(struct ipath_devdata *dd) > > > { > > > + mb(); > > > (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); > > > } > > That's a bit weird. I wouldn't have expected the compiler to muck around > > with a readl(). > I never liked this patch. The last time it came up there were > conflicting answers about whether it was a code generation bug or a > real issue talking to hardware or what. At the least I think this > merits a big comment explain what's going on -- and even better would > be really understanding the bug that's being fixed so that we're > confident it is indeed a real fix. > > - R. I haven't been able to reproduce the bug with -Os even on the systems where it used to fail so I guess this is a non-issue. Please ignore this patch. From louis.laborde at hp.com Fri Jul 14 12:07:04 2006 From: louis.laborde at hp.com (Louis Laborde) Date: Fri, 14 Jul 2006 12:07:04 -0700 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC In-Reply-To: <1152884608.26840.21.camel@stevo-desktop> References: <1152827915.25964.61.camel@stevo-desktop> <44B6ECBE.6040208@hp.com> <1152884608.26840.21.camel@stevo-desktop> Message-ID: <44B7EB58.30209@hp.com> Steve Wise wrote: >> Steve, >> >> The two steps below fail: >> >> # (cd librdmacm && ./autogen.sh && ./configure && make && make install) >> # (cd libcxgb3 && ./autogen.sh && ./configure && make && make install) >> >> >> with: >> checking for ibv_get_device_list in -libverbs... no >> configure: error: ibv_get_device_list() not found. librdmacm requires libibverbs. >> checking for ibv_get_device_list in -libverbs... no >> configure: error: ibv_get_device_list() not found. libmthca requires libibverbs. >> >> Should the configure script do "-libibverbs" instead of "-libverbs" ? >> > > no. The problem is your library search path doesn't > include /usr/local/lib. Or, you didn't do an ldconfig after building > libibverbs. Make sure /usr/local/lib is specified in /etc/ld.so.conf > or in a file in /etc/ld.so.conf.d. Then run ldconfig. You can verify > that its working be running ldconfig -v and seeing libibverbs in the > path... > > > Steve, I found my problem: I am using RHEL-U3 which comes with other version of the libibverbs library in package libibverbs-devel. This library is installed in /usr/lib and does not define the function ibv_get_device_list. I removed the package and dependencies with: rpm -e udapl udapl-devel libibverbs libibverbs-utils libibverbs-devel Is there a way to have the old and new openib libs/tools coexist? Thanks, Louis +---------------------------------------------------------+ | Louis LABORDE e-mail: louis.laborde at hp.com | | HP Cupertino SISL phone: (408) 447-3649 | +---------------------------------------------------------+ From mshefty at ichips.intel.com Fri Jul 14 12:11:18 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 14 Jul 2006 12:11:18 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7E3B7.3010802@veritas.com> References: <44B7D6B8.8020109@veritas.com> <44B7D99D.5090907@veritas.com> <44B7DFCE.7000908@ichips.intel.com> <44B7E3B7.3010802@veritas.com> Message-ID: <44B7EC56.6060706@ichips.intel.com> Can you also post your code, including the completion handler routines and QP creation / initialization sections? - Sean From somenath at veritas.com Fri Jul 14 12:53:18 2006 From: somenath at veritas.com (somenath) Date: Fri, 14 Jul 2006 12:53:18 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7EC56.6060706@ichips.intel.com> References: <44B7D6B8.8020109@veritas.com> <44B7D99D.5090907@veritas.com> <44B7DFCE.7000908@ichips.intel.com> <44B7E3B7.3010802@veritas.com> <44B7EC56.6060706@ichips.intel.com> Message-ID: <44B7F62E.7050001@veritas.com> Sean Hefty wrote: > Can you also post your code, including the completion handler routines > and QP creation / initialization sections? > > - Sean curretnly this is what I am trying to test, just to get one notification successfully with a proper wc element. thanks, som. int recvio_complete( struct ib_cq *cq, void *passed_arg) { io_complete(cq, passed_arg); } int sendio_complete( struct ib_cq *cq, void *passed_arg) { io_complete(cq, passed_arg); } int io_complete( struct ib_cq *cq, void *passed_arg) { xxx_connection_t *arg = passed_arg; xxx_status_t stat = xxx_st_ok; struct ib_wc wc; int count = 0; if (count = ib_poll_cq(cq, 1, &wc) < 0) { stat = xxx_st_error; arg = NULL; goto error; } if(stat = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP)) { stat = xxx_st_error; goto error; } if (count == 0) { stat = xxx_st_ok; goto error; } if ((arg != NULL) && (cq != NULL)) { io_process(conn_handle); } else { stat = xxx_st_error; goto error; } error: return stat; } xxx_status_t xxx_create_qpairs( xxx_connection_t *conn_handle) { xxx_status_t xxx_stat = xxx_st_ok; int ib_stat = 0; int entries_send, entries_recv; struct ib_qp_init_attr qp_attr = { 0 }; unsigned long flags; conn_handle->llp_send_cq = ib_create_cq( xxx_openib.xxx_device, sendio_complete, xxx_async_cb, conn_handle, XXX_SEND_CQ_DEPTH); if (IS_ERR(conn_handle->llp_send_cq)) { ib_stat = PTR_ERR(conn_handle->llp_send_cq); xxx_stat = map_ib_to_xxx_stat(ib_stat); goto free_out; } conn_handle->llp_recv_cq = ib_create_cq( xxx_openib.xxx_device, recvio_complete, xxx_async_cb, conn_handle, XXX_RECV_CQ_DEPTH); if (IS_ERR(conn_handle->llp_recv_cq)) { ib_stat = PTR_ERR(conn_handle->llp_recv_cq); xxx_stat = map_ib_to_xxx_stat(ib_stat); ib_destroy_cq(conn_handle->llp_send_cq); goto free_out; } if(ib_stat = ib_req_notify_cq(conn_handle->llp_send_cq, IB_CQ_NEXT_COMP)) { xxx_stat = map_ib_to_xxx_stat(ib_stat); ib_destroy_cq(conn_handle->llp_send_cq); ib_destroy_cq(conn_handle->llp_recv_cq); goto free_out; } if(ib_stat = ib_req_notify_cq(conn_handle->llp_recv_cq, IB_CQ_NEXT_COMP)) { xxx_stat = map_ib_to_xxx_stat(ib_stat); ib_destroy_cq(conn_handle->llp_recv_cq); goto free_out; } memset(&qp_attr, 0, sizeof qp_attr); qp_attr.event_handler = xxx_qevent_cb; qp_attr.qp_context = conn_handle; qp_attr.send_cq = conn_handle->llp_send_cq; qp_attr.recv_cq = conn_handle->llp_recv_cq; qp_attr.cap.max_send_wr = XXX_SENDQ_DEPTH; qp_attr.cap.max_recv_wr = XXX_RECVQ_DEPTH; qp_attr.cap.max_send_sge = 1; qp_attr.cap.max_recv_sge = 1; qp_attr.sq_sig_type = IB_SIGNAL_ALL_WR; qp_attr.qp_type = IB_QPT_RC; qp_attr.port_num = HCA_PRM_PORT; /* special QP types only */ conn_handle->llp_qp = ib_create_qp( xxx_openib.xxx_global_pd, &qp_attr); if (IS_ERR(conn_handle->llp_qp)) { ib_destroy_cq(conn_handle->llp_send_cq); ib_destroy_cq(conn_handle->llp_recv_cq); ib_stat = PTR_ERR(conn_handle->llp_qp); xxx_stat = map_ib_to_xxx_stat(ib_stat); goto free_out; } xxx_modifyqp_init(conn_handle); free_out: return (xxx_stat); } 102,1 68% 1,1 Top From louis.laborde at hp.com Fri Jul 14 12:57:58 2006 From: louis.laborde at hp.com (Louis Laborde) Date: Fri, 14 Jul 2006 12:57:58 -0700 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC In-Reply-To: <1152827915.25964.61.camel@stevo-desktop> References: <1152827915.25964.61.camel@stevo-desktop> Message-ID: <44B7F746.1060706@hp.com> Steve Wise wrote: > All, > > I added a quick "how to" on the wiki for the Chelsio T3 rnic... > > Steve. > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Steve, I finally got rping working with Chelsio T3 on IA64 platform. Which tools should I use to do bandwidth and latency measurements? Is it covered in some FAQ? Thanks, Louis +---------------------------------------------------------+ | Louis LABORDE e-mail: louis.laborde at hp.com | | HP Cupertino SISL phone: (408) 447-3649 | +---------------------------------------------------------+ From swise at opengridcomputing.com Fri Jul 14 13:47:11 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 14 Jul 2006 15:47:11 -0500 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC References: <1152827915.25964.61.camel@stevo-desktop> <44B7F746.1060706@hp.com> Message-ID: <001401c6a786$af80d4f0$020010ac@haggard> > > Steve, > > I finally got rping working with Chelsio T3 on IA64 platform. > > Which tools should I use to do bandwidth and latency measurements? > Is it covered in some FAQ? > No FAQ exists. But in the iwarp branch, the userspace/perftest dir has rdma_lat and rdma_bw which do rdma write latency and bandwidth tests. Use the --cma flag to force using the rdma cma for connection setup. Like this. For a test to crank out 10000 rdma writes of size 30000: server side: rdmabw --cma --size=30000 client side: rdmabw --cma --size=30000 --iters=10000 Ask Chelsio for other perf benchmark programs. They haven't released these other programs yet, but you could probably get them directly from Chelsio... NOTE: Large IO sizes don't work. Keep it under 60000B. Steve. From swise at opengridcomputing.com Fri Jul 14 13:55:58 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 14 Jul 2006 15:55:58 -0500 Subject: [openib-general] wiki update - howto for Chelsio's T3 RNIC References: <1152827915.25964.61.camel@stevo-desktop> <44B6ECBE.6040208@hp.com> <1152884608.26840.21.camel@stevo-desktop> <44B7EB58.30209@hp.com> Message-ID: <006c01c6a787$e89796b0$020010ac@haggard> > Steve, > > I found my problem: I am using RHEL-U3 which comes with other version > of > the libibverbs library in package libibverbs-devel. This library is > installed > in /usr/lib and does not define the function ibv_get_device_list. > > I removed the package and dependencies with: > rpm -e udapl udapl-devel libibverbs libibverbs-utils libibverbs-devel > > Is there a way to have the old and new openib libs/tools coexist? > Well, you could probably set things up so certain processes get one set of libs and certain processes get the other set by tweaking LIBPATH or whatever. But that's probably not a good idea since they share the same kernel subsystem. The iwarp branch kernel hasn't been tested with non-iwarp-branch libs for instance... The iwarp branch should work with the IB devices however. So I'm not sure why you would want both sets of libs? Steve. From mshefty at ichips.intel.com Fri Jul 14 14:01:44 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 14 Jul 2006 14:01:44 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B7F62E.7050001@veritas.com> References: <44B7D6B8.8020109@veritas.com> <44B7D99D.5090907@veritas.com> <44B7DFCE.7000908@ichips.intel.com> <44B7E3B7.3010802@veritas.com> <44B7EC56.6060706@ichips.intel.com> <44B7F62E.7050001@veritas.com> Message-ID: <44B80638.8020804@ichips.intel.com> somenath wrote: > int > io_complete( > struct ib_cq *cq, > void *passed_arg) > { > xxx_connection_t *arg = passed_arg; > xxx_status_t stat = xxx_st_ok; > struct ib_wc wc; > int count = 0; > > if (count = ib_poll_cq(cq, 1, &wc) < 0) { I think this evaluates ib_poll_cq(..) < 0 before doing the assignment. Since the expression evaluates to false, count is assigned 0. Can you try modifying this to: if ((count = ib_poll_cq(..)) < 0) - Sean From somenath at veritas.com Fri Jul 14 14:35:24 2006 From: somenath at veritas.com (somenath) Date: Fri, 14 Jul 2006 14:35:24 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B80638.8020804@ichips.intel.com> References: <44B7D6B8.8020109@veritas.com> <44B7D99D.5090907@veritas.com> <44B7DFCE.7000908@ichips.intel.com> <44B7E3B7.3010802@veritas.com> <44B7EC56.6060706@ichips.intel.com> <44B7F62E.7050001@veritas.com> <44B80638.8020804@ichips.intel.com> Message-ID: <44B80E1C.5020201@veritas.com> Sean Hefty wrote: > somenath wrote: > >> int >> io_complete( >> struct ib_cq *cq, >> void *passed_arg) >> { >> xxx_connection_t *arg = passed_arg; >> xxx_status_t stat = xxx_st_ok; >> struct ib_wc wc; >> int count = 0; >> >> if (count = ib_poll_cq(cq, 1, &wc) < 0) { > > > I think this evaluates ib_poll_cq(..) < 0 before doing the > assignment. Since the expression evaluates to false, count is > assigned 0. Can you try modifying this to: > > if ((count = ib_poll_cq(..)) < 0) > > - Sean I added that stuff, but it didn't make a difference...it still returned 0.... thanks anyway for suggesting.... looks like, I may be getting a valid entry even if ib_poll_cq() is returning zero... is that possible? I am going to try it anyway, even if the ib_poll_cq() returns zero, go ahead and try to use the wc.. dump of wc entry from recv side, looks like many are valid entries (like wr_id, length, op etc..) wr_id=0x10118163c00 status=0x0 op=0x80 vendor_err=0x246 byte_len=0x2028 imm_data=0xa0367eb8 qp_num=0x404 src_qp=0x68 wc_flags=0x0 pkey_index=0x0 slid=0x0 sl=0x0 dlid_path_bits=0x0 port_num=0xff som. From mshefty at ichips.intel.com Fri Jul 14 14:55:30 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 14 Jul 2006 14:55:30 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B80E1C.5020201@veritas.com> References: <44B7D6B8.8020109@veritas.com> <44B7D99D.5090907@veritas.com> <44B7DFCE.7000908@ichips.intel.com> <44B7E3B7.3010802@veritas.com> <44B7EC56.6060706@ichips.intel.com> <44B7F62E.7050001@veritas.com> <44B80638.8020804@ichips.intel.com> <44B80E1C.5020201@veritas.com> Message-ID: <44B812D2.1030606@ichips.intel.com> somenath wrote: >> I think this evaluates ib_poll_cq(..) < 0 before doing the >> assignment. Since the expression evaluates to false, count is >> assigned 0. Can you try modifying this to: >> >> if ((count = ib_poll_cq(..)) < 0) >> >> - Sean > > > I added that stuff, but it didn't make a difference...it still returned > 0.... If ib_poll_cq() is truly returning 0, but with a valid wc, then this is a bug. (I'm pretty sure that the additional parens are needed in any case, and you'll also want to swap your ib_poll_cq and ib_req_notify calls back around.) I have never seen this issue, and I'm sure that we would have heard about it if others had. Nothing obvious jumped out at me when looking at the mthca completion code. Maybe you've tried this, but can you break the count = ib_poll_cq() out from the if statement, print count and the wc structure immediately before and after the call, and post the results? > looks like, I may be getting a valid entry even if ib_poll_cq() is > returning zero... > is that possible? You should not be getting a valid wc if the call returns 0. For most applications this would result in a lost completion. > dump of wc entry from recv side, looks like many are valid entries (like > wr_id, length, op etc..) This could just be left over data from the stack, but... > wr_id=0x10118163c00 status=0x0 op=0x80 Does the wr_id match the value that you set on the work request? (I'm assuming that it does based on your previous comment.) The opcode does match IB_WC_RECV. - Sean From somenath at veritas.com Fri Jul 14 15:24:30 2006 From: somenath at veritas.com (somenath) Date: Fri, 14 Jul 2006 15:24:30 -0700 Subject: [openib-general] is wc valid if ib_poll_cq() returns zero In-Reply-To: <44B812D2.1030606@ichips.intel.com> References: <44B7D6B8.8020109@veritas.com> <44B7D99D.5090907@veritas.com> <44B7DFCE.7000908@ichips.intel.com> <44B7E3B7.3010802@veritas.com> <44B7EC56.6060706@ichips.intel.com> <44B7F62E.7050001@veritas.com> <44B80638.8020804@ichips.intel.com> <44B80E1C.5020201@veritas.com> <44B812D2.1030606@ichips.intel.com> Message-ID: <44B8199E.2020406@veritas.com> Sean Hefty wrote: > somenath wrote: > >>> I think this evaluates ib_poll_cq(..) < 0 before doing the >>> assignment. Since the expression evaluates to false, count is >>> assigned 0. Can you try modifying this to: >>> >>> if ((count = ib_poll_cq(..)) < 0) >>> >>> - Sean >> >> >> >> I added that stuff, but it didn't make a difference...it still >> returned 0.... > > > If ib_poll_cq() is truly returning 0, but with a valid wc, then this > is a bug. (I'm pretty sure that the additional parens are needed in > any case, and you'll also want to swap your ib_poll_cq and > ib_req_notify calls back around.) I have never seen this issue, and > I'm sure that we would have heard about it if others had. Nothing > obvious jumped out at me when looking at the mthca completion code. > > Maybe you've tried this, but can you break the count = ib_poll_cq() > out from the if statement, print count and the wc structure > immediately before and after the call, and post the results? I tried this code too...(an retrying just now) and seeing it still returns 0... count = ib_poll_cq(cq, 1, &wc); if (count < 0) { stat = xxx_st_error; goto error; //it has never gone to error from this location. } I am ready to try any other suggestions and will change this portion of the code too, but currently I am just trying to get a valid wc handle with ib_poll_cq() returning 1!!! that never happens in anyway I try. > >> looks like, I may be getting a valid entry even if ib_poll_cq() is >> returning zero... >> is that possible? > > > You should not be getting a valid wc if the call returns 0. For most > applications this would result in a lost completion. > >> dump of wc entry from recv side, looks like many are valid entries >> (like wr_id, length, op etc..) > > > This could just be left over data from the stack, but... > >> wr_id=0x10118163c00 status=0x0 op=0x80 > > > Does the wr_id match the value that you set on the work request? (I'm > assuming that it does based on your previous comment.) The opcode > does match IB_WC_RECV. > ' yes, wr_id, lenght, opcode, status match... next, i will check the data .... > - Sean From ralphc at pathscale.com Fri Jul 14 15:27:07 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Fri, 14 Jul 2006 15:27:07 -0700 Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: <20060713054658.GC5096@rhun.ibm.com> References: <1152746967.4572.263.camel@brick.pathscale.com> <20060712.174013.95062313.davem@davemloft.net> <20060713054658.GC5096@rhun.ibm.com> Message-ID: <1152916027.4572.391.camel@brick.pathscale.com> On Thu, 2006-07-13 at 08:46 +0300, Muli Ben-Yehuda wrote: > On Wed, Jul 12, 2006 at 05:40:13PM -0700, David Miller wrote: > > From: Roland Dreier > > Date: Wed, 12 Jul 2006 17:11:26 -0700 > > > > > A cleaner solution would be to make the dma_ API really use the device > > > it's passed anyway, and allow drivers to override the standard PCI > > > stuff nicely. But that would be major surgery, I guess. > > > > Clean but expensive, you should not force the rest of the kernel > > to eat the cost of something you want to do when it's totally > > unnecessary for most other users. > > > > For example, x86 never needs to do anything other than a direct > > virt_to_phys translation to produce a DMA address, no matter what > > bus the device is on. It's a single simple integer adjustment > > that can be done inline in about 2 or 3 instructions at most. > > It's possible that even x86 will support multiple IOMMUs in the future > - for example, the Calgary IOMMU support we recently added to x86-64 > could be modified to work on plain x86 as well. > > I like the idea of a per-device DMA-API implementation, but only if it > can be done in a way that is zero cost to the majority of the users of > the API. We already have dynamic dma_ops on x86-64 to support nommu, > swiotlb, gart and Calgary cleanly, extending it to use a per-device > dma-ops isn't too difficult. > > Cheers, > Muli A per-device DMA-API would solve my problem. It would be a fairly invasive changeset though. The basic idea would be to add a struct dma_mapping_ops * to struct device and change all the inline dma_* routines to something like: static inline dma_addr_t dma_map_single(struct device *hwdev, void *ptr, size_t size, int direction) { BUG_ON(!valid_dma_direction(direction)); return hwdev->dma_ops ? hwdev->dma_ops->map_single(hwdev, ptr, size, direction) : dma_ops->map_single(hwdev, ptr, size, direction); } Note that the current design only supports one IOMMU type in a system. This could support multiple IOMMU types at the same time. Another possibility is to only do this for the infiniband subsystem. The idea would be to replace calls to dma_* with ib_dma_* which would be defined as above. From davem at davemloft.net Fri Jul 14 15:35:03 2006 From: davem at davemloft.net (David Miller) Date: Fri, 14 Jul 2006 15:35:03 -0700 (PDT) Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: <1152916027.4572.391.camel@brick.pathscale.com> References: <20060712.174013.95062313.davem@davemloft.net> <20060713054658.GC5096@rhun.ibm.com> <1152916027.4572.391.camel@brick.pathscale.com> Message-ID: <20060714.153503.123972494.davem@davemloft.net> From: Ralph Campbell Date: Fri, 14 Jul 2006 15:27:07 -0700 > Note that the current design only supports one IOMMU type in a system. > This could support multiple IOMMU types at the same time. This is not true, the framework allows multiply such types and in fact Sparc64 takes advantage of this. We have about 4 or 5 different PCI controllers, and the IOMMUs are slightly different in each. Even with the standard PCI DMA mapping calls, we can gather the platform private information necessary to program the IOMMU appropriately for a given chipset. The dma_mapping_ops idea will never get accepted by folks like Linus, for reasons I've outlined in previous emails in this thread. So, it's best to look elsewhere for solutions to your problem, such as the ideas used by the USB and IEE1394 device layers. From ravinandan.arakali at neterion.com Fri Jul 14 15:36:47 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Fri, 14 Jul 2006 15:36:47 -0700 Subject: [openib-general] ping problem with ammassocards(iWARPinterface) In-Reply-To: <1152817805.25964.40.camel@stevo-desktop> Message-ID: <001701c6a795$fe1c96d0$4710100a@pc.s2io.com> As Pradipta suggested, I rebuilt the libraries by removing the optimization(-O2 flag) from Makefile. Now, I don't see the core dump but there's no connection established with rping. This is similar to the failure I am seeing with rdma_lat test. BTW, when I start the rping in server mode, at say port 9999, should I expect to see an entity listening on that port number when I do "netstat -an". Currently, I don't see that. Ravi -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Thursday, July 13, 2006 12:10 PM To: ravinandan.arakali at neterion.com Cc: bpradip at in.ibm.com; openib-general at openib.org Subject: Re: [openib-general] ping problem with ammassocards(iWARPinterface) By the way, does this failure happen immediately or after some period of time? On Thu, 2006-07-13 at 13:27 -0500, Steve Wise wrote: > I guess this isn't surprising since rping doesn't work for you either. > Something fundamental is screwed up on your user side methinks... > > CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side > wasn't listening on the appropriate TCP port, or the server process did > an rdma_reject(). I'm guessing its the former... > > You could use tcpdmp and to see if the connection request is getting RST > by the remote side. > > > > > On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote: > > With the --cma option, I don't see the error about running SM. > > But there's no connection established. > > > > openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma > > pp_server_connect_cma starting server > > > > openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 > > pp_client_connect_cma starting client > > pp_client_connect_cma/856 unexpected CM event 8 > > pp_client_connect_cma NOT connected! > > pp_connect_cma(17.2.2.102,18515) failed! > > > > There are no messages in dmesg either. > > > > Ravi > > > > -----Original Message----- > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > Sent: Thursday, July 13, 2006 6:55 AM > > To: Ravinandan Arakali > > Cc: bpradip at in.ibm.com; openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso > > cards(iWARPinterface) > > > > > > Are you trying to run this over iwarp? It doesn't need an SM... > > > > For the perftests rdma_lat and rdma_bw in the iwarp branch, use the > > --cma flag. > > > > Steve. > > > > > > On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: > > > Also, I am trying to run some of the iwarp bandwidth/latency tests > > > (available under directory perftest). > > > The first thing to do here is to run opensm. When I run opensm (with debug > > > level 10), I get the following error. Any idea what needs to be done to > > get > > > this working ? > > > > > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.2.0 > > > Command Line Arguments: > > > d level = 0xa > > > Log File: /var/log/osm.log > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.2.0 > > > > > > Using default GUID 0x0 > > > Error: Could not get port guid > > > Exiting SM > > > > > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > > > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > > > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > > > > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > > > previous bind > > > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > Sent: Wednesday, July 12, 2006 10:31 AM > > > To: Ravinandan Arakali > > > Cc: openib-general at openib.org > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > interface) > > > > > > > > > Ravinandan, > > > Do you still see the rping crash? > > > > > > Thanks, > > > Pradipta Kumar. > > > > > > Ravinandan Arakali wrote: > > > > Pradipta, > > > > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > > > > values in /proc/krping. When I re-ran the krping test, I see following > > > > output > > > > openfab2:~ # cat /proc/krping > > > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > > > Any idea why rping is dumping core ? > > > > > > > > Has any testing been done using SDP with ammasso cards ? > > > > > > > > Regards, > > > > Ravi > > > > > > > > > > > > -----Original Message----- > > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > > Sent: Friday, July 07, 2006 11:20 PM > > > > To: Ravinandan Arakali > > > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > > > openib-general at openib.org > > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > > interface) > > > > > > > > > > > > Ravinandan Arakali wrote: > > > >> Pradipta, > > > >> Following is the output from gdb after core dump. I have also > > copy-pasted > > > >> the gdb output on client system. > > > >> > > > >> Attached is the dmesg output when krping test is run in verbose mode. > > > >> The ping data on the sender(client) seems okay. The content is shifted > > > >> forward by one character for each packet. On receiver, after receiving > > > > ping > > > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > > > messages > > > >> can be lost during writing to /var/log/messages ? > > > > krping is indeed working!!...Using 'verbose' allows you to see the ping > > > > data. > > > > When not using 'verbose' you see only 'send/recv' messages. > > > >> ----------------------------------------- > > > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > > > >> Starting program: > > > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > > > >> 0.0.0.0 -p 9999 > > > >> [Thread debugging using libthread_db enabled] > > > >> [New Thread -1210054992 (LWP 3668)] > > > >> ipaddr (0.0.0.0) > > > >> port 9999 > > > >> created cm_id 0x804e6e0 > > > >> [New Thread -1210057824 (LWP 3671)] > > > >> rdma_bind_addr successful > > > >> rdma_listen > > > >> cma_event type 4 cma_id 0x804e968 (child) > > > >> child cma 0x804e968 > > > >> > > > >> Program received signal SIGSEGV, Segmentation fault. > > > >> [Switching to Thread -1210054992 (LWP 3668)] > > > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > > > >> (gdb) bt > > > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address > > 0x6 > > > >> ) at examples/rping.c:767 > > > >> (gdb) > > > >> > > > >> --------------------------------- > > > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > > > >> Starting program: > > > >> > > tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > > > >> 17.2.2.102 -p 9999 > > > >> [Thread debugging using libthread_db enabled] > > > >> [New Thread 47388824908032 (LWP 4620)] > > > >> ipaddr (17.2.2.102) > > > >> port 9999 > > > >> created cm_id 0x506b00 > > > >> [New Thread 1082132800 (LWP 4623)] > > > >> cma_event type 0 cma_id 0x506b00 (parent) > > > >> cma_event type 2 cma_id 0x506b00 (parent) > > > >> rdma_resolve_addr - rdma_resolve_route successful > > > >> created pd 0x506e60 > > > >> created channel 0x506e80 > > > >> created cq 0x506ea0 > > > >> created qp 0x506f40 > > > >> rping_setup_buffers called on cb 0x505010 > > > >> allocated & registered buffers... > > > >> [New Thread 1090525504 (LWP 4624)] > > > >> cq_thread started. > > > >> > > > >> > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ralphc at pathscale.com Fri Jul 14 16:45:19 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Fri, 14 Jul 2006 16:45:19 -0700 Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: <20060714.153503.123972494.davem@davemloft.net> References: <20060712.174013.95062313.davem@davemloft.net> <20060713054658.GC5096@rhun.ibm.com> <1152916027.4572.391.camel@brick.pathscale.com> <20060714.153503.123972494.davem@davemloft.net> Message-ID: <1152920719.4572.398.camel@brick.pathscale.com> On Fri, 2006-07-14 at 15:35 -0700, David Miller wrote: > From: Ralph Campbell > Date: Fri, 14 Jul 2006 15:27:07 -0700 > > > Note that the current design only supports one IOMMU type in a system. > > This could support multiple IOMMU types at the same time. > > This is not true, the framework allows multiply such types > and in fact Sparc64 takes advantage of this. We have about > 4 or 5 different PCI controllers, and the IOMMUs are slightly > different in each. I see. It looks like dma_map_single() is an inline call to pci_map_single() which is a function call that can then look at the device and tell what IOMMU function to call. > Even with the standard PCI DMA mapping calls, we can gather the > platform private information necessary to program the IOMMU > appropriately for a given chipset. > > The dma_mapping_ops idea will never get accepted by folks like Linus, > for reasons I've outlined in previous emails in this thread. So, it's > best to look elsewhere for solutions to your problem, such as the > ideas used by the USB and IEE1394 device layers. The USB code won't work in my case because the USB system is the one doing the memory allocation and IOMMU setup so it can remember the kernel virtual address or physical pages used to create the mapping. In my case, the infiniband (SRP) code is doing the mapping and only passing the dma_addr_t to the device driver at which point I have no way to convert it back to a kernel virtual address. I need to either change the IB device API to include mapping functions or intercept the dma_* functions so I can save the inputs. From swise at opengridcomputing.com Fri Jul 14 20:08:09 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 14 Jul 2006 22:08:09 -0500 Subject: [openib-general] ping problem with ammassocards(iWARPinterface) References: <001701c6a795$fe1c96d0$4710100a@pc.s2io.com> Message-ID: <002301c6a7bb$e70d4f90$020010ac@haggard> ----- Original Message ----- From: "Ravinandan Arakali" To: "'Steve Wise'" Cc: ; ; "Leonid. Grossman (E-mail)" Sent: Friday, July 14, 2006 5:36 PM Subject: RE: [openib-general] ping problem with ammassocards(iWARPinterface) > As Pradipta suggested, I rebuilt the libraries by removing > the optimization(-O2 flag) from Makefile. Now, I don't see the > core dump but there's no connection established with rping. > This is similar to the failure I am seeing with rdma_lat test. > > BTW, when I start the rping in server mode, at say port 9999, > should I expect to see an entity listening on that port number > when I do "netstat -an". Currently, I don't see that. > No, netstat doesn't show rdma information.... From stefanr at s5r6.in-berlin.de Sat Jul 15 06:42:32 2006 From: stefanr at s5r6.in-berlin.de (Stefan Richter) Date: Sat, 15 Jul 2006 15:42:32 +0200 Subject: [openib-general] Suggestions for how to remove bus_to_virt() In-Reply-To: <1152920719.4572.398.camel@brick.pathscale.com> References: <20060712.174013.95062313.davem@davemloft.net> <20060713054658.GC5096@rhun.ibm.com> <1152916027.4572.391.camel@brick.pathscale.com> <20060714.153503.123972494.davem@davemloft.net> <1152920719.4572.398.camel@brick.pathscale.com> Message-ID: <44B8F0C8.70103@s5r6.in-berlin.de> Ralph Campbell wrote: > On Fri, 2006-07-14 at 15:35 -0700, David Miller wrote: ... >> The dma_mapping_ops idea will never get accepted by folks like Linus, >> for reasons I've outlined in previous emails in this thread. So, it's >> best to look elsewhere for solutions to your problem, such as the >> ideas used by the USB and IEE1394 device layers. > > The USB code won't work in my case because the USB system is > the one doing the memory allocation and IOMMU setup so it > can remember the kernel virtual address or physical pages used > to create the mapping. Side note: The same is true with the DMA stuff in the ieee1394 subsystem. And the SCSI subsystem doesn't allocate (all) buffers but leaves DMA mapping and unmapping to the low-level drivers --- i.e. Ralph can't rip bus_to_virt replacements from there either, because: > In my case, the infiniband (SRP) code is doing the mapping and > only passing the dma_addr_t to the device driver at which point > I have no way to convert it back to a kernel virtual address. > I need to either change the IB device API to include mapping functions > or intercept the dma_* functions so I can save the inputs. On the other hand, ieee1394/dma is the rather obvious example of a generic layer which keeps book of virtual address and bus address of mapped memory regions, for above or below layers to use as they need. Ralph, do you think you can arrange your required API change as a pure _extension_ of the IB API? I.e. add fields to data structs or add fields to callback templates or add calls into the SRP layer... (I haven't bothered to look at the API yet.) -- Stefan Richter -=====-=-==- -=== -==== http://arcgraph.de/sr/ From swise at opengridcomputing.com Sat Jul 15 08:50:05 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 15 Jul 2006 10:50:05 -0500 Subject: [openib-general] ping problem with ammassocards(iWARPinterface) In-Reply-To: <002301c6a7bb$e70d4f90$020010ac@haggard> References: <001701c6a795$fe1c96d0$4710100a@pc.s2io.com> <002301c6a7bb$e70d4f90$020010ac@haggard> Message-ID: <1152978605.4658.22.camel@stevo-desktop> Why don't you mirror the ethernet ports in question on your switch and run tcpdump to capture the packets? That would give us a clue as to how far the connection setup is getting... NOTE: You cannot use tcpdump on the iw interface and expect to see the packets. You need to run it on a 3rd system... Steve. On Fri, 2006-07-14 at 22:08 -0500, Steve Wise wrote: > ----- Original Message ----- > From: "Ravinandan Arakali" > To: "'Steve Wise'" > Cc: ; ; "Leonid. Grossman > (E-mail)" > Sent: Friday, July 14, 2006 5:36 PM > Subject: RE: [openib-general] ping problem with > ammassocards(iWARPinterface) > > > > As Pradipta suggested, I rebuilt the libraries by removing > > the optimization(-O2 flag) from Makefile. Now, I don't see the > > core dump but there's no connection established with rping. > > This is similar to the failure I am seeing with rdma_lat test. > > > > BTW, when I start the rping in server mode, at say port 9999, > > should I expect to see an entity listening on that port number > > when I do "netstat -an". Currently, I don't see that. > > > > No, netstat doesn't show rdma information.... > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From or.gerlitz at gmail.com Sat Jul 15 13:40:49 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Sat, 15 Jul 2006 22:40:49 +0200 Subject: [openib-general] [PATCH] fmr pool: remove unnecessary pointer dereference In-Reply-To: <20060713153344.GA22648@mellanox.co.il> References: <20060709090521.GB2609@mellanox.co.il> <20060713153344.GA22648@mellanox.co.il> Message-ID: <15ddcffd0607151340p4c6b3a3dpa3ad9f368b252177@mail.gmail.com> On 7/13/06, Michael S. Tsirkin wrote: > Andrew, could you please drop the following into -mm and on to Linus? Have you commited this to the openib SVN? Or. From mst at mellanox.co.il Sun Jul 16 06:12:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 16 Jul 2006 16:12:42 +0300 Subject: [openib-general] [PATCH] librdmacm: add missing include Message-ID: <20060716131242.GA16108@mellanox.co.il> cma.c uses offsetof macro, so it must include stddef.h to pullin the definition. Index: openib/src/userspace/librdmacm/src/cma.c =================================================================== --- openib/src/userspace/librdmacm/src/cma.c (revision 8501) +++ openib/src/userspace/librdmacm/src/cma.c (working copy) @@ -48,6 +48,7 @@ #include #include #include +#include #include -- MST From mst at mellanox.co.il Sun Jul 16 07:10:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 16 Jul 2006 17:10:18 +0300 Subject: [openib-general] svn updated from upstream In-Reply-To: <15ddcffd0607151340p4c6b3a3dpa3ad9f368b252177@mail.gmail.com> References: <15ddcffd0607151340p4c6b3a3dpa3ad9f368b252177@mail.gmail.com> Message-ID: <20060716141018.GA16128@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH] fmr pool: remove unnecessary pointer dereference > > On 7/13/06, Michael S. Tsirkin wrote: > > Andrew, could you please drop the following into -mm and on to Linus? OK, by popular demand :) I have put the following patches from upstream in svn: r8530 | mst | 2006-07-16 17:05:12 +0300 (Sun, 16 Jul 2006) | 5 lines Merge from upstream: remove an extra pointer deference from the fmr interface Signed-off-by: Michael S. Tsirkin ------------------------------------------------------------------------ r8529 | mst | 2006-07-16 17:01:20 +0300 (Sun, 16 Jul 2006) | 8 lines mthca_ah_query returs the static rate of the address handle in internal mthc format. fix it to use rate encoding from enum ib_rate, which is what users expect. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin ------------------------------------------------------------------------ r8528 | mst | 2006-07-16 17:00:10 +0300 (Sun, 16 Jul 2006) | 5 lines IB/sa: merge from upstream: use gfp_mask specified by user for idr allocations Signed-off-by: Michael S. Tsirkin ------------------------------------------------------------------------ r8527 | mst | 2006-07-16 16:54:05 +0300 (Sun, 16 Jul 2006) | 12 lines Already applied in 2.6.18. [PATCH] srp: fix fmr error handling srp_unmap_data assumes req->fmr is NULL if the request is not mapped, so we must clean it out in case of an error. Signed-off-by: Vu Pham Signed-off-by: Michael S. Tsirkin Acked-by: Roland Dreier ------------------------------------------------------------------------ There were also lockdep-related fixes in mthca but I haven't the time to tackle them yet. -- MST From eitan at mellanox.co.il Sun Jul 16 10:18:00 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 16 Jul 2006 20:18:00 +0300 Subject: [openib-general] OFED 1.1 release - schedule and features Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3028430B0@mtlexch01.mtl.com> Hi Roland, We are trying this approach and will probably be done with it tomorrow. So I guess Vlad will be able to update the group soon. Eitan Zahavi > -----Original Message----- > From: Roland Dreier > Sent: Thursday, July 13, 2006 11:11 PM > To: Or Gerlitz > Cc: Tziporet Koren; OpenFabricsEWG; openib > Subject: Re: [openib-general] OFED 1.1 release - schedule and features > > > So if the link which ib0 maps to is DOWN you move the ib0 IPv4 address > to > another device whose link is UP (eg ib1) and you somehow have ib1 > send a > gratuitous ARP? > > I think there may be a problem in the way IPoIB deals with gratuitous ARPs. Because > if a neighbour structure is updated by the networking core, there's no way for IPoIB > to know about that and update the associated IB path. > > Has anyone actually tried this failover approach? > > - R. From service.Al-padeel at padeel.com Sun Jul 16 17:30:30 2006 From: service.Al-padeel at padeel.com (service accounts) Date: Mon, 17 Jul 2006 03:30:30 +0300 Subject: [openib-general] update account Al-padeel Message-ID: dear user, we sow you to agree to identify you at our new site webqui have with an aim of giving you the possibility of changing the color of our pseudo in paltalk and free for that please move to the web site ======================= http://www.aI-Padeel.com =================== From anton at samba.org Mon Jul 17 02:38:53 2006 From: anton at samba.org (Anton Blanchard) Date: Mon, 17 Jul 2006 19:38:53 +1000 Subject: [openib-general] Fix some suspicious ppc64 code in dapl In-Reply-To: References: <20060604002200.GB986@krispykreme> Message-ID: <20060717093853.GA17769@krispykreme> Hi, > Thank you Anton. Could you replying with a signed off by line? I'll > properly attribute this fix to you in the commit log. Sorry, I let this one slip. Here it is. Anton -- I was reading through the ppc64 specific code in dapl/ and noticed some suspicious inline assembly. - EIEIO_ON_SMP and ISYNC_ON_SMP are in kernel UP build optimisations, we shouldnt export them to userspace. Replace it with lwsync and isync. - The comment says its implemenenting cmpxchg64 but in fact its implementing cmpxchg32. Fix the comment. Signed-off-by: Anton Blanchard Index: dapl/udapl/linux/dapl_osd.h =================================================================== --- dapl/udapl/linux/dapl_osd.h (revision 7621) +++ dapl/udapl/linux/dapl_osd.h (working copy) @@ -238,14 +238,13 @@ #endif /* __ia64__ */ #elif defined(__PPC64__) __asm__ __volatile__ ( - EIEIO_ON_SMP -"1: lwarx %0,0,%2 # __cmpxchg_u64\n\ - cmpd 0,%0,%3\n\ +" lwsync\n\ +1: lwarx %0,0,%2 # __cmpxchg_u32\n\ + cmpw 0,%0,%3\n\ bne- 2f\n\ stwcx. %4,0,%2\n\ - bne- 1b" - ISYNC_ON_SMP - "\n\ + bne- 1b\n\ + isync\n\ 2:" : "=&r" (current_value), "=m" (*v) : "r" (v), "r" (match_value), "r" (new_value), "m" (*v) From k_mahesh85 at yahoo.co.in Mon Jul 17 05:01:20 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Mon, 17 Jul 2006 13:01:20 +0100 (BST) Subject: [openib-general] affiliated asynchronous errors Message-ID: <20060717120120.91563.qmail@web8324.mail.in.yahoo.com> hi all, i have a query.... if an affiliated asynchronous error occurs and still there are some outstanding workrequests on the workqueue then whether flush error completion is generated for those completions or not? suppose in SDP it is not registering any event handler while creating QP. so if any affiliated asynchronous error happens then how do the SDP will come to know about that and what happens to the outstanding work requests on the QP? thanks, Mahesh --------------------------------- Find out what India is talking about on Yahoo! Answers India. So, what’s NEW about the NEW Yahoo! Messenger? Find out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Jul 17 06:46:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Jul 2006 09:46:03 -0400 Subject: [openib-general] [PATCH][MINOR] OpenSM/osm_sa_path_record.c: In __osm_pr_rcv_get_lid_pair_path, only get reversible path parameters if requested Message-ID: <1153143963.4540.12995.camel@hal.voltaire.com> OpenSM/osm_sa_path_record.c: In __osm_pr_rcv_get_lid_pair_path, only get reversible path parameters if requested Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8538) +++ opensm/osm_sa_path_record.c (working copy) @@ -694,6 +694,12 @@ __osm_pr_rcv_get_lid_pair_path( see Vol1 Ver1.2 p900 l16 */ if( comp_mask & IB_PR_COMPMASK_REVERSIBLE ) + { + /* now try the reversible path */ + rev_path_status = __osm_pr_rcv_get_path_parms( p_rcv, p_pr, p_dest_port, + p_src_port, src_lid_ho, + comp_mask, &rev_path_parms ); + path_parms.reversible = (rev_path_status == IB_SUCCESS); if( (! path_parms.reversible && ( p_pr->num_path & 0x80 ) ) ) { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, @@ -703,7 +709,8 @@ __osm_pr_rcv_get_lid_pair_path( cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); p_pr_item = NULL; goto Exit; - }; + } + } __osm_pr_rcv_build_pr( p_rcv, p_src_port, p_dest_port, src_lid_ho, dest_lid_ho, preference, &path_parms, From mst at mellanox.co.il Mon Jul 17 07:03:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 17:03:49 +0300 Subject: [openib-general] [PATCH] ipoib: fix address update handling (was Re: OFED 1.1 release - schedule and features) In-Reply-To: References: Message-ID: <20060717140349.GA19080@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: OFED 1.1 release - schedule and features > > > So if the link which ib0 maps to is DOWN you move the ib0 IPv4 address > > to another device whose link is UP (eg ib1) and you somehow have ib1 > > send a gratuitous ARP? > > I think there may be a problem in the way IPoIB deals with gratuitous > ARPs. Because if a neighbour structure is updated by the networking > core, there's no way for IPoIB to know about that and update the > associated IB path. > > Has anyone actually tried this failover approach? > > - R. OK, we've seen the problem here - and here's a patch to fix it. Seems to work fine here - I'll let it run for a day just to make sure. How does it look? --- The neighbour ha field may get updated without destroying the neighbour. In this case, the ha field gets out of sync with the address handle stored in ipoib_neigh->ah, with the result that the ah field would point to an incorrect path, resulting in all packets being lost. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 3f89f5e..474aa21 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -212,6 +212,7 @@ struct ipoib_path { struct ipoib_neigh { struct ipoib_ah *ah; + union ib_gid dgid; struct sk_buff_head queue; struct neighbour *neighbour; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 1c6ea1c..cf71d2a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -404,6 +404,8 @@ static void path_rec_completion(int stat list_for_each_entry(neigh, &path->neigh_list, list) { kref_get(&path->ah->ref); neigh->ah = path->ah; + memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, + sizeof(union ib_gid)); while ((skb = __skb_dequeue(&neigh->queue))) __skb_queue_tail(&skqueue, skb); @@ -510,6 +512,8 @@ static void neigh_add_path(struct sk_buf if (path->ah) { kref_get(&path->ah->ref); neigh->ah = path->ah; + memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, + sizeof(union ib_gid)); ipoib_send(dev, skb, path->ah, be32_to_cpup((__be32 *) skb->dst->neighbour->ha)); @@ -633,6 +637,25 @@ static int ipoib_start_xmit(struct sk_bu neigh = *to_ipoib_neigh(skb->dst->neighbour); if (likely(neigh->ah)) { + if (unlikely(memcmp(&neigh->dgid.raw, + skb->dst->neighbour->ha + 4, + sizeof(union ib_gid)))) { + spin_lock(&priv->lock); + /* + * It's safe to call ipoib_put_ah() inside + * priv->lock here, because we know that + * path->ah will always hold one more reference, + * so ipoib_put_ah() will never do more than + * decrement the ref count. + */ + ipoib_put_ah(neigh->ah); + list_del(&neigh->list); + ipoib_neigh_free(neigh); + spin_unlock(&priv->lock); + ipoib_path_lookup(skb, dev); + goto out; + } + ipoib_send(dev, skb, neigh->ah, be32_to_cpup((__be32 *) skb->dst->neighbour->ha)); goto out; -- MST From mst at mellanox.co.il Mon Jul 17 07:32:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 17:32:38 +0300 Subject: [openib-general] problem report: uninterruptible sleep in ib_uverbs Message-ID: <20060717143238.GB19080@mellanox.co.il> Hi! With recent git code, an application that allocated a huge number of PDs got blocked during cleanup. Here's part of the sysrq dump: gen2_basic D ffff8100d9e9be58 0 4977 4846 (NOTLB) ffff8100d9e9be58 ffff81011f0901e0 ffff81000000e400 ffff81000102fed8 0000000000000002 000000020000000a ffff81011bf32730 ffff81011bf32908 0000000000009372 000138bb2a26cb8a ffff81011c0dc0c0 0000000000000000 Call Trace: [] __down_write_nested+0x82/0x9a [] :ib_uverbs:idr_write_uobj+0x19/0x2e [] :ib_uverbs:ib_uverbs_dealloc_pd+0x41/0xd2 [] :ib_uverbs:ib_uverbs_write+0xc4/0xd3 [<000000000000000c>] I didn't observe this before the locking changes in core. -- MST From mst at mellanox.co.il Mon Jul 17 07:40:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 17:40:36 +0300 Subject: [openib-general] problem report: uninterruptible sleep in ib_uverbs In-Reply-To: <20060717143238.GB19080@mellanox.co.il> References: <20060717143238.GB19080@mellanox.co.il> Message-ID: <20060717144036.GA19459@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: problem report: uninterruptible sleep in ib_uverbs > > Hi! > With recent git code, an application that allocated a huge number of PDs got > blocked during cleanup. Sorry, that should have been: a huge number of SRQs. -- MST From minich at ornl.gov Mon Jul 17 08:00:52 2006 From: minich at ornl.gov (Makia Minich) Date: Mon, 17 Jul 2006 11:00:52 -0400 Subject: [openib-general] OFED and SuSE issue Message-ID: I think that I've gone through and successfully built everything I needed to against the SuSE kernel, but I'm running into an issue when I load ib_mthca: ib_mthca: Unknown symbol bad_dma_address This is an x86_64 box with SuSE Enterprise 9 (2.6.5 kernel). Does anyone have a pointer on what I'm missing? -- Makia Minich From mst at mellanox.co.il Mon Jul 17 08:02:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 18:02:05 +0300 Subject: [openib-general] RFC: deadlock in uverbs_cmd.c Message-ID: <20060717150205.GB19459@mellanox.co.il> Hello! uverbs_cmd currently locks both send and receive cq with down_write during ib_uverbs_create_qp operation. It seems that it is easy to create a deadlock here by creating 2 QPs: QP1 - send completions go to CQ1, receive completions go to CQ2 QP2 - send completions go to CQ2, receive completions go to CQ1 Comments? Further, Roland, would you mind writing up something about the lock nesting in uverbs? -- MST From rdreier at cisco.com Mon Jul 17 08:04:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 17 Jul 2006 08:04:56 -0700 Subject: [openib-general] RFC: deadlock in uverbs_cmd.c In-Reply-To: <20060717150205.GB19459@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 17 Jul 2006 18:02:05 +0300") References: <20060717150205.GB19459@mellanox.co.il> Message-ID: Michael> Hello! uverbs_cmd currently locks both send and receive Michael> cq with down_write during ib_uverbs_create_qp operation. It does? It seems that the code in question is: scq = idr_read_cq(cmd.send_cq_handle, file->ucontext); rcq = idr_read_cq(cmd.recv_cq_handle, file->ucontext); which should use down_read() and be safe. Michael> Comments? Further, Roland, would you mind writing up Michael> something about the lock nesting in uverbs? OK, I'll add that to my todo list. - R. From rdreier at cisco.com Mon Jul 17 08:06:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 17 Jul 2006 08:06:07 -0700 Subject: [openib-general] problem report: uninterruptible sleep in ib_uverbs In-Reply-To: <20060717143238.GB19080@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 17 Jul 2006 17:32:38 +0300") References: <20060717143238.GB19080@mellanox.co.il> Message-ID: Michael> Hi! With recent git code, an application that allocated Michael> a huge number of PDs got blocked during cleanup. Here's Michael> part of the sysrq dump: Hmm, I'll take a look. With sufficient mutex debugging options turned on, you should get a list of who is holding which locks from sysrq-t, which would be helpful info. - R. From rdreier at cisco.com Mon Jul 17 08:08:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 17 Jul 2006 08:08:16 -0700 Subject: [openib-general] problem report: uninterruptible sleep in ib_uverbs In-Reply-To: <20060717143238.GB19080@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 17 Jul 2006 17:32:38 +0300") References: <20060717143238.GB19080@mellanox.co.il> Message-ID: BTW, can this be reproduced with gen2_basic from svn? What parameters (if any) are necessary to duplicate it? - R. From mst at mellanox.co.il Mon Jul 17 08:12:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 18:12:01 +0300 Subject: [openib-general] problem report: uninterruptible sleep in ib_uverbs In-Reply-To: <20060717144036.GA19459@mellanox.co.il> References: <20060717144036.GA19459@mellanox.co.il> Message-ID: <20060717151201.GD19459@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: problem report: uninterruptible sleep in ib_uverbs > > Quoting r. Michael S. Tsirkin : > > Subject: problem report: uninterruptible sleep in ib_uverbs > > > > Hi! > > With recent git code, an application that allocated a huge number of PDs got > > blocked during cleanup. > > Sorry, that should have been: a huge number of SRQs. Looks like uverbs_cmd didn't unlock PD in some error paths. Comments? -- uverbs_cmd: left some mutexes locked on error. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index bdf5d50..0371806 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1775,7 +1775,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uv ah = ib_create_ah(pd, &attr); if (IS_ERR(ah)) { ret = PTR_ERR(ah); - goto err; + goto err_put; } ah->uobject = uobj; @@ -1811,6 +1811,9 @@ err_copy: err_destroy: ib_destroy_ah(ah); +err_put: + put_pd_read(pd); + err: put_uobj_write(uobj); return ret; @@ -1984,7 +1987,7 @@ ssize_t ib_uverbs_create_srq(struct ib_u srq = pd->device->create_srq(pd, &attr, &udata); if (IS_ERR(srq)) { ret = PTR_ERR(srq); - goto err; + goto err_put; } srq->device = pd->device; @@ -2029,6 +2032,9 @@ err_copy: err_destroy: ib_destroy_srq(srq); +err_put: + put_pd_read(pd); + err: put_uobj_write(&obj->uobject); return ret; -- MST From mst at mellanox.co.il Mon Jul 17 08:10:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 18:10:04 +0300 Subject: [openib-general] RFC: deadlock in uverbs_cmd.c In-Reply-To: References: Message-ID: <20060717151004.GC19459@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: RFC: deadlock in uverbs_cmd.c > > Michael> Hello! uverbs_cmd currently locks both send and receive > Michael> cq with down_write during ib_uverbs_create_qp operation. > > It does? It seems that the code in question is: > > scq = idr_read_cq(cmd.send_cq_handle, file->ucontext); > rcq = idr_read_cq(cmd.recv_cq_handle, file->ucontext); > > which should use down_read() and be safe. Bzzt. Sorry. > Michael> Comments? Further, Roland, would you mind writing up > Michael> something about the lock nesting in uverbs? > > OK, I'll add that to my todo list. > > - R. > -- MST From mst at mellanox.co.il Mon Jul 17 08:13:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 18:13:51 +0300 Subject: [openib-general] problem report: uninterruptible sleep in ib_uverbs In-Reply-To: References: Message-ID: <20060717151351.GE19459@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: problem report: uninterruptible sleep in ib_uverbs > > BTW, can this be reproduced with gen2_basic from svn? What parameters > (if any) are necessary to duplicate it? Donnu. Dotan here will upload this test shortly. -- MST From halr at voltaire.com Mon Jul 17 08:09:22 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Jul 2006 11:09:22 -0400 Subject: [openib-general] {PATCH][MINOR] OpenSM/osm_sa_path_record.c: In __osm_pr_rcv_get_path_parms, fix endian of port GUID and indicate port number in some osm_log messages Message-ID: <1153148961.4540.13388.camel@hal.voltaire.com> OpenSM/osm_sa_path_record.c: In __osm_pr_rcv_get_path_parms, fix endian of port GUID and indicate port number in some osm_log messages Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8540) +++ opensm/osm_sa_path_record.c (working copy) @@ -322,9 +322,11 @@ __osm_pr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_pr_rcv_get_path_parms: " - "New smallest MTU = %u at intervening port 0x%016" PRIx64 "\n", + "New smallest MTU = %u at intervening port 0x%016" PRIx64 + " port num: 0x%X\n", mtu, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + osm_physp_get_port_num( p_physp ) ); } } @@ -335,9 +337,11 @@ __osm_pr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_pr_rcv_get_path_parms: " - "New smallest rate = %u at intervening port 0x%016" PRIx64 "\n", + "New smallest rate = %u at intervening port 0x%016" PRIx64 + " port num: 0x%X\n", rate, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + osm_physp_get_port_num( p_physp ) ); } } @@ -369,9 +373,11 @@ __osm_pr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_pr_rcv_get_path_parms: " - "New smallest MTU = %u at intervening port 0x%016" PRIx64 "\n", + "New smallest MTU = %u at intervening port 0x%016" PRIx64 + " port num: 0x%X\n", mtu, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + osm_physp_get_port_num( p_physp ) ); } } @@ -382,9 +388,11 @@ __osm_pr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_pr_rcv_get_path_parms: " - "New smallest rate = %u at intervening port 0x%016" PRIx64 "\n", + "New smallest rate = %u at intervening port 0x%016" PRIx64 + " port num: 0x%X\n", rate, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + osm_physp_get_port_num( p_physp ) ); } } From mst at mellanox.co.il Mon Jul 17 08:20:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 18:20:51 +0300 Subject: [openib-general] [PATCH] fix error handling in uverbs_cmd Message-ID: <20060717152051.GA20065@mellanox.co.il> Reposting. The following seems to fix the problems in destroy_pd I reported earlier. --- uverbs_cmd: left some mutexes locked on error. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index bdf5d50..0371806 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1775,7 +1775,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uv ah = ib_create_ah(pd, &attr); if (IS_ERR(ah)) { ret = PTR_ERR(ah); - goto err; + goto err_put; } ah->uobject = uobj; @@ -1811,6 +1811,9 @@ err_copy: err_destroy: ib_destroy_ah(ah); +err_put: + put_pd_read(pd); + err: put_uobj_write(uobj); return ret; @@ -1984,7 +1987,7 @@ ssize_t ib_uverbs_create_srq(struct ib_u srq = pd->device->create_srq(pd, &attr, &udata); if (IS_ERR(srq)) { ret = PTR_ERR(srq); - goto err; + goto err_put; } srq->device = pd->device; @@ -2029,6 +2032,9 @@ err_copy: err_destroy: ib_destroy_srq(srq); +err_put: + put_pd_read(pd); + err: put_uobj_write(&obj->uobject); return ret; -- MST From dotanb at mellanox.co.il Mon Jul 17 08:24:42 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Mon, 17 Jul 2006 18:24:42 +0300 Subject: [openib-general] problem report: uninterruptible sleep in ib_uverbs In-Reply-To: <20060717151351.GE19459@mellanox.co.il> References: <20060717151351.GE19459@mellanox.co.il> Message-ID: <200607171824.42789.dotanb@mellanox.co.il> On Monday 17 July 2006 18:13, Michael S. Tsirkin wrote: > Quoting r. Roland Dreier : > > Subject: Re: problem report: uninterruptible sleep in ib_uverbs > > > > BTW, can this be reproduced with gen2_basic from svn? What parameters > > (if any) are necessary to duplicate it? > > Donnu. Dotan here will upload this test shortly. Hi. i just updated the gen2_basic with our most updated version (there weren't so many changes ..). command line: % ./gen2_basic -c=SRQ -t=9 Dotan From rdreier at cisco.com Mon Jul 17 08:33:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 17 Jul 2006 08:33:32 -0700 Subject: [openib-general] [PATCH] fix error handling in uverbs_cmd In-Reply-To: <20060717152051.GA20065@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 17 Jul 2006 18:20:51 +0300") References: <20060717152051.GA20065@mellanox.co.il> Message-ID: Looks like a correct fix. Thanks, applied and queued for 2.6.18 (I guess I'm back). - R. From suri at baymicrosystems.com Mon Jul 17 08:45:20 2006 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Mon, 17 Jul 2006 11:45:20 -0400 Subject: [openib-general] Perf_main for RDMA In-Reply-To: Message-ID: <200607171545.k6HFjMfU032696@mail.baymicrosystems.com> Folks: Has any one run perf_main(RDMA) tests simulating high latency networks? I am trying to simulate RDMA over WAN and looking for some benchmarks to compare with. Any pointers to links will be great. Many thanks in advance, Suri From rdreier at cisco.com Mon Jul 17 08:57:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 17 Jul 2006 08:57:09 -0700 Subject: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant In-Reply-To: <20060713183047.642bd9e6.akpm@osdl.org> (Andrew Morton's message of "Thu, 13 Jul 2006 18:30:47 -0700") References: <44B405C8.4040706@oracle.com> <44B433CE.1030103@oracle.com> <20060712093820.GA9218@elte.hu> <20060712183049.bcb6c404.akpm@osdl.org> <20060713135446.5e2c6dd5.akpm@osdl.org> <20060713144341.97d4f771.akpm@osdl.org> <20060713181835.ad5eeff6.akpm@osdl.org> <20060713183047.642bd9e6.akpm@osdl.org> Message-ID: > Problem is, I think, you'll need to preallocate IDR_FREE_MAX items. And > then free them all again when none of them were consumed (usual). Actually I think it's OK if we just pass in no more than one extra layer for each try to add something with idr_get_new(). In the worst case, this leads to a lot of extra calls to idr_get_new(), but in the usual case it's fine. I'm including a lightly tested big patch with all my idr changes for comments -- I'll split it up into a form more suitable for merging. (Some of the changes are unrelated and obviously good, eg using kmem_cache_zalloc() instead of a slab cache with a constructor that does memset(0)). I'm not sure I'm thrilled with this approach, but it does seem to be a net win. With an allyesconfig with debugging options turned off (so spinlocks shrink back down to 8 bytes), I get the following: text data bss dec hex filename 24347759 5971210 2322176 32641145 1f21079 vmlinux.old 24347370 5970474 2320704 32638548 1f20654 vmlinux.new Most of the savings comes from ocfs2, which has a static array of 255 structures that each contain an idr -- so removing the lock from struct idr saves 255 * 8 = 2040 bytes. However, even without factoring that in, this does seem to be a net win: add/remove: 2/4 grow/shrink: 23/51 up/down: 719/-3215 (-2496) function old new delta idr_get_new_above 38 554 +516 dm_create 957 1000 +43 ipath_init_one 3294 3329 +35 get_layer - 32 +32 idr_alloc_layer - 16 +16 sd_probe 871 881 +10 rtc_device_register 493 503 +10 proc_register 277 286 +9 mmc_add_host_sysfs 126 135 +9 idr_add_uobj 80 85 +5 cma_alloc_port 224 229 +5 sys_timer_create 876 880 +4 set_anon_super 173 177 +4 sctp_process_init 1312 1316 +4 ib_ucm_ctx_alloc 197 201 +4 ib_create_cm_id 287 290 +3 proc_mkdir_mode 95 97 +2 vfs_kern_mount 279 280 +1 send_mad 325 326 +1 proc_symlink 141 142 +1 proc_file_write 40 41 +1 kill_block_super 56 57 +1 get_sb_single 175 176 +1 free_proc_entry 108 109 +1 deactivate_super 126 127 +1 proc_readdir 353 352 -1 proc_getattr 40 39 -1 get_sb_nodev 150 149 -1 o2net_send_message_vec 2032 2030 -2 hwmon_device_register 198 196 -2 create_proc_entry 170 168 -2 __put_super_and_need_restart 54 52 -2 v9fs_read_work 1424 1421 -3 v9fs_mux_init 1333 1330 -3 v9fs_mux_flush_cb 303 300 -3 v9fs_mux_destroy 369 366 -3 v9fs_mux_cancel 382 379 -3 o2net_init 441 438 -3 inotify_add_watch 285 280 -5 v9fs_session_init 1490 1484 -6 unnamed_dev_idr 32 24 -8 unit_table 32 24 -8 tcp_ps 32 24 -8 sdp_ps 32 24 -8 sd_index_idr 32 24 -8 sctp_assocs_id 32 24 -8 rtc_idr 32 24 -8 query_idr 32 24 -8 proc_inum_idr 32 24 -8 posix_timers_id 32 24 -8 mmc_host_idr 32 24 -8 lpfc_hba_index 32 24 -8 ib_uverbs_srq_idr 32 24 -8 ib_uverbs_qp_idr 32 24 -8 ib_uverbs_pd_idr 32 24 -8 ib_uverbs_mw_idr 32 24 -8 ib_uverbs_mr_idr 32 24 -8 ib_uverbs_cq_idr 32 24 -8 ib_uverbs_ah_idr 32 24 -8 i2c_adapter_idr 32 24 -8 hwmon_idr 32 24 -8 ctx_id_table 32 24 -8 cm 112 104 -8 allocated_ptys 32 24 -8 _minor_idr 32 24 -8 i2c_add_adapter 494 484 -10 idr_cache_ctor 12 - -12 v9fs_get_idpool 149 134 -15 ib_cm_init 237 220 -17 free_layer 55 34 -21 idr_init 88 62 -26 idr_get_new 40 13 -27 idr_remove 357 328 -29 infinipath_init 227 197 -30 lpfc_pci_probe_one 2752 2710 -42 ptmx_open 427 379 -48 idr_pre_get 59 - -59 alloc_layer 66 - -66 idr_get_new_above_int 533 - -533 o2net_nodes 99960 97920 -2040 --- diff --git a/arch/powerpc/mm/mmu_context_64.c b/arch/powerpc/mm/mmu_context_64.c index 90a06ac..85d6a03 100644 --- a/arch/powerpc/mm/mmu_context_64.c +++ b/arch/powerpc/mm/mmu_context_64.c @@ -26,20 +26,21 @@ static DEFINE_IDR(mmu_context_idr); int init_new_context(struct task_struct *tsk, struct mm_struct *mm) { + struct idr_layer *layer = NULL; int index; int err; again: - if (!idr_pre_get(&mmu_context_idr, GFP_KERNEL)) - return -ENOMEM; - spin_lock(&mmu_context_lock); - err = idr_get_new_above(&mmu_context_idr, NULL, 1, &index); + err = idr_get_new_above(&mmu_context_idr, NULL, 1, &index, layer); spin_unlock(&mmu_context_lock); - if (err == -EAGAIN) + if (err == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -ENOMEM; goto again; - else if (err) + } else if (err) return err; if (index > MAX_CONTEXT) { diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c index bfdb902..0ae099e 100644 --- a/drivers/char/tty_io.c +++ b/drivers/char/tty_io.c @@ -2135,6 +2135,7 @@ #ifdef CONFIG_UNIX98_PTYS static int ptmx_open(struct inode * inode, struct file * filp) { struct tty_struct *tty; + struct idr_layer *layer = NULL; int retval; int index; int idr_ret; @@ -2143,24 +2144,27 @@ static int ptmx_open(struct inode * inod /* find a device that is not in use. */ down(&allocated_ptys_lock); - if (!idr_pre_get(&allocated_ptys, GFP_KERNEL)) { - up(&allocated_ptys_lock); - return -ENOMEM; - } - idr_ret = idr_get_new(&allocated_ptys, NULL, &index); - if (idr_ret < 0) { - up(&allocated_ptys_lock); - if (idr_ret == -EAGAIN) - return -ENOMEM; - return -EIO; - } - if (index >= pty_limit) { - idr_remove(&allocated_ptys, index); - up(&allocated_ptys_lock); - return -EIO; - } + do { + idr_ret = idr_get_new(&allocated_ptys, NULL, &index, layer); + if (idr_ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { + idr_ret = -ENOMEM; + break; + } + continue; + } + + if (index >= pty_limit) { + idr_remove(&allocated_ptys, index); + idr_ret = -EIO; + } + } while (idr_ret == -EAGAIN); up(&allocated_ptys_lock); + if (idr_ret) + return idr_ret == -EAGAIN ? -ENOMEM : -EIO; + mutex_lock(&tty_mutex); retval = init_dev(ptm_driver, index, &tty); mutex_unlock(&tty_mutex); diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c index 106fa01..82d6d04 100644 --- a/drivers/hwmon/hwmon.c +++ b/drivers/hwmon/hwmon.c @@ -38,20 +38,21 @@ static DEFINE_SPINLOCK(idr_lock); */ struct class_device *hwmon_device_register(struct device *dev) { + struct idr_layer *layer = NULL; struct class_device *cdev; int id, err; again: - if (unlikely(idr_pre_get(&hwmon_idr, GFP_KERNEL) == 0)) - return ERR_PTR(-ENOMEM); - spin_lock(&idr_lock); - err = idr_get_new(&hwmon_idr, NULL, &id); + err = idr_get_new(&hwmon_idr, NULL, &id, layer); spin_unlock(&idr_lock); - if (unlikely(err == -EAGAIN)) + if (unlikely(err == -EAGAIN)) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return ERR_PTR(-ENOMEM); goto again; - else if (unlikely(err)) + } else if (unlikely(err)) return ERR_PTR(err); id = id & MAX_ID_MASK; diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c index 9cb277d..f8aa8ea 100644 --- a/drivers/i2c/i2c-core.c +++ b/drivers/i2c/i2c-core.c @@ -151,22 +151,24 @@ static struct device_attribute dev_attr_ int i2c_add_adapter(struct i2c_adapter *adap) { int id, res = 0; + struct idr_layer *layer = NULL; struct list_head *item; struct i2c_driver *driver; mutex_lock(&core_lists); - if (idr_pre_get(&i2c_adapter_idr, GFP_KERNEL) == 0) { - res = -ENOMEM; - goto out_unlock; - } - - res = idr_get_new(&i2c_adapter_idr, adap, &id); - if (res < 0) { - if (res == -EAGAIN) +again: + res = idr_get_new(&i2c_adapter_idr, adap, &id, layer); + if (res == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { res = -ENOMEM; - goto out_unlock; + goto out_unlock; + } + goto again; } + if (res < 0) + goto out_unlock; adap->nr = id & MAX_ID_MASK; mutex_init(&adap->bus_lock); diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index f85c97f..cf14d01 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -298,6 +298,7 @@ static int cm_init_av_by_path(struct ib_ static int cm_alloc_id(struct cm_id_private *cm_id_priv) { + struct idr_layer *layer = NULL; unsigned long flags; int ret; static int next_id; @@ -305,9 +306,15 @@ static int cm_alloc_id(struct cm_id_priv do { spin_lock_irqsave(&cm.lock, flags); ret = idr_get_new_above(&cm.local_id_table, cm_id_priv, next_id++, - (__force int *) &cm_id_priv->id.local_id); + (__force int *) &cm_id_priv->id.local_id, + layer); spin_unlock_irqrestore(&cm.lock, flags); - } while( (ret == -EAGAIN) && idr_pre_get(&cm.local_id_table, GFP_KERNEL) ); + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + ret = -ENOMEM; + } + } while (ret == -EAGAIN); return ret; } @@ -3347,7 +3354,6 @@ static int __init ib_cm_init(void) cm.remote_qp_table = RB_ROOT; cm.remote_sidr_table = RB_ROOT; idr_init(&cm.local_id_table); - idr_pre_get(&cm.local_id_table, GFP_KERNEL); cm.wq = create_workqueue("ib_cm"); if (!cm.wq) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d6f99d5..314b150 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1433,6 +1433,7 @@ static void cma_bind_port(struct rdma_bi static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, unsigned short snum) { + struct idr_layer *layer = NULL; struct rdma_bind_list *bind_list; int port, start, ret; @@ -1443,8 +1444,13 @@ static int cma_alloc_port(struct idr *ps start = snum ? snum : sysctl_local_port_range[0]; do { - ret = idr_get_new_above(ps, bind_list, start, &port); - } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL)); + ret = idr_get_new_above(ps, bind_list, start, &port, layer); + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + ret = -ENOMEM; + } + } while (ret == -EAGAIN); if (ret) goto err; diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index aeda484..824b652 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -490,17 +490,20 @@ static void init_mad(struct ib_sa_mad *m static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) { + struct idr_layer *layer = NULL; unsigned long flags; int ret, id; retry: - if (!idr_pre_get(&query_idr, gfp_mask)) - return -ENOMEM; spin_lock_irqsave(&idr_lock, flags); - ret = idr_get_new(&query_idr, query, &id); + ret = idr_get_new(&query_idr, query, &id, layer); spin_unlock_irqrestore(&idr_lock, flags); - if (ret == -EAGAIN) - goto retry; + if (ret == -EAGAIN) { + layer = idr_alloc_layer(gfp_mask); + if (layer) + goto retry; + ret = -ENOMEM; + } if (ret) return ret; diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index c1c6fda..22d5e24 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -173,6 +173,7 @@ static void ib_ucm_cleanup_events(struct static struct ib_ucm_context *ib_ucm_ctx_alloc(struct ib_ucm_file *file) { + struct idr_layer *layer = NULL; struct ib_ucm_context *ctx; int result; @@ -186,13 +187,15 @@ static struct ib_ucm_context *ib_ucm_ctx INIT_LIST_HEAD(&ctx->events); do { - result = idr_pre_get(&ctx_id_table, GFP_KERNEL); - if (!result) - goto error; - mutex_lock(&ctx_id_mutex); - result = idr_get_new(&ctx_id_table, ctx, &ctx->id); + result = idr_get_new(&ctx_id_table, ctx, &ctx->id, layer); mutex_unlock(&ctx_id_mutex); + + if (result == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + result = -ENOMEM; + } } while (result == -EAGAIN); if (result) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index bdf5d50..71dea88 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -109,18 +109,20 @@ static void put_uobj_write(struct ib_uob static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj) { + struct idr_layer *layer = NULL; int ret; -retry: - if (!idr_pre_get(idr, GFP_KERNEL)) - return -ENOMEM; + do { + spin_lock(&ib_uverbs_idr_lock); + ret = idr_get_new(idr, uobj, &uobj->id, layer); + spin_unlock(&ib_uverbs_idr_lock); - spin_lock(&ib_uverbs_idr_lock); - ret = idr_get_new(idr, uobj, &uobj->id); - spin_unlock(&ib_uverbs_idr_lock); - - if (ret == -EAGAIN) - goto retry; + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + ret = -ENOMEM; + } + } while (ret == -EAGAIN); return ret; } diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 823131d..6f2a711 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -168,15 +168,11 @@ static void ipath_free_devdata(struct pc static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) { + struct idr_layer *layer = NULL; unsigned long flags; struct ipath_devdata *dd; int ret; - if (!idr_pre_get(&unit_table, GFP_KERNEL)) { - dd = ERR_PTR(-ENOMEM); - goto bail; - } - dd = vmalloc(sizeof(*dd)); if (!dd) { dd = ERR_PTR(-ENOMEM); @@ -187,7 +183,19 @@ static struct ipath_devdata *ipath_alloc spin_lock_irqsave(&ipath_devs_lock, flags); - ret = idr_get_new(&unit_table, dd, &dd->ipath_unit); + do { + ret = idr_get_new(&unit_table, dd, &dd->ipath_unit, layer); + if (ret == -EAGAIN) { + spin_unlock_irqrestore(&ipath_devs_lock, flags); + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { + dd = ERR_PTR(-ENOMEM); + goto bail; + } + spin_lock_irqsave(&ipath_devs_lock, flags); + } + } while (ret == -EAGAIN); + if (ret < 0) { printk(KERN_ERR IPATH_DRV_NAME ": Could not allocate unit ID: error %d\n", -ret); @@ -1754,10 +1762,6 @@ static int __init infinipath_init(void) * the PCI subsystem. */ idr_init(&unit_table); - if (!idr_pre_get(&unit_table, GFP_KERNEL)) { - ret = -ENOMEM; - goto bail; - } ret = pci_register_driver(&ipath_driver); if (ret < 0) { @@ -1780,7 +1784,7 @@ static int __init infinipath_init(void) goto bail_group; } - goto bail; + return ret; bail_group: ipath_driver_remove_group(&ipath_driver.driver); @@ -1791,7 +1795,6 @@ bail_pci: bail_unit: idr_destroy(&unit_table); -bail: return ret; } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index c99bf9f..e2dd20a 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -807,23 +807,30 @@ static void free_minor(int minor) */ static int specific_minor(struct mapped_device *md, int minor) { + struct idr_layer *layer = NULL; int r, m; if (minor >= (1 << MINORBITS)) return -EINVAL; - r = idr_pre_get(&_minor_idr, GFP_KERNEL); - if (!r) - return -ENOMEM; - spin_lock(&_minor_lock); +again: if (idr_find(&_minor_idr, minor)) { r = -EBUSY; goto out; } - r = idr_get_new_above(&_minor_idr, MINOR_ALLOCED, minor, &m); + r = idr_get_new_above(&_minor_idr, MINOR_ALLOCED, minor, &m, layer); + if (r == -EAGAIN) { + spin_unlock(&_minor_lock); + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -ENOMEM; + + spin_lock(&_minor_lock); + goto again; + } if (r) goto out; @@ -840,18 +847,21 @@ out: static int next_free_minor(struct mapped_device *md, int *minor) { + struct idr_layer *layer = NULL; int r, m; - r = idr_pre_get(&_minor_idr, GFP_KERNEL); - if (!r) - return -ENOMEM; - +again: spin_lock(&_minor_lock); - - r = idr_get_new(&_minor_idr, MINOR_ALLOCED, &m); - if (r) { - goto out; + r = idr_get_new(&_minor_idr, MINOR_ALLOCED, &m, layer); + if (r == -EAGAIN) { + spin_unlock(&_minor_lock); + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -ENOMEM; + goto again; } + if (r) + goto out; if (m >= (1 << MINORBITS)) { idr_remove(&_minor_idr, m); diff --git a/drivers/mmc/mmc_sysfs.c b/drivers/mmc/mmc_sysfs.c index a2a35fd..bf61ff4 100644 --- a/drivers/mmc/mmc_sysfs.c +++ b/drivers/mmc/mmc_sysfs.c @@ -280,14 +280,19 @@ struct mmc_host *mmc_alloc_host_sysfs(in */ int mmc_add_host_sysfs(struct mmc_host *host) { + struct idr_layer *layer = NULL; int err; - if (!idr_pre_get(&mmc_host_idr, GFP_KERNEL)) - return -ENOMEM; - +again: spin_lock(&mmc_host_lock); - err = idr_get_new(&mmc_host_idr, host, &host->index); + err = idr_get_new(&mmc_host_idr, host, &host->index, layer); spin_unlock(&mmc_host_lock); + if (err == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -ENOMEM; + goto again; + } if (err) return err; diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c index 1cb61a7..3388059 100644 --- a/drivers/rtc/class.c +++ b/drivers/rtc/class.c @@ -42,19 +42,21 @@ struct rtc_device *rtc_device_register(c struct rtc_class_ops *ops, struct module *owner) { + struct idr_layer *layer = NULL; struct rtc_device *rtc; int id, err; - if (idr_pre_get(&rtc_idr, GFP_KERNEL) == 0) { - err = -ENOMEM; - goto exit; - } - - +again: mutex_lock(&idr_lock); - err = idr_get_new(&rtc_idr, NULL, &id); + err = idr_get_new(&rtc_idr, NULL, &id, layer); mutex_unlock(&idr_lock); + if (err == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return ERR_PTR(-ENOMEM); + goto again; + } if (err < 0) goto exit; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 81755a3..401608b 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -1421,6 +1421,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev, struct lpfc_hba *phba; struct lpfc_sli *psli; struct lpfc_iocbq *iocbq_entry = NULL, *iocbq_next = NULL; + struct idr_layer *layer = NULL; unsigned long bar0map_len, bar2map_len; int error = -ENODEV, retval; int i; @@ -1443,10 +1444,16 @@ lpfc_pci_probe_one(struct pci_dev *pdev, phba->pcidev = pdev; /* Assign an unused board number */ - if (!idr_pre_get(&lpfc_hba_index, GFP_KERNEL)) - goto out_put_host; - - error = idr_get_new(&lpfc_hba_index, NULL, &phba->brd_no); +again: + error = idr_get_new(&lpfc_hba_index, NULL, &phba->brd_no, layer); + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { + error = -ENOMEM; + goto out_put_host; + } + goto again; + } if (error) goto out_put_host; diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 3225d31..fe504be 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1617,6 +1617,7 @@ static int sd_probe(struct device *dev) struct scsi_device *sdp = to_scsi_device(dev); struct scsi_disk *sdkp; struct gendisk *gd; + struct idr_layer *layer = NULL; u32 index; int error; @@ -1636,13 +1637,19 @@ static int sd_probe(struct device *dev) if (!gd) goto out_free; - if (!idr_pre_get(&sd_index_idr, GFP_KERNEL)) - goto out_put; - +again: spin_lock(&sd_index_lock); - error = idr_get_new(&sd_index_idr, NULL, &index); + error = idr_get_new(&sd_index_idr, NULL, &index, layer); spin_unlock(&sd_index_lock); + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { + error = -ENOMEM; + goto out_put; + } + goto again; + } if (index >= SD_MAX_DISKS) error = -EBUSY; if (error) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 22f7ccd..d03c3ba 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -198,25 +198,26 @@ struct v9fs_session_info *v9fs_inode2v9s int v9fs_get_idpool(struct v9fs_idpool *p) { + struct idr_layer *layer = NULL; int i = 0; int error; retry: - if (idr_pre_get(&p->pool, GFP_KERNEL) == 0) - return 0; - if (down_interruptible(&p->lock) == -EINTR) { eprintk(KERN_WARNING, "Interrupted while locking\n"); return -1; } /* no need to store exactly p, we just need something non-null */ - error = idr_get_new(&p->pool, p, &i); + error = idr_get_new(&p->pool, p, &i, layer); up(&p->lock); - if (error == -EAGAIN) + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -1; goto retry; - else if (error) + } else if (error) return -1; return i; diff --git a/fs/inotify.c b/fs/inotify.c index 723836a..7e12bed 100644 --- a/fs/inotify.c +++ b/fs/inotify.c @@ -131,12 +131,16 @@ EXPORT_SYMBOL_GPL(put_inotify_watch); static int inotify_handle_get_wd(struct inotify_handle *ih, struct inotify_watch *watch) { + struct idr_layer *layer = NULL; int ret; do { - if (unlikely(!idr_pre_get(&ih->idr, GFP_KERNEL))) - return -ENOSPC; - ret = idr_get_new_above(&ih->idr, watch, ih->last_wd+1, &watch->wd); + ret = idr_get_new_above(&ih->idr, watch, ih->last_wd+1, &watch->wd, layer); + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + ret = -ENOSPC; + } } while (ret == -EAGAIN); if (likely(!ret)) diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index b650efa..f9d1b04 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -175,19 +175,21 @@ static u8 o2net_num_from_nn(struct o2net static int o2net_prep_nsw(struct o2net_node *nn, struct o2net_status_wait *nsw) { + struct idr_layer *layer = NULL; int ret = 0; do { - if (!idr_pre_get(&nn->nn_status_idr, GFP_ATOMIC)) { - ret = -EAGAIN; - break; - } spin_lock(&nn->nn_lock); - ret = idr_get_new(&nn->nn_status_idr, nsw, &nsw->ns_id); + ret = idr_get_new(&nn->nn_status_idr, nsw, &nsw->ns_id, layer); if (ret == 0) list_add_tail(&nsw->ns_node_item, &nn->nn_status_list); spin_unlock(&nn->nn_lock); + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_ATOMIC); + if (!layer) + ret = -ENOMEM; + } } while (ret == -EAGAIN); if (ret == 0) { diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 4ba0300..60bdd42 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -318,19 +318,20 @@ #define PROC_DYNAMIC_FIRST 0xF0000000UL */ static unsigned int get_inode_number(void) { + struct idr_layer *layer = NULL; int i, inum = 0; int error; retry: - if (idr_pre_get(&proc_inum_idr, GFP_KERNEL) == 0) - return 0; - spin_lock(&proc_inum_lock); - error = idr_get_new(&proc_inum_idr, NULL, &i); + error = idr_get_new(&proc_inum_idr, NULL, &i, layer); spin_unlock(&proc_inum_lock); - if (error == -EAGAIN) + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return 0; goto retry; - else if (error) + } else if (error) return 0; inum = (i & MAX_ID_MASK) + PROC_DYNAMIC_FIRST; diff --git a/fs/super.c b/fs/super.c index 6d4e817..67455f5 100644 --- a/fs/super.c +++ b/fs/super.c @@ -607,19 +607,20 @@ static DEFINE_SPINLOCK(unnamed_dev_lock) int set_anon_super(struct super_block *s, void *data) { + struct idr_layer *layer = NULL; int dev; int error; retry: - if (idr_pre_get(&unnamed_dev_idr, GFP_ATOMIC) == 0) - return -ENOMEM; spin_lock(&unnamed_dev_lock); - error = idr_get_new(&unnamed_dev_idr, NULL, &dev); + error = idr_get_new(&unnamed_dev_idr, NULL, &dev, layer); spin_unlock(&unnamed_dev_lock); - if (error == -EAGAIN) - /* We raced and lost with another CPU. */ + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_ATOMIC); + if (!layer) + return -ENOMEM; goto retry; - else if (error) + } else if (error) return -EAGAIN; if ((dev & MAX_ID_MASK) == (1 << MINORBITS)) { diff --git a/include/linux/idr.h b/include/linux/idr.h index 8268034..de34c4e 100644 --- a/include/linux/idr.h +++ b/include/linux/idr.h @@ -44,7 +44,7 @@ #define MAX_ID_MASK (MAX_ID_BIT - 1) #define MAX_LEVEL (MAX_ID_SHIFT + IDR_BITS - 1) / IDR_BITS /* Number of id_layer structs to leave in free list */ -#define IDR_FREE_MAX MAX_LEVEL + MAX_LEVEL +#define IDR_FREE_MAX (MAX_LEVEL + MAX_LEVEL) struct idr_layer { unsigned long bitmap; /* A zero bit means "space here" */ @@ -57,7 +57,6 @@ struct idr { struct idr_layer *id_free; int layers; int id_free_cnt; - spinlock_t lock; }; #define IDR_INIT(name) \ @@ -66,7 +65,6 @@ #define IDR_INIT(name) \ .id_free = NULL, \ .layers = 0, \ .id_free_cnt = 0, \ - .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ } #define DEFINE_IDR(name) struct idr name = IDR_INIT(name) @@ -75,9 +73,10 @@ #define DEFINE_IDR(name) struct idr name */ void *idr_find(struct idr *idp, int id); -int idr_pre_get(struct idr *idp, gfp_t gfp_mask); -int idr_get_new(struct idr *idp, void *ptr, int *id); -int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id); +struct idr_layer *idr_alloc_layer(gfp_t gfp_mask); +int idr_get_new(struct idr *idp, void *ptr, int *id, struct idr_layer *layer); +int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id, + struct idr_layer *layer); void *idr_replace(struct idr *idp, void *ptr, int id); void idr_remove(struct idr *idp, int id); void idr_destroy(struct idr *idp); diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c index ac6dc87..18891b2 100644 --- a/kernel/posix-timers.c +++ b/kernel/posix-timers.c @@ -434,6 +434,7 @@ sys_timer_create(const clockid_t which_c struct sigevent __user *timer_event_spec, timer_t __user * created_timer_id) { + struct idr_layer *layer = NULL; int error = 0; struct k_itimer *new_timer = NULL; int new_timer_id; @@ -451,17 +452,16 @@ sys_timer_create(const clockid_t which_c spin_lock_init(&new_timer->it_lock); retry: - if (unlikely(!idr_pre_get(&posix_timers_id, GFP_KERNEL))) { - error = -EAGAIN; - goto out; - } spin_lock_irq(&idr_lock); error = idr_get_new(&posix_timers_id, (void *) new_timer, - &new_timer_id); + &new_timer_id, layer); spin_unlock_irq(&idr_lock); - if (error == -EAGAIN) - goto retry; - else if (error) { + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (layer) + goto retry; + } + if (error) { /* * Wierd looking, but we return EAGAIN if the IDR is * full (proper POSIX return value for this) diff --git a/lib/idr.c b/lib/idr.c index 16d2143..187ce4d 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -35,65 +35,39 @@ #include static kmem_cache_t *idr_layer_cache; -static struct idr_layer *alloc_layer(struct idr *idp) +static struct idr_layer *get_layer(struct idr *idp) { struct idr_layer *p; - unsigned long flags; - spin_lock_irqsave(&idp->lock, flags); - if ((p = idp->id_free)) { + p = idp->id_free; + if (p) { idp->id_free = p->ary[0]; idp->id_free_cnt--; p->ary[0] = NULL; } - spin_unlock_irqrestore(&idp->lock, flags); - return(p); -} -/* only called when idp->lock is held */ -static void __free_layer(struct idr *idp, struct idr_layer *p) -{ - p->ary[0] = idp->id_free; - idp->id_free = p; - idp->id_free_cnt++; + return p; } static void free_layer(struct idr *idp, struct idr_layer *p) { - unsigned long flags; - - /* - * Depends on the return element being zeroed. - */ - spin_lock_irqsave(&idp->lock, flags); - __free_layer(idp, p); - spin_unlock_irqrestore(&idp->lock, flags); + if (idp->id_free_cnt < IDR_FREE_MAX) { + p->ary[0] = idp->id_free; + idp->id_free = p; + idp->id_free_cnt++; + } else + kmem_cache_free(idr_layer_cache, p); } /** - * idr_pre_get - reserver resources for idr allocation - * @idp: idr handle + * idr_alloc_layer - reserve resources for idr allocation * @gfp_mask: memory allocation flags - * - * This function should be called prior to locking and calling the - * following function. It preallocates enough memory to satisfy - * the worst possible allocation. - * - * If the system is REALLY out of memory this function returns 0, - * otherwise 1. */ -int idr_pre_get(struct idr *idp, gfp_t gfp_mask) +struct idr_layer *idr_alloc_layer(gfp_t gfp_mask) { - while (idp->id_free_cnt < IDR_FREE_MAX) { - struct idr_layer *new; - new = kmem_cache_alloc(idr_layer_cache, gfp_mask); - if (new == NULL) - return (0); - free_layer(idp, new); - } - return 1; + return kmem_cache_zalloc(idr_layer_cache, gfp_mask); } -EXPORT_SYMBOL(idr_pre_get); +EXPORT_SYMBOL(idr_alloc_layer); static int sub_alloc(struct idr *idp, void *ptr, int *starting_id) { @@ -136,7 +110,7 @@ static int sub_alloc(struct idr *idp, vo * Create the layer below if it is missing. */ if (!p->ary[m]) { - if (!(new = alloc_layer(idp))) + if (!(new = get_layer(idp))) return -1; p->ary[m] = new; p->count++; @@ -171,14 +145,13 @@ static int idr_get_new_above_int(struct { struct idr_layer *p, *new; int layers, v, id; - unsigned long flags; id = starting_id; build_up: p = idp->top; layers = idp->layers; if (unlikely(!p)) { - if (!(p = alloc_layer(idp))) + if (!(p = get_layer(idp))) return -1; layers = 1; } @@ -190,19 +163,17 @@ build_up: layers++; if (!p->count) continue; - if (!(new = alloc_layer(idp))) { + if (!(new = get_layer(idp))) { /* * The allocation failed. If we built part of * the structure tear it down. */ - spin_lock_irqsave(&idp->lock, flags); for (new = p; p && p != idp->top; new = p) { p = p->ary[0]; new->ary[0] = NULL; new->bitmap = new->count = 0; - __free_layer(idp, new); + free_layer(idp, new); } - spin_unlock_irqrestore(&idp->lock, flags); return -1; } new->ary[0] = p; @@ -216,7 +187,7 @@ build_up: v = sub_alloc(idp, ptr, &id); if (v == -2) goto build_up; - return(v); + return v; } /** @@ -225,20 +196,25 @@ build_up: * @ptr: pointer you want associated with the ide * @start_id: id to start search at * @id: pointer to the allocated handle + * @layer: pointer to extra storage * * This is the allocate id function. It should be called with any * required locks. * * If memory is required, it will return -EAGAIN, you should unlock - * and go back to the idr_pre_get() call. If the idr is full, it will - * return -ENOSPC. + * and call idr_alloc_layer() to get a new layer to pass in as the + * @layer parameter. If the idr is full, it will return -ENOSPC. * * @id returns a value in the range 0 ... 0x7fffffff */ -int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id) +int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id, + struct idr_layer *layer) { int rv; + if (layer) + free_layer(idp, layer); + rv = idr_get_new_above_int(idp, ptr, starting_id); /* * This is a cheap hack until the IDR code can be fixed to @@ -260,33 +236,20 @@ EXPORT_SYMBOL(idr_get_new_above); * @idp: idr handle * @ptr: pointer you want associated with the ide * @id: pointer to the allocated handle + * @layer: pointer to extra storage * * This is the allocate id function. It should be called with any * required locks. * * If memory is required, it will return -EAGAIN, you should unlock - * and go back to the idr_pre_get() call. If the idr is full, it will - * return -ENOSPC. + * and call idr_alloc_layer() to get a new layer to pass in as the + * @layer parameter. If the idr is full, it will return -ENOSPC. * * @id returns a value in the range 0 ... 0x7fffffff */ -int idr_get_new(struct idr *idp, void *ptr, int *id) +int idr_get_new(struct idr *idp, void *ptr, int *id, struct idr_layer *layer) { - int rv; - - rv = idr_get_new_above_int(idp, ptr, 0); - /* - * This is a cheap hack until the IDR code can be fixed to - * return proper error values. - */ - if (rv < 0) { - if (rv == -1) - return -EAGAIN; - else /* Will be -3 */ - return -ENOSPC; - } - *id = rv; - return 0; + return idr_get_new_above(idp, ptr, 0, id, layer); } EXPORT_SYMBOL(idr_get_new); @@ -349,11 +312,6 @@ void idr_remove(struct idr *idp, int id) idp->top = p; --idp->layers; } - while (idp->id_free_cnt >= IDR_FREE_MAX) { - p = alloc_layer(idp); - kmem_cache_free(idr_layer_cache, p); - return; - } } EXPORT_SYMBOL(idr_remove); @@ -364,7 +322,7 @@ EXPORT_SYMBOL(idr_remove); void idr_destroy(struct idr *idp) { while (idp->id_free_cnt) { - struct idr_layer *p = alloc_layer(idp); + struct idr_layer *p = get_layer(idp); kmem_cache_free(idr_layer_cache, p); } } @@ -445,17 +403,11 @@ void *idr_replace(struct idr *idp, void } EXPORT_SYMBOL(idr_replace); -static void idr_cache_ctor(void * idr_layer, kmem_cache_t *idr_layer_cache, - unsigned long flags) -{ - memset(idr_layer, 0, sizeof(struct idr_layer)); -} - static int init_id_cache(void) { if (!idr_layer_cache) idr_layer_cache = kmem_cache_create("idr_layer_cache", - sizeof(struct idr_layer), 0, 0, idr_cache_ctor, NULL); + sizeof(struct idr_layer), 0, 0, NULL, NULL); return 0; } @@ -470,6 +422,5 @@ void idr_init(struct idr *idp) { init_id_cache(); memset(idp, 0, sizeof(struct idr)); - spin_lock_init(&idp->lock); } EXPORT_SYMBOL(idr_init); diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 2a87736..fc712c4 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1928,6 +1928,7 @@ int sctp_process_init(struct sctp_associ * association. */ if (!asoc->temp) { + struct idr_layer *layer = NULL; int assoc_id; int error; @@ -1937,15 +1938,16 @@ int sctp_process_init(struct sctp_associ goto clean_up; retry: - if (unlikely(!idr_pre_get(&sctp_assocs_id, gfp))) - goto clean_up; spin_lock_bh(&sctp_assocs_id_lock); error = idr_get_new_above(&sctp_assocs_id, (void *)asoc, 1, - &assoc_id); + &assoc_id, layer); spin_unlock_bh(&sctp_assocs_id_lock); - if (error == -EAGAIN) - goto retry; - else if (error) + if (error == -EAGAIN) { + layer = idr_alloc_layer(gfp); + if (layer) + goto retry; + } + if (error) goto clean_up; asoc->assoc_id = (sctp_assoc_t) assoc_id; From gregkh at suse.de Mon Jul 17 09:25:31 2006 From: gregkh at suse.de (Greg KH) Date: Mon, 17 Jul 2006 09:25:31 -0700 Subject: [openib-general] [patch 02/45] IB/mthca: restore missing PCI registers after reset In-Reply-To: <20060717162452.GA4829@kroah.com> References: <20060717160652.408007000@blue.kroah.org> Message-ID: <20060717162531.GC4829@kroah.com> -stable review patch. If anyone has any objections, please let us know. ------------------ mthca does not restore the following PCI-X/PCI Express registers after reset: PCI-X device: PCI-X command register PCI-X bridge: upstream and downstream split transaction registers PCI Express : PCI Express device control and link control registers This causes instability and/or bad performance on systems where one of these registers is set to a non-default value by BIOS. Signed-off-by: Michael S. Tsirkin Signed-off-by: Chris Wright Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mthca/mthca_reset.c | 59 ++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) --- linux-2.6.17.2.orig/drivers/infiniband/hw/mthca/mthca_reset.c +++ linux-2.6.17.2/drivers/infiniband/hw/mthca/mthca_reset.c @@ -49,6 +49,12 @@ int mthca_reset(struct mthca_dev *mdev) u32 *hca_header = NULL; u32 *bridge_header = NULL; struct pci_dev *bridge = NULL; + int bridge_pcix_cap = 0; + int hca_pcie_cap = 0; + int hca_pcix_cap = 0; + + u16 devctl; + u16 linkctl; #define MTHCA_RESET_OFFSET 0xf0010 #define MTHCA_RESET_VALUE swab32(1) @@ -110,6 +116,9 @@ int mthca_reset(struct mthca_dev *mdev) } } + hca_pcix_cap = pci_find_capability(mdev->pdev, PCI_CAP_ID_PCIX); + hca_pcie_cap = pci_find_capability(mdev->pdev, PCI_CAP_ID_EXP); + if (bridge) { bridge_header = kmalloc(256, GFP_KERNEL); if (!bridge_header) { @@ -129,6 +138,13 @@ int mthca_reset(struct mthca_dev *mdev) goto out; } } + bridge_pcix_cap = pci_find_capability(bridge, PCI_CAP_ID_PCIX); + if (!bridge_pcix_cap) { + err = -ENODEV; + mthca_err(mdev, "Couldn't locate HCA bridge " + "PCI-X capability, aborting.\n"); + goto out; + } } /* actually hit reset */ @@ -178,6 +194,20 @@ int mthca_reset(struct mthca_dev *mdev) good: /* Now restore the PCI headers */ if (bridge) { + if (pci_write_config_dword(bridge, bridge_pcix_cap + 0x8, + bridge_header[(bridge_pcix_cap + 0x8) / 4])) { + err = -ENODEV; + mthca_err(mdev, "Couldn't restore HCA bridge Upstream " + "split transaction control, aborting.\n"); + goto out; + } + if (pci_write_config_dword(bridge, bridge_pcix_cap + 0xc, + bridge_header[(bridge_pcix_cap + 0xc) / 4])) { + err = -ENODEV; + mthca_err(mdev, "Couldn't restore HCA bridge Downstream " + "split transaction control, aborting.\n"); + goto out; + } /* * Bridge control register is at 0x3e, so we'll * naturally restore it last in this loop. @@ -203,6 +233,35 @@ good: } } + if (hca_pcix_cap) { + if (pci_write_config_dword(mdev->pdev, hca_pcix_cap, + hca_header[hca_pcix_cap / 4])) { + err = -ENODEV; + mthca_err(mdev, "Couldn't restore HCA PCI-X " + "command register, aborting.\n"); + goto out; + } + } + + if (hca_pcie_cap) { + devctl = hca_header[(hca_pcie_cap + PCI_EXP_DEVCTL) / 4]; + if (pci_write_config_word(mdev->pdev, hca_pcie_cap + PCI_EXP_DEVCTL, + devctl)) { + err = -ENODEV; + mthca_err(mdev, "Couldn't restore HCA PCI Express " + "Device Control register, aborting.\n"); + goto out; + } + linkctl = hca_header[(hca_pcie_cap + PCI_EXP_LNKCTL) / 4]; + if (pci_write_config_word(mdev->pdev, hca_pcie_cap + PCI_EXP_LNKCTL, + linkctl)) { + err = -ENODEV; + mthca_err(mdev, "Couldn't restore HCA PCI Express " + "Link control register, aborting.\n"); + goto out; + } + } + for (i = 0; i < 16; ++i) { if (i * 4 == PCI_COMMAND) continue; -- From mshefty at ichips.intel.com Mon Jul 17 09:39:08 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 17 Jul 2006 09:39:08 -0700 Subject: [openib-general] [PATCH] librdmacm: add missing include In-Reply-To: <20060716131242.GA16108@mellanox.co.il> References: <20060716131242.GA16108@mellanox.co.il> Message-ID: <44BBBD2C.30007@ichips.intel.com> Michael S. Tsirkin wrote: > cma.c uses offsetof macro, so it must include stddef.h to pullin the definition. Thanks - committed. From halr at voltaire.com Mon Jul 17 09:43:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Jul 2006 12:43:00 -0400 Subject: [openib-general] [PATCH][MINOR] OpenSM/osm_sa_multipath_record.c: Fix endian of port GUID and indicate port number in some osm_log messages Message-ID: <1153154579.4540.13865.camel@hal.voltaire.com> OpenSM/osm_sa_multipath_record.c: Fix endian of port GUID and indicate port number in some osm_log messages Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_multipath_record.c =================================================================== --- opensm/osm_sa_multipath_record.c (revision 8549) +++ opensm/osm_sa_multipath_record.c (working copy) @@ -207,8 +207,8 @@ __osm_mpr_rcv_get_path_parms( "path not found for PKEY = 0x%x\n" "\t\tsrc %Lx dst %Lx\n", required_pkey, - osm_physp_get_port_guid( p_physp ), - osm_physp_get_port_guid( p_dest_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + cl_ntoh64( osm_physp_get_port_guid( p_dest_physp ) ) ); status = IB_NOT_FOUND; goto Exit; @@ -352,9 +352,11 @@ __osm_mpr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mpr_rcv_get_path_parms: " - "New smallest MTU = %u at intervening port 0x%016" PRIx64 "\n", + "New smallest MTU = %u at intervening port 0x%016" PRIx64 + " port num 0x%X\n", mtu, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + osm_physp_get_port_num( p_physp ) ); } } @@ -365,9 +367,11 @@ __osm_mpr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mpr_rcv_get_path_parms: " - "New smallest rate = %u at intervening port 0x%016" PRIx64 "\n", + "New smallest rate = %u at intervening port 0x%016" PRIx64 + " port num 0x%X\n", rate, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + osm_physp_get_port_num( p_physp ) ); } } @@ -399,7 +403,7 @@ __osm_mpr_rcv_get_path_parms( "__osm_mpr_rcv_get_path_parms: Path not found for SL %d\n" "\t\tin_port_num %d port_guid %Lx\n", required_sl, in_port_num, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); status = IB_NOT_FOUND; goto Exit; } @@ -414,9 +418,11 @@ __osm_mpr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mpr_rcv_get_path_parms: " - "New smallest MTU = %u at intervening port 0x%016" PRIx64 "\n", + "New smallest MTU = %u at intervening port 0x%016" PRIx64 + " port num 0x%X\n", mtu, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + osm_physp_get_port_num( p_physp ) ); } } @@ -427,9 +433,11 @@ __osm_mpr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mpr_rcv_get_path_parms: " - "New smallest rate = %u at intervening port 0x%016" PRIx64 "\n", + "New smallest rate = %u at intervening port 0x%016" PRIx64 + " port num 0x%X\n", rate, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + osm_physp_get_port_num( p_physp ) ); } } @@ -449,7 +457,7 @@ __osm_mpr_rcv_get_path_parms( "__osm_mpr_rcv_get_path_parms: " "New smallest MTU = %u at destination port 0x%016" PRIx64 "\n", mtu, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); } } @@ -462,7 +470,7 @@ __osm_mpr_rcv_get_path_parms( "__osm_mpr_rcv_get_path_parms: " "New smallest rate = %u at destination port 0x%016" PRIx64 "\n", rate, - osm_physp_get_port_guid( p_physp ) ); + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); } } From mst at mellanox.co.il Mon Jul 17 10:27:59 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 20:27:59 +0300 Subject: [openib-general] [PATCH] fix error handling in uverbs_cmd In-Reply-To: References: Message-ID: <20060717172759.GA21388@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] fix error handling in uverbs_cmd > > Looks like a correct fix. > > Thanks, applied and queued for 2.6.18 (I guess I'm back). Great! It's good to have you back. There are a couple of fixes posted previously that I think makes sense to send upstream, so you might want to consider as well: mthca: fix limit event range check for memfree. and Make kernel use UAR2 instead of UAR1 for hardware access There's MAD validation patch by Sean and ipoib patch by me that are under testing but are I think needed for 2.6.18. I am running stress tests on 2.6.18-rc2 + these 2 patches now and once that stabilizes I expect they'll be ready for upstream, hopefully by the end of the kernel summit. Note that both address long standing issues already discussed some months ago. There are also a couple of userspace fixes - have you seen them or should I repost? BTW, is it helpful for you if I continue to publish patches through a git tree as well? I can put all kernel patches for review on a git branch if that helps. -- MST From rdreier at cisco.com Mon Jul 17 12:56:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 17 Jul 2006 12:56:44 -0700 Subject: [openib-general] [PATCH] fix error handling in uverbs_cmd References: <20060717172759.GA21388@mellanox.co.il> Message-ID: Michael> There are a couple of fixes posted previously that I Michael> think makes sense to send upstream, so you might want to Michael> consider as well: Michael> mthca: fix limit event range check for memfree. Yes, I will merge that. Michael> Make kernel use UAR2 instead of UAR1 for hardware access Seems like 2.6.19 material but I'll queue it up. Michael> There are also a couple of userspace fixes - have you Michael> seen them or should I repost? Yes, I have them and will merge them soon. Michael> BTW, is it helpful for you if I continue to publish Michael> patches through a git tree as well? I can put all kernel Michael> patches for review on a git branch if that helps. It's somewhat useful but not a big deal if you don't want to do it. It's up to you depending on how much effort it is for you. - R. From rdreier at cisco.com Mon Jul 17 13:21:41 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 17 Jul 2006 13:21:41 -0700 Subject: [openib-general] fix limit event range check for memfree In-Reply-To: <20060713080549.GA20223@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 13 Jul 2006 11:05:49 +0300") References: <20060713080549.GA20223@mellanox.co.il> Message-ID: Thanks, applied. From halr at voltaire.com Mon Jul 17 13:17:31 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Jul 2006 16:17:31 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c: In __mgrp_request_is_realizable, set default selectors to exactly Message-ID: <1153167451.4540.15096.camel@hal.voltaire.com> OpenSM/osm_sa_mcmember_record.c: In __mgrp_request_is_realizable, set default selectors to exactly Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_mcmember_record.c =================================================================== --- opensm/osm_sa_mcmember_record.c (revision 8559) +++ opensm/osm_sa_mcmember_record.c (working copy) @@ -1063,9 +1063,9 @@ __mgrp_request_is_realizable( IN ib_member_rec_t * p_mcm_rec, IN const osm_physp_t* const p_physp) { - uint8_t mtu_sel; + uint8_t mtu_sel = 2; /* exactly */ uint8_t mtu_required, mtu, port_mtu; - uint8_t rate_sel; + uint8_t rate_sel = 2; /* exactly */ uint8_t rate_required, rate, port_rate; osm_log_t *p_log = p_rcv->p_log; ib_port_info_t *p_pi = NULL; From arlin.r.davis at intel.com Mon Jul 17 13:31:13 2006 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 17 Jul 2006 13:31:13 -0700 Subject: [openib-general] [PATCH] uDAPL cma provider, errno reporting on create thread during open Message-ID: Added errno reporting (message and return codes) during open to help diagnose create thread issues. Signed-off by: Arlin Davis ardavis at ichips.intel.com Index: openib_cma/dapl_ib_util.c =================================================================== --- openib_cma/dapl_ib_util.c (revision 8559) +++ openib_cma/dapl_ib_util.c (working copy) @@ -212,6 +212,7 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_N struct rdma_cm_id *cm_id; union ibv_gid *gid; int ret; + DAT_RETURN dat_status; dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " open_hca: %s - %p\n", hca_name, hca_ptr); @@ -225,8 +226,9 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_N } dapl_os_unlock(&g_hca_lock); - if (dapli_ib_thread_init()) - return DAT_INTERNAL_ERROR; + dat_status = dapli_ib_thread_init(); + if (dat_status != DAT_SUCCESS) + return dat_status; /* HCA name will be hostname or IP address */ if (getipaddr((char*)hca_name, @@ -557,10 +559,10 @@ DAT_RETURN dapls_ib_setup_async_callback return DAT_SUCCESS; } -int dapli_ib_thread_init(void) +DAT_RETURN dapli_ib_thread_init(void) { long opts; - DAT_RETURN ret; + DAT_RETURN dat_status; dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " ib_thread_init(%d)\n", getpid()); @@ -568,31 +570,27 @@ int dapli_ib_thread_init(void) dapl_os_lock(&g_hca_lock); if (g_ib_thread_state != IB_THREAD_INIT) { dapl_os_unlock(&g_hca_lock); - return 0; + return DAT_SUCCESS; } /* uCMA events non-blocking */ opts = fcntl(g_cm_events->fd, F_GETFL); /* uCMA */ if (opts < 0 || fcntl(g_cm_events->fd, F_SETFL, opts | O_NONBLOCK) < 0) { - dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " dapl_ib_init: ERR with uCMA FD\n" ); dapl_os_unlock(&g_hca_lock); - return 1; + return(dapl_convert_errno(errno, "create_thread ERR: cm_fd")); } g_ib_thread_state = IB_THREAD_CREATE; dapl_os_unlock(&g_hca_lock); /* create thread to process inbound connect request */ - ret = dapl_os_thread_create(dapli_thread, NULL, &g_ib_thread); - if (ret != DAT_SUCCESS) - { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " ib_thread_init: failed to create thread\n"); - return 1; - } - + dat_status = dapl_os_thread_create(dapli_thread, NULL, &g_ib_thread); + if (dat_status != DAT_SUCCESS) + return(dapl_convert_errno(errno, + "create_thread ERR:" + " check resource limits")); + /* wait for thread to start */ dapl_os_lock(&g_hca_lock); while (g_ib_thread_state != IB_THREAD_RUN) { @@ -609,7 +607,8 @@ int dapli_ib_thread_init(void) dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " ib_thread_init(%d) exit\n",getpid()); - return 0; + + return DAT_SUCCESS; } void dapli_ib_thread_destroy(void) Index: openib_cma/dapl_ib_util.h =================================================================== --- openib_cma/dapl_ib_util.h (revision 8559) +++ openib_cma/dapl_ib_util.h (working copy) @@ -265,7 +265,7 @@ typedef uint32_t ib_shm_transport_t; int32_t dapls_ib_init (void); int32_t dapls_ib_release (void); void dapli_thread(void *arg); -int dapli_ib_thread_init(void); +DAT_RETURN dapli_ib_thread_init(void); void dapli_ib_thread_destroy(void); void dapli_cma_event_cb(void); void dapli_cq_event_cb(struct _ib_hca_transport *hca); From mst at mellanox.co.il Mon Jul 17 13:48:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 17 Jul 2006 23:48:36 +0300 Subject: [openib-general] [PATCH] fix error handling in uverbs_cmd In-Reply-To: References: Message-ID: <20060717204836.GC21682@mellanox.co.il> Quoting r. Roland Dreier : > Michael> Make kernel use UAR2 instead of UAR1 for hardware access > > Seems like 2.6.19 material but I'll queue it up. Oh, fine, you Acked that at some point so I thought we want it for 2.6.18 but not a big deal. -- MST From jlentini at netapp.com Mon Jul 17 14:24:08 2006 From: jlentini at netapp.com (James Lentini) Date: Mon, 17 Jul 2006 17:24:08 -0400 (EDT) Subject: [openib-general] Fix some suspicious ppc64 code in dapl In-Reply-To: <20060717093853.GA17769@krispykreme> References: <20060604002200.GB986@krispykreme> <20060717093853.GA17769@krispykreme> Message-ID: On Mon, 17 Jul 2006, Anton Blanchard wrote: > > Hi, > > > Thank you Anton. Could you replying with a signed off by line? I'll > > properly attribute this fix to you in the commit log. > > Sorry, I let this one slip. Here it is. > > Anton > > -- > > I was reading through the ppc64 specific code in dapl/ and noticed some > suspicious inline assembly. > > - EIEIO_ON_SMP and ISYNC_ON_SMP are in kernel UP build optimisations, we > shouldnt export them to userspace. Replace it with lwsync and isync. > - The comment says its implemenenting cmpxchg64 but in fact its > implementing cmpxchg32. Fix the comment. Thanks! Committed in revision 8562. From mamidala at cse.ohio-state.edu Mon Jul 17 15:15:36 2006 From: mamidala at cse.ohio-state.edu (amith rajith mamidala) Date: Mon, 17 Jul 2006 18:15:36 -0400 (EDT) Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users for iWARP Message-ID: Hi, We were wondering if a similar patch is applied to the iWARP side. This functionality is very useful, Thanks, Amith ---------- Forwarded message ---------- Date: Thu, 13 Jul 2006 16:31:56 -0700 From: Sean Hefty To: openib-general at openib.org Cc: 'amith rajith mamidala' Subject: [PATCH] [RFC] librdmacm: expose device list to users The following patch adds calls to the userspace RDMA CM to return its list of RDMA devices. The calls are similar to ibv_get_device_list() / ibv_free_device_list(). Currently, RDMA device contexts are handed to the user only after they create an rdma_cm_id and bind it to a local device. By exposing the device list to the user, it makes it easier for the user to allocate device specific resources (such as PDs, CQs, etc.) that are shared among multiple rdma_cm_id's. Signed-off-by: Sean Hefty --- Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 8215) +++ include/rdma/rdma_cma.h (working copy) @@ -332,4 +332,20 @@ static inline uint16_t rdma_get_dst_port ((struct sockaddr_in *) &id->route.addr.dst_addr)->sin_port; } +/** + * rdma_get_devices - Get list of RDMA devices currently available. + * @num_devices: If non-NULL, set to the number of devices returned. + * + * Return a NULL-terminated array of opened RDMA devices. Callers can use this + * routine to allocate resources on specific RDMA devices that will be shared + * across multiple rdma_cm_id's. + * The array must be released by calling rdma_free_devices(). + */ +struct ibv_context **rdma_get_devices(int *num_devices); + +/** + * rdma_free_devices - Frees the list of devices returned by rdma_get_devices(). + */ +void rdma_free_devices(struct ibv_context **list); + #endif /* RDMA_CMA_H */ Index: src/cma.c =================================================================== --- src/cma.c (revision 8517) +++ src/cma.c (working copy) @@ -216,6 +216,32 @@ err: return ret; } +struct ibv_context **rdma_get_devices(int *num_devices) +{ + struct ibv_context **devs = NULL; + int i; + + if (!cma_dev_cnt && ucma_init()) + goto out; + + devs = malloc(sizeof *devs * (cma_dev_cnt + 1)); + if (!devs) + goto out; + + for (i = 0; i < cma_dev_cnt; i++) + devs[i] = cma_dev_array[i].verbs; + devs[i] = NULL; +out: + if (num_devices) + *num_devices = devs ? cma_dev_cnt : 0; + return devs; +} + +void rdma_free_devices(struct ibv_context **list) +{ + free(list); +} + static void __attribute__((destructor)) rdma_cma_fini(void) { ucma_cleanup(); Index: src/librdmacm.map =================================================================== --- src/librdmacm.map (revision 8215) +++ src/librdmacm.map (working copy) @@ -21,5 +21,7 @@ RDMACM_1.0 { rdma_get_dst_attr; rdma_join_multicast; rdma_leave_multicast; + rdma_get_devices; + rdma_free_devices; local: *; }; From mshefty at ichips.intel.com Mon Jul 17 15:29:43 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 17 Jul 2006 15:29:43 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users for iWARP In-Reply-To: References: Message-ID: <44BC0F57.9080309@ichips.intel.com> amith rajith mamidala wrote: > We were wondering if a similar patch is applied to the iWARP side. This > functionality is very useful, This patch should work regardless of device type. - Sean From mshefty at ichips.intel.com Mon Jul 17 15:33:35 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 17 Jul 2006 15:33:35 -0700 Subject: [openib-general] affiliated asynchronous errors In-Reply-To: <20060717120120.91563.qmail@web8324.mail.in.yahoo.com> References: <20060717120120.91563.qmail@web8324.mail.in.yahoo.com> Message-ID: <44BC103F.9060506@ichips.intel.com> keshetti mahesh wrote: > if an affiliated asynchronous error occurs and still there are some > outstanding workrequests on the workqueue then whether flush error > completion is generated for those completions or not? Outstanding work requests should complete with flushed status. > suppose in SDP it is not registering any event handler while creating > QP. so if any affiliated asynchronous error happens then how do the SDP > will come to know about that and what happens to the outstanding work > requests on the QP? I guess that SDP could detect an error when it polls the CQ and sees a non-success completion status. - Sean From ardavis at ichips.intel.com Mon Jul 17 16:12:40 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 17 Jul 2006 16:12:40 -0700 Subject: [openib-general] [PATCH] uDAPL cma provider, errno reporting on create thread during open In-Reply-To: References: Message-ID: <44BC1968.5020004@ichips.intel.com> Arlin Davis wrote: >Added errno reporting (message and return codes) during open to help diagnose create thread issues. > > > Committed revision 8565. From ralphc at pathscale.com Mon Jul 17 18:18:36 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Mon, 17 Jul 2006 18:18:36 -0700 Subject: [openib-general] [PATCH] IB/ipath - Fix a data corruption problem where error packets are processed anyway Message-ID: <1153185516.4572.431.camel@brick.pathscale.com> This patch fixes a problem where certain error packets are passed to the Infiniband layer for processing even though the packet actually was received with an error. Signed-off-by: Ralph Campbell diff -r 1b8d63e34819 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Sat Jul 15 22:03:19 2006 +0000 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Mon Jul 17 17:26:48 2006 -0700 @@ -859,6 +859,38 @@ static void ipath_rcv_layer(struct ipath __ipath_layer_rcv_lid(dd, hdr); } +static void ipath_rcv_hdrerr(struct ipath_devdata *dd, + u32 eflags, + u32 l, + u32 etail, + u64 *rc) +{ + char emsg[128]; + struct ipath_message_header *hdr; + + get_rhf_errstring(eflags, emsg, sizeof emsg); + hdr = (struct ipath_message_header *)&rc[1]; + ipath_cdbg(PKT, "RHFerrs %x hdrqtail=%x typ=%u " + "tlen=%x opcode=%x egridx=%x: %s\n", + eflags, l, + ipath_hdrget_rcv_type((__le32 *) rc), + ipath_hdrget_length_in_bytes((__le32 *) rc), + be32_to_cpu(hdr->bth[0]) >> 24, + etail, emsg); + + /* Count local link integrity errors. */ + if (eflags & (INFINIPATH_RHF_H_ICRCERR | INFINIPATH_RHF_H_VCRCERR)) { + u8 n = (dd->ipath_ibcctrl >> + INFINIPATH_IBCC_PHYERRTHRESHOLD_SHIFT) & + INFINIPATH_IBCC_PHYERRTHRESHOLD_MASK; + + if (++dd->ipath_lli_counter > n) { + dd->ipath_lli_counter = 0; + dd->ipath_lli_errors++; + } + } +} + /* * ipath_kreceive - receive a packet * @dd: the infinipath device @@ -875,7 +907,6 @@ void ipath_kreceive(struct ipath_devdata struct ipath_message_header *hdr; u32 eflags, i, etype, tlen, pkttot = 0, updegr=0, reloop=0; static u64 totcalls; /* stats, may eventually remove */ - char emsg[128]; if (!dd->ipath_hdrqtailptr) { ipath_dev_err(dd, @@ -938,26 +969,9 @@ reloop: "%x\n", etype); } - if (eflags & ~(INFINIPATH_RHF_H_TIDERR | - INFINIPATH_RHF_H_IHDRERR)) { - get_rhf_errstring(eflags, emsg, sizeof emsg); - ipath_cdbg(PKT, "RHFerrs %x hdrqtail=%x typ=%u " - "tlen=%x opcode=%x egridx=%x: %s\n", - eflags, l, etype, tlen, bthbytes[0], - ipath_hdrget_index((__le32 *) rc), emsg); - /* Count local link integrity errors. */ - if (eflags & (INFINIPATH_RHF_H_ICRCERR | - INFINIPATH_RHF_H_VCRCERR)) { - u8 n = (dd->ipath_ibcctrl >> - INFINIPATH_IBCC_PHYERRTHRESHOLD_SHIFT) & - INFINIPATH_IBCC_PHYERRTHRESHOLD_MASK; - - if (++dd->ipath_lli_counter > n) { - dd->ipath_lli_counter = 0; - dd->ipath_lli_errors++; - } - } - } else if (etype == RCVHQ_RCV_TYPE_NON_KD) { + if (unlikely(eflags)) + ipath_rcv_hdrerr(dd, eflags, l, etail, rc); + else if (etype == RCVHQ_RCV_TYPE_NON_KD) { int ret = __ipath_verbs_rcv(dd, rc + 1, ebuf, tlen); if (ret == -ENODEV) @@ -981,25 +995,7 @@ reloop: else if (etype == RCVHQ_RCV_TYPE_EXPECTED) ipath_dbg("Bug: Expected TID, opcode %x; ignored\n", be32_to_cpu(hdr->bth[0]) & 0xff); - else if (eflags & (INFINIPATH_RHF_H_TIDERR | - INFINIPATH_RHF_H_IHDRERR)) { - /* - * This is a type 3 packet, only the LRH is in the - * rcvhdrq, the rest of the header is in the eager - * buffer. - */ - u8 opcode; - if (ebuf) { - bthbytes = (u8 *) ebuf; - opcode = *bthbytes; - } - else - opcode = 0; - get_rhf_errstring(eflags, emsg, sizeof emsg); - ipath_dbg("Err %x (%s), opcode %x, egrbuf %x, " - "len %x\n", eflags, emsg, opcode, etail, - tlen); - } else { + else { /* * error packet, type of error unknown. * Probably type 3, but we don't know, so don't From ralphc at pathscale.com Mon Jul 17 18:19:54 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Mon, 17 Jul 2006 18:19:54 -0700 Subject: [openib-general] IB/ipath - Fix ib_ipath driver to work with SRP Message-ID: <1153185594.4572.433.camel@brick.pathscale.com> I am still working on a proposal to remove the phys_to_virt() calls in the ib_ipath driver. In the mean time, this patch allows SRP to work by fixing the Rkey check and conversion from IB address to kernel virtual address. It also returns the correct page size for FMRs. Signed-off-by: Ralph Campbell diff -r f2b5df59d33d drivers/infiniband/hw/ipath/ipath_keys.c --- a/drivers/infiniband/hw/ipath/ipath_keys.c Mon Jul 17 17:27:08 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_keys.c Mon Jul 17 17:34:16 2006 -0700 @@ -197,6 +197,21 @@ int ipath_rkey_ok(struct ipath_ibdev *de size_t off; int ret; + /* + * We use RKEY == zero for physical addresses + * (see ipath_get_dma_mr). + */ + if (rkey == 0) { + sge->mr = NULL; + sge->vaddr = phys_to_virt(vaddr); + sge->length = len; + sge->sge_length = len; + ss->sg_list = NULL; + ss->num_sge = 1; + ret = 1; + goto bail; + } + mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))]; if (unlikely(mr == NULL || mr->lkey != rkey)) { ret = 0; diff -r f2b5df59d33d drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Mon Jul 17 17:27:08 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Mon Jul 17 17:35:36 2006 -0700 @@ -627,6 +627,7 @@ static int ipath_query_device(struct ib_ props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR | IB_DEVICE_BAD_QKEY_CNTR | IB_DEVICE_SHUTDOWN_PORT | IB_DEVICE_SYS_IMAGE_GUID; + props->page_size_cap = PAGE_SIZE; props->vendor_id = ipath_layer_get_vendorid(dev->dd); props->vendor_part_id = ipath_layer_get_deviceid(dev->dd); props->hw_ver = ipath_layer_get_pcirev(dev->dd); From ralphc at pathscale.com Mon Jul 17 18:21:24 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Mon, 17 Jul 2006 18:21:24 -0700 Subject: [openib-general] IB/ipath ipath_skip_sge() can break if num_sge > 1 Message-ID: <1153185684.4572.435.camel@brick.pathscale.com> ipath_skip_sge() doesn't exactly duplicate the side effects of ipath_copy_sge() if num_sge > 1 since it doesn't decrement ss->num_sge. This could result in the sg_list being accessed out of bounds. Since ipath_skip_sge() is almost always called with num_sge == 1, the original "optimization" is almost never used. Signed-off-by: Ralph Campbell diff -r 62120593e7d6 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Mon Jul 17 17:36:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Mon Jul 17 17:42:52 2006 -0700 @@ -191,10 +191,6 @@ void ipath_skip_sge(struct ipath_sge_sta { struct ipath_sge *sge = &ss->sge; - while (length > sge->sge_length) { - length -= sge->sge_length; - ss->sge = *ss->sg_list++; - } while (length) { u32 len = sge->length; From k_mahesh85 at yahoo.co.in Mon Jul 17 22:17:02 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Tue, 18 Jul 2006 06:17:02 +0100 (BST) Subject: [openib-general] affiliated asynchronous errors In-Reply-To: <44BC103F.9060506@ichips.intel.com> Message-ID: <20060718051702.24097.qmail@web8321.mail.in.yahoo.com> suppose an work request is in process and due to some reason catastrophic error occured then i assume that the driver will report this error as an affiliated asynchronous error(IB_EVENT_QP_FATAL) to that QP and there will not be any completion corresponding to that work request(will there be ?????) (am i right or missing something) now as there is no event handler for the QP (suppose in the case of SDP ) how do the ULP come to know about it -Mahesh Sean Hefty wrote: keshetti mahesh wrote: > if an affiliated asynchronous error occurs and still there are some > outstanding workrequests on the workqueue then whether flush error > completion is generated for those completions or not? Outstanding work requests should complete with flushed status. > suppose in SDP it is not registering any event handler while creating > QP. so if any affiliated asynchronous error happens then how do the SDP > will come to know about that and what happens to the outstanding work > requests on the QP? I guess that SDP could detect an error when it polls the CQ and sees a non-success completion status. - Sean --------------------------------- Find out what India is talking about on Yahoo! Answers India. So, what’s NEW about the NEW Yahoo! Messenger? Find out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at mellanox.co.il Mon Jul 17 23:14:36 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 18 Jul 2006 09:14:36 +0300 Subject: [openib-general] affiliated asynchronous errors In-Reply-To: <20060718051702.24097.qmail@web8321.mail.in.yahoo.com> References: <20060718051702.24097.qmail@web8321.mail.in.yahoo.com> Message-ID: <200607180914.36367.dotanb@mellanox.co.il> Hi. On Tuesday 18 July 2006 08:17, keshetti mahesh wrote: > > suppose an work request is in process and due to some reason catastrophic error occured then i assume that the driver will report this error as an affiliated asynchronous error(IB_EVENT_QP_FATAL) to that QP > and there will not be any completion corresponding to that work request(will there be ?????) > (am i right or missing something) Do you know what is the reason for this catastrophic error? most of the time, when you will get this event your CQ is in error (CQ overrun). > > now as there is no event handler for the QP (suppose in the case of SDP ) how do the ULP come to know about it You will get an affiliated event + completion with error. I don't familiar with the SDP code, but as much as i know the module that handles the IB resources should set the async event handler for all of it's resources to know what is the state of them (for error and notification events). Dotan From ardavis at ichips.intel.com Mon Jul 17 16:12:40 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 17 Jul 2006 16:12:40 -0700 Subject: [openib-general] [PATCH] uDAPL cma provider, errno reporting on create thread during open In-Reply-To: References: Message-ID: <44BC1968.5020004@ichips.intel.com> Arlin Davis wrote: >Added errno reporting (message and return codes) during open to help diagnose create thread issues. > > > Committed revision 8565. From k_mahesh85 at yahoo.co.in Tue Jul 18 01:51:09 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Tue, 18 Jul 2006 09:51:09 +0100 (BST) Subject: [openib-general] affiliated asynchronous errors In-Reply-To: <44BC103F.9060506@ichips.intel.com> Message-ID: <20060718085109.18746.qmail@web8324.mail.in.yahoo.com> Sean Hefty wrote: keshetti mahesh wrote: >> if an affiliated asynchronous error occurs and still there are some >> outstanding workrequests on the workqueue then whether flush error >> completion is generated for those completions or not? >Outstanding work requests should complete with flushed status. but the above statement is contradicting the statement from the IB specs "For QPs: • For Affiliated Asynchronous Errors, it may not be possible to con-tinue to process Work Requests. In this case, outstanding Work Requests will not be completed. • When handling the error notification, it is the responsibility of the Consumer to ensure that all error processing has completed prior to forcing the QP to reset. " (p.no 461 vol-1.2) what i can understood from the above statement is, no completion will be generated for the workrequest which is in progress when an affliated asynchronous error occured. >> suppose in SDP it is not registering any event handler while creating >> QP. so if any affiliated asynchronous error happens then how do the SDP >> will come to know about that and what happens to the outstanding work >> requests on the QP? >I guess that SDP could detect an error when it polls the CQ and sees a non-success completion status. if there is no completion from where SDP will poll??? >- Sean -Mahesh --------------------------------- Find out what India is talking about on Yahoo! Answers India. Send FREE SMS from New Yahoo! Messenger to Mobile: Download NOW! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackm at mellanox.co.il Tue Jul 18 02:31:51 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Tue, 18 Jul 2006 12:31:51 +0300 Subject: [openib-general] [PATCH] RMPP: add Dual-sided RMPP support In-Reply-To: <000001c69cc2$9f7ac710$4e268686@amr.corp.intel.com> References: <000001c69cc2$9f7ac710$4e268686@amr.corp.intel.com> Message-ID: <200607181231.52254.jackm@mellanox.co.il> Sorry for the delay in responding. It seems to me that there are some bugs here. This is based solely on code review, so I may have erred. See my comments below On Saturday 01 July 2006 06:58, Sean Hefty wrote: > =================================================================== > > +static void ack_ds_ack(struct ib_mad_agent_private *agent, > + struct ib_mad_recv_wc *recv_wc) > +{ .... > + rmpp_mad->rmpp_hdr.seg_num = 0; The seg_num value will be set to 1 within ib_post_send_mad/ib_send_rmpp_mad -- see line 869 of mad_rmpp.c: if (rmpp_mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) { mad_send_wr->seg_num = 1; return IB_RMPP_RESULT_INTERNAL; } In this case, the rmpp_type is NOT DATA, but is ACK, so seg_num will be set to 1. > + rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(1); > + > + ret = ib_post_send_mad(msg, NULL); > + if (ret) { > + ib_destroy_ah(msg->ah); > + ib_free_send_mad(msg); > + } > +} > + > static void process_rmpp_ack(struct ib_mad_agent_private *agent, ... > + if (!mad_send_wr) { The "if" below will not be taken ever, since seg_num has been set to 1 in the DS Ack packet. > + if (!seg_num) > + process_ds_ack(agent, mad_recv_wc, newwin); > + goto out; /* Unmatched or DS RMPP ACK */ > + } > + (mad_send_wr->timeout) on sending side is ALWAYS non-zero (see function send_next_seg() in mad_rmpp.c, line 574): /* 2 seconds for an ACK until we can find the packet lifetime */ timeout = mad_send_wr->send_buf.timeout_ms; if (!timeout || timeout > 2000) mad_send_wr->timeout = msecs_to_jiffies(2000); Therefore, it seems to me that every RMPP transaction will be double-sided, since the sending side will send the ack_ds_ack below. > + if ((mad_send_wr->last_ack == mad_send_wr->send_buf.seg_count) && > + (mad_send_wr->timeout)) { > + spin_unlock_irqrestore(&agent->lock, flags); > + ack_ds_ack(agent, mad_recv_wc); > + return; /* Repeated ACK for DS RMPP transaction */ > + } > I also failed to see where the second side of the double-sided RMPP transaction starts up (sending side). Jack From ogerlitz at voltaire.com Tue Jul 18 02:43:54 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 18 Jul 2006 12:43:54 +0300 Subject: [openib-general] svn updated from upstream In-Reply-To: <20060716141018.GA16128@mellanox.co.il> References: <15ddcffd0607151340p4c6b3a3dpa3ad9f368b252177@mail.gmail.com> <20060716141018.GA16128@mellanox.co.il> Message-ID: <44BCAD5A.1070300@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> Subject: Re: [PATCH] fmr pool: remove unnecessary pointer dereference >> >> On 7/13/06, Michael S. Tsirkin wrote: >>> Andrew, could you please drop the following into -mm and on to Linus? > > OK, by popular demand :) I have put the following patches from upstream in svn: Roland, Michael, Do we have leftovers patches (eg the lockdep related once mentioned below) that were only sent upstream through Andrew to Linus and are not in the SVN? (they are also not in yours (Roland) git tree but you can pull them). Till we decide something else, the SVN is a super set of the upstream code, correct? Or. > > r8530 | mst | 2006-07-16 17:05:12 +0300 (Sun, 16 Jul 2006) | 5 lines > > Merge from upstream: remove an extra pointer deference from the fmr interface > > Signed-off-by: Michael S. Tsirkin > > > ------------------------------------------------------------------------ > r8529 | mst | 2006-07-16 17:01:20 +0300 (Sun, 16 Jul 2006) | 8 lines > > mthca_ah_query returs the static rate of the address handle in internal mthc > format. fix it to use rate encoding from enum ib_rate, which is what users > expect. > > Signed-off-by: Jack Morgenstein > Signed-off-by: Michael S. Tsirkin > > ------------------------------------------------------------------------ > r8528 | mst | 2006-07-16 17:00:10 +0300 (Sun, 16 Jul 2006) | 5 lines > > IB/sa: merge from upstream: use gfp_mask specified by user for idr allocations > > Signed-off-by: Michael S. Tsirkin > > > ------------------------------------------------------------------------ > r8527 | mst | 2006-07-16 16:54:05 +0300 (Sun, 16 Jul 2006) | 12 lines > > Already applied in 2.6.18. > > [PATCH] srp: fix fmr error handling > > srp_unmap_data assumes req->fmr is NULL if the request is not mapped, > so we must clean it out in case of an error. > > Signed-off-by: Vu Pham > Signed-off-by: Michael S. Tsirkin > Acked-by: Roland Dreier > > ------------------------------------------------------------------------ > > There were also lockdep-related fixes in mthca but I haven't the time to > tackle them yet. > From halr at voltaire.com Tue Jul 18 02:54:21 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Jul 2006 05:54:21 -0400 Subject: [openib-general] [PATCH] OpenSM/complib: Eliminate deprecated warnings Message-ID: <1153216460.4540.20885.camel@hal.voltaire.com> OpenSM/complib: Eliminate deprecated warnings Signed-off-by: Hal Rosenstock Index: complib/Makefile.am =================================================================== --- complib/Makefile.am (revision 8561) +++ complib/Makefile.am (working copy) @@ -10,7 +10,7 @@ else DBGFLAGS = -g endif -libosmcomp_la_CFLAGS = -Wall $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +libosmcomp_la_CFLAGS = -Wall $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -Wno-deprecated-declarations if HAVE_LD_VERSION_SCRIPT libosmcomp_version_script = -Wl,--version-script=$(srcdir)/libosmcomp.map From rdreier at cisco.com Tue Jul 18 04:57:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 18 Jul 2006 04:57:04 -0700 Subject: [openib-general] IB/ipath ipath_skip_sge() can break if num_sge > 1 In-Reply-To: <1153185684.4572.435.camel@brick.pathscale.com> (Ralph Campbell's message of "Mon, 17 Jul 2006 18:21:24 -0700") References: <1153185684.4572.435.camel@brick.pathscale.com> Message-ID: Thanks, queued all three patches. From ogerlitz at voltaire.com Tue Jul 18 05:11:46 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 18 Jul 2006 15:11:46 +0300 (IDT) Subject: [openib-general] iscsi dependent iser updates for 2.6.18 Message-ID: Roland, We had recently two changes in iser which were going hand in hand (dependent) with changes in drivers/scsi/libiscsi, so they are pushed upstream by Mike Christie through James tree. The first patch made it to 2.6.18-rc1 and corresponds to svn 8485 so it would appear in the for-2.6.18 branch of your tree once you pull Linus tree. The second patch (actually two patches) should be send ~today by Mike to linux-scsi and corresponds to svn 8486/8500 and svn 8567 Or. ------------------------------------------------------------------------ r8567 | ogerlitz | 2006-07-18 13:28:56 +0300 (Tue, 18 Jul 2006) | 6 lines don't set conn->recv_lock to null in ep_disconnect, since conn_stop might be called later (eg in login timeout flow) and attempt to use this lock. Signed-off-by: Erez Zilber Signed-off-by: Or Gerlitz ------------------------------------------------------------------------ r8500 | ogerlitz | 2006-07-12 13:40:44 +0300 (Wed, 12 Jul 2006) | 6 lines bugfix for the change of r8486, iscsi_conn_teardown frees the iscsi transport conn (cls_conn) and libiscsi conn (cls_conn->dd_data) so it must be called after iscsi_iser_conn_terminate which references them. Signed-off-by: Or Gerlitz ------------------------------------------------------------------------ r8486 | ogerlitz | 2006-07-11 12:46:17 +0300 (Tue, 11 Jul 2006) | 5 lines adopt iser to the removal of the terminate_conn iscsi transport callbacks; bug fix for the case of ep_disconnect being called following login failure Signed-off-by: Or Gerlitz ------------------------------------------------------------------------ r8485 | ogerlitz | 2006-07-11 12:38:21 +0300 (Tue, 11 Jul 2006) | 6 lines ported iser to the libiscsi get/set_param calls, fixed few minor minor bugs related to iscsi caps and params Signed-off-by: Or Gerlitz Signed-off-by: Mike Christie From rdreier at cisco.com Tue Jul 18 05:35:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 18 Jul 2006 05:35:08 -0700 Subject: [openib-general] iscsi dependent iser updates for 2.6.18 In-Reply-To: (Or Gerlitz's message of "Tue, 18 Jul 2006 15:11:46 +0300 (IDT)") References: Message-ID: OK, that's fine -- I assume there's nothing I actually have to do? - R. From ogerlitz at voltaire.com Tue Jul 18 05:53:54 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 18 Jul 2006 15:53:54 +0300 Subject: [openib-general] iscsi dependent iser updates for 2.6.18 In-Reply-To: References: Message-ID: <44BCD9E2.4070002@voltaire.com> Roland Dreier wrote: > OK, that's fine -- I assume there's nothing I actually have to do? OFED is built from the for-2.6.18 branch of your tree, as it is assumed that your tree-at-this-branch contains the most stable ib code; so you just need to pull Linus tree to have the first patch merged at your tree and pull again once James pushes the second/third patches upstream. Or. From mst at mellanox.co.il Tue Jul 18 05:54:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 18 Jul 2006 15:54:46 +0300 Subject: [openib-general] [PATCH] fix error handling in uverbs_cmd In-Reply-To: References: Message-ID: <20060718125446.GB17709@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] fix error handling in uverbs_cmd > > Michael> There are a couple of fixes posted previously that I > Michael> think makes sense to send upstream, so you might want to > Michael> consider as well: > > Michael> mthca: fix limit event range check for memfree. > > Yes, I will merge that. BTW, Ishai just pointed me to another outstanding bugfix patch in srp: http://openib.org/pipermail/openib-general/2006-June/022969.html I didn't look into this at any depth, but just from the subject it looks important, and harmless enough for 2.6.18. Right? -- MST From mst at mellanox.co.il Tue Jul 18 06:07:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 18 Jul 2006 16:07:25 +0300 Subject: [openib-general] iscsi dependent iser updates for 2.6.18 In-Reply-To: <44BCD9E2.4070002@voltaire.com> References: <44BCD9E2.4070002@voltaire.com> Message-ID: <20060718130725.GC17709@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: iscsi dependent iser updates for 2.6.18 > > Roland Dreier wrote: > > OK, that's fine -- I assume there's nothing I actually have to do? > > OFED is built from the for-2.6.18 branch of your tree, as it is assumed > that your tree-at-this-branch contains the most stable ib code; Not really, OFED normally just pulls from upstream, although it is of course easy to pull for-2.6.18 - I just want to minimise merge hassles if a commit goes upstream in a modified form. > So you just need to pull Linus tree to have the first patch merged at your > tree and pull again once James pushes the second/third patches upstream. > > Or. I guess pulling from Linus is always a good idea :) -- MST From arne.redlich at xiranet.com Tue Jul 18 06:53:35 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Tue, 18 Jul 2006 15:53:35 +0200 Subject: [openib-general] [PATCH 0/2] ib_cm: fix REJ on cm_id destruction Message-ID: <1153230815.6972.95.camel@confield.dd.xiranet.com> This patchset against the svn trunk fixes the CM's behaviour in case of errors during REQ message processing. Currently, this will lead to the cm_id being destroyed after sending (in most cases) a IB_CM_CONSUMER_REJ. The latter is not always desirable because the remote side might actually rely on the "correct" reject reason. #1 prepares (actually: cleans up) cm_destroy_id(). #2 rejects REQs with invalid GIDs using an appropriate REJ reason. Depends on #1. Please note that these patches are merely compile tested. However, a different version against the current mainline kernel behaves as expected. Thanks, Arne -- Arne Redlich Xiranet Communications GmbH From arne.redlich at xiranet.com Tue Jul 18 06:53:41 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Tue, 18 Jul 2006 15:53:41 +0200 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153225007.6972.56.camel@confield.dd.xiranet.com> References: <1153225007.6972.56.camel@confield.dd.xiranet.com> Message-ID: <1153230821.6972.98.camel@confield.dd.xiranet.com> Reject a REQ containing invalid GID(s) with appropriate reason codes instead of IB_CM_CONSUMER_REJ. Signed-off-by: Arne Redlich Index: infiniband/core/cm.c =================================================================== --- infiniband/core/cm.c (revision 8556) +++ infiniband/core/cm.c (working copy) @@ -1337,7 +1337,7 @@ static int cm_req_handler(struct cm_work struct ib_cm_id *cm_id; struct cm_id_private *cm_id_priv, *listen_cm_id_priv; struct cm_req_msg *req_msg; - int ret; + int ret, reject; req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad; @@ -1354,6 +1354,7 @@ static int cm_req_handler(struct cm_work id.local_id); if (IS_ERR(cm_id_priv->timewait_info)) { ret = PTR_ERR(cm_id_priv->timewait_info); + reject = 1; goto error1; } cm_id_priv->timewait_info->work.remote_id = req_msg->local_comm_id; @@ -1363,6 +1364,7 @@ static int cm_req_handler(struct cm_work listen_cm_id_priv = cm_match_req(work, cm_id_priv); if (!listen_cm_id_priv) { ret = -EINVAL; + reject = 0; goto error2; } @@ -1373,12 +1375,21 @@ static int cm_req_handler(struct cm_work cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); - if (ret) + if (ret) { + cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_INVALID_GID, + CM_MSG_RESPONSE_REQ, NULL, 0); + reject = 0; goto error3; + } if (req_msg->alt_local_lid) { ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); - if (ret) + if (ret) { + cm_issue_rej(work->port, work->mad_recv_wc, + IB_CM_INVALID_ALT_GID, CM_MSG_RESPONSE_REQ, + NULL, 0); + reject = 0; goto error3; + } } cm_id_priv->tid = req_msg->hdr.tid; cm_id_priv->timeout_ms = cm_convert_to_ms( @@ -1405,7 +1416,7 @@ error3: atomic_dec(&cm_id_priv->refcount cm_cleanup_timewait(cm_id_priv->timewait_info); error2: kfree(cm_id_priv->timewait_info); cm_id_priv->timewait_info = NULL; -error1: ib_destroy_cm_id(&cm_id_priv->id); +error1: cm_destroy_id(&cm_id_priv->id, reject); return ret; } From arne.redlich at xiranet.com Tue Jul 18 06:53:39 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Tue, 18 Jul 2006 15:53:39 +0200 Subject: [openib-general] [PATCH 1/2] ib_cm: cm_destroy_id() cleanup In-Reply-To: <1153225007.6972.56.camel@confield.dd.xiranet.com> References: <1153225007.6972.56.camel@confield.dd.xiranet.com> Message-ID: <1153230819.6972.97.camel@confield.dd.xiranet.com> In its current incarnation, cm_destroy_id() will not send a REJ if fed a magic number (err = -ENOMEM). This patch replaces this magic number with a more generic "reject" parameter. Signed-off-by: Arne Redlich Index: infiniband/core/cm.c =================================================================== --- infiniband/core/cm.c (revision 8556) +++ infiniband/core/cm.c (working copy) @@ -702,7 +702,7 @@ static void cm_reset_to_idle(struct cm_i } } -static void cm_destroy_id(struct ib_cm_id *cm_id, int err) +static void cm_destroy_id(struct ib_cm_id *cm_id, int reject) { struct cm_id_private *cm_id_priv; struct cm_work *work; @@ -737,14 +737,15 @@ retest: NULL, 0); break; case IB_CM_REQ_RCVD: - if (err == -ENOMEM) { + if (reject) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); + } else { /* Do not reject to allow future retries. */ cm_reset_to_idle(cm_id_priv); spin_unlock_irqrestore(&cm_id_priv->lock, flags); } else { - spin_unlock_irqrestore(&cm_id_priv->lock, flags); - ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, - NULL, 0, NULL, 0); } break; case IB_CM_MRA_REQ_RCVD: @@ -789,7 +790,7 @@ retest: void ib_destroy_cm_id(struct ib_cm_id *cm_id) { - cm_destroy_id(cm_id, 0); + cm_destroy_id(cm_id, 1); } EXPORT_SYMBOL(ib_destroy_cm_id); @@ -1179,7 +1180,7 @@ static void cm_process_work(struct cm_id } cm_deref_id(cm_id_priv); if (ret) - cm_destroy_id(&cm_id_priv->id, ret); + cm_destroy_id(&cm_id_priv->id, (ret == -ENOMEM) ? 0 : 1); } static void cm_format_mra(struct cm_mra_msg *mra_msg, From rdreier at cisco.com Tue Jul 18 06:59:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 18 Jul 2006 06:59:36 -0700 Subject: [openib-general] [PATCH 1/2] ib_cm: cm_destroy_id() cleanup In-Reply-To: <1153230819.6972.97.camel@confield.dd.xiranet.com> (Arne Redlich's message of "Tue, 18 Jul 2006 15:53:39 +0200") References: <1153225007.6972.56.camel@confield.dd.xiranet.com> <1153230819.6972.97.camel@confield.dd.xiranet.com> Message-ID: > + cm_destroy_id(&cm_id_priv->id, (ret == -ENOMEM) ? 0 : 1); This is rather obfuscated. How about just cm_destroy_id(&cm_id_priv->id, ret != -ENOMEM); - R. From arne.redlich at xiranet.com Tue Jul 18 07:16:36 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Tue, 18 Jul 2006 16:16:36 +0200 Subject: [openib-general] [PATCH 1/2] ib_cm: cm_destroy_id() cleanup In-Reply-To: References: <1153225007.6972.56.camel@confield.dd.xiranet.com> <1153230819.6972.97.camel@confield.dd.xiranet.com> Message-ID: <1153232196.6972.106.camel@confield.dd.xiranet.com> Am Dienstag, den 18.07.2006, 06:59 -0700 schrieb Roland Dreier: > > + cm_destroy_id(&cm_id_priv->id, (ret == -ENOMEM) ? 0 : 1); > > This is rather obfuscated. How about just > > cm_destroy_id(&cm_id_priv->id, ret != -ENOMEM); > > - R. Sure. Fixed below for your convenience. Would you consider pushing these to 2.6.18? If so, I'd happily provide patches against rc1 or your git tree. Thanks, Arne -- In its current incarnation, cm_destroy_id() will not send a REJ if fed a magic number (err = -ENOMEM). This patch replaces this magic number with a more generic "reject" parameter. Signed-off-by: Arne Redlich Index: infiniband/core/cm.c =================================================================== --- infiniband/core/cm.c (revision 8565) +++ infiniband/core/cm.c (working copy) @@ -702,7 +702,7 @@ static void cm_reset_to_idle(struct cm_i } } -static void cm_destroy_id(struct ib_cm_id *cm_id, int err) +static void cm_destroy_id(struct ib_cm_id *cm_id, int reject) { struct cm_id_private *cm_id_priv; struct cm_work *work; @@ -737,14 +737,15 @@ retest: NULL, 0); break; case IB_CM_REQ_RCVD: - if (err == -ENOMEM) { + if (reject) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); + } else { /* Do not reject to allow future retries. */ cm_reset_to_idle(cm_id_priv); spin_unlock_irqrestore(&cm_id_priv->lock, flags); } else { - spin_unlock_irqrestore(&cm_id_priv->lock, flags); - ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, - NULL, 0, NULL, 0); } break; case IB_CM_MRA_REQ_RCVD: @@ -789,7 +790,7 @@ retest: void ib_destroy_cm_id(struct ib_cm_id *cm_id) { - cm_destroy_id(cm_id, 0); + cm_destroy_id(cm_id, 1); } EXPORT_SYMBOL(ib_destroy_cm_id); @@ -1179,7 +1180,7 @@ static void cm_process_work(struct cm_id } cm_deref_id(cm_id_priv); if (ret) - cm_destroy_id(&cm_id_priv->id, ret); + cm_destroy_id(&cm_id_priv->id, ret != -ENOMEM); } static void cm_format_mra(struct cm_mra_msg *mra_msg, From swise at opengridcomputing.com Tue Jul 18 07:20:17 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 18 Jul 2006 09:20:17 -0500 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users for iWARP In-Reply-To: References: Message-ID: <1153232417.22538.7.camel@stevo-desktop> On Mon, 2006-07-17 at 18:15 -0400, amith rajith mamidala wrote: > Hi, > > We were wondering if a similar patch is applied to the iWARP side. This > functionality is very useful, > I haven't pulled this into the iwarp branch yet. I'm busy right now with other things. Maybe you could apply the patch to the iwarp branch and test it? Then post a patch for the iwarp branch and I'll pull it in. I'm not sure at this point how different the iwarp branch is from the main trunk... Steve. From rdreier at cisco.com Tue Jul 18 07:19:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 18 Jul 2006 07:19:47 -0700 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153230821.6972.98.camel@confield.dd.xiranet.com> (Arne Redlich's message of "Tue, 18 Jul 2006 15:53:41 +0200") References: <1153225007.6972.56.camel@confield.dd.xiranet.com> <1153230821.6972.98.camel@confield.dd.xiranet.com> Message-ID: > + if (ret) { > + cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_INVALID_GID, > + CM_MSG_RESPONSE_REQ, NULL, 0); > + reject = 0; > goto error3; > + } This seems rather awkward -- send a REJ and then set reject to 0? Maybe it would be better if cm_destroy_id() took an actual reject code rather than just a flag? - R. From mst at mellanox.co.il Tue Jul 18 07:31:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 18 Jul 2006 17:31:56 +0300 Subject: [openib-general] [PATCH 1/2] ib_cm: cm_destroy_id() cleanup In-Reply-To: <1153232196.6972.106.camel@confield.dd.xiranet.com> References: <1153232196.6972.106.camel@confield.dd.xiranet.com> Message-ID: <20060718143156.GA18551@mellanox.co.il> Quoting r. Arne Redlich : > Subject: Re: [PATCH 1/2] ib_cm: cm_destroy_id() cleanup > > Am Dienstag, den 18.07.2006, 06:59 -0700 schrieb Roland Dreier: > > > + cm_destroy_id(&cm_id_priv->id, (ret == -ENOMEM) ? 0 : 1); > > > > This is rather obfuscated. How about just > > > > cm_destroy_id(&cm_id_priv->id, ret != -ENOMEM); > > > > - R. > > Sure. Fixed below for your convenience. > Would you consider pushing these to 2.6.18? If so, I'd happily provide > patches against rc1 or your git tree. > > Thanks, > Arne Well, Sean is the judge here, but what's the motivation for this patch? > -- > > In its current incarnation, cm_destroy_id() will not send a REJ if fed a > magic number (err = -ENOMEM). This patch replaces this magic number with > a more generic "reject" parameter. > > Signed-off-by: Arne Redlich Are you sure this is a good idea? cm_destroy_id is after all an internal function so I don't see why it must be generic. -ENOMEM is the value user returns to avoid rej, so it seems cleaner to just do the check where it's actually needed and avoid extra branches - and I expect this will be easier to extend if we have more codes. > Index: infiniband/core/cm.c > =================================================================== > --- infiniband/core/cm.c (revision 8565) > +++ infiniband/core/cm.c (working copy) > @@ -702,7 +702,7 @@ static void cm_reset_to_idle(struct cm_i > } > } > > -static void cm_destroy_id(struct ib_cm_id *cm_id, int err) > +static void cm_destroy_id(struct ib_cm_id *cm_id, int reject) > { > struct cm_id_private *cm_id_priv; > struct cm_work *work; > @@ -737,14 +737,15 @@ retest: > NULL, 0); > break; > case IB_CM_REQ_RCVD: > - if (err == -ENOMEM) { > + if (reject) { > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, > + NULL, 0, NULL, 0); > + } else { > /* Do not reject to allow future retries. */ > cm_reset_to_idle(cm_id_priv); > spin_unlock_irqrestore(&cm_id_priv->lock, flags); > } else { > - spin_unlock_irqrestore(&cm_id_priv->lock, flags); > - ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, > - NULL, 0, NULL, 0); > } This looks like "if {} else {} else {}" - am I missing something? > break; > case IB_CM_MRA_REQ_RCVD: > @@ -789,7 +790,7 @@ retest: > > void ib_destroy_cm_id(struct ib_cm_id *cm_id) > { > - cm_destroy_id(cm_id, 0); > + cm_destroy_id(cm_id, 1); > } > EXPORT_SYMBOL(ib_destroy_cm_id); > > @@ -1179,7 +1180,7 @@ static void cm_process_work(struct cm_id > } > cm_deref_id(cm_id_priv); > if (ret) > - cm_destroy_id(&cm_id_priv->id, ret); > + cm_destroy_id(&cm_id_priv->id, ret != -ENOMEM); > } > > static void cm_format_mra(struct cm_mra_msg *mra_msg, -- MST From rdreier at cisco.com Tue Jul 18 07:42:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 18 Jul 2006 07:42:21 -0700 Subject: [openib-general] [PATCH 1/2] ib_cm: cm_destroy_id() cleanup In-Reply-To: <20060718143156.GA18551@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 18 Jul 2006 17:31:56 +0300") References: <1153232196.6972.106.camel@confield.dd.xiranet.com> <20060718143156.GA18551@mellanox.co.il> Message-ID: > > Would you consider pushing these to 2.6.18? If so, I'd happily provide > > patches against rc1 or your git tree. > Well, Sean is the judge here, but what's the motivation for this patch? Yes, exactly. Also, after looking at this, it seems like there must be a better way to get the CM to send correct reject codes. - R. From mst at mellanox.co.il Tue Jul 18 07:43:59 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 18 Jul 2006 17:43:59 +0300 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153230821.6972.98.camel@confield.dd.xiranet.com> References: <1153230821.6972.98.camel@confield.dd.xiranet.com> Message-ID: <20060718144359.GB18551@mellanox.co.il> Quoting r. Arne Redlich : > Reject a REQ containing invalid GID(s) with appropriate reason codes > instead of IB_CM_CONSUMER_REJ. > > Signed-off-by: Arne Redlich Are there actual applications that can use the exact reject status? I also wander what should the status be if there are multiple errors in a REQ. > @@ -1373,12 +1375,21 @@ static int cm_req_handler(struct cm_work > > cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); > ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); > - if (ret) > + if (ret) { > + cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_INVALID_GID, > + CM_MSG_RESPONSE_REQ, NULL, 0); > + reject = 0; > goto error3; > + } > if (req_msg->alt_local_lid) { > ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); > - if (ret) > + if (ret) { > + cm_issue_rej(work->port, work->mad_recv_wc, > + IB_CM_INVALID_ALT_GID, CM_MSG_RESPONSE_REQ, > + NULL, 0); > + reject = 0; > goto error3; > + } > } Hmm ... it looks like cm_init_av_by_path can also fail if ib_find_cached_pkey returns an error - is it right that your patch will return invalid gid in this case? Maybe the right thing to do is to 1. Make cm_init_av_by_path return a more specific error in case of GID mismatch. ENXIO might be a good fit, but we can always add our own 2. Teach cm_destroy_id to send invalid gid reject on this error -- MST From arne.redlich at xiranet.com Tue Jul 18 07:57:07 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Tue, 18 Jul 2006 16:57:07 +0200 Subject: [openib-general] [PATCH 1/2] ib_cm: cm_destroy_id() cleanup In-Reply-To: <20060718143156.GA18551@mellanox.co.il> References: <1153232196.6972.106.camel@confield.dd.xiranet.com> <20060718143156.GA18551@mellanox.co.il> Message-ID: <1153234627.6972.119.camel@confield.dd.xiranet.com> Am Dienstag, den 18.07.2006, 17:31 +0300 schrieb Michael S. Tsirkin: > Quoting r. Arne Redlich : > > Subject: Re: [PATCH 1/2] ib_cm: cm_destroy_id() cleanup > > > > Am Dienstag, den 18.07.2006, 06:59 -0700 schrieb Roland Dreier: > > > > + cm_destroy_id(&cm_id_priv->id, (ret == -ENOMEM) ? 0 : 1); > > > > > > This is rather obfuscated. How about just > > > > > > cm_destroy_id(&cm_id_priv->id, ret != -ENOMEM); > > > > > > - R. > > > > Sure. Fixed below for your convenience. > > Would you consider pushing these to 2.6.18? If so, I'd happily provide > > patches against rc1 or your git tree. > > > > Thanks, > > Arne > > Well, Sean is the judge here, but what's the motivation for this patch? It's basically the foundation for patch #2 - return a *sensible* REJ reason and not just IB_CONSUMER_REJ. > > -- > > > > In its current incarnation, cm_destroy_id() will not send a REJ if fed a > > magic number (err = -ENOMEM). This patch replaces this magic number with > > a more generic "reject" parameter. > > > > Signed-off-by: Arne Redlich > > Are you sure this is a good idea? cm_destroy_id is after all an internal > function so I don't see why it must be generic. Because it's just plain easier to handle such code - just looking at the parameters will tell you what it does despite some magic value you'll have to track down. While this is imperative for public APIs, it's still nice for private ones, no? > -ENOMEM is the value user > returns to avoid rej, so it seems cleaner to just do the check where it's > actually needed and avoid extra branches - and I expect this will be easier to > extend if we have more codes. > > > Index: infiniband/core/cm.c > > =================================================================== > > --- infiniband/core/cm.c (revision 8565) > > +++ infiniband/core/cm.c (working copy) > > @@ -702,7 +702,7 @@ static void cm_reset_to_idle(struct cm_i > > } > > } > > > > -static void cm_destroy_id(struct ib_cm_id *cm_id, int err) > > +static void cm_destroy_id(struct ib_cm_id *cm_id, int reject) > > { > > struct cm_id_private *cm_id_priv; > > struct cm_work *work; > > @@ -737,14 +737,15 @@ retest: > > NULL, 0); > > break; > > case IB_CM_REQ_RCVD: > > - if (err == -ENOMEM) { > > + if (reject) { > > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > > + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, > > + NULL, 0, NULL, 0); > > + } else { > > /* Do not reject to allow future retries. */ > > cm_reset_to_idle(cm_id_priv); > > spin_unlock_irqrestore(&cm_id_priv->lock, flags); > > } else { > > - spin_unlock_irqrestore(&cm_id_priv->lock, flags); > > - ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, > > - NULL, 0, NULL, 0); > > } > > This looks like "if {} else {} else {}" - am I missing something? Oops, you're right. Don't know how this one slipped in? Arne -- Arne Redlich Xiranet Communications GmbH From minich at ornl.gov Tue Jul 18 07:58:26 2006 From: minich at ornl.gov (Makia Minich) Date: Tue, 18 Jul 2006 10:58:26 -0400 Subject: [openib-general] Problem with OFED on XT3 Message-ID: First, a little bit about what I'm trying to do (hoping that someone becomes interested enough to keep reading), and then the problem. I'm currently tasked with getting some form of infiniband up and running on a service node of the Cray XT3. Because the XT3 is currently shipping with SuSE9 (with the 2.6.5 based kernel) I decided to go with the OFED 1.0.1 release to see out of the box what is going to happen. Because of the system layout, I'm unable to change out the kernel, so there were some minor OFED source tweaks that I needed to perform (attached) to satisfy some missing symbols. On loading modules, I was seemingly successful loading everything up to and including ib_ipoib. Ifconfig showed the ib0 and ib1 devices available, and /sys/class/infiniband showed link to the subnet manager was in place. Attempting to assign an ip-address to the interface proved to be too much, as the node kernel panicked with the following: general protection fault: 0000 [1] CPU 0 Pid: 11258, comm: ifconfig Tainted: P U (2.6.5-7.252-ss ) RIP: 0010:[] {__kfree_skb+173} RSP: 0018:00000100c3cf3af8 EFLAGS: 00010286 RAX: 1b6012ffffffff00 RBX: 0000000000000000 RCX: ffffffffffffffe8 RDX: 0000000000000000 RSI: ffffffff80421ba0 RDI: 0000010005cfd340 RBP: 00000100e0c97480 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffff4 R13: ffffffff8029eeb0 R14: 0000000000000000 R15: 0000000000000003 FS: 0000002a9588e0a0(0000) GS:ffffffff80514b40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a9576650c CR3: 0000000000101000 CR4: 00000000000006e0 Process ifconfig (pid: 11258, threadinfo 00000100c3cf2000, task 00000100c3eba580) Stack: 0000000000000003 00000100c281f000 00000100e0c97480 ffffffff802ab825 00000100c281f000 ffffffff8029ef78 0000000000000000 00000100c281f000 0000000000000003 ffffffff802a86e3 Call Trace: {noop_enqueue+37} {dev_queue_xmit+200} {nf_hook_slow+227} {dev_queue_xmit+0} {igmp6_send+724} {fib6_walk_continue+192} {fib6_clean_node+0} {igmp6_join_group+51} {igmp6_group_added+191} {addrconf_prefix_route+225} {mld_del_delrec+117} {ipv6_dev_mc_inc+486} {addrconf_join_solict+59} {addrconf_dad_start+28} {addrconf_add_linklocal+43} {addrconf_dev_config+229} {addrconf_notify+123} {notifier_call_chain+31} {dev_open+261} {dev_change_flags+95} {devinet_ioctl+756} {inet_ioctl+87} {sock_ioctl+577} {sys_ioctl+532} {error_exit+0} {system_call+124} Code: ff 08 0f 94 c2 84 d2 74 09 48 8b 01 48 89 c7 ff 50 08 48 89 RIP {__kfree_skb+173} RSP <00000100c3cf3af8> <0>Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing Due to a lack of system dumps, I'm hoping that someone might have seen a similar panic and might offer some things to try to resolve this issue. Thanks... -- Makia Minich National Center for Computation Science Oak Ridge National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: cray_diffs.patch Type: application/octet-stream Size: 9107 bytes Desc: not available URL: From arne.redlich at xiranet.com Tue Jul 18 08:09:06 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Tue, 18 Jul 2006 17:09:06 +0200 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <20060718144359.GB18551@mellanox.co.il> References: <1153230821.6972.98.camel@confield.dd.xiranet.com> <20060718144359.GB18551@mellanox.co.il> Message-ID: <1153235346.6972.133.camel@confield.dd.xiranet.com> Am Dienstag, den 18.07.2006, 17:43 +0300 schrieb Michael S. Tsirkin: > Quoting r. Arne Redlich : > > Reject a REQ containing invalid GID(s) with appropriate reason codes > > instead of IB_CM_CONSUMER_REJ. > > > > Signed-off-by: Arne Redlich > > Are there actual applications that can use the exact reject status? Yep - the Gen1 SRP initiator does. It sends a REQ with an invalid DGID. If rejected with the correct code (INVALID GID), it will retry after looking up the GID. > I also > wander what should the status be if there are multiple errors in a REQ. The first one discovered, no? > > @@ -1373,12 +1375,21 @@ static int cm_req_handler(struct cm_work > > > > cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); > > ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); > > - if (ret) > > + if (ret) { > > + cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_INVALID_GID, > > + CM_MSG_RESPONSE_REQ, NULL, 0); > > + reject = 0; > > goto error3; > > + } > > if (req_msg->alt_local_lid) { > > ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); > > - if (ret) > > + if (ret) { > > + cm_issue_rej(work->port, work->mad_recv_wc, > > + IB_CM_INVALID_ALT_GID, CM_MSG_RESPONSE_REQ, > > + NULL, 0); > > + reject = 0; > > goto error3; > > + } > > } > > Hmm ... it looks like cm_init_av_by_path can also fail if ib_find_cached_pkey > returns an error - is it right that your patch will return invalid gid > in this case? > > Maybe the right thing to do is to > 1. Make cm_init_av_by_path return a more specific error in case of GID mismatch. > ENXIO might be a good fit, but we can always add our own Agreed. I've indeed overlooked it in the initial patch. > 2. Teach cm_destroy_id to send invalid gid reject on this error Hmm, I'd rather agree with Roland that passing a rej reason to cm_destroy_id() would be the cleanest way (while I attempted to go the easiest) - my brain cannot cope with too many layers of indirection that easily. ;-) Thanks, Arne -- Arne Redlich Xiranet Communications GmbH From mst at mellanox.co.il Tue Jul 18 08:19:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 18 Jul 2006 18:19:52 +0300 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153235346.6972.133.camel@confield.dd.xiranet.com> References: <1153235346.6972.133.camel@confield.dd.xiranet.com> Message-ID: <20060718151952.GA19060@mellanox.co.il> Quoting r. Arne Redlich : > > > Reject a REQ containing invalid GID(s) with appropriate reason codes > > > instead of IB_CM_CONSUMER_REJ. > > > > > > Signed-off-by: Arne Redlich > > > > Are there actual applications that can use the exact reject status? > > Yep - the Gen1 SRP initiator does. You are implementing an SRP target on top of gen2 then? > It sends a REQ with an invalid DGID. > If rejected with the correct code (INVALID GID), it will retry after > looking up the GID. Interesting. Why is it doing this? Any idea where did it get the wrong GID in the first place? Would other ULPs need such handling? > > I also > > wander what should the status be if there are multiple errors in a REQ. > > The first one discovered, no? Or maybe the most severe one? The spec seems fuzzy in this respect ... -- MST From halr at voltaire.com Tue Jul 18 08:09:39 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Jul 2006 11:09:39 -0400 Subject: [openib-general] [PATCH][TRIVIAL] OpenSM/osm_link_mgr.c: Eliminate duplicated line of code Message-ID: <1153235379.4540.22955.camel@hal.voltaire.com> OpenSM/osm_link_mgr.c: Eliminate duplicated line of code Signed-off-by: Hal Rosenstock Index: opensm/osm_link_mgr.c =================================================================== --- opensm/osm_link_mgr.c (revision 8572) +++ opensm/osm_link_mgr.c (working copy) @@ -180,7 +180,6 @@ __osm_link_mgr_set_physp_pi( PAST THIS POINT WE ARE HANDLING EITHER A NON PORT 0 OR ENHANCED PORT 0 */ - p_node = osm_physp_get_node_ptr( p_physp ); p_old_pi = osm_physp_get_port_info_ptr( p_physp ); memset( payload, 0, IB_SMP_DATA_SIZE ); From mshefty at ichips.intel.com Tue Jul 18 09:54:08 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 18 Jul 2006 09:54:08 -0700 Subject: [openib-general] [PATCH 1/2] ib_cm: cm_destroy_id() cleanup In-Reply-To: <1153230819.6972.97.camel@confield.dd.xiranet.com> References: <1153225007.6972.56.camel@confield.dd.xiranet.com> <1153230819.6972.97.camel@confield.dd.xiranet.com> Message-ID: <44BD1230.6000303@ichips.intel.com> Arne Redlich wrote: > In its current incarnation, cm_destroy_id() will not send a REJ if fed a > magic number (err = -ENOMEM). This patch replaces this magic number with > a more generic "reject" parameter. The intent here is that if the user destroys their cm_id in the middle of connection establishment, then the reject code is consumer reject. The cm_destroy_id() interface is generic, taking an error code as input. Since we don't know what state the cm_id is in when the call is made, I'm not sure that changing to a reject code is any more generic. That said, the err parameter is currently only used to reject a connection. See other response for an additional comment. - Sean From halr at voltaire.com Tue Jul 18 09:03:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Jul 2006 12:03:25 -0400 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <20060718151952.GA19060@mellanox.co.il> References: <1153235346.6972.133.camel@confield.dd.xiranet.com> <20060718151952.GA19060@mellanox.co.il> Message-ID: <1153238605.4540.23436.camel@hal.voltaire.com> On Tue, 2006-07-18 at 11:19, Michael S. Tsirkin wrote: [snip...] > > > I also > > > wander what should the status be if there are multiple errors in a REQ. > > > > The first one discovered, no? > > Or maybe the most severe one? The spec seems fuzzy in this respect ... There is no defined precedence of errors for CM in the IBA as far as I know so it's up to the implementer as to which one to return in the presence of multiple errors. -- Hal From mshefty at ichips.intel.com Tue Jul 18 10:09:22 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 18 Jul 2006 10:09:22 -0700 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153230821.6972.98.camel@confield.dd.xiranet.com> References: <1153225007.6972.56.camel@confield.dd.xiranet.com> <1153230821.6972.98.camel@confield.dd.xiranet.com> Message-ID: <44BD15C2.9010602@ichips.intel.com> Arne Redlich wrote: > @@ -1354,6 +1354,7 @@ static int cm_req_handler(struct cm_work > id.local_id); > if (IS_ERR(cm_id_priv->timewait_info)) { > ret = PTR_ERR(cm_id_priv->timewait_info); > + reject = 1; > goto error1; > } An error here indicates that we were unable to allocate memory to record the incoming connection request. I didn't send a reject in this case to allow it to be retried. (The chances of being able to send a reject if this allocation fails is unlikely.) If there's a need to send a reject in this case, we should call cm_issue_rej(). I don't think that we can send one from cm_destroy_id() unless the cm_id has been initialized further. > @@ -1373,12 +1375,21 @@ static int cm_req_handler(struct cm_work > > cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); > ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); > - if (ret) > + if (ret) { > + cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_INVALID_GID, > + CM_MSG_RESPONSE_REQ, NULL, 0); Change to IB_CM_REJ_INVALID_GID. > + reject = 0; > goto error3; > + } > if (req_msg->alt_local_lid) { > ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); > - if (ret) > + if (ret) { > + cm_issue_rej(work->port, work->mad_recv_wc, > + IB_CM_INVALID_ALT_GID, CM_MSG_RESPONSE_REQ, > + NULL, 0); IB_CM_REJ_INVALID_ALT_GID. - Sean From mshefty at ichips.intel.com Tue Jul 18 10:14:47 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 18 Jul 2006 10:14:47 -0700 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <20060718144359.GB18551@mellanox.co.il> References: <1153230821.6972.98.camel@confield.dd.xiranet.com> <20060718144359.GB18551@mellanox.co.il> Message-ID: <44BD1707.3050503@ichips.intel.com> Michael S. Tsirkin wrote: >> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); >>- if (ret) >>+ if (ret) { >>+ cm_issue_rej(work->port, work->mad_recv_wc, >>+ IB_CM_INVALID_ALT_GID, CM_MSG_RESPONSE_REQ, >>+ NULL, 0); >>+ reject = 0; >> goto error3; >>+ } >> } > > > Hmm ... it looks like cm_init_av_by_path can also fail if ib_find_cached_pkey > returns an error - is it right that your patch will return invalid gid > in this case? > > Maybe the right thing to do is to > 1. Make cm_init_av_by_path return a more specific error in case of GID mismatch. > ENXIO might be a good fit, but we can always add our own > 2. Teach cm_destroy_id to send invalid gid reject on this error I'm not sure what the correct reject message would be for an invalid pkey... I agree that being more specific would be good though. - Sean From mshefty at ichips.intel.com Tue Jul 18 10:55:27 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 18 Jul 2006 10:55:27 -0700 Subject: [openib-general] [PATCH] RMPP: add Dual-sided RMPP support In-Reply-To: <200607181231.52254.jackm@mellanox.co.il> References: <000001c69cc2$9f7ac710$4e268686@amr.corp.intel.com> <200607181231.52254.jackm@mellanox.co.il> Message-ID: <44BD208F.6060307@ichips.intel.com> Jack Morgenstein wrote: >>+static void ack_ds_ack(struct ib_mad_agent_private *agent, >>+ struct ib_mad_recv_wc *recv_wc) >>+{ > > .... > > >>+ rmpp_mad->rmpp_hdr.seg_num = 0; This is the value sent on the wire in the header... > > The seg_num value will be set to 1 within ib_post_send_mad/ib_send_rmpp_mad -- > see line 869 of mad_rmpp.c: > if (rmpp_mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) { > mad_send_wr->seg_num = 1; > return IB_RMPP_RESULT_INTERNAL; > } > > In this case, the rmpp_type is NOT DATA, but is ACK, so seg_num will be set to > 1. ...and this is used to retrieve the correct payload segment sent with the MAD. (I _think_ it ends up being used in ib_get_payload(), but I'm not completely sure.) You should see seg_num = 0 in the turn-around ACK with an analyzer or madeye. > I also failed to see where the second side of the double-sided RMPP > transaction starts up (sending side). Are you referring to the side that sent the request? Assuming that's the case, then the existing code to reassemble the response was sufficient. This is because we send an initial window size of 1 in our turn-around ACK, which is the assumed window size for other RMPP transfers. - Sean From mst at mellanox.co.il Tue Jul 18 11:24:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 18 Jul 2006 21:24:18 +0300 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <44BD1707.3050503@ichips.intel.com> References: <44BD1707.3050503@ichips.intel.com> Message-ID: <20060718182418.GB19613@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > Michael S. Tsirkin wrote: > >> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); > >>- if (ret) > >>+ if (ret) { > >>+ cm_issue_rej(work->port, work->mad_recv_wc, > >>+ IB_CM_INVALID_ALT_GID, CM_MSG_RESPONSE_REQ, > >>+ NULL, 0); > >>+ reject = 0; > >> goto error3; > >>+ } > >> } > > > > > > Hmm ... it looks like cm_init_av_by_path can also fail if ib_find_cached_pkey > > returns an error - is it right that your patch will return invalid gid > > in this case? > > > > Maybe the right thing to do is to > > 1. Make cm_init_av_by_path return a more specific error in case of GID > > mismatch. ENXIO might be a good fit, but we can always add our own > > 2. Teach cm_destroy_id to send invalid gid reject on this error > > I'm not sure what the correct reject message would be for an invalid pkey... > > I agree that being more specific would be good though. > > - Sean > By the way, AFAIK by design cache might be out of sync with actual hardware. Roland, could you confirm this pls? So if we look things up in cache and they are not there there must be a retry strategy which is missing if we reject the connection. A quick solution would be to force cache update before reject, or query device directly. Comments? -- MST From swise at opengridcomputing.com Tue Jul 18 11:48:47 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 18 Jul 2006 13:48:47 -0500 Subject: [openib-general] [PATCH Round 4 0/3][RFC] Network Event Notifier Mechanism Message-ID: <20060718184847.25455.88191.stgit@dell3.ogc.int> All, I'm posting this one more time for a definitive decision on pulling in this netevent notifier patch. I've included in this patchset changes to the Infiniband Core to use netevents instead of packet snooping to discover IPoIB ARP changes. See patch 3/3 for the Infiniband changes. Thanks, Steve. Round 4 Changes: - changed drivers/infiniband/core/addr.c to use netevents for discovering IPoIB ARP events. Round 3 Changes: - changed netlink msg for neighbour change to (RTM_NEIGHUPD) - added netlink msg for PMTU change events (RTM_ROUTEUPD) - added netlink messages for redirect (RTM_DELROUTE + RTM_NEWROUTE) - tested neighbour change events via netlink for ipv4 and ipv6. - tested redirect change events via netlink for ipv4. Round 2 Changes: - cleaned up event structures per review feedback. - began integration with netlink (see neighbour changes in patch 2). - added IPv6 support. ------ This patch implements a mechanism that allows interested clients to register for notification of certain network events. The intended use is to allow RDMA devices (linux/drivers/infiniband) to be notified of neighbour updates, ICMP redirects, path MTU changes, and route changes. The reason these devices need update events is because they typically cache this information in hardware and need to be notified when this information has been updated. For information on RDMA protocols, see: http://www.ietf.org/html.charters/rddp-charter.html. The key events of interest are: - neighbour mac address change - routing redirect (the next hop neighbour changes for a dst_entry) - path mtu change (the path mtu for a dst_entry changes). - route add/deletes NOTE: These new netevents are also passed up to user space via netlink. We would like to get this or similar functionality included in 2.6.19 and request comments. This patchset consists of 3 patches: 1) New files implementing the Network Event Notifier 2) Core network changes to generate network event notifications 3) Cleanup ib_addr modules to use the netevent patch Signed-off-by: Tom Tucker Signed-off-by: Steve Wise From swise at opengridcomputing.com Tue Jul 18 11:48:57 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 18 Jul 2006 13:48:57 -0500 Subject: [openib-general] [PATCH Round 4 1/3] Network Event Notifier Mechanism. In-Reply-To: <20060718184847.25455.88191.stgit@dell3.ogc.int> References: <20060718184847.25455.88191.stgit@dell3.ogc.int> Message-ID: <20060718184857.25455.75816.stgit@dell3.ogc.int> This patch uses notifier blocks to implement a network event notifier mechanism. Clients register their callback function by calling register_netevent_notifier() like this: static struct notifier_block nb = { .notifier_call = my_callback_func }; ... register_netevent_notifier(&nb); --- include/net/netevent.h | 49 +++++++++++++++++++++++++++++++++++ net/core/netevent.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 117 insertions(+), 0 deletions(-) diff --git a/include/net/netevent.h b/include/net/netevent.h new file mode 100644 index 0000000..22214c8 --- /dev/null +++ b/include/net/netevent.h @@ -0,0 +1,49 @@ +#ifndef _NET_EVENT_H +#define _NET_EVENT_H + +/* + * Generic netevent notifiers + * + * Authors: + * Tom Tucker + * + * Changes: + */ + +#ifdef __KERNEL__ + +#include + +/* + * Generic route info structure. + * + * Family Data ptr type + * -------------------------------- + * AF_INET - struct fib_info * + * AF_INET6 - struct rt6_info * + * AF_DECnet - struct dn_route * + */ +struct netevent_route_info { + u16 family; + void *data; +}; + +struct netevent_redirect { + struct dst_entry *old; + struct dst_entry *new; +}; + +enum netevent_notif_type { + NETEVENT_NEIGH_UPDATE = 1, /* arg is struct neighbour ptr */ + NETEVENT_ROUTE_ADD, /* arg is struct netevent_route_info ptr */ + NETEVENT_ROUTE_DEL, /* arg is struct netevent_route_info ptr */ + NETEVENT_PMTU_UPDATE, /* arg is struct dst_entry ptr */ + NETEVENT_REDIRECT, /* arg is struct netevent_redirect ptr */ +}; + +extern int register_netevent_notifier(struct notifier_block *nb); +extern int unregister_netevent_notifier(struct notifier_block *nb); +extern int call_netevent_notifiers(unsigned long val, void *v); + +#endif +#endif diff --git a/net/core/netevent.c b/net/core/netevent.c new file mode 100644 index 0000000..e995751 --- /dev/null +++ b/net/core/netevent.c @@ -0,0 +1,68 @@ +/* + * Network event notifiers + * + * Authors: + * Tom Tucker + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Fixes: + */ + +#include +#include + +static ATOMIC_NOTIFIER_HEAD(netevent_notif_chain); + +/** + * register_netevent_notifier - register a netevent notifier block + * @nb: notifier + * + * Register a notifier to be called when a netevent occurs. + * The notifier passed is linked into the kernel structures and must + * not be reused until it has been unregistered. A negative errno code + * is returned on a failure. + */ +int register_netevent_notifier(struct notifier_block *nb) +{ + int err; + + err = atomic_notifier_chain_register(&netevent_notif_chain, nb); + return err; +} + +/** + * netevent_unregister_notifier - unregister a netevent notifier block + * @nb: notifier + * + * Unregister a notifier previously registered by + * register_neigh_notifier(). The notifier is unlinked into the + * kernel structures and may then be reused. A negative errno code + * is returned on a failure. + */ + +int unregister_netevent_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(&netevent_notif_chain, nb); +} + +/** + * call_netevent_notifiers - call all netevent notifier blocks + * @val: value passed unmodified to notifier function + * @v: pointer passed unmodified to notifier function + * + * Call all neighbour notifier blocks. Parameters and return value + * are as for notifier_call_chain(). + */ + +int call_netevent_notifiers(unsigned long val, void *v) +{ + return atomic_notifier_call_chain(&netevent_notif_chain, val, v); +} + +EXPORT_SYMBOL_GPL(register_netevent_notifier); +EXPORT_SYMBOL_GPL(unregister_netevent_notifier); +EXPORT_SYMBOL_GPL(call_netevent_notifiers); From swise at opengridcomputing.com Tue Jul 18 11:49:07 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 18 Jul 2006 13:49:07 -0500 Subject: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification. In-Reply-To: <20060718184847.25455.88191.stgit@dell3.ogc.int> References: <20060718184847.25455.88191.stgit@dell3.ogc.int> Message-ID: <20060718184907.25455.8174.stgit@dell3.ogc.int> This patch adds netevent and netlink calls for neighbour change, route add/del, pmtu change, and routing redirect events. Netlink Details: Neighbour change events are broadcast as a new ndmsg type RTM_NEIGHUPD. Path mtu change events are broadcast as a new rtmsg type RTM_ROUTEUPD. Routing redirect events are broadcast as a pair of rtmsgs, RTM_DELROUTE and RTM_NEWROUTE. --- include/linux/rtnetlink.h | 4 ++ net/core/Makefile | 2 + net/core/neighbour.c | 37 ++++++++++++++++--- net/ipv4/fib_semantics.c | 9 +++++ net/ipv4/route.c | 86 ++++++++++++++++++++++++++++++++++++++++++-- net/ipv6/route.c | 87 +++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 213 insertions(+), 12 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index facd9ee..340ca4f 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -35,6 +35,8 @@ #define RTM_NEWROUTE RTM_NEWROUTE #define RTM_DELROUTE RTM_DELROUTE RTM_GETROUTE, #define RTM_GETROUTE RTM_GETROUTE + RTM_ROUTEUPD, +#define RTM_ROUTEUPD RTM_ROUTEUPD RTM_NEWNEIGH = 28, #define RTM_NEWNEIGH RTM_NEWNEIGH @@ -42,6 +44,8 @@ #define RTM_NEWNEIGH RTM_NEWNEIGH #define RTM_DELNEIGH RTM_DELNEIGH RTM_GETNEIGH, #define RTM_GETNEIGH RTM_GETNEIGH + RTM_NEIGHUPD, +#define RTM_NEIGHUPD RTM_NEIGHUPD RTM_NEWRULE = 32, #define RTM_NEWRULE RTM_NEWRULE diff --git a/net/core/Makefile b/net/core/Makefile index e9bd246..2645ba4 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -7,7 +7,7 @@ obj-y := sock.o request_sock.o skbuff.o obj-$(CONFIG_SYSCTL) += sysctl_net_core.o -obj-y += dev.o ethtool.o dev_mcast.o dst.o \ +obj-y += dev.o ethtool.o dev_mcast.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o obj-$(CONFIG_XFRM) += flow.o diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 7ad681f..11c7643 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -29,9 +29,11 @@ #include #include #include #include +#include #include #include #include +#include #define NEIGH_DEBUG 1 @@ -58,6 +60,7 @@ static void neigh_app_notify(struct neig #endif static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); +static void rtm_neigh_change(struct neighbour *n); static struct neigh_table *neigh_tables; #ifdef CONFIG_PROC_FS @@ -754,6 +757,7 @@ #endif neigh->nud_state = NUD_STALE; neigh->updated = jiffies; neigh_suspect(neigh); + notify = 1; } } else if (state & NUD_DELAY) { if (time_before_eq(now, @@ -762,6 +766,7 @@ #endif neigh->nud_state = NUD_REACHABLE; neigh->updated = jiffies; neigh_connect(neigh); + notify = 1; next = neigh->confirmed + neigh->parms->reachable_time; } else { NEIGH_PRINTK2("neigh %p is probed.\n", neigh); @@ -819,6 +824,8 @@ #endif out: write_unlock(&neigh->lock); } + if (notify) + rtm_neigh_change(neigh); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) @@ -926,9 +933,7 @@ int neigh_update(struct neighbour *neigh { u8 old; int err; -#ifdef CONFIG_ARPD int notify = 0; -#endif struct net_device *dev; int update_isrouter = 0; @@ -948,9 +953,7 @@ #endif neigh_suspect(neigh); neigh->nud_state = new; err = 0; -#ifdef CONFIG_ARPD notify = old & NUD_VALID; -#endif goto out; } @@ -1022,9 +1025,7 @@ #endif if (!(new & NUD_CONNECTED)) neigh->confirmed = jiffies - (neigh->parms->base_reachable_time << 1); -#ifdef CONFIG_ARPD notify = 1; -#endif } if (new == old) goto out; @@ -1055,7 +1056,11 @@ out: (neigh->flags | NTF_ROUTER) : (neigh->flags & ~NTF_ROUTER); } + write_unlock_bh(&neigh->lock); + + if (notify) + rtm_neigh_change(neigh); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) neigh_app_notify(neigh); @@ -2369,9 +2374,27 @@ static void neigh_app_notify(struct neig NETLINK_CB(skb).dst_group = RTNLGRP_NEIGH; netlink_broadcast(rtnl, skb, 0, RTNLGRP_NEIGH, GFP_ATOMIC); } - #endif /* CONFIG_ARPD */ +static void rtm_neigh_change(struct neighbour *n) +{ + struct nlmsghdr *nlh; + int size = NLMSG_SPACE(sizeof(struct ndmsg) + 256); + struct sk_buff *skb = alloc_skb(size, GFP_ATOMIC); + + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + if (!skb) + return; + + if (neigh_fill_info(skb, n, 0, 0, RTM_NEIGHUPD, 0) < 0) { + kfree_skb(skb); + return; + } + nlh = (struct nlmsghdr *)skb->data; + NETLINK_CB(skb).dst_group = RTNLGRP_NEIGH; + netlink_broadcast(rtnl, skb, 0, RTNLGRP_NEIGH, GFP_ATOMIC); +} + #ifdef CONFIG_SYSCTL static struct neigh_sysctl_table { diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 5f87533..33d8a83 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -44,6 +44,7 @@ #include #include #include #include +#include #include "fib_lookup.h" @@ -279,6 +280,14 @@ void rtmsg_fib(int event, u32 key, struc struct sk_buff *skb; u32 pid = req ? req->pid : n->nlmsg_pid; int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); + struct netevent_route_info nri; + int netevent; + + nri.family = AF_INET; + nri.data = &fa->fa_info; + netevent = event == RTM_NEWROUTE ? NETEVENT_ROUTE_ADD + : NETEVENT_ROUTE_DEL; + call_netevent_notifiers(netevent, &nri); skb = alloc_skb(size, GFP_KERNEL); if (!skb) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 2dc6dbb..18879e6 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -104,6 +104,7 @@ #include #include #include #include +#include #ifdef CONFIG_SYSCTL #include #endif @@ -151,6 +152,8 @@ static struct dst_entry *ipv4_negative_a static void ipv4_link_failure(struct sk_buff *skb); static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu); static int rt_garbage_collect(void); +static int rt_fill_info(struct sk_buff *skb, u32 pid, u32 seq, int event, + int nowait, unsigned int flags, unsigned int prot); static struct dst_ops ipv4_dst_ops = { @@ -1117,6 +1120,52 @@ static void rt_del(unsigned hash, struct spin_unlock_bh(rt_hash_lock_addr(hash)); } +static void rtm_redirect(struct rtable *old, struct rtable *new) +{ + struct netevent_redirect netevent; + struct sk_buff *skb; + int err; + + netevent.old = &old->u.dst; + netevent.new = &new->u.dst; + + /* notify netevent subscribers */ + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); + + /* Post NETLINK messages: RTM_DELROUTE for old route, + RTM_NEWROUTE for new route */ + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + skb->dst = &old->u.dst; + NETLINK_CB(skb).dst_pid = 0; + + err = rt_fill_info(skb, 0, 0, RTM_DELROUTE, 1, 0, RTPROT_UNSPEC); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV4_ROUTE, GFP_ATOMIC); + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + skb->dst = &new->u.dst; + NETLINK_CB(skb).dst_pid = 0; + + err = rt_fill_info(skb, 0, 0, RTM_NEWROUTE, 1, 0, RTPROT_REDIRECT); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV4_ROUTE, GFP_ATOMIC); + return; + +out_free: + kfree_skb(skb); + return; +} + void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, u32 saddr, struct net_device *dev) { @@ -1216,6 +1265,8 @@ void ip_rt_redirect(u32 old_gw, u32 dadd rt_drop(rt); goto do_next; } + + rtm_redirect(rth, rt); rt_del(hash, rth); if (!rt_intern_hash(hash, rt, &rt)) @@ -1442,6 +1493,32 @@ unsigned short ip_rt_frag_needed(struct return est_mtu ? : new_mtu; } +static void rtm_pmtu_update(struct rtable *rt) +{ + struct sk_buff *skb; + int err; + + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst); + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + skb->dst = &rt->u.dst; + NETLINK_CB(skb).dst_pid = 0; + + err = rt_fill_info(skb, 0, 0, RTM_ROUTEUPD, 1, 0, RTPROT_UNSPEC); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV4_ROUTE, GFP_ATOMIC); + return; + +out_free: + kfree_skb(skb); + return; +} + static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu) { if (dst->metrics[RTAX_MTU-1] > mtu && mtu >= 68 && @@ -1452,6 +1529,7 @@ static void ip_rt_update_pmtu(struct dst } dst->metrics[RTAX_MTU-1] = mtu; dst_set_expires(dst, ip_rt_mtu_expires); + rtm_pmtu_update((struct rtable *)dst); } } @@ -2627,7 +2705,7 @@ int ip_route_output_key(struct rtable ** } static int rt_fill_info(struct sk_buff *skb, u32 pid, u32 seq, int event, - int nowait, unsigned int flags) + int nowait, unsigned int flags, unsigned int prot) { struct rtable *rt = (struct rtable*)skb->dst; struct rtmsg *r; @@ -2646,7 +2724,7 @@ #endif r->rtm_table = RT_TABLE_MAIN; r->rtm_type = rt->rt_type; r->rtm_scope = RT_SCOPE_UNIVERSE; - r->rtm_protocol = RTPROT_UNSPEC; + r->rtm_protocol = prot; r->rtm_flags = (rt->rt_flags & ~0xFFFF) | RTM_F_CLONED; if (rt->rt_flags & RTCF_NOTIFY) r->rtm_flags |= RTM_F_NOTIFY; @@ -2792,7 +2870,7 @@ int inet_rtm_getroute(struct sk_buff *in NETLINK_CB(skb).dst_pid = NETLINK_CB(in_skb).pid; err = rt_fill_info(skb, NETLINK_CB(in_skb).pid, nlh->nlmsg_seq, - RTM_NEWROUTE, 0, 0); + RTM_NEWROUTE, 0, 0, RTPROT_UNSPEC); if (!err) goto out_free; if (err < 0) { @@ -2830,7 +2908,7 @@ int ip_rt_dump(struct sk_buff *skb, str skb->dst = dst_clone(&rt->u.dst); if (rt_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, RTM_NEWROUTE, - 1, NLM_F_MULTI) <= 0) { + 1, NLM_F_MULTI, RTPROT_UNSPEC) <= 0) { dst_release(xchg(&skb->dst, NULL)); rcu_read_unlock_bh(); goto done; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 87c39c9..a2b1d53 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -53,6 +53,7 @@ #include #include #include #include +#include #include @@ -96,6 +97,10 @@ static int ip6_pkt_discard(struct sk_bu static int ip6_pkt_discard_out(struct sk_buff *skb); static void ip6_link_failure(struct sk_buff *skb); static void ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu); +static int rt6_fill_node(struct sk_buff *skb, struct rt6_info *rt, + struct in6_addr *dst, struct in6_addr *src, + int iif, int type, u32 pid, u32 seq, + int prefix, unsigned int flags); #ifdef CONFIG_IPV6_ROUTE_INFO static struct rt6_info *rt6_add_route_info(struct in6_addr *prefix, int prefixlen, @@ -731,6 +736,32 @@ static void ip6_link_failure(struct sk_b } } +static void rtm_pmtu_update(struct rt6_info *rt) +{ + struct sk_buff *skb; + int err; + + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst); + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + skb->dst = &rt->u.dst; + NETLINK_CB(skb).dst_pid = 0; + + err = rt6_fill_node(skb, rt, NULL, NULL, 0, RTM_ROUTEUPD, 0, 0, 0, 0); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV6_ROUTE, GFP_ATOMIC); + return; + +out_free: + kfree_skb(skb); + return; +} + static void ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu) { struct rt6_info *rt6 = (struct rt6_info*)dst; @@ -742,6 +773,7 @@ static void ip6_rt_update_pmtu(struct ds dst->metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG; } dst->metrics[RTAX_MTU-1] = mtu; + rtm_pmtu_update(rt6); } } @@ -907,6 +939,7 @@ int ip6_route_add(struct in6_rtmsg *rtms struct net_device *dev = NULL; struct inet6_dev *idev = NULL; int addr_type; + struct netevent_route_info nri; rta = (struct rtattr **) _rtattr; @@ -1085,6 +1118,9 @@ install_route: rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_mtu(&rt->u.dst)); rt->u.dst.dev = dev; rt->rt6i_idev = idev; + nri.family = AF_INET6; + nri.data = rt; + call_netevent_notifiers(NETEVENT_ROUTE_ADD, &nri); return ip6_ins_rt(rt, nlh, _rtattr, req); out: @@ -1116,6 +1152,7 @@ static int ip6_route_del(struct in6_rtms struct fib6_node *fn; struct rt6_info *rt; int err = -ESRCH; + struct netevent_route_info nri; read_lock_bh(&rt6_lock); @@ -1137,6 +1174,10 @@ static int ip6_route_del(struct in6_rtms continue; dst_hold(&rt->u.dst); read_unlock_bh(&rt6_lock); + + nri.family = AF_INET6; + nri.data = rt; + call_netevent_notifiers(NETEVENT_ROUTE_DEL, &nri); return ip6_del_rt(rt, nlh, _rtattr, req); } @@ -1146,6 +1187,50 @@ static int ip6_route_del(struct in6_rtms return err; } +static void rtm_redirect(struct rt6_info *old, struct rt6_info *new) +{ + struct netevent_redirect netevent; + struct sk_buff *skb; + int err; + + netevent.old = &old->u.dst; + netevent.new = &new->u.dst; + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); + + /* Post NETLINK messages: RTM_DELROUTE for old route, + RTM_NEWROUTE for new route */ + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + NETLINK_CB(skb).dst_pid = 0; + NETLINK_CB(skb).dst_group = RTNLGRP_IPV6_ROUTE; + + err = rt6_fill_node(skb, old, NULL, NULL, 0, RTM_DELROUTE, 0, 0, 0, 0); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV6_ROUTE, GFP_ATOMIC); + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + NETLINK_CB(skb).dst_pid = 0; + NETLINK_CB(skb).dst_group = RTNLGRP_IPV6_ROUTE; + + err = rt6_fill_node(skb, new, NULL, NULL, 0, RTM_NEWROUTE, 0, 0, 0, 0); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV6_ROUTE, GFP_ATOMIC); + return; + +out_free: + kfree_skb(skb); + return; +} + /* * Handle redirects */ @@ -1252,6 +1337,8 @@ restart: if (ip6_ins_rt(nrt, NULL, NULL, NULL)) goto out; + rtm_redirect(rt, nrt); + if (rt->rt6i_flags&RTF_CACHE) { ip6_del_rt(rt, NULL, NULL, NULL); return; From swise at opengridcomputing.com Tue Jul 18 11:49:17 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 18 Jul 2006 13:49:17 -0500 Subject: [openib-general] [PATCH Round 4 3/3] Cleanup ib_addr module to use the netevent patch. In-Reply-To: <20060718184847.25455.88191.stgit@dell3.ogc.int> References: <20060718184847.25455.88191.stgit@dell3.ogc.int> Message-ID: <20060718184917.25455.75688.stgit@dell3.ogc.int> --- drivers/infiniband/core/addr.c | 30 ++++++++++++++---------------- 1 files changed, 14 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index d294bbc..1205e80 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -35,6 +35,7 @@ #include #include #include #include +#include #include MODULE_AUTHOR("Sean Hefty"); @@ -326,25 +327,22 @@ void rdma_addr_cancel(struct rdma_dev_ad } EXPORT_SYMBOL(rdma_addr_cancel); -static int addr_arp_recv(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pkt, struct net_device *orig_dev) +static int netevent_callback(struct notifier_block *self, unsigned long event, + void *ctx) { - struct arphdr *arp_hdr; + if (event == NETEVENT_NEIGH_UPDATE) { + struct neighbour *neigh = ctx; - arp_hdr = (struct arphdr *) skb->nh.raw; - - if (arp_hdr->ar_op == htons(ARPOP_REQUEST) || - arp_hdr->ar_op == htons(ARPOP_REPLY)) - set_timeout(jiffies); - - kfree_skb(skb); + if (neigh->dev->type == ARPHRD_INFINIBAND && + (neigh->nud_state & NUD_VALID)) { + set_timeout(jiffies); + } + } return 0; } -static struct packet_type addr_arp = { - .type = __constant_htons(ETH_P_ARP), - .func = addr_arp_recv, - .af_packet_priv = (void*) 1, +static struct notifier_block nb = { + .notifier_call = netevent_callback }; static int addr_init(void) @@ -353,13 +351,13 @@ static int addr_init(void) if (!addr_wq) return -ENOMEM; - dev_add_pack(&addr_arp); + register_netevent_notifier(&nb); return 0; } static void addr_cleanup(void) { - dev_remove_pack(&addr_arp); + unregister_netevent_notifier(&nb); destroy_workqueue(addr_wq); } From sean.hefty at intel.com Tue Jul 18 11:47:59 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 18 Jul 2006 11:47:59 -0700 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <20060718182418.GB19613@mellanox.co.il> Message-ID: <000101c6aa9a$b1371360$e598070a@amr.corp.intel.com> >By the way, AFAIK by design cache might be out of sync with actual hardware. >Roland, could you confirm this pls? > >So if we look things up in cache and they are not there there must be >a retry strategy which is missing if we reject the connection. >A quick solution would be to force cache update before reject, >or query device directly. If we're trying to look up a specific value, and it's not found in the cache, my preference would be for the cache code to update itself and retry the check before returning. - Sean From mst at mellanox.co.il Tue Jul 18 12:09:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 18 Jul 2006 22:09:53 +0300 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <000101c6aa9a$b1371360$e598070a@amr.corp.intel.com> References: <000101c6aa9a$b1371360$e598070a@amr.corp.intel.com> Message-ID: <20060718190953.GC19613@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > >By the way, AFAIK by design cache might be out of sync with actual hardware. > >Roland, could you confirm this pls? > > > >So if we look things up in cache and they are not there there must be > >a retry strategy which is missing if we reject the connection. > >A quick solution would be to force cache update before reject, > >or query device directly. > > If we're trying to look up a specific value, and it's not found in the cache, my > preference would be for the cache code to update itself and retry the check > before returning. That's impossible without API cange since cache queries could be used from interrupt context as well, while cache update might sleep. My preference would be to add a new set of APIs that wrap existing cache and do what you propose - this way they can be added where required and not all over the code. -- MST From mshefty at ichips.intel.com Tue Jul 18 12:21:07 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 18 Jul 2006 12:21:07 -0700 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153235346.6972.133.camel@confield.dd.xiranet.com> References: <1153230821.6972.98.camel@confield.dd.xiranet.com> <20060718144359.GB18551@mellanox.co.il> <1153235346.6972.133.camel@confield.dd.xiranet.com> Message-ID: <44BD34A3.6030009@ichips.intel.com> Arne Redlich wrote: > Yep - the Gen1 SRP initiator does. It sends a REQ with an invalid DGID. > If rejected with the correct code (INVALID GID), it will retry after > looking up the GID. Didn't it have a DGID from a path record that it had to get before sending the REQ? >>> cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); >>> ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); >>>- if (ret) >>>+ if (ret) { >>>+ cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_INVALID_GID, >>>+ CM_MSG_RESPONSE_REQ, NULL, 0); >>>+ reject = 0; >>> goto error3; >>>+ } >>> if (req_msg->alt_local_lid) { >>> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); >>>- if (ret) >>>+ if (ret) { >>>+ cm_issue_rej(work->port, work->mad_recv_wc, >>>+ IB_CM_INVALID_ALT_GID, CM_MSG_RESPONSE_REQ, >>>+ NULL, 0); >>>+ reject = 0; >>> goto error3; I missed this before, but the reject message must also contain a valid GID in the ARI data. I'm working on a patch based on the changes above to try to fix this. - Sean From halr at voltaire.com Tue Jul 18 12:11:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Jul 2006 15:11:04 -0400 Subject: [openib-general] [PATCH][MINOR]OpenSM/osm_sa_path_record.c: In __osm_pr_rcv_get_port_pair_paths, use ib_path_rec_num_path routine Message-ID: <1153249863.4540.25241.camel@hal.voltaire.com> OpenSM/osm_sa_path_record.c: In __osm_pr_rcv_get_port_pair_paths, use ib_path_rec_num_path routine Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8575) +++ opensm/osm_sa_path_record.c (working copy) @@ -874,7 +874,7 @@ __osm_pr_rcv_get_port_pair_paths( /* If SubnAdmGet, assume NumbPaths 1 (1.2 erratum) */ if( p_sa_mad->method != IB_MAD_METHOD_GET ) if( comp_mask & IB_PR_COMPMASK_NUMBPATH ) - iterations = p_pr->num_path & 0x7F; + iterations = ib_path_rec_num_path( p_pr ); else iterations = (uintn_t)(-1); else From rdreier at cisco.com Tue Jul 18 12:37:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 18 Jul 2006 12:37:22 -0700 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID References: <44BD1707.3050503@ichips.intel.com> <20060718182418.GB19613@mellanox.co.il> Message-ID: Michael> By the way, AFAIK by design cache might be out of sync Michael> with actual hardware. Roland, could you confirm this Michael> pls? Yes, that's true. But even without cache, since the update of GID and P_Key tables is asynchronous, there's nothing that prevents the result of a query of GID or P_Key from being wrong before the query function returns to the caller. So the cache is not really the problem. - R. From whirligig at kr-eng.com Tue Jul 18 14:08:02 2006 From: whirligig at kr-eng.com (Google Updates) Date: Tue, 18 Jul 2006 16:08:02 -0500 (PDT) Subject: [openib-general] New Google Toolbar Released Message-ID: <49160116.2S472XP4@kr-eng.com> An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Jul 18 12:52:37 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Jul 2006 15:52:37 -0400 Subject: [openib-general] [PATCH] osmtest/osmtest.c: Exercise SA PathRecord multicast destination code Message-ID: <1153252356.4540.25637.camel@hal.voltaire.com> osmtest/osmtest.c: Exercise SA PathRecord multicast destination code Signed-off-by: Hal Rosenstock Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 8575) +++ osmtest/osmtest.c (working copy) @@ -1006,6 +1006,78 @@ osmtest_get_path_rec_by_guid_pair( IN os return ( status ); } +/********************************************************************** + **********************************************************************/ +static ib_api_status_t +osmtest_get_path_rec_by_gid_pair( IN osmtest_t * const p_osmt, + IN ib_gid_t sgid, + IN ib_gid_t dgid, + IN osmtest_req_context_t *p_context) +{ + cl_status_t status = IB_SUCCESS; + osmv_query_req_t req; + osmv_gid_pair_t gid_pair; + + OSM_LOG_ENTER( &p_osmt->log, osmtest_get_path_rec_by_gid_pair); + + memset( p_context, 0, sizeof( *p_context ) ); + + p_context->p_osmt = p_osmt; + req.timeout_ms = p_osmt->opt.transaction_timeout; + req.retry_cnt = p_osmt->opt.retry_count; + req.flags = OSM_SA_FLAGS_SYNC; + req.query_context = p_context; + req.pfn_query_cb = osmtest_query_res_cb; + + req.query_type = OSMV_QUERY_PATH_REC_BY_GIDS; + + gid_pair.dest_gid = dgid; + gid_pair.src_gid = sgid; + + req.p_query_input = &gid_pair; + req.sm_key = 0; + + osm_log( &p_osmt->log, OSM_LOG_VERBOSE, + "osmtest_get_path_rec_by_gid_pair: " + "Query for path from 0x%016" PRIx64 " 0x%016" PRIx64 " to 0x%016" PRIx64 " 0x%016" PRIx64"\n", + sgid.unicast.prefix, sgid.unicast.interface_id, + dgid.unicast.prefix, dgid.unicast.interface_id ); + + status = osmv_query_sa( p_osmt->h_bind, &req ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_gid_pair: ERR 006A: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); + goto Exit; + } + + status = (*p_context).result.status; + + + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_gid_pair: ERR 006B: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); + + if( status == IB_REMOTE_ERROR ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_gid_pair: " + "Remote error = %s\n", + ib_get_mad_status_str( osm_madw_get_mad_ptr + ( (*p_context).result.p_result_madw ) ) ); + } + goto Exit; + } + + Exit: + + OSM_LOG_EXIT( &p_osmt->log ); + return ( status ); +} + #if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) /********************************************************************** **********************************************************************/ @@ -5326,6 +5398,7 @@ static ib_api_status_t osmtest_validate_against_db( IN osmtest_t * const p_osmt ) { ib_api_status_t status = IB_SUCCESS; + ib_gid_t portgid, mgid; #ifdef VENDOR_RMPP_SUPPORT ib_net64_t sm_key; ib_net16_t test_lid; @@ -5348,6 +5421,15 @@ osmtest_validate_against_db( IN osmtest_ if( status != IB_SUCCESS ) goto Exit; + /* Exercise SA PathRecord multicast destination code */ + memset( &context, 0, sizeof( context ) ); + ib_gid_set_default( &portgid, portguid ); + /* Set IPoIB broadcast MGID */ + mgid.unicast.prefix = CL_HTON64(0xff12401bffff0000ULL); + mgid.unicast.interface_id = CL_HTON64(0x00000000ffffffffULL); + /* Can't check status as don't know whether port is running IPoIB */ + osmtest_get_path_rec_by_gid_pair( p_osmt, portgid, mgid, &context); + #if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) memset( &context, 0, sizeof( context ) ); memset( &request, 0, sizeof( request ) ); From halr at voltaire.com Tue Jul 18 13:51:34 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Jul 2006 16:51:34 -0400 Subject: [openib-general] [PATCH] OpenSM/doc: Add current-routing.txt Message-ID: <1153255892.4540.26224.camel@hal.voltaire.com> OpenSM/doc: Add current-routing.txt This document is a description of the current routing algorithms/methods implemented by OpenSM. Written with copious help from Eitan Zahavi and Sasha Khapyorsky Signed-off-by: Hal Rosenstock Index: doc/current-routing.txt =================================================================== --- doc/current-routing.txt (revision 0) +++ doc/current-routing.txt (revision 0) @@ -0,0 +1,147 @@ +Current OpenSM Routing +7/18/06 + +OpenSM offers two routing engines: + +1. Min Hop Algorithm - based on the minimum hops to each node where the +path length is optimized. + +2. UPDN Unicast routing algorithm - also based on the minimum hops to each +node, but it is constrained to ranking rules. This algorithm should be chosen +if the subnet is not a pure Fat Tree, and deadlock may occur due to a +loop in the subnet. + +OpenSM now also offers a file method which can load routes from a table. See +modular-routing.txt for more information on this. + +The basic routing algorithm is comprised of two stages: +1. MinHop matrix calculation + How many hops are required to get from each port to each LID ? + The algorithm to fill these tables is different if you run standard +(min hop) or Up/Down. + For standard routing, a "relaxation" algorithm is used to propagate +min hop from every destination LID through neighbor switches + For Up/Down routing, a BFS from every target is used. The BFS tracks link +direction (up or down) and avoid steps that will perform up after a down +step was used. + +2. Once MinHop matrices exist, each switch is visited and for each target LID a +decision is made as to what port should be used to get to that LID. + This step is common to standard and Up/Down routing. Each port has a +counter counting the number of target LIDs going through it. + When there are multiple alternative ports with same MinHop to a LID, +the one with less previously assigned ports is selected. + If LMC > 0, more checks are added: Within each group of LIDs assigned to +same target port, + a. use only ports which have same MinHop + b. first prefer the ones that go to different systemImageGuid (then +the previous LID of the same LMC group) + c. if none - prefer those who go through other NodeGuid + d. fall back to the number of paths method (if all go to same node). + + +Effect of Topology Changes + +OpenSM will preserve existing routing in any case where there is no change in +the fabric switches unless the -r (--reassign_lids) option is specified. + +-r +--reassign_lids + This option causes OpenSM to reassign LIDs to all + end nodes. Specifying -r on a running subnet + may disrupt subnet traffic. + Without -r, OpenSM attempts to preserve existing + LID assignments resolving multiple use of same LID. + +If a link is added or removed, OpenSM does not recalculate +the routes that do not have to change. A route has to change +if the port is no longer UP or no longer the MinHop. When routing changes +are performed, the same algorithm for balancing the routes is invoked. + +In the case of using the file based routing, any topology changes are +currently ignored The 'file' routing engine just loads the LFTs from the file +specified, with no reaction to real topology. Obviously, this will not be able +to recheck LIDs (by GUID) for disconnected nodes, and LFTs for non-existent +switches will be skipped. Multicast is not affected by 'file' routing engine +(this uses min hop tables). + + +Min Hop Algorithm + +The Min Hop algorithm is invoked when neither UPDN or the file method are +specified. + +The Min Hop algorithm is divided into two stages: computation of +min-hop tables on every switch and LFT output port assignment. Link +subscription is also equalized with the ability to override based on +port GUID. The latter is supplied by: + +-i +-ignore-guids + This option provides the means to define a set of ports + (by guids) that will be ignored by the link load + equalization algorithm. + +LMC awareness routes based on (remote) system or switch basis. + + +Purpose of UPDN Algorithm + +The UPDN algorithm is designed to prevent deadlocks from occurring in loops +of the subnet. A loop-deadlock is a situation in which it is no longer +possible to send data between any two hosts connected through the loop. As +such, the UPDN routing algorithm should be used if the subnet is not a pure +Fat Tree, and one of its loops may experience a deadlock (due, for example, +to high pressure). + +The UPDN algorithm is based on the following main stages: + +1. Auto-detect root nodes - based on the HCA hop length from any switch in +the subnet, a statistical histogram is built for each switch (hop num vs +number of occurrences). If the histogram reflects a specific column (higher +than others) for a certain node, then it is marked as a root node. Since +the algorithm is statistical, it may not find any root nodes. The list of +the root nodes found by this auto-detect stage is used by the ranking +process stage. + + Note 1: The user can override the node list manually. + Note 2: If this stage cannot find any root nodes, and the user did not + specify a guid list file, OpenSM defaults back to the Min Hop + routing algorithm. + +2. Ranking process - All root switch nodes (found in stage 1) are assigned +a rank of 0. Using the BFS algorithm, the rest of the switch nodes in the +subnet are ranked incrementally. This ranking aids in the process of enforcing +rules that ensure loop-free paths. + +3. Min Hop Table setting - after ranking is done, a BFS algorithm is run from +each (HCA or switch) node in the subnet. During the BFS process, the FDB table +of each switch node traversed by BFS is updated, in reference to the starting +node, based on the ranking rules and guid values. + +At the end of the process, the updated FDB tables ensure loop-free paths +through the subnet. + + +UPDN Algorithm Usage + +Activation through OpenSM + +Use '-R updn' option (instead of old '-u') to activate the UPDN algorithm. +Use `-a ' for adding an UPDN guid file that contains the +root nodes for ranking. +If the `-a' option is not used, OpenSM uses its auto-detect root nodes +algorithm. + +Notes on the guid list file: +1. A valid guid file specifies one guid in each line. Lines with an invalid +format will be discarded. +2. The user should specify the root switch guids. However, it is also +possible to specify HCA guids; OpenSM will use the guid of the switch (if +it exists) that connects the HCA to the subnet as a root node. + + +To learn more about deadlock-free routing, see the article +"Deadlock Free Message Routing in Multiprocessor Interconnection Networks" +by William J Dally and Charles L Seitz (1985). + From arlin.r.davis at intel.com Tue Jul 18 14:59:30 2006 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 18 Jul 2006 14:59:30 -0700 Subject: [openib-general] [PATCH] uDAPL - OpenIB-cma: added consumer wakeup mechanism for cq wait objects Message-ID: Fix for Bug 158. Add support for dat_evd_set_unwaitable on a DTO EVD. Signed-off by: Arlin Davis ardavis at ichips.intel.com Index: dapl/common/dapl_evd_util.c =================================================================== --- dapl/common/dapl_evd_util.c (revision 8559) +++ dapl/common/dapl_evd_util.c (working copy) @@ -239,7 +239,6 @@ dapls_evd_alloc ( #ifdef CQ_WAIT_OBJECT /* Create CQ wait object; no CNO and data stream type */ - evd_ptr->cq_wait_obj_handle = 0; if (( cno_ptr == NULL ) && ((evd_flags & ~ (DAT_EVD_DTO_FLAG|DAT_EVD_RMR_BIND_FLAG)) == 0 )) { Index: dapl/openib_cma/dapl_ib_qp.c =================================================================== --- dapl/openib_cma/dapl_ib_qp.c (revision 8559) +++ dapl/openib_cma/dapl_ib_qp.c (working copy) @@ -93,7 +93,7 @@ DAT_RETURN dapls_ib_qp_alloc(IN DAPL_IA ia_ptr->hca_ptr->ib_trans.ib_cq; #ifdef CQ_WAIT_OBJECT if (rcv_evd->cq_wait_obj_handle) - channel = rcv_evd->cq_wait_obj_handle; + channel = rcv_evd->cq_wait_obj_handle->events; #endif /* Call IB verbs to create CQ */ rcv_cq = ibv_create_cq(ia_ptr->hca_ptr->ib_hca_handle, Index: dapl/openib_cma/dapl_ib_util.h =================================================================== --- dapl/openib_cma/dapl_ib_util.h (revision 8565) +++ dapl/openib_cma/dapl_ib_util.h (working copy) @@ -106,7 +106,15 @@ typedef uint16_t ib_hca_port_t; typedef uint32_t ib_comp_handle_t; #ifdef CQ_WAIT_OBJECT -typedef struct ibv_comp_channel *ib_wait_obj_handle_t; + +/* CQ event channel, plus pipe to enable consumer wakeup */ +typedef struct _ib_wait_obj_handle +{ + struct ibv_comp_channel *events; + int pipe[2]; + +} *ib_wait_obj_handle_t; + #endif /* Definitions */ Index: dapl/openib_cma/dapl_ib_cq.c =================================================================== --- dapl/openib_cma/dapl_ib_cq.c (revision 8559) +++ dapl/openib_cma/dapl_ib_cq.c (working copy) @@ -240,7 +240,7 @@ dapls_ib_cq_alloc(IN DAPL_IA *ia_ptr, #ifdef CQ_WAIT_OBJECT if (evd_ptr->cq_wait_obj_handle) - channel = evd_ptr->cq_wait_obj_handle; + channel = evd_ptr->cq_wait_obj_handle->events; #endif /* Call IB verbs to create CQ */ @@ -299,7 +299,7 @@ dapls_ib_cq_resize(IN DAPL_IA *ia_ptr, #ifdef CQ_WAIT_OBJECT if (evd_ptr->cq_wait_obj_handle) - channel = evd_ptr->cq_wait_obj_handle; + channel = evd_ptr->cq_wait_obj_handle->events; #endif /* Call IB verbs to create CQ */ @@ -444,14 +444,36 @@ dapls_ib_wait_object_create(IN DAPL_EVD " cq_object_create: (%p,%p)\n", evd_ptr, p_cq_wait_obj_handle ); - /* set cq_wait object to evd_ptr */ *p_cq_wait_obj_handle = + dapl_os_alloc(sizeof(struct _ib_wait_obj_handle)); + + if (*p_cq_wait_obj_handle == NULL) + return(dapl_convert_errno(ENOMEM," wait_object_create")); + + dapl_os_memzero(*p_cq_wait_obj_handle, + sizeof(struct _ib_wait_obj_handle)); + + /* create pipe for waking up work thread */ + if (pipe((*p_cq_wait_obj_handle)->pipe)) + goto bail; + + /* set cq_wait object to evd_ptr */ + (*p_cq_wait_obj_handle)->events = ibv_create_comp_channel( evd_ptr->header.owner_ia->hca_ptr->ib_hca_handle); + if ((*p_cq_wait_obj_handle)->events == NULL) + goto bail; + return DAT_SUCCESS; +bail: + dapl_os_free(*p_cq_wait_obj_handle, + sizeof(struct _ib_wait_obj_handle)); + + return(dapl_convert_errno(errno," wait_object_create")); } + DAT_RETURN dapls_ib_wait_object_destroy(IN ib_wait_obj_handle_t p_cq_wait_obj_handle) { @@ -459,7 +481,10 @@ dapls_ib_wait_object_destroy(IN ib_wait_ " cq_object_destroy: wait_obj=%p\n", p_cq_wait_obj_handle ); - ibv_destroy_comp_channel(p_cq_wait_obj_handle); + ibv_destroy_comp_channel(p_cq_wait_obj_handle->events); + + dapl_os_free(p_cq_wait_obj_handle, + sizeof(struct _ib_wait_obj_handle)); return DAT_SUCCESS; } @@ -471,7 +496,8 @@ dapls_ib_wait_object_wakeup (IN ib_wait_ " cq_object_wakeup: wait_obj=%p\n", p_cq_wait_obj_handle ); - /* no wake up mechanism */ + /* write to pipe for wake up */ + write(p_cq_wait_obj_handle->pipe[1], "w", sizeof "w"); return DAT_SUCCESS; } @@ -483,30 +509,36 @@ dapls_ib_wait_object_wait(IN ib_wait_obj struct ibv_cq *ibv_cq = NULL; int status = 0; int timeout_ms = -1; - struct pollfd cq_fd = { - .fd = p_cq_wait_obj_handle->fd, - .events = POLLIN, - .revents = 0 - }; + struct pollfd ufds[2]; dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " cq_object_wait: CQ channel %p time %d\n", p_cq_wait_obj_handle, timeout ); + + /* setup cq event channel and pipe fd for consumer wakeup */ + ufds[0].fd = p_cq_wait_obj_handle->events->fd; + ufds[0].events = POLLIN; + ufds[0].revents = 0; + ufds[1].fd = p_cq_wait_obj_handle->pipe[0]; + ufds[1].events = POLLIN; + ufds[1].revents = 0; /* uDAPL timeout values in usecs */ if (timeout != DAT_TIMEOUT_INFINITE) timeout_ms = timeout/1000; /* restart syscall */ - while ((status = poll(&cq_fd, 1, timeout_ms)) == -1 ) + while ((status = poll(ufds, 2, timeout_ms)) == -1 ) if (errno == EINTR) continue; /* returned event */ if (status > 0) { - if (!ibv_get_cq_event(p_cq_wait_obj_handle, - &ibv_cq, (void*)&evd_ptr)) { - ibv_ack_cq_events(ibv_cq, 1); + if (ufds[0].revents == POLLIN) { + if (!ibv_get_cq_event(p_cq_wait_obj_handle->events, + &ibv_cq, (void*)&evd_ptr)) { + ibv_ack_cq_events(ibv_cq, 1); + } } status = 0; From sean.hefty at intel.com Tue Jul 18 15:09:21 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 18 Jul 2006 15:09:21 -0700 Subject: [openib-general] [PATCH] ib_cm: fix REJ due to invalid GID In-Reply-To: <44BD34A3.6030009@ichips.intel.com> Message-ID: <000201c6aab6$d267ddf0$e598070a@amr.corp.intel.com> The following patch should set the reject code properly when rejecting a request that contains an invalid GID. A suitable GID is returned by the IB CM in the additional reject information (ARI). Signed-off-by: Sean Hefty --- Index: cm.c =================================================================== --- cm.c (revision 8484) +++ cm.c (working copy) @@ -1372,12 +1372,23 @@ static int cm_req_handler(struct cm_work cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); - if (ret) + if (ret) { + ib_get_cached_gid(work->port->cm_dev->device, + work->port->port_num, 0, &work->path[0].sgid); + cm_issue_rej(work->port, work->mad_recv_wc, + IB_CM_REJ_INVALID_GID, CM_MSG_RESPONSE_REQ, + &work->path[0].sgid, sizeof work->path[0].sgid); goto error3; + } if (req_msg->alt_local_lid) { ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); - if (ret) + if (ret) { + cm_issue_rej(work->port, work->mad_recv_wc, + IB_CM_REJ_INVALID_ALT_GID, + CM_MSG_RESPONSE_REQ, &work->path[0].sgid, + sizeof work->path[0].sgid); goto error3; + } } cm_id_priv->tid = req_msg->hdr.tid; cm_id_priv->timeout_ms = cm_convert_to_ms( From ravinandan.arakali at neterion.com Tue Jul 18 17:30:26 2006 From: ravinandan.arakali at neterion.com (Ravinandan Arakali) Date: Tue, 18 Jul 2006 17:30:26 -0700 Subject: [openib-general] ping problem with ammassocards(iWARPinterface) Message-ID: <000b01c6aaca$885b5340$4710100a@pc.s2io.com> Steve/Pradipta, Without the -O2 option, rping is now working ! Earlier, I did not realize that the cable was yanked out. Thanks for all the help.. Ravi -----Original Message----- From: Ravinandan Arakali [mailto:ravinandan.arakali at neterion.com] Sent: Friday, July 14, 2006 3:37 PM To: 'Steve Wise' Cc: 'bpradip at in.ibm.com'; 'openib-general at openib.org'; Leonid. Grossman (E-mail) Subject: RE: [openib-general] ping problem with ammassocards(iWARPinterface) As Pradipta suggested, I rebuilt the libraries by removing the optimization(-O2 flag) from Makefile. Now, I don't see the core dump but there's no connection established with rping. This is similar to the failure I am seeing with rdma_lat test. BTW, when I start the rping in server mode, at say port 9999, should I expect to see an entity listening on that port number when I do "netstat -an". Currently, I don't see that. Ravi -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Thursday, July 13, 2006 12:10 PM To: ravinandan.arakali at neterion.com Cc: bpradip at in.ibm.com; openib-general at openib.org Subject: Re: [openib-general] ping problem with ammassocards(iWARPinterface) By the way, does this failure happen immediately or after some period of time? On Thu, 2006-07-13 at 13:27 -0500, Steve Wise wrote: > I guess this isn't surprising since rping doesn't work for you either. > Something fundamental is screwed up on your user side methinks... > > CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side > wasn't listening on the appropriate TCP port, or the server process did > an rdma_reject(). I'm guessing its the former... > > You could use tcpdmp and to see if the connection request is getting RST > by the remote side. > > > > > On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote: > > With the --cma option, I don't see the error about running SM. > > But there's no connection established. > > > > openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma > > pp_server_connect_cma starting server > > > > openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 > > pp_client_connect_cma starting client > > pp_client_connect_cma/856 unexpected CM event 8 > > pp_client_connect_cma NOT connected! > > pp_connect_cma(17.2.2.102,18515) failed! > > > > There are no messages in dmesg either. > > > > Ravi > > > > -----Original Message----- > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > Sent: Thursday, July 13, 2006 6:55 AM > > To: Ravinandan Arakali > > Cc: bpradip at in.ibm.com; openib-general at openib.org > > Subject: Re: [openib-general] ping problem with ammasso > > cards(iWARPinterface) > > > > > > Are you trying to run this over iwarp? It doesn't need an SM... > > > > For the perftests rdma_lat and rdma_bw in the iwarp branch, use the > > --cma flag. > > > > Steve. > > > > > > On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: > > > Also, I am trying to run some of the iwarp bandwidth/latency tests > > > (available under directory perftest). > > > The first thing to do here is to run opensm. When I run opensm (with debug > > > level 10), I get the following error. Any idea what needs to be done to > > get > > > this working ? > > > > > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.2.0 > > > Command Line Arguments: > > > d level = 0xa > > > Log File: /var/log/osm.log > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.2.0 > > > > > > Using default GUID 0x0 > > > Error: Could not get port guid > > > Exiting SM > > > > > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > > > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > > > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > > > > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No > > > previous bind > > > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > Sent: Wednesday, July 12, 2006 10:31 AM > > > To: Ravinandan Arakali > > > Cc: openib-general at openib.org > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > interface) > > > > > > > > > Ravinandan, > > > Do you still see the rping crash? > > > > > > Thanks, > > > Pradipta Kumar. > > > > > > Ravinandan Arakali wrote: > > > > Pradipta, > > > > Okay, thanks.. Initially, I was not sure since I don't remember non-zero > > > > values in /proc/krping. When I re-ran the krping test, I see following > > > > output > > > > openfab2:~ # cat /proc/krping > > > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > > > Any idea why rping is dumping core ? > > > > > > > > Has any testing been done using SDP with ammasso cards ? > > > > > > > > Regards, > > > > Ravi > > > > > > > > > > > > -----Original Message----- > > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > > Sent: Friday, July 07, 2006 11:20 PM > > > > To: Ravinandan Arakali > > > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > > > openib-general at openib.org > > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > > interface) > > > > > > > > > > > > Ravinandan Arakali wrote: > > > >> Pradipta, > > > >> Following is the output from gdb after core dump. I have also > > copy-pasted > > > >> the gdb output on client system. > > > >> > > > >> Attached is the dmesg output when krping test is run in verbose mode. > > > >> The ping data on the sender(client) seems okay. The content is shifted > > > >> forward by one character for each packet. On receiver, after receiving > > > > ping > > > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > > > messages > > > >> can be lost during writing to /var/log/messages ? > > > > krping is indeed working!!...Using 'verbose' allows you to see the ping > > > > data. > > > > When not using 'verbose' you see only 'send/recv' messages. > > > >> ----------------------------------------- > > > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > > > >> Starting program: > > > >> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > > > >> 0.0.0.0 -p 9999 > > > >> [Thread debugging using libthread_db enabled] > > > >> [New Thread -1210054992 (LWP 3668)] > > > >> ipaddr (0.0.0.0) > > > >> port 9999 > > > >> created cm_id 0x804e6e0 > > > >> [New Thread -1210057824 (LWP 3671)] > > > >> rdma_bind_addr successful > > > >> rdma_listen > > > >> cma_event type 4 cma_id 0x804e968 (child) > > > >> child cma 0x804e968 > > > >> > > > >> Program received signal SIGSEGV, Segmentation fault. > > > >> [Switching to Thread -1210054992 (LWP 3668)] > > > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > > > >> (gdb) bt > > > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at address > > 0x6 > > > >> ) at examples/rping.c:767 > > > >> (gdb) > > > >> > > > >> --------------------------------- > > > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > > > >> Starting program: > > > >> > > tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > > > >> 17.2.2.102 -p 9999 > > > >> [Thread debugging using libthread_db enabled] > > > >> [New Thread 47388824908032 (LWP 4620)] > > > >> ipaddr (17.2.2.102) > > > >> port 9999 > > > >> created cm_id 0x506b00 > > > >> [New Thread 1082132800 (LWP 4623)] > > > >> cma_event type 0 cma_id 0x506b00 (parent) > > > >> cma_event type 2 cma_id 0x506b00 (parent) > > > >> rdma_resolve_addr - rdma_resolve_route successful > > > >> created pd 0x506e60 > > > >> created channel 0x506e80 > > > >> created cq 0x506ea0 > > > >> created qp 0x506f40 > > > >> rping_setup_buffers called on cb 0x505010 > > > >> allocated & registered buffers... > > > >> [New Thread 1090525504 (LWP 4624)] > > > >> cq_thread started. > > > >> > > > >> > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From arne.redlich at xiranet.com Wed Jul 19 00:11:28 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Wed, 19 Jul 2006 09:11:28 +0200 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <44BD15C2.9010602@ichips.intel.com> References: <1153225007.6972.56.camel@confield.dd.xiranet.com> <1153230821.6972.98.camel@confield.dd.xiranet.com> <44BD15C2.9010602@ichips.intel.com> Message-ID: <1153293088.6972.147.camel@confield.dd.xiranet.com> Am Dienstag, den 18.07.2006, 10:09 -0700 schrieb Sean Hefty: > Arne Redlich wrote: > > @@ -1354,6 +1354,7 @@ static int cm_req_handler(struct cm_work > > id.local_id); > > if (IS_ERR(cm_id_priv->timewait_info)) { > > ret = PTR_ERR(cm_id_priv->timewait_info); > > + reject = 1; > > goto error1; > > } > > An error here indicates that we were unable to allocate memory to record the > incoming connection request. I didn't send a reject in this case to allow it to > be retried. (The chances of being able to send a reject if this allocation > fails is unlikely.) > > If there's a need to send a reject in this case, we should call cm_issue_rej(). > I don't think that we can send one from cm_destroy_id() unless the cm_id has > been initialized further. This was merely to initialize "reject" (=> "err = 0" using the old code), otherwise preserving the original behaviour. > > @@ -1373,12 +1375,21 @@ static int cm_req_handler(struct cm_work > > > > cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); > > ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); > > - if (ret) > > + if (ret) { > > + cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_INVALID_GID, > > + CM_MSG_RESPONSE_REQ, NULL, 0); > > Change to IB_CM_REJ_INVALID_GID. > > > + reject = 0; > > goto error3; > > + } > > if (req_msg->alt_local_lid) { > > ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); > > - if (ret) > > + if (ret) { > > + cm_issue_rej(work->port, work->mad_recv_wc, > > + IB_CM_INVALID_ALT_GID, CM_MSG_RESPONSE_REQ, > > + NULL, 0); > > IB_CM_REJ_INVALID_ALT_GID. Of course. I really don't know where my mind was when submitting this apparently half baked patch instead of the final version which gets (at least) this one right and also doesn't have the "if - else - else" oddity pointed out by Michael. Sorry. Thanks, Arne -- Arne Redlich Xiranet Communications GmbH From arne.redlich at xiranet.com Wed Jul 19 00:32:43 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Wed, 19 Jul 2006 09:32:43 +0200 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <44BD34A3.6030009@ichips.intel.com> References: <1153230821.6972.98.camel@confield.dd.xiranet.com> <20060718144359.GB18551@mellanox.co.il> <1153235346.6972.133.camel@confield.dd.xiranet.com> <44BD34A3.6030009@ichips.intel.com> Message-ID: <1153294363.6972.151.camel@confield.dd.xiranet.com> Am Dienstag, den 18.07.2006, 12:21 -0700 schrieb Sean Hefty: > Arne Redlich wrote: > > Yep - the Gen1 SRP initiator does. It sends a REQ with an invalid DGID. > > If rejected with the correct code (INVALID GID), it will retry after > > looking up the GID. > > Didn't it have a DGID from a path record that it had to get before sending the REQ? If I'm not completely mistaken, the path lookup is only done after receiving an IB_CM_REJ_INVALID_GID. I have no idea why it is done this way, though. Arne -- Arne Redlich Xiranet Communications GmbH From mst at mellanox.co.il Wed Jul 19 00:43:27 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 19 Jul 2006 10:43:27 +0300 Subject: [openib-general] [PATCH] ib_cm: fix REJ due to invalid GID In-Reply-To: <000201c6aab6$d267ddf0$e598070a@amr.corp.intel.com> References: <000201c6aab6$d267ddf0$e598070a@amr.corp.intel.com> Message-ID: <20060719074327.GA15325@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [PATCH] ib_cm: fix REJ due to invalid GID > > The following patch should set the reject code properly when > rejecting a request that contains an invalid GID. A suitable > GID is returned by the IB CM in the additional reject information > (ARI). > > Signed-off-by: Sean Hefty > --- > Index: cm.c > =================================================================== > --- cm.c (revision 8484) > +++ cm.c (working copy) > @@ -1372,12 +1372,23 @@ static int cm_req_handler(struct cm_work > > cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); > ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); > - if (ret) > + if (ret) { > + ib_get_cached_gid(work->port->cm_dev->device, > + work->port->port_num, 0, &work->path[0].sgid); > + cm_issue_rej(work->port, work->mad_recv_wc, > + IB_CM_REJ_INVALID_GID, CM_MSG_RESPONSE_REQ, > + &work->path[0].sgid, sizeof work->path[0].sgid); > goto error3; > + } Should we be worried that this also sends IB_CM_REJ_INVALID_GID if find_cached_pkey fails? -- MST From arne.redlich at xiranet.com Wed Jul 19 00:46:07 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Wed, 19 Jul 2006 09:46:07 +0200 Subject: [openib-general] [PATCH] ib_cm: fix REJ due to invalid GID In-Reply-To: <000201c6aab6$d267ddf0$e598070a@amr.corp.intel.com> References: <000201c6aab6$d267ddf0$e598070a@amr.corp.intel.com> Message-ID: <1153295167.6972.156.camel@confield.dd.xiranet.com> Am Dienstag, den 18.07.2006, 15:09 -0700 schrieb Sean Hefty: > The following patch should set the reject code properly when > rejecting a request that contains an invalid GID. A suitable > GID is returned by the IB CM in the additional reject information > (ARI). > > Signed-off-by: Sean Hefty > > --- > Index: cm.c > =================================================================== > --- cm.c (revision 8484) > +++ cm.c (working copy) > @@ -1372,12 +1372,23 @@ static int cm_req_handler(struct cm_work > > cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); > ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); > - if (ret) > + if (ret) { > + ib_get_cached_gid(work->port->cm_dev->device, > + work->port->port_num, 0, &work->path[0].sgid); > + cm_issue_rej(work->port, work->mad_recv_wc, > + IB_CM_REJ_INVALID_GID, CM_MSG_RESPONSE_REQ, > + &work->path[0].sgid, sizeof work->path[0].sgid); > goto error3; Won't this lead to a second REJ (IB_CM_REJ_CONSUMER_DEFINED) being issued by cm_destroy_id()? That's actually what I tried to prevent using the "reject" param. Arne -- Arne Redlich Xiranet Communications GmbH From mst at mellanox.co.il Wed Jul 19 00:44:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 19 Jul 2006 10:44:58 +0300 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: References: Message-ID: <20060719074458.GB15325@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > Michael> By the way, AFAIK by design cache might be out of sync > Michael> with actual hardware. Roland, could you confirm this > Michael> pls? > > Yes, that's true. But even without cache, since the update of GID and > P_Key tables is asynchronous, there's nothing that prevents the result > of a query of GID or P_Key from being wrong before the query function > returns to the caller. So the cache is not really the problem. Hmm. This implies that the requesting side must handle this: CMA must retry a connection if it gets invalid GID reject. -- MST From halr at voltaire.com Wed Jul 19 03:26:47 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 06:26:47 -0400 Subject: [openib-general] [PATCHv3 0/3] OpenSM: Support configurable SL per partition Message-ID: <1153304806.4540.34261.camel@hal.voltaire.com> OpenSM: Support configurable SL per partition The following patch series adds support for configurable SL per partition in the partition manager as well as the underlying SA (path and multipath) record support. Signed-off-by: Hal Rosenstock From halr at voltaire.com Wed Jul 19 03:26:55 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 06:26:55 -0400 Subject: [openib-general] [PATCHv3 1/3] OpenSM: Support configurable SL per partition Message-ID: <1153304815.4540.34263.camel@hal.voltaire.com> OpenSM: Support configurable SL per partition Signed-off-by: Sasha Khapyorsky Signed-off-by: Hal Rosenstock Index: include/opensm/osm_partition.h =================================================================== --- include/opensm/osm_partition.h (revision 8574) +++ include/opensm/osm_partition.h (working copy) @@ -102,6 +102,7 @@ typedef struct _osm_prtn { cl_map_item_t map_item; uint16_t pkey; + uint8_t sl; cl_map_t full_guid_tbl; cl_map_t part_guid_tbl; char name[32]; @@ -114,6 +115,9 @@ typedef struct _osm_prtn * pkey * The IBA defined P_KEY of this Partition. * +* sl +* The Service Level (SL) associated with this Partiton. +* * port_guid_tbl * Container of pointers to all Port objects in the Partition, * indexed by port GUID. Index: opensm/osm_prtn.c =================================================================== --- opensm/osm_prtn.c (revision 8574) +++ opensm/osm_prtn.c (working copy) @@ -79,6 +79,7 @@ osm_prtn_t* osm_prtn_new( memset(p, 0, sizeof(*p)); p->pkey = pkey; + p->sl = OSM_DEFAULT_SL; cl_map_construct(&p->full_guid_tbl); cl_map_init(&p->full_guid_tbl, 32); cl_map_construct(&p->part_guid_tbl); @@ -220,7 +221,7 @@ ib_api_status_t osm_prtn_add_mcgroup(osm mc_rec.pkey = pkey; mc_rec.rate = rate ? rate : 0x3; /* 10Gb/sec */ mc_rec.pkt_life = OSM_DEFAULT_SUBNET_TIMEOUT; - mc_rec.sl_flow_hop = OSM_DEFAULT_SL << 28; + mc_rec.sl_flow_hop = ib_member_set_sl_flow_hop(p->sl, 0, 0); /* Note: scope needs to be consistent with MGID */ mc_rec.scope_state = 0x21; Index: opensm/osm_prtn_config.c =================================================================== --- opensm/osm_prtn_config.c (revision 8574) +++ opensm/osm_prtn_config.c (working copy) @@ -52,8 +52,10 @@ #include #include #include +#include #include +#include #include #include #include @@ -82,7 +84,7 @@ struct part_conf { osm_log_t *p_log; osm_subn_t *p_subn; osm_prtn_t *p_prtn; - unsigned is_ipoib, mtu, rate; + unsigned is_ipoib, mtu, rate, sl; }; @@ -122,6 +124,16 @@ static int partition_create(unsigned lin if (!conf->p_prtn) return -1; + if (conf->p_subn->opt.no_qos) { + if (conf->sl != OSM_DEFAULT_SL) { + osm_log(conf->p_log, OSM_LOG_ERROR, + "partition_create: Overriding SL %d to default SL %d on partition %s as QoS not enabled\n", + conf->sl, OSM_DEFAULT_SL, name); + conf->sl = OSM_DEFAULT_SL; + } + } + conf->p_prtn->sl = conf->sl; + if (conf->is_ipoib) osm_prtn_add_mcgroup(conf->p_log, conf->p_subn, conf->p_prtn, conf->is_ipoib, conf->rate, conf->mtu); @@ -145,6 +157,17 @@ static int partition_add_flag(unsigned l PARSEWARN(conf->p_log, lineno, "flag \'rate\' requires valid value" " - skipped.\n"); + } else if (!strncmp(flag, "sl", len)) { + unsigned sl; + char *end; + + if (!val || !*val || (sl = strtoul(val, &end, 0)) > 15 || + (*end && !isspace(*end))) + PARSEWARN(conf->p_log, lineno, + "flag \'sl\' requires valid value" + " - skipped.\n"); + else + conf->sl = sl; } else { PARSEWARN(conf->p_log, lineno, "unrecognized partition flag \'%s\'" @@ -254,6 +277,8 @@ static struct part_conf *new_part_conf(o conf->p_log = p_log; conf->p_subn = p_subn; conf->p_prtn = NULL; + conf->is_ipoib = 0; + conf->sl = OSM_DEFAULT_SL; return conf; } Index: doc/partition-config.txt =================================================================== --- doc/partition-config.txt (revision 8574) +++ doc/partition-config.txt (working copy) @@ -46,6 +46,7 @@ ipoib - indicates that this partiti result IPoIB capable MC group will be created. rate= - specifies rate for this IPoIB MC group (default is 3 (10GBps)) mtu= - specifies MTU for this IPoIB MC group (default is 4 (2048)) +sl= - specifies SL for this IPoIB MC group (default is 0) Note that values for 'rate' and 'mtu' should be specified as defined in IBTA specification (for example mtu=4 for 2048). From halr at voltaire.com Wed Jul 19 03:27:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 06:27:02 -0400 Subject: [openib-general] [PATCHv3 2/3] OpenSM/osm_sa_path_record.c: Support SL in PathRecord requests Message-ID: <1153304818.4540.34265.camel@hal.voltaire.com> OpenSM/osm_sa_path_record.c: Support SL in PathRecord requests Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8579) +++ opensm/osm_sa_path_record.c (working copy) @@ -66,6 +66,7 @@ #include #include #include +#include #define OSM_PR_RCV_POOL_MIN_SIZE 64 #define OSM_PR_RCV_POOL_GROW_SIZE 64 @@ -164,6 +165,7 @@ __osm_pr_rcv_get_path_parms( const osm_physp_t* p_physp; const osm_physp_t* p_dest_physp; const osm_switch_t* p_sw; + const osm_prtn_t* p_prtn; const ib_port_info_t* p_pi; const cl_qmap_t* p_sw_tbl; ib_api_status_t status = IB_SUCCESS; @@ -174,6 +176,7 @@ __osm_pr_rcv_get_path_parms( uint8_t required_mtu; uint8_t required_rate; uint8_t required_pkt_life; + uint8_t sl; ib_net16_t dest_lid; OSM_LOG_ENTER( p_rcv->p_log, __osm_pr_rcv_get_path_parms ); @@ -556,7 +559,6 @@ __osm_pr_rcv_get_path_parms( p_parms->mtu = mtu; p_parms->rate = rate; p_parms->pkt_life = pkt_life; - p_parms->sl = OSM_DEFAULT_SL; if( comp_mask & IB_PR_COMPMASK_RAWTRAFFIC && cl_ntoh32( p_pr->hop_flow_raw ) & ( 1<<31 ) ) @@ -568,8 +570,9 @@ __osm_pr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_pr_rcv_get_path_parms: ERR 1F1A: " - "Ports do not share specified P_Key\n"); + "Ports do not share specified PKey 0x%04x\n", cl_ntoh16(pkey)); status = IB_NOT_FOUND; + goto Exit; } } else @@ -579,11 +582,34 @@ __osm_pr_rcv_get_path_parms( { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_pr_rcv_get_path_parms: ERR 1F1B: " - "Ports do not have any shared P_Keys\n"); + "Ports do not have any shared PKeys\n"); status = IB_NOT_FOUND; + goto Exit; } } + + p_prtn = (osm_prtn_t *)cl_qmap_get(&p_rcv->p_subn->prtn_pkey_tbl, + pkey & cl_ntoh16((uint16_t)~0x8000)); + if ( p_prtn == (osm_prtn_t *)cl_qmap_end(&p_rcv->p_subn->prtn_pkey_tbl) ) + { + /* this may be possible when pkey tables are created somehow in + previous runs or things are going wrong here */ + sl = OSM_DEFAULT_SL; + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_pr_rcv_get_path_parms: ERR 1F1C: " + "No partition found for PKey 0x%04x - using default SL %d\n", cl_ntoh16(pkey), sl ); + } + else + sl = p_prtn->sl; + + if ( ( comp_mask & IB_PR_COMPMASK_SL ) && ib_path_rec_sl( p_pr ) != sl ) + { + status = IB_NOT_FOUND; + goto Exit; + } + p_parms->pkey = pkey; + p_parms->sl = sl; Exit: OSM_LOG_EXIT( p_rcv->p_log ); @@ -621,7 +647,7 @@ __osm_pr_rcv_build_pr( p_pr->slid = cl_hton16( src_lid_ho ); p_pr->pkey = p_parms->pkey; - p_pr->sl = p_parms->sl; + p_pr->sl = cl_hton16(p_parms->sl); p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80); p_pr->rate = (uint8_t)(p_parms->rate | 0x80); @@ -1002,18 +1028,6 @@ __osm_pr_rcv_get_end_points( into the endpoints. */ - if( comp_mask & IB_PR_COMPMASK_SL ) - { - if( p_pr->sl != OSM_DEFAULT_SL ) - { - if ( p_sa_mad->method == IB_MAD_METHOD_GET ) - sa_status = IB_SA_MAD_STATUS_NO_RECORDS; - *pp_src_port = 0; - *pp_dest_port = 0; - goto Exit; - } - } - if( comp_mask & IB_PR_COMPMASK_SGID ) { *pp_src_port = (osm_port_t*)cl_qmap_get( @@ -1780,7 +1794,7 @@ osm_pr_rcv_process( if ( p_mgrp ) { /* Make sure the rest of the PathRecord matches the MC group attributes */ - status = __osm_pr_match_mgrp_attributes( p_rcv, p_madw, p_mgrp); + status = __osm_pr_match_mgrp_attributes( p_rcv, p_madw, p_mgrp ); if ( status == IB_SUCCESS ) { p_pr_item = (osm_pr_item_t*)cl_qlock_pool_get( &p_rcv->pr_pool ); @@ -1812,7 +1826,7 @@ osm_pr_rcv_process( /* SL, Hop Limit, and Flow Label */ ib_member_get_sl_flow_hop( p_mgrp->mcmember_rec.sl_flow_hop, &sl, &flow_label, &hop_limit ); - p_pr_item->path_rec.sl = sl; + p_pr_item->path_rec.sl = cl_hton16( sl ); p_pr_item->path_rec.hop_flow_raw = (uint32_t)(hop_limit) | (flow_label << 8); From halr at voltaire.com Wed Jul 19 03:27:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 06:27:07 -0400 Subject: [openib-general] [PATCHv2 3/3] OpenSM/osm_sa_multipath_record.c: Support SL in MultiPathRecord requests Message-ID: <1153304822.4540.34267.camel@hal.voltaire.com> OpenSM/osm_sa_multipath_record.c: Support SL in MultiPathRecord requests Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_multipath_record.c =================================================================== --- opensm/osm_sa_multipath_record.c (revision 8574) +++ opensm/osm_sa_multipath_record.c (working copy) @@ -62,6 +62,7 @@ #include #include #include +#include #include #include #include @@ -164,6 +165,7 @@ __osm_mpr_rcv_get_path_parms( const osm_physp_t* p_physp; const osm_physp_t* p_dest_physp; const osm_switch_t* p_sw; + const osm_prtn_t* p_prtn; const ib_port_info_t* p_pi; const cl_qmap_t* p_sw_tbl; ib_slvl_table_t* p_slvl_tbl; @@ -193,28 +195,55 @@ __osm_mpr_rcv_get_path_parms( mtu = ib_port_info_get_neighbor_mtu( p_pi ); rate = ib_port_info_compute_rate( p_pi ); - if ( comp_mask & IB_MPR_COMPMASK_SL ) - required_sl = ib_multipath_rec_sl( p_mpr ); - else - required_sl = OSM_DEFAULT_SL; - - if ( comp_mask & IB_MPR_COMPMASK_PKEY ) { + if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC && + cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) ) + required_pkey = 0; + else if ( comp_mask & IB_MPR_COMPMASK_PKEY ) { required_pkey = p_mpr->pkey; - if ( !osm_physp_has_pkey( p_rcv->p_log, required_pkey, p_physp ) || - !osm_physp_has_pkey( p_rcv->p_log, required_pkey, p_dest_physp ) ) { - osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, - "__osm_mpr_rcv_get_path_parms: " - "path not found for PKey = 0x%04x\n" + if( !osm_physp_share_this_pkey( p_physp, p_dest_physp, required_pkey ) ) { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4518: " + "Ports do not share specified PKey 0x%04x\n" "\t\tsrc %Lx dst %Lx\n", cl_ntoh16( required_pkey ), cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), cl_ntoh64( osm_physp_get_port_guid( p_dest_physp ) ) ); - status = IB_NOT_FOUND; goto Exit; } - } else - required_pkey = IB_DEFAULT_PKEY; + } else { + required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp ); + if ( !required_pkey ) { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4519: " + "Ports do not have any shared PKeys\n" + "\t\tsrc %Lx dst %Lx\n", + cl_ntoh64( osm_physp_get_port_guid( p_physp ) ), + cl_ntoh64( osm_physp_get_port_guid( p_dest_physp ) ) ); + status = IB_NOT_FOUND; + goto Exit; + } + } + + p_prtn = (osm_prtn_t *)cl_qmap_get(&p_rcv->p_subn->prtn_pkey_tbl, + required_pkey & cl_ntoh16((uint16_t)~0x8000)); + if ( p_prtn == (osm_prtn_t *)cl_qmap_end(&p_rcv->p_subn->prtn_pkey_tbl) ) + { + /* this may be possible when pkey tables are created somehow in + previous runs or things are going wrong here */ + required_sl = OSM_DEFAULT_SL; + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 451A: " + "No partition found for PKey 0x%04x - using default SL %d\n", cl_ntoh16(required_pkey), required_sl ); + } + else + required_sl = p_prtn->sl; + + if ( ( comp_mask & IB_MPR_COMPMASK_SL ) && ib_multipath_rec_sl( p_mpr ) != required_sl ) + { + status = IB_NOT_FOUND; + goto Exit; + } /* Walk the subnet object from source to destination, @@ -641,7 +670,7 @@ __osm_mpr_rcv_build_pr( p_pr->slid = cl_hton16( src_lid_ho ); p_pr->pkey = p_parms->pkey; - p_pr->sl = p_parms->sl; + p_pr->sl = cl_hton16( p_parms->sl ); p_pr->mtu = (uint8_t)( p_parms->mtu | 0x80 ); p_pr->rate = (uint8_t)( p_parms->rate | 0x80 ); From ogerlitz at voltaire.com Wed Jul 19 04:07:00 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 19 Jul 2006 14:07:00 +0300 Subject: [openib-general] IPoIB "bonding" solution for OFED 1.1 (was re: [PATCH] ipoib: fix address update handling (was Re: OFED 1.1 release - schedule and features)) In-Reply-To: <20060717140349.GA19080@mellanox.co.il> References: <20060717140349.GA19080@mellanox.co.il> Message-ID: <44BE1254.8070403@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Roland Dreier : >> Subject: Re: OFED 1.1 release - schedule and features >> >> > So if the link which ib0 maps to is DOWN you move the ib0 IPv4 address >> > to another device whose link is UP (eg ib1) and you somehow have ib1 >> > send a gratuitous ARP? >> >> I think there may be a problem in the way IPoIB deals with gratuitous >> ARPs. Because if a neighbour structure is updated by the networking >> core, there's no way for IPoIB to know about that and update the >> associated IB path. >> Has anyone actually tried this failover approach? > OK, we've seen the problem here - and here's a patch to fix it. > Seems to work fine here - I'll let it run for a day just to make sure. Hi Tziporet, Do you have an initial drop of the bonding solution planned for OFED 1.1 that is ready to see the daylight? if not, when is this expected? As i mentioned to you, we are investigating few possible ways to implement HA for IPoIB and want to examine your approach as well. Or. From halr at voltaire.com Wed Jul 19 04:57:26 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 07:57:26 -0400 Subject: [openib-general] [PATCH][MINOR]OpenSM/osm_qos.c: In vl_high_limit_update, use ib_port_info_set_state_no_change routine Message-ID: <1153310244.4540.34877.camel@hal.voltaire.com> OpenSM/osm_qos.c: In vl_high_limit_update, use ib_port_info_set_state_no_change routine Signed-off-by: Hal Rosenstock Index: opensm/osm_qos.c =================================================================== --- opensm/osm_qos.c (revision 8584) +++ opensm/osm_qos.c (working copy) @@ -247,8 +247,7 @@ static ib_api_status_t vl_high_limit_upd memcpy(payload, p_pi, sizeof(ib_port_info_t)); p_pi = (ib_port_info_t *) payload; - p_pi->state_info2 = 0; - ib_port_info_set_port_state(p_pi, IB_LINK_NO_CHANGE); + ib_port_info_set_state_no_change(p_pi); p_pi->vl_high_limit = qcfg->vl_high_limit; From halr at voltaire.com Wed Jul 19 05:13:36 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 08:13:36 -0400 Subject: [openib-general] [PATCH] OpenSM/QoS: Better SLtoVLMappingTable determination Message-ID: <1153311215.4540.35013.camel@hal.voltaire.com> OpenSM/QoS: Better SLtoVLMappingTable determination On a switch, if VLCap 1 indicating 1 data VL, determine whether SLtoVLMappingTable exists from port 0's capability mask. If more than 1 data VL, SLtoVLMappingTable must be supported. For HCA and router ports, just check the capability mask. Signed-off-by: Hal Rosenstock Index: opensm/osm_qos.c =================================================================== --- opensm/osm_qos.c (revision 8586) +++ opensm/osm_qos.c (working copy) @@ -197,27 +197,33 @@ static ib_api_status_t sl2vl_update_tabl cl_hton32(attr_mod), CL_DISP_MSGID_NONE, &context); } -static ib_api_status_t sl2vl_update(osm_req_t * p_req, +static ib_api_status_t sl2vl_update(osm_req_t * p_req, osm_port_t * p_port, osm_physp_t * p, unsigned port_num, const struct qos_config *qcfg) { ib_api_status_t status; unsigned i, num_ports; - ib_port_info_t *p_pi = osm_physp_get_port_info_ptr(p); + ib_port_info_t *p_pi; + osm_physp_t *p_physp; if (!p_pi) return IB_ERROR; - if (port_num == 0) { + if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) { + if (ib_port_info_get_vl_cap(p_pi) == 1) { + /* Check port 0's capability mask */ + p_physp = osm_port_get_default_phys_ptr(p_port); + p_pi = osm_physp_get_port_info_ptr(p_physp); + if (!(p_pi->capability_mask & IB_PORT_CAP_HAS_SL_MAP)) + return IB_SUCCESS; + } + num_ports = osm_node_get_num_physp(osm_physp_get_node_ptr(p)); + } else { + p_pi = osm_physp_get_port_info_ptr(p); if (!(p_pi->capability_mask & IB_PORT_CAP_HAS_SL_MAP)) return IB_SUCCESS; - } else if (ib_port_info_get_vl_cap(p_pi) == 1) - return IB_SUCCESS; - - if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) - num_ports = osm_node_get_num_physp(osm_physp_get_node_ptr(p)); - else num_ports = 1; + } for (i = 0; i < num_ports; i++) { status = @@ -267,7 +273,8 @@ static ib_api_status_t vl_high_limit_upd } static ib_api_status_t qos_physp_setup(osm_log_t * p_log, osm_req_t * p_req, - osm_physp_t * p, unsigned port_num, + osm_port_t * p_port, osm_physp_t * p, + unsigned port_num, const struct qos_config *qcfg) { ib_api_status_t status; @@ -297,7 +304,7 @@ static ib_api_status_t qos_physp_setup(o } /* setup SL2VL tables */ - status = sl2vl_update(p_req, p, port_num, qcfg); + status = sl2vl_update(p_req, p_port, p, port_num, qcfg); if (status != IB_SUCCESS) { osm_log(p_log, OSM_LOG_ERROR, "qos_physp_setup: ERR 6203 : " @@ -356,7 +363,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t continue; status = qos_physp_setup(&p_osm->log, &p_osm->sm.req, - p_physp, i, &swe_config); + p_port, p_physp, i, &swe_config); } /* skip base port 0 */ p_sw = osm_get_switch_by_guid(&p_osm->subn, @@ -377,7 +384,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t continue; status = qos_physp_setup(&p_osm->log, &p_osm->sm.req, - p_physp, 0, cfg); + p_port, p_physp, 0, cfg); } cl_plock_release(&p_osm->lock); From halr at voltaire.com Wed Jul 19 06:51:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 09:51:10 -0400 Subject: [openib-general] [PATCH][MINOR] OpenSM: Simplify some endian conversions in osm_prtn.c and osm_pkey_mgr.c Message-ID: <1153317067.4540.35854.camel@hal.voltaire.com> OpenSM: Simplify some endian conversions in osm_prtn.c and osm_pkey_mgr.c Signed-off-by: Hal Rosenstock Index: opensm/osm_prtn.c =================================================================== --- opensm/osm_prtn.c (revision 8586) +++ opensm/osm_prtn.c (working copy) @@ -208,7 +208,7 @@ ib_api_status_t osm_prtn_add_mcgroup(osm osm_sa_t *p_sa = &p_subn->p_osm->sa; ib_api_status_t status = IB_SUCCESS; - pkey = cl_hton16(cl_ntoh16(p->pkey) | 0x8000); + pkey = p->pkey | cl_hton16(0x8000); memset(&mc_rec, 0, sizeof(mc_rec)); @@ -272,7 +272,7 @@ osm_prtn_t *osm_prtn_make_new(osm_log_t return NULL; if (cl_ntoh16(pkey) & 0x8000) { - pkey = cl_hton16(cl_ntoh16(pkey) & ~0x8000); + pkey &= cl_hton16(~0x8000); osm_log(p_log, OSM_LOG_VERBOSE, "osm_prtn_make_new: pkey stripped for" " partition \'%s\' (0x%04x)\n", Index: opensm/osm_pkey_mgr.c =================================================================== --- opensm/osm_pkey_mgr.c (revision 8586) +++ opensm/osm_pkey_mgr.c (working copy) @@ -172,7 +172,7 @@ pkey_mgr_process_partition_table( osm_physp_t *p_physp; if (full) - pkey = cl_hton16( cl_ntoh16( pkey ) | 0x8000 ); + pkey |= cl_hton16( 0x8000 ); i_next = cl_map_head( p_tbl ); while ( i_next != cl_map_end( p_tbl ) ) From mst at mellanox.co.il Wed Jul 19 07:30:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 19 Jul 2006 17:30:17 +0300 Subject: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 In-Reply-To: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> References: <000301c6a480$6f7cb210$ff0da8c0@amr.corp.intel.com> Message-ID: <20060719143017.GA15384@mellanox.co.il> This patch looks good and worked well here for several days. An earlier version of this patch (by Jack) has also shipped with OFED 1.0. As was discussed earlier, I suggest queueing this for 2.6.18 - without this patch, we observe that RMPP transactions are sometimes aborted because of duplicate packets (which result typically from retries after ACK got lost). Acked-by: Michael S. Tsirkin Acked-by: Jack Morgenstein Quoting r. Sean Hefty : Subject: [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1 Enhance validation of MADs sent by userspace clients for spec compliance with C13-18.1.1 (duplicate requests / responses). Also verify that RMPP MADs are data only, to avoid a userspace app causing a kernel crash by sending non-data MADs. Signed-off-by: Sean Hefty -- MST From swise at opengridcomputing.com Wed Jul 19 07:33:39 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 19 Jul 2006 09:33:39 -0500 Subject: [openib-general] ping problem with ammassocards(iWARPinterface) In-Reply-To: <000b01c6aaca$885b5340$4710100a@pc.s2io.com> References: <000b01c6aaca$885b5340$4710100a@pc.s2io.com> Message-ID: <1153319619.18035.1.camel@stevo-desktop> Cool! I'm busy now with other things, but it sounds like there's some bugs in rping... On Tue, 2006-07-18 at 17:30 -0700, Ravinandan Arakali wrote: > Steve/Pradipta, > Without the -O2 option, rping is now working ! > > Earlier, I did not realize that the cable was yanked out. > Thanks for all the help.. > > Ravi > -----Original Message----- > From: Ravinandan Arakali [mailto:ravinandan.arakali at neterion.com] > Sent: Friday, July 14, 2006 3:37 PM > To: 'Steve Wise' > Cc: 'bpradip at in.ibm.com'; 'openib-general at openib.org'; Leonid. Grossman > (E-mail) > Subject: RE: [openib-general] ping problem with > ammassocards(iWARPinterface) > > > As Pradipta suggested, I rebuilt the libraries by removing > the optimization(-O2 flag) from Makefile. Now, I don't see the > core dump but there's no connection established with rping. > This is similar to the failure I am seeing with rdma_lat test. > > BTW, when I start the rping in server mode, at say port 9999, > should I expect to see an entity listening on that port number > when I do "netstat -an". Currently, I don't see that. > > Ravi > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Thursday, July 13, 2006 12:10 PM > To: ravinandan.arakali at neterion.com > Cc: bpradip at in.ibm.com; openib-general at openib.org > Subject: Re: [openib-general] ping problem with > ammassocards(iWARPinterface) > > > By the way, does this failure happen immediately or after some period of > time? > > > On Thu, 2006-07-13 at 13:27 -0500, Steve Wise wrote: > > I guess this isn't surprising since rping doesn't work for you either. > > Something fundamental is screwed up on your user side methinks... > > > > CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side > > wasn't listening on the appropriate TCP port, or the server process did > > an rdma_reject(). I'm guessing its the former... > > > > You could use tcpdmp and to see if the connection request is getting RST > > by the remote side. > > > > > > > > > > On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote: > > > With the --cma option, I don't see the error about running SM. > > > But there's no connection established. > > > > > > openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma > > > pp_server_connect_cma starting server > > > > > > openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 > > > pp_client_connect_cma starting client > > > pp_client_connect_cma/856 unexpected CM event 8 > > > pp_client_connect_cma NOT connected! > > > pp_connect_cma(17.2.2.102,18515) failed! > > > > > > There are no messages in dmesg either. > > > > > > Ravi > > > > > > -----Original Message----- > > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > > Sent: Thursday, July 13, 2006 6:55 AM > > > To: Ravinandan Arakali > > > Cc: bpradip at in.ibm.com; openib-general at openib.org > > > Subject: Re: [openib-general] ping problem with ammasso > > > cards(iWARPinterface) > > > > > > > > > Are you trying to run this over iwarp? It doesn't need an SM... > > > > > > For the perftests rdma_lat and rdma_bw in the iwarp branch, use the > > > --cma flag. > > > > > > Steve. > > > > > > > > > On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: > > > > Also, I am trying to run some of the iwarp bandwidth/latency tests > > > > (available under directory perftest). > > > > The first thing to do here is to run opensm. When I run opensm (with > debug > > > > level 10), I get the following error. Any idea what needs to be done > to > > > get > > > > this working ? > > > > > > > > openfab2:/tmp/ib/src/userspace # opensm -d 10 > > > > ------------------------------------------------- > > > > OpenSM Rev:openib-1.2.0 > > > > Command Line Arguments: > > > > d level = 0xa > > > > Log File: /var/log/osm.log > > > > ------------------------------------------------- > > > > OpenSM Rev:openib-1.2.0 > > > > > > > > Using default GUID 0x0 > > > > Error: Could not get port guid > > > > Exiting SM > > > > > > > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log > > > > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 > > > > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 > > > > > > > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: > No > > > > previous bind > > > > Jul 12 08:35:04 721702 [0000] -> Exiting SM > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > > Sent: Wednesday, July 12, 2006 10:31 AM > > > > To: Ravinandan Arakali > > > > Cc: openib-general at openib.org > > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > > interface) > > > > > > > > > > > > Ravinandan, > > > > Do you still see the rping crash? > > > > > > > > Thanks, > > > > Pradipta Kumar. > > > > > > > > Ravinandan Arakali wrote: > > > > > Pradipta, > > > > > Okay, thanks.. Initially, I was not sure since I don't remember > non-zero > > > > > values in /proc/krping. When I re-ran the krping test, I see > following > > > > > output > > > > > openfab2:~ # cat /proc/krping > > > > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 > > > > > > > > > > As you mentioned, the RDMA traffic seems to be flowing indeed ! > > > > > Any idea why rping is dumping core ? > > > > > > > > > > Has any testing been done using SDP with ammasso cards ? > > > > > > > > > > Regards, > > > > > Ravi > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] > > > > > Sent: Friday, July 07, 2006 11:20 PM > > > > > To: Ravinandan Arakali > > > > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; > > > > > openib-general at openib.org > > > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP > > > > > interface) > > > > > > > > > > > > > > > Ravinandan Arakali wrote: > > > > >> Pradipta, > > > > >> Following is the output from gdb after core dump. I have also > > > copy-pasted > > > > >> the gdb output on client system. > > > > >> > > > > >> Attached is the dmesg output when krping test is run in verbose > mode. > > > > >> The ping data on the sender(client) seems okay. The content is > shifted > > > > >> forward by one character for each packet. On receiver, after > receiving > > > > > ping > > > > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because > > > > messages > > > > >> can be lost during writing to /var/log/messages ? > > > > > krping is indeed working!!...Using 'verbose' allows you to see the > ping > > > > > data. > > > > > When not using 'verbose' you see only 'send/recv' messages. > > > > >> ----------------------------------------- > > > > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 > > > > >> Starting program: > > > > >> > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a > > > > >> 0.0.0.0 -p 9999 > > > > >> [Thread debugging using libthread_db enabled] > > > > >> [New Thread -1210054992 (LWP 3668)] > > > > >> ipaddr (0.0.0.0) > > > > >> port 9999 > > > > >> created cm_id 0x804e6e0 > > > > >> [New Thread -1210057824 (LWP 3671)] > > > > >> rdma_bind_addr successful > > > > >> rdma_listen > > > > >> cma_event type 4 cma_id 0x804e968 (child) > > > > >> child cma 0x804e968 > > > > >> > > > > >> Program received signal SIGSEGV, Segmentation fault. > > > > >> [Switching to Thread -1210054992 (LWP 3668)] > > > > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 > > > > >> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > > > > >> (gdb) bt > > > > >> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at > examples/rping.c:514 > > > > >> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at > address > > > 0x6 > > > > >> ) at examples/rping.c:767 > > > > >> (gdb) > > > > >> > > > > >> --------------------------------- > > > > >> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 > > > > >> Starting program: > > > > >> > > > tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a > > > > >> 17.2.2.102 -p 9999 > > > > >> [Thread debugging using libthread_db enabled] > > > > >> [New Thread 47388824908032 (LWP 4620)] > > > > >> ipaddr (17.2.2.102) > > > > >> port 9999 > > > > >> created cm_id 0x506b00 > > > > >> [New Thread 1082132800 (LWP 4623)] > > > > >> cma_event type 0 cma_id 0x506b00 (parent) > > > > >> cma_event type 2 cma_id 0x506b00 (parent) > > > > >> rdma_resolve_addr - rdma_resolve_route successful > > > > >> created pd 0x506e60 > > > > >> created channel 0x506e80 > > > > >> created cq 0x506ea0 > > > > >> created qp 0x506f40 > > > > >> rping_setup_buffers called on cb 0x505010 > > > > >> allocated & registered buffers... > > > > >> [New Thread 1090525504 (LWP 4624)] > > > > >> cq_thread started. > > > > >> > > > > >> > > > > > > > > > > > > > > > _______________________________________________ > > > > > openib-general mailing list > > > > > openib-general at openib.org > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > To unsubscribe, please visit > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > From mst at mellanox.co.il Wed Jul 19 07:58:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 19 Jul 2006 17:58:04 +0300 Subject: [openib-general] 2 patches in mst-for-2.6.18 Message-ID: <20060719145804.GB15384@mellanox.co.il> Roland, I have put the following patches in my mst-for-2.6.18 tree: $git log --pretty=short origin..mst-for-2.6.18 commit e882c87183b09edaf827d887d7897d2f73212392 Author: Michael S. Tsirkin [PATCH] IB/ipoib: fix packet loss after hardware address update commit 6ee6eba4d3160150a43fa5215a24639c6fb1cd0b Author: Sean Hefty [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 I suggest queueing them both for 2.6.18. You can get it here git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 Both patches have been posted on the list previously, and have been tested for several days here in Mellanox. -- MST From rdreier at cisco.com Wed Jul 19 08:40:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 19 Jul 2006 08:40:09 -0700 Subject: [openib-general] 2 patches in mst-for-2.6.18 In-Reply-To: <20060719145804.GB15384@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 19 Jul 2006 17:58:04 +0300") References: <20060719145804.GB15384@mellanox.co.il> Message-ID: > [PATCH] IB/ipoib: fix packet loss after hardware address update Hm, OK I guess. Are we sure that the memcmp in the data path is OK? > [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 Seems fine but I wonder why the ib_response_mad function moved into a public header. Also, how does this patch do this: > Also verify that RMPP MADs are data only, to avoid a userspace > app causing a kernel crash by sending non-data MADs. what is the crash? And how does adding a check for the BM class fix this? - R. From halr at voltaire.com Wed Jul 19 08:32:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 11:32:49 -0400 Subject: [openib-general] [PATCH][TRIVIAL] OpenSM/osm_sa_service_record.c: Some trivial cleanups Message-ID: <1153323168.4540.36658.camel@hal.voltaire.com> OpenSM/osm_sa_service_record.c: Some trivial cleanups Assert on attribute ID where most common Also, eliminate unused variable in osm_sr_rcv_process Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_service_record.c =================================================================== --- opensm/osm_sa_service_record.c (revision 8589) +++ opensm/osm_sa_service_record.c (working copy) @@ -791,8 +791,6 @@ osm_sr_rcv_process_get_method( p_recvd_service_rec = (ib_service_record_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); - CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_SERVICE_RECORD ); - if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) { osm_dump_service_record( p_rcv->p_log, @@ -1060,7 +1058,6 @@ osm_sr_rcv_process( { ib_sa_mad_t * p_sa_mad; ib_net16_t sa_status = IB_SA_MAD_STATUS_REQ_INVALID; - ib_service_record_t* p_recvd_service_rec; boolean_t valid; OSM_LOG_ENTER( p_rcv->p_log, osm_sr_rcv_process ); @@ -1068,8 +1065,8 @@ osm_sr_rcv_process( CL_ASSERT( p_madw ); p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); - p_recvd_service_rec = - (ib_service_record_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); + + CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_SERVICE_RECORD ); switch (p_sa_mad->method) { From eitan at mellanox.co.il Wed Jul 19 09:25:53 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 19 Jul 2006 19:25:53 +0300 Subject: [openib-general] [PATCHv2 3/3] OpenSM/osm_sa_multipath_record.c: Support SL inMultiPathRecord requests Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D8CDC@mtlexch01.mtl.com> Sure. Did you catch the P_Key issues using a test? Eitan From mshefty at ichips.intel.com Wed Jul 19 09:38:25 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 19 Jul 2006 09:38:25 -0700 Subject: [openib-general] 2 patches in mst-for-2.6.18 In-Reply-To: References: <20060719145804.GB15384@mellanox.co.il> Message-ID: <44BE6001.2090505@ichips.intel.com> Roland Dreier wrote: > > [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 > > Seems fine but I wonder why the ib_response_mad function moved into a > public header. This functionality was implemented / called from several places. Private implementations have been replaced with a single public call. > Also, how does this patch do this: > > > Also verify that RMPP MADs are data only, to avoid a userspace > > app causing a kernel crash by sending non-data MADs. > > what is the crash? And how does adding a check for the BM class fix this? I think this is just an invalid comment. The user MAD code in the kernel already prevented a userspace app from sending non-data MADs. I removed this comment when performing the actual commit to svn (revision 8498). Michael, did you pull in the patch from svn, or that posted to the list? There could have been minor differences between them. - Sean From mshefty at ichips.intel.com Wed Jul 19 09:45:21 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 19 Jul 2006 09:45:21 -0700 Subject: [openib-general] [PATCH] ib_cm: fix REJ due to invalid GID In-Reply-To: <20060719074327.GA15325@mellanox.co.il> References: <000201c6aab6$d267ddf0$e598070a@amr.corp.intel.com> <20060719074327.GA15325@mellanox.co.il> Message-ID: <44BE61A1.50602@ichips.intel.com> Michael S. Tsirkin wrote: > Should we be worried that this also sends IB_CM_REJ_INVALID_GID if > find_cached_pkey fails? I'm not sure that an invalid pkey necessarily maps better to any other reject code... - Sean From mshefty at ichips.intel.com Wed Jul 19 09:55:21 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 19 Jul 2006 09:55:21 -0700 Subject: [openib-general] [PATCH] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153295167.6972.156.camel@confield.dd.xiranet.com> References: <000201c6aab6$d267ddf0$e598070a@amr.corp.intel.com> <1153295167.6972.156.camel@confield.dd.xiranet.com> Message-ID: <44BE63F9.7030006@ichips.intel.com> Arne Redlich wrote: > Won't this lead to a second REJ (IB_CM_REJ_CONSUMER_DEFINED) being > issued by cm_destroy_id()? That's actually what I tried to prevent using > the "reject" param. Hmm... it does. I missed that in my traces. - Sean From mst at mellanox.co.il Wed Jul 19 10:05:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 19 Jul 2006 20:05:22 +0300 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153294363.6972.151.camel@confield.dd.xiranet.com> References: <1153294363.6972.151.camel@confield.dd.xiranet.com> Message-ID: <20060719170522.GA16573@mellanox.co.il> Quoting r. Arne Redlich : > Subject: Re: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > Am Dienstag, den 18.07.2006, 12:21 -0700 schrieb Sean Hefty: > > Arne Redlich wrote: > > > Yep - the Gen1 SRP initiator does. It sends a REQ with an invalid DGID. > > > If rejected with the correct code (INVALID GID), it will retry after > > > looking up the GID. > > > > Didn't it have a DGID from a path record that it had to get before sending the REQ? > > If I'm not completely mistaken, the path lookup is only done after > receiving an IB_CM_REJ_INVALID_GID. I have no idea why it is done this > way, though. > > Arne Er ... how does it send the REQ then? -- MST From somenath at veritas.com Wed Jul 19 10:17:36 2006 From: somenath at veritas.com (somenath) Date: Wed, 19 Jul 2006 10:17:36 -0700 Subject: [openib-general] openib tools to update mellnox firmware Message-ID: <44BE6930.1080105@veritas.com> I see this messages while booting my system: Jul 17 10:35:37 alekhine kernel: ib_mthca 0000:03:00.0: HCA FW version 4.6.0 is old (4.7.0 is current). Jul 17 10:35:37 alekhine kernel: ib_mthca 0000:03:00.0: If you have problems, try updating your HCA FW. where can I find the latest firmware for ib_mthca (pci express card) and tools to update this? thanks, som. From somenath at veritas.com Wed Jul 19 10:16:26 2006 From: somenath at veritas.com (somenath) Date: Wed, 19 Jul 2006 10:16:26 -0700 Subject: [openib-general] API clarification: ib_get_dma_mr() Message-ID: <44BE68EA.9050809@veritas.com> only documentation I find on this API is: /** * ib_get_dma_mr - Returns a memory region for system memory that is * usable for DMA. * @pd: The protection domain associated with the memory region. * @mr_access_flags: Specifies the memory access rights. */ I see there is no address associated with it, so I assume this is some kind of reserved key stuff. now my questions are: 1. it basically returns l_key and r_key: can those keys be used with any phys/bus addresseses? (address returned by dma_map_single(), dma_map_xxx(), virt_to_phys() etc) 2. can the same l_key and r_key be used for different addresses at the same time? I mean can one just use the API once to get a pair of l_key and r_key and use for all post_recv and send ops (including rdma's)? and my next question is on (valid for another similar API on fast memory reg too): struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, u64 *iova_start); q1:is there any use of iova_start returned by this API? phys_buf_arrary, l_key and r_key are only used in subsequent APIs.. q2: is there any merit of using this API (or the one with fast memory reg) vs ib_get_dma_mr()? I see the advatage that its little bit more secure that it gives seperate l_key/r_key for a given phys address. Is there any other benefit? thanks, som. From ardavis at ichips.intel.com Wed Jul 19 10:15:43 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 19 Jul 2006 10:15:43 -0700 Subject: [openib-general] [PATCH] uDAPL - OpenIB-cma: added consumer wakeup mechanism for cq wait objects In-Reply-To: References: Message-ID: <44BE68BF.3050009@ichips.intel.com> Arlin Davis wrote: >Fix for Bug 158. Add support for dat_evd_set_unwaitable on a DTO EVD. > > > Committed revision 8592. From mst at mellanox.co.il Wed Jul 19 10:18:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 19 Jul 2006 20:18:51 +0300 Subject: [openib-general] 2 patches in mst-for-2.6.18 In-Reply-To: References: Message-ID: <20060719171851.GB16573@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] 2 patches in mst-for-2.6.18 > > > [PATCH] IB/ipoib: fix packet loss after hardware address update > > Hm, OK I guess. Are we sure that the memcmp in the data path is OK? I did some measurements with netperf and did not notice any change. > > [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 > > Seems fine but I wonder why the ib_response_mad function moved into a > public header. Oops, looks like I picked some wrong bits from svn. Will redo the patch tomorrow - or try picking the patch from svn by yourself. > Also, how does this patch do this: > > > Also verify that RMPP MADs are data only, to avoid a userspace > > app causing a kernel crash by sending non-data MADs. > > what is the crash? And how does adding a check for the BM class fix this? I'll let Sean address these. -- MST From narravul at cse.ohio-state.edu Wed Jul 19 10:28:13 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Wed, 19 Jul 2006 13:28:13 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters Message-ID: Hi, We are using kernel 2.6.16.3 and installed the latest code/drivers from the iwarp branch. Currently, we are seeing an address resolution problem while using rping. [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 libibverbs: Warning: couldn't load driver /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: undefined symbol: ibv_cmd_query_pkey cma event 1, error -110 The server hangs and cannot be killed. Both the nodes are connected to the switch and I am able to ping the local ip addresses. Is there anything we are missing? Thanks, --Sundeep. From sashak at voltaire.com Wed Jul 19 10:40:38 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 19 Jul 2006 20:40:38 +0300 Subject: [openib-general] [PATCHv2] OpenSM: Support configurable SL per partition In-Reply-To: <1152875460.4642.1197.camel@hal.voltaire.com> References: <1152875460.4642.1197.camel@hal.voltaire.com> Message-ID: <20060719174038.GA11657@sashak.voltaire.com> Hi Hal, The minor comment is below. On 07:11 Fri 14 Jul , Hal Rosenstock wrote: > OpenSM: Support configurable SL per partition > > Signed-off-by: Sasha Khapyorsky > Signed-off-by: Hal Rosenstock > > Index: include/opensm/osm_partition.h > =================================================================== > --- include/opensm/osm_partition.h (revision 8520) > +++ include/opensm/osm_partition.h (working copy) > @@ -102,6 +102,7 @@ > { > cl_map_item_t map_item; > uint16_t pkey; > + uint8_t sl; > cl_map_t full_guid_tbl; > cl_map_t part_guid_tbl; > char name[32]; > @@ -114,6 +115,9 @@ > * pkey > * The IBA defined P_KEY of this Partition. > * > +* sl > +* The Service Level (SL) associated with this Partiton. > +* > * port_guid_tbl > * Container of pointers to all Port objects in the Partition, > * indexed by port GUID. > Index: opensm/osm_prtn.c > =================================================================== > --- opensm/osm_prtn.c (revision 8520) > +++ opensm/osm_prtn.c (working copy) > @@ -79,6 +79,7 @@ > > memset(p, 0, sizeof(*p)); > p->pkey = pkey; > + p->sl = OSM_DEFAULT_SL; > cl_map_construct(&p->full_guid_tbl); > cl_map_init(&p->full_guid_tbl, 32); > cl_map_construct(&p->part_guid_tbl); > @@ -220,7 +221,7 @@ > mc_rec.pkey = pkey; > mc_rec.rate = rate ? rate : 0x3; /* 10Gb/sec */ > mc_rec.pkt_life = OSM_DEFAULT_SUBNET_TIMEOUT; > - mc_rec.sl_flow_hop = OSM_DEFAULT_SL << 28; > + mc_rec.sl_flow_hop = ib_member_set_sl_flow_hop(p->sl, 0, 0); > /* Note: scope needs to be consistent with MGID */ > mc_rec.scope_state = 0x21; > > Index: opensm/osm_prtn_config.c > =================================================================== > --- opensm/osm_prtn_config.c (revision 8520) > +++ opensm/osm_prtn_config.c (working copy) > @@ -52,8 +52,10 @@ > #include > #include > #include > +#include ^^^^^^^^^ I guess you don't need to include limits.h now, since MAX_ constants are not used. Sasha > > #include > +#include > #include > #include > #include > @@ -82,7 +84,7 @@ > osm_log_t *p_log; > osm_subn_t *p_subn; > osm_prtn_t *p_prtn; > - unsigned is_ipoib, mtu, rate; > + unsigned is_ipoib, mtu, rate, sl; > }; > > > @@ -122,6 +124,16 @@ > if (!conf->p_prtn) > return -1; > > + if (conf->p_subn->opt.no_qos) { > + if (conf->sl != OSM_DEFAULT_SL) { > + osm_log(conf->p_log, OSM_LOG_ERROR, > + "partition_create: Overriding SL %d to default SL %d on partition %s as QoS not enabled\n", > + conf->sl, OSM_DEFAULT_SL, name); > + conf->sl = OSM_DEFAULT_SL; > + } > + } > + conf->p_prtn->sl = conf->sl; > + > if (conf->is_ipoib) > osm_prtn_add_mcgroup(conf->p_log, conf->p_subn, conf->p_prtn, > conf->is_ipoib, conf->rate, conf->mtu); > @@ -145,6 +157,17 @@ > PARSEWARN(conf->p_log, lineno, > "flag \'rate\' requires valid value" > " - skipped.\n"); > + } else if (!strncmp(flag, "sl", len)) { > + unsigned sl; > + char *end; > + > + if (!val || !*val || (sl = strtoul(val, &end, 0)) > 15 || > + (*end && !isspace(*end))) > + PARSEWARN(conf->p_log, lineno, > + "flag \'sl\' requires valid value" > + " - skipped.\n"); > + else > + conf->sl = sl; > } else { > PARSEWARN(conf->p_log, lineno, > "unrecognized partition flag \'%s\'" > @@ -254,6 +277,8 @@ > conf->p_log = p_log; > conf->p_subn = p_subn; > conf->p_prtn = NULL; > + conf->is_ipoib = 0; > + conf->sl = OSM_DEFAULT_SL; > return conf; > } > > Index: doc/partition-config.txt > =================================================================== > --- doc/partition-config.txt (revision 8520) > +++ doc/partition-config.txt (working copy) > @@ -46,6 +46,7 @@ > result IPoIB capable MC group will be created. > rate= - specifies rate for this IPoIB MC group (default is 3 (10GBps)) > mtu= - specifies MTU for this IPoIB MC group (default is 4 (2048)) > +sl= - specifies SL for this IPoIB MC group (default is 0) > > Note that values for 'rate' and 'mtu' should be specified as defined in > IBTA specification (for example mtu=4 for 2048). > > > From halr at voltaire.com Wed Jul 19 10:46:14 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 13:46:14 -0400 Subject: [openib-general] [PATCHv2] OpenSM: Support configurable SL per partition In-Reply-To: <20060719174038.GA11657@sashak.voltaire.com> References: <1152875460.4642.1197.camel@hal.voltaire.com> <20060719174038.GA11657@sashak.voltaire.com> Message-ID: <1153331173.4540.37933.camel@hal.voltaire.com> Hi Sasha, On Wed, 2006-07-19 at 13:40, Sasha Khapyorsky wrote: > Hi Hal, > > The minor comment is below. [snip...] > > Index: opensm/osm_prtn_config.c > > =================================================================== > > --- opensm/osm_prtn_config.c (revision 8520) > > +++ opensm/osm_prtn_config.c (working copy) > > @@ -52,8 +52,10 @@ > > #include > > #include > > #include > > +#include > ^^^^^^^^^ > I guess you don't need to include limits.h now, since MAX_ constants are > not used. Thanks. Removed in r8595. -- Hal From swise at opengridcomputing.com Wed Jul 19 11:16:17 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 19 Jul 2006 13:16:17 -0500 Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: References: Message-ID: <1153332977.18035.28.camel@stevo-desktop> What does your network configuration look like? IE send ifconfig output on both systems. Send the output of lsmod on both systems. Does krping work? Can you ssh between the 2 systems using the T3 ip addresses? Do the following libs exist in /usr/local/lib: libibverbs.* librmdacm.* libcxgb3.* Do ldconfig -v and make sure libibverbs is only found in /usr/local/lib. If you have OFED or some other rpm package with libibverbs installed, you'll have problems. For some reason it looks to be trying to load the MTHCA Infiniband library. Do you have IB cards in the system? Steve. On Wed, 2006-07-19 at 13:28 -0400, Sundeep Narravula wrote: > Hi, > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > the iwarp branch. Currently, we are seeing an address resolution problem > while using rping. > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > libibverbs: Warning: couldn't load driver > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > undefined symbol: ibv_cmd_query_pkey > cma event 1, error -110 > > The server hangs and cannot be killed. > > Both the nodes are connected to the switch and I am able to ping the local > ip addresses. Is there anything we are missing? > > Thanks, > --Sundeep. > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From bpradip at in.ibm.com Wed Jul 19 11:55:52 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Thu, 20 Jul 2006 00:25:52 +0530 Subject: [openib-general] ping problem with ammassocards(iWARPinterface) In-Reply-To: <000b01c6aaca$885b5340$4710100a@pc.s2io.com> References: <000b01c6aaca$885b5340$4710100a@pc.s2io.com> Message-ID: <44BE8038.8010104@in.ibm.com> Ravinandan Arakali wrote: > Steve/Pradipta, > Without the -O2 option, rping is now working ! > Good to know that!! But still we need to find out the root cause of the problem. Thanks, Pradipta > Earlier, I did not realize that the cable was yanked out. > Thanks for all the help.. > > Ravi > -----Original Message----- > From: Ravinandan Arakali [mailto:ravinandan.arakali at neterion.com] > Sent: Friday, July 14, 2006 3:37 PM > To: 'Steve Wise' > Cc: 'bpradip at in.ibm.com'; 'openib-general at openib.org'; Leonid. Grossman > (E-mail) > Subject: RE: [openib-general] ping problem with > ammassocards(iWARPinterface) > > > As Pradipta suggested, I rebuilt the libraries by removing > the optimization(-O2 flag) from Makefile. Now, I don't see the > core dump but there's no connection established with rping. > This is similar to the failure I am seeing with rdma_lat test. > > BTW, when I start the rping in server mode, at say port 9999, > should I expect to see an entity listening on that port number > when I do "netstat -an". Currently, I don't see that. > > Ravi > > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Thursday, July 13, 2006 12:10 PM > To: ravinandan.arakali at neterion.com > Cc: bpradip at in.ibm.com; openib-general at openib.org > Subject: Re: [openib-general] ping problem with > ammassocards(iWARPinterface) > > > By the way, does this failure happen immediately or after some period of > time? > > > On Thu, 2006-07-13 at 13:27 -0500, Steve Wise wrote: >> I guess this isn't surprising since rping doesn't work for you either. >> Something fundamental is screwed up on your user side methinks... >> >> CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side >> wasn't listening on the appropriate TCP port, or the server process did >> an rdma_reject(). I'm guessing its the former... >> >> You could use tcpdmp and to see if the connection request is getting RST >> by the remote side. >> >> >> >> >> On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote: >>> With the --cma option, I don't see the error about running SM. >>> But there's no connection established. >>> >>> openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma >>> pp_server_connect_cma starting server >>> >>> openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102 >>> pp_client_connect_cma starting client >>> pp_client_connect_cma/856 unexpected CM event 8 >>> pp_client_connect_cma NOT connected! >>> pp_connect_cma(17.2.2.102,18515) failed! >>> >>> There are no messages in dmesg either. >>> >>> Ravi >>> >>> -----Original Message----- >>> From: Steve Wise [mailto:swise at opengridcomputing.com] >>> Sent: Thursday, July 13, 2006 6:55 AM >>> To: Ravinandan Arakali >>> Cc: bpradip at in.ibm.com; openib-general at openib.org >>> Subject: Re: [openib-general] ping problem with ammasso >>> cards(iWARPinterface) >>> >>> >>> Are you trying to run this over iwarp? It doesn't need an SM... >>> >>> For the perftests rdma_lat and rdma_bw in the iwarp branch, use the >>> --cma flag. >>> >>> Steve. >>> >>> >>> On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote: >>>> Also, I am trying to run some of the iwarp bandwidth/latency tests >>>> (available under directory perftest). >>>> The first thing to do here is to run opensm. When I run opensm (with > debug >>>> level 10), I get the following error. Any idea what needs to be done > to >>> get >>>> this working ? >>>> >>>> openfab2:/tmp/ib/src/userspace # opensm -d 10 >>>> ------------------------------------------------- >>>> OpenSM Rev:openib-1.2.0 >>>> Command Line Arguments: >>>> d level = 0xa >>>> Log File: /var/log/osm.log >>>> ------------------------------------------------- >>>> OpenSM Rev:openib-1.2.0 >>>> >>>> Using default GUID 0x0 >>>> Error: Could not get port guid >>>> Exiting SM >>>> >>>> openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log >>>> Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0 >>>> Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0 >>>> >>>> Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: > No >>>> previous bind >>>> Jul 12 08:35:04 721702 [0000] -> Exiting SM >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] >>>> Sent: Wednesday, July 12, 2006 10:31 AM >>>> To: Ravinandan Arakali >>>> Cc: openib-general at openib.org >>>> Subject: Re: [openib-general] ping problem with ammasso cards(iWARP >>>> interface) >>>> >>>> >>>> Ravinandan, >>>> Do you still see the rping crash? >>>> >>>> Thanks, >>>> Pradipta Kumar. >>>> >>>> Ravinandan Arakali wrote: >>>>> Pradipta, >>>>> Okay, thanks.. Initially, I was not sure since I don't remember > non-zero >>>>> values in /proc/krping. When I re-ran the krping test, I see > following >>>>> output >>>>> openfab2:~ # cat /proc/krping >>>>> 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856 >>>>> >>>>> As you mentioned, the RDMA traffic seems to be flowing indeed ! >>>>> Any idea why rping is dumping core ? >>>>> >>>>> Has any testing been done using SDP with ammasso cards ? >>>>> >>>>> Regards, >>>>> Ravi >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com] >>>>> Sent: Friday, July 07, 2006 11:20 PM >>>>> To: Ravinandan Arakali >>>>> Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com; >>>>> openib-general at openib.org >>>>> Subject: Re: [openib-general] ping problem with ammasso cards(iWARP >>>>> interface) >>>>> >>>>> >>>>> Ravinandan Arakali wrote: >>>>>> Pradipta, >>>>>> Following is the output from gdb after core dump. I have also >>> copy-pasted >>>>>> the gdb output on client system. >>>>>> >>>>>> Attached is the dmesg output when krping test is run in verbose > mode. >>>>>> The ping data on the sender(client) seems okay. The content is > shifted >>>>>> forward by one character for each packet. On receiver, after > receiving >>>>> ping >>>>>> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because >>>> messages >>>>>> can be lost during writing to /var/log/messages ? >>>>> krping is indeed working!!...Using 'verbose' allows you to see the > ping >>>>> data. >>>>> When not using 'verbose' you see only 'send/recv' messages. >>>>>> ----------------------------------------- >>>>>> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999 >>>>>> Starting program: >>>>>> > /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a >>>>>> 0.0.0.0 -p 9999 >>>>>> [Thread debugging using libthread_db enabled] >>>>>> [New Thread -1210054992 (LWP 3668)] >>>>>> ipaddr (0.0.0.0) >>>>>> port 9999 >>>>>> created cm_id 0x804e6e0 >>>>>> [New Thread -1210057824 (LWP 3671)] >>>>>> rdma_bind_addr successful >>>>>> rdma_listen >>>>>> cma_event type 4 cma_id 0x804e968 (child) >>>>>> child cma 0x804e968 >>>>>> >>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>> [Switching to Thread -1210054992 (LWP 3668)] >>>>>> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514 >>>>>> 514 cb->pd = ibv_alloc_pd(cm_id->verbs); >>>>>> (gdb) bt >>>>>> #0 rping_setup_qp (cb=0x0, cm_id=0x804e968) at > examples/rping.c:514 >>>>>> #1 0x0804a716 in main (argc=9, argv=Cannot access memory at > address >>> 0x6 >>>>>> ) at examples/rping.c:767 >>>>>> (gdb) >>>>>> >>>>>> --------------------------------- >>>>>> (gdb) run -c -vV -C100 -d -a 17.2.2.102 -p 9999 >>>>>> Starting program: >>>>>> >>> tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d -a >>>>>> 17.2.2.102 -p 9999 >>>>>> [Thread debugging using libthread_db enabled] >>>>>> [New Thread 47388824908032 (LWP 4620)] >>>>>> ipaddr (17.2.2.102) >>>>>> port 9999 >>>>>> created cm_id 0x506b00 >>>>>> [New Thread 1082132800 (LWP 4623)] >>>>>> cma_event type 0 cma_id 0x506b00 (parent) >>>>>> cma_event type 2 cma_id 0x506b00 (parent) >>>>>> rdma_resolve_addr - rdma_resolve_route successful >>>>>> created pd 0x506e60 >>>>>> created channel 0x506e80 >>>>>> created cq 0x506ea0 >>>>>> created qp 0x506f40 >>>>>> rping_setup_buffers called on cb 0x505010 >>>>>> allocated & registered buffers... >>>>>> [New Thread 1090525504 (LWP 4624)] >>>>>> cq_thread started. >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> openib-general mailing list >>>>> openib-general at openib.org >>>>> http://openib.org/mailman/listinfo/openib-general >>>>> >>>>> To unsubscribe, please visit >>>> http://openib.org/mailman/listinfo/openib-general >>>>> >>>> >>>> _______________________________________________ >>>> openib-general mailing list >>>> openib-general at openib.org >>>> http://openib.org/mailman/listinfo/openib-general >>>> >>>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Wed Jul 19 11:42:06 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 14:42:06 -0400 Subject: [openib-general] [PATCHv2 3/3] OpenSM/osm_sa_multipath_record.c: Support SL inMultiPathRecord requests In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D8CDC@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D8CDC@mtlexch01.mtl.com> Message-ID: <1153334518.4540.38474.camel@hal.voltaire.com> On Wed, 2006-07-19 at 12:25, Eitan Zahavi wrote: > Sure. Did you catch the P_Key issues using a test? No; those part of the changes were to make the multipath record code look more like the path record code. -- Hal > Eitan > From sean.hefty at intel.com Wed Jul 19 11:54:11 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 19 Jul 2006 11:54:11 -0700 Subject: [openib-general] [PATCH v2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153295167.6972.156.camel@confield.dd.xiranet.com> Message-ID: <000001c6ab64$b97798e0$e598070a@amr.corp.intel.com> Let's try this again... Once we're past cm_match_req(), we can just call into the main API to reject the connection, which will set the state of the cm_id correctly. I looked at changing cm_destroy_id(), but didn't like the idea of trying to have the destroy call set the ARI data. - Sean --- Index: cm.c =================================================================== --- cm.c (revision 8590) +++ cm.c (working copy) @@ -1353,7 +1353,7 @@ static int cm_req_handler(struct cm_work id.local_id); if (IS_ERR(cm_id_priv->timewait_info)) { ret = PTR_ERR(cm_id_priv->timewait_info); - goto error1; + goto destroy; } cm_id_priv->timewait_info->work.remote_id = req_msg->local_comm_id; cm_id_priv->timewait_info->remote_ca_guid = req_msg->local_ca_guid; @@ -1362,7 +1362,8 @@ static int cm_req_handler(struct cm_work listen_cm_id_priv = cm_match_req(work, cm_id_priv); if (!listen_cm_id_priv) { ret = -EINVAL; - goto error2; + kfree(cm_id_priv->timewait_info); + goto destroy; } cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler; @@ -1372,12 +1373,22 @@ static int cm_req_handler(struct cm_work cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); - if (ret) - goto error3; + if (ret) { + ib_get_cached_gid(work->port->cm_dev->device, + work->port->port_num, 0, &work->path[0].sgid); + ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, + &work->path[0].sgid, sizeof work->path[0].sgid, + NULL, 0); + goto rejected; + } if (req_msg->alt_local_lid) { ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); - if (ret) - goto error3; + if (ret) { + ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, + &work->path[0].sgid, + sizeof work->path[0].sgid, NULL, 0); + goto rejected; + } } cm_id_priv->tid = req_msg->hdr.tid; cm_id_priv->timeout_ms = cm_convert_to_ms( @@ -1399,12 +1410,11 @@ static int cm_req_handler(struct cm_work cm_deref_id(listen_cm_id_priv); return 0; -error3: atomic_dec(&cm_id_priv->refcount); +rejected: + atomic_dec(&cm_id_priv->refcount); cm_deref_id(listen_cm_id_priv); - cm_cleanup_timewait(cm_id_priv->timewait_info); -error2: kfree(cm_id_priv->timewait_info); - cm_id_priv->timewait_info = NULL; -error1: ib_destroy_cm_id(&cm_id_priv->id); +destroy: + ib_destroy_cm_id(&cm_id_priv->id); return ret; } From halr at voltaire.com Wed Jul 19 12:15:37 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 15:15:37 -0400 Subject: [openib-general] [PATCH][MINOR] OpenSM/osm_helper.c: Support InformInfoRecord attribute ID decode Message-ID: <1153336534.4540.38766.camel@hal.voltaire.com> OpenSM/osm_helper.c: Support InformInfoRecord attribute ID decode Signed-off-by: Hal Rosenstock Index: opensm/osm_helper.c =================================================================== --- opensm/osm_helper.c (revision 8596) +++ opensm/osm_helper.c (working copy) @@ -257,10 +257,194 @@ const char* const __ib_sa_attr_str[] = "TraceRecord", /* 39 */ "MultiPathRecord", /* 3A */ "ServiceAssociationRecord", /* 3B */ - "UNKNOWN" /* 3C - always highest value */ + "UNKNOWN", /* 3C */ + "UNKNOWN", /* 3D */ + "UNKNOWN", /* 3E */ + "UNKNOWN", /* 3F */ + "UNKNOWN", /* 40 */ + "UNKNOWN", /* 41 */ + "UNKNOWN", /* 42 */ + "UNKNOWN", /* 43 */ + "UNKNOWN", /* 44 */ + "UNKNOWN", /* 45 */ + "UNKNOWN", /* 46 */ + "UNKNOWN", /* 47 */ + "UNKNOWN", /* 48 */ + "UNKNOWN", /* 49 */ + "UNKNOWN", /* 4A */ + "UNKNOWN", /* 4B */ + "UNKNOWN", /* 4C */ + "UNKNOWN", /* 4D */ + "UNKNOWN", /* 4E */ + "UNKNOWN", /* 4F */ + "UNKNOWN", /* 50 */ + "UNKNOWN", /* 51 */ + "UNKNOWN", /* 52 */ + "UNKNOWN", /* 53 */ + "UNKNOWN", /* 54 */ + "UNKNOWN", /* 55 */ + "UNKNOWN", /* 56 */ + "UNKNOWN", /* 57 */ + "UNKNOWN", /* 58 */ + "UNKNOWN", /* 59 */ + "UNKNOWN", /* 5A */ + "UNKNOWN", /* 5B */ + "UNKNOWN", /* 5C */ + "UNKNOWN", /* 5D */ + "UNKNOWN", /* 5E */ + "UNKNOWN", /* 5F */ + "UNKNOWN", /* 60 */ + "UNKNOWN", /* 61 */ + "UNKNOWN", /* 62 */ + "UNKNOWN", /* 63 */ + "UNKNOWN", /* 64 */ + "UNKNOWN", /* 65 */ + "UNKNOWN", /* 66 */ + "UNKNOWN", /* 67 */ + "UNKNOWN", /* 68 */ + "UNKNOWN", /* 69 */ + "UNKNOWN", /* 6A */ + "UNKNOWN", /* 6B */ + "UNKNOWN", /* 6C */ + "UNKNOWN", /* 6D */ + "UNKNOWN", /* 6E */ + "UNKNOWN", /* 6F */ + "UNKNOWN", /* 70 */ + "UNKNOWN", /* 71 */ + "UNKNOWN", /* 72 */ + "UNKNOWN", /* 73 */ + "UNKNOWN", /* 74 */ + "UNKNOWN", /* 75 */ + "UNKNOWN", /* 76 */ + "UNKNOWN", /* 77 */ + "UNKNOWN", /* 78 */ + "UNKNOWN", /* 79 */ + "UNKNOWN", /* 7A */ + "UNKNOWN", /* 7B */ + "UNKNOWN", /* 7C */ + "UNKNOWN", /* 7D */ + "UNKNOWN", /* 7E */ + "UNKNOWN", /* 7F */ + "UNKNOWN", /* 80 */ + "UNKNOWN", /* 81 */ + "UNKNOWN", /* 82 */ + "UNKNOWN", /* 83 */ + "UNKNOWN", /* 84 */ + "UNKNOWN", /* 85 */ + "UNKNOWN", /* 86 */ + "UNKNOWN", /* 87 */ + "UNKNOWN", /* 88 */ + "UNKNOWN", /* 89 */ + "UNKNOWN", /* 8A */ + "UNKNOWN", /* 8B */ + "UNKNOWN", /* 8C */ + "UNKNOWN", /* 8D */ + "UNKNOWN", /* 8E */ + "UNKNOWN", /* 8F */ + "UNKNOWN", /* 90 */ + "UNKNOWN", /* 91 */ + "UNKNOWN", /* 92 */ + "UNKNOWN", /* 93 */ + "UNKNOWN", /* 94 */ + "UNKNOWN", /* 95 */ + "UNKNOWN", /* 96 */ + "UNKNOWN", /* 97 */ + "UNKNOWN", /* 98 */ + "UNKNOWN", /* 99 */ + "UNKNOWN", /* 9A */ + "UNKNOWN", /* 9B */ + "UNKNOWN", /* 9C */ + "UNKNOWN", /* 9D */ + "UNKNOWN", /* 9E */ + "UNKNOWN", /* 9F */ + "UNKNOWN", /* A0 */ + "UNKNOWN", /* A1 */ + "UNKNOWN", /* A2 */ + "UNKNOWN", /* A3 */ + "UNKNOWN", /* A4 */ + "UNKNOWN", /* A5 */ + "UNKNOWN", /* A6 */ + "UNKNOWN", /* A7 */ + "UNKNOWN", /* A8 */ + "UNKNOWN", /* A9 */ + "UNKNOWN", /* AA */ + "UNKNOWN", /* AB */ + "UNKNOWN", /* AC */ + "UNKNOWN", /* AD */ + "UNKNOWN", /* AE */ + "UNKNOWN", /* AF */ + "UNKNOWN", /* B0 */ + "UNKNOWN", /* B1 */ + "UNKNOWN", /* B2 */ + "UNKNOWN", /* B3 */ + "UNKNOWN", /* B4 */ + "UNKNOWN", /* B5 */ + "UNKNOWN", /* B6 */ + "UNKNOWN", /* B7 */ + "UNKNOWN", /* B8 */ + "UNKNOWN", /* B9 */ + "UNKNOWN", /* BA */ + "UNKNOWN", /* BB */ + "UNKNOWN", /* BC */ + "UNKNOWN", /* BD */ + "UNKNOWN", /* BE */ + "UNKNOWN", /* BF */ + "UNKNOWN", /* C0 */ + "UNKNOWN", /* C1 */ + "UNKNOWN", /* C2 */ + "UNKNOWN", /* C3 */ + "UNKNOWN", /* C4 */ + "UNKNOWN", /* C5 */ + "UNKNOWN", /* C6 */ + "UNKNOWN", /* C7 */ + "UNKNOWN", /* C8 */ + "UNKNOWN", /* C9 */ + "UNKNOWN", /* CA */ + "UNKNOWN", /* CB */ + "UNKNOWN", /* CC */ + "UNKNOWN", /* CD */ + "UNKNOWN", /* CE */ + "UNKNOWN", /* CF */ + "UNKNOWN", /* D0 */ + "UNKNOWN", /* D1 */ + "UNKNOWN", /* D2 */ + "UNKNOWN", /* D3 */ + "UNKNOWN", /* D4 */ + "UNKNOWN", /* D5 */ + "UNKNOWN", /* D6 */ + "UNKNOWN", /* D7 */ + "UNKNOWN", /* D8 */ + "UNKNOWN", /* D9 */ + "UNKNOWN", /* DA */ + "UNKNOWN", /* DB */ + "UNKNOWN", /* DC */ + "UNKNOWN", /* DD */ + "UNKNOWN", /* DE */ + "UNKNOWN", /* DF */ + "UNKNOWN", /* E0 */ + "UNKNOWN", /* E1 */ + "UNKNOWN", /* E2 */ + "UNKNOWN", /* E3 */ + "UNKNOWN", /* E4 */ + "UNKNOWN", /* E5 */ + "UNKNOWN", /* E6 */ + "UNKNOWN", /* E7 */ + "UNKNOWN", /* E8 */ + "UNKNOWN", /* E9 */ + "UNKNOWN", /* EA */ + "UNKNOWN", /* EB */ + "UNKNOWN", /* EC */ + "UNKNOWN", /* ED */ + "UNKNOWN", /* EE */ + "UNKNOWN", /* EF */ + "UNKNOWN", /* F0 */ + "UNKNOWN", /* F1 */ + "UNKNOWN", /* F2 */ + "InformInfoRecord", /* F3 */ + "UNKNOWN" /* F4 - always highest value */ }; -#define OSM_SA_ATTR_STR_UNKNOWN_VAL 0x3C +#define OSM_SA_ATTR_STR_UNKNOWN_VAL 0xF4 /********************************************************************** From halr at voltaire.com Wed Jul 19 13:33:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 16:33:49 -0400 Subject: [openib-general] [PATCH][MINOR] OpenSM/osm_sa_informinfo.c: In osm_infr_rcv_process_set_method, if sending error, don't call osm_infr_rcv_process_set_method Message-ID: <1153341227.4540.39419.camel@hal.voltaire.com> OpenSM/osm_sa_informinfo.c: In osm_infr_rcv_process_set_method, if sending error, don't call osm_infr_rcv_process_set_method Also, fix typo in osm_log message Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_informinfo.c =================================================================== --- opensm/osm_sa_informinfo.c (revision 8600) +++ opensm/osm_sa_informinfo.c (working copy) @@ -501,7 +501,7 @@ osm_infr_rcv_process_set_method( /* No Such Item - So Error */ osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_infr_rcv_process_set_method: ERR 4307: " - "Failed to UnSubscribe to non exiting inform object\n"); + "Failed to UnSubscribe to non existing inform object\n"); /* o13-13.1.1: we need to set the subscribe bit to 0 */ p_recvd_inform_info->subscribe = 0; @@ -546,9 +546,11 @@ osm_infr_rcv_process( "Unsupported Method (%s)\n", ib_get_sa_method_str( p_sa_mad->method ) ); osm_sa_send_error( p_rcv->p_resp, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR ); + goto Exit; } osm_infr_rcv_process_set_method( p_rcv, p_madw ); + Exit: OSM_LOG_EXIT( p_rcv->p_log ); } From halr at voltaire.com Wed Jul 19 13:05:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 16:05:03 -0400 Subject: [openib-general] [PATCH] OpenSM/ib_types.h: Pad ib_inform_info_record_t to be modulo 8 in size Message-ID: <1153339501.4540.39193.camel@hal.voltaire.com> OpenSM/ib_types.h: Pad ib_inform_info_record_t to be modulo 8 in size so that attribute offset is calculated properly Signed-off-by: Hal Rosenstock Index: include/iba/ib_types.h =================================================================== --- include/iba/ib_types.h (revision 8598) +++ include/iba/ib_types.h (working copy) @@ -7087,6 +7087,7 @@ typedef struct _ib_inform_info_record ib_net16_t subscriber_enum; uint8_t reserved[6]; ib_inform_info_t inform_info; + uint8_t pad[4]; } PACK_SUFFIX ib_inform_info_record_t; #include From halr at voltaire.com Wed Jul 19 15:21:12 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 18:21:12 -0400 Subject: [openib-general] [PATCH][TRIVIAL]OpenSM/osm_sa_informinfo.c: Remove unneeded line of code in __osm_infr_rcv_respond Message-ID: <1153347671.4540.40378.camel@hal.voltaire.com> OpenSM/osm_sa_informinfo.c: Remove unneeded line of code in __osm_infr_rcv_respond Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_informinfo.c =================================================================== --- opensm/osm_sa_informinfo.c (revision 8606) +++ opensm/osm_sa_informinfo.c (working copy) @@ -312,8 +312,6 @@ __osm_infr_rcv_respond( p_resp_sa_mad->method = IB_MAD_METHOD_GET_RESP; /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; - p_resp_sa_mad->paylen_newwin = - cl_hton32( sizeof(ib_inform_info_record_t) ); p_resp_infr = (ib_inform_info_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); From halr at voltaire.com Wed Jul 19 15:31:17 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Jul 2006 18:31:17 -0400 Subject: [openib-general] [PATCH][TRIVIAL] OpenSM/osm_sa_class_port_info.c: Eliminate unneeded line of code in __osm_cpi_rcv_respond Message-ID: <1153348269.4540.40467.camel@hal.voltaire.com> OpenSM/osm_sa_class_port_info.c: Eliminate unneeded line of code in __osm_cpi_rcv_respond Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_class_port_info.c =================================================================== --- opensm/osm_sa_class_port_info.c (revision 8606) +++ opensm/osm_sa_class_port_info.c (working copy) @@ -160,7 +160,6 @@ __osm_cpi_rcv_respond( p_resp_sa_mad->method |= IB_MAD_METHOD_RESP_MASK; /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; - p_resp_sa_mad->paylen_newwin = 0; p_resp_cpi = (ib_class_port_info_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); From yates2 at llnl.gov Wed Jul 19 16:57:46 2006 From: yates2 at llnl.gov (Kim Yates) Date: Wed, 19 Jul 2006 16:57:46 -0700 Subject: [openib-general] Draft topics for OFA Developers' workshop Message-ID: <03709DA6-89E9-4369-BA59-383546CC247D@llnl.gov> Thanks to those who sent suggestions for topics, and to Bill Boas, Matt Leininger, and Helen Chen for helping come up with this draft list. At this point not considering scheduling or presenters, here's the tentative list of topics, in no particular order, for the next OFA Developers' workshop. Please let me know of any suggestions for additions or changes. Plans for IFED Rel. 2.0 Virtualization Inter-subnet routing Intra-subnet routing Quality of Service Congestion control Security MPI RDS iSCSI ISER SRP SDP Storage Diagnostics Interoperability OpenFabrics on Windows User needs would be addressed primarily in the IBTA+OFA joint DevCon. Before the workshop, Bill Boas and Matt Leininger will lead an effort to come up with a shared technical lexicon, to reduce confusion over terms that have widely different meanings to different groups. ________________________________________________________ Kim Yates LLNL Center for Applied Scientific Computing kimyates at llnl.gov (925) 424-3642 From dotanb at mellanox.co.il Wed Jul 19 23:29:58 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 20 Jul 2006 09:29:58 +0300 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D8D5E@mtlexch01.mtl.com> Hi. I executed the example ucmatose and got the following failure: # ucmatose librdmacm: couldn't read ABI version. failed to create event channel it seems that the cma kernel module don't create any ABI file, but the user level library tries to search for it anyway. here are the machine/driver props: ************************************************************* Host Architecture : x86_64 Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 2) Kernel Version : 2.6.9-22.ELsmp Memory size : 4037872 kB Driver Version : gen2_linux-20060719-1758 (REV=8587) (kernel part: git for kernel 2.6.18) HCA ID(s) : mthca0 HCA model(s) : 25218 FW version(s) : 5.1.921 Board(s) : MT_0150000001 ************************************************************* did you see this problem before? thanks Dotan Barak Software Verification Engineer Mellanox Technologies Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Wed Jul 19 23:51:19 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 19 Jul 2006 23:51:19 -0700 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D8D5E@mtlexch01.mtl.com> Message-ID: <000001c6abc8$e7d431d0$29d8180a@amr.corp.intel.com> I executed the example ucmatose and got the following failure: # ucmatose librdmacm: couldn't read ABI version. failed to create event channel it seems that the cma kernel module don't create any ABI file, but the user level library tries to search for it anyway. here are the machine/driver props: ************************************************************* Host Architecture : x86_64 Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 2) Kernel Version : 2.6.9-22.ELsmp Memory size : 4037872 kB Driver Version : gen2_linux-20060719-1758 (REV=8587) (kernel part: git for kernel 2.6.18) HCA ID(s) : mthca0 HCA model(s) : 25218 FW version(s) : 5.1.921 Board(s) : MT_0150000001 ************************************************************* did you see this problem before? This looks like a potential backport issue. The ABI file is usually created under misc/rdma_cm, but I believe that the backport patch moves the rdma_cm directory under infiniband. (The misc directory doesn't exist.) The library typically looks under misc/rdma_cm. (Note that very old versions of the RDMA CM did not create an ABI file, so it's possible that you have that version.) If I remember correctly, the event channel code went into the CMA in May. Can you tell me what version of the ucma and librdmacm you are using? - Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From narravul at cse.ohio-state.edu Thu Jul 20 00:05:35 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Thu, 20 Jul 2006 03:05:35 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: <1153332977.18035.28.camel@stevo-desktop> Message-ID: Hi, > What does your network configuration look like? IE send ifconfig output > on both systems. > Send the output of lsmod on both systems. I have attached both these for both the systems. > Does krping work? No. It fails with the same error. > Can you ssh between the 2 systems using the T3 ip addresses? No. > Do the following libs exist in /usr/local/lib: > libibverbs.* > librmdacm.* These exist. > libcxgb3.* This is not in /usr/local/lib. /usr/local/lib/infiniband/cxgb3.* exist though. > Do ldconfig -v and make sure libibverbs is only found in /usr/local/lib. > If you have OFED or some other rpm package with libibverbs installed, > you'll have problems. Checked this. Only one exists. > For some reason it looks to be trying to load the MTHCA Infiniband > library. Do you have IB cards in the system? Yes. We do have IB nics on the system. --Sundeep. > > Steve. > > > On Wed, 2006-07-19 at 13:28 -0400, Sundeep Narravula wrote: > > Hi, > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > the iwarp branch. Currently, we are seeing an address resolution problem > > while using rping. > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > libibverbs: Warning: couldn't load driver > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > undefined symbol: ibv_cmd_query_pkey > > cma event 1, error -110 > > > > The server hangs and cannot be killed. > > > > Both the nodes are connected to the switch and I am able to ping the local > > ip addresses. Is there anything we are missing? > > > > Thanks, > > --Sundeep. > > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > -------------- next part -------------- Module Size Used by rdma_krping 21512 0 rdma_ucm 16384 0 ib_ucm 19972 0 ib_uverbs 38312 2 rdma_ucm,ib_ucm ib_umad 18992 0 rdma_cm 28036 2 rdma_krping,rdma_ucm iw_cm 13316 1 rdma_cm ib_addr 11524 1 rdma_cm ib_cm 35844 2 ib_ucm,rdma_cm ib_local_sa 15880 2 rdma_ucm,rdma_cm findex 8192 1 ib_local_sa ib_sa 17060 2 rdma_cm,ib_local_sa ib_mad 38816 4 ib_umad,ib_cm,ib_local_sa,ib_sa iw_cxgb3 76244 0 cxgb3c 24112 1 iw_cxgb3 ib_core 49536 11 rdma_krping,ib_ucm,ib_uverbs,ib_umad,rdma_cm,iw_cm,ib_cm,ib_local_sa,ib_sa,ib_mad,iw_cxgb3 nfs 201324 2 lockd 60424 2 nfs ipv6 248416 16 parport_pc 30148 1 lp 16456 0 parport 38728 2 parport_pc,lp autofs4 23044 1 sunrpc 142012 3 nfs,lockd dm_mod 57624 0 button 10896 0 battery 13700 0 ac 9092 0 ohci_hcd 23556 0 i2c_piix4 12816 0 i2c_core 25088 1 i2c_piix4 cxgb3 91664 1 cxgb3c e1000 104884 0 tg3 98052 0 floppy 61252 0 -------------- next part -------------- Module Size Used by rdma_krping 21512 0 rdma_ucm 16384 0 ib_ucm 19972 0 ib_uverbs 38312 2 rdma_ucm,ib_ucm ib_umad 18992 0 rdma_cm 28036 2 rdma_krping,rdma_ucm iw_cm 13316 1 rdma_cm ib_addr 11524 1 rdma_cm ib_cm 35844 2 ib_ucm,rdma_cm ib_local_sa 15880 2 rdma_ucm,rdma_cm findex 8192 1 ib_local_sa ib_sa 17060 2 rdma_cm,ib_local_sa ib_mad 38816 4 ib_umad,ib_cm,ib_local_sa,ib_sa iw_cxgb3 76244 0 cxgb3c 24112 1 iw_cxgb3 ib_core 49536 11 rdma_krping,ib_ucm,ib_uverbs,ib_umad,rdma_cm,iw_cm,ib_cm,ib_local_sa,ib_sa,ib_mad,iw_cxgb3 nfs 201324 2 lockd 60424 2 nfs ipv6 248416 16 parport_pc 30148 1 lp 16456 0 parport 38728 2 parport_pc,lp autofs4 23044 1 sunrpc 142012 3 nfs,lockd dm_mod 57624 0 button 10896 0 battery 13700 0 ac 9092 0 ohci_hcd 23556 0 i2c_piix4 12816 0 i2c_core 25088 1 i2c_piix4 cxgb3 91664 1 cxgb3c e1000 104884 0 tg3 98052 0 floppy 61252 0 -------------- next part -------------- eth0 Link encap:Ethernet HWaddr 00:30:48:52:78:14 inet addr:192.168.103.10 Bcast:192.168.255.255 Mask:255.255.0.0 inet6 addr: fe80::230:48ff:fe52:7814/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:22678 errors:0 dropped:0 overruns:0 frame:0 TX packets:13466 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2198016 (2.0 MiB) TX bytes:1334264 (1.2 MiB) Base address:0x8f00 Memory:ee1a0000-ee1c0000 eth2 Link encap:Ethernet HWaddr 00:07:43:01:10:61 inet addr:150.11.11.101 Bcast:150.11.255.255 Mask:255.255.0.0 inet6 addr: fe80::207:43ff:fe01:1061/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:185 Memory:febff000-febfffff eth3 Link encap:Ethernet HWaddr 00:07:43:01:10:62 inet addr:150.10.11.101 Bcast:150.10.255.255 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:185 Memory:febff000-febfffff lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:56 errors:0 dropped:0 overruns:0 frame:0 TX packets:56 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4252 (4.1 KiB) TX bytes:4252 (4.1 KiB) -------------- next part -------------- eth0 Link encap:Ethernet HWaddr 00:30:48:52:84:A2 inet addr:192.168.103.9 Bcast:192.168.255.255 Mask:255.255.0.0 inet6 addr: fe80::230:48ff:fe52:84a2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11787 errors:0 dropped:0 overruns:0 frame:0 TX packets:2547 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1093128 (1.0 MiB) TX bytes:228608 (223.2 KiB) Base address:0x8f00 Memory:ee1a0000-ee1c0000 eth2 Link encap:Ethernet HWaddr 00:07:43:01:10:5F inet addr:150.11.11.100 Bcast:150.11.255.255 Mask:255.255.0.0 inet6 addr: fe80::207:43ff:fe01:105f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:1 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:94 (94.0 b) Interrupt:185 Memory:febff000-febfffff eth3 Link encap:Ethernet HWaddr 00:07:43:01:10:60 inet addr:150.10.11.100 Bcast:150.10.255.255 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:185 Memory:febff000-febfffff lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:56 errors:0 dropped:0 overruns:0 frame:0 TX packets:56 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4252 (4.1 KiB) TX bytes:4252 (4.1 KiB) From mst at mellanox.co.il Thu Jul 20 00:48:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 20 Jul 2006 10:48:04 +0300 Subject: [openib-general] testing connection take 3 Message-ID: <20060720074804.GA10536@mellanox.co.il> sorry -- MST From mst at mellanox.co.il Thu Jul 20 01:26:48 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 20 Jul 2006 11:26:48 +0300 Subject: [openib-general] 2 patches in mst-for-2.6.18 In-Reply-To: References: <20060719145804.GB15384@mellanox.co.il> Message-ID: <20060720082648.GA4778@mellanox.co.il> Quoting r. Roland Dreier : > > [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 > > Seems fine but I wonder why the ib_response_mad function moved into a > public header. OK, the problem was that a file using it was missing from a commit. Sorry. I have updated the tree at git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 It now has: >git log --pretty=short origin..mst-for-2.6.18 commit 7534914e26bd847d61e8de6db3ce4fe83b67299e Author: Sean Hefty [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 commit 864cb0bc598c9c6ece816d6bc74efb54e0ef6b70 Author: Michael S. Tsirkin [PATCH] IB/ipoib: fix packet loss after hardware address update -- MST From krkumar2 at in.ibm.com Thu Jul 20 02:23:13 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Thu, 20 Jul 2006 14:53:13 +0530 Subject: [openib-general] [RFC] [PATCH 3/7] ibrdmaverbs hdr files In-Reply-To: Message-ID: Hi, Are you are OK with : 1. Changing all non-exported routines in libibverbs to the new API. All static functions can be changed immediately instead of going the "deprecation" way. 2. Adding new duplicate API's for those API's that are exported to other libraries. Is this the right thing to do and acceptable ? Thanks, - KK Roland Dreier wrote on 07/10/2006 10:27:03 PM: > James> Long term, I don't think it makes sense having two > James> libraries with the exact same functionality, but different > James> function names. Replacing ibv_ with rdma_ would be ideal, > James> but disruptive to current users. Could you provide a > James> backwards compatibility header (one that mapped each ibv_ > James> function to its equivalent rdma_ function)? > > Agreed. I think that the way forward is to provide a libibverbs > release that has both rdma_ and ibv_ prefixed identifiers, and also > provides both and headers, etc. > Then in the next major libibverbs release, the ibv_ versions of > identifiers can be deprecated, and finally removed in the next major > release (which could be renamed to librdmaverbs). > > Following that route is something that I've been meaning to work on, > and I'd be happy to accept patches in that direction. But a big bang > approach that breaks the world is not a good idea I don't think. > > - R. From mst at mellanox.co.il Thu Jul 20 02:25:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 20 Jul 2006 12:25:06 +0300 Subject: [openib-general] [PATCH] ib_cm: fix REJ due to invalid GID In-Reply-To: <44BE61A1.50602@ichips.intel.com> References: <44BE61A1.50602@ichips.intel.com> Message-ID: <20060720092506.GB27323@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] ib_cm: fix REJ due to invalid GID > > Michael S. Tsirkin wrote: > > Should we be worried that this also sends IB_CM_REJ_INVALID_GID if > > find_cached_pkey fails? > > I'm not sure that an invalid pkey necessarily maps better to any other reject > code... It is not the syndrom that I'm worried about, but rather the fact that we are giving out GID information. Is this a security issue? Should we worry about that? -- MST From dotanb at mellanox.co.il Thu Jul 20 03:19:47 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 20 Jul 2006 13:19:47 +0300 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D8E74@mtlexch01.mtl.com> I executed the example ucmatose and got the following failure: # ucmatose librdmacm: couldn't read ABI version. failed to create event channel it seems that the cma kernel module don't create any ABI file, but the user level library tries to search for it anyway. did you see this problem before? This looks like a potential backport issue. The ABI file is usually created under misc/rdma_cm, but I believe that the backport patch moves the rdma_cm directory under infiniband. (The misc directory doesn't exist.) The library typically looks under misc/rdma_cm. (Note that very old versions of the RDMA CM did not create an ABI file, so it's possible that you have that version.) If I remember correctly, the event channel code went into the CMA in May. Can you tell me what version of the ucma and librdmacm you are using? [Dotan Barak] in the kernel code: cma.c: #define CMA_VERSION 0x00 in the librdma header files: rdma_cma_abi.h:#define RDMA_USER_CM_MIN_ABI_VERSION 1 rdma_cma_abi.h:#define RDMA_USER_CM_MAX_ABI_VERSION 2 do you meant to those version numbers? Dotan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Thu Jul 20 01:26:48 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 20 Jul 2006 11:26:48 +0300 Subject: [openib-general] 2 patches in mst-for-2.6.18 In-Reply-To: References: <20060719145804.GB15384@mellanox.co.il> Message-ID: <20060720082648.GA4778@mellanox.co.il> Quoting r. Roland Dreier : > > [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 > > Seems fine but I wonder why the ib_response_mad function moved into a > public header. OK, the problem was that a file using it was missing from a commit. Sorry. I have updated the tree at git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 It now has: >git log --pretty=short origin..mst-for-2.6.18 commit 7534914e26bd847d61e8de6db3ce4fe83b67299e Author: Sean Hefty [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 commit 864cb0bc598c9c6ece816d6bc74efb54e0ef6b70 Author: Michael S. Tsirkin [PATCH] IB/ipoib: fix packet loss after hardware address update -- MST From vlad at mellanox.co.il Thu Jul 20 04:26:34 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 20 Jul 2006 14:26:34 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3028430B0@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3028430B0@mtlexch01.mtl.com> Message-ID: <44BF686A.7060707@mellanox.co.il> Hi Or, I am working on IPoIB failover. I tried Michael's Tsirkin patch for ipoib (updating neighbor structure) and it fixes the issue Roland was talking about. Meanwhile I have tested the following flow: /*_Setup description:_*/ host1 - 2 IB ports connected to IB switch. ib0: 11.0.0.1 ib1: 12.0.0.1 host2 - port 1 connected to the IB switch. ib0: 11.0.0.2 opensm over port1 /*_Flow description:_*/ - ping host2 -> 11.0.0.1 (passed) - set port1 of the host1 to 'DOWN' state (disconnect the port from IB subnet) - ping host2 -> 11.0.0.1 (failed) - ifconfig ib0 0.0.0.0 (on host1) - ifconfig ib1 11.0.0.1 (on host1) - arping -I ib1 11.0.0.1 (on host1) - ping host2 -> 11.0.0.1 (passed) arping in this case was not really necessary because ping issues ARP requests by himself. Also, I have tested ssh over IPoIB with the same flow. In this case arping also wasn't necessary , but it makes an update of neighbors with the new MAC address (of ib1 interface) more quickly. Note: I updated the original arping to be able to send broadcast using ipv4_bcast_addr. We should decide about initial configuration of the IPoIB interfaces for high availability: should they be in a different IP subnets or stay in the same one. Regards, Vladimir Eitan Zahavi wrote: > Hi Roland, > > We are trying this approach and will probably be done with it tomorrow. > So I guess Vlad will be able to update the group soon. > > Eitan Zahavi > > >> -----Original Message----- >> From: Roland Dreier >> Sent: Thursday, July 13, 2006 11:11 PM >> To: Or Gerlitz >> Cc: Tziporet Koren; OpenFabricsEWG; openib >> Subject: Re: [openib-general] OFED 1.1 release - schedule and features >> >> > So if the link which ib0 maps to is DOWN you move the ib0 IPv4 >> > address > to > >> another device whose link is UP (eg ib1) and you somehow have ib1 > >> > send a > >> gratuitous ARP? >> >> I think there may be a problem in the way IPoIB deals with gratuitous >> > ARPs. Because > >> if a neighbour structure is updated by the networking core, there's no >> > way for IPoIB > >> to know about that and update the associated IB path. >> >> Has anyone actually tried this failover approach? >> >> - R. >> > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > > From tziporet at mellanox.co.il Thu Jul 20 05:12:19 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 20 Jul 2006 15:12:19 +0300 Subject: [openib-general] IPoIB "bonding" solution for OFED 1.1 (was re: [PATCH] ipoib: fix address update handling (was Re: OFED 1.1 release - schedule and features)) In-Reply-To: <44BE1254.8070403@voltaire.com> References: <20060717140349.GA19080@mellanox.co.il> <44BE1254.8070403@voltaire.com> Message-ID: <44BF7323.7010103@mellanox.co.il> Or Gerlitz wrote: > Hi Tziporet, > > Do you have an initial drop of the bonding solution planned for OFED 1.1 > that is ready to see the daylight? if not, when is this expected? > > As i mentioned to you, we are investigating few possible ways to > implement HA for IPoIB and want to examine your approach as well. > > Or. > > > Vlad already answered. We will be happy for any help in this area. Tziporet From krkumar2 at in.ibm.com Thu Jul 20 05:28:05 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Thu, 20 Jul 2006 17:58:05 +0530 Subject: [openib-general] [RFC] [PATCH 3/7] ibrdmaverbs hdr files In-Reply-To: Message-ID: To clarify, I am attempting to not duplicate any function, though data structures will get duplicated (with the knowledge that one set will be removed at some stage). The idea is to define all rdma_* functions and make the exported ibv_ functions call these rdma_ exported API's. (eg : ibv_cmd_create_qp(...) { return rdma_cmd_create_qp(); } , etc). The un-exported ibv_ routines can be modified without any such action. This also ensures that if a bug is found in creation of a qp, the fix will go into the "base" function - rdma_cmd_create_qp() and the same fix doesn't have to be applied to ibv_cmd_create_qp. Hope this clarifies my intention. thanks, - KK openib-general-bounces at openib.org wrote on 07/20/2006 02:53:13 PM: > Hi, > > Are you are OK with : > > 1. Changing all non-exported routines in libibverbs to the new API. > All static functions can be changed immediately instead of going > the "deprecation" way. > > 2. Adding new duplicate API's for those API's that are exported to > other libraries. > > Is this the right thing to do and acceptable ? > > Thanks, > > - KK > > > Roland Dreier wrote on 07/10/2006 10:27:03 PM: > > > James> Long term, I don't think it makes sense having two > > James> libraries with the exact same functionality, but different > > James> function names. Replacing ibv_ with rdma_ would be ideal, > > James> but disruptive to current users. Could you provide a > > James> backwards compatibility header (one that mapped each ibv_ > > James> function to its equivalent rdma_ function)? > > > > Agreed. I think that the way forward is to provide a libibverbs > > release that has both rdma_ and ibv_ prefixed identifiers, and also > > provides both and headers, etc. > > Then in the next major libibverbs release, the ibv_ versions of > > identifiers can be deprecated, and finally removed in the next major > > release (which could be renamed to librdmaverbs). > > > > Following that route is something that I've been meaning to work on, > > and I'd be happy to accept patches in that direction. But a big bang > > approach that breaks the world is not a good idea I don't think. > > > > - R. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From bpradip at in.ibm.com Thu Jul 20 06:12:50 2006 From: bpradip at in.ibm.com (Pradipta Kumar Banerjee) Date: Thu, 20 Jul 2006 18:42:50 +0530 Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: References: Message-ID: <44BF8152.8040103@in.ibm.com> Sundeep Narravula wrote: > Hi, > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > the iwarp branch. Currently, we are seeing an address resolution problem > while using rping. > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > libibverbs: Warning: couldn't load driver > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > undefined symbol: ibv_cmd_query_pkey > cma event 1, error -110 Hi Sundeep, What is the output of 'ibv_devices' ? (its in libibverbs/examples) Also can you try disabling the mthca device (unload the driver may be !!) and then test rping with chelsio. Thanks, Pradipta Kumar. > > The server hangs and cannot be killed. > > Both the nodes are connected to the switch and I am able to ping the local > ip addresses. Is there anything we are missing? > > Thanks, > --Sundeep. > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From swise at opengridcomputing.com Thu Jul 20 07:59:32 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 20 Jul 2006 09:59:32 -0500 Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: References: Message-ID: <1153407572.12133.12.camel@stevo-desktop> The ifconfig stuff looks ok. thanks for sending it. What firmware version are you using? You might need to contact Chelsio and get the correct firmware for the openib drivers. ethtool -i will show us what version you're using. It should say something like this: vic21:~ # ethtool -i eth2 driver: cxgb3 version: 1.0.0 firmware-version: T 2.38 bus-info: 0000:03:01.0 vic21:~ # Also, ethtool -S will show you statistics to see if packets are flowing. Please build the iw_cxgb3 driver with debug on and send me the /var/log/messages snipits after running rping. That'll help me figure out what's up... Steve. On Thu, 2006-07-20 at 03:05 -0400, Sundeep Narravula wrote: > Hi, > > > What does your network configuration look like? IE send ifconfig output > > on both systems. > > Send the output of lsmod on both systems. > > I have attached both these for both the systems. > > > Does krping work? > > No. It fails with the same error. > > > Can you ssh between the 2 systems using the T3 ip addresses? > > No. > > > Do the following libs exist in /usr/local/lib: > > libibverbs.* > > librmdacm.* > > These exist. > > > libcxgb3.* > > This is not in /usr/local/lib. > /usr/local/lib/infiniband/cxgb3.* exist though. > > > Do ldconfig -v and make sure libibverbs is only found in /usr/local/lib. > > If you have OFED or some other rpm package with libibverbs installed, > > you'll have problems. > > Checked this. Only one exists. > > > For some reason it looks to be trying to load the MTHCA Infiniband > > library. Do you have IB cards in the system? > > Yes. We do have IB nics on the system. > > --Sundeep. > > > > > Steve. > > > > > > On Wed, 2006-07-19 at 13:28 -0400, Sundeep Narravula wrote: > > > Hi, > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > while using rping. > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > libibverbs: Warning: couldn't load driver > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > undefined symbol: ibv_cmd_query_pkey > > > cma event 1, error -110 > > > > > > The server hangs and cannot be killed. > > > > > > Both the nodes are connected to the switch and I am able to ping the local > > > ip addresses. Is there anything we are missing? > > > > > > Thanks, > > > --Sundeep. > > > > > > > > > > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From swise at opengridcomputing.com Thu Jul 20 08:03:02 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 20 Jul 2006 10:03:02 -0500 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it In-Reply-To: <000001c6abc8$e7d431d0$29d8180a@amr.corp.intel.com> References: <000001c6abc8$e7d431d0$29d8180a@amr.corp.intel.com> Message-ID: <1153407782.12133.17.camel@stevo-desktop> On Wed, 2006-07-19 at 23:51 -0700, Sean Hefty wrote: > I executed the example ucmatose and got the following failure: > > > > > > # ucmatose > librdmacm: couldn't read ABI version. > failed to create event channel Make sure rdma_ucm is loaded. Without that module, you'll get that error... # rmmod rdma_ucm # rping -s -a 0.0.0.0 librdmacm: couldn't open rdma_cm ABI version. rdma_create_event_channel error 2 # From arne.redlich at xiranet.com Thu Jul 20 08:38:16 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Thu, 20 Jul 2006 17:38:16 +0200 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <20060719170522.GA16573@mellanox.co.il> References: <1153294363.6972.151.camel@confield.dd.xiranet.com> <20060719170522.GA16573@mellanox.co.il> Message-ID: <1153409896.6972.192.camel@confield.dd.xiranet.com> Am Mittwoch, den 19.07.2006, 20:05 +0300 schrieb Michael S. Tsirkin: > Quoting r. Arne Redlich : > > Subject: Re: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > > > Am Dienstag, den 18.07.2006, 12:21 -0700 schrieb Sean Hefty: > > > Arne Redlich wrote: > > > > Yep - the Gen1 SRP initiator does. It sends a REQ with an invalid DGID. > > > > If rejected with the correct code (INVALID GID), it will retry after > > > > looking up the GID. > > > > > > Didn't it have a DGID from a path record that it had to get before sending the REQ? > > > > If I'm not completely mistaken, the path lookup is only done after > > receiving an IB_CM_REJ_INVALID_GID. I have no idea why it is done this > > way, though. > > > > Arne > > Er ... how does it send the REQ then? The path record's {D,S}LID (as well as the SGID) are initialized correctly in the 1st REQ. Arne -- Arne Redlich Xiranet Communications GmbH From mst at mellanox.co.il Thu Jul 20 08:44:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 20 Jul 2006 18:44:33 +0300 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153409896.6972.192.camel@confield.dd.xiranet.com> References: <1153294363.6972.151.camel@confield.dd.xiranet.com> <20060719170522.GA16573@mellanox.co.il> <1153409896.6972.192.camel@confield.dd.xiranet.com> Message-ID: <20060720154433.GA27224@mellanox.co.il> Quoting r. Arne Redlich : > Subject: Re: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > Am Mittwoch, den 19.07.2006, 20:05 +0300 schrieb Michael S. Tsirkin: > > Quoting r. Arne Redlich : > > > Subject: Re: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > > > > > Am Dienstag, den 18.07.2006, 12:21 -0700 schrieb Sean Hefty: > > > > Arne Redlich wrote: > > > > > Yep - the Gen1 SRP initiator does. It sends a REQ with an invalid DGID. > > > > > If rejected with the correct code (INVALID GID), it will retry after > > > > > looking up the GID. > > > > > > > > Didn't it have a DGID from a path record that it had to get before sending the REQ? > > > > > > If I'm not completely mistaken, the path lookup is only done after > > > receiving an IB_CM_REJ_INVALID_GID. I have no idea why it is done this > > > way, though. > > > > > > Arne > > > > Er ... how does it send the REQ then? > > The path record's {D,S}LID (as well as the SGID) are initialized > correctly in the 1st REQ. Sorry for being dense ... Why not the GID then? -- MST From halr at voltaire.com Thu Jul 20 08:50:16 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Jul 2006 11:50:16 -0400 Subject: [openib-general] [PATCH] OpenSM/complib: Fix memory corruption in cl_pool.c:cl_qcpool_init Message-ID: <1153410609.4540.50388.camel@hal.voltaire.com> OpenSM/complib: Fix memory corruption in cl_pool.c:cl_qcpool_init This affects big endian 64-bit architectures (it's covered up on little endian 64-bit architectures). Signed-off-by: Sasha Khapyorsky Signed-off-by: Hal Rosenstock Index: complib/cl_pool.c =================================================================== --- complib/cl_pool.c (revision 8616) +++ complib/cl_pool.c (working copy) @@ -131,7 +131,7 @@ cl_qcpool_init( /* Copy the user's sizes into our array for future use. */ memcpy( p_pool->component_sizes, component_sizes, - sizeof(uint32_t) * num_components ); + sizeof(component_sizes[0]) * num_components ); /* Store the number of components per object. */ p_pool->num_components = num_components; From sean.hefty at intel.com Thu Jul 20 08:43:38 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 20 Jul 2006 08:43:38 -0700 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D8E74@mtlexch01.mtl.com> Message-ID: <000201c6ac13$45436a80$48d8180a@amr.corp.intel.com> in the kernel code: cma.c: #define CMA_VERSION 0x00 I'm looking for the version in the rdma_ucm module. Do you see an rdma_cm directory under /sys/class/misc or /sys/class/infiniband? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Thu Jul 20 09:00:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 20 Jul 2006 19:00:22 +0300 Subject: [openib-general] connection test 4 Message-ID: <20060720160022.GA17387@mellanox.co.il> -- MST From narravul at cse.ohio-state.edu Thu Jul 20 09:39:20 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Thu, 20 Jul 2006 12:39:20 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: <44BF8152.8040103@in.ibm.com> Message-ID: Hi, > Hi Sundeep, > What is the output of 'ibv_devices' ? (its in libibverbs/examples) This is the output of ibv_devices. [narravul at d1-as4 examples]$ ./ibv_devices libibverbs: Warning: couldn't load driver /usr/local/lib/infiniband/amso.so: /usr/local/lib/infiniband/amso.so: undefined symbol: sysfs_get_classdev_device libibverbs: Warning: couldn't load driver /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: undefined symbol: ibv_cmd_query_pkey device node GUID ------ ---------------- cxgb3_0 0007430110610000 > > Also can you try disabling the mthca device (unload the driver may be !!) and > then test rping with chelsio. The mthca device is not loaded. Thanks, --Sundeep. > > Thanks, > Pradipta Kumar. > > > > > The server hangs and cannot be killed. > > > > Both the nodes are connected to the switch and I am able to ping the local > > ip addresses. Is there anything we are missing? > > > > Thanks, > > --Sundeep. > > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From swise at opengridcomputing.com Thu Jul 20 10:17:14 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 20 Jul 2006 12:17:14 -0500 Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: References: Message-ID: <1153415834.12133.25.camel@stevo-desktop> On Thu, 2006-07-20 at 12:39 -0400, Sundeep Narravula wrote: > Hi, > > > Hi Sundeep, > > What is the output of 'ibv_devices' ? (its in libibverbs/examples) > > This is the output of ibv_devices. > > [narravul at d1-as4 examples]$ ./ibv_devices > libibverbs: Warning: couldn't load driver > /usr/local/lib/infiniband/amso.so: /usr/local/lib/infiniband/amso.so: > undefined symbol: sysfs_get_classdev_device > libibverbs: Warning: couldn't load driver > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > undefined symbol: ibv_cmd_query_pkey All of the above warnings should go away if you build/install libamso and libmthca. But they are just warnings. They shouldn't affect the T3 behavior. > device node GUID > ------ ---------------- > cxgb3_0 0007430110610000 > The above looks correct for the T3 device. Steve. From narravul at cse.ohio-state.edu Thu Jul 20 10:42:27 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Thu, 20 Jul 2006 13:42:27 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: <1153407572.12133.12.camel@stevo-desktop> Message-ID: Hi Steve, > What firmware version are you using? You might need to contact Chelsio > and get the correct firmware for the openib drivers. ethtool -i will > show us what version you're using. It should say something like this: > > vic21:~ # ethtool -i eth2 > driver: cxgb3 > version: 1.0.0 > firmware-version: T 2.38 > bus-info: 0000:03:01.0 > vic21:~ # I will verify this. > > Also, ethtool -S will show you statistics to see if packets are flowing. Looks like no packets are flowing. All the stats are 0. > Please build the iw_cxgb3 driver with debug on and send me > the /var/log/messages snipits after running rping. That'll help me > figure out what's up... I will try this and get back. Thanks, --Sundeep. > > Steve. > > > > On Thu, 2006-07-20 at 03:05 -0400, Sundeep Narravula wrote: > > Hi, > > > > > What does your network configuration look like? IE send ifconfig output > > > on both systems. > > > Send the output of lsmod on both systems. > > > > I have attached both these for both the systems. > > > > > Does krping work? > > > > No. It fails with the same error. > > > > > Can you ssh between the 2 systems using the T3 ip addresses? > > > > No. > > > > > Do the following libs exist in /usr/local/lib: > > > libibverbs.* > > > librmdacm.* > > > > These exist. > > > > > libcxgb3.* > > > > This is not in /usr/local/lib. > > /usr/local/lib/infiniband/cxgb3.* exist though. > > > > > Do ldconfig -v and make sure libibverbs is only found in /usr/local/lib. > > > If you have OFED or some other rpm package with libibverbs installed, > > > you'll have problems. > > > > Checked this. Only one exists. > > > > > For some reason it looks to be trying to load the MTHCA Infiniband > > > library. Do you have IB cards in the system? > > > > Yes. We do have IB nics on the system. > > > > --Sundeep. > > > > > > > > Steve. > > > > > > > > > On Wed, 2006-07-19 at 13:28 -0400, Sundeep Narravula wrote: > > > > Hi, > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > while using rping. > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > libibverbs: Warning: couldn't load driver > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > undefined symbol: ibv_cmd_query_pkey > > > > cma event 1, error -110 > > > > > > > > The server hangs and cannot be killed. > > > > > > > > Both the nodes are connected to the switch and I am able to ping the local > > > > ip addresses. Is there anything we are missing? > > > > > > > > Thanks, > > > > --Sundeep. > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > > > From swise at opengridcomputing.com Thu Jul 20 10:50:46 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 20 Jul 2006 12:50:46 -0500 Subject: [openib-general] libibverbs question Message-ID: <1153417846.12133.30.camel@stevo-desktop> If a SEND is received on a QP and it generates a BOUNDS violation because the recv buffer posted is too small, will this generate an async error? I've written a program to generate this event, and I get the appropriate completion status (IBV_WC_LOC_LEN_ERR), but I'm not getting any poll event on the async_fd for the associated verbs context. Should I? I was expecting something like IBV_EVENT_QP_ACCESS_ERR to be passed up. I'm running this over mthca... Thanks, Steve. From eitan at mellanox.co.il Thu Jul 20 11:18:42 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 20 Jul 2006 21:18:42 +0300 Subject: [openib-general] [PATCH] OpenSM/complib: Fix memory corruption incl_pool.c:cl_qcpool_init Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D9030@mtlexch01.mtl.com> Hi Sasha, Hal, Good catch! Eitan From tuan.phamdo at intel.com Thu Jul 20 11:47:11 2006 From: tuan.phamdo at intel.com (Phamdo, Tuan) Date: Thu, 20 Jul 2006 11:47:11 -0700 Subject: [openib-general] [Openib-promoters] Draft topics for OFA Developers' workshop Message-ID: Kim, looks good. Do we plan to have any end-user talking about their experiences, requirements, etc. Also, a common comment we've heard from previous events is stability and quality of OFA stack. Can we talk about "testing/validation and release process"? just a thought. Tuan Phamdo Server Platform Group 503-712-4168 (Office) 503-799-9552 (Mobile) -----Original Message----- From: openib-promoters-bounces at openib.org [mailto:openib-promoters-bounces at openib.org] On Behalf Of Kim Yates Sent: Wednesday, July 19, 2006 4:58 PM To: openib-promoters at openib.org; openib-general Subject: [Openib-promoters] Draft topics for OFA Developers' workshop Thanks to those who sent suggestions for topics, and to Bill Boas, Matt Leininger, and Helen Chen for helping come up with this draft list. At this point not considering scheduling or presenters, here's the tentative list of topics, in no particular order, for the next OFA Developers' workshop. Please let me know of any suggestions for additions or changes. Plans for IFED Rel. 2.0 Virtualization Inter-subnet routing Intra-subnet routing Quality of Service Congestion control Security MPI RDS iSCSI ISER SRP SDP Storage Diagnostics Interoperability OpenFabrics on Windows User needs would be addressed primarily in the IBTA+OFA joint DevCon. Before the workshop, Bill Boas and Matt Leininger will lead an effort to come up with a shared technical lexicon, to reduce confusion over terms that have widely different meanings to different groups. ________________________________________________________ Kim Yates LLNL Center for Applied Scientific Computing kimyates at llnl.gov (925) 424-3642 _______________________________________________ openib-promoters mailing list openib-promoters at openib.org http://openib.org/mailman/listinfo/openib-promoters From narravul at cse.ohio-state.edu Thu Jul 20 12:05:37 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Thu, 20 Jul 2006 15:05:37 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: Message-ID: Steve, > > Please build the iw_cxgb3 driver with debug on and send me > > the /var/log/messages snipits after running rping. That'll help me > > figure out what's up... I have attached the logs on the server side and the client side. Thanks, --Sundeep. > > I will try this and get back. > > Thanks, > --Sundeep. > > > > > Steve. > > > > > > > > On Thu, 2006-07-20 at 03:05 -0400, Sundeep Narravula wrote: > > > Hi, > > > > > > > What does your network configuration look like? IE send ifconfig output > > > > on both systems. > > > > Send the output of lsmod on both systems. > > > > > > I have attached both these for both the systems. > > > > > > > Does krping work? > > > > > > No. It fails with the same error. > > > > > > > Can you ssh between the 2 systems using the T3 ip addresses? > > > > > > No. > > > > > > > Do the following libs exist in /usr/local/lib: > > > > libibverbs.* > > > > librmdacm.* > > > > > > These exist. > > > > > > > libcxgb3.* > > > > > > This is not in /usr/local/lib. > > > /usr/local/lib/infiniband/cxgb3.* exist though. > > > > > > > Do ldconfig -v and make sure libibverbs is only found in /usr/local/lib. > > > > If you have OFED or some other rpm package with libibverbs installed, > > > > you'll have problems. > > > > > > Checked this. Only one exists. > > > > > > > For some reason it looks to be trying to load the MTHCA Infiniband > > > > library. Do you have IB cards in the system? > > > > > > Yes. We do have IB nics on the system. > > > > > > --Sundeep. > > > > > > > > > > > Steve. > > > > > > > > > > > > On Wed, 2006-07-19 at 13:28 -0400, Sundeep Narravula wrote: > > > > > Hi, > > > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > while using rping. > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > libibverbs: Warning: couldn't load driver > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > cma event 1, error -110 > > > > > > > > > > The server hangs and cannot be killed. > > > > > > > > > > Both the nodes are connected to the switch and I am able to ping the local > > > > > ip addresses. Is there anything we are missing? > > > > > > > > > > Thanks, > > > > > --Sundeep. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > openib-general mailing list > > > > > openib-general at openib.org > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- iw_ch:open_rnic_toe line 140 cxio: opening rnic dev cxgb3c0 cxio: rnic cxgb3c0 info: tpt_base=0xa000000 tpt_top=0xbffffff pbl_base=0xe000000 pbl_top=0xfffffff rqt_base=0xc000000, rqt_top=0xdffffff cxio: udbell_len=0x8000000 udbell_physbase=0xf0000000 kdb_addr=f8852004 cxio: CtrlQP dma_addr=0x339a0000 kaddr=f39a0000 size=1024 iw_ch:open_rnic_init line 74 iw_ch:iwch_register_device line 983 iw_ch: dev name = cxgb3_%d iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_pkey:840 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_gid:850 iw_ch:ibdev f682d400, port 1, index 0, gid f3cb4764 iw_ch:dev f682d400 port 1 netdev f7f50000 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_pkey:840 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_gid:850 iw_ch:ibdev f682d400, port 2, index 0, gid f3cb4724 iw_ch:dev f682d400 port 2 netdev f77e4000 iw_ch:iwch_register_device line 1069 iw_ch:iwch_register_device line 1077 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_board:961 dev = 0xf682d400 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_hca:952 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_fw_ver:940 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_rev:928 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_device:867 ADDRCONF(NETDEV_UP): eth2: link is not ready eth2: link up, 10Gbps, full-duplex ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready ADDRCONF(NETDEV_UP): eth3: link is not ready eth2: no IPv6 routers present iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_alloc_ucontext:114 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_device:867 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_dealloc_ucontext:104 -------------- next part -------------- iw_ch:open_rnic_toe line 140 cxio: opening rnic dev cxgb3c0 cxio: rnic cxgb3c0 info: tpt_base=0xa000000 tpt_top=0xbffffff pbl_base=0xe000000 pbl_top=0xfffffff rqt_base=0xc000000, rqt_top=0xdffffff cxio: udbell_len=0x8000000 udbell_physbase=0xf0000000 kdb_addr=f8852004 cxio: CtrlQP dma_addr=0x339a0000 kaddr=f39a0000 size=1024 iw_ch:open_rnic_init line 74 iw_ch:iwch_register_device line 983 iw_ch: dev name = cxgb3_%d iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_pkey:840 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_gid:850 iw_ch:ibdev c26adc00, port 1, index 0, gid f5e15804 iw_ch:dev c26adc00 port 1 netdev f7f50000 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_pkey:840 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_gid:850 iw_ch:ibdev c26adc00, port 2, index 0, gid f5e157c4 iw_ch:dev c26adc00 port 2 netdev f7011800 iw_ch:iwch_register_device line 1069 iw_ch:iwch_register_device line 1077 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_board:961 dev = 0xc26adc00 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_hca:952 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_fw_ver:940 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_rev:928 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_device:867 ADDRCONF(NETDEV_UP): eth3: link is not ready ADDRCONF(NETDEV_UP): eth2: link is not ready eth2: link up, 10Gbps, full-duplex ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready eth2: no IPv6 routers present From narravul at cse.ohio-state.edu Thu Jul 20 12:07:38 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Thu, 20 Jul 2006 15:07:38 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: Message-ID: Sorry attached the wrong server log. Here are the new logs. --Sundeep. On Thu, 20 Jul 2006, Sundeep Narravula wrote: > Steve, > > > > Please build the iw_cxgb3 driver with debug on and send me > > > the /var/log/messages snipits after running rping. That'll help me > > > figure out what's up... > > I have attached the logs on the server side and the client side. > > Thanks, > --Sundeep. > > > > > I will try this and get back. > > > > Thanks, > > --Sundeep. > > > > > > > > Steve. > > > > > > > > > > > > On Thu, 2006-07-20 at 03:05 -0400, Sundeep Narravula wrote: > > > > Hi, > > > > > > > > > What does your network configuration look like? IE send ifconfig output > > > > > on both systems. > > > > > Send the output of lsmod on both systems. > > > > > > > > I have attached both these for both the systems. > > > > > > > > > Does krping work? > > > > > > > > No. It fails with the same error. > > > > > > > > > Can you ssh between the 2 systems using the T3 ip addresses? > > > > > > > > No. > > > > > > > > > Do the following libs exist in /usr/local/lib: > > > > > libibverbs.* > > > > > librmdacm.* > > > > > > > > These exist. > > > > > > > > > libcxgb3.* > > > > > > > > This is not in /usr/local/lib. > > > > /usr/local/lib/infiniband/cxgb3.* exist though. > > > > > > > > > Do ldconfig -v and make sure libibverbs is only found in /usr/local/lib. > > > > > If you have OFED or some other rpm package with libibverbs installed, > > > > > you'll have problems. > > > > > > > > Checked this. Only one exists. > > > > > > > > > For some reason it looks to be trying to load the MTHCA Infiniband > > > > > library. Do you have IB cards in the system? > > > > > > > > Yes. We do have IB nics on the system. > > > > > > > > --Sundeep. > > > > > > > > > > > > > > Steve. > > > > > > > > > > > > > > > On Wed, 2006-07-19 at 13:28 -0400, Sundeep Narravula wrote: > > > > > > Hi, > > > > > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > > while using rping. > > > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > > libibverbs: Warning: couldn't load driver > > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > > cma event 1, error -110 > > > > > > > > > > > > The server hangs and cannot be killed. > > > > > > > > > > > > Both the nodes are connected to the switch and I am able to ping the local > > > > > > ip addresses. Is there anything we are missing? > > > > > > > > > > > > Thanks, > > > > > > --Sundeep. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > openib-general mailing list > > > > > > openib-general at openib.org > > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > -------------- next part -------------- cxio: opening rnic dev cxgb3c0 cxio: rnic cxgb3c0 info: tpt_base=0xa000000 tpt_top=0xbffffff pbl_base=0xe000000 pbl_top=0xfffffff rqt_base=0xc000000, rqt_top=0xdffffff cxio: udbell_len=0x8000000 udbell_physbase=0xf0000000 kdb_addr=f8852004 cxio: CtrlQP dma_addr=0x339a0000 kaddr=f39a0000 size=1024 iw_ch:open_rnic_init line 74 iw_ch:iwch_register_device line 983 iw_ch: dev name = cxgb3_%d iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_pkey:840 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_gid:850 iw_ch:ibdev c26adc00, port 1, index 0, gid f5e15804 iw_ch:dev c26adc00 port 1 netdev f7f50000 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_pkey:840 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_gid:850 iw_ch:ibdev c26adc00, port 2, index 0, gid f5e157c4 iw_ch:dev c26adc00 port 2 netdev f7011800 iw_ch:iwch_register_device line 1069 iw_ch:iwch_register_device line 1077 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_board:961 dev = 0xc26adc00 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_hca:952 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_fw_ver:940 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_rev:928 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_device:867 ADDRCONF(NETDEV_UP): eth3: link is not ready ADDRCONF(NETDEV_UP): eth2: link is not ready eth2: link up, 10Gbps, full-duplex ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready eth2: no IPv6 routers present iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_alloc_ucontext:114 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_device:867 iw_ch:iwch_create_listen (drivers/infiniband/hw/cxgb3/iwch_cm.c line 1944) iw_ch:get_port enter (drivers/infiniband/hw/cxgb3/iwch_cm.c line 148) inp port 9999 iw_ch:get_port alloc port 9999 iw_ch:alloc_ep enter (drivers/infiniband/hw/cxgb3/iwch_cm.c line 290) iw_ch:alloc ep f5e10140 iw_ch: state_set - idle -> listen iw_ch:listen_start enter (drivers/infiniband/hw/cxgb3/iwch_cm.c line 1225) -------------- next part -------------- iw_ch:open_rnic_toe line 140 cxio: opening rnic dev cxgb3c0 cxio: rnic cxgb3c0 info: tpt_base=0xa000000 tpt_top=0xbffffff pbl_base=0xe000000 pbl_top=0xfffffff rqt_base=0xc000000, rqt_top=0xdffffff cxio: udbell_len=0x8000000 udbell_physbase=0xf0000000 kdb_addr=f8852004 cxio: CtrlQP dma_addr=0x339a0000 kaddr=f39a0000 size=1024 iw_ch:open_rnic_init line 74 iw_ch:iwch_register_device line 983 iw_ch: dev name = cxgb3_%d iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_pkey:840 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_gid:850 iw_ch:ibdev f682d400, port 1, index 0, gid f3cb4764 iw_ch:dev f682d400 port 1 netdev f7f50000 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_port:897 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_pkey:840 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_gid:850 iw_ch:ibdev f682d400, port 2, index 0, gid f3cb4724 iw_ch:dev f682d400 port 2 netdev f77e4000 iw_ch:iwch_register_device line 1069 iw_ch:iwch_register_device line 1077 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_board:961 dev = 0xf682d400 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_hca:952 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_fw_ver:940 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:show_rev:928 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_device:867 ADDRCONF(NETDEV_UP): eth2: link is not ready eth2: link up, 10Gbps, full-duplex ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready ADDRCONF(NETDEV_UP): eth3: link is not ready eth2: no IPv6 routers present iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_alloc_ucontext:114 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_query_device:867 iw_ch:drivers/infiniband/hw/cxgb3/iwch_provider.c:iwch_dealloc_ucontext:104 From swise at opengridcomputing.com Thu Jul 20 12:32:31 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 20 Jul 2006 14:32:31 -0500 Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: References: Message-ID: <1153423951.12133.49.camel@stevo-desktop> The server log indicates the T3 RNIC got the listen command for port 9999. The client log, however, shows no activity. So I _think_ there is still some configuration problem and the RDMA CMA is choosing the wrong rdma interface for the outgoing connection. Its certainly _not_ choosing the T3 rnic. In fact, cma event 1 is 'ADDR ERROR' which I think means it couldn't find a local rdma interface to reach the dst ipaddr -or- it found the local interface, but couldn't resolve the neighbour ipaddr to a hw address (ARP failed). Lets re-examine the test: > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > > > while using rping. > > > > > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > > > libibverbs: Warning: couldn't load driver > > > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > > > cma event 1, error -110 > > Given the above 'rping' command, 150.10.11.200 is supposed to be the rping server's rnic address. However, your ifconfig output showed no interface bound to 150.10.11.200. Is this a pilot error, or did you change the config since you originally sent out the 'rping' line above? Steve. From narravul at cse.ohio-state.edu Thu Jul 20 12:48:16 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Thu, 20 Jul 2006 15:48:16 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: <1153423951.12133.49.camel@stevo-desktop> Message-ID: > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > > > > while using rping. > > > > > > > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > > > > libibverbs: Warning: couldn't load driver > > > > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > > > > cma event 1, error -110 > > > > > Given the above 'rping' command, 150.10.11.200 is supposed to be the > rping server's rnic address. However, your ifconfig output showed no > interface bound to 150.10.11.200. > > Is this a pilot error, or did you change the config since you originally > sent out the 'rping' line above? Sorry about the confusion. I recently changed the configuration. Here is the latest error. [narravul at d1-as4 ~]$ rping -c -vV -C10 -S100 -a 150.10.11.100 -p 9999 libibverbs: Warning: couldn't load driver /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: undefined symbol: ibv_cmd_query_pkey cma event 1, error -110 --Sundeep. > > Steve. > > > From arne.redlich at xiranet.com Fri Jul 21 00:44:36 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Fri, 21 Jul 2006 09:44:36 +0200 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <20060720154433.GA27224@mellanox.co.il> References: <1153294363.6972.151.camel@confield.dd.xiranet.com> <20060719170522.GA16573@mellanox.co.il> <1153409896.6972.192.camel@confield.dd.xiranet.com> <20060720154433.GA27224@mellanox.co.il> Message-ID: <1153467876.6972.235.camel@confield.dd.xiranet.com> Am Donnerstag, den 20.07.2006, 18:44 +0300 schrieb Michael S. Tsirkin: > Quoting r. Arne Redlich : > > Subject: Re: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > > > Am Mittwoch, den 19.07.2006, 20:05 +0300 schrieb Michael S. Tsirkin: > > > Quoting r. Arne Redlich : > > > > Subject: Re: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID > > > > > > > > Am Dienstag, den 18.07.2006, 12:21 -0700 schrieb Sean Hefty: > > > > > Arne Redlich wrote: > > > > > > Yep - the Gen1 SRP initiator does. It sends a REQ with an invalid DGID. > > > > > > If rejected with the correct code (INVALID GID), it will retry after > > > > > > looking up the GID. > > > > > > > > > > Didn't it have a DGID from a path record that it had to get before sending the REQ? > > > > > > > > If I'm not completely mistaken, the path lookup is only done after > > > > receiving an IB_CM_REJ_INVALID_GID. I have no idea why it is done this > > > > way, though. > > > > > > > > Arne > > > > > > Er ... how does it send the REQ then? > > > > The path record's {D,S}LID (as well as the SGID) are initialized > > correctly in the 1st REQ. > > Sorry for being dense ... Why not the GID then? I only took a rather superficial look at the code. But since the invalid GID REJ is treated with such care, I guess it's intentional and not a bug? Maybe (one of) the original author(s) is around and can shed some light into this? Arne -- Arne Redlich Xiranet Communications GmbH From svenar at simula.no Fri Jul 21 04:12:20 2006 From: svenar at simula.no (Sven-Arne Reinemo) Date: Fri, 21 Jul 2006 13:12:20 +0200 Subject: [openib-general] A few questions about IBMgtSim In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E302368923@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E302368923@mtlexch01.mtl.com> Message-ID: <44C0B694.9070809@simula.no> Anno Domini 30-06-2006 17:04, Eitan Zahavi wrote: > Hi Sven, > > Currently there is no way to scale simulation time to real time. > The main reason is that the time scale is mixed: > * OpenSM calculation time is about the same (if you run the simulator on > remote node) So this means that the internal operation of OpenSM with the simulator is identical to its operation with real hardware? I have done some performance test with IBMgtSim and OpenSM running on separate machines and to me it looks like there is very little concurrency between the two processes. I.e. it looks like they spend a lot of time waiting for each other. Below are some results from a few simulation runs, the observed CPU utilization seems quite low. I would have expected much higher CPU load for IBMgtSim... Any thoughts on how this matches your experience? OpenSM #hosts² #sw #ports elapsed¹ kernel¹ user¹ %cpu mem 288 36 24 585 109 99 35 410 512 48 32 766 144 136 36 520 1152 72 48 1161 218 211 36 741 IBMgtSim #hosts² #sw #ports elapsed¹ kernel¹ user¹ %cpu mem 288 36 24 586 87 221 52 92 512 48 32 767 109 278 50 102 1152 72 48 1161 169 432 51 132 ¹time in seconds ²organized in a 3 stage Clos Best regards, Sven-Arne -- SAR ---- GnuPG public key - http://home.ifi.uio.no/~svenar/gpg.asc ---- "There are only 10 kinds of people in this world; those who know binary and those who don't." -- Unknown From swise at opengridcomputing.com Fri Jul 21 08:55:32 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 21 Jul 2006 10:55:32 -0500 Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: References: Message-ID: <1153497332.27338.11.camel@stevo-desktop> On Thu, 2006-07-20 at 15:48 -0400, Sundeep Narravula wrote: > > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > > > > > while using rping. > > > > > > > > > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > > > > > libibverbs: Warning: couldn't load driver > > > > > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > > > > > cma event 1, error -110 > > > > Why don't you go ahead and build/install libmthca -or- remove the IB card and see if that helps. This is still some basic configuration/installation problem here... Steve. From mshefty at ichips.intel.com Fri Jul 21 09:42:29 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 21 Jul 2006 09:42:29 -0700 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <1153467876.6972.235.camel@confield.dd.xiranet.com> References: <1153294363.6972.151.camel@confield.dd.xiranet.com> <20060719170522.GA16573@mellanox.co.il> <1153409896.6972.192.camel@confield.dd.xiranet.com> <20060720154433.GA27224@mellanox.co.il> <1153467876.6972.235.camel@confield.dd.xiranet.com> Message-ID: <44C103F5.9020906@ichips.intel.com> Arne Redlich wrote: > I only took a rather superficial look at the code. But since the invalid > GID REJ is treated with such care, I guess it's intentional and not a > bug? I would lean more towards it being a bug. The data in the CM REQ is wrong. (There's a difference between rejecting a request to another GID, versus the GID carried in the request being bogus.) In any case, an argument can be made that this change fixes a compliancy issue in the IB CM. If the change corrects the problem for you, I will commit it. - Sean From Don.Dhondt at Bull.com Fri Jul 21 10:04:16 2006 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Fri, 21 Jul 2006 10:04:16 -0700 Subject: [openib-general] mvapich2 pmi scalability problems Message-ID: We have been working with LLNL trying to debug a problem using slurm as our resource manager, mvapich2 as our MPI choice and OFED 1.0 as our infiniband stack. The mvapich2 version is mvapich2-0.9.3. The problem arises when we try to scale a simple mpi job. We can not go much above 128 tasks before we start timing out socket connections on the PMI exchanges. Can anyone at OSU comment? Processes PMI_KVS_Put PMI_KVS_Get PMI_KVS_Commit Num Procs ratio Calls ratio n32 1024 1248 1024 1 1 n64 4096 4544 4096 2 4 n96 9216 9888 9216 3 8 n128 16384 17280 16384 4 16 Comment from LLNL: ------------------ That is interesting! The ratio for MPICH2 is constant, so clearly MVAPICH2 is doing something unusual (and unexpected, to me anyway). What will MVAPCH2 do with really large parallel jobs? We regularly run jobs with thousands to tens of thousands of tasks. If you have located an MVAPICH2 expert, this would definitely be worth asking about. It's use of PMI appears to be non-scalable. ---------------------- Any help is appreciated. Regards, Don Dhondt GCOS 8 Communications Solutions Project Manager Bull HN Information Systems Inc. 13430 N. Black Canyon Hwy., Phoenix, AZ 85029 Work (602) 862-5245 Fax (602) 862-4290 -------------- next part -------------- An HTML attachment was scrubbed... URL: From koop at cse.ohio-state.edu Fri Jul 21 11:51:31 2006 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri, 21 Jul 2006 14:51:31 -0400 (EDT) Subject: [openib-general] mvapich2 pmi scalability problems In-Reply-To: Message-ID: Don, Are you using the USE_MPD_RING flag when compiling? If not, can you give that a try? It should very significantly decrease the number of PMI calls that are made. Thanks, Matthew Koop On Fri, 21 Jul 2006 Don.Dhondt at Bull.com wrote: > We have been working with LLNL trying to debug a problem using slurm as > our resource manager, > mvapich2 as our MPI choice and OFED 1.0 as our infiniband stack. The > mvapich2 version is mvapich2-0.9.3. > The problem arises when we try to scale a simple mpi job. We can not go > much above 128 tasks > before we start timing out socket connections on the PMI exchanges. > Can anyone at OSU comment? > > Processes PMI_KVS_Put PMI_KVS_Get PMI_KVS_Commit Num Procs ratio > Calls ratio > n32 1024 1248 1024 1 1 > n64 4096 4544 4096 2 4 > n96 9216 9888 9216 3 8 > n128 16384 17280 16384 4 16 > > Comment from LLNL: > ------------------ > That is interesting! The ratio for MPICH2 is constant, so clearly > MVAPICH2 is doing something unusual (and unexpected, to me anyway). > > What will MVAPCH2 do with really large parallel jobs? We regularly > run jobs with thousands to tens of thousands of tasks. If you have > located an MVAPICH2 expert, this would definitely be worth asking > about. It's use of PMI appears to be non-scalable. > ---------------------- > > Any help is appreciated. > > Regards, > Don Dhondt > GCOS 8 Communications Solutions Project Manager > Bull HN Information Systems Inc. > 13430 N. Black Canyon Hwy., Phoenix, AZ 85029 > Work (602) 862-5245 Fax (602) 862-4290 From rdreier at cisco.com Fri Jul 21 12:44:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 21 Jul 2006 12:44:27 -0700 Subject: [openib-general] libibverbs question In-Reply-To: <1153417846.12133.30.camel@stevo-desktop> (Steve Wise's message of "Thu, 20 Jul 2006 12:50:46 -0500") References: <1153417846.12133.30.camel@stevo-desktop> Message-ID: Steve> If a SEND is received on a QP and it generates a BOUNDS Steve> violation because the recv buffer posted is too small, will Steve> this generate an async error? No, I don't believe so. The error can be returned synchronously as a completion with status of "local length error." (I was actually too lazy to look at the IB spec, but you should be able to confirm that by looking at the section of chapter 11 on completion status) From Don.Dhondt at Bull.com Fri Jul 21 13:19:57 2006 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Fri, 21 Jul 2006 13:19:57 -0700 Subject: [openib-general] mvapich2 pmi scalability problems In-Reply-To: Message-ID: Matthew, We build mvapich2 using the make.mvapich2.gen2 script. Within that script file is the fillowing: # Whether to use an optimized queue pair exchange scheme. This is not # checked for a setting in in the script. It must be set here explicitly. # Supported: "-DUSE_MPD_RING" and "" (to disable) if [ $ARCH = "_PPC64_" ]; then HAVE_MPD_RING="" else HAVE_MPD_RING="-DUSE_MPD_RING" fi Since we are compiling for ia64 our assumption is it compiled with HAVE_MPD_RING="-DUSE_MPD_RING". Is this correct? Also, we are not using mpd to run start the jobs. Since we are using slurm as the resource manager the jobs are started with srun. Does MPD_RING on apply if using MDP? -Don Matthew Koop 07/21/2006 11:51 AM To Don.Dhondt at Bull.com cc openib-general at openib.org Subject Re: [openib-general] mvapich2 pmi scalability problems Don, Are you using the USE_MPD_RING flag when compiling? If not, can you give that a try? It should very significantly decrease the number of PMI calls that are made. Thanks, Matthew Koop -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Fri Jul 21 14:37:40 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 21 Jul 2006 14:37:40 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM Message-ID: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> The following set of patches forwards communication related events to the IB CM for processing. Communication events of interest are communication established and path migration, with only the former is currently handled by the IB CM. This removes the need for users to trap for these events and pass the information onto IB CM. Communication established events can be handled by the ib_cm_establish() routine, but no mechanism exists to notify the IB CM of path migration. This adds the framework for doing so. Signed-off-by: Sean Hefty From sean.hefty at intel.com Fri Jul 21 14:45:12 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 21 Jul 2006 14:45:12 -0700 Subject: [openib-general] [PATCH 1/4] Add cm_event_handler to struct ib_device In-Reply-To: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> Message-ID: <000101c6ad0e$f1a927a0$e598070a@amr.corp.intel.com> Provide a mechanism to dispatch CM events to a registered client of each RDMA device. To avoid reporting QP events to all registered clients, a new CM event handler was added to each ib_device. Signed-off-by: Sean Hefty --- Can one of the iWarp vendors see if this event_handling is generic enough for their use as well? Index: include/rdma/ib_verbs.h =================================================================== --- include/rdma/ib_verbs.h (revision 8626) +++ include/rdma/ib_verbs.h (working copy) @@ -874,6 +874,7 @@ struct ib_device { u32 flags; + void (*cm_handler)(struct ib_event *); int (*query_device)(struct ib_device *device, struct ib_device_attr *device_attr); int (*query_port)(struct ib_device *device, @@ -1069,6 +1070,11 @@ int ib_register_event_handler (struct i int ib_unregister_event_handler(struct ib_event_handler *event_handler); void ib_dispatch_event(struct ib_event *event); +int ib_register_cm_handler(struct ib_device *device, + void (*cm_handler)(struct ib_event *)); +void ib_unregister_cm_handler(struct ib_device *device); +void ib_dispatch_cm_event(struct ib_event *event); + int ib_query_device(struct ib_device *device, struct ib_device_attr *device_attr); Index: core/device.c =================================================================== --- core/device.c (revision 8626) +++ core/device.c (working copy) @@ -478,6 +478,67 @@ void ib_dispatch_event(struct ib_event * EXPORT_SYMBOL(ib_dispatch_event); /** + * ib_register_cm_handler - Register a CM event handler + * @device:Device to register with + * @cm_handler:Handler to register + * + * ib_register_cm_handler() registers an event handler that will be + * called back when asynchronous communication events occurs. + * This callback may occur in interrupt context. + */ +int ib_register_cm_handler(struct ib_device *device, + void (*cm_handler)(struct ib_event *)) +{ + unsigned long flags; + int ret = 0; + + spin_lock_irqsave(&device->event_handler_lock, flags); + if (!device->cm_handler) + device->cm_handler = cm_handler; + else + ret = -EBUSY; + spin_unlock_irqrestore(&device->event_handler_lock, flags); + + return ret; +} +EXPORT_SYMBOL(ib_register_cm_handler); + +/** + * ib_unregister_cm_handler - Unregister a CM event handler + * @cm_handler:Handler to unregister + * + * Unregister a CM event handler registered with ib_register_cm_handler(). + */ +void ib_unregister_cm_handler(struct ib_device *device) +{ + unsigned long flags; + + spin_lock_irqsave(&device->event_handler_lock, flags); + device->cm_handler = NULL; + spin_unlock_irqrestore(&device->event_handler_lock, flags); +} +EXPORT_SYMBOL(ib_unregister_cm_handler); + +/** + * ib_dispatch_cm_event - Dispatch an asynchronous CM event + * @event:Event to dispatch + * + * Low-level drivers must call ib_dispatch_cm_event() to dispatch the + * event to any registered CM event handler when an asynchronous + * communication event occurs. + */ +void ib_dispatch_cm_event(struct ib_event *event) +{ + unsigned long flags; + + spin_lock_irqsave(&event->device->event_handler_lock, flags); + if (event->device->cm_handler) + event->device->cm_handler(event); + spin_unlock_irqrestore(&event->device->event_handler_lock, flags); +} +EXPORT_SYMBOL(ib_dispatch_cm_event); + +/** * ib_query_device - Query IB device attributes * @device:Device to query * @device_attr:Device attributes From sean.hefty at intel.com Fri Jul 21 14:50:50 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 21 Jul 2006 14:50:50 -0700 Subject: [openib-general] [PATCH 2/4] mthca: forward QP COMM_EST and PATH_MIG events to cm_handler In-Reply-To: <000101c6ad0e$f1a927a0$e598070a@amr.corp.intel.com> Message-ID: <000201c6ad0f$bb63ea80$e598070a@amr.corp.intel.com> Forwards communication established and path migration events on a QP for dispatching to the CM event handler. Signed-off-by: Sean Hefty --- I couldn't locate where asynchronous QP events were handled in the ipath driver. If a lower level driver does not dispatch these events, then connections over that device must wait until an RTU is received during connection establishment. Index: hw/mthca/mthca_qp.c =================================================================== --- hw/mthca/mthca_qp.c (revision 8626) +++ hw/mthca/mthca_qp.c (working copy) @@ -246,12 +246,16 @@ void mthca_qp_event(struct mthca_dev *de return; } - if (event_type == IB_EVENT_PATH_MIG) - qp->port = qp->alt_port; - event.device = &dev->ib_dev; event.event = event_type; event.element.qp = &qp->ibqp; + + if (event_type == IB_EVENT_PATH_MIG) { + qp->port = qp->alt_port; + ib_dispatch_cm_event(&event); + } else if (event_type == IB_EVENT_COMM_EST) + ib_dispatch_cm_event(&event); + if (qp->ibqp.event_handler) qp->ibqp.event_handler(&event, qp->ibqp.qp_context); From sean.hefty at intel.com Fri Jul 21 14:54:11 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 21 Jul 2006 14:54:11 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <000201c6ad0f$bb63ea80$e598070a@amr.corp.intel.com> Message-ID: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> Register the IB CM to receive and process communication established events on a QP. This requires tracking connections using local QPNs as part of timewait handling. Signed-off-by: Sean Hefty --- Index: core/cm.c =================================================================== --- core/cm.c (revision 8629) +++ core/cm.c (working copy) @@ -69,6 +69,7 @@ static struct ib_cm { struct rb_root listen_service_table; u64 listen_service_id; /* struct rb_root peer_service_table; todo: fix peer to peer */ + struct rb_root local_qp_table; struct rb_root remote_qp_table; struct rb_root remote_id_table; struct rb_root remote_sidr_table; @@ -110,10 +111,14 @@ struct cm_work { struct cm_timewait_info { struct cm_work work; /* Must be first. */ + struct rb_node local_qp_node; struct rb_node remote_qp_node; struct rb_node remote_id_node; + __be64 local_ca_guid; __be64 remote_ca_guid; + __be32 local_qpn; __be32 remote_qpn; + u8 inserted_local_qp; u8 inserted_remote_qp; u8 inserted_remote_id; }; @@ -508,6 +513,59 @@ static struct cm_timewait_info * cm_find return NULL; } +static struct cm_timewait_info * cm_insert_local_qpn(struct cm_timewait_info + *timewait_info) +{ + struct rb_node **link = &cm.local_qp_table.rb_node; + struct rb_node *parent = NULL; + struct cm_timewait_info *cur_timewait_info; + __be64 local_ca_guid = timewait_info->local_ca_guid; + __be32 local_qpn = timewait_info->local_qpn; + + while (*link) { + parent = *link; + cur_timewait_info = rb_entry(parent, struct cm_timewait_info, + local_qp_node); + if (local_qpn < cur_timewait_info->local_qpn) + link = &(*link)->rb_left; + else if (local_qpn > cur_timewait_info->local_qpn) + link = &(*link)->rb_right; + else if (local_ca_guid < cur_timewait_info->local_ca_guid) + link = &(*link)->rb_left; + else if (local_ca_guid > cur_timewait_info->local_ca_guid) + link = &(*link)->rb_right; + else + return cur_timewait_info; + } + timewait_info->inserted_local_qp = 1; + rb_link_node(&timewait_info->local_qp_node, parent, link); + rb_insert_color(&timewait_info->local_qp_node, &cm.local_qp_table); + return NULL; +} + +static struct cm_timewait_info * cm_find_local_qpn(__be64 local_ca_guid, + __be32 local_qpn) +{ + struct rb_node *node = cm.local_qp_table.rb_node; + struct cm_timewait_info *timewait_info; + + while (node) { + timewait_info = rb_entry(node, struct cm_timewait_info, + local_qp_node); + if (local_qpn < timewait_info->local_qpn) + node = node->rb_left; + else if (local_qpn > timewait_info->local_qpn) + node = node->rb_right; + else if (local_ca_guid < timewait_info->local_ca_guid) + node = node->rb_left; + else if (local_ca_guid > timewait_info->local_ca_guid) + node = node->rb_right; + else + return timewait_info; + } + return NULL; +} + static struct cm_timewait_info * cm_insert_remote_qpn(struct cm_timewait_info *timewait_info) { @@ -640,11 +698,22 @@ static inline int cm_convert_to_ms(int i return 1 << max(iba_time - 8, 0); } +static void cm_cleanup_local_qpn(struct cm_timewait_info *timewait_info) +{ + unsigned long flags; + + spin_lock_irqsave(&cm.lock, flags); + rb_erase(&timewait_info->local_qp_node, &cm.local_qp_table); + timewait_info->inserted_local_qp = 0; + spin_unlock_irqrestore(&cm.lock, flags); +} + static void cm_cleanup_timewait(struct cm_timewait_info *timewait_info) { unsigned long flags; if (!timewait_info->inserted_remote_id && + !timewait_info->inserted_local_qp && !timewait_info->inserted_remote_qp) return; @@ -654,6 +723,11 @@ static void cm_cleanup_timewait(struct c timewait_info->inserted_remote_id = 0; } + if (timewait_info->inserted_local_qp) { + rb_erase(&timewait_info->local_qp_node, &cm.local_qp_table); + timewait_info->inserted_local_qp = 0; + } + if (timewait_info->inserted_remote_qp) { rb_erase(&timewait_info->remote_qp_node, &cm.remote_qp_table); timewait_info->inserted_remote_qp = 0; @@ -1015,20 +1089,33 @@ int ib_send_cm_req(struct ib_cm_id *cm_i cm_id_priv->local_ack_timeout = cm_req_get_primary_local_ack_timeout(req_msg); + cm_id_priv->timewait_info->local_ca_guid = req_msg->local_ca_guid; + cm_id_priv->timewait_info->local_qpn = cm_id_priv->local_qpn; + spin_lock_irqsave(&cm.lock, flags); + if (cm_insert_local_qpn(cm_id_priv->timewait_info)) + ret = -EADDRINUSE; + spin_unlock_irqrestore(&cm.lock, flags); + if (ret) + goto error2; + spin_lock_irqsave(&cm_id_priv->lock, flags); ret = ib_post_send_mad(cm_id_priv->msg, NULL); if (ret) { spin_unlock_irqrestore(&cm_id_priv->lock, flags); - goto error2; + goto error3; } BUG_ON(cm_id->state != IB_CM_IDLE); cm_id->state = IB_CM_REQ_SENT; spin_unlock_irqrestore(&cm_id_priv->lock, flags); return 0; -error2: cm_free_msg(cm_id_priv->msg); -error1: kfree(cm_id_priv->timewait_info); -out: return ret; +error3: + cm_cleanup_local_qpn(cm_id_priv->timewait_info); +error2: + cm_free_msg(cm_id_priv->msg); +error1: + kfree(cm_id_priv->timewait_info); + return ret; } EXPORT_SYMBOL(ib_send_cm_req); @@ -1444,7 +1531,7 @@ int ib_send_cm_rep(struct ib_cm_id *cm_i struct ib_cm_rep_param *param) { struct cm_id_private *cm_id_priv; - struct ib_mad_send_buf *msg; + struct ib_mad_send_buf *msg = NULL; struct cm_rep_msg *rep_msg; unsigned long flags; int ret; @@ -1458,12 +1545,20 @@ int ib_send_cm_rep(struct ib_cm_id *cm_i if (cm_id->state != IB_CM_REQ_RCVD && cm_id->state != IB_CM_MRA_REQ_SENT) { ret = -EINVAL; - goto out; + goto error; } + cm_id_priv->timewait_info->local_ca_guid = cm_id->device->node_guid; + cm_id_priv->timewait_info->local_qpn = cpu_to_be32(param->qp_num); + spin_lock(&cm.lock); + ret = cm_insert_local_qpn(cm_id_priv->timewait_info) ? -EADDRINUSE : 0; + spin_unlock(&cm.lock); + if (ret) + goto error; + ret = cm_alloc_msg(cm_id_priv, &msg); if (ret) - goto out; + goto error; rep_msg = (struct cm_rep_msg *) msg->mad; cm_format_rep(rep_msg, cm_id_priv, param); @@ -1471,11 +1566,8 @@ int ib_send_cm_rep(struct ib_cm_id *cm_i msg->context[1] = (void *) (unsigned long) IB_CM_REP_SENT; ret = ib_post_send_mad(msg, NULL); - if (ret) { - spin_unlock_irqrestore(&cm_id_priv->lock, flags); - cm_free_msg(msg); - return ret; - } + if (ret) + goto error; cm_id->state = IB_CM_REP_SENT; cm_id_priv->msg = msg; @@ -1483,8 +1575,15 @@ int ib_send_cm_rep(struct ib_cm_id *cm_i cm_id_priv->responder_resources = param->responder_resources; cm_id_priv->rq_psn = cm_rep_get_starting_psn(rep_msg); cm_id_priv->local_qpn = cm_rep_get_local_qpn(rep_msg); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return 0; -out: spin_unlock_irqrestore(&cm_id_priv->lock, flags); +error: + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + if (cm_id_priv->timewait_info->inserted_local_qp) + cm_cleanup_local_qpn(cm_id_priv->timewait_info); + if (msg) + cm_free_msg(msg); return ret; } EXPORT_SYMBOL(ib_send_cm_rep); @@ -2603,6 +2702,7 @@ static int cm_timewait_handler(struct cm spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id_priv->id.state != IB_CM_TIMEWAIT || + cm_id_priv->local_qpn != timewait_info->local_qpn || cm_id_priv->remote_qpn != timewait_info->remote_qpn) { spin_unlock_irqrestore(&cm_id_priv->lock, flags); goto out; @@ -3113,6 +3213,28 @@ static void cm_recv_handler(struct ib_ma queue_work(cm.wq, &work->work); } +static void cm_event_handler(struct ib_event *event) +{ + struct cm_id_private *cm_id_priv = NULL; + struct cm_timewait_info *timewait_info; + unsigned long flags; + + if (event->event == IB_EVENT_COMM_EST) { + spin_lock_irqsave(&cm.lock, flags); + timewait_info = cm_find_local_qpn(event->device->node_guid, + cpu_to_be32(event->element. + qp->qp_num)); + if (timewait_info) + cm_id_priv = cm_get_id(timewait_info->work.local_id, + timewait_info->work.remote_id); + spin_unlock_irqrestore(&cm.lock, flags); + if (cm_id_priv) { + ib_cm_establish(&cm_id_priv->id); + cm_deref_id(cm_id_priv); + } + } +} + static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv, struct ib_qp_attr *qp_attr, int *qp_attr_mask) @@ -3308,6 +3430,7 @@ static void cm_add_one(struct ib_device cm_dev->device = device; cm_dev->ca_guid = device->node_guid; + ib_register_cm_handler(device, cm_event_handler); set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); for (i = 1; i <= device->phys_port_cnt; i++) { @@ -3345,6 +3468,7 @@ error1: ib_modify_port(device, port->port_num, 0, &port_modify); ib_unregister_mad_agent(port->mad_agent); } + ib_unregister_cm_handler(device); kfree(cm_dev); } @@ -3371,6 +3495,7 @@ static void cm_remove_one(struct ib_devi ib_modify_port(device, port->port_num, 0, &port_modify); ib_unregister_mad_agent(port->mad_agent); } + ib_unregister_cm_handler(device); kfree(cm_dev); } @@ -3385,6 +3510,7 @@ static int __init ib_cm_init(void) cm.listen_service_table = RB_ROOT; cm.listen_service_id = __constant_be64_to_cpu(IB_CM_ASSIGN_SERVICE_ID); cm.remote_id_table = RB_ROOT; + cm.local_qp_table = RB_ROOT; cm.remote_qp_table = RB_ROOT; cm.remote_sidr_table = RB_ROOT; idr_init(&cm.local_id_table); From sean.hefty at intel.com Fri Jul 21 14:55:59 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 21 Jul 2006 14:55:59 -0700 Subject: [openib-general] [PATCH 4/4] RDMA CM: add handling for IB_CM_USER_ESTABLISHED event In-Reply-To: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> Message-ID: <000401c6ad10$736cccf0$e598070a@amr.corp.intel.com> Signed-off-by: Sean Hefty --- Index: core/cma.c =================================================================== --- core/cma.c (revision 8626) +++ core/cma.c (working copy) @@ -886,6 +886,7 @@ static int cma_ib_handler(struct ib_cm_i private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; break; case IB_CM_RTU_RECEIVED: + case IB_CM_USER_ESTABLISHED: status = cma_rtu_recv(id_priv); event = status ? RDMA_CM_EVENT_CONNECT_ERROR : RDMA_CM_EVENT_ESTABLISHED; From sean.hefty at intel.com Fri Jul 21 15:04:16 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 21 Jul 2006 15:04:16 -0700 Subject: [openib-general] svn checkin to merge upstream Message-ID: <000501c6ad11$9bf546b0$e598070a@amr.corp.intel.com> Roland, I committed a bug fix in svn rev 8629 to the IB CM that I'd like pushed upstream for 2.6.18. It was possible for a function to return 0 (success) if an allocation occurred. Thanks, - Sean From koop at cse.ohio-state.edu Fri Jul 21 15:26:27 2006 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri, 21 Jul 2006 18:26:27 -0400 (EDT) Subject: [openib-general] mvapich2 pmi scalability problems In-Reply-To: Message-ID: > Since we are compiling for ia64 our assumption is it compiled with > HAVE_MPD_RING="-DUSE_MPD_RING". Is this correct? > Also, we are not using mpd to run start the jobs. Since we are > using slurm as the resource manager the jobs are started with > srun. Does MPD_RING on apply if using MDP? It should be using the USE_MPD_RING flag in that case. Just to make sure, can you just verify by seeing the compile flags used by `grep`ing mpicc: grep MPI_CFLAGS mpicc it should print out -DUSE_MPD_RING. Even though you are using SLURM this option should still work since MPD is not actually used -- it is making use of the PMI interface only. The numbers of messages you posted earlier would be consistent with what would be expected due to IB QP information that is exchanged at startup. The preferred setup, which should show superior scalability is using -DUSE_MPD_RING. Even in the all PMI case we should be able to add some additional optimizations, but the ring startup is really what should be used. Matt From Don.Dhondt at Bull.com Fri Jul 21 16:03:57 2006 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Fri, 21 Jul 2006 16:03:57 -0700 Subject: [openib-general] mvapich2 pmi scalability problems In-Reply-To: Message-ID: $ grep MPI_CFLAGS ./opt/mpi/mvapich2/bin/mpicc # MPI_CFLAGS - Any special flags needed to compile MPI_CFLAGS="-D_IA64_ -DUSE_INLINE -DRDMA_FAST_PATH -D_SMP_ -DUSE_HEADER_CACHING -DLAZY_MEM_UNREGISTER -DONE_SIDED -D_MLX_PCI_EX_DDR_ -DMPID_USE_SEQUENCE_NUMBERS -D_MEDIUM_CLUSTER -DUSE_MPD_RING -I/usr/local/ofed/include -O2" CFLAGS="$CFLAGS $MPI_CFLAGS" Matthew Koop 07/21/2006 03:26 PM To Don.Dhondt at Bull.com cc openib-general at openib.org Subject Re: [openib-general] mvapich2 pmi scalability problems > Since we are compiling for ia64 our assumption is it compiled with > HAVE_MPD_RING="-DUSE_MPD_RING". Is this correct? > Also, we are not using mpd to run start the jobs. Since we are > using slurm as the resource manager the jobs are started with > srun. Does MPD_RING on apply if using MDP? It should be using the USE_MPD_RING flag in that case. Just to make sure, can you just verify by seeing the compile flags used by `grep`ing mpicc: grep MPI_CFLAGS mpicc it should print out -DUSE_MPD_RING. Even though you are using SLURM this option should still work since MPD is not actually used -- it is making use of the PMI interface only. The numbers of messages you posted earlier would be consistent with what would be expected due to IB QP information that is exchanged at startup. The preferred setup, which should show superior scalability is using -DUSE_MPD_RING. Even in the all PMI case we should be able to add some additional optimizations, but the ring startup is really what should be used. Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at opengridcomputing.com Fri Jul 21 16:47:13 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 21 Jul 2006 18:47:13 -0500 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> References: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> Message-ID: <1153525633.12547.26.camel@trinity.ogc.int> Sean: I apologize in advance if the following generates "who is this newbie" groans from the IB crowd... I sometimes see the COMM_EST event on my (NFS) QP event handler. I assumed that this was occurring due to a race condition between the CM event handler thread (processing the ESTABLISHED event that moves the QP --> RTS), and my I/O tasklets and callbacks. Since my I/Os are handled on interrupt/tasklet context, the response to the request (posting a WR to the SQ) was happening before the QP was moved to RTS in the CM thread. This didn't actually cause a problem since the IB driver (mthca) seemed to simply queue up, but not transmit the SQ WR until the QP-->RTS. All that said... I have a few questions: - Does any of the above make sense? - Do _all_ IB devices (not just mthca) queue WR posted on the SQ when the QP is in RTR or can they generate an error if I don't wait until the QP --> RTS to reply to a request. - Given a _very_ brief review of the patch, it looks like I don't need to to change what I'm doing now (nothing). True? - Will path migration now be handled automagically in the IB CM? Thanks, Tom On Fri, 2006-07-21 at 14:37 -0700, Sean Hefty wrote: > The following set of patches forwards communication related events to the IB CM > for processing. Communication events of interest are communication established > and path migration, with only the former is currently handled by the IB CM. > > This removes the need for users to trap for these events and pass the > information onto IB CM. Communication established events can be handled by the > ib_cm_establish() routine, but no mechanism exists to notify the IB CM of path > migration. This adds the framework for doing so. > > Signed-off-by: Sean Hefty > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sean.hefty at intel.com Fri Jul 21 17:07:02 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 21 Jul 2006 17:07:02 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <1153525633.12547.26.camel@trinity.ogc.int> Message-ID: <000001c6ad22$c26c5250$f6c8180a@amr.corp.intel.com> >I sometimes see the COMM_EST event on my (NFS) QP event handler. I >assumed that this was occurring due to a race condition between the CM >event handler thread (processing the ESTABLISHED event that moves the QP >--> RTS), and my I/O tasklets and callbacks. Since my I/Os are handled >on interrupt/tasklet context, the response to the request (posting a WR >to the SQ) was happening before the QP was moved to RTS in the CM >thread. This didn't actually cause a problem since the IB driver (mthca) >seemed to simply queue up, but not transmit the SQ WR until the >QP-->RTS. > >All that said... I have a few questions: > >- Does any of the above make sense? In the general sense, this is correct. The COMM_EST event is generated when a receive completion occurs on a QP that is in the RTR state - as opposed to RTS state. Typically, a QP is transitioned from RTR to RTS when an RTU message is processed by the CM. Since the client side of the connection can send data immediately after sending an RTU, there's the potential for the event. The user data sent over the connection simply has to be received at the QP before the RTU can be processed. Also note that the RTU is sent over an unreliable QP, so could be dropped. >- Do _all_ IB devices (not just mthca) queue WR posted on the SQ when >the QP is in RTR or can they generate an error if I don't wait until the >QP --> RTS to reply to a request. All IB devices should allow posting of WR to the SQ while in the RTR state. >- Given a _very_ brief review of the patch, it looks like I don't need >to to change what I'm doing now (nothing). True? Correct. The patch fixes what I would consider to be a small window: when user data is received on a QP, but the RTU is continually lost. >- Will path migration now be handled automagically in the IB CM? Not yet. A user can load an alternate path in the IB CM, but there's still no mechanism in place for the IB CM to switch from using an old path to a new one when sending CM messages. The QP itself will fail-over if an alternate path is loaded, but the CM messages will continue to use the old path. This patch simply extends the framework that is there, but additional coding is needed. - Sean From eitan at mellanox.co.il Sat Jul 22 11:25:38 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 22 Jul 2006 21:25:38 +0300 Subject: [openib-general] A few questions about IBMgtSim Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D909B@mtlexch01.mtl.com> Hi Sven, > > > > Currently there is no way to scale simulation time to real time. > > The main reason is that the time scale is mixed: > > * OpenSM calculation time is about the same (if you run the simulator > > on remote node) > > So this means that the internal operation of OpenSM with the simulator is > identical to its operation with real hardware? [EZ] Yes, if the algorithmic stage is only computational (like the routing stage) the time it takes is the sane as real hardware. But the entire fabric setting is involving sending and receiving MADs thus odes not scale. > > I have done some performance test with IBMgtSim and OpenSM running on > separate machines and to me it looks like there is very little concurrency > between the two processes. I.e. it looks like they spend a lot of time waiting > for each other. Below are some results from a few simulation runs, the > observed CPU utilization seems quite low. I would have expected much higher > CPU load for IBMgtSim... Any thoughts on how this matches your experience? [EZ] Yes - these is not much concurrency. Actually it really depends on the number of MADs you allow on the wire. Also, one of the major limitations I run into (which made me split the processes to 2 machine) was memory availability for the 10K nodes case. I do not see what is the drive for doing these comparisons. BTW: do you plan to run the OpenSM tests over the simulator? > > OpenSM > #hosts² #sw #ports elapsed¹ kernel¹ user¹ %cpu mem > 288 36 24 585 109 99 35 410 > 512 48 32 766 144 136 36 520 > 1152 72 48 1161 218 211 36 741 > > IBMgtSim > #hosts² #sw #ports elapsed¹ kernel¹ user¹ %cpu mem > 288 36 24 586 87 221 52 92 > 512 48 32 767 109 278 50 102 > 1152 72 48 1161 169 432 51 132 > > ¹time in seconds > ²organized in a 3 stage Clos > > Best regards, > Sven-Arne > > -- > SAR > ---- GnuPG public key - http://home.ifi.uio.no/~svenar/gpg.asc ---- "There are > only 10 kinds of people in this world; those who know > binary and those who don't." > -- Unknown From ogerlitz at voltaire.com Sun Jul 23 01:05:26 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 23 Jul 2006 11:05:26 +0300 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> References: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> Message-ID: <44C32DC6.2080405@voltaire.com> Sean Hefty wrote: > Register the IB CM to receive and process communication established > events on a QP. This requires tracking connections using local QPNs > as part of timewait handling. > > Signed-off-by: Sean Hefty > --- > Index: core/cm.c > =================================================================== > --- core/cm.c (revision 8629) > +++ core/cm.c (working copy) > @@ -1015,20 +1089,33 @@ int ib_send_cm_req(struct ib_cm_id *cm_i > cm_id_priv->local_ack_timeout = > cm_req_get_primary_local_ack_timeout(req_msg); > > + cm_id_priv->timewait_info->local_ca_guid = req_msg->local_ca_guid; > + cm_id_priv->timewait_info->local_qpn = cm_id_priv->local_qpn; > + spin_lock_irqsave(&cm.lock, flags); > + if (cm_insert_local_qpn(cm_id_priv->timewait_info)) > + ret = -EADDRINUSE; > + spin_unlock_irqrestore(&cm.lock, flags); why bother doing it here? can't the race happen only on the passive side of the connection? Just to make sure, following reading the code/patch i understand the case where the RTU does appear following the event is handled, what would happen in this case is that an EINVAL status would be returned by cm_rtu_handler to cm_work_handler and that's all. Or. From ogerlitz at voltaire.com Sun Jul 23 01:29:26 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 23 Jul 2006 11:29:26 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <000001c6ad22$c26c5250$f6c8180a@amr.corp.intel.com> References: <000001c6ad22$c26c5250$f6c8180a@amr.corp.intel.com> Message-ID: <44C33366.2060400@voltaire.com> Sean Hefty wrote: >> - Do _all_ IB devices (not just mthca) queue WR posted on the SQ when >> the QP is in RTR or can they generate an error if I don't wait until the >> QP --> RTS to reply to a request. > > All IB devices should allow posting of WR to the SQ while in the RTR state. Sean, Is this statement based on the IB spec, the mthca driver or both? From the mthca code it seems there is a place holder for checking and doing some sort of special handing for this case, but this is not implemented yet. mthca/mthca_qp.c:1507: /* XXX check that state is OK to post send */ mthca/mthca_qp.c:1732: /* XXX check that state is OK to post receive */ mthca/mthca_qp.c:1848: /* XXX check that state is OK to post send */ mthca/mthca_qp.c:2101: /* XXX check that state is OK to post receive */ I think special handling (eg queuing of TX WRs if the QP is still in RTR) is some sort of over doing, but lets first see if indeed its not implemented and what others think. Or. From ogerlitz at voltaire.com Sun Jul 23 01:48:20 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 23 Jul 2006 11:48:20 +0300 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> References: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> Message-ID: <44C337D4.7060807@voltaire.com> Sean Hefty wrote: > --- core/cm.c (revision 8629) > +++ core/cm.c (working copy) > +static void cm_event_handler(struct ib_event *event) > +{ > + struct cm_id_private *cm_id_priv = NULL; > + struct cm_timewait_info *timewait_info; > + unsigned long flags; > + > + if (event->event == IB_EVENT_COMM_EST) { > + spin_lock_irqsave(&cm.lock, flags); > + timewait_info = cm_find_local_qpn(event->device->node_guid, > + cpu_to_be32(event->element. > + qp->qp_num)); > + if (timewait_info) > + cm_id_priv = cm_get_id(timewait_info->work.local_id, > + timewait_info->work.remote_id); > + spin_unlock_irqrestore(&cm.lock, flags); > + if (cm_id_priv) { > + ib_cm_establish(&cm_id_priv->id); > + cm_deref_id(cm_id_priv); > + } > + } > +} > + +/** + * ib_cm_establish - Forces a connection state to established. + * @cm_id: Connection identifier to transition to established. + * + * This routine should be invoked by users who receive messages on a + * connected QP before an RTU has been received. + */ +int ib_cm_establish(struct ib_cm_id *cm_id); Sean, As with your patch ib_cm_establish is now called internally from within the CM with the suggested solution good for both user and kernel space connections/qps, how about removing it from the API? Or. Or. From eitan at mellanox.co.il Sun Jul 23 01:53:56 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 23 Jul 2006 11:53:56 +0300 Subject: [openib-general] [PATCH] osm: bug on SM generated Notice Reports Message-ID: <86mzb0em6z.fsf@mtl066.yok.mtl.com> Hi Hal When formating an SM generated report the ProducerType was using CL_NTOH32 which can not be used to format a 24bit network order number. The propose fix also avoids the duplication of conversion to and from network order introduced by the previous use of ib_notice_set_prod_type that requires a network ordered argument. The fix introduces ib_notice_set_prod_type_ho which is similar but requires host order value. For completeness I also introduced the ib_notice_set_vend_id_ho. Eitan Signed-off-by: Eitan Zahavi Index: include/iba/ib_types.h =================================================================== --- include/iba/ib_types.h (revision 8636) +++ include/iba/ib_types.h (working copy) @@ -6838,6 +6838,40 @@ ib_notice_set_prod_type( * ib_mad_notice_attr_t *********/ +/****f* IBA Base: Types/ib_notice_set_prod_type_ho +* NAME +* ib_notice_set_prod_type_ho +* +* DESCRIPTION +* Set the notice Producer Type of Generic Notice given Host Order +* +* SYNOPSIS +*/ +static inline void +ib_notice_set_prod_type_ho( + IN ib_mad_notice_attr_t *p_ntc, + IN uint32_t prod_type_val_ho) +{ + p_ntc->g_or_v.generic.prod_type_lsb = + cl_hton16( (uint16_t)(prod_type_val_ho & 0x0000ffff) ); + p_ntc->g_or_v.generic.prod_type_msb = + (uint8_t)( (prod_type_val_ho & 0x00ff0000) >> 16); +} +/* +* PARAMETERS +* p_ntc +* [in] Pointer to the notice MAD attribute +* +* prod_type +* [in] The producer Type code in host order +* +* RETURN VALUES +* None +* +* SEE ALSO +* ib_mad_notice_attr_t +*********/ + /****f* IBA Base: Types/ib_notice_get_vend_id * NAME * ib_notice_get_vend_id @@ -6901,6 +6935,40 @@ ib_notice_set_vend_id( * ib_mad_notice_attr_t *********/ +/****f* IBA Base: Types/ib_notice_set_vend_id_ho +* NAME +* ib_notice_set_vend_id_ho +* +* DESCRIPTION +* Set the notice Producer Type of Generic Notice given a host order value +* +* SYNOPSIS +*/ +static inline void +ib_notice_set_vend_id_ho( + IN ib_mad_notice_attr_t *p_ntc, + IN ib_net32_t vend_id_ho) +{ + p_ntc->g_or_v.vend.vend_id_lsb = + cl_hton16((uint16_t)(vend_id_ho & 0x0000ffff)); + p_ntc->g_or_v.vend.vend_id_msb = + (uint8_t)((vend_id_ho & 0x00ff0000) >> 16); +} +/* +* PARAMETERS +* p_ntc +* [in] Pointer to the notice MAD attribute +* +* vend_id_ho +* [in] The producer Type code in host order +* +* RETURN VALUES +* None +* +* SEE ALSO +* ib_mad_notice_attr_t +*********/ + #include typedef struct _ib_inform_info { Index: osmtest/osmt_inform.c =================================================================== --- osmtest/osmt_inform.c (revision 8636) +++ osmtest/osmt_inform.c (working copy) @@ -462,7 +462,7 @@ osmt_send_trap_wait_for_forward( IN osmt /* prepare the notice */ p_ntc->generic_type = 0x82;/* generic, type = 2 */ - ib_notice_set_prod_type(p_ntc, 1); + ib_notice_set_prod_type_ho(p_ntc, 1); p_ntc->g_or_v.generic.trap_num = cl_hton16(0x26); p_ntc->issuer_lid = cl_hton16(2); Index: opensm/osm_state_mgr.c =================================================================== --- opensm/osm_state_mgr.c (revision 8636) +++ opensm/osm_state_mgr.c (working copy) @@ -1709,7 +1709,7 @@ __osm_state_mgr_report_new_ports( /* details of the notice */ notice.generic_type = 0x83; /* is generic subn mgt type */ - ib_notice_set_prod_type( ¬ice, CL_HTON32( 4 ) ); /* A Class Manager generator */ + ib_notice_set_prod_type_ho( ¬ice, 4 ); /* A Class Manager generator */ /* endport becomes to be reachable */ notice.g_or_v.generic.trap_num = CL_HTON16( 64 ); /* The sm_base_lid is saved in network order already. */ Index: opensm/osm_multicast.c =================================================================== --- opensm/osm_multicast.c (revision 8636) +++ opensm/osm_multicast.c (working copy) @@ -329,7 +329,7 @@ osm_mgrp_send_delete_notice( /* details of the notice */ notice.generic_type = 0x83; /* is generic subn mgt type */ - ib_notice_set_prod_type(¬ice, CL_NTOH32(4)); /* A Class Manager generator */ + ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ notice.g_or_v.generic.trap_num = CL_HTON16(67); /* delete of mcg */ /* The sm_base_lid is saved in network order already. */ notice.issuer_lid = p_subn->sm_base_lid; @@ -375,7 +375,7 @@ osm_mgrp_send_create_notice( /* details of the notice */ notice.generic_type = 0x83; /* Generic SubnMgt type */ - ib_notice_set_prod_type(¬ice, CL_HTON32(4)); /* A Class Manager generator */ + ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ notice.g_or_v.generic.trap_num = CL_HTON16(66); /* create of mcg */ /* The sm_base_lid is saved in network order already. */ notice.issuer_lid = p_subn->sm_base_lid; Index: opensm/osm_drop_mgr.c =================================================================== --- opensm/osm_drop_mgr.c (revision 8636) +++ opensm/osm_drop_mgr.c (working copy) @@ -315,7 +315,7 @@ __osm_drop_mgr_remove_port( /* details of the notice */ notice.generic_type = 0x83; /* is generic subn mgt type */ - ib_notice_set_prod_type(¬ice, CL_HTON32(4)); /* A class manager generator */ + ib_notice_set_prod_type_ho(¬ice, 4); /* A class manager generator */ /* endport ceases to be reachable */ notice.g_or_v.generic.trap_num = CL_HTON16(65); /* The sm_base_lid is saved in network order already. */ From eitan at mellanox.co.il Sun Jul 23 02:01:27 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 23 Jul 2006 12:01:27 +0300 Subject: [openib-general] [PATCH] osm: bug on SM generated Notice Reports Message-ID: <86lkqkelug.fsf@mtl066.yok.mtl.com> Hi Hal When formating an SM generated report the ProducerType was using CL_NTOH32 which can not be used to format a 24bit network order number. The propose fix also avoids the duplication of conversion to and from network order introduced by the previous use of ib_notice_set_prod_type that requires a network ordered argument. The fix introduces ib_notice_set_prod_type_ho which is similar but requires host order value. For completeness I also introduced the ib_notice_set_vend_id_ho. Eitan Signed-off-by: Eitan Zahavi Index: include/iba/ib_types.h =================================================================== --- include/iba/ib_types.h (revision 8636) +++ include/iba/ib_types.h (working copy) @@ -6838,6 +6838,40 @@ ib_notice_set_prod_type( * ib_mad_notice_attr_t *********/ +/****f* IBA Base: Types/ib_notice_set_prod_type_ho +* NAME +* ib_notice_set_prod_type_ho +* +* DESCRIPTION +* Set the notice Producer Type of Generic Notice given Host Order +* +* SYNOPSIS +*/ +static inline void +ib_notice_set_prod_type_ho( + IN ib_mad_notice_attr_t *p_ntc, + IN uint32_t prod_type_val_ho) +{ + p_ntc->g_or_v.generic.prod_type_lsb = + cl_hton16( (uint16_t)(prod_type_val_ho & 0x0000ffff) ); + p_ntc->g_or_v.generic.prod_type_msb = + (uint8_t)( (prod_type_val_ho & 0x00ff0000) >> 16); +} +/* +* PARAMETERS +* p_ntc +* [in] Pointer to the notice MAD attribute +* +* prod_type +* [in] The producer Type code in host order +* +* RETURN VALUES +* None +* +* SEE ALSO +* ib_mad_notice_attr_t +*********/ + /****f* IBA Base: Types/ib_notice_get_vend_id * NAME * ib_notice_get_vend_id @@ -6901,6 +6935,40 @@ ib_notice_set_vend_id( * ib_mad_notice_attr_t *********/ +/****f* IBA Base: Types/ib_notice_set_vend_id_ho +* NAME +* ib_notice_set_vend_id_ho +* +* DESCRIPTION +* Set the notice Producer Type of Generic Notice given a host order value +* +* SYNOPSIS +*/ +static inline void +ib_notice_set_vend_id_ho( + IN ib_mad_notice_attr_t *p_ntc, + IN uint32_t vend_id_ho) +{ + p_ntc->g_or_v.vend.vend_id_lsb = + cl_hton16((uint16_t)(vend_id_ho & 0x0000ffff)); + p_ntc->g_or_v.vend.vend_id_msb = + (uint8_t)((vend_id_ho & 0x00ff0000) >> 16); +} +/* +* PARAMETERS +* p_ntc +* [in] Pointer to the notice MAD attribute +* +* vend_id_ho +* [in] The producer Type code in host order +* +* RETURN VALUES +* None +* +* SEE ALSO +* ib_mad_notice_attr_t +*********/ + #include typedef struct _ib_inform_info { Index: osmtest/osmt_inform.c =================================================================== --- osmtest/osmt_inform.c (revision 8636) +++ osmtest/osmt_inform.c (working copy) @@ -462,7 +462,7 @@ osmt_send_trap_wait_for_forward( IN osmt /* prepare the notice */ p_ntc->generic_type = 0x82;/* generic, type = 2 */ - ib_notice_set_prod_type(p_ntc, 1); + ib_notice_set_prod_type_ho(p_ntc, 1); p_ntc->g_or_v.generic.trap_num = cl_hton16(0x26); p_ntc->issuer_lid = cl_hton16(2); Index: opensm/osm_state_mgr.c =================================================================== --- opensm/osm_state_mgr.c (revision 8636) +++ opensm/osm_state_mgr.c (working copy) @@ -1709,7 +1709,7 @@ __osm_state_mgr_report_new_ports( /* details of the notice */ notice.generic_type = 0x83; /* is generic subn mgt type */ - ib_notice_set_prod_type( ¬ice, CL_HTON32( 4 ) ); /* A Class Manager generator */ + ib_notice_set_prod_type_ho( ¬ice, 4 ); /* A Class Manager generator */ /* endport becomes to be reachable */ notice.g_or_v.generic.trap_num = CL_HTON16( 64 ); /* The sm_base_lid is saved in network order already. */ Index: opensm/osm_multicast.c =================================================================== --- opensm/osm_multicast.c (revision 8636) +++ opensm/osm_multicast.c (working copy) @@ -329,7 +329,7 @@ osm_mgrp_send_delete_notice( /* details of the notice */ notice.generic_type = 0x83; /* is generic subn mgt type */ - ib_notice_set_prod_type(¬ice, CL_NTOH32(4)); /* A Class Manager generator */ + ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ notice.g_or_v.generic.trap_num = CL_HTON16(67); /* delete of mcg */ /* The sm_base_lid is saved in network order already. */ notice.issuer_lid = p_subn->sm_base_lid; @@ -375,7 +375,7 @@ osm_mgrp_send_create_notice( /* details of the notice */ notice.generic_type = 0x83; /* Generic SubnMgt type */ - ib_notice_set_prod_type(¬ice, CL_HTON32(4)); /* A Class Manager generator */ + ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ notice.g_or_v.generic.trap_num = CL_HTON16(66); /* create of mcg */ /* The sm_base_lid is saved in network order already. */ notice.issuer_lid = p_subn->sm_base_lid; Index: opensm/osm_drop_mgr.c =================================================================== --- opensm/osm_drop_mgr.c (revision 8636) +++ opensm/osm_drop_mgr.c (working copy) @@ -315,7 +315,7 @@ __osm_drop_mgr_remove_port( /* details of the notice */ notice.generic_type = 0x83; /* is generic subn mgt type */ - ib_notice_set_prod_type(¬ice, CL_HTON32(4)); /* A class manager generator */ + ib_notice_set_prod_type_ho(¬ice, 4); /* A class manager generator */ /* endport ceases to be reachable */ notice.g_or_v.generic.trap_num = CL_HTON16(65); /* The sm_base_lid is saved in network order already. */ From jackm at mellanox.co.il Sun Jul 23 02:21:35 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Sun, 23 Jul 2006 12:21:35 +0300 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it In-Reply-To: <000201c6ac13$45436a80$48d8180a@amr.corp.intel.com> References: <000201c6ac13$45436a80$48d8180a@amr.corp.intel.com> Message-ID: <200607231221.35857.jackm@mellanox.co.il> On Thursday 20 July 2006 18:43, Sean Hefty wrote: > in the kernel code: cma.c: > > #define CMA_VERSION 0x00 > > > I'm looking for the version in the rdma_ucm module. Do you see an rdma_cm > directory under /sys/class/misc or /sys/class/infiniband? Hi, Problem is with kernel 2.6.9, in which "struct miscdevice" did not yet contain the "class" field, so the following code could not be executed in file core/ucma.c, procedure ucma_init() : ret = class_device_create_file(ucma_misc.class, &class_device_attr_abi_version); Thus, the kernel module could not create the abi version file for the userspace code in cma.c to read. (In fact, the backport needed to delete these lines entirely). SVN 7079 corrected this in file "librdmacm/src/cma.c" so that if the ABI was not available via "/sys/class/misc/rdma_cm/abi_version", we would assume that the ABI version was the latest version. (If all was not OK with this assumption in a specific execution instance, there would be downstream failures which would be caught). Unfortunately, SVN 7497 broke this fix. The fix needs to be re-instated. From mst at mellanox.co.il Sun Jul 23 02:29:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 23 Jul 2006 12:29:33 +0300 Subject: [openib-general] svn checkin to merge upstream In-Reply-To: <000501c6ad11$9bf546b0$e598070a@amr.corp.intel.com> References: <000501c6ad11$9bf546b0$e598070a@amr.corp.intel.com> Message-ID: <20060723092933.GA26317@mellanox.co.il> Quoting r. Sean Hefty : > Subject: svn checkin to merge upstream > > Roland, > > I committed a bug fix in svn rev 8629 to the IB CM that I'd like pushed upstream > for 2.6.18. It was possible for a function to return 0 (success) if an > allocation occurred. > > Thanks, > - Sean This one (below)? Seems to do more than it says, and I now get a warning about the "out:" label being unused. Is it just me? Further, wouldn't if (IS_ERR()) { ret = PTR_ERR(cm_id_priv->timewait_info); goto out; } be more elegant, and a smaller patch? Finally, Sean, could you post patches inline on the list please? It is confusing if the version in SVN differs from was last posted. ---- ------------------------------------------------------------------------ r8629 | sean.hefty | 2006-07-22 00:19:38 +0300 (Sat, 22 Jul 2006) | 5 lines Return error code, rather than success (0), on failure allocating timewait_info in ib_send_cm_req. Signed-off-by: Sean Hefty ------------------------------------------------------------------------ Index: xxx/drivers/infiniband/core/cm.c =================================================================== --- xxx/drivers/infiniband/core/cm.c (revision 8628) +++ xxx/drivers/infiniband/core/cm.c (revision 8629) @@ -969,15 +969,14 @@ int ib_send_cm_req(struct ib_cm_id *cm_i spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id->state != IB_CM_IDLE) { spin_unlock_irqrestore(&cm_id_priv->lock, flags); - ret = -EINVAL; - goto out; + return -EINVAL; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); cm_id_priv->timewait_info = cm_create_timewait_info(cm_id_priv-> id.local_id); if (IS_ERR(cm_id_priv->timewait_info)) - goto out; + return PTR_ERR(cm_id_priv->timewait_info); ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av); if (ret) -- MST From mst at mellanox.co.il Sun Jul 23 02:42:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 23 Jul 2006 12:42:50 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <44C33366.2060400@voltaire.com> References: <44C33366.2060400@voltaire.com> Message-ID: <20060723094250.GB26317@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH 0/4] Dispatch communication relatedevents to the IB CM > > Sean Hefty wrote: > >> - Do _all_ IB devices (not just mthca) queue WR posted on the SQ when > >> the QP is in RTR or can they generate an error if I don't wait until the > >> QP --> RTS to reply to a request. > > > > All IB devices should allow posting of WR to the SQ while in the RTR state. > > Sean, > > Is this statement based on the IB spec, the mthca driver or both? > > From the mthca code it seems there is a place holder for checking and > doing some sort of special handing for this case, but this is not > implemented yet. > > mthca/mthca_qp.c:1507: /* XXX check that state is OK to post send */ > mthca/mthca_qp.c:1732: /* XXX check that state is OK to post receive */ > mthca/mthca_qp.c:1848: /* XXX check that state is OK to post send */ > mthca/mthca_qp.c:2101: /* XXX check that state is OK to post receive */ > > I think special handling (eg queuing of TX WRs if the QP is still in > RTR) is some sort of over doing, but lets first see if indeed its not > implemented and what others think. > > Or. IB spec 1.2 vol 1 page 456 C10-29: If a Work Request is submitted to a Send Queue while its corresponding QP is in the RTR State, an immediate error shall be returned. So, while you technically *can* post on a QP in RTR, you won't, and are not supposed to, get anything useful out of this. Sean, could you please restate why is this actually useful? -- MST From ogerlitz at voltaire.com Sun Jul 23 03:44:11 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 23 Jul 2006 13:44:11 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <44BF686A.7060707@mellanox.co.il> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3028430B0@mtlexch01.mtl.com> <44BF686A.7060707@mellanox.co.il> Message-ID: <44C352FB.9090009@voltaire.com> Vladimir Sokolovsky wrote: > Hi Or, > I am working on IPoIB failover. > I tried Michael's Tsirkin patch for ipoib (updating neighbor structure) > and it fixes the issue Roland was talking about. > > Meanwhile I have tested the following flow: > /*_Setup description:_*/ > > host1 - 2 IB ports connected to IB switch. > ib0: 11.0.0.1 > ib1: 12.0.0.1 > > host2 - port 1 connected to the IB switch. > ib0: 11.0.0.2 > opensm over port1 > > > /*_Flow description:_*/ > - ping host2 -> 11.0.0.1 (passed) > - set port1 of the host1 to 'DOWN' state (disconnect the port from IB > subnet) > - ping host2 -> 11.0.0.1 (failed) > - ifconfig ib0 0.0.0.0 (on host1) > - ifconfig ib1 11.0.0.1 (on host1) > - arping -I ib1 11.0.0.1 (on host1) > - ping host2 -> 11.0.0.1 (passed) > > arping in this case was not really necessary because ping issues ARP > requests by himself. Hi Vlad, Did you have any special reason to assign host1:ib1 an IP address ***before*** the failover? is the reason for that happen to be having it joins the IPv4 multicast group at "batch time", that is not during the failover? > - arping -I ib1 11.0.0.1 (on host1) -U Unsolicited ARP mode to update neighbours' ARP caches. No replies are expected -A The same as -U, but ARP REPLY packets used instead of ARP REQUEST. I think we want arping to send a gratuitous arp with the MAC of ib1 so weren't you need to provide the -U or -A command line to arping? If i understand correct, gratuitous arp was not sent in your usage case so i am not sure Michael's patch was exercised. > Note: I updated the original arping to be able to send broadcast using > ipv4_bcast_addr. Can you please send the patch to arping? > Also, I have tested ssh over IPoIB with the same flow. In this case > arping also wasn't necessary , but it makes an update of neighbors > with the new MAC address (of ib1 interface) more quickly. Two interesting test cases you might want to validate your approach with is something "long" ie that delivers much traffic before and after the failover ie: iperf or netperf over TCP AND UDP. I have not validated it but i think UDP would not generate ARP so the gratuitous is the only way to update the remote system with the MAC change. Or. From tziporet at mellanox.co.il Sun Jul 23 03:55:34 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 23 Jul 2006 13:55:34 +0300 Subject: [openib-general] openib tools to update mellnox firmware In-Reply-To: <44BE6930.1080105@veritas.com> References: <44BE6930.1080105@veritas.com> Message-ID: <44C355A6.9080009@mellanox.co.il> somenath wrote: > I see this messages while booting my system: > > Jul 17 10:35:37 alekhine kernel: ib_mthca 0000:03:00.0: HCA FW version > 4.6.0 is old (4.7.0 is current). > Jul 17 10:35:37 alekhine kernel: ib_mthca 0000:03:00.0: If you have > problems, try updating your HCA FW. > > where can I find the latest firmware for ib_mthca (pci express card) and > tools to update this? > > thanks, som. > > > Most updated FW is located at: http://www.mellanox.com/support/firmware_download.php You can download a binary image and burn it using mstflint (which is part of OFED), or download also MFT tools (from the same location) to create a binary image. Tziporet From tziporet at mellanox.co.il Sun Jul 23 03:56:51 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 23 Jul 2006 13:56:51 +0300 Subject: [openib-general] libibverbs question In-Reply-To: References: <1153417846.12133.30.camel@stevo-desktop> Message-ID: <44C355F3.10901@mellanox.co.il> Roland Dreier wrote: > Steve> If a SEND is received on a QP and it generates a BOUNDS > Steve> violation because the recv buffer posted is too small, will > Steve> this generate an async error? > > No, I don't believe so. The error can be returned synchronously as a > completion with status of "local length error." > > (I was actually too lazy to look at the IB spec, but you should be > able to confirm that by looking at the section of chapter 11 on > completion status) > > > Mellanox HCAs behaviour is: in RC - WQE completed with error in UD/UC - skip message and resume operation Tziporet From swise at opengridcomputing.com Sun Jul 23 06:37:22 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Sun, 23 Jul 2006 08:37:22 -0500 Subject: [openib-general] libibverbs question References: <1153417846.12133.30.camel@stevo-desktop> <44C355F3.10901@mellanox.co.il> Message-ID: <006101c6ae5d$23af2430$020010ac@haggard> Thanks all. This is exactly what I observed in my program. ----- Original Message ----- From: "Tziporet Koren" To: "Roland Dreier" Cc: "Steve Wise" ; "openib-general" Sent: Sunday, July 23, 2006 5:56 AM Subject: Re: [openib-general] libibverbs question > Roland Dreier wrote: >> Steve> If a SEND is received on a QP and it generates a BOUNDS >> Steve> violation because the recv buffer posted is too small, >> will >> Steve> this generate an async error? >> >> No, I don't believe so. The error can be returned synchronously as a >> completion with status of "local length error." >> >> (I was actually too lazy to look at the IB spec, but you should be >> able to confirm that by looking at the section of chapter 11 on >> completion status) >> >> >> > Mellanox HCAs behaviour is: > in RC - WQE completed with error > in UD/UC - skip message and resume operation > > Tziporet > > From vlad at mellanox.co.il Sun Jul 23 06:58:08 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 23 Jul 2006 16:58:08 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <44C352FB.9090009@voltaire.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3028430B0@mtlexch01.mtl.com> <44BF686A.7060707@mellanox.co.il> <44C352FB.9090009@voltaire.com> Message-ID: <44C38070.9070507@mellanox.co.il> Hi Or, See below, Regards, Vladimir Or Gerlitz wrote: > Vladimir Sokolovsky wrote: >> Hi Or, >> I am working on IPoIB failover. >> I tried Michael's Tsirkin patch for ipoib (updating neighbor >> structure) and it fixes the issue Roland was talking about. >> >> Meanwhile I have tested the following flow: >> /*_Setup description:_*/ >> >> host1 - 2 IB ports connected to IB switch. >> ib0: 11.0.0.1 >> ib1: 12.0.0.1 >> >> host2 - port 1 connected to the IB switch. >> ib0: 11.0.0.2 >> opensm over port1 >> >> >> /*_Flow description:_*/ >> - ping host2 -> 11.0.0.1 (passed) >> - set port1 of the host1 to 'DOWN' state (disconnect the port from >> IB subnet) >> - ping host2 -> 11.0.0.1 (failed) >> - ifconfig ib0 0.0.0.0 (on host1) >> - ifconfig ib1 11.0.0.1 (on host1) >> - arping -I ib1 11.0.0.1 (on host1) >> - ping host2 -> 11.0.0.1 (passed) >> >> arping in this case was not really necessary because ping issues ARP >> requests by himself. > > Hi Vlad, > > Did you have any special reason to assign host1:ib1 an IP address > ***before*** the failover? is the reason for that happen to be having > it joins the IPv4 multicast group at "batch time", that is not during > the failover? ib1 interface is loaded in any case (with or without configuration) if ib0 is loaded by /etc/init.d/network or /etc/init.d/openibd. It can't be configured with IP 0.0.0.0 - it fails to start with this configuration. So, I gave it some IP in a different IP subnet. > > > - arping -I ib1 11.0.0.1 (on host1) > > -U Unsolicited ARP mode to update neighbours' ARP caches. No replies > are expected > > -A The same as -U, but ARP REPLY packets used instead of ARP REQUEST. > > I think we want arping to send a gratuitous arp with the MAC of ib1 > so weren't you need to provide the -U or -A command line to arping? > You are right I used 'arping -A ...' (fogot to insert it in the email). Actually, I have added my flag '-R' which means '-A over IPoIB' > If i understand correct, gratuitous arp was not sent in your usage > case so i am not sure Michael's patch was exercised. > > > Note: I updated the original arping to be able to send broadcast using > > ipv4_bcast_addr. > > Can you please send the patch to arping? > Attached (arping_full_ib.c - arping.c with changes for IPoIB). > > Also, I have tested ssh over IPoIB with the same flow. In this case > > arping also wasn't necessary , but it makes an update of neighbors > > with the new MAC address (of ib1 interface) more quickly. > > Two interesting test cases you might want to validate your approach > with is something "long" ie that delivers much traffic before and > after the failover ie: iperf or netperf over TCP AND UDP. I have not > validated it but i think UDP would not generate ARP so the gratuitous > is the only way to update the remote system with the MAC change. > I will test it later. > Or. > > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: arping_full_ib.c URL: From martingdodo at yahoo.com Sun Jul 23 20:37:18 2006 From: martingdodo at yahoo.com (martingdodo) Date: Mon, 24 Jul 2006 13:37:18 +1000 Subject: [openib-general] Important Report Message-ID: <7wvqmj5graxcvew.240720061337@dma1x> S.O.S. As this is a serious problem against humanity, I'd like to write to bring your awareness. I appologise for any inconvinience. Please read the attached article of "China Exporting Weapons to 'Rogue' Countries" as well. You maybe abble to help by passing the report to whoever concerns. Your contribution to human right counts, and can make a big difference. Highly regards. Marting D. International Human Rights Volunteer ----------------------------------- REPORT INTO ALLEGATIONS OF ORGAN HARVESTING OF FALUN GONG PRACTITIONERS IN CHINA by David Matas and David Kilgour 6 July 2006 i Table of Contents A. Introduction B. Working methods C. The allegation D. Difficulties of proof E. Methods of Proof F. Elements of proof and disproof 1) A perceived threat 2) A policy of persecution 3) Incitement to hatred 4) Massive Arrests 5) Repression 6) The unidentified and the disappeared 7) Sources of transplants 8) Blood testing 9) Corpses with missing organs 10) A confession 11) Admissions 12) Waiting times 13) Incriminating Information on Websites 14) Victim interviews 15) Human rights violations generally 16) Financial considerations ii 17) Corruption 18) Legislation G. Credibility of witnesses and investigators H. Proposed further investigation I. Conclusions J. Recommendations K. Commentary L. Appendices 1) Letter of invitation from CIPFG 2) Biography of David Matas 3) Biography of David Kilgour 4) People interviewed 5) Letter to the Chinese embassy 6) Statements by the Government of China on Falun Gong 7) Physical persecution of the Falun Gong 8) Blood testing of Falun Gong prisoners 9) Unidentified Falun Gong in detention 10) Disappearances 11) AI's Records of Number of Executed Prisoners in China Each Year 12) Corpses with missing organs 13) Transcript of Interview 14) Transcripts of telephone investigations A. Introduction The Coalition to Investigate the Persecution of the Falun Gong in China (CIPFG), a nongovernmental organization registered in Washington, D.C. with a branch in Ottawa, Canada, by letter dated May 24, 2006 asked for our assistance in investigating allegations that state institutions and employees of the government of People's Republic of China have been harvesting organs from live Falun Gong practitioners, killing the practitioners in the process. The request letter is attached as an appendix to this report. Many of the friends of China, including us two, are concerned about these allegations. In light of their seriousness, as well as our own commitment respecting human dignity world wide, we accepted the request. David Matas is an immigration, refugee and international human rights lawyer in private practice in Winnipeg. He is actively involved in the promotion of respect for human rights as an author, speaker and participant in several human rights non-governmental organizations. David Kilgour is a former member of Parliament and a former Secretary of State of the Government of Canada for the Asia Pacific region. Before he became a Parliamentarian, he was a Crown prosecutor. The biographies of both authors are attached as appendices to this report. B. Working Methods We conducted our investigation independently from the CIPFG, the Falun Dafa Association, any other organization, and any government. We sought to go to China unsuccessfully, but would be willing to go even subsequently to pursue a second stage of the investigation if access to witnesses and institutions can be obtained. We interviewed a number of different people listed in an appendix to this report as well as read extensively any information we could obtain relevant to our report. We were not paid by anyone for this report but rather did this work as volunteers. C. The Allegation It is alleged that Falun Gong practitioners are victims of live organ harvesting throughout China. The allegation is that organ harvesting is inflicted on unwilling Falun Gong practitioners at a wide variety of locations, pursuant to a systematic policy, in large numbers. Organ harvesting is a step in organ transplants. The purpose of organ harvesting is to provide organs for transplants. Transplants do not necessarily have to take place in the same place as the location of the organ harvesting. The two locations are often different, organs harvested in one place are shipped to another place for transplanting. The allegation is further that the organs are harvested from the practitioners while they are still alive. The practitioners are killed in the course of the organ harvesting operations or immediately thereafter. These operations are a form of murder. Finally, we are told that the practitioners killed in this way are then cremated. There is no corpse left to examine to identify as the source of an organ transplant. The thought of such a practice occurring, particularly if it might be at the direction of a government, at the beginning of the 21st century when the value of individual human life is finally gaining more widespread respect, is most alarming. Accordingly, when one of the first in camera witnesses, a woman who is not a Falun Gong practitioner, met in the course of this inquiry said that her surgeon husband told her that he personally removed the corneas from approximately 2000 anaesthetized Falun Gong prisoners in northeast China during the two year period before October, 2003 ( at which time he refused to continue), we were shaken. Much of what we have encountered since, as outlined in this report, has been almost equally disturbing. D. Difficulties of proof The allegations, by their very nature, are difficult either to prove or disprove. The best evidence for proving any allegation is eye witness evidence. Yet for this alleged crime, there is unlikely to be any eye witness evidence. The people present at the scene of organ harvesting of Falun Gong practitioners, if it does occur, are either perpetrators or victims. There are no bystanders. Because the victims, according to, the allegation are murdered and cremated, there is no body to be found, no autopsy to be conducted. There are no surviving victims to tell what happened to them. Perpetrators are unlikely to confess to what would be, if they occurred, crimes against humanity. Nonetheless, though we did not get full scale confessions, we garnered a surprising number of admissions through investigator phone calls. The scene of the crime, if the crime has occurred, leaves no traces. Once an organ harvesting is completed, the operating room in which it takes place looks like any other empty operating room. The clampdown on human rights reporting in China makes assessment of the allegations difficult. China, regrettably, represses human rights reporters and defenders. There is no freedom of expression. Those reporting on human rights violations from within China are often jailed and sometimes charged with communicating state secrets. In this context, the silence of human rights non-governmental organizations on organ harvesting of unwilling Falun Gong practitioners tells us nothing. The International Committee of the Red Cross is not allowed to visit prisoners in China. Nor is any other organization concerned with human rights of prisoners. That also cuts off a potential avenue of evidence. China has no access to information legislation. It is impossible to get from the Government of China basic information about organ transplants - how many transplants there are, what is the source of the organs, how much is paid for transplants or where that money is spent. We did seek to visit China for this report. Our efforts went nowhere. We asked in writing for a meeting with the embassy to discuss terms of entry. Our letter is attached as an appendix to this report. Our request for a meeting was accepted. But the person who met with David Kilgour was interested only in denying the allegations and not in arranging for our visit. E. Methods of proof We have had to look at a number of factors, to determine whether they present a picture, all together, which make the allegations either true or untrue. None of these elements on its own either establishes or disproves the allegations. Together, they paint a picture. Many of the pieces of evidence we considered, in themselves, do not constitute ironclad proof of the allegation. But their non-existence might well have constituted disproof. The combination of these factors, particularly when there are so many of them, has the effect of making the allegations believable, even when any one of them in isolation might not do so. Where every possible element of disproof we could identify fails to disprove the allegations, the likelihood of the allegations being true becomes substantial. Proof can be either inductive or deductive. Criminal investigation normally works deductively, stringing together individual pieces of evidence into a coherent whole. The limitations our investigation faced placed severe constraints in this deductive method. Some elements from which we could deduce what was happening were, nonetheless, available, in particular, the investigator phone calls. We also used inductive reasoning, working backwards as well as forwards. If the allegations were not true, how would we know it was not true? If the allegations were true, what facts would be consistent with those allegations? What would explain the reality of the allegations, if the allegations were real? Answers to those sorts of questions which helped us to form our conclusions. F. Elements of proof and disproof We considered any and all elements of proof and disproof which were available and which might be available. Some evidentiary trails went nowhere. But we attempted to follow them nonetheless. 1) Perceived threat The Chinese Communist Party (CCP) came to see Falun Gong as a threat to its monopoly of ideological power over China in the late 1990s. This perceived threat does not prove the allegations. Yet, if the Falun Gong were not seen as a threat to the power of the CCP, the allegations would be undermined. Falun Gong was founded in north eastern China in 1992 by Li Hongzhi. In the 1980s, Li began practising qigong, a centuries-old system of breathing exercises - occasionally referred to as "Chinese yoga" - which was thought to improve health and spiritual sensitivity. Qigong in all its variations was suppressed across the country in 1949 after the CCP seized office in Beijing, but the police state environment had become less oppressive by the 1980s for qigong in all forms, including Falun Gong. Falun Gong had at the time only recently been developed by Li and included elements of Confucianism, Buddhism and Taoism. In essence, it teaches methods of meditation through exercises intended to improve physical and spiritual health and fitness. The movement is not political and it followers seek to promote truth, tolerance and compassion across racial, national and cultural boundaries. Violence is anathema to Falun Gong adherents. Li registered his movement with the government's Qigong Research Association and by the mid-nineties claimed to have approximately 60 million practitioners. The Chinese Government's sports department itself estimated that there were 70 million adherents in 1999. According to Professor Maria Hsia Chang's book, Falun Gong, published by Yale University in 2004, "Reportedly, the middle-aged and those from the middle class comprised (Falun Gong's) main following, although its ranks also included students and the elderly, as well as peasants. They came from all walks of life: teachers, physicians, soldiers, CCP cadres, diplomats posted in foreign countries, and other government officials. More than that, it was reported that among the followers of Master Li were the spouses and family members of some of China's top officials, including President Jiang, Premier Zhu and officials of the State Council, the executive branch of government." Falun Gong was part of the explosion of religious activity that appeared in China since the 1980s as "part of China's post-Mao 'spiritual vacuum' and the scaling back of the Party's ideological control of society..." 2 The popular appeal of Falun Gong in particular was based in part on its commitment to integrate modern science with Chinese traditions. 1 Professor Maria Hsia Chang's book, Falun Gong, published by Yale University, 2004 2 "Falun Gong and Canada's China policy". David Ownby, vol. 56, International Journal, Canadian Institute of International Affairs, Spring 2001. Before Falun Gong was banned in July, 1999, its adherents gathered regularly in China's myriad cities to do their exercises. As Chang notes, in Beijing alone there were more than 2000 practice stations. China's Premier Zhu for one, she adds, appeared to be pleased with the rising popularity of Li's movement because its positive social consequences included reducing medical costs for practitioners, who were often healthy. President Jiang himself was reported to have taken up qigong in 1992 by inviting a member of Zhong Gong, a group which then claimed 38 million members, to treat him for arthritis and neck pains (By early 2000, however, Jiang's government banned Zhong Gong as an "evil sect" and drove its leader out of China.).1 Jiang's personal confrontation with Falun Gong had begun to develop in 1996, Chang and many other observers conclude, when Li's book, Rotating the Law Wheel, sold almost a million copies across China. This alerted nervous party leaders, including Jiang, to the growing popularity of the movement. Fearing the possibility of political revolt against the government, they banned the sale of China Falun Gong and others publications and encouraged disgruntled adherents to accuse Li of stealing from the public. Chang notes: "Sensing that he and (Falun Gong) had fallen into disfavour - and reportedly at the urging of authorities - Li emigrated to the United States in early 1998, where he has since acquired permanent residence." 1 The non - violent phase of the campaign continued into May 1998, when a government television interviewer referred to Falun Gong as a "superstition". According to Chang's research, this resulted in about a hundred CCP party, government and military retirees, who were adherents of Falun Gong, petitioning Jiang unsuccessfully to legalize it. The party later had an article published in a magazine (Science and Technology for Youth), which singled out Falun Gong as a superstition and a health risk because practitioners might refuse conventional medical treatments for serious illnesses. A large number of Falun Gong adherents demonstrated peacefully against the contents of the piece outside the Tianjin editor's office. When arrests and police beatings resulted, the stage was set for another act of protest in the national capital. 1 On April 25th, 1999, 10,000 - 16,000 ordinary Chinese citizens gathered from dawn until late at night outside the CCP headquarters at Zhongnanhai next to Beijing's Forbidden City. The participants included intellectuals, government officials and party members. The protest was silent; there were no posters and not a single political slogan or defiant thought was voiced. Chang: "On the day of the demonstration, (Jiang) asked to be driven around Zhongnanhai in his limousine and stared at the throng through the tinted windows. That night, clearly alarmed by the demonstration, he wrote the CCP Politburo to assure his colleagues that he believed 'Marxism can triumph over Falun Gong'".1 The CCP's half century of monopolizing power in China was suddenly in the personal view of its current leader in grave danger. David Ownby, Director of the Centre of East Asian studies at the University of Montreal and a specialist in modern Chinese history, wrote candidly about what was occurring in mid-2001 and earlier in a paper prepared five years ago for the Canadian Institute of International Affairs.2 Ownby observes that the "seemingly benign nature of Falungong in North America and its apparently 'evil' character in China might lead Canadians who are concerned about human rights to look very carefully at the Chinese case against Falungong," Though Chinese leaders refer to Falun Gong as a "cult", Ownby notes that "there is little in their practice in Canada and the US that supports the idea that the group is a 'cult' in the general sense of the word. The Chinese government's case against Falungong as a 'cult' can not be convincing unless the government allows third party verification of its allegations of Falungong abuses in China. China has essentially reacted out of fear of Falungong's ability to mobilize its followers..." 2) A policy of persecution If organ harvesting from Falun Gong practitioners were widespread across China, one would expect some governmental policy directive to that effect. Yet, the secrecy of policy formulation in China prevents us from determining whether such a policy exists. Nonetheless, we do know that persecution of Falun Gong exists, as an official policy. There are some very strong policy statements, attached as an appendix to this report, by the Government of China and the Communist Party of China, calling for the persecution of the Falun Gong, including physical persecution. These statements are consistent with the allegations we have heard. According to Li Baigen, then assistant director of the Beijing Municipal Planning office who attended the meeting, during 1999 the three men heading the 610 office called more than 3000 officials to the Great Hall of the People in the capital to discuss the campaign against Falun Gong, which was then not going well. Demonstrations were continuing to occur around the capital. The head of the 610 office, Li Lanqing, verbally announced the government's new policy on the movement: "defaming their reputations, bankrupting them financially and destroying them physically." It appears to have been only after this meeting that the deaths of adherents at police hands began to be recorded as suicides. We were told by Falun Gong practitioners in Canada, that many of their members in China were told by law-enforcement officers in different parts of China that "death of Falun Gong members count as suicide, and they will be cremated directly". 3) Incitement to hatred The Falun Gong in China are dehumanized both in word and deed. Policy directives are matched by incitement to the population at large both to justify the policy of persecution, to recruit participants, and to forestall opposition. This sort of vocabulary directed against a particular group has become both the precursor and the hallmark of gross human violations directed against the group. According to Amnesty International, the Chinese Government adopted three strategies to crush Falun Gong: violence against practitioners who refuse to renounce their beliefs; "brainwashing" to force all known practitioners to abandon Falun Gong and renounce it, and a more effective media campaign to turn public opinion against Falun Gong. The media campaign featured an incident on 23 January 2001 when five persons declared to be Falun Gong practitioners by the government, including a 12 year-old girl and her mother, purportedly set themselves on fire in Tiananmen Square. The state media repeatedly broadcast shocking images of the burning body of the girl and material aimed at discrediting the group after the incident, reportedly changing public opinion about Falun Gong. There is considerable concern about whether in reality the government staged the entire incident. Incitement to hatred is not specific enough to indicate the form that persecution takes. But it promotes any and all violations of the worst sort. It is hard to imagine the allegations we have heard being true in the absence of this sort of hate propaganda. Once this sort of incitement exists, the fact that people would engage in such behaviour against the Falun Gong - harvesting their organs and killing them in the process - ceases to be implausible. 4) Massive Arrests Despite the media campaign, hundreds of thousands of men and women travelled to Beijing to protest or to unfold banners calling for the group's legalization almost daily. Author Jennifer Zeng, formerly of Beijing and now living in Australia, confirms that she managed to acquire classified information that by the end of April 2001 there had been 3 http://web.amnesty.org/library/Index/engASA170282001 4 "Few Members of Large Sect to Face Trial, Beijing Says", The New York Times, December 2, 1999, http://www.cesnur.org/testi/falun_023.htm or "Failure admitted in crackdown", South China Morning Post, April 22, 2000 By Willy Wo-Lap Lam approximately 830,000 arrests of Falun Gong adherents. Large numbers of Falun Gong adherents in arbitrary indefinite secret detention alone do not prove the allegations. But the opposite, the absence of such pool of detainees, would undermine the allegations. An extremely large group of people subject to the exercise of the whims and power of the state, without recourse to any form of protection of their rights, provides a potential source for organ harvesting of the unwilling. 5) Repression The crackdown on Falun Gong included President Jiang's creation of a special force, the 6-10 office 5 6, in every province, city, county, university, government department and government-owned business to spearhead the attack. Jiang's mandate to the office was to "eradicate" Falun Gong 6. This included sending thousands upon thousands of its practitioners to prisons and labour camps beginning in the summer of 1999. The US State Department's 2005 country report on China 7, for example, indicates that its police run hundreds of detention centres, with the 340 re-education-through-labour ones alone having a holding capacity of about 300,000 persons. The report also indicates that the number of Falun Gong practitioners who died in custody estimated was from a few hundred to a few thousand. The UN Special Rapporteur on Torture's recent report 8 noted that 5 Appendix 6, (June 7, 1999) "Comrade Jiang Zemin's speech at the meeting of the Political Bureau of CCCCP regarding speeding up the dealing with and settling the problem of 'FALUN GONG'" 6 H. CON. RES. 188, CONCURRENT RESOLUTION, U.S http://thomas.loc.gov/cgi-bin/query/z?c107:hc188: 7 U.S. Department of State 2005 Country Reports on Human Rights Practices - China, March 8, 2006. (http://www.state.gov/g/drl/rls/hrrpt/2005/61605.htm) 8 U.N. Commission on Human Rights: Report of the Special Rapporteur on torture and other cruel, inhuman or degrading treatment or punishment, Manfred Nowak, on his Mission to China from November 20 to December 2, 2005 (E/CN.4/2006/6/Add.6), March 10, 2006. (http://www.ohchr.org/english/bodies/chr/docs/62chr/ecn4-2006-6- Add6.doc ) 12 "Since 2000, the Special Rapporteur and his predecessors have reported 314 cases of alleged torture to the Government of China. These cases represent well over 1,160 individuals." And "In addition to this figure, it is to be noted that one case sent in 2003 (E/CN.4/2003/68/Add.1 para. 301) detailed the alleged ill treatment and torture of thousands of Falun Gong practitioners." Furthermore, the report indicated that 66% of the victims of alleged torture and ill-treatment were Falun Gong practitioners, with the remaining victims comprising Uyghurs (11%), sex workers (8%), Tibetans (6%), human rights defenders (5%), political dissidents (2%), and others (persons infected with HIV/AIDS and members of religious groups 2%). Local governments everywhere were given unlimited authority to implement Beijing's orders in 1999 and afterwards. This included numerous staged attempts later on to demonstrate to China's population that practitioners committed suicide by selfimmolation, killed and mutilated family members and refused medical treatment. Over time this campaign had the desired effect and many, if not most, Chinese nationals clearly came to accept the CCP view about Falun Gong. Only later in 1999 did the National People's Congress pass new laws targeting Falun Gong retroactively and purporting to legalize a long list of illegal acts done against its members. Part of a wire story from the Beijing bureau of the Washington Post fully two summers later (5 Aug 2001) 9 illustrates the severity of the ongoing methods of the 6-10 office and other agents of the regime against Falun Gong practitioners: "At a police station in western Beijing, Ouyang was stripped and interrogated for five hours. 'If I responded incorrectly, that is if I didn't say, 'yes,' they shocked me with the electric truncheon,' he said. Then, he was transferred to a labour 9 Washington Post Foreign Service, "Torture Is Breaking Falun Gong: China Systematically Eradicating Group," John Pomfret and Philip P. Pan, August 5, 2001. (http://www.washingtonpost.com/ac2/wpdyn? pagename=article&node=&contentId=A33055-2001Aug4 ) 13 camp in Beijing's western suburbs. There, the guards ordered him to stand facing a wall. If he moved, they shocked him. If he fell down from fatigue, they shocked him..." "(Later) he was taken before a group of Falun Gong inmates and rejected the group one more time as the video cameras rolled. Ouyang left jail and entered the brainwashing classes. Twenty days after debating Falun Gong for 16 hours a day, he 'graduated'. 'The pressure on me was and is incredible,' he said. 'In the past two years, I have seen the worst of what man can do. We really are the worst animals on Earth.'" Ownby noted that human rights organizations "have unanimously condemned China's brutal campaign against the Falungong , and many governments around the world, including Canada's, have expressed their concern." He cited Amnesty International's report of 2000 which noted that 77 Falun Gong practitioners had "died in custody, or shortly after release, in suspicious circumstances since the crackdown began in July 1999." 2 6) The Unidentified and the disappeared Falun Gong detentions, though in some ways it was just Chinese repression as usual with the Falun Gong being the unlucky targets, presented an unusual feature. Falun Gong practitioners who had come from all over the country to Tiananmen Square in Beijing to appeal or protest were arrested. Those who revealed their identities to their captors would be shipped back to their home localities. Their families would be implicated in their Falun Gong activities and pressured to join in the effort to get the practitioners to renounce Falun Gong. Their workplace leaders, their co-workers, their local government leaders would be held responsible and penalized for the fact that these individuals had gone to Beijing to appeal or protest. To protect their families and avoid the hostility of the people in their locality, many detained Falun Gong declined to identify themselves. The result was a large Falun Gong prison population whose identities the authorities did not know. As well, no one who knew them knew where they were. Though this refusal to identify themselves was done for protection purposes, it may have had the opposite effect. It is easier to victimize a person whose whereabouts is unknown to family members than a person whose location the family knows. This population is a remarkably undefended group of people, even by Chinese standards. This population of the unidentified was treated especially badly. As well, they were moved around within the Chinese prison system for reasons not explained to the prisoners. Was this the population which became the source of harvested Falun Gong organs? Obviously, the mere existence of this population does not tell us that this is so. Yet, the existence of this population provides a ready explanation for the source of harvested organs, if the allegations are true. Members of this population could just disappear without anyone outside of the prison system being the wiser. Information about this population of the unidentified is attached as an appendix to this report. In fact, there are many missing Falun Gong practitioners. An appendix to this report sets out evidence of these disappearances. If every Falun Gong practitioner were present and accounted for, the allegations with which we are faced would be disproved. But a person can go missing for a variety of reasons. Disappearances are a human rights violation for which China should be held accountable. But they are not necessarily this violation. There is every reason to believe that the Government of China is responsible for the disappearance of many Falun Gong practitioners. Those disappearances do not prove the allegations with which they are faced. But, like many of the other factors we considered, they are consistent with those allegations. 7) Sources of transplants There are many more transplants than identifiable sources. We know that some organs come from executed prisoners. Very few come from willing donor family members. But these sources leave huge gaps in the totals. The number of executed prisoners and willing sources come nowhere close to the number of transplants. The number of executed prisoners is itself not public. We are operating only from estimates attached as an appendix. Those estimates, when one considers global execution totals, are immense, but nowhere near the estimated totals of transplants. At least 98% of the organs for transplants come from someone other than family donors.10 In the case of kidneys, for example, only 227 of 40,393 transplants - about 0.6% - done between 1971 and 2001 in China came from family donors 11. Chinese nationals, for cultural reasons, are reluctant to donate their organs after death. There is no organized system of organ donation yet formed in China 12 10. The government of China admitted to using the organs of executed prisoners only last year 13 14, although it had been going on for many years. The regime has had no 10 http://www.transplantation.org.cn/html/2006-04/467.html Life weekly, 2006-04-07 Archived page: http://archive.edoors.com/render.php?uri=http%3A%2F%2Fwww.transplantation.org.cn%2Fhtml%2F2006 -04%2F467.html+&x=26&y=11 11 http://www.chinapharm.com.cn/html/xxhc/2002124105954.html China Pharmacy Net, 2002-12-05 Archived page: http://archive.edoors.com/content5.php?uri=http://www.chinapharm.com.cn/html/xxhc/2002124105954.html 12 http://www.chinadaily.com.cn/china/2006-05/05/content_582847.htm (2006-05-05, China Daily) English Archived page: http://archive.edoors.com/content5.php? uri=http://www.chinadaily.com.cn/china/2006-05/05/content_582847.htm 13 "China to 'tidy up' trade in executed prisoners' organs," The Times, December 03, 2005 http://www.timesonline.co.uk/article/0,,25689-1901558,00.html 16 barriers to prevent marketing the organs of "enemies of the state". According to AI's records 15, the average number of executed prisoners between 1995 and 1999 was 1680 per year. The average between 2000 and 2005 was 1616 per year. The numbers have bounced around from year to year, but the overall average number for the periods before and after Falun Gong persecution began is the same. Executions cannot explain the increase of organ transplants in China since the persecution of Falun Gong began. According to public reports, there were approximately 30,000 16 transplants in total done in China before 1999 and about 18,500 16 17 in the six year period 1994 to 1999. Professor Bingyi Shi, vice-chair of the China Medical Organ Transplant Association, says there were about 90,000 18 in total up until 2005, leaving about 60,000 in the six yearperiod 2000 to 2005 since the persecution of Falun Gong began. 14 "Beijing Mulls New Law on Transplants of Deathrow Inmate Organs", http://caijing.hexun.com/english/detail.aspx?issue=147&sl=2488&id=1430379 Caijing Magazine/Issue:147, Nov 28 2005 15 Index of AI Annual reports: http://www.amnesty.org/ailib/aireport/index.html, from here one can select annual report of each year. 16 http://www.biotech.org.cn/news/news/show.php?id=864 (China Biotech Information Net, 2002-12-02) http://www.chinapharm.com.cn/html/xxhc/2002124105954.html (China Pharmacy Net, 2002-12-05) Archived page: http://archive.edoors.com/content5.php?uri=http://www.chinapharm.com.cn/html/xxhc/2002124105954.html http://www.people.com.cn/GB/14739/14740/21474/2766303.html (People's Daily, 2004-09-07, from Xinhua News Agency) 17 "The Number of Renal Transplant (Asia & the Middle and Near East)1989-2000," Medical Net (Japan), http://www.medi-net.or.jp/tcnet/DATA/renal_a.html 18 http://www.transplantation.org.cn/html/2006-03/394.html (Health Paper Net 2006-03-02) Archived page: http://archive.edoors.com/render.php? uri=http%3A%2F%2Fwww.transplantation.org.cn%2Fhtml%2F2006-03%2F394.html+&x=32&y=11 17 The other identified sources of organ transplants, willing family donors and the brain dead, have always been tiny. In 2005, living-related kidney transplant consists of 0.5% of total transplants national wide 19. The total of brain dead donors for all years and all of China is 9 up to March 2006 19 20. There is no indication of a significant increase in either of these categories in recent years. Presumably the identified sources of organ transplants which produced 18,500 organ transplants in the six year period 1994 to 1999 produced the same number of organs for transplants in the next six year period 2000 to 2005. That means that the source of 41,500 transplants for the six year period 2000 to 2005 is unexplained. Where do the organs come from for the 41,500 transplants? The allegation of organ harvesting from Falun Gong practitioners provides an answer. Again this sort of gap in the figures does not establish that the allegation of harvesting of organs from Falun Gong practitioners is true. But the converse, a full explanation of the source of all organ transplants, would disprove the allegation. If the source of all organ transplants could be traced either to willing donors or executed prisoners, then the allegation aboutt the Falun Gong would be disproved. But such tracing is impossible. Estimates of the executions of China are often much higher than the figures based on publicly available records of executions. There is no official Chinese reporting on overall statistics of executions, leaving totals open to estimation. One technique some of those involved in estimating executions have used is the 19 "CURRENT SITUATION OF ORGAN DONATION IN CHINA FROM STIGMA TO STIGMATA", Abstract, The World Transplant Congress, http://www.abstracts2view.com/wtc/ Zhonghua K Chen, Fanjun Zeng, Changsheng Ming, Junjie Ma, Jipin Jiang. Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, HUST, Wuhan, China. http://www.abstracts2view.com/wtc/view.php?nu=WTC06L_1100&terms= 20 http://www.transplantation.org.cn/html/2006-03/400.html , (Beijing Youth Daily, 2006-03-06) 18 number of transplant operations. Because it is known that at least some transplants come from executed prisoners and that family donors are few and far between, some analysts have deduced from the number of transplants that executions have increased. This reasoning is unpersuasive. One cannot estimate executions from transplants unless executions are the only alleged source of transplants. Yet, Falun Gong practitioners are another alleged source. It is impossible to conclude that those practitioners are not a source of organs for transplants because of the number of executions where the number of executions is deduced from the number of transplants. There appeared to be only 22 21 liver transplant centres operating across China before 1999, compared to fully 500 in mid - April, 2006 22 12. The number of liver transplant operations in all of China appeared to total 135 by 1998 11, contrasted with more than 4000 18 in 2005 alone. For kidneys, the pattern is also significant (3,596 11 transplants in 1998 and nearly 10,000 18 in 2005). The increase in organ transplants in China parallels the increase in persecution of the Falun Gong. These parallel increases, in themselves, do not prove the allegation. But they are consistent with the allegation. If the parallel did not exist, that hypothetical non-existence would undercut the allegations. 8) Blood testing We know that Falun Gong practitioners in detention are systematically blood tested. 21 http://unn.people.com.cn/GB/channel413/417/1100/1131/200010/17/1857.html (People's Daily Net and Union News Net, 2000-10-17). Archived at: http://archive.edoors.com/content5.php? uri=http://unn.people.com.cn/GB/channel413/417/1100/1131/200010/17/157.html 22 According to Deputy Minister of Health, Mr. Huang Jiefu, http://www.transplantation.org.cn/html/2006-04/467.html (Lifeweekly, 2006-04-07). Archived at: http://archive.edoors.com/render.php? uri=http%3A%2F%2Fwww.transplantation.org.cn%2Fhtml%2F2006-04%2F467.html+&x=26&y=11 19 We have heard such a number of testimonials to that effect that this testing exists beyond a shadow of a doubt. Why is it happening? The practitioners themselves are not told. It is unlikely that the testing serves a health purpose. For one, it is unnecessary to blood test people systematically simply as a health precaution. For another, the health of the Falun Gong in detention is disregarded in so many other ways. it is implausible that the authorities would blood test Falun Gong as a precautionary health measure. Blood testing is a pre-requisite for organ transplants. Donors need to be matched with recipients so that the antibodies of the recipients do not reject the organs of the donors. The mere fact of blood testing does not establish that organ harvesting of Falun Gong practitioners is taking place. But the opposite is true. If there were no blood testing, the allegation would be disproved. The widespread blood testing of Falun Gong practitioners in detention cuts off this avenue of disproof. 9) Corpses with missing organs A number of family members of Falun Gong practitioners who died in detention reported seeing the corpses of their loved ones with surgical incisions and body parts missing. The authorities gave no coherent explanation for these mutilated corpses. Again the evidence about these mutilated corpses is attached as an appendix to this report. We have only a few instances of such mutilated corpses. We have no official explanation why they were mutilated. Their mutilation is consistent with organ harvesting. We cannot even guess otherwise why these corpses would have been mutilated and body parts removed. 20 10) A confession We met one witness who said that her surgeon husband told her that he personally removed the corneas from approximately 2,000 anaesthetized Falun Gong prisoners in northeast China during the two year period before October, 2003, at which time he refused to continue. The surgeon made it clear to his wife that none of the cornea "donors" survived the experience because other surgeons removed other vital organs and all of their bodies were then burned. The woman is not a Falun Gong practitioner. This confession is second hand. The women is not confessing something she did. Rather she is relating a terrible admission her husband made to her. The statement of this witness needs to be assessed for its credibility, something this report does later. Here we can say that, if it can be believed, it establishes all on its own the allegation. 11) Admissions One of us has listened with a certified Mandarin-English interpreter to the cited recorded telephone conversations between officials and callers on behalf of the Falun Gong communities in Canada and the United States. Certified copies of the relevant transcripts in Mandarin and English were provided to us. The accuracy of the translations of the portions of them used in this report is attested to by the certified translator, Mr. C. Y., a certified interpreter with the Government of Ontario. He certified that he had listened to the recording of the conversations referred to in this report and has read the transcripts in Chinese and the translated English version of the conversations, and verifies that the transcripts are correct and translations accurate. The original recordings of the calls remain available as well. One of us met with two of the callers in Toronto on May 27th to discuss the routing, timing, recording, accuracy of the translations from Mandarin to English and other features of the calls. One of the callers, "Ms. M", who will not be identified to avoid risk of harm to family 21 members still in China and will be referred to hereafter as M, told one of us that in early March, 2006 she managed to get through to the Public Security Bureau in Shanxi. The respondent there told her that healthy and young prisoners are selected from the prison population to be organ donors. If the candidates could not be tricked into providing the blood samples necessary for successful transplants, the official went on with guileless candour, employees of the office take the samples by force. On March 18 or 19, 2006 M spoke to a representative of the Eye Department at the People's Liberation Army hospital in Shenyang in north-eastern China, although she was not able to make a full recorded transcript. Her notes indicate that the person identifying himself as the department's Chief-Physician said the facility did "many cornea operations", adding that "we also have fresh corneas." Asked what that means, the Chief-Physician replied "...just taken from bodies". At Army Hospital 301 in Beijing in April, 2006, a surgeon, who told M that she did liver transplants herself, added that the source of the organs was a "state secret" and that anyone revealing the source "could be disqualified from doing such operations." The second investigator for the World Organization to Investigate the Persecution of Falun Gong placed her calls from within the continental United States and will hereafter be referred to as N. N telephoned approximately thirty hospitals, detention centres and courts across China and recorded a number of them admitting to the use of organs from Falun Gong practitioners. Her methods, translations and so on were noted by the one of us who met with her in Toronto on May 27th to have been done on a virtually identical basis as M and are thus accepted by both of us as accurately representing what was said over the telephone. The same accredited translator worked on the texts of her recorded conversations. Hospitals and Detention Centres Admissions in Telephone 22 Conversations FALUN GONG ORGANS ARE STILL READILY AVAILABLE Admission from Mishan Detention Centre: On June 8, 2006, an official at the Mishan city detention centre, Heilongjiang Province admitted that the centre then had at least five or six male Falun Gong prisoners under 40 years-of-age available as organ suppliers. Mr. Li of the centre also gave details of the operation of selecting Falun Gong prisoners as organ suppliers for hospitals: 1. This particular detention centre at the time picked the organ suppliers, not the hospital. 2. Chief-Physician Cui of the detention centre at the time of the conversation was the point of contact for organ suppliers. 3. Blood will be drawn from the prisoners picked to become organ suppliers, and such prisoners do not know the purpose of the blood test. 4. the detention centre has various means of obtaining blood samples from reluctant "donors". Shanghai's Zhongshan hospital: A doctor at this hospital in mid-March of this year said that all of his organs come from Falun Gong practitioners. Qianfoshan hospital in Shandong: A doctor at this hospital in March implied that he then had organs from Falun Gong persons and added that in April there would be "more of these kinds of bodies..." Minzu hospital in Nanning city: In May, Dr Lu of this hospital said organs from Falun Gong practitioners were not available at his institution and suggested the caller call Guangzhou to get 23 them. He also admitted that he earlier went to prisons to select healthy Falun Gong persons in their 30s to provide their organs. Zhengzhou Medical University in Henan province: In mid-March of this year, Dr Wang of this centre agreed that "we pick all the young and healthy kidneys..." Guangzhou Military region hospital: Dr Zhu of this hospital in April of this year said he then had some type B kidneys from Falun Gong, but would have "several batches" before May 1 and perhaps no more until May 20 or later. Oriental Organ Transplant Centre: Chief-Physician Song at this centre in mid-March this year volunteered that his hospital had more than ten "beating hearts". The caller asked if that meant "live bodies" and Song replied, "Yes it is so." Wuhan city Tongji hospital: An official at this hospital two weeks later told the caller that "(i)t's not a problem" for his institution when the caller said, "...we hope the kidney suppliers are alive. (We're) looking for live organ transplants from prisoners, for example, using living bodies from prisoners who practise Falun Gong, Is it possible?" Detention Centres and Courts: First Detention Centre of Qinhuangdao City An official at this centre told the caller in mid-May this year that she should call the Intermediate People's court to obtain Falun Gong kidneys. 24 Intermediate People's court The same day, an official at the Intermediate People's court said they had no Falun Gong live kidneys, but had had them in the past, specifically in 2001. First Criminal Bureau of the Jinzhou people's court In May of this year, an official in the court told the caller that access to Falun Gong kidneys currently depended on "qualifications" of the organ seekers. The map of China which follows indicates the regions where detention or hospital personnel have made admissions to telephone investigators: Most of the excerpted phone call texts are in an appendix. For illustration purposes, 25 excerpts of three conversations follow: (1)Mishan City Detention Centre, Heilongjiang province (8 June 2006): M: "Do you have Falun Gong [organ] suppliers? ..." Li: "We used to have, yes." M: "... what about now?" Li: "... Yes." .. M: "Can we come to select, or you provide directly to us?" Li: "We provide them to you." M: "What about the price?" Li: "We discuss after you come." .. M: "... How many [Falun Gong suppliers] under age 40 do you have?" Li: "Quite a few." .. M: "Are they male or female?" Li: "Male" .. M: "Now, for ... the male Falun Gong [prisoners], How many of them do you have?" Li: "Seven, eight, we have [at least] five, six now." M: "Are they from countryside or from the city?" Li: "countryside." (2) Nanning City Minzu Hospital in Guangxi Autonomous Region (22 May 2006): M: "...Could you find organs from Falun Gong practitioners?" 26 Dr. Lu: "Let me tell you, we have no way to get (them). It's rather difficult to get it now in Guangxi. If you cannot wait, I suggest you go to Guangzhou because it's very easy for them to get the organs. They are able to look for (them) nation wide. As they are performing the liver transplant, they can get the kidney for you at the same time, so it's very easy for them to do. Many places where supplies are short go to them for help..." M: "Why is it easy for them to get?" Lu: "Because they are an important institution. They contact the (judicial) system in the name of the whole university." M: "Then they use organs from Falun Gong practitioners?" Lu: "Correct..." M: "...what you used before (organs from Falun Gong practitioners), was it from detention centre(s) or prison(s)?" Lu: "From prisons." M: "...and it was from healthy Falun Gong practitioners...?" Lu: "Correct. We would choose the good ones because we assure the quality in our operation." M: "That means you choose the organs yourself." Lu: "Correct..." M: "Usually, how old is the organ supplier?" Lu: "Usually in their thirties." M: "... Then you will go to the prison to select yourself?" Lu: "Correct. We must select it." M: "What if the chosen one doesn't want to have blood drawn?" Lu: "He will for sure let us do it." M: "How?" Lu: "They will for sure find a way. What do you worry about? These kinds of things should not be of any concern to you. They have their procedures." M: "Does the person know that his organ will be removed?" Lu: "No, he doesn't." 27 (3) Oriental Organ Transplant Centre (also called Tianjin City No 1 Central Hospital),Tianjin City, (15 March 2006): N: Is this Chief-Physician Song?" Song: Yes, please speak." .. N: Her doctor told her that the kidney is quite good because he [the supplier,] practises ...Falun Gong." Song: "Of course. We have all those who breathe and with heart beat...Up until now, for this year, we have more than ten kidneys, more than ten such kidneys." N: "More than ten of this kind of kidneys? You mean live bodies?" Song: "Yes it is so." 12) Waiting times Hospital web sites in China advertise short waiting times for organ transplants. Transplants of long dead donors are not viable because of organ deterioration after death. If we take these hospital's self-promotions at face value, they tell us that there are a number of people now alive who are available almost on demand as sources of organs. The wait times for organ transplants for organ recipients in China appear to be much lower than anywhere else. The China International Transplantation Assistant Centre website says, "It may take only one week to find out the suitable (kidney) donor, the maximum time being one month..." 23. It goes further, "If something wrong with the 23 http://en.zoukiishoku.com/list/qa2.htm Archived page: http://archive.edoors.com/render.php?uri=http%3A%2F%2Fen.zoukiishoku.com%2Flist%2Fqa2.htm&x=1 9&y=11 28 donor's organ happens, the patient will have the option to be offered another organ donor and have the operation again in one week." 24 The site of the Oriental Organ Transplant Centre in early April, 2006, claimed that "the average waiting time (for a suitable liver) is 2 weeks." 25 The website of the Changzheng Hospital in Shanghai says: "...the average waiting time for a liver supply is one week among all the patients" 26. In contrast, the median waiting time in Canada was 32.5 months in 2003 and in British Columbia it was even longer at 52.5 months 27. If as indicated the survival period for a kidney is between 24-48 hours and a liver about 12 hours 28, the presence of a large bank of living kidney-liver "donors" must be the only way China's transplant centres can assure such short waits to customers. The astonishingly short waiting times advertised for perfectly- matched organs would suggest the existence of both a computer matching system for transplants and a large bank of live prospective 'donors'. The advertisements do not identify Falun Gong practitioners as the source of these organs. But there are no other identified sources. Even if the Falun Gong as the sources of these organs is only an allegation, it is the only allegation we have. No other large body of people now alive have been identified to us as sources of organs sufficient in numbers to meet the large number of transplant demands now being made and met in China. 24 http://en.zoukiishoku.com/list/volunteer.htm Archived at: http://archive.edoors.com/render.php?uri=http%3A%2F%2Fen.zoukiishoku.com%2Flist%2Fvolunteer.htm +&x=8& y=9 25 The front page has been altered. The archived page is at: http://archive.edoors.com/content5.php?uri=http://www.ootc.net/special_images/ootc1.png 26 http://www.transorgan.com/apply.asp Archived at : http://archive.edoors.com/render.php?uri=http%3A%2F%2Fwww.transorgan.com%2Fapply.asp&x=15&y =8 27 Canadian Organ Replacement Register, Canadian Institute for Health Information, (http://www.cihi.ca/cihiweb/en/downloads/CORR-CST2005_Gill-rev_July22_2005.ppt), July 2005 28 Donor Matching System, The Organ Procurement and Transplantation Network (OPTN) http://www.optn.org/about/transplantation/matchingProcess.asp 29 13) Incriminating Information on Websites Some of the material available on the websites of various transplant centres in China before March 9, 2006 (when allegations about large-scale organ seizures resurfaced in Canadian and other world media) is also inculpatory. Understandably, a good deal of it has since been removed. So these comments will refer only to sites that can still be found at archived locations, with the site locations being identified either in the comments or as footnotes. A surprising amount of self-accusatory material is still available as of the final week of June, 2006 to web browsers. We list here only four examples: (1) China International Transplantation Network Assistance Centre Website (http://en.zoukiishoku.com/ ) (Shenyang City) This website as of May 17, 2006 indicated in the English version (the Mandarin one evidently disappeared after March 9) that the centre was established in 2003 at the First Affiliated Hospital of China Medical University "...specifically for foreign friends. Most of the patients are from all over the world." The opening sentence of the site 29 introduction declares that "Viscera (one dictionary definition: "soft interior organs...including the brain, lungs, heart etc") providers can be found immediately!" On another page 30 on the same site is this statement: "...the number of kidney transplant operations is at least 5,000 every year all over the country. So many transplantation operations are owing to the support of the Chinese government. The 29 The original page has been altered. Older versions with that specific statement can still be found at Internet Archive: http://web.archive.org/web/20050305122521/http://en.zoukiishoku.com/ 30 http://en.zoukiishoku.com/list/facts.htm or use archived version at: http://archive.edoors.com/render.php?uri=http%3A%2F%2Fen.zoukiishoku.com%2Flist%2Ffacts.htm&x= 24&y=1 2 30 supreme demotic court, supreme demotic law - officer, police, judiciary, department of health and civil administration have enacted a law together to make sure that organ donations are supported by the government. This is unique in the world." In the 'question and answer' section of the site are found: "Before the living kidney transplantation, we will ensure the donor's renal function...So it is more safe than in other countries, where the organ is not from a living donor." 31 "Q: Are the organs for the pancreas transplant(ed) from brain death (sic) (dead) patients?" "A: Our organs do not come from brain death victims because the state of the organ may not be good." 32 (2)Orient Organ Transplant Centre Website (http://www.ootc.net ) (Tianjin City) On a page which we were informed was changed in mid-April (but can still be viewed as an archive 25) is the claim that from "January 2005 to now, we have done 647 liver transplants - 12 of them done this week; the average waiting time is 2 weeks." A chart also removed about the same time (but archive still available 33) indicates that from virtually a standing start in 1998 (when it managed only 9 liver transplants) by 2005 it had completed fully 2248 34. 31 http://en.zoukiishoku.com/list/qa.htm or use archived version: http://archive.edoors.com/render.php?uri=http%3A%2F%2Fen.zoukiishoku.com%2Flist%2Fqa.htm&x=27 &y=10 32 http://en.zoukiishoku.com/list/qa7.htm or use archived version: http://archive.edoors.com/render.php?uri=http%3A%2F%2Fen.zoukiishoku.com%2Flist%2Fqa7.htm&x=3 5&y=10 33 The front page has been altered. Archived at: http://archive.edoors.com/content5.php?uri=http://www.ootc.net/special_images/ooct_achievement.jpg http://archive.edoors.com/content5.php?uri=http://www.ootc.net/special_images/ootc2.png 34 The front page has been altered. Archived at: http://archive.edoors.com/content5.php?uri=http://www.ootc.net/special_images/ooct_case.jpg http://archive.edoors.com/content5.php?uri=http://www.ootc.net/special_images/ootc1.png 31 In contrast, according to the Canadian Organ Replacement Register 27, the total in Canada for all kinds of organ transplants in 2004 was 1773. (3)Jiaotang University Hospital Liver Transplant Centre Website (http://www.firsthospital.cn/hospital/index.asp ) (Shanghai) In a posting on April 26, 2006 35, the sohu website says in part: "The liver transplant cases (here) are seven in 2001, 53 cases in 2002, 105 cases in 2003, 144 cases in 2004, 147 cases in 2005 and 17 cases in January, 2006," . (4) Website of Changzheng Hospital Organ Transplant Centre, affiliated with No. 2 Military Medical University (http://www.transorgan.com/) (Shanghai) A page was removed after March 9, 2006. (Internet Archive page is available 36.) It contains the following graph depicting the number of liver transplant each year by this Centre: 35 http://www.health.sohu.com/20060426/n243015842.shtml Archived at: http://archive.edoors.com/content5.php?uri=http://health.sohu.com/52/81/harticle15198152.shtml 36 The URL of the removed page as of March 2005 in the Internet Archive is http://web.archive.org/web/20050317130117/http://www.transorgan.com/about_g_intro.asp 32 In the "Liver Transplant Application" form 37, it states on the top, "...Currently, for the liver transplant, the operation fee and the hospitalisation expense together is about 200,000 yuan ($66,667 CND), and the average waiting time for a liver supply is one week among all the patients in our hospital...." 14) Victim interviews We conducted a number of interviews with victims of Falun Gong repression in China who now reside in Canada. These interviews revealed activities by the authorities which, while inconclusive in isolation, in context with everything else we considered, were corroborative and consistent with the allegations. (1) Ms. Yuzhi Wang, Vancouver One of us met with the Ms. Wang in Toronto on May 27th at a location at the University of Toronto and heard her deeply disturbing personal history. For being a Falun Gong practitioner and therefore suddenly "an enemy of the people" only as of mid-1999, she spent most of her time between 2000 and the end of 2001 in labour camps, with 20-50 37 http://www.transorgan.com/apply.asp , Archived at : http://archive.edoors.com/render.php?uri=http%3A%2F%2Fwww.transorgan.com%2Fapply.asp&x=15&y =8 33 persons squeezed into a cell of approximately 15 square metres. By late 2001, near death from various forms of torture over a lengthy period for refusing to give up her beliefs, she was sent to hospital for "treatment", which included approximately three months of forced feeding after she began a hunger strike in desperation and more beatings by thugs from the 6-10 office. In Harbin, Wang was examined thoroughly at several hospitals, and the examining doctors indicated that she had organ damage. Later, when she overheard a doctor say that she would not recover, the 610 office personnel "suddenly lost interest in me and I (eventually) managed to escape from the hospital." When in time her health recovered, she found a way to relocate to a country in the Middle East, but even there 6-10 agents attempted to kidnap her because she was criticizing the Jiang regime to tourists visiting there from China. Wang gives much credit to Canadian immigration officials there for intervening and enabling her to come to Canada as a refugee. She is convinced that she survived only because her captors in Harbin concluded they could not profit by selling her organs, which they concluded were damaged by their "treatments". (2) Mr. Xiaohua Wang, Montreal On meeting Mr. Wang on May 27th, he provided a detailed statement of his periods of persecution by officials during 2001 and 2002. It began when police arrested him at a Kunming city design institute where he worked as an engineer, ransacked his home, stole his computer, and took him to prison. His wife and two-year-old child could only scream at the departing police vehicle. In jail, he was beaten into unconsciousness by long term inmates on the order of the warden, whose constant mantra was, "Beating is the only way to treat (Falun Gong)". Wang was later transferred to the local "brainwashing centre". When released, he fled to a distant region of the country without his family, where he found work until he was 34 again arrested as one of the 6-10 office's "most wanted criminals". He ended up at the Yunnan forced labour camp #2, which manufactures artificial gems and crystals for export through the application of chromium oxide in the manufacturing process. For refusing to recant his Falun Gong beliefs, Wang was kept there for almost two years. His hair turned gray from the constant exposure to the chemical and 16-hour work days. In January, 2002, the local hospital did a comprehensive physical examination on every Falun Gong prisoner, including an electrocardiogram, whole body x-ray, liver, blood and kidney test. Beforehand, he was told by the police: "The Communist party cares about you so much. They want to transform Falun Gong at all costs." Little knowing the probable real purpose of the tests at the time, he cooperated. Miraculously, he managed to get out of China and get to Canada in early 2005. He also praises Canadian immigration officials for getting him and his family out speedily. (3) Ms. Na Gan, Toronto Ms Gan worked as a customs officer at the Beijing International Airport for 11 years until mid-July, 1999, when she and five other Falun Gong practitioners attempted to avail themselves of each citizen's specified constitutional right to petition at a designated location near the CCP headquarters in central Beijing. Police beat the group and dragged all of them into waiting buses. Thereafter, she was incarcerated on five further occasions because she refused to renounce Falun Gong. When a psychiatrist examined her in a hospital and pronounced her mentally fit, the police still kept her in a locked room there for eight days with patients who were screaming. When she later unfurled a banner in Tiananmen Square, saying "truthfulness-compassion-tolerance", she was kicked by police. Back in custody, she was beaten by other prisoners at the direction of officials and forced to stand for hours in the snow without an overcoat. In March, 2000, the banner incident got her a one year sentence under house arrest, 35 expulsion from the Communist party, and termination of her salary. By the year's end, she was back in a crowded cell with mostly Falun Gong adherents. When she refused to read aloud an article defaming Falun Gong, a policeman kicked her in the head repeatedly. She was next moved to the Beijing women's labour camp, where the treatment was so severe that she finally signed a pledge to renounce Falun Gong. She managed to leave China for Canada as an immigrant from fear of further persecution in May, 2004 but without her husband and daughter. Gan's observations relative to organ harvesting are probably inconclusive. Numerous Falun Gong prisoners with her in detention in Beijing - some cells holding as many as 30 women - were identified by four digit numbers only. One night, she was awakened by noises, only to find the next morning that some of the numbered inmates had been dragged from their cells and never returned. One cannot fairly conclude the worst here without knowing more. For five months in mid-2001, she was part of forced labour team of approximately 130 mostly female Falun Gong prisoners. Only the Falun Gong members in the group were taken by soldiers to a nearby police hospital for blood and urine tests, x-rays, and eye examinations. This medical attention seemed to her at the time completely out of character with everything else experienced at the camp. Only later did she learn about the organ harvesting going occurring across China. 15) Human rights violations generally Falun Gong are not the only victims of human rights violations in China. It is incontestable that the organs of prisoners sentenced to death are harvested after execution. Besides Falun Gong, other prime targets of human rights violations are Tibetans, Christians, Uighurs, democracy activists and human rights defenders. Rule of Law mechanisms in place to prevent human rights violations, such as an independent judiciary, access to counsel on detention, habeas corpus, the right to public trial, are glaringly absent in China. China, according to its constitution, is ruled by the 36 Communist Party. It is not ruled by law. This overall pattern of human rights violations, like many other factors, does not in itself prove the allegations. But it removes an element of disproof. It is impossible to say of these allegations that it is out of step with an overall pattern of respect for human rights in China. While the allegations, in themselves, are surprising, they are less surprising with a country that has the human rights record China than they would be for many other countries. 16) Financial considerations In China, organ transplanting is a very profitable business. We can trace the money of the people who pay for organ transplants to specific hospitals which do organ transplants, but we can not go further than that. We do not know who gets the money the hospitals receive. Are doctors and nurses engaged in criminal organ harvesting paid exorbitant sums for their crimes? That was a question it was impossible for us to answer, since we had no way of knowing where the money went. China International Transplantation Network Assistance Centre Website (http://en.zoukiishoku.com/ ) (Shenyang City) Before its indicated removal from the site 38 in April, 2006, the size of the profits for transplants was suggested in the following price list: Kidney US$62,000 Liver US$98,000-130,000 Liver-kidney US$160,000-180,000 Kidney-pancreas US$150,000 38 Yet, one can still go to the Internet Archive to find the information on this website from March 2006: http://archive.edoors.com/render.php?uri=http%3A%2F%2Fen.zoukiishoku.com%2Flist%2Fcost.htm+&x= 16&y=1 1 37 Lung US$150,000-170,000 Heart US$130,000-160,000 Cornea US$30,000 A standard way of investigating any crime allegation where money changes hands is to follow the money trail. But for China, its closed doors means that following the money trail is impossible. Not knowing where the money goes proves nothing. But it also disproves nothing, including these allegations. 17) Corruption Corruption is a major problem across China. State institutions are sometimes run for the benefit of those in charge of them, rather than for the benefit of the people. Military hospitals across the country operate independently from the Ministry of Health and, while the figures for their organ transplants are secret, we understand they are large. Trafficking in Falun Gong vital organs would be consistent with the numerous other commercial activities on the part of the Chinese army, especially in the years up until 2004 while Jiang was chairman of the country's Military Commission. The widespread corruption of official Chinese institutions raises the question whether the harvesting of Falun Gong organs for transplants, if it does occur, happens as the result of official policy or as the result of the profiteering of individual hospitals, taking advantage of the defenceless of a captive Falun Gong population in their regions. The policy of repression of the Falun Gong means that they are in prison without rights, at the disposition of corrupt authorities. The incitement to hatred against the Falun Gong and their dehumanization means that they can be butchered and killed without qualms by those who buy into this official hate propaganda. Whether the harvesting of Falun Gong organs, if it does occur, happens as the result of official policy or unofficial corruption, is for us difficult to be absolutely certain about. 38 Chinese officials, in theory in charge of the country, sometimes have substantial difficulty in determining whether corruption exists, let alone how to put an end to it. For us, on the outside, it is easier to form a conclusion on the result, whether or not the alleged organ harvesting occurs, than to determine whether this practice, if it exists, is the result of policy or corruption. 18) Legislation China in March enacted legislation to take effect July 1 to ban sales of human organs and require that donors give written permission for their organs to be transplanted. The legislation is titled a "temporary regulation." The rules further limit transplant surgery to certain institutions. These institutions must verify that the organs are from legal sources. Hospital transplant ethics committees must approve all transplants in advance. This legislation is welcome. Yet, its very enactment highlights the fact that there is no such legislation in place now, the lawlessness now enveloping organ transplants. This very lawless, again, though it does not prove the allegations, removes a possible element of disproof. The absence of any legal constraints on organ transplants in China makes the allegations on which this report focusses easier to accept. Up to July 1st, Chinese law has allowed the buying and selling of organs. Chinese law has not required that donors give written permission for their organs to be transplanted. There have been no restriction on the institutions which could engage in organ harvesting or transplants. Until July 1, there was no requirement that the institutions engaged in transplants had to verify that the organs being transplanted were from legal sources. There was no obligation to have transplant ethics committees approve all transplants in advance. As well, the fact that the legislation came into force on July 1 does not mean that the problem, if it existed, has ceased to exist since that date. In China, there is a large 39 step between the enactment of legislation and its implementation. To take an obvious example, the 1982 Constitution of China provides that the people of China will turn China into a country with a high level of democracy. We are now twenty four years from the enactment of that commitment to democracy. Yet China is far from democratic. The mere fact that China now has in force organ transplant legislation does not mean, in itself, that the legislation is implemented. Indeed, the overall record of China in implementing new legislation is such that the old practices for organ transplants, whatever they may happen to be, are likely to continue, at least in some places in China, for quite some time. G. Credibility We conclude that the verbal admissions in the transcripts of interviews of investigators can be trusted. There is no doubt in our minds that these interviews did take place with the persons claimed to be interviewed at the time and place indicated and that the transcripts accurately reflect what was said. Moreover, the content of what was said can itself be believed. For one, when weighed against the recent international uproar about alleged organ seizures as the 2008 Beijing Olympics approach, the admissions made at the various institutions are contrary to the reputational interests of the government of China in attempting to convince the international community that the widespread killing of Falun Gong prisoners for their vital organs has not occurred. The testimony of the wife of the surgeon allegedly complicit in Falun Gong organ harvesting seemed credible to us, partly because of its extreme detail. However, that detail also posed a problem for us, because it provided a good deal of information 40 which it was impossible to corroborate independently. We were reluctant to base our findings on sole source information. So, in the end, we relied on the testimony of this witness only where it was corroborative and consistent with other evidence, rather than as sole source information. In the course of our work, we have come across a number of people sceptical of the allegations. This scepticism has a number of different causes. Some of the scepticism reminds of the statement of U.S. Supreme Court Justice Felix Frankfurter 1943 to a Polish diplomat in reaction to being told by Jan Karski about the Holocaust. Frankfurter said: "I did not say that this young man was lying. I said that I was unable to believe what he told me. There is a difference." The allegations here are so shocking that they are almost impossible to believe. The allegations, if true, would represent a grotesque form of evil which, despite all the depravations humanity has seen, would be new to this planet. The very horror makes us reel back in disbelief. But that disbelief does not mean that the allegations are untrue. H. Further Research Obviously, this report is not the final word on this subject. There is much that we ourselves, given the opportunity, would rather do before we completed the report. But it would mean pursuing avenues of investigation which are not now open to us. We will welcome any comments on its contents or any additional information individuals or governments might be willing to provide. We would like to see Chinese hospital records of transplants. Are there consents on file? Are there records of sources of organs? Donors can survive many forms of transplant operations. No one can survive a full liver 41 or heart donation. But kidney donations are normally not fatal. Where are the surviving donors? We would like to do a random sampling of donations to see if we could locate the donors. Family members of deceased donors should either know of the consents of the donors. Alternatively, the family members should have given the consents themselves. Here, too, we would like to do a random sampling of immediate family members of deceased donors to see if the families either consented themselves to the donations or were aware of the consent of the donor. China has engaged in a major expansion of organ transplant facilities in recent years. This expansion likely would have been accompanied by feasibility studies indicating organ sources. We would like to see these feasibility studies. Ideally, we would like to pursue further research before we come to any firm conclusions. But the very willingness to engage in further research may require the forming of tentative conclusions. If we could decide now that there is nothing in the allegations, we might well further conclude that additional research would be pointless. I. Conclusions Based on what we now know, we have come to the regrettable conclusion that the allegations are true. We believe that there has been and continues today to be large scale organ seizures from unwilling Falun Gong practitioners. We have concluded that the government of China and its agencies in numerous parts of the country, in particular hospitals but also detention centres and 'people's courts', since 1999 have put to death a large but unknown number of Falun Gong prisoners of conscience. Their vital organs, including hearts, kidneys, livers and corneas, were virtually simultaneously seized involuntarily for sale at high prices, sometimes to 42 foreigners, who normally face long waits for voluntary donations of such organs in their home countries. How many of the victims were first convicted of any offence, serious or otherwise, in legitimate courts, we are unable to estimate because such information appears to be unavailable both to Chinese nationals and foreigners. It appears to us that many human beings belonging to a peaceful voluntary organization made illegal seven years ago by President Jiang because he thought it might threaten the dominance of the Communist Party of China have been in effect executed by medical practitioners for their organs. Our conclusion comes not from any one single item of evidence, but rather the piecing together of all the evidence we have considered. Each portion of the evidence we have considered is, in itself, verifiable and, in most cases, incontestable. Put together, they paint a damning whole picture. It is their combination that has convinced us. J. Recommendations 1) It goes without saying that the harvesting of organs of unwilling Falun Gong practitioners, if it is happening, as we believe it is, should cease. 2) Organ harvesting of unwilling donors where it is either systematic or widespread is a crime against humanity. We are not in a position, with the resources and information at our disposal, to conduct a criminal investigation. Criminal authorities in China should investigate the allegation for possible prosecution. 3) Governmental, non-governmental and inter-governmental human rights organizations with far better investigative capacity than ours should take these allegations seriously and make their own determinations whether or not they are true. 4) Article 3 of the United Nations Protocol to Prevent, Suppress and Punish Trafficking 43 in Persons, bans, among other practices,... the removal of organs. Governments should request the relevant agency of the UN (we would suggest the UN Committee Against Torture and the UN Special Rapporteur on Torture), to investigate if the government of China has engaged in, or is engaging in now, in violations of any of the terms of Article 3. If so, the necessary steps to seek a remedy should be initiated with deliberate haste. 5) Until the Chinese law on organ transplants is effectively implemented, foreign governments should not issue visas to doctors from China seeking to travel abroad for the purpose of training in organ or bodily issue transplantation. Any doctor in China known to be involved in trafficking in the organs of prisoners should be barred entry by all foreign countries permanently. 6) All states should strengthen their laws against the crime of trafficking in organs. The laws should require doctors to report to the authorities of their country any evidence suggesting that a patient has obtained an organ from a trafficked person abroad, defined to include persons in detention abroad. 7) All should prevent and, at the very least, discourage their nationals from obtaining organ transplants in China until the Chinese law on organ transplants is rigorously implemented. States should, if necessary, deny passports or revoke passports of those who are travelling to China for organ transplants. 8) Until the international community is satisfied that the new Chinese law on organ transplants is effectively implemented, foreign funding agencies, medical organizations and individual health professionals should not participate in any Government of China-sponsored organ transplant research or meetings. Foreign companies which currently provide goods and services to China's organ transplant programs should cease and desist immediately until the government of China can demonstrate that their law on organ transplants is effective. 44 9) The current form of dialogue between Canada and China over human rights should cease. Canadian political scientist and former diplomat Charles Burton recently declared the dialogue a charade. In hindsight, the Government erred in agreeing to the talk fests in exchange for Canada no longer co-sponsoring the yearly motion criticizing China's government at the then UN Human Rights Commission. 10) The repression, imprisonment and severe mistreatment of Falun Gong practitioners must stop immediately. 11) All detention facilities, including forced labour camps, must be opened for international community inspection through the International Committee for the Red Cross or other human rights or humanitarian organization. 12) Chinese hospitals should keep records of the source of every transplant. These records should be available for inspection by international human rights officials. 13) Every organ transplant donor should consent to the donation in writing. These consents should be available for inspection by international human rights officials. 14) China and every other state now party to the Convention against Torture, including Canada, should accede to the Optional Protocol to the Convention against Torture. 15) Every organ transplant, both donation and receipt, should have official approval from a government supervisory agency before the transplant takes place. 16) Organ harvesting from executed prisoners should cease immediately. 17) Commercialization of organ transplants should cease. Organ transplants should not be for sale. 45 K. Commentary To accept the first recommendation would mean accepting that the allegations are true. All the other recommendations we make do not require accepting that the allegations are true. We suggest adoption of these other recommendations in any case. To accept the next three recommendations would mean giving at least some credence to the allegations. The next three recommendations do not require accepting the allegations as true; but they make sense only if there is a reasonable possibility the recommendations are true. The remaining recommendations make sense and could be implemented whether the allegations are true or false. The next five recommendations are addressed to the international community, asking the community to promote respect within China of international standards about organ transplants. We are well aware that the Government of China denies the allegations. We suggest that the most credible and effective way from the Government of China to assert that denial is to implement all of the remaining recommendations in this report after the first eight recommendations. If the remaining recommendations were implemented, the allegations considered here could no longer be made. To all those are sceptical about the allegations, we ask you to ask yourself what you would suggest to prevent, in any state, allegations like these from becoming true. The common sense list of precautions to prevent the sort of activity here alleged have pretty much all been missing in China. Until the recent legislation was in force, many basic precautions to prevent the abuses here alleged from happening were not in place. That legislation does not fill the gap unless and until it is comprehensively implemented. Every state, and not just China, needs to lay in its defences in order to prevent the 46 harvesting of organs from the unwilling, the marginalized, the defenceless. Whatever one thinks of the allegations, and we reiterate we believe them to be true, China is remarkably undefended to prevent the sorts of activities here alleged from happening. There are many reasons why the death penalty is wrong. Not least is the densitization of the executioners. When the state kills defenceless human beings already in detention for their crimes, it becomes all too easy to take the next step, harvesting their organs without their consent. This is a step China undoubtedly took. When the state harvests the organs of executed prisoners without their consent, it is another step that becomes all too easy and tempting to take to harvest the organs of other vilified, depersonalised, defenceless prisoners without their consent, especially when there is big money to be made from it. We urge the government of China, whatever they think of the allegations considered here, to build up their defences against even the slightest possibility of the harvesting of organs from unwilling Falun Gong practitioners. All of which is respectfully submitted, (Signature) (Signature) ____________________ _______________________ David Matas David Kilgour Ottawa 6July 2006 1 (APPENDICES 1-12 are in a seperate file.) APPENDIX 13 TRANSCRIPT OF INTERVIEW Interview With Ex-Wife Of A Chinese Surgeon Who Removed Corneas Of Falun Gong Practitioners On May 20, 2006, Mr. David Kilgour conducted an interview in the United States with the ex-wife of a Chinese surgeon who removed corneas of Falun Gong prisoners. The following transcript was abridged and edited to protect those who may be in danger due to the publishing of this interview. W - Ex-wife of a Chinese surgeon who removed corneas of Falun Gong Practitioners. A - Another person who was also present at the Interview raised 2 questions. Kilgour: ... The closest person who saw this happen is "W". ... In 2001 when did the procurement of food supplies for [Sujiatun Hospital] go up? W: About July in the summer. Kilgour: July 2001.You were in the accounting department? W: Statistics and Logistics Department. Kilgour: Statistics and Logistics Department. What happened? The procurement of food went up first and then the surgical equipment? W: In July 2001, there were many people working in the Statistics and Logistics Department. Some of them from procurement brought the receipts to me for signature after they made the purchase. On the receipt I noted sharp increases in the food supplies. Also the people in charge of the logistics were delivering meals to the facilities where Falun Gong practitioners were detained. Other medial staff came to our department to report the purchase of the medical equipment. From the receipts, the medial equipment supplies also sharply increased. Kilgour: By the way, the facilities to detain Falun Gong practitioners, was it the underground facilities? W: In the backyard of the hospital, there were some one-story houses typically built for construction workers. After several months, the consumption of food and other supplies gradually decreased. At that time people guessed that maybe the detainees were sent to an underground facility. 2 Kilgour: When did the supply decrease? September? October? W: After about 4 or 5 months. Kilgour: End of 2001? W: Yes. Kilgour: How much of an increase did you estimate it was from the food [receipts you saw]? How many people you estimated were there? W: The person in charge of getting the food and in charge of sending food to Falun Gong practitioners detained told me that there were about 5000 to 6000 practitioners. At the time, a lot of public security bureaus and hospitals in many areas were detaining many Falun Gong practitioners. A lot of people working at the hospital including me were not Falun Gong practitioners. So we didn't pay attention. If it were not for what happened in 2003 when I found my ex-husband was directly involved in it, I probably wouldn't be interested in this at all. A lot of the staffers working in our department are family members of the officials in government health care system. For some matters, we knew it in our heart but none of us would discuss these things. Kilgour: When they decreased the procurement, where did your think the practitioners went? W: We thought they were released. Kilgour: At the end of 2001, you thought they were released? W: Yes. Kilgour: All 5000 had been released? W: No, there were still Falun Gong practitioners detained in the hospital, but the number was gradually decreasing. Later in 2003, I learned that Falun Gong practitioners were transferred to the underground complex and other hospitals, because our hospital couldn't hold so many people. Kilgour: They left the houses or cabins at the backyard to go to underground? W: Yes, I later got to know these in 2002. Kilgour: Did you say that you were not the person to send the food to them when practitioners were detained at the houses or cabins at the backyard? 3 W: No, I was not. Kilgour: Did you know who supplied their meals after they left your jurisdiction? W: I didn't know. Kilgour: I heard a lot of these people were killed for their organs. 2001 and 2002. Was it the correct understanding? W: During the years of 2001-2002, I didn't know anything about organ harvesting. I only knew the detaining of these people. Kilgour: So you didn't discover this until you husband told you in 2003. W: Right. Kilgour: Did he tell you 2001-2002 he already started doing these operations? W: Yes, he started from 2002. Kilgour: Your former husband began in 2002? W: Yes. Kilgour: Did you roughly know if there were [organ removal] operations since 2001? W: The operations started in 2001, some were done in our hospital, and some were done at other hospitals in the region. I found out in 2003. At the beginning he also did the operations, but he did not know they were Falun Gong practitioners. He was a nureo-surgeon. He removed corneas. Starting from 2002 he got to know those he operated on were Falun Gong practitioners. Because our hospital was not an organ transplant hospital. It was only in charge of removal. How these organs were transplanted, he didn't know. Kilgour: Your ex-husband started to take organs from Falun Gong practitioners starting from when? W: At the end of 2001, he started to operate, but he didn't know these live bodies were Falun Gong practitioners. He got to know that in 2002. Kilgour: What kind of organs did he take out? W: Corneas. 4 Kilgour: Just corneas? W: Yes. Kilgour: Were these people alive or dead? W: Usually these Falun Gong practitioners were injected with a shot to cause heart failure. During the process these people would be pushed into operation rooms to have their organs removed. On the surface the heart stopped beating, but the brain was still functioning, because of that shot. Kilgour: What was the injection called? W: I don't know the name of it but it caused heart failure. I was not a nurse or a doctor. Don't know the names of the injection. Kilgour: Causing heart failure, most, or all or some cases? W: For most people. Kilgour: So he would take corneas of these people, then what happened to these people? W: These people were pushed to other operation rooms for removals of heart, liver, kidneys, etc. During one operation when he collaborated with other doctors, he learned they were Falun Gong practitioners, that their organs were removed while alive, and that it was not just cornea removal -, they were removing many organs. Kilgour: They did it in different rooms, didn't they? W: In the later period of time, when these doctors cooperated together, they started doing the operations together. At the beginning, fearing information could leak out; different organs were removed by different doctors at different rooms. Later on when they got money, they were no longer afraid any more. They started to remove the organs together. For other practitioners who were operated in other hospital, my ex-husband didn't know what happened to them afterwards. For the practitioners in our hospital, after their kidneys, liver, etc and skin were removed, there were only bones and flesh, etc. left. The bodies were thrown into the boiler room at the hospital. In the beginning, I did not fully believe this had happened. For some doctors who had operation accidents, they may form some illusions. So I checked with other doctors and other officials from the government health care system. 5 Kilgour: in 2003 or 2002? W: 2003. Kilgour: Your husband only did corneas? W: yes Kilgour: how many cornea operations did your ex-husband perform? W: He said about 2000. Kilgour: Corneas of 2000 people, or 2000 corneas? W: Corneas of around 2000 people. Kilgour: This is from 2001 to 2003? W: From the end of 2001 to October 2003. Kilgour: That was when he left? W: It was the time that I got to know this and he stopped doing it. Kilgour: Where did these corneas go? W: It was usually collected by other hospitals. There was an existing system handling such business of removal and sales of the organs to other hospitals or other areas. Kilgour: Nearby or far away? W: I don't know. Kilgour: All the heart, liver, kidneys and corneas go off to other hospitals? W: Yes. Kilgour: Did you know what prices they sold them for? W: I don't know at the time. However, in year 2002, a neighbor had a liver transplant. It cost 200,000 yuan. The hospital charged a little bit less for Chinese than foreigners. Kilgour: Which year, 2001 or 2002? 6 W: 2002. Kilgour: What was the husband told?, How did they justify? These were perfectly healthy people... W: In the beginning, he wasn't told anything. He was asked to help out in other hospitals. However every time when he did such favor, or provided this kind of help, he got lots of money, and cash awards. Several dozens of times his normal salary. Kilgour: What was the total amount of money he got out of the 2000 cornea removal? W: Hundreds of thousands of US dollars. Kilgour: were they paid in US dollars? W: Paid in Chinese yuan. Equivalent to Hundreds of thousands of US dollars. Kilgour: How many doctors were working on these organ removals in the hospital, and in which area? Are we talking about 100 doctors or dozens, or 10? W: I don't know how many people were doing it specifically. But I know that about 4 or 5 doctors whom were acquaintances of us at our hospital were doing it. And in other hospitals, doctors of general practice were also doing this. Kilgour: Is there any records in the statistics department regarding how many people were operated upon? W: There was no proper procedure or paper work for this kind of operations. So there was no way to count the number of operations in the normal way. Kilgour: After practitioners transferred underground at the end of 2001, did you know where their food supplies were from? W: Food still came from our department. Just the amount gradually decreased. At the end of 2001 we thought they were released. In 2003, I learned that they were not released but were transferred to underground or other hospital. Kilgour: Was the underground facility run by the military army or by the hospital? You said food was still from the hospital. W: We weren't responsible for the procurement of the food for the people detained and kept underground. That is why there is so much difference in the procuring of food when people were transferred to the underground complex. But the food of some of 7 the detainees were provided by the hospital, and others were not. The decrease of food was not proportional to the decrease of the number of detainees. Kilgour: What did your husband tell you about the underground facility? 5000 people killed, or more than 5000? W: He didn't know how many people were detained underground. He only heard from some others that people were detained underground. If three operations were done every day, after several years of operation, for the 5000-6000 people, not many people would be left. This whole scheme and the trading of organs were organized by the government health care system. The doctors' responsibility was simply to do what they were told to do. Kilgour: He didn't go down to the underground facility himself? W: He didn't. Kilgour: Rudimentary operation in the underground facility? W: He had never been there. Kilgour: All of those people, were they dead when they were operated on? Or their hearts stopped? Did he know what they were killed afterwards? They weren't yet dead. W: At the beginning, he doesn't know these were Faun Gong practitioners. As time went by, he knew they were Faun Gong practitioners. When they did more of these removals of organs and became bold, these doctors started to do the removals together - this doctor extracted the cornea; another doctor removed the kidney; the third doctor took out the liver. At that time, this patient, or this Faun Gong practitioner, he knew what was the next step to treat the body. (Translator added the translation of the two missed sentences: Yes, the heart stopped beating, but they were still living.) If the victim's skin was not pealed off and only internal organs were removed, the openings of the bodies would be sealed and an agent would sign the paperwork. The bodies would be sent to the crematorium near the Sujiatun area. Kilgour: Only if when the skin were removed, they would be sent to the boiler's room? W: Yes. Kilgour: Usually what was the "supposed" cause of death given? W: Usually no specific reason when the bodies were sent to the crematorium. Usually the reasons were "The heart stopped beating", "heart failure". When these people were rounded up and detained, nobody knew their names or where they were from. So when 8 they were sent to the crematorium, nobody could claim their bodies. Kilgour: Who administered the drug to cause the heart to stop beating? W: Nurse. Kilgour: Nurse working for the hospital? W: Nurses brought over by these doctors. Doctors including my ex-husband came to this hospital in 1999 or 2000. He brought his nurse over. When organ harvest first started, nurses were assigned to the doctors. Wherever the doctors go, their nurses go with them as far as the organ removal operations were concerned. These nurses were not like personal secretaries. In year 2003, government health authorities sent many doctors involved in organ removal operations to an area sealed by the government because of SARS. These doctors believed they were sent there to let them live or die over there. I mean the government already wanted to put to death secretively the first group involved in organ removal. So they sent them to SARS affected area in Beijing. >From that point on my husband realized that there was danger in doing this and that any time, he could be killed and done away with as accomplice. Later when he wanted to quit, someone did try to kill him. Kilgour: In the hospital? W: Outside the hospital. Kilgour: Can you give us more details? W: At the end of 2003 after I learned about the issue, he came back from Beijing. He could no longer live a normal life. After I knew about it, he listened to my advice and decided to quit doing it. He submitted the resignation letter. It was around the new year of 2004. In February 2004, after his resignation was granted, the last month working in the hospital, he was finishing open ends at his work. During that time we received phone threats at home. Someone said to him, "You watch out for your life." One day we got off work in the afternoon. There were 2 people walking toward us trying to assassinate him. If you were a woman, I would show you my scar, because I pushed him aside and took the stab. Because men do not have very good six senses, so he kept walking. When I realized the 2 people were going to pull the knife to stab him, I pushed him aside and took the stab for him. Many people came over and I was sent 9 to the hospital. These two men ran away. Kilgour: Which side? (Location of the scar) W: Right side. Kilgour: Do you know who these two people were? W: I didn't know in the beginning. Later I knew. Kilgour: Who were they? W: I learned that they were thugs hired by the government health authorities. Kilgour: How did you find that out about this two? W: Because my family was part of the government health care system. My mom used to be a doctor. After these things happened, our friends suggested we get a divorce so it would separate our children and me from my husband. After all, our children and I didn't participate in any of these. So we were divorced at the end of 2003, very close to the new year of 2004. Kilgour: How many did you think were still alive? W: Initially I estimated there were about 2000 people left at the time I left China in 2004. But I cannot give a figure anymore, because China is still arresting Falun Gong practitioners and there have been people come in and going out. So I cannot give a figure now any more. Kilgour: How did you come to this number 2000 in 2004? W: According to how many my ex-husband did and how many other doctors did. And how many sent to other hospitals. Good doctors are well connected within the health care system. Many of them used to be classmates in medical schools. The number was estimated by the few doctors involved. When we were together in private, they discussed how many people in total. At that time, these doctors did not want to continue. They wanted to go to other countries or transfer to other fields. So the total number of death was calculated and derived by these doctors involved. Kilgour: What is their estimate of how many people were killed? W: They estimated 3000-4000 people. 10 Kilgour: This is the estimate by all of the doctors? W: No. By three doctors we were familiar with. Kilgour: Do you have anything else you want to say? W: Chinese or non-Chinese, they think it is impossible Sujiatun detained so many Falun Gong practitioners. They focused on just this Sujiatun hospital. Because most people do not know there are underground facilities. I want to say, even if things were over for Sujiatun, in other hospitals this issue continues. Because I worked in Sujiatun, I know about Sujiatun. Other hospitals and detention centers, inspecting and putting control on these facilities will help reduce the deaths. For Chinese people, one person comes out, there are still family members in China. They still dare not come out to speak the truth. They are afraid it could put their family members in danger. It doesn't mean that they don't know about it. A: Does your mother know about what you are doing? W: Yes. A: Does she still work in the government health care system? W: No. She retired a long time ago. She is almost 70 years old. 1 APPENDIX 14 TRANSCRIPT OF TELEPHNOE INVESTIGATIONS (1) Mishan City Detention Centre, Heilongjiang province (8 June 2006): M: Do you have Falun Gong [organ] suppliers? ... Mr. Li: We used to have, yes. M: ... what about now? Mr. Li: ... Yes. ..... M: Can we come to select, or you provide directly to us? Mr.Li: We provide them to you. M: What about the price? Mr. Li: We discuss after you come. ..... M: How many [Falun Gong suppliers] under age 40 do you have? Mr. Li: Quite a few. ..... M: Are they male or female? Mr. Li: Male ..... M: Now, for ... the male Falun Gong [prisoners], How many of them do you have? Mr. Li: Seven, eight, we have [at least] five, six now. M: Are they from countryside or from the city? Mr. Li: countryside. 2 (2)Shanghai's Zhongshan Hospital Organ Transplant Clinic (16 March 2006): M: Hi. Are you a doctor? Doctor: Yes, I am... ..... M: ...So how long do I have to wait [for organ transplant surgery]? Doctor: About a week after you come... M: Is there the kind of organs that come from Falun Gong? I heard that they are very good." Doctor: All of ours are those types. (3) Qianfoshan City Liver Transplant Hospital, Shandong province (16 March 2006): Receptionist: "Hold a second. I'll get a doctor for you. Doctor: Hello. How are you? M: ... How long have you been doing [these operations]?... Doctor: ... Over four years. .. M: The supply of livers.. the ones from Falun Gong, I want to ask if you have those types?" Doctor: It is ok if you come here. M: So that means you have them? Doctor: ...In April, there will be more of these kinds of suppliers.., now gradually we have more and more." M: Why will there be more in April? Doctor: This I can't explain to you... 3 (4)Nanning City Minzu Hospital in Guangxi Autonomous Region (22 May 2006): M: Could you find organs from Falun Gong practitioners? Dr. Lu: Let me tell you, we have no way to get (them). It's rather difficult to get it nowin Guangxi. If you cannot wait, I suggest you go to Guangzhou because it's very easy for them to get the organs. They are able to look for them nation wide. As they are performing the liver transplant, they can get the kidney for you at the same time, so it's very easy for them to do. Many places where supplies are short go to them for help. ..... M: Why is it easy for them to get?... Lu: Because they are an important institution. They contact the judicial system in the name of the whole university. M: Then they use organs from Falun Gong practitioners? Lu: Correct... ..... M: ... What you used before (organs from Falun Gong practitioners), were they from detention centre(s) or prison(s)?" Lu: From prisons. M: ... And it was from healthy Falun Gong practioners...? Lu: Correct. We would choose the good ones because we assure the quality in our operation." M: That means you choose the organs yourself. Lu: Correct... ..... M: Usually, how old is the organ supplier? 4 Lu: Usually in their thirties. M: ... Then you will go to the prison to select yourself? Lu: Correct. We must select it. M: What if the chosen one doesn't want to have blood drawn? Lu: He will for sure let us do it. M: How? Lu: They will for sure find a way. What do you worry about? These kinds of things should not be of any concern to you. They have their rocedures. M: Does the person know that his organ will be removed? Lu: No, he doesn't. (5)Shanghai Jiaotong University Hospital's Liver Transplant Centre (16 March 2006): M: "I want to know how long [ the patients] have to wait (for a liver transplant). Dr. Dai: The supply of organs we have, we have every day. We do them every day. M: We want fresh, alive ones. Dr. Dai: They are all alive, all alive... M: How many [liver transplants] have you done? Dr. Dai: We have done 400 to 500 cases...Your major job is to come, prepare the money, enough money, and come. M: How much is it? Dr. Dai: If everything goes smoothly, it's about RMB 150,000...RMB 200,000. M: How long do I have to wait? Dr. Dai: I need to check your blood type...If you come today, I may do it for you within one week. 5 M: I heard some come from those who practise Falun Gong, those who are very healthy. Dr. Dai: UYes, we have. I can't talk clearly to you over the phone. M: If you can find me this type, I am coming very soon. Dr. Dai: It's ok. Please come. M: ...What is your last name?... Dr. Dai: I'm Doctor Dai. (6) Zhengzhou Medical University Organ Transplant Centre in Henan Province (14 March 2006): Dr. Wang: ...For sure, [the organ] is healthy... If it's not healthy, we won't take it. M: I've heard that those kidneys from Falun Gong practitioners are better. Do you have them? Wang: Yes, yes, we pick all the young and healthy kidneys... M: That is the kind that practises this type of [Falun] Gong. Wang: For this, you could rest assured. Sorry I can't tell you much on the phone. M: Do you get (them) out of town? Wang: ... We have local ones and out-of-town ones. ..... M: What is your last name? Wang: Wang (7) Oriental Organ Transplant Center (also called Tianjin City No 1 Central Hospital), Tianjin City, (15 March 2006): N: Is this Chief-Physician Song? 6 Song: Yes, please speak. ..... N: Her doctor told her that the kidney is quite good because he [the supplier,] practises ..Falun Gong. Song: Of course. We have all those who breathe and with heart beat...Up until now, for this year, we have more than ten kidneys, more than ten such kidneys. N: More than ten of this kind of kidneys? You mean live bodies? Song: Yes it is so. (8) Tongji Hospital in Wuhan City, Wuhan City, Hunan Province (30 March 2006): N: How many (kidney transplants) can you do in a year? Official: ... Our department is the one that does the most in the whole Hubei province. We do a lot if the organ suppliers are ample. N: ... We hope the kidney suppliers are alive. [We're] looking for live organ transplants from prisoners, for example, using living bodies from prisoners who practise Falun Gong. Is it possible? Official: It's not a problem. (9) General Hospital of Guangzhou Military Region, Guangdong Province (12 April 2006): N: Is this Dr. Zhu...? Zhu: Yes that's me. N: I'm from hospital 304. ...... I have two relatives in hospital 304. We don't have enough kidney supply right now. We did a lot of [kidney transplants] in 2001, 2002 and 2003... Zhu: Right... N: We found that kidneys from young people and Falun Gong [practitioners] are better. How about your hospital, such as kidneys from Falun Gong? 7 Zhu: We have very few kidneys from Falun Gong. N: But you still have some? Zhu: It is not hard for [blood] type B. If you come here, we can arrange it quickly, definitely before May 1. N: There will be a batch before May 1? Zhu: Several batches. N: Will you have some after May 1? Zhu: After May 1, you may need to wait until May 20 or later. ..... (10) First Detention Centre of Qinhuangdao city,Shangdong Province (18 May 2006): N: Is this the First Detention Centre of Qinhuangdao City?" Official: "What's up? N: We are doing kidney transplantations and we don't have enough organs. Official: You don't need to call here. You just call the court. N: Which court? Official: It is the Intermediate People's Court. You need to tell them about such thing. N: In 2001, you provided live organs from young and healthy people who practised Falun Gong... Official: You don't need to talk about that time. It has been so many years. Right now it is with the court. You just call them. (11) The Second Detention Centre of Qinhuangdao city Shangdong Province (18 May 2006): ..... N:... I wonder if you still have live organ supplies from people such as those practising 8 Falun Gong? Official: No, we don't have Falun Gong [organs] right now. There are very few people- almost none. During earlier 2000s there [were] many Falun Gong [organs]. ..... (12) The Qinhuangdao Intermediate People's Court Shangdong Province (18 May 2006): ..... N: ... Can your court provide us with some live kidneys from young and healthy people? Official: No matter good or bad, we have none. There is no execution after the Spring festival... N:... I mean live kidneys from young and healthy people who practise Falun Gong. You had a lot in 2001... Official: We had before... N: ... Not just the executed prisoners-such live organs as Falun Gong...? Official: No, what you said is in 2001. We have to face reality now... ..... (13) The First Criminal Bureau of the Jinzhou Intermediate People's Court (23 May 2006): N: Starting from 2001, we always (got) kidneys from young and healthy people who practise Falun Gong from detention centres and courts...I wonder if you still have such organs in your court right now? Official: That depends on your qualifications... If you have good qualifications, we may still provide some... N: Are we supposed to get them, or will you prepare for them?" Official: According to past experience, it is you that will come here to get them. N: ... What are the qualifications that we must have? 9 Official: ... Let's say for now this year is very different from previous years. This year the situation is very tough...The policy is very strict. Several years ago we had a good relationship with Beijing, but recently it is very tense...It's all about mutual benefits... (14) Kunming Higher People's Court (31 May 2006): N: ... We contacted your court several times in 2001. Your court can provide us with those live kidney organs from those young and healthy Falun Gong practitioners...? Official: I am not sure about that. Such things are related to national secrets. I don't think this is something that we can talk about on the phone. If you want to know more information about these things, you'd better contact us in a formal way, okay? -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 9CommentsOnCCP.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: InvestigationReport.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OpenLetterFromEurop.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: WeaponsToRogue.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: AssemblyLineOfOrganHarvesting.txt URL: From sean.hefty at intel.com Sun Jul 23 22:35:51 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 23 Jul 2006 22:35:51 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <44C33366.2060400@voltaire.com> Message-ID: <000001c6aee3$0738f3e0$87f9070a@amr.corp.intel.com> >> All IB devices should allow posting of WR to the SQ while in the RTR state. > >Is this statement based on the IB spec, the mthca driver or both? Uhm... I thought I was quoting the IB spec (c10-27), but I kind of skipped over the part that mentioned the receive queue. My bad. - Sean From sean.hefty at intel.com Sun Jul 23 22:39:50 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 23 Jul 2006 22:39:50 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C32DC6.2080405@voltaire.com> Message-ID: <000101c6aee3$951b95a0$87f9070a@amr.corp.intel.com> >> @@ -1015,20 +1089,33 @@ int ib_send_cm_req(struct ib_cm_id *cm_i >> cm_id_priv->local_ack_timeout = >> cm_req_get_primary_local_ack_timeout(req_msg); >> >> + cm_id_priv->timewait_info->local_ca_guid = req_msg->local_ca_guid; >> + cm_id_priv->timewait_info->local_qpn = cm_id_priv->local_qpn; >> + spin_lock_irqsave(&cm.lock, flags); >> + if (cm_insert_local_qpn(cm_id_priv->timewait_info)) >> + ret = -EADDRINUSE; >> + spin_unlock_irqrestore(&cm.lock, flags); > >why bother doing it here? can't the race happen only on the passive side >of the connection? The race can only happen on the passive side, but... for peer to peer connections, we won't know which side is passive until later, and this tracks the local QPN as part of timewait now. (Note that peer to peer support requires more work.) >Just to make sure, following reading the code/patch i understand the >case where the RTU does appear following the event is handled, what >would happen in this case is that an EINVAL status would be returned by >cm_rtu_handler to cm_work_handler and that's all. Correct. - Sean From sean.hefty at intel.com Sun Jul 23 22:43:42 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 23 Jul 2006 22:43:42 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C337D4.7060807@voltaire.com> Message-ID: <000201c6aee4$1f54b760$87f9070a@amr.corp.intel.com> >+/** >+ * ib_cm_establish - Forces a connection state to established. >+ * @cm_id: Connection identifier to transition to established. >+ * >+ * This routine should be invoked by users who receive messages on a >+ * connected QP before an RTU has been received. >+ */ >+int ib_cm_establish(struct ib_cm_id *cm_id); > >As with your patch ib_cm_establish is now called internally from within >the CM with the suggested solution good for both user and kernel space >connections/qps, how about removing it from the API? I thought about removing it, but it changes the ABI, and existing users may still want to call ib_cm_establish() after processing a receive completion. This will let them immediately send a response without waiting for either the RTU or the COMM_EST event. - Sean From sean.hefty at intel.com Sun Jul 23 22:54:43 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 23 Jul 2006 22:54:43 -0700 Subject: [openib-general] svn checkin to merge upstream In-Reply-To: <20060723092933.GA26317@mellanox.co.il> Message-ID: <000301c6aee5$a925fa20$87f9070a@amr.corp.intel.com> >This one (below)? >Seems to do more than it says, and I now get a warning about the "out:" label >being unused. Is it just me? I missed the warning because I extracted this patch from my CM event handling patch, which reworked the error handling in this function. >Further, wouldn't > if (IS_ERR()) { > ret = PTR_ERR(cm_id_priv->timewait_info); > goto out; > } > >be more elegant, and a smaller patch? It would, but it's possible the CM event patch would just change it again anyway. I will check on that, and at least fix the warning. - Sean From sean.hefty at intel.com Sun Jul 23 23:00:47 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 23 Jul 2006 23:00:47 -0700 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it In-Reply-To: <200607231221.35857.jackm@mellanox.co.il> Message-ID: <000401c6aee6$8252e6a0$87f9070a@amr.corp.intel.com> >SVN 7079 corrected this in file "librdmacm/src/cma.c" so that if the ABI was >not available via "/sys/class/misc/rdma_cm/abi_version", we would assume that >the ABI version was the latest version. (If all was not OK with this >assumption in a specific execution instance, there would be downstream >failures which would be caught). >Unfortunately, SVN 7497 broke this fix. The fix needs to be re-instated. Yes - I inadvertently removed this fix when I removed libsysfs from the librdmacm. I will re-instate the fix. - Sean From mst at mellanox.co.il Sun Jul 23 23:18:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 24 Jul 2006 09:18:20 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> References: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> Message-ID: <20060724061820.GA8470@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [PATCH 0/4] Dispatch communication related events to the IB CM > > The following set of patches forwards communication related events to the IB CM > for processing. Communication events of interest are communication established > and path migration, with only the former is currently handled by the IB CM. > > This removes the need for users to trap for these events and pass the > information onto IB CM. Communication established events can be handled by the > ib_cm_establish() routine, but no mechanism exists to notify the IB CM of path > migration. This adds the framework for doing so. > > Signed-off-by: Sean Hefty > Some things unclear to me: 1. Is a user somehow *required* to call ib_cm_establish if he gets the event? E.g. what about cma users? 2. There are now two ways CM gets notified about Communication established events - what happends if both the user calls ib_cm_establish, and CM gets this directly? 3. If I understand it correctly, CM does not yet know how to switch to the alternate path for CM messages. So, is the patch still useful by itself, or is it more of a framework preparatory work? -- MST From ogerlitz at voltaire.com Mon Jul 24 00:00:01 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 24 Jul 2006 10:00:01 +0300 (IDT) Subject: [openib-general] [PATCH] IB/ipoib: fix crash on mcast join finish Message-ID: Roland, This crash happens 1:1 with setting ipoib_debug_mcast, the fix applied by the patch below is to set mcast->ah before the debug code attempts to access it. This is 2.6.18 material, correct? also since the crash does not allow to debug mcast, i guess the fix needs to go into OFED 1.1 as well. Or. ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) ib0: Created ah ffff81002cdb1c00 Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [] :ib_ipoib:ipoib_mcast_join_finish+0x273/0x3af PGD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: ib_ipoib ib_sa autofs nfs lockd sunrpc sg st sd_mod sr_mod scsi_mod ib_mthca ib_mad ib_core i2c_amd8111 i2c_amd756 i2c_core e100 Pid: 3919, comm: ib_mad1 Not tainted 2.6.18-rc1 #6 RIP: 0010:[] [] :ib_ipoib:ipoib_mcast_join_finish+0x273/0x3af RSP: 0018:ffff81003eba9ca0 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff81000ddcce80 RCX: 0000000000000012 RDX: 00000000000000ff RSI: ffff810031f63000 RDI: ffffffff880b087b RBP: ffff8100050d61c0 R08: 0000000000000040 R09: 000000000000001b R10: 0000000000000000 R11: 0000000000000000 R12: ffff810031f63500 R13: ffff810031f63000 R14: ffff81003eba9d58 R15: ffff81001a947000 FS: 00002aad0e23a0a0(0000) GS:ffff81003f8b8f40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006e0 Process ib_mad1 (pid: 3919, threadinfo ffff81003eba8000, task ffff81001e204a70) Stack: 000000000000c000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 000101030000c000 0000ffff1b4012ff ffffffff00000000 0000000000000000 000101030000c000 00000000000000ff Call Trace: [] :ib_ipoib:ipoib_mcast_join_complete+0x9d/0x24c [] :ib_sa:ib_sa_mcmember_rec_callback+0x40/0x49 [] :ib_sa:recv_handler+0x3a/0x43 [] :ib_mad:ib_mad_completion_handler+0x3ac/0x592 [] run_workqueue+0x9a/0xea [] worker_thread+0x108/0x13a [] kthread+0xc9/0xf2 [] child_rip+0x8/0x12 Code: ff 70 08 0f b6 43 0f 50 0f b6 43 0e 50 0f b6 43 0d 50 0f b6 RIP [] :ib_ipoib:ipoib_mcast_join_finish+0x273/0x3af RSP CR2: 0000000000000008 ---------------------------------------------------------------------- set mcast->ah before the ipoib mcast debug code attempts to access it Signed-off-by: Or Gerlitz --- linux-2.6.18-rc1-orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-07-18 12:53:33.000000000 +0300 +++ linux-2.6.18-rc1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-07-24 09:50:37.000000000 +0300 @@ -264,6 +264,9 @@ static int ipoib_mcast_join_finish(struc if (!ah) { ipoib_warn(priv, "ib_address_create failed\n"); } else { + spin_lock_irq(&priv->lock); + mcast->ah = ah; + spin_unlock_irq(&priv->lock); ipoib_dbg_mcast(priv, "MGID " IPOIB_GID_FMT " AV %p, LID 0x%04x, SL %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), @@ -271,10 +274,6 @@ static int ipoib_mcast_join_finish(struc be16_to_cpu(mcast->mcmember.mlid), mcast->mcmember.sl); } - - spin_lock_irq(&priv->lock); - mcast->ah = ah; - spin_unlock_irq(&priv->lock); } /* actually send any queued packets */ From arne.redlich at xiranet.com Mon Jul 24 00:30:41 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Mon, 24 Jul 2006 09:30:41 +0200 Subject: [openib-general] [PATCH 2/2] ib_cm: fix REJ due to invalid GID In-Reply-To: <44C103F5.9020906@ichips.intel.com> References: <1153294363.6972.151.camel@confield.dd.xiranet.com> <20060719170522.GA16573@mellanox.co.il> <1153409896.6972.192.camel@confield.dd.xiranet.com> <20060720154433.GA27224@mellanox.co.il> <1153467876.6972.235.camel@confield.dd.xiranet.com> <44C103F5.9020906@ichips.intel.com> Message-ID: <1153726241.6495.4.camel@confield.dd.xiranet.com> Am Freitag, den 21.07.2006, 09:42 -0700 schrieb Sean Hefty: > Arne Redlich wrote: > > I only took a rather superficial look at the code. But since the invalid > > GID REJ is treated with such care, I guess it's intentional and not a > > bug? > > I would lean more towards it being a bug. The data in the CM REQ is wrong. > (There's a difference between rejecting a request to another GID, versus the GID > carried in the request being bogus.) > > In any case, an argument can be made that this change fixes a compliancy issue > in the IB CM. If the change corrects the problem for you, I will commit it. Looks good so far - although I've used the patch against 2.6.17.x. Thanks, Arne -- Arne Redlich Xiranet Communications GmbH From ogerlitz at voltaire.com Mon Jul 24 00:36:09 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 24 Jul 2006 10:36:09 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <44C38070.9070507@mellanox.co.il> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3028430B0@mtlexch01.mtl.com> <44BF686A.7060707@mellanox.co.il> <44C352FB.9090009@voltaire.com> <44C38070.9070507@mellanox.co.il> Message-ID: <44C47869.2000708@voltaire.com> > Or Gerlitz wrote: >> Vladimir Sokolovsky wrote: >> Did you have any special reason to assign host1:ib1 an IP address >> ***before*** the failover? is the reason for that happen to be having >> it joins the IPv4 multicast group at "batch time", that is not during >> the failover? > ib1 interface is loaded in any case (with or without configuration) if > ib0 is loaded by /etc/init.d/network or /etc/init.d/openibd. It can't be > configured with IP 0.0.0.0 - it fails to start with this configuration. > So, I gave it some IP in a different IP subnet. Not sure what you mean by "loaded": the trigger for IPoIB to registers network devices is plain IB, that is "device (not link!) up" event it gets through the ib stack client register hotplug mechanism, for exampe if the HCA has two ports, IPoIB will register ib0 and ib1 (same for two HCAs each of them with one port etc). However, I think the trigger for IPoIB to attempt doing the SA Q to have the port GID associated with IPoIB netdevice join an mcast group is the user action towards having this device being "UP" (eg the assignment of IP address to it). Not sure what you mean by "start", you can just do nothing before the failure of ib0 and during the failover from ib0 to ib1, assign ib1 the address which used to be of ib0. >> I think we want arping to send a gratuitous arp with the MAC of ib1 >> so weren't you need to provide the -U or -A command line to arping? > You are right I used 'arping -A ...' (fogot to insert it in the email). > Actually, I have added my flag '-R' which means '-A over IPoIB' thanks for the patch, i am not sure to fully follow the code path when the "unsolicited" flag is set, but i do see what unlike in the -A/-U options you have made the -R option not to set the "unsolicited" flag, can you explain what was the issue? Or. From mst at mellanox.co.il Mon Jul 24 00:38:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 24 Jul 2006 10:38:52 +0300 Subject: [openib-general] [PATCH] IB/ipoib: fix crash on mcast join finish In-Reply-To: References: Message-ID: <20060724073852.GA8795@mellanox.co.il> Quoting r. Or Gerlitz : > This is 2.6.18 material, correct? Looks like it. > also since the crash does not allow > to debug mcast, i guess the fix needs to go into OFED 1.1 as well. > > Or. Then Cc me on this patch :) -- MST From ogerlitz at voltaire.com Mon Jul 24 00:46:38 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 24 Jul 2006 10:46:38 +0300 Subject: [openib-general] [PATCH] IB/ipoib: fix crash on mcast join finish In-Reply-To: <20060724073852.GA8795@mellanox.co.il> References: <20060724073852.GA8795@mellanox.co.il> Message-ID: <44C47ADE.4090806@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> also since the crash does not allow >> to debug mcast, i guess the fix needs to go into OFED 1.1 as well. > Then Cc me on this patch :) OK, I see, so you are the maintainer of IPoIB in OFED? and of any other modules as well? which? Or. From mst at mellanox.co.il Mon Jul 24 00:53:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 24 Jul 2006 10:53:51 +0300 Subject: [openib-general] 3 patches in mst-for-2.6.18 Message-ID: <20060724075351.GB8795@mellanox.co.il> Roland, I have put the following patches in my mst-for-2.6.18 tree: $git log --pretty=short origin..mst-for-2.6.18 commit cdfe3231414ee6dca31bb1fa3f9140408ae33a6a Author: Or Gerlitz [PATCH] IB/ipoib: fix oops with ipoib_debug_mcast set commit 7534914e26bd847d61e8de6db3ce4fe83b67299e Author: Sean Hefty [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 commit 864cb0bc598c9c6ece816d6bc74efb54e0ef6b70 Author: Michael S. Tsirkin [PATCH] IB/ipoib: fix packet loss after hardware address update I suggest queueing these patches for 2.6.18. You can get them here git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 These patches have been posted on the list previously, but let me know and I'll repost them separately. Incidentially, could you please confirm you have the time for this, at the moment? -- MST From mst at mellanox.co.il Mon Jul 24 01:08:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 24 Jul 2006 11:08:16 +0300 Subject: [openib-general] [PATCH] IB/ipoib: fix crash on mcast join finish In-Reply-To: <44C47ADE.4090806@voltaire.com> References: <44C47ADE.4090806@voltaire.com> Message-ID: <20060724080816.GC8795@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH] IB/ipoib: fix crash on mcast join finish > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz : > > >> also since the crash does not allow > >> to debug mcast, i guess the fix needs to go into OFED 1.1 as well. > > > Then Cc me on this patch :) > > OK, I see, so you are the maintainer of IPoIB in OFED? and of any > other modules as well? which? No, Linus is the maintainer :) Seriously, whatever goes into Linus' tree for 2.6.18 will get into OFED more or less automatically. My point was to help make sure things get tested. First, we are doing OFED tests here in Mellanox and if I drop patches in kernel_patches/fixes directory they will get run here nightly even before Linus pulls them. Unfortunately this harness at the moment does not know how to build kernels, so it is less relevant for testing release candidates, and more relevant for testing code backported to various distro's kernels, OFED-style. Further, I am running my own git tree on my development machines, so if I put stuff there you'll get more bug-reports. -- MST From vlad at mellanox.co.il Mon Jul 24 02:14:36 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 24 Jul 2006 12:14:36 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release - schedule and features In-Reply-To: <44C47869.2000708@voltaire.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3028430B0@mtlexch01.mtl.com> <44BF686A.7060707@mellanox.co.il> <44C352FB.9090009@voltaire.com> <44C38070.9070507@mellanox.co.il> <44C47869.2000708@voltaire.com> Message-ID: <44C48F7C.60405@mellanox.co.il> Or Gerlitz wrote: >> Or Gerlitz wrote: >>> Vladimir Sokolovsky wrote: > >>> Did you have any special reason to assign host1:ib1 an IP address >>> ***before*** the failover? is the reason for that happen to be >>> having it joins the IPv4 multicast group at "batch time", that is >>> not during the failover? > >> ib1 interface is loaded in any case (with or without configuration) >> if ib0 is loaded by /etc/init.d/network or /etc/init.d/openibd. It >> can't be configured with IP 0.0.0.0 - it fails to start with this >> configuration. >> So, I gave it some IP in a different IP subnet. > > Not sure what you mean by "loaded": the trigger for IPoIB to registers > network devices is plain IB, that is "device (not link!) up" event it > gets through the ib stack client register hotplug mechanism, for > exampe if the HCA has two ports, IPoIB will register ib0 and ib1 (same > for two HCAs each of them with one port etc). > However, I think the trigger for IPoIB to attempt doing the SA Q to > have the port GID associated with IPoIB netdevice join an mcast group > is the user action towards having this device being "UP" (eg the > assignment of IP address to it). > > Not sure what you mean by "start", you can just do nothing before the > failure of ib0 and during the failover from ib0 to ib1, assign ib1 the > address which used to be of ib0. By "loaded" I meant that ib1 is configured with IP address and other parameters after executing '/etc/init.d/network start' or '/etc/init.d/openibd start' I worked on SuSE 10 and I saw that even if ifcfg-ib1 does not exist then ib1 get the same configuration as ib0. This does not happens on RedHat 4.0. > >>> I think we want arping to send a gratuitous arp with the MAC of ib1 >>> so weren't you need to provide the -U or -A command line to arping? > >> You are right I used 'arping -A ...' (fogot to insert it in the >> email). Actually, I have added my flag '-R' which means '-A over IPoIB' > > thanks for the patch, i am not sure to fully follow the code path when > the "unsolicited" flag is set, but i do see what unlike in the -A/-U > options you have made the -R option not to set the "unsolicited" flag, > can you explain what was the issue? There was no issue, it just a drop version. So, you can change it as you wish. > > Or. > Regards, Vladimir From halr at voltaire.com Mon Jul 24 02:44:54 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 Jul 2006 05:44:54 -0400 Subject: [openib-general] [PATCH] osm: bug on SM generated Notice Reports In-Reply-To: <86lkqkelug.fsf@mtl066.yok.mtl.com> References: <86lkqkelug.fsf@mtl066.yok.mtl.com> Message-ID: <1153734291.4537.72821.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-07-23 at 05:01, Eitan Zahavi wrote: > Hi Hal > > When formating an SM generated report the ProducerType was using CL_NTOH32 > which can not be used to format a 24bit network order number. > > The propose fix also avoids the duplication of conversion to and from network > order introduced by the previous use of ib_notice_set_prod_type that requires > a network ordered argument. > > The fix introduces ib_notice_set_prod_type_ho which is similar but requires > host order value. > For completeness I also introduced the ib_notice_set_vend_id_ho. Thanks. Applied in r8640. Are the network order routines ( ib_notice_set_prod_type and ib_notice_set_vend_id) still needed or should they be eliminated ? -- Hal > Eitan > > Signed-off-by: Eitan Zahavi From k_mahesh85 at yahoo.co.in Mon Jul 24 03:50:35 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Mon, 24 Jul 2006 11:50:35 +0100 (BST) Subject: [openib-general] connection loss handling in mthca Message-ID: <20060724105035.20971.qmail@web8323.mail.in.yahoo.com> i have a query regarding the handling of asynchronous events in mthca driver consider the situation, receiver has posted some 10 descriptors. and 5 out of them are completd successfully, after that connection is lost( in NIC level) due to some reason now, 1. how do the QP know about this(there is no IB specific event) 2. What about the remaining descriptors in the receiver side are completions will be generated for them -Mahesh --------------------------------- Find out what India is talking about on Yahoo! Answers India. -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Jul 24 03:56:45 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 Jul 2006 06:56:45 -0400 Subject: [openib-general] {PATCH]OpenSM: Better handling for Path/MultiPath requests for raw traffic Message-ID: <1153738604.4537.73841.camel@hal.voltaire.com> OpenSM: Better handling for Path/MultiPath requests for raw traffic Signed-off-by: Sasha Khapyorsky Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 8640) +++ opensm/osm_sa_path_record.c (working copy) @@ -562,7 +562,7 @@ __osm_pr_rcv_get_path_parms( if( comp_mask & IB_PR_COMPMASK_RAWTRAFFIC && cl_ntoh32( p_pr->hop_flow_raw ) & ( 1<<31 ) ) - pkey = 0; + pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp ); else if( comp_mask & IB_PR_COMPMASK_PKEY ) { pkey = p_pr->pkey; @@ -588,19 +588,27 @@ __osm_pr_rcv_get_path_parms( } } - p_prtn = (osm_prtn_t *)cl_qmap_get(&p_rcv->p_subn->prtn_pkey_tbl, - pkey & cl_ntoh16((uint16_t)~0x8000)); - if ( p_prtn == (osm_prtn_t *)cl_qmap_end(&p_rcv->p_subn->prtn_pkey_tbl) ) - { - /* this may be possible when pkey tables are created somehow in - previous runs or things are going wrong here */ - sl = OSM_DEFAULT_SL; - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_pr_rcv_get_path_parms: ERR 1F1C: " - "No partition found for PKey 0x%04x - using default SL %d\n", cl_ntoh16(pkey), sl ); + if (pkey) { + p_prtn = (osm_prtn_t *)cl_qmap_get(&p_rcv->p_subn->prtn_pkey_tbl, + pkey & cl_ntoh16((uint16_t)~0x8000)); + if ( p_prtn == (osm_prtn_t *)cl_qmap_end(&p_rcv->p_subn->prtn_pkey_tbl) ) + { + /* this may be possible when pkey tables are created somehow in + previous runs or things are going wrong here */ + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_pr_rcv_get_path_parms: ERR 1F1C: " + "No partition found for PKey 0x%04x - using default SL %d\n", + cl_ntoh16(pkey), sl ); + sl = OSM_DEFAULT_SL; + } + else + sl = p_prtn->sl; + + /* reset pkey when raw traffic */ + if( comp_mask & IB_PR_COMPMASK_RAWTRAFFIC && + cl_ntoh32( p_pr->hop_flow_raw ) & ( 1<<31 ) ) + pkey = 0; } - else - sl = p_prtn->sl; if ( ( comp_mask & IB_PR_COMPMASK_SL ) && ib_path_rec_sl( p_pr ) != sl ) { Index: opensm/osm_sa_multipath_record.c =================================================================== --- opensm/osm_sa_multipath_record.c (revision 8640) +++ opensm/osm_sa_multipath_record.c (working copy) @@ -197,7 +197,7 @@ __osm_mpr_rcv_get_path_parms( if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC && cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) ) - required_pkey = 0; + required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp ); else if ( comp_mask & IB_MPR_COMPMASK_PKEY ) { required_pkey = p_mpr->pkey; if( !osm_physp_share_this_pkey( p_physp, p_dest_physp, required_pkey ) ) { @@ -225,19 +225,26 @@ __osm_mpr_rcv_get_path_parms( } } - p_prtn = (osm_prtn_t *)cl_qmap_get(&p_rcv->p_subn->prtn_pkey_tbl, - required_pkey & cl_ntoh16((uint16_t)~0x8000)); - if ( p_prtn == (osm_prtn_t *)cl_qmap_end(&p_rcv->p_subn->prtn_pkey_tbl) ) - { - /* this may be possible when pkey tables are created somehow in - previous runs or things are going wrong here */ - required_sl = OSM_DEFAULT_SL; - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_mpr_rcv_get_path_parms: ERR 451A: " - "No partition found for PKey 0x%04x - using default SL %d\n", cl_ntoh16(required_pkey), required_sl ); + if (required_pkey) { + p_prtn = (osm_prtn_t *)cl_qmap_get(&p_rcv->p_subn->prtn_pkey_tbl, + required_pkey & cl_ntoh16((uint16_t)~0x8000)); + if ( p_prtn == (osm_prtn_t *)cl_qmap_end(&p_rcv->p_subn->prtn_pkey_tbl) ) + { + /* this may be possible when pkey tables are created somehow in + previous runs or things are going wrong here */ + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 451A: " + "No partition found for PKey 0x%04x - using default SL %d\n", cl_ntoh16(required_pkey), required_sl ); + required_sl = OSM_DEFAULT_SL; + } + else + required_sl = p_prtn->sl; + + /* reset pkey when raw traffic */ + if( comp_mask & IB_PR_COMPMASK_RAWTRAFFIC && + cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) ) + required_pkey = 0; } - else - required_sl = p_prtn->sl; if ( ( comp_mask & IB_MPR_COMPMASK_SL ) && ib_multipath_rec_sl( p_mpr ) != required_sl ) { From dotanb at mellanox.co.il Mon Jul 24 04:10:49 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Mon, 24 Jul 2006 14:10:49 +0300 Subject: [openib-general] connection loss handling in mthca In-Reply-To: <20060724105035.20971.qmail@web8323.mail.in.yahoo.com> References: <20060724105035.20971.qmail@web8323.mail.in.yahoo.com> Message-ID: <200607241410.49781.dotanb@mellanox.co.il> Hi. On Monday 24 July 2006 13:50, keshetti mahesh wrote: > i have a query regarding the handling of asynchronous events in mthca driver > consider the situation, receiver has posted some 10 descriptors. and 5 out of them are completd successfully, after that connection is lost( in NIC level) due to some reason > > now, > 1. how do the QP know about this(there is no IB specific event) If the QP was the responder of an RDMA operation which failed, there should be an async event on the QP. > 2. What about the remaining descriptors in the receiver side > are completions will be generated for them In case of an error, the QP state will be changed to error and all the WR (in SQ and RQ) will be flushed (with error) Dotan From k_mahesh85 at yahoo.co.in Mon Jul 24 04:42:00 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Mon, 24 Jul 2006 12:42:00 +0100 (BST) Subject: [openib-general] connection lose handling in mthca Message-ID: <20060724114200.52467.qmail@web8316.mail.in.yahoo.com> Hi. On Monday 24 July 2006 13:50, keshetti mahesh wrote: > i have a query regarding the handling of asynchronous events in mthca driver > consider the situation, receiver has posted some 10 descriptors. and 5 out of them are completd successfully, after that connection is lost( in NIC level) due to some reason > > now, > 1. how do the QP know about this(there is no IB specific event) If the QP was the responder of an RDMA operation which failed, there should be an async event on the QP. > 2. What about the remaining descriptors in the receiver side > are completions will be generated for them In case of an error, the QP state will be changed to error and all the WR (in SQ and RQ) will be flushed (with error) where does it happen? in the interrupt handler or ?? i have gone through the mthca code 1. there is no IQE or event corresponding to the connection lose 2. in the interrupt handlers only the event handler corresponding to that QP is called (no QP state change) Dotan --------------------------------- Find out what India is talking about on Yahoo! Answers India. -------------- next part -------------- An HTML attachment was scrubbed... URL: From k_mahesh85 at yahoo.co.in Mon Jul 24 05:47:54 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Mon, 24 Jul 2006 13:47:54 +0100 (BST) Subject: [openib-general] connection loss handling in mthca In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D956D@mtlexch01.mtl.com> Message-ID: <20060724124754.24447.qmail@web8313.mail.in.yahoo.com> Dotan Barak wrote: Message -----Original Message----- From: keshetti mahesh [mailto:k_mahesh85 at yahoo.co.in] Sent: Monday, July 24, 2006 3:21 PM To: Dotan Barak Subject: RE: [openib-general] connection loss handling in mthca Dotan Barak wrote: -----Original Message----- From: keshetti mahesh [mailto:k_mahesh85 at yahoo.co.in] Sent: Monday, July 24, 2006 2:40 PM To: Dotan Barak Subject: Re: [openib-general] connection loss handling in mthca Dotan Barak wrote: Hi. On Monday 24 July 2006 13:50, keshetti mahesh wrote: > i have a query regarding the handling of asynchronous events in mthca driver > consider the situation, receiver has posted some 10 descriptors. and 5 out of them are completd successfully, after that connection is lost( in NIC level) due to some reason > > now, > 1. how do the QP know about this(there is no IB specific event) If the QP was the responder of an RDMA operation which failed, there should be an async event on the QP. > 2. What about the remaining descriptors in the receiver side > are completions will be generated for them In case of an error, the QP state will be changed to error and all the WR (in SQ and RQ) will be flushed (with error) where does it happen? in the interrupt handler or ?? i have gone through the mthca code 1. there is no IQE or event corresponding to the connection lose 2. in the interrupt handlers only the event handler corresponding to that QP is called (no QP state change) [Dotan Barak] When there is an error with the QP, the QP state is being changed by the HCA (Automatically). The async event event occur only if the operation is an RDMA operation and the QP is the responder, there should be completion with error after the QP had the problem (is there are WR in the QP). the event is an affiliated event (only for this QP), so only the event handler of this QP should get this event. Dotan Dotan ok, now what i can understand is if we consider this case (i.e. connection lose) the HCA will automatically change the state of QP to error. No async event or error will be generated (this is not RDMA operation) . and a completion with error code (which error code????) will be generated for the completion which is in process and all other outstanding WRs will be flushed. is this OK??? ??with which error status the WR in progress will be completed. -Mahesh [Dotan Barak] what you understood is correct. I cannot tell you the expected status of the completion if i don't know what you are doing (which opcodes do you use, if the QP which go to error is responder or requestor ...). the first WR which fails will have a "meaningful" status and the rest of the completions status will be "flushed with error". Dotan let me put the whole thing again A is the sender(who has posted some 5 descriptors) and B is the receiver(who has posted the same 5 receive descriptors) now the sender(A) HCA has detected the connection lose due to " TPT error for data buffer" on the receiver (B) side then -the receiver(B) will be notified about this through an interrupt(affiliated asynchronous error) ?? -upon receiving the interrupt the receiver(B) HCA will transit the state of QP to error -what happens to the WR s in progress at the both ends ..with which code the completion will be generated??? -Mahesh --------------------------------- Find out what India is talking about on Yahoo! Answers India. -------------- next part -------------- An HTML attachment was scrubbed... URL: From minich at ornl.gov Mon Jul 24 05:58:29 2006 From: minich at ornl.gov (Makia Minich) Date: Mon, 24 Jul 2006 08:58:29 -0400 Subject: [openib-general] Problem with OFED on XT3 (update) In-Reply-To: Message-ID: Just wanted to update people on this (still looking for some insight, but not really expecting any). I was successfully able to bring up the entire stack (all loadable modules) without ifconfig'ing the ib0 interface. At this point, I'm able to participate in the network (I can see the subnet manager, ibping works between the XT3 and an separate node on the switch, ibstat sees the card information, and ibnetdiscover can see the entire network. So, it would seem that there's something going on in ipoib specifically. When I have a chance to try some other things, I'll update accordingly, but again if anyone happens to see something that seems interesting from the kernel panic below, let me know (not really expecting much, just hoping that someone's ran across a like problem). On 7/18/06 10:58 AM, "Makia Minich" wrote: > First, a little bit about what I'm trying to do (hoping that someone becomes > interested enough to keep reading), and then the problem. I'm currently > tasked with getting some form of infiniband up and running on a service node > of the Cray XT3. Because the XT3 is currently shipping with SuSE9 (with the > 2.6.5 based kernel) I decided to go with the OFED 1.0.1 release to see out > of the box what is going to happen. Because of the system layout, I'm > unable to change out the kernel, so there were some minor OFED source tweaks > that I needed to perform (attached) to satisfy some missing symbols. > > On loading modules, I was seemingly successful loading everything up to and > including ib_ipoib. Ifconfig showed the ib0 and ib1 devices available, and > /sys/class/infiniband showed link to the subnet manager was in place. > Attempting to assign an ip-address to the interface proved to be too much, > as the node kernel panicked with the following: > > general protection fault: 0000 [1] > CPU 0 > Pid: 11258, comm: ifconfig Tainted: P U (2.6.5-7.252-ss ) > RIP: 0010:[] > {__kfree_skb+173} > RSP: 0018:00000100c3cf3af8 EFLAGS: 00010286 > RAX: 1b6012ffffffff00 RBX: 0000000000000000 RCX: ffffffffffffffe8 > RDX: 0000000000000000 RSI: ffffffff80421ba0 RDI: 0000010005cfd340 > RBP: 00000100e0c97480 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffff4 > R13: ffffffff8029eeb0 R14: 0000000000000000 R15: 0000000000000003 > FS: 0000002a9588e0a0(0000) GS:ffffffff80514b40(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000002a9576650c CR3: 0000000000101000 CR4: 00000000000006e0 > Process ifconfig (pid: 11258, threadinfo 00000100c3cf2000, task > 00000100c3eba580) > Stack: > 0000000000000003 > 00000100c281f000 > 00000100e0c97480 > ffffffff802ab825 > 00000100c281f000 > ffffffff8029ef78 > 0000000000000000 > 00000100c281f000 > 0000000000000003 > ffffffff802a86e3 > Call Trace: > {noop_enqueue+37} > {dev_queue_xmit+200} > {nf_hook_slow+227} > {dev_queue_xmit+0} > {igmp6_send+724} > {fib6_walk_continue+192} > {fib6_clean_node+0} > {igmp6_join_group+51} > {igmp6_group_added+191} > {addrconf_prefix_route+225} > {mld_del_delrec+117} > {ipv6_dev_mc_inc+486} > {addrconf_join_solict+59} > {addrconf_dad_start+28} > {addrconf_add_linklocal+43} > {addrconf_dev_config+229} > {addrconf_notify+123} > {notifier_call_chain+31} > {dev_open+261} > {dev_change_flags+95} > {devinet_ioctl+756} > {inet_ioctl+87} > {sock_ioctl+577} > {sys_ioctl+532} > {error_exit+0} > {system_call+124} > Code: > ff > 08 > 0f > 94 > c2 > 84 > d2 > 74 > 09 > 48 > 8b > 01 > 48 > 89 > c7 > ff > 50 > 08 > 48 > 89 > RIP > {__kfree_skb+173} > RSP <00000100c3cf3af8> > > <0>Kernel panic: Aiee, killing interrupt handler! > In interrupt handler - not syncing > > Due to a lack of system dumps, I'm hoping that someone might have seen a > similar panic and might offer some things to try to resolve this issue. > > Thanks... -- Makia Minich National Center for Computation Science Oak Ridge National Laboratory Phone: 865.574.7460 From sean.hefty at intel.com Mon Jul 24 08:09:20 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 08:09:20 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <20060724061820.GA8470@mellanox.co.il> Message-ID: <000001c6af33$243f29f0$15d5180a@amr.corp.intel.com> >Some things unclear to me: >1. Is a user somehow *required* to call ib_cm_establish if he gets the event? > E.g. what about cma users? A user is not required to call ib_cm_establish. The patch fixes the issue where CMA users receive a COMM_EST event, but do not have the ability to call ib_cm_establish directly. It does this by pushing COMM_EST event handling directly to the IB CM. >2. There are now two ways CM gets notified about Communication established > events - what happends if both the user calls ib_cm_establish, and CM > gets this directly? The second call will fail due to an invalid state. Note that this failure was already possible, since an RTU could have been processed before the user could have called ib_cm_establish. >3. If I understand it correctly, CM does not yet know how to switch to the > alternate path for CM messages. So, is the patch still useful by itself, > or is it more of a framework preparatory work? The patch is useful by itself for the COMM_EST event. It is my intent that the same infrastructure be used to notify the IB CM of path migration. PATH_MIG event handling is currently limited to mthca reporting path migration events to the CM event handler. The IB CM currently just ignores the event. If it makes it cleaner, I can remove the line from mthca that reports PATH_MIG events. - Sean From rdreier at cisco.com Mon Jul 24 09:45:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Jul 2006 09:45:26 -0700 Subject: [openib-general] 3 patches in mst-for-2.6.18 In-Reply-To: <20060724075351.GB8795@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 24 Jul 2006 10:53:51 +0300") References: <20060724075351.GB8795@mellanox.co.il> Message-ID: > Roland, I have put the following patches in my mst-for-2.6.18 tree: OK, I have all three. I updated this: > [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 with the following, which uninlines ib_response_mad() for the following size savings: text data bss dec hex filename 2017675 434628 175844 2628147 281a33 ../vmlinux.orig 2017539 434652 175844 2628035 2819c3 ../vmlinux.noinline diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 2d4ee3f..1c3cfbb 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -167,6 +167,15 @@ static int is_vendor_method_in_use( return 0; } +int ib_response_mad(struct ib_mad *mad) +{ + return ((mad->mad_hdr.method & IB_MGMT_METHOD_RESP) || + (mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) || + ((mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_BM) && + (mad->mad_hdr.attr_mod & IB_BM_ATTR_MOD_RESP))); +} +EXPORT_SYMBOL(ib_response_mad); + /* * ib_register_mad_agent - Register to send/receive MADs */ diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index aebcdaf..585d28e 100644 --- a/include/rdma/ib_mad.h +++ b/include/rdma/ib_mad.h @@ -251,13 +251,7 @@ struct ib_mad_send_buf { * ib_response_mad - Returns if the specified MAD has been generated in * response to a sent request or trap. */ -static inline int ib_response_mad(struct ib_mad *mad) -{ - return ((mad->mad_hdr.method & IB_MGMT_METHOD_RESP) || - (mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) || - ((mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_BM) && - (mad->mad_hdr.attr_mod & IB_BM_ATTR_MOD_RESP))); -} +int ib_response_mad(struct ib_mad *mad); /** * ib_get_rmpp_resptime - Returns the RMPP response time. From rdreier at cisco.com Mon Jul 24 09:45:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Jul 2006 09:45:34 -0700 Subject: [openib-general] [PATCH] IB/ipoib: fix crash on mcast join finish In-Reply-To: (Or Gerlitz's message of "Mon, 24 Jul 2006 10:00:01 +0300 (IDT)") References: Message-ID: Thanks, applied. From rdreier at cisco.com Mon Jul 24 09:50:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Jul 2006 09:50:23 -0700 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus to get a few fixes: Dotan Barak: IB/mthca: Fix SRQ limit event range check Michael S. Tsirkin: IB/uverbs: Fix unlocking in error paths IB/ipoib: Fix packet loss after hardware address update Or Gerlitz: IB/ipoib: Fix oops with ipoib_debug_mcast set Ralph Campbell: IB/ipath: Fix a data corruption IB/ipath: Fix ib_ipath driver to work with SRP IB/ipath: ipath_skip_sge() can break if num_sge > 1 Roland Dreier: IB/uverbs: Fix lockdep warnings IB/mthca: Initialize max_cmds before debug code prints it Sean Hefty: IB/mad: Validate MADs for spec compliance drivers/infiniband/core/mad.c | 22 +++--- drivers/infiniband/core/user_mad.c | 87 +++++++++++++++++++++--- drivers/infiniband/core/uverbs_cmd.c | 42 ++++++++---- drivers/infiniband/hw/ipath/ipath_driver.c | 76 ++++++++++----------- drivers/infiniband/hw/ipath/ipath_keys.c | 15 ++++ drivers/infiniband/hw/ipath/ipath_verbs.c | 5 - drivers/infiniband/hw/mthca/mthca_cmd.c | 5 + drivers/infiniband/hw/mthca/mthca_srq.c | 3 + drivers/infiniband/ulp/ipoib/ipoib.h | 1 drivers/infiniband/ulp/ipoib/ipoib_main.c | 23 ++++++ drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 8 +- include/rdma/ib_mad.h | 7 ++ 12 files changed, 209 insertions(+), 85 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 5ed4dab..1c3cfbb 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -167,6 +167,15 @@ static int is_vendor_method_in_use( return 0; } +int ib_response_mad(struct ib_mad *mad) +{ + return ((mad->mad_hdr.method & IB_MGMT_METHOD_RESP) || + (mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) || + ((mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_BM) && + (mad->mad_hdr.attr_mod & IB_BM_ATTR_MOD_RESP))); +} +EXPORT_SYMBOL(ib_response_mad); + /* * ib_register_mad_agent - Register to send/receive MADs */ @@ -570,13 +579,6 @@ int ib_unregister_mad_agent(struct ib_ma } EXPORT_SYMBOL(ib_unregister_mad_agent); -static inline int response_mad(struct ib_mad *mad) -{ - /* Trap represses are responses although response bit is reset */ - return ((mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) || - (mad->mad_hdr.method & IB_MGMT_METHOD_RESP)); -} - static void dequeue_mad(struct ib_mad_list_head *mad_list) { struct ib_mad_queue *mad_queue; @@ -723,7 +725,7 @@ static int handle_outgoing_dr_smp(struct switch (ret) { case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY: - if (response_mad(&mad_priv->mad.mad) && + if (ib_response_mad(&mad_priv->mad.mad) && mad_agent_priv->agent.recv_handler) { local->mad_priv = mad_priv; local->recv_mad_agent = mad_agent_priv; @@ -1551,7 +1553,7 @@ find_mad_agent(struct ib_mad_port_privat unsigned long flags; spin_lock_irqsave(&port_priv->reg_lock, flags); - if (response_mad(mad)) { + if (ib_response_mad(mad)) { u32 hi_tid; struct ib_mad_agent_private *entry; @@ -1799,7 +1801,7 @@ static void ib_mad_complete_recv(struct } /* Complete corresponding request */ - if (response_mad(mad_recv_wc->recv_buf.mad)) { + if (ib_response_mad(mad_recv_wc->recv_buf.mad)) { spin_lock_irqsave(&mad_agent_priv->lock, flags); mad_send_wr = ib_find_send_mad(mad_agent_priv, mad_recv_wc); if (!mad_send_wr) { diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index afe70a5..1273f88 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -112,8 +112,10 @@ struct ib_umad_device { struct ib_umad_file { struct ib_umad_port *port; struct list_head recv_list; + struct list_head send_list; struct list_head port_list; spinlock_t recv_lock; + spinlock_t send_lock; wait_queue_head_t recv_wait; struct ib_mad_agent *agent[IB_UMAD_MAX_AGENTS]; int agents_dead; @@ -177,12 +179,21 @@ static int queue_packet(struct ib_umad_f return ret; } +static void dequeue_send(struct ib_umad_file *file, + struct ib_umad_packet *packet) + { + spin_lock_irq(&file->send_lock); + list_del(&packet->list); + spin_unlock_irq(&file->send_lock); + } + static void send_handler(struct ib_mad_agent *agent, struct ib_mad_send_wc *send_wc) { struct ib_umad_file *file = agent->context; struct ib_umad_packet *packet = send_wc->send_buf->context[0]; + dequeue_send(file, packet); ib_destroy_ah(packet->msg->ah); ib_free_send_mad(packet->msg); @@ -370,6 +381,51 @@ static int copy_rmpp_mad(struct ib_mad_s return 0; } +static int same_destination(struct ib_user_mad_hdr *hdr1, + struct ib_user_mad_hdr *hdr2) +{ + if (!hdr1->grh_present && !hdr2->grh_present) + return (hdr1->lid == hdr2->lid); + + if (hdr1->grh_present && hdr2->grh_present) + return !memcmp(hdr1->gid, hdr2->gid, 16); + + return 0; +} + +static int is_duplicate(struct ib_umad_file *file, + struct ib_umad_packet *packet) +{ + struct ib_umad_packet *sent_packet; + struct ib_mad_hdr *sent_hdr, *hdr; + + hdr = (struct ib_mad_hdr *) packet->mad.data; + list_for_each_entry(sent_packet, &file->send_list, list) { + sent_hdr = (struct ib_mad_hdr *) sent_packet->mad.data; + + if ((hdr->tid != sent_hdr->tid) || + (hdr->mgmt_class != sent_hdr->mgmt_class)) + continue; + + /* + * No need to be overly clever here. If two new operations have + * the same TID, reject the second as a duplicate. This is more + * restrictive than required by the spec. + */ + if (!ib_response_mad((struct ib_mad *) hdr)) { + if (!ib_response_mad((struct ib_mad *) sent_hdr)) + return 1; + continue; + } else if (!ib_response_mad((struct ib_mad *) sent_hdr)) + continue; + + if (same_destination(&packet->mad.hdr, &sent_packet->mad.hdr)) + return 1; + } + + return 0; +} + static ssize_t ib_umad_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { @@ -379,7 +435,6 @@ static ssize_t ib_umad_write(struct file struct ib_ah_attr ah_attr; struct ib_ah *ah; struct ib_rmpp_mad *rmpp_mad; - u8 method; __be64 *tid; int ret, data_len, hdr_len, copy_offset, rmpp_active; @@ -473,28 +528,36 @@ static ssize_t ib_umad_write(struct file } /* - * If userspace is generating a request that will generate a - * response, we need to make sure the high-order part of the - * transaction ID matches the agent being used to send the - * MAD. + * Set the high-order part of the transaction ID to make MADs from + * different agents unique, and allow routing responses back to the + * original requestor. */ - method = ((struct ib_mad_hdr *) packet->msg->mad)->method; - - if (!(method & IB_MGMT_METHOD_RESP) && - method != IB_MGMT_METHOD_TRAP_REPRESS && - method != IB_MGMT_METHOD_SEND) { + if (!ib_response_mad(packet->msg->mad)) { tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid; *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 | (be64_to_cpup(tid) & 0xffffffff)); + rmpp_mad->mad_hdr.tid = *tid; + } + + spin_lock_irq(&file->send_lock); + ret = is_duplicate(file, packet); + if (!ret) + list_add_tail(&packet->list, &file->send_list); + spin_unlock_irq(&file->send_lock); + if (ret) { + ret = -EINVAL; + goto err_msg; } ret = ib_post_send_mad(packet->msg, NULL); if (ret) - goto err_msg; + goto err_send; up_read(&file->port->mutex); return count; +err_send: + dequeue_send(file, packet); err_msg: ib_free_send_mad(packet->msg); err_ah: @@ -657,7 +720,9 @@ static int ib_umad_open(struct inode *in } spin_lock_init(&file->recv_lock); + spin_lock_init(&file->send_lock); INIT_LIST_HEAD(&file->recv_list); + INIT_LIST_HEAD(&file->send_list); init_waitqueue_head(&file->recv_wait); file->port = port; diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index bdf5d50..30923eb 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -42,6 +42,13 @@ #include #include "uverbs.h" +static struct lock_class_key pd_lock_key; +static struct lock_class_key mr_lock_key; +static struct lock_class_key cq_lock_key; +static struct lock_class_key qp_lock_key; +static struct lock_class_key ah_lock_key; +static struct lock_class_key srq_lock_key; + #define INIT_UDATA(udata, ibuf, obuf, ilen, olen) \ do { \ (udata)->inbuf = (void __user *) (ibuf); \ @@ -76,12 +83,13 @@ #define INIT_UDATA(udata, ibuf, obuf, il */ static void init_uobj(struct ib_uobject *uobj, u64 user_handle, - struct ib_ucontext *context) + struct ib_ucontext *context, struct lock_class_key *key) { uobj->user_handle = user_handle; uobj->context = context; kref_init(&uobj->ref); init_rwsem(&uobj->mutex); + lockdep_set_class(&uobj->mutex, key); uobj->live = 0; } @@ -470,7 +478,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uve if (!uobj) return -ENOMEM; - init_uobj(uobj, 0, file->ucontext); + init_uobj(uobj, 0, file->ucontext, &pd_lock_key); down_write(&uobj->mutex); pd = file->device->ib_dev->alloc_pd(file->device->ib_dev, @@ -591,7 +599,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverb if (!obj) return -ENOMEM; - init_uobj(&obj->uobject, 0, file->ucontext); + init_uobj(&obj->uobject, 0, file->ucontext, &mr_lock_key); down_write(&obj->uobject.mutex); /* @@ -770,7 +778,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uv if (!obj) return -ENOMEM; - init_uobj(&obj->uobject, cmd.user_handle, file->ucontext); + init_uobj(&obj->uobject, cmd.user_handle, file->ucontext, &cq_lock_key); down_write(&obj->uobject.mutex); if (cmd.comp_channel >= 0) { @@ -1051,13 +1059,14 @@ ssize_t ib_uverbs_create_qp(struct ib_uv if (!obj) return -ENOMEM; - init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext); + init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext, &qp_lock_key); down_write(&obj->uevent.uobject.mutex); + srq = cmd.is_srq ? idr_read_srq(cmd.srq_handle, file->ucontext) : NULL; pd = idr_read_pd(cmd.pd_handle, file->ucontext); scq = idr_read_cq(cmd.send_cq_handle, file->ucontext); - rcq = idr_read_cq(cmd.recv_cq_handle, file->ucontext); - srq = cmd.is_srq ? idr_read_srq(cmd.srq_handle, file->ucontext) : NULL; + rcq = cmd.recv_cq_handle == cmd.send_cq_handle ? + scq : idr_read_cq(cmd.recv_cq_handle, file->ucontext); if (!pd || !scq || !rcq || (cmd.is_srq && !srq)) { ret = -EINVAL; @@ -1125,7 +1134,8 @@ ssize_t ib_uverbs_create_qp(struct ib_uv put_pd_read(pd); put_cq_read(scq); - put_cq_read(rcq); + if (rcq != scq) + put_cq_read(rcq); if (srq) put_srq_read(srq); @@ -1150,7 +1160,7 @@ err_put: put_pd_read(pd); if (scq) put_cq_read(scq); - if (rcq) + if (rcq && rcq != scq) put_cq_read(rcq); if (srq) put_srq_read(srq); @@ -1751,7 +1761,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uv if (!uobj) return -ENOMEM; - init_uobj(uobj, cmd.user_handle, file->ucontext); + init_uobj(uobj, cmd.user_handle, file->ucontext, &ah_lock_key); down_write(&uobj->mutex); pd = idr_read_pd(cmd.pd_handle, file->ucontext); @@ -1775,7 +1785,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uv ah = ib_create_ah(pd, &attr); if (IS_ERR(ah)) { ret = PTR_ERR(ah); - goto err; + goto err_put; } ah->uobject = uobj; @@ -1811,6 +1821,9 @@ err_copy: err_destroy: ib_destroy_ah(ah); +err_put: + put_pd_read(pd); + err: put_uobj_write(uobj); return ret; @@ -1963,7 +1976,7 @@ ssize_t ib_uverbs_create_srq(struct ib_u if (!obj) return -ENOMEM; - init_uobj(&obj->uobject, cmd.user_handle, file->ucontext); + init_uobj(&obj->uobject, cmd.user_handle, file->ucontext, &srq_lock_key); down_write(&obj->uobject.mutex); pd = idr_read_pd(cmd.pd_handle, file->ucontext); @@ -1984,7 +1997,7 @@ ssize_t ib_uverbs_create_srq(struct ib_u srq = pd->device->create_srq(pd, &attr, &udata); if (IS_ERR(srq)) { ret = PTR_ERR(srq); - goto err; + goto err_put; } srq->device = pd->device; @@ -2029,6 +2042,9 @@ err_copy: err_destroy: ib_destroy_srq(srq); +err_put: + put_pd_read(pd); + err: put_uobj_write(&obj->uobject); return ret; diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 823131d..f98518d 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -859,6 +859,38 @@ static void ipath_rcv_layer(struct ipath __ipath_layer_rcv_lid(dd, hdr); } +static void ipath_rcv_hdrerr(struct ipath_devdata *dd, + u32 eflags, + u32 l, + u32 etail, + u64 *rc) +{ + char emsg[128]; + struct ipath_message_header *hdr; + + get_rhf_errstring(eflags, emsg, sizeof emsg); + hdr = (struct ipath_message_header *)&rc[1]; + ipath_cdbg(PKT, "RHFerrs %x hdrqtail=%x typ=%u " + "tlen=%x opcode=%x egridx=%x: %s\n", + eflags, l, + ipath_hdrget_rcv_type((__le32 *) rc), + ipath_hdrget_length_in_bytes((__le32 *) rc), + be32_to_cpu(hdr->bth[0]) >> 24, + etail, emsg); + + /* Count local link integrity errors. */ + if (eflags & (INFINIPATH_RHF_H_ICRCERR | INFINIPATH_RHF_H_VCRCERR)) { + u8 n = (dd->ipath_ibcctrl >> + INFINIPATH_IBCC_PHYERRTHRESHOLD_SHIFT) & + INFINIPATH_IBCC_PHYERRTHRESHOLD_MASK; + + if (++dd->ipath_lli_counter > n) { + dd->ipath_lli_counter = 0; + dd->ipath_lli_errors++; + } + } +} + /* * ipath_kreceive - receive a packet * @dd: the infinipath device @@ -875,7 +907,6 @@ void ipath_kreceive(struct ipath_devdata struct ipath_message_header *hdr; u32 eflags, i, etype, tlen, pkttot = 0, updegr=0, reloop=0; static u64 totcalls; /* stats, may eventually remove */ - char emsg[128]; if (!dd->ipath_hdrqtailptr) { ipath_dev_err(dd, @@ -938,26 +969,9 @@ reloop: "%x\n", etype); } - if (eflags & ~(INFINIPATH_RHF_H_TIDERR | - INFINIPATH_RHF_H_IHDRERR)) { - get_rhf_errstring(eflags, emsg, sizeof emsg); - ipath_cdbg(PKT, "RHFerrs %x hdrqtail=%x typ=%u " - "tlen=%x opcode=%x egridx=%x: %s\n", - eflags, l, etype, tlen, bthbytes[0], - ipath_hdrget_index((__le32 *) rc), emsg); - /* Count local link integrity errors. */ - if (eflags & (INFINIPATH_RHF_H_ICRCERR | - INFINIPATH_RHF_H_VCRCERR)) { - u8 n = (dd->ipath_ibcctrl >> - INFINIPATH_IBCC_PHYERRTHRESHOLD_SHIFT) & - INFINIPATH_IBCC_PHYERRTHRESHOLD_MASK; - - if (++dd->ipath_lli_counter > n) { - dd->ipath_lli_counter = 0; - dd->ipath_lli_errors++; - } - } - } else if (etype == RCVHQ_RCV_TYPE_NON_KD) { + if (unlikely(eflags)) + ipath_rcv_hdrerr(dd, eflags, l, etail, rc); + else if (etype == RCVHQ_RCV_TYPE_NON_KD) { int ret = __ipath_verbs_rcv(dd, rc + 1, ebuf, tlen); if (ret == -ENODEV) @@ -981,25 +995,7 @@ reloop: else if (etype == RCVHQ_RCV_TYPE_EXPECTED) ipath_dbg("Bug: Expected TID, opcode %x; ignored\n", be32_to_cpu(hdr->bth[0]) & 0xff); - else if (eflags & (INFINIPATH_RHF_H_TIDERR | - INFINIPATH_RHF_H_IHDRERR)) { - /* - * This is a type 3 packet, only the LRH is in the - * rcvhdrq, the rest of the header is in the eager - * buffer. - */ - u8 opcode; - if (ebuf) { - bthbytes = (u8 *) ebuf; - opcode = *bthbytes; - } - else - opcode = 0; - get_rhf_errstring(eflags, emsg, sizeof emsg); - ipath_dbg("Err %x (%s), opcode %x, egrbuf %x, " - "len %x\n", eflags, emsg, opcode, etail, - tlen); - } else { + else { /* * error packet, type of error unknown. * Probably type 3, but we don't know, so don't diff --git a/drivers/infiniband/hw/ipath/ipath_keys.c b/drivers/infiniband/hw/ipath/ipath_keys.c index 46773c6..a5ca279 100644 --- a/drivers/infiniband/hw/ipath/ipath_keys.c +++ b/drivers/infiniband/hw/ipath/ipath_keys.c @@ -197,6 +197,21 @@ int ipath_rkey_ok(struct ipath_ibdev *de size_t off; int ret; + /* + * We use RKEY == zero for physical addresses + * (see ipath_get_dma_mr). + */ + if (rkey == 0) { + sge->mr = NULL; + sge->vaddr = phys_to_virt(vaddr); + sge->length = len; + sge->sge_length = len; + ss->sg_list = NULL; + ss->num_sge = 1; + ret = 1; + goto bail; + } + mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))]; if (unlikely(mr == NULL || mr->lkey != rkey)) { ret = 0; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 56ac336..d70a9b6 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -191,10 +191,6 @@ void ipath_skip_sge(struct ipath_sge_sta { struct ipath_sge *sge = &ss->sge; - while (length > sge->sge_length) { - length -= sge->sge_length; - ss->sge = *ss->sg_list++; - } while (length) { u32 len = sge->length; @@ -627,6 +623,7 @@ static int ipath_query_device(struct ib_ props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR | IB_DEVICE_BAD_QKEY_CNTR | IB_DEVICE_SHUTDOWN_PORT | IB_DEVICE_SYS_IMAGE_GUID; + props->page_size_cap = PAGE_SIZE; props->vendor_id = ipath_layer_get_vendorid(dev->dd); props->vendor_part_id = ipath_layer_get_deviceid(dev->dd); props->hw_ver = ipath_layer_get_pcirev(dev->dd); diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index d0f7731..deabc14 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -778,11 +778,12 @@ #define QUERY_FW_EQ_SET_CI_BASE_OFFSET 0 ((dev->fw_ver & 0xffff0000ull) >> 16) | ((dev->fw_ver & 0x0000ffffull) << 16); + MTHCA_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET); + dev->cmd.max_cmds = 1 << lg; + mthca_dbg(dev, "FW version %012llx, max commands %d\n", (unsigned long long) dev->fw_ver, dev->cmd.max_cmds); - MTHCA_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET); - dev->cmd.max_cmds = 1 << lg; MTHCA_GET(dev->catas_err.addr, outbox, QUERY_FW_ERR_START_OFFSET); MTHCA_GET(dev->catas_err.size, outbox, QUERY_FW_ERR_SIZE_OFFSET); diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index fab417c..b60a9d7 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -370,7 +370,8 @@ int mthca_modify_srq(struct ib_srq *ibsr return -EINVAL; if (attr_mask & IB_SRQ_LIMIT) { - if (attr->srq_limit > srq->max) + u32 max_wr = mthca_is_memfree(dev) ? srq->max - 1 : srq->max; + if (attr->srq_limit > max_wr) return -EINVAL; mutex_lock(&srq->mutex); diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 3f89f5e..474aa21 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -212,6 +212,7 @@ struct ipoib_path { struct ipoib_neigh { struct ipoib_ah *ah; + union ib_gid dgid; struct sk_buff_head queue; struct neighbour *neighbour; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 1c6ea1c..cf71d2a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -404,6 +404,8 @@ static void path_rec_completion(int stat list_for_each_entry(neigh, &path->neigh_list, list) { kref_get(&path->ah->ref); neigh->ah = path->ah; + memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, + sizeof(union ib_gid)); while ((skb = __skb_dequeue(&neigh->queue))) __skb_queue_tail(&skqueue, skb); @@ -510,6 +512,8 @@ static void neigh_add_path(struct sk_buf if (path->ah) { kref_get(&path->ah->ref); neigh->ah = path->ah; + memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, + sizeof(union ib_gid)); ipoib_send(dev, skb, path->ah, be32_to_cpup((__be32 *) skb->dst->neighbour->ha)); @@ -633,6 +637,25 @@ static int ipoib_start_xmit(struct sk_bu neigh = *to_ipoib_neigh(skb->dst->neighbour); if (likely(neigh->ah)) { + if (unlikely(memcmp(&neigh->dgid.raw, + skb->dst->neighbour->ha + 4, + sizeof(union ib_gid)))) { + spin_lock(&priv->lock); + /* + * It's safe to call ipoib_put_ah() inside + * priv->lock here, because we know that + * path->ah will always hold one more reference, + * so ipoib_put_ah() will never do more than + * decrement the ref count. + */ + ipoib_put_ah(neigh->ah); + list_del(&neigh->list); + ipoib_neigh_free(neigh); + spin_unlock(&priv->lock); + ipoib_path_lookup(skb, dev); + goto out; + } + ipoib_send(dev, skb, neigh->ah, be32_to_cpup((__be32 *) skb->dst->neighbour->ha)); goto out; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index ab40488..b5e6a7b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -264,6 +264,10 @@ static int ipoib_mcast_join_finish(struc if (!ah) { ipoib_warn(priv, "ib_address_create failed\n"); } else { + spin_lock_irq(&priv->lock); + mcast->ah = ah; + spin_unlock_irq(&priv->lock); + ipoib_dbg_mcast(priv, "MGID " IPOIB_GID_FMT " AV %p, LID 0x%04x, SL %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), @@ -271,10 +275,6 @@ static int ipoib_mcast_join_finish(struc be16_to_cpu(mcast->mcmember.mlid), mcast->mcmember.sl); } - - spin_lock_irq(&priv->lock); - mcast->ah = ah; - spin_unlock_irq(&priv->lock); } /* actually send any queued packets */ diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index 5ff7755..585d28e 100644 --- a/include/rdma/ib_mad.h +++ b/include/rdma/ib_mad.h @@ -75,6 +75,7 @@ #define IB_MGMT_METHOD_REPORT_RESP 0x86 #define IB_MGMT_METHOD_TRAP_REPRESS 0x07 #define IB_MGMT_METHOD_RESP 0x80 +#define IB_BM_ATTR_MOD_RESP cpu_to_be32(1) #define IB_MGMT_MAX_METHODS 128 @@ -247,6 +248,12 @@ struct ib_mad_send_buf { }; /** + * ib_response_mad - Returns if the specified MAD has been generated in + * response to a sent request or trap. + */ +int ib_response_mad(struct ib_mad *mad); + +/** * ib_get_rmpp_resptime - Returns the RMPP response time. * @rmpp_hdr: An RMPP header. */ From mshefty at ichips.intel.com Mon Jul 24 10:03:37 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 10:03:37 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <000101c6aee3$951b95a0$87f9070a@amr.corp.intel.com> References: <000101c6aee3$951b95a0$87f9070a@amr.corp.intel.com> Message-ID: <44C4FD69.9040005@ichips.intel.com> Sean Hefty wrote: > The race can only happen on the passive side, but... for peer to peer > connections, we won't know which side is passive until later, and this tracks > the local QPN as part of timewait now. (Note that peer to peer support requires > more work.) For peer to peer, tracking the QP would still only need to be done on the passive side, so that justification can be ignored. This means that we are tracking local QPN as part of timewait. - Sean From somenath at veritas.com Mon Jul 24 10:21:05 2006 From: somenath at veritas.com (somenath) Date: Mon, 24 Jul 2006 10:21:05 -0700 Subject: [openib-general] openib tools to update mellnox firmware In-Reply-To: <44C355A6.9080009@mellanox.co.il> References: <44BE6930.1080105@veritas.com> <44C355A6.9080009@mellanox.co.il> Message-ID: <44C50181.4000304@veritas.com> Tziporet Koren wrote: > somenath wrote: > >> I see this messages while booting my system: >> >> Jul 17 10:35:37 alekhine kernel: ib_mthca 0000:03:00.0: HCA FW >> version 4.6.0 is old (4.7.0 is current). >> Jul 17 10:35:37 alekhine kernel: ib_mthca 0000:03:00.0: If you have >> problems, try updating your HCA FW. >> >> where can I find the latest firmware for ib_mthca (pci express card) >> and tools to update this? >> >> thanks, som. >> >> >> > > Most updated FW is located at: > http://www.mellanox.com/support/firmware_download.php > > You can download a binary image and burn it using mstflint (which is > part of OFED), or download also MFT tools (from the same location) to > create a binary image. > > Tziporet > thanks Tziporet. from ur link , I am looking here http://www.mellanox.com/support/firmware_table.php for the exact HCA I have from your list. I don't find the exact match of OPN and Card Rev (PSID) and I don't know how to find those. all I found about that card is this: [root at alekhine bin]# ./ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.6.0 Hardware version: a0 Node GUID: 0x0005ad0000039abc System image GUID: 0x0005ad000100d050 Is there another tool which will give me OPN/Card Rev (PSID) so that I can match exactly from the table above? also, another question: is the latest firmware backward compatible with the openib stack I am using? (from redhat binary distribtution RH4 update 3)? thanks, som. From sean.hefty at intel.com Mon Jul 24 09:59:20 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 09:59:20 -0700 Subject: [openib-general] [PATCH] svn checkin to merge upstream In-Reply-To: <20060723092933.GA26317@mellanox.co.il> Message-ID: <000101c6af42$81b73e10$ff0da8c0@amr.corp.intel.com> I reverted svn checkin 8629 and replaced it with check-in 8646, shown below. This only causes a minor change to the CM event handling patch. Return error code, rather than success (0), on failure allocating timewait_info in ib_send_cm_req. Signed-off-by: Sean Hefty --- Index: cm.c =================================================================== --- cm.c (revision 8645) +++ cm.c (revision 8646) @@ -976,8 +976,10 @@ cm_id_priv->timewait_info = cm_create_timewait_info(cm_id_priv-> id.local_id); - if (IS_ERR(cm_id_priv->timewait_info)) + if (IS_ERR(cm_id_priv->timewait_info)) { + ret = PTR_ERR(cm_id_priv->timewait_info); goto out; + } ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av); if (ret) From mshefty at ichips.intel.com Mon Jul 24 10:35:21 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 10:35:21 -0700 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it In-Reply-To: <000401c6aee6$8252e6a0$87f9070a@amr.corp.intel.com> References: <000401c6aee6$8252e6a0$87f9070a@amr.corp.intel.com> Message-ID: <44C504D9.7080407@ichips.intel.com> Sean Hefty wrote: > Yes - I inadvertently removed this fix when I removed libsysfs from the > librdmacm. I will re-instate the fix. 7497 was a fix that added libsysfs to the librdmacm as a result of it being removed from libibverbs. In any case, I applied a fix to assume the latest ABI version. - Sean From afriedle at open-mpi.org Mon Jul 24 10:59:27 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Mon, 24 Jul 2006 10:59:27 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <000101c6a6d4$87fe8600$ff0da8c0@amr.corp.intel.com> References: <000101c6a6d4$87fe8600$ff0da8c0@amr.corp.intel.com> Message-ID: <44C50A7F.7010404@open-mpi.org> What's the best way to map the ibv_context's return from rdma_get_devices() to rdma_cm_id's for use with the rest of the RDMA functions? Should I allocate my own rdma_cm_id and fill it in (like rdma_create_id()), or am I missing something? Andrew Sean Hefty wrote: > The following patch adds calls to the userspace RDMA CM to return its list > of RDMA devices. The calls are similar to ibv_get_device_list() / > ibv_free_device_list(). > > Currently, RDMA device contexts are handed to the user only after they > create an rdma_cm_id and bind it to a local device. By exposing the device > list to the user, it makes it easier for the user to allocate device > specific resources (such as PDs, CQs, etc.) that are shared among multiple > rdma_cm_id's. > > Signed-off-by: Sean Hefty > --- > Index: include/rdma/rdma_cma.h > =================================================================== > --- include/rdma/rdma_cma.h (revision 8215) > +++ include/rdma/rdma_cma.h (working copy) > @@ -332,4 +332,20 @@ static inline uint16_t rdma_get_dst_port > ((struct sockaddr_in *) &id->route.addr.dst_addr)->sin_port; > } > > +/** > + * rdma_get_devices - Get list of RDMA devices currently available. > + * @num_devices: If non-NULL, set to the number of devices returned. > + * > + * Return a NULL-terminated array of opened RDMA devices. Callers can use this > + * routine to allocate resources on specific RDMA devices that will be shared > + * across multiple rdma_cm_id's. > + * The array must be released by calling rdma_free_devices(). > + */ > +struct ibv_context **rdma_get_devices(int *num_devices); > + > +/** > + * rdma_free_devices - Frees the list of devices returned by rdma_get_devices(). > + */ > +void rdma_free_devices(struct ibv_context **list); > + > #endif /* RDMA_CMA_H */ > Index: src/cma.c > =================================================================== > --- src/cma.c (revision 8517) > +++ src/cma.c (working copy) > @@ -216,6 +216,32 @@ err: > return ret; > } > > +struct ibv_context **rdma_get_devices(int *num_devices) > +{ > + struct ibv_context **devs = NULL; > + int i; > + > + if (!cma_dev_cnt && ucma_init()) > + goto out; > + > + devs = malloc(sizeof *devs * (cma_dev_cnt + 1)); > + if (!devs) > + goto out; > + > + for (i = 0; i < cma_dev_cnt; i++) > + devs[i] = cma_dev_array[i].verbs; > + devs[i] = NULL; > +out: > + if (num_devices) > + *num_devices = devs ? cma_dev_cnt : 0; > + return devs; > +} > + > +void rdma_free_devices(struct ibv_context **list) > +{ > + free(list); > +} > + > static void __attribute__((destructor)) rdma_cma_fini(void) > { > ucma_cleanup(); > Index: src/librdmacm.map > =================================================================== > --- src/librdmacm.map (revision 8215) > +++ src/librdmacm.map (working copy) > @@ -21,5 +21,7 @@ RDMACM_1.0 { > rdma_get_dst_attr; > rdma_join_multicast; > rdma_leave_multicast; > + rdma_get_devices; > + rdma_free_devices; > local: *; > }; > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ishai at mellanox.co.il Mon Jul 24 09:56:02 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 24 Jul 2006 19:56:02 +0300 Subject: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers Message-ID: <20060724165602.GA8600@mellanox.co.il> Hi, The current SRP initiator code cannot work with several fail-over mechanisms. The current srp driver's behavior when a target off-line then online: 1) The target is offline. 2) the initiator tries to reconnect and fails 3) The initiator calls srp_remove_work that removes the scsi_host. 4) The target is back online. 5) the user (or the ibsrpdm daemon) is expected to execute a new add_target. 6) This creates a new scsi_host (with new names to the devices and new index in the scsi_host directory in sysfs) for this target. Fail-over drivers (e.g., MPP that is used by Engenio and XVM that is used by SGI) have problems with this behavior (item 3). They need the scsi_host to keep exist and return errors in the meanwhile until the connection to the target resumes. In addition remove/re-alloc scsi host is a "heavy" operation instead of disconnect/reconnect the connection only. In order to support these tools I propose the following changes that will allow the user to move the srp initiator to a disconnected state (when the target leaves the fabric) and reconnect it later (when the target returns to the fabric). After these changes will be in the ib_srp module, the ibsrpdm daemon will be able to monitor the presence of targets in the fabric and to use this interface (When targets leave or rejoin the fabric). Here is the description of the new design: (I already implemented most of the code) 1) Split the function srp_reconnect_target into two functions: _srp_disconnect_target and _srp_reconnect_target 2) Adding two new states: SRP_TARGET_DISCONNECTED (The state after _srp_disconnect_target was executed and before _srp_reconnect_target is executed) and SRP_TARGET_DISCONNECTING (The state while in srp_remove_target). 3) Adding new input files in sysfs: /sys/class/scsi_host/host?/{disconnect_target,connect_target,erase_target} 4) Writing the string "remove" to /sys/class/scsi_host/host?/disconnect_target calls srp_disconnect_target that moves the corresponding target to a SRP_TARGET_DISCONNECTED state (After closing the cm, and reset all pending requests). Now when the scsi performs queuecommand to this host the result is DID_NO_CONNECT. This causes the scsi mid-layer to return to the user with an IO error without initiating the scsi error auto recovery chain. 5) Writing anything to /sys/class/scsi_host/host?/reconnect_target calls _srp_reconnect_target that move the target to SRP_TARGET_LIVE state again. 6) Writing "erase" to /sys/class/scsi_host/host?/erase_target calls srp_remove_work that removes the scsi_host. 7) Adding output files in sysfs to present the HCA and port that the initiator used to connect to the target. Using these files and the target GUID the ibsrpdm can know on which scsi_host to perform the reconnect_target. Please comment. -- Ishai Rabinovitz From mst at mellanox.co.il Mon Jul 24 11:10:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 24 Jul 2006 21:10:06 +0300 Subject: [openib-general] 3 patches in mst-for-2.6.18 In-Reply-To: References: Message-ID: <20060724181006.GA11915@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: 3 patches in mst-for-2.6.18 > > > Roland, I have put the following patches in my mst-for-2.6.18 tree: > > OK, I have all three. > > I updated this: > > > [PATCH] IB/mad: validate MADs for spec compliance C13-18.1.1 > > with the following, which uninlines ib_response_mad() for the > following size savings: > > text data bss dec hex filename > 2017675 434628 175844 2628147 281a33 ../vmlinux.orig > 2017539 434652 175844 2628035 2819c3 ../vmlinux.noinline Looks fine. One point to take into account, in my experience, is that to be fair with size savings checks, you should build with -Os. Otherwise compiler seems to behave in a wildly unpredictable way with respect to code size - sometimes removing code results in size increases as compiler decides to e.g. unroll some loops. -- MST From sean.hefty at intel.com Mon Jul 24 10:48:30 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 10:48:30 -0700 Subject: [openib-general] 3 patches in mst-for-2.6.18 In-Reply-To: Message-ID: <000201c6af49$606d5850$ff0da8c0@amr.corp.intel.com> >with the following, which uninlines ib_response_mad() for the >following size savings: I've committed this change to svn as well. (I committed while writing the svn log message, so it's truncated.) - Sean From rdreier at cisco.com Mon Jul 24 11:39:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Jul 2006 11:39:40 -0700 Subject: [openib-general] 3 patches in mst-for-2.6.18 In-Reply-To: <20060724181006.GA11915@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 24 Jul 2006 21:10:06 +0300") References: <20060724181006.GA11915@mellanox.co.il> Message-ID: > One point to take into account, in my experience, is that to be fair with size > savings checks, you should build with -Os. Yes, I have CONFIG_CC_OPTIMIZE_FOR_SIZE=y From sean.hefty at intel.com Mon Jul 24 11:41:19 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 11:41:19 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44C50A7F.7010404@open-mpi.org> Message-ID: <000301c6af50$c1581ea0$ff0da8c0@amr.corp.intel.com> >What's the best way to map the ibv_context's return from >rdma_get_devices() to rdma_cm_id's for use with the rest of the RDMA >functions? Once an rdma_cm_id has been bound to a device, it will reference one of the devices (ibv_context) that's also return from rdma_get_devices(). I.e. the pointers should be the same. Is this the information you're looking for? - Sean From afriedle at open-mpi.org Mon Jul 24 12:35:14 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Mon, 24 Jul 2006 12:35:14 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <000301c6af50$c1581ea0$ff0da8c0@amr.corp.intel.com> References: <000301c6af50$c1581ea0$ff0da8c0@amr.corp.intel.com> Message-ID: <44C520F2.2020909@open-mpi.org> Sean Hefty wrote: >>What's the best way to map the ibv_context's return from >>rdma_get_devices() to rdma_cm_id's for use with the rest of the RDMA >>functions? > > > Once an rdma_cm_id has been bound to a device, it will reference one of the > devices (ibv_context) that's also return from rdma_get_devices(). I.e. the > pointers should be the same. > > Is this the information you're looking for? Not quite - that's backwards from how I'm looking to do it. I want to use rdma_get_devices() (or something similar) to see which devices are available, then get/create rdma_cm_id's for each of those devices, without having something like an IP address beforehand. Andrew > > - Sean From eitan at mellanox.co.il Mon Jul 24 12:47:15 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 24 Jul 2006 22:47:15 +0300 Subject: [openib-general] [PATCH] osm: bug on SM generated Notice Reports Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3028D9669@mtlexch01.mtl.com> Hi Hal, > Are the network order routines ( ib_notice_set_prod_type and > ib_notice_set_vend_id) still needed or should they be eliminated ? [EZ] I propose to keep them. From sean.hefty at intel.com Mon Jul 24 12:44:53 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 12:44:53 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44C520F2.2020909@open-mpi.org> Message-ID: <000401c6af59$a25b9960$ff0da8c0@amr.corp.intel.com> >Not quite - that's backwards from how I'm looking to do it. I want to >use rdma_get_devices() (or something similar) to see which devices are >available, then get/create rdma_cm_id's for each of those devices, >without having something like an IP address beforehand. It sounds like you just want to call verbs directly to get a device list and use that. The rdma_cm is designed around providing a socket-like interface based on IP addressing, so I'm not sure it gives you what you want. - Sean From afriedle at open-mpi.org Mon Jul 24 12:51:26 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Mon, 24 Jul 2006 12:51:26 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <000401c6af59$a25b9960$ff0da8c0@amr.corp.intel.com> References: <000401c6af59$a25b9960$ff0da8c0@amr.corp.intel.com> Message-ID: <44C524BE.1020208@open-mpi.org> Sean Hefty wrote: >>Not quite - that's backwards from how I'm looking to do it. I want to >>use rdma_get_devices() (or something similar) to see which devices are >>available, then get/create rdma_cm_id's for each of those devices, >>without having something like an IP address beforehand. > > > It sounds like you just want to call verbs directly to get a device list and use > that. The rdma_cm is designed around providing a socket-like interface based on > IP addressing, so I'm not sure it gives you what you want. Yes, that's what I've been doing, though you recommended using the RDMA CM to set up multicast groups. What interface should I be using then? > - Sean From koop at cse.ohio-state.edu Mon Jul 24 13:40:20 2006 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon, 24 Jul 2006 16:40:20 -0400 (EDT) Subject: [openib-general] mvapich2 pmi scalability problems In-Reply-To: Message-ID: We've looked at the issue a bit more and discussed off-list, and the issue has been resolved through the attached patch. For best performance, the upcoming release of SLURM (1.1.5) with better optimization of PMI should also be used. Thanks, Matthew Koop - Network-Based Computing Lab Ohio State University On Fri, 21 Jul 2006 Don.Dhondt at Bull.com wrote: > Matthew, > > We build mvapich2 using the make.mvapich2.gen2 script. > Within that script file is the fillowing: > # Whether to use an optimized queue pair exchange scheme. This is not > # checked for a setting in in the script. It must be set here explicitly. > # Supported: "-DUSE_MPD_RING" and "" (to disable) > if [ $ARCH = "_PPC64_" ]; then > HAVE_MPD_RING="" > else > HAVE_MPD_RING="-DUSE_MPD_RING" > fi > > Since we are compiling for ia64 our assumption is it compiled with > HAVE_MPD_RING="-DUSE_MPD_RING". Is this correct? > Also, we are not using mpd to run start the jobs. Since we are > using slurm as the resource manager the jobs are started with > srun. Does MPD_RING on apply if using MDP? > > -Don > > > > > Matthew Koop > 07/21/2006 11:51 AM > > To > Don.Dhondt at Bull.com > cc > openib-general at openib.org > Subject > Re: [openib-general] mvapich2 pmi scalability problems > > > > > > > Don, > > Are you using the USE_MPD_RING flag when compiling? If not, can you give > that a try? It should very significantly decrease the number of PMI calls > that are made. > > Thanks, > > Matthew Koop > > > > > > > -------------- next part -------------- Index: src/mpid/osu_ch3/channels/mrail/include/mpidi_ch3_pre.h =================================================================== --- src/mpid/osu_ch3/channels/mrail/include/mpidi_ch3_pre.h (revision 377) +++ src/mpid/osu_ch3/channels/mrail/include/mpidi_ch3_pre.h (working copy) @@ -97,6 +97,7 @@ struct MPID_Request * send_active; struct MPID_Request * recv_active; int local_nodes; + int hostid; } MPIDI_CH3I_SMP_VC; #endif Index: src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_smp_progress.c =================================================================== --- src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_smp_progress.c (revision 377) +++ src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_smp_progress.c (working copy) @@ -1069,6 +1069,16 @@ /** exchange address hostid using PMI interface **/ if (pg_size > 1) { +#ifdef USE_MPD_RING + for(i = 0; i < pg_size; i++) { + MPIDI_PG_Get_vc(pg, i, &vc); + if(i == pg_rank) { + hostnames_j[i] = hostid; + } else { + hostnames_j[i] = vc->smp.hostid; + } + } +#else char *key; char *val; @@ -1167,8 +1177,8 @@ mpi_errno); return mpi_errno; } +#endif /* end !MPD_RING */ - } /** end of exchange address **/ Index: src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c =================================================================== --- src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c (revision 377) +++ src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c (working copy) @@ -1209,6 +1209,10 @@ rdma_iba_addr_table.lid[i][0], local_addr_len, QPLEN_XDR); +#ifdef _SMP_ + vc->smp.hostid = rdma_iba_addr_table.hostid[i][0]; +#endif + /* Get the qp, key and buffer for this process */ temp_ptr = alladdr_inv + pg_rank * QPLEN_XDR; From sean.hefty at intel.com Mon Jul 24 13:51:04 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 13:51:04 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44C524BE.1020208@open-mpi.org> Message-ID: <000601c6af62$e1be78d0$ff0da8c0@amr.corp.intel.com> >Yes, that's what I've been doing, though you recommended using the RDMA >CM to set up multicast groups. What interface should I be using then? That's the problem. There is no raw userspace SA query interface. The only other alternative is to send MADs directly to the SA through the SA MAD interface, but doing so would bypass the kernel ib_multicast module which is responsible for tracking multicast join / leave requests. (In fact, the userspace MAD interface should eventually prevent a userspace app from sending raw join/leave requests.) Are you trying to have each port join the same multicast group? Are you trying to join the same multicast group multiple times from the same port? How are you determining the multicast group parameters? And who creates the multicast group? - Sean From halr at voltaire.com Mon Jul 24 13:09:05 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 Jul 2006 16:09:05 -0400 Subject: [openib-general] [PATCH]OpenSM/osm_prtn.c: Set exactly selectors for MTU and rate as well as in component mask prior to calling osm_mcmr_rcv_find_or_create_new_mgrp Message-ID: <1153771744.4537.81507.camel@hal.voltaire.com> OpenSM/osm_prtn.c: Set exactly selectors for MTU and rate as well as in component mask prior to calling osm_mcmr_rcv_find_or_create_new_mgrp Signed-off-by: Sasha Khapyorsky Signed-off-by: Hal Rosenstock Index: opensm/osm_prtn.c =================================================================== --- opensm/osm_prtn.c (revision 8640) +++ opensm/osm_prtn.c (working copy) @@ -216,17 +216,18 @@ ib_api_status_t osm_prtn_add_mcgroup(osm memcpy(&mc_rec.mgid.raw[4], &pkey, sizeof(pkey)); mc_rec.qkey = CL_HTON32(0x0b1b); - mc_rec.mtu = mtu ? mtu : 4; /* 2048 Bytes */ + mc_rec.mtu = (mtu ? mtu : 4) | (2 << 6); /* 2048 Bytes */ mc_rec.tclass = 0; mc_rec.pkey = pkey; - mc_rec.rate = rate ? rate : 0x3; /* 10Gb/sec */ + mc_rec.rate = (rate ? rate : 0x3) | (2 << 6); /* 10Gb/sec */ mc_rec.pkt_life = OSM_DEFAULT_SUBNET_TIMEOUT; mc_rec.sl_flow_hop = ib_member_set_sl_flow_hop(p->sl, 0, 0); /* Note: scope needs to be consistent with MGID */ mc_rec.scope_state = 0x21; /* don't update rate, mtu */ - comp_mask = IB_MCR_COMPMASK_MTU | IB_MCR_COMPMASK_RATE; + comp_mask = IB_MCR_COMPMASK_MTU | IB_MCR_COMPMASK_MTU_SEL + | IB_MCR_COMPMASK_RATE | IB_MCR_COMPMASK_RATE_SEL; status = osm_mcmr_rcv_find_or_create_new_mgrp(&p_sa->mcmr_rcv, comp_mask, &mc_rec, &p_mgrp); if (!p_mgrp || status != IB_SUCCESS) From tziporet at mellanox.co.il Mon Jul 24 14:45:04 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 25 Jul 2006 00:45:04 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA74FE@mtlexch01.mtl.com> Hi all, This is the outcome of the meeting we had today regarding OFED 1.1 schedule and features. Tziporet 1. Schedule: ============ Target release date: 31-Aug Intermediate milestones: 1. Create 1.1 branch of user level code and rc1: 27-Jul 2. Feature freeze : 3-Aug 3. Code freeze (rc-x): 25-Aug 4. Final release: 31-Aug In general all agreed but it seems aggressive schedule. We will delay in 1 week if needed or drop some features. There was a request for another OFED release toward SC06 that will include most updated Open MPI version and we agreed this is possible. git tree of kernel code will be available on Sandia servers once their system administrator will setup the server with git installed (should be this week) 2. Features: ============ Note: features that are under low priority may not be qualified in final release due to schedule limitations. 1. OS: Novell: - SLES 9.0 SP3* - SLES10 (official release)* Redhat: - Redhat EL4 up3 - Redhat EL4 up4* kernel.org: - Kernel 2.6.17* * Changes from last release Note: Redhat EL4 up2, Fedora C4 and SuSE Pro 10 were dropped from the list. We will keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. 2. General changes: - lib32 on 64 bits systems - Kernel code based on 2.6.18 - HCA fatal - full flow support - Low priority - High Availability in IPoIB and SRP - Bug fixes 3. OSM (new code based on the trunk): - Partition Manager (Pkey) - Low priority - Pre-computed routing load from file - Primitive QoS - As technology preview Mainly developed and verified by Voltaire. 4. SDP: - Beta quality (higher stability) - Improved latency - Improved bandwidth of small messages (Naggle algorithm) - done - Support the backlog parameter in the listen call - support sending/receiving out of band data - Interoperability with previous SDP implementation We need SDP from Cisco to test the interoperability with their SDP 5. SRP: - GA quality - DM (Device Mapper) - for high availability - Basic failover/failback testing with daemon+srp+XVM/MPP and Engenio target A technical mail was published on the general list. (Subject: Needed changes to support fail-over drivers). Need help from Roland to close the technical details since he is SRP maintainer. 6. IPoIB: - High availability support using a daemon in user level 7. uDAPL: - Scalability features needed for Intel MPI - Going to take the new code from the trunk 8. OSU - MVAPICH: - Based on 0.97 (+ bug fixes) - Message coalescing 9. Open MPI: - Open MPI 1.1.1 - Depending on the dates/schedule of OFED 1.1 and Open MPI 1.1.1 (If not then Open MPI 1.1 will be used) - The major differences between Open MPI 1.1 and 1.1.1 can be seen here: http://svn.open-mpi.org/svn/ompi/trunk/NEWS 10. MPI tests: - Replace to the new test versions from LLNL, Intel, OSU 11. iSER: - Stability - code review and bug fixes at iser and libiscsi code related to error handling (libiscsi is a service module used by both iscsi_tcp & iser) - Testing more platforms (e.g. ppc64 and ia64) - Performance improvements - The libiscsi fixes are (2.6.18-rc2/3) and will (2.6.19) be pushed upstream and from there be propagated to distros (eg SLES10 RH5) through their merge process. 12. RDS: - Oracle and SilverStorm should update Need to decide if RDS should be removed from OFED since Oracle does not support it for now. Sujal will check it and we will get to decision soon. 13: Management: - Madeye utility - Add saquery tool - Enhancement to ibnetdiscover tool with grouping function - New ibutils package: o Port error counter check o Port performance counters dump o Link width and Link Speed check by flag -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Sujal Das Sent: Monday, July 17, 2006 8:32 PM To: openfabrics-ewg at openib.org Subject: [openfabrics-ewg] OFED 1.1 planning meeting Hello all, We would like to call an OFED v1.1 planning meeting ASAP. We can use the regular time and call-in number that Shawn/Jeff from Cisco had set up earlier. Shawn/Jeff: will you please confirm and send a reminder? Some critical feature requirements for OFED 1.1 are based on customer feedback on OFED 1.0, specifically some missing features that continue the perception of OFED 1.0 as being inferior to proprietary stacks: - Support for SLES 9 in general - SRP failover and MPP - SDP usability - RDS usability - UDAPL scalability - High Availability - Additional OS/CPU platforms support for iSER, sync with upstream kernel - Etc Thanks, Sujal _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg -------------- next part -------------- An HTML attachment was scrubbed... URL: From afriedle at open-mpi.org Mon Jul 24 15:04:58 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Mon, 24 Jul 2006 15:04:58 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <000601c6af62$e1be78d0$ff0da8c0@amr.corp.intel.com> References: <000601c6af62$e1be78d0$ff0da8c0@amr.corp.intel.com> Message-ID: <44C5440A.4030708@open-mpi.org> Sean Hefty wrote: >>Yes, that's what I've been doing, though you recommended using the RDMA >>CM to set up multicast groups. What interface should I be using then? > > > That's the problem. There is no raw userspace SA query interface. The only > other alternative is to send MADs directly to the SA through the SA MAD > interface, but doing so would bypass the kernel ib_multicast module which is > responsible for tracking multicast join / leave requests. (In fact, the > userspace MAD interface should eventually prevent a userspace app from sending > raw join/leave requests.) I'd rather not do all that. > Are you trying to have each port join the same multicast group? Possibly - could make some implementation easier, though for performance probably not. > Are you trying to join the same multicast group multiple times from the same port? Almost certainly not. > How are you determining the multicast group parameters? Which parameters are you referring to? > And who creates the multicast group? Don't know yet :) Right now I just have a few ideas I want to play around with, but first I need to be able to just set up and use IB/multicast with an API that meshes well with Open MPI. RDMA CM just about does it right - except for how interfaces/ports are presented. I just need to be able to get a list of rdma_cm_id's for the available interfaces. Really, ibverbs is sufficient (and does well) for what I need to do CM-wise, other than support for multicast. Andrew > - Sean From sean.hefty at intel.com Mon Jul 24 15:27:18 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 24 Jul 2006 15:27:18 -0700 Subject: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44C5440A.4030708@open-mpi.org> Message-ID: <000701c6af70$52f2a730$ff0da8c0@amr.corp.intel.com> >> Are you trying to have each port join the same multicast group? > >Possibly - could make some implementation easier, though for performance > probably not. If you only want to join a group once per system, then using the following calls from the RDMA CM should work: rdma_create_id(); rdma_resolve_addr(multicast IP address); /* At this point, you'll have a device */ rdma_create_qp(); rdma_join_multicast(); You would only need to know the address of the multicast group to join, which is somewhat equivalent to knowing the MGID. > > How are you determining the multicast group parameters? > >Which parameters are you referring to? The values in the MCMemberRecord needed when creating or joining the multicast group: MGID, pKey, qKey, SL, flow label, traffic class, etc. >RDMA CM just about does it right - except for how interfaces/ports are >presented. I just need to be able to get a list of rdma_cm_id's for the >available interfaces. I'll think about this some, but I'm not sure there's much else that can be done. At some point, you need an address / identifier for the multicast group. >Really, ibverbs is sufficient (and does well) for what I need to do >CM-wise, other than support for multicast. Ibverbs doesn't really help establish connections, but I'm guessing that you exchange QP information over TCP sockets. - Sean From rdreier at cisco.com Mon Jul 24 15:34:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Jul 2006 15:34:14 -0700 Subject: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers In-Reply-To: <20060724165602.GA8600@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 24 Jul 2006 19:56:02 +0300") References: <20060724165602.GA8600@mellanox.co.il> Message-ID: [CC'ing linux-scsi as well -- I think we'll get better insight from there] > The current SRP initiator code cannot work with several fail-over mechanisms. > > The current srp driver's behavior when a target off-line then online: > 1) The target is offline. > 2) the initiator tries to reconnect and fails > 3) The initiator calls srp_remove_work that removes the scsi_host. > 4) The target is back online. > 5) the user (or the ibsrpdm daemon) is expected to execute a new add_target. > 6) This creates a new scsi_host (with new names to the devices and new index in > the scsi_host directory in sysfs) for this target. > > Fail-over drivers (e.g., MPP that is used by Engenio and XVM that is used by > SGI) have problems with this behavior (item 3). They need the scsi_host to keep > exist and return errors in the meanwhile until the connection to the target > resumes. OK, but is this a valid assumption? What happens for iSCSI and/or iSER? > In addition remove/re-alloc scsi host is a "heavy" operation instead of > disconnect/reconnect the connection only. > > In order to support these tools I propose the following changes that will allow > the user to move the srp initiator to a disconnected state (when the target > leaves the fabric) and reconnect it later (when the target returns to the > fabric). Seems OK but see below... > After these changes will be in the ib_srp module, the ibsrpdm daemon will be > able to monitor the presence of targets in the fabric and to use this interface > (When targets leave or rejoin the fabric). How does the daemon know when something is gone for good vs. when it might come back? > Here is the description of the new design: (I already implemented most of the > code) > > 1) Split the function srp_reconnect_target into two functions: > _srp_disconnect_target and _srp_reconnect_target > > 2) Adding two new states: SRP_TARGET_DISCONNECTED (The state after > _srp_disconnect_target was executed and before _srp_reconnect_target is > executed) and SRP_TARGET_DISCONNECTING (The state while in srp_remove_target). > > 3) Adding new input files in sysfs: > /sys/class/scsi_host/host?/{disconnect_target,connect_target,erase_target} > > 4) Writing the string "remove" to /sys/class/scsi_host/host?/disconnect_target > calls srp_disconnect_target that moves the corresponding target to a > SRP_TARGET_DISCONNECTED state (After closing the cm, and reset all pending > requests). Now when the scsi performs queuecommand to this host the result is > DID_NO_CONNECT. This causes the scsi mid-layer to return to the user with an > IO error without initiating the scsi error auto recovery chain. Why does userspace need to be able to disconnect a connection? > 5) Writing anything to /sys/class/scsi_host/host?/reconnect_target calls > _srp_reconnect_target that move the target to SRP_TARGET_LIVE state again. > > 6) Writing "erase" to /sys/class/scsi_host/host?/erase_target calls > srp_remove_work that removes the scsi_host. Why the asymmetry here? In other words, why does anything work for reconnect_target but only the literal "erase" work for erase_target? > 7) Adding output files in sysfs to present the HCA and port that the initiator > used to connect to the target. Using these files and the target GUID the > ibsrpdm can know on which scsi_host to perform the reconnect_target. From michaelc at cs.wisc.edu Mon Jul 24 19:06:39 2006 From: michaelc at cs.wisc.edu (Mike Christie) Date: Mon, 24 Jul 2006 22:06:39 -0400 Subject: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers In-Reply-To: References: <20060724165602.GA8600@mellanox.co.il> Message-ID: <44C57CAF.7010502@cs.wisc.edu> Roland Dreier wrote: > [CC'ing linux-scsi as well -- I think we'll get better insight from there] > > > The current SRP initiator code cannot work with several fail-over mechanisms. > > > > The current srp driver's behavior when a target off-line then online: > > 1) The target is offline. > > 2) the initiator tries to reconnect and fails > > 3) The initiator calls srp_remove_work that removes the scsi_host. > > 4) The target is back online. > > 5) the user (or the ibsrpdm daemon) is expected to execute a new add_target. > > 6) This creates a new scsi_host (with new names to the devices and new index in > > the scsi_host directory in sysfs) for this target. > > > > Fail-over drivers (e.g., MPP that is used by Engenio and XVM that is used by > > SGI) have problems with this behavior (item 3). They need the scsi_host to keep > > exist and return errors in the meanwhile until the connection to the target > > resumes. > > OK, but is this a valid assumption? What happens for iSCSI and/or iSER? I do not see why the host has to remain constant for the above problem. I can understand why it may be easier to program though. However, this is not a requirement for other multipath drivers like dm-multipath or md multpiath and I do not think you should rely on that type of behavior. The short story is that I think we are moving to something similar to what srp does very soon. The long story.... iscsi and iser allocate a host per session (session is allocated in the host's hostdata). If there are problems with the connection (target goes unreachable for N number of seconds or we get some error value from the network layer, etc) we keep the host, session, connection, target and scsi devices around and try to reconnect. We then have a userspace daemon that tries to reconnect to the target and relogin. If we reconnect within X seconds (we call this the replacement_timeout and it is similar to the FC class dev_loss_tmo), we reuse those structs and go on as normal. If after replacement_timeout seconds we do not reconnect, we can remove the host, session, connection, target and scsi_devices or we can keep them around and reuse them if we later reconnect. If we remove those structs we later have to allocate new ones of course and will get a new host number. Whether we use the model of reusing the structs or removing them is controlled in userspace and we currently do the wrong thing by default and keep the structs around. I guess what we are supposed to do is something similar to the FC class where if dev_loss_tmo expires then we should remove the session, connection, target and devices. I am not sure if we should be removing the scsi host though. I think it makes sense to remove that too, since the host and session are so closely tied in our model. We are in the process to moving to the model where all the structs are removed as the default and only model we support, and it looks like we will do this in 2.6.19. From mst at mellanox.co.il Mon Jul 24 21:40:27 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 25 Jul 2006 07:40:27 +0300 Subject: [openib-general] openib tools to update mellnox firmware In-Reply-To: <44C50181.4000304@veritas.com> References: <44C50181.4000304@veritas.com> Message-ID: <20060725044027.GA7682@mellanox.co.il> Quoting r. somenath : > thanks Tziporet. > from ur link , I am looking here > http://www.mellanox.com/support/firmware_table.php > > for the exact HCA I have from your list. I don't find the exact match of > OPN and Card Rev (PSID) and I don't know how to find those. > all I found about that card is this: > [root at alekhine bin]# ./ibstat > CA 'mthca0' > CA type: MT25208 (MT23108 compat mode) > Number of ports: 2 > Firmware version: 4.6.0 > Hardware version: a0 > Node GUID: 0x0005ad0000039abc > System image GUID: 0x0005ad000100d050 > > Is there another tool which will give me OPN/Card Rev (PSID) so that I > can match exactly from the table above? > Try this: cat /sys/class/infiniband/mthca0/board_id > also, another question: is the latest firmware backward compatible with > the openib stack I am using? (from redhat binary distribtution RH4 > update 3)? > > thanks, som. Should be. -- MST From sweitzen at cisco.com Mon Jul 24 22:11:36 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 24 Jul 2006 22:11:36 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary Message-ID: Cisco IB host drivers are available at http://www.cisco.com/cgi-bin/tablebuild.pl/sfs-linux and http://www.cisco.com/cgi-bin/tablebuild.pl/sfs-win2K. Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Monday, July 24, 2006 2:45 PM To: openfabrics-ewg at openib.org Cc: openib Subject: Re: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary Hi all, This is the outcome of the meeting we had today regarding OFED 1.1 schedule and features. Tziporet 1. Schedule: ============ Target release date: 31-Aug Intermediate milestones: 1. Create 1.1 branch of user level code and rc1: 27-Jul 2. Feature freeze : 3-Aug 3. Code freeze (rc-x): 25-Aug 4. Final release: 31-Aug In general all agreed but it seems aggressive schedule. We will delay in 1 week if needed or drop some features. There was a request for another OFED release toward SC06 that will include most updated Open MPI version and we agreed this is possible. git tree of kernel code will be available on Sandia servers once their system administrator will setup the server with git installed (should be this week) 2. Features: ============ Note: features that are under low priority may not be qualified in final release due to schedule limitations. 1. OS: Novell: - SLES 9.0 SP3* - SLES10 (official release)* Redhat: - Redhat EL4 up3 - Redhat EL4 up4* kernel.org: - Kernel 2.6.17* * Changes from last release Note: Redhat EL4 up2, Fedora C4 and SuSE Pro 10 were dropped from the list. We will keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. 2. General changes: - lib32 on 64 bits systems - Kernel code based on 2.6.18 - HCA fatal - full flow support - Low priority - High Availability in IPoIB and SRP - Bug fixes 3. OSM (new code based on the trunk): - Partition Manager (Pkey) - Low priority - Pre-computed routing load from file - Primitive QoS - As technology preview Mainly developed and verified by Voltaire. 4. SDP: - Beta quality (higher stability) - Improved latency - Improved bandwidth of small messages (Naggle algorithm) - done - Support the backlog parameter in the listen call - support sending/receiving out of band data - Interoperability with previous SDP implementation We need SDP from Cisco to test the interoperability with their SDP 5. SRP: - GA quality - DM (Device Mapper) - for high availability - Basic failover/failback testing with daemon+srp+XVM/MPP and Engenio target A technical mail was published on the general list. (Subject: Needed changes to support fail-over drivers). Need help from Roland to close the technical details since he is SRP maintainer. 6. IPoIB: - High availability support using a daemon in user level 7. uDAPL: - Scalability features needed for Intel MPI - Going to take the new code from the trunk 8. OSU - MVAPICH: - Based on 0.97 (+ bug fixes) - Message coalescing 9. Open MPI: - Open MPI 1.1.1 - Depending on the dates/schedule of OFED 1.1 and Open MPI 1.1.1 (If not then Open MPI 1.1 will be used) - The major differences between Open MPI 1.1 and 1.1.1 can be seen here: http://svn.open-mpi.org/svn/ompi/trunk/NEWS 10. MPI tests: - Replace to the new test versions from LLNL, Intel, OSU 11. iSER: - Stability - code review and bug fixes at iser and libiscsi code related to error handling (libiscsi is a service module used by both iscsi_tcp & iser) - Testing more platforms (e.g. ppc64 and ia64) - Performance improvements - The libiscsi fixes are (2.6.18-rc2/3) and will (2.6.19) be pushed upstream and from there be propagated to distros (eg SLES10 RH5) through their merge process. 12. RDS: - Oracle and SilverStorm should update Need to decide if RDS should be removed from OFED since Oracle does not support it for now. Sujal will check it and we will get to decision soon. 13: Management: - Madeye utility - Add saquery tool - Enhancement to ibnetdiscover tool with grouping function - New ibutils package: o Port error counter check o Port performance counters dump o Link width and Link Speed check by flag -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Sujal Das Sent: Monday, July 17, 2006 8:32 PM To: openfabrics-ewg at openib.org Subject: [openfabrics-ewg] OFED 1.1 planning meeting Hello all, We would like to call an OFED v1.1 planning meeting ASAP. We can use the regular time and call-in number that Shawn/Jeff from Cisco had set up earlier. Shawn/Jeff: will you please confirm and send a reminder? Some critical feature requirements for OFED 1.1 are based on customer feedback on OFED 1.0, specifically some missing features that continue the perception of OFED 1.0 as being inferior to proprietary stacks: - Support for SLES 9 in general - SRP failover and MPP - SDP usability - RDS usability - UDAPL scalability - High Availability - Additional OS/CPU platforms support for iSER, sync with upstream kernel - Etc Thanks, Sujal _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlleinin at hpcn.ca.sandia.gov Tue Jul 25 00:26:10 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Tue, 25 Jul 2006 00:26:10 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA74FE@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA74FE@mtlexch01.mtl.com> Message-ID: <1153812370.13875.162.camel@localhost> On Tue, 2006-07-25 at 00:45 +0300, Tziporet Koren wrote: > Hi all, > > This is the outcome of the meeting we had today regarding OFED 1.1 > schedule and features. > 5. SRP: > > – GA quality > > – DM (Device Mapper) - for high availability > > – Basic failover/failback testing with daemon+srp+XVM/MPP and > Engenio target > Tziporet, Are there any plans to test with the DDN SRP target? Several DoE sites are testing/using the DDN IB based storage. Thanks, - Matt From herbert at gondor.apana.org.au Tue Jul 25 00:39:45 2006 From: herbert at gondor.apana.org.au (Herbert Xu) Date: Tue, 25 Jul 2006 17:39:45 +1000 Subject: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification. In-Reply-To: <20060718184907.25455.8174.stgit@dell3.ogc.int> Message-ID: Steve Wise wrote: > > Routing redirect events are broadcast as a pair of rtmsgs, RTM_DELROUTE > and RTM_NEWROUTE. This may confuse existing rtnetlink users since you're generating an RTM_DELROUTE message that's identical to one triggered by something like 'ip route del'. As you're introducing a completely new RTM_ROUTEUPD type, it might be better to attach any information from the existing route that you need to the ROUTEUPD message. Actually, what was the reason you need the existing route here? > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > index 5f87533..33d8a83 100644 > --- a/net/ipv4/fib_semantics.c > +++ b/net/ipv4/fib_semantics.c > @@ -44,6 +44,7 @@ #include > #include > #include > #include > +#include > > #include "fib_lookup.h" > > @@ -279,6 +280,14 @@ void rtmsg_fib(int event, u32 key, struc > struct sk_buff *skb; > u32 pid = req ? req->pid : n->nlmsg_pid; > int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); > + struct netevent_route_info nri; > + int netevent; > + > + nri.family = AF_INET; > + nri.data = &fa->fa_info; > + netevent = event == RTM_NEWROUTE ? NETEVENT_ROUTE_ADD > + : NETEVENT_ROUTE_DEL; > + call_netevent_notifiers(netevent, &nri); Hmm, this is broken. These route events are meaningless without the corresponding IP rule events. Are you sure you really want to make your hardware/driver grok multiple routing tables? Perhaps you should simply stick to dst entries and flush all your tables when the routes are changed. This is what the Linux IP stack does. > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 2dc6dbb..18879e6 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -1117,6 +1120,52 @@ static void rt_del(unsigned hash, struct > spin_unlock_bh(rt_hash_lock_addr(hash)); > } > > +static void rtm_redirect(struct rtable *old, struct rtable *new) > +{ > + struct netevent_redirect netevent; > + struct sk_buff *skb; > + int err; > + > + netevent.old = &old->u.dst; > + netevent.new = &new->u.dst; > + > + /* notify netevent subscribers */ > + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); > + > + /* Post NETLINK messages: RTM_DELROUTE for old route, > + RTM_NEWROUTE for new route */ > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); Please use a better size estimate rather than NLMSG_GOODSIZE here since you're doing GFP_ATOMIC. > @@ -1442,6 +1493,32 @@ unsigned short ip_rt_frag_needed(struct > return est_mtu ? : new_mtu; > } > > +static void rtm_pmtu_update(struct rtable *rt) > +{ > + struct sk_buff *skb; > + int err; > + > + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst); > + > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); Ditto. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From jackm at mellanox.co.il Tue Jul 25 01:36:41 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Tue, 25 Jul 2006 11:36:41 +0300 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it In-Reply-To: <44C504D9.7080407@ichips.intel.com> References: <000401c6aee6$8252e6a0$87f9070a@amr.corp.intel.com> <44C504D9.7080407@ichips.intel.com> Message-ID: <200607251136.42137.jackm@mellanox.co.il> On Monday 24 July 2006 20:35, Sean Hefty wrote: > Sean Hefty wrote: > > Yes - I inadvertently removed this fix when I removed libsysfs from the > > librdmacm. I will re-instate the fix. > > 7497 was a fix that added libsysfs to the librdmacm as a result of it being > removed from libibverbs. In any case, I applied a fix to assume the latest > ABI version. > > - Sean Thanks, BTW, 7497 WAS the mod that removed the fix -- see patch file below with diff of cma.c between 7496 and 7497. Jack ---------------------------------- Index: cma.c =================================================================== --- cma.c (revision 7496) +++ cma.c (revision 7497) @@ -49,6 +49,8 @@ #include #include +#include + #include #include #include @@ -140,7 +142,8 @@ static void ucma_cleanup(void) static int check_abi_version(void) { char path[256]; - char val[16]; + struct sysfs_attribute *attr; + int ret = -1; if (sysfs_get_mnt_path(path, sizeof path)) { fprintf(stderr, "librdmacm: couldn't find sysfs mount.\n"); @@ -148,17 +151,33 @@ static int check_abi_version(void) } strncat(path, "/class/misc/rdma_cm/abi_version", sizeof path); - if (!sysfs_read_attribute_value(path, val, sizeof val)) - abi_ver = strtol(val, NULL, 10); + + attr = sysfs_open_attribute(path); + if (!attr) { + fprintf(stderr, "librdmacm: couldn't open rdma_cm ABI version.\n"); + return -ENOSYS; + } + + if (sysfs_read_attribute(attr)) { + fprintf(stderr, "librdmacm: couldn't read rdma_cm ABI version.\n"); + goto out; + } + + abi_ver = strtol(attr->value, NULL, 10); if (abi_ver < RDMA_USER_CM_MIN_ABI_VERSION || abi_ver > RDMA_USER_CM_MAX_ABI_VERSION) { fprintf(stderr, "librdmacm: kernel ABI version %d " "doesn't match library version %d.\n", abi_ver, RDMA_USER_CM_MAX_ABI_VERSION); - return -ENOSYS; + goto out; } - return 0; + + ret = 0; + +out: + sysfs_close_attribute(attr); + return ret; } static int ucma_init(void) From monil at voltaire.com Tue Jul 25 02:27:47 2006 From: monil at voltaire.com (Moni Levy) Date: Tue, 25 Jul 2006 12:27:47 +0300 Subject: [openib-general] [openfabrics-ewg] IPoIB "bonding" solution for OFED 1.1 (was re: [PATCH] ipoib: fix address update handling (was Re: OFED 1.1 release - schedule and features)) In-Reply-To: <44BF7323.7010103@mellanox.co.il> References: <20060717140349.GA19080@mellanox.co.il> <44BE1254.8070403@voltaire.com> <44BF7323.7010103@mellanox.co.il> Message-ID: <6a122cc00607250227o460082cu77d85d1057eb34e3@mail.gmail.com> On 7/20/06, Tziporet Koren wrote: > Or Gerlitz wrote: > > Hi Tziporet, > > > > Do you have an initial drop of the bonding solution planned for OFED 1.1 > > that is ready to see the daylight? if not, when is this expected? > > > > As i mentioned to you, we are investigating few possible ways to > > implement HA for IPoIB and want to examine your approach as well. > > > > Or. > > > > > > > Vlad already answered. We will be happy for any help in this area. Tziporet, In order to get as much cooperation as possible I think that we should post an RFC about that before implementing it & before getting the implementation in OFED 1.1. We looked into the more standard implementation that uses bonding device and tried to find out what the issues are. More then that I'm not sure that what you, guys, suggest will work if we have multicast applications running. -- Moni > Tziporet > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > > From k_mahesh85 at yahoo.co.in Tue Jul 25 03:05:03 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Tue, 25 Jul 2006 11:05:03 +0100 (BST) Subject: [openib-general] connection loss handling in MTHCA Message-ID: <20060725100503.60345.qmail@web8321.mail.in.yahoo.com> hello all, i have a query about the "connection loss" handling in the mthca driver consider the following situation, during the data transfer between two connected endpoints if one side end point(HCA level) detects that coneection is lost due to some reason such as "receive queue empty on remote end " or "TPT error for data buffer on remote end" then how it will be handled in the mthca implementation what happens to the WRs which are in progress and outstanding?? is there any asynchronous event generated correspondig to that?? -Mahesh --------------------------------- Find out what India is talking about on Yahoo! Answers India. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Tue Jul 25 06:51:19 2006 From: jlentini at netapp.com (James Lentini) Date: Tue, 25 Jul 2006 09:51:19 -0400 (EDT) Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> References: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> Message-ID: Sean, Thanks for working on this. One minor suggestion below: On Fri, 21 Jul 2006, Sean Hefty wrote: > @@ -3308,6 +3430,7 @@ static void cm_add_one(struct ib_device > > cm_dev->device = device; > cm_dev->ca_guid = device->node_guid; > + ib_register_cm_handler(device, cm_event_handler); If ib_register_cm_handler() fails, how about logging an error message: if (ib_register_cm_handler(device, cm_event_handler)) printk(KERN_WARNING "IB CM unable to register CM handler\n); > > set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); > for (i = 1; i <= device->phys_port_cnt; i++) { From ogerlitz at voltaire.com Tue Jul 25 07:17:22 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 25 Jul 2006 17:17:22 +0300 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <000201c6aee4$1f54b760$87f9070a@amr.corp.intel.com> References: <000201c6aee4$1f54b760$87f9070a@amr.corp.intel.com> Message-ID: <44C627F2.906@voltaire.com> Sean Hefty wrote: >> +/** >> + * ib_cm_establish - Forces a connection state to established. >> + * @cm_id: Connection identifier to transition to established. >> + * >> + * This routine should be invoked by users who receive messages on a >> + * connected QP before an RTU has been received. >> + */ >> +int ib_cm_establish(struct ib_cm_id *cm_id); >> >> As with your patch ib_cm_establish is now called internally from within >> the CM with the suggested solution good for both user and kernel space >> connections/qps, how about removing it from the API? > > I thought about removing it, but it changes the ABI, and existing users may > still want to call ib_cm_establish() after processing a receive completion. > This will let them immediately send a response without waiting for either the > RTU or the COMM_EST event. there are three types of CM/VERBS consumers: 1) work over the CM/VERBS 2) work over the CMA/VERBS with the CMA doing the QP state changes 3) work over the CMA/VERBS with the ULP doing the QP state changes Your patch provides ***full*** solution for type 2 consumers. Type 3 consumers can't call ib_cm_establish since they don't interact with the CM, as for type 1 consumers, looking on the CM code i was not able to convince myself that calling ib_cm_establish along with modifying the qp state to RTS would provide a full solution for them, what do you think? Or. From mst at mellanox.co.il Tue Jul 25 10:06:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 25 Jul 2006 20:06:43 +0300 Subject: [openib-general] 2 patches in mst-for-2.6.18 Message-ID: <20060725170643.GP8795@mellanox.co.il> I have put the following 2 patches in mst-for-2.6.18: Author: Ishai Rabinovitz IB/srp: fix crash in srp_reconnect_target commit 82bf649ad7e434ccb7ba91e2fc5764a5888bbfb4 Author: Sean Hefty IB/cm: fix error handling in ib_send_cm_req Both have been posted on list by the respective authors. I haven't done testing on either, but both seem pretty trivial fixes. I'll be running this tree in the next couple of days. With regard to Ishai's patch, I hear from him that he actually saw a crash in srp_reconnect_target that this is supposed to fix. Pls review and let me know. here: git://www.mellanox.co.il/~git/infiniband mst-for-2.6.18 -- MST From rdreier at cisco.com Tue Jul 25 10:12:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 25 Jul 2006 10:12:56 -0700 Subject: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers In-Reply-To: <20060725134529.GA20068@mellanox.co.il> (Ishai Rabinovitz's message of "Tue, 25 Jul 2006 16:45:29 +0300") References: <20060725134529.GA20068@mellanox.co.il> Message-ID: > > Why does userspace need to be able to disconnect a connection? > There are two options on who will initiate the disconnection: the userspace > daemon or the ib_srp module. I considered both options and I was not sure > which one is better. I choose to do it in userspace because it looks a good > symmetry that both the disconnection and reconnection will be initiate in the > same place. I will accept your comment and change it to the kernel. I'm not telling you what to do -- I'm just asking. But it does seem to me that the kernel knows better when to disconnect a connection -- eg I don't think an error completion will be signaled to userspace. Conversely if a target goes away and comes back with no IOs submitted in between, then the connection should survive and there's no reason to disconnect/reconnect. - R. From mshefty at ichips.intel.com Tue Jul 25 10:28:14 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 10:28:14 -0700 Subject: [openib-general] [librdmacm] ABI file in not being created by cma module, but the user level library tries to use it In-Reply-To: <200607251136.42137.jackm@mellanox.co.il> References: <000401c6aee6$8252e6a0$87f9070a@amr.corp.intel.com> <44C504D9.7080407@ichips.intel.com> <200607251136.42137.jackm@mellanox.co.il> Message-ID: <44C654AE.6080307@ichips.intel.com> Jack Morgenstein wrote: >>>Yes - I inadvertently removed this fix when I removed libsysfs from the >>>librdmacm. I will re-instate the fix. >> >>7497 was a fix that added libsysfs to the librdmacm as a result of it being >>removed from libibverbs. In any case, I applied a fix to assume the latest >>ABI version. >> > BTW, 7497 WAS the mod that removed the fix -- see patch file below with diff > of cma.c between 7496 and 7497. Correct - when I looked at the changes from 7497, I initially thought it was my patch that removed libsysfs from the librdmacm. That was incorrect, so I was simply trying to correct my previous comment. Sorry for the confusion. - Sean From mshefty at ichips.intel.com Tue Jul 25 10:14:43 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 10:14:43 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: References: <000301c6ad10$32d946a0$e598070a@amr.corp.intel.com> Message-ID: <44C65183.6000709@ichips.intel.com> James Lentini wrote: > If ib_register_cm_handler() fails, how about logging an error message: > > if (ib_register_cm_handler(device, cm_event_handler)) > printk(KERN_WARNING "IB CM unable to register CM handler\n); Thanks - I added this. - Sean From trimmer at silverstorm.com Tue Jul 25 10:06:59 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 25 Jul 2006 13:06:59 -0400 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <20060723094250.GB26317@mellanox.co.il> Message-ID: > IB spec 1.2 vol 1 page 456 > > C10-29: If a Work Request is submitted to a Send Queue while its > corresponding > QP is in the RTR State, an immediate error shall be returned. > > So, while you technically *can* post on a QP in RTR, you won't, and are > not > supposed to, get anything useful out of this. Perhaps we should pursue changing this in the IBTA spec. Being able to post to the SQ while in RTR makes handling of the Comm Est/RTU race with the CQ callback much easier to handle. It would be better if the IB spec permitted posting to the SQ in RTR but indicated the SQ would not be processed until the QP moved to RTS. I believe the present Mellanox silicon/firmware implements such behavior. Todd Rimmer Chief Systems Architect SilverStorm Technologies Voice: 610-233-4852 Fax: 610-233-4777 TRimmer at SilverStorm.com www.SilverStorm.com From mshefty at ichips.intel.com Tue Jul 25 10:33:35 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 10:33:35 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C62643.6030801@voltaire.com> References: <000101c6aee3$951b95a0$87f9070a@amr.corp.intel.com> <44C4FD69.9040005@ichips.intel.com> <44C62643.6030801@voltaire.com> Message-ID: <44C655EF.4010804@ichips.intel.com> Or Gerlitz wrote: >> For peer to peer, tracking the QP would still only need to be done on >> the passive side, so that justification can be ignored. This means >> that we are tracking local QPN as part of timewait. > > Sorry, i don't fully follow. > > I was thinking that in the peer to peer connection establishment model, > the passive side is not known beforehand, that is both sides do listen > and send the REQ, one side (CM) becomes the "passive" and the other > becomes the "active", this means you can't tell before getting the REQ > if you would be passive or active so you might ment to say that the code > that places the local qpn in the rbtree can be executed once getting the > REQ and not before sending REQ or REP as done today? After thinking about peer to peer more, I think that it could insert the local QPN after the passive side calls ib_send_cm_rep(), similar to what's done in the client-server model. If we only want to track local QPNs for the purposes of handling the COMM_EST event, then tracking on the passive side is sufficient. If we want to track local QPNs as part of timewait, then we want tracking on both sides. I went the latter route. - Sean From ishai at mellanox.co.il Tue Jul 25 06:45:29 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 25 Jul 2006 16:45:29 +0300 Subject: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers In-Reply-To: References: Message-ID: <20060725134529.GA20068@mellanox.co.il> On Tue, Jul 25, 2006 at 04:58:48PM +0300, rdreier at cisco.com wrote: > [CC'ing linux-scsi as well -- I think we'll get better insight from = > there] > > OK, but is this a valid assumption? What happens for iSCSI and/or iSER? >From Mike's response I understand that it is a reasonable behavior to keep the host (at least for a period of time) and let the userspace daemon be responsible to the reconnection or deallocating of that host. > > How does the daemon know when something is gone for good vs. when it > might come back? > I think we should use a time out in the daemon. > > Why does userspace need to be able to disconnect a connection? > There are two options on who will initiate the disconnection: the userspace daemon or the ib_srp module. I considered both options and I was not sure which one is better. I choose to do it in userspace because it looks a good symmetry that both the disconnection and reconnection will be initiate in the same place. I will accept your comment and change it to the kernel. > > Why the asymmetry here? In other words, why does anything work for > reconnect_target but only the literal "erase" work for erase_target? > Because erase_target is a destructive command that can not be reversed I think it should use a more safe approach. -- Ishai Rabinovitz From swise at opengridcomputing.com Tue Jul 25 08:05:40 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 25 Jul 2006 10:05:40 -0500 Subject: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification. In-Reply-To: References: Message-ID: <1153839940.14354.37.camel@stevo-desktop> On Tue, 2006-07-25 at 17:39 +1000, Herbert Xu wrote: > Steve Wise wrote: > > > > Routing redirect events are broadcast as a pair of rtmsgs, RTM_DELROUTE > > and RTM_NEWROUTE. > > This may confuse existing rtnetlink users since you're generating an > RTM_DELROUTE message that's identical to one triggered by something > like 'ip route del'. > Yea, I didn't really want to create a REDIRECT rtmsg, so I punted. :-) But they really are seeing a delete followed by an add. That's what the kernel is doing. > As you're introducing a completely new RTM_ROUTEUPD type, it might > be better to attach any information from the existing route that you > need to the ROUTEUPD message. Yea, the main change is the next hop ip address or gateway field. > > Actually, what was the reason you need the existing route here? > The rdma driver needs to update all established rdma connections that are using the next-hop information of the existing route and make them use the next-hop information of the new route. In addition, the rdma driver might have a reference to the old dst entry. So it can release that ref and add a ref to the new dst entry. > > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > > index 5f87533..33d8a83 100644 > > --- a/net/ipv4/fib_semantics.c > > +++ b/net/ipv4/fib_semantics.c > > @@ -44,6 +44,7 @@ #include > > #include > > #include > > #include > > +#include > > > > #include "fib_lookup.h" > > > > @@ -279,6 +280,14 @@ void rtmsg_fib(int event, u32 key, struc > > struct sk_buff *skb; > > u32 pid = req ? req->pid : n->nlmsg_pid; > > int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); > > + struct netevent_route_info nri; > > + int netevent; > > + > > + nri.family = AF_INET; > > + nri.data = &fa->fa_info; > > + netevent = event == RTM_NEWROUTE ? NETEVENT_ROUTE_ADD > > + : NETEVENT_ROUTE_DEL; > > + call_netevent_notifiers(netevent, &nri); > > Hmm, this is broken. These route events are meaningless without the > corresponding IP rule events. Are you sure you really want to make > your hardware/driver grok multiple routing tables? > > Perhaps you should simply stick to dst entries and flush all your > tables when the routes are changed. This is what the Linux IP stack > does. > I have to admit I'm a little fuzzy on the routing stuff. The main netevents I've utilized in the the rdma driver I'm writing is the neighbour update event and the redirect event. Route add/del was added for completeness of "routing" netevents. Can you expand further or point me to code where the IP stack "flushes its tables" when routes are changed? >From my experience, all the rdma driver needs is the dst entry. It using the routing table to determine the dst_entry at connection establish time. And it needs to know if the next-hop or PMTU ever changes. > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > > index 2dc6dbb..18879e6 100644 > > --- a/net/ipv4/route.c > > +++ b/net/ipv4/route.c > > @@ -1117,6 +1120,52 @@ static void rt_del(unsigned hash, struct > > spin_unlock_bh(rt_hash_lock_addr(hash)); > > } > > > > +static void rtm_redirect(struct rtable *old, struct rtable *new) > > +{ > > + struct netevent_redirect netevent; > > + struct sk_buff *skb; > > + int err; > > + > > + netevent.old = &old->u.dst; > > + netevent.new = &new->u.dst; > > + > > + /* notify netevent subscribers */ > > + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); > > + > > + /* Post NETLINK messages: RTM_DELROUTE for old route, > > + RTM_NEWROUTE for new route */ > > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); > > Please use a better size estimate rather than NLMSG_GOODSIZE here since > you're doing GFP_ATOMIC. > ok > > @@ -1442,6 +1493,32 @@ unsigned short ip_rt_frag_needed(struct > > return est_mtu ? : new_mtu; > > } > > > > +static void rtm_pmtu_update(struct rtable *rt) > > +{ > > + struct sk_buff *skb; > > + int err; > > + > > + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst); > > + > > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); > > Ditto. > ok Thanks, Steve. From rkuchimanchi at silverstorm.com Tue Jul 25 08:30:48 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 25 Jul 2006 21:00:48 +0530 Subject: [openib-general] RDMA Writes in kernel mode Message-ID: <44C63928.5060401@silverstorm.com> Please correct me if I am wrong, but it seems none of the ULPs in the Open Fabrics stack use RDMA Writes. Is it just because RDMA Write semantics do not make sense in these ULPs or is there any other reason ? The reason I ask is because I am trying to do RDMA Writes in a kernel module and even though I get RDMA Write completions, I do not see the data on the remote peer. I am not really sure why this is happening. Can any one hazard a guess as to the reason for this behavior of RDMA Writes ? Also, is there any piece of code that does RDMA Writes in kernel mode ? I would appreciate any pointers to such code. It will help me in verifying my code to see if I am missing something. Regards, Ram From dotanb at mellanox.co.il Tue Jul 25 08:03:46 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 25 Jul 2006 18:03:46 +0300 Subject: [openib-general] [libmthca] patch to devinfo: Added a print of the IB ports rate values Message-ID: <200607251803.47025.dotanb@mellanox.co.il> attached in the patch of this issue (until i'll fix my mail client) thanks Dotan -------------- next part -------------- Added a print of the IB ports rate values. Signed-off-by: Dotan Barak Index: latest/src/userspace/libibverbs/examples/devinfo.c =================================================================== --- latest.orig/src/userspace/libibverbs/examples/devinfo.c 2006-07-25 08:02:39.000000000 +0300 +++ latest/src/userspace/libibverbs/examples/devinfo.c 2006-07-25 17:05:35.000000000 +0300 @@ -104,36 +104,36 @@ static const char *atomic_cap_str(enum i } } -static const char *mtu_str(enum ibv_mtu max_mtu) +static int mtu_num(enum ibv_mtu max_mtu) { switch (max_mtu) { - case IBV_MTU_256: return "256"; - case IBV_MTU_512: return "512"; - case IBV_MTU_1024: return "1024"; - case IBV_MTU_2048: return "2048"; - case IBV_MTU_4096: return "4096"; - default: return "invalid MTU"; + case IBV_MTU_256: return 256; + case IBV_MTU_512: return 512; + case IBV_MTU_1024: return 1024; + case IBV_MTU_2048: return 2048; + case IBV_MTU_4096: return 4096; + default: return -1; } } -static const char *width_str(uint8_t width) +static int width_num(uint8_t width) { switch (width) { - case 1: return "1"; - case 2: return "4"; - case 4: return "8"; - case 8: return "12"; - default: return "invalid width"; + case 1: return 1; + case 2: return 4; + case 4: return 8; + case 8: return 12; + default: return -1; } } -static const char *speed_str(uint8_t speed) +static double speed_num(uint8_t speed) { switch (speed) { - case 1: return "2.5 Gbps"; - case 2: return "5.0 Gbps"; - case 4: return "10.0 Gbps"; - default: return "invalid speed"; + case 1: return 2.5; + case 2: return 5.0; + case 4: return 10.0; + default: return -1; } } @@ -257,10 +257,10 @@ static int print_hca_cap(struct ibv_devi printf("\t\tport:\t%d\n", port); printf("\t\t\tstate:\t\t\t%s (%d)\n", port_state_str(port_attr.state), port_attr.state); - printf("\t\t\tmax_mtu:\t\t%s (%d)\n", - mtu_str(port_attr.max_mtu), port_attr.max_mtu); - printf("\t\t\tactive_mtu:\t\t%s (%d)\n", - mtu_str(port_attr.active_mtu), port_attr.active_mtu); + printf("\t\t\tmax_mtu:\t\t%d (%d)\n", + mtu_num(port_attr.max_mtu), port_attr.max_mtu); + printf("\t\t\tactive_mtu:\t\t%d (%d)\n", + mtu_num(port_attr.active_mtu), port_attr.active_mtu); printf("\t\t\tsm_lid:\t\t\t%d\n", port_attr.sm_lid); printf("\t\t\tport_lid:\t\t%d\n", port_attr.lid); printf("\t\t\tport_lmc:\t\t0x%02x\n", port_attr.lmc); @@ -276,10 +276,12 @@ static int print_hca_cap(struct ibv_devi printf("\t\t\tgid_tbl_len:\t\t%d\n", port_attr.gid_tbl_len); printf("\t\t\tsubnet_timeout:\t\t%d\n", port_attr.subnet_timeout); printf("\t\t\tinit_type_reply:\t%d\n", port_attr.init_type_reply); - printf("\t\t\tactive_width:\t\t%sX (%d)\n", - width_str(port_attr.active_width), port_attr.active_width); - printf("\t\t\tactive_speed:\t\t%s (%d)\n", - speed_str(port_attr.active_speed), port_attr.active_speed); + printf("\t\t\tactive_width:\t\t%dX (%d)\n", + width_num(port_attr.active_width), port_attr.active_width); + printf("\t\t\tactive_speed:\t\t%.1lf Gbps (%d)\n", + speed_num(port_attr.active_speed), port_attr.active_speed); + printf("\t\t\trate:\t\t\t%.1lf\n", + width_num(port_attr.active_width) * speed_num(port_attr.active_speed)); printf("\t\t\tphys_state:\t\t%s (%d)\n", port_phy_state_str(port_attr.phys_state), port_attr.phys_state); From tziporet at mellanox.co.il Tue Jul 25 08:39:39 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 25 Jul 2006 18:39:39 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary In-Reply-To: <1153812370.13875.162.camel@localhost> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA74FE@mtlexch01.mtl.com> <1153812370.13875.162.camel@localhost> Message-ID: <44C63B3B.1040001@mellanox.co.il> Matt Leininger wrote: >> 5. SRP: >> >> – GA quality >> >> – DM (Device Mapper) - for high availability >> >> – Basic failover/failback testing with daemon+srp+XVM/MPP and >> Engenio target >> >> > Tziporet, > > Are there any plans to test with the DDN SRP target? Several DoE > sites are testing/using the DDN IB based storage. > > > Mellanox does not have DDN SRP target. We will be happy to test it of DDN will loan us a system. Another option is that DDN will take OFED 1.1 RCs and test it in their labs. Can you approach them and ask this. If yes then I can cc them on the RCs mails so they can do it. Is there any other vendor who has DDN SRP target, and going to test OFED with it? Tziporet From ogerlitz at voltaire.com Tue Jul 25 07:10:11 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 25 Jul 2006 17:10:11 +0300 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C4FD69.9040005@ichips.intel.com> References: <000101c6aee3$951b95a0$87f9070a@amr.corp.intel.com> <44C4FD69.9040005@ichips.intel.com> Message-ID: <44C62643.6030801@voltaire.com> Sean Hefty wrote: > Sean Hefty wrote: >> The race can only happen on the passive side, but... for peer to peer >> connections, we won't know which side is passive until later, and this >> tracks >> the local QPN as part of timewait now. (Note that peer to peer >> support requires >> more work.) > > For peer to peer, tracking the QP would still only need to be done on > the passive side, so that justification can be ignored. This means that > we are tracking local QPN as part of timewait. Sorry, i don't fully follow. I was thinking that in the peer to peer connection establishment model, the passive side is not known beforehand, that is both sides do listen and send the REQ, one side (CM) becomes the "passive" and the other becomes the "active", this means you can't tell before getting the REQ if you would be passive or active so you might ment to say that the code that places the local qpn in the rbtree can be executed once getting the REQ and not before sending REQ or REP as done today? Or. From swise at opengridcomputing.com Tue Jul 25 11:19:42 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 25 Jul 2006 13:19:42 -0500 Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: References: Message-ID: <1153851582.14354.60.camel@stevo-desktop> hmm. Lets recap for my brain: 1) krping works over T3. So the physical connectivity is ok. 2) rping doesn't work...rping client gets an error 110 from cma event 1. Event 1 is ADDR_ERROR, which means something balled up trying to figure out which local interface to use, or the ARP resolution failed (timed out). 3) I seem to remember from the T3 debug logs, that the client isn't getting any connect request from the iwarp CM. This jives with #2. Does the server ever get any connection indication at all? Can I log into these systems and poke around? It'll definitely expedite things... Also, a network sniffer trace of pkts going in/out of the client's port would help too... Steve. On Tue, 2006-07-25 at 13:47 -0400, Sundeep Narravula wrote: > > > > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > > > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > > > > > > > while using rping. > > > > > > > > > > > > > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > > > > > > > libibverbs: Warning: couldn't load driver > > > > > > > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > > > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > > > > > > > cma event 1, error -110 > > > > > > > > > > Why don't you go ahead and build/install libmthca -or- remove the IB > > card and see if that helps. This is still some basic > > configuration/installation problem here... > > Hi Steve, > We have tried both these options. Building libmthca / removing the IB > cards did not help. We are still running the same errors. The dmesg log > are also identical. > > --Sundeep. > > > > > Steve. > > > From somenath at veritas.com Tue Jul 25 09:34:35 2006 From: somenath at veritas.com (somenath) Date: Tue, 25 Jul 2006 09:34:35 -0700 Subject: [openib-general] openib tools to update mellnox firmware In-Reply-To: <20060725044027.GA7682@mellanox.co.il> References: <44C50181.4000304@veritas.com> <20060725044027.GA7682@mellanox.co.il> Message-ID: <44C6481B.6080106@veritas.com> Michael S. Tsirkin wrote: >Quoting r. somenath : > > >>thanks Tziporet. >>from ur link , I am looking here >>http://www.mellanox.com/support/firmware_table.php >> >>for the exact HCA I have from your list. I don't find the exact match of >>OPN and Card Rev (PSID) and I don't know how to find those. >>all I found about that card is this: >>[root at alekhine bin]# ./ibstat >>CA 'mthca0' >> CA type: MT25208 (MT23108 compat mode) >> Number of ports: 2 >> Firmware version: 4.6.0 >> Hardware version: a0 >> Node GUID: 0x0005ad0000039abc >> System image GUID: 0x0005ad000100d050 >> >>Is there another tool which will give me OPN/Card Rev (PSID) so that I >>can match exactly from the table above? >> >> >> > >Try this: > >cat /sys/class/infiniband/mthca0/board_id > > in my setup that doesn't produce any output...if I do this: [root at alekhine mthca0]# cat /sys/class/infiniband/mthca0/board_id that's an empty output. however cat'ing other files give some stuff e.g.: [root at alekhine mthca0]# cat /sys/class/infiniband/mthca0/fw_ver 4.6.0 ============================== [root at alekhine mthca0]# cat /sys/class/infiniband/mthca0/hca_type MT25208 (MT23108 compat mode) ========================================== [root at alekhine mthca0]# cat /sys/class/infiniband/mthca0/hw_rev a0 ========================================== thanks, som. >>also, another question: is the latest firmware backward compatible with >>the openib stack I am using? (from redhat binary distribtution RH4 >>update 3)? >> >>thanks, som. >> >> > >Should be. > > > From mshefty at ichips.intel.com Tue Jul 25 11:12:57 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 11:12:57 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C627F2.906@voltaire.com> References: <000201c6aee4$1f54b760$87f9070a@amr.corp.intel.com> <44C627F2.906@voltaire.com> Message-ID: <44C65F29.3020306@ichips.intel.com> Or Gerlitz wrote: > 1) work over the CM/VERBS > 2) work over the CMA/VERBS with the CMA doing the QP state changes > 3) work over the CMA/VERBS with the ULP doing the QP state changes Side note: type 3 would be userspace CMA users. > Your patch provides ***full*** solution for type 2 consumers. Type 3 > consumers can't call ib_cm_establish since they don't interact with the > CM, as for type 1 consumers, looking on the CM code i was not able to > convince myself that calling ib_cm_establish along with modifying the qp > state to RTS would provide a full solution for them, what do you think? The behavior of this patch is the same in all three cases. For type 1, calling ib_cm_establish() along with transitioning the QP to RTS is sufficient. This method is also usable by type 2, if the CMA were to intercept the COMM_EST events. The method breaks down for type 3 because there's no way for the CMA to get the COMM_EST event. I would say that this provides a full solution for all types of consumers, and agree that ib_cm_establish() could be removed from the API. However, keeping it prevents breaking the ABI, and it does have limited use if called as a result of polling a receive completion, by forcing the connection into the established state immediately. - Sean From narravul at cse.ohio-state.edu Tue Jul 25 10:47:55 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Tue, 25 Jul 2006 13:47:55 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: <1153497332.27338.11.camel@stevo-desktop> Message-ID: > > > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > > > > > > while using rping. > > > > > > > > > > > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > > > > > > libibverbs: Warning: couldn't load driver > > > > > > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > > > > > > cma event 1, error -110 > > > > > > > Why don't you go ahead and build/install libmthca -or- remove the IB > card and see if that helps. This is still some basic > configuration/installation problem here... Hi Steve, We have tried both these options. Building libmthca / removing the IB cards did not help. We are still running the same errors. The dmesg log are also identical. --Sundeep. > > Steve. > From mst at mellanox.co.il Tue Jul 25 11:40:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 25 Jul 2006 21:40:38 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <44AD4E04.2030908@ichips.intel.com> References: <44AD4E04.2030908@ichips.intel.com> Message-ID: <20060725184038.GC10375@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: ucma into kernel.org > > Michael S. Tsirkin wrote: > > TCP sockets just expose this to application through the > > TCP_SYNCNT option. > > > > Which leads again to my suggestion: since both TCP and IB CM have this, let us > > change max_cm_retries to max_request_retries, and add this in rdma_cm as a > > generic option. > > I'm not against adding in generic option for this, but the retry values have > different limits. IB is limited to a maximum of 15, while TCP has a maximum of 255. I thought about this some more. I think there's value in making it generic. Can we maybe emulate TCP by changing the TID, or is this better done in the ULP, in your opinion? -- MST From sean.hefty at intel.com Tue Jul 25 11:33:03 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 11:33:03 -0700 Subject: [openib-general] RDMA Writes in kernel mode In-Reply-To: <44C63928.5060401@silverstorm.com> Message-ID: <000301c6b018$c38058f0$8698070a@amr.corp.intel.com> >Also, is there any piece of code that does RDMA Writes in kernel mode ? >I would appreciate any pointers to such code. It will help me in >verifying my code to see if I am missing something. There's a sample program that does writes in: gen2/utils/src/linix-kernel/infiniband/util/krping - Sean From sean.hefty at intel.com Tue Jul 25 12:11:55 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 12:11:55 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: Message-ID: <000401c6b01e$31f58800$8698070a@amr.corp.intel.com> >Perhaps we should pursue changing this in the IBTA spec. Being able to >post to the SQ while in RTR makes handling of the Comm Est/RTU race with >the CQ callback much easier to handle. > >It would be better if the IB spec permitted posting to the SQ in RTR but >indicated the SQ would not be processed until the QP moved to RTS. I >believe the present Mellanox silicon/firmware implements such behavior. I think it would be simpler to transition the QP to RTS after sending a REP, with the restriction that a user may not post sends until an RTU is received, a communication establish event occurs, or a receive message completes on the QP. - Sean From sean.hefty at intel.com Tue Jul 25 12:42:37 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 12:42:37 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060725184038.GC10375@mellanox.co.il> Message-ID: <000501c6b022$7bb7fd70$8698070a@amr.corp.intel.com> >I thought about this some more. I think there's value in making it generic. >Can we maybe emulate TCP by changing the TID, or is this better done in the >ULP, in your opinion? Thinking out loud here... I don't think that it makes sense to change the IB CM to support retrying a REQ more than that specified by the spec. Max CM retries is also used by other CM messages, plus there's the problem that what the active side is sending as a retry is really a new request to the passive side, and both requests carry the same active QPN. The problem that we were seeing running Intel MPI, and I'm guessing at least a couple of the other MPI implementations are hitting as well, wasn't that the number of retries was too small, but that the remote_cm_response_timeout was. Connections were taking minutes to form. Setting max CM retries to the largest value only helped to a point. My solution was to allow the user to override the IB CM REQ parameters used by the RDMA CM. This included local and remote CM response timeouts, plus max CM retries. It sounds like the only value that you want to make generic is max CM retries. Could the CMA retry a connection request after it times out by the IB CM? I think so, but that gets back to the issue of the passive IB CM seeing different connection requests for the same QP. For the actual problem I was trying to solve, the original REQ had been received, so a second REQ would have been rejected due to a duplicate QPN. I think that a generic solution would have to scale down to the lowest value. - Sean From mlleinin at hpcn.ca.sandia.gov Tue Jul 25 13:02:32 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Tue, 25 Jul 2006 13:02:32 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary In-Reply-To: <44C63B3B.1040001@mellanox.co.il> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA74FE@mtlexch01.mtl.com> <1153812370.13875.162.camel@localhost> <44C63B3B.1040001@mellanox.co.il> Message-ID: <1153857752.13875.200.camel@localhost> On Tue, 2006-07-25 at 18:39 +0300, Tziporet Koren wrote: > Matt Leininger wrote: > >> 5. SRP: > >> > >> – GA quality > >> > >> – DM (Device Mapper) - for high availability > >> > >> – Basic failover/failback testing with daemon+srp+XVM/MPP and > >> Engenio target > >> > >> > > Tziporet, > > > > Are there any plans to test with the DDN SRP target? Several DoE > > sites are testing/using the DDN IB based storage. > > > > > > > Mellanox does not have DDN SRP target. We will be happy to test it of > DDN will loan us a system. > > Another option is that DDN will take OFED 1.1 RCs and test it in their labs. > Can you approach them and ask this. If yes then I can cc them on the RCs > mails so they can do it. > > Is there any other vendor who has DDN SRP target, and going to test OFED > with it? I thought Cisco had a DDN SRP target. - Matt From narravul at cse.ohio-state.edu Tue Jul 25 13:02:54 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Tue, 25 Jul 2006 16:02:54 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: <1153851582.14354.60.camel@stevo-desktop> Message-ID: Hi Steve, > 1) krping works over T3. So the physical connectivity is ok. This does NOT work. It gives out the same error as rping. > 2) rping doesn't work...rping client gets an error 110 from cma event 1. > Event 1 is ADDR_ERROR, which means something balled up trying to figure > out which local interface to use, or the ARP resolution failed (timed > out). > > 3) I seem to remember from the T3 debug logs, that the client isn't > getting any connect request from the iwarp CM. This jives with #2. Yes and Yes. > > Does the server ever get any connection indication at all? No. The server logs do not show any indication of conenction request or error > Can I log into these systems and poke around? It'll definitely expedite > things... Yes. We will create an account on these systems. I will send out the information shortly. > Also, a network sniffer trace of pkts going in/out of the client's port > would help too... I will have to check this. Thanks, --Sundeep. > > Steve. > > > > > On Tue, 2006-07-25 at 13:47 -0400, Sundeep Narravula wrote: > > > > > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > > > > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > > > > > > > > while using rping. > > > > > > > > > > > > > > > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > > > > > > > > libibverbs: Warning: couldn't load driver > > > > > > > > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > > > > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > > > > > > > > cma event 1, error -110 > > > > > > > > > > > > > Why don't you go ahead and build/install libmthca -or- remove the IB > > > card and see if that helps. This is still some basic > > > configuration/installation problem here... > > > > Hi Steve, > > We have tried both these options. Building libmthca / removing the IB > > cards did not help. We are still running the same errors. The dmesg log > > are also identical. > > > > --Sundeep. > > > > > > > > Steve. > > > > > > From mst at mellanox.co.il Tue Jul 25 13:15:07 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 25 Jul 2006 23:15:07 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <000501c6b022$7bb7fd70$8698070a@amr.corp.intel.com> References: <000501c6b022$7bb7fd70$8698070a@amr.corp.intel.com> Message-ID: <20060725201507.GA11519@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] ucma into kernel.org > > >I thought about this some more. I think there's value in making it generic. > >Can we maybe emulate TCP by changing the TID, or is this better done in the > >ULP, in your opinion? > > Thinking out loud here... > > I don't think that it makes sense to change the IB CM to support retrying a REQ > more than that specified by the spec. Max CM retries is also used by other CM > messages, plus there's the problem that what the active side is sending as a > retry is really a new request to the passive side, and both requests carry the > same active QPN. > > The problem that we were seeing running Intel MPI, and I'm guessing at least a > couple of the other MPI implementations are hitting as well, wasn't that the > number of retries was too small, but that the remote_cm_response_timeout was. > Connections were taking minutes to form. Setting max CM retries to the largest > value only helped to a point. > > My solution was to allow the user to override the IB CM REQ parameters used by > the RDMA CM. This included local and remote CM response timeouts, plus max CM > retries. It sounds like the only value that you want to make generic is max CM > retries. > > Could the CMA retry a connection request after it times out by the IB CM? I > think so, but that gets back to the issue of the passive IB CM seeing different > connection requests for the same QP. For the actual problem I was trying to > solve, the original REQ had been received, so a second REQ would have been > rejected due to a duplicate QPN. How do you mean duplicate QPN? You can;t track remote QPNs, can you? > I think that a generic solution would have to scale down to the lowest value. > > - Sean > -- MST From mst at mellanox.co.il Tue Jul 25 13:17:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 25 Jul 2006 23:17:29 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <000401c6b01e$31f58800$8698070a@amr.corp.intel.com> References: <000401c6b01e$31f58800$8698070a@amr.corp.intel.com> Message-ID: <20060725201729.GB11519@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH 0/4] Dispatch communication relatedevents to the IB CM > > >Perhaps we should pursue changing this in the IBTA spec. Being able to > >post to the SQ while in RTR makes handling of the Comm Est/RTU race with > >the CQ callback much easier to handle. > > > >It would be better if the IB spec permitted posting to the SQ in RTR but > >indicated the SQ would not be processed until the QP moved to RTS. I > >believe the present Mellanox silicon/firmware implements such behavior. > > I think it would be simpler to transition the QP to RTS after sending a REP, > with the restriction that a user may not post sends until an RTU is received, a > communication establish event occurs, or a receive message completes on the QP. > > - Sean I think if you post RR on active side in init state, you even can send directly, right? -- MST From swise at opengridcomputing.com Tue Jul 25 11:51:24 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 25 Jul 2006 13:51:24 -0500 Subject: [openib-general] RDMA Writes in kernel mode In-Reply-To: <44C63928.5060401@silverstorm.com> References: <44C63928.5060401@silverstorm.com> Message-ID: <1153853484.14354.76.camel@stevo-desktop> On Tue, 2006-07-25 at 21:00 +0530, Ramachandra K wrote: > Please correct me if I am wrong, but it seems none of the ULPs in the > Open Fabrics stack use RDMA Writes. Is it just because RDMA Write > semantics do not make sense in these ULPs or is there any other reason ? > The reason I ask is because I am trying to do RDMA Writes in a kernel > module and even though I get RDMA Write completions, I do not see the > data on the remote peer. I am not really sure why this is happening. Can > any one hazard a guess as to the reason for this behavior of RDMA Writes ? > > Also, is there any piece of code that does RDMA Writes in kernel mode ? > I would appreciate any pointers to such code. It will help me in > verifying my code to see if I am missing something. > Check out krping: gen2/utils/src/linux-kernel/infiniband/util/krping/krping.c From sean.hefty at intel.com Tue Jul 25 13:28:28 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 13:28:28 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060725201507.GA11519@mellanox.co.il> Message-ID: <000601c6b028$e3f5b930$8698070a@amr.corp.intel.com> >How do you mean duplicate QPN? You can;t track remote QPNs, can you? The QPN carried in the original and new REQ messages would be the same. Remote QPNs are tracked by the IB CM for timewait purposes, in case the remote system reboots, then tries to establish a new connection using a QP that the local system still thinks is connected. - Sean From weiny2 at llnl.gov Tue Jul 25 13:26:55 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 25 Jul 2006 13:26:55 -0700 Subject: [openib-general] libsysfs dependancy (in libibcm?) Message-ID: <20060725132655.651fc01f.weiny2@llnl.gov> I though the trunk had removed all the libsysfs dependancies. (except for ibverbs-1.0) I just got this error on a trunk I pulled yesterday. weiny2 at xdevi:~/OpenIB-trunk/src/userspace/libibcm 10:54:19 > ./configure --prefix=/home/weiny2/root checking for a BSD-compatible install... /usr/bin/install -c checking sysfs/libsysfs.h usability... no checking sysfs/libsysfs.h presence... no checking for sysfs/libsysfs.h... no configure: error: not found. libibcm requires libsysfs. Am I thinking of something else? Ira From sean.hefty at intel.com Tue Jul 25 13:31:29 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 13:31:29 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <20060725201729.GB11519@mellanox.co.il> Message-ID: <000701c6b029$4f165850$8698070a@amr.corp.intel.com> >I think if you post RR on active side in init state, you even can send >directly, right? The active side can post receives, but incoming messages won't be processed until the QP transitions to RTR, so sends at this point would be lost. The REP message is needed to transition from INIT to RTR. - Sean From shahanse at cisco.com Tue Jul 25 13:39:10 2006 From: shahanse at cisco.com (Shawn Hansen (shahanse)) Date: Tue, 25 Jul 2006 13:39:10 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary Message-ID: Yes, Cisco plans to test OFED on a DDN SRP target. --Shawn -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Tuesday, July 25, 2006 8:40 AM To: Matt Leininger Cc: openfabrics-ewg at openib.org; openib Subject: Re: [openfabrics-ewg] OFED 1.1 planning meeting - summary Matt Leininger wrote: >> 5. SRP: >> >> - GA quality >> >> - DM (Device Mapper) - for high availability >> >> - Basic failover/failback testing with daemon+srp+XVM/MPP and >> Engenio target >> >> > Tziporet, > > Are there any plans to test with the DDN SRP target? Several DoE > sites are testing/using the DDN IB based storage. > > > Mellanox does not have DDN SRP target. We will be happy to test it of DDN will loan us a system. Another option is that DDN will take OFED 1.1 RCs and test it in their labs. Can you approach them and ask this. If yes then I can cc them on the RCs mails so they can do it. Is there any other vendor who has DDN SRP target, and going to test OFED with it? Tziporet _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From sean.hefty at intel.com Tue Jul 25 13:43:13 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 13:43:13 -0700 Subject: [openib-general] libsysfs dependancy (in libibcm?) In-Reply-To: <20060725132655.651fc01f.weiny2@llnl.gov> Message-ID: <000801c6b02a$f2ed22a0$8698070a@amr.corp.intel.com> >I though the trunk had removed all the libsysfs dependancies. (except for >ibverbs-1.0) As you discovered, the libibcm still depends on libsysfs. I removed the dependency from librdmacm. - Sean From mst at mellanox.co.il Tue Jul 25 13:53:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 25 Jul 2006 23:53:55 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <000601c6b028$e3f5b930$8698070a@amr.corp.intel.com> References: <000601c6b028$e3f5b930$8698070a@amr.corp.intel.com> Message-ID: <20060725205355.GA12113@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] ucma into kernel.org > > >How do you mean duplicate QPN? You can;t track remote QPNs, can you? > > The QPN carried in the original and new REQ messages would be the same. Remote > QPNs are tracked by the IB CM for timewait purposes, in case the remote system > reboots, then tries to establish a new connection using a QP that the local > system still thinks is connected. Hmm. Not sure why does CM need to do this? Does it say so in the spec? -- MST From sean.hefty at intel.com Tue Jul 25 14:18:07 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 14:18:07 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060725205355.GA12113@mellanox.co.il> Message-ID: <000901c6b02f$d2e01490$8698070a@amr.corp.intel.com> >Hmm. Not sure why does CM need to do this? Does it say so in the spec? >From 12.9.8.3.1: (RC, UC) A CM may receive a REQ/REP specifying a remote QPN in "REQ:local QPN"/"REP:local QPN" that the CM already considers connected to a local QP. A local CM may receive such a REQ/REP if its local QP has a stale connection, as described in section 12.4.1. When a CM receives such a REQ/REP it shall abort the connection establishment by issuing REJ to the REQ/REP. It shall then issue DREQ, with "DREQ:remote QPN" set to the remote QPN from the REQ/REP, until DREP is received or Max Retries is exceeded, and place the local QP in the TimeWait state. - Sean From mst at mellanox.co.il Tue Jul 25 14:45:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 00:45:01 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <000901c6b02f$d2e01490$8698070a@amr.corp.intel.com> References: <000901c6b02f$d2e01490$8698070a@amr.corp.intel.com> Message-ID: <20060725214501.GE12113@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: ucma into kernel.org > > >Hmm. Not sure why does CM need to do this? Does it say so in the spec? > > >From 12.9.8.3.1: Hmm, thanks. So what you are saying is that we must destroy/recreate the QP as well, and this implies this must be done at ULP level. Right? So I guess if CMA rounds the retry count down to the max supported value, this will be enough for most ULPs, and if not, ULP can get the etry count, see that it is smaller than what user set, and destroy/retry. Sounds good? -- MST From mst at mellanox.co.il Tue Jul 25 14:49:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 00:49:15 +0300 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around Message-ID: <20060725214915.GF12113@mellanox.co.il> Hi, Roland! For some time now we've been trying to debug a data corruption failure that occurs when working with Mellanox SRP target. We could not yet pin-point exactly where the problem is (debugging suggestions are wellcome) but the corruption seems to occur only when the target gets a request where the virtual address is not 0. Until there's a better understanding of the issue, I've come up with the following simple patch that will use indirect mode for this case. Could you comment on this please? --- Add work-around for data corruption observed with Mellanox targets when VA != 0. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8f472e7..6a8b286 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -75,6 +75,12 @@ module_param(topspin_workarounds, int, 0 MODULE_PARM_DESC(topspin_workarounds, "Enable workarounds for Topspin/Cisco SRP target bugs if != 0"); +static int mellanox_workarounds = 1; + +module_param(mellanox_workarounds, int, 0444); +MODULE_PARM_DESC(mellanox_workarounds, + "Enable workarounds for Mellanox SRP target bugs if != 0"); + static const u8 topspin_oui[3] = { 0x00, 0x05, 0xad }; static void srp_add_one(struct ib_device *device); @@ -581,6 +587,10 @@ static int srp_map_fmr(struct srp_device if (!dev->fmr_pool) return -ENODEV; + if (mellanox_workarounds && + sg_dma_address(&scat[0]) & ~dev->fmr_page_mask) + return -EINVAL; + len = page_cnt = 0; for (i = 0; i < sg_cnt; ++i) { if (sg_dma_address(&scat[i]) & ~dev->fmr_page_mask) { -- The question of whether a computer can think is no more interesting than the question of whether a submarine can swim. -- Edsger Wybe Dijkstra ----- End forwarded message ----- -- MST From sean.hefty at intel.com Tue Jul 25 14:54:02 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 14:54:02 -0700 Subject: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users Message-ID: <000a01c6b034$d7618bc0$8698070a@amr.corp.intel.com> Just returning thread to list... I accidentally bumped it off by replying directly. - Sean >Sean Hefty wrote: >>>>Are you wanting to dynamically determine that address? >>> >>>Yes - for one, I don't want to concern the MPI user with that sort of >>>detail. Also, I imagine a single MPI job using many different multicast >>>groups simultaneously. >>> >>>What I figured I would do is have a range of multicast addresses I could >>>select from (which WOULD be configurable by users, but have good >>>defaults), then successively select multicast addresses until a free one >>>is found. >> >> >> There is an interface in the kernel that could be used to determine if a port >> has already joined a multicast group. It may be possible to expose something >> like this to userspace, but how will you ensure that all related MPI jobs >join >> the correct group? Maybe it would help me to understand how MPI uses IP >> multicast groups today. > >Apologies for the long email.. > >determine if a port has already joined a multicast group.. not sure >that's what I'm after. What I'm thinking is that one predefined group >of processes are communicating via a multicast group. Another group of >processes are started up, and want to communicate via multicast as well >- what is to stop this second group from using the same multicast >address, and erroneously sending messages to the first group (and vice >versa)? Note that these groups of processes may or may not be running >on the same ports, though may still be on the same network. > >If we don't have some way to prevent different MPI jobs on a network >from using the same multicast address, multicast is not going to be very >useful. > > >MPI has used very little multicast so far - the mvapich group has a >couple papers on using multicast for collective operations, I don't know >of any other MPI's that have used it. I'm looking to do some similar >work in Open MPI, and hopefully take it further and do more/better. In >other words, I don't know for sure how MPI uses multicast yet :) > >It's not difficult to know who should/shouldn't be part of a particular >multicast group. For example, all of the members of a particular >communicator might be in a single multicast group. We always know which >peers are in a particular communicator, so we know which peers should be >in a multicast group together. Coming up with an address for a >particular group is a little more difficult - the address would have to >be chosen by one peer (I'd just pick rank 0 or something), then >communicated OOB to the other peers in the communicator. > >An example of where this mapping of communicator to multicast group >might not hold would be at scale. I have no idea how far the IB >multicast scales yet - but if it doesn't go as far as MPI does, multiple >groups would probably be needed to span the communicator. It may also >be beneficial (performance) to use many smaller groups in a tree-like >fashion instead of one large group. > > >One solution (suggested by Matt Leininger) would be for the IB stack >(CM?) to hand out multicast addresses. I'm thinking it would be useful >to come up with a header file for a solution before implementing >anything. An 'ideal' API might look something like this: > >ObtainMulticastAddress > Returns a multicast address guaranteed not to be in use on the network > >JoinMulticastGroup (aka connect) > Requires a multicast address specifying which multicast group to join > >LeaveMulticastGroup (aka disconnect) > Requires a multicast address specifying which multicast group to leave > >ReturnMulticastAddress (inform IB that an address may be reused) > Requires a multicast address > >I would then use ObtainMulticastAddress, then pass the returned address >via OOB to all the peers I want to be in that multicast group. All the >peers would then use JoinMulticastGroup. > >ReturnMulticastAddress could require that all calls to >JoinMulticastGroup on the provided address have been paired with >matching LeaveMulticastGroup calls. What would be better though, is if >the group were asynchronously released when the last peer in the group >calls LeaveMulticastGroup. This would avoid the need for explicitly >fencing between all the LeaveMulticastGroup calls and one peer calling >ReturnMulticastAddress. > >The idea behind ObtainMulticastAddress is that I don't care what the >particular address is - just that I have an address, and nobody else is >using it. The address can take any form - does not have to be IP. > >It's important that everything be non-blocking. If needed, operations >can be done asynchronously (ie completion signalled via an event queue). > >Is this sort of approach even feasible? > >Andrew From rdreier at cisco.com Tue Jul 25 14:54:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 25 Jul 2006 14:54:47 -0700 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: <20060725214915.GF12113@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 26 Jul 2006 00:49:15 +0300") References: <20060725214915.GF12113@mellanox.co.il> Message-ID: > Until there's a better understanding of the issue, I've come up with the > following simple patch that will use indirect mode for this case. Is it possible to check the target's OUI to detect Mellanox targets? Right now this workaround affects all targets unconditionally. - R. From rdreier at cisco.com Tue Jul 25 14:59:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 25 Jul 2006 14:59:18 -0700 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: (Roland Dreier's message of "Tue, 25 Jul 2006 14:54:47 -0700") References: <20060725214915.GF12113@mellanox.co.il> Message-ID: BTW -- how are you seeing this corruption? Have you tested other targets? If not then I would like to be able to run the test and confirm that it really is a problem with the Mellanox target. - R. From sean.hefty at intel.com Tue Jul 25 15:01:57 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 15:01:57 -0700 Subject: [openib-general] ucma into kernel.org In-Reply-To: <20060725214501.GE12113@mellanox.co.il> Message-ID: <000b01c6b035$f2a6c9d0$8698070a@amr.corp.intel.com> >Hmm, thanks. >So what you are saying is that we must destroy/recreate the QP as well, >and this implies this must be done at ULP level. Right? I believe so. >So I guess if CMA rounds the retry count down to the max supported >value, this will be enough for most ULPs, and if not, ULP >can get the etry count, see that it is smaller than what user set, >and destroy/retry. > >Sounds good? I would rather the set failed if the value is too large. We could expose the maximum supported number of retries through a #define in the header file. So, is the proposal to provide a generic set_option call for connection parameters? Is there any other parameter than retries? Does a generic connection timeout value work for TCP as well? - Sean From mst at mellanox.co.il Tue Jul 25 15:02:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 01:02:40 +0300 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: References: Message-ID: <20060725220240.GG12113@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] RFC: srp filesystem data corruption problem/work-around > > > Until there's a better understanding of the issue, I've come up with the > > following simple patch that will use indirect mode for this case. > > Is it possible to check the target's OUI to detect Mellanox targets? > Right now this workaround affects all targets unconditionally. Hmm ... okay, makes sense. thanks. It might not be just mellanox OUI, but I'll check how this can be detected. -- MST From mst at mellanox.co.il Tue Jul 25 15:09:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 01:09:51 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <000b01c6b035$f2a6c9d0$8698070a@amr.corp.intel.com> References: <000b01c6b035$f2a6c9d0$8698070a@amr.corp.intel.com> Message-ID: <20060725220951.GJ12113@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: ucma into kernel.org > > >Hmm, thanks. > >So what you are saying is that we must destroy/recreate the QP as well, > >and this implies this must be done at ULP level. Right? > > I believe so. > > >So I guess if CMA rounds the retry count down to the max supported > >value, this will be enough for most ULPs, and if not, ULP > >can get the etry count, see that it is smaller than what user set, > >and destroy/retry. > > > >Sounds good? > > I would rather the set failed if the value is too large. We could expose the > maximum supported number of retries through a #define in the header file. OK, but wouldn't it be different for IB/TCP? Maybe a function then? > So, is the proposal to provide a generic set_option call for connection > parameters? Is there any other parameter than retries? Does a generic > connection timeout value work for TCP as well? Not that I know: I see MAX_TCP_SYNCNT but nothing to do with timers. Ideally CMA would do something adaptive ... -- MST From mst at mellanox.co.il Tue Jul 25 15:19:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 01:19:25 +0300 Subject: [openib-general] ucma into kernel.org In-Reply-To: <000b01c6b035$f2a6c9d0$8698070a@amr.corp.intel.com> References: <000b01c6b035$f2a6c9d0$8698070a@amr.corp.intel.com> Message-ID: <20060725221925.GA12864@mellanox.co.il> Quoting r. Sean Hefty : > So, is the proposal to provide a generic set_option call for connection > parameters? yes > Is there any other parameter than retries? not that I know of. -- MST From narravul at cse.ohio-state.edu Tue Jul 25 15:48:52 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Tue, 25 Jul 2006 18:48:52 -0400 (EDT) Subject: [openib-general] Problem running rping on chelsio T3 adapters In-Reply-To: Message-ID: With disabling of MSI-X rping now works. Thanks, --Sundeep. On Tue, 25 Jul 2006, Sundeep Narravula wrote: > Hi Steve, > > > 1) krping works over T3. So the physical connectivity is ok. > > This does NOT work. It gives out the same error as rping. > > > 2) rping doesn't work...rping client gets an error 110 from cma event 1. > > Event 1 is ADDR_ERROR, which means something balled up trying to figure > > out which local interface to use, or the ARP resolution failed (timed > > out). > > > > 3) I seem to remember from the T3 debug logs, that the client isn't > > getting any connect request from the iwarp CM. This jives with #2. > > Yes and Yes. > > > > > Does the server ever get any connection indication at all? > > No. The server logs do not show any indication of conenction request or > error > > > Can I log into these systems and poke around? It'll definitely expedite > > things... > > Yes. We will create an account on these systems. I will send out the > information shortly. > > > Also, a network sniffer trace of pkts going in/out of the client's port > > would help too... > > I will have to check this. > > Thanks, > --Sundeep. > > > > > Steve. > > > > > > > > > > On Tue, 2006-07-25 at 13:47 -0400, Sundeep Narravula wrote: > > > > > > > > > > > > > We are using kernel 2.6.16.3 and installed the latest code/drivers from > > > > > > > > > > > > > the iwarp branch. Currently, we are seeing an address resolution problem > > > > > > > > > > > > > while using rping. > > > > > > > > > > > > > > > > > > > > > > > > > > [narravul at d1-as4 examples]$ ./rping -c -VvC100 -a 150.10.11.200 -p 9999 > > > > > > > > > > > > > libibverbs: Warning: couldn't load driver > > > > > > > > > > > > > /usr/local/lib/infiniband/mthca.so: /usr/local/lib/infiniband/mthca.so: > > > > > > > > > > > > > undefined symbol: ibv_cmd_query_pkey > > > > > > > > > > > > > cma event 1, error -110 > > > > > > > > > > > > > > > > Why don't you go ahead and build/install libmthca -or- remove the IB > > > > card and see if that helps. This is still some basic > > > > configuration/installation problem here... > > > > > > Hi Steve, > > > We have tried both these options. Building libmthca / removing the IB > > > cards did not help. We are still running the same errors. The dmesg log > > > are also identical. > > > > > > --Sundeep. > > > > > > > > > > > Steve. > > > > > > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From vuhuong at mellanox.com Tue Jul 25 15:53:57 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 25 Jul 2006 15:53:57 -0700 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: References: <20060725214915.GF12113@mellanox.co.il> Message-ID: <44C6A105.3050902@mellanox.com> Roland Dreier wrote: > > Until there's a better understanding of the issue, I've come up with the > > following simple patch that will use indirect mode for this case. > > Is it possible to check the target's OUI to detect Mellanox targets? Mellanox target ioc_guid is derived from node_guid; therefore, Mellanox target's OUI would be "00-02-C9" I think that Engenio and DDN target also use node_guid to derive target ioc_guid -- they may have same target's OUI as Mellanox target > Right now this workaround affects all targets unconditionally. > Can we rework the patch to have mellanox_workarounds=0 by default? Vu From mshefty at ichips.intel.com Tue Jul 25 16:01:55 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 16:01:55 -0700 Subject: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <000a01c6b034$d7618bc0$8698070a@amr.corp.intel.com> References: <000a01c6b034$d7618bc0$8698070a@amr.corp.intel.com> Message-ID: <44C6A2E3.40704@ichips.intel.com> Sean Hefty wrote: >>One solution (suggested by Matt Leininger) would be for the IB stack >>(CM?) to hand out multicast addresses. I'm thinking it would be useful >>to come up with a header file for a solution before implementing >>anything. An 'ideal' API might look something like this: The IB SA does something like this. If a user creates a multicast group with an MGID of 0, the SA will assign an MGID to the group. That MGID then somehow needs to be distributed to everyone wanting to join that group. I'm not aware of any equivalent functionality for IP multicast groups, so I'm not sure if it makes sense to try to provide this functionality through the RDMA CM or expose a raw IB multicast interface. I've been trying to come up with a raw SA interface for userspace, but a generic one that supports RMPP isn't ending up much simpler than the raw MAD interface. And looking at the multicast module in the kernel, I just noticed that it doesn't handle the case if the user tries to join a group with an MGID of 0. >>I would then use ObtainMulticastAddress, then pass the returned address >>via OOB to all the peers I want to be in that multicast group. All the >>peers would then use JoinMulticastGroup. In order to use what's there, is there any way that the processes can create unique addresses to use? Maybe map the server port numbers into the address? >>ReturnMulticastAddress could require that all calls to >>JoinMulticastGroup on the provided address have been paired with >>matching LeaveMulticastGroup calls. What would be better though, is if >>the group were asynchronously released when the last peer in the group >>calls LeaveMulticastGroup. This would avoid the need for explicitly >>fencing between all the LeaveMulticastGroup calls and one peer calling >>ReturnMulticastAddress. Your interface looks good. I just need to think more about the details of what approach to use. I want to find the balance between being easy to use, but provide the necessary capabilities. I like the idea that the group is automatically deleted when the last user leaves, since this matches with the IB implementation. We might also be able to remove the ObtainMulticastAddress, by letting JoinMulticastGroup take a wildcard address. Can you take a look at the kernel interface in ib_multicast.h, and let me know if exposing that to userspace would work for you? >>The idea behind ObtainMulticastAddress is that I don't care what the >>particular address is - just that I have an address, and nobody else is >>using it. The address can take any form - does not have to be IP. >> >>It's important that everything be non-blocking. If needed, operations >>can be done asynchronously (ie completion signalled via an event queue). >> >>Is this sort of approach even feasible? I'm pretty sure that the approach is feasible. It's just that the code there now wasn't written to support a call like ObtainMulticastAddress. - Sean From vuhuong at mellanox.com Tue Jul 25 16:06:31 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 25 Jul 2006 16:06:31 -0700 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: References: <20060725214915.GF12113@mellanox.co.il> Message-ID: <44C6A3F7.9080704@mellanox.com> Roland Dreier wrote: > BTW -- how are you seeing this corruption? Host machine: EM64T cpu, RHEL4 u1/2/3 x86 32-bit version Create/mount ext2 or ext3 file system on srp devices using 1K block size (ie. mkfs -t ext2 -b 1024). Copy file, umount, remount and compare to the original file - looping the procedure above and eventually you will get data mismatch. > > Have you tested other targets? If not then I would like to be able to > run the test and confirm that it really is a problem with the Mellanox > target. I tested with Engenio target and have not seen this problem. Please run your file system data corruption test with 1K block size on ext2/ext3 fs Vu From afriedle at open-mpi.org Tue Jul 25 17:07:11 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Tue, 25 Jul 2006 17:07:11 -0700 Subject: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44C6A2E3.40704@ichips.intel.com> References: <000a01c6b034$d7618bc0$8698070a@amr.corp.intel.com> <44C6A2E3.40704@ichips.intel.com> Message-ID: <44C6B22F.4010209@open-mpi.org> Sean Hefty wrote: > The IB SA does something like this. If a user creates a multicast group > with an MGID of 0, the SA will assign an MGID to the group. That MGID > then somehow needs to be distributed to everyone wanting to join that > group. That's exactly what I want, as long as it's certain the assigned MGID is not already in use. > I'm not aware of any equivalent functionality for IP multicast groups, > so I'm not sure if it makes sense to try to provide this functionality > through the RDMA CM or expose a raw IB multicast interface. I'd prefer a raw IB interface - like you said, this isn't really analogous to IP, and I'd like to avoid the other non-multicast issue I have with RDMA CM. Also, when I first started looking at IB multicast I was expecting this to be part of the ibverbs interface, not a CM. >>> I would then use ObtainMulticastAddress, then pass the returned address >>> via OOB to all the peers I want to be in that multicast group. All the >>> peers would then use JoinMulticastGroup. > > > In order to use what's there, is there any way that the processes can > create unique addresses to use? Maybe map the server port numbers into > the address? Not sure I understand what you're asking.. addresses to use with what? > Your interface looks good. I just need to think more about the details > of what approach to use. I want to find the balance between being easy > to use, but provide the necessary capabilities. Always good :) > I like the idea that the group is automatically deleted when the last > user leaves, since this matches with the IB implementation. We might > also be able to remove the ObtainMulticastAddress, by letting > JoinMulticastGroup take a wildcard address. Can you take a look at the > kernel interface in ib_multicast.h, and let me know if exposing that to > userspace would work for you? Part of why I defined ReturnMulticastAddress the way I did was because I thought it would be useful to hold on to multicast groups without having any peers joined. These could be kept in a pool for re-use, and have peers join/leave them as needed. The MVAPICH group wrote a paper on a similar idea, where they keep a pool of groups with all peers joined, then any peers not interested in communication when a group is pulled from the pool can pull out. But if the time cost is in the join and not the initial creation, this doesn't solve anything. ib_multicast.h looks good.. lots of functionality packed into very few functions. I don't see any problems with it... yet :) I like the callback on join completion, as opposed to polling somewhere. The comments don't say anything about passing an MGID of 0 in - I assume this functionality will be there. Would I pass an MLID of 0 as well, or do I need to come up with a valid MLID from somewhere? Just to make sure, if I pass in an MGID of 0, an MGID will not only be allocated, but joined as well? Again to be clear, ib_free_multicast() will leave the multicast group in question? Is ib_get_mcmember_rec the interface you mentioned for determining whether a port is already in a multicast group? Just thought of a feature that would be nice. As-is, I have no idea when all peers intending to join a multicast group have done so. What would be nice is some sort of notification mechanism - say the ability to provide a callback that is called each time a peer joins a multicast group. I already know which peers I expect to join, so I can keep a list of which ones have/haven't joined, and mark the multicast group as useable when all the expected peers are joined. Would this be reasonable? The alternative for me would be for each peer to send messages OOB to every other peer in the multicast group when it has successfully joined. Andrew From sean.hefty at intel.com Tue Jul 25 18:40:24 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 25 Jul 2006 18:40:24 -0700 Subject: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44C6B22F.4010209@open-mpi.org> Message-ID: <000001c6b054$77873950$13cd180a@amr.corp.intel.com> >I'd prefer a raw IB interface - like you said, this isn't really >analogous to IP, and I'd like to avoid the other non-multicast issue I >have with RDMA CM. Also, when I first started looking at IB multicast I >was expecting this to be part of the ibverbs interface, not a CM. IB multicast is more of an SA interface, rather than IB verbs or IB CM. I've already started looking at bringing the kernel ib_multicast interface up into userspace. >> In order to use what's there, is there any way that the processes can >> create unique addresses to use? Maybe map the server port numbers into >> the address? > >Not sure I understand what you're asking.. addresses to use with what? I was trying to ask if there was any way for the processes to generate unique addresses. For example, what TCP port number do the processes listen on when establishing their out of band connections? Is there some way that you can map the addresses that are used for out of band communication to a multicast IP address, such that the processes get unique addresses? From reading down into your mail, it doesn't sound like this would help much. >Part of why I defined ReturnMulticastAddress the way I did was because I >thought it would be useful to hold on to multicast groups without having >any peers joined. These could be kept in a pool for re-use, and have >peers join/leave them as needed. The MVAPICH group wrote a paper on a >similar idea, where they keep a pool of groups with all peers joined, >then any peers not interested in communication when a group is pulled >from the pool can pull out. But if the time cost is in the join and not >the initial creation, this doesn't solve anything. The only standard defined way to allocate an IB multicast address (i.e. MGID), is for someone to create the multicast group. This does an implicit join by the creator. In IB, the cost is in the join, since it requires programming the switches. The group will continue to exist as long as someone remains in the group. >ib_multicast.h looks good.. lots of functionality packed into very few >functions. I don't see any problems with it... yet :) I think the same basic API can be exposed in userspace. It may be possible to expose a couple of extra helper functions to simplify creating and joining a group, but I'm not sure if they will be worth it. >I like the callback on join completion, as opposed to polling somewhere. This doesn't end up working well for userspace apps. To get a callback, the library ends up needing to create a thread to poll for events from the kernel. It makes more sense to give the application control over the threading, and let it poll for the events. >The comments don't say anything about passing an MGID of 0 in - I assume >this functionality will be there. Would I pass an MLID of 0 as well, or >do I need to come up with a valid MLID from somewhere? Well, after looking at the code, an MGID of 0 doesn't currently work. The implementation doesn't handle it. I worked on a design to add support for MGID 0 to the multicast module, and will start on it in the next day or so. Another thought I had is to allow ib_get_mcmember_rec() be called with an MGID of 0. Doing so would return an MCMemberRecord with reasonable default values that could be used when creating a group. (The returned values would either be hard-coded or copy those from the first join on a given port, if one had occurred. In almost all cases, the first join would come from ipoib.) >Just to make sure, if I pass in an MGID of 0, an MGID will not only be >allocated, but joined as well? Correct >Again to be clear, ib_free_multicast() will leave the multicast group in >question? Correct - the function is called "free" instead of "leave" because it must be called even if the join request failed, and may be called if the join operation has not yet completed. >Is ib_get_mcmember_rec the interface you mentioned for determining >whether a port is already in a multicast group? Yes - but it requires that you already know which group (MGID) is being joined. >Just thought of a feature that would be nice. As-is, I have no idea >when all peers intending to join a multicast group have done so. What >would be nice is some sort of notification mechanism - say the ability >to provide a callback that is called each time a peer joins a multicast >group. I already know which peers I expect to join, so I can keep a >list of which ones have/haven't joined, and mark the multicast group as >useable when all the expected peers are joined. > >Would this be reasonable? The alternative for me would be for each peer >to send messages OOB to every other peer in the multicast group when it >has successfully joined. There is no way to do this. Note that there may be a delay between a node joining a group and the programming of the switch tables. - Sean From herbert at gondor.apana.org.au Tue Jul 25 20:39:51 2006 From: herbert at gondor.apana.org.au (Herbert Xu) Date: Wed, 26 Jul 2006 13:39:51 +1000 Subject: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification. In-Reply-To: <1153839940.14354.37.camel@stevo-desktop> References: <1153839940.14354.37.camel@stevo-desktop> Message-ID: <20060726033951.GB3203@gondor.apana.org.au> On Tue, Jul 25, 2006 at 10:05:40AM -0500, Steve Wise wrote: > > But they really are seeing a delete followed by an add. That's what the > kernel is doing. Actually that's the other thing I don't really like. The user-space monitor may perceive that a route was actually deleted and replaced by a new one even though this isn't what's happening at all. In fact the problem here is that you're sending route notifications when it's really the dst_entry that's changing. User-space as it stands only get notifications about fib changes which is quite different from changes to the transient dst_entry objects which only exist in the route cache. Is anyone actually going to use the user-space interface of this? If not perhaps we should wait until someone really needs it before adding the netlink part of the patch. We can change the kernel interface at will so if we make a mistake with netevent it can be easily corrected. For user-space though the rules are totally different. I'd really hate to be stuck with an interface which turns out to not be the one that people actually want to have. > The rdma driver needs to update all established rdma connections that > are using the next-hop information of the existing route and make them > use the next-hop information of the new route. In addition, the rdma > driver might have a reference to the old dst entry. So it can release > that ref and add a ref to the new dst entry. Do you really need the old route for the user-space part of your patch? > I have to admit I'm a little fuzzy on the routing stuff. The main > netevents I've utilized in the the rdma driver I'm writing is the > neighbour update event and the redirect event. Route add/del was added > for completeness of "routing" netevents. So you mean you aren't going to use the route notifications? In that case we should probably just drop them and add them when someone actually needs it. At that point they can tell us what semantics they want from it :) > Can you expand further or point me to code where the IP stack "flushes > its tables" when routes are changed? Grep for rt_cache_flush in net/ipv4/fib_hash.c. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From halr at voltaire.com Tue Jul 25 20:47:50 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 25 Jul 2006 23:47:50 -0400 Subject: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44C6B22F.4010209@open-mpi.org> References: <000a01c6b034$d7618bc0$8698070a@amr.corp.intel.com> <44C6A2E3.40704@ichips.intel.com> <44C6B22F.4010209@open-mpi.org> Message-ID: <1153885668.4537.112476.camel@hal.voltaire.com> On Tue, 2006-07-25 at 20:07, Andrew Friedley wrote: [snip..] > The comments don't say anything about passing an MGID of 0 in - I assume > this functionality will be there. Would I pass an MLID of 0 as well, or > do I need to come up with a valid MLID from somewhere? The MLID comes from the SA. The SM assigns an available MLID for the (SA assigned) MGID and the SA returns this value in the SA MCMemberRecord response to the SA client. -- Hal From mst at mellanox.co.il Tue Jul 25 22:22:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 08:22:51 +0300 Subject: [openib-general] [PATCH] IB/uverbs: include cosmetic fix Message-ID: <20060726052251.GA8380@mellanox.co.il> IB/uverbs: include cosmetic fix Since uverbs_cmd.c uses lockdep now, it should include linux/lockdep.h directly rather than rely on linux/file.h to pull it in. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 30923eb..3fbec94 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -37,6 +37,7 @@ #include #include +#include #include -- MST From ogerlitz at voltaire.com Tue Jul 25 22:47:06 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 26 Jul 2006 08:47:06 +0300 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C655EF.4010804@ichips.intel.com> References: <000101c6aee3$951b95a0$87f9070a@amr.corp.intel.com> <44C4FD69.9040005@ichips.intel.com> <44C62643.6030801@voltaire.com> <44C655EF.4010804@ichips.intel.com> Message-ID: <44C701DA.9030605@voltaire.com> Sean Hefty wrote: > After thinking about peer to peer more, I think that it could insert the > local QPN after the passive side calls ib_send_cm_rep(), similar to > what's done in the client-server model. If we only want to track local > QPNs for the purposes of handling the COMM_EST event, then tracking on > the passive side is sufficient. If we want to track local QPNs as part > of timewait, then we want tracking on both sides. I went the latter route. Generally, i guess you need to insert the local QPN into the rb_tree ***before*** sending the REP not after it. Can you state what is the usage being done with the local QPNs in the timeout on REQ flow? Or. From ogerlitz at voltaire.com Tue Jul 25 22:52:02 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 26 Jul 2006 08:52:02 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: References: Message-ID: <44C70302.1020908@voltaire.com> Rimmer, Todd wrote: > Perhaps we should pursue changing this in the IBTA spec. Being able to > post to the SQ while in RTR makes handling of the Comm Est/RTU race with > the CQ callback much easier to handle. > > It would be better if the IB spec permitted posting to the SQ in RTR but > indicated the SQ would not be processed until the QP moved to RTS. I > believe the present Mellanox silicon/firmware implements such behavior. Compared to the actual need and complexity induced on the driver for supporting proxy-ing of TX WRs, I think this (IBTA-ing and implement-ing the thing) is overdoing which should be avoided. Since with Sean's patch it is ensured that an established event would be generated by the CM, a passive side IB consumer (eg iSER/SRP/NFS targets) can queue an incoming RX till the event comes in. Or. From mst at mellanox.co.il Tue Jul 25 23:26:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 09:26:47 +0300 Subject: [openib-general] [PATCH] lockdep: don't pull in includes when lockdep disabled In-Reply-To: <20060704115656.GA1539@elte.hu> References: <20060704115656.GA1539@elte.hu> Message-ID: <20060726062647.GA8711@mellanox.co.il> Ingo, does the following look good to you? Do not pull in various includes through lockdep.h if lockdep is disabled. Signed-off-by: Michael S. Tsirkin diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 316e0fb..39d50c4 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -8,13 +8,13 @@ #ifndef __LINUX_LOCKDEP_H #define __LINUX_LOCKDEP_H +#ifdef CONFIG_LOCKDEP + #include #include #include #include -#ifdef CONFIG_LOCKDEP - /* * Lock-class usage-state bits: */ -- MST From arjan at infradead.org Tue Jul 25 23:33:19 2006 From: arjan at infradead.org (Arjan van de Ven) Date: Wed, 26 Jul 2006 08:33:19 +0200 Subject: [openib-general] [PATCH] lockdep: don't pull in includes when lockdep disabled In-Reply-To: <20060726062647.GA8711@mellanox.co.il> References: <20060704115656.GA1539@elte.hu> <20060726062647.GA8711@mellanox.co.il> Message-ID: <1153895599.2896.4.camel@laptopd505.fenrus.org> On Wed, 2006-07-26 at 09:26 +0300, Michael S. Tsirkin wrote: > Ingo, does the following look good to you? > > Do not pull in various includes through lockdep.h if lockdep is disabled. Hi, can you tell us what this fixes? Eg is there a specific problem? I mean... we're adding ifdefs so there better be a real good reason for them.... fixing something real would be such a reason ;-) Greetings, Arjan van de Ven From mst at mellanox.co.il Tue Jul 25 23:43:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 09:43:49 +0300 Subject: [openib-general] [PATCH] lockdep: don't pull in includes when lockdep disabled In-Reply-To: <1153895599.2896.4.camel@laptopd505.fenrus.org> References: <1153895599.2896.4.camel@laptopd505.fenrus.org> Message-ID: <20060726064349.GA8874@mellanox.co.il> Quoting r. Arjan van de Ven : > Subject: Re: [PATCH] lockdep: don't pull in includes when lockdep disabled > > On Wed, 2006-07-26 at 09:26 +0300, Michael S. Tsirkin wrote: > > Ingo, does the following look good to you? > > > > Do not pull in various includes through lockdep.h if lockdep is disabled. > > Hi, > > can you tell us what this fixes? Eg is there a specific problem? Er ... it's a cosmetic change - there's no serious problem, it is just that even if I disable lockdep, linux/lockdep.h will pull in several headers even though they are not needed -> more useless work for compiler to do. > I mean... we're adding ifdefs Note this doesn't add ifdefs, just moves them around. > so there better be a real good reason for > them.... fixing something real would be such a reason ;-) Well, I don't expect this specific bit to speed compilation up in any measurable way, but unnecessary includes do have the tendency to accumulate and lead to slower builds ... Is that a reason? -- MST From adobriyan at gmail.com Wed Jul 26 00:13:18 2006 From: adobriyan at gmail.com (Alexey Dobriyan) Date: Wed, 26 Jul 2006 11:13:18 +0400 Subject: [openib-general] [PATCH] lockdep: don't pull in includes when lockdep disabled In-Reply-To: <1153895599.2896.4.camel@laptopd505.fenrus.org> References: <20060704115656.GA1539@elte.hu> <20060726062647.GA8711@mellanox.co.il> <1153895599.2896.4.camel@laptopd505.fenrus.org> Message-ID: <20060726071318.GA6824@martell.zuzino.mipt.ru> On Wed, Jul 26, 2006 at 08:33:19AM +0200, Arjan van de Ven wrote: > On Wed, 2006-07-26 at 09:26 +0300, Michael S. Tsirkin wrote: > > Ingo, does the following look good to you? > > > > Do not pull in various includes through lockdep.h if lockdep is disabled. > > Hi, > > can you tell us what this fixes? Eg is there a specific problem? [raises hand] Zillions of warnings on m68k allmodconfig. And, yes, patch removes them. In file included from ... from ... include/linux/list.h: In function `__list_add_rcu': include/linux/list.h:89: warning: implicit declaration of function `smp_wmb' > I mean... we're adding ifdefs so there better be a real good reason for > them.... fixing something real would be such a reason ;-) From mst at mellanox.co.il Wed Jul 26 02:28:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 12:28:43 +0300 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: <44C6A105.3050902@mellanox.com> References: <44C6A105.3050902@mellanox.com> Message-ID: <20060726092843.GR8795@mellanox.co.il> Quoting r. Vu Pham : > > Right now this workaround affects all targets unconditionally. > > > > Can we rework the patch to have mellanox_workarounds=0 by > default? Hmm ... since this is a data corruption issue, seems to me the safe setting should be the default one. No? -- MST From k_mahesh85 at yahoo.co.in Wed Jul 26 02:54:39 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Wed, 26 Jul 2006 10:54:39 +0100 (BST) Subject: [openib-general] connection loss handling in MTHCA Message-ID: <20060726095439.75597.qmail@web8320.mail.in.yahoo.com> hello all, i have a query about the "connection loss" handling in the mthca driver consider the following situation, during the data transfer between two connected endpoints if one side end point(HCA level) detects that coneection is lost due to some reason such as "receive queue empty on remote end " or "TPT error for data buffer on remote end" then how it will be handled in the mthca implementation what happens to the WRs which are in progress and outstanding?? is there any asynchronous event generated correspondig to that?? -Mahesh --------------------------------- Find out what India is talking about on Yahoo! Answers India. SMS memory full? Store all your important SMS in your Yahoo! Mail. Register for SMS BAK UP now! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed Jul 26 03:29:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 13:29:44 +0300 Subject: [openib-general] restore missing PCI registers after reset In-Reply-To: <20060717162531.GC4829@kroah.com> References: <20060717162531.GC4829@kroah.com> Message-ID: <20060726102944.GA9411@mellanox.co.il> Quoting r. Greg KH : > Subject: [patch 02/45] IB/mthca: restore missing PCI registers after reset > ------------------ > mthca does not restore the following PCI-X/PCI Express registers after reset: > PCI-X device: PCI-X command register > PCI-X bridge: upstream and downstream split transaction registers > PCI Express : PCI Express device control and link control registers > > This causes instability and/or bad performance on systems where one of > these registers is set to a non-default value by BIOS. > > Signed-off-by: Michael S. Tsirkin > Signed-off-by: Chris Wright > Signed-off-by: Greg Kroah-Hartman By the way, Greg, this code is completely generic, and the same seems to apply to all PCI-X/PCI-Express devices - should not pci_restore_state and friends really know about these registers, as well? What do you think? -- MST From trimmer at silverstorm.com Wed Jul 26 06:21:56 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Wed, 26 Jul 2006 09:21:56 -0400 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <20060725201729.GB11519@mellanox.co.il> Message-ID: > Quoting r. Sean Hefty : > > Subject: RE: [PATCH 0/4] Dispatch communication relatedevents to the IB > CM > > > > >Perhaps we should pursue changing this in the IBTA spec. Being able to > > >post to the SQ while in RTR makes handling of the Comm Est/RTU race > with > > >the CQ callback much easier to handle. > > > > > >It would be better if the IB spec permitted posting to the SQ in RTR > but > > >indicated the SQ would not be processed until the QP moved to RTS. I > > >believe the present Mellanox silicon/firmware implements such behavior. > > > > I think it would be simpler to transition the QP to RTS after sending a > REP, > > with the restriction that a user may not post sends until an RTU is > received, a > > communication establish event occurs, or a receive message completes on > the QP. This approach will not work. If the QP is in RTS the Communication established event will never be generated. Hence the lost RTU case would not be properly handled and the ULP would need to take on the burden. Its much better to isolate the solution to the CM and let the ULP post to the send Q in RTR. Todd Rimmer Chief Systems Architect SilverStorm Technologies Voice: 610-233-4852 Fax: 610-233-4777 TRimmer at SilverStorm.com www.SilverStorm.com From ogerlitz at voltaire.com Wed Jul 26 07:26:49 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 26 Jul 2006 17:26:49 +0300 (IDT) Subject: [openib-general] controlling IPoIB debug Message-ID: Roland, This will probably turn into newbee question, but anyway: I can't disable CONFIG_INFINIBAND_IPOIB_DEBUG, that is i was expecting to be able to press "n" on the "IP-over-InfiniBand debugging" submenu of "IP-over-InfiniBand" and it does not have any impact. Also, with CONFIG_INFINIBAND_IPOIB_DEBUG being set, no "*" is marked near it, so it looks like: IP-over-InfiniBand --- IP-over-InfiniBand debugging [ ] IP-over-InfiniBand data path debugging Do i miss anything here, or there is some problem? attached is my .config Or. -------------- next part -------------- # # Automatically generated make config: don't edit # Linux kernel version: 2.6.18-rc2 # Wed Jul 26 17:21:21 2006 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y # CONFIG_TASK_DELAY_ACCT is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_CPUSETS is not set # CONFIG_RELAY is not set CONFIG_INITRAMFS_SOURCE="" CONFIG_UID16=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_RT_MUTEXES=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y # # Block layer # CONFIG_LBD=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set # CONFIG_MK8 is not set # CONFIG_MPSC is not set CONFIG_GENERIC_CPU=y CONFIG_X86_L1_CACHE_BYTES=128 CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_X86_INTERNODE_CACHE_BYTES=128 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_X86_HT=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y CONFIG_SMP=y CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y CONFIG_NUMA=y CONFIG_K8_NUMA=y CONFIG_NODES_SHIFT=6 CONFIG_X86_64_ACPI_NUMA=y CONFIG_NUMA_EMU=y CONFIG_ARCH_DISCONTIGMEM_ENABLE=y CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set CONFIG_DISCONTIGMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_DISCONTIGMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_NEED_MULTIPLE_NODES=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MIGRATION=y CONFIG_RESOURCES_64BIT=y CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y CONFIG_NR_CPUS=32 CONFIG_HOTPLUG_CPU=y CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_SWIOTLB=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_AMD=y # CONFIG_KEXEC is not set # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x200000 CONFIG_SECCOMP=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_1000 is not set CONFIG_HZ=250 # CONFIG_REORDER is not set CONFIG_K8_NB=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_ISA_DMA_API=y CONFIG_GENERIC_PENDING_IRQ=y # # Power management options # CONFIG_PM=y # CONFIG_PM_LEGACY is not set # CONFIG_PM_DEBUG is not set CONFIG_SOFTWARE_SUSPEND=y CONFIG_PM_STD_PARTITION="" CONFIG_SUSPEND_SMP=y # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y CONFIG_ACPI_SLEEP_PROC_SLEEP=y CONFIG_ACPI_AC=y CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y # CONFIG_ACPI_VIDEO is not set # CONFIG_ACPI_HOTKEY is not set CONFIG_ACPI_FAN=y CONFIG_ACPI_DOCK=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_HOTPLUG_CPU=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_NUMA=y # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set CONFIG_ACPI_TOSHIBA=y CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y CONFIG_ACPI_CONTAINER=y CONFIG_ACPI_SBS=m # # CPU Frequency scaling # CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_TABLE=y # CONFIG_CPU_FREQ_DEBUG is not set CONFIG_CPU_FREQ_STAT=y # CONFIG_CPU_FREQ_STAT_DETAILS is not set CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set CONFIG_CPU_FREQ_GOV_PERFORMANCE=y # CONFIG_CPU_FREQ_GOV_POWERSAVE is not set CONFIG_CPU_FREQ_GOV_USERSPACE=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y # CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set # # CPUFreq processor drivers # CONFIG_X86_POWERNOW_K8=y CONFIG_X86_POWERNOW_K8_ACPI=y # CONFIG_X86_SPEEDSTEP_CENTRINO is not set CONFIG_X86_ACPI_CPUFREQ=y # # shared options # CONFIG_X86_ACPI_CPUFREQ_PROC_INTF=y # CONFIG_X86_SPEEDSTEP_LIB is not set # # Bus options (PCI etc.) # CONFIG_PCI=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_MSI=y # CONFIG_PCI_DEBUG is not set # # PCCARD (PCMCIA/CardBus) support # # CONFIG_PCCARD is not set # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set # # Executable file formats / Emulations # CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set CONFIG_IA32_EMULATION=y CONFIG_IA32_AOUT=y CONFIG_COMPAT=y CONFIG_SYSVIPC_COMPAT=y # # Networking # CONFIG_NET=y # # Networking options # # CONFIG_NETDEBUG is not set CONFIG_PACKET=y # CONFIG_PACKET_MMAP is not set CONFIG_UNIX=y # CONFIG_NET_KEY is not set CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set CONFIG_IP_FIB_HASH=y CONFIG_IP_PNP=y CONFIG_IP_PNP_DHCP=y # CONFIG_IP_PNP_BOOTP is not set # CONFIG_IP_PNP_RARP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set # CONFIG_SYN_COOKIES is not set # CONFIG_INET_AH is not set # CONFIG_INET_ESP is not set # CONFIG_INET_IPCOMP is not set # CONFIG_INET_XFRM_TUNNEL is not set # CONFIG_INET_TUNNEL is not set # CONFIG_INET_XFRM_MODE_TRANSPORT is not set # CONFIG_INET_XFRM_MODE_TUNNEL is not set CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y # CONFIG_TCP_CONG_ADVANCED is not set CONFIG_TCP_CONG_BIC=y CONFIG_IPV6=y # CONFIG_IPV6_PRIVACY is not set # CONFIG_IPV6_ROUTER_PREF is not set # CONFIG_INET6_AH is not set # CONFIG_INET6_ESP is not set # CONFIG_INET6_IPCOMP is not set # CONFIG_INET6_XFRM_TUNNEL is not set # CONFIG_INET6_TUNNEL is not set # CONFIG_INET6_XFRM_MODE_TRANSPORT is not set # CONFIG_INET6_XFRM_MODE_TUNNEL is not set # CONFIG_IPV6_TUNNEL is not set # CONFIG_NETWORK_SECMARK is not set # CONFIG_NETFILTER is not set # # DCCP Configuration (EXPERIMENTAL) # # CONFIG_IP_DCCP is not set # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set # # TIPC Configuration (EXPERIMENTAL) # # CONFIG_TIPC is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set CONFIG_VLAN_8021Q=m # CONFIG_DECNET is not set # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set # # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_NET_TCPPROBE is not set # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set # CONFIG_IEEE80211 is not set # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y # CONFIG_DEBUG_DRIVER is not set # CONFIG_SYS_HYPERVISOR is not set # # Connector - unified userspace <-> kernelspace linker # # CONFIG_CONNECTOR is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # CONFIG_PARPORT=m # CONFIG_PARPORT_PC is not set # CONFIG_PARPORT_GSC is not set # CONFIG_PARPORT_AX88796 is not set # CONFIG_PARPORT_1284 is not set # # Plug and Play support # # CONFIG_PNP is not set # # Block devices # CONFIG_BLK_DEV_FD=y # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set # CONFIG_BLK_DEV_NBD is not set # CONFIG_BLK_DEV_SX8 is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_BLK_DEV_RAM_BLOCKSIZE=1024 CONFIG_BLK_DEV_INITRD=y # CONFIG_CDROM_PKTCDVD is not set # CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y CONFIG_IDEDISK_MULTI_MODE=y CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y # CONFIG_BLK_DEV_CMD640 is not set CONFIG_BLK_DEV_IDEPCI=y # CONFIG_IDEPCI_SHARE_IRQ is not set # CONFIG_BLK_DEV_OFFBOARD is not set # CONFIG_BLK_DEV_GENERIC is not set # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_RZ1000 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set CONFIG_BLK_DEV_AMD74XX=y CONFIG_BLK_DEV_ATIIXP=y # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_SC1200 is not set CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_IT821X is not set # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set CONFIG_BLK_DEV_PDC202XX_NEW=y # CONFIG_BLK_DEV_SVWKS is not set # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SIS5513 is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # # CONFIG_RAID_ATTRS is not set CONFIG_SCSI=m CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=m CONFIG_CHR_DEV_ST=m # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=m # CONFIG_BLK_DEV_SR_VENDOR is not set CONFIG_CHR_DEV_SG=m # CONFIG_CHR_DEV_SCH is not set # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # # CONFIG_SCSI_MULTI_LUN is not set CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_LOGGING=y # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=m CONFIG_SCSI_FC_ATTRS=m CONFIG_SCSI_ISCSI_ATTRS=m # CONFIG_SCSI_SAS_ATTRS is not set # # SCSI low-level drivers # CONFIG_ISCSI_TCP=m # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set CONFIG_SCSI_AIC79XX=m CONFIG_AIC79XX_CMDS_PER_DEVICE=32 CONFIG_AIC79XX_RESET_DELAY_MS=4000 # CONFIG_AIC79XX_ENABLE_RD_STRM is not set # CONFIG_AIC79XX_DEBUG_ENABLE is not set CONFIG_AIC79XX_DEBUG_MASK=0 # CONFIG_AIC79XX_REG_PRETTY_PRINT is not set # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set # CONFIG_MEGARAID_SAS is not set # CONFIG_SCSI_SATA is not set # CONFIG_SCSI_HPTIOP is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_QLOGIC_1280 is not set # CONFIG_SCSI_QLA_FC is not set # CONFIG_SCSI_LPFC is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # CONFIG_MD=y # CONFIG_BLK_DEV_MD is not set CONFIG_BLK_DEV_DM=y # CONFIG_DM_CRYPT is not set # CONFIG_DM_SNAPSHOT is not set # CONFIG_DM_MIRROR is not set # CONFIG_DM_ZERO is not set # CONFIG_DM_MULTIPATH is not set # # Fusion MPT device support # CONFIG_FUSION=y CONFIG_FUSION_SPI=m # CONFIG_FUSION_FC is not set # CONFIG_FUSION_SAS is not set CONFIG_FUSION_MAX_SGE=128 # CONFIG_FUSION_CTL is not set # # IEEE 1394 (FireWire) support # CONFIG_IEEE1394=y # # Subsystem Options # # CONFIG_IEEE1394_VERBOSEDEBUG is not set # CONFIG_IEEE1394_OUI_DB is not set # CONFIG_IEEE1394_EXTRA_CONFIG_ROMS is not set # CONFIG_IEEE1394_EXPORT_FULL_API is not set # # Device Drivers # # CONFIG_IEEE1394_PCILYNX is not set CONFIG_IEEE1394_OHCI1394=y # # Protocol Drivers # # CONFIG_IEEE1394_VIDEO1394 is not set # CONFIG_IEEE1394_SBP2 is not set # CONFIG_IEEE1394_ETH1394 is not set # CONFIG_IEEE1394_DV1394 is not set CONFIG_IEEE1394_RAWIO=y # # I2O device support # # CONFIG_I2O is not set # # Network device support # CONFIG_NETDEVICES=y # CONFIG_DUMMY is not set CONFIG_BONDING=m # CONFIG_EQUALIZER is not set CONFIG_TUN=y # # ARCnet devices # # CONFIG_ARCNET is not set # # PHY device support # # CONFIG_PHYLIB is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_HAPPYMEAL is not set # CONFIG_SUNGEM is not set # CONFIG_CASSINI is not set CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=y # CONFIG_TYPHOON is not set # # Tulip family network device support # CONFIG_NET_TULIP=y # CONFIG_DE2104X is not set CONFIG_TULIP=y # CONFIG_TULIP_MWI is not set # CONFIG_TULIP_MMIO is not set # CONFIG_TULIP_NAPI is not set # CONFIG_DE4X5 is not set # CONFIG_WINBOND_840 is not set # CONFIG_DM9102 is not set # CONFIG_ULI526X is not set # CONFIG_HP100 is not set CONFIG_NET_PCI=y # CONFIG_PCNET32 is not set # CONFIG_AMD8111_ETH is not set # CONFIG_ADAPTEC_STARFIRE is not set # CONFIG_B44 is not set CONFIG_FORCEDETH=y # CONFIG_DGRS is not set # CONFIG_EEPRO100 is not set CONFIG_E100=m # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set # CONFIG_NE2K_PCI is not set CONFIG_8139CP=y CONFIG_8139TOO=y # CONFIG_8139TOO_PIO is not set # CONFIG_8139TOO_TUNE_TWISTER is not set # CONFIG_8139TOO_8129 is not set # CONFIG_8139_OLD_RX_RESET is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_VIA_RHINE is not set # CONFIG_NET_POCKET is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set CONFIG_E1000=m # CONFIG_E1000_NAPI is not set # CONFIG_E1000_DISABLE_PACKET_SPLIT is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SIS190 is not set # CONFIG_SKGE is not set # CONFIG_SKY2 is not set # CONFIG_SK98LIN is not set # CONFIG_VIA_VELOCITY is not set CONFIG_TIGON3=y # CONFIG_BNX2 is not set # # Ethernet (10000 Mbit) # # CONFIG_CHELSIO_T1 is not set # CONFIG_IXGB is not set CONFIG_S2IO=m # CONFIG_S2IO_NAPI is not set # CONFIG_MYRI10GE is not set # # Token Ring devices # # CONFIG_TR is not set # # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set # # Wan interfaces # # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PLIP is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_NET_FC is not set # CONFIG_SHAPER is not set CONFIG_NETCONSOLE=y CONFIG_NETPOLL=y # CONFIG_NETPOLL_RX is not set # CONFIG_NETPOLL_TRAP is not set CONFIG_NET_POLL_CONTROLLER=y # # ISDN subsystem # # CONFIG_ISDN is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=y # CONFIG_INPUT_EVBUG is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_VSXXXAA is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_MISC is not set # # Hardware I/O ports # CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set # CONFIG_SERIO_PARKBD is not set # CONFIG_SERIO_PCIPS2 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set # CONFIG_GAMEPORT is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_VT_HW_CONSOLE_BINDING is not set # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_PCI=y CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_8250_RUNTIME_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 # CONFIG_PRINTER is not set # CONFIG_PPDEV is not set # CONFIG_TIPAR is not set # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # CONFIG_WATCHDOG=y # CONFIG_WATCHDOG_NOWAYOUT is not set # # Watchdog Device Drivers # CONFIG_SOFT_WATCHDOG=y # CONFIG_ACQUIRE_WDT is not set # CONFIG_ADVANTECH_WDT is not set # CONFIG_ALIM1535_WDT is not set # CONFIG_ALIM7101_WDT is not set # CONFIG_SC520_WDT is not set # CONFIG_EUROTECH_WDT is not set # CONFIG_IB700_WDT is not set # CONFIG_IBMASR is not set # CONFIG_WAFER_WDT is not set # CONFIG_I6300ESB_WDT is not set # CONFIG_I8XX_TCO is not set # CONFIG_SC1200_WDT is not set # CONFIG_60XX_WDT is not set # CONFIG_SBC8360_WDT is not set # CONFIG_CPU5_WDT is not set # CONFIG_W83627HF_WDT is not set # CONFIG_W83877F_WDT is not set # CONFIG_W83977F_WDT is not set # CONFIG_MACHZ_WDT is not set # CONFIG_SBC_EPX_C3_WATCHDOG is not set # # PCI-based Watchdog Cards # # CONFIG_PCIPCWATCHDOG is not set # CONFIG_WDTPCI is not set CONFIG_HW_RANDOM=y CONFIG_HW_RANDOM_INTEL=y CONFIG_HW_RANDOM_AMD=y CONFIG_HW_RANDOM_GEODE=y # CONFIG_NVRAM is not set CONFIG_RTC=y # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # CONFIG_AGP=y CONFIG_AGP_AMD64=y CONFIG_AGP_INTEL=y # CONFIG_AGP_SIS is not set # CONFIG_AGP_VIA is not set # CONFIG_DRM is not set # CONFIG_MWAVE is not set # CONFIG_PC8736x_GPIO is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=256 CONFIG_HPET=y # CONFIG_HPET_RTC_IRQ is not set CONFIG_HPET_MMAP=y # CONFIG_HANGCHECK_TIMER is not set # # TPM devices # # CONFIG_TCG_TPM is not set # CONFIG_TELCLOCK is not set # # I2C support # CONFIG_I2C=m CONFIG_I2C_CHARDEV=m # # I2C Algorithms # # CONFIG_I2C_ALGOBIT is not set # CONFIG_I2C_ALGOPCF is not set # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set CONFIG_I2C_AMD756=m CONFIG_I2C_AMD756_S4882=m CONFIG_I2C_AMD8111=m # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set # CONFIG_I2C_PIIX4 is not set CONFIG_I2C_ISA=m # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_OCORES is not set # CONFIG_I2C_PARPORT is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Miscellaneous I2C Chip support # # CONFIG_SENSORS_DS1337 is not set # CONFIG_SENSORS_DS1374 is not set # CONFIG_SENSORS_EEPROM is not set # CONFIG_SENSORS_PCF8574 is not set # CONFIG_SENSORS_PCA9539 is not set # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_MAX6875 is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # SPI support # # CONFIG_SPI is not set # CONFIG_SPI_MASTER is not set # # Dallas's 1-wire bus # # # Hardware Monitoring support # CONFIG_HWMON=y # CONFIG_HWMON_VID is not set # CONFIG_SENSORS_ABITUGURU is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set # CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ADM9240 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_ATXP1 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_F71805F is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_FSCPOS is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_GL520SM is not set # CONFIG_SENSORS_IT87 is not set # CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_LM92 is not set # CONFIG_SENSORS_MAX1619 is not set # CONFIG_SENSORS_PC87360 is not set # CONFIG_SENSORS_SIS5595 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_SMSC47M192 is not set CONFIG_SENSORS_SMSC47B397=m # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_VT8231 is not set # CONFIG_SENSORS_W83781D is not set # CONFIG_SENSORS_W83791D is not set # CONFIG_SENSORS_W83792D is not set # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83627HF is not set # CONFIG_SENSORS_W83627EHF is not set # CONFIG_SENSORS_HDAPS is not set # CONFIG_HWMON_DEBUG_CHIP is not set # # Misc devices # # CONFIG_IBM_ASM is not set # # Multimedia devices # # CONFIG_VIDEO_DEV is not set CONFIG_VIDEO_V4L2=y # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set # # Graphics support # CONFIG_FIRMWARE_EDID=y # CONFIG_FB is not set # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_VGACON_SOFT_SCROLLBACK=y CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=256 CONFIG_VIDEO_SELECT=y CONFIG_DUMMY_CONSOLE=y # # Sound # CONFIG_SOUND=y # # Advanced Linux Sound Architecture # # CONFIG_SND is not set # # Open Sound System # CONFIG_SOUND_PRIME=y # CONFIG_SOUND_BT878 is not set # CONFIG_SOUND_EMU10K1 is not set # CONFIG_SOUND_FUSION is not set # CONFIG_SOUND_ES1371 is not set CONFIG_SOUND_ICH=y # CONFIG_SOUND_TRIDENT is not set # CONFIG_SOUND_MSNDCLAS is not set # CONFIG_SOUND_MSNDPIN is not set # CONFIG_SOUND_VIA82CXXX is not set # CONFIG_SOUND_OSS is not set # CONFIG_SOUND_TVMIXER is not set # # USB support # CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y # CONFIG_USB is not set # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' # # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # MMC/SD Card support # # CONFIG_MMC is not set # # LED devices # # CONFIG_NEW_LEDS is not set # # LED drivers # # # LED Triggers # # # InfiniBand support # CONFIG_INFINIBAND=m CONFIG_INFINIBAND_USER_MAD=m CONFIG_INFINIBAND_USER_ACCESS=m CONFIG_INFINIBAND_ADDR_TRANS=y CONFIG_INFINIBAND_MTHCA=m CONFIG_INFINIBAND_MTHCA_DEBUG=y CONFIG_IPATH_CORE=m CONFIG_INFINIBAND_IPATH=m CONFIG_INFINIBAND_IPOIB=m CONFIG_INFINIBAND_IPOIB_DEBUG=y # CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set CONFIG_INFINIBAND_SRP=m CONFIG_INFINIBAND_ISER=m # # EDAC - error detection and reporting (RAS) (EXPERIMENTAL) # # CONFIG_EDAC is not set # # Real Time Clock # # CONFIG_RTC_CLASS is not set # # DMA Engine support # # CONFIG_DMA_ENGINE is not set # # DMA Clients # # # DMA Devices # # # Firmware Drivers # # CONFIG_EDD is not set # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set # # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y # CONFIG_EXT2_FS_SECURITY is not set # CONFIG_EXT2_FS_XIP is not set CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y # CONFIG_EXT3_FS_SECURITY is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y CONFIG_REISERFS_FS=y # CONFIG_REISERFS_CHECK is not set # CONFIG_REISERFS_PROC_INFO is not set CONFIG_REISERFS_FS_XATTR=y CONFIG_REISERFS_FS_POSIX_ACL=y # CONFIG_REISERFS_FS_SECURITY is not set CONFIG_JFS_FS=m # CONFIG_JFS_POSIX_ACL is not set # CONFIG_JFS_SECURITY is not set # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set CONFIG_FS_POSIX_ACL=y CONFIG_XFS_FS=m # CONFIG_XFS_QUOTA is not set # CONFIG_XFS_SECURITY is not set # CONFIG_XFS_POSIX_ACL is not set # CONFIG_XFS_RT is not set CONFIG_OCFS2_FS=m CONFIG_OCFS2_DEBUG_MASKLOG=y # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set CONFIG_INOTIFY=y CONFIG_INOTIFY_USER=y # CONFIG_QUOTA is not set CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=m CONFIG_AUTOFS4_FS=m # CONFIG_FUSE_FS is not set # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y # CONFIG_ZISOFS is not set # CONFIG_UDF_FS is not set # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y CONFIG_CONFIGFS_FS=m # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_HFSPLUS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # CONFIG_NFS_FS=m CONFIG_NFS_V3=y # CONFIG_NFS_V3_ACL is not set # CONFIG_NFS_V4 is not set # CONFIG_NFS_DIRECTIO is not set CONFIG_NFSD=m CONFIG_NFSD_V3=y # CONFIG_NFSD_V3_ACL is not set # CONFIG_NFSD_V4 is not set CONFIG_NFSD_TCP=y CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=m CONFIG_NFS_COMMON=y CONFIG_SUNRPC=m # CONFIG_RPCSEC_GSS_KRB5 is not set # CONFIG_RPCSEC_GSS_SPKM3 is not set # CONFIG_SMB_FS is not set # CONFIG_CIFS is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # CONFIG_9P_FS is not set # # Partition Types # # CONFIG_PARTITION_ADVANCED is not set CONFIG_MSDOS_PARTITION=y # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=y # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set CONFIG_NLS_ISO8859_15=y # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set CONFIG_NLS_UTF8=y # # Instrumentation Support # CONFIG_PROFILING=y CONFIG_OPROFILE=y CONFIG_KPROBES=y # # Kernel hacking # CONFIG_TRACE_IRQFLAGS_SUPPORT=y # CONFIG_PRINTK_TIME is not set CONFIG_MAGIC_SYSRQ=y CONFIG_UNUSED_SYMBOLS=y CONFIG_DEBUG_KERNEL=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_DETECT_SOFTLOCKUP=y # CONFIG_SCHEDSTATS is not set # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_RT_MUTEXES is not set # CONFIG_RT_MUTEX_TESTER is not set # CONFIG_DEBUG_SPINLOCK is not set # CONFIG_DEBUG_MUTEXES is not set # CONFIG_DEBUG_RWSEMS is not set # CONFIG_DEBUG_LOCK_ALLOC is not set # CONFIG_PROVE_LOCKING is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set CONFIG_DEBUG_FS=y # CONFIG_DEBUG_VM is not set # CONFIG_FRAME_POINTER is not set CONFIG_UNWIND_INFO=y CONFIG_STACK_UNWIND=y # CONFIG_FORCED_INLINING is not set # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_DEBUG_RODATA is not set # CONFIG_IOMMU_DEBUG is not set # CONFIG_DEBUG_STACKOVERFLOW is not set # CONFIG_DEBUG_STACK_USAGE is not set # # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y # CONFIG_CRYPTO_HMAC is not set # CONFIG_CRYPTO_NULL is not set # CONFIG_CRYPTO_MD4 is not set CONFIG_CRYPTO_MD5=m # CONFIG_CRYPTO_SHA1 is not set # CONFIG_CRYPTO_SHA256 is not set # CONFIG_CRYPTO_SHA512 is not set # CONFIG_CRYPTO_WP512 is not set # CONFIG_CRYPTO_TGR192 is not set # CONFIG_CRYPTO_DES is not set # CONFIG_CRYPTO_BLOWFISH is not set # CONFIG_CRYPTO_TWOFISH is not set # CONFIG_CRYPTO_SERPENT is not set # CONFIG_CRYPTO_AES is not set # CONFIG_CRYPTO_AES_X86_64 is not set # CONFIG_CRYPTO_CAST5 is not set # CONFIG_CRYPTO_CAST6 is not set # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set # CONFIG_CRYPTO_KHAZAD is not set # CONFIG_CRYPTO_ANUBIS is not set # CONFIG_CRYPTO_DEFLATE is not set # CONFIG_CRYPTO_MICHAEL_MIC is not set CONFIG_CRYPTO_CRC32C=m # CONFIG_CRYPTO_TEST is not set # # Hardware crypto devices # # # Library routines # # CONFIG_CRC_CCITT is not set # CONFIG_CRC16 is not set CONFIG_CRC32=y CONFIG_LIBCRC32C=m CONFIG_PLIST=y From bugzilla-daemon at openib.org Wed Jul 26 07:42:23 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 26 Jul 2006 07:42:23 -0700 (PDT) Subject: [openib-general] [Bug 181] New: HPL test always failed Message-ID: <20060726144223.6F3B42283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=181 Summary: HPL test always failed Product: OpenFabrics Windows Version: unspecified Platform: X86-64 OS/Version: Other Status: NEW Severity: blocker Priority: P2 Component: WSD AssignedTo: bugzilla at openib.org ReportedBy: evgeniyge at voltaire.com CC: evgeniyge at voltaire.com We tried to run without RDMA read,and with low level driver HCA MT25208. command line: mpiexec -hosts 4 hostname1 2 hostname2 2..... hpl.exe example of error msg: job aborted: rank: node: exit code: message 0: parker6: terminated 1: parker6: terminated 2: parker7: terminated 3: parker7: terminated 4: parker8: fatal error: Fatal error in MPI_Send: Internal MPI error!, error stack: MPI_Send(172)...................: MPI_Send(buf=0x0000000002B1A9B8, count=17820, MPI_DOUBLE, dest=3, tag=1001, comm=0x84000002) failed MPIDI_CH3I_Progress(165)........: handle_sock_op failed handle_new_message_read(422)....: MPIDI_CH3U_Handle_recv_pkt(1359): received unknown packet type (type=1071575908) 5: parker8: terminated 6: parker9: terminated 7: parker9: terminated ---- error analysis ----- 4: mpi has detected a fatal error and aborted hpl.exe run on parker8 ---- error analysis ----- example of HPL.dat --------------------------------------HPL.dat------------------------------- HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 4 # of problems sizes (N) 5100 3000 3400 3500 Ns 4 # of NBs 100 97 95 90 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 3 # of process grids (P x Q) 2 4 4 Ps 4 2 2 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ------------------------------------------------------------------------------- ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tziporet at mellanox.co.il Wed Jul 26 07:42:19 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 26 Jul 2006 17:42:19 +0300 Subject: [openib-general] new user level branch for OFED 1.1 Message-ID: <44C77F4B.4070207@mellanox.co.il> Hi All, Toward OFED 1.1 release I have created the 1.1 branch: https://openib.org/svn/gen2/branches/1.1/ This branch includes the src/userspace/ based on trunk r8680, and all the other ofed staff. Tziporet From rdreier at cisco.com Wed Jul 26 08:06:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 26 Jul 2006 08:06:09 -0700 Subject: [openib-general] controlling IPoIB debug In-Reply-To: (Or Gerlitz's message of "Wed, 26 Jul 2006 17:26:49 +0300 (IDT)") References: Message-ID: > I can't disable CONFIG_INFINIBAND_IPOIB_DEBUG, that is i was > expecting to be able to press "n" on the "IP-over-InfiniBand debugging" > submenu of "IP-over-InfiniBand" and it does not have any impact. Debugging is forced to be on unless you set EMBEDDED=y. This is so that everyone will ship modules with debugging enabled, so that when someone has a problem we can actually debug it. - R. From afriedle at open-mpi.org Wed Jul 26 08:46:22 2006 From: afriedle at open-mpi.org (Andrew Friedley) Date: Wed, 26 Jul 2006 08:46:22 -0700 Subject: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <000001c6b054$77873950$13cd180a@amr.corp.intel.com> References: <000001c6b054$77873950$13cd180a@amr.corp.intel.com> Message-ID: <44C78E4E.6060103@open-mpi.org> Sean Hefty wrote: > I was trying to ask if there was any way for the processes to generate unique > addresses. For example, what TCP port number do the processes listen on when > establishing their out of band connections? Is there some way that you can map > the addresses that are used for out of band communication to a multicast IP > address, such that the processes get unique addresses? From reading down into > your mail, it doesn't sound like this would help much. Not without breaking many layers of abstraction.. although TCP is all we support for OOB right now, the framework is in place for supporting other (non-TCP/IP) protocols in the future. I'm asking some of our runtime developers if there's anything I could use.. doesn't look like it right now. > I think the same basic API can be exposed in userspace. It may be possible to > expose a couple of extra helper functions to simplify creating and joining a > group, but I'm not sure if they will be worth it. The existing interface seems reasonable - I don't see how adding extra functions would improve anything. > This doesn't end up working well for userspace apps. To get a callback, the > library ends up needing to create a thread to poll for events from the kernel. > It makes more sense to give the application control over the threading, and let > it poll for the events. I figured you would say that. So this would be a separate polling interface from a CQ or what the RDMA CM provides? > Well, after looking at the code, an MGID of 0 doesn't currently work. The > implementation doesn't handle it. I worked on a design to add support for MGID > 0 to the multicast module, and will start on it in the next day or so. Okay, I look forward to seeing the patch. > Another thought I had is to allow ib_get_mcmember_rec() be called with an MGID > of 0. Doing so would return an MCMemberRecord with reasonable default values > that could be used when creating a group. (The returned values would either be > hard-coded or copy those from the first join on a given port, if one had > occurred. In almost all cases, the first join would come from ipoib.) This would be very good - it would allow for adjusting such values before the group is actually joined. I see a possible race condition though - consider two processes calling ib_get_mcmember_rec(). Both of them return from this before either can call ib_join_multicast() and create the multicast group. Is it possible for the same MGID to be returned from ib_get_mcmember_rec() in this scenario? > There is no way to do this. Note that there may be a delay between a node > joining a group and the programming of the switch tables. Thought I'd try. Are you saying that just because a join has completed, that doesn't imply the network is fully ready for handling multicast messages for that group? Andrew From mst at mellanox.co.il Wed Jul 26 08:57:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 18:57:51 +0300 Subject: [openib-general] [PATCH] IB/mthca: fix mthca_array_clear thinko Message-ID: <20060726155750.GF9411@mellanox.co.il> Discovered by Ali Ayoub: mthca_array_clear does not clear the slot if the used count is positive. This leads to crashes in mthca_qp_event since that uses mthca_array_get to check that the qp is valid. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/hw/mthca/mthca_allocator.c b/drivers/infiniband/hw/mthca/mthca_allocator.c index 9ba3211..848e583 100644 --- a/drivers/infiniband/hw/mthca/mthca_allocator.c +++ b/drivers/infiniband/hw/mthca/mthca_allocator.c @@ -144,7 +144,9 @@ void mthca_array_clear(struct mthca_arra if (--array->page_list[p].used == 0) { free_page((unsigned long) array->page_list[p].page); array->page_list[p].page = NULL; - } + } else + array->page_list[p].page[index & (PAGE_SIZE / + sizeof (void *) - 1)] = NULL; if (array->page_list[p].used < 0) pr_debug("Array %p index %d page %d with ref count %d < 0\n", -- MST From swise at opengridcomputing.com Wed Jul 26 09:15:43 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 26 Jul 2006 11:15:43 -0500 Subject: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification. In-Reply-To: <20060726033951.GB3203@gondor.apana.org.au> References: <1153839940.14354.37.camel@stevo-desktop> <20060726033951.GB3203@gondor.apana.org.au> Message-ID: <1153930543.8155.22.camel@stevo-desktop> On Wed, 2006-07-26 at 13:39 +1000, Herbert Xu wrote: > On Tue, Jul 25, 2006 at 10:05:40AM -0500, Steve Wise wrote: > > > > But they really are seeing a delete followed by an add. That's what the > > kernel is doing. > > Actually that's the other thing I don't really like. The user-space > monitor may perceive that a route was actually deleted and replaced > by a new one even though this isn't what's happening at all. > > In fact the problem here is that you're sending route notifications > when it's really the dst_entry that's changing. User-space as it > stands only get notifications about fib changes which is quite different > from changes to the transient dst_entry objects which only exist in the > route cache. > > Is anyone actually going to use the user-space interface of this? If not > perhaps we should wait until someone really needs it before adding the > netlink part of the patch. > > We can change the kernel interface at will so if we make a mistake with > netevent it can be easily corrected. For user-space though the rules > are totally different. I'd really hate to be stuck with an interface > which turns out to not be the one that people actually want to have. > The user interface is not needed for the rdma users. They are all in kernel. I added this at the request of reviewers of this patch. I have no problem at all defering the rtnetlink integration until someone really needs it. > > The rdma driver needs to update all established rdma connections that > > are using the next-hop information of the existing route and make them > > use the next-hop information of the new route. In addition, the rdma > > driver might have a reference to the old dst entry. So it can release > > that ref and add a ref to the new dst entry. > > Do you really need the old route for the user-space part of your patch? > Not if we remove the user-space parts. :-) > > I have to admit I'm a little fuzzy on the routing stuff. The main > > netevents I've utilized in the the rdma driver I'm writing is the > > neighbour update event and the redirect event. Route add/del was added > > for completeness of "routing" netevents. > > So you mean you aren't going to use the route notifications? In that case > we should probably just drop them and add them when someone actually needs > it. At that point they can tell us what semantics they want from it :) > This is fine by me too! The key events needed for rdma are: neighbour update events rtredirect events pmtu change events > > Can you expand further or point me to code where the IP stack "flushes > > its tables" when routes are changed? > > Grep for rt_cache_flush in net/ipv4/fib_hash.c. > thanks. Dave, what do you think about removing the user-space stuff for the first round of integration? IE: Just add netevents and kernel hooks to generate them. Steve. From gregkh at suse.de Wed Jul 26 09:20:07 2006 From: gregkh at suse.de (Greg KH) Date: Wed, 26 Jul 2006 09:20:07 -0700 Subject: [openib-general] restore missing PCI registers after reset In-Reply-To: <20060726102944.GA9411@mellanox.co.il> References: <20060717162531.GC4829@kroah.com> <20060726102944.GA9411@mellanox.co.il> Message-ID: <20060726162007.GA9871@suse.de> On Wed, Jul 26, 2006 at 01:29:44PM +0300, Michael S. Tsirkin wrote: > Quoting r. Greg KH : > > Subject: [patch 02/45] IB/mthca: restore missing PCI registers after reset > > ------------------ > > mthca does not restore the following PCI-X/PCI Express registers after reset: > > PCI-X device: PCI-X command register > > PCI-X bridge: upstream and downstream split transaction registers > > PCI Express : PCI Express device control and link control registers > > > > This causes instability and/or bad performance on systems where one of > > these registers is set to a non-default value by BIOS. > > > > Signed-off-by: Michael S. Tsirkin > > Signed-off-by: Chris Wright > > Signed-off-by: Greg Kroah-Hartman > > By the way, Greg, this code is completely generic, and the same seems to apply > to all PCI-X/PCI-Express devices - should not pci_restore_state and > friends really know about these registers, as well? > > What do you think? I think pci_restore_state() already restores the msi and msix state, take a look at the latest kernel version :) thanks, greg k-h From mst at mellanox.co.il Wed Jul 26 09:32:26 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 19:32:26 +0300 Subject: [openib-general] restore missing PCI registers after reset In-Reply-To: <20060726162007.GA9871@suse.de> References: <20060726162007.GA9871@suse.de> Message-ID: <20060726163226.GG9411@mellanox.co.il> Quoting r. Greg KH : > I think pci_restore_state() already restores the msi and msix state, > take a look at the latest kernel version :) Yes, I know :) but I am not talking abotu MSI/MSI-X, I am talking about the following: > > > PCI-X device: PCI-X command register > > > PCI-X bridge: upstream and downstream split transaction registers > > > PCI Express : PCI Express device control and link control registers these register values include maxumum MTU for PCI express and other vital data. -- MST From mshefty at ichips.intel.com Wed Jul 26 09:35:43 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 26 Jul 2006 09:35:43 -0700 Subject: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users In-Reply-To: <44C78E4E.6060103@open-mpi.org> References: <000001c6b054$77873950$13cd180a@amr.corp.intel.com> <44C78E4E.6060103@open-mpi.org> Message-ID: <44C799DF.7070200@ichips.intel.com> Andrew Friedley wrote: > I figured you would say that. So this would be a separate polling > interface from a CQ or what the RDMA CM provides? Yes. This is one of the issues that I have with the userspace implementation. For a raw IB interface, users can end up needing a half-dozen libraries, each with their own event interface. > I see a possible race condition though - consider two processes calling > ib_get_mcmember_rec(). Both of them return from this before either can > call ib_join_multicast() and create the multicast group. Is it possible > for the same MGID to be returned from ib_get_mcmember_rec() in this > scenario? I probably wasn't being clear. ib_get_mcmember_rec() would return an MCMemberRecord with MGID 0, since that was what was requested. Other default parameters needed to create the group would be filled in. The actual MGID has to come from the SA through the join call. > Thought I'd try. Are you saying that just because a join has completed, > that doesn't imply the network is fully ready for handling multicast > messages for that group? I believe that is the case. - Sean From mshefty at ichips.intel.com Wed Jul 26 09:40:41 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 26 Jul 2006 09:40:41 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C701DA.9030605@voltaire.com> References: <000101c6aee3$951b95a0$87f9070a@amr.corp.intel.com> <44C4FD69.9040005@ichips.intel.com> <44C62643.6030801@voltaire.com> <44C655EF.4010804@ichips.intel.com> <44C701DA.9030605@voltaire.com> Message-ID: <44C79B09.2050209@ichips.intel.com> Or Gerlitz wrote: > Generally, i guess you need to insert the local QPN into the rb_tree > ***before*** sending the REP not after it. That is what the patch does. > Can you state what is the usage being done with the local QPNs in the > timeout on REQ flow? I don't quite follow what you're asking here. Local QPNs are tracked when a REQ is sent. If a second REQ is sent using the same QPN, it will fail with an address in use error. The local QPN is not removed from the table until the connection fails, or we exit the timewait state after being disconnected. - Sean From vuhuong at mellanox.com Wed Jul 26 16:40:00 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 26 Jul 2006 16:40:00 -0700 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: <20060726092843.GR8795@mellanox.co.il> References: <44C6A105.3050902@mellanox.com> <20060726092843.GR8795@mellanox.co.il> Message-ID: <44C7FD50.4030502@mellanox.com> Michael S. Tsirkin wrote: > Quoting r. Vu Pham : >>> Right now this workaround affects all targets unconditionally. >>> >> Can we rework the patch to have mellanox_workarounds=0 by >> default? > > Hmm ... since this is a data corruption issue, seems to me the safe > setting should be the default one. No? > As Roland pointed out "Right now this workaround affects all targets unconditionally" We can set mellanox_workarounds=0 by default to avoid affecting other targets. Whoever test with Mellanox target will pass mellanox_workaround=1 by loading time From mst at mellanox.co.il Wed Jul 26 09:43:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 19:43:50 +0300 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: <44C7FD50.4030502@mellanox.com> References: <44C7FD50.4030502@mellanox.com> Message-ID: <20060726164349.GH9411@mellanox.co.il> Quoting r. Vu Pham : > Subject: Re: [PATCH] RFC: srp filesystem data corruption problem/work-around > > Michael S. Tsirkin wrote: > > Quoting r. Vu Pham : > >>> Right now this workaround affects all targets unconditionally. > >>> > >> Can we rework the patch to have mellanox_workarounds=0 by > >> default? > > > > Hmm ... since this is a data corruption issue, seems to me the safe > > setting should be the default one. No? > > > > As Roland pointed out "Right now this workaround affects all targets > unconditionally" I'll fix that up to check the OUI. -- MST From gregkh at suse.de Wed Jul 26 09:42:46 2006 From: gregkh at suse.de (Greg KH) Date: Wed, 26 Jul 2006 09:42:46 -0700 Subject: [openib-general] restore missing PCI registers after reset In-Reply-To: <20060726163226.GG9411@mellanox.co.il> References: <20060726162007.GA9871@suse.de> <20060726163226.GG9411@mellanox.co.il> Message-ID: <20060726164246.GE9871@suse.de> On Wed, Jul 26, 2006 at 07:32:26PM +0300, Michael S. Tsirkin wrote: > Quoting r. Greg KH : > > I think pci_restore_state() already restores the msi and msix state, > > take a look at the latest kernel version :) > > Yes, I know :) > but I am not talking abotu MSI/MSI-X, I am talking about the following: > > > > PCI-X device: PCI-X command register > > > > PCI-X bridge: upstream and downstream split transaction registers > > > > PCI Express : PCI Express device control and link control registers > > these register values include maxumum MTU for PCI express and other vital > data. Make up a patch that shows how you would save these in a generic way and we can discuss it. I know people have talked about saving the extended PCI config space for devices that need it, so that might be all you need to do here. thanks, greg k-h From mshefty at ichips.intel.com Wed Jul 26 10:38:08 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 26 Jul 2006 10:38:08 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> References: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> Message-ID: <44C7A880.3030900@ichips.intel.com> I don't believe that there were any objections to this patch, and only one minor change request to print a warning message. Roland, do the mthca changes look okay to commit? - Sean From mst at mellanox.co.il Wed Jul 26 10:54:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 20:54:19 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <44C7A880.3030900@ichips.intel.com> References: <44C7A880.3030900@ichips.intel.com> Message-ID: <20060726175419.GA11624@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH 0/4] Dispatch communication related events to the IB CM > > I don't believe that there were any objections to this patch, and only one minor > change request to print a warning message. Roland, do the mthca changes look > okay to commit? > > - Sean BTW,do you think we need this for 2.6.18? It does fix a bug when RTU is lost ... -- MST From sean.hefty at intel.com Wed Jul 26 11:21:15 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 26 Jul 2006 11:21:15 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <20060726175419.GA11624@mellanox.co.il> Message-ID: <000001c6b0e0$4832ced0$e598070a@amr.corp.intel.com> >BTW,do you think we need this for 2.6.18? >It does fix a bug when RTU is lost ... The chances of an RTU being repeatedly lost, but user data being received over the same path seems fairly low IMO. My take is that it's probably not needed for 2.6.18, but that depends on where we are in the 2.6.18 release cycle. - Sean From rdreier at cisco.com Wed Jul 26 12:05:10 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 26 Jul 2006 12:05:10 -0700 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: <20060726164349.GH9411@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 26 Jul 2006 19:43:50 +0300") References: <44C7FD50.4030502@mellanox.com> <20060726164349.GH9411@mellanox.co.il> Message-ID: > I'll fix that up to check the OUI. Makes sense. Unfortunately at least the Engenio target I have access to uses the same Mellanox OUI: IO Unit Info: port LID: 0003 port GID: fe800000000000000002c902004000e6 change ID: 0002 max controllers: 0x10 controller[ 1] GUID: 0002c902004000e4 vendor ID: 0002c9 device ID: 005a44 IO class : 0100 ID: LSI Storage Systems SRP Driver 200400a0b80bdd41 service entries: 1 service[ 0]: 200400a0b80bdd41 / SRP.T10:200400A0B80BDD41 but still I think it's better than nothing to only activate the workaround for GUIDs starting with 0002c9. - R. From rdreier at cisco.com Wed Jul 26 12:19:11 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 26 Jul 2006 12:19:11 -0700 Subject: [openib-general] [PATCH] IB/uverbs: include cosmetic fix In-Reply-To: <20060726052251.GA8380@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 26 Jul 2006 08:22:51 +0300") References: <20060726052251.GA8380@mellanox.co.il> Message-ID: > Since uverbs_cmd.c uses lockdep now, it should include > linux/lockdep.h directly rather than rely on linux/file.h to pull > it in. Current style seems to be to let lockdep.h be included implicitly. A quick grep shows that none of the files that call lockdep_set_class() include . - R. From mst at mellanox.co.il Wed Jul 26 12:51:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 22:51:42 +0300 Subject: [openib-general] [PATCH] IB/uverbs: include cosmetic fix In-Reply-To: References: Message-ID: <20060726195142.GA12182@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/uverbs: include cosmetic fix > > > Since uverbs_cmd.c uses lockdep now, it should include > > linux/lockdep.h directly rather than rely on linux/file.h to pull > > it in. > > Current style seems to be to let lockdep.h be included implicitly. A > quick grep shows that none of the files that call lockdep_set_class() > include . > > - R. > > Fair enough. I removed this from mst-for-2.6.18, so there are 3 patches thre now: Author: Michael S. Tsirkin IB/mthca: fix mthca_array_clear thinko commit fcba37034273136e6bc3124a2ab21821743ce9fd Author: Ishai Rabinovitz IB/srp: fix crash in srp_reconnect_target commit 82bf649ad7e434ccb7ba91e2fc5764a5888bbfb4 Author: Sean Hefty IB/cm: fix error handling in ib_send_cm_req -- MST From mst at mellanox.co.il Wed Jul 26 12:53:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 26 Jul 2006 22:53:40 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000001c6b0e0$4832ced0$e598070a@amr.corp.intel.com> References: <000001c6b0e0$4832ced0$e598070a@amr.corp.intel.com> Message-ID: <20060726195340.GB12182@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH 0/4] Dispatch communication related events to the IB CM > > >BTW,do you think we need this for 2.6.18? > >It does fix a bug when RTU is lost ... > > The chances of an RTU being repeatedly lost, but user data being received over > the same path seems fairly low IMO. My take is that it's probably not needed > for 2.6.18, but that depends on where we are in the 2.6.18 release cycle. Well, I think as long at there's no actual release, theoretical problems are fair game. In -stable only real word issues count :) -- MST From davem at davemloft.net Wed Jul 26 13:56:42 2006 From: davem at davemloft.net (David Miller) Date: Wed, 26 Jul 2006 13:56:42 -0700 (PDT) Subject: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification. In-Reply-To: <1153930543.8155.22.camel@stevo-desktop> References: <1153839940.14354.37.camel@stevo-desktop> <20060726033951.GB3203@gondor.apana.org.au> <1153930543.8155.22.camel@stevo-desktop> Message-ID: <20060726.135642.89059962.davem@davemloft.net> From: Steve Wise Date: Wed, 26 Jul 2006 11:15:43 -0500 > Dave, what do you think about removing the user-space stuff for the > first round of integration? IE: Just add netevents and kernel hooks to > generate them. Sure. From vuhuong at mellanox.com Wed Jul 26 15:51:46 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 26 Jul 2006 15:51:46 -0700 Subject: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers In-Reply-To: References: <20060725134529.GA20068@mellanox.co.il> Message-ID: <44C7F202.6080009@mellanox.com> Roland Dreier wrote: > > > Why does userspace need to be able to disconnect a connection? > > > There are two options on who will initiate the disconnection: the userspace > > daemon or the ib_srp module. I considered both options and I was not sure > > which one is better. I choose to do it in userspace because it looks a good > > symmetry that both the disconnection and reconnection will be initiate in the > > same place. I will accept your comment and change it to the kernel. > > I'm not telling you what to do -- I'm just asking. > > But it does seem to me that the kernel knows better when to disconnect > a connection -- eg I don't think an error completion will be signaled > to userspace. Conversely if a target goes away and comes back with no > IOs submitted in between, then the connection should survive and > there's no reason to disconnect/reconnect. > Yes; however the usermode can still signal the kernel about the events but the kernel will justify on the action to disconnect/reconnect. In your example with no I/O, the kernel can check active_q/pending_q and decide to keep the connection intact. While the target is offline + some apps issue I/Os or in case of error completion/IB errors, the kernel can actively disconnect a connection, moving target to DISCONNECTED state if required. And it does seem to me that the kerne does not know a target off-line until scsi commands timeout and scsi error recovery kick in - this will bring scsi devices to off-line state. Some fail-over drivers may not happy about scsi devices going off-line. So the kernel can rely on usermode's signal to disconnect. In summary I think that we need usermode + kernel working together. Usermode signal the kernel about off-line/on-line events, kernel justify on action disconnect/reconnect or not From sean.hefty at intel.com Wed Jul 26 17:10:26 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 26 Jul 2006 17:10:26 -0700 Subject: [openib-general] [PATCH] multicast: add support for MGID 0 Message-ID: <000001c6b111$10204af0$e598070a@amr.corp.intel.com> Add support to join a multicast group with MGID = 0, with the actual MGID of the group returned by the SA. The multicast module must be able to handle multiple requests for MGID = 0, with each request causing a new multicast group to be created. Also enhance the API for ib_get_mcmember_rec() to support a requested MGID of 0. In this case, a default MCMemberRecord is returned to the user and may be used when creating a new multicast group. Currently, the default values are hard-coded by the multicast module, but that can be extended in the future or overridden by the user before creating the group. Signed-off-by: Sean Hefty --- Index: include/rdma/ib_multicast.h =================================================================== --- include/rdma/ib_multicast.h (revision 8647) +++ include/rdma/ib_multicast.h (working copy) @@ -88,8 +88,13 @@ void ib_free_multicast(struct ib_multica * @device: Device associated with the multicast group. * @port_num: Port on the specified device to associate with the multicast * group. - * @mgid: MGID of multicast group. + * @mgid: optional MGID of multicast group. * @rec: Location to copy SA multicast member record. + * + * If an MGID is specified, returns an existing multicast member record if + * one is found for the local port. If no MGID is specified, or the specified + * MGID is 0, returns a multicast member record filled in with default values + * that may be used to create a new multicast group. */ int ib_get_mcmember_rec(struct ib_device *device, u8 port_num, union ib_gid *mgid, struct ib_sa_mcmember_rec *rec); Index: core/multicast.c =================================================================== --- core/multicast.c (revision 8695) +++ core/multicast.c (working copy) @@ -37,8 +37,10 @@ #include #include #include +#include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("InfiniBand multicast membership handling"); @@ -63,6 +65,7 @@ static struct ib_client mcast_client = { static struct ib_event_handler event_handler; static struct workqueue_struct *mcast_wq; +static union ib_gid mgid0; struct mcast_device; @@ -144,7 +147,8 @@ static struct mcast_group *mcast_find(st } static struct mcast_group *mcast_insert(struct mcast_port *port, - struct mcast_group *group) + struct mcast_group *group, + int allow_duplicates) { struct rb_node **link = &port->table.rb_node; struct rb_node *parent = NULL; @@ -161,6 +165,8 @@ static struct mcast_group *mcast_insert( link = &(*link)->rb_left; else if (ret > 0) link = &(*link)->rb_right; + else if (allow_duplicates) + link = &(*link)->rb_left; else return cur_group; } @@ -476,6 +482,10 @@ static void join_handler(int status, str else { spin_lock_irq(&group->port->lock); group->rec = *rec; + if (!memcmp(&mgid0, &group->rec.mgid, sizeof mgid0)) { + rb_erase(&group->node, &group->port->table); + mcast_insert(group->port, group, 1); + } spin_unlock_irq(&group->port->lock); } mcast_work_handler(group); @@ -492,12 +502,16 @@ static struct mcast_group *acquire_group { struct mcast_group *group, *cur_group; unsigned long flags; + int is_mgid0; - spin_lock_irqsave(&port->lock, flags); - group = mcast_find(port, mgid); - if (group) - goto found; - spin_unlock_irqrestore(&port->lock, flags); + is_mgid0 = !memcmp(&mgid0, mgid, sizeof mgid0); + if (!is_mgid0) { + spin_lock_irqsave(&port->lock, flags); + group = mcast_find(port, mgid); + if (group) + goto found; + spin_unlock_irqrestore(&port->lock, flags); + } group = kzalloc(sizeof *group, gfp_mask); if (!group) @@ -511,7 +525,7 @@ static struct mcast_group *acquire_group spin_lock_init(&group->lock); spin_lock_irqsave(&port->lock, flags); - cur_group = mcast_insert(port, group); + cur_group = mcast_insert(port, group, is_mgid0); if (cur_group) { kfree(group); group = cur_group; @@ -619,19 +633,30 @@ int ib_get_mcmember_rec(struct ib_device struct mcast_port *port; struct mcast_group *group; unsigned long flags; + int ret = 0; dev = ib_get_client_data(device, &mcast_client); if (!dev) return -ENODEV; port = &dev->port[port_num - dev->start_port]; - spin_lock_irqsave(&port->lock, flags); - group = mcast_find(port, mgid); - if (group) - *rec = group->rec; - spin_unlock_irqrestore(&port->lock, flags); + if (mgid && memcmp(mgid, &mgid0, sizeof mgid0)) { + spin_lock_irqsave(&port->lock, flags); + group = mcast_find(port, mgid); + if (group) + *rec = group->rec; + else + ret = -EADDRNOTAVAIL; + spin_unlock_irqrestore(&port->lock, flags); + } else { + memset(rec, 0, sizeof *rec); + ib_get_cached_gid(device, port_num, 0, &rec->port_gid); + rec->pkey = 0xFFFF; + get_random_bytes(&rec->qkey, sizeof rec->qkey); + rec->join_state = 1; + } - return group ? 0 : -EADDRNOTAVAIL; + return ret; } EXPORT_SYMBOL(ib_get_mcmember_rec); From halr at voltaire.com Wed Jul 26 19:11:45 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Jul 2006 22:11:45 -0400 Subject: [openib-general] [PATCH] multicast: add support for MGID 0 In-Reply-To: <000001c6b111$10204af0$e598070a@amr.corp.intel.com> References: <000001c6b111$10204af0$e598070a@amr.corp.intel.com> Message-ID: <1153966304.19539.8614.camel@hal.voltaire.com> Hi Sean, On Wed, 2006-07-26 at 20:10, Sean Hefty wrote: > Add support to join a multicast group with MGID = 0, with the actual > MGID of the group returned by the SA. The multicast module must be able > to handle multiple requests for MGID = 0, with each request causing a > new multicast group to be created. > > Also enhance the API for ib_get_mcmember_rec() to support a requested > MGID of 0. In this case, a default MCMemberRecord is returned to the > user and may be used when creating a new multicast group. Currently, > the default values are hard-coded by the multicast module, but that can > be extended in the future or overridden by the user before creating > the group. > > Signed-off-by: Sean Hefty > --- [snip...] > Index: core/multicast.c > =================================================================== > --- core/multicast.c (revision 8695) > +++ core/multicast.c (working copy) [snip...] > + if (mgid && memcmp(mgid, &mgid0, sizeof mgid0)) { > + spin_lock_irqsave(&port->lock, flags); > + group = mcast_find(port, mgid); > + if (group) > + *rec = group->rec; > + else > + ret = -EADDRNOTAVAIL; > + spin_unlock_irqrestore(&port->lock, flags); > + } else { > + memset(rec, 0, sizeof *rec); > + ib_get_cached_gid(device, port_num, 0, &rec->port_gid); > + rec->pkey = 0xFFFF; > + get_random_bytes(&rec->qkey, sizeof rec->qkey); > + rec->join_state = 1; > + } These are the hard coded values for pkey, qkey, and join state mentioned above. Should there be module parameters to override them ? Also, where do the other parameters (components) that are necessary to create a group come from ? Another option would be to obtain all of them from the appropriate (partition based) IPoIB broadcast group. -- Hal From sean.hefty at intel.com Wed Jul 26 21:11:41 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 26 Jul 2006 21:11:41 -0700 Subject: [openib-general] [PATCH] multicast: add support for MGID 0 In-Reply-To: <1153966304.19539.8614.camel@hal.voltaire.com> Message-ID: <000001c6b132$c40cc450$8ed9180a@amr.corp.intel.com> >These are the hard coded values for pkey, qkey, and join state mentioned >above. Should there be module parameters to override them ? My thought was that a user could override any of the values before creating the group. I'm not sure module parameters are necessary, but I do see how they might be useful. I'll see what others think. >Also, where do the other parameters (components) that are necessary to >create a group come from ? They default to 0. I looked at the values returned for the ipoib broadcast group that was running on my system, and coded the values to match that. >Another option would be to obtain all of them from the appropriate >(partition based) IPoIB broadcast group. I agree, and this is what the RDMA CM does. This was the future extension that I mentioned, since there are still issues that would need to be worked out. Including partition information in the query changes the API. Also, ipoib depends on the ib_multicast module, so ib_multicast cannot rely on ipoib being loaded. It may work better for a user to get the broadcast address that they want, then query for that MGID, but I haven't looked at this enough to know what makes the most sense yet. - Sean From ogerlitz at voltaire.com Wed Jul 26 22:45:57 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 27 Jul 2006 08:45:57 +0300 Subject: [openib-general] controlling IPoIB debug In-Reply-To: References: Message-ID: <44C85315.4030300@voltaire.com> Roland Dreier wrote: > > I can't disable CONFIG_INFINIBAND_IPOIB_DEBUG, that is i was > > expecting to be able to press "n" on the "IP-over-InfiniBand debugging" > > submenu of "IP-over-InfiniBand" and it does not have any impact. > > Debugging is forced to be on unless you set EMBEDDED=y. This is so > that everyone will ship modules with debugging enabled, so that when > someone has a problem we can actually debug it. OK, this makes sense. However, I could not find the way to set CONFIG_EMBEDDED, can you educate me how to do it? ... thanks. Or. From ogerlitz at voltaire.com Wed Jul 26 22:51:01 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 27 Jul 2006 08:51:01 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: References: Message-ID: <44C85445.10300@voltaire.com> Rimmer, Todd wrote: > This approach will not work. If the QP is in RTS the Communication > established event will never be generated. Hence the lost RTU case > would not be properly handled and the ULP would need to take on the > burden. Its much better to isolate the solution to the CM and let the > ULP post to the send Q in RTR. I might miss you allover also is there a chance you might not read the patches with enough attention? Lets first agree that you don't refer to CMA consumers for which the CMA does the state transitions, since for them the CM will always get the COMM_EST async event and will emulate an RTU reception, that is will transition the cm id state and generate CM_USER_ESTABLISHED event for the CMA which will modify the qp state to RTS and generate RDMA_ESTABLISHED event to the ULP. So might mean to other types of CM/CMA consumers, please provide the details, specifically what makes you state "if the QP is in RTS". Or. From ogerlitz at voltaire.com Wed Jul 26 23:34:42 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 27 Jul 2006 09:34:42 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000001c6b0e0$4832ced0$e598070a@amr.corp.intel.com> References: <000001c6b0e0$4832ced0$e598070a@amr.corp.intel.com> Message-ID: <44C85E82.3030904@voltaire.com> Sean Hefty wrote: >> BTW,do you think we need this for 2.6.18? >> It does fix a bug when RTU is lost ... > > The chances of an RTU being repeatedly lost, but user data being received over > the same path seems fairly low IMO. My take is that it's probably not needed > for 2.6.18, but that depends on where we are in the 2.6.18 release cycle. I think we need first to commit this to the SVN and have different developers (eg people working on iSER/NFSoRDMA/SRP/Lustre passive side and ofcourse SDP) test and experience with it before pushing it upstream, targeting 2.6.19 makes sense to me. Not fully handling this race is not a bug but rather a feature that was missing in the openib stack from day one for which now we have a patch that attempts to address it. Or. From ogerlitz at voltaire.com Thu Jul 27 00:45:23 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 27 Jul 2006 10:45:23 +0300 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C79B09.2050209@ichips.intel.com> References: <000101c6aee3$951b95a0$87f9070a@amr.corp.intel.com> <44C4FD69.9040005@ichips.intel.com> <44C62643.6030801@voltaire.com> <44C655EF.4010804@ichips.intel.com> <44C701DA.9030605@voltaire.com> <44C79B09.2050209@ichips.intel.com> Message-ID: <44C86F13.7060203@voltaire.com> Sean Hefty wrote: > Or Gerlitz wrote: >> Generally, i guess you need to insert the local QPN into the rb_tree >> ***before*** sending the REP not after it. > That is what the patch does. OK, sorry for making noise, i was referring to your comment: "I think that it could insert the local QPN after the passive side calls ib_send_cm_rep(), similar to what's done in the client-server model." where i should have verified it in the code you sent... >> Can you state what is the usage being done with the local QPNs in the >> timeout on REQ flow? > > I don't quite follow what you're asking here. Local QPNs are tracked > when a REQ is sent. If a second REQ is sent using the same QPN, it will > fail with an address in use error. The local QPN is not removed from > the table until the connection fails, or we exit the timewait state > after being disconnected. OK, with s/REQ/REP/g i think to understand it better now: in ib_send_cm_rep you attempt to insert the local QPN into the rb_tree and if this is a resend of the REP or the user is recycling QPs (etc) a records for this QPN would be already there and you get address-in-use, unless you remove it in the timeout code or when disconnecting this connection (which is what you do). This is done also for the ib_send_cm_req case but you agreed it can be removed from there also for the peer to peer case. Or. From tziporet at mellanox.co.il Thu Jul 27 00:51:25 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 27 Jul 2006 10:51:25 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: References: Message-ID: <44C8707D.4020807@mellanox.co.il> Rimmer, Todd wrote: > Perhaps we should pursue changing this in the IBTA spec. Being able to > post to the SQ while in RTR makes handling of the Comm Est/RTU race with > the CQ callback much easier to handle. > > It would be better if the IB spec permitted posting to the SQ in RTR but > indicated the SQ would not be processed until the QP moved to RTS. I > believe the present Mellanox silicon/firmware implements such behavior. > > > Mellanox HCAs does NOT implement this behavior. Although it does not move the QP to error when WQE is posted, it will not send the posted WQE when the QP is moved to RTS. Tziporet From ogerlitz at voltaire.com Thu Jul 27 01:22:26 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 27 Jul 2006 11:22:26 +0300 Subject: [openib-general] [PATCH] multicast: add support for MGID 0 In-Reply-To: <000001c6b132$c40cc450$8ed9180a@amr.corp.intel.com> References: <000001c6b132$c40cc450$8ed9180a@amr.corp.intel.com> Message-ID: <44C877C2.7070905@voltaire.com> Sean Hefty wrote: >> These are the hard coded values for pkey, qkey, and join state mentioned >> above. Should there be module parameters to override them ? > My thought was that a user could override any of the values before creating the > group. I'm not sure module parameters are necessary, but I do see how they > might be useful. I'll see what others think. I don't think there's a need for module params here, as Sean stated, the impl. provide default values and the user can override them. The smartness here should be on the user side, so if they don't like the idea of default values, we can remove them, instead of providing a way to set the default values... >> Another option would be to obtain all of them from the appropriate >> (partition based) IPoIB broadcast group. > I agree, and this is what the RDMA CM does. This was the future extension that > I mentioned, since there are still issues that would need to be worked out. > Including partition information in the query changes the API. Also, ipoib > depends on the ib_multicast module, so ib_multicast cannot rely on ipoib being > loaded. It may work better for a user to get the broadcast address that they > want, then query for that MGID, but I haven't looked at this enough to know what > makes the most sense yet. Again, as this API is targeted for "plain and native IB" apps that is applications that does not want to use the IPoIB association from network to rdma devices (which is used by the CMA to know the pkey etc), if the application want to look in the /sys files of the root/child interface to learn the pkey etc params, let them do it. Or. From halr at voltaire.com Thu Jul 27 03:31:09 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jul 2006 06:31:09 -0400 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000001c6b0e0$4832ced0$e598070a@amr.corp.intel.com> References: <000001c6b0e0$4832ced0$e598070a@amr.corp.intel.com> Message-ID: <1153996267.19539.15990.camel@hal.voltaire.com> On Wed, 2006-07-26 at 14:21, Sean Hefty wrote: > >BTW,do you think we need this for 2.6.18? > >It does fix a bug when RTU is lost ... > > The chances of an RTU being repeatedly lost, but user data being received over > the same path seems fairly low IMO. CM messages and the connection can be over separate paths. There is no requirement that they follow the same path but they certainly can which is the most common case. -- Hal From ogerlitz at voltaire.com Thu Jul 27 03:31:05 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 27 Jul 2006 13:31:05 +0300 (IDT) Subject: [openib-general] some CM/CMA clarfications/questions related to listener impl. Message-ID: Sean, Does the the code below is what needs to be done by a CMA consumer to listen on a SID based on over all the active IB ports of this node? What's most important for me now is to understand that the code below does what i expect, however looking in ib_cm.h and the cma code while coding the example, one point that i run into was the granularity the IB CM allows consumers "bind" objects to. I see that CM IDs are bounded to **device** and not to couple? if indeed, what was the rational? It means that we don't support a config of a node connected to two IP subnets each over a different port of the same device, and two processes, each attempting to listen over the same port, but with different ip. Is it as of the CM section of the IB spec saying the CM works with node guids and this implies we can't support this config? Or. void listen_all(void) { struct rdma_cm_id *cma_id; struct sockaddr_in addr; int rc,rb,rl; rc = rb = rl = 1; memset(&addr, 0, sizeof addr); addr.sin_family = AF_INET; addr.sin_port = htons(0x1234); addr.sin_addr.s_addr = htonl(INADDR_ANY); cma_id = rdma_create_id(cma_handler, NULL, RDMA_PS_TCP); if (IS_ERR(cma_id)) { rc = PTR_ERR(cma_id); goto end; } rb = rdma_bind_addr(cma_id, (struct sockaddr *)&addr); if(rb) goto end; rl = rdma_listen(cma_id, 0); if(rl) goto end; rdma_destroy_id(cma_id); end: printk("create %d bind %d listen %d\n",rc,rb,rl); } From ogerlitz at voltaire.com Thu Jul 27 03:37:53 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 27 Jul 2006 13:37:53 +0300 (IDT) Subject: [openib-general] some CM/CMA clarfications/questions related to listener impl. In-Reply-To: References: Message-ID: On Thu, 27 Jul 2006, Or Gerlitz wrote: > Does the the code below is what needs to be done by a CMA > consumer to listen on a SID based on over > all the active IB ports of this node? Just to clarify, i understand this call to rdma_bind does not binds the cma id to rdma device; I meant to ask if connection requests arriving over any port of this system to the resulted SID would be delivered up to this listener along with the CM/CMA generated ID binded to the rdma device from where the REQ arrived. Or. From trimmer at silverstorm.com Thu Jul 27 08:09:25 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Thu, 27 Jul 2006 11:09:25 -0400 Subject: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM In-Reply-To: <44C85445.10300@voltaire.com> Message-ID: > From: Or Gerlitz [mailto:ogerlitz at voltaire.com] > > Rimmer, Todd wrote: > > This approach will not work. If the QP is in RTS the Communication > > established event will never be generated. Hence the lost RTU case > > would not be properly handled and the ULP would need to take on the > > burden. Its much better to isolate the solution to the CM and let the > > ULP post to the send Q in RTR. > > I might miss you allover also is there a chance you might not read the > patches with enough attention? > > Lets first agree that you don't refer to CMA consumers for which the CMA > does the state transitions, since for them the CM will always get the > COMM_EST async event and will emulate an RTU reception, that is will > transition the cm id state and generate CM_USER_ESTABLISHED event for > the CMA which will modify the qp state to RTS and generate > RDMA_ESTABLISHED event to the ULP. > > So might mean to other types of CM/CMA consumers, please provide the > details, specifically what makes you state "if the QP is in RTS". > > Or. My comment was in response to Sean's comment: > I think it would be simpler to transition the QP to RTS after sending a > REP, with the restriction that a user may not post sends until an RTU is > received, a communication establish event occurs, or a receive message > completes on the QP. Hence, this was not in the patches, it was something he was proposing as an alternative. My point is that if the CMA moved the QP to RTS, the CMA would not get a HCA Communication Established Async Event, in which case the CMA would have no vehicle to generate the communication established event to the CMA consumers. It seems burdensome for all CMA consumers to need to implement an alternate Tx queue which will only be used for this one rare situation. The result would be that few CMA consumers would implement it and it would be difficult to test. Hence it is best for the CMA and stack to address the race itself. The particular rare race is the case where: active side CMA consumer completes connection process (and CMA sends RTU). active side immediately sends a message passive side CQ callback occurs before CMA gets RTU or Communication Established Async event (and hence before CMA has moved QP to RTS) While this race sounds rare, it's the kind of thing which will happen in some large cluster under heavy stress. In which case it will be hard to debug, so its better to design out the race from the start. In this rare case, the passive side needs to queue any response TX it may want to do until it gets Communication established. This sidebar queue would not be required after communication established callback. However to avoid CMA consumer protocol errors, the CMA consumer would have to make sure the messages on this TX queue were unconditionally sent before any future sends. As it turns out, we already have such a Q, the Send Q. The Send Q was created previously and the only true limitation is that per IBTA the HCA hardware may not accept send doorbells until in RTS. So one possible approach internal to the stack would be to allow CMA consumers to post to the Send Q when the QP is in RTR, however internal to the stack do not inform the HCA QP of these WQEs until the QP is moved to RTS. The HCA driver could keep track of how many Send Q posts occurred while in RTR, then upon movement to RTS, it could issue the appropriate doorbells to the hardware. The above approach would solve the race completely transparent to CMA consumers. Todd Rimmer From rdreier at cisco.com Thu Jul 27 08:20:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 27 Jul 2006 08:20:46 -0700 Subject: [openib-general] controlling IPoIB debug In-Reply-To: <44C85315.4030300@voltaire.com> (Or Gerlitz's message of "Thu, 27 Jul 2006 08:45:57 +0300") References: <44C85315.4030300@voltaire.com> Message-ID: > OK, this makes sense. However, I could not find the way to set > CONFIG_EMBEDDED, can you educate me how to do it? ... thanks. make menuconfig then hit / type embedded and hit enter From sean.hefty at intel.com Thu Jul 27 08:28:20 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 27 Jul 2006 08:28:20 -0700 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <44C86F13.7060203@voltaire.com> Message-ID: <000101c6b191$4abcf6b0$1ad4180a@amr.corp.intel.com> >This is done also for the ib_send_cm_req case but you agreed it can be >removed from there also for the peer to peer case. To clarify, it can be removed from ib_send_cm_req() if we only care about this handling the COMM_EST event. If we want to track local QPNs as part of timewait, or want to expand the code to handle path migration, we need it in both places. - Sean From sean.hefty at intel.com Thu Jul 27 08:56:23 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 27 Jul 2006 08:56:23 -0700 Subject: [openib-general] some CM/CMA clarfications/questions related to listener impl. In-Reply-To: Message-ID: <000201c6b195$35d36e10$1ad4180a@amr.corp.intel.com> >Does the the code below is what needs to be done by a CMA >consumer to listen on a SID based on over >all the active IB ports of this node? Your code should be sufficient to listen across all RDMA devices. >What's most important for me now is to understand that the code >below does what i expect, however looking in ib_cm.h and the cma >code while coding the example, one point that i run into was the >granularity the IB CM allows consumers "bind" objects to. The binding is unique system wide if an IP address of 0 is specified, so the generated SID will also be unique system wide. >I see that CM IDs are bounded to **device** and not to > couple? if indeed, what was the rational? Basically, the implementation was easier, and I didn't see a need to track listens per port. >It means that we don't support a config of a node connected to >two IP subnets each over a different port of the same device, >and two processes, each attempting to listen over the same port, >but with different ip. As long as the processes bind to different IP addresses, this is supported. Rdma_bind() allows binding to the same port if the IP addresses are different. Rdma_listen() will map both requests to the same SID, but distinguishes them using the private data carried in the REQ. - Sean From sean.hefty at intel.com Thu Jul 27 09:54:48 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 27 Jul 2006 09:54:48 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <44C85E82.3030904@voltaire.com> Message-ID: <000001c6b19d$5f695c00$c5c8180a@amr.corp.intel.com> Roland, Not sure if you're totally back yet, but can you ACK/NACK this? Thanks, Sean From rdreier at cisco.com Thu Jul 27 10:00:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 27 Jul 2006 10:00:17 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000001c6b19d$5f695c00$c5c8180a@amr.corp.intel.com> (Sean Hefty's message of "Thu, 27 Jul 2006 09:54:48 -0700") References: <000001c6b19d$5f695c00$c5c8180a@amr.corp.intel.com> Message-ID: > Not sure if you're totally back yet, but can you ACK/NACK this? To be honest I haven't read this in detail. However it seems to have generated a lot of discussion. Has a consensus been reached? - R. From sean.hefty at intel.com Thu Jul 27 10:28:24 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 27 Jul 2006 10:28:24 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: Message-ID: <000101c6b1a2$11269f80$c5c8180a@amr.corp.intel.com> >To be honest I haven't read this in detail. However it seems to have >generated a lot of discussion. Has a consensus been reached? Just to make sure, here's my summary of the discussions so far. Anyone who disagrees can jump in. There was discussion on this being 2.6.18 material. The consensus is to let it get wider testing in svn first. The fix / feature isn't critical enough to push upstream immediately, so waiting until 2.6.19 seems reasonable. There was also discussion on whether the CM should track local QPNs on both the active and passive sides, or just the passive side. To properly track timewait and eventually handle path migration, tracking on both sides is needed, which is what the patch does. Or asked about removing the ib_cm_establish() call from the API. It was left as part of the API to avoid breaking the ABI, and still allow users to force connection establishment in case they poll a receive WC before the COMM_EST event occurs. There was also some side discussion about the COMM_EST event in general and trying to respond to request messages while the QP is still in the RTR state. A discussion on possible changes to the spec spawned off from this... - Sean From rdreier at cisco.com Thu Jul 27 10:46:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 27 Jul 2006 10:46:59 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000101c6b1a2$11269f80$c5c8180a@amr.corp.intel.com> (Sean Hefty's message of "Thu, 27 Jul 2006 10:28:24 -0700") References: <000101c6b1a2$11269f80$c5c8180a@amr.corp.intel.com> Message-ID: > Or asked about removing the ib_cm_establish() call from the API. It was left as > part of the API to avoid breaking the ABI, and still allow users to force > connection establishment in case they poll a receive WC before the COMM_EST > event occurs. Changing the in-kernel interface is no big deal, so I don't think that needs to be considered. However the second reason seems valid to me. Without ib_cm_establish() it's not clear what a consumer can do if a receive is polled before the communication established event. - R. From mst at mellanox.co.il Thu Jul 27 11:30:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 27 Jul 2006 21:30:12 +0300 Subject: [openib-general] posting send requests in RTR ( was Re: [PATCH 0/4] Dispatch communication related events to the IB CM) In-Reply-To: References: Message-ID: <20060727183012.GB16272@mellanox.co.il> Quoting r. Rimmer, Todd : > The particular rare race is the case where: > active side CMA consumer completes connection process (and CMA sends > RTU). > active side immediately sends a message > passive side CQ callback occurs before CMA gets RTU or Communication > Established Async event (and hence before CMA has moved QP to RTS) > > While this race sounds rare, it's the kind of thing which will happen in > some large cluster under heavy stress. In which case it will be hard to > debug, so its better to design out the race from the start. IB spec is full of such subtle issues. How about disconnect MAD bypassing the send? Protocol designers must be familiar with IB ordering rules. > In this rare case, the passive side needs to queue any response TX it > may want to do until it gets Communication established. Not necessarily. CMA consumer at this point is a rare beast, but since the issue is not specific to CMA, let's look at some IB protocols: IPoIB connected mode can simply drop a packet. So can SRP. SDP (potential CMA consumer!) simply never needs to send any application data before communication established event. Possibly some ULPs may need an extra queue - so let them handle the complexity. -- MST From caitlinb at broadcom.com Thu Jul 27 11:47:02 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 27 Jul 2006 11:47:02 -0700 Subject: [openib-general] posting send requests in RTR ( was Re: [PATCH 0/4] Dispatch communication related events to the IB CM) In-Reply-To: <20060727183012.GB16272@mellanox.co.il> Message-ID: <54AD0F12E08D1541B826BE97C98F99F1744786@NT-SJCA-0751.brcm.ad.broadcom.com> Michael Tsirkin wrote: > > Not necessarily. CMA consumer at this point is a rare beast, > but since the issue is not specific to CMA, let's look at > some IB protocols: IPoIB connected mode can simply drop a > packet. So can SRP. SDP (potential CMA consumer!) simply > never needs to send any application data before communication > established event. > I agree, the simplest and transport neutral rule is that the application is not guaranteed what will happen if they post to the send queue before the communication established event, They MAY get an error, or it MAY be queued, it MAY even be sent because the QP *is* in RTS mode but the consumer just hasn't been told yet, or it may cause a packet to be placed on the wire prematurely with possible adverse effects on the connection (iWARP on the passive side sending a DDP Segment before the active side has received the MPA Response can indeed result in a broken connection). So the best rule for the application is to simply always wait for the communication established event. That always works, and complying with it is hardly a burden for the application. From sean.hefty at intel.com Thu Jul 27 11:49:37 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 27 Jul 2006 11:49:37 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: <20060727183012.GB16272@mellanox.co.il> Message-ID: <000001c6b1ad$6985e900$7ad0180a@amr.corp.intel.com> >Not necessarily. CMA consumer at this point is a rare beast, but since the >issue >is not specific to CMA, let's look at some IB protocols: IPoIB connected mode >can simply drop a packet. So can SRP. SDP (potential CMA consumer!) simply >never needs to send any application data before communication established >event. > >Possibly some ULPs may need an extra queue - so let them handle the complexity. If the ULP isn't sharing a CQ across multiple QPs, it could just wait to arm the CQ (followed by a poll) after receiving the connection establish event. I think the shared CQ case is the more difficult one for the ULP to handle, and I don't know how many ULPs this affects. If you wanted to allow queuing sends in RTR, could this be done entirely in software under verbs? The difficulty comes if an error occurs while trying to post the sends once the QP transitions to RTS. - Sean From trimmer at silverstorm.com Thu Jul 27 12:06:27 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Thu, 27 Jul 2006 15:06:27 -0400 Subject: [openib-general] posting send requests in RTR In-Reply-To: <000001c6b1ad$6985e900$7ad0180a@amr.corp.intel.com> Message-ID: > From: Sean Hefty [mailto:sean.hefty at intel.com] > > >Not necessarily. CMA consumer at this point is a rare beast, but since > the > >issue > >is not specific to CMA, let's look at some IB protocols: IPoIB connected > mode > >can simply drop a packet. So can SRP. SDP (potential CMA consumer!) > simply > >never needs to send any application data before communication established > >event. > > > >Possibly some ULPs may need an extra queue - so let them handle the > complexity. > > If the ULP isn't sharing a CQ across multiple QPs, it could just wait to > arm the > CQ (followed by a poll) after receiving the connection establish event. I > think > the shared CQ case is the more difficult one for the ULP to handle, and I > don't > know how many ULPs this affects. > > If you wanted to allow queuing sends in RTR, could this be done entirely > in > software under verbs? The difficulty comes if an error occurs while > trying to > post the sends once the QP transitions to RTS. > Actually since WQEs are in memory, while in RTR the verbs driver could build the WQEs, post them to the QP, just not issue the doorbells. On RTR->RTS the verbs driver could issue the doorbells. Of course if an RTR->ERR transition occurs, the WQE Flushed behavior would need to be simulated, but of course the WQEs themselves are available to look at (and for the flushed event mainly the Work request Id is of interest) so it shouldn't be too hard. If this approach were taken the verbs driver could permit sends to be posted in RTR and delay informing the HCA until the RTS transition. This type of approach would completely contain the issue in the HCA driver rather than having CMA consumers need to worry about it. Todd Rimmer From mst at mellanox.co.il Thu Jul 27 12:07:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 27 Jul 2006 22:07:28 +0300 Subject: [openib-general] posting send requests in RTR In-Reply-To: <000001c6b1ad$6985e900$7ad0180a@amr.corp.intel.com> References: <000001c6b1ad$6985e900$7ad0180a@amr.corp.intel.com> Message-ID: <20060727190728.GA16415@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: posting send requests in RTR > > >Not necessarily. CMA consumer at this point is a rare beast, but since the > >issue > >is not specific to CMA, let's look at some IB protocols: IPoIB connected mode > >can simply drop a packet. So can SRP. SDP (potential CMA consumer!) simply > >never needs to send any application data before communication established > >event. > > > >Possibly some ULPs may need an extra queue - so let them handle the complexity. > > If the ULP isn't sharing a CQ across multiple QPs, it could just wait to arm > the CQ (followed by a poll) after receiving the connection establish event. I > think the shared CQ case is the more difficult one for the ULP to handle, and > I don't know how many ULPs this affects. You can queue receive completions in ULP and process them after the established event. If WR ID is a running index like in IPoIB you can just increment a counter. It's up to the applications to decide which approach is better. -- MST From mst at mellanox.co.il Thu Jul 27 12:09:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 27 Jul 2006 22:09:54 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: References: Message-ID: <20060727190954.GB16415@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH 0/4] Dispatch communication related events to the IB CM > > > Or asked about removing the ib_cm_establish() call from the API. It was left as > > part of the API to avoid breaking the ABI, and still allow users to force > > connection establishment in case they poll a receive WC before the COMM_EST > > event occurs. > > Changing the in-kernel interface is no big deal, so I don't think that > needs to be considered. However the second reason seems valid to me. > Without ib_cm_establish() it's not clear what a consumer can do if a > receive is polled before the communication established event. Hmm. This means we *still* need rdma_cm_establish(). Now *that* is the simple fix we need for 2.6.18. Sean? -- MST From mst at mellanox.co.il Thu Jul 27 12:22:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 27 Jul 2006 22:22:44 +0300 Subject: [openib-general] posting send requests in RTR In-Reply-To: References: Message-ID: <20060727192244.GC16415@mellanox.co.il> Quoting r. Rimmer, Todd : > This type of approach would completely contain the > issue in the HCA driver rather than having CMA consumers need to worry > about it. So, we are moving code from some ULPs where its shared for all hardware into low level where it needs to be re-implemented for each hardware type. If there's some common code, let's see it used in more than one place, then we'll move it to a utility function. -- MST From caitlinb at broadcom.com Thu Jul 27 12:45:43 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 27 Jul 2006 12:45:43 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: Message-ID: <54AD0F12E08D1541B826BE97C98F99F1744798@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: >> From: Sean Hefty [mailto:sean.hefty at intel.com] >> >>> Not necessarily. CMA consumer at this point is a rare beast, but >>> since the issue is not specific to CMA, let's look at some IB >>> protocols: IPoIB connected mode can simply drop a packet. So can >>> SRP. SDP (potential CMA consumer!) simply never needs to send any >>> application data before communication established event. >>> >>> Possibly some ULPs may need an extra queue - so let them handle the >>> complexity. >> >> If the ULP isn't sharing a CQ across multiple QPs, it could > just wait > to >> arm the >> CQ (followed by a poll) after receiving the connection establish >> event. I think the shared CQ case is the more difficult one for the >> ULP to handle, and I don't know how many ULPs this affects. >> >> If you wanted to allow queuing sends in RTR, could this be done >> entirely in software under verbs? The difficulty comes if an error >> occurs while trying to post the sends once the QP transitions to RTS. >> > > Actually since WQEs are in memory, while in RTR the verbs driver could > build the WQEs, post them to the QP, just not issue the doorbells. That's an implementation assumption. While it is true for every HCA and RNIC I am aware of it is still the type of assumption that a good interface should not make. The other assumption is that the HCA/RNIC does NOT notice that a QP has contents until the doorbell is rung. NOT ringing a doorbell does NOT guarantee that the send queue will NOT be processed. An implemetation that lacked full confidence in its doorbells might have a background scan to look for non-empty send queues as a safety mechanism. I still see a very simple split. If the hardware knows that it has the QP fenced until RTS then there is no harm in letting send WQEs be generated in RTR, otherwise it returns an error. Meanwhile, the verbs consumer is simply told not to post the the send queue until the Connection Established event. Each device can then decide when it is safe to generate the Connection Estabished event. From trimmer at silverstorm.com Thu Jul 27 13:41:11 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Thu, 27 Jul 2006 16:41:11 -0400 Subject: [openib-general] posting send requests in RTR In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1744798@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > > > > > Actually since WQEs are in memory, while in RTR the verbs driver could > > build the WQEs, post them to the QP, just not issue the doorbells. > > > That's an implementation assumption. While it is true for every HCA > and RNIC I am aware of it is still the type of assumption that a good > interface should not make. The other assumption is that the HCA/RNIC > does NOT notice that a QP has contents until the doorbell is rung. > > NOT ringing a doorbell does NOT guarantee that the send queue will NOT > be processed. An implemetation that lacked full confidence in its > doorbells > might have a background scan to look for non-empty send queues as a > safety > mechanism. The assumption would not be in the interface, but rather in the hardware specific implementation of the verbs driver. Hence the appropriate assumption could be made for each hardware model. As needed the alternate TX queue could be implemented in the verbs driver. > > I still see a very simple split. If the hardware knows that it has the > QP fenced until RTS then there is no harm in letting send WQEs be > generated > in RTR, otherwise it returns an error. Meanwhile, the verbs consumer is > simply told not to post the the send queue until the Connection > Established > event. The challenge is that every consume must then implement a alternate TX queue to handle this situation. It would be much better to hide this race in the stack or the verbs driver rather than expose it for error prone implementation in every consumer (both user space and kernel consumers would be vulnerable). Todd Rimmer From caitlinb at broadcom.com Thu Jul 27 13:47:17 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 27 Jul 2006 13:47:17 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: Message-ID: <54AD0F12E08D1541B826BE97C98F99F17447B1@NT-SJCA-0751.brcm.ad.broadcom.com> Rimmer, Todd wrote: >> From: Caitlin Bestler [mailto:caitlinb at broadcom.com] >> >>> >>> Actually since WQEs are in memory, while in RTR the verbs driver >>> could build the WQEs, post them to the QP, just not issue the >>> doorbells. >> >> >> That's an implementation assumption. While it is true for every HCA >> and RNIC I am aware of it is still the type of assumption that a good >> interface should not make. The other assumption is that the HCA/RNIC >> does NOT notice that a QP has contents until the doorbell is rung. >> >> NOT ringing a doorbell does NOT guarantee that the send queue will >> NOT be processed. An implemetation that lacked full confidence in its >> doorbells might have a background scan to look for non-empty send >> queues as a safety mechanism. > > The assumption would not be in the interface, but rather in > the hardware specific implementation of the verbs driver. > Hence the appropriate assumption could be made for each > hardware model. As needed the alternate TX queue could be > implemented in the verbs driver. > >> >> I still see a very simple split. If the hardware knows that it has >> the QP fenced until RTS then there is no harm in letting send WQEs be >> generated in RTR, otherwise it returns an error. Meanwhile, the verbs >> consumer > is >> simply told not to post the the send queue until the Connection >> Established event. > > The challenge is that every consume must then implement a > alternate TX queue to handle this situation. It would be > much better to hide this race in the stack or the verbs > driver rather than expose it for error prone implementation > in every consumer (both user space and kernel consumers would be > vulnerable). > > Todd Rimmer That assumes that there is any valid reason for an application to post send requests before the connection is established. While there is clearly a need to post receive work requests before the connection is established I cannot think of any reason why an application needs to pre-prime the send queue. Putting unneeded complexity in the definition of a hardware service just invites more hardware dependencies and eventual hardware specific bugs that will complicate life for application developers. "Don't post until the connection is established" is very simple for *both* the application and the verbs provider. From sean.hefty at intel.com Thu Jul 27 14:24:53 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 27 Jul 2006 14:24:53 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <20060727190954.GB16415@mellanox.co.il> Message-ID: <000001c6b1c3$1a5672d0$8fc8180a@amr.corp.intel.com> >Hmm. >This means we *still* need rdma_cm_establish(). > >Now *that* is the simple fix we need for 2.6.18. >Sean? I'm not sure that we do. A user of the RDMA CM simply must wait until they receive a connection established event before posting any sends. This is the requirement placed on the RDMA CM user today. The basic problem that this patch solves is if a connection is established, but the RTU is repeatedly lost, the user will get a failed connection attempt for what should have been an active connection. I'm not sure if anyone has hit this in practice. - Sean From troy at scl.ameslab.gov Thu Jul 27 14:39:03 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Thu, 27 Jul 2006 16:39:03 -0500 Subject: [openib-general] debian packages for ehca? Message-ID: <20060727213901.GA5187@minbar-g5.scl.ameslab.gov> I see there are debian packages in testing for libibverbs and mthca.. is the ehca userspace component ready to be packaged and put in debian testing? If not, what does it need yet? From sean.hefty at intel.com Thu Jul 27 14:42:54 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 27 Jul 2006 14:42:54 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F17447B1@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <000101c6b1c5$9e4c1de0$8fc8180a@amr.corp.intel.com> >That assumes that there is any valid reason for an application to >post send requests before the connection is established. While there >is clearly a need to post receive work requests before the connection >is established I cannot think of any reason why an application needs >to pre-prime the send queue. It's not pre-priming the send queue. An application could pull a completed receive work completion off of a CQ. The receive may very well be a request that requires a response. At this point, the connection is obviously established from the consumers viewpoint, but not necessarily by the viewpoint of the RDMA CM or IB CM. The response must now be queued. I believe that the problem can be limited under the following application conditions: 1. The application uses the CQ with different QPs. 2. The application is on the passive (server) side of the connection. 3. The active (client) side sends a request to the server. Even combined these conditions could easily occur. IMO, the question is do we pass this problem to the applications to deal with, or try to handle transparently it under verbs. If we try to handle it under verbs, can it be done in one place? How much would such checks affects the performance of post send operations? And how would immediate or other errors be handled when posting queued sends? My personal take at the moment is to let the ULPs handle the problem. - Sean From rdreier at cisco.com Thu Jul 27 14:46:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 27 Jul 2006 14:46:04 -0700 Subject: [openib-general] debian packages for ehca? In-Reply-To: <20060727213901.GA5187@minbar-g5.scl.ameslab.gov> (Troy Benjegerdes's message of "Thu, 27 Jul 2006 16:39:03 -0500") References: <20060727213901.GA5187@minbar-g5.scl.ameslab.gov> Message-ID: >>>>> "Troy" == Troy Benjegerdes writes: Troy> I see there are debian packages in testing for libibverbs Troy> and mthca.. is the ehca userspace component ready to be Troy> packaged and put in debian testing? If not, what does it Troy> need yet? I don't know the state of libehca really but the main thing missing presumably is a Debian maintainer. From caitlinb at broadcom.com Thu Jul 27 14:59:17 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 27 Jul 2006 14:59:17 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: <000101c6b1c5$9e4c1de0$8fc8180a@amr.corp.intel.com> Message-ID: <54AD0F12E08D1541B826BE97C98F99F17447DA@NT-SJCA-0751.brcm.ad.broadcom.com> Sean Hefty wrote: >> That assumes that there is any valid reason for an application to >> post send requests before the connection is established. While there >> is clearly a need to post receive work requests before the >> connection is established I cannot think of any reason why an >> application needs to pre-prime the send queue. > > It's not pre-priming the send queue. An application could > pull a completed receive work completion off of a CQ. The > receive may very well be a request that requires a response. > At this point, the connection is obviously established from > the consumers viewpoint, but not necessarily by the viewpoint > of the RDMA CM or IB CM. The response must now be queued. > Or you could have the verbs provider infer that since a receive work request has completed then the connection is obviously established, and it could generate the "connection established" event. That way the Consumer always gets a "connection established" event. Alternately, it would be reasonable to simply document that a receive completion *implied* a connection established event, and therefore the application could post to the send queue after it reaped a receive completion (or got a connection established event). Most servers only transmit in response to requests received, so that could actually be an even simpler rule for many server applications. The iWARP angle on this question does not deal with queuing responses, but rather that RDMAC compliant RNICs assume that the host software is responsible for ensuring that the connection complies with the MPA requirement that the active side send the first MPA Frame. If the application posts to the send queue *before* the Connection Established event, an RDMAC compliant RNIC is likely to send it anyway, thereby putting the MPA connection at risk. If that DDP Segment is received by the active side before the MPA Response there is no guarantee that it will be processed correctly. From troy at scl.ameslab.gov Thu Jul 27 15:09:22 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Thu, 27 Jul 2006 17:09:22 -0500 Subject: [openib-general] more ehca errors.. Message-ID: <20060727220922.GB5187@minbar-g5.scl.ameslab.gov> Does this mean I have a mismatched kernel/firmware/ehca revision? This is the 2.6.17 kernel with a relatively recent ehca driver from subversion (SVNEHCA_0009) p5l3:~# p5l3:~# strace -ewrite -ewrite=all ibv_rc_pingpong write(3, "\0\0\0\0\0\4\0\2\0\0\0\0\377\350\10\230", 16) = 16 | 00000 00 00 00 00 00 04 00 02 00 00 00 00 ff e8 08 98 ........ ........ | write(3, "\0\0\0\3\0\4\0\1\0\0\0\0\377\350\10\270", 16) = 16 | 00000 00 00 00 03 00 04 00 01 00 00 00 00 ff e8 08 b8 ........ ........ | write(3, "\0\0\0\t\0\f\0\3\0\0\0\0\377\350\10h\0\0\0\0\20\1`\0\0"..., 48) = -1 EINVAL (Invalid argument) write(2, "Couldn\'t allocate MR\n", 21Couldn't allocate MR ) = 21 | 00000 43 6f 75 6c 64 6e 27 74 20 61 6c 6c 6f 63 61 74 Couldn't allocat | | 00010 65 20 4d 52 0a e MR. | p5l3:~# p5l3:~# p5l3:~# dmesg PU0007 000e0091:ehca_hcall_7arg_7ret HCAD_ERROR opcode=160 ret=ffffffffffffffd4 arg1=1000000003000004 arg2=5 arg3=10016000 arg4=1000 arg5=80000000000000 arg6=451573e0 arg7=0 out1=0 out2=0 out3=0 out4=0 out5=0 out6=0 out7=0 PU0007 00090454:ehca_reg_mr HCAD_ERROR hipz_alloc_mr failed, h_ret=ffffffffffffffd4 hca_hndl=1000000003000004 PU0007 00090478:ehca_reg_mr <<< ret=ffffffea shca=c000000002509800 e_mr=c0000003452d8e80 iova_start=0000000010016000 size=1000 acl=1 e_pd=c0000003451573e0 pginfo=c0000001c4e3ba70 num_pages=1 num_4k=1 PU0007 00090176:ehca_reg_user_mr <<< rc=ffffffffffffffea pd=c0000003451573e0 region=c0000003429d1ed0 mr_access_flags=1 udata=c0000001c4e3bba0 From sean.hefty at intel.com Thu Jul 27 15:10:10 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 27 Jul 2006 15:10:10 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F17447DA@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <000201c6b1c9$6d714340$8fc8180a@amr.corp.intel.com> >Or you could have the verbs provider infer that since a receive >work request has completed then the connection is obviously >established, and it could generate the "connection established" >event. This is what is done today. The issue is that the connection established event is delivered asynchronously to the user's event queue and can be processed after the user has already pulled the receive work completion. >Alternately, it would be reasonable to simply document that a >receive completion *implied* a connection established event, >and therefore the application could post to the send queue >after it reaped a receive completion (or got a connection >established event). The problem is that the QP is not in the RTS state, so cannot accept sends. - Sean From mst at mellanox.co.il Thu Jul 27 15:13:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 28 Jul 2006 01:13:54 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <000001c6b1c3$1a5672d0$8fc8180a@amr.corp.intel.com> References: <000001c6b1c3$1a5672d0$8fc8180a@amr.corp.intel.com> Message-ID: <20060727221354.GC17234@mellanox.co.il> Quoting r. Sean Hefty : > The basic problem that this patch solves is if a connection is established, > but the RTU is repeatedly lost, the user will get a failed connection attempt > for what should have been an active connection. I'm not sure if anyone has > hit this in practice. Neither am I, but still I'm a bit uncomfortable - after all, the spec explicitly says: "it should be pointed out that successful reception of the final message in a CM protocol message exchange cannot be guaranteed." -- MST From caitlinb at broadcom.com Thu Jul 27 15:18:38 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 27 Jul 2006 15:18:38 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: <000201c6b1c9$6d714340$8fc8180a@amr.corp.intel.com> Message-ID: <54AD0F12E08D1541B826BE97C98F99F17447E0@NT-SJCA-0751.brcm.ad.broadcom.com> Sean Hefty wrote: > >> Alternately, it would be reasonable to simply document that a receive >> completion *implied* a connection established event, and therefore >> the application could post to the send queue after it reaped a >> receive completion (or got a connection established event). > > The problem is that the QP is not in the RTS state, so cannot accept > sends. > Well, I suppose if your adapter can be in a state where it has completed a receive work request for a connection but is not yet convinced that that connection is established then it would have to queue those work completions somewhere. If that is all you are proposing then I have no objections, an iWARP adapter can never be in such a state. But I am curious as to why completing a receive work request does not place the QP in the RTS state since the end-to-end QP pairing has obviously been confirmed, and therefore the QP can send. From krkumar2 at in.ibm.com Thu Jul 27 21:50:01 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 28 Jul 2006 10:20:01 +0530 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. Message-ID: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> This patchset is a proposal to create new API's and data structures with transport neutral names. The idea is to remove the old API once all libraries/applications/examples are gradually converted to use the new API. Patch 1/6 - Changes to libibverbs configuration file to build the libibverbs with the new API. Patch 2/6 - Additions to include files in libibverbs for the new API. Patch 3/6 - Source files in libibverbs defining the new API. Patch 4/6 - Convert librdmacm examples to use the new API. Patch 5/6 - Convert librdmacm include files to use the new libibverbs API. Patch 6/6 - Convert librdmacm source files to use the new libibverbs API. Information notes found during the changes : -------------------------------------------- - Added LIBRDMAVERBS_DRIVER_PATH and also use old OPENIB_DRIVER_PATH_ENV for backwards compatibility, but have not set user_path to include OPENIB_DRIVER_PATH_ENV results. - Currently ibv_driver_init is implemented in all drivers. But the function returns a "struct ibv_driver *", while we expect "struct rdma_driver *". In reality this is fine as they are both pointers pointing to identical objects. Otherwise each driver has to be changed now. Once all drivers are changed to use rdma_* API's, this will not be an issue. - IB specific routines are also converted to use RDMA generic API's for sake of uniformness (knowing that transport dependent names will be removed once all apps are converted). - Passing different pointer to verbs, though the end result is the same (no warnings generated though as this is a link-time trick). Eg : int rdma_query_device(struct rdma_context *context, struct rdma_device_attr *device_attr) { return context->ops.query_device(context, device_attr); } However this will not be an issue once the drivers are changed to use the new API. Eg : int mthca_query_device(struct rdma_context *context, struct rdma_device_attr *attr) Signed-off-by: Krishna Kumar From krkumar2 at in.ibm.com Thu Jul 27 21:50:08 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 28 Jul 2006 10:20:08 +0530 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> Additions to include files in libibverbs for the new API. Signed-off-by: Krishna Kumar diff -ruNp ORG/libibverbs/include/infiniband/arch.h NEW/libibverbs/include/infiniband/arch.h --- ORG/libibverbs/include/infiniband/arch.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/include/infiniband/arch.h 2006-07-27 15:34:44.000000000 +0530 @@ -32,8 +32,8 @@ * $Id: arch.h 8358 2006-07-04 20:38:54Z roland $ */ -#ifndef INFINIBAND_ARCH_H -#define INFINIBAND_ARCH_H +#ifndef RDMA_ARCH_H +#define RDMA_ARCH_H #include #include @@ -92,4 +92,4 @@ static inline uint64_t ntohll(uint64_t x #endif -#endif /* INFINIBAND_ARCH_H */ +#endif /* RDMA_ARCH_H */ diff -ruNp ORG/libibverbs/include/infiniband/driver.h NEW/libibverbs/include/infiniband/driver.h --- ORG/libibverbs/include/infiniband/driver.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/include/infiniband/driver.h 2006-07-27 15:34:44.000000000 +0530 @@ -34,8 +34,8 @@ * $Id: driver.h 7484 2006-05-24 21:12:21Z roland $ */ -#ifndef INFINIBAND_DRIVER_H -#define INFINIBAND_DRIVER_H +#ifndef RDMA_DRIVER_H +#define RDMA_DRIVER_H #include #include @@ -143,4 +143,89 @@ const char *ibv_get_sysfs_path(void); int ibv_read_sysfs_file(const char *dir, const char *file, char *buf, size_t size); -#endif /* INFINIBAND_DRIVER_H */ +/* Generic RDMA declarations follow */ + +typedef struct rdma_device *(*rdma_driver_init_func)(const char *, int); + +int rdma_cmd_get_context(struct rdma_context *context, + struct rdma_get_context *cmd, size_t cmd_size, + struct rdma_get_context_resp *resp, size_t resp_size); +int rdma_cmd_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr, + uint64_t *raw_fw_ver, + struct rdma_query_device *cmd, size_t cmd_size); +int rdma_cmd_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr, + struct rdma_query_port *cmd, size_t cmd_size); +int rdma_cmd_query_gid(struct rdma_context *context, uint8_t port_num, + int index, union rdma_gid *gid); +int rdma_cmd_query_pkey(struct rdma_context *context, uint8_t port_num, + int index, uint16_t *pkey); +int rdma_cmd_alloc_pd(struct rdma_context *context, struct rdma_pd *pd, + struct rdma_alloc_pd *cmd, size_t cmd_size, + struct rdma_alloc_pd_resp *resp, size_t resp_size); +int rdma_cmd_dealloc_pd(struct rdma_pd *pd); +int rdma_cmd_reg_mr(struct rdma_pd *pd, void *addr, size_t length, + uint64_t hca_va, enum rdma_access_flags access, + struct rdma_mr *mr, struct rdma_reg_mr *cmd, + size_t cmd_size); +int rdma_cmd_dereg_mr(struct rdma_mr *mr); +int rdma_cmd_create_cq(struct rdma_context *context, int cqe, + struct rdma_comp_channel *channel, + int comp_vector, struct rdma_cq *cq, + struct rdma_create_cq *cmd, size_t cmd_size, + struct rdma_create_cq_resp *resp, size_t resp_size); +int rdma_cmd_poll_cq(struct rdma_cq *cq, int ne, struct rdma_wc *wc); +int rdma_cmd_req_notify_cq(struct rdma_cq *cq, int solicited_only); +int rdma_cmd_resize_cq(struct rdma_cq *cq, int cqe, + struct rdma_resize_cq *cmd, size_t cmd_size); +int rdma_cmd_destroy_cq(struct rdma_cq *cq); + +int rdma_cmd_create_srq(struct rdma_pd *pd, + struct rdma_srq *srq, struct rdma_srq_init_attr *attr, + struct rdma_create_srq *cmd, size_t cmd_size, + struct rdma_create_srq_resp *resp, size_t resp_size); +int rdma_cmd_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask, + struct rdma_modify_srq *cmd, size_t cmd_size); +int rdma_cmd_query_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + struct rdma_query_srq *cmd, size_t cmd_size); +int rdma_cmd_destroy_srq(struct rdma_srq *srq); + +int rdma_cmd_create_qp(struct rdma_pd *pd, + struct rdma_qp *qp, struct rdma_qp_init_attr *attr, + struct rdma_create_qp *cmd, size_t cmd_size, + struct rdma_create_qp_resp *resp, size_t resp_size); +int rdma_cmd_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *qp_attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *qp_init_attr, + struct rdma_query_qp *cmd, size_t cmd_size); +int rdma_cmd_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_modify_qp *cmd, size_t cmd_size); +int rdma_cmd_destroy_qp(struct rdma_qp *qp); +int rdma_cmd_post_send(struct rdma_qp *ibqp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr); +int rdma_cmd_post_recv(struct rdma_qp *ibqp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr); +int rdma_cmd_post_srq_recv(struct rdma_srq *srq, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr); +int rdma_cmd_create_ah(struct rdma_pd *pd, struct rdma_ah *ah, + struct rdma_ah_attr *attr); +int rdma_cmd_destroy_ah(struct rdma_ah *ah); +int rdma_cmd_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, + uint16_t lid); +int rdma_cmd_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, + uint16_t lid); + +/* + * sysfs helper functions + */ +const char *rdma_get_sysfs_path(void); + +int rdma_read_sysfs_file(const char *dir, const char *file, + char *buf, size_t size); + +#endif /* RDMA_DRIVER_H */ diff -ruNp ORG/libibverbs/include/infiniband/kern-abi.h NEW/libibverbs/include/infiniband/kern-abi.h --- ORG/libibverbs/include/infiniband/kern-abi.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/include/infiniband/kern-abi.h 2006-07-27 15:34:44.000000000 +0530 @@ -878,4 +878,840 @@ struct ibv_create_srq_resp_v5 { __u32 srq_handle; }; +/* Generic RDMA declarations follow */ + +/* + * The minimum and maximum kernel ABI that we can handle. + */ +#define RDMA_USER_VERBS_MIN_ABI_VERSION 1 +#define RDMA_USER_VERBS_MAX_ABI_VERSION 6 + +enum { + RDMA_USER_VERBS_CMD_GET_CONTEXT, + RDMA_USER_VERBS_CMD_QUERY_DEVICE, + RDMA_USER_VERBS_CMD_QUERY_PORT, + RDMA_USER_VERBS_CMD_ALLOC_PD, + RDMA_USER_VERBS_CMD_DEALLOC_PD, + RDMA_USER_VERBS_CMD_CREATE_AH, + RDMA_USER_VERBS_CMD_MODIFY_AH, + RDMA_USER_VERBS_CMD_QUERY_AH, + RDMA_USER_VERBS_CMD_DESTROY_AH, + RDMA_USER_VERBS_CMD_REG_MR, + RDMA_USER_VERBS_CMD_REG_SMR, + RDMA_USER_VERBS_CMD_REREG_MR, + RDMA_USER_VERBS_CMD_QUERY_MR, + RDMA_USER_VERBS_CMD_DEREG_MR, + RDMA_USER_VERBS_CMD_ALLOC_MW, + RDMA_USER_VERBS_CMD_BIND_MW, + RDMA_USER_VERBS_CMD_DEALLOC_MW, + RDMA_USER_VERBS_CMD_CREATE_COMP_CHANNEL, + RDMA_USER_VERBS_CMD_CREATE_CQ, + RDMA_USER_VERBS_CMD_RESIZE_CQ, + RDMA_USER_VERBS_CMD_DESTROY_CQ, + RDMA_USER_VERBS_CMD_POLL_CQ, + RDMA_USER_VERBS_CMD_PEEK_CQ, + RDMA_USER_VERBS_CMD_REQ_NOTIFY_CQ, + RDMA_USER_VERBS_CMD_CREATE_QP, + RDMA_USER_VERBS_CMD_QUERY_QP, + RDMA_USER_VERBS_CMD_MODIFY_QP, + RDMA_USER_VERBS_CMD_DESTROY_QP, + RDMA_USER_VERBS_CMD_POST_SEND, + RDMA_USER_VERBS_CMD_POST_RECV, + RDMA_USER_VERBS_CMD_ATTACH_MCAST, + RDMA_USER_VERBS_CMD_DETACH_MCAST, + RDMA_USER_VERBS_CMD_CREATE_SRQ, + RDMA_USER_VERBS_CMD_MODIFY_SRQ, + RDMA_USER_VERBS_CMD_QUERY_SRQ, + RDMA_USER_VERBS_CMD_DESTROY_SRQ, + RDMA_USER_VERBS_CMD_POST_SRQ_RECV +}; + +/* + * Make sure that all structs defined in this file remain laid out so + * that they pack the same way on 32-bit and 64-bit architectures (to + * avoid incompatibility between 32-bit userspace and 64-bit kernels). + * Specifically: + * - Do not use pointer types -- pass pointers in __u64 instead. + * - Make sure that any structure larger than 4 bytes is padded to a + * multiple of 8 bytes. Otherwise the structure size will be + * different between 32-bit and 64-bit architectures. + */ + +struct rdma_kern_async_event { + __u64 element; + __u32 event_type; + __u32 reserved; +}; + +struct rdma_comp_event { + __u64 cq_handle; +}; + +/* + * All commands from userspace should start with a __u32 command field + * followed by __u16 in_words and out_words fields (which give the + * length of the command block and response buffer if any in 32-bit + * words). The kernel driver will read these fields first and read + * the rest of the command struct based on these value. + */ + +struct rdma_query_params { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; +}; + +struct rdma_query_params_resp { + __u32 num_cq_events; +}; + +struct rdma_get_context { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 driver_data[0]; +}; + +struct rdma_get_context_resp { + __u32 async_fd; + __u32 num_comp_vectors; +}; + +struct rdma_query_device { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 driver_data[0]; +}; + +struct rdma_query_device_resp { + __u64 fw_ver; + __u64 node_guid; + __u64 sys_image_guid; + __u64 max_mr_size; + __u64 page_size_cap; + __u32 vendor_id; + __u32 vendor_part_id; + __u32 hw_ver; + __u32 max_qp; + __u32 max_qp_wr; + __u32 device_cap_flags; + __u32 max_sge; + __u32 max_sge_rd; + __u32 max_cq; + __u32 max_cqe; + __u32 max_mr; + __u32 max_pd; + __u32 max_qp_rd_atom; + __u32 max_ee_rd_atom; + __u32 max_res_rd_atom; + __u32 max_qp_init_rd_atom; + __u32 max_ee_init_rd_atom; + __u32 atomic_cap; + __u32 max_ee; + __u32 max_rdd; + __u32 max_mw; + __u32 max_raw_ipv6_qp; + __u32 max_raw_ethy_qp; + __u32 max_mcast_grp; + __u32 max_mcast_qp_attach; + __u32 max_total_mcast_qp_attach; + __u32 max_ah; + __u32 max_fmr; + __u32 max_map_per_fmr; + __u32 max_srq; + __u32 max_srq_wr; + __u32 max_srq_sge; + __u16 max_pkeys; + __u8 local_ca_ack_delay; + __u8 phys_port_cnt; + __u8 reserved[4]; +}; + +struct rdma_query_port { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u8 port_num; + __u8 reserved[7]; + __u64 driver_data[0]; +}; + +struct rdma_query_port_resp { + __u32 port_cap_flags; + __u32 max_msg_sz; + __u32 bad_pkey_cntr; + __u32 qkey_viol_cntr; + __u32 gid_tbl_len; + __u16 pkey_tbl_len; + __u16 lid; + __u16 sm_lid; + __u8 state; + __u8 max_mtu; + __u8 active_mtu; + __u8 lmc; + __u8 max_vl_num; + __u8 sm_sl; + __u8 subnet_timeout; + __u8 init_type_reply; + __u8 active_width; + __u8 active_speed; + __u8 phys_state; + __u8 reserved[3]; +}; + +struct rdma_alloc_pd { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 driver_data[0]; +}; + +struct rdma_alloc_pd_resp { + __u32 pd_handle; +}; + +struct rdma_dealloc_pd { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 pd_handle; +}; + +struct rdma_reg_mr { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 start; + __u64 length; + __u64 hca_va; + __u32 pd_handle; + __u32 access_flags; + __u64 driver_data[0]; +}; + +struct rdma_reg_mr_resp { + __u32 mr_handle; + __u32 lkey; + __u32 rkey; +}; + +struct rdma_dereg_mr { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 mr_handle; +}; + +struct rdma_create_comp_channel { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; +}; + +struct rdma_create_comp_channel_resp { + __u32 fd; +}; + +struct rdma_create_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 cqe; + __u32 comp_vector; + __s32 comp_channel; + __u32 reserved; + __u64 driver_data[0]; +}; + +struct rdma_create_cq_resp { + __u32 cq_handle; + __u32 cqe; +}; + +struct rdma_kern_wc { + __u64 wr_id; + __u32 status; + __u32 opcode; + __u32 vendor_err; + __u32 byte_len; + __u32 imm_data; + __u32 qp_num; + __u32 src_qp; + __u32 wc_flags; + __u16 pkey_index; + __u16 slid; + __u8 sl; + __u8 dlid_path_bits; + __u8 port_num; + __u8 reserved; +}; + +struct rdma_poll_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 cq_handle; + __u32 ne; +}; + +struct rdma_poll_cq_resp { + __u32 count; + __u32 reserved; + struct rdma_kern_wc wc[0]; +}; + +struct rdma_req_notify_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 cq_handle; + __u32 solicited; +}; + +struct rdma_resize_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 cq_handle; + __u32 cqe; + __u64 driver_data[0]; +}; + +struct rdma_resize_cq_resp { + __u32 cqe; +}; + +struct rdma_destroy_cq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 cq_handle; + __u32 reserved; +}; + +struct rdma_destroy_cq_resp { + __u32 comp_events_reported; + __u32 async_events_reported; +}; + +struct rdma_kern_global_route { + __u8 dgid[16]; + __u32 flow_label; + __u8 sgid_index; + __u8 hop_limit; + __u8 traffic_class; + __u8 reserved; +}; + +struct rdma_kern_ah_attr { + struct rdma_kern_global_route grh; + __u16 dlid; + __u8 sl; + __u8 src_path_bits; + __u8 static_rate; + __u8 is_global; + __u8 port_num; + __u8 reserved; +}; + +struct rdma_kern_qp_attr { + __u32 qp_attr_mask; + __u32 qp_state; + __u32 cur_qp_state; + __u32 path_mtu; + __u32 path_mig_state; + __u32 qkey; + __u32 rq_psn; + __u32 sq_psn; + __u32 dest_qp_num; + __u32 qp_access_flags; + + struct rdma_kern_ah_attr ah_attr; + struct rdma_kern_ah_attr alt_ah_attr; + + /* ib_qp_cap */ + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + + __u16 pkey_index; + __u16 alt_pkey_index; + __u8 en_sqd_async_notify; + __u8 sq_draining; + __u8 max_rd_atomic; + __u8 max_dest_rd_atomic; + __u8 min_rnr_timer; + __u8 port_num; + __u8 timeout; + __u8 retry_cnt; + __u8 rnr_retry; + __u8 alt_port_num; + __u8 alt_timeout; + __u8 reserved[5]; +}; + +struct rdma_create_qp { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 pd_handle; + __u32 send_cq_handle; + __u32 recv_cq_handle; + __u32 srq_handle; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + __u8 sq_sig_all; + __u8 qp_type; + __u8 is_srq; + __u8 reserved; + __u64 driver_data[0]; +}; + +struct rdma_create_qp_resp { + __u32 qp_handle; + __u32 qpn; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + __u32 reserved; +}; + +struct rdma_qp_dest { + __u8 dgid[16]; + __u32 flow_label; + __u16 dlid; + __u16 reserved; + __u8 sgid_index; + __u8 hop_limit; + __u8 traffic_class; + __u8 sl; + __u8 src_path_bits; + __u8 static_rate; + __u8 is_global; + __u8 port_num; +}; + +struct rdma_query_qp { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 qp_handle; + __u32 attr_mask; + __u64 driver_data[0]; +}; + +struct rdma_query_qp_resp { + struct rdma_qp_dest dest; + struct rdma_qp_dest alt_dest; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + __u32 qkey; + __u32 rq_psn; + __u32 sq_psn; + __u32 dest_qp_num; + __u32 qp_access_flags; + __u16 pkey_index; + __u16 alt_pkey_index; + __u8 qp_state; + __u8 cur_qp_state; + __u8 path_mtu; + __u8 path_mig_state; + __u8 en_sqd_async_notify; + __u8 max_rd_atomic; + __u8 max_dest_rd_atomic; + __u8 min_rnr_timer; + __u8 port_num; + __u8 timeout; + __u8 retry_cnt; + __u8 rnr_retry; + __u8 alt_port_num; + __u8 alt_timeout; + __u8 sq_sig_all; + __u8 reserved[5]; + __u64 driver_data[0]; +}; + +struct rdma_modify_qp { + __u32 command; + __u16 in_words; + __u16 out_words; + struct rdma_qp_dest dest; + struct rdma_qp_dest alt_dest; + __u32 qp_handle; + __u32 attr_mask; + __u32 qkey; + __u32 rq_psn; + __u32 sq_psn; + __u32 dest_qp_num; + __u32 qp_access_flags; + __u16 pkey_index; + __u16 alt_pkey_index; + __u8 qp_state; + __u8 cur_qp_state; + __u8 path_mtu; + __u8 path_mig_state; + __u8 en_sqd_async_notify; + __u8 max_rd_atomic; + __u8 max_dest_rd_atomic; + __u8 min_rnr_timer; + __u8 port_num; + __u8 timeout; + __u8 retry_cnt; + __u8 rnr_retry; + __u8 alt_port_num; + __u8 alt_timeout; + __u8 reserved[2]; + __u64 driver_data[0]; +}; + +struct rdma_destroy_qp { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 qp_handle; + __u32 reserved; +}; + +struct rdma_destroy_qp_resp { + __u32 events_reported; +}; + +struct rdma_kern_send_wr { + __u64 wr_id; + __u32 num_sge; + __u32 opcode; + __u32 send_flags; + __u32 imm_data; + union { + struct { + __u64 remote_addr; + __u32 rkey; + __u32 reserved; + } rdma; + struct { + __u64 remote_addr; + __u64 compare_add; + __u64 swap; + __u32 rkey; + __u32 reserved; + } atomic; + struct { + __u32 ah; + __u32 remote_qpn; + __u32 remote_qkey; + __u32 reserved; + } ud; + } wr; +}; + +struct rdma_post_send { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 qp_handle; + __u32 wr_count; + __u32 sge_count; + __u32 wqe_size; + struct rdma_kern_send_wr send_wr[0]; +}; + +struct rdma_post_send_resp { + __u32 bad_wr; +}; + +struct rdma_kern_recv_wr { + __u64 wr_id; + __u32 num_sge; + __u32 reserved; +}; + +struct rdma_post_recv { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 qp_handle; + __u32 wr_count; + __u32 sge_count; + __u32 wqe_size; + struct rdma_kern_recv_wr recv_wr[0]; +}; + +struct rdma_post_recv_resp { + __u32 bad_wr; +}; + +struct rdma_post_srq_recv { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 srq_handle; + __u32 wr_count; + __u32 sge_count; + __u32 wqe_size; + struct rdma_kern_recv_wr recv_wr[0]; +}; + +struct rdma_post_srq_recv_resp { + __u32 bad_wr; +}; + +struct rdma_create_ah { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 pd_handle; + __u32 reserved; + struct rdma_kern_ah_attr attr; +}; + +struct rdma_create_ah_resp { + __u32 handle; +}; + +struct rdma_destroy_ah { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 ah_handle; +}; + +struct rdma_attach_mcast { + __u32 command; + __u16 in_words; + __u16 out_words; + __u8 gid[16]; + __u32 qp_handle; + __u16 mlid; + __u16 reserved; + __u64 driver_data[0]; +}; + +struct rdma_detach_mcast { + __u32 command; + __u16 in_words; + __u16 out_words; + __u8 gid[16]; + __u32 qp_handle; + __u16 mlid; + __u16 reserved; + __u64 driver_data[0]; +}; + +struct rdma_create_srq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 pd_handle; + __u32 max_wr; + __u32 max_sge; + __u32 srq_limit; + __u64 driver_data[0]; +}; + +struct rdma_create_srq_resp { + __u32 srq_handle; + __u32 max_wr; + __u32 max_sge; + __u32 reserved; +}; + +struct rdma_modify_srq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 srq_handle; + __u32 attr_mask; + __u32 max_wr; + __u32 srq_limit; + __u64 driver_data[0]; +}; + +struct rdma_query_srq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 srq_handle; + __u32 reserved; + __u64 driver_data[0]; +}; + +struct rdma_query_srq_resp { + __u32 max_wr; + __u32 max_sge; + __u32 srq_limit; + __u32 reserved; +}; + +struct rdma_destroy_srq { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 srq_handle; + __u32 reserved; +}; + +struct rdma_destroy_srq_resp { + __u32 events_reported; +}; + +/* + * Compatibility with older ABI versions + */ + +enum { + RDMA_USER_VERBS_CMD_QUERY_PARAMS_V2, + RDMA_USER_VERBS_CMD_GET_CONTEXT_V2, + RDMA_USER_VERBS_CMD_QUERY_DEVICE_V2, + RDMA_USER_VERBS_CMD_QUERY_PORT_V2, + RDMA_USER_VERBS_CMD_QUERY_GID_V2, + RDMA_USER_VERBS_CMD_QUERY_PKEY_V2, + RDMA_USER_VERBS_CMD_ALLOC_PD_V2, + RDMA_USER_VERBS_CMD_DEALLOC_PD_V2, + RDMA_USER_VERBS_CMD_CREATE_AH_V2, + RDMA_USER_VERBS_CMD_MODIFY_AH_V2, + RDMA_USER_VERBS_CMD_QUERY_AH_V2, + RDMA_USER_VERBS_CMD_DESTROY_AH_V2, + RDMA_USER_VERBS_CMD_REG_MR_V2, + RDMA_USER_VERBS_CMD_REG_SMR_V2, + RDMA_USER_VERBS_CMD_REREG_MR_V2, + RDMA_USER_VERBS_CMD_QUERY_MR_V2, + RDMA_USER_VERBS_CMD_DEREG_MR_V2, + RDMA_USER_VERBS_CMD_ALLOC_MW_V2, + RDMA_USER_VERBS_CMD_BIND_MW_V2, + RDMA_USER_VERBS_CMD_DEALLOC_MW_V2, + RDMA_USER_VERBS_CMD_CREATE_CQ_V2, + RDMA_USER_VERBS_CMD_RESIZE_CQ_V2, + RDMA_USER_VERBS_CMD_DESTROY_CQ_V2, + RDMA_USER_VERBS_CMD_POLL_CQ_V2, + RDMA_USER_VERBS_CMD_PEEK_CQ_V2, + RDMA_USER_VERBS_CMD_REQ_NOTIFY_CQ_V2, + RDMA_USER_VERBS_CMD_CREATE_QP_V2, + RDMA_USER_VERBS_CMD_QUERY_QP_V2, + RDMA_USER_VERBS_CMD_MODIFY_QP_V2, + RDMA_USER_VERBS_CMD_DESTROY_QP_V2, + RDMA_USER_VERBS_CMD_POST_SEND_V2, + RDMA_USER_VERBS_CMD_POST_RECV_V2, + RDMA_USER_VERBS_CMD_ATTACH_MCAST_V2, + RDMA_USER_VERBS_CMD_DETACH_MCAST_V2, + RDMA_USER_VERBS_CMD_CREATE_SRQ_V2, + RDMA_USER_VERBS_CMD_MODIFY_SRQ_V2, + RDMA_USER_VERBS_CMD_QUERY_SRQ_V2, + RDMA_USER_VERBS_CMD_DESTROY_SRQ_V2, + RDMA_USER_VERBS_CMD_POST_SRQ_RECV_V2, + /* + * Set commands that didn't exist to -1 so our compile-time + * trick opcodes in RDMA_INIT_CMD() doesn't break. + */ + RDMA_USER_VERBS_CMD_CREATE_COMP_CHANNEL_V2 = -1, +}; + +struct rdma_destroy_cq_v1 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 cq_handle; +}; + +struct rdma_destroy_qp_v1 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 qp_handle; +}; + +struct rdma_destroy_srq_v1 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 srq_handle; +}; + +struct rdma_get_context_v2 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 cq_fd_tab; + __u64 driver_data[0]; +}; + +struct rdma_create_cq_v2 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 cqe; + __u32 event_handler; + __u64 driver_data[0]; +}; + +struct rdma_modify_srq_v3 { + __u32 command; + __u16 in_words; + __u16 out_words; + __u32 srq_handle; + __u32 attr_mask; + __u32 max_wr; + __u32 max_sge; + __u32 srq_limit; + __u32 reserved; + __u64 driver_data[0]; +}; + +struct rdma_create_qp_resp_v3 { + __u32 qp_handle; + __u32 qpn; +}; + +struct rdma_create_qp_resp_v4 { + __u32 qp_handle; + __u32 qpn; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; +}; + +struct rdma_create_srq_resp_v5 { + __u32 srq_handle; +}; + #endif /* KERN_ABI_H */ diff -ruNp ORG/libibverbs/include/infiniband/marshall.h NEW/libibverbs/include/infiniband/marshall.h --- ORG/libibverbs/include/infiniband/marshall.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/include/infiniband/marshall.h 2006-07-27 15:34:44.000000000 +0530 @@ -30,8 +30,8 @@ * SOFTWARE. */ -#ifndef INFINIBAND_MARSHALL_H -#define INFINIBAND_MARSHALL_H +#ifndef RDMA_MARSHALL_H +#define RDMA_MARSHALL_H #include #include @@ -60,6 +60,20 @@ void ibv_copy_path_rec_from_kern(struct void ibv_copy_path_rec_to_kern(struct ibv_kern_path_rec *dst, struct ibv_sa_path_rec *src); +/* Generic RDMA declarations follow */ + +void rdma_copy_qp_attr_from_kern(struct rdma_qp_attr *dst, + struct rdma_kern_qp_attr *src); + +void rdma_copy_ah_attr_from_kern(struct rdma_ah_attr *dst, + struct rdma_kern_ah_attr *src); + +void rdma_copy_path_rec_from_kern(struct rdma_sa_path_rec *dst, + struct rdma_kern_path_rec *src); + +void rdma_copy_path_rec_to_kern(struct rdma_kern_path_rec *dst, + struct rdma_sa_path_rec *src); + END_C_DECLS -#endif /* INFINIBAND_MARSHALL_H */ +#endif /* RDMA_MARSHALL_H */ diff -ruNp ORG/libibverbs/include/infiniband/opcode.h NEW/libibverbs/include/infiniband/opcode.h --- ORG/libibverbs/include/infiniband/opcode.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/include/infiniband/opcode.h 2006-07-27 15:34:44.000000000 +0530 @@ -32,8 +32,8 @@ * $Id: opcode.h 1989 2005-03-14 20:25:13Z roland $ */ -#ifndef INFINIBAND_OPCODE_H -#define INFINIBAND_OPCODE_H +#ifndef RDMA_OPCODE_H +#define RDMA_OPCODE_H /* * This macro cleans up the definitions of constants for BTH opcodes. @@ -146,4 +146,108 @@ enum { IBV_OPCODE(UD, SEND_ONLY_WITH_IMMEDIATE) }; -#endif /* INFINIBAND_OPCODE_H */ +/* Generic RDMA declarations follow */ + +#define RDMA_OPCODE(transport, op) \ + RDMA_OPCODE_ ## transport ## _ ## op = \ + RDMA_OPCODE_ ## transport + RDMA_OPCODE_ ## op + +enum { + /* transport types -- just used to define real constants */ + RDMA_OPCODE_RC = 0x00, + RDMA_OPCODE_UC = 0x20, + RDMA_OPCODE_RD = 0x40, + RDMA_OPCODE_UD = 0x60, + + /* operations -- just used to define real constants */ + RDMA_OPCODE_SEND_FIRST = 0x00, + RDMA_OPCODE_SEND_MIDDLE = 0x01, + RDMA_OPCODE_SEND_LAST = 0x02, + RDMA_OPCODE_SEND_LAST_WITH_IMMEDIATE = 0x03, + RDMA_OPCODE_SEND_ONLY = 0x04, + RDMA_OPCODE_SEND_ONLY_WITH_IMMEDIATE = 0x05, + RDMA_OPCODE_RDMA_WRITE_FIRST = 0x06, + RDMA_OPCODE_RDMA_WRITE_MIDDLE = 0x07, + RDMA_OPCODE_RDMA_WRITE_LAST = 0x08, + RDMA_OPCODE_RDMA_WRITE_LAST_WITH_IMMEDIATE = 0x09, + RDMA_OPCODE_RDMA_WRITE_ONLY = 0x0a, + RDMA_OPCODE_RDMA_WRITE_ONLY_WITH_IMMEDIATE = 0x0b, + RDMA_OPCODE_RDMA_READ_REQUEST = 0x0c, + RDMA_OPCODE_RDMA_READ_RESPONSE_FIRST = 0x0d, + RDMA_OPCODE_RDMA_READ_RESPONSE_MIDDLE = 0x0e, + RDMA_OPCODE_RDMA_READ_RESPONSE_LAST = 0x0f, + RDMA_OPCODE_RDMA_READ_RESPONSE_ONLY = 0x10, + RDMA_OPCODE_ACKNOWLEDGE = 0x11, + RDMA_OPCODE_ATOMIC_ACKNOWLEDGE = 0x12, + RDMA_OPCODE_COMPARE_SWAP = 0x13, + RDMA_OPCODE_FETCH_ADD = 0x14, + + /* real constants follow -- see comment about above RDMA_OPCODE() + macro for more details */ + + /* RC */ + RDMA_OPCODE(RC, SEND_FIRST), + RDMA_OPCODE(RC, SEND_MIDDLE), + RDMA_OPCODE(RC, SEND_LAST), + RDMA_OPCODE(RC, SEND_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(RC, SEND_ONLY), + RDMA_OPCODE(RC, SEND_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(RC, RDMA_WRITE_FIRST), + RDMA_OPCODE(RC, RDMA_WRITE_MIDDLE), + RDMA_OPCODE(RC, RDMA_WRITE_LAST), + RDMA_OPCODE(RC, RDMA_WRITE_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(RC, RDMA_WRITE_ONLY), + RDMA_OPCODE(RC, RDMA_WRITE_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(RC, RDMA_READ_REQUEST), + RDMA_OPCODE(RC, RDMA_READ_RESPONSE_FIRST), + RDMA_OPCODE(RC, RDMA_READ_RESPONSE_MIDDLE), + RDMA_OPCODE(RC, RDMA_READ_RESPONSE_LAST), + RDMA_OPCODE(RC, RDMA_READ_RESPONSE_ONLY), + RDMA_OPCODE(RC, ACKNOWLEDGE), + RDMA_OPCODE(RC, ATOMIC_ACKNOWLEDGE), + RDMA_OPCODE(RC, COMPARE_SWAP), + RDMA_OPCODE(RC, FETCH_ADD), + + /* UC */ + RDMA_OPCODE(UC, SEND_FIRST), + RDMA_OPCODE(UC, SEND_MIDDLE), + RDMA_OPCODE(UC, SEND_LAST), + RDMA_OPCODE(UC, SEND_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(UC, SEND_ONLY), + RDMA_OPCODE(UC, SEND_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(UC, RDMA_WRITE_FIRST), + RDMA_OPCODE(UC, RDMA_WRITE_MIDDLE), + RDMA_OPCODE(UC, RDMA_WRITE_LAST), + RDMA_OPCODE(UC, RDMA_WRITE_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(UC, RDMA_WRITE_ONLY), + RDMA_OPCODE(UC, RDMA_WRITE_ONLY_WITH_IMMEDIATE), + + /* RD */ + RDMA_OPCODE(RD, SEND_FIRST), + RDMA_OPCODE(RD, SEND_MIDDLE), + RDMA_OPCODE(RD, SEND_LAST), + RDMA_OPCODE(RD, SEND_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(RD, SEND_ONLY), + RDMA_OPCODE(RD, SEND_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(RD, RDMA_WRITE_FIRST), + RDMA_OPCODE(RD, RDMA_WRITE_MIDDLE), + RDMA_OPCODE(RD, RDMA_WRITE_LAST), + RDMA_OPCODE(RD, RDMA_WRITE_LAST_WITH_IMMEDIATE), + RDMA_OPCODE(RD, RDMA_WRITE_ONLY), + RDMA_OPCODE(RD, RDMA_WRITE_ONLY_WITH_IMMEDIATE), + RDMA_OPCODE(RD, RDMA_READ_REQUEST), + RDMA_OPCODE(RD, RDMA_READ_RESPONSE_FIRST), + RDMA_OPCODE(RD, RDMA_READ_RESPONSE_MIDDLE), + RDMA_OPCODE(RD, RDMA_READ_RESPONSE_LAST), + RDMA_OPCODE(RD, RDMA_READ_RESPONSE_ONLY), + RDMA_OPCODE(RD, ACKNOWLEDGE), + RDMA_OPCODE(RD, ATOMIC_ACKNOWLEDGE), + RDMA_OPCODE(RD, COMPARE_SWAP), + RDMA_OPCODE(RD, FETCH_ADD), + + /* UD */ + RDMA_OPCODE(UD, SEND_ONLY), + RDMA_OPCODE(UD, SEND_ONLY_WITH_IMMEDIATE) +}; + +#endif /* RDMA_OPCODE_H */ diff -ruNp ORG/libibverbs/include/infiniband/sa.h NEW/libibverbs/include/infiniband/sa.h --- ORG/libibverbs/include/infiniband/sa.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/include/infiniband/sa.h 2006-07-27 15:34:44.000000000 +0530 @@ -33,8 +33,8 @@ * $Id: sa.h 2616 2005-06-15 15:22:39Z halr $ */ -#ifndef INFINIBAND_SA_H -#define INFINIBAND_SA_H +#ifndef RDMA_SA_H +#define RDMA_SA_H #include @@ -99,4 +99,67 @@ struct ibv_sa_service_rec { uint64_t data64[2]; }; -#endif /* INFINIBAND_SA_H */ +/* Generic RDMA declarations follow */ + +struct rdma_sa_path_rec { + /* reserved */ + /* reserved */ + union rdma_gid dgid; + union rdma_gid sgid; + uint16_t dlid; + uint16_t slid; + int raw_traffic; + /* reserved */ + uint32_t flow_label; + uint8_t hop_limit; + uint8_t traffic_class; + int reversible; + uint8_t numb_path; + uint16_t pkey; + /* reserved */ + uint8_t sl; + uint8_t mtu_selector; + uint8_t mtu; + uint8_t rate_selector; + uint8_t rate; + uint8_t packet_life_time_selector; + uint8_t packet_life_time; + uint8_t preference; +}; + +struct rdma_sa_mcmember_rec { + union rdma_gid mgid; + union rdma_gid port_gid; + uint32_t qkey; + uint16_t mlid; + uint8_t mtu_selector; + uint8_t mtu; + uint8_t traffic_class; + uint16_t pkey; + uint8_t rate_selector; + uint8_t rate; + uint8_t packet_life_time_selector; + uint8_t packet_life_time; + uint8_t sl; + uint32_t flow_label; + uint8_t hop_limit; + uint8_t scope; + uint8_t join_state; + int proxy_join; +}; + +struct rdma_sa_service_rec { + uint64_t id; + union rdma_gid gid; + uint16_t pkey; + /* uint16_t resv; */ + uint32_t lease; + uint8_t key[16]; + uint8_t name[64]; + uint8_t data8[16]; + uint16_t data16[8]; + uint32_t data32[4]; + uint64_t data64[2]; +}; + +#endif /* RDMA_SA_H */ diff -ruNp ORG/libibverbs/include/infiniband/sa-kern-abi.h NEW/libibverbs/include/infiniband/sa-kern-abi.h --- ORG/libibverbs/include/infiniband/sa-kern-abi.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/include/infiniband/sa-kern-abi.h 2006-07-27 15:34:44.000000000 +0530 @@ -30,8 +30,8 @@ * SOFTWARE. */ -#ifndef INFINIBAND_SA_KERN_ABI_H -#define INFINIBAND_SA_KERN_ABI_H +#ifndef RDMA_SA_KERN_ABI_H +#define RDMA_SA_KERN_ABI_H #include @@ -62,4 +62,28 @@ struct ibv_kern_path_rec { __u8 preference; }; -#endif /* INFINIBAND_SA_KERN_ABI_H */ +/* Generic RDMA declarations follow */ + +struct rdma_kern_path_rec { + __u8 dgid[16]; + __u8 sgid[16]; + __u16 dlid; + __u16 slid; + __u32 raw_traffic; + __u32 flow_label; + __u32 reversible; + __u32 mtu; + __u16 pkey; + __u8 hop_limit; + __u8 traffic_class; + __u8 numb_path; + __u8 sl; + __u8 mtu_selector; + __u8 rate_selector; + __u8 rate; + __u8 packet_life_time_selector; + __u8 packet_life_time; + __u8 preference; +}; + +#endif /* RDMA_SA_KERN_ABI_H */ diff -ruNp ORG/libibverbs/include/infiniband/verbs.h NEW/libibverbs/include/infiniband/verbs.h --- ORG/libibverbs/include/infiniband/verbs.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/include/infiniband/verbs.h 2006-07-27 15:34:44.000000000 +0530 @@ -35,8 +35,8 @@ * $Id: verbs.h 8076 2006-06-16 18:26:34Z sean.hefty $ */ -#ifndef INFINIBAND_VERBS_H -#define INFINIBAND_VERBS_H +#ifndef RDMA_VERBS_H +#define RDMA_VERBS_H #include #include @@ -1016,8 +1016,984 @@ int ibv_attach_mcast(struct ibv_qp *qp, */ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); +/* Generic RDMA declarations follow */ + +union rdma_gid { + uint8_t raw[16]; + struct { + uint64_t subnet_prefix; + uint64_t interface_id; + } global; +}; + +enum rdma_node_type { + RDMA_NODE_CA = 1, + RDMA_NODE_SWITCH, + RDMA_NODE_ROUTER +}; + +enum rdma_device_cap_flags { + RDMA_DEVICE_RESIZE_MAX_WR = 1, + RDMA_DEVICE_BAD_PKEY_CNTR = 1 << 1, + RDMA_DEVICE_BAD_QKEY_CNTR = 1 << 2, + RDMA_DEVICE_RAW_MULTI = 1 << 3, + RDMA_DEVICE_AUTO_PATH_MIG = 1 << 4, + RDMA_DEVICE_CHANGE_PHY_PORT = 1 << 5, + RDMA_DEVICE_UD_AV_PORT_ENFORCE = 1 << 6, + RDMA_DEVICE_CURR_QP_STATE_MOD = 1 << 7, + RDMA_DEVICE_SHUTDOWN_PORT = 1 << 8, + RDMA_DEVICE_INIT_TYPE = 1 << 9, + RDMA_DEVICE_PORT_ACTIVE_EVENT = 1 << 10, + RDMA_DEVICE_SYS_IMAGE_GUID = 1 << 11, + RDMA_DEVICE_RC_RNR_NAK_GEN = 1 << 12, + RDMA_DEVICE_SRQ_RESIZE = 1 << 13, + RDMA_DEVICE_N_NOTIFY_CQ = 1 << 14 +}; + +enum rdma_atomic_cap { + RDMA_ATOMIC_NONE, + RDMA_ATOMIC_HCA, + RDMA_ATOMIC_GLOB +}; + +struct rdma_device_attr { + char fw_ver[64]; + uint64_t node_guid; + uint64_t sys_image_guid; + uint64_t max_mr_size; + uint64_t page_size_cap; + uint32_t vendor_id; + uint32_t vendor_part_id; + uint32_t hw_ver; + int max_qp; + int max_qp_wr; + int device_cap_flags; + int max_sge; + int max_sge_rd; + int max_cq; + int max_cqe; + int max_mr; + int max_pd; + int max_qp_rd_atom; + int max_ee_rd_atom; + int max_res_rd_atom; + int max_qp_init_rd_atom; + int max_ee_init_rd_atom; + enum rdma_atomic_cap atomic_cap; + int max_ee; + int max_rdd; + int max_mw; + int max_raw_ipv6_qp; + int max_raw_ethy_qp; + int max_mcast_grp; + int max_mcast_qp_attach; + int max_total_mcast_qp_attach; + int max_ah; + int max_fmr; + int max_map_per_fmr; + int max_srq; + int max_srq_wr; + int max_srq_sge; + uint16_t max_pkeys; + uint8_t local_ca_ack_delay; + uint8_t phys_port_cnt; +}; + +enum rdma_mtu { + RDMA_MTU_256 = 1, + RDMA_MTU_512 = 2, + RDMA_MTU_1024 = 3, + RDMA_MTU_2048 = 4, + RDMA_MTU_4096 = 5 +}; + +enum rdma_port_state { + RDMA_PORT_NOP = 0, + RDMA_PORT_DOWN = 1, + RDMA_PORT_INIT = 2, + RDMA_PORT_ARMED = 3, + RDMA_PORT_ACTIVE = 4, + RDMA_PORT_ACTIVE_DEFER = 5 +}; + +struct rdma_port_attr { + enum rdma_port_state state; + enum rdma_mtu max_mtu; + enum rdma_mtu active_mtu; + int gid_tbl_len; + uint32_t port_cap_flags; + uint32_t max_msg_sz; + uint32_t bad_pkey_cntr; + uint32_t qkey_viol_cntr; + uint16_t pkey_tbl_len; + uint16_t lid; + uint16_t sm_lid; + uint8_t lmc; + uint8_t max_vl_num; + uint8_t sm_sl; + uint8_t subnet_timeout; + uint8_t init_type_reply; + uint8_t active_width; + uint8_t active_speed; + uint8_t phys_state; +}; + +enum rdma_event_type { + RDMA_EVENT_CQ_ERR, + RDMA_EVENT_QP_FATAL, + RDMA_EVENT_QP_REQ_ERR, + RDMA_EVENT_QP_ACCESS_ERR, + RDMA_EVENT_COMM_EST, + RDMA_EVENT_SQ_DRAINED, + RDMA_EVENT_PATH_MIG, + RDMA_EVENT_PATH_MIG_ERR, + RDMA_EVENT_DEVICE_FATAL, + RDMA_EVENT_PORT_ACTIVE, + RDMA_EVENT_PORT_ERR, + RDMA_EVENT_LID_CHANGE, + RDMA_EVENT_PKEY_CHANGE, + RDMA_EVENT_SM_CHANGE, + RDMA_EVENT_SRQ_ERR, + RDMA_EVENT_SRQ_LIMIT_REACHED, + RDMA_EVENT_QP_LAST_WQE_REACHED, + RDMA_EVENT_CLIENT_REREGISTER +}; + +struct rdma_async_event { + union { + struct rdma_cq *cq; + struct rdma_qp *qp; + struct rdma_srq *srq; + int port_num; + } element; + enum rdma_event_type event_type; +}; + +enum rdma_wc_status { + RDMA_WC_SUCCESS, + RDMA_WC_LOC_LEN_ERR, + RDMA_WC_LOC_QP_OP_ERR, + RDMA_WC_LOC_EEC_OP_ERR, + RDMA_WC_LOC_PROT_ERR, + RDMA_WC_WR_FLUSH_ERR, + RDMA_WC_MW_BIND_ERR, + RDMA_WC_BAD_RESP_ERR, + RDMA_WC_LOC_ACCESS_ERR, + RDMA_WC_REM_INV_REQ_ERR, + RDMA_WC_REM_ACCESS_ERR, + RDMA_WC_REM_OP_ERR, + RDMA_WC_RETRY_EXC_ERR, + RDMA_WC_RNR_RETRY_EXC_ERR, + RDMA_WC_LOC_RDD_VIOL_ERR, + RDMA_WC_REM_INV_RD_REQ_ERR, + RDMA_WC_REM_ABORT_ERR, + RDMA_WC_INV_EECN_ERR, + RDMA_WC_INV_EEC_STATE_ERR, + RDMA_WC_FATAL_ERR, + RDMA_WC_RESP_TIMEOUT_ERR, + RDMA_WC_GENERAL_ERR +}; + +enum rdma_wc_opcode { + RDMA_WC_SEND, + RDMA_WC_RDMA_WRITE, + RDMA_WC_RDMA_READ, + RDMA_WC_COMP_SWAP, + RDMA_WC_FETCH_ADD, + RDMA_WC_BIND_MW, +/* + * Set value of RDMA_WC_RECV so consumers can test if a completion is a + * receive by testing (opcode & RDMA_WC_RECV). + */ + RDMA_WC_RECV = 1 << 7, + RDMA_WC_RECV_RDMA_WITH_IMM +}; + +enum rdma_wc_flags { + RDMA_WC_GRH = 1 << 0, + RDMA_WC_WITH_IMM = 1 << 1 +}; + +struct rdma_wc { + uint64_t wr_id; + enum rdma_wc_status status; + enum rdma_wc_opcode opcode; + uint32_t vendor_err; + uint32_t byte_len; + uint32_t imm_data; /* in network byte order */ + uint32_t qp_num; + uint32_t src_qp; + enum rdma_wc_flags wc_flags; + uint16_t pkey_index; + uint16_t slid; + uint8_t sl; + uint8_t dlid_path_bits; +}; + +enum rdma_access_flags { + RDMA_ACCESS_LOCAL_WRITE = 1, + RDMA_ACCESS_REMOTE_WRITE = (1<<1), + RDMA_ACCESS_REMOTE_READ = (1<<2), + RDMA_ACCESS_REMOTE_ATOMIC = (1<<3), + RDMA_ACCESS_MW_BIND = (1<<4) +}; + +struct rdma_pd { + struct rdma_context *context; + uint32_t handle; +}; + +struct rdma_mr { + struct rdma_context *context; + struct rdma_pd *pd; + uint32_t handle; + uint32_t lkey; + uint32_t rkey; +}; + +struct rdma_global_route { + union rdma_gid dgid; + uint32_t flow_label; + uint8_t sgid_index; + uint8_t hop_limit; + uint8_t traffic_class; +}; + +struct rdma_grh { + uint32_t version_tclass_flow; + uint16_t paylen; + uint8_t next_hdr; + uint8_t hop_limit; + union rdma_gid sgid; + union rdma_gid dgid; +}; + +enum rdma_rate { + RDMA_RATE_MAX = 0, + RDMA_RATE_2_5_GBPS = 2, + RDMA_RATE_5_GBPS = 5, + RDMA_RATE_10_GBPS = 3, + RDMA_RATE_20_GBPS = 6, + RDMA_RATE_30_GBPS = 4, + RDMA_RATE_40_GBPS = 7, + RDMA_RATE_60_GBPS = 8, + RDMA_RATE_80_GBPS = 9, + RDMA_RATE_120_GBPS = 10 +}; + +/** + * rdma_rate_to_mult - Convert the IB rate enum to a multiple of the + * base rate of 2.5 Gbit/sec. For example, RDMA_RATE_5_GBPS will be + * converted to 2, since 5 Gbit/sec is 2 * 2.5 Gbit/sec. + * @rate: rate to convert. + */ +int rdma_rate_to_mult(enum rdma_rate rate) __attribute_const; + +/** + * mult_to_rdma_rate - Convert a multiple of 2.5 Gbit/sec to an IB rate enum. + * @mult: multiple to convert. + */ +enum rdma_rate mult_to_rdma_rate(int mult) __attribute_const; + +struct rdma_ah_attr { + struct rdma_global_route grh; + uint16_t dlid; + uint8_t sl; + uint8_t src_path_bits; + uint8_t static_rate; + uint8_t is_global; + uint8_t port_num; +}; + +enum rdma_srq_attr_mask { + RDMA_SRQ_MAX_WR = 1 << 0, + RDMA_SRQ_LIMIT = 1 << 1 +}; + +struct rdma_srq_attr { + uint32_t max_wr; + uint32_t max_sge; + uint32_t srq_limit; +}; + +struct rdma_srq_init_attr { + void *srq_context; + struct rdma_srq_attr attr; +}; + +enum rdma_qp_type { + RDMA_QPT_RC = 2, + RDMA_QPT_UC, + RDMA_QPT_UD +}; + +struct rdma_qp_cap { + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; +}; + +struct rdma_qp_init_attr { + void *qp_context; + struct rdma_cq *send_cq; + struct rdma_cq *recv_cq; + struct rdma_srq *srq; + struct rdma_qp_cap cap; + enum rdma_qp_type qp_type; + int sq_sig_all; +}; + +enum rdma_qp_attr_mask { + RDMA_QP_STATE = 1 << 0, + RDMA_QP_CUR_STATE = 1 << 1, + RDMA_QP_EN_SQD_ASYNC_NOTIFY = 1 << 2, + RDMA_QP_ACCESS_FLAGS = 1 << 3, + RDMA_QP_PKEY_INDEX = 1 << 4, + RDMA_QP_PORT = 1 << 5, + RDMA_QP_QKEY = 1 << 6, + RDMA_QP_AV = 1 << 7, + RDMA_QP_PATH_MTU = 1 << 8, + RDMA_QP_TIMEOUT = 1 << 9, + RDMA_QP_RETRY_CNT = 1 << 10, + RDMA_QP_RNR_RETRY = 1 << 11, + RDMA_QP_RQ_PSN = 1 << 12, + RDMA_QP_MAX_QP_RD_ATOMIC = 1 << 13, + RDMA_QP_ALT_PATH = 1 << 14, + RDMA_QP_MIN_RNR_TIMER = 1 << 15, + RDMA_QP_SQ_PSN = 1 << 16, + RDMA_QP_MAX_DEST_RD_ATOMIC = 1 << 17, + RDMA_QP_PATH_MIG_STATE = 1 << 18, + RDMA_QP_CAP = 1 << 19, + RDMA_QP_DEST_QPN = 1 << 20 +}; + +enum rdma_qp_state { + RDMA_QPS_RESET, + RDMA_QPS_INIT, + RDMA_QPS_RTR, + RDMA_QPS_RTS, + RDMA_QPS_SQD, + RDMA_QPS_SQE, + RDMA_QPS_ERR +}; + +enum rdma_mig_state { + RDMA_MIG_MIGRATED, + RDMA_MIG_REARM, + RDMA_MIG_ARMED +}; + +struct rdma_qp_attr { + enum rdma_qp_state qp_state; + enum rdma_qp_state cur_qp_state; + enum rdma_mtu path_mtu; + enum rdma_mig_state path_mig_state; + uint32_t qkey; + uint32_t rq_psn; + uint32_t sq_psn; + uint32_t dest_qp_num; + int qp_access_flags; + struct rdma_qp_cap cap; + struct rdma_ah_attr ah_attr; + struct rdma_ah_attr alt_ah_attr; + uint16_t pkey_index; + uint16_t alt_pkey_index; + uint8_t en_sqd_async_notify; + uint8_t sq_draining; + uint8_t max_rd_atomic; + uint8_t max_dest_rd_atomic; + uint8_t min_rnr_timer; + uint8_t port_num; + uint8_t timeout; + uint8_t retry_cnt; + uint8_t rnr_retry; + uint8_t alt_port_num; + uint8_t alt_timeout; +}; + +enum rdma_wr_opcode { + RDMA_WR_RDMA_WRITE, + RDMA_WR_RDMA_WRITE_WITH_IMM, + RDMA_WR_SEND, + RDMA_WR_SEND_WITH_IMM, + RDMA_WR_RDMA_READ, + RDMA_WR_ATOMIC_CMP_AND_SWP, + RDMA_WR_ATOMIC_FETCH_AND_ADD +}; + +enum rdma_send_flags { + RDMA_SEND_FENCE = 1 << 0, + RDMA_SEND_SIGNALED = 1 << 1, + RDMA_SEND_SOLICITED = 1 << 2, + RDMA_SEND_INLINE = 1 << 3 +}; + +struct rdma_sge { + uint64_t addr; + uint32_t length; + uint32_t lkey; +}; + +struct rdma_send_wr { + struct rdma_send_wr *next; + uint64_t wr_id; + struct rdma_sge *sg_list; + int num_sge; + enum rdma_wr_opcode opcode; + enum rdma_send_flags send_flags; + uint32_t imm_data; /* in network byte order */ + union { + struct { + uint64_t remote_addr; + uint32_t rkey; + } rdma; + struct { + uint64_t remote_addr; + uint64_t compare_add; + uint64_t swap; + uint32_t rkey; + } atomic; + struct { + struct rdma_ah *ah; + uint32_t remote_qpn; + uint32_t remote_qkey; + } ud; + } wr; +}; + +struct rdma_recv_wr { + struct rdma_recv_wr *next; + uint64_t wr_id; + struct rdma_sge *sg_list; + int num_sge; +}; + +struct rdma_srq { + struct rdma_context *context; + void *srq_context; + struct rdma_pd *pd; + uint32_t handle; + + pthread_mutex_t mutex; + pthread_cond_t cond; + uint32_t events_completed; +}; + +struct rdma_qp { + struct rdma_context *context; + void *qp_context; + struct rdma_pd *pd; + struct rdma_cq *send_cq; + struct rdma_cq *recv_cq; + struct rdma_srq *srq; + uint32_t handle; + uint32_t qp_num; + enum rdma_qp_state state; + enum rdma_qp_type qp_type; + + pthread_mutex_t mutex; + pthread_cond_t cond; + uint32_t events_completed; +}; + +struct rdma_comp_channel { + int fd; +}; + +struct rdma_cq { + struct rdma_context *context; + void *cq_context; + uint32_t handle; + int cqe; + + pthread_mutex_t mutex; + pthread_cond_t cond; + uint32_t comp_events_completed; + uint32_t async_events_completed; +}; + +struct rdma_ah { + struct rdma_context *context; + struct rdma_pd *pd; + uint32_t handle; +}; + +struct rdma_device; +struct rdma_context; + +struct rdma_device_ops { + struct rdma_context * (*alloc_context)(struct rdma_device *device, + int cmd_fd); + void (*free_context)(struct rdma_context *context); +}; + +enum { + RDMA_SYSFS_NAME_MAX = 64, + RDMA_SYSFS_PATH_MAX = 256 +}; + +struct rdma_device { + struct rdma_driver *driver; + struct rdma_device_ops ops; + /* Name of underlying kernel IB device, eg "mthca0" */ + char name[RDMA_SYSFS_NAME_MAX]; + /* Name of uverbs device, eg "uverbs0" */ + char dev_name[RDMA_SYSFS_NAME_MAX]; + /* Path to infiniband_verbs class device in sysfs */ + char dev_path[RDMA_SYSFS_PATH_MAX]; + /* Path to infiniband class device in sysfs */ + char ibdev_path[RDMA_SYSFS_PATH_MAX]; +}; + +struct rdma_context_ops { + int (*query_device)(struct rdma_context *context, + struct rdma_device_attr *device_attr); + int (*query_port)(struct rdma_context *context, + uint8_t port_num, + struct rdma_port_attr *port_attr); + struct rdma_pd * (*alloc_pd)(struct rdma_context *context); + int (*dealloc_pd)(struct rdma_pd *pd); + struct rdma_mr * (*reg_mr)(struct rdma_pd *pd, void *addr, + size_t length, + enum rdma_access_flags access); + int (*dereg_mr)(struct rdma_mr *mr); + struct rdma_cq * (*create_cq)(struct rdma_context *context, + int cqe, + struct rdma_comp_channel *channel, + int comp_vector); + int (*poll_cq)(struct rdma_cq *cq, int num_entries, + struct rdma_wc *wc); + int (*req_notify_cq)(struct rdma_cq *cq, + int solicited_only); + void (*cq_event)(struct rdma_cq *cq); + int (*resize_cq)(struct rdma_cq *cq, int cqe); + int (*destroy_cq)(struct rdma_cq *cq); + struct rdma_srq * (*create_srq)(struct rdma_pd *pd, + struct rdma_srq_init_attr *srq_init_attr); + int (*modify_srq)(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask); + int (*query_srq)(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr); + int (*destroy_srq)(struct rdma_srq *srq); + int (*post_srq_recv)(struct rdma_srq *srq, + struct rdma_recv_wr *recv_wr, + struct rdma_recv_wr **bad_recv_wr); + struct rdma_qp * (*create_qp)(struct rdma_pd *pd, + struct rdma_qp_init_attr *attr); + int (*query_qp)(struct rdma_qp *qp, + struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr); + int (*modify_qp)(struct rdma_qp *qp, + struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask); + int (*destroy_qp)(struct rdma_qp *qp); + int (*post_send)(struct rdma_qp *qp, + struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr); + int (*post_recv)(struct rdma_qp *qp, + struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr); + struct rdma_ah * (*create_ah)(struct rdma_pd *pd, + struct rdma_ah_attr *attr); + int (*destroy_ah)(struct rdma_ah *ah); + int (*attach_mcast)(struct rdma_qp *qp, + union rdma_gid *gid, + uint16_t lid); + int (*detach_mcast)(struct rdma_qp *qp, + union rdma_gid *gid, + uint16_t lid); +}; + +struct rdma_context { + struct rdma_device *device; + struct rdma_context_ops ops; + int cmd_fd; + int async_fd; + int num_comp_vectors; + void *abi_compat; +}; + +/** + * rdma_get_device_list - Get list of IB devices currently available + * @num_devices: optional. if non-NULL, set to the number of devices + * returned in the array. + * + * Return a NULL-terminated array of IB devices. The array can be + * released with rdma_free_device_list(). + */ +struct rdma_device **rdma_get_device_list(int *num_devices); + +/** + * rdma_free_device_list - Free list from rdma_get_device_list() + * + * Free an array of devices returned from rdma_get_device_list(). Once + * the array is freed, pointers to devices that were not opened with + * rdma_open_device() are no longer valid. Client code must open all + * devices it intends to use before calling rdma_free_device_list(). + */ +void rdma_free_device_list(struct rdma_device **list); + +/** + * rdma_get_device_name - Return kernel device name + */ +const char *rdma_get_device_name(struct rdma_device *device); + +/** + * rdma_get_device_guid - Return device's node GUID + */ +uint64_t rdma_get_device_guid(struct rdma_device *device); + +/** + * rdma_open_device - Initialize device for use + */ +struct rdma_context *rdma_open_device(struct rdma_device *device); + +/** + * rdma_close_device - Release device + */ +int rdma_close_device(struct rdma_context *context); + +/** + * rdma_get_async_event - Get next async event + * @event: Pointer to use to return async event + * + * All async events returned by rdma_get_async_event() must eventually + * be acknowledged with rdma_ack_async_event(). + */ +int rdma_get_async_event(struct rdma_context *context, + struct rdma_async_event *event); + +/** + * rdma_ack_async_event - Acknowledge an async event + * @event: Event to be acknowledged. + * + * All async events which are returned by rdma_get_async_event() must + * be acknowledged. To avoid races, destroying an object (CQ, SRQ or + * QP) will wait for all affiliated events to be acknowledged, so + * there should be a one-to-one correspondence between acks and + * successful gets. + */ +void rdma_ack_async_event(struct rdma_async_event *event); + +/** + * rdma_query_device - Get device properties + */ +int rdma_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr); + +/** + * rdma_query_port - Get port properties + */ +int rdma_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr); + +/** + * rdma_query_gid - Get a GID table entry + */ +int rdma_query_gid(struct rdma_context *context, uint8_t port_num, + int index, union rdma_gid *gid); + +/** + * rdma_query_pkey - Get a P_Key table entry + */ +int rdma_query_pkey(struct rdma_context *context, uint8_t port_num, + int index, uint16_t *pkey); + +/** + * rdma_alloc_pd - Allocate a protection domain + */ +struct rdma_pd *rdma_alloc_pd(struct rdma_context *context); + +/** + * rdma_dealloc_pd - Free a protection domain + */ +int rdma_dealloc_pd(struct rdma_pd *pd); + +/** + * rdma_reg_mr - Register a memory region + */ +struct rdma_mr *rdma_reg_mr(struct rdma_pd *pd, void *addr, + size_t length, enum rdma_access_flags access); + +/** + * rdma_dereg_mr - Deregister a memory region + */ +int rdma_dereg_mr(struct rdma_mr *mr); + +/** + * rdma_create_comp_channel - Create a completion event channel + */ +struct rdma_comp_channel *rdma_create_comp_channel(struct rdma_context *context); + +/** + * rdma_destroy_comp_channel - Destroy a completion event channel + */ +int rdma_destroy_comp_channel(struct rdma_comp_channel *channel); + +/** + * rdma_create_cq - Create a completion queue + * @context - Context CQ will be attached to + * @cqe - Minimum number of entries required for CQ + * @cq_context - Consumer-supplied context returned for completion events + * @channel - Completion channel where completion events will be queued. + * May be NULL if completion events will not be used. + * @comp_vector - Completion vector used to signal completion events. + * Must be >= 0 and < context->num_comp_vectors. + */ +struct rdma_cq *rdma_create_cq(struct rdma_context *context, int cqe, + void *cq_context, + struct rdma_comp_channel *channel, + int comp_vector); + +/** + * rdma_resize_cq - Modifies the capacity of the CQ. + * @cq: The CQ to resize. + * @cqe: The minimum size of the CQ. + * + * Users can examine the cq structure to determine the actual CQ size. + */ +int rdma_resize_cq(struct rdma_cq *cq, int cqe); + +/** + * rdma_destroy_cq - Destroy a completion queue + */ +int rdma_destroy_cq(struct rdma_cq *cq); + +/** + * rdma_get_cq_event - Read next CQ event + * @channel: Channel to get next event from. + * @cq: Used to return pointer to CQ. + * @cq_context: Used to return consumer-supplied CQ context. + * + * All completion events returned by rdma_get_cq_event() must + * eventually be acknowledged with rdma_ack_cq_events(). + */ +int rdma_get_cq_event(struct rdma_comp_channel *channel, + struct rdma_cq **cq, void **cq_context); + +/** + * rdma_ack_cq_events - Acknowledge CQ completion events + * @cq: CQ to acknowledge events for + * @nevents: Number of events to acknowledge. + * + * All completion events which are returned by rdma_get_cq_event() must + * be acknowledged. To avoid races, rdma_destroy_cq() will wait for + * all completion events to be acknowledged, so there should be a + * one-to-one correspondence between acks and successful gets. An + * application may accumulate multiple completion events and + * acknowledge them in a single call to rdma_ack_cq_events() by passing + * the number of events to ack in @nevents. + */ +void rdma_ack_cq_events(struct rdma_cq *cq, unsigned int nevents); + +/** + * rdma_poll_cq - Poll a CQ for work completions + * @cq:the CQ being polled + * @num_entries:maximum number of completions to return + * @wc:array of at least @num_entries of &struct rdma_wc where completions + * will be returned + * + * Poll a CQ for (possibly multiple) completions. If the return value + * is < 0, an error occurred. If the return value is >= 0, it is the + * number of completions returned. If the return value is + * non-negative and strictly less than num_entries, then the CQ was + * emptied. + */ +static inline int rdma_poll_cq(struct rdma_cq *cq, int num_entries, + struct rdma_wc *wc) +{ + return cq->context->ops.poll_cq(cq, num_entries, wc); +} + +/** + * rdma_req_notify_cq - Request completion notification on a CQ. An + * event will be added to the completion channel associated with the + * CQ when an entry is added to the CQ. + * @cq: The completion queue to request notification for. + * @solicited_only: If non-zero, an event will be generated only for + * the next solicited CQ entry. If zero, any CQ entry, solicited or + * not, will generate an event. + */ +static inline int rdma_req_notify_cq(struct rdma_cq *cq, int solicited_only) +{ + return cq->context->ops.req_notify_cq(cq, solicited_only); +} + +/** + * rdma_create_srq - Creates a SRQ associated with the specified protection + * domain. + * @pd: The protection domain associated with the SRQ. + * @srq_init_attr: A list of initial attributes required to create the SRQ. + * + * srq_attr->max_wr and srq_attr->max_sge are read the determine the + * requested size of the SRQ, and set to the actual values allocated + * on return. If rdma_create_srq() succeeds, then max_wr and max_sge + * will always be at least as large as the requested values. + */ +struct rdma_srq *rdma_create_srq(struct rdma_pd *pd, + struct rdma_srq_init_attr *srq_init_attr); + +/** + * rdma_modify_srq - Modifies the attributes for the specified SRQ. + * @srq: The SRQ to modify. + * @srq_attr: On input, specifies the SRQ attributes to modify. On output, + * the current values of selected SRQ attributes are returned. + * @srq_attr_mask: A bit-mask used to specify which attributes of the SRQ + * are being modified. + * + * The mask may contain RDMA_SRQ_MAX_WR to resize the SRQ and/or + * RDMA_SRQ_LIMIT to set the SRQ's limit and request notification when + * the number of receives queued drops below the limit. + */ +int rdma_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask); + +/** + * rdma_query_srq - Returns the attribute list and current values for the + * specified SRQ. + * @srq: The SRQ to query. + * @srq_attr: The attributes of the specified SRQ. + */ +int rdma_query_srq(struct rdma_srq *srq, struct rdma_srq_attr *srq_attr); + +/** + * rdma_destroy_srq - Destroys the specified SRQ. + * @srq: The SRQ to destroy. + */ +int rdma_destroy_srq(struct rdma_srq *srq); + +/** + * rdma_post_srq_recv - Posts a list of work requests to the specified SRQ. + * @srq: The SRQ to post the work request on. + * @recv_wr: A list of work requests to post on the receive queue. + * @bad_recv_wr: On an immediate failure, this parameter will reference + * the work request that failed to be posted on the QP. + */ +static inline int rdma_post_srq_recv(struct rdma_srq *srq, + struct rdma_recv_wr *recv_wr, + struct rdma_recv_wr **bad_recv_wr) +{ + return srq->context->ops.post_srq_recv(srq, recv_wr, bad_recv_wr); +} + +/** + * rdmav_create_qp - Create a queue pair. + */ +struct rdma_qp *rdmav_create_qp(struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr); + +/** + * rdma_modify_qp - Modify a queue pair. + */ +int rdma_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask); + +/** + * rdma_query_qp - Returns the attribute list and current values for the + * specified QP. + * @qp: The QP to query. + * @attr: The attributes of the specified QP. + * @attr_mask: A bit-mask used to select specific attributes to query. + * @init_attr: Additional attributes of the selected QP. + * + * The qp_attr_mask may be used to limit the query to gathering only the + * selected attributes. + */ +int rdma_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr); + +/** + * rdmav_destroy_qp - Destroy a queue pair. + */ +int rdmav_destroy_qp(struct rdma_qp *qp); + +/** + * rdma_post_send - Post a list of work requests to a send queue. + */ +static inline int rdma_post_send(struct rdma_qp *qp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr) +{ + return qp->context->ops.post_send(qp, wr, bad_wr); +} + +/** + * rdma_post_recv - Post a list of work requests to a receive queue. + */ +static inline int rdma_post_recv(struct rdma_qp *qp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) +{ + return qp->context->ops.post_recv(qp, wr, bad_wr); +} + +/** + * rdma_create_ah - Create an address handle. + */ +struct rdma_ah *rdma_create_ah(struct rdma_pd *pd, struct rdma_ah_attr *attr); + +/** + * rdma_init_ah_from_wc - Initializes address handle attributes from a + * work completion. + * @context: Device context on which the received message arrived. + * @port_num: Port on which the received message arrived. + * @wc: Work completion associated with the received message. + * @grh: References the received global route header. This parameter is + * ignored unless the work completion indicates that the GRH is valid. + * @ah_attr: Returned attributes that can be used when creating an address + * handle for replying to the message. + */ +int rdma_init_ah_from_wc(struct rdma_context *context, uint8_t port_num, + struct rdma_wc *wc, struct rdma_grh *grh, + struct rdma_ah_attr *ah_attr); + +/** + * rdma_create_ah_from_wc - Creates an address handle associated with the + * sender of the specified work completion. + * @pd: The protection domain associated with the address handle. + * @wc: Work completion information associated with a received message. + * @grh: References the received global route header. This parameter is + * ignored unless the work completion indicates that the GRH is valid. + * @port_num: The outbound port number to associate with the address. + * + * The address handle is used to reference a local or global destination + * in all UD QP post sends. + */ +struct rdma_ah *rdma_create_ah_from_wc(struct rdma_pd *pd, struct rdma_wc *wc, + struct rdma_grh *grh, uint8_t port_num); + +/** + * rdma_destroy_ah - Destroy an address handle. + */ +int rdma_destroy_ah(struct rdma_ah *ah); + +/** + * rdma_attach_mcast - Attaches the specified QP to a multicast group. + * @qp: QP to attach to the multicast group. The QP must be a UD QP. + * @gid: Multicast group GID. + * @lid: Multicast group LID in host byte order. + * + * In order to route multicast packets correctly, subnet + * administration must have created the multicast group and configured + * the fabric appropriately. The port associated with the specified + * QP must also be a member of the multicast group. + */ +int rdma_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid); + +/** + * rdma_detach_mcast - Detaches the specified QP from a multicast group. + * @qp: QP to detach from the multicast group. + * @gid: Multicast group GID. + * @lid: Multicast group LID in host byte order. + */ +int rdma_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid); + END_C_DECLS # undef __attribute_const -#endif /* INFINIBAND_VERBS_H */ +#endif /* RDMA_VERBS_H */ From krkumar2 at in.ibm.com Thu Jul 27 21:50:22 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 28 Jul 2006 10:20:22 +0530 Subject: [openib-general] [PATCH 3/6] libibverbs configuration files changes. In-Reply-To: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <20060728045021.9426.29350.sendpatchset@K50wks273950wss.in.ibm.com> Configuration/Makefiles to build libibverbs with the new API. Signed-off-by: Krishna Kumar diff -ruNp ORG/libibverbs/configure.in NEW/libibverbs/configure.in --- ORG/libibverbs/configure.in 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/configure.in 2006-07-27 15:34:44.000000000 +0530 @@ -2,7 +2,7 @@ dnl Process this file with autoconf to p AC_PREREQ(2.57) AC_INIT(libibverbs, 1.1-pre1, openib-general at openib.org) -AC_CONFIG_SRCDIR([src/ibverbs.h]) +AC_CONFIG_SRCDIR([src/rdmaverbs.h]) AC_CONFIG_AUX_DIR(config) AM_CONFIG_HEADER(config.h) AM_INIT_AUTOMAKE(libibverbs, 1.1-pre1) @@ -33,5 +33,5 @@ AC_CACHE_CHECK(whether ld accepts --vers AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") -AC_CONFIG_FILES([Makefile libibverbs.spec]) +AC_CONFIG_FILES([Makefile librdmaverbs.spec]) AC_OUTPUT diff -ruNp ORG/libibverbs/libibverbs.spec.in NEW/libibverbs/libibverbs.spec.in --- ORG/libibverbs/libibverbs.spec.in 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/libibverbs.spec.in 1970-01-01 05:30:00.000000000 +0530 @@ -1,106 +0,0 @@ -# $Id: libibverbs.spec.in 7484 2006-05-24 21:12:21Z roland $ - -%define ver @VERSION@ - -Name: libibverbs -Version: 1.1 -Release: 0.1.pre1%{?dist} -Summary: A library for direct userspace use of InfiniBand - -Group: System Environment/Libraries -License: GPL/BSD -Url: http://openib.org/ -Source: http://openib.org/downloads/libibverbs-1.1-pre1.tar.gz -BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) - -%description -libibverbs is a library that allows userspace processes to use -InfiniBand "verbs" as described in the InfiniBand Architecture -Specification. This includes direct hardware access for fast path -operations. - -For this library to be useful, a device-specific plug-in module should -also be installed. - -%package devel -Summary: Development files for the libibverbs library -Group: System Environment/Libraries - -%description devel -Static libraries and header files for the libibverbs verbs library. - -%package utils -Summary: Examples for the libibverbs library -Group: System Environment/Libraries -Requires: %{name} = %{version}-%{release} - -%description utils -Useful libibverbs1 example programs such as ibv_devinfo, which -displays information about InfiniBand devices. - -%prep -%setup -q -n %{name}-%{ver} - -%build -%configure -make %{?_smp_mflags} - -%install -rm -rf $RPM_BUILD_ROOT -%makeinstall -# remove unpackaged files from the buildroot -rm -f $RPM_BUILD_ROOT%{_libdir}/*.la - -%clean -rm -rf $RPM_BUILD_ROOT - -%post -p /sbin/ldconfig -%postun -p /sbin/ldconfig - -%files -%defattr(-,root,root,-) -%{_libdir}/libibverbs*.so.* -%doc AUTHORS COPYING ChangeLog README - -%files devel -%defattr(-,root,root,-) -%{_libdir}/lib*.so -%{_libdir}/*.a -%{_includedir}/* - -%files utils -%defattr(-,root,root,-) -%{_bindir}/* -%{_mandir}/man1/* - -%changelog -* Mon May 22 2006 Roland Dreier - 1.1-0.1.pre1 -- New upstream release -- Remove dependency on libsysfs, since it is no longer used - -* Thu May 4 2006 Roland Dreier - 1.0.4-1 -- New upstream release - -* Mon Mar 14 2006 Roland Dreier - 1.0.3-1 -- New upstream release - -* Mon Mar 13 2006 Roland Dreier - 1.0.1-1 -- New upstream release - -* Thu Feb 16 2006 Roland Dreier - 1.0-1 -- New upstream release - -* Wed Feb 15 2006 Roland Dreier - 1.0-0.5.rc7 -- New upstream release - -* Sun Jan 22 2006 Roland Dreier - 1.0-0.4.rc6 -- New upstream release - -* Tue Oct 25 2005 Roland Dreier - 1.0-0.3.rc5 -- New upstream release - -* Wed Oct 5 2005 Roland Dreier - 1.0-0.2.rc4 -- Update to upstream 1.0-rc4 release - -* Mon Sep 26 2005 Roland Dreier - 1.0-0.1.rc3 -- Initial attempt at Fedora Extras-compliant spec file diff -ruNp ORG/libibverbs/librdmaverbs.spec.in NEW/libibverbs/librdmaverbs.spec.in --- ORG/libibverbs/librdmaverbs.spec.in 1970-01-01 05:30:00.000000000 +0530 +++ NEW/libibverbs/librdmaverbs.spec.in 2006-07-27 15:34:44.000000000 +0530 @@ -0,0 +1,106 @@ +# $Id: librdmaverbs.spec.in 7484 2006-05-24 21:12:21Z roland $ + +%define ver @VERSION@ + +Name: librdmaverbs +Version: 1.1 +Release: 0.1.pre1%{?dist} +Summary: A library for direct userspace use of InfiniBand + +Group: System Environment/Libraries +License: GPL/BSD +Url: http://openib.org/ +Source: http://openib.org/downloads/librdmaverbs-1.1-pre1.tar.gz +BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) + +%description +librdmaverbs is a library that allows userspace processes to use +InfiniBand and iWARP "verbs" as described in the InfiniBand Architecture +Specification and the iWARP documents. This includes direct hardware access +for fast path operations. + +For this library to be useful, a device-specific plug-in module should +also be installed. + +%package devel +Summary: Development files for the librdmaverbs library +Group: System Environment/Libraries + +%description devel +Static libraries and header files for the librdmaverbs verbs library. + +%package utils +Summary: Examples for the librdmaverbs library +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release} + +%description utils +Useful librdmaverbs example programs such as ibv_devinfo, which +displays information about InfiniBand devices. + +%prep +%setup -q -n %{name}-%{ver} + +%build +%configure +make %{?_smp_mflags} + +%install +rm -rf $RPM_BUILD_ROOT +%makeinstall +# remove unpackaged files from the buildroot +rm -f $RPM_BUILD_ROOT%{_libdir}/*.la + +%clean +rm -rf $RPM_BUILD_ROOT + +%post -p /sbin/ldconfig +%postun -p /sbin/ldconfig + +%files +%defattr(-,root,root,-) +%{_libdir}/libibverbs*.so.* +%doc AUTHORS COPYING ChangeLog README + +%files devel +%defattr(-,root,root,-) +%{_libdir}/lib*.so +%{_libdir}/*.a +%{_includedir}/* + +%files utils +%defattr(-,root,root,-) +%{_bindir}/* +%{_mandir}/man1/* + +%changelog +* Mon May 22 2006 Roland Dreier - 1.1-0.1.pre1 +- New upstream release +- Remove dependency on libsysfs, since it is no longer used + +* Thu May 4 2006 Roland Dreier - 1.0.4-1 +- New upstream release + +* Mon Mar 14 2006 Roland Dreier - 1.0.3-1 +- New upstream release + +* Mon Mar 13 2006 Roland Dreier - 1.0.1-1 +- New upstream release + +* Thu Feb 16 2006 Roland Dreier - 1.0-1 +- New upstream release + +* Wed Feb 15 2006 Roland Dreier - 1.0-0.5.rc7 +- New upstream release + +* Sun Jan 22 2006 Roland Dreier - 1.0-0.4.rc6 +- New upstream release + +* Tue Oct 25 2005 Roland Dreier - 1.0-0.3.rc5 +- New upstream release + +* Wed Oct 5 2005 Roland Dreier - 1.0-0.2.rc4 +- Update to upstream 1.0-rc4 release + +* Mon Sep 26 2005 Roland Dreier - 1.0-0.1.rc3 +- Initial attempt at Fedora Extras-compliant spec file diff -ruNp ORG/libibverbs/Makefile.am NEW/libibverbs/Makefile.am --- ORG/libibverbs/Makefile.am 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/Makefile.am 2006-07-27 15:34:44.000000000 +0530 @@ -9,7 +9,7 @@ AM_CFLAGS = -g -Wall -D_GNU_SOURCE src_libibverbs_la_CFLAGS = -g -Wall -D_GNU_SOURCE -DDRIVER_PATH=\"$(libdir)/infiniband\" if HAVE_LD_VERSION_SCRIPT - libibverbs_version_script = -Wl,--version-script=$(srcdir)/src/libibverbs.map + libibverbs_version_script = -Wl,--version-script=$(srcdir)/src/librdmaverbs.map else libibverbs_version_script = endif @@ -18,7 +18,7 @@ src_libibverbs_la_SOURCES = src/cmd.c sr src/memory.c src/sysfs.c src/verbs.c src_libibverbs_la_LDFLAGS = -version-info 2 -export-dynamic \ $(libibverbs_version_script) -src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map +src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/librdmaverbs.map bin_PROGRAMS = examples/ibv_devices examples/ibv_devinfo \ examples/ibv_asyncwatch examples/ibv_rc_pingpong examples/ibv_uc_pingpong \ @@ -56,8 +56,8 @@ DEBIAN = debian/changelog debian/compat EXTRA_DIST = include/infiniband/driver.h include/infiniband/kern-abi.h \ include/infiniband/opcode.h include/infiniband/verbs.h include/infiniband/marshall.h \ include/infiniband/sa-kern-abi.h include/infiniband/sa.h \ - src/ibverbs.h examples/pingpong.h \ - src/libibverbs.map libibverbs.spec.in $(man_MANS) + src/rdmaverbs.h examples/pingpong.h \ + src/librdmaverbs.map libibverbs.spec.in $(man_MANS) dist-hook: libibverbs.spec cp libibverbs.spec $(distdir) From krkumar2 at in.ibm.com Thu Jul 27 21:50:15 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 28 Jul 2006 10:20:15 +0530 Subject: [openib-general] [PATCH 2/6] libibverbs source files changes. In-Reply-To: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <20060728045014.9426.49275.sendpatchset@K50wks273950wss.in.ibm.com> Source files in libibverbs defining the new API. Signed-off-by: Krishna Kumar diff -ruNp ORG/libibverbs/src/cmd.c NEW/libibverbs/src/cmd.c --- ORG/libibverbs/src/cmd.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/cmd.c 2006-07-27 15:34:44.000000000 +0530 @@ -45,16 +45,16 @@ #include #include -#include "ibverbs.h" +#include "rdmaverbs.h" -static int ibv_cmd_get_context_v2(struct ibv_context *context, - struct ibv_get_context *new_cmd, +static int rdma_cmd_get_context_v2(struct rdma_context *context, + struct rdma_get_context *new_cmd, size_t new_cmd_size, - struct ibv_get_context_resp *resp, + struct rdma_get_context_resp *resp, size_t resp_size) { - struct ibv_abi_compat_v2 *t; - struct ibv_get_context_v2 *cmd; + struct rdma_abi_compat_v2 *t; + struct rdma_get_context_v2 *cmd; size_t cmd_size; uint32_t cq_fd; @@ -65,9 +65,10 @@ static int ibv_cmd_get_context_v2(struct cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; cmd = alloca(cmd_size); - memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + memcpy(cmd->driver_data, new_cmd->driver_data, + new_cmd_size - sizeof *new_cmd); - IBV_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); + RDMA_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); cmd->cq_fd_tab = (uintptr_t) &cq_fd; if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) @@ -81,14 +82,16 @@ static int ibv_cmd_get_context_v2(struct return 0; } -int ibv_cmd_get_context(struct ibv_context *context, struct ibv_get_context *cmd, - size_t cmd_size, struct ibv_get_context_resp *resp, +int rdma_cmd_get_context(struct rdma_context *context, + struct rdma_get_context *cmd, + size_t cmd_size, struct rdma_get_context_resp *resp, size_t resp_size) { if (abi_ver <= 2) - return ibv_cmd_get_context_v2(context, cmd, cmd_size, resp, resp_size); + return rdma_cmd_get_context_v2(context, cmd, cmd_size, resp, + resp_size); - IBV_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); + RDMA_INIT_CMD_RESP(cmd, cmd_size, GET_CONTEXT, resp, resp_size); if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) return errno; @@ -99,14 +102,26 @@ int ibv_cmd_get_context(struct ibv_conte return 0; } -int ibv_cmd_query_device(struct ibv_context *context, - struct ibv_device_attr *device_attr, +int ibv_cmd_get_context(struct ibv_context *context, + struct ibv_get_context *cmd, + size_t cmd_size, struct ibv_get_context_resp *resp, + size_t resp_size) +{ + return rdma_cmd_get_context((struct rdma_context *) context, + (struct rdma_get_context *) cmd, + cmd_size, + (struct rdma_get_context_resp *) resp, + resp_size); +} + +int rdma_cmd_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr, uint64_t *raw_fw_ver, - struct ibv_query_device *cmd, size_t cmd_size) + struct rdma_query_device *cmd, size_t cmd_size) { - struct ibv_query_device_resp resp; + struct rdma_query_device_resp resp; - IBV_INIT_CMD_RESP(cmd, cmd_size, QUERY_DEVICE, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_DEVICE, &resp, sizeof resp); if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) return errno; @@ -156,13 +171,25 @@ int ibv_cmd_query_device(struct ibv_cont return 0; } -int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num, - struct ibv_port_attr *port_attr, - struct ibv_query_port *cmd, size_t cmd_size) +int ibv_cmd_query_device(struct ibv_context *context, + struct ibv_device_attr *device_attr, + uint64_t *raw_fw_ver, + struct ibv_query_device *cmd, size_t cmd_size) +{ + return rdma_cmd_query_device((struct rdma_context *) context, + (struct rdma_device_attr *) device_attr, + raw_fw_ver, + (struct rdma_query_device *) cmd, + cmd_size); +} + +int rdma_cmd_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr, + struct rdma_query_port *cmd, size_t cmd_size) { - struct ibv_query_port_resp resp; + struct rdma_query_port_resp resp; - IBV_INIT_CMD_RESP(cmd, cmd_size, QUERY_PORT, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_PORT, &resp, sizeof resp); cmd->port_num = port_num; if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) @@ -191,11 +218,21 @@ int ibv_cmd_query_port(struct ibv_contex return 0; } -int ibv_cmd_alloc_pd(struct ibv_context *context, struct ibv_pd *pd, - struct ibv_alloc_pd *cmd, size_t cmd_size, - struct ibv_alloc_pd_resp *resp, size_t resp_size) +int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num, + struct ibv_port_attr *port_attr, + struct ibv_query_port *cmd, size_t cmd_size) { - IBV_INIT_CMD_RESP(cmd, cmd_size, ALLOC_PD, resp, resp_size); + return rdma_cmd_query_port((struct rdma_context *) context, + port_num, + (struct rdma_port_attr *) port_attr, + (struct rdma_query_port *) cmd, cmd_size); +} + +int rdma_cmd_alloc_pd(struct rdma_context *context, struct rdma_pd *pd, + struct rdma_alloc_pd *cmd, size_t cmd_size, + struct rdma_alloc_pd_resp *resp, size_t resp_size) +{ + RDMA_INIT_CMD_RESP(cmd, cmd_size, ALLOC_PD, resp, resp_size); if (write(context->cmd_fd, cmd, cmd_size) != cmd_size) return errno; @@ -205,11 +242,21 @@ int ibv_cmd_alloc_pd(struct ibv_context return 0; } -int ibv_cmd_dealloc_pd(struct ibv_pd *pd) +int ibv_cmd_alloc_pd(struct ibv_context *context, struct ibv_pd *pd, + struct ibv_alloc_pd *cmd, size_t cmd_size, + struct ibv_alloc_pd_resp *resp, size_t resp_size) +{ + return rdma_cmd_alloc_pd((struct rdma_context *) context, + (struct rdma_pd *) pd, + (struct rdma_alloc_pd *) cmd, cmd_size, + (struct rdma_alloc_pd_resp *) resp, resp_size); +} + +int rdma_cmd_dealloc_pd(struct rdma_pd *pd) { - struct ibv_dealloc_pd cmd; + struct rdma_dealloc_pd cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, DEALLOC_PD); + RDMA_INIT_CMD(&cmd, sizeof cmd, DEALLOC_PD); cmd.pd_handle = pd->handle; if (write(pd->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -218,14 +265,19 @@ int ibv_cmd_dealloc_pd(struct ibv_pd *pd return 0; } -int ibv_cmd_reg_mr(struct ibv_pd *pd, void *addr, size_t length, - uint64_t hca_va, enum ibv_access_flags access, - struct ibv_mr *mr, struct ibv_reg_mr *cmd, +int ibv_cmd_dealloc_pd(struct ibv_pd *pd) +{ + return rdma_cmd_dealloc_pd((struct rdma_pd *) pd); +} + +int rdma_cmd_reg_mr(struct rdma_pd *pd, void *addr, size_t length, + uint64_t hca_va, enum rdma_access_flags access, + struct rdma_mr *mr, struct rdma_reg_mr *cmd, size_t cmd_size) { - struct ibv_reg_mr_resp resp; + struct rdma_reg_mr_resp resp; - IBV_INIT_CMD_RESP(cmd, cmd_size, REG_MR, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(cmd, cmd_size, REG_MR, &resp, sizeof resp); cmd->start = (uintptr_t) addr; cmd->length = length; @@ -243,11 +295,21 @@ int ibv_cmd_reg_mr(struct ibv_pd *pd, vo return 0; } -int ibv_cmd_dereg_mr(struct ibv_mr *mr) +int ibv_cmd_reg_mr(struct ibv_pd *pd, void *addr, size_t length, + uint64_t hca_va, enum ibv_access_flags access, + struct ibv_mr *mr, struct ibv_reg_mr *cmd, + size_t cmd_size) +{ + return rdma_cmd_reg_mr((struct rdma_pd *) pd, addr, length, + hca_va, access, (struct rdma_mr *) mr, + (struct rdma_reg_mr *) cmd, cmd_size); +} + +int rdma_cmd_dereg_mr(struct rdma_mr *mr) { - struct ibv_dereg_mr cmd; + struct rdma_dereg_mr cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, DEREG_MR); + RDMA_INIT_CMD(&cmd, sizeof cmd, DEREG_MR); cmd.mr_handle = mr->handle; if (write(mr->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -256,19 +318,27 @@ int ibv_cmd_dereg_mr(struct ibv_mr *mr) return 0; } -static int ibv_cmd_create_cq_v2(struct ibv_context *context, int cqe, - struct ibv_cq *cq, - struct ibv_create_cq *new_cmd, size_t new_cmd_size, - struct ibv_create_cq_resp *resp, size_t resp_size) +int ibv_cmd_dereg_mr(struct ibv_mr *mr) { - struct ibv_create_cq_v2 *cmd; + return rdma_cmd_dereg_mr((struct rdma_mr *) mr); +} + +static int rdma_cmd_create_cq_v2(struct rdma_context *context, int cqe, + struct rdma_cq *cq, + struct rdma_create_cq *new_cmd, + size_t new_cmd_size, + struct rdma_create_cq_resp *resp, + size_t resp_size) +{ + struct rdma_create_cq_v2 *cmd; size_t cmd_size; cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; cmd = alloca(cmd_size); - memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + memcpy(cmd->driver_data, new_cmd->driver_data, + new_cmd_size - sizeof *new_cmd); - IBV_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); cmd->user_handle = (uintptr_t) cq; cmd->cqe = cqe; cmd->event_handler = 0; @@ -282,17 +352,17 @@ static int ibv_cmd_create_cq_v2(struct i return 0; } -int ibv_cmd_create_cq(struct ibv_context *context, int cqe, - struct ibv_comp_channel *channel, - int comp_vector, struct ibv_cq *cq, - struct ibv_create_cq *cmd, size_t cmd_size, - struct ibv_create_cq_resp *resp, size_t resp_size) +int rdma_cmd_create_cq(struct rdma_context *context, int cqe, + struct rdma_comp_channel *channel, + int comp_vector, struct rdma_cq *cq, + struct rdma_create_cq *cmd, size_t cmd_size, + struct rdma_create_cq_resp *resp, size_t resp_size) { if (abi_ver <= 2) - return ibv_cmd_create_cq_v2(context, cqe, cq, + return rdma_cmd_create_cq_v2(context, cqe, cq, cmd, cmd_size, resp, resp_size); - IBV_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_CQ, resp, resp_size); cmd->user_handle = (uintptr_t) cq; cmd->cqe = cqe; cmd->comp_vector = comp_vector; @@ -308,20 +378,34 @@ int ibv_cmd_create_cq(struct ibv_context return 0; } -int ibv_cmd_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc) +int ibv_cmd_create_cq(struct ibv_context *context, int cqe, + struct ibv_comp_channel *channel, + int comp_vector, struct ibv_cq *cq, + struct ibv_create_cq *cmd, size_t cmd_size, + struct ibv_create_cq_resp *resp, size_t resp_size) +{ + return rdma_cmd_create_cq((struct rdma_context *) context, cqe, + (struct rdma_comp_channel *) channel, + comp_vector, (struct rdma_cq *) cq, + (struct rdma_create_cq *) cmd, cmd_size, + (struct rdma_create_cq_resp *) resp, + resp_size); +} + +int rdma_cmd_poll_cq(struct rdma_cq *ibcq, int ne, struct rdma_wc *wc) { - struct ibv_poll_cq cmd; - struct ibv_poll_cq_resp *resp; + struct rdma_poll_cq cmd; + struct rdma_poll_cq_resp *resp; int i; int rsize; int ret; - rsize = sizeof *resp + ne * sizeof(struct ibv_kern_wc); + rsize = sizeof *resp + ne * sizeof(struct rdma_kern_wc); resp = malloc(rsize); if (!resp) return -1; - IBV_INIT_CMD_RESP(&cmd, sizeof cmd, POLL_CQ, resp, rsize); + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, POLL_CQ, resp, rsize); cmd.cq_handle = ibcq->handle; cmd.ne = ne; @@ -353,11 +437,17 @@ out: return ret; } -int ibv_cmd_req_notify_cq(struct ibv_cq *ibcq, int solicited_only) +int ibv_cmd_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc) +{ + return rdma_cmd_poll_cq((struct rdma_cq *) ibcq, ne, + (struct rdma_wc *) wc); +} + +int rdma_cmd_req_notify_cq(struct rdma_cq *ibcq, int solicited_only) { - struct ibv_req_notify_cq cmd; + struct rdma_req_notify_cq cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, REQ_NOTIFY_CQ); + RDMA_INIT_CMD(&cmd, sizeof cmd, REQ_NOTIFY_CQ); cmd.cq_handle = ibcq->handle; cmd.solicited = !!solicited_only; @@ -367,12 +457,18 @@ int ibv_cmd_req_notify_cq(struct ibv_cq return 0; } -int ibv_cmd_resize_cq(struct ibv_cq *cq, int cqe, - struct ibv_resize_cq *cmd, size_t cmd_size) +int ibv_cmd_req_notify_cq(struct ibv_cq *ibcq, int solicited_only) +{ + return rdma_cmd_req_notify_cq((struct rdma_cq *) ibcq, + solicited_only); +} + +int rdma_cmd_resize_cq(struct rdma_cq *cq, int cqe, + struct rdma_resize_cq *cmd, size_t cmd_size) { - struct ibv_resize_cq_resp resp; + struct rdma_resize_cq_resp resp; - IBV_INIT_CMD_RESP(cmd, cmd_size, RESIZE_CQ, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(cmd, cmd_size, RESIZE_CQ, &resp, sizeof resp); cmd->cq_handle = cq->handle; cmd->cqe = cqe; @@ -384,11 +480,18 @@ int ibv_cmd_resize_cq(struct ibv_cq *cq, return 0; } -static int ibv_cmd_destroy_cq_v1(struct ibv_cq *cq) +int ibv_cmd_resize_cq(struct ibv_cq *cq, int cqe, + struct ibv_resize_cq *cmd, size_t cmd_size) +{ + return rdma_cmd_resize_cq((struct rdma_cq *) cq, cqe, + (struct rdma_resize_cq *) cmd, cmd_size); +} + +static int rdma_cmd_destroy_cq_v1(struct rdma_cq *cq) { - struct ibv_destroy_cq_v1 cmd; + struct rdma_destroy_cq_v1 cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, DESTROY_CQ); + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_CQ); cmd.cq_handle = cq->handle; if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -397,15 +500,15 @@ static int ibv_cmd_destroy_cq_v1(struct return 0; } -int ibv_cmd_destroy_cq(struct ibv_cq *cq) +int rdma_cmd_destroy_cq(struct rdma_cq *cq) { - struct ibv_destroy_cq cmd; - struct ibv_destroy_cq_resp resp; + struct rdma_destroy_cq cmd; + struct rdma_destroy_cq_resp resp; if (abi_ver == 1) - return ibv_cmd_destroy_cq_v1(cq); + return rdma_cmd_destroy_cq_v1(cq); - IBV_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_CQ, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_CQ, &resp, sizeof resp); cmd.cq_handle = cq->handle; if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -420,12 +523,17 @@ int ibv_cmd_destroy_cq(struct ibv_cq *cq return 0; } -int ibv_cmd_create_srq(struct ibv_pd *pd, - struct ibv_srq *srq, struct ibv_srq_init_attr *attr, - struct ibv_create_srq *cmd, size_t cmd_size, - struct ibv_create_srq_resp *resp, size_t resp_size) +int ibv_cmd_destroy_cq(struct ibv_cq *cq) +{ + return rdma_cmd_destroy_cq((struct rdma_cq *) cq); +} + +int rdma_cmd_create_srq(struct rdma_pd *pd, + struct rdma_srq *srq, struct rdma_srq_init_attr *attr, + struct rdma_create_srq *cmd, size_t cmd_size, + struct rdma_create_srq_resp *resp, size_t resp_size) { - IBV_INIT_CMD_RESP(cmd, cmd_size, CREATE_SRQ, resp, resp_size); + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_SRQ, resp, resp_size); cmd->user_handle = (uintptr_t) srq; cmd->pd_handle = pd->handle; cmd->max_wr = attr->attr.max_wr; @@ -441,8 +549,8 @@ int ibv_cmd_create_srq(struct ibv_pd *pd attr->attr.max_wr = resp->max_wr; attr->attr.max_sge = resp->max_sge; } else { - struct ibv_create_srq_resp_v5 *resp_v5 = - (struct ibv_create_srq_resp_v5 *) resp; + struct rdma_create_srq_resp_v5 *resp_v5 = + (struct rdma_create_srq_resp_v5 *) resp; memmove((void *) resp + sizeof *resp, (void *) resp_v5 + sizeof *resp_v5, @@ -452,20 +560,34 @@ int ibv_cmd_create_srq(struct ibv_pd *pd return 0; } -static int ibv_cmd_modify_srq_v3(struct ibv_srq *srq, - struct ibv_srq_attr *srq_attr, - enum ibv_srq_attr_mask srq_attr_mask, - struct ibv_modify_srq *new_cmd, +int ibv_cmd_create_srq(struct ibv_pd *pd, + struct ibv_srq *srq, struct ibv_srq_init_attr *attr, + struct ibv_create_srq *cmd, size_t cmd_size, + struct ibv_create_srq_resp *resp, size_t resp_size) +{ + return rdma_cmd_create_srq((struct rdma_pd *) pd, + (struct rdma_srq *) srq, + (struct rdma_srq_init_attr *) attr, + (struct rdma_create_srq *) cmd, cmd_size, + (struct rdma_create_srq_resp *) resp, + resp_size); +} + +static int rdma_cmd_modify_srq_v3(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask, + struct rdma_modify_srq *new_cmd, size_t new_cmd_size) { - struct ibv_modify_srq_v3 *cmd; + struct rdma_modify_srq_v3 *cmd; size_t cmd_size; cmd_size = sizeof *cmd + new_cmd_size - sizeof *new_cmd; cmd = alloca(cmd_size); - memcpy(cmd->driver_data, new_cmd->driver_data, new_cmd_size - sizeof *new_cmd); + memcpy(cmd->driver_data, new_cmd->driver_data, + new_cmd_size - sizeof *new_cmd); - IBV_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); cmd->srq_handle = srq->handle; cmd->attr_mask = srq_attr_mask; @@ -480,16 +602,16 @@ static int ibv_cmd_modify_srq_v3(struct return 0; } -int ibv_cmd_modify_srq(struct ibv_srq *srq, - struct ibv_srq_attr *srq_attr, - enum ibv_srq_attr_mask srq_attr_mask, - struct ibv_modify_srq *cmd, size_t cmd_size) +int rdma_cmd_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask, + struct rdma_modify_srq *cmd, size_t cmd_size) { if (abi_ver == 3) - return ibv_cmd_modify_srq_v3(srq, srq_attr, srq_attr_mask, + return rdma_cmd_modify_srq_v3(srq, srq_attr, srq_attr_mask, cmd, cmd_size); - IBV_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_SRQ); cmd->srq_handle = srq->handle; cmd->attr_mask = srq_attr_mask; @@ -502,12 +624,23 @@ int ibv_cmd_modify_srq(struct ibv_srq *s return 0; } -int ibv_cmd_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr, - struct ibv_query_srq *cmd, size_t cmd_size) +int ibv_cmd_modify_srq(struct ibv_srq *srq, + struct ibv_srq_attr *srq_attr, + enum ibv_srq_attr_mask srq_attr_mask, + struct ibv_modify_srq *cmd, size_t cmd_size) { - struct ibv_query_srq_resp resp; + return rdma_cmd_modify_srq((struct rdma_srq *) srq, + (struct rdma_srq_attr *) srq_attr, + srq_attr_mask, + (struct rdma_modify_srq *) cmd, cmd_size); +} - IBV_INIT_CMD_RESP(cmd, cmd_size, QUERY_SRQ, &resp, sizeof resp); +int rdma_cmd_query_srq(struct rdma_srq *srq, struct rdma_srq_attr *srq_attr, + struct rdma_query_srq *cmd, size_t cmd_size) +{ + struct rdma_query_srq_resp resp; + + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_SRQ, &resp, sizeof resp); cmd->srq_handle = srq->handle; if (write(srq->context->cmd_fd, cmd, cmd_size) != cmd_size) @@ -520,11 +653,19 @@ int ibv_cmd_query_srq(struct ibv_srq *sr return 0; } -static int ibv_cmd_destroy_srq_v1(struct ibv_srq *srq) +int ibv_cmd_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr, + struct ibv_query_srq *cmd, size_t cmd_size) +{ + return rdma_cmd_query_srq((struct rdma_srq *) srq, + (struct rdma_srq_attr *) srq_attr, + (struct rdma_query_srq *) cmd, cmd_size); +} + +static int rdma_cmd_destroy_srq_v1(struct rdma_srq *srq) { - struct ibv_destroy_srq_v1 cmd; + struct rdma_destroy_srq_v1 cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, DESTROY_SRQ); + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_SRQ); cmd.srq_handle = srq->handle; if (write(srq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -533,15 +674,15 @@ static int ibv_cmd_destroy_srq_v1(struct return 0; } -int ibv_cmd_destroy_srq(struct ibv_srq *srq) +int rdma_cmd_destroy_srq(struct rdma_srq *srq) { - struct ibv_destroy_srq cmd; - struct ibv_destroy_srq_resp resp; + struct rdma_destroy_srq cmd; + struct rdma_destroy_srq_resp resp; if (abi_ver == 1) - return ibv_cmd_destroy_srq_v1(srq); + return rdma_cmd_destroy_srq_v1(srq); - IBV_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_SRQ, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_SRQ, &resp, sizeof resp); cmd.srq_handle = srq->handle; if (write(srq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -555,12 +696,17 @@ int ibv_cmd_destroy_srq(struct ibv_srq * return 0; } -int ibv_cmd_create_qp(struct ibv_pd *pd, - struct ibv_qp *qp, struct ibv_qp_init_attr *attr, - struct ibv_create_qp *cmd, size_t cmd_size, - struct ibv_create_qp_resp *resp, size_t resp_size) +int ibv_cmd_destroy_srq(struct ibv_srq *srq) +{ + return rdma_cmd_destroy_srq((struct rdma_srq *) srq); +} + +int rdma_cmd_create_qp(struct rdma_pd *pd, + struct rdma_qp *qp, struct rdma_qp_init_attr *attr, + struct rdma_create_qp *cmd, size_t cmd_size, + struct rdma_create_qp_resp *resp, size_t resp_size) { - IBV_INIT_CMD_RESP(cmd, cmd_size, CREATE_QP, resp, resp_size); + RDMA_INIT_CMD_RESP(cmd, cmd_size, CREATE_QP, resp, resp_size); cmd->user_handle = (uintptr_t) qp; cmd->pd_handle = pd->handle; @@ -591,15 +737,15 @@ int ibv_cmd_create_qp(struct ibv_pd *pd, } if (abi_ver == 4) { - struct ibv_create_qp_resp_v4 *resp_v4 = - (struct ibv_create_qp_resp_v4 *) resp; + struct rdma_create_qp_resp_v4 *resp_v4 = + (struct rdma_create_qp_resp_v4 *) resp; memmove((void *) resp + sizeof *resp, (void *) resp_v4 + sizeof *resp_v4, resp_size - sizeof *resp); } else if (abi_ver <= 3) { - struct ibv_create_qp_resp_v3 *resp_v3 = - (struct ibv_create_qp_resp_v3 *) resp; + struct rdma_create_qp_resp_v3 *resp_v3 = + (struct rdma_create_qp_resp_v3 *) resp; memmove((void *) resp + sizeof *resp, (void *) resp_v3 + sizeof *resp_v3, @@ -609,14 +755,27 @@ int ibv_cmd_create_qp(struct ibv_pd *pd, return 0; } -int ibv_cmd_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, - enum ibv_qp_attr_mask attr_mask, - struct ibv_qp_init_attr *init_attr, - struct ibv_query_qp *cmd, size_t cmd_size) +int ibv_cmd_create_qp(struct ibv_pd *pd, + struct ibv_qp *qp, struct ibv_qp_init_attr *attr, + struct ibv_create_qp *cmd, size_t cmd_size, + struct ibv_create_qp_resp *resp, size_t resp_size) { - struct ibv_query_qp_resp resp; + return rdma_cmd_create_qp((struct rdma_pd *) pd, + (struct rdma_qp *) qp, + (struct rdma_qp_init_attr *) attr, + (struct rdma_create_qp *) cmd, cmd_size, + (struct rdma_create_qp_resp *) resp, + resp_size); +} + +int rdma_cmd_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr, + struct rdma_query_qp *cmd, size_t cmd_size) +{ + struct rdma_query_qp_resp resp; - IBV_INIT_CMD_RESP(cmd, cmd_size, QUERY_QP, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(cmd, cmd_size, QUERY_QP, &resp, sizeof resp); cmd->qp_handle = qp->handle; cmd->attr_mask = attr_mask; @@ -689,11 +848,22 @@ int ibv_cmd_query_qp(struct ibv_qp *qp, return 0; } -int ibv_cmd_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, - enum ibv_qp_attr_mask attr_mask, - struct ibv_modify_qp *cmd, size_t cmd_size) +int ibv_cmd_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, + enum ibv_qp_attr_mask attr_mask, + struct ibv_qp_init_attr *init_attr, + struct ibv_query_qp *cmd, size_t cmd_size) +{ + return rdma_cmd_query_qp((struct rdma_qp *) qp, + (struct rdma_qp_attr *) attr, attr_mask, + (struct rdma_qp_init_attr *) init_attr, + (struct rdma_query_qp *) cmd, cmd_size); +} + +int rdma_cmd_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_modify_qp *cmd, size_t cmd_size) { - IBV_INIT_CMD(cmd, cmd_size, MODIFY_QP); + RDMA_INIT_CMD(cmd, cmd_size, MODIFY_QP); cmd->qp_handle = qp->handle; cmd->attr_mask = attr_mask; @@ -749,11 +919,21 @@ int ibv_cmd_modify_qp(struct ibv_qp *qp, return 0; } -static int ibv_cmd_destroy_qp_v1(struct ibv_qp *qp) +int ibv_cmd_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, + enum ibv_qp_attr_mask attr_mask, + struct ibv_modify_qp *cmd, size_t cmd_size) +{ + return rdma_cmd_modify_qp((struct rdma_qp *) qp, + (struct rdma_qp_attr *) attr, + attr_mask, (struct rdma_modify_qp *) cmd, + cmd_size); +} + +static int rdma_cmd_destroy_qp_v1(struct rdma_qp *qp) { - struct ibv_destroy_qp_v1 cmd; + struct rdma_destroy_qp_v1 cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, DESTROY_QP); + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_QP); cmd.qp_handle = qp->handle; if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -762,14 +942,14 @@ static int ibv_cmd_destroy_qp_v1(struct return 0; } -int ibv_cmd_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, - struct ibv_send_wr **bad_wr) +int rdma_cmd_post_send(struct rdma_qp *ibqp, struct rdma_send_wr *wr, + struct rdma_send_wr **bad_wr) { - struct ibv_post_send *cmd; - struct ibv_post_send_resp resp; - struct ibv_send_wr *i; - struct ibv_kern_send_wr *n, *tmp; - struct ibv_sge *s; + struct rdma_post_send *cmd; + struct rdma_post_send_resp resp; + struct rdma_send_wr *i; + struct rdma_kern_send_wr *n, *tmp; + struct rdma_sge *s; unsigned wr_count = 0; unsigned sge_count = 0; int cmd_size; @@ -783,14 +963,14 @@ int ibv_cmd_post_send(struct ibv_qp *ibq cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; cmd = alloca(cmd_size); - IBV_INIT_CMD_RESP(cmd, cmd_size, POST_SEND, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_SEND, &resp, sizeof resp); cmd->qp_handle = ibqp->handle; cmd->wr_count = wr_count; cmd->sge_count = sge_count; cmd->wqe_size = sizeof *n; - n = (struct ibv_kern_send_wr *) ((void *) cmd + sizeof *cmd); - s = (struct ibv_sge *) (n + wr_count); + n = (struct rdma_kern_send_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); tmp = n; for (i = wr; i; i = i->next) { @@ -799,21 +979,21 @@ int ibv_cmd_post_send(struct ibv_qp *ibq tmp->opcode = i->opcode; tmp->send_flags = i->send_flags; tmp->imm_data = i->imm_data; - if (ibqp->qp_type == IBV_QPT_UD) { + if (ibqp->qp_type == RDMA_QPT_UD) { tmp->wr.ud.ah = i->wr.ud.ah->handle; tmp->wr.ud.remote_qpn = i->wr.ud.remote_qpn; tmp->wr.ud.remote_qkey = i->wr.ud.remote_qkey; } else { switch(i->opcode) { - case IBV_WR_RDMA_WRITE: - case IBV_WR_RDMA_WRITE_WITH_IMM: - case IBV_WR_RDMA_READ: + case RDMA_WR_RDMA_WRITE: + case RDMA_WR_RDMA_WRITE_WITH_IMM: + case RDMA_WR_RDMA_READ: tmp->wr.rdma.remote_addr = i->wr.rdma.remote_addr; tmp->wr.rdma.rkey = i->wr.rdma.rkey; break; - case IBV_WR_ATOMIC_CMP_AND_SWP: - case IBV_WR_ATOMIC_FETCH_AND_ADD: + case RDMA_WR_ATOMIC_CMP_AND_SWP: + case RDMA_WR_ATOMIC_FETCH_AND_ADD: tmp->wr.atomic.remote_addr = i->wr.atomic.remote_addr; tmp->wr.atomic.compare_add = @@ -849,14 +1029,22 @@ int ibv_cmd_post_send(struct ibv_qp *ibq return ret; } -int ibv_cmd_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, - struct ibv_recv_wr **bad_wr) +int ibv_cmd_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, + struct ibv_send_wr **bad_wr) { - struct ibv_post_recv *cmd; - struct ibv_post_recv_resp resp; - struct ibv_recv_wr *i; - struct ibv_kern_recv_wr *n, *tmp; - struct ibv_sge *s; + return rdma_cmd_post_send((struct rdma_qp *) ibqp, + (struct rdma_send_wr *) wr, + (struct rdma_send_wr **) bad_wr); +} + +int rdma_cmd_post_recv(struct rdma_qp *ibqp, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) +{ + struct rdma_post_recv *cmd; + struct rdma_post_recv_resp resp; + struct rdma_recv_wr *i; + struct rdma_kern_recv_wr *n, *tmp; + struct rdma_sge *s; unsigned wr_count = 0; unsigned sge_count = 0; int cmd_size; @@ -870,14 +1058,14 @@ int ibv_cmd_post_recv(struct ibv_qp *ibq cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; cmd = alloca(cmd_size); - IBV_INIT_CMD_RESP(cmd, cmd_size, POST_RECV, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_RECV, &resp, sizeof resp); cmd->qp_handle = ibqp->handle; cmd->wr_count = wr_count; cmd->sge_count = sge_count; cmd->wqe_size = sizeof *n; - n = (struct ibv_kern_recv_wr *) ((void *) cmd + sizeof *cmd); - s = (struct ibv_sge *) (n + wr_count); + n = (struct rdma_kern_recv_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); tmp = n; for (i = wr; i; i = i->next) { @@ -907,14 +1095,22 @@ int ibv_cmd_post_recv(struct ibv_qp *ibq return ret; } -int ibv_cmd_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *wr, +int ibv_cmd_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr) { - struct ibv_post_srq_recv *cmd; - struct ibv_post_srq_recv_resp resp; - struct ibv_recv_wr *i; - struct ibv_kern_recv_wr *n, *tmp; - struct ibv_sge *s; + return rdma_cmd_post_recv((struct rdma_qp *) ibqp, + (struct rdma_recv_wr *) wr, + (struct rdma_recv_wr **) bad_wr); +} + +int rdma_cmd_post_srq_recv(struct rdma_srq *srq, struct rdma_recv_wr *wr, + struct rdma_recv_wr **bad_wr) +{ + struct rdma_post_srq_recv *cmd; + struct rdma_post_srq_recv_resp resp; + struct rdma_recv_wr *i; + struct rdma_kern_recv_wr *n, *tmp; + struct rdma_sge *s; unsigned wr_count = 0; unsigned sge_count = 0; int cmd_size; @@ -928,14 +1124,14 @@ int ibv_cmd_post_srq_recv(struct ibv_srq cmd_size = sizeof *cmd + wr_count * sizeof *n + sge_count * sizeof *s; cmd = alloca(cmd_size); - IBV_INIT_CMD_RESP(cmd, cmd_size, POST_SRQ_RECV, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(cmd, cmd_size, POST_SRQ_RECV, &resp, sizeof resp); cmd->srq_handle = srq->handle; cmd->wr_count = wr_count; cmd->sge_count = sge_count; cmd->wqe_size = sizeof *n; - n = (struct ibv_kern_recv_wr *) ((void *) cmd + sizeof *cmd); - s = (struct ibv_sge *) (n + wr_count); + n = (struct rdma_kern_recv_wr *) ((void *) cmd + sizeof *cmd); + s = (struct rdma_sge *) (n + wr_count); tmp = n; for (i = wr; i; i = i->next) { @@ -965,13 +1161,21 @@ int ibv_cmd_post_srq_recv(struct ibv_srq return ret; } -int ibv_cmd_create_ah(struct ibv_pd *pd, struct ibv_ah *ah, - struct ibv_ah_attr *attr) +int ibv_cmd_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *wr, + struct ibv_recv_wr **bad_wr) +{ + return rdma_cmd_post_srq_recv((struct rdma_srq *) srq, + (struct rdma_recv_wr *) wr, + (struct rdma_recv_wr **) bad_wr); +} + +int rdma_cmd_create_ah(struct rdma_pd *pd, struct rdma_ah *ah, + struct rdma_ah_attr *attr) { - struct ibv_create_ah cmd; - struct ibv_create_ah_resp resp; + struct rdma_create_ah cmd; + struct rdma_create_ah_resp resp; - IBV_INIT_CMD_RESP(&cmd, sizeof cmd, CREATE_AH, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, CREATE_AH, &resp, sizeof resp); cmd.user_handle = (uintptr_t) ah; cmd.pd_handle = pd->handle; cmd.attr.dlid = attr->dlid; @@ -994,11 +1198,18 @@ int ibv_cmd_create_ah(struct ibv_pd *pd, return 0; } -int ibv_cmd_destroy_ah(struct ibv_ah *ah) +int ibv_cmd_create_ah(struct ibv_pd *pd, struct ibv_ah *ah, + struct ibv_ah_attr *attr) +{ + return rdma_cmd_create_ah((struct rdma_pd *) pd, (struct rdma_ah *) ah, + (struct rdma_ah_attr *) attr); +} + +int rdma_cmd_destroy_ah(struct rdma_ah *ah) { - struct ibv_destroy_ah cmd; + struct rdma_destroy_ah cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, DESTROY_AH); + RDMA_INIT_CMD(&cmd, sizeof cmd, DESTROY_AH); cmd.ah_handle = ah->handle; if (write(ah->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -1007,15 +1218,20 @@ int ibv_cmd_destroy_ah(struct ibv_ah *ah return 0; } -int ibv_cmd_destroy_qp(struct ibv_qp *qp) +int ibv_cmd_destroy_ah(struct ibv_ah *ah) +{ + return rdma_cmd_destroy_ah((struct rdma_ah *) ah); +} + +int rdma_cmd_destroy_qp(struct rdma_qp *qp) { - struct ibv_destroy_qp cmd; - struct ibv_destroy_qp_resp resp; + struct rdma_destroy_qp cmd; + struct rdma_destroy_qp_resp resp; if (abi_ver == 1) - return ibv_cmd_destroy_qp_v1(qp); + return rdma_cmd_destroy_qp_v1(qp); - IBV_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_QP, &resp, sizeof resp); + RDMA_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_QP, &resp, sizeof resp); cmd.qp_handle = qp->handle; if (write(qp->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) @@ -1029,11 +1245,16 @@ int ibv_cmd_destroy_qp(struct ibv_qp *qp return 0; } -int ibv_cmd_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +int ibv_cmd_destroy_qp(struct ibv_qp *qp) { - struct ibv_attach_mcast cmd; + return rdma_cmd_destroy_qp((struct rdma_qp *) qp); +} + +int rdma_cmd_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) +{ + struct rdma_attach_mcast cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, ATTACH_MCAST); + RDMA_INIT_CMD(&cmd, sizeof cmd, ATTACH_MCAST); memcpy(cmd.gid, gid->raw, sizeof cmd.gid); cmd.qp_handle = qp->handle; cmd.mlid = lid; @@ -1044,11 +1265,17 @@ int ibv_cmd_attach_mcast(struct ibv_qp * return 0; } -int ibv_cmd_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +int ibv_cmd_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +{ + return rdma_cmd_attach_mcast((struct rdma_qp *) qp, + (union rdma_gid *) gid, lid); +} + +int rdma_cmd_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) { - struct ibv_detach_mcast cmd; + struct rdma_detach_mcast cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, DETACH_MCAST); + RDMA_INIT_CMD(&cmd, sizeof cmd, DETACH_MCAST); memcpy(cmd.gid, gid->raw, sizeof cmd.gid); cmd.qp_handle = qp->handle; cmd.mlid = lid; @@ -1058,3 +1285,9 @@ int ibv_cmd_detach_mcast(struct ibv_qp * return 0; } + +int ibv_cmd_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +{ + return rdma_cmd_detach_mcast((struct rdma_qp *) qp, + (union rdma_gid *) gid, lid); +} diff -ruNp ORG/libibverbs/src/device.c NEW/libibverbs/src/device.c --- ORG/libibverbs/src/device.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/device.c 2006-07-27 15:34:44.000000000 +0530 @@ -48,23 +48,23 @@ #include -#include "ibverbs.h" +#include "rdmaverbs.h" static pthread_mutex_t device_list_lock = PTHREAD_MUTEX_INITIALIZER; static int num_devices; -static struct ibv_device **device_list; +static struct rdma_device **device_list; -struct ibv_device **ibv_get_device_list(int *num) +struct rdma_device **rdma_get_device_list(int *num) { - struct ibv_device **l; + struct rdma_device **l; int i; pthread_mutex_lock(&device_list_lock); if (!num_devices) - num_devices = ibverbs_init(&device_list); + num_devices = rdmaverbs_init(&device_list); - l = calloc(num_devices + 1, sizeof (struct ibv_device *)); + l = calloc(num_devices + 1, sizeof (struct rdma_device *)); for (i = 0; i < num_devices; ++i) l[i] = device_list[i]; @@ -76,24 +76,40 @@ struct ibv_device **ibv_get_device_list( return l; } -void ibv_free_device_list(struct ibv_device **list) +struct ibv_device **ibv_get_device_list(int *num) +{ + return (struct ibv_device **) rdma_get_device_list(num); +} + +void rdma_free_device_list(struct rdma_device **list) { free(list); } -const char *ibv_get_device_name(struct ibv_device *device) +void ibv_free_device_list(struct ibv_device **list) +{ + rdma_free_device_list((struct rdma_device **) list); + +} + +const char *rdma_get_device_name(struct rdma_device *device) { return device->name; } -uint64_t ibv_get_device_guid(struct ibv_device *device) +const char *ibv_get_device_name(struct ibv_device *device) +{ + return rdma_get_device_name((struct rdma_device *) device); +} + +uint64_t rdma_get_device_guid(struct rdma_device *device) { char attr[24]; uint64_t guid = 0; uint16_t parts[4]; int i; - if (ibv_read_sysfs_file(device->ibdev_path, "node_guid", + if (rdma_read_sysfs_file(device->ibdev_path, "node_guid", attr, sizeof attr) < 0) return 0; @@ -107,11 +123,16 @@ uint64_t ibv_get_device_guid(struct ibv_ return htonll(guid); } -struct ibv_context *ibv_open_device(struct ibv_device *device) +uint64_t ibv_get_device_guid(struct ibv_device *device) +{ + return rdma_get_device_guid((struct rdma_device *) device); +} + +struct rdma_context *rdma_open_device(struct rdma_device *device) { char *devpath; int cmd_fd; - struct ibv_context *context; + struct rdma_context *context; asprintf(&devpath, "/dev/infiniband/%s", device->dev_name); @@ -140,14 +161,20 @@ err: return NULL; } -int ibv_close_device(struct ibv_context *context) +struct ibv_context *ibv_open_device(struct ibv_device *device) +{ + return (struct ibv_context *) + rdma_open_device((struct rdma_device *) device); +} + +int rdma_close_device(struct rdma_context *context) { int async_fd = context->async_fd; int cmd_fd = context->cmd_fd; int cq_fd = -1; if (abi_ver <= 2) { - struct ibv_abi_compat_v2 *t = context->abi_compat; + struct rdma_abi_compat_v2 *t = context->abi_compat; cq_fd = t->channel.fd; free(context->abi_compat); } @@ -162,10 +189,15 @@ int ibv_close_device(struct ibv_context return 0; } -int ibv_get_async_event(struct ibv_context *context, - struct ibv_async_event *event) +int ibv_close_device(struct ibv_context *context) +{ + return rdma_close_device((struct rdma_context *) context); +} + +int rdma_get_async_event(struct rdma_context *context, + struct rdma_async_event *event) { - struct ibv_kern_async_event ev; + struct rdma_kern_async_event ev; if (read(context->async_fd, &ev, sizeof ev) != sizeof ev) return -1; @@ -173,23 +205,23 @@ int ibv_get_async_event(struct ibv_conte event->event_type = ev.event_type; switch (event->event_type) { - case IBV_EVENT_CQ_ERR: + case RDMA_EVENT_CQ_ERR: event->element.cq = (void *) (uintptr_t) ev.element; break; - case IBV_EVENT_QP_FATAL: - case IBV_EVENT_QP_REQ_ERR: - case IBV_EVENT_QP_ACCESS_ERR: - case IBV_EVENT_COMM_EST: - case IBV_EVENT_SQ_DRAINED: - case IBV_EVENT_PATH_MIG: - case IBV_EVENT_PATH_MIG_ERR: - case IBV_EVENT_QP_LAST_WQE_REACHED: + case RDMA_EVENT_QP_FATAL: + case RDMA_EVENT_QP_REQ_ERR: + case RDMA_EVENT_QP_ACCESS_ERR: + case RDMA_EVENT_COMM_EST: + case RDMA_EVENT_SQ_DRAINED: + case RDMA_EVENT_PATH_MIG: + case RDMA_EVENT_PATH_MIG_ERR: + case RDMA_EVENT_QP_LAST_WQE_REACHED: event->element.qp = (void *) (uintptr_t) ev.element; break; - case IBV_EVENT_SRQ_ERR: - case IBV_EVENT_SRQ_LIMIT_REACHED: + case RDMA_EVENT_SRQ_ERR: + case RDMA_EVENT_SRQ_LIMIT_REACHED: event->element.srq = (void *) (uintptr_t) ev.element; break; @@ -201,12 +233,19 @@ int ibv_get_async_event(struct ibv_conte return 0; } -void ibv_ack_async_event(struct ibv_async_event *event) +int ibv_get_async_event(struct ibv_context *context, + struct ibv_async_event *event) +{ + return rdma_get_async_event((struct rdma_context *) context, + (struct rdma_async_event *) event); +} + +void rdma_ack_async_event(struct rdma_async_event *event) { switch (event->event_type) { - case IBV_EVENT_CQ_ERR: + case RDMA_EVENT_CQ_ERR: { - struct ibv_cq *cq = event->element.cq; + struct rdma_cq *cq = event->element.cq; pthread_mutex_lock(&cq->mutex); ++cq->async_events_completed; @@ -216,16 +255,16 @@ void ibv_ack_async_event(struct ibv_asyn return; } - case IBV_EVENT_QP_FATAL: - case IBV_EVENT_QP_REQ_ERR: - case IBV_EVENT_QP_ACCESS_ERR: - case IBV_EVENT_COMM_EST: - case IBV_EVENT_SQ_DRAINED: - case IBV_EVENT_PATH_MIG: - case IBV_EVENT_PATH_MIG_ERR: - case IBV_EVENT_QP_LAST_WQE_REACHED: + case RDMA_EVENT_QP_FATAL: + case RDMA_EVENT_QP_REQ_ERR: + case RDMA_EVENT_QP_ACCESS_ERR: + case RDMA_EVENT_COMM_EST: + case RDMA_EVENT_SQ_DRAINED: + case RDMA_EVENT_PATH_MIG: + case RDMA_EVENT_PATH_MIG_ERR: + case RDMA_EVENT_QP_LAST_WQE_REACHED: { - struct ibv_qp *qp = event->element.qp; + struct rdma_qp *qp = event->element.qp; pthread_mutex_lock(&qp->mutex); ++qp->events_completed; @@ -235,10 +274,10 @@ void ibv_ack_async_event(struct ibv_asyn return; } - case IBV_EVENT_SRQ_ERR: - case IBV_EVENT_SRQ_LIMIT_REACHED: + case RDMA_EVENT_SRQ_ERR: + case RDMA_EVENT_SRQ_LIMIT_REACHED: { - struct ibv_srq *srq = event->element.srq; + struct rdma_srq *srq = event->element.srq; pthread_mutex_lock(&srq->mutex); ++srq->events_completed; @@ -252,3 +291,8 @@ void ibv_ack_async_event(struct ibv_asyn return; } } + +void ibv_ack_async_event(struct ibv_async_event *event) +{ + rdma_ack_async_event((struct rdma_async_event *) event); +} diff -ruNp ORG/libibverbs/src/ibverbs.h NEW/libibverbs/src/ibverbs.h --- ORG/libibverbs/src/ibverbs.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/ibverbs.h 1970-01-01 05:30:00.000000000 +0530 @@ -1,88 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id: ibverbs.h 4466 2005-12-14 20:44:36Z roland $ - */ - -#ifndef IB_VERBS_H -#define IB_VERBS_H - -#include - -#include - -#define HIDDEN __attribute__((visibility ("hidden"))) - -#define INIT __attribute__((constructor)) -#define FINI __attribute__((destructor)) - -#define PFX "libibverbs: " - -struct ibv_driver { - ibv_driver_init_func init_func; - struct ibv_driver *next; -}; - -struct ibv_abi_compat_v2 { - struct ibv_comp_channel channel; - pthread_mutex_t in_use; -}; - -extern HIDDEN int abi_ver; - -extern HIDDEN int ibverbs_init(struct ibv_device ***list); - -extern HIDDEN int ibv_init_mem_map(void); -extern HIDDEN int ibv_lock_range(void *base, size_t size); -extern HIDDEN int ibv_unlock_range(void *base, size_t size); - -#define IBV_INIT_CMD(cmd, size, opcode) \ - do { \ - if (abi_ver > 2) \ - (cmd)->command = IB_USER_VERBS_CMD_##opcode; \ - else \ - (cmd)->command = IB_USER_VERBS_CMD_##opcode##_V2; \ - (cmd)->in_words = (size) / 4; \ - (cmd)->out_words = 0; \ - } while (0) - -#define IBV_INIT_CMD_RESP(cmd, size, opcode, out, outsize) \ - do { \ - if (abi_ver > 2) \ - (cmd)->command = IB_USER_VERBS_CMD_##opcode; \ - else \ - (cmd)->command = IB_USER_VERBS_CMD_##opcode##_V2; \ - (cmd)->in_words = (size) / 4; \ - (cmd)->out_words = (outsize) / 4; \ - (cmd)->response = (uintptr_t) (out); \ - } while (0) - -#endif /* IB_VERBS_H */ diff -ruNp ORG/libibverbs/src/init.c NEW/libibverbs/src/init.c --- ORG/libibverbs/src/init.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/init.c 2006-07-27 15:34:44.000000000 +0530 @@ -46,24 +46,28 @@ #include #include -#include "ibverbs.h" +#include "rdmaverbs.h" #ifndef OPENIB_DRIVER_PATH_ENV # define OPENIB_DRIVER_PATH_ENV "OPENIB_DRIVER_PATH" #endif +#ifndef LIBRDMAVERBS_DRIVER_PATH_ENV +# define LIBRDMAVERBS_DRIVER_PATH_ENV "LIBRDMAVERBS_DRIVER_PATH" +#endif + HIDDEN int abi_ver; static char default_path[] = DRIVER_PATH; static const char *user_path; -static struct ibv_driver *driver_list; +static struct rdma_driver *driver_list; static void load_driver(char *so_path) { void *dlhandle; - ibv_driver_init_func init_func; - struct ibv_driver *driver; + rdma_driver_init_func init_func; + struct rdma_driver *driver; dlhandle = dlopen(so_path, RTLD_NOW); if (!dlhandle) { @@ -122,11 +126,11 @@ static void find_drivers(char *dir) globfree(&so_glob); } -static struct ibv_device *init_drivers(const char *class_path, +static struct rdma_device *init_drivers(const char *class_path, const char *dev_name) { - struct ibv_driver *driver; - struct ibv_device *dev; + struct rdma_driver *driver; + struct rdma_device *dev; int abi_ver = 0; char sys_path[IBV_SYSFS_PATH_MAX]; char ibdev_name[IBV_SYSFS_NAME_MAX]; @@ -135,10 +139,12 @@ static struct ibv_device *init_drivers(c snprintf(sys_path, sizeof sys_path, "%s/%s", class_path, dev_name); - if (ibv_read_sysfs_file(sys_path, "abi_version", value, sizeof value) > 0) + if (rdma_read_sysfs_file(sys_path, "abi_version", value, + sizeof value) > 0) abi_ver = strtol(value, NULL, 10); - if (ibv_read_sysfs_file(sys_path, "ibdev", ibdev_name, sizeof ibdev_name) < 0) { + if (rdma_read_sysfs_file(sys_path, "ibdev", ibdev_name, + sizeof ibdev_name) < 0) { fprintf(stderr, PFX "Warning: no ibdev class attr for %s\n", sys_path); return NULL; @@ -151,8 +157,9 @@ static struct ibv_device *init_drivers(c dev->driver = driver; strcpy(dev->dev_path, sys_path); - snprintf(dev->ibdev_path, IBV_SYSFS_PATH_MAX, "%s/class/infiniband/%s", - ibv_get_sysfs_path(), ibdev_name); + snprintf(dev->ibdev_path, IBV_SYSFS_PATH_MAX, + "%s/class/infiniband/%s", + rdma_get_sysfs_path(), ibdev_name); strcpy(dev->dev_name, dev_name); strcpy(dev->name, ibdev_name); @@ -172,7 +179,7 @@ static int check_abi_version(const char { char value[8]; - if (ibv_read_sysfs_file(path, "class/infiniband_verbs/abi_version", + if (rdma_read_sysfs_file(path, "class/infiniband_verbs/abi_version", value, sizeof value) < 0) { fprintf(stderr, PFX "Fatal: couldn't read uverbs ABI version.\n"); return -1; @@ -191,21 +198,21 @@ static int check_abi_version(const char return 0; } -HIDDEN int ibverbs_init(struct ibv_device ***list) +HIDDEN int rdmaverbs_init(struct rdma_device ***list) { const char *sysfs_path; char *wr_path, *dir; char class_path[IBV_SYSFS_PATH_MAX]; DIR *class_dir; struct dirent *dent; - struct ibv_device *device; - struct ibv_device **new_list; + struct rdma_device *device; + struct rdma_device **new_list; int num_devices = 0; int list_size = 0; *list = NULL; - if (ibv_init_mem_map()) + if (rdma_init_mem_map()) return 0; find_drivers(default_path); @@ -215,12 +222,22 @@ HIDDEN int ibverbs_init(struct ibv_devic * environment if we're not running SUID. */ if (getuid() == geteuid()) { - user_path = getenv(OPENIB_DRIVER_PATH_ENV); + const char *user_path_extra; + + user_path = getenv(LIBRDMAVERBS_DRIVER_PATH_ENV); if (user_path) { wr_path = strdupa(user_path); while ((dir = strsep(&wr_path, ";:"))) find_drivers(dir); } + + /* for backwards compatibility */ + user_path_extra = getenv(OPENIB_DRIVER_PATH_ENV); + if (user_path_extra) { + wr_path = strdupa(user_path_extra); + while ((dir = strsep(&wr_path, ";:"))) + find_drivers(dir); + } } /* @@ -230,7 +247,7 @@ HIDDEN int ibverbs_init(struct ibv_devic */ load_driver(NULL); - sysfs_path = ibv_get_sysfs_path(); + sysfs_path = rdma_get_sysfs_path(); if (!sysfs_path) { fprintf(stderr, PFX "Fatal: couldn't find sysfs mount.\n"); return 0; @@ -258,7 +275,7 @@ HIDDEN int ibverbs_init(struct ibv_devic if (list_size <= num_devices) { list_size = list_size ? list_size * 2 : 1; - new_list = realloc(*list, list_size * sizeof (struct ibv_device *)); + new_list = realloc(*list, list_size * sizeof (struct rdma_device *)); if (!new_list) goto out; *list = new_list; diff -ruNp ORG/libibverbs/src/libibverbs.map NEW/libibverbs/src/libibverbs.map --- ORG/libibverbs/src/libibverbs.map 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/libibverbs.map 1970-01-01 05:30:00.000000000 +0530 @@ -1,79 +0,0 @@ -IBVERBS_1.0 { - global: - ibv_get_device_list; - ibv_free_device_list; - ibv_get_device_name; - ibv_get_device_guid; - ibv_open_device; - ibv_close_device; - ibv_get_async_event; - ibv_ack_async_event; - ibv_query_device; - ibv_query_port; - ibv_query_gid; - ibv_query_pkey; - ibv_alloc_pd; - ibv_dealloc_pd; - ibv_reg_mr; - ibv_dereg_mr; - ibv_create_comp_channel; - ibv_destroy_comp_channel; - ibv_create_cq; - ibv_resize_cq; - ibv_destroy_cq; - ibv_get_cq_event; - ibv_ack_cq_events; - ibv_create_srq; - ibv_modify_srq; - ibv_query_srq; - ibv_destroy_srq; - ibv_create_qp; - ibv_query_qp; - ibv_modify_qp; - ibv_destroy_qp; - ibv_create_ah; - ibv_init_ah_from_wc; - ibv_create_ah_from_wc; - ibv_destroy_ah; - ibv_attach_mcast; - ibv_detach_mcast; - ibv_cmd_get_context; - ibv_cmd_query_device; - ibv_cmd_query_port; - ibv_cmd_query_gid; - ibv_cmd_query_pkey; - ibv_cmd_alloc_pd; - ibv_cmd_dealloc_pd; - ibv_cmd_reg_mr; - ibv_cmd_dereg_mr; - ibv_cmd_create_cq; - ibv_cmd_poll_cq; - ibv_cmd_req_notify_cq; - ibv_cmd_resize_cq; - ibv_cmd_destroy_cq; - ibv_cmd_create_srq; - ibv_cmd_modify_srq; - ibv_cmd_query_srq; - ibv_cmd_destroy_srq; - ibv_cmd_create_qp; - ibv_cmd_query_qp; - ibv_cmd_modify_qp; - ibv_cmd_destroy_qp; - ibv_cmd_post_send; - ibv_cmd_post_recv; - ibv_cmd_post_srq_recv; - ibv_cmd_create_ah; - ibv_cmd_destroy_ah; - ibv_cmd_attach_mcast; - ibv_cmd_detach_mcast; - ibv_copy_qp_attr_from_kern; - ibv_copy_ah_attr_from_kern; - ibv_copy_path_rec_from_kern; - ibv_copy_path_rec_to_kern; - ibv_rate_to_mult; - mult_to_ibv_rate; - ibv_get_sysfs_path; - ibv_read_sysfs_file; - - local: *; -}; diff -ruNp ORG/libibverbs/src/librdmaverbs.map NEW/libibverbs/src/librdmaverbs.map --- ORG/libibverbs/src/librdmaverbs.map 1970-01-01 05:30:00.000000000 +0530 +++ NEW/libibverbs/src/librdmaverbs.map 2006-07-27 15:34:44.000000000 +0530 @@ -0,0 +1,154 @@ +IBVERBS_1.0 { + global: + ibv_get_device_list; + ibv_free_device_list; + ibv_get_device_name; + ibv_get_device_guid; + ibv_open_device; + ibv_close_device; + ibv_get_async_event; + ibv_ack_async_event; + ibv_query_device; + ibv_query_port; + ibv_query_gid; + ibv_query_pkey; + ibv_alloc_pd; + ibv_dealloc_pd; + ibv_reg_mr; + ibv_dereg_mr; + ibv_create_comp_channel; + ibv_destroy_comp_channel; + ibv_create_cq; + ibv_resize_cq; + ibv_destroy_cq; + ibv_get_cq_event; + ibv_ack_cq_events; + ibv_create_srq; + ibv_modify_srq; + ibv_query_srq; + ibv_destroy_srq; + ibv_create_qp; + ibv_query_qp; + ibv_modify_qp; + ibv_destroy_qp; + ibv_create_ah; + ibv_init_ah_from_wc; + ibv_create_ah_from_wc; + ibv_destroy_ah; + ibv_attach_mcast; + ibv_detach_mcast; + ibv_cmd_get_context; + ibv_cmd_query_device; + ibv_cmd_query_port; + ibv_cmd_query_gid; + ibv_cmd_query_pkey; + ibv_cmd_alloc_pd; + ibv_cmd_dealloc_pd; + ibv_cmd_reg_mr; + ibv_cmd_dereg_mr; + ibv_cmd_create_cq; + ibv_cmd_poll_cq; + ibv_cmd_req_notify_cq; + ibv_cmd_resize_cq; + ibv_cmd_destroy_cq; + ibv_cmd_create_srq; + ibv_cmd_modify_srq; + ibv_cmd_query_srq; + ibv_cmd_destroy_srq; + ibv_cmd_create_qp; + ibv_cmd_query_qp; + ibv_cmd_modify_qp; + ibv_cmd_destroy_qp; + ibv_cmd_post_send; + ibv_cmd_post_recv; + ibv_cmd_post_srq_recv; + ibv_cmd_create_ah; + ibv_cmd_destroy_ah; + ibv_cmd_attach_mcast; + ibv_cmd_detach_mcast; + ibv_copy_qp_attr_from_kern; + ibv_copy_ah_attr_from_kern; + ibv_copy_path_rec_from_kern; + ibv_copy_path_rec_to_kern; + ibv_rate_to_mult; + mult_to_ibv_rate; + ibv_get_sysfs_path; + ibv_read_sysfs_file; + + rdma_get_device_list; + rdma_free_device_list; + rdma_get_device_name; + rdma_get_device_guid; + rdma_open_device; + rdma_close_device; + rdma_get_async_event; + rdma_ack_async_event; + rdma_query_device; + rdma_query_port; + rdma_query_gid; + rdma_query_pkey; + rdma_alloc_pd; + rdma_dealloc_pd; + rdma_reg_mr; + rdma_dereg_mr; + rdma_create_comp_channel; + rdma_destroy_comp_channel; + rdma_create_cq; + rdma_resize_cq; + rdma_destroy_cq; + rdma_get_cq_event; + rdma_ack_cq_events; + rdma_create_srq; + rdma_modify_srq; + rdma_query_srq; + rdma_destroy_srq; + rdmav_create_qp; + rdma_query_qp; + rdma_modify_qp; + rdmav_destroy_qp; + rdma_create_ah; + rdma_init_ah_from_wc; + rdma_create_ah_from_wc; + rdma_destroy_ah; + rdma_attach_mcast; + rdma_detach_mcast; + rdma_cmd_get_context; + rdma_cmd_query_device; + rdma_cmd_query_port; + rdma_cmd_query_gid; + rdma_cmd_query_pkey; + rdma_cmd_alloc_pd; + rdma_cmd_dealloc_pd; + rdma_cmd_reg_mr; + rdma_cmd_dereg_mr; + rdma_cmd_create_cq; + rdma_cmd_poll_cq; + rdma_cmd_req_notify_cq; + rdma_cmd_resize_cq; + rdma_cmd_destroy_cq; + rdma_cmd_create_srq; + rdma_cmd_modify_srq; + rdma_cmd_query_srq; + rdma_cmd_destroy_srq; + rdma_cmd_create_qp; + rdma_cmd_query_qp; + rdma_cmd_modify_qp; + rdma_cmd_destroy_qp; + rdma_cmd_post_send; + rdma_cmd_post_recv; + rdma_cmd_post_srq_recv; + rdma_cmd_create_ah; + rdma_cmd_destroy_ah; + rdma_cmd_attach_mcast; + rdma_cmd_detach_mcast; + rdma_copy_qp_attr_from_kern; + rdma_copy_ah_attr_from_kern; + rdma_copy_path_rec_from_kern; + rdma_copy_path_rec_to_kern; + rdma_rate_to_mult; + mult_to_rdma_rate; + rdma_get_sysfs_path; + rdma_read_sysfs_file; + + local: *; +}; diff -ruNp ORG/libibverbs/src/marshall.c NEW/libibverbs/src/marshall.c --- ORG/libibverbs/src/marshall.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/marshall.c 2006-07-27 15:34:44.000000000 +0530 @@ -38,8 +38,8 @@ #include -void ibv_copy_ah_attr_from_kern(struct ibv_ah_attr *dst, - struct ibv_kern_ah_attr *src) +void rdma_copy_ah_attr_from_kern(struct rdma_ah_attr *dst, + struct rdma_kern_ah_attr *src) { memcpy(dst->grh.dgid.raw, src->grh.dgid, sizeof dst->grh.dgid); dst->grh.flow_label = src->grh.flow_label; @@ -55,8 +55,15 @@ void ibv_copy_ah_attr_from_kern(struct i dst->port_num = src->port_num; } -void ibv_copy_qp_attr_from_kern(struct ibv_qp_attr *dst, - struct ibv_kern_qp_attr *src) +void ibv_copy_ah_attr_from_kern(struct ibv_ah_attr *dst, + struct ibv_kern_ah_attr *src) +{ + rdma_copy_ah_attr_from_kern((struct rdma_ah_attr *) dst, + (struct rdma_kern_ah_attr *) src); +} + +void rdma_copy_qp_attr_from_kern(struct rdma_qp_attr *dst, + struct rdma_kern_qp_attr *src) { dst->cur_qp_state = src->cur_qp_state; dst->path_mtu = src->path_mtu; @@ -73,8 +80,8 @@ void ibv_copy_qp_attr_from_kern(struct i dst->cap.max_recv_sge = src->max_recv_sge; dst->cap.max_inline_data = src->max_inline_data; - ibv_copy_ah_attr_from_kern(&dst->ah_attr, &src->ah_attr); - ibv_copy_ah_attr_from_kern(&dst->alt_ah_attr, &src->alt_ah_attr); + rdma_copy_ah_attr_from_kern(&dst->ah_attr, &src->ah_attr); + rdma_copy_ah_attr_from_kern(&dst->alt_ah_attr, &src->alt_ah_attr); dst->pkey_index = src->pkey_index; dst->alt_pkey_index = src->alt_pkey_index; @@ -91,8 +98,15 @@ void ibv_copy_qp_attr_from_kern(struct i dst->alt_timeout = src->alt_timeout; } -void ibv_copy_path_rec_from_kern(struct ibv_sa_path_rec *dst, - struct ibv_kern_path_rec *src) +void ibv_copy_qp_attr_from_kern(struct ibv_qp_attr *dst, + struct ibv_kern_qp_attr *src) +{ + return rdma_copy_qp_attr_from_kern((struct rdma_qp_attr *) dst, + (struct rdma_kern_qp_attr *) src); +} + +void rdma_copy_path_rec_from_kern(struct rdma_sa_path_rec *dst, + struct rdma_kern_path_rec *src) { memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid); memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid); @@ -116,8 +130,15 @@ void ibv_copy_path_rec_from_kern(struct dst->packet_life_time_selector = src->packet_life_time_selector; } -void ibv_copy_path_rec_to_kern(struct ibv_kern_path_rec *dst, - struct ibv_sa_path_rec *src) +void ibv_copy_path_rec_from_kern(struct ibv_sa_path_rec *dst, + struct ibv_kern_path_rec *src) +{ + return rdma_copy_path_rec_from_kern((struct rdma_sa_path_rec *) dst, + (struct rdma_kern_path_rec *) src); +} + +void rdma_copy_path_rec_to_kern(struct rdma_kern_path_rec *dst, + struct rdma_sa_path_rec *src) { memcpy(dst->dgid, src->dgid.raw, sizeof src->dgid); memcpy(dst->sgid, src->sgid.raw, sizeof src->sgid); @@ -140,3 +161,10 @@ void ibv_copy_path_rec_to_kern(struct ib dst->preference = src->preference; dst->packet_life_time_selector = src->packet_life_time_selector; } + +void ibv_copy_path_rec_to_kern(struct ibv_kern_path_rec *dst, + struct ibv_sa_path_rec *src) +{ + return rdma_copy_path_rec_to_kern((struct rdma_kern_path_rec *) dst, + (struct rdma_sa_path_rec *) src); +} diff -ruNp ORG/libibverbs/src/memory.c NEW/libibverbs/src/memory.c --- ORG/libibverbs/src/memory.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/memory.c 2006-07-27 15:34:44.000000000 +0530 @@ -41,7 +41,7 @@ #include #include -#include "ibverbs.h" +#include "rdmaverbs.h" /* * We keep a linked list of page ranges that have been locked along with a @@ -51,21 +51,21 @@ * to avoid the O(n) cost of registering/unregistering memory. */ -struct ibv_mem_node { - struct ibv_mem_node *prev, *next; +struct rdma_mem_node { + struct rdma_mem_node *prev, *next; uintptr_t start, end; int refcnt; }; static struct { - struct ibv_mem_node *first; + struct rdma_mem_node *first; pthread_mutex_t mutex; uintptr_t page_size; } mem_map; -int ibv_init_mem_map(void) +int rdma_init_mem_map(void) { - struct ibv_mem_node *node = NULL; + struct rdma_mem_node *node = NULL; node = malloc(sizeof *node); if (!node) @@ -94,9 +94,9 @@ fail: return -1; } -static struct ibv_mem_node *__mm_find_first(uintptr_t start, uintptr_t end) +static struct rdma_mem_node *__mm_find_first(uintptr_t start, uintptr_t end) { - struct ibv_mem_node *node = mem_map.first; + struct rdma_mem_node *node = mem_map.first; while (node) { if ((node->start <= start && node->end >= start) || @@ -108,18 +108,18 @@ static struct ibv_mem_node *__mm_find_fi return node; } -static struct ibv_mem_node *__mm_prev(struct ibv_mem_node *node) +static struct rdma_mem_node *__mm_prev(struct rdma_mem_node *node) { return node->prev; } -static struct ibv_mem_node *__mm_next(struct ibv_mem_node *node) +static struct rdma_mem_node *__mm_next(struct rdma_mem_node *node) { return node->next; } -static void __mm_add(struct ibv_mem_node *node, - struct ibv_mem_node *new) +static void __mm_add(struct rdma_mem_node *node, + struct rdma_mem_node *new) { new->prev = node; new->next = node->next; @@ -128,7 +128,7 @@ static void __mm_add(struct ibv_mem_node new->next->prev = new; } -static void __mm_remove(struct ibv_mem_node *node) +static void __mm_remove(struct rdma_mem_node *node) { /* Never have to remove the first node, so we can use prev */ node->prev->next = node->next; @@ -136,10 +136,10 @@ static void __mm_remove(struct ibv_mem_n node->next->prev = node->prev; } -int ibv_lock_range(void *base, size_t size) +int rdma_lock_range(void *base, size_t size) { uintptr_t start, end; - struct ibv_mem_node *node, *tmp; + struct rdma_mem_node *node, *tmp; int ret = 0; if (!size) @@ -202,10 +202,15 @@ out: return ret; } -int ibv_unlock_range(void *base, size_t size) +int ibv_lock_range(void *base, size_t size) +{ + return rdma_lock_range(base, size); +} + +int rdma_unlock_range(void *base, size_t size) { uintptr_t start, end; - struct ibv_mem_node *node, *tmp; + struct rdma_mem_node *node, *tmp; int ret = 0; if (!size) @@ -256,3 +261,8 @@ out: return ret; } + +int ibv_unlock_range(void *base, size_t size) +{ + return rdma_unlock_range(base, size); +} diff -ruNp ORG/libibverbs/src/rdmaverbs.h NEW/libibverbs/src/rdmaverbs.h --- ORG/libibverbs/src/rdmaverbs.h 1970-01-01 05:30:00.000000000 +0530 +++ NEW/libibverbs/src/rdmaverbs.h 2006-07-27 15:34:44.000000000 +0530 @@ -0,0 +1,125 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: ibverbs.h 4466 2005-12-14 20:44:36Z roland $ + */ + +#ifndef SRC_RDMA_VERBS_H +#define SRC_RDMA_VERBS_H + +#include + +#include + +#define HIDDEN __attribute__((visibility ("hidden"))) + +#define INIT __attribute__((constructor)) +#define FINI __attribute__((destructor)) + +#ifndef PFX +#define PFX "librdmaverbs: " +#endif + +struct ibv_driver { + ibv_driver_init_func init_func; + struct ibv_driver *next; +}; + +struct ibv_abi_compat_v2 { + struct ibv_comp_channel channel; + pthread_mutex_t in_use; +}; + +extern HIDDEN int abi_ver; + +#define IBV_INIT_CMD(cmd, size, opcode) \ + do { \ + if (abi_ver > 2) \ + (cmd)->command = IB_USER_VERBS_CMD_##opcode; \ + else \ + (cmd)->command = IB_USER_VERBS_CMD_##opcode##_V2; \ + (cmd)->in_words = (size) / 4; \ + (cmd)->out_words = 0; \ + } while (0) + +#define IBV_INIT_CMD_RESP(cmd, size, opcode, out, outsize) \ + do { \ + if (abi_ver > 2) \ + (cmd)->command = IB_USER_VERBS_CMD_##opcode; \ + else \ + (cmd)->command = IB_USER_VERBS_CMD_##opcode##_V2; \ + (cmd)->in_words = (size) / 4; \ + (cmd)->out_words = (outsize) / 4; \ + (cmd)->response = (uintptr_t) (out); \ + } while (0) + +/* Generic RDMA declarations follow */ + +struct rdma_driver { + rdma_driver_init_func init_func; + struct rdma_driver *next; +}; + +struct rdma_abi_compat_v2 { + struct rdma_comp_channel channel; + pthread_mutex_t in_use; +}; + +extern HIDDEN int abi_ver; + +extern HIDDEN int rdmaverbs_init(struct rdma_device ***list); + +extern HIDDEN int rdma_init_mem_map(void); +extern HIDDEN int rdma_lock_range(void *base, size_t size); +extern HIDDEN int rdma_unlock_range(void *base, size_t size); + +#define RDMA_INIT_CMD(cmd, size, opcode) \ + do { \ + if (abi_ver > 2) \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode; \ + else \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode##_V2; \ + (cmd)->in_words = (size) / 4; \ + (cmd)->out_words = 0; \ + } while (0) + +#define RDMA_INIT_CMD_RESP(cmd, size, opcode, out, outsize) \ + do { \ + if (abi_ver > 2) \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode; \ + else \ + (cmd)->command = RDMA_USER_VERBS_CMD_##opcode##_V2; \ + (cmd)->in_words = (size) / 4; \ + (cmd)->out_words = (outsize) / 4; \ + (cmd)->response = (uintptr_t) (out); \ + } while (0) + +#endif /* SRC_RDMA_VERBS_H */ diff -ruNp ORG/libibverbs/src/sysfs.c NEW/libibverbs/src/sysfs.c --- ORG/libibverbs/src/sysfs.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/sysfs.c 2006-07-27 15:34:44.000000000 +0530 @@ -44,11 +44,11 @@ #include #include -#include "ibverbs.h" +#include "rdmaverbs.h" static char *sysfs_path; -const char *ibv_get_sysfs_path(void) +const char *rdma_get_sysfs_path(void) { char *env = NULL; @@ -77,7 +77,12 @@ const char *ibv_get_sysfs_path(void) return sysfs_path; } -int ibv_read_sysfs_file(const char *dir, const char *file, +const char *ibv_get_sysfs_path(void) +{ + return rdma_get_sysfs_path(); +} + +int rdma_read_sysfs_file(const char *dir, const char *file, char *buf, size_t size) { char *path; @@ -102,3 +107,9 @@ int ibv_read_sysfs_file(const char *dir, return len; } + +int ibv_read_sysfs_file(const char *dir, const char *file, + char *buf, size_t size) +{ + return rdma_read_sysfs_file(dir, file, buf, size); +} diff -ruNp ORG/libibverbs/src/verbs.c NEW/libibverbs/src/verbs.c --- ORG/libibverbs/src/verbs.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/libibverbs/src/verbs.c 2006-07-27 15:34:44.000000000 +0530 @@ -44,9 +44,9 @@ #include #include -#include "ibverbs.h" +#include "rdmaverbs.h" -int ibv_rate_to_mult(enum ibv_rate rate) +int rdma_rate_to_mult(enum rdma_rate rate) { switch (rate) { case IBV_RATE_2_5_GBPS: return 1; @@ -62,7 +62,12 @@ int ibv_rate_to_mult(enum ibv_rate rate) } } -enum ibv_rate mult_to_ibv_rate(int mult) +int ibv_rate_to_mult(enum ibv_rate rate) +{ + return rdma_rate_to_mult(rate); +} + +enum rdma_rate mult_to_rdma_rate(int mult) { switch (mult) { case 1: return IBV_RATE_2_5_GBPS; @@ -78,20 +83,39 @@ enum ibv_rate mult_to_ibv_rate(int mult) } } +enum ibv_rate mult_to_ibv_rate(int mult) +{ + return mult_to_rdma_rate(mult); +} + +int rdma_query_device(struct rdma_context *context, + struct rdma_device_attr *device_attr) +{ + return context->ops.query_device(context, device_attr); +} + int ibv_query_device(struct ibv_context *context, struct ibv_device_attr *device_attr) { - return context->ops.query_device(context, device_attr); + return rdma_query_device((struct rdma_context *) context, + (struct rdma_device_attr *) device_attr); +} + +int rdma_query_port(struct rdma_context *context, uint8_t port_num, + struct rdma_port_attr *port_attr) +{ + return context->ops.query_port(context, port_num, port_attr); } int ibv_query_port(struct ibv_context *context, uint8_t port_num, struct ibv_port_attr *port_attr) { - return context->ops.query_port(context, port_num, port_attr); + return rdma_query_port((struct rdma_context *) context, port_num, + (struct rdma_port_attr *) port_attr); } -int ibv_query_gid(struct ibv_context *context, uint8_t port_num, - int index, union ibv_gid *gid) +int rdma_query_gid(struct rdma_context *context, uint8_t port_num, + int index, union rdma_gid *gid) { char name[24]; char attr[41]; @@ -100,7 +124,7 @@ int ibv_query_gid(struct ibv_context *co snprintf(name, sizeof name, "ports/%d/gids/%d", port_num, index); - if (ibv_read_sysfs_file(context->device->ibdev_path, name, + if (rdma_read_sysfs_file(context->device->ibdev_path, name, attr, sizeof attr) < 0) return -1; @@ -114,7 +138,14 @@ int ibv_query_gid(struct ibv_context *co return 0; } -int ibv_query_pkey(struct ibv_context *context, uint8_t port_num, +int ibv_query_gid(struct ibv_context *context, uint8_t port_num, + int index, union ibv_gid *gid) +{ + return rdma_query_gid((struct rdma_context *) context, port_num, index, + (union rdma_gid *) gid); +} + +int rdma_query_pkey(struct rdma_context *context, uint8_t port_num, int index, uint16_t *pkey) { char name[24]; @@ -123,7 +154,7 @@ int ibv_query_pkey(struct ibv_context *c snprintf(name, sizeof name, "ports/%d/pkeys/%d", port_num, index); - if (ibv_read_sysfs_file(context->device->ibdev_path, name, + if (rdma_read_sysfs_file(context->device->ibdev_path, name, attr, sizeof attr) < 0) return -1; @@ -134,9 +165,16 @@ int ibv_query_pkey(struct ibv_context *c return 0; } -struct ibv_pd *ibv_alloc_pd(struct ibv_context *context) +int ibv_query_pkey(struct ibv_context *context, uint8_t port_num, + int index, uint16_t *pkey) +{ + return rdma_query_pkey((struct rdma_context *) context, port_num, index, + pkey); +} + +struct rdma_pd *rdma_alloc_pd(struct rdma_context *context) { - struct ibv_pd *pd; + struct rdma_pd *pd; pd = context->ops.alloc_pd(context); if (pd) @@ -145,15 +183,25 @@ struct ibv_pd *ibv_alloc_pd(struct ibv_c return pd; } -int ibv_dealloc_pd(struct ibv_pd *pd) +struct ibv_pd *ibv_alloc_pd(struct ibv_context *context) +{ + return (struct ibv_pd *) rdma_alloc_pd((struct rdma_context *) context); +} + +int rdma_dealloc_pd(struct rdma_pd *pd) { return pd->context->ops.dealloc_pd(pd); } -struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, - size_t length, enum ibv_access_flags access) +int ibv_dealloc_pd(struct ibv_pd *pd) +{ + return rdma_dealloc_pd((struct rdma_pd *) pd); +} + +struct rdma_mr *rdma_reg_mr(struct rdma_pd *pd, void *addr, + size_t length, enum rdma_access_flags access) { - struct ibv_mr *mr; + struct rdma_mr *mr; mr = pd->context->ops.reg_mr(pd, addr, length, access); if (mr) { @@ -164,14 +212,26 @@ struct ibv_mr *ibv_reg_mr(struct ibv_pd return mr; } -int ibv_dereg_mr(struct ibv_mr *mr) +struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, + size_t length, enum ibv_access_flags access) +{ + return (struct ibv_mr *) rdma_reg_mr((struct rdma_pd *) pd, addr, + length, access); +} + +int rdma_dereg_mr(struct rdma_mr *mr) { return mr->context->ops.dereg_mr(mr); } -static struct ibv_comp_channel *ibv_create_comp_channel_v2(struct ibv_context *context) +int ibv_dereg_mr(struct ibv_mr *mr) { - struct ibv_abi_compat_v2 *t = context->abi_compat; + return rdma_dereg_mr((struct rdma_mr *) mr); +} + +static struct rdma_comp_channel *rdma_create_comp_channel_v2(struct rdma_context *context) +{ + struct rdma_abi_compat_v2 *t = context->abi_compat; static int warned; if (!pthread_mutex_trylock(&t->in_use)) @@ -187,14 +247,14 @@ static struct ibv_comp_channel *ibv_crea return NULL; } -struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context) +struct rdma_comp_channel *rdma_create_comp_channel(struct rdma_context *context) { - struct ibv_comp_channel *channel; - struct ibv_create_comp_channel cmd; - struct ibv_create_comp_channel_resp resp; + struct rdma_comp_channel *channel; + struct rdma_create_comp_channel cmd; + struct rdma_create_comp_channel_resp resp; if (abi_ver <= 2) - return ibv_create_comp_channel_v2(context); + return rdma_create_comp_channel_v2(context); channel = malloc(sizeof *channel); if (!channel) @@ -211,17 +271,22 @@ struct ibv_comp_channel *ibv_create_comp return channel; } -static int ibv_destroy_comp_channel_v2(struct ibv_comp_channel *channel) +struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context) +{ + return (struct ibv_comp_channel *) rdma_create_comp_channel((struct rdma_context *) context); +} + +static int rdma_destroy_comp_channel_v2(struct rdma_comp_channel *channel) { - struct ibv_abi_compat_v2 *t = (struct ibv_abi_compat_v2 *) channel; + struct rdma_abi_compat_v2 *t = (struct rdma_abi_compat_v2 *) channel; pthread_mutex_unlock(&t->in_use); return 0; } -int ibv_destroy_comp_channel(struct ibv_comp_channel *channel) +int rdma_destroy_comp_channel(struct rdma_comp_channel *channel) { if (abi_ver <= 2) - return ibv_destroy_comp_channel_v2(channel); + return rdma_destroy_comp_channel_v2(channel); close(channel->fd); free(channel); @@ -229,10 +294,17 @@ int ibv_destroy_comp_channel(struct ibv_ return 0; } -struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe, void *cq_context, - struct ibv_comp_channel *channel, int comp_vector) +int ibv_destroy_comp_channel(struct ibv_comp_channel *channel) { - struct ibv_cq *cq = context->ops.create_cq(context, cqe, channel, + return rdma_destroy_comp_channel((struct rdma_comp_channel *) channel); +} + +struct rdma_cq *rdma_create_cq(struct rdma_context *context, int cqe, + void *cq_context, + struct rdma_comp_channel *channel, + int comp_vector) +{ + struct rdma_cq *cq = context->ops.create_cq(context, cqe, channel, comp_vector); if (cq) { @@ -247,7 +319,17 @@ struct ibv_cq *ibv_create_cq(struct ibv_ return cq; } -int ibv_resize_cq(struct ibv_cq *cq, int cqe) +struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe, + void *cq_context, + struct ibv_comp_channel *channel, int comp_vector) +{ + return (struct ibv_cq *) rdma_create_cq((struct rdma_context *) context, + cqe, cq_context, (struct + rdma_comp_channel *) channel, + comp_vector); +} + +int rdma_resize_cq(struct rdma_cq *cq, int cqe) { if (!cq->context->ops.resize_cq) return ENOSYS; @@ -255,21 +337,30 @@ int ibv_resize_cq(struct ibv_cq *cq, int return cq->context->ops.resize_cq(cq, cqe); } -int ibv_destroy_cq(struct ibv_cq *cq) +int ibv_resize_cq(struct ibv_cq *cq, int cqe) +{ + return rdma_resize_cq((struct rdma_cq *) cq, cqe); +} + +int rdma_destroy_cq(struct rdma_cq *cq) { return cq->context->ops.destroy_cq(cq); } +int ibv_destroy_cq(struct ibv_cq *cq) +{ + return rdma_destroy_cq((struct rdma_cq *) cq); +} -int ibv_get_cq_event(struct ibv_comp_channel *channel, - struct ibv_cq **cq, void **cq_context) +int rdma_get_cq_event(struct rdma_comp_channel *channel, + struct rdma_cq **cq, void **cq_context) { - struct ibv_comp_event ev; + struct rdma_comp_event ev; if (read(channel->fd, &ev, sizeof ev) != sizeof ev) return -1; - *cq = (struct ibv_cq *) (uintptr_t) ev.cq_handle; + *cq = (struct rdma_cq *) (uintptr_t) ev.cq_handle; *cq_context = (*cq)->cq_context; if ((*cq)->context->ops.cq_event) @@ -278,7 +369,14 @@ int ibv_get_cq_event(struct ibv_comp_cha return 0; } -void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents) +int ibv_get_cq_event(struct ibv_comp_channel *channel, + struct ibv_cq **cq, void **cq_context) +{ + return rdma_get_cq_event((struct rdma_comp_channel *) channel, + (struct rdma_cq **) cq, cq_context); +} + +void rdma_ack_cq_events(struct rdma_cq *cq, unsigned int nevents) { pthread_mutex_lock(&cq->mutex); cq->comp_events_completed += nevents; @@ -286,10 +384,15 @@ void ibv_ack_cq_events(struct ibv_cq *cq pthread_mutex_unlock(&cq->mutex); } -struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, - struct ibv_srq_init_attr *srq_init_attr) +void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents) +{ + rdma_ack_cq_events((struct rdma_cq *) cq, nevents); +} + +struct rdma_srq *rdma_create_srq(struct rdma_pd *pd, + struct rdma_srq_init_attr *srq_init_attr) { - struct ibv_srq *srq; + struct rdma_srq *srq; if (!pd->context->ops.create_srq) return NULL; @@ -307,27 +410,55 @@ struct ibv_srq *ibv_create_srq(struct ib return srq; } +struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, + struct ibv_srq_init_attr *srq_init_attr) +{ + return (struct ibv_srq *) rdma_create_srq((struct rdma_pd *) pd, + (struct rdma_srq_init_attr *) + srq_init_attr); +} + +int rdma_modify_srq(struct rdma_srq *srq, + struct rdma_srq_attr *srq_attr, + enum rdma_srq_attr_mask srq_attr_mask) +{ + return srq->context->ops.modify_srq(srq, srq_attr, srq_attr_mask); +} + int ibv_modify_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr, enum ibv_srq_attr_mask srq_attr_mask) { - return srq->context->ops.modify_srq(srq, srq_attr, srq_attr_mask); + return rdma_modify_srq((struct rdma_srq *) srq, + (struct rdma_srq_attr *) srq_attr, + srq_attr_mask); } -int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr) +int rdma_query_srq(struct rdma_srq *srq, struct rdma_srq_attr *srq_attr) { return srq->context->ops.query_srq(srq, srq_attr); } -int ibv_destroy_srq(struct ibv_srq *srq) +int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr) +{ + return rdma_query_srq((struct rdma_srq *) srq, + (struct rdma_srq_attr *) srq_attr); +} + +int rdma_destroy_srq(struct rdma_srq *srq) { return srq->context->ops.destroy_srq(srq); } -struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, - struct ibv_qp_init_attr *qp_init_attr) +int ibv_destroy_srq(struct ibv_srq *srq) +{ + return rdma_destroy_srq((struct rdma_srq *) srq); +} + +struct rdma_qp *rdmav_create_qp(struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr) { - struct ibv_qp *qp = pd->context->ops.create_qp(pd, qp_init_attr); + struct rdma_qp *qp = pd->context->ops.create_qp(pd, qp_init_attr); if (qp) { qp->context = pd->context; @@ -345,9 +476,16 @@ struct ibv_qp *ibv_create_qp(struct ibv_ return qp; } -int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, - enum ibv_qp_attr_mask attr_mask, - struct ibv_qp_init_attr *init_attr) +struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, + struct ibv_qp_init_attr *qp_init_attr) +{ + return (struct ibv_qp *) rdmav_create_qp((struct rdma_pd *) pd, + (struct rdma_qp_init_attr *) qp_init_attr); +} + +int rdma_query_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask, + struct rdma_qp_init_attr *init_attr) { int ret; @@ -361,8 +499,17 @@ int ibv_query_qp(struct ibv_qp *qp, stru return 0; } -int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, - enum ibv_qp_attr_mask attr_mask) +int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, + enum ibv_qp_attr_mask attr_mask, + struct ibv_qp_init_attr *init_attr) +{ + return rdma_query_qp((struct rdma_qp *) qp, + (struct rdma_qp_attr *) attr, attr_mask, + (struct rdma_qp_init_attr *) init_attr); +} + +int rdma_modify_qp(struct rdma_qp *qp, struct rdma_qp_attr *attr, + enum rdma_qp_attr_mask attr_mask) { int ret; @@ -376,14 +523,26 @@ int ibv_modify_qp(struct ibv_qp *qp, str return 0; } -int ibv_destroy_qp(struct ibv_qp *qp) +int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, + enum ibv_qp_attr_mask attr_mask) +{ + return rdma_modify_qp((struct rdma_qp *) qp, + (struct rdma_qp_attr *) attr, attr_mask); +} + +int rdmav_destroy_qp(struct rdma_qp *qp) { return qp->context->ops.destroy_qp(qp); } -struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) +int ibv_destroy_qp(struct ibv_qp *qp) { - struct ibv_ah *ah = pd->context->ops.create_ah(pd, attr); + return rdmav_destroy_qp((struct rdma_qp *) qp); +} + +struct rdma_ah *rdma_create_ah(struct rdma_pd *pd, struct rdma_ah_attr *attr) +{ + struct rdma_ah *ah = pd->context->ops.create_ah(pd, attr); if (ah) { ah->context = pd->context; @@ -393,22 +552,28 @@ struct ibv_ah *ibv_create_ah(struct ibv_ return ah; } -static int ibv_find_gid_index(struct ibv_context *context, uint8_t port_num, - union ibv_gid *gid) +struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) +{ + return (struct ibv_ah *) rdma_create_ah((struct rdma_pd *) pd, + (struct rdma_ah_attr *) attr); +} + +static int rdma_find_gid_index(struct rdma_context *context, uint8_t port_num, + union rdma_gid *gid) { - union ibv_gid sgid; + union rdma_gid sgid; int i = 0, ret; do { - ret = ibv_query_gid(context, port_num, i++, &sgid); + ret = rdma_query_gid(context, port_num, i++, &sgid); } while (!ret && memcmp(&sgid, gid, sizeof *gid)); return ret ? ret : i - 1; } -int ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num, - struct ibv_wc *wc, struct ibv_grh *grh, - struct ibv_ah_attr *ah_attr) +int rdma_init_ah_from_wc(struct rdma_context *context, uint8_t port_num, + struct rdma_wc *wc, struct rdma_grh *grh, + struct rdma_ah_attr *ah_attr) { uint32_t flow_class; int ret; @@ -423,7 +588,7 @@ int ibv_init_ah_from_wc(struct ibv_conte ah_attr->is_global = 1; ah_attr->grh.dgid = grh->sgid; - ret = ibv_find_gid_index(context, port_num, &grh->dgid); + ret = rdma_find_gid_index(context, port_num, &grh->dgid); if (ret < 0) return ret; @@ -436,30 +601,66 @@ int ibv_init_ah_from_wc(struct ibv_conte return 0; } -struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd, struct ibv_wc *wc, - struct ibv_grh *grh, uint8_t port_num) +int ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num, + struct ibv_wc *wc, struct ibv_grh *grh, + struct ibv_ah_attr *ah_attr) +{ + return rdma_init_ah_from_wc((struct rdma_context *) context, port_num, + (struct rdma_wc *) wc, + (struct rdma_grh *) grh, + (struct rdma_ah_attr *) ah_attr); +} + +struct rdma_ah *rdma_create_ah_from_wc(struct rdma_pd *pd, struct rdma_wc *wc, + struct rdma_grh *grh, uint8_t port_num) { - struct ibv_ah_attr ah_attr; + struct rdma_ah_attr ah_attr; int ret; - ret = ibv_init_ah_from_wc(pd->context, port_num, wc, grh, &ah_attr); + ret = rdma_init_ah_from_wc(pd->context, port_num, wc, grh, &ah_attr); if (ret) return NULL; - return ibv_create_ah(pd, &ah_attr); + return rdma_create_ah(pd, &ah_attr); } -int ibv_destroy_ah(struct ibv_ah *ah) +struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd, struct ibv_wc *wc, + struct ibv_grh *grh, uint8_t port_num) +{ + return (struct ibv_ah *) rdma_create_ah_from_wc((struct rdma_pd *) pd, + (struct rdma_wc *) wc, + (struct rdma_grh *) grh, + port_num); +} + +int rdma_destroy_ah(struct rdma_ah *ah) { return ah->context->ops.destroy_ah(ah); } -int ibv_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +int ibv_destroy_ah(struct ibv_ah *ah) +{ + return rdma_destroy_ah((struct rdma_ah *) ah); +} + +int rdma_attach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) { return qp->context->ops.attach_mcast(qp, gid, lid); } -int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +int ibv_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +{ + return rdma_attach_mcast((struct rdma_qp *) qp, (union rdma_gid *) gid, + lid); +} + +int rdma_detach_mcast(struct rdma_qp *qp, union rdma_gid *gid, uint16_t lid) { return qp->context->ops.detach_mcast(qp, gid, lid); } + +int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +{ + return rdma_detach_mcast((struct rdma_qp *) qp, (union rdma_gid *) gid, + lid); +} From krkumar2 at in.ibm.com Thu Jul 27 21:50:28 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 28 Jul 2006 10:20:28 +0530 Subject: [openib-general] [PATCH 4/6] librdmacm include file changes. In-Reply-To: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <20060728045028.9426.63971.sendpatchset@K50wks273950wss.in.ibm.com> Convert librdmacm include files to use the new libibverbs API. Signed-off-by: Krishna Kumar diff -ruNp ORG/librdmacm/include/rdma/rdma_cma_abi.h NEW/librdmacm/include/rdma/rdma_cma_abi.h --- ORG/librdmacm/include/rdma/rdma_cma_abi.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/include/rdma/rdma_cma_abi.h 2006-07-27 15:34:44.000000000 +0530 @@ -123,7 +123,7 @@ struct ucma_abi_query_route { struct ucma_abi_query_route_resp { __u64 node_guid; - struct ibv_kern_path_rec ib_route[2]; + struct rdma_kern_path_rec ib_route[2]; struct sockaddr_in6 src_addr; struct sockaddr_in6 dst_addr; __u32 num_paths; @@ -194,7 +194,7 @@ struct ucma_abi_leave_mcast { struct ucma_abi_dst_attr_resp { __u32 remote_qpn; __u32 remote_qkey; - struct ibv_kern_ah_attr ah_attr; + struct rdma_kern_ah_attr ah_attr; }; struct ucma_abi_get_dst_attr { diff -ruNp ORG/librdmacm/include/rdma/rdma_cma.h NEW/librdmacm/include/rdma/rdma_cma.h --- ORG/librdmacm/include/rdma/rdma_cma.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/include/rdma/rdma_cma.h 2006-07-27 15:34:44.000000000 +0530 @@ -68,8 +68,8 @@ enum { }; struct ib_addr { - union ibv_gid sgid; - union ibv_gid dgid; + union rdma_gid sgid; + union rdma_gid dgid; uint16_t pkey; }; @@ -83,7 +83,7 @@ struct rdma_addr { struct rdma_route { struct rdma_addr addr; - struct ibv_sa_path_rec *path_rec; + struct rdma_sa_path_rec *path_rec; int num_paths; }; @@ -92,10 +92,10 @@ struct rdma_event_channel { }; struct rdma_cm_id { - struct ibv_context *verbs; + struct rdma_context *verbs; struct rdma_event_channel *channel; void *context; - struct ibv_qp *qp; + struct rdma_qp *qp; struct rdma_route route; enum rdma_port_space ps; uint8_t port_num; @@ -191,8 +191,8 @@ int rdma_resolve_route(struct rdma_cm_id * QPs allocated to an rdma_cm_id will automatically be transitioned by the CMA * through their states. */ -int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd *pd, - struct ibv_qp_init_attr *qp_init_attr); +int rdma_create_qp(struct rdma_cm_id *id, struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr); /** * rdma_destroy_qp - Deallocate the QP associated with the specified RDMA @@ -214,7 +214,7 @@ struct rdma_conn_param { /* Fields below ignored if a QP is created on the rdma_cm_id. */ uint8_t srq; uint32_t qp_num; - enum ibv_qp_type qp_type; + enum rdma_qp_type qp_type; }; /** @@ -341,11 +341,11 @@ static inline uint16_t rdma_get_dst_port * across multiple rdma_cm_id's. * The array must be released by calling rdma_free_devices(). */ -struct ibv_context **rdma_get_devices(int *num_devices); +struct rdma_context **rdma_get_devices(int *num_devices); /** * rdma_free_devices - Frees the list of devices returned by rdma_get_devices(). */ -void rdma_free_devices(struct ibv_context **list); +void rdma_free_devices(struct rdma_context **list); #endif /* RDMA_CMA_H */ diff -ruNp ORG/librdmacm/include/rdma/rdma_cma_ib.h NEW/librdmacm/include/rdma/rdma_cma_ib.h --- ORG/librdmacm/include/rdma/rdma_cma_ib.h 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/include/rdma/rdma_cma_ib.h 2006-07-27 15:34:44.000000000 +0530 @@ -34,7 +34,7 @@ /* IB specific option names for get/set. */ enum { - IB_PATH_OPTIONS = 1, /* struct ibv_kern_path_rec */ + IB_PATH_OPTIONS = 1, /* struct rdma_kern_path_rec */ IB_CM_REQ_OPTIONS = 2 /* struct ib_cm_req_opt */ }; @@ -56,7 +56,7 @@ struct ib_cm_req_opt { * Users must have called rdma_connect() to resolve the destination information. */ int rdma_get_dst_attr(struct rdma_cm_id *id, struct sockaddr *addr, - struct ibv_ah_attr *ah_attr, uint32_t *remote_qpn, + struct rdma_ah_attr *ah_attr, uint32_t *remote_qpn, uint32_t *remote_qkey); #endif /* RDMA_CMA_IB_H */ From krkumar2 at in.ibm.com Thu Jul 27 21:50:34 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 28 Jul 2006 10:20:34 +0530 Subject: [openib-general] [PATCH 5/6] librdmacm source file changes. In-Reply-To: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <20060728045034.9426.6547.sendpatchset@K50wks273950wss.in.ibm.com> Convert librdmacm source files to use the new libibverbs API. Signed-off-by: Krishna Kumar diff -ruNp ORG/librdmacm/configure.in NEW/librdmacm/configure.in --- ORG/librdmacm/configure.in 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/configure.in 2006-07-27 15:34:44.000000000 +0530 @@ -25,8 +25,8 @@ AC_CHECK_SIZEOF(long) dnl Checks for libraries if test "$disable_libcheck" != "yes" then -AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], - AC_MSG_ERROR([ibv_get_device_list() not found. librdmacm requires libibverbs.])) +AC_CHECK_LIB(ibverbs, rdma_get_device_list, [], + AC_MSG_ERROR([rdma_get_device_list() not found. librdmacm requires libibverbs.])) fi dnl Checks for header files. diff -ruNp ORG/librdmacm/src/cma.c NEW/librdmacm/src/cma.c --- ORG/librdmacm/src/cma.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/src/cma.c 2006-07-27 15:34:44.000000000 +0530 @@ -103,7 +103,7 @@ do { } while (0) struct cma_device { - struct ibv_context *verbs; + struct rdma_context *verbs; uint64_t guid; int port_cnt; }; @@ -130,7 +130,7 @@ static void ucma_cleanup(void) { if (cma_dev_cnt) { while (cma_dev_cnt) - ibv_close_device(cma_dev_array[--cma_dev_cnt].verbs); + rdma_close_device(cma_dev_array[--cma_dev_cnt].verbs); free(cma_dev_array); cma_dev_cnt = 0; @@ -141,7 +141,7 @@ static int check_abi_version(void) { char value[8]; - if (ibv_read_sysfs_file(ibv_get_sysfs_path(), + if (rdma_read_sysfs_file(rdma_get_sysfs_path(), "class/misc/rdma_cm/abi_version", value, sizeof value) < 0) { /* @@ -167,9 +167,9 @@ static int check_abi_version(void) static int ucma_init(void) { - struct ibv_device **dev_list = NULL; + struct rdma_device **dev_list = NULL; struct cma_device *cma_dev; - struct ibv_device_attr attr; + struct rdma_device_attr attr; int i, ret; pthread_mutex_lock(&mut); @@ -180,7 +180,7 @@ static int ucma_init(void) if (ret) goto err; - dev_list = ibv_get_device_list(&cma_dev_cnt); + dev_list = rdma_get_device_list(&cma_dev_cnt); if (!dev_list) { printf("CMA: unable to get RDMA device list\n"); ret = -ENODEV; @@ -196,15 +196,15 @@ static int ucma_init(void) for (i = 0; dev_list[i]; ++i) { cma_dev = &cma_dev_array[i]; - cma_dev->guid = ibv_get_device_guid(dev_list[i]); - cma_dev->verbs = ibv_open_device(dev_list[i]); + cma_dev->guid = rdma_get_device_guid(dev_list[i]); + cma_dev->verbs = rdma_open_device(dev_list[i]); if (!cma_dev->verbs) { printf("CMA: unable to open RDMA device\n"); ret = -ENODEV; goto err; } - ret = ibv_query_device(cma_dev->verbs, &attr); + ret = rdma_query_device(cma_dev->verbs, &attr); if (ret) { printf("CMA: unable to query RDMA device\n"); goto err; @@ -219,13 +219,13 @@ err: ucma_cleanup(); pthread_mutex_unlock(&mut); if (dev_list) - ibv_free_device_list(dev_list); + rdma_free_device_list(dev_list); return ret; } -struct ibv_context **rdma_get_devices(int *num_devices) +struct rdma_context **rdma_get_devices(int *num_devices) { - struct ibv_context **devs = NULL; + struct rdma_context **devs = NULL; int i; if (!cma_dev_cnt && ucma_init()) @@ -244,7 +244,7 @@ out: return devs; } -void rdma_free_devices(struct ibv_context **list) +void rdma_free_devices(struct rdma_context **list) { free(list); } @@ -479,7 +479,7 @@ static int ucma_query_route(struct rdma_ id->route.num_paths = resp->num_paths; for (i = 0; i < resp->num_paths; i++) - ibv_copy_path_rec_from_kern(&id->route.path_rec[i], + rdma_copy_path_rec_from_kern(&id->route.path_rec[i], &resp->ib_route[i]); } @@ -578,11 +578,11 @@ int rdma_resolve_route(struct rdma_cm_id return 0; } -static int rdma_init_qp_attr(struct rdma_cm_id *id, struct ibv_qp_attr *qp_attr, +static int rdma_init_qp_attr(struct rdma_cm_id *id, struct rdma_qp_attr *qp_attr, int *qp_attr_mask) { struct ucma_abi_init_qp_attr *cmd; - struct ibv_kern_qp_attr *resp; + struct rdma_kern_qp_attr *resp; struct cma_id_private *id_priv; void *msg; int ret, size; @@ -596,59 +596,59 @@ static int rdma_init_qp_attr(struct rdma if (ret != size) return (ret > 0) ? -ENODATA : ret; - ibv_copy_qp_attr_from_kern(qp_attr, resp); + rdma_copy_qp_attr_from_kern(qp_attr, resp); *qp_attr_mask = resp->qp_attr_mask; return 0; } static int ucma_modify_qp_rtr(struct rdma_cm_id *id) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; int qp_attr_mask, ret; if (!id->qp) return -EINVAL; /* Need to update QP attributes from default values. */ - qp_attr.qp_state = IBV_QPS_INIT; + qp_attr.qp_state = RDMA_QPS_INIT; ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); if (ret) return ret; - ret = ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); + ret = rdma_modify_qp(id->qp, &qp_attr, qp_attr_mask); if (ret) return ret; - qp_attr.qp_state = IBV_QPS_RTR; + qp_attr.qp_state = RDMA_QPS_RTR; ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); if (ret) return ret; - return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); + return rdma_modify_qp(id->qp, &qp_attr, qp_attr_mask); } static int ucma_modify_qp_rts(struct rdma_cm_id *id) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; int qp_attr_mask, ret; - qp_attr.qp_state = IBV_QPS_RTS; + qp_attr.qp_state = RDMA_QPS_RTS; ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); if (ret) return ret; - return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); + return rdma_modify_qp(id->qp, &qp_attr, qp_attr_mask); } static int ucma_modify_qp_err(struct rdma_cm_id *id) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; if (!id->qp) return 0; - qp_attr.qp_state = IBV_QPS_ERR; - return ibv_modify_qp(id->qp, &qp_attr, IBV_QP_STATE); + qp_attr.qp_state = RDMA_QPS_ERR; + return rdma_modify_qp(id->qp, &qp_attr, RDMA_QP_STATE); } static int ucma_find_pkey(struct cma_device *cma_dev, uint8_t port_num, @@ -658,7 +658,7 @@ static int ucma_find_pkey(struct cma_dev uint16_t chk_pkey; for (i = 0, ret = 0; !ret; i++) { - ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); + ret = rdma_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); if (!ret && pkey == chk_pkey) { *pkey_index = (uint16_t) i; return 0; @@ -668,9 +668,9 @@ static int ucma_find_pkey(struct cma_dev return -EINVAL; } -static int ucma_init_ib_qp(struct cma_id_private *id_priv, struct ibv_qp *qp) +static int ucma_init_ib_qp(struct cma_id_private *id_priv, struct rdma_qp *qp) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; struct ib_addr *ibaddr; int ret; @@ -681,15 +681,15 @@ static int ucma_init_ib_qp(struct cma_id return ret; qp_attr.port_num = id_priv->id.port_num; - qp_attr.qp_state = IBV_QPS_INIT; - qp_attr.qp_access_flags = IBV_ACCESS_LOCAL_WRITE; - return ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_ACCESS_FLAGS | - IBV_QP_PKEY_INDEX | IBV_QP_PORT); + qp_attr.qp_state = RDMA_QPS_INIT; + qp_attr.qp_access_flags = RDMA_ACCESS_LOCAL_WRITE; + return rdma_modify_qp(qp, &qp_attr, RDMA_QP_STATE | RDMA_QP_ACCESS_FLAGS | + RDMA_QP_PKEY_INDEX | RDMA_QP_PORT); } -static int ucma_init_ud_qp(struct cma_id_private *id_priv, struct ibv_qp *qp) +static int ucma_init_ud_qp(struct cma_id_private *id_priv, struct rdma_qp *qp) { - struct ibv_qp_attr qp_attr; + struct rdma_qp_attr qp_attr; struct ib_addr *ibaddr; int ret; @@ -700,35 +700,35 @@ static int ucma_init_ud_qp(struct cma_id return ret; qp_attr.port_num = id_priv->id.port_num; - qp_attr.qp_state = IBV_QPS_INIT; + qp_attr.qp_state = RDMA_QPS_INIT; qp_attr.qkey = ntohs(rdma_get_src_port(&id_priv->id)); - ret = ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_PKEY_INDEX | - IBV_QP_PORT | IBV_QP_QKEY); + ret = rdma_modify_qp(qp, &qp_attr, RDMA_QP_STATE | RDMA_QP_PKEY_INDEX | + RDMA_QP_PORT | RDMA_QP_QKEY); if (ret) return ret; - qp_attr.qp_state = IBV_QPS_RTR; - ret = ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE); + qp_attr.qp_state = RDMA_QPS_RTR; + ret = rdma_modify_qp(qp, &qp_attr, RDMA_QP_STATE); if (ret) return ret; - qp_attr.qp_state = IBV_QPS_RTS; + qp_attr.qp_state = RDMA_QPS_RTS; qp_attr.sq_psn = 0; - return ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_SQ_PSN); + return rdma_modify_qp(qp, &qp_attr, RDMA_QP_STATE | RDMA_QP_SQ_PSN); } -int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd *pd, - struct ibv_qp_init_attr *qp_init_attr) +int rdma_create_qp(struct rdma_cm_id *id, struct rdma_pd *pd, + struct rdma_qp_init_attr *qp_init_attr) { struct cma_id_private *id_priv; - struct ibv_qp *qp; + struct rdma_qp *qp; int ret; id_priv = container_of(id, struct cma_id_private, id); if (id->verbs != pd->context) return -EINVAL; - qp = ibv_create_qp(pd, qp_init_attr); + qp = rdmav_create_qp(pd, qp_init_attr); if (!qp) return -ENOMEM; @@ -742,19 +742,19 @@ int rdma_create_qp(struct rdma_cm_id *id id->qp = qp; return 0; err: - ibv_destroy_qp(qp); + rdmav_destroy_qp(qp); return ret; } void rdma_destroy_qp(struct rdma_cm_id *id) { - ibv_destroy_qp(id->qp); + rdmav_destroy_qp(id->qp); } static void ucma_copy_conn_param_to_kern(struct ucma_abi_conn_param *dst, struct rdma_conn_param *src, uint32_t qp_num, - enum ibv_qp_type qp_type, uint8_t srq) + enum rdma_qp_type qp_type, uint8_t srq) { dst->qp_num = qp_num; dst->qp_type = qp_type; @@ -934,7 +934,7 @@ int rdma_leave_multicast(struct rdma_cm_ struct cma_id_private *id_priv; void *msg; int ret, size, addrlen; - struct ibv_ah_attr ah_attr; + struct rdma_ah_attr ah_attr; uint32_t qp_info; addrlen = ucma_addrlen(addr); @@ -951,7 +951,7 @@ int rdma_leave_multicast(struct rdma_cm_ if (ret) goto out; - ret = ibv_detach_mcast(id->qp, &ah_attr.grh.dgid, ah_attr.dlid); + ret = rdma_detach_mcast(id->qp, &ah_attr.grh.dgid, ah_attr.dlid); if (ret) goto out; } @@ -1075,7 +1075,7 @@ static void ucma_process_mcast(struct rd { struct ucma_abi_join_mcast kmc_data; struct rdma_multicast_data *mc_data; - struct ibv_ah_attr ah_attr; + struct rdma_ah_attr ah_attr; uint32_t qp_info; kmc_data = *(struct ucma_abi_join_mcast *) evt->private_data; @@ -1093,7 +1093,7 @@ static void ucma_process_mcast(struct rd if (evt->status) goto err; - evt->status = ibv_attach_mcast(id->qp, &ah_attr.grh.dgid, ah_attr.dlid); + evt->status = rdma_attach_mcast(id->qp, &ah_attr.grh.dgid, ah_attr.dlid); if (evt->status) goto err; return; @@ -1243,7 +1243,7 @@ int rdma_set_option(struct rdma_cm_id *i } int rdma_get_dst_attr(struct rdma_cm_id *id, struct sockaddr *addr, - struct ibv_ah_attr *ah_attr, uint32_t *remote_qpn, + struct rdma_ah_attr *ah_attr, uint32_t *remote_qpn, uint32_t *remote_qkey) { struct ucma_abi_dst_attr_resp *resp; @@ -1265,7 +1265,7 @@ int rdma_get_dst_attr(struct rdma_cm_id if (ret != size) return (ret > 0) ? -ENODATA : ret; - ibv_copy_ah_attr_from_kern(ah_attr, &resp->ah_attr); + rdma_copy_ah_attr_from_kern(ah_attr, &resp->ah_attr); *remote_qpn = resp->remote_qpn; *remote_qkey = resp->remote_qkey; return 0; From krkumar2 at in.ibm.com Thu Jul 27 21:50:41 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 28 Jul 2006 10:20:41 +0530 Subject: [openib-general] [PATCH 6/6] librdmacm examples changes. In-Reply-To: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <20060728045041.9426.80827.sendpatchset@K50wks273950wss.in.ibm.com> Convert librdmacm examples to use the new API. Signed-off-by: Krishna Kumar diff -ruNp ORG/librdmacm/examples/cmatose.c NEW/librdmacm/examples/cmatose.c --- ORG/librdmacm/examples/cmatose.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/examples/cmatose.c 2006-07-27 15:34:44.000000000 +0530 @@ -62,9 +62,9 @@ struct cmatest_node { int id; struct rdma_cm_id *cma_id; int connected; - struct ibv_pd *pd; - struct ibv_cq *cq; - struct ibv_mr *mr; + struct rdma_pd *pd; + struct rdma_cq *cq; + struct rdma_mr *mr; void *mem; }; @@ -100,8 +100,8 @@ static int create_message(struct cmatest printf("failed message allocation\n"); return -1; } - node->mr = ibv_reg_mr(node->pd, node->mem, message_size, - IBV_ACCESS_LOCAL_WRITE); + node->mr = rdma_reg_mr(node->pd, node->mem, message_size, + RDMA_ACCESS_LOCAL_WRITE); if (!node->mr) { printf("failed to reg MR\n"); goto err; @@ -114,10 +114,10 @@ err: static int init_node(struct cmatest_node *node) { - struct ibv_qp_init_attr init_qp_attr; + struct rdma_qp_init_attr init_qp_attr; int cqe, ret; - node->pd = ibv_alloc_pd(node->cma_id->verbs); + node->pd = rdma_alloc_pd(node->cma_id->verbs); if (!node->pd) { ret = -ENOMEM; printf("cmatose: unable to allocate PD\n"); @@ -125,7 +125,7 @@ static int init_node(struct cmatest_node } cqe = message_count ? message_count * 2 : 2; - node->cq = ibv_create_cq(node->cma_id->verbs, cqe, node, 0, 0); + node->cq = rdma_create_cq(node->cma_id->verbs, cqe, node, 0, 0); if (!node->cq) { ret = -ENOMEM; printf("cmatose: unable to create CQ\n"); @@ -139,7 +139,7 @@ static int init_node(struct cmatest_node init_qp_attr.cap.max_recv_sge = 1; init_qp_attr.qp_context = node; init_qp_attr.sq_sig_all = 1; - init_qp_attr.qp_type = IBV_QPT_RC; + init_qp_attr.qp_type = RDMA_QPT_RC; init_qp_attr.send_cq = node->cq; init_qp_attr.recv_cq = node->cq; ret = rdma_create_qp(node->cma_id, node->pd, &init_qp_attr); @@ -159,8 +159,8 @@ out: static int post_recvs(struct cmatest_node *node) { - struct ibv_recv_wr recv_wr, *recv_failure; - struct ibv_sge sge; + struct rdma_recv_wr recv_wr, *recv_failure; + struct rdma_sge sge; int i, ret = 0; if (!message_count) @@ -176,7 +176,7 @@ static int post_recvs(struct cmatest_nod sge.addr = (uintptr_t) node->mem; for (i = 0; i < message_count && !ret; i++ ) { - ret = ibv_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); + ret = rdma_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); if (ret) { printf("failed to post receives: %d\n", ret); break; @@ -187,8 +187,8 @@ static int post_recvs(struct cmatest_nod static int post_sends(struct cmatest_node *node) { - struct ibv_send_wr send_wr, *bad_send_wr; - struct ibv_sge sge; + struct rdma_send_wr send_wr, *bad_send_wr; + struct rdma_sge sge; int i, ret = 0; if (!node->connected || !message_count) @@ -197,7 +197,7 @@ static int post_sends(struct cmatest_nod send_wr.next = NULL; send_wr.sg_list = &sge; send_wr.num_sge = 1; - send_wr.opcode = IBV_WR_SEND; + send_wr.opcode = RDMA_WR_SEND; send_wr.send_flags = 0; send_wr.wr_id = (unsigned long)node; @@ -206,7 +206,7 @@ static int post_sends(struct cmatest_nod sge.addr = (uintptr_t) node->mem; for (i = 0; i < message_count && !ret; i++) { - ret = ibv_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); + ret = rdma_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); if (ret) printf("failed to post sends: %d\n", ret); } @@ -350,15 +350,15 @@ static void destroy_node(struct cmatest_ rdma_destroy_qp(node->cma_id); if (node->cq) - ibv_destroy_cq(node->cq); + rdma_destroy_cq(node->cq); if (node->mem) { - ibv_dereg_mr(node->mr); + rdma_dereg_mr(node->mr); free(node->mem); } if (node->pd) - ibv_dealloc_pd(node->pd); + rdma_dealloc_pd(node->pd); /* Destroy the RDMA ID after all device resources */ rdma_destroy_id(node->cma_id); @@ -404,7 +404,7 @@ static void destroy_nodes(void) static int poll_cqs(void) { - struct ibv_wc wc[8]; + struct rdma_wc wc[8]; int done, i, ret; for (i = 0; i < connections; i++) { @@ -412,7 +412,7 @@ static int poll_cqs(void) continue; for (done = 0; done < message_count; done += ret) { - ret = ibv_poll_cq(test.nodes[i].cq, 8, wc); + ret = rdma_poll_cq(test.nodes[i].cq, 8, wc); if (ret < 0) { printf("cmatose: failed polling CQ: %d\n", ret); return ret; diff -ruNp ORG/librdmacm/examples/mckey.c NEW/librdmacm/examples/mckey.c --- ORG/librdmacm/examples/mckey.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/examples/mckey.c 2006-07-27 15:34:44.000000000 +0530 @@ -50,10 +50,10 @@ struct cmatest_node { int id; struct rdma_cm_id *cma_id; int connected; - struct ibv_pd *pd; - struct ibv_cq *cq; - struct ibv_mr *mr; - struct ibv_ah *ah; + struct rdma_pd *pd; + struct rdma_cq *cq; + struct rdma_mr *mr; + struct rdma_ah *ah; uint32_t remote_qpn; uint32_t remote_qkey; void *mem; @@ -85,14 +85,14 @@ static int create_message(struct cmatest if (!message_count) return 0; - node->mem = malloc(message_size + sizeof(struct ibv_grh)); + node->mem = malloc(message_size + sizeof(struct rdma_grh)); if (!node->mem) { printf("failed message allocation\n"); return -1; } - node->mr = ibv_reg_mr(node->pd, node->mem, - message_size + sizeof(struct ibv_grh), - IBV_ACCESS_LOCAL_WRITE); + node->mr = rdma_reg_mr(node->pd, node->mem, + message_size + sizeof(struct rdma_grh), + RDMA_ACCESS_LOCAL_WRITE); if (!node->mr) { printf("failed to reg MR\n"); goto err; @@ -105,10 +105,10 @@ err: static int init_node(struct cmatest_node *node) { - struct ibv_qp_init_attr init_qp_attr; + struct rdma_qp_init_attr init_qp_attr; int cqe, ret; - node->pd = ibv_alloc_pd(node->cma_id->verbs); + node->pd = rdma_alloc_pd(node->cma_id->verbs); if (!node->pd) { ret = -ENOMEM; printf("mckey: unable to allocate PD\n"); @@ -116,7 +116,7 @@ static int init_node(struct cmatest_node } cqe = message_count ? message_count * 2 : 2; - node->cq = ibv_create_cq(node->cma_id->verbs, cqe, node, 0, 0); + node->cq = rdma_create_cq(node->cma_id->verbs, cqe, node, 0, 0); if (!node->cq) { ret = -ENOMEM; printf("mckey: unable to create CQ\n"); @@ -130,7 +130,7 @@ static int init_node(struct cmatest_node init_qp_attr.cap.max_recv_sge = 1; init_qp_attr.qp_context = node; init_qp_attr.sq_sig_all = 0; - init_qp_attr.qp_type = IBV_QPT_UD; + init_qp_attr.qp_type = RDMA_QPT_UD; init_qp_attr.send_cq = node->cq; init_qp_attr.recv_cq = node->cq; ret = rdma_create_qp(node->cma_id, node->pd, &init_qp_attr); @@ -150,8 +150,8 @@ out: static int post_recvs(struct cmatest_node *node) { - struct ibv_recv_wr recv_wr, *recv_failure; - struct ibv_sge sge; + struct rdma_recv_wr recv_wr, *recv_failure; + struct rdma_sge sge; int i, ret = 0; if (!message_count) @@ -162,12 +162,12 @@ static int post_recvs(struct cmatest_nod recv_wr.num_sge = 1; recv_wr.wr_id = (uintptr_t) node; - sge.length = message_size + sizeof(struct ibv_grh); + sge.length = message_size + sizeof(struct rdma_grh); sge.lkey = node->mr->lkey; sge.addr = (uintptr_t) node->mem; for (i = 0; i < message_count && !ret; i++ ) { - ret = ibv_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); + ret = rdma_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); if (ret) { printf("failed to post receives: %d\n", ret); break; @@ -178,8 +178,8 @@ static int post_recvs(struct cmatest_nod static int post_sends(struct cmatest_node *node, int signal_flag) { - struct ibv_send_wr send_wr, *bad_send_wr; - struct ibv_sge sge; + struct rdma_send_wr send_wr, *bad_send_wr; + struct rdma_sge sge; int i, ret = 0; if (!node->connected || !message_count) @@ -188,8 +188,8 @@ static int post_sends(struct cmatest_nod send_wr.next = NULL; send_wr.sg_list = &sge; send_wr.num_sge = 1; - send_wr.opcode = IBV_WR_SEND_WITH_IMM; - send_wr.send_flags = IBV_SEND_INLINE | signal_flag; + send_wr.opcode = RDMA_WR_SEND_WITH_IMM; + send_wr.send_flags = RDMA_SEND_INLINE | signal_flag; send_wr.wr_id = (unsigned long)node; send_wr.imm_data = htonl(node->cma_id->qp->qp_num); @@ -197,12 +197,12 @@ static int post_sends(struct cmatest_nod send_wr.wr.ud.remote_qpn = node->remote_qpn; send_wr.wr.ud.remote_qkey = node->remote_qkey; - sge.length = message_size - sizeof(struct ibv_grh); + sge.length = message_size - sizeof(struct rdma_grh); sge.lkey = node->mr->lkey; sge.addr = (uintptr_t) node->mem; for (i = 0; i < message_count && !ret; i++) { - ret = ibv_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); + ret = rdma_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); if (ret) printf("failed to post sends: %d\n", ret); } @@ -241,7 +241,7 @@ err: static int join_handler(struct cmatest_node *node) { - struct ibv_ah_attr ah_attr; + struct rdma_ah_attr ah_attr; int ret; ret = rdma_get_dst_attr(node->cma_id, test.dst_addr, &ah_attr, @@ -251,7 +251,7 @@ static int join_handler(struct cmatest_n goto err; } - node->ah = ibv_create_ah(node->pd, &ah_attr); + node->ah = rdma_create_ah(node->pd, &ah_attr); if (!node->ah) { printf("mckey: failure creating address handle\n"); goto err; @@ -299,21 +299,21 @@ static void destroy_node(struct cmatest_ return; if (node->ah) - ibv_destroy_ah(node->ah); + rdma_destroy_ah(node->ah); if (node->cma_id->qp) rdma_destroy_qp(node->cma_id); if (node->cq) - ibv_destroy_cq(node->cq); + rdma_destroy_cq(node->cq); if (node->mem) { - ibv_dereg_mr(node->mr); + rdma_dereg_mr(node->mr); free(node->mem); } if (node->pd) - ibv_dealloc_pd(node->pd); + rdma_dealloc_pd(node->pd); /* Destroy the RDMA ID after all device resources */ rdma_destroy_id(node->cma_id); @@ -356,7 +356,7 @@ static void destroy_nodes(void) static int poll_cqs(void) { - struct ibv_wc wc[8]; + struct rdma_wc wc[8]; int done, i, ret; for (i = 0; i < connections; i++) { @@ -364,7 +364,7 @@ static int poll_cqs(void) continue; for (done = 0; done < message_count; done += ret) { - ret = ibv_poll_cq(test.nodes[i].cq, 8, wc); + ret = rdma_poll_cq(test.nodes[i].cq, 8, wc); if (ret < 0) { printf("mckey: failed polling CQ: %d\n", ret); return ret; diff -ruNp ORG/librdmacm/examples/rping.c NEW/librdmacm/examples/rping.c --- ORG/librdmacm/examples/rping.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/examples/rping.c 2006-07-27 15:34:44.000000000 +0530 @@ -111,32 +111,32 @@ struct rping_rdma_info { struct rping_cb { int server; /* 0 iff client */ pthread_t cqthread; - struct ibv_comp_channel *channel; - struct ibv_cq *cq; - struct ibv_pd *pd; - struct ibv_qp *qp; + struct rdma_comp_channel *channel; + struct rdma_cq *cq; + struct rdma_pd *pd; + struct rdma_qp *qp; - struct ibv_recv_wr rq_wr; /* recv work request record */ - struct ibv_sge recv_sgl; /* recv single SGE */ + struct rdma_recv_wr rq_wr; /* recv work request record */ + struct rdma_sge recv_sgl; /* recv single SGE */ struct rping_rdma_info recv_buf;/* malloc'd buffer */ - struct ibv_mr *recv_mr; /* MR associated with this buffer */ + struct rdma_mr *recv_mr; /* MR associated with this buffer */ - struct ibv_send_wr sq_wr; /* send work requrest record */ - struct ibv_sge send_sgl; + struct rdma_send_wr sq_wr; /* send work requrest record */ + struct rdma_sge send_sgl; struct rping_rdma_info send_buf;/* single send buf */ - struct ibv_mr *send_mr; + struct rdma_mr *send_mr; - struct ibv_send_wr rdma_sq_wr; /* rdma work request record */ - struct ibv_sge rdma_sgl; /* rdma single SGE */ + struct rdma_send_wr rdma_sq_wr; /* rdma work request record */ + struct rdma_sge rdma_sgl; /* rdma single SGE */ char *rdma_buf; /* used as rdma sink */ - struct ibv_mr *rdma_mr; + struct rdma_mr *rdma_mr; uint32_t remote_rkey; /* remote guys RKEY */ uint64_t remote_addr; /* remote guys TO */ uint32_t remote_len; /* remote guys LEN */ char *start_buf; /* rdma read src */ - struct ibv_mr *start_mr; + struct rdma_mr *start_mr; enum test_state state; /* used for cond/signalling */ sem_t sem; @@ -232,7 +232,7 @@ static int rping_cma_event_handler(struc return ret; } -static int server_recv(struct rping_cb *cb, struct ibv_wc *wc) +static int server_recv(struct rping_cb *cb, struct rdma_wc *wc) { if (wc->byte_len != sizeof(cb->recv_buf)) { fprintf(stderr, "Received bogus data, size %d\n", wc->byte_len); @@ -253,7 +253,7 @@ static int server_recv(struct rping_cb * return 0; } -static int client_recv(struct rping_cb *cb, struct ibv_wc *wc) +static int client_recv(struct rping_cb *cb, struct rdma_wc *wc) { if (wc->byte_len != sizeof(cb->recv_buf)) { fprintf(stderr, "Received bogus data, size %d\n", wc->byte_len); @@ -270,39 +270,39 @@ static int client_recv(struct rping_cb * static int rping_cq_event_handler(struct rping_cb *cb) { - struct ibv_wc wc; - struct ibv_recv_wr *bad_wr; + struct rdma_wc wc; + struct rdma_recv_wr *bad_wr; int ret; - while ((ret = ibv_poll_cq(cb->cq, 1, &wc)) == 1) { + while ((ret = rdma_poll_cq(cb->cq, 1, &wc)) == 1) { ret = 0; if (wc.status) { fprintf(stderr, "cq completion failed status %d\n", wc.status); - if (wc.status != IBV_WC_WR_FLUSH_ERR) + if (wc.status != RDMA_WC_WR_FLUSH_ERR) ret = -1; goto error; } switch (wc.opcode) { - case IBV_WC_SEND: + case RDMA_WC_SEND: DEBUG_LOG("send completion\n"); break; - case IBV_WC_RDMA_WRITE: + case RDMA_WC_RDMA_WRITE: DEBUG_LOG("rdma write completion\n"); cb->state = RDMA_WRITE_COMPLETE; sem_post(&cb->sem); break; - case IBV_WC_RDMA_READ: + case RDMA_WC_RDMA_READ: DEBUG_LOG("rdma read completion\n"); cb->state = RDMA_READ_COMPLETE; sem_post(&cb->sem); break; - case IBV_WC_RECV: + case RDMA_WC_RECV: DEBUG_LOG("recv completion\n"); ret = cb->server ? server_recv(cb, &wc) : client_recv(cb, &wc); @@ -311,7 +311,7 @@ static int rping_cq_event_handler(struct goto error; } - ret = ibv_post_recv(cb->qp, &cb->rq_wr, &bad_wr); + ret = rdma_post_recv(cb->qp, &cb->rq_wr, &bad_wr); if (ret) { fprintf(stderr, "post recv error: %d\n", ret); goto error; @@ -374,14 +374,14 @@ static void rping_setup_wr(struct rping_ cb->send_sgl.length = sizeof cb->send_buf; cb->send_sgl.lkey = cb->send_mr->lkey; - cb->sq_wr.opcode = IBV_WR_SEND; - cb->sq_wr.send_flags = IBV_SEND_SIGNALED; + cb->sq_wr.opcode = RDMA_WR_SEND; + cb->sq_wr.send_flags = RDMA_SEND_SIGNALED; cb->sq_wr.sg_list = &cb->send_sgl; cb->sq_wr.num_sge = 1; cb->rdma_sgl.addr = (uint64_t) (unsigned long) cb->rdma_buf; cb->rdma_sgl.lkey = cb->rdma_mr->lkey; - cb->rdma_sq_wr.send_flags = IBV_SEND_SIGNALED; + cb->rdma_sq_wr.send_flags = RDMA_SEND_SIGNALED; cb->rdma_sq_wr.sg_list = &cb->rdma_sgl; cb->rdma_sq_wr.num_sge = 1; } @@ -392,14 +392,14 @@ static int rping_setup_buffers(struct rp DEBUG_LOG("rping_setup_buffers called on cb %p\n", cb); - cb->recv_mr = ibv_reg_mr(cb->pd, &cb->recv_buf, sizeof cb->recv_buf, - IBV_ACCESS_LOCAL_WRITE); + cb->recv_mr = rdma_reg_mr(cb->pd, &cb->recv_buf, sizeof cb->recv_buf, + RDMA_ACCESS_LOCAL_WRITE); if (!cb->recv_mr) { fprintf(stderr, "recv_buf reg_mr failed\n"); return errno; } - cb->send_mr = ibv_reg_mr(cb->pd, &cb->send_buf, sizeof cb->send_buf, 0); + cb->send_mr = rdma_reg_mr(cb->pd, &cb->send_buf, sizeof cb->send_buf, 0); if (!cb->send_mr) { fprintf(stderr, "send_buf reg_mr failed\n"); ret = errno; @@ -413,10 +413,10 @@ static int rping_setup_buffers(struct rp goto err2; } - cb->rdma_mr = ibv_reg_mr(cb->pd, cb->rdma_buf, cb->size, - IBV_ACCESS_LOCAL_WRITE | - IBV_ACCESS_REMOTE_READ | - IBV_ACCESS_REMOTE_WRITE); + cb->rdma_mr = rdma_reg_mr(cb->pd, cb->rdma_buf, cb->size, + RDMA_ACCESS_LOCAL_WRITE | + RDMA_ACCESS_REMOTE_READ | + RDMA_ACCESS_REMOTE_WRITE); if (!cb->rdma_mr) { fprintf(stderr, "rdma_buf reg_mr failed\n"); ret = errno; @@ -431,10 +431,10 @@ static int rping_setup_buffers(struct rp goto err4; } - cb->start_mr = ibv_reg_mr(cb->pd, cb->start_buf, cb->size, - IBV_ACCESS_LOCAL_WRITE | - IBV_ACCESS_REMOTE_READ | - IBV_ACCESS_REMOTE_WRITE); + cb->start_mr = rdma_reg_mr(cb->pd, cb->start_buf, cb->size, + RDMA_ACCESS_LOCAL_WRITE | + RDMA_ACCESS_REMOTE_READ | + RDMA_ACCESS_REMOTE_WRITE); if (!cb->start_mr) { fprintf(stderr, "start_buf reg_mr failed\n"); ret = errno; @@ -449,32 +449,32 @@ static int rping_setup_buffers(struct rp err5: free(cb->start_buf); err4: - ibv_dereg_mr(cb->rdma_mr); + rdma_dereg_mr(cb->rdma_mr); err3: free(cb->rdma_buf); err2: - ibv_dereg_mr(cb->send_mr); + rdma_dereg_mr(cb->send_mr); err1: - ibv_dereg_mr(cb->recv_mr); + rdma_dereg_mr(cb->recv_mr); return ret; } static void rping_free_buffers(struct rping_cb *cb) { DEBUG_LOG("rping_free_buffers called on cb %p\n", cb); - ibv_dereg_mr(cb->recv_mr); - ibv_dereg_mr(cb->send_mr); - ibv_dereg_mr(cb->rdma_mr); + rdma_dereg_mr(cb->recv_mr); + rdma_dereg_mr(cb->send_mr); + rdma_dereg_mr(cb->rdma_mr); free(cb->rdma_buf); if (!cb->server) { - ibv_dereg_mr(cb->start_mr); + rdma_dereg_mr(cb->start_mr); free(cb->start_buf); } } static int rping_create_qp(struct rping_cb *cb) { - struct ibv_qp_init_attr init_attr; + struct rdma_qp_init_attr init_attr; int ret; memset(&init_attr, 0, sizeof(init_attr)); @@ -482,7 +482,7 @@ static int rping_create_qp(struct rping_ init_attr.cap.max_recv_wr = 2; init_attr.cap.max_recv_sge = 1; init_attr.cap.max_send_sge = 1; - init_attr.qp_type = IBV_QPT_RC; + init_attr.qp_type = RDMA_QPT_RC; init_attr.send_cq = cb->cq; init_attr.recv_cq = cb->cq; @@ -501,43 +501,43 @@ static int rping_create_qp(struct rping_ static void rping_free_qp(struct rping_cb *cb) { - ibv_destroy_qp(cb->qp); - ibv_destroy_cq(cb->cq); - ibv_destroy_comp_channel(cb->channel); - ibv_dealloc_pd(cb->pd); + rdmav_destroy_qp(cb->qp); + rdma_destroy_cq(cb->cq); + rdma_destroy_comp_channel(cb->channel); + rdma_dealloc_pd(cb->pd); } static int rping_setup_qp(struct rping_cb *cb, struct rdma_cm_id *cm_id) { int ret; - cb->pd = ibv_alloc_pd(cm_id->verbs); + cb->pd = rdma_alloc_pd(cm_id->verbs); if (!cb->pd) { - fprintf(stderr, "ibv_alloc_pd failed\n"); + fprintf(stderr, "rdma_alloc_pd failed\n"); return errno; } DEBUG_LOG("created pd %p\n", cb->pd); - cb->channel = ibv_create_comp_channel(cm_id->verbs); + cb->channel = rdma_create_comp_channel(cm_id->verbs); if (!cb->channel) { - fprintf(stderr, "ibv_create_comp_channel failed\n"); + fprintf(stderr, "rdma_create_comp_channel failed\n"); ret = errno; goto err1; } DEBUG_LOG("created channel %p\n", cb->channel); - cb->cq = ibv_create_cq(cm_id->verbs, RPING_SQ_DEPTH * 2, cb, + cb->cq = rdma_create_cq(cm_id->verbs, RPING_SQ_DEPTH * 2, cb, cb->channel, 0); if (!cb->cq) { - fprintf(stderr, "ibv_create_cq failed\n"); + fprintf(stderr, "rdma_create_cq failed\n"); ret = errno; goto err2; } DEBUG_LOG("created cq %p\n", cb->cq); - ret = ibv_req_notify_cq(cb->cq, 0); + ret = rdma_req_notify_cq(cb->cq, 0); if (ret) { - fprintf(stderr, "ibv_create_cq failed\n"); + fprintf(stderr, "rdma_create_cq failed\n"); ret = errno; goto err3; } @@ -551,11 +551,11 @@ static int rping_setup_qp(struct rping_c return 0; err3: - ibv_destroy_cq(cb->cq); + rdma_destroy_cq(cb->cq); err2: - ibv_destroy_comp_channel(cb->channel); + rdma_destroy_comp_channel(cb->channel); err1: - ibv_dealloc_pd(cb->pd); + rdma_dealloc_pd(cb->pd); return ret; } @@ -581,14 +581,14 @@ static void *cm_thread(void *arg) static void *cq_thread(void *arg) { struct rping_cb *cb = arg; - struct ibv_cq *ev_cq; + struct rdma_cq *ev_cq; void *ev_ctx; int ret; DEBUG_LOG("cq_thread started.\n"); while (1) { - ret = ibv_get_cq_event(cb->channel, &ev_cq, &ev_ctx); + ret = rdma_get_cq_event(cb->channel, &ev_cq, &ev_ctx); if (ret) { fprintf(stderr, "Failed to get cq event!\n"); exit(ret); @@ -597,19 +597,19 @@ static void *cq_thread(void *arg) fprintf(stderr, "Unkown CQ!\n"); exit(-1); } - ret = ibv_req_notify_cq(cb->cq, 0); + ret = rdma_req_notify_cq(cb->cq, 0); if (ret) { fprintf(stderr, "Failed to set notify!\n"); exit(ret); } ret = rping_cq_event_handler(cb); - ibv_ack_cq_events(cb->cq, 1); + rdma_ack_cq_events(cb->cq, 1); if (ret) exit(ret); } } -static void rping_format_send(struct rping_cb *cb, char *buf, struct ibv_mr *mr) +static void rping_format_send(struct rping_cb *cb, char *buf, struct rdma_mr *mr) { struct rping_rdma_info *info = &cb->send_buf; @@ -623,7 +623,7 @@ static void rping_format_send(struct rpi static int rping_test_server(struct rping_cb *cb) { - struct ibv_send_wr *bad_wr; + struct rdma_send_wr *bad_wr; int ret; while (1) { @@ -639,12 +639,12 @@ static int rping_test_server(struct rpin DEBUG_LOG("server received sink adv\n"); /* Issue RDMA Read. */ - cb->rdma_sq_wr.opcode = IBV_WR_RDMA_READ; + cb->rdma_sq_wr.opcode = RDMA_WR_RDMA_READ; cb->rdma_sq_wr.wr.rdma.rkey = cb->remote_rkey; cb->rdma_sq_wr.wr.rdma.remote_addr = cb->remote_addr; cb->rdma_sq_wr.sg_list->length = cb->remote_len; - ret = ibv_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -666,7 +666,7 @@ static int rping_test_server(struct rpin printf("server ping data: %s\n", cb->rdma_buf); /* Tell client to continue */ - ret = ibv_post_send(cb->qp, &cb->sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -684,7 +684,7 @@ static int rping_test_server(struct rpin DEBUG_LOG("server received sink adv\n"); /* RDMA Write echo data */ - cb->rdma_sq_wr.opcode = IBV_WR_RDMA_WRITE; + cb->rdma_sq_wr.opcode = RDMA_WR_RDMA_WRITE; cb->rdma_sq_wr.wr.rdma.rkey = cb->remote_rkey; cb->rdma_sq_wr.wr.rdma.remote_addr = cb->remote_addr; cb->rdma_sq_wr.sg_list->length = strlen(cb->rdma_buf) + 1; @@ -693,7 +693,7 @@ static int rping_test_server(struct rpin cb->rdma_sq_wr.sg_list->addr, cb->rdma_sq_wr.sg_list->length); - ret = ibv_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -710,7 +710,7 @@ static int rping_test_server(struct rpin DEBUG_LOG("server rdma write complete \n"); /* Tell client to begin again */ - ret = ibv_post_send(cb->qp, &cb->sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -757,7 +757,7 @@ static int rping_bind_server(struct rpin static int rping_run_server(struct rping_cb *cb) { - struct ibv_recv_wr *bad_wr; + struct rdma_recv_wr *bad_wr; int ret; ret = rping_bind_server(cb); @@ -776,9 +776,9 @@ static int rping_run_server(struct rping goto err1; } - ret = ibv_post_recv(cb->qp, &cb->rq_wr, &bad_wr); + ret = rdma_post_recv(cb->qp, &cb->rq_wr, &bad_wr); if (ret) { - fprintf(stderr, "ibv_post_recv failed: %d\n", ret); + fprintf(stderr, "rdma_post_recv failed: %d\n", ret); goto err2; } @@ -804,7 +804,7 @@ err1: static int rping_test_client(struct rping_cb *cb) { int ping, start, cc, i, ret = 0; - struct ibv_send_wr *bad_wr; + struct rdma_send_wr *bad_wr; unsigned char c; start = 65; @@ -825,7 +825,7 @@ static int rping_test_client(struct rpin cb->start_buf[cb->size - 1] = 0; rping_format_send(cb, cb->start_buf, cb->start_mr); - ret = ibv_post_send(cb->qp, &cb->sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -841,7 +841,7 @@ static int rping_test_client(struct rpin } rping_format_send(cb, cb->rdma_buf, cb->rdma_mr); - ret = ibv_post_send(cb->qp, &cb->sq_wr, &bad_wr); + ret = rdma_post_send(cb->qp, &cb->sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); break; @@ -926,7 +926,7 @@ static int rping_bind_client(struct rpin static int rping_run_client(struct rping_cb *cb) { - struct ibv_recv_wr *bad_wr; + struct rdma_recv_wr *bad_wr; int ret; ret = rping_bind_client(cb); @@ -945,9 +945,9 @@ static int rping_run_client(struct rping goto err1; } - ret = ibv_post_recv(cb->qp, &cb->rq_wr, &bad_wr); + ret = rdma_post_recv(cb->qp, &cb->rq_wr, &bad_wr); if (ret) { - fprintf(stderr, "ibv_post_recv failed: %d\n", ret); + fprintf(stderr, "rdma_post_recv failed: %d\n", ret); goto err2; } diff -ruNp ORG/librdmacm/examples/udaddy.c NEW/librdmacm/examples/udaddy.c --- ORG/librdmacm/examples/udaddy.c 2006-07-27 15:34:18.000000000 +0530 +++ NEW/librdmacm/examples/udaddy.c 2006-07-27 15:34:44.000000000 +0530 @@ -55,10 +55,10 @@ struct cmatest_node { int id; struct rdma_cm_id *cma_id; int connected; - struct ibv_pd *pd; - struct ibv_cq *cq; - struct ibv_mr *mr; - struct ibv_ah *ah; + struct rdma_pd *pd; + struct rdma_cq *cq; + struct rdma_mr *mr; + struct rdma_ah *ah; uint32_t remote_qpn; uint32_t remote_qkey; void *mem; @@ -90,14 +90,14 @@ static int create_message(struct cmatest if (!message_count) return 0; - node->mem = malloc(message_size + sizeof(struct ibv_grh)); + node->mem = malloc(message_size + sizeof(struct rdma_grh)); if (!node->mem) { printf("failed message allocation\n"); return -1; } - node->mr = ibv_reg_mr(node->pd, node->mem, - message_size + sizeof(struct ibv_grh), - IBV_ACCESS_LOCAL_WRITE); + node->mr = rdma_reg_mr(node->pd, node->mem, + message_size + sizeof(struct rdma_grh), + RDMA_ACCESS_LOCAL_WRITE); if (!node->mr) { printf("failed to reg MR\n"); goto err; @@ -110,10 +110,10 @@ err: static int init_node(struct cmatest_node *node) { - struct ibv_qp_init_attr init_qp_attr; + struct rdma_qp_init_attr init_qp_attr; int cqe, ret; - node->pd = ibv_alloc_pd(node->cma_id->verbs); + node->pd = rdma_alloc_pd(node->cma_id->verbs); if (!node->pd) { ret = -ENOMEM; printf("udaddy: unable to allocate PD\n"); @@ -121,7 +121,7 @@ static int init_node(struct cmatest_node } cqe = message_count ? message_count * 2 : 2; - node->cq = ibv_create_cq(node->cma_id->verbs, cqe, node, 0, 0); + node->cq = rdma_create_cq(node->cma_id->verbs, cqe, node, 0, 0); if (!node->cq) { ret = -ENOMEM; printf("udaddy: unable to create CQ\n"); @@ -135,7 +135,7 @@ static int init_node(struct cmatest_node init_qp_attr.cap.max_recv_sge = 1; init_qp_attr.qp_context = node; init_qp_attr.sq_sig_all = 0; - init_qp_attr.qp_type = IBV_QPT_UD; + init_qp_attr.qp_type = RDMA_QPT_UD; init_qp_attr.send_cq = node->cq; init_qp_attr.recv_cq = node->cq; ret = rdma_create_qp(node->cma_id, node->pd, &init_qp_attr); @@ -155,8 +155,8 @@ out: static int post_recvs(struct cmatest_node *node) { - struct ibv_recv_wr recv_wr, *recv_failure; - struct ibv_sge sge; + struct rdma_recv_wr recv_wr, *recv_failure; + struct rdma_sge sge; int i, ret = 0; if (!message_count) @@ -167,12 +167,12 @@ static int post_recvs(struct cmatest_nod recv_wr.num_sge = 1; recv_wr.wr_id = (uintptr_t) node; - sge.length = message_size + sizeof(struct ibv_grh); + sge.length = message_size + sizeof(struct rdma_grh); sge.lkey = node->mr->lkey; sge.addr = (uintptr_t) node->mem; for (i = 0; i < message_count && !ret; i++ ) { - ret = ibv_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); + ret = rdma_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); if (ret) { printf("failed to post receives: %d\n", ret); break; @@ -183,8 +183,8 @@ static int post_recvs(struct cmatest_nod static int post_sends(struct cmatest_node *node, int signal_flag) { - struct ibv_send_wr send_wr, *bad_send_wr; - struct ibv_sge sge; + struct rdma_send_wr send_wr, *bad_send_wr; + struct rdma_sge sge; int i, ret = 0; if (!node->connected || !message_count) @@ -193,8 +193,8 @@ static int post_sends(struct cmatest_nod send_wr.next = NULL; send_wr.sg_list = &sge; send_wr.num_sge = 1; - send_wr.opcode = IBV_WR_SEND_WITH_IMM; - send_wr.send_flags = IBV_SEND_INLINE | signal_flag; + send_wr.opcode = RDMA_WR_SEND_WITH_IMM; + send_wr.send_flags = RDMA_SEND_INLINE | signal_flag; send_wr.wr_id = (unsigned long)node; send_wr.imm_data = htonl(node->cma_id->qp->qp_num); @@ -202,12 +202,12 @@ static int post_sends(struct cmatest_nod send_wr.wr.ud.remote_qpn = node->remote_qpn; send_wr.wr.ud.remote_qkey = node->remote_qkey; - sge.length = message_size - sizeof(struct ibv_grh); + sge.length = message_size - sizeof(struct rdma_grh); sge.lkey = node->mr->lkey; sge.addr = (uintptr_t) node->mem; for (i = 0; i < message_count && !ret; i++) { - ret = ibv_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); + ret = rdma_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); if (ret) printf("failed to post sends: %d\n", ret); } @@ -305,7 +305,7 @@ err1: static int resolved_handler(struct cmatest_node *node) { - struct ibv_ah_attr ah_attr; + struct rdma_ah_attr ah_attr; int ret; ret = rdma_get_dst_attr(node->cma_id, test.dst_addr, &ah_attr, @@ -315,7 +315,7 @@ static int resolved_handler(struct cmate goto err; } - node->ah = ibv_create_ah(node->pd, &ah_attr); + node->ah = rdma_create_ah(node->pd, &ah_attr); if (!node->ah) { printf("udaddy: failure creating address handle\n"); goto err; @@ -371,21 +371,21 @@ static void destroy_node(struct cmatest_ return; if (node->ah) - ibv_destroy_ah(node->ah); + rdma_destroy_ah(node->ah); if (node->cma_id->qp) rdma_destroy_qp(node->cma_id); if (node->cq) - ibv_destroy_cq(node->cq); + rdma_destroy_cq(node->cq); if (node->mem) { - ibv_dereg_mr(node->mr); + rdma_dereg_mr(node->mr); free(node->mem); } if (node->pd) - ibv_dealloc_pd(node->pd); + rdma_dealloc_pd(node->pd); /* Destroy the RDMA ID after all device resources */ rdma_destroy_id(node->cma_id); @@ -429,9 +429,9 @@ static void destroy_nodes(void) free(test.nodes); } -static void create_reply_ah(struct cmatest_node *node, struct ibv_wc *wc) +static void create_reply_ah(struct cmatest_node *node, struct rdma_wc *wc) { - node->ah = ibv_create_ah_from_wc(node->pd, wc, node->mem, + node->ah = rdma_create_ah_from_wc(node->pd, wc, node->mem, node->cma_id->port_num); node->remote_qpn = ntohl(wc->imm_data); node->remote_qkey = ntohs(rdma_get_dst_port(node->cma_id)); @@ -439,7 +439,7 @@ static void create_reply_ah(struct cmate static int poll_cqs(void) { - struct ibv_wc wc[8]; + struct rdma_wc wc[8]; int done, i, ret; for (i = 0; i < connections; i++) { @@ -447,7 +447,7 @@ static int poll_cqs(void) continue; for (done = 0; done < message_count; done += ret) { - ret = ibv_poll_cq(test.nodes[i].cq, 8, wc); + ret = rdma_poll_cq(test.nodes[i].cq, 8, wc); if (ret < 0) { printf("udaddy: failed polling CQ: %d\n", ret); return ret; @@ -511,7 +511,7 @@ static int run_server(void) printf("sending replies\n"); for (i = 0; i < connections; i++) { - ret = post_sends(&test.nodes[i], IBV_SEND_SIGNALED); + ret = post_sends(&test.nodes[i], RDMA_SEND_SIGNALED); if (ret) goto out; } From zhushisongzhu at yahoo.com Fri Jul 28 00:19:51 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Fri, 28 Jul 2006 00:19:51 -0700 (PDT) Subject: [openib-general] OFED 1.0 - Official Release (Tziporet Koren) In-Reply-To: <449A92D1.8090404@mellanox.co.il> Message-ID: <20060728071952.6646.qmail@web36905.mail.mud.yahoo.com> Dear Sir, what's your progress on sdp connection? I'm waiting for the new release to test. tks zhu --- Tziporet Koren wrote: > zhu shi song wrote: > > I'm sorry SDP is not in production state. SDP is > very > > important for our application and we are waiting > it > > mature enough to be used in our product. And do > you > > have any schedule to let SDP work ok(especially > can > > support many large concurrent connections just > like > > TCP)? I very appreciate I can test new SDP before > end > > of June. > > tks > > zhu > > > > > The plan is to have a stable SDP in 1.1 release. > The schedule of 1.1 is end of July in the best case > (more likely it will > be mid-Aug) > However we will have RCs before this and we can let > you know when many > large concurrent connections are supported. > > Tziporet > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From Arkady.Kanevsky at netapp.com Fri Jul 28 05:08:43 2006 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Fri, 28 Jul 2006 08:08:43 -0400 Subject: [openib-general] posting send requests in RTR Message-ID: If a QP is not in the RTS state then Send post should be flushed to CQ for IB. This fact need to be preserved so ULP can ensure that for Completion Suppression Sends have been completed. Thanks, Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Thursday, July 27, 2006 6:19 PM > To: Sean Hefty; Rimmer, Todd; Michael S. Tsirkin > Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org > Subject: Re: [openib-general] posting send requests in RTR > > Sean Hefty wrote: > > > > >> Alternately, it would be reasonable to simply document > that a receive > >> completion *implied* a connection established event, and therefore > >> the application could post to the send queue after it reaped a > >> receive completion (or got a connection established event). > > > > The problem is that the QP is not in the RTS state, so > cannot accept > > sends. > > > > Well, I suppose if your adapter can be in a state where it > has completed a receive work request for a connection but is > not yet convinced that that connection is established then it > would have to queue those work completions somewhere. > > If that is all you are proposing then I have no objections, > an iWARP adapter can never be in such a state. > > But I am curious as to why completing a receive work request > does not place the QP in the RTS state since the end-to-end > QP pairing has obviously been confirmed, and therefore the QP > can send. > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From trimmer at silverstorm.com Fri Jul 28 06:20:47 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 28 Jul 2006 09:20:47 -0400 Subject: [openib-general] posting send requests in RTR In-Reply-To: <000101c6b1c5$9e4c1de0$8fc8180a@amr.corp.intel.com> Message-ID: > From: Sean Hefty [mailto:sean.hefty at intel.com] > > >That assumes that there is any valid reason for an application to > >post send requests before the connection is established. While there > >is clearly a need to post receive work requests before the connection > >is established I cannot think of any reason why an application needs > >to pre-prime the send queue. > > It's not pre-priming the send queue. An application could pull a > completed > receive work completion off of a CQ. The receive may very well be a > request > that requires a response. At this point, the connection is obviously > established from the consumers viewpoint, but not necessarily by the > viewpoint > of the RDMA CM or IB CM. The response must now be queued. > > I believe that the problem can be limited under the following application > conditions: > > 1. The application uses the CQ with different QPs. > 2. The application is on the passive (server) side of the connection. > 3. The active (client) side sends a request to the server. > > Even combined these conditions could easily occur. > > IMO, the question is do we pass this problem to the applications to deal > with, > or try to handle transparently it under verbs. If we try to handle it > under > verbs, can it be done in one place? How much would such checks affects > the > performance of post send operations? And how would immediate or other > errors be > handled when posting queued sends? > > My personal take at the moment is to let the ULPs handle the problem. > I feel this is best solved in the verbs driver and will add no more than 1 QP state test to the data path. The verbs driver will need to test the QP state, if its RTR, it should process as much of the WQE as it can without notifying the HCA. For Mellanox silicon, this would mean build the WQE but don't ring the doorbell. Theoretically other HCAs may require other algorithms (such as a sidebar TX queue). Since PathScale/QLogic HCA does QP management in software, its solution should be soft as well. Not sure about IBM eHCA. Immediate errors would be tested for as they are presently. Any immediate error tests would precede this test. On transition from RTR to RTS, the verbs driver would appropriately notify the HCA of the queued send WQEs. For Mellanox HCA this would involve ringing the appropriate doorbell. It is important that we keep the writing of applications simple. Requiring applications and ULPs to solve subtle races like this almost guarantees they won't be solved. As Open Fabrics use expands we will find more developers implementing to the APIs. The easier we make it, the more likely Open Fabrics use will expand. Todd Rimmer From trimmer at silverstorm.com Fri Jul 28 06:42:24 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 28 Jul 2006 09:42:24 -0400 Subject: [openib-general] posting send requests in RTR In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F17447B1@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > > That assumes that there is any valid reason for an application to > post send requests before the connection is established. While there > is clearly a need to post receive work requests before the connection > is established I cannot think of any reason why an application needs > to pre-prime the send queue. > > Putting unneeded complexity in the definition of a hardware service > just invites more hardware dependencies and eventual hardware specific > bugs that will complicate life for application developers. "Don't post > until the connection is established" is very simple for *both* the > application and the verbs provider. Here is a real world example and how we uncovered this issue: Native IB storage SRP Targets. SRP Targets implement the passive side, after processing the REQ, they send a REP. However the target QP is in RTR. The Srp client gets the REP, sends the RTU and announces to the OS that a device is available. The OS immediately begins issuing SCSI commands. The target receives the SCSI commands (such as Test Unit Ready or Inquiry) and wants to act on them immediately. However if the command has passed the RTU or the RTU is lost, the target is still in RTR. If the command was very simple, the target may want to answer the query immediately by posting a send with the response. If the RTU is totally lost and async event processing is delayed, the target may even be able to do some processing and still not have the QP in RTS when it has its response ready. While its possible to build an additional queuing point above the send Q, such queuing points tend to impact performance and latency for high performance protocols. While this example was for SRP, its not unique to SRP. Most protocols (such as SRP and SDP) include a set of initialization or capability query messages which the client may issue to the target immediately after the client believes it has a connection. Many of those initialization messages are the types of things which the target can answer immediately and may even choose to do so in its completion handler. Todd Rimmer From trimmer at silverstorm.com Fri Jul 28 06:50:07 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 28 Jul 2006 09:50:07 -0400 Subject: [openib-general] posting send requests in RTR In-Reply-To: Message-ID: > From: Rimmer, Todd > > > From: Sean Hefty [mailto:sean.hefty at intel.com] > > > > >That assumes that there is any valid reason for an application to > > >post send requests before the connection is established. While there > > >is clearly a need to post receive work requests before the connection > > >is established I cannot think of any reason why an application needs > > >to pre-prime the send queue. > > > > It's not pre-priming the send queue. An application could pull a > > completed > > receive work completion off of a CQ. The receive may very well be a > > request > > that requires a response. At this point, the connection is obviously > > established from the consumers viewpoint, but not necessarily by the > > viewpoint > > of the RDMA CM or IB CM. The response must now be queued. > > > > I believe that the problem can be limited under the following > application > > conditions: > > > > 1. The application uses the CQ with different QPs. > > 2. The application is on the passive (server) side of the connection. > > 3. The active (client) side sends a request to the server. > > > > Even combined these conditions could easily occur. > > > > IMO, the question is do we pass this problem to the applications to > deal > > with, > > or try to handle transparently it under verbs. If we try to handle it > > under > > verbs, can it be done in one place? How much would such checks > affects > > the > > performance of post send operations? And how would immediate or other > > errors be > > handled when posting queued sends? > > > > My personal take at the moment is to let the ULPs handle the problem. > > > I feel this is best solved in the verbs driver and will add no more than > 1 QP state test to the data path. The verbs driver will need to test > the QP state, if its RTR, it should process as much of the WQE as it can > without notifying the HCA. For Mellanox silicon, this would mean build > the WQE but don't ring the doorbell. Theoretically other HCAs may > require other algorithms (such as a sidebar TX queue). Since > PathScale/QLogic HCA does QP management in software, its solution should > be soft as well. Not sure about IBM eHCA. > > Immediate errors would be tested for as they are presently. Any > immediate error tests would precede this test. > > On transition from RTR to RTS, the verbs driver would appropriately > notify the HCA of the queued send WQEs. For Mellanox HCA this would > involve ringing the appropriate doorbell. > > It is important that we keep the writing of applications simple. > Requiring applications and ULPs to solve subtle races like this almost > guarantees they won't be solved. As Open Fabrics use expands we will > find more developers implementing to the APIs. The easier we make it, > the more likely Open Fabrics use will expand. > > Todd Rimmer > One addition, also poll_cq would need to test for an error state. In which case it would simulate the Flushed events for those queued SendQ WQEs. Todd Rimmer From mshefty at ichips.intel.com Fri Jul 28 09:56:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 28 Jul 2006 09:56:24 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: References: Message-ID: <44CA41B8.909@ichips.intel.com> Kanevsky, Arkady wrote: > If a QP is not in the RTS state then Send post should > be flushed to CQ for IB. > This fact need to be preserved so ULP can ensure that > for Completion Suppression Sends have been completed. Sends posted when in the RTR state result in an immediate error. No work completion is generated. Aren't sends required to be queue but not processed while in the SQD state? Would it be that hard to apply the same requirement for RTR? (Besides the fact that it's a spec change...) - Sean From mshefty at ichips.intel.com Fri Jul 28 10:04:10 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 28 Jul 2006 10:04:10 -0700 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <44CA438A.4050603@ichips.intel.com> Krishna Kumar wrote: > +struct rdma_kern_async_event { Why not just do: #define rdma_structure_name ibv_structure_name Then we simply remove the #define and replace ibv_structure_name with rdma_structure_name when ready. This better shows the changes, prevents duplicating every structure, and should avoid any compile warnings. - Sean From mshefty at ichips.intel.com Fri Jul 28 10:08:30 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 28 Jul 2006 10:08:30 -0700 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <44CA448E.4040603@ichips.intel.com> Krishna Kumar wrote: > +/** > + * rdmav_create_qp - Create a queue pair. > + */ > +struct rdma_qp *rdmav_create_qp(struct rdma_pd *pd, > + struct rdma_qp_init_attr *qp_init_attr); > + > +/** > + * rdmav_destroy_qp - Destroy a queue pair. > + */ > +int rdmav_destroy_qp(struct rdma_qp *qp); I realize why you did this, but adding the 'v' to these two calls makes them inconsistent with all the other calls in the API. I don't know what the right approach is here, but it may be better to have the RDMA CM use different names. - Sean From mshefty at ichips.intel.com Fri Jul 28 10:12:44 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 28 Jul 2006 10:12:44 -0700 Subject: [openib-general] [PATCH 4/6] librdmacm include file changes. In-Reply-To: <20060728045028.9426.63971.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045028.9426.63971.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <44CA458C.8040003@ichips.intel.com> Krishna Kumar wrote: > struct ucma_abi_query_route_resp { > __u64 node_guid; > - struct ibv_kern_path_rec ib_route[2]; > + struct rdma_kern_path_rec ib_route[2]; Path records are IB specific. Not sure we need to rename them. > struct ib_addr { > - union ibv_gid sgid; > - union ibv_gid dgid; > + union rdma_gid sgid; > + union rdma_gid dgid; Ditto for GIDs. > struct rdma_route { > struct rdma_addr addr; > - struct ibv_sa_path_rec *path_rec; > + struct rdma_sa_path_rec *path_rec; same > -int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd *pd, > - struct ibv_qp_init_attr *qp_init_attr); > +int rdma_create_qp(struct rdma_cm_id *id, struct rdma_pd *pd, > + struct rdma_qp_init_attr *qp_init_attr); If we want to avoid changing this, maybe the new verbs call can use 'alloc' in place of 'create'. > - IB_PATH_OPTIONS = 1, /* struct ibv_kern_path_rec */ > + IB_PATH_OPTIONS = 1, /* struct rdma_kern_path_rec */ see above - Sean From mshefty at ichips.intel.com Fri Jul 28 10:18:41 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 28 Jul 2006 10:18:41 -0700 Subject: [openib-general] [PATCH 5/6] librdmacm source file changes. In-Reply-To: <20060728045034.9426.6547.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045034.9426.6547.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <44CA46F1.8050704@ichips.intel.com> These changes look fine. We just need to decide if we want to change everything that's ibv_* to rdma_*, or keep IB specific names (path records, GIDs, PKeys, etc.) the same. - Sean From mshefty at ichips.intel.com Fri Jul 28 10:23:29 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 28 Jul 2006 10:23:29 -0700 Subject: [openib-general] [PATCH 6/6] librdmacm examples changes. In-Reply-To: <20060728045041.9426.80827.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045041.9426.80827.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <44CA4811.7090000@ichips.intel.com> These changes look fine. - Sean From swise at opengridcomputing.com Fri Jul 28 11:28:49 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 28 Jul 2006 13:28:49 -0500 Subject: [openib-general] [PATCH Round 5 0/3] Network Event Notifier Mechanism Message-ID: <20060728182849.14325.62898.stgit@stevo-desktop> Dave/Roland, is this patchset about ready to go? Round 5 Changes: - removed user-mode support - removed route add/del events. ------ This patch implements a mechanism that allows interested clients to register for notification of certain network events. The intended use is to allow RDMA devices (linux/drivers/infiniband) to be notified of neighbour updates, ICMP redirects, and path MTU changes. RDMA devices need these network events because they typically cache this information in hardware and need to be notified when it changes. To learn about RDMA internet protocols, see: http://www.ietf.org/html.charters/rddp-charter.html. The key events of interest are: - neighbour mac address change - routing redirect (the next hop neighbour changes for a dst_entry) - path mtu change (the path mtu for a dst_entry changes). This patchset consists of 3 patches: 1) New files implementing the Network Event Notifier 2) Core network changes to generate netevents 3) Cleanup ib_addr modules to use the netevent patch Signed-off-by: Tom Tucker Signed-off-by: Steve Wise From swise at opengridcomputing.com Fri Jul 28 11:28:59 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 28 Jul 2006 13:28:59 -0500 Subject: [openib-general] [PATCH Round 5 1/3] Network Event Notifier Mechanism. In-Reply-To: <20060728182849.14325.62898.stgit@stevo-desktop> References: <20060728182849.14325.62898.stgit@stevo-desktop> Message-ID: <20060728182859.14325.86842.stgit@stevo-desktop> This patch uses notifier blocks to implement a network event notifier mechanism. Clients register their callback function by calling register_netevent_notifier() like this: static struct notifier_block nb = { .notifier_call = my_callback_func }; ... register_netevent_notifier(&nb); --- include/net/netevent.h | 33 +++++++++++++++++++++++ net/core/netevent.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+), 0 deletions(-) diff --git a/include/net/netevent.h b/include/net/netevent.h new file mode 100644 index 0000000..e5d2162 --- /dev/null +++ b/include/net/netevent.h @@ -0,0 +1,33 @@ +#ifndef _NET_EVENT_H +#define _NET_EVENT_H + +/* + * Generic netevent notifiers + * + * Authors: + * Tom Tucker + * Steve Wise + * + * Changes: + */ +#ifdef __KERNEL__ + +#include + +struct netevent_redirect { + struct dst_entry *old; + struct dst_entry *new; +}; + +enum netevent_notif_type { + NETEVENT_NEIGH_UPDATE = 1, /* arg is struct neighbour ptr */ + NETEVENT_PMTU_UPDATE, /* arg is struct dst_entry ptr */ + NETEVENT_REDIRECT, /* arg is struct netevent_redirect ptr */ +}; + +extern int register_netevent_notifier(struct notifier_block *nb); +extern int unregister_netevent_notifier(struct notifier_block *nb); +extern int call_netevent_notifiers(unsigned long val, void *v); + +#endif +#endif diff --git a/net/core/netevent.c b/net/core/netevent.c new file mode 100644 index 0000000..35d02c3 --- /dev/null +++ b/net/core/netevent.c @@ -0,0 +1,69 @@ +/* + * Network event notifiers + * + * Authors: + * Tom Tucker + * Steve Wise + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Fixes: + */ + +#include +#include + +static ATOMIC_NOTIFIER_HEAD(netevent_notif_chain); + +/** + * register_netevent_notifier - register a netevent notifier block + * @nb: notifier + * + * Register a notifier to be called when a netevent occurs. + * The notifier passed is linked into the kernel structures and must + * not be reused until it has been unregistered. A negative errno code + * is returned on a failure. + */ +int register_netevent_notifier(struct notifier_block *nb) +{ + int err; + + err = atomic_notifier_chain_register(&netevent_notif_chain, nb); + return err; +} + +/** + * netevent_unregister_notifier - unregister a netevent notifier block + * @nb: notifier + * + * Unregister a notifier previously registered by + * register_neigh_notifier(). The notifier is unlinked into the + * kernel structures and may then be reused. A negative errno code + * is returned on a failure. + */ + +int unregister_netevent_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(&netevent_notif_chain, nb); +} + +/** + * call_netevent_notifiers - call all netevent notifier blocks + * @val: value passed unmodified to notifier function + * @v: pointer passed unmodified to notifier function + * + * Call all neighbour notifier blocks. Parameters and return value + * are as for notifier_call_chain(). + */ + +int call_netevent_notifiers(unsigned long val, void *v) +{ + return atomic_notifier_call_chain(&netevent_notif_chain, val, v); +} + +EXPORT_SYMBOL_GPL(register_netevent_notifier); +EXPORT_SYMBOL_GPL(unregister_netevent_notifier); +EXPORT_SYMBOL_GPL(call_netevent_notifiers); From swise at opengridcomputing.com Fri Jul 28 11:29:09 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 28 Jul 2006 13:29:09 -0500 Subject: [openib-general] [PATCH Round 5 2/3] Core net changes to generate netevents In-Reply-To: <20060728182849.14325.62898.stgit@stevo-desktop> References: <20060728182849.14325.62898.stgit@stevo-desktop> Message-ID: <20060728182909.14325.2628.stgit@stevo-desktop> Generate netevents for: - neighbour changes - routing redirects - pmtu changes --- net/core/Makefile | 2 +- net/core/neighbour.c | 14 ++++++++------ net/ipv4/route.c | 8 ++++++++ net/ipv6/route.c | 7 +++++++ 4 files changed, 24 insertions(+), 7 deletions(-) diff --git a/net/core/Makefile b/net/core/Makefile index e9bd246..2645ba4 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -7,7 +7,7 @@ obj-y := sock.o request_sock.o skbuff.o obj-$(CONFIG_SYSCTL) += sysctl_net_core.o -obj-y += dev.o ethtool.o dev_mcast.o dst.o \ +obj-y += dev.o ethtool.o dev_mcast.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o obj-$(CONFIG_XFRM) += flow.o diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 7ad681f..5130d2e 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -29,6 +29,7 @@ #include #include #include #include +#include #include #include #include @@ -754,6 +755,7 @@ #endif neigh->nud_state = NUD_STALE; neigh->updated = jiffies; neigh_suspect(neigh); + notify = 1; } } else if (state & NUD_DELAY) { if (time_before_eq(now, @@ -762,6 +764,7 @@ #endif neigh->nud_state = NUD_REACHABLE; neigh->updated = jiffies; neigh_connect(neigh); + notify = 1; next = neigh->confirmed + neigh->parms->reachable_time; } else { NEIGH_PRINTK2("neigh %p is probed.\n", neigh); @@ -819,6 +822,8 @@ #endif out: write_unlock(&neigh->lock); } + if (notify) + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) @@ -926,9 +931,7 @@ int neigh_update(struct neighbour *neigh { u8 old; int err; -#ifdef CONFIG_ARPD int notify = 0; -#endif struct net_device *dev; int update_isrouter = 0; @@ -948,9 +951,7 @@ #endif neigh_suspect(neigh); neigh->nud_state = new; err = 0; -#ifdef CONFIG_ARPD notify = old & NUD_VALID; -#endif goto out; } @@ -1022,9 +1023,7 @@ #endif if (!(new & NUD_CONNECTED)) neigh->confirmed = jiffies - (neigh->parms->base_reachable_time << 1); -#ifdef CONFIG_ARPD notify = 1; -#endif } if (new == old) goto out; @@ -1056,6 +1055,9 @@ out: (neigh->flags & ~NTF_ROUTER); } write_unlock_bh(&neigh->lock); + + if (notify) + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) neigh_app_notify(neigh); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 2dc6dbb..19bd49d 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -104,6 +104,7 @@ #include #include #include #include +#include #ifdef CONFIG_SYSCTL #include #endif @@ -1125,6 +1126,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd struct rtable *rth, **rthp; u32 skeys[2] = { saddr, 0 }; int ikeys[2] = { dev->ifindex, 0 }; + struct netevent_redirect netevent; if (!in_dev) return; @@ -1216,6 +1218,11 @@ void ip_rt_redirect(u32 old_gw, u32 dadd rt_drop(rt); goto do_next; } + + netevent.old = &rth->u.dst; + netevent.new = &rt->u.dst; + call_netevent_notifiers(NETEVENT_REDIRECT, + &netevent); rt_del(hash, rth); if (!rt_intern_hash(hash, rt, &rt)) @@ -1452,6 +1459,7 @@ static void ip_rt_update_pmtu(struct dst } dst->metrics[RTAX_MTU-1] = mtu; dst_set_expires(dst, ip_rt_mtu_expires); + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, dst); } } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 87c39c9..4b16371 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -53,6 +53,7 @@ #include #include #include #include +#include #include @@ -742,6 +743,7 @@ static void ip6_rt_update_pmtu(struct ds dst->metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG; } dst->metrics[RTAX_MTU-1] = mtu; + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, dst); } } @@ -1155,6 +1157,7 @@ void rt6_redirect(struct in6_addr *dest, struct rt6_info *rt, *nrt = NULL; int strict; struct fib6_node *fn; + struct netevent_redirect netevent; /* * Get the "current" route for this destination and @@ -1252,6 +1255,10 @@ restart: if (ip6_ins_rt(nrt, NULL, NULL, NULL)) goto out; + netevent.old = &rt->u.dst; + netevent.new = &nrt->u.dst; + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); + if (rt->rt6i_flags&RTF_CACHE) { ip6_del_rt(rt, NULL, NULL, NULL); return; From swise at opengridcomputing.com Fri Jul 28 11:29:20 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 28 Jul 2006 13:29:20 -0500 Subject: [openib-general] [PATCH Round 5 3/3] Cleanup ib_addr module to use the netevents In-Reply-To: <20060728182849.14325.62898.stgit@stevo-desktop> References: <20060728182849.14325.62898.stgit@stevo-desktop> Message-ID: <20060728182919.14325.58582.stgit@stevo-desktop> --- drivers/infiniband/core/addr.c | 30 ++++++++++++++---------------- 1 files changed, 14 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index d294bbc..1205e80 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -35,6 +35,7 @@ #include #include #include #include +#include #include MODULE_AUTHOR("Sean Hefty"); @@ -326,25 +327,22 @@ void rdma_addr_cancel(struct rdma_dev_ad } EXPORT_SYMBOL(rdma_addr_cancel); -static int addr_arp_recv(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pkt, struct net_device *orig_dev) +static int netevent_callback(struct notifier_block *self, unsigned long event, + void *ctx) { - struct arphdr *arp_hdr; + if (event == NETEVENT_NEIGH_UPDATE) { + struct neighbour *neigh = ctx; - arp_hdr = (struct arphdr *) skb->nh.raw; - - if (arp_hdr->ar_op == htons(ARPOP_REQUEST) || - arp_hdr->ar_op == htons(ARPOP_REPLY)) - set_timeout(jiffies); - - kfree_skb(skb); + if (neigh->dev->type == ARPHRD_INFINIBAND && + (neigh->nud_state & NUD_VALID)) { + set_timeout(jiffies); + } + } return 0; } -static struct packet_type addr_arp = { - .type = __constant_htons(ETH_P_ARP), - .func = addr_arp_recv, - .af_packet_priv = (void*) 1, +static struct notifier_block nb = { + .notifier_call = netevent_callback }; static int addr_init(void) @@ -353,13 +351,13 @@ static int addr_init(void) if (!addr_wq) return -ENOMEM; - dev_add_pack(&addr_arp); + register_netevent_notifier(&nb); return 0; } static void addr_cleanup(void) { - dev_remove_pack(&addr_arp); + unregister_netevent_notifier(&nb); destroy_workqueue(addr_wq); } From caitlinb at broadcom.com Fri Jul 28 11:40:55 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 28 Jul 2006 11:40:55 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: Message-ID: <54AD0F12E08D1541B826BE97C98F99F17DC73B@NT-SJCA-0751.brcm.ad.broadcom.com> Rimmer, Todd wrote: >> From: Caitlin Bestler [mailto:caitlinb at broadcom.com] >> >> That assumes that there is any valid reason for an application to >> post send requests before the connection is established. While there >> is clearly a need to post receive work requests before the >> connection is established I cannot think of any reason why an >> application needs to pre-prime the send queue. >> >> Putting unneeded complexity in the definition of a hardware service >> just invites more hardware dependencies and eventual hardware >> specific bugs that will complicate life for application developers. >> "Don't post until the connection is established" is very simple for >> *both* the application and the verbs provider. > > Here is a real world example and how we uncovered this issue: > Native IB storage SRP Targets. > > SRP Targets implement the passive side, after processing the > REQ, they send a REP. However the target QP is in RTR. > > The Srp client gets the REP, sends the RTU and announces to > the OS that a device is available. The OS immediately begins issuing > SCSI commands. > > The target receives the SCSI commands (such as Test Unit Ready or > Inquiry) and wants to act on them immediately. However if > the command has passed the RTU or the RTU is lost, the target > is still in RTR. If the command was very simple, the target > may want to answer the query immediately by posting a send > with the response. If the RTU is totally lost and async > event processing is delayed, the target may even be able to > do some processing and still not have the QP in RTS when it > has its response ready. > > While its possible to build an additional queuing point above > the send Q, such queuing points tend to impact performance > and latency for high performance protocols. > > While this example was for SRP, its not unique to SRP. Most > protocols (such as SRP and SDP) include a set of > initialization or capability query messages which the client > may issue to the target immediately after the client believes > it has a connection. Many of those initialization messages > are the types of things which the target can answer > immediately and may even choose to do so in its completion handler. > When the QP is ready to accept postings to the Send Queue it should report Connection Established to the Consumer. Before it has reported that, any attempt by the Consumer to post should return an error. The problem here is that you are informing the Consumer that the connection is established before you are informing the QP itself. Well effectively you are, because the application naturally assumes that the RDMA device cannot successfully complete a receive work request until after the connection is established. And in fact that assumption is correct. I'm not concerned with how IB drivers resolve this issue, what I am concerned is that this fix does not create any sort of expectation that the RDMA Device MUST accept and queue "early" send requests. From Tim.Snider at engenio.com Fri Jul 28 11:41:57 2006 From: Tim.Snider at engenio.com (Snider, Tim) Date: Fri, 28 Jul 2006 12:41:57 -0600 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around Message-ID: <18A61515E49B764AB09447A336E51F5661F637@NAMAIL2.ad.lsil.com> This is an oversight. It will be fixed in the next major version of the LSI Logic Storage controller firmware. The OUI after the fix will be A0B8. Timothy Snider Storage Architect Strategic Planning, Technology and Architecture LSI Logic Corporation 3718 North Rock Road Wichita, KS 67226 (316) 636-8736 tim.snider at lsil.com -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Wednesday, July 26, 2006 2:05 PM To: Michael S. Tsirkin Cc: Vu Pham; openib-general at openib.org Subject: Re: [PATCH] RFC: srp filesystem data corruption problem/work-around > I'll fix that up to check the OUI. Makes sense. Unfortunately at least the Engenio target I have access to uses the same Mellanox OUI: IO Unit Info: port LID: 0003 port GID: fe800000000000000002c902004000e6 change ID: 0002 max controllers: 0x10 controller[ 1] GUID: 0002c902004000e4 vendor ID: 0002c9 device ID: 005a44 IO class : 0100 ID: LSI Storage Systems SRP Driver 200400a0b80bdd41 service entries: 1 service[ 0]: 200400a0b80bdd41 / SRP.T10:200400A0B80BDD41 but still I think it's better than nothing to only activate the workaround for GUIDs starting with 0002c9. - R. From mshefty at ichips.intel.com Fri Jul 28 12:19:49 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 28 Jul 2006 12:19:49 -0700 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <20060727221354.GC17234@mellanox.co.il> References: <000001c6b1c3$1a5672d0$8fc8180a@amr.corp.intel.com> <20060727221354.GC17234@mellanox.co.il> Message-ID: <44CA6355.2070607@ichips.intel.com> Michael S. Tsirkin wrote: > Neither am I, but still I'm a bit uncomfortable - after all, > the spec explicitly says: > > "it should be pointed out that successful reception of the final message > in a CM protocol message exchange cannot be guaranteed." The patch handles this by routing the communication establish event directly to the IB CM for processing, which will force the connection in the established state and generate a connection event to the user. The requirement for RDMA CM users is that they must wait for an RDMA_CM_EVENT_ESTABLISHED event before posting sends. Are we happy with this requirement? The effect on the user is that either the receive processing must be deferred or sends must be queued until such an event occurs. The alternative is to add an rdma_establish() call that a user may call after polling their first receive completion. - Sean From rdreier at cisco.com Fri Jul 28 12:27:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 28 Jul 2006 12:27:59 -0700 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: <44CA438A.4050603@ichips.intel.com> (Sean Hefty's message of "Fri, 28 Jul 2006 10:04:10 -0700") References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> <44CA438A.4050603@ichips.intel.com> Message-ID: > Why not just do: > > #define rdma_structure_name ibv_structure_name > > Then we simply remove the #define and replace ibv_structure_name with > rdma_structure_name when ready. This better shows the changes, prevents > duplicating every structure, and should avoid any compile warnings. I agree, except let's make rdma_structure_name the real name, and do the define the opposite way. That way it's easier to remove the compatibility stuff later. - R. From trimmer at silverstorm.com Fri Jul 28 12:28:54 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 28 Jul 2006 15:28:54 -0400 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: <44CA6355.2070607@ichips.intel.com> Message-ID: > From: Sean Hefty > > Michael S. Tsirkin wrote: > > Neither am I, but still I'm a bit uncomfortable - after all, > > the spec explicitly says: > > > > "it should be pointed out that successful reception of the final message > > in a CM protocol message exchange cannot be guaranteed." > > The patch handles this by routing the communication establish event > directly to > the IB CM for processing, which will force the connection in the > established > state and generate a connection event to the user. > > The requirement for RDMA CM users is that they must wait for an > RDMA_CM_EVENT_ESTABLISHED event before posting sends. Are we happy with > this > requirement? > > The effect on the user is that either the receive processing must be > deferred or > sends must be queued until such an event occurs. The alternative is to > add an > rdma_establish() call that a user may call after polling their first > receive > completion. > An rdma_establish call would not solve the issue. The problem is the CQ callbacks are not in a premptable context. The rdma_establish call would need to move the QP to RTS, which requires a premptable context. Todd Rimmer From rdreier at cisco.com Fri Jul 28 12:29:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 28 Jul 2006 12:29:12 -0700 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: <44CA448E.4040603@ichips.intel.com> (Sean Hefty's message of "Fri, 28 Jul 2006 10:08:30 -0700") References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> <44CA448E.4040603@ichips.intel.com> Message-ID: > I realize why you did this, but adding the 'v' to these two calls makes them > inconsistent with all the other calls in the API. I don't know what the right > approach is here, but it may be better to have the RDMA CM use different names. Maybe the best approach is to use rdmav_ to replace ibv_. - R. From swise at opengridcomputing.com Fri Jul 28 12:35:59 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 28 Jul 2006 14:35:59 -0500 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> <44CA438A.4050603@ichips.intel.com> Message-ID: <1154115359.7239.51.camel@stevo-desktop> On Fri, 2006-07-28 at 12:27 -0700, Roland Dreier wrote: > > Why not just do: > > > > #define rdma_structure_name ibv_structure_name > > > > Then we simply remove the #define and replace ibv_structure_name with > > rdma_structure_name when ready. This better shows the changes, prevents > > duplicating every structure, and should avoid any compile warnings. > > I agree, except let's make rdma_structure_name the real name, and do > the define the opposite way. That way it's easier to remove the > compatibility stuff later. > > - R. Uh, won't that break binary compatibility? Or is that not a requirement? IE: If you make the ibv_* names #defines, then all apps will have to recompile to use the new lib. If you leave them as the real names, then apps can still link to the new lib and run ok...i think... From rdreier at cisco.com Fri Jul 28 13:21:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 28 Jul 2006 13:21:36 -0700 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: <1154115359.7239.51.camel@stevo-desktop> (Steve Wise's message of "Fri, 28 Jul 2006 14:35:59 -0500") References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> <44CA438A.4050603@ichips.intel.com> <1154115359.7239.51.camel@stevo-desktop> Message-ID: Steve> Uh, won't that break binary compatibility? Or is that not Steve> a requirement? IE: If you make the ibv_* names #defines, Steve> then all apps will have to recompile to use the new lib. Steve> If you leave them as the real names, then apps can still Steve> link to the new lib and run ok...i think... I don't think the names of structures matters for ABI compatibility -- if I have a header with struct foo { int a; int b; }; void blah(struct foo *z); then a binary compiled against that will still work even if I later change the name of struct foo so I have: stuct bar { int c; int d; }; void blah(struct bar *z); after all the layout of struct foo and struct bar are the same. - R. From swise at opengridcomputing.com Fri Jul 28 13:29:26 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 28 Jul 2006 15:29:26 -0500 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> <20060728045008.9426.53690.sendpatchset@K50wks273950wss.in.ibm.com> <44CA438A.4050603@ichips.intel.com> <1154115359.7239.51.camel@stevo-desktop> Message-ID: <1154118566.7239.56.camel@stevo-desktop> you're right...its friday... ;-) On Fri, 2006-07-28 at 13:21 -0700, Roland Dreier wrote: > Steve> Uh, won't that break binary compatibility? Or is that not > Steve> a requirement? IE: If you make the ibv_* names #defines, > Steve> then all apps will have to recompile to use the new lib. > Steve> If you leave them as the real names, then apps can still > Steve> link to the new lib and run ok...i think... > > I don't think the names of structures matters for ABI compatibility -- > if I have a header with > > struct foo { > int a; > int b; > }; > > void blah(struct foo *z); > > then a binary compiled against that will still work even if I later > change the name of struct foo so I have: > > stuct bar { > int c; > int d; > }; > > void blah(struct bar *z); > > after all the layout of struct foo and struct bar are the same. > > - R. From rdreier at cisco.com Fri Jul 28 13:29:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 28 Jul 2006 13:29:07 -0700 Subject: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around In-Reply-To: <18A61515E49B764AB09447A336E51F5661F637@NAMAIL2.ad.lsil.com> (Tim Snider's message of "Fri, 28 Jul 2006 12:41:57 -0600") References: <18A61515E49B764AB09447A336E51F5661F637@NAMAIL2.ad.lsil.com> Message-ID: > This is an oversight. It will be fixed in the next major version of the > LSI Logic Storage controller firmware. The OUI after the fix will be > A0B8. OK great. So let's respin this patch so it defaults on but only triggers for targets with the Mellanox OUI -- that will catch current LSI targets but I think that's acceptable. - R. From rdreier at cisco.com Fri Jul 28 15:20:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 28 Jul 2006 15:20:31 -0700 Subject: [openib-general] 2 patches in mst-for-2.6.18 In-Reply-To: <20060725170643.GP8795@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 25 Jul 2006 20:06:43 +0300") References: <20060725170643.GP8795@mellanox.co.il> Message-ID: OK, I applied these patches. From rdreier at cisco.com Fri Jul 28 15:20:20 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 28 Jul 2006 15:20:20 -0700 Subject: [openib-general] [PATCH] IB/mthca: fix mthca_array_clear thinko In-Reply-To: <20060726155750.GF9411@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 26 Jul 2006 18:57:51 +0300") References: <20060726155750.GF9411@mellanox.co.il> Message-ID: Thanks, applied. From ardavis at ichips.intel.com Fri Jul 28 15:37:50 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Fri, 28 Jul 2006 15:37:50 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA74FE@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA74FE@mtlexch01.mtl.com> Message-ID: <44CA91BE.5050301@ichips.intel.com> Tziporet Koren wrote: > Hi all, > > This is the outcome of the meeting we had today regarding OFED 1.1 > schedule and features. > > > > Can we include librdmacm and dapl in the basic installation option? Also, it would be nice to have rdma_ucm and rdma_cm load on boot by default. Thanks, -arlin From greg.lindahl at qlogic.com Fri Jul 28 19:14:33 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Fri, 28 Jul 2006 19:14:33 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> References: <20060728045001.9426.85822.sendpatchset@K50wks273950wss.in.ibm.com> Message-ID: <20060729021433.GD5936@greglaptop.internal.keyresearch.com> > This patchset is a proposal to create new API's and data structures with > transport neutral names. We named ourselves OpenFabrics instead of OpenRDMA for a reason, did I miss some point where we decided that we would use RDMA as a transport neutral name in the source code? -- greg From mst at mellanox.co.il Sat Jul 29 14:55:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 30 Jul 2006 00:55:09 +0300 Subject: [openib-general] posting send requests in RTR In-Reply-To: References: Message-ID: <20060729215509.GB10158@mellanox.co.il> Quoting r. Rimmer, Todd : > The target receives the SCSI commands (such as Test Unit Ready or > Inquiry) and wants to act on them immediately. However if the command > has passed the RTU or the RTU is lost, the target is still in RTR. If > the command was very simple, the target may want to answer the query > immediately by posting a send with the response. Since the response won't go out until QP is in RTS anyway, why is it important to post the send immediately? The simplest appproach for you is to avoid polling the CQ until QP is in RTS. -- MST From ftillier at silverstorm.com Sat Jul 29 18:55:13 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Sat, 29 Jul 2006 18:55:13 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: <20060729215509.GB10158@mellanox.co.il> References: <20060729215509.GB10158@mellanox.co.il> Message-ID: <79ae2f320607291855v74c5596btae23d46851b6e867@mail.gmail.com> On 7/29/06, Michael S. Tsirkin wrote: > Quoting r. Rimmer, Todd : > > The target receives the SCSI commands (such as Test Unit Ready or > > Inquiry) and wants to act on them immediately. However if the command > > has passed the RTU or the RTU is lost, the target is still in RTR. If > > the command was very simple, the target may want to answer the query > > immediately by posting a send with the response. > > Since the response won't go out until QP is in RTS anyway, why is it important > to post the send immediately? The simplest appproach for you is to avoid > polling the CQ until QP is in RTS. Which only works if you have a CQ per QP. If the CQ happens to be shared you're SOL. - Fab From ogerlitz at voltaire.com Sat Jul 29 22:55:57 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 30 Jul 2006 08:55:57 +0300 Subject: [openib-general] some CM/CMA clarfications/questions related to listener impl. In-Reply-To: <000201c6b195$35d36e10$1ad4180a@amr.corp.intel.com> References: <000201c6b195$35d36e10$1ad4180a@amr.corp.intel.com> Message-ID: <44CC49ED.8090804@voltaire.com> Sean Hefty wrote: >> Does the the code below is what needs to be done by a CMA >> consumer to listen on a SID based on over >> all the active IB ports of this node? > Your code should be sufficient to listen across all RDMA devices. OK, thanks for confirming that. >> What's most important for me now is to understand that the code >> below does what i expect, however looking in ib_cm.h and the cma >> code while coding the example, one point that i run into was the >> granularity the IB CM allows consumers "bind" objects to. > The binding is unique system wide if an IP address of 0 is > specified, so the generated SID will also be unique system wide. OK >> I see that CM IDs are bounded to **device** and not to >> couple? if indeed, what was the rational? > Basically, the implementation was easier, and I didn't see a need to track > listens per port. OK, given the below info it makes sense >> It means that we don't support a config of a node connected to >> two IP subnets each over a different port of the same device, >> and two processes, each attempting to listen over the same port, >> but with different ip. > As long as the processes bind to different IP addresses, this is supported. > Rdma_bind() allows binding to the same port if the IP addresses are different. > Rdma_listen() will map both requests to the same SID, but distinguishes them > using the private data carried in the REQ. wow! i was not aware to that, thanks Or. From ogerlitz at voltaire.com Sat Jul 29 23:26:40 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 30 Jul 2006 09:26:40 +0300 Subject: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP In-Reply-To: <000101c6b191$4abcf6b0$1ad4180a@amr.corp.intel.com> References: <000101c6b191$4abcf6b0$1ad4180a@amr.corp.intel.com> Message-ID: <44CC5120.6090107@voltaire.com> Sean Hefty wrote: >> This is done also for the ib_send_cm_req case but you agreed it can be >> removed from there also for the peer to peer case. > > To clarify, it can be removed from ib_send_cm_req() if we only care about this > handling the COMM_EST event. If we want to track local QPNs as part of > timewait, or want to expand the code to handle path migration, we need it in > both places. OK, make sense, thanks for clarifying it. Or. From ogerlitz at voltaire.com Sat Jul 29 23:30:19 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 30 Jul 2006 09:30:19 +0300 Subject: [openib-general] controlling IPoIB debug In-Reply-To: References: <44C85315.4030300@voltaire.com> Message-ID: <44CC51FB.20506@voltaire.com> Roland Dreier wrote: > > OK, this makes sense. However, I could not find the way to set > > CONFIG_EMBEDDED, can you educate me how to do it? ... thanks. > > make menuconfig > > then hit / > > type embedded and hit enter OK, thanks Or. From jackm at mellanox.co.il Sun Jul 30 00:05:31 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Sun, 30 Jul 2006 10:05:31 +0300 Subject: [openib-general] APM: QP migration state change when failover triggered by hw In-Reply-To: <44B55981.6040408@3leafnetworks.com> References: <44B55981.6040408@3leafnetworks.com> Message-ID: <200607301005.32499.jackm@mellanox.co.il> On Wednesday 12 July 2006 23:20, Venkatesh Babu wrote: > > With OFED 1.0, when cable is removed from the port corresponding to the > primary path, CI sends an event IB_EVENT_PATH_MIG, but is not changeing > the state to "Migrated" and not migrating to the alternate path. So the > traffic doesn't resume on the alternate path. > Could you please describe your flow in more detail (including setup phase), and, if possible, send us a small test program which illustrates your problem? Thanks! - Jack From ogerlitz at voltaire.com Sun Jul 30 00:10:06 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 30 Jul 2006 10:10:06 +0300 Subject: [openib-general] posting send requests in RTR In-Reply-To: <000101c6b1c5$9e4c1de0$8fc8180a@amr.corp.intel.com> References: <000101c6b1c5$9e4c1de0$8fc8180a@amr.corp.intel.com> Message-ID: <44CC5B4E.3030503@voltaire.com> Sean Hefty wrote: >> That assumes that there is any valid reason for an application to >> post send requests before the connection is established. While there >> is clearly a need to post receive work requests before the connection >> is established I cannot think of any reason why an application needs >> to pre-prime the send queue. > > It's not pre-priming the send queue. An application could pull a completed > receive work completion off of a CQ. The receive may very well be a request > that requires a response. At this point, the connection is obviously > established from the consumers viewpoint, but not necessarily by the viewpoint > of the RDMA CM or IB CM. The response must now be queued. > > I believe that the problem can be limited under the following application > conditions: > > 1. The application uses the CQ with different QPs. > 2. The application is on the passive (server) side of the connection. > 3. The active (client) side sends a request to the server. > > Even combined these conditions could easily occur. > > IMO, the question is do we pass this problem to the applications to deal with, > or try to handle transparently it under verbs. If we try to handle it under > verbs, can it be done in one place? How much would such checks affects the > performance of post send operations? And how would immediate or other errors be > handled when posting queued sends? > > My personal take at the moment is to let the ULPs handle the problem. My personal take is as of Sean, move it to the ULP, it would be nice to have some document educating ULP programmers about in-nature IB races such eg RX before ESTABLISHED, RX after DISCONNECTED whatever. Its wrong (overdoing, asking for bugs, you named it) adding a requirement for hw drivers to be able to serve post TX-es while QP is in RTR, as Sean pointed how do you return an instant error while posting the queued TX in this case??? how do you flush the queued TX if the QP gets from RTR into ERROR state??? i guess more minds can produce more error schemes which are hell to take care of. Or. From tziporet at mellanox.co.il Sun Jul 30 02:15:17 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 30 Jul 2006 12:15:17 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary In-Reply-To: <44CA91BE.5050301@ichips.intel.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA74FE@mtlexch01.mtl.com> <44CA91BE.5050301@ichips.intel.com> Message-ID: <44CC78A5.3030104@mellanox.co.il> Arlin Davis wrote: > Can we include librdmacm and dapl in the basic installation option? > Also, it would be nice to have rdma_ucm and rdma_cm load on boot by > default. > > Thanks, > > -arlin This is a small change in the OFED scripts. I suggest that if we go for this change we will do it for the HPC install and not for the basic install (which includes only the verbs and IPoIB). If there is no objection from anyone we will go for this change. Tziporet From tziporet at mellanox.co.il Sun Jul 30 02:20:23 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 30 Jul 2006 12:20:23 +0300 Subject: [openib-general] OFED 1.0 - Official Release (Tziporet Koren) In-Reply-To: <20060728071952.6646.qmail@web36905.mail.mud.yahoo.com> References: <20060728071952.6646.qmail@web36905.mail.mud.yahoo.com> Message-ID: <44CC79D7.3030805@mellanox.co.il> zhu shi song wrote: > Dear Sir, > what's your progress on sdp connection? I'm waiting > for the new release to test. > tks > zhu > Progress is very good, and we succeeded to run Polygraph with 800 connections for few days. RC1 is expected this week so you will be able to test it yourself Tziporet From moshek at voltaire.com Sun Jul 30 04:25:50 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Sun, 30 Jul 2006 14:25:50 +0300 Subject: [openib-general] [openfabrics-ewg] new user level branch for OFED 1.1 Message-ID: Does the new planned OFED 1.1 resolve the open 64 bit compilation problems over PPC 64 ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Wednesday, July 26, 2006 5:42 PM To: EWG; OPENIB Subject: [openfabrics-ewg] new user level branch for OFED 1.1 Hi All, Toward OFED 1.1 release I have created the 1.1 branch: https://openib.org/svn/gen2/branches/1.1/ This branch includes the src/userspace/ based on trunk r8680, and all the other ofed staff. Tziporet _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From tziporet at mellanox.co.il Sun Jul 30 05:50:06 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 30 Jul 2006 15:50:06 +0300 Subject: [openib-general] [openfabrics-ewg] new user level branch for OFED 1.1 In-Reply-To: References: Message-ID: <44CCAAFE.3030002@mellanox.co.il> Moshe Kazir wrote: > Does the new planned OFED 1.1 resolve the open 64 bit compilation > problems over PPC 64 ? > > Moshe > > > Which issues you refer to? Tziporet From moshek at voltaire.com Sun Jul 30 06:11:52 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Sun, 30 Jul 2006 16:11:52 +0300 Subject: [openib-general] [openfabrics-ewg] new user level branch for OFED 1.1 Message-ID: Please look at Bugzilla bugs :169,170,171,174 Best regards, Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Tziporet Koren [mailto:tziporet at mellanox.co.il] Sent: Sunday, July 30, 2006 3:50 PM To: Moshe Kazir Cc: EWG; OPENIB Subject: Re: [openfabrics-ewg] new user level branch for OFED 1.1 Moshe Kazir wrote: > Does the new planned OFED 1.1 resolve the open 64 bit compilation > problems over PPC 64 ? > > Moshe > > > Which issues you refer to? Tziporet From sean.hefty at intel.com Sun Jul 30 12:24:21 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 30 Jul 2006 12:24:21 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060729021433.GD5936@greglaptop.internal.keyresearch.com> Message-ID: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> >We named ourselves OpenFabrics instead of OpenRDMA for a reason Wasn't OpenRDMA already taken? - Sean From panda at cse.ohio-state.edu Sun Jul 30 20:05:40 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun, 30 Jul 2006 23:05:40 -0400 (EDT) Subject: [openib-general] Announcing the release of MVAPICH 0.9.8 with on-demand connection management, fault-tolerance and advanced multi-rail scheduling support Message-ID: <200607310305.k6V35eem007313@xi.cse.ohio-state.edu> The MVAPICH team is pleased to announce the availability of MVAPICH 0.9.8 with the following new features: - On-demand connection management using native InfiniBand Unreliable Datagram (UD) support. This feature enables InfiniBand connections to be setup dynamically and has `near constant' memory usage with increasing number of processes. This feature together with the Shared Receive Queue (SRQ) feature (available since MVAPICH 0.9.7) enhances the scalability of MVAPICH on multi-thousand node clusters. Performance of applications and memory scalability using on-demand connection management and SRQ support can be seen by visiting the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html - Support for Fault Tolerance: Mem-to-mem reliable data transfer (detection of I/O bus error with 32bit CRC and retransmission in case of error). This mode enables MVAPICH to deliver messages reliably in presence of I/O bus errors. - Multi-rail communication support with flexible scheduling policies: - Separate control of small and large message scheduling - Three different scheduling policies for small messages: - Using First Subchannel, Round Robin and Process Binding - Six different scheduling policies for large messages: - Round Robin, Weighted striping, Even striping, Stripe Blocking, Adaptive Striping and Process Binding - Shared library support for Solaris - Integrated and easy-to-use build script which automatically detects system architecture and InfiniBand adapter types and optimizes MVAPICH for any particular installation More details on all features and supported platforms can be obtained by visiting the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_features.html MVAPICH 0.9.8 continues to deliver excellent performance. Sample performance numbers include: - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR: - 2.93 microsec one-way latency (4 bytes) - 1471 MB/sec unidirectional bandwidth - 2678 MB/sec bidirectional bandwidth - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR (dual-rail): - 2534 MB/sec unidirectional bandwidth - 3003 MB/sec bidirectional bandwidth - OpenIB/Gen2 on Opteron with PCI-Ex and IBA-DDR: - 2.65 microsec one-way latency (4 bytes) - 1399 MB/sec unidirectional bandwidth - 2253 MB/sec bidirectional bandwidth - Solaris uDAPL/IBTL on Opteron with PCI-Ex and IBA-SDR: - 3.86 microsec one-way latency (4 bytes) - 981 MB/sec unidirectional bandwidth - 1856 MB/sec bidirectional bandwidth - OpenIB/Gen2 uDAPL on EM64T with PCI-Ex and IBA-SDR: - 3.80 microsec one-way latency (4 bytes) - 963 MB/sec unidirectional bandwidth - 1851 MB/sec bidirectional bandwidth - OpenIB/Gen2 uDAPL on Opteron with PCI-Ex and IBA-DDR: - 2.81 microsec one-way latency (4 bytes) - 1411 MB/sec unidirectional bandwidth - 2252 MB/sec bidirectional bandwidth Performance numbers for all other platforms, system configurations and operations can be viewed by visiting `Performance' section of the project's web page. For downloading MVAPICH 0.9.8 package and accessing the anonymous SVN, please visit the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ A stripped down version of this release is also available at the OpenIB SVN. All feedbacks, including bug reports, hints for performance tuning, patches and enhancements are welcome. Please post it to mvapich-discuss mailing list. Thanks, MVAPICH Team at OSU/NBCL ====================================================================== MVAPICH/MVAPICH2 project is currently supported with funding from U.S. National Science Foundation, U.S. DOE Office of Science, Mellanox, Intel, Cisco Systems, Sun Microsystems and Linux Networx; and with equipment support from Advanced Clustering, AMD, Apple, Appro, Dell, IBM, Intel, Mellanox, Microway, PathScale, SilverStorm and Sun Microsystems. Other technology partner includes Etnus. ====================================================================== From davem at davemloft.net Sun Jul 30 20:45:31 2006 From: davem at davemloft.net (David Miller) Date: Sun, 30 Jul 2006 20:45:31 -0700 (PDT) Subject: [openib-general] [PATCH Round 5 0/3] Network Event Notifier Mechanism In-Reply-To: <20060728182849.14325.62898.stgit@stevo-desktop> References: <20060728182849.14325.62898.stgit@stevo-desktop> Message-ID: <20060730.204531.101497100.davem@davemloft.net> From: Steve Wise Date: Fri, 28 Jul 2006 13:28:49 -0500 > Dave/Roland, is this patchset about ready to go? All 3 patches applied, thanks Steve. From zhushisongzhu at yahoo.com Mon Jul 31 00:39:35 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 31 Jul 2006 00:39:35 -0700 (PDT) Subject: [openib-general] OFED 1.0 - Official Release (Tziporet Koren) In-Reply-To: <44CC79D7.3030805@mellanox.co.il> Message-ID: <20060731073935.6825.qmail@web36907.mail.mud.yahoo.com> tks zhu --- Tziporet Koren wrote: > > > zhu shi song wrote: > > Dear Sir, > > what's your progress on sdp connection? I'm > waiting > > for the new release to test. > > tks > > zhu > > > > Progress is very good, and we succeeded to run > Polygraph with 800 > connections for few days. > > RC1 is expected this week so you will be able to > test it yourself > > Tziporet > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From yann.kalemkarian at bull.net Mon Jul 31 01:17:29 2006 From: yann.kalemkarian at bull.net (Yann K.) Date: Mon, 31 Jul 2006 10:17:29 +0200 Subject: [openib-general] [PROTOCOL] IB stack forcing synchronous exchanges Message-ID: <44CDBC99.3090005@bull.net> Hello everybody, Currently we are using an external mpi code forcing synchronous exchanges, using the MP_EAGER_LIMIT or the LIBELAN_TPORT_BIGMSG flags at the network communication library level. After parsing a bit the ofed 1.0.1 code (+ doc), I did not find anything related with my problem. Did anybody get the same problem and found a solution ? Thanks in advance. Yann K. From krkumar2 at in.ibm.com Mon Jul 31 02:14:46 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Mon, 31 Jul 2006 14:44:46 +0530 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: Message-ID: Hi Sean & Roland, Thanks for your review comments. I will respond to all comments in this mail to make it easy :) A. ------- > > Why not just do: > > > > #define rdma_structure_name ibv_structure_name > > > > Then we simply remove the #define and replace ibv_structure_name with > > rdma_structure_name when ready. This better shows the changes, prevents > > duplicating every structure, and should avoid any compile warnings. > > I agree, except let's make rdma_structure_name the real name, and do > the define the opposite way. That way it's easier to remove the > compatibility stuff later. I had tried this some time back (with amso driver actually) and found it had some issues. Eg, some data structures have the same name as API routines. I did : "cscope -1 -L" on all function names and found 27 functions that also clashed with structure names. The full list is : ********* Problem with ibv_query_device ********** include/infiniband/kern-abi.h ibv_query_device 146 struct ibv_query_device { src/verbs.c ibv_query_device 81 int ibv_query_device(struct ibv_context *context ********* Problem with ibv_query_port ********** include/infiniband/kern-abi.h ibv_query_port 198 struct ibv_query_port { src/verbs.c ibv_query_port 87 int ibv_query_port(struct ibv_context *context ********* Problem with ibv_alloc_pd ********** include/infiniband/kern-abi.h ibv_alloc_pd 231 struct ibv_alloc_pd { src/verbs.c ibv_alloc_pd 137 struct ibv_pd *ibv_alloc_pd(struct ibv_context * ********* Problem with ibv_dealloc_pd ********** include/infiniband/kern-abi.h ibv_dealloc_pd 243 struct ibv_dealloc_pd { src/verbs.c ibv_dealloc_pd 148 int ibv_dealloc_pd(struct ibv_pd *pd) ********* Problem with ibv_reg_mr ********** include/infiniband/kern-abi.h ibv_reg_mr 250 struct ibv_reg_mr { src/verbs.c ibv_reg_mr 153 struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, ********* Problem with ibv_dereg_mr ********** include/infiniband/kern-abi.h ibv_dereg_mr 269 struct ibv_dereg_mr { src/verbs.c ibv_dereg_mr 167 int ibv_dereg_mr(struct ibv_mr *mr) ********* Problem with ibv_create_comp_channel ********** include/infiniband/kern-abi.h ibv_create_comp_channel 276 struct ibv_create_comp_channel { src/verbs.c ibv_create_comp_channel 190 struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context) ********* Problem with ibv_create_cq ********** include/infiniband/kern-abi.h ibv_create_cq 287 struct ibv_create_cq { src/verbs.c ibv_create_cq 232 struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe, void *cq_context, ********* Problem with ibv_resize_cq ********** include/infiniband/kern-abi.h ibv_resize_cq 346 struct ibv_resize_cq { src/verbs.c ibv_resize_cq 250 int ibv_resize_cq(struct ibv_cq *cq, int cqe) ********* Problem with ibv_destroy_cq ********** include/infiniband/kern-abi.h ibv_destroy_cq 360 struct ibv_destroy_cq { src/verbs.c ibv_destroy_cq 258 int ibv_destroy_cq(struct ibv_cq *cq) ********* Problem with ibv_poll_cq ********** include/infiniband/kern-abi.h ibv_poll_cq 323 struct ibv_poll_cq { include/infiniband/verbs.h ibv_poll_cq 831 static inline int ibv_poll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *wc) ********* Problem with ibv_req_notify_cq ********** include/infiniband/kern-abi.h ibv_req_notify_cq 338 struct ibv_req_notify_cq { include/infiniband/verbs.h ibv_req_notify_cq 845 static inline int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only) ********* Problem with ibv_create_srq ********** include/infiniband/kern-abi.h ibv_create_srq 696 struct ibv_create_srq { src/verbs.c ibv_create_srq 289 struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, ********* Problem with ibv_modify_srq ********** include/infiniband/kern-abi.h ibv_modify_srq 716 struct ibv_modify_srq { src/verbs.c ibv_modify_srq 310 int ibv_modify_srq(struct ibv_srq *srq, ********* Problem with ibv_query_srq ********** include/infiniband/kern-abi.h ibv_query_srq 727 struct ibv_query_srq { src/verbs.c ibv_query_srq 317 int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr) ********* Problem with ibv_destroy_srq ********** include/infiniband/kern-abi.h ibv_destroy_srq 744 struct ibv_destroy_srq { src/verbs.c ibv_destroy_srq 322 int ibv_destroy_srq(struct ibv_srq *srq) ********* Problem with ibv_post_srq_recv ********** include/infiniband/kern-abi.h ibv_post_srq_recv 636 struct ibv_post_srq_recv { include/infiniband/verbs.h ibv_post_srq_recv 901 static inline int ibv_post_srq_recv(struct ibv_srq *srq, ********* Problem with ibv_create_qp ********** include/infiniband/kern-abi.h ibv_create_qp 432 struct ibv_create_qp { src/verbs.c ibv_create_qp 327 struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, ********* Problem with ibv_modify_qp ********** include/infiniband/kern-abi.h ibv_modify_qp 524 struct ibv_modify_qp { src/verbs.c ibv_modify_qp 364 int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, ********* Problem with ibv_query_qp ********** include/infiniband/kern-abi.h ibv_query_qp 480 struct ibv_query_qp { src/verbs.c ibv_query_qp 348 int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, ********* Problem with ibv_destroy_qp ********** include/infiniband/kern-abi.h ibv_destroy_qp 557 struct ibv_destroy_qp { src/verbs.c ibv_destroy_qp 379 int ibv_destroy_qp(struct ibv_qp *qp) ********* Problem with ibv_post_send ********** include/infiniband/kern-abi.h ibv_post_send 598 struct ibv_post_send { include/infiniband/verbs.h ibv_post_send 943 static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, ********* Problem with ibv_post_recv ********** include/infiniband/kern-abi.h ibv_post_recv 620 struct ibv_post_recv { include/infiniband/verbs.h ibv_post_recv 952 static inline int ibv_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr, ********* Problem with ibv_create_ah ********** include/infiniband/kern-abi.h ibv_create_ah 652 struct ibv_create_ah { src/verbs.c ibv_create_ah 384 struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) ********* Problem with ibv_destroy_ah ********** include/infiniband/kern-abi.h ibv_destroy_ah 667 struct ibv_destroy_ah { src/verbs.c ibv_destroy_ah 452 int ibv_destroy_ah(struct ibv_ah *ah) ********* Problem with ibv_attach_mcast ********** include/infiniband/kern-abi.h ibv_attach_mcast 674 struct ibv_attach_mcast { src/verbs.c ibv_attach_mcast 457 int ibv_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) ********* Problem with ibv_detach_mcast ********** include/infiniband/kern-abi.h ibv_detach_mcast 685 struct ibv_detach_mcast { src/verbs.c ibv_detach_mcast 462 int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) That was the reason I wrote separate routines and data structures so that #defines don't do wrong things. Does that sound reasonable ? B. ----- - "Maybe the best approach is to use rdmav_ to replace ibv_" : OK, that sounds consistent. Is "rdv_" better than the longer "rdmav_" ? C. -------- - "Path records are IB specific. Not sure we need to rename them" and "These changes look fine. We just need to decide if we want to change everything that's ibv_* to rdma_*, or keep IB specific names (path records, GIDs, PKeys, etc.) the same." I had indicated this in my "Information notes" in the [PATCH 0/6] : "IB specific routines are also converted to use RDMA generic API's for sake of uniformness (knowing that transport dependent names will be removed once all apps are converted)." The issue is between deciding to have either rd(ma)_v or ibv_ for IB specific structures. Currently there is no other transport other than IB that has these specific structures, but if that changes it might be better to keep the name transport agnostic. Another reason that I see at this time is to have uniform names which means that this library exports names using one prefix - this means that I do not have to care about the underlying transport type and I also do not have to remember that ibv_ is for [a, b, c, d] operations and rdma_ is for [e, f, g, h] operations. What do you feel ? D. -------- > > We named ourselves OpenFabrics instead of OpenRDMA for a reason > Wasn't OpenRDMA already taken? Correct. Also "RDMA" name is a generic enough name and has nothing to do with openRDMA. Roland had rightly pointed out in my first attempt that usage of "#ifdef OPENRDMA_xxx" was inappropirate, so I corrected that in this patchset (I had changed OPENIB to OPENRDMA blindly :)). Thanks, - KK From krkumar2 at in.ibm.com Mon Jul 31 04:25:47 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Mon, 31 Jul 2006 16:55:47 +0530 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: Message-ID: OTOH, I think #A below is possible (will provide feedback on this tmrw). Please let me know comments on other items though. Thanks, - KK Krishna Kumar2/India/IBM wrote on 07/31/2006 02:44:46 PM: > Hi Sean & Roland, > > Thanks for your review comments. I will respond to all comments in > this mail to make it easy :) > > A. > ------- > > > Why not just do: > > > > > > #define rdma_structure_name ibv_structure_name > > > > > > Then we simply remove the #define and replace ibv_structure_name with > > > rdma_structure_name when ready. This better shows the changes, prevents > > > duplicating every structure, and should avoid any compile warnings. > > > > I agree, except let's make rdma_structure_name the real name, and do > > the define the opposite way. That way it's easier to remove the > > compatibility stuff later. > I had tried this some time back (with amso driver actually) and found it had some > issues. Eg, some data structures have the same name as API routines. I did : > > "cscope -1 -L" on all function names and found 27 functions that > also clashed with structure names. The full list is : > > ********* Problem with ibv_query_device ********** > include/infiniband/kern-abi.h ibv_query_device 146 struct ibv_query_device { > src/verbs.c ibv_query_device 81 int ibv_query_device(struct ibv_context *context > ********* Problem with ibv_query_port ********** > include/infiniband/kern-abi.h ibv_query_port 198 struct ibv_query_port { > src/verbs.c ibv_query_port 87 int ibv_query_port(struct ibv_context *context > ********* Problem with ibv_alloc_pd ********** > include/infiniband/kern-abi.h ibv_alloc_pd 231 struct ibv_alloc_pd { > src/verbs.c ibv_alloc_pd 137 struct ibv_pd *ibv_alloc_pd(struct ibv_context * > ********* Problem with ibv_dealloc_pd ********** > include/infiniband/kern-abi.h ibv_dealloc_pd 243 struct ibv_dealloc_pd { > src/verbs.c ibv_dealloc_pd 148 int ibv_dealloc_pd(struct ibv_pd *pd) > ********* Problem with ibv_reg_mr ********** > include/infiniband/kern-abi.h ibv_reg_mr 250 struct ibv_reg_mr { > src/verbs.c ibv_reg_mr 153 struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, > ********* Problem with ibv_dereg_mr ********** > include/infiniband/kern-abi.h ibv_dereg_mr 269 struct ibv_dereg_mr { > src/verbs.c ibv_dereg_mr 167 int ibv_dereg_mr(struct ibv_mr *mr) > ********* Problem with ibv_create_comp_channel ********** > include/infiniband/kern-abi.h ibv_create_comp_channel 276 struct > ibv_create_comp_channel { > src/verbs.c ibv_create_comp_channel 190 struct ibv_comp_channel > *ibv_create_comp_channel(struct ibv_context *context) > ********* Problem with ibv_create_cq ********** > include/infiniband/kern-abi.h ibv_create_cq 287 struct ibv_create_cq { > src/verbs.c ibv_create_cq 232 struct ibv_cq *ibv_create_cq(struct ibv_context > *context, int cqe, void *cq_context, > ********* Problem with ibv_resize_cq ********** > include/infiniband/kern-abi.h ibv_resize_cq 346 struct ibv_resize_cq { > src/verbs.c ibv_resize_cq 250 int ibv_resize_cq(struct ibv_cq *cq, int cqe) > ********* Problem with ibv_destroy_cq ********** > include/infiniband/kern-abi.h ibv_destroy_cq 360 struct ibv_destroy_cq { > src/verbs.c ibv_destroy_cq 258 int ibv_destroy_cq(struct ibv_cq *cq) > ********* Problem with ibv_poll_cq ********** > include/infiniband/kern-abi.h ibv_poll_cq 323 struct ibv_poll_cq { > include/infiniband/verbs.h ibv_poll_cq 831 static inline int > ibv_poll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *wc) > ********* Problem with ibv_req_notify_cq ********** > include/infiniband/kern-abi.h ibv_req_notify_cq 338 struct ibv_req_notify_cq { > include/infiniband/verbs.h ibv_req_notify_cq 845 static inline int > ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only) > ********* Problem with ibv_create_srq ********** > include/infiniband/kern-abi.h ibv_create_srq 696 struct ibv_create_srq { > src/verbs.c ibv_create_srq 289 struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, > ********* Problem with ibv_modify_srq ********** > include/infiniband/kern-abi.h ibv_modify_srq 716 struct ibv_modify_srq { > src/verbs.c ibv_modify_srq 310 int ibv_modify_srq(struct ibv_srq *srq, > ********* Problem with ibv_query_srq ********** > include/infiniband/kern-abi.h ibv_query_srq 727 struct ibv_query_srq { > src/verbs.c ibv_query_srq 317 int ibv_query_srq(struct ibv_srq *srq, struct > ibv_srq_attr *srq_attr) > ********* Problem with ibv_destroy_srq ********** > include/infiniband/kern-abi.h ibv_destroy_srq 744 struct ibv_destroy_srq { > src/verbs.c ibv_destroy_srq 322 int ibv_destroy_srq(struct ibv_srq *srq) > ********* Problem with ibv_post_srq_recv ********** > include/infiniband/kern-abi.h ibv_post_srq_recv 636 struct ibv_post_srq_recv { > include/infiniband/verbs.h ibv_post_srq_recv 901 static inline int > ibv_post_srq_recv(struct ibv_srq *srq, > ********* Problem with ibv_create_qp ********** > include/infiniband/kern-abi.h ibv_create_qp 432 struct ibv_create_qp { > src/verbs.c ibv_create_qp 327 struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, > ********* Problem with ibv_modify_qp ********** > include/infiniband/kern-abi.h ibv_modify_qp 524 struct ibv_modify_qp { > src/verbs.c ibv_modify_qp 364 int ibv_modify_qp(struct ibv_qp *qp, struct > ibv_qp_attr *attr, > ********* Problem with ibv_query_qp ********** > include/infiniband/kern-abi.h ibv_query_qp 480 struct ibv_query_qp { > src/verbs.c ibv_query_qp 348 int ibv_query_qp(struct ibv_qp *qp, struct > ibv_qp_attr *attr, > ********* Problem with ibv_destroy_qp ********** > include/infiniband/kern-abi.h ibv_destroy_qp 557 struct ibv_destroy_qp { > src/verbs.c ibv_destroy_qp 379 int ibv_destroy_qp(struct ibv_qp *qp) > ********* Problem with ibv_post_send ********** > include/infiniband/kern-abi.h ibv_post_send 598 struct ibv_post_send { > include/infiniband/verbs.h ibv_post_send 943 static inline int > ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, > ********* Problem with ibv_post_recv ********** > include/infiniband/kern-abi.h ibv_post_recv 620 struct ibv_post_recv { > include/infiniband/verbs.h ibv_post_recv 952 static inline int > ibv_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr, > ********* Problem with ibv_create_ah ********** > include/infiniband/kern-abi.h ibv_create_ah 652 struct ibv_create_ah { > src/verbs.c ibv_create_ah 384 struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, > struct ibv_ah_attr *attr) > ********* Problem with ibv_destroy_ah ********** > include/infiniband/kern-abi.h ibv_destroy_ah 667 struct ibv_destroy_ah { > src/verbs.c ibv_destroy_ah 452 int ibv_destroy_ah(struct ibv_ah *ah) > ********* Problem with ibv_attach_mcast ********** > include/infiniband/kern-abi.h ibv_attach_mcast 674 struct ibv_attach_mcast { > src/verbs.c ibv_attach_mcast 457 int ibv_attach_mcast(struct ibv_qp *qp, union > ibv_gid *gid, uint16_t lid) > ********* Problem with ibv_detach_mcast ********** > include/infiniband/kern-abi.h ibv_detach_mcast 685 struct ibv_detach_mcast { > src/verbs.c ibv_detach_mcast 462 int ibv_detach_mcast(struct ibv_qp *qp, union > ibv_gid *gid, uint16_t lid) > > That was the reason I wrote separate routines and data structures so that > #defines don't do wrong things. Does that sound reasonable ? > > B. > ----- > > - "Maybe the best approach is to use rdmav_ to replace ibv_" : > > OK, that sounds consistent. Is "rdv_" better than the longer "rdmav_" ? > > C. > -------- > > - "Path records are IB specific. Not sure we need to rename them" and "These > changes look fine. We just need to decide if we want to change everything > that's ibv_* to rdma_*, or keep IB specific names (path records, GIDs, PKeys, > etc.) the same." > > I had indicated this in my "Information notes" in the [PATCH 0/6] : > "IB specific routines are also converted to use RDMA generic API's for sake > of uniformness (knowing that transport dependent names will be removed > once all apps are converted)." > > The issue is between deciding to have either rd(ma)_v or ibv_ for IB specific > structures. Currently there is no other transport other than IB that has these > specific structures, but if that changes it might be better to keep the name > transport agnostic. Another reason that I see at this time is to have uniform > names which means that this library exports names using one prefix - this means > that I do not have to care about the underlying transport type and I also do not > have to remember that ibv_ is for [a, b, c, d] operations and rdma_ is for [e, f, g, h] > operations. What do you feel ? > > D. > -------- > > > We named ourselves OpenFabrics instead of OpenRDMA for a reason > > Wasn't OpenRDMA already taken? > > Correct. Also "RDMA" name is a generic enough name and has nothing to do with > openRDMA. Roland had rightly pointed out in my first attempt that usage of "#ifdef > OPENRDMA_xxx" was inappropirate, so I corrected that in this patchset (I had > changed OPENIB to OPENRDMA blindly :)). > > Thanks, > > - KK From mst at mellanox.co.il Mon Jul 31 05:08:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 31 Jul 2006 15:08:12 +0300 Subject: [openib-general] [PATCH repost] libmthca: fix compilation on SLES10 Message-ID: <20060731120812.GJ9411@mellanox.co.il> Roland, could you take the following please? Fix compilation on SLES10: --- mthca.h uses offsetof so it must include stddef.h Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: l/src/userspace/libmthca/src/mthca.h =================================================================== --- l/src/userspace/libmthca/src/mthca.h (revision 7569) +++ l/src/userspace/libmthca/src/mthca.h (working copy) @@ -36,6 +36,7 @@ #ifndef MTHCA_H #define MTHCA_H +#include #include #include -- MST From mst at mellanox.co.il Mon Jul 31 05:07:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 31 Jul 2006 15:07:12 +0300 Subject: [openib-general] [PATCH repost] libmthca: stricter checks in mthca_create_srq Message-ID: <20060731120712.GI9411@mellanox.co.il> Roland, could you comment please? --- If SRQ max_wr is too high, srq buf size calculation srq->buf_size = srq->max << srq->wqe_shift might overflow, resulting in crash on buffer access. The simplest solution is to limit max_wr since practically hardware does not support values bigger than 1 << 16. Signed-off-by: Michael S. Tsirkin Index: openib/src/userspace/libmthca/src/verbs.c =================================================================== --- openib/src/userspace/libmthca/src/verbs.c (revision 8484) +++ openib/src/userspace/libmthca/src/verbs.c (working copy) @@ -372,7 +372,7 @@ struct ibv_srq *mthca_create_srq(struct int ret; /* Sanity check SRQ size before proceeding */ - if (attr->attr.max_wr > 16 << 20 || attr->attr.max_sge > 64) + if (attr->attr.max_wr >= 1 << 16 || attr->attr.max_sge > 64) return NULL; srq = malloc(sizeof *srq); -- MST From swise at opengridcomputing.com Mon Jul 31 07:52:48 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 31 Jul 2006 09:52:48 -0500 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> Message-ID: <1154357568.1066.4.camel@stevo-desktop> On Sun, 2006-07-30 at 12:24 -0700, Sean Hefty wrote: > >We named ourselves OpenFabrics instead of OpenRDMA for a reason > > Wasn't OpenRDMA already taken? > > - Sean > rdma_* is more descriptive than something like ofv_* or of_* in my opinion. I would think the prefix should help describe the functionality being implemented: Transport Neutral RDMA. Is there a good technical reason for not using rdma_? Steve. From greg.lindahl at qlogic.com Mon Jul 31 08:15:39 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 08:15:39 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <1154357568.1066.4.camel@stevo-desktop> References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> Message-ID: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> On Mon, Jul 31, 2006 at 09:52:48AM -0500, Steve Wise wrote: > rdma_* is more descriptive than something like ofv_* or of_* in my > opinion. I would think the prefix should help describe the > functionality being implemented: Transport Neutral RDMA. Some functions are RDMA. Others are not. If all are called RDMA, that's misleading. For example, in IB, there is send/receive as well as RDMA. ULPs often use send/receive for short messages. I wouldn't know anything about the non-IB parts of Open Fabrics, but I would bet that there is non-RDMA functionality in them. The common concept is messaging, not RDMA. -- greg From swise at opengridcomputing.com Mon Jul 31 08:24:11 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 31 Jul 2006 10:24:11 -0500 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> Message-ID: <1154359451.3078.4.camel@stevo-desktop> On Mon, 2006-07-31 at 08:15 -0700, Greg Lindahl wrote: > On Mon, Jul 31, 2006 at 09:52:48AM -0500, Steve Wise wrote: > > > rdma_* is more descriptive than something like ofv_* or of_* in my > > opinion. I would think the prefix should help describe the > > functionality being implemented: Transport Neutral RDMA. > > Some functions are RDMA. Others are not. If all are called RDMA, > that's misleading. > > For example, in IB, there is send/receive as well as RDMA. ULPs often > use send/receive for short messages. > I see your point. However, the IETF RDMA protocol defines SEND as well as READ, WRITE, etc. So in my mind, that's all RDMA, not just read and write. Steve. From caitlinb at broadcom.com Mon Jul 31 09:01:16 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Mon, 31 Jul 2006 09:01:16 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> Message-ID: <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > On Mon, Jul 31, 2006 at 09:52:48AM -0500, Steve Wise wrote: > >> rdma_* is more descriptive than something like ofv_* or of_* in my >> opinion. I would think the prefix should help describe the >> functionality being implemented: Transport Neutral RDMA. > > Some functions are RDMA. Others are not. If all are called > RDMA, that's misleading. > > For example, in IB, there is send/receive as well as RDMA. > ULPs often use send/receive for short messages. > > I wouldn't know anything about the non-IB parts of Open > Fabrics, but I would bet that there is non-RDMA functionality in them. > > The common concept is messaging, not RDMA. > That would imply that the purpose of the openfabrics stack is to replace netdev. I don't think we want to go there. The broadest scope I can imagine is that openfabrics is for networking where the immediate using layer is aware of memory registration. RDMA-associated send/recv is message based, but then so is SCTP. And SCTP is handled through netdev. The more important distinction is that even when using anonymous buffers via openfabrics they are *registered* buffers. Further, they are supplied by the application layer. That contrasts with non-QP-based networking where buffers are supplied without pre-registration and some degree of system buffering is available to hold received content before the application asks for it. Now you could argue that "RDMA" is not the best label for this class of service. You might be tempted to say it is "qp based" networking, but then you realize that a "Queue Pair" is an RDMA capable network endpoinit that is implemented using three or more queues. In other words, no short memorable label is ever perfect. Just go with the label that people recognize whether it is the best theoretical description or not. Aside from *what* the label is, the key question is whether a given function is "generic RDMA" versus IB or iWARP specific. I believe that is the intent of the "rdma_" prefix, and I haven't spotted any misapplications yet. From greg.lindahl at qlogic.com Mon Jul 31 09:23:33 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 09:23:33 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <1154359451.3078.4.camel@stevo-desktop> References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> Message-ID: <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> On Mon, Jul 31, 2006 at 10:24:11AM -0500, Steve Wise wrote: > However, the IETF RDMA protocol defines SEND as well as READ, WRITE, > etc. So in my mind, that's all RDMA, not just read and write. Well, most people think RDMA means RDMA. The RDMA protocol undoubtedly defines SEND/RECV because it's needed in addition to RDMA to get good performance. But trying to call all of that RDMA is a marketing slogan. Here's why it's a problem: I've repeatedly seen people try to use RDMA (get and put) all the time because they think it must be faster than simple send and receive... that's what the slogans tell them. But then they discover that they need to use ordinary SEND/RECV for shorter messages and for conversations with a lot of participants. That's a technical screwup caused by the marketing slogan. Let's pick symbol names that match our organization name. I'm a bit dissappointed that several of you who were at the last Sonoma conference forgot we discussed this in a public session right before the name change. I am not on the steering committee, and wouldn't be surprised if the openrdma domain name issue was the big decider in the name choice, but the wisdom of having RDMA in our name was in doubt for more reasons than just that. -- g From greg.lindahl at qlogic.com Mon Jul 31 10:17:21 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 10:17:21 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <20060731171721.GB1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 09:01:16AM -0700, Caitlin Bestler wrote: > That would imply that the purpose of the openfabrics stack > is to replace netdev. I don't think it implies that at all. -- greg From mshefty at ichips.intel.com Mon Jul 31 10:17:20 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 31 Jul 2006 10:17:20 -0700 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: References: Message-ID: <44CE3B20.8030907@ichips.intel.com> > - "Path records are IB specific. Not sure we need to rename them" and "These > changes look fine. We just need to decide if we want to change everything > that's ibv_* to rdma_*, or keep IB specific names (path records, GIDs, PKeys, > etc.) the same." > > I had indicated this in my "Information notes" in the [PATCH 0/6] : "IB > specific routines are also converted to use RDMA generic API's for sake of > uniformness (knowing that transport dependent names will be removed once all > apps are converted)." > > The issue is between deciding to have either rd(ma)_v or ibv_ for IB specific > structures. Currently there is no other transport other than IB that has > these specific structures, but if that changes it might be better to keep the > name transport agnostic. Another reason that I see at this time is to have > uniform names which means that this library exports names using one prefix - > this means that I do not have to care about the underlying transport type and > I also do not have to remember that ibv_ is for [a, b, c, d] operations and > rdma_ is for [e, f, g, h] operations. What do you feel ? If an application is looking at a path record, GID, PKey, etc. they they _are_ caring about the underlying transport type and the fact that it is IB. An application that wants to be transport neutral would just need to limit itself to using rdma_* structures and APIs. If we take a larger view, I don't think we want transport neutral names for the IB CM and IB MAD userspace APIs and structures. Things like path records, GIDs, etc. are also used by those libraries. - Sean From jlentini at netapp.com Mon Jul 31 10:25:39 2006 From: jlentini at netapp.com (James Lentini) Date: Mon, 31 Jul 2006 13:25:39 -0400 (EDT) Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> Message-ID: On Mon, 31 Jul 2006, Greg Lindahl wrote: > On Mon, Jul 31, 2006 at 10:24:11AM -0500, Steve Wise wrote: > > > However, the IETF RDMA protocol defines SEND as well as READ, WRITE, > > etc. So in my mind, that's all RDMA, not just read and write. > > Well, most people think RDMA means RDMA. The RDMA protocol undoubtedly > defines SEND/RECV because it's needed in addition to RDMA to get good > performance. But trying to call all of that RDMA is a marketing slogan. I agree that the term RDMA SEND is confusing. However, the data in an RDMA SEND is deposited directly (zero copy) into the users memory. With that in mind, I can understand why the authors of the IETF spec termed these operations RDMA SEND/RECV. > Here's why it's a problem: I've repeatedly seen people try to use RDMA > (get and put) all the time because they think it must be faster than I'm assuming RDMA get/put correspond to RDMA READ/WRITE. > simple send and receive... that's what the slogans tell them. But then > they discover that they need to use ordinary SEND/RECV for shorter > messages and for conversations with a lot of participants. By ordinary SEND/RECV, do you mean IB/iWARP SEND/RECV or traditional (sockets) networking send(2)/recv(2)? > That's a technical screwup caused by the marketing slogan. The terms RDMA read and RDMA write are technically accurate. It sounds like these developers were misled to believe that using RDMA will speed up all communications. Of course that is not true. There are situations (e.g. short lived connections) were RDMA may not be appropriate. > Let's pick symbol names that match our organization name. Our organization name has more to do with marketing than anything else. > I'm a bit dissappointed that several of you who were at the last > Sonoma conference forgot we discussed this in a public session right > before the name change. I am not on the steering committee, and > wouldn't be surprised if the openrdma domain name issue was the big > decider in the name choice, but the wisdom of having RDMA in our > name was in doubt for more reasons than just that. I think either rdmav_ or rdv_ would be ok, but I see how using RDMA throughout the API could be confusing. Perhaps someone can think of a better prefix. How about dav_ (direct access verb)? From caitlinb at broadcom.com Mon Jul 31 10:27:59 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Mon, 31 Jul 2006 10:27:59 -0700 Subject: [openib-general] posting send requests in RTR In-Reply-To: <20060729215509.GB10158@mellanox.co.il> Message-ID: <54AD0F12E08D1541B826BE97C98F99F17DC8A9@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Saturday, July 29, 2006 2:55 PM > To: Rimmer, Todd > Cc: Caitlin Bestler; Sean Hefty; Or Gerlitz; Roland Dreier; > openib-general at openib.org > Subject: Re: posting send requests in RTR > > Quoting r. Rimmer, Todd : > > The target receives the SCSI commands (such as Test Unit Ready or > > Inquiry) and wants to act on them immediately. However if > the command > > has passed the RTU or the RTU is lost, the target is still > in RTR. If > > the command was very simple, the target may want to answer > the query > > immediately by posting a send with the response. > > Since the response won't go out until QP is in RTS anyway, > why is it important to post the send immediately? The > simplest appproach for you is to avoid polling the CQ until > QP is in RTS. > That only works if you can avoid polling the CQ until *the* QP associated with it is in RTS. But if the CQ supports a large number of QPs then the application should not be expected to figure out which receive completions it cannot process because they are from connections that are not yet established. Especially since the connections are established, because otherwise you could not have a successful completion of a receive work request, but not all of the stack understands that yet. The provider should deal with its own confustion and not inflict it on the consumer. It is very reasonable to expect the active side to wait for a "connection established" event before it sends its first message. It is something totally different to tell a server application supporting a very large number of connections that it cannot respond to a request received on one of those connections because parts of the driver do not realize that the connection is established yet (although it really is). Whether the need for fencing originates in the protocol (the IETF MPA requirement that the passive side receive the first MPA frame before it sends one) or in the driver (the QP state in memory is not promptly updated) the solution should stay within the Provider and not be foisted on the application. Getting application developers to understand RDMA is challenging enough without adding more rules on top of the ones that are truly needed. From rdreier at cisco.com Mon Jul 31 10:32:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 10:32:05 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731171721.GB1098@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 31 Jul 2006 10:17:21 -0700") References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> Message-ID: I apologize for trying to take this thread in a slightly useful direction, but... Greg, what would be your suggestion of a more generic (not IB-specific) replacement of the libibverbs name and ibv_ prefix? - R. From rdreier at cisco.com Mon Jul 31 10:33:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 10:33:33 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: (James Lentini's message of "Mon, 31 Jul 2006 13:25:39 -0400 (EDT)") References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> Message-ID: > Perhaps someone can think of a better prefix. How about dav_ (direct > access verb)? Ugh... makes me think of http://www.webdav.org/ I think rdmav_ is the best I've seen so far... - R. From greg.lindahl at qlogic.com Mon Jul 31 10:37:42 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 10:37:42 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> Message-ID: <20060731173742.GF1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 01:25:39PM -0400, James Lentini wrote: > I agree that the term RDMA SEND is confusing. However, the data in an > RDMA SEND is deposited directly (zero copy) into the users memory. There are many mechanisms other than DMA or RDMA which have this property. You're confusing specification with implementation, too. When I read from a disk on modern Unix, the data is deposited into the user's memory, whether it's DMA or PIO. The defining characteristic of RDMA is that it deposits or reads data based on address provided by the other side, *and* that it has one-sided semantics. In ordinary messaging, data is transferred from buffers which are much less flexibly addressed, and semantics are two-sided. > > Here's why it's a problem: I've repeatedly seen people try to use RDMA > > (get and put) all the time because they think it must be faster than > > I'm assuming RDMA get/put correspond to RDMA READ/WRITE. Yes, "get" and "put" are what the general community have traditionally called these operations. These names emphasize the one-sided nature of the operation, unlike the new official(tm) names. > > simple send and receive... that's what the slogans tell them. But then > > they discover that they need to use ordinary SEND/RECV for shorter > > messages and for conversations with a lot of participants. > > By ordinary SEND/RECV, do you mean IB/iWARP SEND/RECV or traditional > (sockets) networking send(2)/recv(2)? I was actually thinking of OpenIB SEND/RECV. > > That's a technical screwup caused by the marketing slogan. > > The terms RDMA read and RDMA write are technically accurate. It seems we have different defintions of "technical", then. Slogans don't make good engineering. > Perhaps someone can think of a better prefix. How about dav_ (direct > access verb)? That's much better than rdma_, but do you really think the Linux folks are going to be happy about OpenFabrics calls with a prefix that doesn't look anything like "Open Fabrics"? -- greg From greg.lindahl at qlogic.com Mon Jul 31 10:38:32 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 10:38:32 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> Message-ID: <20060731173832.GG1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 10:32:05AM -0700, Roland Dreier wrote: > Greg, what would be your suggestion of a more generic (not > IB-specific) replacement of the libibverbs name and ibv_ prefix? Anything that makes it clear that it's an Open Fabrics call. Which is what our organization and software stack are called. -- greg From mshefty at ichips.intel.com Mon Jul 31 10:39:49 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 31 Jul 2006 10:39:49 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> Message-ID: <44CE4065.2000004@ichips.intel.com> James Lentini wrote: > Perhaps someone can think of a better prefix. How about dav_ (direct > access verb)? Or maybe just "verb". Would that be better? IMO, the underlying issue with using 'rdma' is that a software based solution doesn't actually do 'rdma'. I think this is Greg's complaint, and why he uses the terms 'get/put' instead of rdma read/write. - Sean From rdreier at cisco.com Mon Jul 31 10:41:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 10:41:40 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731173832.GG1098@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 31 Jul 2006 10:38:32 -0700") References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> Message-ID: Greg> Anything that makes it clear that it's an Open Fabrics Greg> call. Which is what our organization and software stack are Greg> called. Without a specific suggestion I still think librdmaverbs/rdmav_ are the best solution I've seen so far. Let's forget about OpenFabrics marketing for a little bit and just focus on Linux RDMA drivers, which is after all what the point is. - R. From mst at mellanox.co.il Mon Jul 31 10:44:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 31 Jul 2006 20:44:30 +0300 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: Message-ID: <20060731174430.GC11778@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH 0/6] Tranport Neutral Verbs Proposal. > > > Perhaps someone can think of a better prefix. How about dav_ (direct > > access verb)? > > Ugh... makes me think of http://www.webdav.org/ > > I think rdmav_ is the best I've seen so far... Yea. -- MST From rdreier at cisco.com Mon Jul 31 10:45:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 10:45:39 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731173742.GF1098@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 31 Jul 2006 10:37:42 -0700") References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> <20060731173742.GF1098@greglaptop.internal.keyresearch.com> Message-ID: > That's much better than rdma_, but do you really think the Linux folks > are going to be happy about OpenFabrics calls with a prefix that > doesn't look anything like "Open Fabrics"? I don't think Linux folks care about Open Fabrics at all. No other drivers have a brand name and it's pretty silly trying to brand IB/iWARP/RDMA/whatever drivers. - R. From greg.lindahl at qlogic.com Mon Jul 31 10:49:52 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 10:49:52 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <44CE4065.2000004@ichips.intel.com> References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> <44CE4065.2000004@ichips.intel.com> Message-ID: <20060731174952.GH1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 10:39:49AM -0700, Sean Hefty wrote: > Or maybe just "verb". Would that be better? That's a good one. > IMO, the underlying issue with using 'rdma' is that a software based > solution doesn't actually do 'rdma'. I think this is Greg's complaint, and > why he uses the terms 'get/put' instead of rdma read/write. Actually, no, it isn't that. It's philosophical, a reaction to the marketing over-hyping of RDMA. I'm stunned that you've never heard of "put" and "get" ! Never used CRAY SHMEM or any one-sided interconnect, I guess? MPI uses those terms, too. -- greg From caitlinb at broadcom.com Mon Jul 31 10:54:55 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Mon, 31 Jul 2006 10:54:55 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731173742.GF1098@greglaptop.internal.keyresearch.com> Message-ID: <54AD0F12E08D1541B826BE97C98F99F17DC8BA@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Greg Lindahl > Sent: Monday, July 31, 2006 10:38 AM > To: James Lentini > Cc: openib-general > Subject: Re: [openib-general] [PATCH 0/6] Tranport Neutral > Verbs Proposal. > > On Mon, Jul 31, 2006 at 01:25:39PM -0400, James Lentini wrote: > > > I agree that the term RDMA SEND is confusing. However, the > data in an > > RDMA SEND is deposited directly (zero copy) into the users memory. > > There are many mechanisms other than DMA or RDMA which have > this property. You're confusing specification with > implementation, too. > When I read from a disk on modern Unix, the data is deposited > into the user's memory, whether it's DMA or PIO. > > The defining characteristic of RDMA is that it deposits or > reads data based on address provided by the other side, *and* > that it has one-sided semantics. In ordinary messaging, data > is transferred from buffers which are much less flexibly > addressed, and semantics are two-sided. > Trying to characterize "RDMA" as consisting *solely* of messages that identify target buffers in the message is off target. RDMA protocols rely on the combination of messages that name their target buffer and messages that target "anonymous" buffers (that are selected by the Data Sink). The iWARP terminology, "tagged" and "untagged" is actually quite useful here and it helps emphasize that the two techniques complement each other. The more informative distinction between "RDMA" and conventional networking is that with RDMA even the "anonymous" buffers come directly from the user (not from system buffering), MUST be pre-posted (via RQ or SRQ) and MUST be enabled (registered) for RDMA access explicitly by a layer *above* RDMA. Now if you can come up with a short acronym that conveys that then I am fine with using it. But avoid explanations that imply that RDMA SEND/RECV is somehow less part of "RDMA" than "RDMA Write" or "RDMA Read". Trying to use "openfabrics" either results in something too long or insufficiently clear when viewed in the context of the kernel as a whole ("of_"?). From greg.lindahl at qlogic.com Mon Jul 31 10:58:57 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 10:58:57 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> <20060731173742.GF1098@greglaptop.internal.keyresearch.com> Message-ID: <20060731175857.GJ1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 10:45:39AM -0700, Roland Dreier wrote: > No other drivers have a brand name and it's pretty silly trying to > brand IB/iWARP/RDMA/whatever drivers. I don't see this as branding or marketing. I see it as trying to come up with a name that's accurate. What do you think of verb_ ? -- greg From swise at opengridcomputing.com Mon Jul 31 11:03:16 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 31 Jul 2006 13:03:16 -0500 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> Message-ID: <1154368996.3078.10.camel@stevo-desktop> On Mon, 2006-07-31 at 10:41 -0700, Roland Dreier wrote: > Greg> Anything that makes it clear that it's an Open Fabrics > Greg> call. Which is what our organization and software stack are > Greg> called. > > Without a specific suggestion I still think librdmaverbs/rdmav_ are > the best solution I've seen so far. > > Let's forget about OpenFabrics marketing for a little bit and just > focus on Linux RDMA drivers, which is after all what the point is. > I agree. Plus we already have precedence for rdma_ with the RDMA CMA... Steve. From mst at mellanox.co.il Mon Jul 31 11:01:35 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 31 Jul 2006 21:01:35 +0300 Subject: [openib-general] Fwd: issues in ipoib Message-ID: <20060731180135.GE11778@mellanox.co.il> Hi! Here's a list of ipoib issues raised since 2.6.17 that still need looking into. I'll see if I can find some time to work on some of them for 2.6.18, but - any takers? Especially wrt 1, need to figure out what's the right thing to to. Roland, could you pls find some time to look into 3 - it seems like there's a potential race there. What do you think? ----- Forwarded message from "Michael S. Tsirkin" ----- Date: Mon, 31 Jul 2006 15:30:14 +0300 From: "Michael S. Tsirkin" Subject: issues in ipoib Reply-To: "Michael S. Tsirkin" We have several issues in ipoib and core left unhandled from previous release: 1. pkey cache issues http://thread.gmane.org/gmane.linux.drivers.openib/26684/focus=26692 2. kmmalloc GFP_ATOMIC allocation issues http://thread.gmane.org/gmane.linux.drivers.openib/26684/focus=26692 3. ipoib race reported after code review by Eitan Rabin http://openib.org/pipermail/openib-general/2006-June/022916.html 4. module unloading races http://openib.org/pipermail/openib-general/2006-April/020397.html ----- End forwarded message ----- -- MST From tom at opengridcomputing.com Mon Jul 31 11:06:26 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Mon, 31 Jul 2006 13:06:26 -0500 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> Message-ID: <1154369186.27930.15.camel@trinity.ogc.int> On Mon, 2006-07-31 at 10:33 -0700, Roland Dreier wrote: > > Perhaps someone can think of a better prefix. How about dav_ (direct > > access verb)? > > Ugh... makes me think of http://www.webdav.org/ > > I think rdmav_ is the best I've seen so far... I second this... > > - R. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From greg.lindahl at qlogic.com Mon Jul 31 11:06:51 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 11:06:51 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F17DC8BA@NT-SJCA-0751.brcm.ad.broadcom.com> References: <20060731173742.GF1098@greglaptop.internal.keyresearch.com> <54AD0F12E08D1541B826BE97C98F99F17DC8BA@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <20060731180651.GK1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 10:54:55AM -0700, Caitlin Bestler wrote: > Trying to characterize "RDMA" as consisting *solely* of > messages that identify target buffers in the message is > off target. You're using circular arguments: "Because one particular subset of the RDMA community defines RDMA in fashion X, it is off target to define RDMA in any other fashion." One-sided vs. two-sided is important. You've completely left that out. Well, no matter: we don't need to argue about the defintion of RDMA to solve the question of what the transport-neutral prefix should be. I have no doubt that we would never agree about the defintion. > Now if you can come up with a short acronym that conveys > that then I am fine with using it. Try "now if *someone* can come up with". How did you like verb_ ? -- greg From jlentini at netapp.com Mon Jul 31 11:17:20 2006 From: jlentini at netapp.com (James Lentini) Date: Mon, 31 Jul 2006 14:17:20 -0400 (EDT) Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731174952.GH1098@greglaptop.internal.keyresearch.com> References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> <44CE4065.2000004@ichips.intel.com> <20060731174952.GH1098@greglaptop.internal.keyresearch.com> Message-ID: On Mon, 31 Jul 2006, Greg Lindahl wrote: > I'm stunned that you've never heard of "put" and "get" ! Never used > CRAY SHMEM or any one-sided interconnect, I guess? MPI uses those > terms, too. Dusting off my copy of vipl.h, circa 1996, I see that these operations were called RDMA READ/WRITE in VIA. For at least a decade, the RDMA READ/WRITE terms have been used consistently in RDMA network specifications. From rdreier at cisco.com Mon Jul 31 11:18:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 11:18:16 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731175857.GJ1098@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 31 Jul 2006 10:58:57 -0700") References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> <20060731173742.GF1098@greglaptop.internal.keyresearch.com> <20060731175857.GJ1098@greglaptop.internal.keyresearch.com> Message-ID: > What do you think of verb_ ? My gut reaction is negative. The whole idea of "verbs" is a bit of technical jargon that makes no sense unless you've lived in the RDMA world for a while, so I'd rather not make it the central concept. Also it seems a bit on the generic side -- there might be clashes/confusion with "verbose" or "verbatim". - R. From greg.lindahl at qlogic.com Mon Jul 31 11:22:26 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 11:22:26 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> <44CE4065.2000004@ichips.intel.com> <20060731174952.GH1098@greglaptop.internal.keyresearch.com> Message-ID: <20060731182226.GM1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 02:17:20PM -0400, James Lentini wrote: > Dusting off my copy of vipl.h, circa 1996, I see that these operations > were called RDMA READ/WRITE in VIA. Yes, and that's the predecessor to IB, so that's no surprise that it uses the same term. The IETF RDMA people also use it. Do you think that's all there is to RDMA? I am not surprised that as a storage guy, that's what you're most familiar with. -- greg From greg.lindahl at qlogic.com Mon Jul 31 11:27:13 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 11:27:13 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <1154368996.3078.10.camel@stevo-desktop> References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> Message-ID: <20060731182713.GO1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 01:03:16PM -0500, Steve Wise wrote: > I agree. Plus we already have precedence for rdma_ with the RDMA CMA... That's precedence about like "I used the term 'wimps' in a poster paper once, so now you should allow me to use 'wimps' in my Astrophysical Journal article." True story. Weakly Interacting Massive Particles. Which in turn spawned MACHOs, MAssive Compact Halo Objects. Fun, but not the way to do software engineering. Hint: did you ever hold a discussion as to whether or not that was the right transport-neutral name? -- greg From greg.lindahl at qlogic.com Mon Jul 31 11:28:46 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 11:28:46 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> <20060731173742.GF1098@greglaptop.internal.keyresearch.com> <20060731175857.GJ1098@greglaptop.internal.keyresearch.com> Message-ID: <20060731182846.GP1098@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 11:18:16AM -0700, Roland Dreier wrote: > My gut reaction is negative. The whole idea of "verbs" is a bit of > technical jargon that makes no sense unless you've lived in the RDMA > world for a while, Given the way you are defining RDMA, I'm not surprised at the conclusion you are coming to. We have been calling these the transport neutral verbs, btw. How about ofabric_ ? -- greg From rdreier at cisco.com Mon Jul 31 11:31:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 11:31:33 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731182713.GO1098@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 31 Jul 2006 11:27:13 -0700") References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> <20060731182713.GO1098@greglaptop.internal.keyresearch.com> Message-ID: Greg> Hint: did you ever hold a discussion as to whether or not Greg> that was the right transport-neutral name? Jeeze, Sean posted the RDMA CM code to three mailing lists for review about 100 times. Did you ever complain about the naming convention? - R. From swise at opengridcomputing.com Mon Jul 31 11:34:41 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 31 Jul 2006 13:34:41 -0500 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731182713.GO1098@greglaptop.internal.keyresearch.com> References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> <20060731182713.GO1098@greglaptop.internal.keyresearch.com> Message-ID: <1154370881.3078.16.camel@stevo-desktop> On Mon, 2006-07-31 at 11:27 -0700, Greg Lindahl wrote: > On Mon, Jul 31, 2006 at 01:03:16PM -0500, Steve Wise wrote: > > > I agree. Plus we already have precedence for rdma_ with the RDMA CMA... > > That's precedence about like "I used the term 'wimps' in a poster > paper once, so now you should allow me to use 'wimps' in my > Astrophysical Journal article." > > True story. Weakly Interacting Massive Particles. Which in turn > spawned MACHOs, MAssive Compact Halo Objects. Fun, but not the way to > do software engineering. > > Hint: did you ever hold a discussion as to whether or not that was the > right transport-neutral name? > > -- greg You seem to be the only one objecting to rdma_ and/or rdmav_. I've listened to your arguments for why you think rdma is a bad name, and I'm not convinced. This is an interface to RDMA transports... Steve. From greg.lindahl at qlogic.com Mon Jul 31 11:38:49 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 11:38:49 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <1154370881.3078.16.camel@stevo-desktop> References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> <20060731182713.GO1098@greglaptop.internal.keyresearch.com> <1154370881.3078.16.camel@stevo-desktop> Message-ID: <20060731183849.GA1998@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 01:34:41PM -0500, Steve Wise wrote: > You seem to be the only one objecting to rdma_ and/or rdmav_. At Sonoma, I was not the only one. I forget, were you there? > I've listened to your arguments for why you think rdma is a bad name, > and I'm not convinced. I'm not surprised, I did not expect to convince everyone. However, it is not the case that you get to pick the name by yourself. Nor I. -- greg From mshefty at ichips.intel.com Mon Jul 31 11:39:11 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 31 Jul 2006 11:39:11 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <1154369186.27930.15.camel@trinity.ogc.int> References: <000001c6b40d$c2b7ae70$cdcc180a@amr.corp.intel.com> <1154357568.1066.4.camel@stevo-desktop> <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <1154359451.3078.4.camel@stevo-desktop> <20060731162333.GD23392@greglaptop.hsd1.ca.comcast.net> <1154369186.27930.15.camel@trinity.ogc.int> Message-ID: <44CE4E4F.80702@ichips.intel.com> Tom Tucker wrote: >> > Perhaps someone can think of a better prefix. How about dav_ (direct >> > access verb)? >> >>Ugh... makes me think of http://www.webdav.org/ >> >>I think rdmav_ is the best I've seen so far... > > > I second this... rdma_ or rdmav_ is my peference as well. - Sean From rdreier at cisco.com Mon Jul 31 11:40:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 11:40:46 -0700 Subject: [openib-general] Fwd: issues in ipoib In-Reply-To: <20060731180135.GE11778@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 31 Jul 2006 21:01:35 +0300") References: <20060731180135.GE11778@mellanox.co.il> Message-ID: > 1. pkey cache issues > http://thread.gmane.org/gmane.linux.drivers.openib/26684/focus=26692 I thought we fixed the P_Key cache issues by correcting the oversight in retrying the P_Key query? > 3. ipoib race reported after code review by Eitan Rabin > http://openib.org/pipermail/openib-general/2006-June/022916.html Yeah, might be a problem I guess. Does it work to do netif_stop_queue() in ipoib_ib_dev_down()? > 4. module unloading races > http://openib.org/pipermail/openib-general/2006-April/020397.html Might be worth looking at to see if it fixes the problems after Sean's multicast changes. Has anyone come up with a way to reliably get the crash to happen? - R. From swise at opengridcomputing.com Mon Jul 31 11:42:13 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 31 Jul 2006 13:42:13 -0500 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731183849.GA1998@greglaptop.internal.keyresearch.com> References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> <20060731182713.GO1098@greglaptop.internal.keyresearch.com> <1154370881.3078.16.camel@stevo-desktop> <20060731183849.GA1998@greglaptop.internal.keyresearch.com> Message-ID: <1154371333.3078.20.camel@stevo-desktop> On Mon, 2006-07-31 at 11:38 -0700, Greg Lindahl wrote: > On Mon, Jul 31, 2006 at 01:34:41PM -0500, Steve Wise wrote: > > > You seem to be the only one objecting to rdma_ and/or rdmav_. > > At Sonoma, I was not the only one. I forget, were you there? > No. > > I've listened to your arguments for why you think rdma is a bad name, > > and I'm not convinced. > > I'm not surprised, I did not expect to convince everyone. However, it > is not the case that you get to pick the name by yourself. Nor I. True... From rdreier at cisco.com Mon Jul 31 11:42:53 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 11:42:53 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731183849.GA1998@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 31 Jul 2006 11:38:49 -0700") References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> <20060731182713.GO1098@greglaptop.internal.keyresearch.com> <1154370881.3078.16.camel@stevo-desktop> <20060731183849.GA1998@greglaptop.internal.keyresearch.com> Message-ID: Greg> I'm not surprised, I did not expect to convince Greg> everyone. However, it is not the case that you get to pick Greg> the name by yourself. Nor I. Yeah, as the libibverbs maintainer I guess it's my decision in the end. Is there anyone other than Greg who has a problem with librdmaverbs and rdmav_ symbol names? - R. From greg.lindahl at qlogic.com Mon Jul 31 11:44:15 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 11:44:15 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> <20060731182713.GO1098@greglaptop.internal.keyresearch.com> Message-ID: <20060731184415.GB1998@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 11:31:33AM -0700, Roland Dreier wrote: > Greg> Hint: did you ever hold a discussion as to whether or not > Greg> that was the right transport-neutral name? > > Jeeze, Sean posted the RDMA CM code to three mailing lists for review > about 100 times. Did you ever complain about the naming convention? Roland, I'm not sure what to say. I suspect you think you're being constructive, but I'm getting tired of being shot at for being the messanger. This is an issue important enough that having an explicit discussion is a good idea. It shouldn't have come up as part of a patch. And it wasn't clear to me that the "RDMA CM" was intended to be part of the transport neutral verbs. If you look a the subject of this thread, it's clear that it's about transport neutral verbs. So I looked, and was surprised. -- greg From rdreier at cisco.com Mon Jul 31 11:52:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 11:52:18 -0700 Subject: [openib-general] [PATCH/RFC] libibverbs and libmthca fork support Message-ID: Here's an initial cut (based on Gleb Natapov's work) at using madvise(MADV_DONTFORK) to support fork() from libibverbs. The main changes from Gleb's earlier work are: - I added code to handle doorbell pages in libmthca. As far as I can see this is necessary -- my tests don't work without it. Gleb, did you ever test your changes on memfree HCAs? - I added a new API function, ibv_fork_init(), which must be called before everything else if an app expects to do fork(). I did this because I wanted a way for apps to know if fork() was expected to work or not, and also because the vast majority of apps don't fork() and probably don't want to pay the price of an extra system call plus RB tree operation for every memory registration. - And the bulk of this patch is converting memory.c over to use RB trees -- I just couldn't bring myself to use an O(N) algorithm at this stage... Comments welcome... - R. Index: libibverbs/include/infiniband/driver.h =================================================================== --- libibverbs/include/infiniband/driver.h (revision 8791) +++ libibverbs/include/infiniband/driver.h (working copy) @@ -135,6 +135,9 @@ int ibv_cmd_destroy_ah(struct ibv_ah *ah int ibv_cmd_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); int ibv_cmd_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); +int ibv_dontfork_range(void *base, size_t size); +int ibv_dofork_range(void *base, size_t size); + /* * sysfs helper functions */ Index: libibverbs/include/infiniband/verbs.h =================================================================== --- libibverbs/include/infiniband/verbs.h (revision 8791) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -285,6 +285,8 @@ struct ibv_pd { struct ibv_mr { struct ibv_context *context; struct ibv_pd *pd; + void *addr; + size_t length; uint32_t handle; uint32_t lkey; uint32_t rkey; @@ -1016,6 +1018,14 @@ int ibv_attach_mcast(struct ibv_qp *qp, */ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); +/** + * ibv_fork_init - Prepare data structures so that fork() may be used + * safely. If this function is not called or returns a non-zero + * status, then libibverbs data structures are not fork()-safe and the + * effect of an application calling fork() is undefined. + */ +int ibv_fork_init(void); + END_C_DECLS # undef __attribute_const Index: libibverbs/ChangeLog =================================================================== --- libibverbs/ChangeLog (revision 8791) +++ libibverbs/ChangeLog (working copy) @@ -1,3 +1,29 @@ +2006-07-26 Roland Dreier + + * src/verbs.c (ibv_reg_mr, ibv_dereg_mr): Add calls to + ibv_dontfork_range() and ibv_dofork_range() for memory regions + registered by library consumers. + + * include/infiniband/verbs.h: Add declaration of ibv_fork_init(). + + * include/infiniband/driver.h: Add declarations of + ibv_dontfork_range() and ibv_dofork_range(). + + * src/memory.c: Rewrite to use a red-black tree instead of a + linked list. Change from doing mlock()/munlock() to + madvise(..., MADV_DONTFORK) and madvise(..., MADV_DOFORK), and + change the name of the entry points to ibv_dontfork_range() and + ibv_dofork_range(). Add ibv_fork_init() for applications to + request fork-safe behavior. + + * src/ibverbs.h: Kill off unused declarations. + + * src/init.c (ibverbs_init): Get rid of call to ibv_init_mem_map(). + + * include/infiniband/verbs.h: Add addr and length field to struct + ibv_mr so that memory regions can be madvised(). This changes the + ABI, since the layout of struct ibv_mr is changed. + 2006-07-04 Roland Dreier * include/infiniband/arch.h: Fix typo in sparc mb() Index: libibverbs/src/libibverbs.map =================================================================== --- libibverbs/src/libibverbs.map (revision 8791) +++ libibverbs/src/libibverbs.map (working copy) @@ -74,6 +74,9 @@ IBVERBS_1.0 { mult_to_ibv_rate; ibv_get_sysfs_path; ibv_read_sysfs_file; + ibv_fork_init; + ibv_dontfork_range; + ibv_dofork_range; local: *; }; Index: libibverbs/src/ibverbs.h =================================================================== --- libibverbs/src/ibverbs.h (revision 8791) +++ libibverbs/src/ibverbs.h (working copy) @@ -58,11 +58,7 @@ struct ibv_abi_compat_v2 { extern HIDDEN int abi_ver; -extern HIDDEN int ibverbs_init(struct ibv_device ***list); - -extern HIDDEN int ibv_init_mem_map(void); -extern HIDDEN int ibv_lock_range(void *base, size_t size); -extern HIDDEN int ibv_unlock_range(void *base, size_t size); +HIDDEN int ibverbs_init(struct ibv_device ***list); #define IBV_INIT_CMD(cmd, size, opcode) \ do { \ Index: libibverbs/src/verbs.c =================================================================== --- libibverbs/src/verbs.c (revision 8791) +++ libibverbs/src/verbs.c (working copy) @@ -155,18 +155,32 @@ struct ibv_mr *ibv_reg_mr(struct ibv_pd { struct ibv_mr *mr; + if (ibv_dontfork_range(addr, length)) + return NULL; + mr = pd->context->ops.reg_mr(pd, addr, length, access); if (mr) { mr->context = pd->context; mr->pd = pd; - } + mr->addr = addr; + mr->length = length; + } else + ibv_dofork_range(addr, length); return mr; } int ibv_dereg_mr(struct ibv_mr *mr) { - return mr->context->ops.dereg_mr(mr); + int ret; + void *addr = mr->addr; + size_t length = mr->length; + + ret = mr->context->ops.dereg_mr(mr); + if (!ret) + ibv_dofork_range(addr, length); + + return ret; } static struct ibv_comp_channel *ibv_create_comp_channel_v2(struct ibv_context *context) Index: libibverbs/src/init.c =================================================================== --- libibverbs/src/init.c (revision 8791) +++ libibverbs/src/init.c (working copy) @@ -205,9 +205,6 @@ HIDDEN int ibverbs_init(struct ibv_devic *list = NULL; - if (ibv_init_mem_map()) - return 0; - find_drivers(default_path); /* Index: libibverbs/src/memory.c =================================================================== --- libibverbs/src/memory.c (revision 8791) +++ libibverbs/src/memory.c (working copy) @@ -1,5 +1,6 @@ /* * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -36,6 +37,7 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include @@ -44,114 +46,424 @@ #include "ibverbs.h" /* - * We keep a linked list of page ranges that have been locked along with a - * reference count to manage overlapping registrations, etc. - * - * Eventually we should turn this into an RB-tree or something similar - * to avoid the O(n) cost of registering/unregistering memory. + * Most distro's headers don't have these yet. */ +#ifndef MADV_DONTFORK +#define MADV_DONTFORK 10 +#endif + +#ifndef MADV_DOFORK +#define MADV_DOFORK 11 +#endif struct ibv_mem_node { - struct ibv_mem_node *prev, *next; - uintptr_t start, end; - int refcnt; + enum { + IBV_RED, + IBV_BLACK + } color; + struct ibv_mem_node *parent; + struct ibv_mem_node *left, *right; + uintptr_t start, end; + int refcnt; }; -static struct { - struct ibv_mem_node *first; - pthread_mutex_t mutex; - uintptr_t page_size; -} mem_map; +static struct ibv_mem_node *mm_root; +static pthread_mutex_t mm_mutex = PTHREAD_MUTEX_INITIALIZER; +static int page_size; +static int too_late; -int ibv_init_mem_map(void) +int ibv_fork_init(void) { - struct ibv_mem_node *node = NULL; - - node = malloc(sizeof *node); - if (!node) - goto fail; - - node->prev = node->next = NULL; - node->start = 0; - node->end = UINTPTR_MAX; - node->refcnt = 0; + void *tmp; - mem_map.first = node; + if (mm_root) + return 0; - mem_map.page_size = sysconf(_SC_PAGESIZE); - if (mem_map.page_size < 0) - goto fail; + if (too_late) + return EINVAL; - if (pthread_mutex_init(&mem_map.mutex, NULL)) - goto fail; + page_size = sysconf(_SC_PAGESIZE); + if (page_size < 0) + return errno; + + if (posix_memalign(&tmp, page_size, page_size)) + return ENOMEM; + + if (madvise(tmp, page_size, MADV_DONTFORK) || + madvise(tmp, page_size, MADV_DOFORK)) + return ENOSYS; + + free(tmp); + + mm_root = malloc(sizeof *mm_root); + if (!mm_root) + return ENOMEM; + + mm_root->parent = NULL; + mm_root->left = NULL; + mm_root->right = NULL; + mm_root->color = IBV_BLACK; + mm_root->start = 0; + mm_root->end = UINTPTR_MAX; + mm_root->refcnt = 0; return 0; +} -fail: - if (node) - free(node); +static struct ibv_mem_node *__mm_prev(struct ibv_mem_node *node) +{ + if (node->left) { + node = node->left; + while (node->right) + node = node->right; + } else { + while (node->parent && node == node->parent->left) + node = node->parent; - return -1; + node = node->parent; + } + + return node; } -static struct ibv_mem_node *__mm_find_first(uintptr_t start, uintptr_t end) +static struct ibv_mem_node *__mm_next(struct ibv_mem_node *node) { - struct ibv_mem_node *node = mem_map.first; + if (node->right) { + node = node->right; + while (node->left) + node = node->left; + } else { + while (node->parent && node == node->parent->right) + node = node->parent; - while (node) { - if ((node->start <= start && node->end >= start) || - (node->start <= end && node->end >= end)) - break; - node = node->next; + node = node->parent; } return node; } -static struct ibv_mem_node *__mm_prev(struct ibv_mem_node *node) +static void __mm_rotate_right(struct ibv_mem_node *node) { - return node->prev; + struct ibv_mem_node *tmp; + + tmp = node->left; + + node->left = tmp->right; + if (node->left) + node->left->parent = node; + + if (node->parent) { + if (node->parent->right == node) + node->parent->right = tmp; + else + node->parent->left = tmp; + } else + mm_root = tmp; + + tmp->parent = node->parent; + + tmp->right = node; + node->parent = tmp; } -static struct ibv_mem_node *__mm_next(struct ibv_mem_node *node) +static void __mm_rotate_left(struct ibv_mem_node *node) +{ + struct ibv_mem_node *tmp; + + tmp = node->right; + + node->right = tmp->left; + if (node->right) + node->right->parent = node; + + if (node->parent) { + if (node->parent->right == node) + node->parent->right = tmp; + else + node->parent->left = tmp; + } else + mm_root = tmp; + + tmp->parent = node->parent; + + tmp->left = node; + node->parent = tmp; +} + +static int verify(struct ibv_mem_node *node) +{ + int hl, hr; + + if (!node) + return 1; + + hl = verify(node->left); + hr = verify(node->left); + + if (!hl || !hr) + return 0; + if (hl != hr) + return 0; + + if (node->color == IBV_RED) { + if (node->left && node->left->color != IBV_BLACK) + return 0; + if (node->right && node->right->color != IBV_BLACK) + return 0; + return hl; + } + + return hl + 1; +} + +static void __mm_add_rebalance(struct ibv_mem_node *node) { - return node->next; + struct ibv_mem_node *parent, *gp, *uncle; + + while (node->parent && node->parent->color == IBV_RED) { + parent = node->parent; + gp = node->parent->parent; + + if (parent == gp->left) { + uncle = gp->right; + + if (uncle && uncle->color == IBV_RED) { + parent->color = IBV_BLACK; + uncle->color = IBV_BLACK; + gp->color = IBV_RED; + + node = gp; + } else { + if (node == parent->right) { + __mm_rotate_left(parent); + node = parent; + parent = node->parent; + } + + parent->color = IBV_BLACK; + gp->color = IBV_RED; + + __mm_rotate_right(gp); + } + } else { + uncle = gp->left; + + if (uncle && uncle->color == IBV_RED) { + parent->color = IBV_BLACK; + uncle->color = IBV_BLACK; + gp->color = IBV_RED; + + node = gp; + } else { + if (node == parent->left) { + __mm_rotate_right(parent); + node = parent; + parent = node->parent; + } + + parent->color = IBV_BLACK; + gp->color = IBV_RED; + + __mm_rotate_left(gp); + } + } + } + + mm_root->color = IBV_BLACK; } -static void __mm_add(struct ibv_mem_node *node, - struct ibv_mem_node *new) +static void __mm_add(struct ibv_mem_node *new) { - new->prev = node; - new->next = node->next; - node->next = new; - if (new->next) - new->next->prev = new; + struct ibv_mem_node *node, *parent = NULL; + + node = mm_root; + while (node) { + parent = node; + if (node->start < new->start) + node = node->right; + else + node = node->left; + } + + if (parent->start < new->start) + parent->right = new; + else + parent->left = new; + + new->parent = parent; + new->left = NULL; + new->right = NULL; + + new->color = IBV_RED; + __mm_add_rebalance(new); } static void __mm_remove(struct ibv_mem_node *node) { - /* Never have to remove the first node, so we can use prev */ - node->prev->next = node->next; - if (node->next) - node->next->prev = node->prev; + struct ibv_mem_node *child, *parent, *sib, *tmp; + int nodecol; + + if (node->left && node->right) { + tmp = node->left; + while (tmp->right) + tmp = tmp->right; + + nodecol = tmp->color; + child = tmp->left; + tmp->color = node->color; + + if (tmp->parent != node) { + parent = tmp->parent; + parent->right = tmp->left; + if (tmp->left) + tmp->left->parent = parent; + + tmp->left = node->left; + node->left->parent = tmp; + } else + parent = tmp; + + tmp->right = node->right; + node->right->parent = tmp; + + tmp->parent = node->parent; + if (node->parent) { + if (node->parent->left == node) + node->parent->left = tmp; + else + node->parent->right = tmp; + } else + mm_root = tmp; + } else { + nodecol = node->color; + + child = node->left ? node->left : node->right; + parent = node->parent; + + if (child) + child->parent = parent; + if (parent) { + if (parent->left == node) + parent->left = child; + else + parent->right = child; + } else + mm_root = child; + } + + free(node); + + if (nodecol == IBV_RED) + return; + + while ((!child || child->color == IBV_BLACK) && child != mm_root) { + if (parent->left == child) { + sib = parent->right; + + if (sib->color == IBV_RED) { + parent->color = IBV_RED; + sib->color = IBV_BLACK; + __mm_rotate_left(parent); + sib = parent->right; + } + + if ((!sib->left || sib->left->color == IBV_BLACK) && + (!sib->right || sib->right->color == IBV_BLACK)) { + sib->color = IBV_RED; + child = parent; + parent = child->parent; + } else { + if (!sib->right || sib->right->color == IBV_BLACK) { + if (sib->left) + sib->left->color = IBV_BLACK; + sib->color = IBV_RED; + __mm_rotate_right(sib); + sib = parent->right; + } + + sib->color = parent->color; + parent->color = IBV_BLACK; + if (sib->right) + sib->right->color = IBV_BLACK; + __mm_rotate_left(parent); + child = mm_root; + break; + } + } else { + sib = parent->left; + + if (sib->color == IBV_RED) { + parent->color = IBV_RED; + sib->color = IBV_BLACK; + __mm_rotate_right(parent); + sib = parent->left; + } + + if ((!sib->left || sib->left->color == IBV_BLACK) && + (!sib->right || sib->right->color == IBV_BLACK)) { + sib->color = IBV_RED; + child = parent; + parent = child->parent; + } else { + if (!sib->left || sib->left->color == IBV_BLACK) { + if (sib->right) + sib->right->color = IBV_BLACK; + sib->color = IBV_RED; + __mm_rotate_left(sib); + sib = parent->left; + } + + sib->color = parent->color; + parent->color = IBV_BLACK; + if (sib->left) + sib->left->color = IBV_BLACK; + __mm_rotate_right(parent); + child = mm_root; + break; + } + } + } + + if (child) + child->color = IBV_BLACK; +} + +static struct ibv_mem_node *__mm_find_start(uintptr_t start, uintptr_t end) +{ + struct ibv_mem_node *node = mm_root; + + while (node) { + if (node->start <= start && node->end >= start) + break; + + if (node->start < start) + node = node->right; + else + node = node->left; + } + + return node; } -int ibv_lock_range(void *base, size_t size) +static int ibv_madvise_range(void *base, size_t size, int advice) { uintptr_t start, end; struct ibv_mem_node *node, *tmp; + int inc; int ret = 0; if (!size) return 0; - start = (uintptr_t) base & ~(mem_map.page_size - 1); - end = ((uintptr_t) (base + size + mem_map.page_size - 1) & - ~(mem_map.page_size - 1)) - 1; + inc = advice == MADV_DONTFORK ? 1 : -1; + + start = (uintptr_t) base & ~(page_size - 1); + end = ((uintptr_t) (base + size + page_size - 1) & + ~(page_size - 1)) - 1; - pthread_mutex_lock(&mem_map.mutex); + pthread_mutex_lock(&mm_mutex); - node = __mm_find_first(start, end); + node = __mm_find_start(start, end); if (node->start < start) { tmp = malloc(sizeof *tmp); @@ -165,11 +477,19 @@ int ibv_lock_range(void *base, size_t si tmp->refcnt = node->refcnt; node->end = start - 1; - __mm_add(node, tmp); + __mm_add(tmp); node = tmp; + } else { + tmp = __mm_prev(node); + if (tmp && tmp->refcnt == node->refcnt + inc) { + tmp->end = node->end; + tmp->refcnt = node->refcnt; + __mm_remove(node); + node = tmp; + } } - while (node->start <= end) { + while (node && node->start <= end) { if (node->end > end) { tmp = malloc(sizeof *tmp); if (!tmp) { @@ -182,13 +502,16 @@ int ibv_lock_range(void *base, size_t si tmp->refcnt = node->refcnt; node->end = end; - __mm_add(node, tmp); + __mm_add(tmp); } + node->refcnt += inc; - if (node->refcnt++ == 0) { - ret = mlock((void *) node->start, - node->end - node->start + 1); + if ((inc == -1 && node->refcnt == 0) || + (inc == 1 && node->refcnt == 1)) { + ret = madvise((void *) node->start, + node->end - node->start + 1, + advice); if (ret) goto out; } @@ -196,63 +519,36 @@ int ibv_lock_range(void *base, size_t si node = __mm_next(node); } + if (node) { + tmp = __mm_prev(node); + if (tmp && node->refcnt == tmp->refcnt) { + tmp->end = node->end; + __mm_remove(node); + } + } + out: - pthread_mutex_unlock(&mem_map.mutex); + pthread_mutex_unlock(&mm_mutex); return ret; } -int ibv_unlock_range(void *base, size_t size) +int ibv_dontfork_range(void *base, size_t size) { - uintptr_t start, end; - struct ibv_mem_node *node, *tmp; - int ret = 0; - - if (!size) + if (mm_root) + return ibv_madvise_range(base, size, MADV_DONTFORK); + else { + too_late = 1; return 0; - - start = (uintptr_t) base & ~(mem_map.page_size - 1); - end = ((uintptr_t) (base + size + mem_map.page_size - 1) & - ~(mem_map.page_size - 1)) - 1; - - pthread_mutex_lock(&mem_map.mutex); - - node = __mm_find_first(start, end); - - if (node->start != start) { - ret = -1; - goto out; - } - - while (node && node->end <= end) { - if (--node->refcnt == 0) { - ret = munlock((void *) node->start, - node->end - node->start + 1); - } - - if (__mm_prev(node) && node->refcnt == __mm_prev(node)->refcnt) { - __mm_prev(node)->end = node->end; - tmp = __mm_prev(node); - __mm_remove(node); - node = tmp; - } - - node = __mm_next(node); - } - - if (node && node->refcnt == __mm_prev(node)->refcnt) { - __mm_prev(node)->end = node->end; - tmp = __mm_prev(node); - __mm_remove(node); } +} - if (node->end != end) { - ret = -1; - goto out; +int ibv_dofork_range(void *base, size_t size) +{ + if (mm_root) + return ibv_madvise_range(base, size, MADV_DOFORK); + else { + too_late = 1; + return 0; } - -out: - pthread_mutex_unlock(&mem_map.mutex); - - return ret; } Index: libmthca/configure.in =================================================================== --- libmthca/configure.in (revision 8791) +++ libmthca/configure.in (working copy) @@ -26,7 +26,7 @@ AC_C_CONST AC_CHECK_SIZEOF(long) dnl Checks for library functions -AC_CHECK_FUNCS(ibv_read_sysfs_file) +AC_CHECK_FUNCS(ibv_read_sysfs_file ibv_dontfork_range ibv_dofork_range) AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then Index: libmthca/src/memfree.c =================================================================== --- libmthca/src/memfree.c (revision 8791) +++ libmthca/src/memfree.c (working copy) @@ -46,8 +46,8 @@ #define MTHCA_FREE_MAP_SIZE (MTHCA_DB_REC_PER_PAGE / (SIZEOF_LONG * 8)) struct mthca_db_page { - unsigned long free[MTHCA_FREE_MAP_SIZE]; - uint64_t *db_rec; + unsigned long free[MTHCA_FREE_MAP_SIZE]; + struct mthca_buf db_rec; }; struct mthca_db_table { @@ -91,7 +91,7 @@ int mthca_alloc_db(struct mthca_db_table } for (i = start; i != end; i += dir) - if (db_tab->page[i].db_rec) + if (db_tab->page[i].db_rec.buf) for (j = 0; j < MTHCA_FREE_MAP_SIZE; ++j) if (db_tab->page[i].free[j]) goto found; @@ -101,18 +101,14 @@ int mthca_alloc_db(struct mthca_db_table goto out; } - { - void *tmp; - - if (posix_memalign(&tmp, MTHCA_DB_REC_PAGE_SIZE, - MTHCA_DB_REC_PAGE_SIZE)) { - ret = -1; - goto out; - } - db_tab->page[i].db_rec = tmp; + if (mthca_alloc_buf(&db_tab->page[i].db_rec, + MTHCA_DB_REC_PAGE_SIZE, + MTHCA_DB_REC_PAGE_SIZE)) { + ret = -1; + goto out; } - memset(db_tab->page[i].db_rec, 0, MTHCA_DB_REC_PAGE_SIZE); + memset(db_tab->page[i].db_rec.buf, 0, MTHCA_DB_REC_PAGE_SIZE); memset(db_tab->page[i].free, 0xff, sizeof db_tab->page[i].free); if (group == 0) @@ -140,7 +136,7 @@ found: j = MTHCA_DB_REC_PER_PAGE - 1 - j; ret = i * MTHCA_DB_REC_PER_PAGE + j; - *db = (uint32_t *) &db_tab->page[i].db_rec[j]; + *db = db_tab->page[i].db_rec.buf + j * 8; out: pthread_mutex_unlock(&db_tab->mutex); @@ -163,7 +159,7 @@ void mthca_free_db(struct mthca_db_table page = db_tab->page + i; pthread_mutex_lock(&db_tab->mutex); - page->db_rec[j] = 0; + *(uint64_t *) (page->db_rec.buf + j * 8) = 0; if (i >= db_tab->min_group2) j = MTHCA_DB_REC_PER_PAGE - 1 - j; @@ -190,7 +186,7 @@ struct mthca_db_table *mthca_alloc_db_ta db_tab->min_group2 = npages - 1; for (i = 0; i < npages; ++i) - db_tab->page[i].db_rec = NULL; + db_tab->page[i].db_rec.buf = NULL; return db_tab; } @@ -203,8 +199,8 @@ void mthca_free_db_tab(struct mthca_db_t return; for (i = 0; i < db_tab->npages; ++i) - if (db_tab->page[i].db_rec) - free(db_tab->page[i].db_rec); + if (db_tab->page[i].db_rec.buf) + mthca_free_buf(&db_tab->page[i].db_rec); free(db_tab); } Index: libmthca/src/qp.c =================================================================== --- libmthca/src/qp.c (revision 8791) +++ libmthca/src/qp.c (working copy) @@ -58,12 +58,12 @@ static const uint8_t mthca_opcode[] = { static void *get_recv_wqe(struct mthca_qp *qp, int n) { - return qp->buf + (n << qp->rq.wqe_shift); + return qp->buf.buf + (n << qp->rq.wqe_shift); } static void *get_send_wqe(struct mthca_qp *qp, int n) { - return qp->buf + qp->send_wqe_offset + (n << qp->sq.wqe_shift); + return qp->buf.buf + qp->send_wqe_offset + (n << qp->sq.wqe_shift); } void mthca_init_qp_indices(struct mthca_qp *qp) @@ -821,13 +821,14 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd qp->buf_size = qp->send_wqe_offset + (qp->sq.max << qp->sq.wqe_shift); - if (posix_memalign(&qp->buf, to_mdev(pd->context->device)->page_size, - align(qp->buf_size, to_mdev(pd->context->device)->page_size))) { + if (mthca_alloc_buf(&qp->buf, + align(qp->buf_size, to_mdev(pd->context->device)->page_size), + to_mdev(pd->context->device)->page_size)) { free(qp->wrid); return -1; } - memset(qp->buf, 0, qp->buf_size); + memset(qp->buf.buf, 0, qp->buf_size); if (mthca_is_memfree(pd->context)) { struct mthca_next_seg *next; Index: libmthca/src/verbs.c =================================================================== --- libmthca/src/verbs.c (revision 8791) +++ libmthca/src/verbs.c (working copy) @@ -188,11 +188,10 @@ struct ibv_cq *mthca_create_cq(struct ib goto err; cqe = align_cq_size(cqe); - cq->buf = mthca_alloc_cq_buf(to_mdev(context->device), cqe); - if (!cq->buf) + if (mthca_alloc_cq_buf(to_mdev(context->device), &cq->buf, cqe)) goto err; - cq->mr = __mthca_reg_mr(to_mctx(context)->pd, cq->buf, + cq->mr = __mthca_reg_mr(to_mctx(context)->pd, cq->buf.buf, cqe * MTHCA_CQ_ENTRY_SIZE, 0, IBV_ACCESS_LOCAL_WRITE); if (!cq->mr) @@ -251,7 +250,7 @@ err_unreg: mthca_dereg_mr(cq->mr); err_buf: - free(cq->buf); + mthca_free_buf(&cq->buf); err: free(cq); @@ -264,7 +263,7 @@ int mthca_resize_cq(struct ibv_cq *ibcq, struct mthca_cq *cq = to_mcq(ibcq); struct mthca_resize_cq cmd; struct ibv_mr *mr; - void *buf; + struct mthca_buf buf; int old_cqe; int ret; @@ -280,17 +279,15 @@ int mthca_resize_cq(struct ibv_cq *ibcq, goto out; } - buf = mthca_alloc_cq_buf(to_mdev(ibcq->context->device), cqe); - if (!buf) { - ret = ENOMEM; + ret = mthca_alloc_cq_buf(to_mdev(ibcq->context->device), &buf, cqe); + if (ret) goto out; - } - mr = __mthca_reg_mr(to_mctx(ibcq->context)->pd, buf, + mr = __mthca_reg_mr(to_mctx(ibcq->context)->pd, buf.buf, cqe * MTHCA_CQ_ENTRY_SIZE, 0, IBV_ACCESS_LOCAL_WRITE); if (!mr) { - free(buf); + mthca_free_buf(&buf); ret = ENOMEM; goto out; } @@ -303,14 +300,14 @@ int mthca_resize_cq(struct ibv_cq *ibcq, ret = ibv_cmd_resize_cq(ibcq, cqe - 1, &cmd.ibv_cmd, sizeof cmd); if (ret) { mthca_dereg_mr(mr); - free(buf); + mthca_free_buf(&buf); goto out; } - mthca_cq_resize_copy_cqes(cq, buf, old_cqe); + mthca_cq_resize_copy_cqes(cq, buf.buf, old_cqe); mthca_dereg_mr(cq->mr); - free(cq->buf); + mthca_free_buf(&cq->buf); cq->buf = buf; cq->mr = mr; @@ -336,8 +333,7 @@ int mthca_destroy_cq(struct ibv_cq *cq) } mthca_dereg_mr(to_mcq(cq)->mr); - - free(to_mcq(cq)->buf); + mthca_free_buf(&to_mcq(cq)->buf); free(to_mcq(cq)); return 0; @@ -389,7 +385,7 @@ struct ibv_srq *mthca_create_srq(struct if (mthca_alloc_srq_buf(pd, &attr->attr, srq)) goto err; - srq->mr = __mthca_reg_mr(pd, srq->buf, srq->buf_size, 0, 0); + srq->mr = __mthca_reg_mr(pd, srq->buf.buf, srq->buf_size, 0, 0); if (!srq->mr) goto err_free; @@ -430,7 +426,7 @@ err_unreg: err_free: free(srq->wrid); - free(srq->buf); + mthca_free_buf(&srq->buf); err: free(srq); @@ -469,7 +465,7 @@ int mthca_destroy_srq(struct ibv_srq *sr mthca_dereg_mr(to_msrq(srq)->mr); - free(to_msrq(srq)->buf); + mthca_free_buf(&to_msrq(srq)->buf); free(to_msrq(srq)->wrid); free(to_msrq(srq)); @@ -507,7 +503,7 @@ struct ibv_qp *mthca_create_qp(struct ib pthread_spin_init(&qp->rq.lock, PTHREAD_PROCESS_PRIVATE)) goto err_free; - qp->mr = __mthca_reg_mr(pd, qp->buf, qp->buf_size, 0, 0); + qp->mr = __mthca_reg_mr(pd, qp->buf.buf, qp->buf_size, 0, 0); if (!qp->mr) goto err_free; @@ -574,7 +570,7 @@ err_unreg: err_free: free(qp->wrid); - free(qp->buf); + mthca_free_buf(&qp->buf); err: free(qp); @@ -655,8 +651,7 @@ int mthca_destroy_qp(struct ibv_qp *qp) } mthca_dereg_mr(to_mqp(qp)->mr); - - free(to_mqp(qp)->buf); + mthca_free_buf(&to_mqp(qp)->buf); free(to_mqp(qp)->wrid); free(to_mqp(qp)); Index: libmthca/src/mthca.h =================================================================== --- libmthca/src/mthca.h (revision 8791) +++ libmthca/src/mthca.h (working copy) @@ -112,6 +112,11 @@ struct mthca_context { int qp_table_mask; }; +struct mthca_buf { + void *buf; + size_t length; +}; + struct mthca_pd { struct ibv_pd ibv_pd; struct mthca_ah_page *ah_list; @@ -121,7 +126,7 @@ struct mthca_pd { struct mthca_cq { struct ibv_cq ibv_cq; - void *buf; + struct mthca_buf buf; pthread_spinlock_t lock; struct ibv_mr *mr; uint32_t cqn; @@ -137,7 +142,7 @@ struct mthca_cq { struct mthca_srq { struct ibv_srq ibv_srq; - void *buf; + struct mthca_buf buf; void *last; pthread_spinlock_t lock; struct ibv_mr *mr; @@ -174,7 +179,7 @@ struct mthca_wq { struct mthca_qp { struct ibv_qp ibv_qp; - void *buf; + struct mthca_buf buf; uint64_t *wrid; int send_wqe_offset; int max_inline_data; @@ -259,6 +264,9 @@ static inline int mthca_is_memfree(struc return to_mdev(ibctx->device)->hca_type == MTHCA_ARBEL; } +int mthca_alloc_buf(struct mthca_buf *buf, size_t size, int page_size); +void mthca_free_buf(struct mthca_buf *buf); + int mthca_alloc_db(struct mthca_db_table *db_tab, enum mthca_db_type type, uint32_t **db); void mthca_set_db_qn(uint32_t *db, enum mthca_db_type type, uint32_t qn); @@ -290,7 +298,7 @@ void mthca_arbel_cq_event(struct ibv_cq void mthca_cq_clean(struct mthca_cq *cq, uint32_t qpn, struct mthca_srq *srq); void mthca_cq_resize_copy_cqes(struct mthca_cq *cq, void *buf, int new_cqe); -void *mthca_alloc_cq_buf(struct mthca_device *dev, int cqe); +int mthca_alloc_cq_buf(struct mthca_device *dev, struct mthca_buf *buf, int nent); struct ibv_srq *mthca_create_srq(struct ibv_pd *pd, struct ibv_srq_init_attr *attr); Index: libmthca/src/cq.c =================================================================== --- libmthca/src/cq.c (revision 8791) +++ libmthca/src/cq.c (working copy) @@ -126,7 +126,7 @@ struct mthca_err_cqe { static inline struct mthca_cqe *get_cqe(struct mthca_cq *cq, int entry) { - return cq->buf + entry * MTHCA_CQ_ENTRY_SIZE; + return cq->buf.buf + entry * MTHCA_CQ_ENTRY_SIZE; } static inline struct mthca_cqe *cqe_sw(struct mthca_cq *cq, int i) @@ -612,17 +612,16 @@ void mthca_cq_resize_copy_cqes(struct mt get_cqe(cq, i & old_cqe), MTHCA_CQ_ENTRY_SIZE); } -void *mthca_alloc_cq_buf(struct mthca_device *dev, int nent) +int mthca_alloc_cq_buf(struct mthca_device *dev, struct mthca_buf *buf, int nent) { - void *buf; int i; - if (posix_memalign(&buf, dev->page_size, - align(nent * MTHCA_CQ_ENTRY_SIZE, dev->page_size))) - return NULL; + if (mthca_alloc_buf(buf, align(nent * MTHCA_CQ_ENTRY_SIZE, dev->page_size), + dev->page_size)) + return -1; for (i = 0; i < nent; ++i) - ((struct mthca_cqe *) buf)[i].owner = MTHCA_CQ_ENTRY_OWNER_HW; + ((struct mthca_cqe *) buf->buf)[i].owner = MTHCA_CQ_ENTRY_OWNER_HW; - return buf; + return 0; } Index: libmthca/src/srq.c =================================================================== --- libmthca/src/srq.c (revision 8791) +++ libmthca/src/srq.c (working copy) @@ -47,7 +47,7 @@ static void *get_wqe(struct mthca_srq *srq, int n) { - return srq->buf + (n << srq->wqe_shift); + return srq->buf.buf + (n << srq->wqe_shift); } /* @@ -292,13 +292,14 @@ int mthca_alloc_srq_buf(struct ibv_pd *p srq->buf_size = srq->max << srq->wqe_shift; - if (posix_memalign(&srq->buf, to_mdev(pd->context->device)->page_size, - align(srq->buf_size, to_mdev(pd->context->device)->page_size))) { + if (mthca_alloc_buf(&srq->buf, + align(srq->buf_size, to_mdev(pd->context->device)->page_size), + to_mdev(pd->context->device)->page_size)) { free(srq->wrid); return -1; } - memset(srq->buf, 0, srq->buf_size); + memset(srq->buf.buf, 0, srq->buf_size); /* * Now initialize the SRQ buffer so that all of the WQEs are Index: libmthca/src/ah.c =================================================================== --- libmthca/src/ah.c (revision 8791) +++ libmthca/src/ah.c (working copy) @@ -45,7 +45,7 @@ struct mthca_ah_page { struct mthca_ah_page *prev, *next; - void *buf; + struct mthca_buf buf; struct ibv_mr *mr; int use_cnt; unsigned free[0]; @@ -60,14 +60,14 @@ static struct mthca_ah_page *__add_page( if (!page) return NULL; - if (posix_memalign(&page->buf, page_size, page_size)) { + if (mthca_alloc_buf(&page->buf, page_size, page_size)) { free(page); return NULL; } - page->mr = mthca_reg_mr(&pd->ibv_pd, page->buf, page_size, 0); + page->mr = mthca_reg_mr(&pd->ibv_pd, page->buf.buf, page_size, 0); if (!page->mr) { - free(page->buf); + mthca_free_buf(&page->buf); free(page); return NULL; } @@ -123,7 +123,7 @@ int mthca_alloc_av(struct mthca_pd *pd, if (page->free[i]) { j = ffs(page->free[i]); page->free[i] &= ~(1 << (j - 1)); - ah->av = page->buf + + ah->av = page->buf.buf + (i * 8 * sizeof (int) + (j - 1)) * sizeof *ah->av; break; } @@ -172,7 +172,7 @@ void mthca_free_av(struct mthca_ah *ah) pthread_mutex_lock(&pd->ah_mutex); page = ah->page; - i = ((void *) ah->av - page->buf) / sizeof *ah->av; + i = ((void *) ah->av - page->buf.buf) / sizeof *ah->av; page->free[i / (8 * sizeof (int))] |= 1 << (i % (8 * sizeof (int))); if (!--page->use_cnt) { @@ -184,7 +184,7 @@ void mthca_free_av(struct mthca_ah *ah) page->next->prev = page->prev; mthca_dereg_mr(page->mr); - free(page->buf); + mthca_free_buf(&page->buf); free(page); } Index: libmthca/ChangeLog =================================================================== --- libmthca/ChangeLog (revision 8791) +++ libmthca/ChangeLog (working copy) @@ -1,3 +1,19 @@ +2006-07-26 Roland Dreier + + * src/mthca.h, src/ah.c, src/cq.c, src/memfree.c, src/qp.c, + src/srq.c, src/verbs.c: Convert internal allocations for AH pages + (for non-memfree HCAs), CQ buffers, doorbell pages (for memfree + HCAs), QP buffers and SRQ buffers to use the new buffer + allocator. This makes libmthca fork()-clean when built against + libibverbs 1.1. + + * src/buf.c (mthca_alloc_buf, mthca_free_buf): Add new functions + to wrap up allocating page-aligned buffers. The new functions + will call ibv_dontfork_range()/ibv_dofork_range() to do proper + madvise()ing to handle fork(), if applicable. + + * configure.in: Check for ibv_dontfork_range() and ibv_dontfork_range(). + 2006-07-04 Dotan Barak * src/verbs.c (mthca_create_cq, mthca_resize_cq): Passing huge Index: libmthca/Makefile.am =================================================================== --- libmthca/Makefile.am (revision 8791) +++ libmthca/Makefile.am (working copy) @@ -12,10 +12,9 @@ else mthca_version_script = endif -src_mthca_la_SOURCES = src/ah.c src/cq.c src/memfree.c src/mthca.c src/qp.c \ - src/srq.c src/verbs.c -src_mthca_la_LDFLAGS = -avoid-version -module \ - $(mthca_version_script) +src_mthca_la_SOURCES = src/ah.c src/buf.c src/cq.c src/memfree.c src/mthca.c \ + src/qp.c src/srq.c src/verbs.c +src_mthca_la_LDFLAGS = -avoid-version -module $(mthca_version_script) DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ debian/libmthca1.install debian/libmthca-dev.install debian/rules From rdreier at cisco.com Mon Jul 31 12:04:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 12:04:21 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: <20060731184415.GB1998@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 31 Jul 2006 11:44:15 -0700") References: <20060731151538.GA23392@greglaptop.hsd1.ca.comcast.net> <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> <20060731182713.GO1098@greglaptop.internal.keyresearch.com> <20060731184415.GB1998@greglaptop.internal.keyresearch.com> Message-ID: Greg> This is an issue important enough that having an explicit Greg> discussion is a good idea. It shouldn't have come up as part Greg> of a patch. I'm not really convinced of the importance. To me the exact prefix of the verbs library symbols seems like a bike shed. Unless someone else has a problem with the rmdav_ name then I think we should let this die. - R. From mshefty at ichips.intel.com Mon Jul 31 12:10:15 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 31 Jul 2006 12:10:15 -0700 Subject: [openib-general] Fwd: issues in ipoib In-Reply-To: References: <20060731180135.GE11778@mellanox.co.il> Message-ID: <44CE5597.2050800@ichips.intel.com> Roland Dreier wrote: > > 4. module unloading races > > http://openib.org/pipermail/openib-general/2006-April/020397.html > > Might be worth looking at to see if it fixes the problems after Sean's > multicast changes. Has anyone come up with a way to reliably get the > crash to happen? I think the problems resulting from the multicast changes have been resolved. I wasn't treating an SA query id of 0 as valid, which threw off the reference counting by 1. (I got a callback that I wasn't expecting.) With ipoib using the new multicast module, I'm not sure that it still has a module unload issue. The problem gets pushed down to the multicast module instead. - Sean From rdreier at cisco.com Mon Jul 31 12:26:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 31 Jul 2006 12:26:05 -0700 Subject: [openib-general] Fwd: issues in ipoib In-Reply-To: <44CE5597.2050800@ichips.intel.com> (Sean Hefty's message of "Mon, 31 Jul 2006 12:10:15 -0700") References: <20060731180135.GE11778@mellanox.co.il> <44CE5597.2050800@ichips.intel.com> Message-ID: Sean> I think the problems resulting from the multicast changes Sean> have been resolved. I wasn't treating an SA query id of 0 Sean> as valid, which threw off the reference counting by 1. (I Sean> got a callback that I wasn't expecting.) OK, cool, I missed that fix. Was anyone able to reproduce the problem reliably? If so can you check that it's gone? - R. From mshefty at ichips.intel.com Mon Jul 31 12:30:49 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 31 Jul 2006 12:30:49 -0700 Subject: [openib-general] Fwd: issues in ipoib In-Reply-To: References: <20060731180135.GE11778@mellanox.co.il> <44CE5597.2050800@ichips.intel.com> Message-ID: <44CE5A69.9080301@ichips.intel.com> Roland Dreier wrote: > Was anyone able to reproduce the problem reliably? If so can you > check that it's gone? I wasn't ever able to reproduce the crash, but I modified a local copy of the code and was able to detect that the reference counts were off from what I was expecting. The fix at least removed this problem for me. - Sean From mst at mellanox.co.il Mon Jul 31 12:58:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 31 Jul 2006 22:58:55 +0300 Subject: [openib-general] Fwd: issues in ipoib In-Reply-To: <44CE5597.2050800@ichips.intel.com> References: <44CE5597.2050800@ichips.intel.com> Message-ID: <20060731195855.GA13597@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] Fwd: issues in ipoib > > Roland Dreier wrote: > > > 4. module unloading races > > > http://openib.org/pipermail/openib-general/2006-April/020397.html > > > > Might be worth looking at to see if it fixes the problems after Sean's > > multicast changes. Has anyone come up with a way to reliably get the > > crash to happen? Yes, this was posted on the list - the instructions just don't happen to trigger the crash for Sean :( > I think the problems resulting from the multicast changes have been resolved. I > wasn't treating an SA query id of 0 as valid, which threw off the reference > counting by 1. (I got a callback that I wasn't expecting.) I think I saw crash reports on latest trunk last time I was running it. I haven't the resources to re-test it at the moment, but I haven't seen changes in ipoib/multicast since. > > With ipoib using the new multicast module, I'm not sure that it still has a > module unload issue. The problem gets pushed down to the multicast module instead. Then the problem will be in multicast module unload instead. -- MST From greg.lindahl at qlogic.com Mon Jul 31 13:05:20 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 31 Jul 2006 13:05:20 -0700 Subject: [openib-general] [PATCH 0/6] Tranport Neutral Verbs Proposal. In-Reply-To: References: <54AD0F12E08D1541B826BE97C98F99F17DC87D@NT-SJCA-0751.brcm.ad.broadcom.com> <20060731171721.GB1098@greglaptop.internal.keyresearch.com> <20060731173832.GG1098@greglaptop.internal.keyresearch.com> <1154368996.3078.10.camel@stevo-desktop> <20060731182713.GO1098@greglaptop.internal.keyresearch.com> <20060731184415.GB1998@greglaptop.internal.keyresearch.com> Message-ID: <20060731200520.GB2314@greglaptop.internal.keyresearch.com> On Mon, Jul 31, 2006 at 12:04:21PM -0700, Roland Dreier wrote: > Unless someone else has a problem with the rmdav_ name then I think we > should let this die. Sounds like a call for an open discssion on it, with a proper subject line, even. And asking outside of openib-general. Which is what I am suggesting. -- greg From halr at voltaire.com Mon Jul 31 14:58:29 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 31 Jul 2006 17:58:29 -0400 Subject: [openib-general] Critical OpenSM and diags issues over the next week Message-ID: <1154383107.17511.93594.camel@hal.voltaire.com> As I will be on vacation starting tomorrow (8/1) through next Monday (8/7), Sasha will be dealing with OpenSM and diags maintainership during that time. Please copy him on any critical issues. -- Hal From mshefty at ichips.intel.com Mon Jul 31 15:11:29 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 31 Jul 2006 15:11:29 -0700 Subject: [openib-general] Fwd: issues in ipoib In-Reply-To: <20060731195855.GA13597@mellanox.co.il> References: <44CE5597.2050800@ichips.intel.com> <20060731195855.GA13597@mellanox.co.il> Message-ID: <44CE8011.40201@ichips.intel.com> Michael S. Tsirkin wrote: > I think I saw crash reports on latest trunk last time I was running it. > I haven't the resources to re-test it at the moment, but I haven't > seen changes in ipoib/multicast since. I would definitely like to confirm that the crash still occurs if it is still happening. If it is I will spend some additional time investigating possible causes. - Sean From venkatesh.babu at 3leafnetworks.com Mon Jul 31 19:19:42 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Mon, 31 Jul 2006 19:19:42 -0700 Subject: [openib-general] APM: QP migration state change when failover triggered by hw In-Reply-To: <200607301005.32499.jackm@mellanox.co.il> References: <44B55981.6040408@3leafnetworks.com> <200607301005.32499.jackm@mellanox.co.il> Message-ID: <44CEBA3E.3060208@3leafnetworks.com> I was testing the APM (Automatic Path Migration) functionality. I found that with OFED 1.0 doesn't support it yet and also it is not planned for OFED 1.1. It is interesting to know when this feature is going to be added. I found that with OFED 1.0, there are some bugs and missing components to support this feature. So I opened #160, #172 and #159 to track these. To answer your question - Configuration1: Node1 and Node2 connected directly with two IB cables without switch Configuration2: Node1 and Node 2 conneected through two switches for each port. Node1, port1 -> switch1 -> Node2, port1 Node1, port2 -> switch2 -> Node2, port2 Node 1: 1. Call ib_cm_listen() to wait for connection requests 2. When a REQ message arrives create a RC QP and establish a connection 3. Setup callback handlers to receive packets. 4. Receive packets and verify it and drop it. 5. Event IB_MIG_MIGRATED received 6. Stopped receiving packets. Node 2: 1. Create RC QP 2. Send REQ message to Node 1 to establish the connection (Load both primary and alternate paths) 3. Contineously send some packets 4. Simulate the port failure by unplugging the IB cable 5. Event IB_MIG_MIGRATED received Actually I fixed this problem in Configuration1 by calling ib_modify_qp() to change the mig_state from IB_MIG_ARMED to IB_MIG_MIGRATED when IB_EVENT_PORT_ERR event occurrs. But with Configuration2, IB_EVENT_PORT_ERR event occurrs on a node1, failover to the alternate path doesn't work. The traffic stops. Because node1 doesn't now when the IB_EVENT_PORT_ERR event occurred on Node2. This requires a interface similar to Gen1 interface tsIbSetOutofServiceNoticeHandler() VBabu Jack Morgenstein wrote: >On Wednesday 12 July 2006 23:20, Venkatesh Babu wrote: > > >> With OFED 1.0, when cable is removed from the port corresponding to the >>primary path, CI sends an event IB_EVENT_PATH_MIG, but is not changeing >>the state to "Migrated" and not migrating to the alternate path. So the >>traffic doesn't resume on the alternate path. >> >> >> > >Could you please describe your flow in more detail (including setup phase), >and, if possible, send us a small test program which illustrates your >problem? > >Thanks! >- Jack > > From krkumar2 at in.ibm.com Mon Jul 31 22:23:14 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Tue, 1 Aug 2006 10:53:14 +0530 Subject: [openib-general] [PATCH 1/6] libibverbs include files changes. In-Reply-To: <44CE3B20.8030907@ichips.intel.com> Message-ID: Hi Sean, I still have a problem. Many enums have both IB specific and RDMA generic defines. Eg : enum ibv_event_type { IBV_EVENT_CQ_ERR, IBV_EVENT_QP_FATAL, IBV_EVENT_QP_REQ_ERR, IBV_EVENT_QP_ACCESS_ERR, IBV_EVENT_COMM_EST, IBV_EVENT_SQ_DRAINED, IBV_EVENT_PATH_MIG, IBV_EVENT_PATH_MIG_ERR, IBV_EVENT_DEVICE_FATAL, IBV_EVENT_PORT_ACTIVE, IBV_EVENT_PORT_ERR, IBV_EVENT_LID_CHANGE, IBV_EVENT_PKEY_CHANGE, IBV_EVENT_SM_CHANGE, IBV_EVENT_SRQ_ERR, IBV_EVENT_SRQ_LIMIT_REACHED, IBV_EVENT_QP_LAST_WQE_REACHED, IBV_EVENT_CLIENT_REREGISTER }; It looks weird to keep some elements in the above enum as IBV_ and others as RDMA_. A lot of other enums have this same problem, eg ibv_qp_attr_mask, infact this is true of all enums/structures/defines that have some elements that are IB specific. Thanks, - KK Sean Hefty wrote on 07/31/2006 10:47:20 PM: > > - "Path records are IB specific. Not sure we need to rename them" and "These > > changes look fine. We just need to decide if we want to change everything > > that's ibv_* to rdma_*, or keep IB specific names (path records, GIDs, PKeys, > > etc.) the same." > > > > I had indicated this in my "Information notes" in the [PATCH 0/6] : "IB > > specific routines are also converted to use RDMA generic API's for sake of > > uniformness (knowing that transport dependent names will be removed once all > > apps are converted)." > > > > The issue is between deciding to have either rd(ma)_v or ibv_ for IB specific > > structures. Currently there is no other transport other than IB that has > > these specific structures, but if that changes it might be better to keep the > > name transport agnostic. Another reason that I see at this time is to have > > uniform names which means that this library exports names using one prefix - > > this means that I do not have to care about the underlying transport type and > > I also do not have to remember that ibv_ is for [a, b, c, d] operations and > > rdma_ is for [e, f, g, h] operations. What do you feel ? > > If an application is looking at a path record, GID, PKey, etc. they they _are_ > caring about the underlying transport type and the fact that it is IB. An > application that wants to be transport neutral would just need to limit itself > to using rdma_* structures and APIs. > > If we take a larger view, I don't think we want transport neutral names for the > IB CM and IB MAD userspace APIs and structures. Things like path records, GIDs, > etc. are also used by those libraries. > > - Sean