From vlad at lists.openfabrics.org Mon Sep 1 03:05:40 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 1 Sep 2008 03:05:40 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080901-0200 daily build status Message-ID: <20080901100540.6A9C2E608DC@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From vlad at mellanox.co.il Mon Sep 1 07:11:03 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 1 Sep 2008 17:11:03 +0300 Subject: [ofa-general] [PATCH] IB/mlx4: Set RAE and FRE flags, initialize mtt_sz field in the mpt entry. Message-ID: <20080901141103.GA32171@mellanox.co.il> rae - enable remote access on this fast memory region. fre - enable Fast Registration Operations on this region. mtt_sz - number of MTT entries allocated for this memory region. Signed-off-by: Vladimir Sokolovsky --- drivers/net/mlx4/mr.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 62071d9..9c026e1 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -67,7 +67,8 @@ struct mlx4_mpt_entry { #define MLX4_MPT_FLAG_PHYSICAL (1 << 9) #define MLX4_MPT_FLAG_REGION (1 << 8) -#define MLX4_MPT_PD_FLAG_FAST_REG (1 << 26) +#define MLX4_MPT_PD_FLAG_FAST_REG (1 << 27) +#define MLX4_MPT_PD_FLAG_RAE (1 << 28) #define MLX4_MPT_PD_FLAG_EN_INV (3 << 24) #define MLX4_MTT_FLAG_PRESENT 1 @@ -349,6 +350,9 @@ int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr) /* fast register MR in free state */ mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_FREE); mpt_entry->pd_flags |= cpu_to_be32(MLX4_MPT_PD_FLAG_FAST_REG); + mpt_entry->pd_flags |= cpu_to_be32(MLX4_MPT_PD_FLAG_RAE); + mpt_entry->mtt_sz = cpu_to_be32((1 << mr->mtt.order) * + MLX4_MTT_ENTRY_PER_SEG); } else { mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_SW_OWNS); } -- 1.6.0.1.90.g27a6e From kliteyn at dev.mellanox.co.il Mon Sep 1 08:03:28 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 01 Sep 2008 18:03:28 +0300 Subject: [ofa-general] [PATCH] opensm/Makefile.am: adding yacc-generated .h file as dependency Message-ID: <48BC0440.8050807@dev.mellanox.co.il> Hi Sasha, Adding header file that is produced by yacc/bison to the general dependencies. W/o it compiling of lex-generated .c file sometimes fails. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/Makefile.am | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/Makefile.am b/opensm/opensm/Makefile.am index 7ca4c2a..f94842c 100644 --- a/opensm/opensm/Makefile.am +++ b/opensm/opensm/Makefile.am @@ -126,7 +126,7 @@ opensminclude_HEADERS = \ $(srcdir)/../include/opensm/osm_vl15intf.h \ $(top_builddir)/include/opensm/osm_version.h -BUILT_SOURCES = osm_version +BUILT_SOURCES = osm_version osm_qos_parser_y.h osm_version: if [ -x $(top_srcdir)/../gen_ver.sh ] ; then \ ver_file=$(top_builddir)/include/opensm/osm_version.h ; \ -- 1.5.1.4 From rdreier at cisco.com Mon Sep 1 08:48:09 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 Sep 2008 08:48:09 -0700 Subject: [ofa-general] Re: [PATCH] IB/mlx4: Set RAE and FRE flags, initialize mtt_sz field in the mpt entry. In-Reply-To: <20080901141103.GA32171@mellanox.co.il> (Vladimir Sokolovsky's message of "Mon, 1 Sep 2008 17:11:03 +0300") References: <20080901141103.GA32171@mellanox.co.il> Message-ID: I need help deciding whether to get this in 2.6.27 or not. With this patch, how is send queue fast register working? If this is the last fix then I think we can get it in 2.6.27. If you are still debugging and it still doesn't work well, then I might want to wait and see how big the required fixes end up being. Thanks, Roland From arlin.r.davis at intel.com Mon Sep 1 19:25:59 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:25:59 -0700 Subject: [ofa-general] [PATCH 2/5][v1.2] dapl: fix compiler warnings in common code Message-ID: <000101c90ca3$3ced0a60$5464fe0a@amr.corp.intel.com> Cleanup uDAPL common code. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_ep_get_status.c | 1 + dapl/common/dapl_ep_modify.c | 2 +- dapl/common/dapl_rmr_bind.c | 4 +++- dapl/udapl/dapl_evd_wait.c | 17 ++++++++++------- dapl/udapl/dapl_lmr_create.c | 5 +++-- 5 files changed, 18 insertions(+), 11 deletions(-) diff --git a/dapl/common/dapl_ep_get_status.c b/dapl/common/dapl_ep_get_status.c index a931355..3266134 100644 --- a/dapl/common/dapl_ep_get_status.c +++ b/dapl/common/dapl_ep_get_status.c @@ -38,6 +38,7 @@ #include "dapl.h" #include "dapl_ring_buffer_util.h" +#include "dapl_cookie.h" /* * dapl_ep_get_status diff --git a/dapl/common/dapl_ep_modify.c b/dapl/common/dapl_ep_modify.c index 74e3331..05b39db 100644 --- a/dapl/common/dapl_ep_modify.c +++ b/dapl/common/dapl_ep_modify.c @@ -84,7 +84,7 @@ dapl_ep_modify ( { DAPL_IA *ia; DAPL_EP *ep1, *ep2; - DAT_EP_ATTR ep_attr1, ep_attr2; + DAT_EP_ATTR ep_attr1 = {0}, ep_attr2 = {0}; DAPL_EP new_ep, copy_of_old_ep; DAPL_EP alloc_ep; /* Holder for resources. */ DAPL_PZ *tmp_pz; diff --git a/dapl/common/dapl_rmr_bind.c b/dapl/common/dapl_rmr_bind.c index 905ea2c..c9dc02f 100644 --- a/dapl/common/dapl_rmr_bind.c +++ b/dapl/common/dapl_rmr_bind.c @@ -84,15 +84,17 @@ dapli_rmr_bind_fuse ( DAPL_COOKIE *cookie; DAT_RETURN dat_status; DAT_BOOLEAN is_signaled; + DAPL_HASH_DATA hash_lmr; dat_status = dapls_hash_search (rmr->header.owner_ia->hca_ptr->lmr_hash_table, lmr_triplet->lmr_context, - (DAPL_HASH_DATA *) &lmr); + &hash_lmr); if ( DAT_SUCCESS != dat_status) { dat_status = DAT_ERROR (DAT_INVALID_PARAMETER, DAT_INVALID_ARG2); goto bail; } + lmr = (DAPL_LMR*)hash_lmr; /* if the ep in unconnected return an error. IB requires that the */ /* QP be connected to change a memory window binding since: */ diff --git a/dapl/udapl/dapl_evd_wait.c b/dapl/udapl/dapl_evd_wait.c index 966cef0..a03c5ea 100644 --- a/dapl/udapl/dapl_evd_wait.c +++ b/dapl/udapl/dapl_evd_wait.c @@ -141,27 +141,30 @@ DAT_RETURN dapl_evd_wait ( waitable = evd_ptr->evd_waitable; dapl_os_assert ( sizeof(DAT_COUNT) == sizeof(DAPL_EVD_STATE) ); - evd_state = dapl_os_atomic_assign ( (DAPL_ATOMIC *)&evd_ptr->evd_state, - (DAT_COUNT) DAPL_EVD_STATE_OPEN, - (DAT_COUNT) DAPL_EVD_STATE_WAITED ); - dapl_os_unlock ( &evd_ptr->header.lock ); + evd_state = evd_ptr->evd_state; + if (evd_ptr->evd_state == DAPL_EVD_STATE_OPEN) + evd_ptr->evd_state = DAPL_EVD_STATE_WAITED; if ( evd_state != DAPL_EVD_STATE_OPEN ) { /* Bogus state, bail out */ dat_status = DAT_ERROR (DAT_INVALID_STATE,0); + dapl_os_unlock ( &evd_ptr->header.lock ); goto bail; } if (!waitable) { /* This EVD is not waitable, reset the state and bail */ - (void) dapl_os_atomic_assign ((DAPL_ATOMIC *)&evd_ptr->evd_state, - (DAT_COUNT) DAPL_EVD_STATE_WAITED, - evd_state); + if (evd_ptr->evd_state == DAPL_EVD_STATE_WAITED) + evd_ptr->evd_state = evd_state; + dat_status = DAT_ERROR (DAT_INVALID_STATE, DAT_INVALID_STATE_EVD_UNWAITABLE); + dapl_os_unlock ( &evd_ptr->header.lock ); goto bail; } + dapl_os_unlock ( &evd_ptr->header.lock ); + /* * We now own the EVD, even though we don't have the lock anymore, diff --git a/dapl/udapl/dapl_lmr_create.c b/dapl/udapl/dapl_lmr_create.c index 2a71864..b2492ea 100644 --- a/dapl/udapl/dapl_lmr_create.c +++ b/dapl/udapl/dapl_lmr_create.c @@ -202,6 +202,7 @@ dapli_lmr_create_lmr ( DAPL_LMR *lmr; DAT_REGION_DESCRIPTION reg_desc; DAT_RETURN dat_status; + DAPL_HASH_DATA hash_lmr; dapl_dbg_log (DAPL_DBG_TYPE_API, "dapl_lmr_create_lmr (%p, %p, %p, %x, %p, %p, %p, %p)\n", @@ -215,13 +216,13 @@ dapli_lmr_create_lmr ( dat_status = dapls_hash_search (ia->hca_ptr->lmr_hash_table, original_lmr->param.lmr_context, - (DAPL_HASH_DATA *) &lmr); + &hash_lmr); if ( dat_status != DAT_SUCCESS ) { dat_status = DAT_ERROR (DAT_INVALID_PARAMETER,DAT_INVALID_ARG2); goto bail; } - + lmr = (DAPL_LMR*)hash_lmr; reg_desc.for_lmr_handle = (DAT_LMR_HANDLE) original_lmr; lmr = dapl_lmr_alloc (ia, -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:25:54 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:25:54 -0700 Subject: [ofa-general] [PATCH 1/5][v1.2] dtest/dapltest: fix compiler warnings, add _GNU_SOURCE to test application builds Message-ID: <000001c90ca3$3b3dd3c0$5464fe0a@amr.corp.intel.com> Patch set to cleanup all warnings and fix fedora build issues. dtest/dapltest: fix all compiler warnings, cleanup test code, build with -Wall, -D_GNU_SOURCE. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- test/dapltest/Makefile.am | 2 +- test/dapltest/cmd/dapl_netaddr.c | 2 +- test/dapltest/test/dapl_limit.c | 125 ++++++++++++++++++-------------------- test/dtest/Makefile.am | 2 +- test/dtest/dtest.c | 62 ++++++++----------- 5 files changed, 89 insertions(+), 104 deletions(-) diff --git a/test/dapltest/Makefile.am b/test/dapltest/Makefile.am index 1a19c53..d826f80 100755 --- a/test/dapltest/Makefile.am +++ b/test/dapltest/Makefile.am @@ -3,7 +3,7 @@ INCLUDES = -I include \ -I $(srcdir)/../../dat/include bin_PROGRAMS = dapltest - +dapltest_CFLAGS = -g -Wall -D_GNU_SOURCE dapltest_SOURCES = \ cmd/dapl_main.c \ cmd/dapl_params.c \ diff --git a/test/dapltest/cmd/dapl_netaddr.c b/test/dapltest/cmd/dapl_netaddr.c index a306335..e1600d5 100644 --- a/test/dapltest/cmd/dapl_netaddr.c +++ b/test/dapltest/cmd/dapl_netaddr.c @@ -90,7 +90,7 @@ DT_NetAddrLookupHostAddress (DAT_IA_ADDRESS_PTR to_netaddr, whatzit = "service unavailable"; break; } -#if !defined(WIN32) +#if !defined(WIN32) && defined(__USE_GNU) case EAI_ADDRFAMILY: { whatzit = "node has no address in this family"; diff --git a/test/dapltest/test/dapl_limit.c b/test/dapltest/test/dapl_limit.c index f619edd..e308bef 100644 --- a/test/dapltest/test/dapl_limit.c +++ b/test/dapltest/test/dapl_limit.c @@ -36,13 +36,13 @@ static bool more_handles (DT_Tdep_Print_Head *phead, - DAT_HANDLE **old_ptrptr, /* pointer to current pointer */ + void **old_ptrptr, /* pointer to current pointer */ unsigned int *old_count, /* number pointed to */ unsigned int size) /* size of one datum */ { unsigned int count = *old_count; - DAT_HANDLE *old_handles = *old_ptrptr; - DAT_HANDLE *handle_tmp = DT_Mdep_Malloc (count * 2 * size); + void *old_handles = *old_ptrptr; + void *handle_tmp = DT_Mdep_Malloc (count * 2 * size); if (!handle_tmp) { @@ -166,9 +166,9 @@ limit_test ( DT_Tdep_Print_Head *phead, DAT_EVD_HANDLE ia_async_handle; } OneOpen; - unsigned int count = START_COUNT; - OneOpen *hdlptr = (OneOpen *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof(OneOpen)); + OneOpen *hdlptr = (OneOpen *)hptr; /* IA Exhaustion test loop */ if (hdlptr) @@ -181,14 +181,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles (phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: IAs opened: %d\n", module, w); retval = true; break; } + hdlptr = (OneOpen *)hptr; /* Specify that we want to get back an async EVD. */ hdlptr[w].ia_async_handle = DAT_HANDLE_NULL; ret = dat_ia_open (cmd->device_name, @@ -265,9 +264,9 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many PZs we can create */ - unsigned int count = START_COUNT; - DAT_PZ_HANDLE *hdlptr = (DAT_PZ_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof(DAT_PZ_HANDLE)); + DAT_PZ_HANDLE *hdlptr = (DAT_PZ_HANDLE *)hptr; /* PZ Exhaustion test loop */ if (hdlptr) @@ -282,14 +281,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles(phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: PZs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_PZ_HANDLE *)hptr; ret = dat_pz_create (hdl_sets[w % cmd->width].ia_handle, &hdlptr[w]); if (ret != DAT_SUCCESS) @@ -363,10 +361,10 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many CNOs we can create */ - unsigned int count = START_COUNT; - DAT_CNO_HANDLE *hdlptr = (DAT_CNO_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); - + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof(DAT_CNO_HANDLE)); + DAT_CNO_HANDLE *hdlptr = (DAT_CNO_HANDLE *)hptr; + /* CNO Exhaustion test loop */ if (hdlptr) { @@ -380,14 +378,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles(phead, &hptr, &count, sizeof (*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: CNOs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_CNO_HANDLE *)hptr; ret = dat_cno_create (hdl_sets[w % cmd->width].ia_handle, DAT_OS_WAIT_PROXY_AGENT_NULL, &hdlptr[w]); @@ -484,9 +481,10 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many EVDs we can create */ - unsigned int count = START_COUNT; - DAT_EVD_HANDLE *hdlptr = (DAT_EVD_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc(count * sizeof(DAT_EVD_HANDLE)); + DAT_EVD_HANDLE *hdlptr = (DAT_EVD_HANDLE *)hptr; + DAT_EVD_FLAGS flags = ( DAT_EVD_DTO_FLAG | DAT_EVD_RMR_BIND_FLAG | DAT_EVD_CR_FLAG); @@ -519,14 +517,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles(phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: EVDs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_EVD_HANDLE *)hptr; ret = DT_Tdep_evd_create (hdl_sets[w % cmd->width].ia_handle, DFLT_QLEN, hdl_sets[w % cmd->width].cno_handle, @@ -603,9 +600,9 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many EPs we can create */ - unsigned int count = START_COUNT; - DAT_EP_HANDLE *hdlptr = (DAT_EP_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc(count * sizeof(DAT_EP_HANDLE)); + DAT_EP_HANDLE *hdlptr = (DAT_EP_HANDLE *)hptr; /* EP Exhaustion test loop */ if (hdlptr) @@ -618,14 +615,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles(phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: EPs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_EP_HANDLE *)hptr; ret = dat_ep_create (hdl_sets[w % cmd->width].ia_handle, hdl_sets[w % cmd->width].pz_handle, hdl_sets[w % cmd->width].evd_handle, @@ -674,11 +670,11 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many RSPs we can create */ - unsigned int count = START_COUNT; - DAT_RSP_HANDLE *hdlptr = (DAT_RSP_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); - DAT_EP_HANDLE *epptr = (DAT_EP_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*epptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc(count * sizeof (DAT_RSP_HANDLE)); + DAT_RSP_HANDLE *hdlptr = (DAT_RSP_HANDLE *)hptr; + void *eptr = DT_Mdep_Malloc(count * sizeof (DAT_EP_HANDLE)); + DAT_EP_HANDLE *epptr = (DAT_EP_HANDLE *)eptr; /* RSP Exhaustion test loop */ if (hdlptr) @@ -695,23 +691,21 @@ limit_test ( DT_Tdep_Print_Head *phead, unsigned int count1 = count; unsigned int count2 = count; - if (!more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count1, - sizeof (*hdlptr))) + if (!more_handles(phead, &hptr, &count1, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: RSPs created: %d\n", module, w); retval = true; break; } - if (!more_handles (phead, (DAT_HANDLE **) &epptr, - &count2, - sizeof (*epptr))) + hdlptr = (DAT_RSP_HANDLE *)hptr; + + if (!more_handles (phead, &eptr, &count2, sizeof(*epptr))) { DT_Tdep_PT_Printf (phead, "%s: RSPs created: %d\n", module, w); retval = true; break; } - + epptr = (DAT_EP_HANDLE *)eptr; if (count1 != count2) { DT_Tdep_PT_Printf (phead, "%s: Mismatch in allocation of handle arrays at point %d\n", @@ -810,9 +804,9 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many PSPs we can create */ - unsigned int count = START_COUNT; - DAT_PSP_HANDLE *hdlptr = (DAT_PSP_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof (DAT_PSP_HANDLE)); + DAT_PSP_HANDLE *hdlptr = (DAT_PSP_HANDLE *)hptr; /* PSP Exhaustion test loop */ if (hdlptr) @@ -825,14 +819,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles (phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: PSPs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_PSP_HANDLE *)hptr; ret = dat_psp_create (hdl_sets[w % cmd->width].ia_handle, CONN_QUAL0 + w, hdl_sets[w % cmd->width].evd_handle, @@ -936,10 +929,10 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many LMRs we can create */ - unsigned int count = START_COUNT; - Bpool **hdlptr = (Bpool **) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); - + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof(Bpool*)); + Bpool **hdlptr = (Bpool **)hptr; + /* LMR Exhaustion test loop */ if (hdlptr) { @@ -951,9 +944,7 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles (phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: no memory for LMR handles\n", module); @@ -961,6 +952,7 @@ limit_test ( DT_Tdep_Print_Head *phead, retval = true; break; } + hdlptr = (Bpool **)hptr; /* * Let BpoolAlloc do the hard work; this means that * we're testing unique memory registrations rather @@ -1012,14 +1004,15 @@ limit_test ( DT_Tdep_Print_Head *phead, * We are posting the same buffer 'cnt' times, deliberately, * but that should be OK. */ - unsigned int count = START_COUNT; - DAT_LMR_TRIPLET *hdlptr = (DAT_LMR_TRIPLET *) - DT_Mdep_Malloc (count * cmd->width * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = + DT_Mdep_Malloc(count * cmd->width * sizeof(DAT_LMR_TRIPLET)); + DAT_LMR_TRIPLET *hdlptr = (DAT_LMR_TRIPLET *)hptr; /* Recv-Post Exhaustion test loop */ if (hdlptr) { - unsigned int w = 0; + unsigned int w = 0; unsigned int i = 0; unsigned int done = 0; @@ -1028,9 +1021,8 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - cmd->width * sizeof (*hdlptr))) + && !more_handles (phead, &hptr, &count, + cmd->width * sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: no memory for IOVs \n", module); @@ -1042,6 +1034,7 @@ limit_test ( DT_Tdep_Print_Head *phead, done = retval = true; break; } + hdlptr = (DAT_LMR_TRIPLET *)hptr; for (i = 0; i < cmd->width; i++) { DAT_LMR_TRIPLET *iovp = &hdlptr[w * cmd->width + i]; diff --git a/test/dtest/Makefile.am b/test/dtest/Makefile.am index fcb9b4e..fb605ba 100755 --- a/test/dtest/Makefile.am +++ b/test/dtest/Makefile.am @@ -1,5 +1,5 @@ bin_PROGRAMS = dtest dtest_SOURCES = dtest.c +dtest_CFLAGS = -g -Wall -D_GNU_SOURCE INCLUDES = -I $(srcdir)/../../dat/include dtest_LDADD = $(srcdir)/../../dat/udat/libdat.la - diff --git a/test/dtest/dtest.c b/test/dtest/dtest.c index 039b6bf..a93f878 100755 --- a/test/dtest/dtest.c +++ b/test/dtest/dtest.c @@ -50,6 +50,7 @@ #define DAPL_PROVIDER "OpenIB-cma" #endif +#define F64x "%"PRIx64"" #define MAX_POLLING_CNT 50000 #define MAX_RDMA_RD 4 #define MAX_PROCS 1000 @@ -142,7 +143,6 @@ struct { } time; /* defaults */ -static int parent=1; static int connected=0; static int burst=10; static int server=1; @@ -151,17 +151,13 @@ static int polling=0; static int poll_count=0; static int rdma_wr_poll_count=0; static int rdma_rd_poll_count[MAX_RDMA_RD]={0}; -static int pin_memory=0; static int delay=0; static int buf_len=RDMA_BUFFER_SIZE; static int use_cno=0; -static int post_recv_count=MSG_BUF_COUNT; static int recv_msg_index=0; static int burst_msg_posted=0; static int burst_msg_index=0; -static pid_t child[MAX_PROCS+1]; - /* forward prototypes */ const char * DT_RetToString (DAT_RETURN ret_value); const char * DT_EventToSTr (DAT_EVENT_NUMBER event_code); @@ -188,7 +184,7 @@ DAT_RETURN do_ping_pong_msg( void ); #define LOGPRINTF(_format, _aa...) \ if (verbose) \ printf(_format, ##_aa) - +int main(int argc, char **argv) { int i,c; @@ -358,12 +354,12 @@ main(int argc, char **argv) inet_ntop(AF_INET, &((struct sockaddr_in *)ep_param.local_ia_address_ptr)->sin_addr, addr_str, sizeof(addr_str)); - printf("\n%d Query EP: LOCAL addr %s port %d\n", getpid(), + printf("\n%d Query EP: LOCAL addr %s port "F64x"\n", getpid(), addr_str, ep_param.local_port_qual); inet_ntop(AF_INET, &((struct sockaddr_in *)ep_param.remote_ia_address_ptr)->sin_addr, addr_str, sizeof(addr_str)); - printf("%d Query EP: REMOTE addr %s port %d\n", getpid(), + printf("%d Query EP: REMOTE addr %s port "F64x"\n", getpid(), addr_str, ep_param.remote_port_qual); fflush(stdout); @@ -492,6 +488,7 @@ cleanup: /* free rdma buffers */ free(rbuf); free(sbuf); + return(0); } @@ -577,7 +574,7 @@ send_msg( void *data, if ((event.event_data.dto_completion_event_data.transfered_length != size ) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != 0xaaaa )) { - fprintf(stderr, "%d: ERROR: DTO len %d or cookie " PRIx64 "\n", + fprintf(stderr, "%d: ERROR: DTO len "F64x" or cookie "F64x"\n", getpid(), event.event_data.dto_completion_event_data.transfered_length, event.event_data.dto_completion_event_data.user_cookie.as_64 ); @@ -599,7 +596,6 @@ DAT_RETURN connect_ep( char *hostname, int conn_id ) { DAT_SOCK_ADDR remote_addr; - DAT_EP_ATTR ep_attr; DAT_RETURN ret; DAT_REGION_DESCRIPTION region; DAT_EVENT event; @@ -611,7 +607,7 @@ connect_ep( char *hostname, int conn_id ) /* Register send message buffer */ LOGPRINTF("%d Registering send Message Buffer %p, len %d\n", - getpid(), &rmr_send_msg, sizeof(DAT_RMR_TRIPLET) ); + getpid(), &rmr_send_msg, (int)sizeof(DAT_RMR_TRIPLET)); region.for_va = &rmr_send_msg; ret = dat_lmr_create( h_ia, DAT_MEM_TYPE_VIRTUAL, @@ -800,7 +796,8 @@ connect_ep( char *hostname, int conn_id ) rmr_send_msg.target_address = (DAT_VADDR)(unsigned long)rbuf; rmr_send_msg.segment_length = RDMA_BUFFER_SIZE; - printf("%d Send RMR to remote: snd_msg: r_key_ctx=%x,pad=%x,va=%llx,len=0x%x\n", + printf("%d Send RMR to remote: snd_msg: r_key_ctx=%x,pad=%x, " + "va="F64x",len="F64x"\n", getpid(), rmr_send_msg.rmr_context, rmr_send_msg.pad, rmr_send_msg.target_address, rmr_send_msg.segment_length ); @@ -862,16 +859,17 @@ connect_ep( char *hostname, int conn_id ) sizeof( DAT_RMR_TRIPLET )) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != recv_msg_index) ) { - fprintf(stderr,"ERR recv event: len=%d cookie=" PRIx64 " expected %d/%d\n", + fprintf(stderr,"ERR recv event: len=%d cookie="F64x" expected %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, - (int)event.event_data.dto_completion_event_data.user_cookie.as_64, - sizeof(DAT_RMR_TRIPLET), recv_msg_index ); + event.event_data.dto_completion_event_data.user_cookie.as_64, + (int)sizeof(DAT_RMR_TRIPLET), recv_msg_index ); return( DAT_ABORT ); } r_iov = rmr_recv_msg[ recv_msg_index ]; - printf("%d Received RMR from remote: r_iov: r_key_ctx=%x,pad=%x,va=%llx,len=0x%x\n", + printf("%d Received RMR from remote: r_iov: r_key_ctx=%x,pad=%x " + ",va="F64x",len="F64x"\n", getpid(), r_iov.rmr_context, r_iov.pad, r_iov.target_address, r_iov.segment_length ); @@ -887,7 +885,6 @@ disconnect_ep() DAT_RETURN ret; DAT_EVENT event; DAT_COUNT nmore; - int i,flush_cnt; if (connected) { @@ -962,13 +959,11 @@ disconnect_ep() DAT_RETURN do_rdma_write_with_msg( ) { - DAT_REGION_DESCRIPTION region; DAT_EVENT event; DAT_COUNT nmore; DAT_LMR_TRIPLET l_iov[MSG_IOV_COUNT]; DAT_RMR_TRIPLET r_iov; DAT_DTO_COOKIE cookie; - DAT_RMR_CONTEXT their_context; DAT_RETURN ret; int i; @@ -994,7 +989,7 @@ do_rdma_write_with_msg( ) l_iov[i].virtual_address = (DAT_VADDR)(unsigned long) (&sbuf[l_iov[i].segment_length*i]); - LOGPRINTF("%d rdma_write iov[%d] buf=%p,len=%d\n", + LOGPRINTF("%d rdma_write iov[%d] buf=%p,len="F64x"\n", getpid(), i, &sbuf[l_iov[i].segment_length*i], l_iov[i].segment_length); } @@ -1081,17 +1076,17 @@ do_rdma_write_with_msg( ) if ( (event.event_data.dto_completion_event_data.transfered_length != sizeof( DAT_RMR_TRIPLET )) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != recv_msg_index) ) { + - fprintf(stderr,"unexpected event data for receive: len=%d cookie=" PRIx64 " exp %d/%d\n", + fprintf(stderr,"unexpected event data for receive: len=%d cookie="F64x" exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, - (int)event.event_data.dto_completion_event_data.user_cookie.as_64, - sizeof(DAT_RMR_TRIPLET), recv_msg_index ); + event.event_data.dto_completion_event_data.user_cookie.as_64, + (int)sizeof(DAT_RMR_TRIPLET), recv_msg_index ); return( DAT_ABORT ); } r_iov = rmr_recv_msg[ recv_msg_index ]; - printf("%d Received RMR from remote: r_iov: ctx=%x,pad=%x,va=%p,len=0x%x\n", + printf("%d Received RMR from remote: r_iov: ctx=%x,pad=%x,va=%p,len="F64x"\n", getpid(), r_iov.rmr_context, r_iov.pad, (void*)(unsigned long)r_iov.target_address, @@ -1112,13 +1107,11 @@ do_rdma_write_with_msg( ) DAT_RETURN do_rdma_read_with_msg( ) { - DAT_REGION_DESCRIPTION region; DAT_EVENT event; DAT_COUNT nmore; DAT_LMR_TRIPLET l_iov; DAT_RMR_TRIPLET r_iov; DAT_DTO_COOKIE cookie; - DAT_RMR_CONTEXT their_context; DAT_RETURN ret; int i; @@ -1191,9 +1184,9 @@ do_rdma_read_with_msg( ) } if ((event.event_data.dto_completion_event_data.transfered_length != buf_len ) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != 0x9999 )) { - fprintf(stderr, "%d: ERROR: DTO len %d or cookie " PRIx64 "\n", + fprintf(stderr, "%d: ERROR: DTO len %d or cookie "F64x"\n", getpid(), - event.event_data.dto_completion_event_data.transfered_length, + (int)event.event_data.dto_completion_event_data.transfered_length, event.event_data.dto_completion_event_data.user_cookie.as_64 ); return( DAT_ABORT ); } @@ -1273,17 +1266,17 @@ do_rdma_read_with_msg( ) if ( (event.event_data.dto_completion_event_data.transfered_length != sizeof( DAT_RMR_TRIPLET )) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != recv_msg_index) ) { - fprintf(stderr,"unexpected event data for receive: len=%d cookie=" PRIx64 " exp %d/%d\n", + fprintf(stderr,"unexpected event data for receive: len=%d cookie="F64x" exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, - (int)event.event_data.dto_completion_event_data.user_cookie.as_64, - sizeof(DAT_RMR_TRIPLET), recv_msg_index ); + event.event_data.dto_completion_event_data.user_cookie.as_64, + (int)sizeof(DAT_RMR_TRIPLET), recv_msg_index ); return( DAT_ABORT ); } r_iov = rmr_recv_msg[ recv_msg_index ]; - printf("%d Received RMR from remote: r_iov: ctx=%x,pad=%x,va=%p,len=0x%x\n", + printf("%d Received RMR from remote: r_iov: ctx=%x,pad=%x,va=%p,len="F64x"\n", getpid(), r_iov.rmr_context, r_iov.pad, (void*)(unsigned long)r_iov.target_address, r_iov.segment_length ); @@ -1425,9 +1418,9 @@ do_ping_pong_msg( ) != buf_len) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != burst_msg_index) ) { - fprintf(stderr,"ERR: recv event: len=%d cookie=" PRIx64 " exp %d/%d\n", + fprintf(stderr,"ERR: recv event: len=%d cookie="F64x" exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, - (int)event.event_data.dto_completion_event_data.user_cookie.as_64, + event.event_data.dto_completion_event_data.user_cookie.as_64, buf_len, burst_msg_index ); return( DAT_ABORT ); @@ -1760,7 +1753,6 @@ const char * DT_RetToString (DAT_RETURN ret_value) { const char *major_msg, *minor_msg; - int sz; dat_strerror (ret_value, &major_msg, &minor_msg); -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:26:00 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:26:00 -0700 Subject: [ofa-general] [PATCH 3/5][v1.2] dat: fix compiler warnings in dat common code Message-ID: <000201c90ca3$3da95580$5464fe0a@amr.corp.intel.com> Cleanup uDAT common code Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dat/common/dat_dr.c | 26 +++++++++++--------------- dat/common/dat_sr.c | 19 ++++++++++++------- 2 files changed, 23 insertions(+), 22 deletions(-) diff --git a/dat/common/dat_dr.c b/dat/common/dat_dr.c index 89fc861..f40a94c 100644 --- a/dat/common/dat_dr.c +++ b/dat/common/dat_dr.c @@ -174,16 +174,16 @@ extern DAT_RETURN dat_dr_remove ( IN const DAT_PROVIDER_INFO *info ) { - DAT_DR_ENTRY *data; DAT_DICTIONARY_ENTRY dict_entry; DAT_RETURN status; + DAT_DICTIONARY_DATA data; dict_entry = NULL; dat_os_lock (&g_dr_lock); status = dat_dictionary_search ( g_dr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); + &data); if ( DAT_SUCCESS != status ) { @@ -191,7 +191,7 @@ dat_dr_remove ( goto bail; } - if ( 0 != data->ref_count ) + if ( 0 != ((DAT_DR_ENTRY*)data)->ref_count ) { status = DAT_ERROR (DAT_PROVIDER_IN_USE, 0); goto bail; @@ -200,7 +200,7 @@ dat_dr_remove ( status = dat_dictionary_remove ( g_dr_dictionary, &dict_entry, info, - (DAT_DICTIONARY_DATA *) &data); + &data); if ( DAT_SUCCESS != status ) { /* return status from dat_dictionary_remove() */ @@ -231,20 +231,18 @@ dat_dr_provider_open ( OUT DAT_IA_OPEN_FUNC *p_ia_open_func ) { DAT_RETURN status; - DAT_DR_ENTRY *data; + DAT_DICTIONARY_DATA data; dat_os_lock (&g_dr_lock); - status = dat_dictionary_search ( g_dr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); - + &data); dat_os_unlock (&g_dr_lock); if ( DAT_SUCCESS == status ) { - data->ref_count++; - *p_ia_open_func = data->ia_open_func; + ((DAT_DR_ENTRY*)data)->ref_count++; + *p_ia_open_func = ((DAT_DR_ENTRY*)data)->ia_open_func; } return status; @@ -260,19 +258,17 @@ dat_dr_provider_close ( IN const DAT_PROVIDER_INFO *info ) { DAT_RETURN status; - DAT_DR_ENTRY *data; + DAT_DICTIONARY_DATA data; dat_os_lock (&g_dr_lock); - status = dat_dictionary_search ( g_dr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); - + &data); dat_os_unlock (&g_dr_lock); if ( DAT_SUCCESS == status ) { - data->ref_count--; + ((DAT_DR_ENTRY*)data)->ref_count--; } return status; diff --git a/dat/common/dat_sr.c b/dat/common/dat_sr.c index d5d8666..e3b2a54 100644 --- a/dat/common/dat_sr.c +++ b/dat/common/dat_sr.c @@ -129,12 +129,13 @@ dat_sr_insert ( IN DAT_SR_ENTRY *entry ) { DAT_RETURN status; - DAT_SR_ENTRY *data, *prev_data; + DAT_SR_ENTRY *data; DAT_OS_SIZE lib_path_size; DAT_OS_SIZE lib_path_len; DAT_OS_SIZE ia_params_size; DAT_OS_SIZE ia_params_len; DAT_DICTIONARY_ENTRY dict_entry; + DAT_DICTIONARY_DATA prev_data; if ( NULL == (data = dat_os_alloc (sizeof (DAT_SR_ENTRY))) ) { @@ -184,7 +185,7 @@ dat_sr_insert ( status = dat_dictionary_search (g_sr_dictionary, info, - (DAT_DICTIONARY_DATA *) &prev_data); + &prev_data); if ( DAT_SUCCESS == status ) { /* We already have a dictionary entry, so we don't need a new one. @@ -196,12 +197,12 @@ dat_sr_insert ( dict_entry = NULL; /* Find the next available slot in this chain */ - while (NULL != prev_data->next) + while (NULL != ((DAT_SR_ENTRY*)prev_data)->next) { - prev_data = prev_data->next; + prev_data = ((DAT_SR_ENTRY*)prev_data)->next; } dat_os_assert (NULL != prev_data); - prev_data->next = data; + ((DAT_SR_ENTRY*)prev_data)->next = data; } else { @@ -350,15 +351,17 @@ dat_sr_provider_open ( { DAT_RETURN status; DAT_SR_ENTRY *data; + DAT_DICTIONARY_DATA dict_data; dat_os_lock (&g_sr_lock); status = dat_dictionary_search (g_sr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); + &dict_data); if ( DAT_SUCCESS == status ) { + data = (DAT_SR_ENTRY*)dict_data; while (data != NULL) { if ( 0 == data->ref_count ) @@ -428,15 +431,17 @@ dat_sr_provider_close ( { DAT_RETURN status; DAT_SR_ENTRY *data; + DAT_DICTIONARY_DATA dict_data; dat_os_lock (&g_sr_lock); status = dat_dictionary_search (g_sr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); + &dict_data); if ( DAT_SUCCESS == status ) { + data = (DAT_SR_ENTRY*)dict_data; while (data != NULL) { if ( 1 == data->ref_count ) -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:26:04 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:26:04 -0700 Subject: [ofa-general] [PATCH 5/5][v1.2] dapl providers: fix compiler warnings in cma and scm providers Message-ID: <000401c90ca3$40612280$5464fe0a@amr.corp.intel.com> dapl providers: fix compiler warnings in cma and scm providers Include provider definitions after some key definitions required by providers. Check results of writes and reads, print appropriate error message. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/include/dapl.h | 38 +++++++++++++++++++------------------- dapl/openib_cma/dapl_ib_cq.c | 5 ++++- dapl/openib_cma/dapl_ib_util.c | 32 +++++++++++++++++++++++++------- dapl/openib_cma/dapl_ib_util.h | 14 +------------- dapl/openib_scm/dapl_ib_cm.c | 36 +++++++++++++++++++++++++++++------- dapl/openib_scm/dapl_ib_util.c | 9 ++++++++- dapl/openib_scm/dapl_ib_util.h | 14 +------------- 7 files changed, 87 insertions(+), 61 deletions(-) diff --git a/dapl/include/dapl.h b/dapl/include/dapl.h index 80c9ff3..9d3f546 100644 --- a/dapl/include/dapl.h +++ b/dapl/include/dapl.h @@ -50,19 +50,6 @@ #include "dapl_osd.h" #include "dapl_debug.h" -#ifdef IBAPI -#include "dapl_ibapi_util.h" -#elif VAPI -#include "dapl_vapi_util.h" -#elif __OPENIB__ -#include "dapl_openib_util.h" -#include "dapl_openib_cm.h" -#elif DUMMY -#include "dapl_dummy_util.h" -#elif OPENIB -#include "dapl_ib_util.h" -#endif - /********************************************************************* * * * Enumerations * @@ -215,12 +202,6 @@ typedef struct dapl_rmr_cookie DAPL_RMR_COOKIE; typedef struct dapl_private DAPL_PRIVATE; -typedef void (*DAPL_CONNECTION_STATE_HANDLER) ( - IN DAPL_EP *, - IN ib_cm_events_t, - IN const void *, - OUT DAT_EVENT *); - /********************************************************************* * * @@ -252,6 +233,19 @@ struct dapl_cookie_buffer DAPL_ATOMIC tail; }; +#ifdef IBAPI +#include "dapl_ibapi_util.h" +#elif VAPI +#include "dapl_vapi_util.h" +#elif __OPENIB__ +#include "dapl_openib_util.h" +#include "dapl_openib_cm.h" +#elif DUMMY +#include "dapl_dummy_util.h" +#elif OPENIB +#include "dapl_ib_util.h" +#endif + struct dapl_hca { DAPL_OS_LOCK lock; @@ -673,6 +667,12 @@ void dapls_io_trc_dump ( * * *********************************************************************/ +typedef void (*DAPL_CONNECTION_STATE_HANDLER) ( + IN DAPL_EP *, + IN ib_cm_events_t, + IN const void *, + OUT DAT_EVENT *); + /* * DAT Mandated functions */ diff --git a/dapl/openib_cma/dapl_ib_cq.c b/dapl/openib_cma/dapl_ib_cq.c index 25b4551..cf19f38 100644 --- a/dapl/openib_cma/dapl_ib_cq.c +++ b/dapl/openib_cma/dapl_ib_cq.c @@ -497,7 +497,10 @@ dapls_ib_wait_object_wakeup (IN ib_wait_obj_handle_t p_cq_wait_obj_handle) p_cq_wait_obj_handle ); /* write to pipe for wake up */ - write(p_cq_wait_obj_handle->pipe[1], "w", sizeof "w"); + if (write(p_cq_wait_obj_handle->pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " wait object wakeup write error = %s\n", + strerror(errno)); return DAT_SUCCESS; } diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index e76e319..afb7463 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -321,7 +321,10 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) dapl_llist_add_tail(&g_hca_list, (DAPL_LLIST_ENTRY*)&hca_ptr->ib_trans.entry, &hca_ptr->ib_trans.entry); - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " open_hca: thread wakeup error = %s\n", + strerror(errno)); dapl_os_unlock(&g_hca_lock); dapl_dbg_log( @@ -388,14 +391,20 @@ DAT_RETURN dapls_ib_close_hca(IN DAPL_HCA *hca_ptr) * Wakeup work thread to remove from polling list */ hca_ptr->ib_trans.destroy = 1; - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " close_hca: thread wakeup error = %s\n", + strerror(errno)); /* wait for thread to remove HCA references */ while (hca_ptr->ib_trans.destroy != 2) { struct timespec sleep, remain; sleep.tv_sec = 0; sleep.tv_nsec = 10000000; /* 10 ms */ - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " close_hca: thread wakeup error = %s\n", + strerror(errno)); dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " ib_thread_destroy: wait on hca %p destroy\n"); nanosleep (&sleep, &remain); @@ -671,14 +680,20 @@ void dapli_ib_thread_destroy(void) goto bail; g_ib_thread_state = IB_THREAD_CANCEL; - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " destroy: thread wakeup error = %s\n", + strerror(errno)); while ((g_ib_thread_state != IB_THREAD_EXIT) && (retries--)) { struct timespec sleep, remain; sleep.tv_sec = 0; sleep.tv_nsec = 2000000; /* 2 ms */ dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " ib_thread_destroy: waiting for ib_thread\n"); - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " destroy: thread wakeup error = %s\n", + strerror(errno)); dapl_os_unlock( &g_hca_lock ); nanosleep(&sleep, &remain); dapl_os_lock( &g_hca_lock ); @@ -894,8 +909,11 @@ void dapli_thread(void *arg) /* check and process user events, PIPE */ if (ufds[0].revents == POLLIN) { - read(g_ib_pipe[0], rbuf, 2); - + if (read(g_ib_pipe[0], rbuf, 2) == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " ib_thread: pipe rd err= %s\n", + strerror(errno)); + /* cleanup any device on list marked for destroy */ for(idx=3;idxdestroy == 1) { diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index 1e464b2..1d919d9 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -164,18 +164,6 @@ typedef enum } ib_thread_state_t; -/* - * dapl_llist_entry in dapl.h but dapl.h depends on provider - * typedef's in this file first. move dapl_llist_entry out of dapl.h - */ -struct ib_llist_entry -{ - struct dapl_llist_entry *flink; - struct dapl_llist_entry *blink; - void *data; - struct dapl_llist_entry *list_head; -}; - struct dapl_cm_id { DAPL_OS_LOCK lock; int destroy; @@ -256,7 +244,7 @@ typedef void (*ib_async_handler_t)( /* ib_hca_transport_t, specific to this implementation */ typedef struct _ib_hca_transport { - struct ib_llist_entry entry; + struct dapl_llist_entry entry; int destroy; struct dapl_hca *d_hca; struct rdma_cm_id *cm_id; diff --git a/dapl/openib_scm/dapl_ib_cm.c b/dapl/openib_scm/dapl_ib_cm.c index 9f845b6..48aa82d 100644 --- a/dapl/openib_scm/dapl_ib_cm.c +++ b/dapl/openib_scm/dapl_ib_cm.c @@ -119,7 +119,10 @@ static void dapli_cm_destroy(struct ib_cm_handle *cm_ptr) dapl_os_unlock(&cm_ptr->lock); /* wakeup work thread */ - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " cm_destroy: thread wakeup error = %s\n", + strerror(errno)); } /* queue socket for processing CM work */ @@ -133,7 +136,10 @@ static void dapli_cm_queue(struct ib_cm_handle *cm_ptr) dapl_os_unlock(&cm_ptr->hca->ib_trans.lock); /* wakeup CM work thread */ - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " cm_queue: thread wakeup error = %s\n", + strerror(errno)); } static uint16_t dapli_get_lid(IN struct ibv_context *ctx, IN uint8_t port) @@ -167,7 +173,11 @@ dapli_socket_disconnect(ib_cm_handle_t cm_ptr) } else { /* send disc date, close socket, schedule destroy */ if (cm_ptr->socket >= 0) { - write(cm_ptr->socket, &disc_data, sizeof(disc_data)); + if (write(cm_ptr->socket, + &disc_data, sizeof(disc_data)) == -1) + dapl_log(DAPL_DBG_TYPE_WARN, + " cm_disc: write error = %s\n", + strerror(errno)); close(cm_ptr->socket); cm_ptr->socket = -1; } @@ -473,7 +483,10 @@ dapli_socket_connect_rtu(ib_cm_handle_t cm_ptr) dapl_dbg_log(DAPL_DBG_TYPE_EP," connect_rtu: send RTU\n"); /* complete handshake after final QP state change */ - write(cm_ptr->socket, &rtu_data, sizeof(rtu_data)); + if (write(cm_ptr->socket, &rtu_data, sizeof(rtu_data)) == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " CONN_RTU: write error = %s\n", + strerror(errno)); /* init cm_handle and post the event with private data */ ep_ptr->cm_handle = cm_ptr; @@ -1011,7 +1024,10 @@ dapls_ib_remove_conn_listener ( /* cr_thread will free */ cm_ptr->state = SCM_DESTROY; sp_ptr->cm_srvc_handle = NULL; - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " remove_listen: thread wakeup error = %s\n", + strerror(errno)); } return DAT_SUCCESS; } @@ -1106,7 +1122,10 @@ dapls_ib_reject_connection ( /* cr_thread will destroy CR */ cm_ptr->state = SCM_REJECTED; - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " reject_connection: thread wakeup error = %s\n", + strerror(errno)); return DAT_SUCCESS; } @@ -1442,7 +1461,10 @@ void cr_thread(void *arg) poll(ufds,idx+1,-1); /* infinite, all sockets and pipe */ /* if pipe used to wakeup, consume */ if (ufds[0].revents == POLLIN) - read(g_scm_pipe[0], rbuf, 2); + if (read(g_scm_pipe[0], rbuf, 2) == -1) + dapl_log(DAPL_DBG_TYPE_CM, + " cr_thread: read pipe error = %s\n", + strerror(errno)); dapl_dbg_log(DAPL_DBG_TYPE_CM," cr_thread: wakeup\n"); dapl_os_lock(&hca_ptr->ib_trans.lock); } diff --git a/dapl/openib_scm/dapl_ib_util.c b/dapl/openib_scm/dapl_ib_util.c index 76bde89..f1f6103 100644 --- a/dapl/openib_scm/dapl_ib_util.c +++ b/dapl/openib_scm/dapl_ib_util.c @@ -359,11 +359,18 @@ DAT_RETURN dapls_ib_close_hca ( IN DAPL_HCA *hca_ptr ) /* destroy cr_thread and lock */ hca_ptr->ib_trans.cr_state = IB_THREAD_CANCEL; - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " close_hca: thread wakeup error = %s\n", + strerror(errno)); while (hca_ptr->ib_trans.cr_state != IB_THREAD_EXIT) { struct timespec sleep, remain; sleep.tv_sec = 0; sleep.tv_nsec = 2000000; /* 2 ms */ + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " close_hca: thread wakeup error = %s\n", + strerror(errno)); dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " close_hca: waiting for cr_thread\n"); nanosleep (&sleep, &remain); diff --git a/dapl/openib_scm/dapl_ib_util.h b/dapl/openib_scm/dapl_ib_util.h index 8ed4fac..91a4e67 100644 --- a/dapl/openib_scm/dapl_ib_util.h +++ b/dapl/openib_scm/dapl_ib_util.h @@ -85,18 +85,6 @@ typedef struct _ib_qp_cm union ibv_gid gid; } ib_qp_cm_t; -/* - * dapl_llist_entry in dapl.h but dapl.h depends on provider - * typedef's in this file first. move dapl_llist_entry out of dapl.h - */ -struct ib_llist_entry -{ - struct dapl_llist_entry *flink; - struct dapl_llist_entry *blink; - void *data; - struct dapl_llist_entry *list_head; -}; - typedef enum scm_state { SCM_INIT, @@ -114,7 +102,7 @@ typedef enum scm_state struct ib_cm_handle { - struct ib_llist_entry entry; + struct dapl_llist_entry entry; DAPL_OS_LOCK lock; SCM_STATE state; int socket; -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:26:02 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:26:02 -0700 Subject: [ofa-general] [PATCH 4/5][v1.2] dapl build: add correct CFLAGS for GNU Message-ID: <000301c90ca3$3ed0e590$5464fe0a@amr.corp.intel.com> Signed-off by: Arlin Davis ardavis at ichips.intel.com --- Makefile.am | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Makefile.am b/Makefile.am index bccc6ff..29e6b3b 100644 --- a/Makefile.am +++ b/Makefile.am @@ -12,9 +12,9 @@ OSFLAGS += -DREDHAT_EL5 endif if DEBUG -DBGFLAGS = -ggdb -DDAPL_DBG +AM_CFLAGS = -g -Wall -D_GNU_SOURCE -DDAPL_DBG else -DBGFLAGS = -g +AM_CFLAGS = -g -Wall -D_GNU_SOURCE endif datlibdir = $(libdir) @@ -25,17 +25,17 @@ datlib_LTLIBRARIES = dat/udat/libdat.la dapllibcma_LTLIBRARIES = dapl/udapl/libdaplcma.la dapllibscm_LTLIBRARIES = dapl/udapl/libdaplscm.la -dat_udat_libdat_la_CFLAGS = -Wall $(DBGFLAGS) -D_GNU_SOURCE $(OSFLAGS) \ +dat_udat_libdat_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) \ -I$(srcdir)/dat/include/ -I$(srcdir)/dat/udat/ \ -I$(srcdir)/dat/udat/linux -I$(srcdir)/dat/common/ -dapl_udapl_libdaplcma_la_CFLAGS = -Wall $(DBGFLAGS) -D_GNU_SOURCE $(OSFLAGS) \ +dapl_udapl_libdaplcma_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) \ -DOPENIB -DCQ_WAIT_OBJECT \ -I$(srcdir)/dat/include/ -I$(srcdir)/dapl/include/ \ -I$(srcdir)/dapl/common -I$(srcdir)/dapl/udapl/linux \ -I$(srcdir)/dapl/openib_cma -dapl_udapl_libdaplscm_la_CFLAGS = -Wall $(DBGFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ +dapl_udapl_libdaplscm_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ -DOPENIB -DCQ_WAIT_OBJECT \ -I$(srcdir)/dat/include/ -I$(srcdir)/dapl/include/ \ -I$(srcdir)/dapl/common -I$(srcdir)/dapl/udapl/linux \ -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:33:25 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:33:25 -0700 Subject: [ofa-general] [PATCH 1/5] [v2.0] dtest/dapltest: fix compiler warnings Message-ID: <000501c90ca4$47df0da0$5464fe0a@amr.corp.intel.com> Patch set for DAPL v2.0 to cleanup compiler warnings and fix fedora build issues. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- test/dapltest/Makefile.am | 4 +- test/dapltest/cmd/dapl_netaddr.c | 2 +- test/dapltest/test/dapl_limit.c | 125 ++++++++++++++++++-------------------- test/dtest/Makefile.am | 4 +- test/dtest/dtest.c | 35 +++++------ test/dtest/dtestx.c | 20 +++--- 6 files changed, 90 insertions(+), 100 deletions(-) diff --git a/test/dapltest/Makefile.am b/test/dapltest/Makefile.am index 18660c8..fe69d71 100755 --- a/test/dapltest/Makefile.am +++ b/test/dapltest/Makefile.am @@ -4,7 +4,9 @@ else XFLAGS = endif -dapltest_CFLAGS = $(XFLAGS) +AM_CFLAGS = -g -Wall -D_GNU_SOURCE + +dapltest_CFLAGS = $(AM_FLAGS) $(XFLAGS) INCLUDES = -I include \ -I mdep/linux \ diff --git a/test/dapltest/cmd/dapl_netaddr.c b/test/dapltest/cmd/dapl_netaddr.c index a306335..e1600d5 100644 --- a/test/dapltest/cmd/dapl_netaddr.c +++ b/test/dapltest/cmd/dapl_netaddr.c @@ -90,7 +90,7 @@ DT_NetAddrLookupHostAddress (DAT_IA_ADDRESS_PTR to_netaddr, whatzit = "service unavailable"; break; } -#if !defined(WIN32) +#if !defined(WIN32) && defined(__USE_GNU) case EAI_ADDRFAMILY: { whatzit = "node has no address in this family"; diff --git a/test/dapltest/test/dapl_limit.c b/test/dapltest/test/dapl_limit.c index adf1139..133b3e0 100644 --- a/test/dapltest/test/dapl_limit.c +++ b/test/dapltest/test/dapl_limit.c @@ -36,13 +36,13 @@ static bool more_handles (DT_Tdep_Print_Head *phead, - DAT_HANDLE **old_ptrptr, /* pointer to current pointer */ + void **old_ptrptr, /* pointer to current pointer */ unsigned int *old_count, /* number pointed to */ unsigned int size) /* size of one datum */ { unsigned int count = *old_count; - DAT_HANDLE *old_handles = *old_ptrptr; - DAT_HANDLE *handle_tmp = DT_Mdep_Malloc (count * 2 * size); + void *old_handles = *old_ptrptr; + void *handle_tmp = DT_Mdep_Malloc (count * 2 * size); if (!handle_tmp) { @@ -171,9 +171,9 @@ limit_test ( DT_Tdep_Print_Head *phead, DAT_EVD_HANDLE ia_async_handle; } OneOpen; - unsigned int count = START_COUNT; - OneOpen *hdlptr = (OneOpen *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof(OneOpen)); + OneOpen *hdlptr = (OneOpen *)hptr; /* IA Exhaustion test loop */ if (hdlptr) @@ -186,14 +186,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles (phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: IAs opened: %d\n", module, w); retval = true; break; } + hdlptr = (OneOpen *)hptr; /* Specify that we want to get back an async EVD. */ hdlptr[w].ia_async_handle = DAT_HANDLE_NULL; ret = dat_ia_open (cmd->device_name, @@ -270,9 +269,9 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many PZs we can create */ - unsigned int count = START_COUNT; - DAT_PZ_HANDLE *hdlptr = (DAT_PZ_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof(DAT_PZ_HANDLE)); + DAT_PZ_HANDLE *hdlptr = (DAT_PZ_HANDLE *)hptr; /* PZ Exhaustion test loop */ if (hdlptr) @@ -287,14 +286,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles(phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: PZs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_PZ_HANDLE *)hptr; ret = dat_pz_create (hdl_sets[w % cmd->width].ia_handle, &hdlptr[w]); if (ret != DAT_SUCCESS) @@ -368,10 +366,10 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many CNOs we can create */ - unsigned int count = START_COUNT; - DAT_CNO_HANDLE *hdlptr = (DAT_CNO_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); - + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof(DAT_CNO_HANDLE)); + DAT_CNO_HANDLE *hdlptr = (DAT_CNO_HANDLE *)hptr; + /* CNO Exhaustion test loop */ if (hdlptr) { @@ -385,14 +383,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles(phead, &hptr, &count, sizeof (*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: CNOs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_CNO_HANDLE *)hptr; ret = dat_cno_create (hdl_sets[w % cmd->width].ia_handle, DAT_OS_WAIT_PROXY_AGENT_NULL, &hdlptr[w]); @@ -489,9 +486,10 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many EVDs we can create */ - unsigned int count = START_COUNT; - DAT_EVD_HANDLE *hdlptr = (DAT_EVD_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc(count * sizeof(DAT_EVD_HANDLE)); + DAT_EVD_HANDLE *hdlptr = (DAT_EVD_HANDLE *)hptr; + DAT_EVD_FLAGS flags = ( DAT_EVD_DTO_FLAG | DAT_EVD_RMR_BIND_FLAG | DAT_EVD_CR_FLAG); @@ -524,14 +522,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles(phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: EVDs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_EVD_HANDLE *)hptr; ret = DT_Tdep_evd_create (hdl_sets[w % cmd->width].ia_handle, DFLT_QLEN, hdl_sets[w % cmd->width].cno_handle, @@ -608,9 +605,9 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many EPs we can create */ - unsigned int count = START_COUNT; - DAT_EP_HANDLE *hdlptr = (DAT_EP_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc(count * sizeof(DAT_EP_HANDLE)); + DAT_EP_HANDLE *hdlptr = (DAT_EP_HANDLE *)hptr; /* EP Exhaustion test loop */ if (hdlptr) @@ -623,14 +620,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles(phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: EPs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_EP_HANDLE *)hptr; ret = dat_ep_create (hdl_sets[w % cmd->width].ia_handle, hdl_sets[w % cmd->width].pz_handle, hdl_sets[w % cmd->width].evd_handle, @@ -679,11 +675,11 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many RSPs we can create */ - unsigned int count = START_COUNT; - DAT_RSP_HANDLE *hdlptr = (DAT_RSP_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); - DAT_EP_HANDLE *epptr = (DAT_EP_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*epptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc(count * sizeof (DAT_RSP_HANDLE)); + DAT_RSP_HANDLE *hdlptr = (DAT_RSP_HANDLE *)hptr; + void *eptr = DT_Mdep_Malloc(count * sizeof (DAT_EP_HANDLE)); + DAT_EP_HANDLE *epptr = (DAT_EP_HANDLE *)eptr; /* RSP Exhaustion test loop */ if (hdlptr) @@ -700,23 +696,21 @@ limit_test ( DT_Tdep_Print_Head *phead, unsigned int count1 = count; unsigned int count2 = count; - if (!more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count1, - sizeof (*hdlptr))) + if (!more_handles(phead, &hptr, &count1, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: RSPs created: %d\n", module, w); retval = true; break; } - if (!more_handles (phead, (DAT_HANDLE **) &epptr, - &count2, - sizeof (*epptr))) + hdlptr = (DAT_RSP_HANDLE *)hptr; + + if (!more_handles (phead, &eptr, &count2, sizeof(*epptr))) { DT_Tdep_PT_Printf (phead, "%s: RSPs created: %d\n", module, w); retval = true; break; } - + epptr = (DAT_EP_HANDLE *)eptr; if (count1 != count2) { DT_Tdep_PT_Printf (phead, "%s: Mismatch in allocation of handle arrays at point %d\n", @@ -815,9 +809,9 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many PSPs we can create */ - unsigned int count = START_COUNT; - DAT_PSP_HANDLE *hdlptr = (DAT_PSP_HANDLE *) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof (DAT_PSP_HANDLE)); + DAT_PSP_HANDLE *hdlptr = (DAT_PSP_HANDLE *)hptr; /* PSP Exhaustion test loop */ if (hdlptr) @@ -830,14 +824,13 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles (phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: PSPs created: %d\n", module, w); retval = true; break; } + hdlptr = (DAT_PSP_HANDLE *)hptr; ret = dat_psp_create (hdl_sets[w % cmd->width].ia_handle, CONN_QUAL0 + w, hdl_sets[w % cmd->width].evd_handle, @@ -941,10 +934,10 @@ limit_test ( DT_Tdep_Print_Head *phead, /* * See how many LMRs we can create */ - unsigned int count = START_COUNT; - Bpool **hdlptr = (Bpool **) - DT_Mdep_Malloc (count * sizeof (*hdlptr)); - + unsigned int count = START_COUNT; + void *hptr = DT_Mdep_Malloc (count * sizeof(Bpool*)); + Bpool **hdlptr = (Bpool **)hptr; + /* LMR Exhaustion test loop */ if (hdlptr) { @@ -956,9 +949,7 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - sizeof (*hdlptr))) + && !more_handles (phead, &hptr, &count, sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: no memory for LMR handles\n", module); @@ -966,6 +957,7 @@ limit_test ( DT_Tdep_Print_Head *phead, retval = true; break; } + hdlptr = (Bpool **)hptr; /* * Let BpoolAlloc do the hard work; this means that * we're testing unique memory registrations rather @@ -1017,14 +1009,15 @@ limit_test ( DT_Tdep_Print_Head *phead, * We are posting the same buffer 'cnt' times, deliberately, * but that should be OK. */ - unsigned int count = START_COUNT; - DAT_LMR_TRIPLET *hdlptr = (DAT_LMR_TRIPLET *) - DT_Mdep_Malloc (count * cmd->width * sizeof (*hdlptr)); + unsigned int count = START_COUNT; + void *hptr = + DT_Mdep_Malloc(count * cmd->width * sizeof(DAT_LMR_TRIPLET)); + DAT_LMR_TRIPLET *hdlptr = (DAT_LMR_TRIPLET *)hptr; /* Recv-Post Exhaustion test loop */ if (hdlptr) { - unsigned int w = 0; + unsigned int w = 0; unsigned int i = 0; unsigned int done = 0; @@ -1033,9 +1026,8 @@ limit_test ( DT_Tdep_Print_Head *phead, { DT_Mdep_Schedule(); if (w == count - && !more_handles (phead, (DAT_HANDLE **) &hdlptr, - &count, - cmd->width * sizeof (*hdlptr))) + && !more_handles (phead, &hptr, &count, + cmd->width * sizeof(*hdlptr))) { DT_Tdep_PT_Printf (phead, "%s: no memory for IOVs \n", module); @@ -1047,6 +1039,7 @@ limit_test ( DT_Tdep_Print_Head *phead, done = retval = true; break; } + hdlptr = (DAT_LMR_TRIPLET *)hptr; for (i = 0; i < cmd->width; i++) { DAT_LMR_TRIPLET *iovp = &hdlptr[w * cmd->width + i]; diff --git a/test/dtest/Makefile.am b/test/dtest/Makefile.am index aabd026..90c9d95 100755 --- a/test/dtest/Makefile.am +++ b/test/dtest/Makefile.am @@ -1,11 +1,11 @@ bin_PROGRAMS = dtest dtest_SOURCES = dtest.c +dtest_CFLAGS = -g -Wall -D_GNU_SOURCE if EXT_TYPE_IB bin_PROGRAMS += dtestx dtestx_SOURCES = dtestx.c -dtest_CFLAGS = -DDAT_EXTENSIONS -dtestx_CFLAGS = -DDAT_EXTENSIONS +dtestx_CFLAGS = -g -Wall -D_GNU_SOURCE -DDAT_EXTENSIONS dtestx_LDADD = $(srcdir)/../../dat/udat/libdat2.la endif diff --git a/test/dtest/dtest.c b/test/dtest/dtest.c index 095ff40..00d14e3 100755 --- a/test/dtest/dtest.c +++ b/test/dtest/dtest.c @@ -207,7 +207,6 @@ struct dt_time { struct dt_time time; /* defaults */ -static int parent=1; static int failed=0; static int performance_times=0; static int connected=0; @@ -218,17 +217,13 @@ static int polling=0; static int poll_count=0; static int rdma_wr_poll_count=0; static int rdma_rd_poll_count[MAX_RDMA_RD]={0}; -static int pin_memory=0; static int delay=0; static int buf_len=RDMA_BUFFER_SIZE; static int use_cno=0; -static int post_recv_count=MSG_BUF_COUNT; static int recv_msg_index=0; static int burst_msg_posted=0; static int burst_msg_index=0; -static int child[MAX_PROCS+1]; - /* forward prototypes */ const char * DT_RetToString (DAT_RETURN ret_value); const char * DT_EventToSTr (DAT_EVENT_NUMBER event_code); @@ -254,6 +249,7 @@ DAT_RETURN do_ping_pong_msg( void ); #define LOGPRINTF if (verbose) printf +int main(int argc, char **argv) { int i,c; @@ -446,7 +442,7 @@ main(int argc, char **argv) inet_ntop(AF_INET, &((struct sockaddr_in *)ep_param.local_ia_address_ptr)->sin_addr, addr_str, sizeof(addr_str)); - printf("\n%d Query EP: LOCAL addr %s port %lld\n", getpid(), + printf("\n%d Query EP: LOCAL addr %s port "F64x"\n", getpid(), addr_str, (ep_param.local_port_qual)); #endif #if defined(_WIN32) @@ -458,7 +454,7 @@ main(int argc, char **argv) inet_ntop(AF_INET, &((struct sockaddr_in *)ep_param.remote_ia_address_ptr)->sin_addr, addr_str, sizeof(addr_str)); - printf("%d Query EP: REMOTE addr %s port %lld\n", getpid(), + printf("%d Query EP: REMOTE addr %s port "F64x"\n", getpid(), addr_str, (ep_param.remote_port_qual)); #endif fflush(stdout); @@ -615,6 +611,7 @@ complete: #if defined(_WIN32) || defined(_WIN64) WSACleanup(); #endif + return(0); } #if defined(_WIN32) || defined(_WIN64) @@ -750,7 +747,7 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) /* Register send message buffer */ LOGPRINTF("%d Registering send Message Buffer %p, len %d\n", - getpid(), &rmr_send_msg, sizeof(DAT_RMR_TRIPLET) ); + getpid(), &rmr_send_msg, (int)sizeof(DAT_RMR_TRIPLET) ); region.for_va = &rmr_send_msg; ret = dat_lmr_create( h_ia, DAT_MEM_TYPE_VIRTUAL, @@ -848,8 +845,8 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) else LOGPRINTF("%d dat_psp_created for server listen\n", getpid()); - printf("%d Server waiting for connect request on port %lld\n", - getpid(),conn_id); + printf("%d Server waiting for connect request on port "F64x"\n", + getpid(), conn_id); ret = dat_evd_wait( h_cr_evd, SERVER_TIMEOUT, 1, &event, &nmore ); if(ret != DAT_SUCCESS) { @@ -936,7 +933,7 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) rval = ((struct sockaddr_in *)target->ai_addr)->sin_addr.s_addr; #endif printf ("%d Server Name: %s \n", getpid(), hostname); - printf ("%d Server Net Address: %d.%d.%d.%d port %lld\n", getpid(), + printf ("%d Server Net Address: %d.%d.%d.%d port "F64x"\n", getpid(), (rval >> 0) & 0xff, (rval >> 8) & 0xff, (rval >> 16) & 0xff, (rval >> 24) & 0xff, conn_id); @@ -1099,8 +1096,8 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) recv_msg_index) ) { fprintf(stderr,"ERR recv event: len=%d cookie="F64x" expected %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, - (int)event.event_data.dto_completion_event_data.user_cookie.as_64, - sizeof(DAT_RMR_TRIPLET), recv_msg_index ); + event.event_data.dto_completion_event_data.user_cookie.as_64, + (int)sizeof(DAT_RMR_TRIPLET), recv_msg_index ); return( DAT_ABORT ); } @@ -1322,8 +1319,8 @@ do_rdma_write_with_msg( void ) (event.event_data.dto_completion_event_data.user_cookie.as_64 != recv_msg_index) ) { fprintf(stderr,"unexpected event data for receive: len=%d cookie="F64x" exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, - (int)event.event_data.dto_completion_event_data.user_cookie.as_64, - sizeof(DAT_RMR_TRIPLET), recv_msg_index ); + event.event_data.dto_completion_event_data.user_cookie.as_64, + (int)sizeof(DAT_RMR_TRIPLET), recv_msg_index ); return( DAT_ABORT ); } @@ -1515,8 +1512,8 @@ do_rdma_read_with_msg( void ) fprintf(stderr,"unexpected event data for receive: len=%d cookie="F64x" exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, - (int)event.event_data.dto_completion_event_data.user_cookie.as_64, - sizeof(DAT_RMR_TRIPLET), recv_msg_index ); + event.event_data.dto_completion_event_data.user_cookie.as_64, + (int)sizeof(DAT_RMR_TRIPLET), recv_msg_index ); return( DAT_ABORT ); } @@ -1678,8 +1675,8 @@ do_ping_pong_msg( ) != burst_msg_index) ) { fprintf(stderr,"ERR: recv event: len=%d cookie="F64x" exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, - (int)event.event_data.dto_completion_event_data.user_cookie.as_64, - buf_len, burst_msg_index ); + event.event_data.dto_completion_event_data.user_cookie.as_64, + (int)buf_len, (int)burst_msg_index ); return( DAT_ABORT ); } diff --git a/test/dtest/dtestx.c b/test/dtest/dtestx.c index e568aac..fb89364 100755 --- a/test/dtest/dtestx.c +++ b/test/dtest/dtestx.c @@ -439,7 +439,7 @@ connect_ep(char *hostname) r_iov->virtual_address = hton64((DAT_VADDR)buf[RCV_RDMA_BUF_INDEX]); r_iov->segment_length = hton32(buf_size); - printf("%d Send RMR msg to remote: r_key_ctx=0x%x,va=%p,len=0x%x\n", + printf("%d Send RMR msg to remote: r_key_ctx=0x%x,va="F64x",len=0x%x\n", getpid(), hton32(r_iov->rmr_context), hton64(r_iov->virtual_address), hton32(r_iov->segment_length)); @@ -545,16 +545,14 @@ disconnect_ep(void) int do_immediate() { - DAT_REGION_DESCRIPTION region; DAT_EVENT event; DAT_COUNT nmore; DAT_LMR_TRIPLET iov; DAT_RMR_TRIPLET r_iov; DAT_DTO_COOKIE cookie; - DAT_RMR_CONTEXT their_context; DAT_RETURN status; DAT_UINT32 immed_data; - DAT_UINT32 immed_data_recv; + DAT_UINT32 immed_data_recv = 0; DAT_DTO_COMPLETION_EVENT_DATA *dto_event = &event.event_data.dto_completion_event_data; DAT_IB_EXTENSION_EVENT_DATA *ext_event = @@ -620,10 +618,10 @@ do_immediate() (dto_event->user_cookie.as_64 != RECV_BUF_INDEX+1)) { printf("unexpected event data of immediate write: len=%d " - "cookie=%d expected %d/%d\n", + "cookie="F64x" expected %d/%d\n", (int)dto_event->transfered_length, - (int)dto_event->user_cookie.as_64, - sizeof(int), RECV_BUF_INDEX+1); + dto_event->user_cookie.as_64, + (int)sizeof(int), RECV_BUF_INDEX+1); exit(1); } @@ -669,10 +667,10 @@ do_immediate() (dto_event->user_cookie.as_64 != RECV_BUF_INDEX+1)) { printf("unexpected event data of immediate write: len=%d " - "cookie=%d expected %d/%d\n", + "cookie="F64x" expected %d/%d\n", (int)dto_event->transfered_length, - (int)dto_event->user_cookie.as_64, - sizeof(int), RECV_BUF_INDEX+1); + dto_event->user_cookie.as_64, + (int)sizeof(int), RECV_BUF_INDEX+1); exit(1); } @@ -705,7 +703,7 @@ do_immediate() printf("Client received immed_data=0x%x\n",immed_data_recv); printf("rdma buffer %p contains: %s\n", - buf[ RCV_RDMA_BUF_INDEX ], buf[ RCV_RDMA_BUF_INDEX ]); + buf[RCV_RDMA_BUF_INDEX], (char*)buf[RCV_RDMA_BUF_INDEX]); printf("\n RDMA_WRITE_WITH_IMMEDIATE_DATA test - PASSED\n"); return (0); -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:33:31 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:33:31 -0700 Subject: [ofa-general] [PATCH 3/5] [v2.0] dat: fix compiler warnings in dat common code Message-ID: <000601c90ca4$4aedad30$5464fe0a@amr.corp.intel.com> dat: fix compiler warnings in dat common code Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dat/common/dat_dr.c | 26 +++++++++++--------------- dat/common/dat_sr.c | 19 ++++++++++++------- 2 files changed, 23 insertions(+), 22 deletions(-) diff --git a/dat/common/dat_dr.c b/dat/common/dat_dr.c index bda3002..f7e9ffd 100644 --- a/dat/common/dat_dr.c +++ b/dat/common/dat_dr.c @@ -173,16 +173,16 @@ DAT_RETURN dat_dr_remove ( IN const DAT_PROVIDER_INFO *info ) { - DAT_DR_ENTRY *data; DAT_DICTIONARY_ENTRY dict_entry; DAT_RETURN status; + DAT_DICTIONARY_DATA data; dict_entry = NULL; dat_os_lock (&g_dr_lock); status = dat_dictionary_search ( g_dr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); + &data); if ( DAT_SUCCESS != status ) { @@ -190,7 +190,7 @@ dat_dr_remove ( goto bail; } - if ( 0 != data->ref_count ) + if ( 0 != ((DAT_DR_ENTRY*)data)->ref_count ) { status = DAT_ERROR (DAT_PROVIDER_IN_USE, 0); goto bail; @@ -199,7 +199,7 @@ dat_dr_remove ( status = dat_dictionary_remove ( g_dr_dictionary, &dict_entry, info, - (DAT_DICTIONARY_DATA *) &data); + &data); if ( DAT_SUCCESS != status ) { /* return status from dat_dictionary_remove() */ @@ -230,20 +230,18 @@ dat_dr_provider_open ( OUT DAT_IA_OPEN_FUNC *p_ia_open_func ) { DAT_RETURN status; - DAT_DR_ENTRY *data; + DAT_DICTIONARY_DATA data; dat_os_lock (&g_dr_lock); - status = dat_dictionary_search ( g_dr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); - + &data); dat_os_unlock (&g_dr_lock); if ( DAT_SUCCESS == status ) { - data->ref_count++; - *p_ia_open_func = data->ia_open_func; + ((DAT_DR_ENTRY*)data)->ref_count++; + *p_ia_open_func = ((DAT_DR_ENTRY*)data)->ia_open_func; } return status; @@ -259,19 +257,17 @@ dat_dr_provider_close ( IN const DAT_PROVIDER_INFO *info ) { DAT_RETURN status; - DAT_DR_ENTRY *data; + DAT_DICTIONARY_DATA data; dat_os_lock (&g_dr_lock); - status = dat_dictionary_search ( g_dr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); - + &data); dat_os_unlock (&g_dr_lock); if ( DAT_SUCCESS == status ) { - data->ref_count--; + ((DAT_DR_ENTRY*)data)->ref_count--; } return status; diff --git a/dat/common/dat_sr.c b/dat/common/dat_sr.c index 05be499..10319b9 100755 --- a/dat/common/dat_sr.c +++ b/dat/common/dat_sr.c @@ -129,12 +129,13 @@ dat_sr_insert ( IN DAT_SR_ENTRY *entry ) { DAT_RETURN status; - DAT_SR_ENTRY *data, *prev_data; + DAT_SR_ENTRY *data; DAT_OS_SIZE lib_path_size; DAT_OS_SIZE lib_path_len; DAT_OS_SIZE ia_params_size; DAT_OS_SIZE ia_params_len; DAT_DICTIONARY_ENTRY dict_entry; + DAT_DICTIONARY_DATA prev_data; if ( NULL == (data = dat_os_alloc (sizeof (DAT_SR_ENTRY))) ) { @@ -184,7 +185,7 @@ dat_sr_insert ( status = dat_dictionary_search (g_sr_dictionary, info, - (DAT_DICTIONARY_DATA *) &prev_data); + &prev_data); if ( DAT_SUCCESS == status ) { /* We already have a dictionary entry, so we don't need a new one. @@ -196,12 +197,12 @@ dat_sr_insert ( dict_entry = NULL; /* Find the next available slot in this chain */ - while (NULL != prev_data->next) + while (NULL != ((DAT_SR_ENTRY*)prev_data)->next) { - prev_data = prev_data->next; + prev_data = ((DAT_SR_ENTRY*)prev_data)->next; } dat_os_assert (NULL != prev_data); - prev_data->next = data; + ((DAT_SR_ENTRY*)prev_data)->next = data; } else { @@ -350,15 +351,17 @@ dat_sr_provider_open ( { DAT_RETURN status; DAT_SR_ENTRY *data; + DAT_DICTIONARY_DATA dict_data; dat_os_lock (&g_sr_lock); status = dat_dictionary_search (g_sr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); + &dict_data); if ( DAT_SUCCESS == status ) { + data = (DAT_SR_ENTRY*)dict_data; while (data != NULL) { if ( 0 == data->ref_count ) @@ -450,15 +453,17 @@ dat_sr_provider_close ( { DAT_RETURN status; DAT_SR_ENTRY *data; + DAT_DICTIONARY_DATA dict_data; dat_os_lock (&g_sr_lock); status = dat_dictionary_search (g_sr_dictionary, info, - (DAT_DICTIONARY_DATA *) &data); + &dict_data); if ( DAT_SUCCESS == status ) { + data = (DAT_SR_ENTRY*)dict_data; while (data != NULL) { if ( 1 == data->ref_count ) -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:33:31 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:33:31 -0700 Subject: [ofa-general] [PATCH 2/5] [v2.0] dapl: fix compiler warnings in common code Message-ID: <000701c90ca4$4bad2ca0$5464fe0a@amr.corp.intel.com> dapl: fix compiler warnings in common code Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_ep_get_status.c | 1 + dapl/common/dapl_ep_modify.c | 2 +- dapl/common/dapl_rmr_bind.c | 4 +++- dapl/udapl/dapl_evd_wait.c | 17 ++++++++++------- dapl/udapl/dapl_lmr_create.c | 5 +++-- 5 files changed, 18 insertions(+), 11 deletions(-) diff --git a/dapl/common/dapl_ep_get_status.c b/dapl/common/dapl_ep_get_status.c index 853afff..3af7f9a 100644 --- a/dapl/common/dapl_ep_get_status.c +++ b/dapl/common/dapl_ep_get_status.c @@ -38,6 +38,7 @@ #include "dapl.h" #include "dapl_ring_buffer_util.h" +#include "dapl_cookie.h" /* * dapl_ep_get_status diff --git a/dapl/common/dapl_ep_modify.c b/dapl/common/dapl_ep_modify.c index 05aa0ad..fff21a0 100644 --- a/dapl/common/dapl_ep_modify.c +++ b/dapl/common/dapl_ep_modify.c @@ -84,7 +84,7 @@ dapl_ep_modify ( { DAPL_IA *ia; DAPL_EP *ep1, *ep2; - DAT_EP_ATTR ep_attr1, ep_attr2; + DAT_EP_ATTR ep_attr1 = {0}, ep_attr2 = {0}; DAPL_EP new_ep, copy_of_old_ep; DAPL_EP alloc_ep; /* Holder for resources. */ DAPL_PZ *tmp_pz; diff --git a/dapl/common/dapl_rmr_bind.c b/dapl/common/dapl_rmr_bind.c index 12a98f6..e4b8ecb 100755 --- a/dapl/common/dapl_rmr_bind.c +++ b/dapl/common/dapl_rmr_bind.c @@ -84,15 +84,17 @@ dapli_rmr_bind_fuse ( DAPL_COOKIE *cookie; DAT_RETURN dat_status; DAT_BOOLEAN is_signaled; + DAPL_HASH_DATA hash_lmr; dat_status = dapls_hash_search (rmr->header.owner_ia->hca_ptr->lmr_hash_table, lmr_triplet->lmr_context, - (DAPL_HASH_DATA *)&lmr); + &hash_lmr); if ( DAT_SUCCESS != dat_status) { dat_status = DAT_ERROR (DAT_INVALID_PARAMETER, DAT_INVALID_ARG2); goto bail; } + lmr = (DAPL_LMR*)hash_lmr; /* if the ep in unconnected return an error. IB requires that the */ /* QP be connected to change a memory window binding since: */ diff --git a/dapl/udapl/dapl_evd_wait.c b/dapl/udapl/dapl_evd_wait.c index 42a51a7..578041a 100644 --- a/dapl/udapl/dapl_evd_wait.c +++ b/dapl/udapl/dapl_evd_wait.c @@ -141,27 +141,30 @@ DAT_RETURN DAT_API dapl_evd_wait ( waitable = evd_ptr->evd_waitable; dapl_os_assert ( sizeof(DAT_COUNT) == sizeof(DAPL_EVD_STATE) ); - evd_state = dapl_os_atomic_assign ( (DAPL_ATOMIC *)&evd_ptr->evd_state, - (DAT_COUNT) DAPL_EVD_STATE_OPEN, - (DAT_COUNT) DAPL_EVD_STATE_WAITED ); - dapl_os_unlock ( &evd_ptr->header.lock ); + evd_state = evd_ptr->evd_state; + if (evd_ptr->evd_state == DAPL_EVD_STATE_OPEN) + evd_ptr->evd_state = DAPL_EVD_STATE_WAITED; if ( evd_state != DAPL_EVD_STATE_OPEN ) { /* Bogus state, bail out */ dat_status = DAT_ERROR (DAT_INVALID_STATE,0); + dapl_os_unlock ( &evd_ptr->header.lock ); goto bail; } if (!waitable) { /* This EVD is not waitable, reset the state and bail */ - (void) dapl_os_atomic_assign ((DAPL_ATOMIC *)&evd_ptr->evd_state, - (DAT_COUNT) DAPL_EVD_STATE_WAITED, - evd_state); + if (evd_ptr->evd_state == DAPL_EVD_STATE_WAITED) + evd_ptr->evd_state = evd_state; + dat_status = DAT_ERROR (DAT_INVALID_STATE, DAT_INVALID_STATE_EVD_UNWAITABLE); + dapl_os_unlock ( &evd_ptr->header.lock ); goto bail; } + dapl_os_unlock ( &evd_ptr->header.lock ); + /* * We now own the EVD, even though we don't have the lock anymore, diff --git a/dapl/udapl/dapl_lmr_create.c b/dapl/udapl/dapl_lmr_create.c index 350abe0..99b184a 100644 --- a/dapl/udapl/dapl_lmr_create.c +++ b/dapl/udapl/dapl_lmr_create.c @@ -208,6 +208,7 @@ dapli_lmr_create_lmr ( DAPL_LMR *lmr; DAT_REGION_DESCRIPTION reg_desc; DAT_RETURN dat_status; + DAPL_HASH_DATA hash_lmr; dapl_dbg_log (DAPL_DBG_TYPE_API, "dapl_lmr_create_lmr (%p, %p, %p, %x, %x, %p, %p, %p, %p)\n", @@ -221,13 +222,13 @@ dapli_lmr_create_lmr ( dat_status = dapls_hash_search (ia->hca_ptr->lmr_hash_table, original_lmr->param.lmr_context, - (DAPL_HASH_DATA *) &lmr); + &hash_lmr); if ( dat_status != DAT_SUCCESS ) { dat_status = DAT_ERROR (DAT_INVALID_PARAMETER,DAT_INVALID_ARG2); goto bail; } - + lmr = (DAPL_LMR*)hash_lmr; reg_desc.for_lmr_handle = (DAT_LMR_HANDLE) original_lmr; lmr = dapl_lmr_alloc (ia, -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:33:36 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:33:36 -0700 Subject: [ofa-general] [PATCH 4/5] [v2.0] dapl providers: fix compiler warnings in cma and scm providers Message-ID: <000801c90ca4$4d6a45f0$5464fe0a@amr.corp.intel.com> dapl providers: cleanup all compiler warnings in cma and scm providers Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/include/dapl.h | 41 +++++++++++++++++++++------------------ dapl/openib_cma/dapl_ib_cq.c | 5 +++- dapl/openib_cma/dapl_ib_dto.h | 2 +- dapl/openib_cma/dapl_ib_util.c | 33 ++++++++++++++++++++++++------- dapl/openib_cma/dapl_ib_util.h | 16 +------------- dapl/openib_scm/dapl_ib_cm.c | 38 +++++++++++++++++++++++++++++------- dapl/openib_scm/dapl_ib_dto.h | 2 +- dapl/openib_scm/dapl_ib_util.c | 9 +++++++- dapl/openib_scm/dapl_ib_util.h | 14 +------------ 9 files changed, 94 insertions(+), 66 deletions(-) diff --git a/dapl/include/dapl.h b/dapl/include/dapl.h index f0f2095..58af95d 100755 --- a/dapl/include/dapl.h +++ b/dapl/include/dapl.h @@ -53,20 +53,7 @@ #include "dapl_osd.h" #include "dapl_debug.h" -#ifdef IBAPI -#include "dapl_ibapi_util.h" -#elif VAPI -#include "dapl_vapi_util.h" -#elif __OPENIB__ -#include "dapl_openib_util.h" -#include "dapl_openib_cm.h" -#elif DUMMY -#include "dapl_dummy_util.h" -#elif OPENIB -#include "dapl_ib_util.h" -#else /* windows - IBAL and/or IBAL+Sock_CM */ -#include "dapl_ibal_util.h" -#endif + /********************************************************************* * * @@ -231,11 +218,6 @@ typedef struct dapl_rmr_cookie DAPL_RMR_COOKIE; typedef struct dapl_private DAPL_PRIVATE; -typedef void (*DAPL_CONNECTION_STATE_HANDLER) ( - IN DAPL_EP *, - IN ib_cm_events_t, - IN const void *, - OUT DAT_EVENT *); /********************************************************************* @@ -268,6 +250,21 @@ struct dapl_cookie_buffer DAPL_ATOMIC tail; }; +#ifdef IBAPI +#include "dapl_ibapi_util.h" +#elif VAPI +#include "dapl_vapi_util.h" +#elif __OPENIB__ +#include "dapl_openib_util.h" +#include "dapl_openib_cm.h" +#elif DUMMY +#include "dapl_dummy_util.h" +#elif OPENIB +#include "dapl_ib_util.h" +#else /* windows - IBAL and/or IBAL+Sock_CM */ +#include "dapl_ibal_util.h" +#endif + struct dapl_hca { DAPL_OS_LOCK lock; @@ -701,6 +698,12 @@ void dapls_io_trc_dump ( * * *********************************************************************/ +typedef void (*DAPL_CONNECTION_STATE_HANDLER) ( + IN DAPL_EP *, + IN ib_cm_events_t, + IN const void *, + OUT DAT_EVENT *); + /* * DAT Mandated functions */ diff --git a/dapl/openib_cma/dapl_ib_cq.c b/dapl/openib_cma/dapl_ib_cq.c index d7b3309..742c247 100755 --- a/dapl/openib_cma/dapl_ib_cq.c +++ b/dapl/openib_cma/dapl_ib_cq.c @@ -486,7 +486,10 @@ dapls_ib_wait_object_wakeup (IN ib_wait_obj_handle_t p_cq_wait_obj_handle) p_cq_wait_obj_handle ); /* write to pipe for wake up */ - write(p_cq_wait_obj_handle->pipe[1], "w", sizeof "w"); + if (write(p_cq_wait_obj_handle->pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " wait object wakeup write error = %s\n", + strerror(errno)); return DAT_SUCCESS; } diff --git a/dapl/openib_cma/dapl_ib_dto.h b/dapl/openib_cma/dapl_ib_dto.h index 334fa4b..2b01963 100644 --- a/dapl/openib_cma/dapl_ib_dto.h +++ b/dapl/openib_cma/dapl_ib_dto.h @@ -304,7 +304,7 @@ dapls_ib_post_ext_send ( remote_iov, completion_flags); ib_data_segment_t ds_array[DEFAULT_DS_ENTRIES]; - ib_data_segment_t *ds_array_p, *ds_array_start_p; + ib_data_segment_t *ds_array_p, *ds_array_start_p = NULL; struct ibv_send_wr wr; struct ibv_send_wr *bad_wr; DAT_COUNT i, total_len; diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index 4bbeb8b..a8e1fe3 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005-2007 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2008 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * @@ -317,7 +317,10 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) dapl_llist_add_tail(&g_hca_list, (DAPL_LLIST_ENTRY*)&hca_ptr->ib_trans.entry, &hca_ptr->ib_trans.entry); - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " open_hca: thread wakeup error = %s\n", + strerror(errno)); dapl_os_unlock(&g_hca_lock); dapl_dbg_log( @@ -384,14 +387,20 @@ DAT_RETURN dapls_ib_close_hca(IN DAPL_HCA *hca_ptr) * Wakeup work thread to remove from polling list */ hca_ptr->ib_trans.destroy = 1; - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " destroy: thread wakeup error = %s\n", + strerror(errno)); /* wait for thread to remove HCA references */ while (hca_ptr->ib_trans.destroy != 2) { struct timespec sleep, remain; sleep.tv_sec = 0; sleep.tv_nsec = 10000000; /* 10 ms */ - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " destroy: thread wakeup error = %s\n", + strerror(errno)); dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " ib_thread_destroy: wait on hca %p destroy\n"); nanosleep (&sleep, &remain); @@ -670,14 +679,20 @@ void dapli_ib_thread_destroy(void) goto bail; g_ib_thread_state = IB_THREAD_CANCEL; - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " destroy: thread wakeup error = %s\n", + strerror(errno)); while ((g_ib_thread_state != IB_THREAD_EXIT) && (retries--)) { struct timespec sleep, remain; sleep.tv_sec = 0; sleep.tv_nsec = 2000000; /* 2 ms */ dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " ib_thread_destroy: waiting for ib_thread\n"); - write(g_ib_pipe[1], "w", sizeof "w"); + if (write(g_ib_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " destroy: thread wakeup error = %s\n", + strerror(errno)); dapl_os_unlock( &g_hca_lock ); nanosleep(&sleep, &remain); dapl_os_lock( &g_hca_lock ); @@ -890,9 +905,11 @@ void dapli_thread(void *arg) /* check and process user events, PIPE */ if (ufds[0].revents == POLLIN) { + if (read(g_ib_pipe[0], rbuf, 2) == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " cr_thread: pipe rd err= %s\n", + strerror(errno)); - read(g_ib_pipe[0], rbuf, 2); - /* cleanup any device on list marked for destroy */ for(idx=3;idxdestroy == 1) { diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index 3368180..1e13db9 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005-2007 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2008 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * @@ -155,18 +155,6 @@ typedef enum } ib_thread_state_t; -/* - * dapl_llist_entry in dapl.h but dapl.h depends on provider - * typedef's in this file first. move dapl_llist_entry out of dapl.h - */ -struct ib_llist_entry -{ - struct dapl_llist_entry *flink; - struct dapl_llist_entry *blink; - void *data; - struct dapl_llist_entry *list_head; -}; - struct dapl_cm_id { DAPL_OS_LOCK lock; int destroy; @@ -247,7 +235,7 @@ typedef void (*ib_async_handler_t)( /* ib_hca_transport_t, specific to this implementation */ typedef struct _ib_hca_transport { - struct ib_llist_entry entry; + struct dapl_llist_entry entry; int destroy; struct dapl_hca *d_hca; struct rdma_cm_id *cm_id; diff --git a/dapl/openib_scm/dapl_ib_cm.c b/dapl/openib_scm/dapl_ib_cm.c index d2982c7..cf5891d 100644 --- a/dapl/openib_scm/dapl_ib_cm.c +++ b/dapl/openib_scm/dapl_ib_cm.c @@ -119,7 +119,10 @@ static void dapli_cm_destroy(struct ib_cm_handle *cm_ptr) dapl_os_unlock(&cm_ptr->lock); /* wakeup work thread */ - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_CM, + " cm_destroy: thread wakeup error = %s\n", + strerror(errno)); } /* queue socket for processing CM work */ @@ -133,7 +136,10 @@ static void dapli_cm_queue(struct ib_cm_handle *cm_ptr) dapl_os_unlock(&cm_ptr->hca->ib_trans.lock); /* wakeup CM work thread */ - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_CM, + " cm_queue: thread wakeup error = %s\n", + strerror(errno)); } static uint16_t dapli_get_lid(IN struct ibv_context *ctx, IN uint8_t port) @@ -167,7 +173,11 @@ dapli_socket_disconnect(dp_ib_cm_handle_t cm_ptr) } else { /* send disc date, close socket, schedule destroy */ if (cm_ptr->socket >= 0) { - write(cm_ptr->socket, &disc_data, sizeof(disc_data)); + if (write(cm_ptr->socket, + &disc_data, sizeof(disc_data)) == -1) + dapl_log(DAPL_DBG_TYPE_WARN, + " cm_disc: write error = %s\n", + strerror(errno)); close(cm_ptr->socket); cm_ptr->socket = -1; } @@ -483,8 +493,11 @@ dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) dapl_dbg_log(DAPL_DBG_TYPE_EP," connect_rtu: send RTU\n"); /* complete handshake after final QP state change */ - write(cm_ptr->socket, &rtu_data, sizeof(rtu_data)); - + if (write(cm_ptr->socket, &rtu_data, sizeof(rtu_data)) == -1) { + dapl_log(DAPL_DBG_TYPE_ERR, + " CONN_RTU: write error = %s\n", strerror(errno)); + goto bail; + } /* init cm_handle and post the event with private data */ ep_ptr->cm_handle = cm_ptr; cm_ptr->state = SCM_CONNECTED; @@ -1097,7 +1110,10 @@ dapls_ib_remove_conn_listener ( /* cr_thread will free */ cm_ptr->state = SCM_DESTROY; sp_ptr->cm_srvc_handle = NULL; - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_CM, + " cm_destroy: thread wakeup error = %s\n", + strerror(errno)); } return DAT_SUCCESS; } @@ -1199,7 +1215,10 @@ dapls_ib_reject_connection( /* cr_thread will destroy CR */ cm_ptr->state = SCM_REJECTED; - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_CM, + " cm_destroy: thread wakeup error = %s\n", + strerror(errno)); return DAT_SUCCESS; } @@ -1536,7 +1555,10 @@ void cr_thread(void *arg) poll(ufds,idx+1,-1); /* infinite, all sockets and pipe */ /* if pipe used to wakeup, consume */ if (ufds[0].revents == POLLIN) - read(g_scm_pipe[0], rbuf, 2); + if (read(g_scm_pipe[0], rbuf, 2) == -1) + dapl_log(DAPL_DBG_TYPE_CM, + " cr_thread: read pipe error = %s\n", + strerror(errno)); dapl_dbg_log(DAPL_DBG_TYPE_CM," cr_thread: wakeup\n"); dapl_os_lock(&hca_ptr->ib_trans.lock); } diff --git a/dapl/openib_scm/dapl_ib_dto.h b/dapl/openib_scm/dapl_ib_dto.h index b9826f5..45000b9 100644 --- a/dapl/openib_scm/dapl_ib_dto.h +++ b/dapl/openib_scm/dapl_ib_dto.h @@ -324,7 +324,7 @@ dapls_ib_post_ext_send ( remote_iov, completion_flags, remote_ah); ib_data_segment_t ds_array[DEFAULT_DS_ENTRIES]; - ib_data_segment_t *ds_array_p, *ds_array_start_p; + ib_data_segment_t *ds_array_p, *ds_array_start_p = NULL; struct ibv_send_wr wr; struct ibv_send_wr *bad_wr; DAT_COUNT i, total_len; diff --git a/dapl/openib_scm/dapl_ib_util.c b/dapl/openib_scm/dapl_ib_util.c index 11294fa..58c9943 100644 --- a/dapl/openib_scm/dapl_ib_util.c +++ b/dapl/openib_scm/dapl_ib_util.c @@ -359,13 +359,20 @@ DAT_RETURN dapls_ib_close_hca ( IN DAPL_HCA *hca_ptr ) /* destroy cr_thread and lock */ hca_ptr->ib_trans.cr_state = IB_THREAD_CANCEL; - write(g_scm_pipe[1], "w", sizeof "w"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " thread_destroy: thread wakeup err = %s\n", + strerror(errno)); while (hca_ptr->ib_trans.cr_state != IB_THREAD_EXIT) { struct timespec sleep, remain; sleep.tv_sec = 0; sleep.tv_nsec = 2000000; /* 2 ms */ dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " close_hca: waiting for cr_thread\n"); + if (write(g_scm_pipe[1], "w", sizeof "w") == -1) + dapl_log(DAPL_DBG_TYPE_UTIL, + " thread_destroy: thread wakeup err = %s\n", + strerror(errno)); nanosleep (&sleep, &remain); } dapl_os_lock_destroy(&hca_ptr->ib_trans.lock); diff --git a/dapl/openib_scm/dapl_ib_util.h b/dapl/openib_scm/dapl_ib_util.h index 4e75d2c..f0230b8 100644 --- a/dapl/openib_scm/dapl_ib_util.h +++ b/dapl/openib_scm/dapl_ib_util.h @@ -90,18 +90,6 @@ typedef struct _ib_qp_cm uint16_t qp_type; } ib_qp_cm_t; -/* - * dapl_llist_entry in dapl.h but dapl.h depends on provider - * typedef's in this file first. move dapl_llist_entry out of dapl.h - */ -struct ib_llist_entry -{ - struct dapl_llist_entry *flink; - struct dapl_llist_entry *blink; - void *data; - struct dapl_llist_entry *list_head; -}; - typedef enum scm_state { SCM_INIT, @@ -119,7 +107,7 @@ typedef enum scm_state struct ib_cm_handle { - struct ib_llist_entry entry; + struct dapl_llist_entry entry; DAPL_OS_LOCK lock; SCM_STATE state; int socket; -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 19:33:38 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 1 Sep 2008 19:33:38 -0700 Subject: [ofa-general] [PATCH 5/5] [v2.0] dapl build: add correct CFLAGS, set non-debug build by default for v2 Message-ID: <000901c90ca4$4ed87bf0$5464fe0a@amr.corp.intel.com> dapl build: add correct CFLAGS, set non-debug build by default for v2 Signed-off by: Arlin Davis ardavis at ichips.intel.com --- Makefile.am | 10 +++++----- dapl.spec.in | 5 +---- 2 files changed, 6 insertions(+), 9 deletions(-) diff --git a/Makefile.am b/Makefile.am index dfab5e8..4cb339f 100755 --- a/Makefile.am +++ b/Makefile.am @@ -22,9 +22,9 @@ XPROGRAMS_SCM = endif if DEBUG -DBGFLAGS = -ggdb -DDAPL_DBG +AM_CFLAGS = -g -Wall -D_GNU_SOURCE -DDAPL_DBG else -DBGFLAGS = -g +AM_CFLAGS = -g -Wall -D_GNU_SOURCE endif datlibdir = $(libdir) @@ -35,17 +35,17 @@ datlib_LTLIBRARIES = dat/udat/libdat2.la dapllibofa_LTLIBRARIES = dapl/udapl/libdaplofa.la daplliboscm_LTLIBRARIES = dapl/udapl/libdaploscm.la -dat_udat_libdat2_la_CFLAGS = -Wall $(DBGFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ +dat_udat_libdat2_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ -I$(srcdir)/dat/include/ -I$(srcdir)/dat/udat/ \ -I$(srcdir)/dat/udat/linux -I$(srcdir)/dat/common/ -dapl_udapl_libdaplofa_la_CFLAGS = -Wall $(DBGFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ +dapl_udapl_libdaplofa_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ -DOPENIB -DCQ_WAIT_OBJECT \ -I$(srcdir)/dat/include/ -I$(srcdir)/dapl/include/ \ -I$(srcdir)/dapl/common -I$(srcdir)/dapl/udapl/linux \ -I$(srcdir)/dapl/openib_cma -dapl_udapl_libdaploscm_la_CFLAGS = -Wall $(DBGFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ +dapl_udapl_libdaploscm_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ -DOPENIB -DCQ_WAIT_OBJECT \ -I$(srcdir)/dat/include/ -I$(srcdir)/dapl/include/ \ -I$(srcdir)/dapl/common -I$(srcdir)/dapl/udapl/linux \ diff --git a/dapl.spec.in b/dapl.spec.in index e18f19a..4cb8860 100644 --- a/dapl.spec.in +++ b/dapl.spec.in @@ -75,7 +75,7 @@ Useful test suites to validate uDAPL library API's. %setup -q %build -%configure --enable-debug --enable-ext-type=ib +%configure --enable-ext-type=ib make %{?_smp_mflags} %install @@ -132,9 +132,6 @@ fi %{_mandir}/man5/*.5* %changelog -* Thu Aug 21 2008 Arlin Davis - 2.0.12 -- DAT/DAPL Version 2.0.12 Release 1, OFED 1.4 RC - * Sun Jul 20 2008 Arlin Davis - 2.0.11 - DAT/DAPL Version 2.0.11 Release 1, IB UD extensions in SCM provider -- 1.5.2.5 From arlin.r.davis at intel.com Mon Sep 1 20:12:48 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 1 Sep 2008 20:12:48 -0700 Subject: [ofa-general] [ANNOUNCE] compat-dapl-1.2.10 and dapl-2.0.13 Release Message-ID: New DAPL releases now available from OFA download page: http://www.openfabrics.org/downloads/dapl/ md5sum: 3998feecc43a66c979c3742c05c2bb62 compat-dapl-1.2.10.tar.gz md5sum: 0aa99a9f5a888cc554686d24c4f23369 dapl-2.0.13.tar.gz Summary of changes since last release: v1.2,v2.0 - cleanup all warnings in tests, common code, and providers v1.2,v2.0 - fix Fedora build Vlad, please pick up new packages and install following for OFED 1.4 rc1: compat-dapl-1.2.10-1 compat-dapl-devel-1.2.10-1 dapl-2.0.13-1 dapl-utils-2.0.13-1 dapl-devel-2.0.13-1 dapl-debuginfo-2.0.13-1 -arlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Tue Sep 2 03:01:53 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 2 Sep 2008 03:01:53 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080902-0200 daily build status Message-ID: <20080902100153.A4645E60865@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From sashak at voltaire.com Tue Sep 2 06:37:20 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 2 Sep 2008 16:37:20 +0300 Subject: [ofa-general] Re: [PATCH] opensm/Makefile.am: adding yacc-generated .h file as dependency In-Reply-To: <48BC0440.8050807@dev.mellanox.co.il> References: <48BC0440.8050807@dev.mellanox.co.il> Message-ID: <20080902133720.GK19828@sashak.voltaire.com> Hi Yevgeny, On 18:03 Mon 01 Sep , Yevgeny Kliteynik wrote: > > Adding header file that is produced by yacc/bison to the > general dependencies. W/o it compiling of lex-generated > .c file sometimes fails. Do you have a log of failure? Sasha > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/opensm/Makefile.am | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/opensm/opensm/Makefile.am b/opensm/opensm/Makefile.am > index 7ca4c2a..f94842c 100644 > --- a/opensm/opensm/Makefile.am > +++ b/opensm/opensm/Makefile.am > @@ -126,7 +126,7 @@ opensminclude_HEADERS = \ > $(srcdir)/../include/opensm/osm_vl15intf.h \ > $(top_builddir)/include/opensm/osm_version.h > > -BUILT_SOURCES = osm_version > +BUILT_SOURCES = osm_version osm_qos_parser_y.h > osm_version: > if [ -x $(top_srcdir)/../gen_ver.sh ] ; then \ > ver_file=$(top_builddir)/include/opensm/osm_version.h ; \ > -- > 1.5.1.4 > From kliteyn at dev.mellanox.co.il Tue Sep 2 06:55:45 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 02 Sep 2008 16:55:45 +0300 Subject: [ofa-general] Re: [PATCH] opensm/Makefile.am: adding yacc-generated .h file as dependency In-Reply-To: <20080902133720.GK19828@sashak.voltaire.com> References: <48BC0440.8050807@dev.mellanox.co.il> <20080902133720.GK19828@sashak.voltaire.com> Message-ID: <48BD45E1.3000900@dev.mellanox.co.il> Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 18:03 Mon 01 Sep , Yevgeny Kliteynik wrote: >> Adding header file that is produced by yacc/bison to the >> general dependencies. W/o it compiling of lex-generated >> .c file sometimes fails. > > Do you have a log of failure? ... /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -I/usr/local/ofed/include -O2 -g -fmessage-length=0 -D_FORTIFY_ SOURCE=2 -L/usr/local/ofed/lib64 -L/usr/local/ofed/lib -o libopensm.la -rpath /usr/local/ofed/lib64 -version-info 2:2:0 -export-dynamic -Wl,--version-scrip t=./libopensm.map libopensm_la-osm_log.lo libopensm_la-osm_mad_pool.lo libopensm_la-osm_helper.lo -libumad -ldl -lpthread if gcc -DHAVE_CONFIG_H -I. -I. -I../include -I./../include -I./../../libibcommon/include -I./../../libibumad/include -I/usr/local/ofed/include -O2 -g -fme ssage-length=0 -D_FORTIFY_SOURCE=2 -I/usr/local/ofed/include -Wall -DOSM_VENDOR_INTF_OPENIB -fno-strict-aliasing -DVENDOR_RMPP_SUPPORT -DDUAL_SIDED_RMPP -g -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -I/usr/local/ofed/include -MT opensm-osm_qos_parser_l.o -MD -MP -MF ".de ps/opensm-osm_qos_parser_l.Tpo" -c -o opensm-osm_qos_parser_l.o `test -f 'osm_qos_parser_l.c' || echo './'`osm_qos_parser_l.c; \ then mv -f ".deps/opensm-osm_qos_parser_l.Tpo" ".deps/opensm-osm_qos_parser_l.Po"; else rm -f ".deps/opensm-osm_qos_parser_l.Tpo"; exit 1; fi osm_qos_parser_l.l:49:30: error: osm_qos_parser_y.h: No such file or directory osm_qos_parser_l.l: In function 'yylex': osm_qos_parser_l.l:206: error: 'TK_TEXT' undeclared (first use in this function) ... Full log attached. The problem and solution is described here: http://www.gnu.org/software/libtool/manual/automake/Yacc-and-Lex.html -- Yevgeny > Sasha > >> Signed-off-by: Yevgeny Kliteynik >> --- >> opensm/opensm/Makefile.am | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/opensm/opensm/Makefile.am b/opensm/opensm/Makefile.am >> index 7ca4c2a..f94842c 100644 >> --- a/opensm/opensm/Makefile.am >> +++ b/opensm/opensm/Makefile.am >> @@ -126,7 +126,7 @@ opensminclude_HEADERS = \ >> $(srcdir)/../include/opensm/osm_vl15intf.h \ >> $(top_builddir)/include/opensm/osm_version.h >> >> -BUILT_SOURCES = osm_version >> +BUILT_SOURCES = osm_version osm_qos_parser_y.h >> osm_version: >> if [ -x $(top_srcdir)/../gen_ver.sh ] ; then \ >> ver_file=$(top_builddir)/include/opensm/osm_version.h ; \ >> -- >> 1.5.1.4 >> > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: opensm.rpmbuild.log URL: From sashak at voltaire.com Tue Sep 2 07:18:05 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 2 Sep 2008 17:18:05 +0300 Subject: [ofa-general] Re: [PATCH] opensm/Makefile.am: adding yacc-generated .h file as dependency In-Reply-To: <48BC0440.8050807@dev.mellanox.co.il> References: <48BC0440.8050807@dev.mellanox.co.il> Message-ID: <20080902141805.GR19828@sashak.voltaire.com> On 18:03 Mon 01 Sep , Yevgeny Kliteynik wrote: > Hi Sasha, > > Adding header file that is produced by yacc/bison to the > general dependencies. W/o it compiling of lex-generated > .c file sometimes fails. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From sashak at voltaire.com Tue Sep 2 07:24:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 2 Sep 2008 17:24:00 +0300 Subject: [ofa-general] [PATCH] opensm: move vendor specific compilation flags to config.h Message-ID: <20080902142400.GS19828@sashak.voltaire.com> From: Ira Weiny Move vendor specific compilation flags VENDOR_RMPP_SUPPORT, DUAL_SIDED_RMPP, and OSM_VENDOR_INTF_* to config.h. Signed-off-by: Ira Weiny Signed-off-by: Sasha Khapyorsky --- opensm/config/osmvsel.m4 | 11 ++++++----- opensm/libvendor/Makefile.am | 6 +----- opensm/opensm/Makefile.am | 12 ++---------- opensm/opensm/osm_sa_multipath_record.c | 4 ++-- opensm/osmtest/Makefile.am | 6 +----- 5 files changed, 12 insertions(+), 27 deletions(-) diff --git a/opensm/config/osmvsel.m4 b/opensm/config/osmvsel.m4 index 96208b2..74d5f79 100644 --- a/opensm/config/osmvsel.m4 +++ b/opensm/config/osmvsel.m4 @@ -65,7 +65,7 @@ with_sim="/usr") dnl based on the with_osmv we can try the vendor flag if test $with_osmv = "openib"; then - OSMV_CFLAGS="-DOSM_VENDOR_INTF_OPENIB" + AC_DEFINE(OSM_VENDOR_INTF_OPENIB, 1, [Define as 1 for OpenIB vendor]) OSMV_INCLUDES="-I\$(srcdir)/../include -I\$(srcdir)/../../libibcommon/include -I\$(srcdir)/../../libibumad/include -I\$(includedir)" OSMV_LDADD="-L\$(abs_srcdir)/../../libibumad/.libs -L\$(abs_srcdir)/../../libibcommon/.libs -L\$(libdir) -libumad -libcommon" @@ -76,12 +76,13 @@ if test $with_osmv = "openib"; then if test "x$with_umad_includes" != "x"; then OSMV_INCLUDES="-I$with_umad_includes $OSMV_INCLUDES" fi + AC_DEFINE(DUAL_SIDED_RMPP, 1, [Define as 1 if you want Dual Sided RMPP Support]) elif test $with_osmv = "sim" ; then - OSMV_CFLAGS="-DOSM_VENDOR_INTF_SIM" + AC_DEFINE(OSM_VENDOR_INTF_SIM, 1, [Define as 1 for sim vendor]) OSMV_INCLUDES="-I$with_sim/include -I\$(srcdir)/../include" OSMV_LDADD="-L$with_sim/lib -libmscli" elif test $with_osmv = "gen1"; then - OSMV_CFLAGS="-DOSM_VENDOR_INTF_TS" + AC_DEFINE(OSM_VENDOR_INTF_TS, 1, [Define as 1 for ts vendor]) if test -z $MTHOME; then MTHOME=/usr/local/ibgd/driver/infinihost @@ -111,7 +112,7 @@ elif test $with_osmv = "gen1"; then fi OSMV_LDADD="-L/usr/local/ibgd/driver/infinihost/lib -lvapi -lmosal -lmtl_common -lmpga" elif test $with_osmv = "vapi"; then - OSMV_CFLAGS="-DOSM_VENDOR_INTF_MTL" + AC_DEFINE(OSM_VENDOR_INTF_MTL, 1, [Define as 1 for vapi vendor]) OSMV_INCLUDES="-I/usr/mellanox/include -I/usr/include -I\$(srcdir)/../include" OSMV_LDADD="-L/usr/lib -L/usr/mellanox/lib -lib_mgt -lvapi -lmosal -lmtl_common -lmpga" else @@ -122,9 +123,9 @@ AM_CONDITIONAL(OSMV_VAPI, test $with_osmv = "vapi") AM_CONDITIONAL(OSMV_GEN1, test $with_osmv = "gen1") AM_CONDITIONAL(OSMV_SIM, test $with_osmv = "sim") AM_CONDITIONAL(OSMV_OPENIB, test $with_osmv = "openib") +AC_DEFINE(VENDOR_RMPP_SUPPORT, 1, [Define as 1 if you want Vendor RMPP Support]) AC_SUBST(with_osmv) -AC_SUBST(OSMV_CFLAGS) AC_SUBST(OSMV_LDADD) AC_SUBST(OSMV_INCLUDES) diff --git a/opensm/libvendor/Makefile.am b/opensm/libvendor/Makefile.am index f72dbbe..f359dac 100644 --- a/opensm/libvendor/Makefile.am +++ b/opensm/libvendor/Makefile.am @@ -11,11 +11,7 @@ INCLUDES = $(OSMV_INCLUDES) lib_LTLIBRARIES = libosmvendor.la -if OSMV_OPENIB -libosmvendor_la_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT -DDUAL_SIDED_RMPP $(DBGFLAGS) -else -libosmvendor_la_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -endif +libosmvendor_la_CFLAGS = -Wall $(OSMV_CFLAGS) $(DBGFLAGS) if HAVE_LD_VERSION_SCRIPT libosmvendor_version_script = -Wl,--version-script=$(srcdir)/libosmvendor.map diff --git a/opensm/opensm/Makefile.am b/opensm/opensm/Makefile.am index f94842c..522977c 100644 --- a/opensm/opensm/Makefile.am +++ b/opensm/opensm/Makefile.am @@ -9,11 +9,7 @@ else DBGFLAGS = -g endif -if OSMV_OPENIB -libopensm_la_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT -DDUAL_SIDED_RMPP $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -else -libopensm_la_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -endif +libopensm_la_CFLAGS = -Wall $(OSMV_CFLAGS) $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 if HAVE_LD_VERSION_SCRIPT libopensm_version_script = -Wl,--version-script=$(srcdir)/libopensm.map @@ -63,11 +59,7 @@ opensm_SOURCES = main.c osm_console_io.c osm_console.c osm_db_files.c \ AM_YFLAGS:= -d -if OSMV_OPENIB -opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing -DVENDOR_RMPP_SUPPORT -DDUAL_SIDED_RMPP $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -else -opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -endif +opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 # we need to be able to load libraries from local build subtree before make install # we always give precedence to local tree libs and then use the pre-installed ones. diff --git a/opensm/opensm/osm_sa_multipath_record.c b/opensm/opensm/osm_sa_multipath_record.c index 2b8e00a..c0a4904 100644 --- a/opensm/opensm/osm_sa_multipath_record.c +++ b/opensm/opensm/osm_sa_multipath_record.c @@ -40,12 +40,12 @@ * This object is part of the opensm family of objects. */ -#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) - #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + #include #include #include diff --git a/opensm/osmtest/Makefile.am b/opensm/osmtest/Makefile.am index 236cdcf..785f1af 100644 --- a/opensm/osmtest/Makefile.am +++ b/opensm/osmtest/Makefile.am @@ -13,11 +13,7 @@ osmtest_SOURCES = main.c osmtest.c osmt_service.c osmt_slvl_vl_arb.c \ if OSMV_VAPI osmtest_SOURCES += osmt_mtl_regular_qp.c endif -if OSMV_OPENIB -osmtest_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT -DDUAL_SIDED_RMPP $(DBGFLAGS) -else -osmtest_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -endif +osmtest_CFLAGS = -Wall $(OSMV_CFLAGS) $(DBGFLAGS) osmtest_LDADD = -L../complib -losmcomp -L../libvendor -losmvendor -L../opensm -lopensm $(OSMV_LDADD) EXTRA_DIST = $(srcdir)/include/osmt_inform.h \ -- 1.5.4.rc2.60.gb2e62 From vlad at mellanox.co.il Tue Sep 2 09:41:40 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 02 Sep 2008 19:41:40 +0300 Subject: [ofa-general] Re: [PATCH] IB/mlx4: Set RAE and FRE flags, initialize mtt_sz field in the mpt entry. In-Reply-To: References: <20080901141103.GA32171@mellanox.co.il> Message-ID: <1220373700.13477.26.camel@vlad-laptop> On Mon, 2008-09-01 at 08:48 -0700, Roland Dreier wrote: > I need help deciding whether to get this in 2.6.27 or not. With this > patch, how is send queue fast register working? If this is the last fix > then I think we can get it in 2.6.27. If you are still debugging and it > still doesn't work well, then I might want to wait and see how big the > required fixes end up being. > > Thanks, > Roland Hi Roland, I am still debugging it, there might be more fixes. Regards, Vladimir From jeff at splitrockpr.com Tue Sep 2 09:56:29 2008 From: jeff at splitrockpr.com (Jeffrey Scott) Date: Tue, 02 Sep 2008 09:56:29 -0700 Subject: [ofa-general] IBTA Technical Forum Message-ID: Hello OFA Members - the technical forum is just two weeks away! Please register and book your travel now. We have added more speakers to our agenda, including Jacob Hall from Wachovia. The full agenda is posted at http://www.infinibandta.org/events/IBTATechForum08_. Event: InfiniBand Trade Association's Annual Technical Forum Date: Monday, September 15, 2008 Time: 8am - 5pm with networking reception immediately following Location: Harrah's Las Vegas Register: www.regonline.com/IBTATechForum08 Rate: $299 We're inviting the entire InfiniBand Community! Please assist us in spreading the word about this event to the entire InfiniBand community. The IBTA has created a formal invitation which has been posted online at: http://www.infinibandta.org/events/IBTATechForum08_/Invite_FINAL_081108.pdf. Feel free to forward this invite within your company and to colleagues, vendors, partners, prospects and customers. If you have any questions, please contact: Cheri Winterberg, 978-660-6405, cheriw at owenmedia.com. We'll see you in Las Vegas! -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy.grover at oracle.com Tue Sep 2 13:04:19 2008 From: andy.grover at oracle.com (Andy Grover) Date: Tue, 02 Sep 2008 13:04:19 -0700 Subject: [ofa-general] [RFC] dropping RDS over TCP support Message-ID: <48BD9C43.5020400@oracle.com> We've been discussing dropping RDS's support for using TCP as a transport, and just focusing on RDS as a IB and iWARP-focused protocol. This would simplify the RDS codebase, allow easier inclusion of more IB-centric features, and also give RDS an easier path towards mainline Linux kernel inclusion. Also, the imminent RDS iWARP support will address non-IB use cases. Any objections? Anyone using it? Thanks -- Andy From chu11 at llnl.gov Tue Sep 2 13:12:27 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 02 Sep 2008 13:12:27 -0700 Subject: [ofa-general] Re: [IBSIM] add ReLink command In-Reply-To: <20080831134503.GL27535@sashak.voltaire.com> References: <1219964487.29252.318.camel@cardanus.llnl.gov> <20080831134503.GL27535@sashak.voltaire.com> Message-ID: <1220386347.29252.358.camel@cardanus.llnl.gov> Hey Sasha, > So if one asked for ReLinking whole node? I think it should be > straightforward - restore links for all ports where previous_remote* > exists. What do you think? I didn't think of that before, but I think it's a good idea. So I tweaked it to handle this case when a port isn't specified. > Maybe "restore previously disconnected link(s)" help message? Actually it > is almost same :) Now that you mention it, I think "restore" is a better word to use than "reconnect". So I've now tweaked it to "restore previously unconnected". The new patch is attached. Thanks, Al On Sun, 2008-08-31 at 16:45 +0300, Sasha Khapyorsky wrote: > Hi Al, > > On 16:01 Thu 28 Aug , Al Chu wrote: > > Hey Sasha, > > > > This adds a "ReLink" command to ibsim. If a link was previously > > unlinked, you can run "ReLink" to reconnect it to whatever it was > > connected to before. It's easier than having to figure out what it was > > connected to previously and input both the local and remote ends under > > the "Link" command. > > > > The idea for this option came up when I was trying to simulate an entire > > cluster going down then going back up. Scripting the cluster to go down > > was easy ("Unlink" all CAs), but scripting it to come back up was a > > little harder since I had to figure out all the other end ports to input > > into "Link". > > > > Al > > > > -- > > Albert Chu > > chu11 at llnl.gov > > 925-422-5311 > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > > From ec9cf72ac3dc5950337aa577f49ada6b8887d579 Mon Sep 17 00:00:00 2001 > > From: Albert Chu > > Date: Thu, 28 Aug 2008 15:25:14 -0700 > > Subject: [PATCH] add relink command > > > > > > Signed-off-by: Albert Chu > > --- > > ibsim/sim.h | 2 + > > ibsim/sim_cmd.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 70 insertions(+), 0 deletions(-) > > > > diff --git a/ibsim/sim.h b/ibsim/sim.h > > index f989252..32a4e20 100644 > > --- a/ibsim/sim.h > > +++ b/ibsim/sim.h > > @@ -206,6 +206,8 @@ struct Port { > > char alias[ALIASLEN + 1]; > > Node *remotenode; > > int remoteport; > > + Node *previous_remotenode; > > + int previous_remoteport; > > int errrate; > > uint16_t errattr; > > Node *node; > > diff --git a/ibsim/sim_cmd.c b/ibsim/sim_cmd.c > > index d55fb4c..39eb316 100644 > > --- a/ibsim/sim_cmd.c > > +++ b/ibsim/sim_cmd.c > > @@ -149,14 +149,79 @@ static int do_link(FILE * f, char *line) > > if (link_ports(lport, rport) < 0) > > return -fprintf(f, > > "# can't link: local/remote port are already connected\n"); > > + > > + lport->previous_remotenode = NULL; > > + rport->previous_remotenode = NULL; > > + > > + return 0; > > +} > > + > > +static int do_relink(FILE * f, char *line) > > +{ > > + Port *lport, *rport; > > + Node *lnode; > > + char *orig = 0; > > + char *lnodeid = 0; > > + char *s = line, name[NAMELEN], *sp; > > + int lportnum = -1; > > + > > + // parse local > > + if (strsep(&s, "\"")) > > + orig = strsep(&s, "\""); > > + > > + lnodeid = expand_name(orig, name, &sp); > > + if (!sp && s && *s == '[') > > + sp = s + 1; > > + > > + DEBUG("lnodeid %s port [%s", lnodeid, sp); > > + if (!(lnode = find_node(lnodeid))) { > > + fprintf(f, "# nodeid \"%s\" (%s) not found\n", orig, lnodeid); > > + return -1; > > + } > > + > > + if (sp) { > > + lportnum = strtoul(sp, &sp, 0); > > + if (lportnum < 1 || lportnum > lnode->numports) { > > + fprintf(f, "# nodeid \"%s\": bad port %d\n", > > + lnodeid, lportnum); > > + return -1; > > + } > > + } else { > > + fprintf(f, "# no local port\n"); > > + return -1; > > So if one asked for ReLinking whole node? I think it should be > straightforward - restore links for all ports where previous_remote* > exists. What do you think? > > > + } > > + > > + lport = node_get_port(lnode, lportnum); > > + > > + if (!lport->previous_remotenode) { > > + fprintf(f, "# no previous link stored\n"); > > + return -1; > > + } > > + > > + rport = node_get_port(lport->previous_remotenode, lport->previous_remoteport); > > + > > + if (link_ports(lport, rport) < 0) > > + return -fprintf(f, > > + "# can't link: local/remote port are already connected\n"); > > + > > + lport->previous_remotenode = NULL; > > + rport->previous_remotenode = NULL; > > + > > return 0; > > } > > > > + > > No need extra lines between functions. > > > static void unlink_port(Node * lnode, Port * lport, Node * rnode, int rportnum) > > { > > Port *rport = node_get_port(rnode, rportnum); > > Port *endport; > > > > + /* save current connection for potential relink later */ > > + lport->previous_remotenode = lport->remotenode; > > + lport->previous_remoteport = lport->remoteport; > > + rport->previous_remotenode = rport->remotenode; > > + rport->previous_remoteport = rport->remoteport; > > + > > lport->remotenode = rport->remotenode = 0; > > lport->remoteport = rport->remoteport = 0; > > lport->remotenodeid[0] = rport->remotenodeid[0] = 0; > > @@ -713,6 +778,7 @@ static int dump_help(FILE * f) > > fprintf(f, "\tDump [nodeid] (def all network)\n"); > > fprintf(f, "\tRoute \n"); > > fprintf(f, "\tLink \"nodeid\"[port] \"remoteid\"[port]\n"); > > + fprintf(f, "\tReLink \"nodeid\"[port] : reconnect previously unconnected link\n"); > > Maybe "restore previously disconnected link(s)" help message? Actually it > is almost same :) > > Sasha > > > fprintf(f, "\tUnlink \"nodeid\" : remove all links of the node\n"); > > fprintf(f, "\tUnlink \"nodeid\"[port]\n"); > > fprintf(f, > > @@ -814,6 +880,8 @@ int do_cmd(char *buf, FILE *f) > > * > > * please specify new command support below this comment. > > */ > > + else if (!strncasecmp(line, "ReLink", cmd_len)) > > + r = do_relink(f, line); > > else if (*line != '\n' && *line != '\0') > > fprintf(f, "command \'%s\' unknown - skipped\n", line); > > > > -- > > 1.5.4.5 > > > -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-add-relink-command.patch Type: text/x-patch Size: 4285 bytes Desc: not available URL: From rdreier at cisco.com Tue Sep 2 13:20:10 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 Sep 2008 13:20:10 -0700 Subject: [ofa-general] [PATCH 2.6.27] ib/cm: free cm_device structure In-Reply-To: (Sean Hefty's message of "Mon, 25 Aug 2008 12:13:15 -0700") References: Message-ID: thanks, applied. From jon at opengridcomputing.com Tue Sep 2 13:24:15 2008 From: jon at opengridcomputing.com (Jon Mason) Date: Tue, 2 Sep 2008 15:24:15 -0500 Subject: [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <48BD9C43.5020400@oracle.com> References: <48BD9C43.5020400@oracle.com> Message-ID: <20080902202415.GG32022@opengridcomputing.com> On Tue, Sep 02, 2008 at 01:04:19PM -0700, Andy Grover wrote: > We've been discussing dropping RDS's support for using TCP as a > transport, and just focusing on RDS as a IB and iWARP-focused protocol. > > This would simplify the RDS codebase, allow easier inclusion of more > IB-centric features, and also give RDS an easier path towards mainline > Linux kernel inclusion. Also, the imminent RDS iWARP support will > address non-IB use cases. > > Any objections? Anyone using it? I found it useful for early development of RDS iWARP support. While I believe that mainline inclusion is much more important, I don't see any harm in keeping it around. Is there a thread on lkml listing it as an issue for mainline inclusion? Thanks, Jon > > Thanks -- Andy > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Tue Sep 2 13:24:42 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 Sep 2008 13:24:42 -0700 Subject: [ofa-general] [PATCH] Bug 988: BMA responses are discarded in kernel In-Reply-To: (Michael Brooks's message of "Wed, 27 Aug 2008 12:44:28 -0500") References: Message-ID: thanks, applied. (this patch was corrupted, because your mailer converted it to quoted-printable; in the future please use an MUA that can handle patches properly. I fixed this one up by hand) From rdreier at cisco.com Tue Sep 2 13:26:51 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 Sep 2008 13:26:51 -0700 Subject: [ofa-general] Re: [PATCH] drivers/infiniband/core: Use a NULL test rather than an IS_ERR test In-Reply-To: <200808281531.07942.brunel@diku.dk> (Julien Brunel's message of "Thu, 28 Aug 2008 15:31:07 +0200") References: <200808281531.07942.brunel@diku.dk> Message-ID: thanks, applied From rdreier at cisco.com Tue Sep 2 13:28:25 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 Sep 2008 13:28:25 -0700 Subject: [ofa-general] [PATCH] IB/ipath - fix SLID generation for RC/UC QPs In-Reply-To: <20080829171645.14033.34664.stgit@eng-46.mv.qlogic.com> (Ralph Campbell's message of "Fri, 29 Aug 2008 10:16:45 -0700") References: <20080829171645.14033.34664.stgit@eng-46.mv.qlogic.com> Message-ID: thanks, applied. From rdreier at cisco.com Tue Sep 2 13:28:38 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 Sep 2008 13:28:38 -0700 Subject: [ofa-general] Re: [PATCH] IB/mlx4: Set RAE and FRE flags, initialize mtt_sz field in the mpt entry. In-Reply-To: <20080901141103.GA32171@mellanox.co.il> (Vladimir Sokolovsky's message of "Mon, 1 Sep 2008 17:11:03 +0300") References: <20080901141103.GA32171@mellanox.co.il> Message-ID: thanks, applied. From andy.grover at oracle.com Tue Sep 2 13:50:39 2008 From: andy.grover at oracle.com (Andy Grover) Date: Tue, 02 Sep 2008 13:50:39 -0700 Subject: [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <20080902202415.GG32022@opengridcomputing.com> References: <48BD9C43.5020400@oracle.com> <20080902202415.GG32022@opengridcomputing.com> Message-ID: <48BDA71F.5030708@oracle.com> Jon Mason wrote: > On Tue, Sep 02, 2008 at 01:04:19PM -0700, Andy Grover wrote: >> We've been discussing dropping RDS's support for using TCP as a >> transport, and just focusing on RDS as a IB and iWARP-focused protocol. > I found it useful for early development of RDS iWARP support. Hmm! Were you actually running it or just using it as a reference? > While I > believe that mainline inclusion is much more important, I don't see any > harm in keeping it around. Is there a thread on lkml listing it as an > issue for mainline inclusion? We haven't brought up rds inclusion on lkml yet, but I think positioning RDS as an IB protocol is a good thing. RDS may be a better fit in drivers/infiniband/ulp/rds, rather than net/rds, for example. Removing unused code is also always a good thing. Lastly, we're looking to extend rds with more ib-centric features in the future, so it would be a development burden to tunnel that over the TCP transport...and that starts to sound a lot like reinventing iwarp ;-) Regards -- Andy From jgunthorpe at obsidianresearch.com Tue Sep 2 14:03:09 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 2 Sep 2008 15:03:09 -0600 Subject: [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <48BDA71F.5030708@oracle.com> References: <48BD9C43.5020400@oracle.com> <20080902202415.GG32022@opengridcomputing.com> <48BDA71F.5030708@oracle.com> Message-ID: <20080902210309.GW4314@obsidianresearch.com> On Tue, Sep 02, 2008 at 01:50:39PM -0700, Andy Grover wrote: > Lastly, we're looking to extend rds with more ib-centric features in the > future, so it would be a development burden to tunnel that over the TCP > transport...and that starts to sound a lot like reinventing iwarp ;-) Has anyone talked about a SW implementation of iWarp? It seems to me this same question is going to keep coming up the more protocols are developed.. Even if there is no HW offload a SW only version should get reasonable performance relative to straight TCP I'd think.. Jason From jon at opengridcomputing.com Tue Sep 2 14:07:09 2008 From: jon at opengridcomputing.com (Jon Mason) Date: Tue, 2 Sep 2008 16:07:09 -0500 Subject: [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <48BDA71F.5030708@oracle.com> References: <48BD9C43.5020400@oracle.com> <20080902202415.GG32022@opengridcomputing.com> <48BDA71F.5030708@oracle.com> Message-ID: <20080902210709.GI32022@opengridcomputing.com> On Tue, Sep 02, 2008 at 01:50:39PM -0700, Andy Grover wrote: > Jon Mason wrote: > > On Tue, Sep 02, 2008 at 01:04:19PM -0700, Andy Grover wrote: > >> We've been discussing dropping RDS's support for using TCP as a > >> transport, and just focusing on RDS as a IB and iWARP-focused protocol. > > > I found it useful for early development of RDS iWARP support. > > Hmm! Were you actually running it or just using it as a reference? I was running it to see if there were any issues with iWARP and IB co-existing (as I was developing a stand-alone iWARP RDS method at the time). It does work, contrary to the documentation. My only reason to keep it in would be for those developers who do not have access to IB/iWARP hardware. There may be people who would want to sue it for something beyond is current usage bad this would lower the bar of entry....but I can't imagine who they are or why they would want to. > > While I > > believe that mainline inclusion is much more important, I don't see any > > harm in keeping it around. Is there a thread on lkml listing it as an > > issue for mainline inclusion? > > We haven't brought up rds inclusion on lkml yet, but I think positioning > RDS as an IB protocol is a good thing. RDS may be a better fit in > drivers/infiniband/ulp/rds, rather than net/rds, for example. Removing the TCP module would remove the need for it to be in the net/ dir (as it would then be only IB/iWARP). It will also remove the need to have it go through the netdev mailing list..which may or may not be a good thing. > Removing unused code is also always a good thing. > > Lastly, we're looking to extend rds with more ib-centric features in the > future, so it would be a development burden to tunnel that over the TCP > transport...and that starts to sound a lot like reinventing iwarp ;-) I have no issues with removing it. I simply wanted to make sure that is is not being used. Thanks, Jon > Regards -- Andy From rdreier at cisco.com Tue Sep 2 14:24:22 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 Sep 2008 14:24:22 -0700 Subject: [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <20080902210309.GW4314@obsidianresearch.com> (Jason Gunthorpe's message of "Tue, 2 Sep 2008 15:03:09 -0600") References: <48BD9C43.5020400@oracle.com> <20080902202415.GG32022@opengridcomputing.com> <48BDA71F.5030708@oracle.com> <20080902210309.GW4314@obsidianresearch.com> Message-ID: > Has anyone talked about a SW implementation of iWarp? http://www.osc.edu/research/network_file/projects/iwarp/iwarp_main.shtml From or.gerlitz at gmail.com Tue Sep 2 14:48:53 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 3 Sep 2008 00:48:53 +0300 Subject: ***SPAM*** Re: [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <48BD9C43.5020400@oracle.com> References: <48BD9C43.5020400@oracle.com> Message-ID: <15ddcffd0809021448n55b47b2fv6f23c0e47c38b807@mail.gmail.com> On Tue, Sep 2, 2008 at 11:04 PM, Andy Grover wrote: > We've been discussing dropping RDS's support for using TCP as a transport, > and just focusing on RDS as a IB and iWARP-focused protocol. do we have any results that compare Oracle IPC using rds/bcopy/tcp vs udp? This would simplify the RDS codebase, allow easier inclusion of more > IB-centric features, and also give RDS an easier path towards mainline Linux > kernel inclusion. Also, the imminent RDS iWARP support will address non-IB > use cases. So just to make sure, do IB and iWARP share the same transport code today? if yes, does removing TCP means the transport abstraction would not be needed any more, or you still want to maintain it for the loopback case? Generally speaking, the loopback transport also uses IB, correct? and if it doesn't I am quite sure it can. I tend to agree with Jon that removing TCP might help with mainline inclusion or might create damage... Roland, maybe you have more definitive intuitions re the netdev people potential feedback on rds as a new socket type applicable to RDMA cards such as IB and iWARP using a verbs/rdmacm native transport AND to non RDMA cards with TCP transport, vs the case of RDS being "just" a ULP under drivers/infiniband/ulps that defines a new socket type, etc. Or -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon at opengridcomputing.com Tue Sep 2 14:57:21 2008 From: jon at opengridcomputing.com (Jon Mason) Date: Tue, 2 Sep 2008 16:57:21 -0500 Subject: ***SPAM*** Re: [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <15ddcffd0809021448n55b47b2fv6f23c0e47c38b807@mail.gmail.com> References: <48BD9C43.5020400@oracle.com> <15ddcffd0809021448n55b47b2fv6f23c0e47c38b807@mail.gmail.com> Message-ID: <20080902215721.GJ32022@opengridcomputing.com> On Wed, Sep 03, 2008 at 12:48:53AM +0300, Or Gerlitz wrote: > On Tue, Sep 2, 2008 at 11:04 PM, Andy Grover wrote: > > > We've been discussing dropping RDS's support for using TCP as a transport, > > and just focusing on RDS as a IB and iWARP-focused protocol. > > > do we have any results that compare Oracle IPC using rds/bcopy/tcp vs udp? > > This would simplify the RDS codebase, allow easier inclusion of more > > IB-centric features, and also give RDS an easier path towards mainline Linux > > kernel inclusion. Also, the imminent RDS iWARP support will address non-IB > > use cases. > > > So just to make sure, do IB and iWARP share the same transport code today? > if yes, does removing TCP means the transport abstraction would not be > needed any more, or you still want to maintain it for the loopback case? > Generally speaking, the loopback transport also uses IB, correct? and if it > doesn't I am quite sure it can. Not all iWARP devices can do loopback, so having it done in the IB specific code could be painful. I believe there is a loopback module in RDS which can handle this though. > > I tend to agree with Jon that removing TCP might help with mainline > inclusion or might create damage... > > Roland, maybe you have more definitive intuitions re the netdev people > potential feedback on rds as a new socket type applicable to RDMA cards such > as IB and iWARP using a verbs/rdmacm native transport AND to non RDMA cards > with TCP transport, vs the case of RDS being "just" a ULP under > drivers/infiniband/ulps that defines a new socket type, etc. > > > Or > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From richard.frank at oracle.com Tue Sep 2 15:42:05 2008 From: richard.frank at oracle.com (Richard Frank) Date: Tue, 02 Sep 2008 18:42:05 -0400 Subject: [rds-devel] ***SPAM*** Re: [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <20080902215721.GJ32022@opengridcomputing.com> References: <48BD9C43.5020400@oracle.com> <15ddcffd0809021448n55b47b2fv6f23c0e47c38b807@mail.gmail.com> <20080902215721.GJ32022@opengridcomputing.com> Message-ID: <48BDC13D.9050806@oracle.com> Jon Mason wrote: > On Wed, Sep 03, 2008 at 12:48:53AM +0300, Or Gerlitz wrote: > >> On Tue, Sep 2, 2008 at 11:04 PM, Andy Grover wrote: >> >> >>> We've been discussing dropping RDS's support for using TCP as a transport, >>> and just focusing on RDS as a IB and iWARP-focused protocol. >>> >> do we have any results that compare Oracle IPC using rds/bcopy/tcp vs udp? >> >> This would simplify the RDS codebase, allow easier inclusion of more >> >>> IB-centric features, and also give RDS an easier path towards mainline Linux >>> kernel inclusion. Also, the imminent RDS iWARP support will address non-IB >>> use cases. >>> >> So just to make sure, do IB and iWARP share the same transport code today? >> if yes, does removing TCP means the transport abstraction would not be >> needed any more, or you still want to maintain it for the loopback case? >> Generally speaking, the loopback transport also uses IB, correct? and if it >> doesn't I am quite sure it can. >> > > Not all iWARP devices can do loopback, so having it done in the IB > specific code could be painful. I believe there is a loopback module in > RDS which can handle this though. > Currently, loop back (connecting to local ip:port) is handled by the transport (with IB it uses a local IB RC) - we moved to this to simplify the driver and support things like performing local process comm including rdma'ing between processes - which is a key feature in of itself. If a particular transport can not support this - then it should either emulate operations when possible - or fail them. For example, getting a key for rdma may fail if the connection is loop back. > >> I tend to agree with Jon that removing TCP might help with mainline >> inclusion or might create damage... >> >> Roland, maybe you have more definitive intuitions re the netdev people >> potential feedback on rds as a new socket type applicable to RDMA cards such >> as IB and iWARP using a verbs/rdmacm native transport AND to non RDMA cards >> with TCP transport, vs the case of RDS being "just" a ULP under >> drivers/infiniband/ulps that defines a new socket type, etc. >> >> >> Or >> > > >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> > > _______________________________________________ > rds-devel mailing list > rds-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/rds-devel > From richard.frank at oracle.com Tue Sep 2 15:49:18 2008 From: richard.frank at oracle.com (Richard Frank) Date: Tue, 02 Sep 2008 18:49:18 -0400 Subject: [rds-devel] [ofa-general] [RFC] dropping RDS over TCP support In-Reply-To: <15ddcffd0809021448n55b47b2fv6f23c0e47c38b807@mail.gmail.com> References: <48BD9C43.5020400@oracle.com> <15ddcffd0809021448n55b47b2fv6f23c0e47c38b807@mail.gmail.com> Message-ID: <48BDC2EE.5050105@oracle.com> Or Gerlitz wrote: > On Tue, Sep 2, 2008 at 11:04 PM, Andy Grover > wrote: > > We've been discussing dropping RDS's support for using TCP as a > transport, and just focusing on RDS as a IB and iWARP-focused > protocol. > > > do we have any results that compare Oracle IPC using rds/bcopy/tcp vs udp? > Nothing current - would be great to see rds-stress data for RDS/TCP over 10GE compared to RDS/IB over 10GIB and RDS/IWARP over 10G. The purpose of the TCP transport was to support simple ethernet NICs with bcopy. Our thinking at the time was that TCP (even if the path lengths are longer than UDP) would be more efficient under heavy load - than running UDP from user mode. > This would simplify the RDS codebase, allow easier inclusion of > more IB-centric features, and also give RDS an easier path towards > mainline Linux kernel inclusion. Also, the imminent RDS iWARP > support will address non-IB use cases. > > > So just to make sure, do IB and iWARP share the same transport code > today? if yes, does removing TCP means the transport abstraction would > not be needed any more, or you still want to maintain it for the > loopback case? Generally speaking, the loopback transport also uses > IB, correct? and if it doesn't I am quite sure it can. > > I tend to agree with Jon that removing TCP might help with mainline > inclusion or might create damage... > Why does having the TCP module affect the issue of main line inclusion - what is / are the issues ? > Roland, maybe you have more definitive intuitions re the netdev people > potential feedback on rds as a new socket type applicable to RDMA > cards such as IB and iWARP using a verbs/rdmacm native transport AND > to non RDMA cards with TCP transport, vs the case of RDS being "just" > a ULP under drivers/infiniband/ulps that defines a new socket type, etc. > > > Or > ------------------------------------------------------------------------ > > _______________________________________________ > rds-devel mailing list > rds-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/rds-devel From amirv at mellanox.co.il Wed Sep 3 00:42:17 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Wed, 03 Sep 2008 10:42:17 +0300 Subject: [ofa-general] Re: [PATCH] libsdp: enable fallback to TCP for nonblocking sockets In-Reply-To: <48B6E63F.6060309@gmail.com> References: <48AC445D.2050704@gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD5865EA@mtlexch01.mtl.com> <48AD9C80.8030305@gmail.com> <1219590681.1564.10.camel@amirv-laptop> <48B2CD3A.5020509@gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD61E699@mtlexch01.mtl.com> <48B6E63F.6060309@gmail.com> Message-ID: <1220427737.6824.3.camel@mtllpt156.mtl.com> Yossi Hi, Because you need things fixed immediately I applied your "enable fallback to TCP..." patch. And will fix it ASAP - not to break the non blocking semantics. If your IO signals solution looks good I'll be happy to use it instead. - Amir. On Thu, 2008-08-28 at 20:54 +0300, Yossi Etigin wrote: > Hi, > > I'm attempting to do this with IO signals - install a signal handler > that > will be called when the connect fails, and it will do the fallback. > > --Yossi > > Amir Vadai wrote: > > > > Yossi Hi, > > > > I'm on vacation till Monday. > > I'll check when can we have the full fix - and if it is not in the > near > > future > > we'll put your patch till the full fix be prepared. > > > > - Amir > > > > -----Original Message----- > > From: Yossi Etigin [mailto:yossi.openib at gmail.com] > > Sent: Mon 8/25/2008 6:18 PM > > To: Amir Vadai > > Cc: general list; Oren Duer; Olga Shern > > Subject: Re: [PATCH] libsdp: enable fallback to TCP for nonblocking > sockets > > > > Hi Amir, > > > > The single case in which we block connect() here (and only on SDP, > which > > is rather fast) is the case that is currenlty not supported anyway. > It can > > also be configurable. > > Anyway, we have a client which uses non-blocking sockets and really > needs > > that feature. How about putting this to OFED now and writing > something > > better > > later on? > > > > --Yossi > > > > > > Amir Vadai wrote: > > > See below > > > > > > On Thu, 2008-08-21 at 19:49 +0300, Yossi Etigin wrote: > > >> Hi Amir, > > >> > > >> What you suggesting is to replace almost all socket functions, > and I > > >> don't think that this is good either. > > > I agree - but to break the non-blocking semantics is worse. > > > > > >> It would be write(), send(), recv(), sendto(), recvfrom(), > sendmsg(), > > >> recvmsg(), and also need to change select() (to not return when > > >> fallback > > >> happens if SDP fails), and maybe also poll(). libsdp tries to > avoid > > >> the fast path. > > > I don't see another option. We could have a #ifdef to enable the > user > > > to choose - non blocking support or cleaner fast-path. > > >> Besides, how do we know when to do fallback - can we safely > assume > > >> that if some socket operation fails, then it happened because > > >> connect() failed? > > >>From a brief look at connect man page, they say we should use > select for > > > writing on the socket. after select indicates writability, use > > > getsockopt to determine whether connect() completed successfully > or not. > > >> Anyway, if I understand correctly, you suggest something like: > > >> > > >> int connect(fd, ...) > > >> { > > >> ... > > >> set_state(fd, SDP) > > >> ... > > >> } > > >> > > >> > > >> int read(int fd, ...) > > >> { > > >> int res = socket_funcs.read(shadow_fd(fd), ...); > > >> if (res < 0 && errno != EAGAIN && sock_state(fd) == SDP) > { > > >> sock_state = TCP; > > >> sockt_funs.connect(fd,...); > > >> close(shadow_fd(fd)); > > >> errno = EAGAIN; > > >> } > > >> return res; > > >> } > > >> > > >> > > > ... again, I don't like it too - but I don't think we should > block > > > connect when the user asks not to. > > > - Amir. > > >> --Yossi > > >> > > >> Amir Vadai wrote: > > >>> Yossi Hi, > > >>> > > >>> I think that breaking the semantic of non blocking socket is a > bad > > >> idea. > > >>> There is a solution that won't break this semantics: > > >>> > > >>> 1. User app calls connect(). > > >>> - libsdp try to connect through sdp. > > >>> 2. User app try another operation on the socket (e.g > read/write) > > >>> - if sdp connection established successfully - great > > >>> - if sdp still not established - return -EAGAIN. This is > the > > >>> same behaviour as if the tcp connection wasn't connected yet. > > >>> - if sdp timedout - return -EAGAIN and initiate TCP > connect. > > >>> - if tcp connection established - use it > > >>> - if tcp connection timedout - return error. > > >>> > > >>> Maybe we could optimize it and initiate a tcp connection in > parallel > > >>> with the sdp connection and use it only when the sdp connect is > > >>> timedout. > > >>> > > >>> I will add only the second patch (the debug print fix). > > >>> > > >>> - Amir > > >>> > > >>> > > >> > > >> > > > > > > > From sashak at voltaire.com Wed Sep 3 02:47:06 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 3 Sep 2008 12:47:06 +0300 Subject: [ofa-general] Re: [IBSIM] add ReLink command In-Reply-To: <1220386347.29252.358.camel@cardanus.llnl.gov> References: <1219964487.29252.318.camel@cardanus.llnl.gov> <20080831134503.GL27535@sashak.voltaire.com> <1220386347.29252.358.camel@cardanus.llnl.gov> Message-ID: <20080903094706.GE21573@sashak.voltaire.com> On 13:12 Tue 02 Sep , Al Chu wrote: > From 5316be9376a36d3ed075be9ccff58f07aaeb0cbd Mon Sep 17 00:00:00 2001 > From: Albert Chu > Date: Thu, 28 Aug 2008 15:25:14 -0700 > Subject: [PATCH] add relink command > > > Signed-off-by: Albert Chu Applied. Thanks. Sasha From vlad at lists.openfabrics.org Wed Sep 3 03:05:20 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 3 Sep 2008 03:05:20 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080903-0200 daily build status Message-ID: <20080903100520.B8702E608A2@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From tziporet at mellanox.co.il Wed Sep 3 09:06:27 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 3 Sep 2008 19:06:27 +0300 Subject: [ofa-general] OFED status toward RC1 Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD6BB502@mtlexch01.mtl.com> Hi These are the open items we had for RC1: Done: - iSER - MVAPICH2 1.1 - Open MPI 1.2.7 - Extended QP verb - decided not to include this change for 1.4 and use the workaround we had in 1.3 Not done: - NFS/RDMA support for SLES10 - Jeff when do you expect this will be ready In addition we have a critical bug that IPv6 is not working over IPoIB from kernel 2.6.23 and below - Vlad debugging this We thus delay the RC1 release to Friday (if this issue will be closed by tomorrow) of Monday next week. Tziporet From Jeffrey.C.Becker at nasa.gov Wed Sep 3 09:14:51 2008 From: Jeffrey.C.Becker at nasa.gov (Jeff Becker) Date: Wed, 03 Sep 2008 09:14:51 -0700 Subject: [ofa-general] Re: [ewg] OFED status toward RC1 In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD6BB502@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD6BB502@mtlexch01.mtl.com> Message-ID: <48BEB7FB.9010802@nasa.gov> Hi Tziporet Tziporet Koren wrote: > Hi > > These are the open items we had for RC1: > Done: > - iSER > - MVAPICH2 1.1 > - Open MPI 1.2.7 > - Extended QP verb - decided not to include this change for 1.4 and > use the workaround we had in 1.3 > > Not done: > - NFS/RDMA support for SLES10 - Jeff when do you expect this will be > ready > I should get it to completely build today, and then I will do some light testing. When it passes, I will send my patches to Vlad, hopefully by the end of this week. Thanks. -jeff > In addition we have a critical bug that IPv6 is not working over IPoIB > from kernel 2.6.23 and below - Vlad debugging this > > We thus delay the RC1 release to Friday (if this issue will be closed by > tomorrow) of Monday next week. > > > Tziporet > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From chu11 at llnl.gov Wed Sep 3 11:21:18 2008 From: chu11 at llnl.gov (Al Chu) Date: Wed, 03 Sep 2008 11:21:18 -0700 Subject: [ofa-general] [OPENSM] fix console segfault corner casem Message-ID: <1220466078.29252.386.camel@cardanus.llnl.gov> Hey Sasha, If the call to osm_console_init() fails (most typically b/c bind fails b/c the port is already used), we can fall through into osm_console() and segfault b/c a bunch of stuff isn't initialized properly. Can be handled multiple ways. The patch below makes osm_console_init() return a non-void so we can recognize if an error occurred. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fix-segfault-corner-case-when-osm_console_init-fails.patch Type: text/x-patch Size: 3511 bytes Desc: not available URL: From chu11 at llnl.gov Wed Sep 3 11:21:22 2008 From: chu11 at llnl.gov (Al Chu) Date: Wed, 03 Sep 2008 11:21:22 -0700 Subject: [ofa-general] [OPENSM] close console socket Message-ID: <1220466082.29252.387.camel@cardanus.llnl.gov> Hey Sasha, While fixing the console segfault issue, I noticed that the console socket never seems to be closed. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-close-console-socket-on-cleanup-path.patch Type: text/x-patch Size: 786 bytes Desc: not available URL: From christopher.tanner at gatech.edu Wed Sep 3 11:32:15 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Wed, 3 Sep 2008 14:32:15 -0400 Subject: [ofa-general] Compiling source using Intel Compiler Message-ID: Has anyone built the various IB source packages using the Intel compilers? The configure, make, and make install all progressed without any errors. However, when I try to start OpenSM, I get the following error error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory The LD_LIBRARY_PATH contains the path to the icc and ifort lib directories, so this is not the problem. The reason I'm building from source is because I'm trying to utilize Infiniband on an Ubuntu cluster. Additionally, I need to use the Intel compilers as some of our Fortran programs cannot be compiled using gfortran... Thanks! ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- From aj.guillon at gmail.com Wed Sep 3 11:49:00 2008 From: aj.guillon at gmail.com (AJ Guillon) Date: Wed, 3 Sep 2008 14:49:00 -0400 Subject: [ofa-general] ***SPAM*** Interrupt RDMA Read In-Reply-To: <2f3bf9a60808312300q778c7aaen23b2ca70d5f2c1ea@mail.gmail.com> References: <9870a2060808311148h65c7950g735e5d33d4690960@mail.gmail.com> <2f3bf9a60808312300q778c7aaen23b2ca70d5f2c1ea@mail.gmail.com> Message-ID: <5CE471FB-7159-4BFE-BD1B-371089AB8ED6@gmail.com> Hrrrm. That's really too bad because I would like to use RDMA to steal work from other nodes along with dependent memory. If I'm loading memory for a task on one node, and another node steals the task, the node from which the task was stolen should stop fetching memory required for the now stolen task. A more complex scheduler might be able to deal with this but maybe not optimally. Suggestions for workarounds? AJ On Sep 1, 2008, at 2:00 AM, "Dotan Barak" wrote: > As much as i know, once you posted a WR, you can not cancel it. > The only thing that you can do is flush the whole QP by changing the > QP state to ERROR (which flushes the work Queues and produces > completion for every WR) or to RESET, which cleans the Queues from the > WRs. > > > Dotan > > On Sun, Aug 31, 2008 at 9:48 PM, Adrien Guillon > wrote: >> Hey, >> >> How can I interrupt an RDMA read cleanly? In my case, I might decide >> that I don't need to read some memory anymore (because something else >> happened), so I want to abort. >> >> AJ >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> From tziporet at dev.mellanox.co.il Wed Sep 3 12:32:10 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 03 Sep 2008 22:32:10 +0300 Subject: [ofa-general] Re: [ewg] OFED status toward RC1 In-Reply-To: <48BEB7FB.9010802@nasa.gov> References: <5D49E7A8952DC44FB38C38FA0D758EAD6BB502@mtlexch01.mtl.com> <48BEB7FB.9010802@nasa.gov> Message-ID: <48BEE63A.2040707@mellanox.co.il> Jeff Becker wrote: >> >> Not done: >> - NFS/RDMA support for SLES10 - Jeff when do you expect this will >> be ready >> > I should get it to completely build today, and then I will do some > light testing. When it passes, I will send my patches to Vlad, > hopefully by the end of this week. Thanks. > So we will wait with RC1 for Monday Can you send first patches to Vlad today so he will try them tomorrow, therwise if we will have a problem it will delay the release to Tuesday Thanks Tziporet From dotanba at gmail.com Wed Sep 3 23:15:08 2008 From: dotanba at gmail.com (Dotan Barak) Date: Thu, 4 Sep 2008 09:15:08 +0300 Subject: [ofa-general] ***SPAM*** Interrupt RDMA Read In-Reply-To: <5CE471FB-7159-4BFE-BD1B-371089AB8ED6@gmail.com> References: <9870a2060808311148h65c7950g735e5d33d4690960@mail.gmail.com> <2f3bf9a60808312300q778c7aaen23b2ca70d5f2c1ea@mail.gmail.com> <5CE471FB-7159-4BFE-BD1B-371089AB8ED6@gmail.com> Message-ID: <2f3bf9a60809032315ha6ba13cobdb9682c1cbe0a5f@mail.gmail.com> How would you solve it if you would have used TCP/IP sockets? Dotan On Wed, Sep 3, 2008 at 9:49 PM, AJ Guillon wrote: > Hrrrm. That's really too bad because I would like to use RDMA to steal work > from other nodes along with dependent memory. If I'm loading memory for a > task on one node, and another node steals the task, the node from which the > task was stolen should stop fetching memory required for the now stolen > task. A more complex scheduler might be able to deal with this but maybe not > optimally. > > Suggestions for workarounds? > > AJ > > On Sep 1, 2008, at 2:00 AM, "Dotan Barak" wrote: > >> As much as i know, once you posted a WR, you can not cancel it. >> The only thing that you can do is flush the whole QP by changing the >> QP state to ERROR (which flushes the work Queues and produces >> completion for every WR) or to RESET, which cleans the Queues from the >> WRs. >> >> >> Dotan >> >> On Sun, Aug 31, 2008 at 9:48 PM, Adrien Guillon >> wrote: >>> >>> Hey, >>> >>> How can I interrupt an RDMA read cleanly? In my case, I might decide >>> that I don't need to read some memory anymore (because something else >>> happened), so I want to abort. >>> >>> AJ >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>> > From vlad at lists.openfabrics.org Thu Sep 4 03:03:37 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 4 Sep 2008 03:03:37 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080904-0200 daily build status Message-ID: <20080904100337.5264EE60A04@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From chu11 at llnl.gov Thu Sep 4 10:00:58 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 04 Sep 2008 10:00:58 -0700 Subject: [ofa-general] [OPENSM] fix console segfault corner casem In-Reply-To: <1220466078.29252.386.camel@cardanus.llnl.gov> References: <1220466078.29252.386.camel@cardanus.llnl.gov> Message-ID: <1220547658.26758.23.camel@cardanus.llnl.gov> Hey Sasha, I thought of a way to make it slightly cleaner. New patch attached. Al On Wed, 2008-09-03 at 11:21 -0700, Al Chu wrote: > Hey Sasha, > > If the call to osm_console_init() fails (most typically b/c bind fails > b/c the port is already used), we can fall through into osm_console() > and segfault b/c a bunch of stuff isn't initialized properly. Can be > handled multiple ways. The patch below makes osm_console_init() return > a non-void so we can recognize if an error occurred. > > Al > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fix-segfault-corner-case-when-osm_console_init-fails.patch Type: text/x-patch Size: 3494 bytes Desc: not available URL: From aj.guillon at gmail.com Thu Sep 4 10:36:09 2008 From: aj.guillon at gmail.com (AJ Guillon) Date: Thu, 4 Sep 2008 13:36:09 -0400 Subject: [ofa-general] ***SPAM*** Interrupt RDMA Read In-Reply-To: <2f3bf9a60809032315ha6ba13cobdb9682c1cbe0a5f@mail.gmail.com> References: <9870a2060808311148h65c7950g735e5d33d4690960@mail.gmail.com> <2f3bf9a60808312300q778c7aaen23b2ca70d5f2c1ea@mail.gmail.com> <5CE471FB-7159-4BFE-BD1B-371089AB8ED6@gmail.com> <2f3bf9a60809032315ha6ba13cobdb9682c1cbe0a5f@mail.gmail.com> Message-ID: Reading the socket would block. Use a signal to interrupt and have a variable set to tell it to abort. AJ On Sep 4, 2008, at 2:15 AM, "Dotan Barak" wrote: > How would you solve it if you would have used TCP/IP sockets? > > Dotan > > On Wed, Sep 3, 2008 at 9:49 PM, AJ Guillon > wrote: >> Hrrrm. That's really too bad because I would like to use RDMA to >> steal work >> from other nodes along with dependent memory. If I'm loading memory >> for a >> task on one node, and another node steals the task, the node from >> which the >> task was stolen should stop fetching memory required for the now >> stolen >> task. A more complex scheduler might be able to deal with this but >> maybe not >> optimally. >> >> Suggestions for workarounds? >> >> AJ >> >> On Sep 1, 2008, at 2:00 AM, "Dotan Barak" wrote: >> >>> As much as i know, once you posted a WR, you can not cancel it. >>> The only thing that you can do is flush the whole QP by changing the >>> QP state to ERROR (which flushes the work Queues and produces >>> completion for every WR) or to RESET, which cleans the Queues from >>> the >>> WRs. >>> >>> >>> Dotan >>> >>> On Sun, Aug 31, 2008 at 9:48 PM, Adrien Guillon >> > >>> wrote: >>>> >>>> Hey, >>>> >>>> How can I interrupt an RDMA read cleanly? In my case, I might >>>> decide >>>> that I don't need to read some memory anymore (because something >>>> else >>>> happened), so I want to abort. >>>> >>>> AJ >>>> _______________________________________________ >>>> general mailing list >>>> general at lists.openfabrics.org >>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>> >>>> To unsubscribe, please visit >>>> http://openib.org/mailman/listinfo/openib-general >>>> >> From sashak at voltaire.com Thu Sep 4 11:18:50 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 Sep 2008 21:18:50 +0300 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: References: Message-ID: <20080904181850.GC6273@sashak.voltaire.com> On 14:32 Wed 03 Sep , Christopher Tanner wrote: > Has anyone built the various IB source packages using the Intel compilers? > The configure, make, and make install all progressed without any errors. > However, when I try to start OpenSM, I get the following error > > error while loading shared libraries: libimf.so: cannot open shared object > file: No such file or directory We don't have such library libimf.so. It is something from icc... > The LD_LIBRARY_PATH contains the path to the icc and ifort lib directories, > so this is not the problem. The reason I'm building from source is because > I'm trying to utilize Infiniband on an Ubuntu cluster. Additionally, I need > to use the Intel compilers as some of our Fortran programs cannot be > compiled using gfortran... But why you cannot use gcc for building OFED packages? Sasha From hal.rosenstock at gmail.com Thu Sep 4 13:26:07 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 4 Sep 2008 16:26:07 -0400 Subject: [ofa-general] Re: [PATCH] ibsim: Add support for vendor ID and system image GUID In-Reply-To: <20080831144645.GN27535@sashak.voltaire.com> References: <48A30108.4010307@obsidianresearch.com> <20080818201718.GJ27204@sashak.voltaire.com> <48B5E711.7030503@obsidianresearch.com> <20080831144645.GN27535@sashak.voltaire.com> Message-ID: Hi Sasha, On Sun, Aug 31, 2008 at 10:46 AM, Sasha Khapyorsky wrote: > Hi Hal, > > On 17:45 Wed 27 Aug , Hal Rosenstock wrote: >>>> diff --git a/ibsim/sim_net.c b/ibsim/sim_net.c >>>> index 6e3c0e9..146bcde 100644 >>>> --- a/ibsim/sim_net.c >>>> +++ b/ibsim/sim_net.c >>>> @@ -190,7 +190,9 @@ char (*aliases)[NODEIDLEN + NODEPREFIX + 1]; // >>>> aliases map format: "%s@%s" >>>> int netnodes, netswitches, netports, netaliases; >>>> char netprefix[NODEPREFIX + 1]; >>>> +int netvendid; >>>> int netdevid; >>>> +uint64_t netsysimgguid; >>>> int netwidth = DEFAULT_LINKWIDTH; >>>> int netspeed = DEFAULT_LINKSPEED; >>>> @@ -324,11 +326,12 @@ static Node *new_node(int type, char *nodename, >>>> char *nodedesc, int nodeports) >>>> } >>>> mad_set_field(nd->nodeinfo, 0, IB_NODE_NPORTS_F, nd->numports); >>>> + mad_set_field(nd->nodeinfo, 0, IB_NODE_VENDORID_F, netvendid); >>>> mad_set_field(nd->nodeinfo, 0, IB_NODE_DEVID_F, netdevid); >>>> mad_encode_field(nd->nodeinfo, IB_NODE_GUID_F, &nd->nodeguid); >>>> mad_encode_field(nd->nodeinfo, IB_NODE_PORT_GUID_F, &nd->nodeguid); >>>> - mad_encode_field(nd->nodeinfo, IB_NODE_SYSTEM_GUID_F, &nd->nodeguid); >>>> + mad_encode_field(nd->nodeinfo, IB_NODE_SYSTEM_GUID_F, &netsysimgguid); >>>> >>> >>> And when netsysimgguid was not parsed for this node, it will put previous >>> value there (or "0" if it was never parsed)? >>> >> Is "state" for a node in the topology file needed to deal with this ? >> Something like the following: When the vendor ID line is seen, reset >> netsysimgguid and if 0 when new_node is invoked, then use the node GUID as >> currently done. Does that make sense ? > > Why to not reset netsysimgguid unconditionally at end of new_node()? Sure; that's better as the "state" for new node is already determined. Updated patch to follow shortly. -- Hal > The rest could be as you said: > > mad_encode_field(nd->nodeinfo, IB_NODE_SYSTEM_GUID_F, > netsysimgguid ? &netsysimgguid : &nd->nodeguid); > > Sasha > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at obsidianresearch.com Thu Sep 4 13:41:26 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Thu, 04 Sep 2008 14:41:26 -0600 Subject: [ofa-general] [PATCHv2] ibsim: Add support for vendor ID and system image GUID] Message-ID: <48C047F6.9040605@obsidianresearch.com> Sasha, Attached is the updated patch for adding support for vendor ID and system image GUID to ibsim utilizing your idea to reset netsysimgguid to 0 in new_node. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-ibsim-sysimg3 URL: From christopher.tanner at gatech.edu Thu Sep 4 13:50:47 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Thu, 4 Sep 2008 16:50:47 -0400 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: <20080904181850.GC6273@sashak.voltaire.com> References: <20080904181850.GC6273@sashak.voltaire.com> Message-ID: > We don't have such library libimf.so. It is something from icc... Yes, it is something from icc. The limited support for this type of error states that I need to load a compiler module in order for OpenSM to find the library. However, there's never a mention of the name of the module that I need to load. A modprobe -l *intel* gives the following: lvm-intel.ko intel_vr_nor.ko intelfb.ko intel-agp.ko intel.rng.ko snd-hda-intel.ko snd-intel8x0m.ko snd-intel8x0.ko intel-agp.ich9m.ko Nothing for a modprobe on *icc*. So, I'm stuck... > But why you cannot use gcc for building OFED packages? Our codes have a lot of Fortran 77 in them and gfortran hasn't been compiling those codes very well. Since we're using ifort for Fortran compiling, I figured we ought to use icc (C) and icpc (C++) to use a consistent compiler package. I don't know if programs partially compiled in gcc and ifort will work very well... ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- On Sep 4, 2008, at 2:18 PM, Sasha Khapyorsky wrote: > On 14:32 Wed 03 Sep , Christopher Tanner wrote: >> Has anyone built the various IB source packages using the Intel >> compilers? >> The configure, make, and make install all progressed without any >> errors. >> However, when I try to start OpenSM, I get the following error >> >> error while loading shared libraries: libimf.so: cannot open shared >> object >> file: No such file or directory > > We don't have such library libimf.so. It is something from icc... > >> The LD_LIBRARY_PATH contains the path to the icc and ifort lib >> directories, >> so this is not the problem. The reason I'm building from source is >> because >> I'm trying to utilize Infiniband on an Ubuntu cluster. >> Additionally, I need >> to use the Intel compilers as some of our Fortran programs cannot be >> compiled using gfortran... > > But why you cannot use gcc for building OFED packages? > > Sasha From halr at obsidianresearch.com Thu Sep 4 13:56:30 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Thu, 04 Sep 2008 14:56:30 -0600 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: References: <20080904181850.GC6273@sashak.voltaire.com> Message-ID: <48C04B7E.5080609@obsidianresearch.com> Christopher Tanner wrote: >> We don't have such library libimf.so. It is something from icc... > > Yes, it is something from icc. The limited support for this type of > error states that I need to load a compiler module in order for OpenSM > to find the library. However, there's never a mention of the name of > the module that I need to load. A modprobe -l *intel* gives the > following: > lvm-intel.ko > intel_vr_nor.ko > intelfb.ko > intel-agp.ko > intel.rng.ko > snd-hda-intel.ko > snd-intel8x0m.ko > snd-intel8x0.ko > intel-agp.ich9m.ko > > Nothing for a modprobe on *icc*. So, I'm stuck... Isn't it a library, not a module ? Shouldn't it be part of the icc install ? Does the link below help ? http://softwarecommunity.intel.com/isn/Community/en-us/search/SearchResults.aspx?q=libimf.so -- Hal > >> But why you cannot use gcc for building OFED packages? > > Our codes have a lot of Fortran 77 in them and gfortran hasn't been > compiling those codes very well. Since we're using ifort for Fortran > compiling, I figured we ought to use icc (C) and icpc (C++) to use a > consistent compiler package. I don't know if programs partially > compiled in gcc and ifort will work very well... > > ------------------------------------------- > Chris Tanner > Space Systems Design Lab > Georgia Institute of Technology > christopher.tanner at gatech.edu > ------------------------------------------- > > > > On Sep 4, 2008, at 2:18 PM, Sasha Khapyorsky wrote: > >> On 14:32 Wed 03 Sep , Christopher Tanner wrote: >>> Has anyone built the various IB source packages using the Intel >>> compilers? >>> The configure, make, and make install all progressed without any >>> errors. >>> However, when I try to start OpenSM, I get the following error >>> >>> error while loading shared libraries: libimf.so: cannot open shared >>> object >>> file: No such file or directory >> >> We don't have such library libimf.so. It is something from icc... >> >>> The LD_LIBRARY_PATH contains the path to the icc and ifort lib >>> directories, >>> so this is not the problem. The reason I'm building from source is >>> because >>> I'm trying to utilize Infiniband on an Ubuntu cluster. Additionally, >>> I need >>> to use the Intel compilers as some of our Fortran programs cannot be >>> compiled using gfortran... >> >> But why you cannot use gcc for building OFED packages? >> >> Sasha > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From caitlin.bestler at gmail.com Thu Sep 4 13:55:03 2008 From: caitlin.bestler at gmail.com (Caitlin Bestler) Date: Thu, 4 Sep 2008 13:55:03 -0700 Subject: [ofa-general] ***SPAM*** Interrupt RDMA Read In-Reply-To: <5CE471FB-7159-4BFE-BD1B-371089AB8ED6@gmail.com> References: <9870a2060808311148h65c7950g735e5d33d4690960@mail.gmail.com> <2f3bf9a60808312300q778c7aaen23b2ca70d5f2c1ea@mail.gmail.com> <5CE471FB-7159-4BFE-BD1B-371089AB8ED6@gmail.com> Message-ID: <469958e00809041355w20969a17gd32e71fd06c7bc2@mail.gmail.com> On Wed, Sep 3, 2008 at 11:49 AM, AJ Guillon wrote: > Hrrrm. That's really too bad because I would like to use RDMA to steal work > from other nodes along with dependent memory. If I'm loading memory for a > task on one node, and another node steals the task, the node from which the > task was stolen should stop fetching memory required for the now stolen > task. A more complex scheduler might be able to deal with this but maybe not > optimally. > > Suggestions for workarounds? > If the target device allows you to have multiple RDMA Reads in flight, you could just break the entire Read down into a series of N smaller reads, and have a fraction of that (say N/3 or N/4) in flight at a time. When you want to "cancel" the read, simply stop issuing the remaining Reads. If your read is small that the RTT time determines its duration rather than the length read then you didn't need to cancel anyway. From dillowda at ornl.gov Thu Sep 4 14:01:51 2008 From: dillowda at ornl.gov (David Dillow) Date: Thu, 04 Sep 2008 17:01:51 -0400 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: References: <20080904181850.GC6273@sashak.voltaire.com> Message-ID: <1220562111.7854.12.camel@obelisk.thedillows.org> On Thu, 2008-09-04 at 16:50 -0400, Christopher Tanner wrote: > > We don't have such library libimf.so. It is something from icc... > > Yes, it is something from icc. The limited support for this type of > error states that I need to load a compiler module in order for OpenSM > to find the library. [snip] > Nothing for a modprobe on *icc*. So, I'm stuck... You shouldn't need a kernel module for this.... If you do 'locate libimf.so' what do you get? If you get a path to it, try running openSM with LD_LIBRARY_PATH set to include that path, for example: $ locate libimf.so /opt/icc/some/path/libimf.so $ LD_LIBARY_PATH=/opt/icc/some/path /path/to/opensm options... If that works, then odds are good you installed icc (or its support libraries) incompletely -- are it's libraries installed in the correct place (or listed in /etc/ld.so.conf or /etc/ld.so.conf.d/*) If the path is listed in those places, did you run ldconfig as root after the install? -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office From christopher.tanner at gatech.edu Thu Sep 4 14:09:37 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Thu, 4 Sep 2008 17:09:37 -0400 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: <48C04B7E.5080609@obsidianresearch.com> References: <20080904181850.GC6273@sashak.voltaire.com> <48C04B7E.5080609@obsidianresearch.com> Message-ID: <6032B6F0-F04A-42A0-B155-5A4C57DF8EFB@gatech.edu> > Isn't it a library, not a module ? Yeah, which is why I'm really confused. Here's the link to the website which says I need to load a module (it's around the middle of the way down) http://asci-training.lanl.gov/BProc/ > Shouldn't it be part of the icc install ? Yup. The library exists and the path to it is in the LD_LIBRARY_PATH. Again, confusion. > Does the link below help ? Yes, I've read this before. However, I've also read that a static compilation won't work very well on a cluster, but I haven't tested it out. Since I'm stuck, I'll try this out to see if it works. Thanks Hal. ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- On Sep 4, 2008, at 4:56 PM, Hal Rosenstock wrote: > Christopher Tanner wrote: >>> We don't have such library libimf.so. It is something from icc... >> >> Yes, it is something from icc. The limited support for this type of >> error states that I need to load a compiler module in order for >> OpenSM to find the library. However, there's never a mention of the >> name of the module that I need to load. A modprobe -l *intel* gives >> the following: >> lvm-intel.ko >> intel_vr_nor.ko >> intelfb.ko >> intel-agp.ko >> intel.rng.ko >> snd-hda-intel.ko >> snd-intel8x0m.ko >> snd-intel8x0.ko >> intel-agp.ich9m.ko >> >> Nothing for a modprobe on *icc*. So, I'm stuck... > Isn't it a library, not a module ? > > Shouldn't it be part of the icc install ? > > Does the link below help ? > > http://softwarecommunity.intel.com/isn/Community/en-us/search/SearchResults.aspx?q=libimf.so > > > -- Hal >> >>> But why you cannot use gcc for building OFED packages? >> >> Our codes have a lot of Fortran 77 in them and gfortran hasn't been >> compiling those codes very well. Since we're using ifort for >> Fortran compiling, I figured we ought to use icc (C) and icpc (C++) >> to use a consistent compiler package. I don't know if programs >> partially compiled in gcc and ifort will work very well... >> >> ------------------------------------------- >> Chris Tanner >> Space Systems Design Lab >> Georgia Institute of Technology >> christopher.tanner at gatech.edu >> ------------------------------------------- >> >> >> >> On Sep 4, 2008, at 2:18 PM, Sasha Khapyorsky wrote: >> >>> On 14:32 Wed 03 Sep , Christopher Tanner wrote: >>>> Has anyone built the various IB source packages using the Intel >>>> compilers? >>>> The configure, make, and make install all progressed without any >>>> errors. >>>> However, when I try to start OpenSM, I get the following error >>>> >>>> error while loading shared libraries: libimf.so: cannot open >>>> shared object >>>> file: No such file or directory >>> >>> We don't have such library libimf.so. It is something from icc... >>> >>>> The LD_LIBRARY_PATH contains the path to the icc and ifort lib >>>> directories, >>>> so this is not the problem. The reason I'm building from source >>>> is because >>>> I'm trying to utilize Infiniband on an Ubuntu cluster. >>>> Additionally, I need >>>> to use the Intel compilers as some of our Fortran programs cannot >>>> be >>>> compiled using gfortran... >>> >>> But why you cannot use gcc for building OFED packages? >>> >>> Sasha >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Thu Sep 4 16:08:12 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 02:08:12 +0300 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: References: <20080904181850.GC6273@sashak.voltaire.com> Message-ID: <20080904230812.GJ6273@sashak.voltaire.com> On 16:50 Thu 04 Sep , Christopher Tanner wrote: > > Our codes have a lot of Fortran 77 in them and gfortran hasn't been > compiling those codes very well. Since we're using ifort for Fortran > compiling, I figured we ought to use icc (C) and icpc (C++) to use a > consistent compiler package. I don't know if programs partially compiled in > gcc and ifort will work very well... But you don't need ifort or gfortran for building OpenSM. So you can use gcc for OpenSM and icc/... for the rest. Sasha From sashak at voltaire.com Thu Sep 4 16:23:57 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 02:23:57 +0300 Subject: [ofa-general] [OPENSM] fix console segfault corner casem In-Reply-To: <1220547658.26758.23.camel@cardanus.llnl.gov> References: <1220466078.29252.386.camel@cardanus.llnl.gov> <1220547658.26758.23.camel@cardanus.llnl.gov> Message-ID: <20080904232357.GK6273@sashak.voltaire.com> On 10:00 Thu 04 Sep , Al Chu wrote: > >From 28b61a86e83f547409be6cd6b4a3c6a613e1123f Mon Sep 17 00:00:00 2001 > From: Albert Chu > Date: Thu, 4 Sep 2008 09:58:01 -0700 > Subject: [PATCH] fix segfault corner case when osm_console_init fails > > > Signed-off-by: Albert Chu Applied. Thanks. Sasha From sashak at voltaire.com Thu Sep 4 16:28:26 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 02:28:26 +0300 Subject: [ofa-general] Re: [OPENSM] close console socket In-Reply-To: <1220466082.29252.387.camel@cardanus.llnl.gov> References: <1220466082.29252.387.camel@cardanus.llnl.gov> Message-ID: <20080904232826.GL6273@sashak.voltaire.com> Hi Al, On 11:21 Wed 03 Sep , Al Chu wrote: > > diff --git a/opensm/opensm/osm_console_io.c b/opensm/opensm/osm_console_io.c > index 2822737..3d3ece4 100644 > --- a/opensm/opensm/osm_console_io.c > +++ b/opensm/opensm/osm_console_io.c > @@ -118,6 +118,10 @@ static void osm_console_close(osm_console_t * p_oct, osm_log_t * p_log) > p_oct->client_hn, p_oct->client_ip); > cio_close(p_oct); > } > + if (p_oct->socket > 0) { > + close(p_oct->socket); > + p_oct->socket = -1; > + } > #endif > } Would this work good for stdin (when local console is in use)? I see that fd_in descriptor is closed in cio_close(), isn't it enough (I didn't look closely yet)? Sasha From weiny2 at llnl.gov Thu Sep 4 16:41:44 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 4 Sep 2008 16:41:44 -0700 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: <20080904230812.GJ6273@sashak.voltaire.com> References: <20080904181850.GC6273@sashak.voltaire.com> <20080904230812.GJ6273@sashak.voltaire.com> Message-ID: <20080904164144.3637dc50.weiny2@llnl.gov> Christopher, Correct me if I am wrong below... On Fri, 5 Sep 2008 02:08:12 +0300 Sasha Khapyorsky wrote: > On 16:50 Thu 04 Sep , Christopher Tanner wrote: > > > > Our codes have a lot of Fortran 77 in them and gfortran hasn't been > > compiling those codes very well. Since we're using ifort for Fortran > > compiling, I figured we ought to use icc (C) and icpc (C++) to use a > > consistent compiler package. I don't know if programs partially compiled in > > gcc and ifort will work very well... > > But you don't need ifort or gfortran for building OpenSM. So you can use > gcc for OpenSM and icc/... for the rest. Sasha, I think he is compiling from the OFED release. Unfortunately I believe this only allows you to specify one complier for the entire "distro". Christopher, If you absolutely can't figure out why icc's libraries are being found, I can think of 2 alternatives. 1) Try and run install.pl 2 times with the different compilers. First to build only the packages required for MPI with icc. Then all the management and support stuff with gcc. I don't know if this is possible because I am afraid to run install.pl as root and have it corrupt one of my nodes right now. However, looking inside the script leads me to believe you can select the packages you want built. 2) Extract (from the OFED tarball) the OpenSM and management source rpms and build them with gcc. That list would be: opensm-3.2.2-1.ofed1.4.beta1.src.rpm infiniband-diags-1.4.1-1.ofed1.4.beta1.src.rpm libibcommon-1.1.1-1.ofed1.4.beta1.src.rpm libibmad-1.2.1-1.ofed1.4.beta1.src.rpm libibumad-1.2.1-1.ofed1.4.beta1.src.rpm Here at LLNL we have been building OFED pieces by hand for years. YMMV... Hope this helps, Ira From sashak at voltaire.com Thu Sep 4 16:50:04 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 02:50:04 +0300 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: <20080904164144.3637dc50.weiny2@llnl.gov> References: <20080904181850.GC6273@sashak.voltaire.com> <20080904230812.GJ6273@sashak.voltaire.com> <20080904164144.3637dc50.weiny2@llnl.gov> Message-ID: <20080904235004.GM6273@sashak.voltaire.com> On 16:41 Thu 04 Sep , Ira Weiny wrote: > > I think he is compiling from the OFED release. Unfortunately I believe this > only allows you to specify one complier for the entire "distro". And how about "CC=gcc ./configure"? Guess something similar may work rpmbuild, although management can be compiled from tarballs or git tree just fine. Sasha From chu11 at llnl.gov Thu Sep 4 17:07:00 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 04 Sep 2008 17:07:00 -0700 Subject: [ofa-general] Re: [OPENSM] close console socket In-Reply-To: <20080904232826.GL6273@sashak.voltaire.com> References: <1220466082.29252.387.camel@cardanus.llnl.gov> <20080904232826.GL6273@sashak.voltaire.com> Message-ID: <1220573220.27074.11.camel@cardanus.llnl.gov> Hey Sasha, On Fri, 2008-09-05 at 02:28 +0300, Sasha Khapyorsky wrote: > Hi Al, > > On 11:21 Wed 03 Sep , Al Chu wrote: > > > > diff --git a/opensm/opensm/osm_console_io.c b/opensm/opensm/osm_console_io.c > > index 2822737..3d3ece4 100644 > > --- a/opensm/opensm/osm_console_io.c > > +++ b/opensm/opensm/osm_console_io.c > > @@ -118,6 +118,10 @@ static void osm_console_close(osm_console_t * p_oct, osm_log_t * p_log) > > p_oct->client_hn, p_oct->client_ip); > > cio_close(p_oct); > > } > > + if (p_oct->socket > 0) { > > + close(p_oct->socket); > > + p_oct->socket = -1; > > + } > > #endif > > } > > Would this work good for stdin (when local console is in use)? As far as I can tell, p_oct->socket is only created when OSM_REMOTE_CONSOLE or OSM_LOOPBACK_CONSOLE is set (in osm_console_init ()). > I see that > fd_in descriptor is closed in cio_close(), isn't it enough (I didn't > look closely yet)? >From osm_console() it seems the in_fd (set via cio_open()) is the socket returned from accept() when a connection is accepted. I couldn't find where the original socket itself was actually being closed. Al > Sasha -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From raghuarur at gmail.com Thu Sep 4 18:17:46 2008 From: raghuarur at gmail.com (Raghu Arur) Date: Thu, 4 Sep 2008 18:17:46 -0700 Subject: [ofa-general] ***SPAM*** opensm master switchover Message-ID: <90a961640809041817wea775abtfa64aed623abcd2e@mail.gmail.com> When a opensm master changes in a subnet, is there a signal or event that is sent over that applications can listen to ? Thanks, From sashak at voltaire.com Thu Sep 4 18:21:21 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 04:21:21 +0300 Subject: [ofa-general] Re: [OPENSM] close console socket In-Reply-To: <1220573220.27074.11.camel@cardanus.llnl.gov> References: <1220466082.29252.387.camel@cardanus.llnl.gov> <20080904232826.GL6273@sashak.voltaire.com> <1220573220.27074.11.camel@cardanus.llnl.gov> Message-ID: <20080905012121.GN6273@sashak.voltaire.com> On 17:07 Thu 04 Sep , Al Chu wrote: > > As far as I can tell, p_oct->socket is only created when > OSM_REMOTE_CONSOLE or OSM_LOOPBACK_CONSOLE is set (in osm_console_init > ()). Ok, I see now. Sasha From sashak at voltaire.com Thu Sep 4 18:23:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 04:23:00 +0300 Subject: [ofa-general] Re: [OPENSM] close console socket In-Reply-To: <1220466082.29252.387.camel@cardanus.llnl.gov> References: <1220466082.29252.387.camel@cardanus.llnl.gov> Message-ID: <20080905012300.GO6273@sashak.voltaire.com> On 11:21 Wed 03 Sep , Al Chu wrote: > Subject: [PATCH] close console socket on cleanup path > > > Signed-off-by: Albert Chu Applied. Thanks. Sasha From sashak at voltaire.com Thu Sep 4 18:44:23 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 04:44:23 +0300 Subject: [ofa-general] ***SPAM*** opensm master switchover In-Reply-To: <90a961640809041817wea775abtfa64aed623abcd2e@mail.gmail.com> References: <90a961640809041817wea775abtfa64aed623abcd2e@mail.gmail.com> Message-ID: <20080905014423.GP6273@sashak.voltaire.com> On 18:17 Thu 04 Sep , Raghu Arur wrote: > When a opensm master changes in a subnet, is there a signal or event > that is sent over that applications can listen to ? Look at IBV_EVENT_SM_CHANGE in verbs.h (libibverbs). Sasha From aj.guillon at gmail.com Thu Sep 4 18:53:51 2008 From: aj.guillon at gmail.com (Adrien Guillon) Date: Thu, 4 Sep 2008 21:53:51 -0400 Subject: ***SPAM*** Re: [ofa-general] ***SPAM*** Interrupt RDMA Read In-Reply-To: <469958e00809041355w20969a17gd32e71fd06c7bc2@mail.gmail.com> References: <9870a2060808311148h65c7950g735e5d33d4690960@mail.gmail.com> <2f3bf9a60808312300q778c7aaen23b2ca70d5f2c1ea@mail.gmail.com> <5CE471FB-7159-4BFE-BD1B-371089AB8ED6@gmail.com> <469958e00809041355w20969a17gd32e71fd06c7bc2@mail.gmail.com> Message-ID: <9870a2060809041853m22d820f4gb7d1f533ba79390e@mail.gmail.com> That's another approach I have thought about... breaking the read into bite-sized chunks. However it seems to me that this could lead to more CPU and process time being spent working on managing network traffic. Realistically, I would only want to cancel RDMA reads which will take a relatively long time otherwise. So perhaps I set a data length maximum, and reads are broken down into segments of that maximum size as you suggested.... or I adjust the scheduler to not move tasks which large memory requirements. Another approach is to take a pattern from parallel programming: exponential backoff, but perhaps make it into exponential fetch... I fetch 1U, 2U, 4U, 16U, 32U... to n*nU (where U is the unit of measure, say KB or MB). This way I can cancel the request at certain points, but it becomes harder to cancel each time because I already have so much of the data. Some random thoughts anyways :-) AJ From sashak at voltaire.com Thu Sep 4 18:57:12 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 04:57:12 +0300 Subject: [ofa-general] Re: [PATCHv2] ibsim: Add support for vendor ID and system image GUID] In-Reply-To: <48C047F6.9040605@obsidianresearch.com> References: <48C047F6.9040605@obsidianresearch.com> Message-ID: <20080905015712.GQ6273@sashak.voltaire.com> On 14:41 Thu 04 Sep , Hal Rosenstock wrote: > Sasha, > > Attached is the updated patch for adding support for vendor ID and system > image GUID > to ibsim utilizing your idea to reset netsysimgguid to 0 in new_node. > > -- Hal > > ibsim: Add support for vendor ID and system image GUID > > Signed-off-by: Hal Rosenstock Applied with few fixes (see below). Thanks. > --- > v2: Reset netsysimgguid in new_node > > diff --git a/ibsim/sim_cmd.c b/ibsim/sim_cmd.c > index 820f77e..d587128 100644 > --- a/ibsim/sim_cmd.c > +++ b/ibsim/sim_cmd.c > @@ -571,8 +571,8 @@ static int dump_net(FILE * f, char *line) > fprintf(f, "\n%s %d \"%s\"", > node_type_name(node->type), > node->numports, node->nodeid); > - fprintf(f, "\tnodeguid %" PRIx64 "\n", node->nodeguid); > - > + fprintf(f, "\tnodeguid %" PRIx64 "\tsysimgguid %" PRIx64 "\n", > + node->nodeguid, node->sysguid); > nports = node->numports; > if (node->type == SWITCH_NODE) { > nports++; > diff --git a/ibsim/sim_net.c b/ibsim/sim_net.c > index 6e3c0e9..55da898 100644 > --- a/ibsim/sim_net.c > +++ b/ibsim/sim_net.c > @@ -190,7 +190,9 @@ char (*aliases)[NODEIDLEN + NODEPREFIX + 1]; // aliases map format: "%s@%s" > > int netnodes, netswitches, netports, netaliases; > char netprefix[NODEPREFIX + 1]; > +int netvendid; > int netdevid; > +uint64_t netsysimgguid; > int netwidth = DEFAULT_LINKWIDTH; > int netspeed = DEFAULT_LINKSPEED; > > @@ -324,11 +326,12 @@ static Node *new_node(int type, char *nodename, char *nodedesc, int nodeports) > } > > mad_set_field(nd->nodeinfo, 0, IB_NODE_NPORTS_F, nd->numports); > + mad_set_field(nd->nodeinfo, 0, IB_NODE_VENDORID_F, netvendid); > mad_set_field(nd->nodeinfo, 0, IB_NODE_DEVID_F, netdevid); > > mad_encode_field(nd->nodeinfo, IB_NODE_GUID_F, &nd->nodeguid); > mad_encode_field(nd->nodeinfo, IB_NODE_PORT_GUID_F, &nd->nodeguid); > - mad_encode_field(nd->nodeinfo, IB_NODE_SYSTEM_GUID_F, &nd->nodeguid); > + mad_encode_field(nd->nodeinfo, IB_NODE_SYSTEM_GUID_F, &netsysimgguid); As we discussed sysimage should be encoded to netsysimage if it is presnent in a file or otherwise to nodeguid (as it was). So I changed this to: mad_encode_field(nd->nodeinfo, IB_NODE_SYSTEM_GUID_F, netsysimgguid ? &netsysimgguid , &nd->nodeguid); > > if ((nd->portsbase = new_ports(nd, nodeports, firstport)) < 0) { > IBWARN("can't alloc %d ports for node %s", nodeports, > @@ -336,6 +339,8 @@ static Node *new_node(int type, char *nodename, char *nodedesc, int nodeports) > return 0; > } > > + netsysimgguid = 0; The same story is with newly introduced netvendid, added 'netvendid = 0' too. Sasha From vlad at lists.openfabrics.org Fri Sep 5 03:04:50 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 5 Sep 2008 03:04:50 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080905-0200 daily build status Message-ID: <20080905100450.5D39CE60972@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From yossi.openib at gmail.com Fri Sep 5 04:04:58 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Fri, 05 Sep 2008 14:04:58 +0300 Subject: [ofa-general] ***SPAM*** Re: [PATCH] libsdp: enable fallback to TCP for nonblocking sockets In-Reply-To: <1220427737.6824.3.camel@mtllpt156.mtl.com> References: <48AC445D.2050704@gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD5865EA@mtlexch01.mtl.com> <48AD9C80.8030305@gmail.com> <1219590681.1564.10.camel@amirv-laptop> <48B2CD3A.5020509@gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD61E699@mtlexch01.mtl.com> <48B6E63F.6060309@gmail.com> <1220427737.6824.3.camel@mtllpt156.mtl.com> Message-ID: <48C1125A.2050702@gmail.com> Thanks, Unfortunately the signal solution does not look so good, mainly because it creates a race with 'select', and interrupts system calls. Looks like doing the fallback inside the signal handler is not valid/compatible behaviour. Amir Vadai wrote: > Yossi Hi, > > Because you need things fixed immediately I applied your "enable > fallback to TCP..." patch. > > And will fix it ASAP - not to break the non blocking semantics. > > If your IO signals solution looks good I'll be happy to use it instead. > > - Amir. > > On Thu, 2008-08-28 at 20:54 +0300, Yossi Etigin wrote: >> Hi, >> >> I'm attempting to do this with IO signals - install a signal handler >> that >> will be called when the connect fails, and it will do the fallback. >> >> --Yossi >> >> Amir Vadai wrote: >>> Yossi Hi, >>> >>> I'm on vacation till Monday. >>> I'll check when can we have the full fix - and if it is not in the >> near >>> future >>> we'll put your patch till the full fix be prepared. >>> >>> - Amir >>> >>> -----Original Message----- >>> From: Yossi Etigin [mailto:yossi.openib at gmail.com] >>> Sent: Mon 8/25/2008 6:18 PM >>> To: Amir Vadai >>> Cc: general list; Oren Duer; Olga Shern >>> Subject: Re: [PATCH] libsdp: enable fallback to TCP for nonblocking >> sockets >>> Hi Amir, >>> >>> The single case in which we block connect() here (and only on SDP, >> which >>> is rather fast) is the case that is currenlty not supported anyway. >> It can >>> also be configurable. >>> Anyway, we have a client which uses non-blocking sockets and really >> needs >>> that feature. How about putting this to OFED now and writing >> something >>> better >>> later on? >>> >>> --Yossi >>> >>> >>> Amir Vadai wrote: >>> > See below >>> > >>> > On Thu, 2008-08-21 at 19:49 +0300, Yossi Etigin wrote: >>> >> Hi Amir, >>> >> >>> >> What you suggesting is to replace almost all socket functions, >> and I >>> >> don't think that this is good either. >>> > I agree - but to break the non-blocking semantics is worse. >>> > >>> >> It would be write(), send(), recv(), sendto(), recvfrom(), >> sendmsg(), >>> >> recvmsg(), and also need to change select() (to not return when >>> >> fallback >>> >> happens if SDP fails), and maybe also poll(). libsdp tries to >> avoid >>> >> the fast path. >>> > I don't see another option. We could have a #ifdef to enable the >> user >>> > to choose - non blocking support or cleaner fast-path. >>> >> Besides, how do we know when to do fallback - can we safely >> assume >>> >> that if some socket operation fails, then it happened because >>> >> connect() failed? >>> >>From a brief look at connect man page, they say we should use >> select for >>> > writing on the socket. after select indicates writability, use >>> > getsockopt to determine whether connect() completed successfully >> or not. >>> >> Anyway, if I understand correctly, you suggest something like: >>> >> >>> >> int connect(fd, ...) >>> >> { >>> >> ... >>> >> set_state(fd, SDP) >>> >> ... >>> >> } >>> >> >>> >> >>> >> int read(int fd, ...) >>> >> { >>> >> int res = socket_funcs.read(shadow_fd(fd), ...); >>> >> if (res < 0 && errno != EAGAIN && sock_state(fd) == SDP) >> { >>> >> sock_state = TCP; >>> >> sockt_funs.connect(fd,...); >>> >> close(shadow_fd(fd)); >>> >> errno = EAGAIN; >>> >> } >>> >> return res; >>> >> } >>> >> >>> >> >>> > ... again, I don't like it too - but I don't think we should >> block >>> > connect when the user asks not to. >>> > - Amir. >>> >> --Yossi >>> >> >>> >> Amir Vadai wrote: >>> >>> Yossi Hi, >>> >>> >>> >>> I think that breaking the semantic of non blocking socket is a >> bad >>> >> idea. >>> >>> There is a solution that won't break this semantics: >>> >>> >>> >>> 1. User app calls connect(). >>> >>> - libsdp try to connect through sdp. >>> >>> 2. User app try another operation on the socket (e.g >> read/write) >>> >>> - if sdp connection established successfully - great >>> >>> - if sdp still not established - return -EAGAIN. This is >> the >>> >>> same behaviour as if the tcp connection wasn't connected yet. >>> >>> - if sdp timedout - return -EAGAIN and initiate TCP >> connect. >>> >>> - if tcp connection established - use it >>> >>> - if tcp connection timedout - return error. >>> >>> >>> >>> Maybe we could optimize it and initiate a tcp connection in >> parallel >>> >>> with the sdp connection and use it only when the sdp connect is >>> >>> timedout. >>> >>> >>> >>> I will add only the second patch (the debug print fix). >>> >>> >>> >>> - Amir >>> >>> >>> >>> >>> >> >>> >> >>> > >>> >> > From Sumit.Gaur at Sun.COM Fri Sep 5 06:17:54 2008 From: Sumit.Gaur at Sun.COM (Sumit Gaur - Sun Microsystem) Date: Fri, 05 Sep 2008 18:47:54 +0530 Subject: [ofa-general] upgrade from 1.2.5* to 1.3.1 In-Reply-To: <20080905015440.9C33FE60D8F@openfabrics.org> References: <20080905015440.9C33FE60D8F@openfabrics.org> Message-ID: <48C13182.3020500@Sun.COM> Hi I have upgraded my OFED version from 1.2.5* to 1.3.1, Now application could not communicate with OFED libraries using umad_send and umad_recv function call for IB_SMI_CLASS (with DR path). Is there any major change in umad lib for such requests. Any help or info is appreciated. sumit From truelove at array.ca Fri Sep 5 06:50:14 2008 From: truelove at array.ca (Steven Truelove) Date: Fri, 05 Sep 2008 09:50:14 -0400 Subject: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: References: <48AC6495.1040807@array.ca> Message-ID: <48C13916.9020307@array.ca> Hi, Thanks, I grabbed the Intrepid package and got things rolling. My ports are now in the INIT state. I believe I now need to run OpenSM. Is there a Ubuntu/Debian package that contains it? I haven't been able to find one. If not, could you please point me to what I should install to move forward? Thanks, Steven Truelove Roland Dreier wrote: > > I am trying to get Infiniband up and running on a Ubuntu 8.04 > > system. I can load the modules and see plenty of infiniband content > > under /sys/class, but when I try to run ibv_devices, I get this error: > > > > libibverbs: Warning: no userspace device-specific driver found for > > /sys/class/infiniband_verbs/uverbs0 > > That's because you need to install the device-specific userspace driver ;) > > Add my PPA to your software sources: > > deb http://ppa.launchpad.net/roland.dreier/ubuntu hardy main > deb-src http://ppa.launchpad.net/roland.dreier/ubuntu hardy main > > and do "aptitude install libmlx4-1" and you should be all set. > (the libmlx4 packages are also in the 8.10/Intrepid archive already). > > Let me know if you have any issues. > > - R. > > -- Steven Truelove Array Systems Computing, Inc. 1120 Finch Avenue West, 7th Floor Toronto, Ontario M3J 3H7 CANADA http://www.array.ca truelove at array.ca Phone: (416) 736-0900 x307 Fax: (416) 736-4715 From sashak at voltaire.com Fri Sep 5 07:22:26 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 17:22:26 +0300 Subject: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <48C13916.9020307@array.ca> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> Message-ID: <20080905142226.GV6273@sashak.voltaire.com> On 09:50 Fri 05 Sep , Steven Truelove wrote: > > Thanks, I grabbed the Intrepid package and got things rolling. My ports > are now in the INIT state. I believe I now need to run OpenSM. Is there a > Ubuntu/Debian package that contains it? I haven't been able to find one. I am not aware about such. > If not, could you please point me to what I should install to move forward? http://www.openfabrics.org/downloads/management/README , and latest tarballs: http://www.openfabrics.org/downloads/management , or even more recent stuff directly from git tree: git close git://git.openfabrics.org/~sashak/management Sasha From bs at q-leap.de Fri Sep 5 07:39:56 2008 From: bs at q-leap.de (Bernd Schubert) Date: Fri, 5 Sep 2008 16:39:56 +0200 Subject: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <20080905142226.GV6273@sashak.voltaire.com> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> <20080905142226.GV6273@sashak.voltaire.com> Message-ID: <200809051639.57300.bs@q-leap.de> On Friday 05 September 2008 16:22:26 Sasha Khapyorsky wrote: > On 09:50 Fri 05 Sep , Steven Truelove wrote: > > Thanks, I grabbed the Intrepid package and got things rolling. My > > ports are now in the INIT state. I believe I now need to run OpenSM. Is > > there a Ubuntu/Debian package that contains it? I haven't been able to > > find one. > > I am not aware about such. We just didn't have the time yet to complete all the packaging and to push it upstream to Debian, here is what we have so far # Etchy packages, but also should work for hardy deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/etch ./ # Hardy packages, but not recently maintained (only for my workstation) deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/hardy/ ./ Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH From yossi.openib at gmail.com Fri Sep 5 08:00:46 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Fri, 05 Sep 2008 18:00:46 +0300 Subject: [ofa-general] ***SPAM*** [PATCH] ipoib: fix hang while bringing down uninitialized interface Message-ID: <48C1499E.4080002@gmail.com> Fix bug #1172: If a pkey for an interface is not found during initialization, then poll_timer is left uninitialized. When the device is brought down, ipoib tries to del_timer_sync() it. This call hangs in an infinite loop in lock_timer_base(), because timer_base is NULL. We should check whether the timer was really initialized. Signed-off-by: Yossi Etigin -- diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 66cafa2..3bbf46d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -850,7 +850,10 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) ipoib_dbg(priv, "All sends and receives done.\n"); timeout: - del_timer_sync(&priv->poll_timer); + /* Make sure the timer is initialized */ + if (priv->poll_timer.function) + del_timer_sync(&priv->poll_timer); + qp_attr.qp_state = IB_QPS_RESET; if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE)) ipoib_warn(priv, "Failed to modify QP to RESET state\n"); --Yossi From sashak at voltaire.com Fri Sep 5 09:14:20 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 5 Sep 2008 19:14:20 +0300 Subject: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <200809051639.57300.bs@q-leap.de> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> <20080905142226.GV6273@sashak.voltaire.com> <200809051639.57300.bs@q-leap.de> Message-ID: <20080905161420.GZ6273@sashak.voltaire.com> On 16:39 Fri 05 Sep , Bernd Schubert wrote: > > We just didn't have the time yet to complete all the packaging and to push > it upstream to Debian, here is what we have so far > > # Etchy packages, but also should work for hardy > deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/etch ./ > > # Hardy packages, but not recently maintained (only for my workstation) > deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/hardy/ ./ Great! Sasha From tgree at relay.phys.ualberta.ca Fri Sep 5 12:38:17 2008 From: tgree at relay.phys.ualberta.ca (Terry Greeniaus) Date: Fri, 5 Sep 2008 13:38:17 -0600 (MDT) Subject: [ofa-general] ib_cm question Message-ID: Hello all, We are porting out application to run on the OFED stack. As part of the porting process, I have a series of CM unit tests that I need to get to run. I am having trouble with one in particular. At a high level, the unit test implements a simple protocol for establishing a connection between a client and a server to test basic CM functionality. The protocol uses the private data field of the CM packets to exchange a key that is generated randomly by the server on a per-connection basis. Essentially, the client sends a REQ with a randomly chosen key which will not match the server's. When the server initially receives a REQ for a particular connection, it generates a random key and compares it against the key stored in the REQ. Since they don't match, the server sends a REJ back to the client, and the REJ contains the correct key in the private data field. Finally, the client resends the REQ, this time with the correct key: Client Server REQ -------------------> w/ bad key <------------------- REJ w/ good key REQ -------------------> w/ good key REP/etc. Everything works well until the second REQ is received at the server. It appears that instead of reusing the previous ib_cm_id, the OFED CM generates a new ib_cm_id to handle the second REQ. The unit test thinks that a new connection attempt is being requested instead of a retry of the original attempt and so it generates a new random key, resulting in the protocol being unable to establish a connection. Is something like I have described above supported by the OFED CM? I can try and distill this down to a fairly short code example if that would make things clearer. Thanks, TG From ofed at kononov.ftml.net Fri Sep 5 13:07:34 2008 From: ofed at kononov.ftml.net (Roman Kononov) Date: Fri, 05 Sep 2008 15:07:34 -0500 Subject: [ofa-general] Bogus Receive Completions Message-ID: <48C19186.2050903@kononov.ftml.net> This is continuation of http://lists.openfabrics.org/pipermail/general/2007-December/043658.html Basically, I have two processes on different computers talking to each other over a single QP per process. They both post and receive IBV_WR_RDMA_WRITE_WITH_IMM commands. All Send Work Requests are sequentially numbered in wr_id field. When the process receives Send Work Completion, wr_id is checked for consistency with the posted number. So far so good. All Receive Work Requests are sequentially numbered in wr_id field as well. When the process gets a Receive Work Completion, wr_id is checked for consistency with the posted number. The consistency test eventually fails. The Completion status is "success", wr_id is out of order. I believe that wr_id from Receive Work Completions must arrive in order, but they do not. I managed to reproduce the failure reliably in my environment. Then I modified mthca_tavor_post_recv(), mthca_tavor_post_send() to print all wr->wr_id values passing through them, and I modified mthca_poll_cq() to print all valid wc->wr_id values passing through it. The results from the two processes are attached. In stdout.1.log, one can see that a Receive Work Request with wr_id=0x7f was accepted and immediately completed, while the Receive Queue has 0x7f-0x40=0x3f uncompleted Work Requests. None mthca_tavor_post_recv() calls returned an error. This looks like a bug in libmthca or the firmware. I really need this fixed. Where should go from this point? Any suggestions are appreciated. The QP is created with both SQ and RQ sizes set to 64, with a single CQ. The CQ size is set to 128. I have libibverbs-1.1.2 and libmthca-1.0.5 compiled from sources. ~>cat /etc/issue CentOS release 5.2 (Final) Kernel \r on an \m ~>uname -a Linux node100 2.6.26.3 #1 SMP PREEMPT Wed Sep 3 14:11:03 CDT 2008 x86_64 x86_64 x86_64 GNU/Linux ~>grep 'model name' /proc/cpuinfo model name : Dual Core AMD Opteron(tm) Processor 285 model name : Dual Core AMD Opteron(tm) Processor 285 ~>ibv_devinfo hca_id: mthca0 fw_ver: 4.8.200 node_guid: 0002:c902:0026:dbe0 sys_image_guid: 0002:c902:0026:dbe3 vendor_id: 0x02c9 vendor_part_id: 25208 hw_ver: 0xA0 board_id: MT_02F0110002 phys_port_cnt: 2 ... Thanks, Roman Kononov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: stdout.1.log URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: stdout.2.log URL: From truelove at array.ca Fri Sep 5 13:09:26 2008 From: truelove at array.ca (Steven Truelove) Date: Fri, 05 Sep 2008 16:09:26 -0400 Subject: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <20080905161420.GZ6273@sashak.voltaire.com> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> <20080905142226.GV6273@sashak.voltaire.com> <200809051639.57300.bs@q-leap.de> <20080905161420.GZ6273@sashak.voltaire.com> Message-ID: <48C191F6.5020302@array.ca> Okay, thanks, I have run opensm and I have gotten IPoIB working as well, although there is a problem. IPoIB works fine with static IPs, but I can't get DHCP to work. The logs suggest that the DHCP server simply isn't seeing the DHCPDISCOVERs from the client. Here is the relevant chunk of dhcpd.conf: subnet 192.168.200.0 netmask 255.255.255.0 { always-broadcast on; range 192.168.200.10 192.168.200.50; option broadcast-address 192.168.200.255; } host sappsu4-ib { hardware ethernet 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00; fixed-address 192.168.200.104; } Does this have any chance of working? Thanks, Steven Truelove Sasha Khapyorsky wrote: > On 16:39 Fri 05 Sep , Bernd Schubert wrote: > >> We just didn't have the time yet to complete all the packaging and to push >> it upstream to Debian, here is what we have so far >> >> # Etchy packages, but also should work for hardy >> deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/etch ./ >> >> # Hardy packages, but not recently maintained (only for my workstation) >> deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/hardy/ ./ >> > > Great! > > Sasha > > -- Steven Truelove Array Systems Computing, Inc. 1120 Finch Avenue West, 7th Floor Toronto, Ontario M3J 3H7 CANADA http://www.array.ca truelove at array.ca Phone: (416) 736-0900 x307 Fax: (416) 736-4715 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Sep 5 15:39:06 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 Sep 2008 15:39:06 -0700 Subject: [ofa-general] Bogus Receive Completions In-Reply-To: <48C19186.2050903@kononov.ftml.net> (Roman Kononov's message of "Fri, 05 Sep 2008 15:07:34 -0500") References: <48C19186.2050903@kononov.ftml.net> Message-ID: > I managed to reproduce the failure reliably in my environment. Can you provide the code to reproduce this? I'd like to try it on ConnectX to see if it is HCA-dependent. Also I would suggest raising this issue with whoever sold you the HCAs. From weiny2 at llnl.gov Fri Sep 5 15:47:16 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 5 Sep 2008 15:47:16 -0700 Subject: [ofa-general] [PATCH] ibnetdiscover.c: continue processing other ports even if smpquery fails on one port Message-ID: <20080905154716.54d82f0e.weiny2@llnl.gov> >From a08bca968a590bc041dabc733200469c78581d52 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Fri, 5 Sep 2008 15:40:17 -0700 Subject: [PATCH] ibnetdiscover.c: continue processing other ports even if smpquery fails on one port Signed-off-by: Ira Weiny --- infiniband-diags/src/ibnetdiscover.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c index 803c300..35e7118 100644 --- a/infiniband-diags/src/ibnetdiscover.c +++ b/infiniband-diags/src/ibnetdiscover.c @@ -424,7 +424,7 @@ discover(ib_portid_t *from) if (get_port(&port_buf, i, path) < 0) { IBWARN("can't reach node %s port %d", portid2str(path), i); - return 0; + continue; } port = find_port(node, &port_buf); -- 1.5.4.5 From ofed at kononov.ftml.net Fri Sep 5 16:11:05 2008 From: ofed at kononov.ftml.net (Roman Kononov) Date: Fri, 05 Sep 2008 18:11:05 -0500 Subject: [ofa-general] Bogus Receive Completions In-Reply-To: References: <48C19186.2050903@kononov.ftml.net> Message-ID: <48C1BC89.4080709@kononov.ftml.net> On 2008-09-05 17:39, Roland Dreier wrote: > > I managed to reproduce the failure reliably in my environment. > > Can you provide the code to reproduce this? I'd like to try it on > ConnectX to see if it is HCA-dependent. Perhaps, I can give you the code, but it needs lots of other HW and SW. Setting it up will be a big pain. And, by changing the code and moving stuff around, I can almost mask the problem, and it does not appear that soon. > > Also I would suggest raising this issue with whoever sold you the HCAs. What do you mean? Do you mean that the HCAs could be defective? The manufacturer is HP. The retailer is somebody. I'm sure, that money back is the most what I can get from them. BTW, I added more printfs in mthca_poll_one(), when it handles Receive Completions, and have noticed that cqe->wqe is out of sequence: ... wr_id=3a, is_error=0, wqe=e81, wqe_index=3a, cqe=0x84a7c0, imm=8000003a wr_id=3b, is_error=0, wqe=ec1, wqe_index=3b, cqe=0x84a7e0, imm=8000003b wr_id=3c, is_error=0, wqe=f01, wqe_index=3c, cqe=0x84a800, imm=8000003c wr_id=3d, is_error=0, wqe=f41, wqe_index=3d, cqe=0x84a820, imm=8000003d wr_id=3e, is_error=0, wqe=f81, wqe_index=3e, cqe=0x84a840, imm=8000003e wr_id=3f, is_error=0, wqe=fc1, wqe_index=3f, cqe=0x84a860, imm=8000003f wr_id=7f, is_error=0, wqe=fc1, wqe_index=3f, cqe=0x84a880, imm=80000040 "imm" is really cqe->imm_etype_pkey_eec, and it comes from the sender and is in sequence. Roman From rdreier at cisco.com Fri Sep 5 16:17:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 Sep 2008 16:17:00 -0700 Subject: [ofa-general] Bogus Receive Completions In-Reply-To: <48C1BC89.4080709@kononov.ftml.net> (Roman Kononov's message of "Fri, 05 Sep 2008 18:11:05 -0500") References: <48C19186.2050903@kononov.ftml.net> <48C1BC89.4080709@kononov.ftml.net> Message-ID: > Perhaps, I can give you the code, but it needs lots of other HW and > SW. Setting it up will be a big pain. And, by changing the code and > moving stuff around, I can almost mask the problem, and it does not > appear that soon. I suspect it's going to be hard to debug this without having a way to reproduce it. > > Also I would suggest raising this issue with whoever sold you the HCAs. > > What do you mean? Do you mean that the HCAs could be defective? The > manufacturer is HP. The retailer is somebody. I'm sure, that money > back is the most what I can get from them. I just mean that you should be able to get support as a customer and escalate this issue. Probably Mellanox is the only one who can debug this, especially because it could easily be a firmware issue. - R. From christopher.tanner at gatech.edu Fri Sep 5 17:02:19 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Fri, 5 Sep 2008 20:02:19 -0400 Subject: [ofa-general] OpenSM Ubuntu-unfriendly Message-ID: I'm trying to start the OpenSM daemon at startup by putting the opensmd file in the /etc/init.d directory. However, I get these errors when it tries to start: Starting opensm: /etc/init.d/opensmd: line 64: success: command not found /etc/init.d/opensmd: line 130: rc_exit: command not found Looking at the opensmd script, I noticed some things: a) It contains commands like rc_status, rc_exit, _rc_status_all which are not valid commands in Debian/Ubuntu b) It contains commands like success() and failure() which are not valid in Debian/Ubuntu I think these commands will work on Redhat or SUSE... Does anyone know the equivalent commands in Debian? For example, I think 'success' can be replaced with 'return 1', but I'm not certain how that affects the script. Thanks! ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- From dledford at redhat.com Fri Sep 5 18:41:48 2008 From: dledford at redhat.com (Doug Ledford) Date: Fri, 05 Sep 2008 21:41:48 -0400 Subject: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <48C191F6.5020302@array.ca> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> <20080905142226.GV6273@sashak.voltaire.com> <200809051639.57300.bs@q-leap.de> <20080905161420.GZ6273@sashak.voltaire.com> <48C191F6.5020302@array.ca> Message-ID: <1220665308.7801.45.camel@firewall.xsintricity.com> On Fri, 2008-09-05 at 16:09 -0400, Steven Truelove wrote: > Okay, thanks, I have run opensm and I have gotten IPoIB working as > well, although there is a problem. IPoIB works fine with static IPs, > but I can't get DHCP to work. The logs suggest that the DHCP server > simply isn't seeing the DHCPDISCOVERs from the client. Here is the > relevant chunk of dhcpd.conf: > > subnet 192.168.200.0 netmask 255.255.255.0 { > > always-broadcast on; > > range 192.168.200.10 192.168.200.50; > > option broadcast-address 192.168.200.255; > } > > > host sappsu4-ib { > hardware ethernet 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00; > fixed-address 192.168.200.104; > } > > > Does this have any chance of working? Will your dhcp server even start up with that hardware ethernet line in it? None of the patches for the dhcp server that I've seen enable dhcp to parse that big of an ethernet definition. > Thanks, > > Steven Truelove > > > > > Sasha Khapyorsky wrote: > > On 16:39 Fri 05 Sep , Bernd Schubert wrote: > > > > > We just didn't have the time yet to complete all the packaging and to push > > > it upstream to Debian, here is what we have so far > > > > > > # Etchy packages, but also should work for hardy > > > deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/etch ./ > > > > > > # Hardy packages, but not recently maintained (only for my workstation) > > > deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/hardy/ ./ > > > > > > > Great! > > > > Sasha > > > > > > -- > Steven Truelove > Array Systems Computing, Inc. > 1120 Finch Avenue West, 7th Floor > Toronto, Ontario > M3J 3H7 > CANADA > http://www.array.ca > truelove at array.ca > Phone: (416) 736-0900 x307 > Fax: (416) 736-4715 > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From vlad at lists.openfabrics.org Sat Sep 6 03:03:16 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 6 Sep 2008 03:03:16 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080906-0200 daily build status Message-ID: <20080906100316.6609CE60D8B@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From truelove at array.ca Sat Sep 6 04:18:23 2008 From: truelove at array.ca (Steven Truelove) Date: Sat, 06 Sep 2008 07:18:23 -0400 Subject: ***SPAM*** Re: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <1220665308.7801.45.camel@firewall.xsintricity.com> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> <20080905142226.GV6273@sashak.voltaire.com> <200809051639.57300.bs@q-leap.de> <20080905161420.GZ6273@sashak.voltaire.com> <48C191F6.5020302@array.ca> <1220665308.7801.45.camel@firewall.xsintricity.com> Message-ID: <48C266FF.30108@array.ca> Yes, the DHCP server starts just fine. The hardware address was pulled from the output of ifconfig on the client. Even if the hardware address was wrong, or I didn't list the host at all, the 'range' setting should ensure that an address is provided from the open pool. There is output in the logs to indicate that there is no subnet listing for ib1 and eth0, and that it won't be listening on those interfaces. This implies that it is working on eth1 (where DHCP is tested working) and ib0 (where no log references to DHCPDISCOVER are made, even though the client is sending them). That said, am I barking up the wrong tree entirely by even trying to make this work? There are a few references to this being possible when I google for 'infiniband dhcp', and this is where I got the 'always-broadcast on' setting from. Apparently this is necessary. But I couldn't find anything further to help me. Thanks, Steven Truelove Doug Ledford wrote: > On Fri, 2008-09-05 at 16:09 -0400, Steven Truelove wrote: > >> Okay, thanks, I have run opensm and I have gotten IPoIB working as >> well, although there is a problem. IPoIB works fine with static IPs, >> but I can't get DHCP to work. The logs suggest that the DHCP server >> simply isn't seeing the DHCPDISCOVERs from the client. Here is the >> relevant chunk of dhcpd.conf: >> >> subnet 192.168.200.0 netmask 255.255.255.0 { >> >> always-broadcast on; >> >> range 192.168.200.10 192.168.200.50; >> >> option broadcast-address 192.168.200.255; >> } >> >> >> host sappsu4-ib { >> hardware ethernet 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00; >> fixed-address 192.168.200.104; >> } >> >> >> Does this have any chance of working? >> > > Will your dhcp server even start up with that hardware ethernet line in > it? None of the patches for the dhcp server that I've seen enable dhcp > to parse that big of an ethernet definition. > > >> Thanks, >> >> Steven Truelove >> >> >> >> >> Sasha Khapyorsky wrote: >> >>> On 16:39 Fri 05 Sep , Bernd Schubert wrote: >>> >>> >>>> We just didn't have the time yet to complete all the packaging and to push >>>> it upstream to Debian, here is what we have so far >>>> >>>> # Etchy packages, but also should work for hardy >>>> deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/etch ./ >>>> >>>> # Hardy packages, but not recently maintained (only for my workstation) >>>> deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/hardy/ ./ >>>> >>>> >>> Great! >>> >>> Sasha >>> >>> >>> >> -- >> Steven Truelove >> Array Systems Computing, Inc. >> 1120 Finch Avenue West, 7th Floor >> Toronto, Ontario >> M3J 3H7 >> CANADA >> http://www.array.ca >> truelove at array.ca >> Phone: (416) 736-0900 x307 >> Fax: (416) 736-4715 >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hal.rosenstock at gmail.com Sat Sep 6 05:23:31 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Sat, 6 Sep 2008 08:23:31 -0400 Subject: ***SPAM*** Re: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <48C191F6.5020302@array.ca> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> <20080905142226.GV6273@sashak.voltaire.com> <200809051639.57300.bs@q-leap.de> <20080905161420.GZ6273@sashak.voltaire.com> <48C191F6.5020302@array.ca> Message-ID: On Fri, Sep 5, 2008 at 4:09 PM, Steven Truelove wrote: > Okay, thanks, I have run opensm and I have gotten IPoIB working as well, > although there is a problem. IPoIB works fine with static IPs, but I can't > get DHCP to work. The logs suggest that the DHCP server simply isn't seeing > the DHCPDISCOVERs from the client. Here is the relevant chunk of > dhcpd.conf: > > subnet 192.168.200.0 netmask 255.255.255.0 { > > always-broadcast on; > > range 192.168.200.10 192.168.200.50; > > option broadcast-address 192.168.200.255; > } > > > host sappsu4-ib { > hardware ethernet 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00; I think it's done with client identifier for IB and it should be set to the IPoIB hardware address (20 bytes) which is QPN + GID. You should be able to determine this from: ip addr show ib e.g. ip addr show ib0 5: ib0: mtu 2044 qdisc noop qlen 128 link/[32] 00:0d:00:48:20:06:00:00:00:00:00:00:00:02:c9:03:00:00:14:91 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff ip addr show ib1 6: ib1: mtu 2044 qdisc noop qlen 128 link/[32] 00:0d:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:00:14:92 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff -- Hal > fixed-address 192.168.200.104; > } > > > Does this have any chance of working? > > Thanks, > > Steven Truelove > > > > > Sasha Khapyorsky wrote: > > On 16:39 Fri 05 Sep , Bernd Schubert wrote: > > > We just didn't have the time yet to complete all the packaging and to push > it upstream to Debian, here is what we have so far > # Etchy packages, but also should work for hardy > deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/etch > ./ > # Hardy packages, but not recently maintained (only for my workstation) > deb > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/hardy/ ./ > > > Great! > Sasha > > > -- > Steven Truelove > Array Systems Computing, Inc. > 1120 Finch Avenue West, 7th Floor > Toronto, Ontario > M3J 3H7 > CANADA > http://www.array.ca > truelove at array.ca > Phone: (416) 736-0900 x307 > Fax: (416) 736-4715 > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Sat Sep 6 06:33:55 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Sat, 6 Sep 2008 09:33:55 -0400 Subject: [ofa-general] ib_cm question In-Reply-To: References: Message-ID: Hi, On Fri, Sep 5, 2008 at 3:38 PM, Terry Greeniaus wrote: > Hello all, > > We are porting out application to run on the OFED stack. As part of the > porting process, I have a series of CM unit tests that I need to get to > run. I am having trouble with one in particular. The CM maintainer is currently on sabbatical for a little while. FWIW, I'll provide my take on this. > At a high level, the unit test implements a simple protocol for > establishing a connection between a client and a server to test basic CM > functionality. The protocol uses the private data field of the CM > packets to exchange a key that is generated randomly by the server on a > per-connection basis. Essentially, the client sends a REQ with a > randomly chosen key which will not match the server's. When the server > initially receives a REQ for a particular connection, it generates a > random key and compares it against the key stored in the REQ. Since > they don't match, the server sends a REJ back to the client, and the REJ > contains the correct key in the private data field. Finally, the client > resends the REQ, this time with the correct key: > > Client Server > REQ -------------------> > w/ bad key > > <------------------- REJ > w/ good key > > REQ -------------------> > w/ good key > > REP/etc. Are the keys in the private data ? Out of curiousity, what REJ code is used ? > Everything works well until the second REQ is received at the server. > It appears that instead of reusing the previous ib_cm_id, the OFED CM > generates a new ib_cm_id to handle the second REQ. The unit test thinks > that a new connection attempt is being requested instead of a retry of > the original attempt and so it generates a new random key, resulting in > the protocol being unable to establish a connection. > > Is something like I have described above supported by the OFED CM? ib_cm.h states: * ib_cm_handler - User-defined callback to process communication events. * @cm_id: Communication identifier associated with the reported event. * @event: Information about the communication event. * * IB_CM_REQ_RECEIVED and IB_CM_SIDR_REQ_RECEIVED communication events * generated as a result of listen requests result in the allocation of a * new @cm_id. The new @cm_id is returned to the user through this callback. Although some other CM's may have reused the same "cm id" on the passive side, I don't think that there's a requirement to do so. I think it's valid either way per the spec. IMO the unit test/protocol should not depend on implementation specific behavior which is what I think this amounts to. I don't sufficiently understand the details of your protocol (as to why the initial connection need be rejected) as opposed to passing the key back in the REP. There may also be other possibilities if a protocol change for your application is feasible. -- Hal > I can try and distill this down to a fairly short code example if that > would make things clearer. > > Thanks, > TG > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ofed at kononov.ftml.net Sat Sep 6 07:03:01 2008 From: ofed at kononov.ftml.net (Roman Kononov) Date: Sat, 06 Sep 2008 09:03:01 -0500 Subject: [ofa-general] Bogus Receive Completions In-Reply-To: References: <48C19186.2050903@kononov.ftml.net> <48C1BC89.4080709@kononov.ftml.net> Message-ID: <48C28D95.5060004@kononov.ftml.net> Roland Dreier wrote: > > Perhaps, I can give you the code, but it needs lots of other HW and > > SW. Setting it up will be a big pain. And, by changing the code and > > moving stuff around, I can almost mask the problem, and it does not > > appear that soon. > > Probably Mellanox is the only one who can debug this, especially because > it could easily be a firmware issue. Mellanox! Please! I can setup an SSH connection to the failing system and provide any assistance here 24/7. Roman From dledford at redhat.com Sat Sep 6 09:51:31 2008 From: dledford at redhat.com (Doug Ledford) Date: Sat, 06 Sep 2008 12:51:31 -0400 Subject: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <48C266FF.30108@array.ca> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> <20080905142226.GV6273@sashak.voltaire.com> <200809051639.57300.bs@q-leap.de> <20080905161420.GZ6273@sashak.voltaire.com> <48C191F6.5020302@array.ca> <1220665308.7801.45.camel@firewall.xsintricity.com> <48C266FF.30108@array.ca> Message-ID: <1220719891.7801.48.camel@firewall.xsintricity.com> On Sat, 2008-09-06 at 07:18 -0400, Steven Truelove wrote: > Yes, the DHCP server starts just fine. The hardware address was > pulled from the output of ifconfig on the client. Even if the hardware > address was wrong, or I didn't list the host at all, the 'range' > setting should ensure that an address is provided from the open pool. > > There is output in the logs to indicate that there is no subnet > listing for ib1 and eth0, and that it won't be listening on those > interfaces. This implies that it is working on eth1 (where DHCP is > tested working) and ib0 (where no log references to DHCPDISCOVER are > made, even though the client is sending them). > > That said, am I barking up the wrong tree entirely by even trying to > make this work? There are a few references to this being possible > when I google for 'infiniband dhcp', and this is where I got the > 'always-broadcast on' setting from. Apparently this is necessary. > But I couldn't find anything further to help me. Did you apply the dhcp patch that's in the OFED distribution to the dhcp server and recompile? Without, it doesn't know how to parse IB broadcast packets (and with it, it still doesn't, but it switches from raw mode to cooked socket mode where it doesn't have to know the structure of a raw IPoIB packet). It would certainly explain the dhcp server silently dropping the packets, they wouldn't look like dhcp requests in raw mode. > Thanks, > > Steven Truelove > > > > Doug Ledford wrote: > > On Fri, 2008-09-05 at 16:09 -0400, Steven Truelove wrote: > > > > > Okay, thanks, I have run opensm and I have gotten IPoIB working as > > > well, although there is a problem. IPoIB works fine with static IPs, > > > but I can't get DHCP to work. The logs suggest that the DHCP server > > > simply isn't seeing the DHCPDISCOVERs from the client. Here is the > > > relevant chunk of dhcpd.conf: > > > > > > subnet 192.168.200.0 netmask 255.255.255.0 { > > > > > > always-broadcast on; > > > > > > range 192.168.200.10 192.168.200.50; > > > > > > option broadcast-address 192.168.200.255; > > > } > > > > > > > > > host sappsu4-ib { > > > hardware ethernet 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00; > > > fixed-address 192.168.200.104; > > > } > > > > > > > > > Does this have any chance of working? > > > > > > > Will your dhcp server even start up with that hardware ethernet line in > > it? None of the patches for the dhcp server that I've seen enable dhcp > > to parse that big of an ethernet definition. > > > > > > > Thanks, > > > > > > Steven Truelove > > > > > > > > > > > > > > > Sasha Khapyorsky wrote: > > > > > > > On 16:39 Fri 05 Sep , Bernd Schubert wrote: > > > > > > > > > > > > > We just didn't have the time yet to complete all the packaging and to push > > > > > it upstream to Debian, here is what we have so far > > > > > > > > > > # Etchy packages, but also should work for hardy > > > > > deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/etch ./ > > > > > > > > > > # Hardy packages, but not recently maintained (only for my workstation) > > > > > deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/hardy/ ./ > > > > > > > > > > > > > > Great! > > > > > > > > Sasha > > > > > > > > > > > > > > > -- > > > Steven Truelove > > > Array Systems Computing, Inc. > > > 1120 Finch Avenue West, 7th Floor > > > Toronto, Ontario > > > M3J 3H7 > > > CANADA > > > http://www.array.ca > > > truelove at array.ca > > > Phone: (416) 736-0900 x307 > > > Fax: (416) 736-4715 > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From truelove at array.ca Sat Sep 6 09:57:10 2008 From: truelove at array.ca (Steven Truelove) Date: Sat, 06 Sep 2008 12:57:10 -0400 Subject: ***SPAM*** Re: [ofa-general] ConnectX IB HCA with Ubuntu 8.04 In-Reply-To: <1220719891.7801.48.camel@firewall.xsintricity.com> References: <48AC6495.1040807@array.ca> <48C13916.9020307@array.ca> <20080905142226.GV6273@sashak.voltaire.com> <200809051639.57300.bs@q-leap.de> <20080905161420.GZ6273@sashak.voltaire.com> <48C191F6.5020302@array.ca> <1220665308.7801.45.camel@firewall.xsintricity.com> <48C266FF.30108@array.ca> <1220719891.7801.48.camel@firewall.xsintricity.com> Message-ID: <48C2B666.3020206@array.ca> Okay, thanks, this is likely what I missed. Steven Truelove Doug Ledford wrote: > On Sat, 2008-09-06 at 07:18 -0400, Steven Truelove wrote: > >> Yes, the DHCP server starts just fine. The hardware address was >> pulled from the output of ifconfig on the client. Even if the hardware >> address was wrong, or I didn't list the host at all, the 'range' >> setting should ensure that an address is provided from the open pool. >> >> There is output in the logs to indicate that there is no subnet >> listing for ib1 and eth0, and that it won't be listening on those >> interfaces. This implies that it is working on eth1 (where DHCP is >> tested working) and ib0 (where no log references to DHCPDISCOVER are >> made, even though the client is sending them). >> >> That said, am I barking up the wrong tree entirely by even trying to >> make this work? There are a few references to this being possible >> when I google for 'infiniband dhcp', and this is where I got the >> 'always-broadcast on' setting from. Apparently this is necessary. >> But I couldn't find anything further to help me. >> > > Did you apply the dhcp patch that's in the OFED distribution to the dhcp > server and recompile? Without, it doesn't know how to parse IB > broadcast packets (and with it, it still doesn't, but it switches from > raw mode to cooked socket mode where it doesn't have to know the > structure of a raw IPoIB packet). It would certainly explain the dhcp > server silently dropping the packets, they wouldn't look like dhcp > requests in raw mode. > > >> Thanks, >> >> Steven Truelove >> >> >> >> Doug Ledford wrote: >> >>> On Fri, 2008-09-05 at 16:09 -0400, Steven Truelove wrote: >>> >>> >>>> Okay, thanks, I have run opensm and I have gotten IPoIB working as >>>> well, although there is a problem. IPoIB works fine with static IPs, >>>> but I can't get DHCP to work. The logs suggest that the DHCP server >>>> simply isn't seeing the DHCPDISCOVERs from the client. Here is the >>>> relevant chunk of dhcpd.conf: >>>> >>>> subnet 192.168.200.0 netmask 255.255.255.0 { >>>> >>>> always-broadcast on; >>>> >>>> range 192.168.200.10 192.168.200.50; >>>> >>>> option broadcast-address 192.168.200.255; >>>> } >>>> >>>> >>>> host sappsu4-ib { >>>> hardware ethernet 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00; >>>> fixed-address 192.168.200.104; >>>> } >>>> >>>> >>>> Does this have any chance of working? >>>> >>>> >>> Will your dhcp server even start up with that hardware ethernet line in >>> it? None of the patches for the dhcp server that I've seen enable dhcp >>> to parse that big of an ethernet definition. >>> >>> >>> >>>> Thanks, >>>> >>>> Steven Truelove >>>> >>>> >>>> >>>> >>>> Sasha Khapyorsky wrote: >>>> >>>> >>>>> On 16:39 Fri 05 Sep , Bernd Schubert wrote: >>>>> >>>>> >>>>> >>>>>> We just didn't have the time yet to complete all the packaging and to push >>>>>> it upstream to Debian, here is what we have so far >>>>>> >>>>>> # Etchy packages, but also should work for hardy >>>>>> deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/etch ./ >>>>>> >>>>>> # Hardy packages, but not recently maintained (only for my workstation) >>>>>> deb http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/infiniband/hardy/ ./ >>>>>> >>>>>> >>>>>> >>>>> Great! >>>>> >>>>> Sasha >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> Steven Truelove >>>> Array Systems Computing, Inc. >>>> 1120 Finch Avenue West, 7th Floor >>>> Toronto, Ontario >>>> M3J 3H7 >>>> CANADA >>>> http://www.array.ca >>>> truelove at array.ca >>>> Phone: (416) 736-0900 x307 >>>> Fax: (416) 736-4715 >>>> _______________________________________________ >>>> general mailing list >>>> general at lists.openfabrics.org >>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>> >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at dev.mellanox.co.il Sun Sep 7 02:10:43 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 07 Sep 2008 12:10:43 +0300 Subject: [ofa-general] Bogus Receive Completions In-Reply-To: <48C28D95.5060004@kononov.ftml.net> References: <48C19186.2050903@kononov.ftml.net> <48C1BC89.4080709@kononov.ftml.net> <48C28D95.5060004@kononov.ftml.net> Message-ID: <48C39A93.8060809@mellanox.co.il> Roman Kononov wrote: > Roland Dreier wrote: >> > Perhaps, I can give you the code, but it needs lots of other HW and >> > SW. Setting it up will be a big pain. And, by changing the code and >> > moving stuff around, I can almost mask the problem, and it does not >> > appear that soon. >> >> Probably Mellanox is the only one who can debug this, especially because >> it could easily be a firmware issue. > > Mellanox! Please! > > I can setup an SSH connection to the failing system and provide any > assistance here 24/7. > > Roman > We will work with you to debug it off line. Tziporet From vlad at lists.openfabrics.org Sun Sep 7 03:05:19 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 7 Sep 2008 03:05:19 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080907-0200 daily build status Message-ID: <20080907100519.102C0E60B08@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From vlad at lists.openfabrics.org Mon Sep 8 03:07:52 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 8 Sep 2008 03:07:52 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080908-0200 daily build status Message-ID: <20080908100752.C4970E60975@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From cap at nsc.liu.se Mon Sep 8 03:50:35 2008 From: cap at nsc.liu.se (Peter Kjellstrom) Date: Mon, 8 Sep 2008 12:50:35 +0200 Subject: [ofa-general] Compiling source using Intel Compiler In-Reply-To: References: <20080904181850.GC6273@sashak.voltaire.com> Message-ID: <200809081250.37307.cap@nsc.liu.se> On Thursday 04 September 2008, Christopher Tanner wrote: > > But why you cannot use gcc for building OFED packages? > > Our codes have a lot of Fortran 77 in them and gfortran hasn't been   > compiling those codes very well. Since we're using ifort for Fortran   > compiling, I figured we ought to use icc (C) and icpc (C++) to use a   > consistent compiler package. I don't know if programs partially   > compiled in gcc and ifort will work very well... This is the case for a lot of users and sites (if not most HPC sites). There is no need what-so-ever to compile the IB-stack with icc. Just build in the recommended way and compile your applications with icc/ifort. /Peter From tziporet at mellanox.co.il Mon Sep 8 04:55:34 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 08 Sep 2008 14:55:34 +0300 Subject: [ofa-general] OFED meeting agenda for Sep-8 on OFED 1.4 release status Message-ID: <48C512B6.5080906@mellanox.co.il> This is the agenda for the OFED meeting today (8-Sep): 1. RC1 status: We have build RC1 today and will run testing today - should be out tomorrow. 2. Missing features for RC2: - NFS-RDMA over RHEL 5.1 - OSM: Cashed routing 3. Bug list review: *bug_id* *bug_severity* *op_sys* *assigned_to* *short_short_desc* 1128 blocker Other stefan.roscher at de.ibm.com release IPoIB-CM QP resources in flushing CQE context 1171 critical Other swise at opengridcomputing.com no mac stats with ofed-1.4 cxgb3 1113 critical RHEL 4 vu at mellanox.com rpm -e scsi-target-utils-0.1-2008715 fails 1117 critical SLES 10 yannick.cote at qlogic.com ib_ipath module hangs on unload 1172 major RHEL 5 eli at mellanox.co.il soft lockup in ipoib during hw driver unload 1153 major Other vlad at mellanox.co.il OpenSM- Multicast group will not open when IB host is the client (joined as send only). 1164 normal SLES 10 eli at mellanox.co.il iperf over IPoIB fails for 100 tcp connections 1131 normal Other sashak at voltaire.com ibnetdiscover - some options are mentioned in the man, but not implemented 1132 normal Other sashak at voltaire.com ibclearcounters - -N flag couse to irrelevant error (usage of perfquery) 1136 normal All sashak at voltaire.com ibtracert - some flags mentioned in man page but doesn't implemented 4. Open Discussion Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sumit.Gaur at Sun.COM Mon Sep 8 07:50:39 2008 From: Sumit.Gaur at Sun.COM (Sumit Gaur - Sun Microsystem) Date: Mon, 08 Sep 2008 20:20:39 +0530 Subject: [ofa-general] open_node_name_map on OFED 1.3.1 In-Reply-To: <20080907190004.4D9DDE60B0F@openfabrics.org> References: <20080907190004.4D9DDE60B0F@openfabrics.org> Message-ID: <48C53BBF.7030304@Sun.COM> I have upgraded my OFED version from 1.2.5* to 1.3.1. With OFED 1.3.1 I am facing problems in umad_send and umad_recv I have gone through OFED code for the same. Only extra thing I observed is call of open_node_name_map(node_name_map_file); Is it necessary in OFED 1.3.1 to use above function before making any smpquery ? Thanks sumit From hal.rosenstock at gmail.com Mon Sep 8 09:01:31 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 8 Sep 2008 12:01:31 -0400 Subject: [ofa-general] upgrade from 1.2.5* to 1.3.1 In-Reply-To: <48C13182.3020500@Sun.COM> References: <20080905015440.9C33FE60D8F@openfabrics.org> <48C13182.3020500@Sun.COM> Message-ID: On Fri, Sep 5, 2008 at 9:17 AM, Sumit Gaur - Sun Microsystem wrote: > Hi > I have upgraded my OFED version from 1.2.5* to 1.3.1, Now application could > not communicate with OFED libraries using umad_send and umad_recv function > call for IB_SMI_CLASS (with DR path). Is there any major change in umad lib > for such requests. Any help or info is appreciated. What kernel is being used ? On what machine architecture are you running ? Is it perhaps big endian ? I think there was a change that could affect those machines at a minimum. -- Hal > sumit > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Mon Sep 8 09:05:34 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 8 Sep 2008 12:05:34 -0400 Subject: [ofa-general] open_node_name_map on OFED 1.3.1 In-Reply-To: <48C53BBF.7030304@Sun.COM> References: <20080907190004.4D9DDE60B0F@openfabrics.org> <48C53BBF.7030304@Sun.COM> Message-ID: On Mon, Sep 8, 2008 at 10:50 AM, Sumit Gaur - Sun Microsystem wrote: > I have upgraded my OFED version from 1.2.5* to 1.3.1. With OFED 1.3.1 I am > facing problems in umad_send and umad_recv I have gone through OFED code for > the same. Only extra thing I observed is call of > open_node_name_map(node_name_map_file); > > Is it necessary in OFED 1.3.1 to use above function before making any > smpquery ? Are you talking about the smpquery diag or a custom SMP query ? In the former case, it should work whether or not there is a node name map file (which is optional). In the latter case, there is no need to issue this call (which is in the common diags). It is merely for getting more user friendly node names if they exist. I don't think it's related to any problems you are observing with umad_send/recv. -- Hal > Thanks > sumit > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From jsquyres at cisco.com Mon Sep 8 09:45:14 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 8 Sep 2008 12:45:14 -0400 Subject: [ofa-general] sched_setaffinity / sched_getaffinity Message-ID: <51D89AA7-D416-4315-AC33-964C998DC67B@cisco.com> There's at least one warning in OFED that Betsy mentioned today about sched_setaffinity(): ch3_smp_progress.c:2427: warning: passing argument 3 of 'sched_setaffinity' from incompatible pointer type Be advised that the prototypes for sched_setaffinity() and sched_getaffinity() have changed multiple times over the life of the 2.4 and 2.6 kernel series. Even worse, the the signatures in glibc have not always matched those in the kernel -- even in shipping Linux distros. The Open MPI project spun off a tiny library to solve exactly this problem (because it really has nothing to do with MPI): the Portable Linux Processor Affinity (PLPA) project. PLPA provides plpa_sched_setaffinity() and plpa_sched_getaffinity() API calls with constant signatures and will do the Right Thing regardless of what version of glibc and/or kernel you have. PLPA is fully embeddable in other software projects (e.g., we embed it in Open MPI; htop also embeds it, IIRC). PLPA's license is BSD. See the project page here: http://www.open-mpi.org/projects/plpa/ Ping me on the PLPA mailing list if you have any questions / comments / suggestions / patches / etc. You have to be subscribed to post, sorry. Enjoy. -- Jeff Squyres Cisco Systems From tgree at relay.phys.ualberta.ca Mon Sep 8 10:14:46 2008 From: tgree at relay.phys.ualberta.ca (Terry Greeniaus) Date: Mon, 8 Sep 2008 11:14:46 -0600 (MDT) Subject: [ofa-general] ib_cm question In-Reply-To: References: Message-ID: On Sat, 6 Sep 2008, Hal Rosenstock wrote: > On Fri, Sep 5, 2008 at 3:38 PM, Terry Greeniaus > wrote: > > The CM maintainer is currently on sabbatical for a little while. FWIW, > I'll provide my take on this. Thanks for your response Hal! > > Client Server > > REQ -------------------> > > w/ bad key > > > > <------------------- REJ > > w/ good key > > > > REQ -------------------> > > w/ good key > > > > REP/etc. > > Are the keys in the private data ? Yes. > Out of curiousity, what REJ code is used ? 28 - consumer reject. > ib_cm.h states: > * ib_cm_handler - User-defined callback to process communication events. > * @cm_id: Communication identifier associated with the reported event. > * @event: Information about the communication event. > * > * IB_CM_REQ_RECEIVED and IB_CM_SIDR_REQ_RECEIVED communication events > * generated as a result of listen requests result in the allocation of a > * new @cm_id. The new @cm_id is returned to the user through this callback. > > Although some other CM's may have reused the same "cm id" on the > passive side, I don't think that there's a requirement to do so. I > think it's valid either way per the spec. IMO the unit test/protocol > should not depend on implementation specific behavior which is what I > think this amounts to. You may be right here. The passive side of the CM state machine diagram (Fig 132 p 688 in my copy of the IBA) has an arc from "REJ Sent" to "REQ Rcvd" labelled "(retry) Rcv REQ". It also has an arc labelled "(no retry)" which essentially frees up the cm id. Unfortunately the spec doesn't specify which arc you should follow. We had interpreted this as the passive side waiting for a retried REQ if the number of CM retries as specified in the original REQ packet had not yet been exhausted. However, with dropped packets (UD) this could result in the passive cm id never being freed - so the OFED interpretation of freeing it immediately after sending the REJ and using a new cm id for subsequent REQs may be the more sensible interpretation. > I don't sufficiently understand the details of your protocol (as to > why the initial connection need be rejected) as opposed to passing the > key back in the REP. There may also be other possibilities if a > protocol change for your application is feasible. The protocol is completely contrived for this particular unit test - it isn't used anywhere in our application and was meant to test these particular state transitions in our CM implementation. It's comforting to know that they did their job and found this difference in how the two CMs work, but I will argue that it shouldn't be run against the OFED stack or that it should be modified to take the OFED interpretation of the spec into consideration. Thanks for your time, TG From tgree at relay.phys.ualberta.ca Mon Sep 8 11:33:34 2008 From: tgree at relay.phys.ualberta.ca (Terry Greeniaus) Date: Mon, 8 Sep 2008 12:33:34 -0600 (MDT) Subject: [ofa-general] DM question Message-ID: Hi all, Our application, which we are porting to OFED, makes use of Device Management to both advertise and discover services on the network. Our application uses the user-level MAD library and is currently able to perform DM queries on the subnet. Unfortunately, in order to advertise DM services our application needs to be able to set the "isDeviceManagementSupported" bit on our local port's capabilityMask. I don't see a way to do that from userspace in OFED. This is critical for our application since without that bit set nobody else on the fabric will be able to use our services. Perhaps I have overlooked something? If not, what would the recommended way of setting this bit in the capabilityMask be? Thanks, TG From hal.rosenstock at gmail.com Mon Sep 8 13:24:08 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 8 Sep 2008 16:24:08 -0400 Subject: [ofa-general] DM question In-Reply-To: References: Message-ID: Hi Terry, On Mon, Sep 8, 2008 at 2:33 PM, Terry Greeniaus wrote: > Hi all, > > Our application, which we are porting to OFED, makes use of Device > Management to both advertise and discover services on the network. Our > application uses the user-level MAD library and is currently able to > perform DM queries on the subnet. Unfortunately, in order to advertise > DM services our application needs to be able to set the > "isDeviceManagementSupported" bit on our local port's capabilityMask. I > don't see a way to do that from userspace in OFED. This is critical for > our application since without that bit set nobody else on the fabric > will be able to use our services. > > Perhaps I have overlooked something? If not, what would the recommended > way of setting this bit in the capabilityMask be? The only way I see to do this from user space is something like the following: SubnGet PortInfo of local port Set IsDeviceManagementSupport bit in PortInfo.CapabilityMask Change any other PortInfo fields so set will work (LinkState andPortPhysicalState set to no state change, don't think any others need changing) SubnSet PortInfo of local port (make sure set worked) The downside is that if your application crashes you will need to have a cleanup program to unset that bit. Hope this helps. -- Hal > Thanks, > TG > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tgree at relay.phys.ualberta.ca Mon Sep 8 13:49:26 2008 From: tgree at relay.phys.ualberta.ca (Terry Greeniaus) Date: Mon, 8 Sep 2008 14:49:26 -0600 (MDT) Subject: [ofa-general] DM question In-Reply-To: References: Message-ID: On Mon, 8 Sep 2008, Hal Rosenstock wrote: > > Perhaps I have overlooked something? If not, what would the recommended > > way of setting this bit in the capabilityMask be? > > The only way I see to do this from user space is something like the following: > > SubnGet PortInfo of local port > Set IsDeviceManagementSupport bit in PortInfo.CapabilityMask > Change any other PortInfo fields so set will work (LinkState > andPortPhysicalState set to no state change, don't think any others > need changing) > SubnSet PortInfo of local port (make sure set worked) > > The downside is that if your application crashes you will need to have > a cleanup program to unset that bit. The IBA lists the capabilityMask field as read-only. Is doing a SubnSet on the PortInfo.CapabilityMask field supported in OFED? That would solve the immediate problem. TG From ralph.campbell at qlogic.com Mon Sep 8 13:57:35 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Mon, 08 Sep 2008 13:57:35 -0700 Subject: [ofa-general] DM question In-Reply-To: References: Message-ID: <1220907455.30937.125.camel@chromite.mv.qlogic.com> On Mon, 2008-09-08 at 14:49 -0600, Terry Greeniaus wrote: > On Mon, 8 Sep 2008, Hal Rosenstock wrote: > > > > Perhaps I have overlooked something? If not, what would the recommended > > > way of setting this bit in the capabilityMask be? > > > > The only way I see to do this from user space is something like the following: > > > > SubnGet PortInfo of local port > > Set IsDeviceManagementSupport bit in PortInfo.CapabilityMask > > Change any other PortInfo fields so set will work (LinkState > > andPortPhysicalState set to no state change, don't think any others > > need changing) > > SubnSet PortInfo of local port (make sure set worked) > > > > The downside is that if your application crashes you will need to have > > a cleanup program to unset that bit. > > The IBA lists the capabilityMask field as read-only. Is doing a > SubnSet on the PortInfo.CapabilityMask field supported in OFED? That > would solve the immediate problem. You can't use the SubnSet(Portinfo) MADs to change the PortInfo.CapabilityMask. You (or someone else) will need to modify the kernel to call ib_modify_port() with the bit set in ib_port_modify.set_port_cap_mask. From hal.rosenstock at gmail.com Mon Sep 8 16:33:15 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 8 Sep 2008 19:33:15 -0400 Subject: ***SPAM*** Re: [ofa-general] DM question In-Reply-To: References: Message-ID: On Mon, Sep 8, 2008 at 4:49 PM, Terry Greeniaus wrote: > On Mon, 8 Sep 2008, Hal Rosenstock wrote: > >> > Perhaps I have overlooked something? If not, what would the recommended >> > way of setting this bit in the capabilityMask be? >> >> The only way I see to do this from user space is something like the following: >> >> SubnGet PortInfo of local port >> Set IsDeviceManagementSupport bit in PortInfo.CapabilityMask >> Change any other PortInfo fields so set will work (LinkState >> andPortPhysicalState set to no state change, don't think any others >> need changing) >> SubnSet PortInfo of local port (make sure set worked) >> >> The downside is that if your application crashes you will need to have >> a cleanup program to unset that bit. > > The IBA lists the capabilityMask field as read-only. Is doing a > SubnSet on the PortInfo.CapabilityMask field supported in OFED? You're right; my bad :-( -- Hal > That > would solve the immediate problem. > > TG > From hal.rosenstock at gmail.com Mon Sep 8 16:36:05 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 8 Sep 2008 19:36:05 -0400 Subject: [ofa-general] DM question In-Reply-To: <1220907455.30937.125.camel@chromite.mv.qlogic.com> References: <1220907455.30937.125.camel@chromite.mv.qlogic.com> Message-ID: On Mon, Sep 8, 2008 at 4:57 PM, Ralph Campbell wrote: > On Mon, 2008-09-08 at 14:49 -0600, Terry Greeniaus wrote: >> On Mon, 8 Sep 2008, Hal Rosenstock wrote: >> >> > > Perhaps I have overlooked something? If not, what would the recommended >> > > way of setting this bit in the capabilityMask be? >> > >> > The only way I see to do this from user space is something like the following: >> > >> > SubnGet PortInfo of local port >> > Set IsDeviceManagementSupport bit in PortInfo.CapabilityMask >> > Change any other PortInfo fields so set will work (LinkState >> > andPortPhysicalState set to no state change, don't think any others >> > need changing) >> > SubnSet PortInfo of local port (make sure set worked) >> > >> > The downside is that if your application crashes you will need to have >> > a cleanup program to unset that bit. >> >> The IBA lists the capabilityMask field as read-only. Is doing a >> SubnSet on the PortInfo.CapabilityMask field supported in OFED? That >> would solve the immediate problem. > > You can't use the SubnSet(Portinfo) MADs to change the > PortInfo.CapabilityMask. Right. >You (or someone else) will need > to modify the kernel to call ib_modify_port() with the > bit set in ib_port_modify.set_port_cap_mask. Doing it in the kernel is the straightforward part. It's not available from user space so something needs to be added for that. It could be done like issm but it was decided not to chew up additional fds needlessly. -- Hal > From christopher.tanner at gatech.edu Mon Sep 8 22:08:00 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Tue, 9 Sep 2008 01:08:00 -0400 Subject: [ofa-general] Compiled IB packages Message-ID: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> I am setting up a 16-node (homogeneous) cluster running Ubuntu 8.04 server with Mellanox Infiniband cards. I downloaded (from the OpenFabrics website), compiled, and installed the following IB packages on the master node into the /usr/local/lib directory. The / usr/local directory is being shared to all of the nodes via NFS. All packages seemed to compile and install fine. libibverbs librdmacm libibcm libipathverbs dapl compat-dapl libmlx4 libmthca libcxgb3 libibcommon libibumad libibmad opensm infiniband-diags I have a few questions: a) Do I need to run 'make install' on each node or just the master node? All of the libraries in /usr/local/lib are visible to all nodes... Stated another way, does 'make install' put files elsewhere beside the /usr/local/lib directory? Does it alter OS configuration files to tell it to look for certain files in /usr/local/lib? b) I know I need to load the IB kernel modules (mlx4_core, mlx4_ib, rdma_ucm, ib_core, ib_mad, ib_mthca, ib_umad, ib_uverbs) in order for the IB cards to work. Are these compiled and installed with the above packages? Where does the kernel know where to look for modules? (Sorry, this question is very similar to the first one). c) The OFED software stack contains some stuff that isn't available for source download (e.g. ib-bonding, ibsim, libsdp). Are these necessary for the IB network to operate correctly? Since I'm running Ubuntu, obviously the src.rpm file won't work... Thanks to all for you help. Previous responses regarding issues with OpenSM worked great. ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- From vlad at lists.openfabrics.org Tue Sep 9 03:09:00 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 9 Sep 2008 03:09:00 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080909-0200 daily build status Message-ID: <20080909100900.62052E60CF5@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From mschlining at datadirectnet.com Tue Sep 9 06:07:23 2008 From: mschlining at datadirectnet.com (Marty Schlining) Date: Tue, 9 Sep 2008 06:07:23 -0700 Subject: [ofa-general] Forcing a DDR HCA to SDR speeds Message-ID: <60BA2AA14940C9429038D4E2BC53008D1645AB068D@MAILBOXCLUSTER.datadirect.datadirectnet.com> With OFED 1.3.1 or 1.4, is it possible to force the link speed of a DDR HCA port or the entire DDR HCA from a DDR link to strictly SDR? If so, how can it be done? The HCAs in question is a Mellanox MT25208 dual port HCA, rev A3, firmware 4.8.2. Martin Schlining mschlining at datadirectnet.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From trzyna at us.ibm.com Tue Sep 9 06:01:18 2008 From: trzyna at us.ibm.com (Matthew Trzyna) Date: Tue, 9 Sep 2008 07:01:18 -0600 Subject: [ofa-general] OpenSM Problems/Questions Message-ID: Hello A "Basic Fabric Diagram" at the end. I am working with a customer implementing a large IB fabric and is encountering problems with OpenSM (OFED 1.3) when they added a new 264 node cluster (with its own 288 port IB switch) to their existing cluster. Two more 264 clusters are planned to be added in the near future. They recently moved to SLES 10 SP1 and OFED 1.3 (before adding the new cluster) and had not been experiencing these problems before. Could you help provide answers to the questions listed below? Additional information about the configuration including a basic fabric diagram are provided after the questions. What parameters should be set on the non-SM nodes that affect how the Subnet Administrator functions? What parameters should be set on the SM node(s) that affect how the Subnet Administrator functions? And, what parameters should be removed from the SM node(s)? (ie. ib_sa paths_per_dest=0x7f) How should SM failover be setup? How many failover SM's should be configured? This must happen quickly and transparently or GPFS will die everywhere due to timeouts if this takes too long). Are there SA (Subnet Administrator) commands that should not be executed on a large "live" fabric? (ie. "saquery -p") Should GPFS be configured "off" on the SM node(s)? Do you know of any other OpenSM implementations that have 5 (or more) 288 port IB switches that might have already encountered/resolved some of these issues? The following problem that is being encountered may also be SA/SM related. A node (NodeX) may be seen (through IPoIB) by all but a few nodes (NodesA-G). A ping from those node (NodesA-G) to NodeX returns "Destination Host Unreachable". A ping from NodeX to NodesA-G works. -------------------------------------------------------------------------------------------------- System Information Here is the current opensm.conf file: (See attached file: opensm.conf) It is the default configuration from the OFED 1.3 build with "priority" added at the bottom. Note that the /etc/init.d/opensmd sources /etc/sysconfig/opensm not etc/sysconfig/opensm.conf (opensm.conf was just copied to opensm). There are a couple of "proposed" settings that are commented out, that were found them on the web. Following are the present settings that may affect the Fabric: /etc/infiniband/openib.conf SET_IPOIB_CM=no /etc/modprobe.conf.local options ib_ipoib send_queue_size=512 recv_queue_size=512 options ib_sa paths_per_dest=0x7f /etc/sysctl.conf net.ipv4.neigh.ib0.base_reachable_time = 1200 net.ipv4.neigh.default.gc_thresh3 = 3072 net.ipv4.neigh.default.gc_thresh2 = 2500 net.ipv4.neigh.default.gc_thresh1 = 2048 /etc/sysconfig/opensm All defaults as supplied with OFED 1.3 OpenSM ------------------------------------------------------- Basic Fabric Diagram +----------+ |Top Level |-------------------+ 20 IO nodes +-----------------| 288 port |----------------+ 16 Viual nodes | | IB Sw |------------+ | 2 Admin nodes | +------| |---+ | | (SM nodes) | | +----------+ | | | 4 Support nodes | | | | | | | | | | | | 24 24 24 24 24 24 <--uplinks | | | | | | | | | | | +------+ | | | | | | |(BASE) |(SCU1) |(SCU2) |(SCU3) |(SCU4) |(SCU5) +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |288-port| |288-port| |288-port| |288-port| |288-port| |288-port| | IB Sw | | IB Sw | | IB Sw | | IB Sw | | IB Sw | | IB Sw | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ 140-nodes 264-nodes 264-nodes 264-nodes 264-nodes 264-nodes WhiteBox Dell Dell IBM IBM IBM (future) NOTE: SCU4 is not currently connected to the Top Level Switch. We'd like to address these issues before making that connection. Subnet Managers are configured on nodes connected to the Top Leval Switch. Let me know if you need any more information. Any help you could provide would be most appreciated. Thanks. Matt Trzyna IBM Linux Cluster Enablement 3039 Cornwallis Rd. RTP, NC 27709 e-mail: trzyna at us.ibm.com Office: (919) 254-9917 Tie Line: 444 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: opensm.conf Type: application/octet-stream Size: 4797 bytes Desc: not available URL: From tziporet at mellanox.co.il Tue Sep 9 07:22:33 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 9 Sep 2008 17:22:33 +0300 Subject: [ofa-general] OFED meeting summary for Sep 8, 2008 on 1.4 status Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD75971B@mtlexch01.mtl.com> OFED meeting summary for Sep 8, 2008 on 1.4 status ================================================== Summary: ======== - 1.4-rc1 is done on Sep 9. - Took a target to clean all compilation warning for RC2 - Moved to weekly meetings starting this week Details: ======== 1. Missing features for RC2: - NFS-RDMA over RHEL 5.1 - OSM: Cashed routing 2. We decided to cleanup all warnings for RC2 Each module owner - please start the cleanup work 3. Bugs review: bug_id bug_severity assigned_to Status update 1128 blocker stefan.roscher at de.ibm.com fix under test in low level driver - should be done this week 1171 critical swise at opengridcomputing.com should be fixed for rc2 1113 critical vu at mellanox.com on work 1117 critical yannick.cote at qlogic.com should be fixed in rc1 - to be tested by Qlogic 1153 major yosefe at voltaire.com On work - Voltaire 1164 normal eli at mellanox.co.il need details on the HCA type 1178 normal pasha at mellanox.co.il 1160 normal perkinjo at cse.ohio-state.edu Should be fixed with new package 1131 normal sashak at voltaire.com 1132 normal sashak at voltaire.com 1136 normal sashak at voltaire.com 4. Testing matrix: Betsy need to send Qlogic testing table Tziporet should publish the full matrix Tziporet From hal.rosenstock at gmail.com Tue Sep 9 07:36:50 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 9 Sep 2008 10:36:50 -0400 Subject: ***SPAM*** Re: [ofa-general] Forcing a DDR HCA to SDR speeds In-Reply-To: <60BA2AA14940C9429038D4E2BC53008D1645AB068D@MAILBOXCLUSTER.datadirect.datadirectnet.com> References: <60BA2AA14940C9429038D4E2BC53008D1645AB068D@MAILBOXCLUSTER.datadirect.datadirectnet.com> Message-ID: On Tue, Sep 9, 2008 at 9:07 AM, Marty Schlining wrote: > With OFED 1.3.1 or 1.4, is it possible to force the link speed of a DDR HCA > port or the entire DDR HCA from a DDR link to strictly SDR? If so, how can > it be done? The HCAs in question is a Mellanox MT25208 dual port HCA, rev > A3, firmware 4.8.2. Yes. For an individual port, look at the infiniband-diags ibportstate command included in management: ibportstate speed There are ramifications on the SM to use this in that it must not overwrite the speed (PortInfo:LinkSpeedEnabled). OpenSM has a force_link_speed option for this which can be set to 0 for this type of operation. You will also need to reset the port to make it take effect as renegotiation does not occur unless this is done. ibportstate reset The reset is only allowed on switch port so the peer port must be found. If you want all ports to be SDR and are using OpenSM, you can just set force_link_speed to SDR (1). -- Hal > > Martin Schlining > > mschlining at datadirectnet.com > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From akepner at sgi.com Tue Sep 9 07:54:35 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Tue, 9 Sep 2008 07:54:35 -0700 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled Message-ID: <20080909145435.GO2316@sgi.com> If a socket's sk_write_space() method expects to run with interrupts enabled, syslog can get very noisy with messages like: Badness in local_bh_enable at kernel/softirq.c:140 Call Trace: [] show_stack+0x40/0xa0 [] dump_stack+0x30/0x60 [] local_bh_enable+0x90/0x140 [] _spin_unlock_bh+0x30/0x60 [] svc_sock_enqueue+0x750/0x780 [sunrpc] [] svc_write_space+0xc0/0x1c0 [sunrpc] [] sock_wfree+0xd0/0x140 [] ipoib_send+0x1120/0x14a0 [ib_ipoib] [] ipoib_start_xmit+0x380/0x1140 [ib_ipoib] [] dev_hard_start_xmit+0x4b0/0x680 [] __qdisc_run+0x2d0/0x680 A simple fix is to defer calling skb_orphan() until interrupts have been reenabled. Signed-off-by: Arthur Kepner --- diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index b0ffc9a..8c9dcf1 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -440,7 +440,7 @@ int ipoib_open(struct net_device *dev); int ipoib_add_pkey_attr(struct net_device *dev); int ipoib_add_umcast_attr(struct net_device *dev); -void ipoib_send(struct net_device *dev, struct sk_buff *skb, +int ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_ah *address, u32 qpn); void ipoib_reap_ah(struct work_struct *work); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 66cafa2..711a3ac 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -525,13 +525,14 @@ static inline int post_send(struct ipoib_dev_priv *priv, return ib_post_send(priv->qp, &priv->tx_wr, &bad_wr); } -void ipoib_send(struct net_device *dev, struct sk_buff *skb, +int ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_ah *address, u32 qpn) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_tx_buf *tx_req; int hlen; void *phead; + int ret = 1; /* assume the worst */ if (skb_is_gso(skb)) { hlen = skb_transport_offset(skb) + tcp_hdrlen(skb); @@ -541,7 +542,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, ++dev->stats.tx_dropped; ++dev->stats.tx_errors; dev_kfree_skb_any(skb); - return; + return 1; } } else { if (unlikely(skb->len > priv->mcast_mtu + IPOIB_ENCAP_LEN)) { @@ -550,7 +551,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, ++dev->stats.tx_dropped; ++dev->stats.tx_errors; ipoib_cm_skb_too_long(dev, skb, priv->mcast_mtu); - return; + return 1; } phead = NULL; hlen = 0; @@ -571,7 +572,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, if (unlikely(ipoib_dma_map_tx(priv->ca, tx_req))) { ++dev->stats.tx_errors; dev_kfree_skb_any(skb); - return; + return 1; } if (skb->ip_summed == CHECKSUM_PARTIAL) @@ -593,6 +594,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, --priv->tx_outstanding; ipoib_dma_unmap_tx(priv->ca, tx_req); dev_kfree_skb_any(skb); + ret = 1; if (netif_queue_stopped(dev)) netif_wake_queue(dev); } else { @@ -600,13 +602,14 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, address->last_send = priv->tx_head; ++priv->tx_head; - skb_orphan(skb); - + ret = 0; } if (unlikely(priv->tx_outstanding > MAX_SEND_CQE)) while (poll_tx(priv)) ; /* nothing */ + + return ret; } static void __ipoib_reap_ah(struct net_device *dev) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 7e9e218..b67c793 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -604,7 +604,7 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) goto err_drop; } } else - ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha)); + (void) ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha)); } else { neigh->ah = NULL; @@ -685,7 +685,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, ipoib_dbg(priv, "Send unicast ARP to %04x\n", be16_to_cpu(path->pathrec.dlid)); - ipoib_send(dev, skb, path->ah, IPOIB_QPN(phdr->hwaddr)); + (void) ipoib_send(dev, skb, path->ah, IPOIB_QPN(phdr->hwaddr)); } else if ((path->query || !path_rec_start(dev, path)) && skb_queue_len(&path->queue) < IPOIB_MAX_PATH_REC_QUEUE) { /* put pseudoheader back on for next time */ @@ -704,6 +704,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_neigh *neigh; unsigned long flags; + int orphan = 0; if (unlikely(!spin_trylock_irqsave(&priv->tx_lock, flags))) return NETDEV_TX_LOCKED; @@ -743,7 +744,9 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) goto out; } } else if (neigh->ah) { - ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(skb->dst->neighbour->ha)); + int ret; + ret = ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(skb->dst->neighbour->ha)); + orphan = !ret; goto out; } @@ -788,6 +791,8 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) out: spin_unlock_irqrestore(&priv->tx_lock, flags); + if (orphan) + skb_orphan(skb); return NETDEV_TX_OK; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index ac33c8f..d491801 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -723,7 +723,7 @@ out: } } - ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN); + (void) ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN); } unlock: From vlad at mellanox.co.il Tue Sep 9 08:03:42 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 09 Sep 2008 18:03:42 +0300 Subject: [ofa-general] Compiled IB packages In-Reply-To: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> References: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> Message-ID: <48C6904E.1020606@mellanox.co.il> Christopher Tanner wrote: > I am setting up a 16-node (homogeneous) cluster running Ubuntu 8.04 > server with Mellanox Infiniband cards. I downloaded (from the > OpenFabrics website), compiled, and installed the following IB packages > on the master node into the /usr/local/lib directory. The /usr/local > directory is being shared to all of the nodes via NFS. All packages > seemed to compile and install fine. > > libibverbs > librdmacm > libibcm > libipathverbs > dapl > compat-dapl > libmlx4 > libmthca > libcxgb3 > libibcommon > libibumad > libibmad > opensm > infiniband-diags > > I have a few questions: > a) Do I need to run 'make install' on each node or just the master node? > All of the libraries in /usr/local/lib are visible to all nodes... > Stated another way, does 'make install' put files elsewhere beside the > /usr/local/lib directory? Does it alter OS configuration files to tell > it to look for certain files in /usr/local/lib? > No, all the packages above will put their files under /usr/local > b) I know I need to load the IB kernel modules (mlx4_core, mlx4_ib, > rdma_ucm, ib_core, ib_mad, ib_mthca, ib_umad, ib_uverbs) in order for > the IB cards to work. Are these compiled and installed with the above > packages? Where does the kernel know where to look for modules? (Sorry, > this question is very similar to the first one). > The packages above are user space libraries/binaries. To install kernel modules you should download the latest version of the ofa_1_4_kernel tgz file from: http://www.openfabrics.org/downloads/ofa_1_4_kernel/ To install, run: ./configure --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mthca_debug-mod --with-mlx4-mod --with-mlx4_en-mod --with-mlx4_debug-mod --with-cxgb3-mod --with-ehca-mod --with-ipoib-mod --with-ipoib_debug-mod (... , see --help) make make install > c) The OFED software stack contains some stuff that isn't available for > source download (e.g. ib-bonding, ibsim, libsdp). Are these necessary > for the IB network to operate correctly? Since I'm running Ubuntu, > obviously the src.rpm file won't work... > All OFED tgz files that are available under: http://www.openfabrics.org/~vlad/ofed_1_4/SOURCES/ ib-bonding source RPM can be downloaded from (you can open it to get tgz file using cpio, if you need): http://www.openfabrics.org/~monis/ofed_1_4/ This packages are not necessary for the IB network to operate correctly, but it depends on what are you planning to do. Regards, Vladimir > Thanks to all for you help. Previous responses regarding issues with > OpenSM worked great. > > ------------------------------------------- > Chris Tanner > Space Systems Design Lab > Georgia Institute of Technology > christopher.tanner at gatech.edu > ------------------------------------------- From tziporet at mellanox.co.il Tue Sep 9 08:20:23 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 9 Sep 2008 18:20:23 +0300 Subject: [ofa-general] OFED 1.4-RC1 is available Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD75979A@mtlexch01.mtl.com> Hi, OFED 1.4-RC1 release is available on http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc1.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ for OFED 1.4 Tziporet & Vladimir ======================================================================== Release information: -------------------- Linux Operating Systems: - RedHat EL4 up4: 2.6.9-42.ELsmp * - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7: 2.6.9-78.ELsmp - RedHat EL5: 2.6.18-8.el5 - RedHat EL5 up1: 2.6.18-53.el5 - RedHat EL5 up2: 2.6.18-92.el5 - CentOS 5.2: 2.6.18-92.el5 - Fedora C9: 2.6.25-14.fc9 * - SLES10: 2.6.16.21-0.8-smp - SLES10 SP1: 2.6.16.46-0.12-smp - SLES10 SP1 up1: 2.6.16.53-0.16-smp - SLES10 SP2: 2.6.16.60-0.21-smp - OpenSuSE 10.3: 2.6.22.5-31 * - kernel.org: 2.6.26 and 2.6.27-rc5 * Minimal QA for these versions Systems: * x86_64 * x86 * ia64 * ppc64 Main Changes from OFED 1.4-beta =============================== o Kernel code based on 2.6.27-rc5 o Added NFS-RDMA support for SLES10 SP2 and kernel 2.6.26 and 27 o iSER backports added and its now available o New MPI packages: Open MPI 1.2.7, MVAPICH 1.1 and MVAPICH2 1.1 o New DAPL libraries o 37 bugs fixed (see attached for details) Tasks that should be completed for the RC2: =========================================== 1. NFS-RDMA to work on RHEL 5.1 2. OSM: Cashed routing 3. Cleanup compilation warning 4. Bug fixes -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed-1.4-rc1-fixed-bugs.csv Type: application/octet-stream Size: 3628 bytes Desc: ofed-1.4-rc1-fixed-bugs.csv URL: From yossi.openib at gmail.com Tue Sep 9 09:52:17 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Tue, 09 Sep 2008 19:52:17 +0300 Subject: [ofa-general] ***SPAM*** [PATCH] ipoib: send creation parameters when doing send-only join Message-ID: <48C6A9C1.5070108@gmail.com> If creation parameters are not sent a sender will not trigger mcast group creation. Fixes bug #1153 in bugzilla. Signed-off-by: Yossi Etigin -- Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-09-08 23:04:46.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-09-09 19:40:26.000000000 +0300 @@ -327,6 +327,7 @@ static int ipoib_mcast_sendonly_join(str .join_state = 1 #endif }; + ib_sa_comp_mask comp_mask; int ret = 0; if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags)) { @@ -339,16 +340,37 @@ static int ipoib_mcast_sendonly_join(str return -EBUSY; } - rec.mgid = mcast->mcmember.mgid; - rec.port_gid = priv->local_gid; - rec.pkey = cpu_to_be16(priv->pkey); + comp_mask = + IB_SA_MCMEMBER_REC_MGID | + IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_PKEY | + IB_SA_MCMEMBER_REC_JOIN_STATE | + IB_SA_MCMEMBER_REC_QKEY | + IB_SA_MCMEMBER_REC_MTU_SELECTOR | + IB_SA_MCMEMBER_REC_MTU | + IB_SA_MCMEMBER_REC_TRAFFIC_CLASS | + IB_SA_MCMEMBER_REC_RATE_SELECTOR | + IB_SA_MCMEMBER_REC_RATE | + IB_SA_MCMEMBER_REC_SL | + IB_SA_MCMEMBER_REC_FLOW_LABEL | + IB_SA_MCMEMBER_REC_HOP_LIMIT; + + rec.mgid = mcast->mcmember.mgid; + rec.port_gid = priv->local_gid; + rec.pkey = cpu_to_be16(priv->pkey); + rec.qkey = priv->broadcast->mcmember.qkey; + rec.mtu_selector = IB_SA_EQ; + rec.mtu = priv->broadcast->mcmember.mtu; + rec.traffic_class = priv->broadcast->mcmember.traffic_class; + rec.rate_selector = IB_SA_EQ; + rec.rate = priv->broadcast->mcmember.rate; + rec.sl = priv->broadcast->mcmember.sl; + rec.flow_label = priv->broadcast->mcmember.flow_label; + rec.hop_limit = priv->broadcast->mcmember.hop_limit; mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port, &rec, - IB_SA_MCMEMBER_REC_MGID | - IB_SA_MCMEMBER_REC_PORT_GID | - IB_SA_MCMEMBER_REC_PKEY | - IB_SA_MCMEMBER_REC_JOIN_STATE, + comp_mask, GFP_ATOMIC, ipoib_mcast_sendonly_join_complete, mcast); -- From chu11 at llnl.gov Tue Sep 9 10:01:43 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 09 Sep 2008 10:01:43 -0700 Subject: [ofa-general] [OpenSM][Trivial] Fix comment typo Message-ID: <1220979703.27074.56.camel@cardanus.llnl.gov> Hey Sasha, Noticed it while looking at some other code in the header file. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fix-comment-typo.patch Type: text/x-patch Size: 759 bytes Desc: not available URL: From hal.rosenstock at gmail.com Tue Sep 9 11:35:48 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 9 Sep 2008 14:35:48 -0400 Subject: ***SPAM*** Re: [ofa-general] OpenSM Problems/Questions In-Reply-To: References: Message-ID: Hi, On Tue, Sep 9, 2008 at 9:01 AM, Matthew Trzyna wrote: > Hello > > > A "Basic Fabric Diagram" at the end. > > > I am working with a customer implementing a large IB fabric and is > encountering problems with OpenSM (OFED 1.3) when they added a new 264 node > cluster (with its own 288 port IB switch) to their existing cluster. Two > more 264 clusters are planned to be added in the near future. They recently > moved to SLES 10 SP1 and OFED 1.3 (before adding the new cluster) and had > not been experiencing these problems before. > > Could you help provide answers to the questions listed below? Additional > information about the configuration including a basic fabric diagram are > provided after the questions. > > What parameters should be set on the non-SM nodes that affect how the Subnet > Administrator functions? > What parameters should be set on the SM node(s) that affect how the Subnet > Administrator functions? And, what parameters should be removed from the SM > node(s)? (ie. ib_sa paths_per_dest=0x7f) > How should SM failover be setup? How many failover SM's should be > configured? This must happen quickly and transparently or GPFS will die > everywhere due to timeouts if this takes too long). What is quickly enough ? > Are there SA (Subnet Administrator) commands that should not be executed on > a large "live" fabric? (ie. "saquery -p") > Should GPFS be configured "off" on the SM node(s)? > Do you know of any other OpenSM implementations that have 5 (or more) 288 > port IB switches that might have already encountered/resolved some of these > issues? There are some deployments with multiple large switches deployed. Not sure what you mean by issues; I see questions above. > The following problem that is being encountered may also be SA/SM related. A > node (NodeX) may be seen (through IPoIB) by all but a few nodes (NodesA-G). > A ping from those node (NodesA-G) to NodeX returns "Destination Host > Unreachable". A ping from NodeX to NodesA-G works. Sounds like perhaps those nodes were unable to join the broadcast group perhaps due to a rate issue. -- Hal > -------------------------------------------------------------------------------------------------- > > System Information > > Here is the current opensm.conf file: (See attached file: opensm.conf) > > It is the default configuration from the OFED 1.3 build with "priority" > added at the bottom. Note that the /etc/init.d/opensmd sources > /etc/sysconfig/opensm not etc/sysconfig/opensm.conf (opensm.conf was just > copied to opensm). There are a couple of "proposed" settings that are > commented out, that were found them on the web. > > Following are the present settings that may affect the Fabric: > > /etc/infiniband/openib.conf > SET_IPOIB_CM=no > > /etc/modprobe.conf.local > options ib_ipoib send_queue_size=512 recv_queue_size=512 > options ib_sa paths_per_dest=0x7f > > /etc/sysctl.conf > net.ipv4.neigh.ib0.base_reachable_time = 1200 > net.ipv4.neigh.default.gc_thresh3 = 3072 > net.ipv4.neigh.default.gc_thresh2 = 2500 > net.ipv4.neigh.default.gc_thresh1 = 2048 > > /etc/sysconfig/opensm > All defaults as supplied with OFED 1.3 OpenSM > > > ------------------------------------------------------- > > > Basic Fabric Diagram > > +----------+ > |Top Level |-------------------+ 20 IO nodes > +-----------------| 288 port |----------------+ 16 Viual nodes > | | IB Sw |------------+ | 2 Admin nodes > | +------| |---+ | | (SM nodes) > | | +----------+ | | | 4 Support nodes > | | | | | | > | | | | | | > 24 24 24 24 24 24 <--uplinks > | | | | | | > | | | | | +------+ > | | | | | | > |(BASE) |(SCU1) |(SCU2) |(SCU3) |(SCU4) |(SCU5) > +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ > |288-port| |288-port| |288-port| |288-port| |288-port| |288-port| > | IB Sw | | IB Sw | | IB Sw | | IB Sw | | IB Sw | | IB Sw | > +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ > 140-nodes 264-nodes 264-nodes 264-nodes 264-nodes 264-nodes > WhiteBox Dell Dell IBM IBM IBM (future) > > NOTE: SCU4 is not currently connected to the Top Level Switch. > We'd like to address these issues before making that connection. > > Subnet Managers are configured on nodes connected to the > Top Leval Switch. > > Let me know if you need any more information. > > Any help you could provide would be most appreciated. > > Thanks. > > Matt Trzyna > IBM Linux Cluster Enablement > 3039 Cornwallis Rd. > RTP, NC 27709 > e-mail: trzyna at us.ibm.com > Office: (919) 254-9917 Tie Line: 444 > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From christopher.tanner at gatech.edu Tue Sep 9 11:53:39 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Tue, 9 Sep 2008 14:53:39 -0400 Subject: [ofa-general] Compiled IB packages In-Reply-To: <48C6904E.1020606@mellanox.co.il> References: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> <48C6904E.1020606@mellanox.co.il> Message-ID: Thanks Vladimir - very helpful. However, I'm running into a problem with compiling the ofa package. First, I had to specify the source location on the command line (Ubuntu puts it in a different place than RedHat or SUSE): $ ./configure --kernel-sources=/usr/src/linux-source-2.6.24 ... (other stuff) I'm getting this error: ERROR: Kernel configuration is invalid. include/linux/autoconf.h or include/config/auto.conf are missing. Run 'make oldconfig && make prepare' on kernel src to fix it. This is confusing b/c both of those files exist. $ locate autoconf.h /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h $ locate auto.conf /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf There's a whole bunch more errors that I assume spawn because of this initial error. The output from 'make' is attached (it's pretty long). Let me know what you think. Thanks! ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- On Sep 9, 2008, at 11:03 AM, Vladimir Sokolovsky wrote: > Christopher Tanner wrote: >> I am setting up a 16-node (homogeneous) cluster running Ubuntu 8.04 >> server with Mellanox Infiniband cards. I downloaded (from the >> OpenFabrics website), compiled, and installed the following IB >> packages on the master node into the /usr/local/lib directory. The / >> usr/local directory is being shared to all of the nodes via NFS. >> All packages seemed to compile and install fine. >> libibverbs >> librdmacm >> libibcm >> libipathverbs >> dapl >> compat-dapl >> libmlx4 >> libmthca >> libcxgb3 >> libibcommon >> libibumad >> libibmad >> opensm >> infiniband-diags >> I have a few questions: >> a) Do I need to run 'make install' on each node or just the master >> node? All of the libraries in /usr/local/lib are visible to all >> nodes... Stated another way, does 'make install' put files >> elsewhere beside the /usr/local/lib directory? Does it alter OS >> configuration files to tell it to look for certain files in /usr/ >> local/lib? > > No, all the packages above will put their files under /usr/local > >> b) I know I need to load the IB kernel modules (mlx4_core, >> mlx4_ib, rdma_ucm, ib_core, ib_mad, ib_mthca, ib_umad, ib_uverbs) >> in order for the IB cards to work. Are these compiled and installed >> with the above packages? Where does the kernel know where to look >> for modules? (Sorry, this question is very similar to the first one). > > The packages above are user space libraries/binaries. To install > kernel > modules you should download the latest version of the ofa_1_4_kernel > tgz file from: > > http://www.openfabrics.org/downloads/ofa_1_4_kernel/ > To install, run: > ./configure --with-core-mod --with-user_mad-mod --with-user_access- > mod --with-addr_trans-mod --with-mthca-mod --with-mthca_debug-mod -- > with-mlx4-mod --with-mlx4_en-mod --with-mlx4_debug-mod --with-cxgb3- > mod --with-ehca-mod --with-ipoib-mod --with-ipoib_debug-mod (... , > see --help) > make > make install > > >> c) The OFED software stack contains some stuff that isn't available >> for source download (e.g. ib-bonding, ibsim, libsdp). Are these >> necessary for the IB network to operate correctly? Since I'm >> running Ubuntu, obviously the src.rpm file won't work... > > All OFED tgz files that are available under: > http://www.openfabrics.org/~vlad/ofed_1_4/SOURCES/ > > ib-bonding source RPM can be downloaded from (you can open it to get > tgz file using cpio, if you need): > http://www.openfabrics.org/~monis/ofed_1_4/ > > This packages are not necessary for the IB network to operate > correctly, but > it depends on what are you planning to do. > > Regards, > Vladimir > >> Thanks to all for you help. Previous responses regarding issues >> with OpenSM worked great. >> ------------------------------------------- >> Chris Tanner >> Space Systems Design Lab >> Georgia Institute of Technology >> christopher.tanner at gatech.edu >> ------------------------------------------- From weiny2 at llnl.gov Tue Sep 9 12:11:40 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 9 Sep 2008 12:11:40 -0700 Subject: ***SPAM*** Re: [ofa-general] OpenSM Problems/Questions In-Reply-To: References: Message-ID: <20080909121140.1ec7838b.weiny2@llnl.gov> On Tue, 9 Sep 2008 14:35:48 -0400 "Hal Rosenstock" wrote: > Hi, > > On Tue, Sep 9, 2008 at 9:01 AM, Matthew Trzyna wrote: > > Hello > > > > > > A "Basic Fabric Diagram" at the end. > > > > > > I am working with a customer implementing a large IB fabric and is > > encountering problems with OpenSM (OFED 1.3) when they added a new 264 node > > cluster (with its own 288 port IB switch) to their existing cluster. Two > > more 264 clusters are planned to be added in the near future. They recently > > moved to SLES 10 SP1 and OFED 1.3 (before adding the new cluster) and had > > not been experiencing these problems before. Are there routing issues? > > > > Could you help provide answers to the questions listed below? Additional > > information about the configuration including a basic fabric diagram are > > provided after the questions. > > > > What parameters should be set on the non-SM nodes that affect how the Subnet > > Administrator functions? > > What parameters should be set on the SM node(s) that affect how the Subnet > > Administrator functions? And, what parameters should be removed from the SM > > node(s)? (ie. ib_sa paths_per_dest=0x7f) > > How should SM failover be setup? How many failover SM's should be > > configured? This must happen quickly and transparently or GPFS will die > > everywhere due to timeouts if this takes too long). > > What is quickly enough ? What does GPFS do that requires the SM/SA to be constantly available? Lustre is pretty stable (IB wise) once connected. Our SysAdmins can restart the SM almost at will without issues. As an asside, we do not run with a standby SM. We have not had many instances where OpenSM crashes (probably about 3 times in 3 years). So I think it is important to find out why GPFS needs the SM/SA and then make sure that is available. > > > Are there SA (Subnet Administrator) commands that should not be executed on > > a large "live" fabric? (ie. "saquery -p") > > Should GPFS be configured "off" on the SM node(s)? > > Do you know of any other OpenSM implementations that have 5 (or more) 288 > > port IB switches that might have already encountered/resolved some of these > > issues? > > There are some deployments with multiple large switches deployed. We have 2 clusters which currently have 4x288 port switches in them. Plus many more 24 port "leafs" off of those cores. OpenSM, while not perfect, does work quite well for us. > > Not sure what you mean by issues; I see questions above. I am not sure what the questions are either. Are you having problems with any particular diag or with OpenSM not running (routing?) correctly? > > > The following problem that is being encountered may also be SA/SM related. A > > node (NodeX) may be seen (through IPoIB) by all but a few nodes (NodesA-G). > > A ping from those node (NodesA-G) to NodeX returns "Destination Host > > Unreachable". A ping from NodeX to NodesA-G works. > > Sounds like perhaps those nodes were unable to join the broadcast > group perhaps due to a rate issue. Hal is correct, and saquery is your friend here. If you use "genders" and "whatsup" (https://computing.llnl.gov/linux/downloads.html) I have a series of tools "Pragmatic InfiniBand Utilities (PIU)" (https://computing.llnl.gov/linux/piu.html) which includes a tool called "ibnodeinmcast" which can help debug this. What it does is use saquery [-g|-m] to find nodes in the multicast groups. With the addition of other LLNL tools this can be boiled down to which nodes "should" be in the group but are not. You are welcome to download that package and adapt it to your environment. Another cause could be that OpenSM is not routing something correctly. That will require some more debuging with dump_lfts.sh and dump_mfts.sh. Ira > > -- Hal > > > -------------------------------------------------------------------------------------------------- > > > > System Information > > > > Here is the current opensm.conf file: (See attached file: opensm.conf) > > > > It is the default configuration from the OFED 1.3 build with "priority" > > added at the bottom. Note that the /etc/init.d/opensmd sources > > /etc/sysconfig/opensm not etc/sysconfig/opensm.conf (opensm.conf was just > > copied to opensm). There are a couple of "proposed" settings that are > > commented out, that were found them on the web. > > > > Following are the present settings that may affect the Fabric: > > > > /etc/infiniband/openib.conf > > SET_IPOIB_CM=no > > > > /etc/modprobe.conf.local > > options ib_ipoib send_queue_size=512 recv_queue_size=512 > > options ib_sa paths_per_dest=0x7f > > > > /etc/sysctl.conf > > net.ipv4.neigh.ib0.base_reachable_time = 1200 > > net.ipv4.neigh.default.gc_thresh3 = 3072 > > net.ipv4.neigh.default.gc_thresh2 = 2500 > > net.ipv4.neigh.default.gc_thresh1 = 2048 > > > > /etc/sysconfig/opensm > > All defaults as supplied with OFED 1.3 OpenSM > > > > > > ------------------------------------------------------- > > > > > > Basic Fabric Diagram > > > > +----------+ > > |Top Level |-------------------+ 20 IO nodes > > +-----------------| 288 port |----------------+ 16 Viual nodes > > | | IB Sw |------------+ | 2 Admin nodes > > | +------| |---+ | | (SM nodes) > > | | +----------+ | | | 4 Support nodes > > | | | | | | > > | | | | | | > > 24 24 24 24 24 24 <--uplinks > > | | | | | | > > | | | | | +------+ > > | | | | | | > > |(BASE) |(SCU1) |(SCU2) |(SCU3) |(SCU4) |(SCU5) > > +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ > > |288-port| |288-port| |288-port| |288-port| |288-port| |288-port| > > | IB Sw | | IB Sw | | IB Sw | | IB Sw | | IB Sw | | IB Sw | > > +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ > > 140-nodes 264-nodes 264-nodes 264-nodes 264-nodes 264-nodes > > WhiteBox Dell Dell IBM IBM IBM (future) > > > > NOTE: SCU4 is not currently connected to the Top Level Switch. > > We'd like to address these issues before making that connection. > > > > Subnet Managers are configured on nodes connected to the > > Top Leval Switch. > > > > Let me know if you need any more information. > > > > Any help you could provide would be most appreciated. > > > > Thanks. > > > > Matt Trzyna > > IBM Linux Cluster Enablement > > 3039 Cornwallis Rd. > > RTP, NC 27709 > > e-mail: trzyna at us.ibm.com > > Office: (919) 254-9917 Tie Line: 444 > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http:// openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > From sweitzen at cisco.com Tue Sep 9 12:52:44 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 9 Sep 2008 12:52:44 -0700 Subject: [ofa-general] OFED 1.4-RC1 is available In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD75979A@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD75979A@mtlexch01.mtl.com> Message-ID: I am unable to build MVAPICH2 for multiple compilers: Building the MVAPICH2 RPM [OFA]... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'di st %{nil}' --target x86_64 --define '_name mvapich2_gcc' --define 'impl ofa' --d efine 'rdma --with-rdma=gen2' --define 'ib_include --with-ib-include=/usr/includ e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define 'shared_libs 1' - -define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran F90=gfortran' - -define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define '_pr efix /usr/mpi/gcc/mvapich2-1.2rc2' /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src .rpm Install mvapich2_gcc RPM: Running rpm -iv --nodeps /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv apich2_gcc-1.2rc2-4.x86_64.rpm Build mvapich2_pgi RPM Building the MVAPICH2 RPM [OFA]... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'di st %{nil}' --target x86_64 --define '_name mvapich2_pgi' --define 'impl ofa' --d efine 'rdma --with-rdma=gen2' --define 'ib_include --with-ib-include=/usr/includ e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define 'shared_libs 1' - -define 'romio 1' --define 'comp_env CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90' --def ine 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define '_prefix /usr/mpi/pgi/mvapich2-1.2rc2' /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src.rpm Install mvapich2_pgi RPM: Running rpm -iv --nodeps /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv apich2_pgi-1.2rc2-4.x86_64.rpm Failed to install mvapich2_pgi RPM See /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log # more /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log Preparing packages for installation... file /etc/mpe_graphics.conf from install of mvapich2_pgi-1.2rc2-4 confli cts with file from package mvapich2_gcc-1.2rc2-4 file /etc/mpe_log.conf from install of mvapich2_pgi-1.2rc2-4 conflicts w ith file from package mvapich2_gcc-1.2rc2-4 file /etc/mpe_mpianim.conf from install of mvapich2_pgi-1.2rc2-4 conflic ts with file from package mvapich2_gcc-1.2rc2-4 file /etc/mpe_mpicheck.conf from install of mvapich2_pgi-1.2rc2-4 confli cts with file from package mvapich2_gcc-1.2rc2-4 file /etc/mpe_mpilog.conf from install of mvapich2_pgi-1.2rc2-4 conflict s with file from package mvapich2_gcc-1.2rc2-4 file /etc/mpe_mpitrace.conf from install of mvapich2_pgi-1.2rc2-4 confli cts with file from package mvapich2_gcc-1.2rc2-4 file /etc/mpe_nolog.conf from install of mvapich2_pgi-1.2rc2-4 conflicts with file from package mvapich2_gcc-1.2rc2-4 file /etc/mpicc.conf from install of mvapich2_pgi-1.2rc2-4 conflicts wit h file from package mvapich2_gcc-1.2rc2-4 file /etc/mpicxx.conf from install of mvapich2_pgi-1.2rc2-4 conflicts wi th file from package mvapich2_gcc-1.2rc2-4 file /etc/mpif77.conf from install of mvapich2_pgi-1.2rc2-4 conflicts wi th file from package mvapich2_gcc-1.2rc2-4 file /etc/mpif90.conf from install of mvapich2_pgi-1.2rc2-4 conflicts wi th file from package mvapich2_gcc-1.2rc2-4 Scott Weitzenkamp SQA and Release Manager Server Access Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Tziporet Koren > Sent: Tuesday, September 09, 2008 8:20 AM > To: ewg at lists.openfabrics.org > Cc: general at lists.openfabrics.org > Subject: [ofa-general] OFED 1.4-RC1 is available > > Hi, > OFED 1.4-RC1 release is available on > http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc1.tgz > > To get BUILD_ID run ofed_info > > Please report any issues in bugzilla https://bugs.openfabrics.org/ for > OFED 1.4 > > Tziporet & Vladimir > > ============================================================== > ========== > > Release information: > -------------------- > Linux Operating Systems: > - RedHat EL4 up4: 2.6.9-42.ELsmp * > - RedHat EL4 up5: 2.6.9-55.ELsmp > - RedHat EL4 up6: 2.6.9-67.ELsmp > - RedHat EL4 up7: 2.6.9-78.ELsmp > - RedHat EL5: 2.6.18-8.el5 > - RedHat EL5 up1: 2.6.18-53.el5 > - RedHat EL5 up2: 2.6.18-92.el5 > - CentOS 5.2: 2.6.18-92.el5 > - Fedora C9: 2.6.25-14.fc9 * > - SLES10: 2.6.16.21-0.8-smp > - SLES10 SP1: 2.6.16.46-0.12-smp > - SLES10 SP1 up1: 2.6.16.53-0.16-smp > - SLES10 SP2: 2.6.16.60-0.21-smp > - OpenSuSE 10.3: 2.6.22.5-31 * > - kernel.org: 2.6.26 and 2.6.27-rc5 > > * Minimal QA for these versions > > Systems: > * x86_64 > * x86 > * ia64 > * ppc64 > > > Main Changes from OFED 1.4-beta > =============================== > o Kernel code based on 2.6.27-rc5 > o Added NFS-RDMA support for SLES10 SP2 and kernel 2.6.26 and 27 > o iSER backports added and its now available > o New MPI packages: Open MPI 1.2.7, MVAPICH 1.1 and MVAPICH2 1.1 > o New DAPL libraries > o 37 bugs fixed (see attached for details) > > > Tasks that should be completed for the RC2: > =========================================== > 1. NFS-RDMA to work on RHEL 5.1 > 2. OSM: Cashed routing > 3. Cleanup compilation warning > 4. Bug fixes > From rdreier at cisco.com Tue Sep 9 12:45:28 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 Sep 2008 12:45:28 -0700 Subject: [ofa-general] Re: [PATCH] ipoib: send creation parameters when doing send-only join In-Reply-To: <48C6A9C1.5070108@gmail.com> (Yossi Etigin's message of "Tue, 09 Sep 2008 19:52:17 +0300") References: <48C6A9C1.5070108@gmail.com> Message-ID: > If creation parameters are not sent a sender will not trigger > mcast group creation. Fixes bug #1153 in bugzilla. If there are no receivers and only senders, why would we want to create a multicast group? - R. From hal.rosenstock at gmail.com Tue Sep 9 13:16:28 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 9 Sep 2008 16:16:28 -0400 Subject: [ofa-general] Re: [PATCH] ipoib: send creation parameters when doing send-only join In-Reply-To: References: <48C6A9C1.5070108@gmail.com> Message-ID: On Tue, Sep 9, 2008 at 3:45 PM, Roland Dreier wrote: > > If creation parameters are not sent a sender will not trigger > > mcast group creation. Fixes bug #1153 in bugzilla. > > If there are no receivers and only senders, why would we want to create > a multicast group? IBA states for a MC group to be present there must be at least one "full" member (sender and receiver). So it's not only just senders (SendOnlyNonMembers) but also only just receivers (NonMembers) which won't cause group creation. -- Hal > > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From perkinjo at cse.ohio-state.edu Tue Sep 9 13:25:21 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Tue, 9 Sep 2008 16:25:21 -0400 Subject: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available In-Reply-To: References: <5D49E7A8952DC44FB38C38FA0D758EAD75979A@mtlexch01.mtl.com> Message-ID: <20080909202521.GG3716@cse.ohio-state.edu> Thanks for the note. We are taking a look at this. On Tue, Sep 09, 2008 at 12:52:44PM -0700, Scott Weitzenkamp (sweitzen) wrote: > I am unable to build MVAPICH2 for multiple compilers: > > Building the MVAPICH2 RPM [OFA]... > Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' > --define 'di > st %{nil}' --target x86_64 --define '_name mvapich2_gcc' --define 'impl > ofa' --d > efine 'rdma --with-rdma=gen2' --define 'ib_include > --with-ib-include=/usr/includ > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > 'shared_libs 1' - > -define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran > F90=gfortran' - > -define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' > --define '_pr > efix /usr/mpi/gcc/mvapich2-1.2rc2' > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src > .rpm > Install mvapich2_gcc RPM: > Running rpm -iv --nodeps > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > apich2_gcc-1.2rc2-4.x86_64.rpm > Build mvapich2_pgi RPM > Building the MVAPICH2 RPM [OFA]... > Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' > --define 'di > st %{nil}' --target x86_64 --define '_name mvapich2_pgi' --define 'impl > ofa' --d > efine 'rdma --with-rdma=gen2' --define 'ib_include > --with-ib-include=/usr/includ > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > 'shared_libs 1' - > -define 'romio 1' --define 'comp_env CC=pgcc CXX=pgCC F77=pgf77 > F90=pgf90' --def > ine 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define > '_prefix > /usr/mpi/pgi/mvapich2-1.2rc2' > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src.rpm > Install mvapich2_pgi RPM: > Running rpm -iv --nodeps > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > apich2_pgi-1.2rc2-4.x86_64.rpm > Failed to install mvapich2_pgi RPM > See /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > # more /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > Preparing packages for installation... > file /etc/mpe_graphics.conf from install of > mvapich2_pgi-1.2rc2-4 confli > cts with file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpe_log.conf from install of mvapich2_pgi-1.2rc2-4 > conflicts w > ith file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpe_mpianim.conf from install of mvapich2_pgi-1.2rc2-4 > conflic > ts with file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpe_mpicheck.conf from install of > mvapich2_pgi-1.2rc2-4 confli > cts with file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpe_mpilog.conf from install of mvapich2_pgi-1.2rc2-4 > conflict > s with file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpe_mpitrace.conf from install of > mvapich2_pgi-1.2rc2-4 confli > cts with file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpe_nolog.conf from install of mvapich2_pgi-1.2rc2-4 > conflicts > with file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpicc.conf from install of mvapich2_pgi-1.2rc2-4 > conflicts wit > h file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpicxx.conf from install of mvapich2_pgi-1.2rc2-4 > conflicts wi > th file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpif77.conf from install of mvapich2_pgi-1.2rc2-4 > conflicts wi > th file from package mvapich2_gcc-1.2rc2-4 > file /etc/mpif90.conf from install of mvapich2_pgi-1.2rc2-4 > conflicts wi > th file from package mvapich2_gcc-1.2rc2-4 > > Scott Weitzenkamp > SQA and Release Manager > Server Access Virtualization Business Unit > Cisco Systems > > > > > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > Tziporet Koren > > Sent: Tuesday, September 09, 2008 8:20 AM > > To: ewg at lists.openfabrics.org > > Cc: general at lists.openfabrics.org > > Subject: [ofa-general] OFED 1.4-RC1 is available > > > > Hi, > > OFED 1.4-RC1 release is available on > > http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc1.tgz > > > > To get BUILD_ID run ofed_info > > > > Please report any issues in bugzilla https://bugs.openfabrics.org/ for > > OFED 1.4 > > > > Tziporet & Vladimir > > > > ============================================================== > > ========== > > > > Release information: > > -------------------- > > Linux Operating Systems: > > - RedHat EL4 up4: 2.6.9-42.ELsmp * > > - RedHat EL4 up5: 2.6.9-55.ELsmp > > - RedHat EL4 up6: 2.6.9-67.ELsmp > > - RedHat EL4 up7: 2.6.9-78.ELsmp > > - RedHat EL5: 2.6.18-8.el5 > > - RedHat EL5 up1: 2.6.18-53.el5 > > - RedHat EL5 up2: 2.6.18-92.el5 > > - CentOS 5.2: 2.6.18-92.el5 > > - Fedora C9: 2.6.25-14.fc9 * > > - SLES10: 2.6.16.21-0.8-smp > > - SLES10 SP1: 2.6.16.46-0.12-smp > > - SLES10 SP1 up1: 2.6.16.53-0.16-smp > > - SLES10 SP2: 2.6.16.60-0.21-smp > > - OpenSuSE 10.3: 2.6.22.5-31 * > > - kernel.org: 2.6.26 and 2.6.27-rc5 > > > > * Minimal QA for these versions > > > > Systems: > > * x86_64 > > * x86 > > * ia64 > > * ppc64 > > > > > > Main Changes from OFED 1.4-beta > > =============================== > > o Kernel code based on 2.6.27-rc5 > > o Added NFS-RDMA support for SLES10 SP2 and kernel 2.6.26 and 27 > > o iSER backports added and its now available > > o New MPI packages: Open MPI 1.2.7, MVAPICH 1.1 and MVAPICH2 1.1 > > o New DAPL libraries > > o 37 bugs fixed (see attached for details) > > > > > > Tasks that should be completed for the RC2: > > =========================================== > > 1. NFS-RDMA to work on RHEL 5.1 > > 2. OSM: Cashed routing > > 3. Cleanup compilation warning > > 4. Bug fixes > > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From rdreier at cisco.com Tue Sep 9 13:30:16 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 Sep 2008 13:30:16 -0700 Subject: [ofa-general] Re: [PATCH] ipoib: send creation parameters when doing send-only join In-Reply-To: (Hal Rosenstock's message of "Tue, 9 Sep 2008 16:16:28 -0400") References: <48C6A9C1.5070108@gmail.com> Message-ID: > IBA states for a MC group to be present there must be at least one > "full" member (sender and receiver). So it's not only just senders > (SendOnlyNonMembers) but also only just receivers (NonMembers) which > won't cause group creation. Sure, but the same question still applies. More pedantically, if there are only non-members and send-only non-members of a group, why would we expect it to be created? The patch in question wouldn't even work if we actually used the send-only join status in IPoIB. - R. From chu11 at llnl.gov Tue Sep 9 13:46:44 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 09 Sep 2008 13:46:44 -0700 Subject: [ofa-general] [OpenSM][Trivial] remove old comments Message-ID: <1220993204.27074.61.camel@cardanus.llnl.gov> Hey Sasha, I assume some legacy comment that is no longer relevant (variable does not exist in the source). Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-remove-old-comment.patch Type: text/x-patch Size: 924 bytes Desc: not available URL: From chu11 at llnl.gov Tue Sep 9 13:46:44 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 09 Sep 2008 13:46:44 -0700 Subject: [ofa-general] [OpenSM][Trivial] remove old comments Message-ID: <1220993204.27074.61.camel@cardanus.llnl.gov> Hey Sasha, I assume some legacy comment that is no longer relevant (variable does not exist in the source). Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-remove-old-comment.patch Type: text/x-patch Size: 924 bytes Desc: not available URL: From rdreier at cisco.com Tue Sep 9 14:32:44 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 Sep 2008 14:32:44 -0700 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled In-Reply-To: <20080909145435.GO2316@sgi.com> (akepner@sgi.com's message of "Tue, 9 Sep 2008 07:54:35 -0700") References: <20080909145435.GO2316@sgi.com> Message-ID: thanks, looks like a good fix. Good debugging too. I'll try to get this into 2.6.27. By the way, looking at this stuff again, it seems we have (a possibly quite unlikely) race where a send can complete before the xmit method finishes, and we end up running skb_orphan on an skb that another context has already freed. I'll have to think about how we can fix that -- but any good ideas are appreciated... - R. From rdreier at cisco.com Tue Sep 9 14:42:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 Sep 2008 14:42:00 -0700 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled In-Reply-To: (Roland Dreier's message of "Tue, 09 Sep 2008 14:32:44 -0700") References: <20080909145435.GO2316@sgi.com> Message-ID: Actually I see this is not a regression from 2.6.26 (the bad patch was already in 2.6.26). So I'll queue this for 2.6.28 and hope to come up with a fix for the race in time too. - R From twbowman at gmail.com Tue Sep 9 14:53:56 2008 From: twbowman at gmail.com (Todd Bowman) Date: Tue, 9 Sep 2008 15:53:56 -0600 Subject: [ofa-general] ***SPAM*** opensm failure Message-ID: OpenSM Rev:openib-3.0.13 The opensm segfaulted during an initialization that seems to have been the result of a link state trap (type 1 num12) 09:49:51 914967 [41001960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x011A TID:0x00000000000016cc 09:49:51 948014 [41001960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x011A GID:0xfe80000000000000,0x0008f104003f0ab5 09:49:51 948477 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:67 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad 09:49:51 948497 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad 09:49:51 948502 [41802960] -> __osm_drop_mgr_remove_port: Removed port with GUID:0x0002c90200207801 LID range [0x89,0x89] of node:n1008 09:49:51 948519 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:67 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad 09:49:51 948529 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad ... ... ... 09:49:51 962126 [41802960] -> __osm_drop_mgr_remove_port: Removed port with GUID:0x0002c902002064ad LID range [0xFD,0xFD] of node:hn HCA-1 09:49:52 044097 [41802960] -> __osm_lid_mgr_process_our_sm_node: ERR 0308: Can't acquire SM's port object, GUID 0x0002c902002064ad 09:49:52 098558 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_SUBNET_UCAST_LIDS_WAIT 09:49:52 098917 [41001960] -> __osm_state_mgr_check_tbl_consistency: ERR 3322: lid 0x6E is wrongly assigned to port 0x0008f104003f2cdb in port_lid_tbl 09:49:52 098936 [41001960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad 09:49:52 098944 [41001960] -> __osm_state_mgr_report_new_ports: Discovered new port with GUID:0x0008f104003f2cdb LID range [0x0,0x0] of node:ISR9288/ISR9096 Voltaire sLB-24 09:49:52 098957 [41001960] -> osm_ucast_mgr_process: null (min-hop) tables configured on all switches 09:49:52 098992 [41001960] -> __osm_ucast_mgr_process_port: ERR 3A04: Port 0x8f104003f2cdb has LID 0. An initialization error occurred. Ignoring port 09:49:52 103405 [41802960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_LINK_PORTS_WAIT 09:49:52 103626 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_LINK_PORTS_WAIT 09:49:52 103856 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_LINK_PORTS_WAIT 09:49:52 104077 [41802960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_LINK_PORTS_WAIT ... ... ... 1) Why does the link down trap, start the long chain of __osm_drop_mgr_remove_port? 2) Which of the errors may have caused the the segfault? Thanks, Todd -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramachandra.kuchimanchi at qlogic.com Tue Sep 9 23:56:45 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 10 Sep 2008 12:26:45 +0530 Subject: ***SPAM*** Re: [ofa-general] Compiled IB packages In-Reply-To: References: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> <48C6904E.1020606@mellanox.co.il> Message-ID: <71d336490809092356y93ba6bcx304f119c496f0fcf@mail.gmail.com> On Wed, Sep 10, 2008 at 12:23 AM, Christopher Tanner wrote: > $ locate autoconf.h > /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h Though it may not be related to this error, one issue that I see is that this kernel version may not work with OFED-1.4. OFED-1.4 has a list of kernels it supports and I don't think 2.6.24-19 is supported. One option could be to upgrade the kernel to kernel.org 2.6.26. Regards, Ram From vlad at mellanox.co.il Wed Sep 10 00:28:55 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 10 Sep 2008 10:28:55 +0300 Subject: [ofa-general] Compiled IB packages In-Reply-To: References: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> <48C6904E.1020606@mellanox.co.il> Message-ID: <1221031735.6948.12.camel@vlad-laptop> Hi, >From the log file, I see the mismatch between the sources you are passing to configure command and autoconf.h/auto.conf below: /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf >From the log file: Kernel version: 2.6.24-16-server Modules directory: //lib/modules/2.6.24-16-server/updates Kernel sources: /usr/src/linux-source-2.6.24 Check that you have corresponding (matching the running kernel) linux-headers package installed and then you don't have to pass --kernel-sources and --kernel parameters to the configure script. E.g. for kernel 2.6.24-19-generic it is linux-headers-2.6.24-19-generic Regards, Vladimir On Tue, 2008-09-09 at 14:53 -0400, Christopher Tanner wrote: > Thanks Vladimir - very helpful. However, I'm running into a problem > with compiling the ofa package. First, I had to specify the source > location on the command line (Ubuntu puts it in a different place than > RedHat or SUSE): > > $ ./configure --kernel-sources=/usr/src/linux-source-2.6.24 ... (other > stuff) > > I'm getting this error: > > ERROR: Kernel configuration is invalid. > include/linux/autoconf.h or include/config/auto.conf are > missing. > Run 'make oldconfig && make prepare' on kernel src to fix it. > > This is confusing b/c both of those files exist. > $ locate autoconf.h > /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h > > $ locate auto.conf > /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf > > There's a whole bunch more errors that I assume spawn because of this > initial error. The output from 'make' is attached (it's pretty long). > Let me know what you think. Thanks! > > ------------------------------------------- > Chris Tanner > Space Systems Design Lab > Georgia Institute of Technology > christopher.tanner at gatech.edu > ------------------------------------------- > > > > On Sep 9, 2008, at 11:03 AM, Vladimir Sokolovsky wrote: > > > Christopher Tanner wrote: > >> I am setting up a 16-node (homogeneous) cluster running Ubuntu 8.04 > >> server with Mellanox Infiniband cards. I downloaded (from the > >> OpenFabrics website), compiled, and installed the following IB > >> packages on the master node into the /usr/local/lib directory. The / > >> usr/local directory is being shared to all of the nodes via NFS. > >> All packages seemed to compile and install fine. > >> libibverbs > >> librdmacm > >> libibcm > >> libipathverbs > >> dapl > >> compat-dapl > >> libmlx4 > >> libmthca > >> libcxgb3 > >> libibcommon > >> libibumad > >> libibmad > >> opensm > >> infiniband-diags > >> I have a few questions: > >> a) Do I need to run 'make install' on each node or just the master > >> node? All of the libraries in /usr/local/lib are visible to all > >> nodes... Stated another way, does 'make install' put files > >> elsewhere beside the /usr/local/lib directory? Does it alter OS > >> configuration files to tell it to look for certain files in /usr/ > >> local/lib? > > > > No, all the packages above will put their files under /usr/local > > > >> b) I know I need to load the IB kernel modules (mlx4_core, > >> mlx4_ib, rdma_ucm, ib_core, ib_mad, ib_mthca, ib_umad, ib_uverbs) > >> in order for the IB cards to work. Are these compiled and installed > >> with the above packages? Where does the kernel know where to look > >> for modules? (Sorry, this question is very similar to the first one). > > > > The packages above are user space libraries/binaries. To install > > kernel > > modules you should download the latest version of the ofa_1_4_kernel > > tgz file from: > > > > http://www.openfabrics.org/downloads/ofa_1_4_kernel/ > > To install, run: > > ./configure --with-core-mod --with-user_mad-mod --with-user_access- > > mod --with-addr_trans-mod --with-mthca-mod --with-mthca_debug-mod -- > > with-mlx4-mod --with-mlx4_en-mod --with-mlx4_debug-mod --with-cxgb3- > > mod --with-ehca-mod --with-ipoib-mod --with-ipoib_debug-mod (... , > > see --help) > > make > > make install > > > > > >> c) The OFED software stack contains some stuff that isn't available > >> for source download (e.g. ib-bonding, ibsim, libsdp). Are these > >> necessary for the IB network to operate correctly? Since I'm > >> running Ubuntu, obviously the src.rpm file won't work... > > > > All OFED tgz files that are available under: > > http://www.openfabrics.org/~vlad/ofed_1_4/SOURCES/ > > > > ib-bonding source RPM can be downloaded from (you can open it to get > > tgz file using cpio, if you need): > > http://www.openfabrics.org/~monis/ofed_1_4/ > > > > This packages are not necessary for the IB network to operate > > correctly, but > > it depends on what are you planning to do. > > > > Regards, > > Vladimir > > > >> Thanks to all for you help. Previous responses regarding issues > >> with OpenSM worked great. > >> ------------------------------------------- > >> Chris Tanner > >> Space Systems Design Lab > >> Georgia Institute of Technology > >> christopher.tanner at gatech.edu > >> ------------------------------------------- > From kliteyn at dev.mellanox.co.il Wed Sep 10 01:25:12 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 10 Sep 2008 11:25:12 +0300 Subject: [ofa-general] ***SPAM*** opensm failure In-Reply-To: References: Message-ID: <48C78468.9060101@dev.mellanox.co.il> Hi Todd, Todd Bowman wrote: > OpenSM Rev:openib-3.0.13 Can you upgrade to OFED 1.3.1? We had some bug that was causing opensm to drop the wrong transactions, and the errors in your log could be caused by that. The bug was fixed in OFED 1.3 -- Yevgeny > The opensm segfaulted during an initialization that seems to have been > the result of a link state trap (type 1 num12) > > > 09:49:51 914967 [41001960] -> __osm_trap_rcv_process_ > request: Received Generic Notice type:0x01 num:128 Producer:2 from > LID:0x011A TID:0x00000000000016cc > 09:49:51 948014 [41001960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x011A > GID:0xfe80000000000000,0x0008f104003f0ab5 > 09:49:51 948477 [41802960] -> osm_report_notice: Reporting Generic > Notice type:3 num:67 from LID:0x00FD > GID:0xfe80000000000000,0x0002c902002064ad > 09:49:51 948497 [41802960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x00FD > GID:0xfe80000000000000,0x0002c902002064ad > 09:49:51 948502 [41802960] -> __osm_drop_mgr_remove_port: Removed port > with GUID:0x0002c90200207801 LID range [0x89,0x89] of node:n1008 > 09:49:51 948519 [41802960] -> osm_report_notice: Reporting Generic > Notice type:3 num:67 from LID:0x00FD > GID:0xfe80000000000000,0x0002c902002064ad > 09:49:51 948529 [41802960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x00FD > GID:0xfe80000000000000,0x0002c902002064ad > ... > ... > ... > > 09:49:51 962126 [41802960] -> __osm_drop_mgr_remove_port: Removed port > with GUID:0x0002c902002064ad LID range [0xFD,0xFD] of node:hn HCA-1 > 09:49:52 044097 [41802960] -> __osm_lid_mgr_process_our_sm_node: ERR > 0308: Can't acquire SM's port object, GUID 0x0002c902002064ad > 09:49:52 098558 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: > Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state > OSM_SM_STATE_SET_SUBNET_UCAST_LIDS_WAIT > 09:49:52 098917 [41001960] -> __osm_state_mgr_check_tbl_consistency: ERR > 3322: lid 0x6E is wrongly assigned to port 0x0008f104003f2cdb in > port_lid_tbl > 09:49:52 098936 [41001960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x00FD > GID:0xfe80000000000000,0x0002c902002064ad > 09:49:52 098944 [41001960] -> __osm_state_mgr_report_new_ports: > Discovered new port with GUID:0x0008f104003f2cdb LID range [0x0,0x0] of > node:ISR9288/ISR9096 Voltaire sLB-24 > 09:49:52 098957 [41001960] -> osm_ucast_mgr_process: null (min-hop) > tables configured on all switches > 09:49:52 098992 [41001960] -> __osm_ucast_mgr_process_port: ERR 3A04: > Port 0x8f104003f2cdb has LID 0. An initialization error occurred. > Ignoring port > 09:49:52 103405 [41802960] -> __osm_state_mgr_signal_error: ERR 3303: > Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state > OSM_SM_STATE_SET_LINK_PORTS_WAIT > 09:49:52 103626 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: > Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state > OSM_SM_STATE_SET_LINK_PORTS_WAIT > 09:49:52 103856 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: > Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state > OSM_SM_STATE_SET_LINK_PORTS_WAIT > 09:49:52 104077 [41802960] -> __osm_state_mgr_signal_error: ERR 3303: > Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state > OSM_SM_STATE_SET_LINK_PORTS_WAIT > ... > ... > ... > > > 1) Why does the link down trap, start the long chain of > __osm_drop_mgr_remove_port? > > 2) Which of the errors may have caused the the segfault? > > > > Thanks, > Todd > > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From yossi.openib at gmail.com Wed Sep 10 02:18:07 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Wed, 10 Sep 2008 12:18:07 +0300 Subject: [ofa-general] Re: [PATCH] ipoib: send creation parameters when doing send-only join In-Reply-To: References: <48C6A9C1.5070108@gmail.com> Message-ID: <48C790CF.4050505@gmail.com> Roland Dreier wrote: > > IBA states for a MC group to be present there must be at least one > > "full" member (sender and receiver). So it's not only just senders > > (SendOnlyNonMembers) but also only just receivers (NonMembers) which > > won't cause group creation. > > Sure, but the same question still applies. More pedantically, if there > are only non-members and send-only non-members of a group, why would we > expect it to be created? ipoib senders are FullMembers so sm will try to create the group event if there are only ipoib senders. > > The patch in question wouldn't even work if we actually used the > send-only join status in IPoIB. > > - R. > But we don't, so it's required. Please see bug #1153 for case description. From vlad at lists.openfabrics.org Wed Sep 10 03:09:31 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 10 Sep 2008 03:09:31 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080910-0200 daily build status Message-ID: <20080910100931.A84D0E60D74@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From Sumit.Gaur at Sun.COM Wed Sep 10 03:32:28 2008 From: Sumit.Gaur at Sun.COM (Sumit Gaur - Sun Microsystem) Date: Wed, 10 Sep 2008 16:02:28 +0530 Subject: [ofa-general] upgrade from 1.2.5* to 1.3.1 In-Reply-To: References: <20080905015440.9C33FE60D8F@openfabrics.org> <48C13182.3020500@Sun.COM> Message-ID: <48C7A23C.8070209@Sun.COM> Hi Hal, I did some more debugging and find that only request with hopcount 1 or more are failing with recv packet status 110. I search for this status in error.h but find no value for it. Any idea ? sumit Hal Rosenstock wrote: > On Fri, Sep 5, 2008 at 9:17 AM, Sumit Gaur - Sun Microsystem > wrote: > >>Hi >>I have upgraded my OFED version from 1.2.5* to 1.3.1, Now application could >>not communicate with OFED libraries using umad_send and umad_recv function >>call for IB_SMI_DIRECT_CLASS (with DR path) requests. Is there any major change in umad lib >>for such requests. Any help or info is appreciated. > > > What kernel is being used ? On what machine architecture are you > running ? Is it perhaps big endian ? I think there was a change that > could affect those machines at a minimum. > > -- Hal > > >>sumit >> >>_______________________________________________ >>general mailing list >>general at lists.openfabrics.org >>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >>To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general >> From hal.rosenstock at gmail.com Wed Sep 10 06:07:06 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 10 Sep 2008 09:07:06 -0400 Subject: [ofa-general] upgrade from 1.2.5* to 1.3.1 In-Reply-To: <48C7A23C.8070209@Sun.COM> References: <20080905015440.9C33FE60D8F@openfabrics.org> <48C13182.3020500@Sun.COM> <48C7A23C.8070209@Sun.COM> Message-ID: Hi Sumit, On Wed, Sep 10, 2008 at 6:32 AM, Sumit Gaur - Sun Microsystem wrote: > Hi Hal, > I did some more debugging and find that only request with hopcount 1 or more > are failing with recv packet status 110. I search for this status in error.h > but find no value for it. Any idea ? 110 is ETIMEDOUT -- Hal > sumit > > Hal Rosenstock wrote: >> >> On Fri, Sep 5, 2008 at 9:17 AM, Sumit Gaur - Sun Microsystem >> wrote: >> >>> Hi >>> I have upgraded my OFED version from 1.2.5* to 1.3.1, Now application >>> could >>> not communicate with OFED libraries using umad_send and umad_recv >>> function >>> call for IB_SMI_DIRECT_CLASS (with DR path) requests. Is there any major >>> change in umad lib >>> for such requests. Any help or info is appreciated. >> >> >> What kernel is being used ? On what machine architecture are you >> running ? Is it perhaps big endian ? I think there was a change that >> could affect those machines at a minimum. >> >> -- Hal >> >> >>> sumit >>> >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>> > From halr at obsidianresearch.com Wed Sep 10 06:19:20 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Wed, 10 Sep 2008 07:19:20 -0600 Subject: [ofa-general] [PATCH][TRIVIAL]osm_(helper trap_rcv).c: Change output format of notice type to unsigned decimal Message-ID: <48C7C958.6080100@obsidianresearch.com> Sasha, Attached is a trivial patch to modify the output format of notice type to unsigned decimal. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-notice-type1 URL: From eli at dev.mellanox.co.il Wed Sep 10 06:51:16 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 10 Sep 2008 16:51:16 +0300 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled In-Reply-To: References: <20080909145435.GO2316@sgi.com> Message-ID: <20080910135116.GB26881@mtls03> On Tue, Sep 09, 2008 at 02:32:44PM -0700, Roland Dreier wrote: > By the way, looking at this stuff again, it seems we have (a possibly > quite unlikely) race where a send can complete before the xmit method > finishes, and we end up running skb_orphan on an skb that another > context has already freed. I'll have to think about how we can fix > that -- but any good ideas are appreciated... > We can check if there are outstanding WRs after poll_tx is called. If there are no outstanding WRs, it means that the SKB has been freed. If there are outstanding WRs, it means that the last post has not been freed so we can call skb_orphan(). Like the following patch (on top of Arthur's): diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 711a3ac..332526a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -532,7 +532,7 @@ int ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_tx_buf *tx_req; int hlen; void *phead; - int ret = 1; /* assume the worst */ + int sent; if (skb_is_gso(skb)) { hlen = skb_transport_offset(skb) + tcp_hdrlen(skb); @@ -594,7 +594,7 @@ int ipoib_send(struct net_device *dev, struct sk_buff *skb, --priv->tx_outstanding; ipoib_dma_unmap_tx(priv->ca, tx_req); dev_kfree_skb_any(skb); - ret = 1; + sent = 0; if (netif_queue_stopped(dev)) netif_wake_queue(dev); } else { @@ -602,14 +602,14 @@ int ipoib_send(struct net_device *dev, struct sk_buff *skb, address->last_send = priv->tx_head; ++priv->tx_head; - ret = 0; + sent = 1; } if (unlikely(priv->tx_outstanding > MAX_SEND_CQE)) while (poll_tx(priv)) ; /* nothing */ - return ret; + return !(sent && priv->tx_outstanding); } static void __ipoib_reap_ah(struct net_device *dev) From christopher.tanner at gatech.edu Wed Sep 10 06:59:45 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Wed, 10 Sep 2008 09:59:45 -0400 Subject: [ofa-general] Compiled IB packages In-Reply-To: <1221031735.6948.12.camel@vlad-laptop> References: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> <48C6904E.1020606@mellanox.co.il> <1221031735.6948.12.camel@vlad-laptop> Message-ID: <94325E85-9403-4264-A4FE-90A567A8655B@gatech.edu> Vladimir - Good catch on the linux headers version - I fixed that now. The problem persisted after fixing the headers... but I finally figured out what the issues were. On the configure line: a) the --kernel-sources option needs the path to the linux HEADERS (linux-headers-), not the linux SOURCE (linux-source-). Terminology there is confusing... b) If I didn't specify anything for the --modules-dir option, it defaults to /lib/modules/2.6.24-16-server/updates. I don't know what the 'updates' gets appended onto the end, but that is not correct. So I had to specify --modules-dir=/lib/modules/2.6.24-16-server It compiled and installed just fine! My final question - how do I install the kernel modules on the rest of the nodes? The source was compiled in the /home directory, which is shared to all nodes via NFS. However, the kernel headers are NOT shared to the rest of the nodes. Do you recommend I: a) Install the linux headers on all of the nodes and execute 'make install' on all nodes b) Look at where the modules installed to (from the make install output) and copy the files manually Thanks! ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- On Sep 10, 2008, at 3:28 AM, Vladimir Sokolovsky wrote: > Hi, >> From the log file, I see the mismatch between the sources you are > passing to configure command and autoconf.h/auto.conf below: > > /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h > /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf > >> From the log file: > Kernel version: 2.6.24-16-server > Modules directory: //lib/modules/2.6.24-16-server/updates > Kernel sources: /usr/src/linux-source-2.6.24 > > Check that you have corresponding (matching the running kernel) > linux-headers package installed and then you don't have to pass > --kernel-sources and --kernel parameters to the configure script. > > E.g. > for kernel 2.6.24-19-generic it is linux-headers-2.6.24-19-generic > > Regards, > Vladimir > > On Tue, 2008-09-09 at 14:53 -0400, Christopher Tanner wrote: >> Thanks Vladimir - very helpful. However, I'm running into a problem >> with compiling the ofa package. First, I had to specify the source >> location on the command line (Ubuntu puts it in a different place >> than >> RedHat or SUSE): >> >> $ ./configure --kernel-sources=/usr/src/linux-source-2.6.24 ... >> (other >> stuff) >> >> I'm getting this error: >> >> ERROR: Kernel configuration is invalid. >> include/linux/autoconf.h or include/config/auto.conf are >> missing. >> Run 'make oldconfig && make prepare' on kernel src to fix >> it. >> >> This is confusing b/c both of those files exist. >> $ locate autoconf.h >> /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h >> >> $ locate auto.conf >> /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf >> >> There's a whole bunch more errors that I assume spawn because of this >> initial error. The output from 'make' is attached (it's pretty long). >> Let me know what you think. Thanks! >> >> ------------------------------------------- >> Chris Tanner >> Space Systems Design Lab >> Georgia Institute of Technology >> christopher.tanner at gatech.edu >> ------------------------------------------- >> >> >> >> On Sep 9, 2008, at 11:03 AM, Vladimir Sokolovsky wrote: >> >>> Christopher Tanner wrote: >>>> I am setting up a 16-node (homogeneous) cluster running Ubuntu 8.04 >>>> server with Mellanox Infiniband cards. I downloaded (from the >>>> OpenFabrics website), compiled, and installed the following IB >>>> packages on the master node into the /usr/local/lib directory. >>>> The / >>>> usr/local directory is being shared to all of the nodes via NFS. >>>> All packages seemed to compile and install fine. >>>> libibverbs >>>> librdmacm >>>> libibcm >>>> libipathverbs >>>> dapl >>>> compat-dapl >>>> libmlx4 >>>> libmthca >>>> libcxgb3 >>>> libibcommon >>>> libibumad >>>> libibmad >>>> opensm >>>> infiniband-diags >>>> I have a few questions: >>>> a) Do I need to run 'make install' on each node or just the master >>>> node? All of the libraries in /usr/local/lib are visible to all >>>> nodes... Stated another way, does 'make install' put files >>>> elsewhere beside the /usr/local/lib directory? Does it alter OS >>>> configuration files to tell it to look for certain files in /usr/ >>>> local/lib? >>> >>> No, all the packages above will put their files under /usr/local >>> >>>> b) I know I need to load the IB kernel modules (mlx4_core, >>>> mlx4_ib, rdma_ucm, ib_core, ib_mad, ib_mthca, ib_umad, ib_uverbs) >>>> in order for the IB cards to work. Are these compiled and installed >>>> with the above packages? Where does the kernel know where to look >>>> for modules? (Sorry, this question is very similar to the first >>>> one). >>> >>> The packages above are user space libraries/binaries. To install >>> kernel >>> modules you should download the latest version of the ofa_1_4_kernel >>> tgz file from: >>> >>> http://www.openfabrics.org/downloads/ofa_1_4_kernel/ >>> To install, run: >>> ./configure --with-core-mod --with-user_mad-mod --with-user_access- >>> mod --with-addr_trans-mod --with-mthca-mod --with-mthca_debug-mod -- >>> with-mlx4-mod --with-mlx4_en-mod --with-mlx4_debug-mod --with-cxgb3- >>> mod --with-ehca-mod --with-ipoib-mod --with-ipoib_debug-mod (... , >>> see --help) >>> make >>> make install >>> >>> >>>> c) The OFED software stack contains some stuff that isn't available >>>> for source download (e.g. ib-bonding, ibsim, libsdp). Are these >>>> necessary for the IB network to operate correctly? Since I'm >>>> running Ubuntu, obviously the src.rpm file won't work... >>> >>> All OFED tgz files that are available under: >>> http://www.openfabrics.org/~vlad/ofed_1_4/SOURCES/ >>> >>> ib-bonding source RPM can be downloaded from (you can open it to get >>> tgz file using cpio, if you need): >>> http://www.openfabrics.org/~monis/ofed_1_4/ >>> >>> This packages are not necessary for the IB network to operate >>> correctly, but >>> it depends on what are you planning to do. >>> >>> Regards, >>> Vladimir >>> >>>> Thanks to all for you help. Previous responses regarding issues >>>> with OpenSM worked great. >>>> ------------------------------------------- >>>> Chris Tanner >>>> Space Systems Design Lab >>>> Georgia Institute of Technology >>>> christopher.tanner at gatech.edu >>>> ------------------------------------------- >> From vlad at mellanox.co.il Wed Sep 10 07:14:18 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 10 Sep 2008 17:14:18 +0300 Subject: [ofa-general] Compiled IB packages In-Reply-To: <94325E85-9403-4264-A4FE-90A567A8655B@gatech.edu> References: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> <48C6904E.1020606@mellanox.co.il> <1221031735.6948.12.camel@vlad-laptop> <94325E85-9403-4264-A4FE-90A567A8655B@gatech.edu> Message-ID: <48C7D63A.8090005@mellanox.co.il> Christopher Tanner wrote: > Vladimir - > > Good catch on the linux headers version - I fixed that now. The problem > persisted after fixing the headers... but I finally figured out what the > issues were. On the configure line: > > a) the --kernel-sources option needs the path to the linux HEADERS > (linux-headers-), not the linux SOURCE (linux-source-). > Terminology there is confusing... > If you compiling for the running kernel then configure will find kernel sources using /lib/modules/`uname -r`/build link. So, you don't have to pass '--kernel-sources' and '--kernel'. > b) If I didn't specify anything for the --modules-dir option, it > defaults to /lib/modules/2.6.24-16-server/updates. I don't know what the > 'updates' gets appended onto the end, but that is not correct. So I had > to specify --modules-dir=/lib/modules/2.6.24-16-server Why you think that updates is wrong? modprobe works with /lib/modules/`uname -r`/updates directory in the following way: if kernel module with the same name is present under /lib/modules/`uname -r`/kernel and under /lib/modules/`uname -r`/updates then the module from updates will be loaded. > > It compiled and installed just fine! > > My final question - how do I install the kernel modules on the rest of > the nodes? The source was compiled in the /home directory, which is > shared to all nodes via NFS. However, the kernel headers are NOT shared > to the rest of the nodes. Do you recommend I: > > a) Install the linux headers on all of the nodes and execute 'make > install' on all nodes > b) Look at where the modules installed to (from the make install output) > and copy the files manually > Both options are good. Note, if you use option b) then you need to run "depmod" after copying kernel modules. Regards, Vladimir > Thanks! > > ------------------------------------------- > Chris Tanner > Space Systems Design Lab > Georgia Institute of Technology > christopher.tanner at gatech.edu > ------------------------------------------- > > > > On Sep 10, 2008, at 3:28 AM, Vladimir Sokolovsky wrote: > >> Hi, >>> From the log file, I see the mismatch between the sources you are >> passing to configure command and autoconf.h/auto.conf below: >> >> /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h >> /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf >> >>> From the log file: >> Kernel version: 2.6.24-16-server >> Modules directory: //lib/modules/2.6.24-16-server/updates >> Kernel sources: /usr/src/linux-source-2.6.24 >> >> Check that you have corresponding (matching the running kernel) >> linux-headers package installed and then you don't have to pass >> --kernel-sources and --kernel parameters to the configure script. >> >> E.g. >> for kernel 2.6.24-19-generic it is linux-headers-2.6.24-19-generic >> >> Regards, >> Vladimir >> >> On Tue, 2008-09-09 at 14:53 -0400, Christopher Tanner wrote: >>> Thanks Vladimir - very helpful. However, I'm running into a problem >>> with compiling the ofa package. First, I had to specify the source >>> location on the command line (Ubuntu puts it in a different place than >>> RedHat or SUSE): >>> >>> $ ./configure --kernel-sources=/usr/src/linux-source-2.6.24 ... (other >>> stuff) >>> >>> I'm getting this error: >>> >>> ERROR: Kernel configuration is invalid. >>> include/linux/autoconf.h or include/config/auto.conf are >>> missing. >>> Run 'make oldconfig && make prepare' on kernel src to fix it. >>> >>> This is confusing b/c both of those files exist. >>> $ locate autoconf.h >>> /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h >>> >>> $ locate auto.conf >>> /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf >>> >>> There's a whole bunch more errors that I assume spawn because of this >>> initial error. The output from 'make' is attached (it's pretty long). >>> Let me know what you think. Thanks! >>> >>> ------------------------------------------- >>> Chris Tanner >>> Space Systems Design Lab >>> Georgia Institute of Technology >>> christopher.tanner at gatech.edu >>> ------------------------------------------- >>> >>> >>> >>> On Sep 9, 2008, at 11:03 AM, Vladimir Sokolovsky wrote: >>> >>>> Christopher Tanner wrote: >>>>> I am setting up a 16-node (homogeneous) cluster running Ubuntu 8.04 >>>>> server with Mellanox Infiniband cards. I downloaded (from the >>>>> OpenFabrics website), compiled, and installed the following IB >>>>> packages on the master node into the /usr/local/lib directory. The / >>>>> usr/local directory is being shared to all of the nodes via NFS. >>>>> All packages seemed to compile and install fine. >>>>> libibverbs >>>>> librdmacm >>>>> libibcm >>>>> libipathverbs >>>>> dapl >>>>> compat-dapl >>>>> libmlx4 >>>>> libmthca >>>>> libcxgb3 >>>>> libibcommon >>>>> libibumad >>>>> libibmad >>>>> opensm >>>>> infiniband-diags >>>>> I have a few questions: >>>>> a) Do I need to run 'make install' on each node or just the master >>>>> node? All of the libraries in /usr/local/lib are visible to all >>>>> nodes... Stated another way, does 'make install' put files >>>>> elsewhere beside the /usr/local/lib directory? Does it alter OS >>>>> configuration files to tell it to look for certain files in /usr/ >>>>> local/lib? >>>> >>>> No, all the packages above will put their files under /usr/local >>>> >>>>> b) I know I need to load the IB kernel modules (mlx4_core, >>>>> mlx4_ib, rdma_ucm, ib_core, ib_mad, ib_mthca, ib_umad, ib_uverbs) >>>>> in order for the IB cards to work. Are these compiled and installed >>>>> with the above packages? Where does the kernel know where to look >>>>> for modules? (Sorry, this question is very similar to the first one). >>>> >>>> The packages above are user space libraries/binaries. To install >>>> kernel >>>> modules you should download the latest version of the ofa_1_4_kernel >>>> tgz file from: >>>> >>>> http://www.openfabrics.org/downloads/ofa_1_4_kernel/ >>>> To install, run: >>>> ./configure --with-core-mod --with-user_mad-mod --with-user_access- >>>> mod --with-addr_trans-mod --with-mthca-mod --with-mthca_debug-mod -- >>>> with-mlx4-mod --with-mlx4_en-mod --with-mlx4_debug-mod --with-cxgb3- >>>> mod --with-ehca-mod --with-ipoib-mod --with-ipoib_debug-mod (... , >>>> see --help) >>>> make >>>> make install >>>> >>>> >>>>> c) The OFED software stack contains some stuff that isn't available >>>>> for source download (e.g. ib-bonding, ibsim, libsdp). Are these >>>>> necessary for the IB network to operate correctly? Since I'm >>>>> running Ubuntu, obviously the src.rpm file won't work... >>>> >>>> All OFED tgz files that are available under: >>>> http://www.openfabrics.org/~vlad/ofed_1_4/SOURCES/ >>>> >>>> ib-bonding source RPM can be downloaded from (you can open it to get >>>> tgz file using cpio, if you need): >>>> http://www.openfabrics.org/~monis/ofed_1_4/ >>>> >>>> This packages are not necessary for the IB network to operate >>>> correctly, but >>>> it depends on what are you planning to do. >>>> >>>> Regards, >>>> Vladimir >>>> >>>>> Thanks to all for you help. Previous responses regarding issues >>>>> with OpenSM worked great. >>>>> ------------------------------------------- >>>>> Chris Tanner >>>>> Space Systems Design Lab >>>>> Georgia Institute of Technology >>>>> christopher.tanner at gatech.edu >>>>> ------------------------------------------- >>> > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From yossi.openib at gmail.com Wed Sep 10 07:32:27 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Wed, 10 Sep 2008 17:32:27 +0300 Subject: ***SPAM*** Fwd: [ofa-general] [PATCH] ipoib: fix hang while bringing down uninitialized interface Message-ID: <48C7DA7B.3050706@gmail.com> Roland, Can you comment on this? It fixes a soft lockup during ipoib stop. -------- Original Message -------- Subject: [ofa-general] ***SPAM*** [PATCH] ipoib: fix hang while bringing down uninitialized interface Date: Fri, 05 Sep 2008 18:00:46 +0300 From: Yossi Etigin To: Roland Dreier CC: Olga Shern , general list Fix bug #1172: If a pkey for an interface is not found during initialization, then poll_timer is left uninitialized. When the device is brought down, ipoib tries to del_timer_sync() it. This call hangs in an infinite loop in lock_timer_base(), because timer_base is NULL. We should check whether the timer was really initialized. Signed-off-by: Yossi Etigin -- diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 66cafa2..3bbf46d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -850,7 +850,10 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) ipoib_dbg(priv, "All sends and receives done.\n"); timeout: - del_timer_sync(&priv->poll_timer); + /* Make sure the timer is initialized */ + if (priv->poll_timer.function) + del_timer_sync(&priv->poll_timer); + qp_attr.qp_state = IB_QPS_RESET; if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE)) ipoib_warn(priv, "Failed to modify QP to RESET state\n"); --Yossi _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From alexs at linux.vnet.ibm.com Wed Sep 10 07:23:56 2008 From: alexs at linux.vnet.ibm.com (Alexander Schmidt) Date: Wed, 10 Sep 2008 16:23:56 +0200 Subject: [ofa-general] [PATCH] ib/ehca: add flush CQE generation Message-ID: <20080910162356.7294fe87@BL3D1974.boeblingen.de.ibm.com> When a QP goes into error state, it is required that flush CQEs are delivered to the application for any outstanding work requests. eHCA does not do this in hardware, so this patch adds software flush CQE generation to the ehca driver. Whenever a QP gets into error state, it is added to the QP error list of its respective CQ. If the error QP list of a CQ is not empty, poll_cq() generates flush CQEs before polling the actual CQ. Signed-off-by: Alexander Schmidt --- Applies on top of 2.6.27-rc3, please consider this for 2.6.28. drivers/infiniband/hw/ehca/ehca_classes.h | 14 + drivers/infiniband/hw/ehca/ehca_cq.c | 3 drivers/infiniband/hw/ehca/ehca_iverbs.h | 2 drivers/infiniband/hw/ehca/ehca_qp.c | 225 ++++++++++++++++++++++++++++-- drivers/infiniband/hw/ehca/ehca_reqs.c | 211 ++++++++++++++++++++++++---- 5 files changed, 412 insertions(+), 43 deletions(-) --- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_classes.h +++ infiniband.git/drivers/infiniband/hw/ehca/ehca_classes.h @@ -164,6 +164,13 @@ struct ehca_qmap_entry { u16 reported; }; +struct ehca_queue_map { + struct ehca_qmap_entry *map; + unsigned int entries; + unsigned int tail; + unsigned int left_to_poll; +}; + struct ehca_qp { union { struct ib_qp ib_qp; @@ -173,8 +180,9 @@ struct ehca_qp { enum ehca_ext_qp_type ext_type; enum ib_qp_state state; struct ipz_queue ipz_squeue; - struct ehca_qmap_entry *sq_map; + struct ehca_queue_map sq_map; struct ipz_queue ipz_rqueue; + struct ehca_queue_map rq_map; struct h_galpas galpas; u32 qkey; u32 real_qp_num; @@ -204,6 +212,8 @@ struct ehca_qp { atomic_t nr_events; /* events seen */ wait_queue_head_t wait_completion; int mig_armed; + struct list_head sq_err_node; + struct list_head rq_err_node; }; #define IS_SRQ(qp) (qp->ext_type == EQPT_SRQ) @@ -233,6 +243,8 @@ struct ehca_cq { /* mmap counter for resources mapped into user space */ u32 mm_count_queue; u32 mm_count_galpa; + struct list_head sqp_err_list; + struct list_head rqp_err_list; }; enum ehca_mr_flag { --- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c +++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -53,9 +53,25 @@ /* in RC traffic, insert an empty RDMA READ every this many packets */ #define ACK_CIRC_THRESHOLD 2000000 +static u64 replace_wr_id(u64 wr_id, u16 idx) +{ + u64 ret; + + ret = wr_id & ~QMAP_IDX_MASK; + ret |= idx & QMAP_IDX_MASK; + + return ret; +} + +static u16 get_app_wr_id(u64 wr_id) +{ + return wr_id & QMAP_IDX_MASK; +} + static inline int ehca_write_rwqe(struct ipz_queue *ipz_rqueue, struct ehca_wqe *wqe_p, - struct ib_recv_wr *recv_wr) + struct ib_recv_wr *recv_wr, + u32 rq_map_idx) { u8 cnt_ds; if (unlikely((recv_wr->num_sge < 0) || @@ -69,7 +85,7 @@ static inline int ehca_write_rwqe(struct /* clear wqe header until sglist */ memset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list)); - wqe_p->work_request_id = recv_wr->wr_id; + wqe_p->work_request_id = replace_wr_id(recv_wr->wr_id, rq_map_idx); wqe_p->nr_of_data_seg = recv_wr->num_sge; for (cnt_ds = 0; cnt_ds < recv_wr->num_sge; cnt_ds++) { @@ -146,6 +162,7 @@ static inline int ehca_write_swqe(struct u64 dma_length; struct ehca_av *my_av; u32 remote_qkey = send_wr->wr.ud.remote_qkey; + struct ehca_qmap_entry *qmap_entry = &qp->sq_map.map[sq_map_idx]; if (unlikely((send_wr->num_sge < 0) || (send_wr->num_sge > qp->ipz_squeue.act_nr_of_sg))) { @@ -158,11 +175,10 @@ static inline int ehca_write_swqe(struct /* clear wqe header until sglist */ memset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list)); - wqe_p->work_request_id = send_wr->wr_id & ~QMAP_IDX_MASK; - wqe_p->work_request_id |= sq_map_idx & QMAP_IDX_MASK; + wqe_p->work_request_id = replace_wr_id(send_wr->wr_id, sq_map_idx); - qp->sq_map[sq_map_idx].app_wr_id = send_wr->wr_id & QMAP_IDX_MASK; - qp->sq_map[sq_map_idx].reported = 0; + qmap_entry->app_wr_id = get_app_wr_id(send_wr->wr_id); + qmap_entry->reported = 0; switch (send_wr->opcode) { case IB_WR_SEND: @@ -496,7 +512,9 @@ static int internal_post_recv(struct ehc struct ehca_wqe *wqe_p; int wqe_cnt = 0; int ret = 0; + u32 rq_map_idx; unsigned long flags; + struct ehca_qmap_entry *qmap_entry; if (unlikely(!HAS_RQ(my_qp))) { ehca_err(dev, "QP has no RQ ehca_qp=%p qp_num=%x ext_type=%d", @@ -524,8 +542,15 @@ static int internal_post_recv(struct ehc } goto post_recv_exit0; } + /* + * Get the index of the WQE in the recv queue. The same index + * is used for writing into the rq_map. + */ + rq_map_idx = start_offset / my_qp->ipz_rqueue.qe_size; + /* write a RECV WQE into the QUEUE */ - ret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p, cur_recv_wr); + ret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p, cur_recv_wr, + rq_map_idx); /* * if something failed, * reset the free entry pointer to the start value @@ -540,6 +565,11 @@ static int internal_post_recv(struct ehc } goto post_recv_exit0; } + + qmap_entry = &my_qp->rq_map.map[rq_map_idx]; + qmap_entry->app_wr_id = get_app_wr_id(cur_recv_wr->wr_id); + qmap_entry->reported = 0; + wqe_cnt++; } /* eof for cur_recv_wr */ @@ -596,10 +626,12 @@ static const u8 ib_wc_opcode[255] = { /* internal function to poll one entry of cq */ static inline int ehca_poll_cq_one(struct ib_cq *cq, struct ib_wc *wc) { - int ret = 0; + int ret = 0, qmap_tail_idx; struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); struct ehca_cqe *cqe; struct ehca_qp *my_qp; + struct ehca_qmap_entry *qmap_entry; + struct ehca_queue_map *qmap; int cqe_count = 0, is_error; repoll: @@ -674,27 +706,52 @@ repoll: goto repoll; wc->qp = &my_qp->ib_qp; - if (!(cqe->w_completion_flags & WC_SEND_RECEIVE_BIT)) { - struct ehca_qmap_entry *qmap_entry; + if (is_error) { /* - * We got a send completion and need to restore the original - * wr_id. + * set left_to_poll to 0 because in error state, we will not + * get any additional CQEs */ - qmap_entry = &my_qp->sq_map[cqe->work_request_id & - QMAP_IDX_MASK]; + ehca_add_to_err_list(my_qp, 1); + my_qp->sq_map.left_to_poll = 0; - if (qmap_entry->reported) { - ehca_warn(cq->device, "Double cqe on qp_num=%#x", - my_qp->real_qp_num); - /* found a double cqe, discard it and read next one */ - goto repoll; - } - wc->wr_id = cqe->work_request_id & ~QMAP_IDX_MASK; - wc->wr_id |= qmap_entry->app_wr_id; - qmap_entry->reported = 1; - } else + if (HAS_RQ(my_qp)) + ehca_add_to_err_list(my_qp, 0); + my_qp->rq_map.left_to_poll = 0; + } + + qmap_tail_idx = get_app_wr_id(cqe->work_request_id); + if (!(cqe->w_completion_flags & WC_SEND_RECEIVE_BIT)) + /* We got a send completion. */ + qmap = &my_qp->sq_map; + else /* We got a receive completion. */ - wc->wr_id = cqe->work_request_id; + qmap = &my_qp->rq_map; + + qmap_entry = &qmap->map[qmap_tail_idx]; + if (qmap_entry->reported) { + ehca_warn(cq->device, "Double cqe on qp_num=%#x", + my_qp->real_qp_num); + /* found a double cqe, discard it and read next one */ + goto repoll; + } + + wc->wr_id = replace_wr_id(cqe->work_request_id, qmap_entry->app_wr_id); + qmap_entry->reported = 1; + + /* this is a proper completion, we need to advance the tail pointer */ + if (++qmap->tail == qmap->entries) + qmap->tail = 0; + + /* if left_to_poll is decremented to 0, add the QP to the error list */ + if (qmap->left_to_poll > 0) { + qmap->left_to_poll--; + if ((my_qp->sq_map.left_to_poll == 0) && + (my_qp->rq_map.left_to_poll == 0)) { + ehca_add_to_err_list(my_qp, 1); + if (HAS_RQ(my_qp)) + ehca_add_to_err_list(my_qp, 0); + } + } /* eval ib_wc_opcode */ wc->opcode = ib_wc_opcode[cqe->optype]-1; @@ -733,13 +790,88 @@ poll_cq_one_exit0: return ret; } +static int generate_flush_cqes(struct ehca_qp *my_qp, struct ib_cq *cq, + struct ib_wc *wc, int num_entries, + struct ipz_queue *ipz_queue, int on_sq) +{ + int nr = 0; + struct ehca_wqe *wqe; + u64 offset; + struct ehca_queue_map *qmap; + struct ehca_qmap_entry *qmap_entry; + + if (on_sq) + qmap = &my_qp->sq_map; + else + qmap = &my_qp->rq_map; + + qmap_entry = &qmap->map[qmap->tail]; + + while ((nr < num_entries) && (qmap_entry->reported == 0)) { + /* generate flush CQE */ + memset(wc, 0, sizeof(*wc)); + + offset = qmap->tail * ipz_queue->qe_size; + wqe = (struct ehca_wqe *)ipz_qeit_calc(ipz_queue, offset); + if (!wqe) { + ehca_err(cq->device, "Invalid wqe offset=%#lx on " + "qp_num=%#x", offset, my_qp->real_qp_num); + return nr; + } + + wc->wr_id = replace_wr_id(wqe->work_request_id, + qmap_entry->app_wr_id); + + if (on_sq) { + switch (wqe->optype) { + case WQE_OPTYPE_SEND: + wc->opcode = IB_WC_SEND; + break; + case WQE_OPTYPE_RDMAWRITE: + wc->opcode = IB_WC_RDMA_WRITE; + break; + case WQE_OPTYPE_RDMAREAD: + wc->opcode = IB_WC_RDMA_READ; + break; + default: + ehca_err(cq->device, "Invalid optype=%x", + wqe->optype); + return nr; + } + } else + wc->opcode = IB_WC_RECV; + + if (wqe->wr_flag & WQE_WRFLAG_IMM_DATA_PRESENT) { + wc->ex.imm_data = wqe->immediate_data; + wc->wc_flags |= IB_WC_WITH_IMM; + } + + wc->status = IB_WC_WR_FLUSH_ERR; + + wc->qp = &my_qp->ib_qp; + + /* mark as reported and advance tail pointer */ + qmap_entry->reported = 1; + if (++qmap->tail == qmap->entries) + qmap->tail = 0; + qmap_entry = &qmap->map[qmap->tail]; + + wc++; nr++; + } + + return nr; + +} + int ehca_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc) { struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); int nr; + struct ehca_qp *err_qp; struct ib_wc *current_wc = wc; int ret = 0; unsigned long flags; + int entries_left = num_entries; if (num_entries < 1) { ehca_err(cq->device, "Invalid num_entries=%d ehca_cq=%p " @@ -749,15 +881,40 @@ int ehca_poll_cq(struct ib_cq *cq, int n } spin_lock_irqsave(&my_cq->spinlock, flags); - for (nr = 0; nr < num_entries; nr++) { + + /* generate flush cqes for send queues */ + list_for_each_entry(err_qp, &my_cq->sqp_err_list, sq_err_node) { + nr = generate_flush_cqes(err_qp, cq, current_wc, entries_left, + &err_qp->ipz_squeue, 1); + entries_left -= nr; + current_wc += nr; + + if (entries_left == 0) + break; + } + + /* generate flush cqes for receive queues */ + list_for_each_entry(err_qp, &my_cq->rqp_err_list, rq_err_node) { + nr = generate_flush_cqes(err_qp, cq, current_wc, entries_left, + &err_qp->ipz_rqueue, 0); + entries_left -= nr; + current_wc += nr; + + if (entries_left == 0) + break; + } + + for (nr = 0; nr < entries_left; nr++) { ret = ehca_poll_cq_one(cq, current_wc); if (ret) break; current_wc++; } /* eof for nr */ + entries_left -= nr; + spin_unlock_irqrestore(&my_cq->spinlock, flags); if (ret == -EAGAIN || !ret) - ret = nr; + ret = num_entries - entries_left; poll_cq_exit0: return ret; --- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_cq.c +++ infiniband.git/drivers/infiniband/hw/ehca/ehca_cq.c @@ -276,6 +276,9 @@ struct ib_cq *ehca_create_cq(struct ib_d for (i = 0; i < QP_HASHTAB_LEN; i++) INIT_HLIST_HEAD(&my_cq->qp_hashtab[i]); + INIT_LIST_HEAD(&my_cq->sqp_err_list); + INIT_LIST_HEAD(&my_cq->rqp_err_list); + if (context) { struct ipz_queue *ipz_queue = &my_cq->ipz_queue; struct ehca_create_cq_resp resp; --- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_qp.c +++ infiniband.git/drivers/infiniband/hw/ehca/ehca_qp.c @@ -396,6 +396,50 @@ static void ehca_determine_small_queue(s queue->is_small = (queue->page_size != 0); } +/* needs to be called with cq->spinlock held */ +void ehca_add_to_err_list(struct ehca_qp *qp, int on_sq) +{ + struct list_head *list, *node; + + /* TODO: support low latency QPs */ + if (qp->ext_type == EQPT_LLQP) + return; + + if (on_sq) { + list = &qp->send_cq->sqp_err_list; + node = &qp->sq_err_node; + } else { + list = &qp->recv_cq->rqp_err_list; + node = &qp->rq_err_node; + } + + if (list_empty(node)) + list_add_tail(node, list); + + return; +} + +static void del_from_err_list(struct ehca_cq *cq, struct list_head *node) +{ + unsigned long flags; + + spin_lock_irqsave(&cq->spinlock, flags); + + if (!list_empty(node)) + list_del_init(node); + + spin_unlock_irqrestore(&cq->spinlock, flags); +} + +static void reset_queue_map(struct ehca_queue_map *qmap) +{ + int i; + + qmap->tail = 0; + for (i = 0; i < qmap->entries; i++) + qmap->map[i].reported = 1; +} + /* * Create an ib_qp struct that is either a QP or an SRQ, depending on * the value of the is_srq parameter. If init_attr and srq_init_attr share @@ -407,12 +451,11 @@ static struct ehca_qp *internal_create_q struct ib_srq_init_attr *srq_init_attr, struct ib_udata *udata, int is_srq) { - struct ehca_qp *my_qp; + struct ehca_qp *my_qp, *my_srq = NULL; struct ehca_pd *my_pd = container_of(pd, struct ehca_pd, ib_pd); struct ehca_shca *shca = container_of(pd->device, struct ehca_shca, ib_device); struct ib_ucontext *context = NULL; - u32 nr_qes; u64 h_ret; int is_llqp = 0, has_srq = 0; int qp_type, max_send_sge, max_recv_sge, ret; @@ -457,8 +500,7 @@ static struct ehca_qp *internal_create_q /* handle SRQ base QPs */ if (init_attr->srq) { - struct ehca_qp *my_srq = - container_of(init_attr->srq, struct ehca_qp, ib_srq); + my_srq = container_of(init_attr->srq, struct ehca_qp, ib_srq); has_srq = 1; parms.ext_type = EQPT_SRQBASE; @@ -716,15 +758,19 @@ static struct ehca_qp *internal_create_q "and pages ret=%i", ret); goto create_qp_exit2; } - nr_qes = my_qp->ipz_squeue.queue_length / + + my_qp->sq_map.entries = my_qp->ipz_squeue.queue_length / my_qp->ipz_squeue.qe_size; - my_qp->sq_map = vmalloc(nr_qes * + my_qp->sq_map.map = vmalloc(my_qp->sq_map.entries * sizeof(struct ehca_qmap_entry)); - if (!my_qp->sq_map) { + if (!my_qp->sq_map.map) { ehca_err(pd->device, "Couldn't allocate squeue " "map ret=%i", ret); goto create_qp_exit3; } + INIT_LIST_HEAD(&my_qp->sq_err_node); + /* to avoid the generation of bogus flush CQEs */ + reset_queue_map(&my_qp->sq_map); } if (HAS_RQ(my_qp)) { @@ -736,6 +782,25 @@ static struct ehca_qp *internal_create_q "and pages ret=%i", ret); goto create_qp_exit4; } + + my_qp->rq_map.entries = my_qp->ipz_rqueue.queue_length / + my_qp->ipz_rqueue.qe_size; + my_qp->rq_map.map = vmalloc(my_qp->rq_map.entries * + sizeof(struct ehca_qmap_entry)); + if (!my_qp->rq_map.map) { + ehca_err(pd->device, "Couldn't allocate squeue " + "map ret=%i", ret); + goto create_qp_exit5; + } + INIT_LIST_HEAD(&my_qp->rq_err_node); + /* to avoid the generation of bogus flush CQEs */ + reset_queue_map(&my_qp->rq_map); + } else if (init_attr->srq) { + /* this is a base QP, use the queue map of the SRQ */ + my_qp->rq_map = my_srq->rq_map; + INIT_LIST_HEAD(&my_qp->rq_err_node); + + my_qp->ipz_rqueue = my_srq->ipz_rqueue; } if (is_srq) { @@ -799,7 +864,7 @@ static struct ehca_qp *internal_create_q if (ret) { ehca_err(pd->device, "Couldn't assign qp to send_cq ret=%i", ret); - goto create_qp_exit6; + goto create_qp_exit7; } } @@ -825,25 +890,29 @@ static struct ehca_qp *internal_create_q if (ib_copy_to_udata(udata, &resp, sizeof resp)) { ehca_err(pd->device, "Copy to udata failed"); ret = -EINVAL; - goto create_qp_exit7; + goto create_qp_exit8; } } return my_qp; -create_qp_exit7: +create_qp_exit8: ehca_cq_unassign_qp(my_qp->send_cq, my_qp->real_qp_num); -create_qp_exit6: +create_qp_exit7: kfree(my_qp->mod_qp_parm); +create_qp_exit6: + if (HAS_RQ(my_qp)) + vfree(my_qp->rq_map.map); + create_qp_exit5: if (HAS_RQ(my_qp)) ipz_queue_dtor(my_pd, &my_qp->ipz_rqueue); create_qp_exit4: if (HAS_SQ(my_qp)) - vfree(my_qp->sq_map); + vfree(my_qp->sq_map.map); create_qp_exit3: if (HAS_SQ(my_qp)) @@ -1035,6 +1104,101 @@ static int prepare_sqe_rts(struct ehca_q return 0; } +static int calc_left_cqes(u64 wqe_p, struct ipz_queue *ipz_queue, + struct ehca_queue_map *qmap) +{ + void *wqe_v; + u64 q_ofs; + u32 wqe_idx; + + /* convert real to abs address */ + wqe_p = wqe_p & (~(1UL << 63)); + + wqe_v = abs_to_virt(wqe_p); + + if (ipz_queue_abs_to_offset(ipz_queue, wqe_p, &q_ofs)) { + ehca_gen_err("Invalid offset for calculating left cqes " + "wqe_p=%#lx wqe_v=%p\n", wqe_p, wqe_v); + return -EFAULT; + } + + wqe_idx = q_ofs / ipz_queue->qe_size; + if (wqe_idx < qmap->tail) + qmap->left_to_poll = (qmap->entries - qmap->tail) + wqe_idx; + else + qmap->left_to_poll = wqe_idx - qmap->tail; + + return 0; +} + +static int check_for_left_cqes(struct ehca_qp *my_qp, struct ehca_shca *shca) +{ + u64 h_ret; + void *send_wqe_p, *recv_wqe_p; + int ret; + unsigned long flags; + int qp_num = my_qp->ib_qp.qp_num; + + /* this hcall is not supported on base QPs */ + if (my_qp->ext_type != EQPT_SRQBASE) { + /* get send and receive wqe pointer */ + h_ret = hipz_h_disable_and_get_wqe(shca->ipz_hca_handle, + my_qp->ipz_qp_handle, &my_qp->pf, + &send_wqe_p, &recv_wqe_p, 4); + if (h_ret != H_SUCCESS) { + ehca_err(&shca->ib_device, "disable_and_get_wqe() " + "failed ehca_qp=%p qp_num=%x h_ret=%li", + my_qp, qp_num, h_ret); + return ehca2ib_return_code(h_ret); + } + + /* + * acquire lock to ensure that nobody is polling the cq which + * could mean that the qmap->tail pointer is in an + * inconsistent state. + */ + spin_lock_irqsave(&my_qp->send_cq->spinlock, flags); + ret = calc_left_cqes((u64)send_wqe_p, &my_qp->ipz_squeue, + &my_qp->sq_map); + spin_unlock_irqrestore(&my_qp->send_cq->spinlock, flags); + if (ret) + return ret; + + + spin_lock_irqsave(&my_qp->recv_cq->spinlock, flags); + ret = calc_left_cqes((u64)recv_wqe_p, &my_qp->ipz_rqueue, + &my_qp->rq_map); + spin_unlock_irqrestore(&my_qp->recv_cq->spinlock, flags); + if (ret) + return ret; + } else { + spin_lock_irqsave(&my_qp->send_cq->spinlock, flags); + my_qp->sq_map.left_to_poll = 0; + spin_unlock_irqrestore(&my_qp->send_cq->spinlock, flags); + + spin_lock_irqsave(&my_qp->recv_cq->spinlock, flags); + my_qp->rq_map.left_to_poll = 0; + spin_unlock_irqrestore(&my_qp->recv_cq->spinlock, flags); + } + + /* this assures flush cqes being generated only for pending wqes */ + if ((my_qp->sq_map.left_to_poll == 0) && + (my_qp->rq_map.left_to_poll == 0)) { + spin_lock_irqsave(&my_qp->send_cq->spinlock, flags); + ehca_add_to_err_list(my_qp, 1); + spin_unlock_irqrestore(&my_qp->send_cq->spinlock, flags); + + if (HAS_RQ(my_qp)) { + spin_lock_irqsave(&my_qp->recv_cq->spinlock, flags); + ehca_add_to_err_list(my_qp, 0); + spin_unlock_irqrestore(&my_qp->recv_cq->spinlock, + flags); + } + } + + return 0; +} + /* * internal_modify_qp with circumvention to handle aqp0 properly * smi_reset2init indicates if this is an internal reset-to-init-call for @@ -1539,10 +1703,27 @@ static int internal_modify_qp(struct ib_ goto modify_qp_exit2; } } + if ((qp_new_state == IB_QPS_ERR) && (qp_cur_state != IB_QPS_ERR)) { + ret = check_for_left_cqes(my_qp, shca); + if (ret) + goto modify_qp_exit2; + } if (statetrans == IB_QPST_ANY2RESET) { ipz_qeit_reset(&my_qp->ipz_rqueue); ipz_qeit_reset(&my_qp->ipz_squeue); + + if (qp_cur_state == IB_QPS_ERR) { + del_from_err_list(my_qp->send_cq, &my_qp->sq_err_node); + + if (HAS_RQ(my_qp)) + del_from_err_list(my_qp->recv_cq, + &my_qp->rq_err_node); + } + reset_queue_map(&my_qp->sq_map); + + if (HAS_RQ(my_qp)) + reset_queue_map(&my_qp->rq_map); } if (attr_mask & IB_QP_QKEY) @@ -1958,6 +2139,16 @@ static int internal_destroy_qp(struct ib idr_remove(&ehca_qp_idr, my_qp->token); write_unlock_irqrestore(&ehca_qp_idr_lock, flags); + /* + * SRQs will never get into an error list and do not have a recv_cq, + * so we need to skip them here. + */ + if (HAS_RQ(my_qp) && !IS_SRQ(my_qp)) + del_from_err_list(my_qp->recv_cq, &my_qp->rq_err_node); + + if (HAS_SQ(my_qp)) + del_from_err_list(my_qp->send_cq, &my_qp->sq_err_node); + /* now wait until all pending events have completed */ wait_event(my_qp->wait_completion, !atomic_read(&my_qp->nr_events)); @@ -1983,7 +2174,7 @@ static int internal_destroy_qp(struct ib if (qp_type == IB_QPT_GSI) { struct ib_event event; ehca_info(dev, "device %s: port %x is inactive.", - shca->ib_device.name, port_num); + shca->ib_device.name, port_num); event.device = &shca->ib_device; event.event = IB_EVENT_PORT_ERR; event.element.port_num = port_num; @@ -1991,11 +2182,15 @@ static int internal_destroy_qp(struct ib ib_dispatch_event(&event); } - if (HAS_RQ(my_qp)) + if (HAS_RQ(my_qp)) { ipz_queue_dtor(my_pd, &my_qp->ipz_rqueue); + + vfree(my_qp->rq_map.map); + } if (HAS_SQ(my_qp)) { ipz_queue_dtor(my_pd, &my_qp->ipz_squeue); - vfree(my_qp->sq_map); + + vfree(my_qp->sq_map.map); } kmem_cache_free(qp_cache, my_qp); atomic_dec(&shca->num_qps); --- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_iverbs.h +++ infiniband.git/drivers/infiniband/hw/ehca/ehca_iverbs.h @@ -197,6 +197,8 @@ void ehca_poll_eqs(unsigned long data); int ehca_calc_ipd(struct ehca_shca *shca, int port, enum ib_rate path_rate, u32 *ipd); +void ehca_add_to_err_list(struct ehca_qp *qp, int on_sq); + #ifdef CONFIG_PPC_64K_PAGES void *ehca_alloc_fw_ctrlblock(gfp_t flags); void ehca_free_fw_ctrlblock(void *ptr); From sashak at voltaire.com Wed Sep 10 09:14:41 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 10 Sep 2008 19:14:41 +0300 Subject: [ofa-general] Re: [OpenSM][Trivial] Fix comment typo In-Reply-To: <1220979703.27074.56.camel@cardanus.llnl.gov> References: <1220979703.27074.56.camel@cardanus.llnl.gov> Message-ID: <20080910161441.GC11923@sashak.voltaire.com> On 10:01 Tue 09 Sep , Al Chu wrote: > > Noticed it while looking at some other code in the header file. > > Signed-off-by: Albert Chu Applied. Thanks. Sasha From sashak at voltaire.com Wed Sep 10 09:27:38 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 10 Sep 2008 19:27:38 +0300 Subject: [ofa-general] Re: [OpenSM][Trivial] remove old comments In-Reply-To: <1220993204.27074.61.camel@cardanus.llnl.gov> References: <1220993204.27074.61.camel@cardanus.llnl.gov> Message-ID: <20080910162738.GD11923@sashak.voltaire.com> On 13:46 Tue 09 Sep , Al Chu wrote: > Hey Sasha, > > I assume some legacy comment that is no longer relevant (variable does > not exist in the source). > > Al > > Signed-off-by: Albert Chu Applied. Thanks. Sasha From sashak at voltaire.com Wed Sep 10 10:02:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 10 Sep 2008 20:02:00 +0300 Subject: [ofa-general] Re: [PATCH][TRIVIAL]osm_(helper trap_rcv).c: Change output format of notice type to unsigned decimal In-Reply-To: <48C7C958.6080100@obsidianresearch.com> References: <48C7C958.6080100@obsidianresearch.com> Message-ID: <20080910170200.GG11923@sashak.voltaire.com> On 07:19 Wed 10 Sep , Hal Rosenstock wrote: > Sasha, > > Attached is a trivial patch to modify the output format of notice type to > unsigned decimal. > > -- Hal > > opensm/osm_(helper trap_rcv).c: Display type in unsigned decimal rather > than hex for better clarity and to be consistent with format in osm_inform.c > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From sashak at voltaire.com Wed Sep 10 10:17:16 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 10 Sep 2008 20:17:16 +0300 Subject: [ofa-general] Re: [PATCH] ibnetdiscover.c: continue processing other ports even if smpquery fails on one port In-Reply-To: <20080905154716.54d82f0e.weiny2@llnl.gov> References: <20080905154716.54d82f0e.weiny2@llnl.gov> Message-ID: <20080910171716.GH11923@sashak.voltaire.com> On 15:47 Fri 05 Sep , Ira Weiny wrote: > > Signed-off-by: Ira Weiny Applied. Thanks. Sasha From rdreier at cisco.com Wed Sep 10 11:26:16 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 Sep 2008 11:26:16 -0700 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled In-Reply-To: <20080910135116.GB26881@mtls03> (Eli Cohen's message of "Wed, 10 Sep 2008 16:51:16 +0300") References: <20080909145435.GO2316@sgi.com> <20080910135116.GB26881@mtls03> Message-ID: On Tue, Sep 09, 2008 at 02:32:44PM -0700, Roland Dreier wrote: > By the way, looking at this stuff again, it seems we have (a possibly > quite unlikely) race where a send can complete before the xmit method > finishes, and we end up running skb_orphan on an skb that another > context has already freed. I'll have to think about how we can fix > that -- but any good ideas are appreciated... Actually it looks like Arthur's patch introduces this race. The current code is OK because skb_orphan is called under tx_lock, which is also held when we poll the send CQ. But of course the status quo is no good exactly because of the locking issue Arthur found. > We can check if there are outstanding WRs after poll_tx is called. If > there are no outstanding WRs, it means that the SKB has been freed. If > there are outstanding WRs, it means that the last post has not been > freed so we can call skb_orphan(). Like the following patch (on top of > Arthur's): I don't think this closes the race completely: at the point skb_orphan is called (after Arthur's patch, by design), we have no locks held. And so the timer-driven send completion handling could already have run and freed the skb between when we drop tx_lock and when we call skb_orphan. - R. From rdreier at cisco.com Wed Sep 10 11:29:36 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 Sep 2008 11:29:36 -0700 Subject: [ofa-general] Re: [PATCH] ipoib: send creation parameters when doing send-only join In-Reply-To: <48C790CF.4050505@gmail.com> (Yossi Etigin's message of "Wed, 10 Sep 2008 12:18:07 +0300") References: <48C6A9C1.5070108@gmail.com> <48C790CF.4050505@gmail.com> Message-ID: > But we don't, so it's required. Please see bug #1153 for case description. Yes, I looked at the bug and I don't see the actual problem that is caused by the current code. OK, the group doesn't get created if there are only senders -- so what? It seems a better fix would be just to get rid of the #if 0 and use send-only membership after all these years? - R. From rdreier at cisco.com Wed Sep 10 11:31:01 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 Sep 2008 11:31:01 -0700 Subject: Fwd: [ofa-general] [PATCH] ipoib: fix hang while bringing down uninitialized interface In-Reply-To: <48C7DA7B.3050706@gmail.com> (Yossi Etigin's message of "Wed, 10 Sep 2008 17:32:27 +0300") References: <48C7DA7B.3050706@gmail.com> Message-ID: > Subject: [ofa-general] ***SPAM*** [PATCH] ipoib: fix hang while bringing down uninitialized interface Didn't see this the first time around, I guess because some mail server flagged it as spam. Looks like a real issue. Is this a regression from 2.6.26? (ie what introduced this bug?) From akepner at sgi.com Wed Sep 10 13:21:25 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Wed, 10 Sep 2008 13:21:25 -0700 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled In-Reply-To: References: <20080909145435.GO2316@sgi.com> <20080910135116.GB26881@mtls03> Message-ID: <20080910202125.GD31435@sgi.com> On Wed, Sep 10, 2008 at 11:26:16AM -0700, Roland Dreier wrote: > .... > I don't think this closes the race completely: at the point skb_orphan > is called (after Arthur's patch, by design), we have no locks held. And > so the timer-driven send completion handling could already have run and > freed the skb between when we drop tx_lock and when we call skb_orphan. > Suppose we could just remove the skb_orphan() call from ipoib_send() entirely, and wait for net_tx_action() to do it for us. But I imagine there must be a (performance-related) reason why it's done the way it is. -- Arthur From christopher.tanner at gatech.edu Wed Sep 10 14:13:55 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Wed, 10 Sep 2008 17:13:55 -0400 Subject: [ofa-general] Compiled IB packages In-Reply-To: <48C7D63A.8090005@mellanox.co.il> References: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> <48C6904E.1020606@mellanox.co.il> <1221031735.6948.12.camel@vlad-laptop> <94325E85-9403-4264-A4FE-90A567A8655B@gatech.edu> <48C7D63A.8090005@mellanox.co.il> Message-ID: > If you compiling for the running kernel then configure will find > kernel sources using /lib/modules/`uname -r`/build link. > So, you don't have to pass '--kernel-sources' and '--kernel'. Ah, I see now. > Why you think that updates is wrong? > > modprobe works with /lib/modules/`uname -r`/updates directory in the > following way: > if kernel module with the same name is present under /lib/modules/ > `uname -r`/kernel and > under /lib/modules/`uname -r`/updates then the module from updates > will be loaded. I only said this because, on my system, the /lib/modules/2.6.24-16- server/updates directory doesn't exist; thus the make process was having an error. However, the /lib/modules/2.6.24-16-server/kernel does exist, but this directory wasn't searched by the make process (as far as I can tell). > Both options are good. > Note, if you use option b) then you need to run "depmod" after > copying kernel modules. Ah, thanks for the heads up on depmod. ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- On Sep 10, 2008, at 10:14 AM, Vladimir Sokolovsky wrote: > Christopher Tanner wrote: >> Vladimir - >> Good catch on the linux headers version - I fixed that now. The >> problem persisted after fixing the headers... but I finally figured >> out what the issues were. On the configure line: >> a) the --kernel-sources option needs the path to the linux HEADERS >> (linux-headers-), not the linux SOURCE (linux-source-). >> Terminology there is confusing... > > If you compiling for the running kernel then configure will find > kernel sources using /lib/modules/`uname -r`/build link. > So, you don't have to pass '--kernel-sources' and '--kernel'. > >> b) If I didn't specify anything for the --modules-dir option, it >> defaults to /lib/modules/2.6.24-16-server/updates. I don't know >> what the 'updates' gets appended onto the end, but that is not >> correct. So I had to specify --modules-dir=/lib/modules/2.6.24-16- >> server > > Why you think that updates is wrong? > > modprobe works with /lib/modules/`uname -r`/updates directory in the > following way: > if kernel module with the same name is present under /lib/modules/ > `uname -r`/kernel and > under /lib/modules/`uname -r`/updates then the module from updates > will be loaded. > >> It compiled and installed just fine! >> My final question - how do I install the kernel modules on the rest >> of the nodes? The source was compiled in the /home directory, which >> is shared to all nodes via NFS. However, the kernel headers are NOT >> shared to the rest of the nodes. Do you recommend I: >> a) Install the linux headers on all of the nodes and execute 'make >> install' on all nodes >> b) Look at where the modules installed to (from the make install >> output) and copy the files manually > > Both options are good. > Note, if you use option b) then you need to run "depmod" after > copying kernel modules. > > Regards, > Vladimir > >> Thanks! >> ------------------------------------------- >> Chris Tanner >> Space Systems Design Lab >> Georgia Institute of Technology >> christopher.tanner at gatech.edu >> ------------------------------------------- >> On Sep 10, 2008, at 3:28 AM, Vladimir Sokolovsky wrote: >>> Hi, >>>> From the log file, I see the mismatch between the sources you are >>> passing to configure command and autoconf.h/auto.conf below: >>> >>> /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h >>> /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf >>> >>>> From the log file: >>> Kernel version: 2.6.24-16-server >>> Modules directory: //lib/modules/2.6.24-16-server/updates >>> Kernel sources: /usr/src/linux-source-2.6.24 >>> >>> Check that you have corresponding (matching the running kernel) >>> linux-headers package installed and then you don't have to pass >>> --kernel-sources and --kernel parameters to the configure script. >>> >>> E.g. >>> for kernel 2.6.24-19-generic it is linux-headers-2.6.24-19-generic >>> >>> Regards, >>> Vladimir >>> >>> On Tue, 2008-09-09 at 14:53 -0400, Christopher Tanner wrote: >>>> Thanks Vladimir - very helpful. However, I'm running into a problem >>>> with compiling the ofa package. First, I had to specify the source >>>> location on the command line (Ubuntu puts it in a different place >>>> than >>>> RedHat or SUSE): >>>> >>>> $ ./configure --kernel-sources=/usr/src/linux-source-2.6.24 ... >>>> (other >>>> stuff) >>>> >>>> I'm getting this error: >>>> >>>> ERROR: Kernel configuration is invalid. >>>> include/linux/autoconf.h or include/config/auto.conf are >>>> missing. >>>> Run 'make oldconfig && make prepare' on kernel src to fix >>>> it. >>>> >>>> This is confusing b/c both of those files exist. >>>> $ locate autoconf.h >>>> /usr/src/linux-headers-2.6.24-19-generic/include/linux/autoconf.h >>>> >>>> $ locate auto.conf >>>> /usr/src/linux-headers-2.6.24-19-generic/include/config/auto.conf >>>> >>>> There's a whole bunch more errors that I assume spawn because of >>>> this >>>> initial error. The output from 'make' is attached (it's pretty >>>> long). >>>> Let me know what you think. Thanks! >>>> >>>> ------------------------------------------- >>>> Chris Tanner >>>> Space Systems Design Lab >>>> Georgia Institute of Technology >>>> christopher.tanner at gatech.edu >>>> ------------------------------------------- >>>> >>>> >>>> >>>> On Sep 9, 2008, at 11:03 AM, Vladimir Sokolovsky wrote: >>>> >>>>> Christopher Tanner wrote: >>>>>> I am setting up a 16-node (homogeneous) cluster running Ubuntu >>>>>> 8.04 >>>>>> server with Mellanox Infiniband cards. I downloaded (from the >>>>>> OpenFabrics website), compiled, and installed the following IB >>>>>> packages on the master node into the /usr/local/lib directory. >>>>>> The / >>>>>> usr/local directory is being shared to all of the nodes via NFS. >>>>>> All packages seemed to compile and install fine. >>>>>> libibverbs >>>>>> librdmacm >>>>>> libibcm >>>>>> libipathverbs >>>>>> dapl >>>>>> compat-dapl >>>>>> libmlx4 >>>>>> libmthca >>>>>> libcxgb3 >>>>>> libibcommon >>>>>> libibumad >>>>>> libibmad >>>>>> opensm >>>>>> infiniband-diags >>>>>> I have a few questions: >>>>>> a) Do I need to run 'make install' on each node or just the >>>>>> master >>>>>> node? All of the libraries in /usr/local/lib are visible to all >>>>>> nodes... Stated another way, does 'make install' put files >>>>>> elsewhere beside the /usr/local/lib directory? Does it alter OS >>>>>> configuration files to tell it to look for certain files in /usr/ >>>>>> local/lib? >>>>> >>>>> No, all the packages above will put their files under /usr/local >>>>> >>>>>> b) I know I need to load the IB kernel modules (mlx4_core, >>>>>> mlx4_ib, rdma_ucm, ib_core, ib_mad, ib_mthca, ib_umad, ib_uverbs) >>>>>> in order for the IB cards to work. Are these compiled and >>>>>> installed >>>>>> with the above packages? Where does the kernel know where to look >>>>>> for modules? (Sorry, this question is very similar to the first >>>>>> one). >>>>> >>>>> The packages above are user space libraries/binaries. To install >>>>> kernel >>>>> modules you should download the latest version of the >>>>> ofa_1_4_kernel >>>>> tgz file from: >>>>> >>>>> http://www.openfabrics.org/downloads/ofa_1_4_kernel/ >>>>> To install, run: >>>>> ./configure --with-core-mod --with-user_mad-mod --with- >>>>> user_access- >>>>> mod --with-addr_trans-mod --with-mthca-mod --with-mthca_debug- >>>>> mod -- >>>>> with-mlx4-mod --with-mlx4_en-mod --with-mlx4_debug-mod --with- >>>>> cxgb3- >>>>> mod --with-ehca-mod --with-ipoib-mod --with-ipoib_debug-mod (... , >>>>> see --help) >>>>> make >>>>> make install >>>>> >>>>> >>>>>> c) The OFED software stack contains some stuff that isn't >>>>>> available >>>>>> for source download (e.g. ib-bonding, ibsim, libsdp). Are these >>>>>> necessary for the IB network to operate correctly? Since I'm >>>>>> running Ubuntu, obviously the src.rpm file won't work... >>>>> >>>>> All OFED tgz files that are available under: >>>>> http://www.openfabrics.org/~vlad/ofed_1_4/SOURCES/ >>>>> >>>>> ib-bonding source RPM can be downloaded from (you can open it to >>>>> get >>>>> tgz file using cpio, if you need): >>>>> http://www.openfabrics.org/~monis/ofed_1_4/ >>>>> >>>>> This packages are not necessary for the IB network to operate >>>>> correctly, but >>>>> it depends on what are you planning to do. >>>>> >>>>> Regards, >>>>> Vladimir >>>>> >>>>>> Thanks to all for you help. Previous responses regarding issues >>>>>> with OpenSM worked great. >>>>>> ------------------------------------------- >>>>>> Chris Tanner >>>>>> Space Systems Design Lab >>>>>> Georgia Institute of Technology >>>>>> christopher.tanner at gatech.edu >>>>>> ------------------------------------------- >>>> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sweitzen at cisco.com Wed Sep 10 14:14:44 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 10 Sep 2008 14:14:44 -0700 Subject: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available In-Reply-To: <20080909202521.GG3716@cse.ohio-state.edu> References: <5D49E7A8952DC44FB38C38FA0D758EAD75979A@mtlexch01.mtl.com> <20080909202521.GG3716@cse.ohio-state.edu> Message-ID: I'm also not getting mpiexec built, at least on the first distro I tried (RHEL4 x86_64): # rpm -qlip mvapich2_gcc-1.2rc2-4.x86_64.rpm | fgrep mpiexec /usr/mpi/gcc/mvapich2-1.2rc2/bin/mpiexec.mpd Scott > -----Original Message----- > From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu] > Sent: Tuesday, September 09, 2008 1:25 PM > To: Scott Weitzenkamp (sweitzen) > Cc: Tziporet Koren; ewg at lists.openfabrics.org; > general at lists.openfabrics.org > Subject: Re: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > Thanks for the note. We are taking a look at this. > > On Tue, Sep 09, 2008 at 12:52:44PM -0700, Scott Weitzenkamp > (sweitzen) wrote: > > I am unable to build MVAPICH2 for multiple compilers: > > > > Building the MVAPICH2 RPM [OFA]... > > Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' > > --define 'di > > st %{nil}' --target x86_64 --define '_name mvapich2_gcc' > --define 'impl > > ofa' --d > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > --with-ib-include=/usr/includ > > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > > 'shared_libs 1' - > > -define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran > > F90=gfortran' - > > -define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' > > --define '_pr > > efix /usr/mpi/gcc/mvapich2-1.2rc2' > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src > > .rpm > > Install mvapich2_gcc RPM: > > Running rpm -iv --nodeps > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > apich2_gcc-1.2rc2-4.x86_64.rpm > > Build mvapich2_pgi RPM > > Building the MVAPICH2 RPM [OFA]... > > Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' > > --define 'di > > st %{nil}' --target x86_64 --define '_name mvapich2_pgi' > --define 'impl > > ofa' --d > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > --with-ib-include=/usr/includ > > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > > 'shared_libs 1' - > > -define 'romio 1' --define 'comp_env CC=pgcc CXX=pgCC F77=pgf77 > > F90=pgf90' --def > > ine 'auto_req 0' --define 'mpi_selector > /usr/bin/mpi-selector' --define > > '_prefix > > /usr/mpi/pgi/mvapich2-1.2rc2' > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src.rpm > > Install mvapich2_pgi RPM: > > Running rpm -iv --nodeps > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > apich2_pgi-1.2rc2-4.x86_64.rpm > > Failed to install mvapich2_pgi RPM > > See /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > > # more /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > Preparing packages for installation... > > file /etc/mpe_graphics.conf from install of > > mvapich2_pgi-1.2rc2-4 confli > > cts with file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpe_log.conf from install of mvapich2_pgi-1.2rc2-4 > > conflicts w > > ith file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpe_mpianim.conf from install of > mvapich2_pgi-1.2rc2-4 > > conflic > > ts with file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpe_mpicheck.conf from install of > > mvapich2_pgi-1.2rc2-4 confli > > cts with file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpe_mpilog.conf from install of > mvapich2_pgi-1.2rc2-4 > > conflict > > s with file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpe_mpitrace.conf from install of > > mvapich2_pgi-1.2rc2-4 confli > > cts with file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpe_nolog.conf from install of > mvapich2_pgi-1.2rc2-4 > > conflicts > > with file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpicc.conf from install of mvapich2_pgi-1.2rc2-4 > > conflicts wit > > h file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpicxx.conf from install of mvapich2_pgi-1.2rc2-4 > > conflicts wi > > th file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpif77.conf from install of mvapich2_pgi-1.2rc2-4 > > conflicts wi > > th file from package mvapich2_gcc-1.2rc2-4 > > file /etc/mpif90.conf from install of mvapich2_pgi-1.2rc2-4 > > conflicts wi > > th file from package mvapich2_gcc-1.2rc2-4 > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Access Virtualization Business Unit > > Cisco Systems > > > > > > > > > > > -----Original Message----- > > > From: general-bounces at lists.openfabrics.org > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > > Tziporet Koren > > > Sent: Tuesday, September 09, 2008 8:20 AM > > > To: ewg at lists.openfabrics.org > > > Cc: general at lists.openfabrics.org > > > Subject: [ofa-general] OFED 1.4-RC1 is available > > > > > > Hi, > > > OFED 1.4-RC1 release is available on > > > > http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc1.tgz > > > > > > To get BUILD_ID run ofed_info > > > > > > Please report any issues in bugzilla > https://bugs.openfabrics.org/ for > > > OFED 1.4 > > > > > > Tziporet & Vladimir > > > > > > ============================================================== > > > ========== > > > > > > Release information: > > > -------------------- > > > Linux Operating Systems: > > > - RedHat EL4 up4: 2.6.9-42.ELsmp * > > > - RedHat EL4 up5: 2.6.9-55.ELsmp > > > - RedHat EL4 up6: 2.6.9-67.ELsmp > > > - RedHat EL4 up7: 2.6.9-78.ELsmp > > > - RedHat EL5: 2.6.18-8.el5 > > > - RedHat EL5 up1: 2.6.18-53.el5 > > > - RedHat EL5 up2: 2.6.18-92.el5 > > > - CentOS 5.2: 2.6.18-92.el5 > > > - Fedora C9: 2.6.25-14.fc9 * > > > - SLES10: 2.6.16.21-0.8-smp > > > - SLES10 SP1: 2.6.16.46-0.12-smp > > > - SLES10 SP1 up1: 2.6.16.53-0.16-smp > > > - SLES10 SP2: 2.6.16.60-0.21-smp > > > - OpenSuSE 10.3: 2.6.22.5-31 * > > > - kernel.org: 2.6.26 and 2.6.27-rc5 > > > > > > * Minimal QA for these versions > > > > > > Systems: > > > * x86_64 > > > * x86 > > > * ia64 > > > * ppc64 > > > > > > > > > Main Changes from OFED 1.4-beta > > > =============================== > > > o Kernel code based on 2.6.27-rc5 > > > o Added NFS-RDMA support for SLES10 SP2 and kernel 2.6.26 and 27 > > > o iSER backports added and its now available > > > o New MPI packages: Open MPI 1.2.7, MVAPICH 1.1 and MVAPICH2 1.1 > > > o New DAPL libraries > > > o 37 bugs fixed (see attached for details) > > > > > > > > > Tasks that should be completed for the RC2: > > > =========================================== > > > 1. NFS-RDMA to work on RHEL 5.1 > > > 2. OSM: Cashed routing > > > 3. Cleanup compilation warning > > > 4. Bug fixes > > > > > _______________________________________________ > > ewg mailing list > > ewg at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo > From panda at cse.ohio-state.edu Wed Sep 10 15:14:37 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed, 10 Sep 2008 18:14:37 -0400 (EDT) Subject: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available In-Reply-To: Message-ID: Hi Scott, Thanks for your note. Starting with MVAPICH2 1.2, a new scalable mpirun_rsh job start-up framework (similar to the one used in MVAPICH) has been introduced. This allows MVAPICH2 to start on multi-thousand core clusters with very little time (like MVAPICH). It also allows job start-up scheme to be uniform across MVAPICH and MVAPICH2. The traditional MPD/mpiexec job start-up option is still there. In the latest MVAPICH2 1.2 SRPM (1.2rc2-4), the default has been set for the new scalable start-up scheme. That's why you are not able to have it built with mpiexec. Since Jonathan is updating the SRPM to take care of the multiple compilers errors (you reported yesterday), we will also include an option to have either of these two job start-up schemes (A. the new scalable mpirun_rsh framework or B. the traditional MPD-based framework) installed. The new SRPM to be uploaded by tomorrow will have all these fixes. Let us know if this will work out for you. Thanks, DK On Wed, 10 Sep 2008, Scott Weitzenkamp (sweitzen) wrote: > I'm also not getting mpiexec built, at least on the first distro I tried > (RHEL4 x86_64): > > # rpm -qlip mvapich2_gcc-1.2rc2-4.x86_64.rpm | fgrep mpiexec > /usr/mpi/gcc/mvapich2-1.2rc2/bin/mpiexec.mpd > > Scott > > > > > -----Original Message----- > > From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu] > > Sent: Tuesday, September 09, 2008 1:25 PM > > To: Scott Weitzenkamp (sweitzen) > > Cc: Tziporet Koren; ewg at lists.openfabrics.org; > > general at lists.openfabrics.org > > Subject: Re: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > > > Thanks for the note. We are taking a look at this. > > > > On Tue, Sep 09, 2008 at 12:52:44PM -0700, Scott Weitzenkamp > > (sweitzen) wrote: > > > I am unable to build MVAPICH2 for multiple compilers: > > > > > > Building the MVAPICH2 RPM [OFA]... > > > Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' > > > --define 'di > > > st %{nil}' --target x86_64 --define '_name mvapich2_gcc' > > --define 'impl > > > ofa' --d > > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > > --with-ib-include=/usr/includ > > > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > > > 'shared_libs 1' - > > > -define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran > > > F90=gfortran' - > > > -define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' > > > --define '_pr > > > efix /usr/mpi/gcc/mvapich2-1.2rc2' > > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src > > > .rpm > > > Install mvapich2_gcc RPM: > > > Running rpm -iv --nodeps > > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > > apich2_gcc-1.2rc2-4.x86_64.rpm > > > Build mvapich2_pgi RPM > > > Building the MVAPICH2 RPM [OFA]... > > > Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' > > > --define 'di > > > st %{nil}' --target x86_64 --define '_name mvapich2_pgi' > > --define 'impl > > > ofa' --d > > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > > --with-ib-include=/usr/includ > > > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > > > 'shared_libs 1' - > > > -define 'romio 1' --define 'comp_env CC=pgcc CXX=pgCC F77=pgf77 > > > F90=pgf90' --def > > > ine 'auto_req 0' --define 'mpi_selector > > /usr/bin/mpi-selector' --define > > > '_prefix > > > /usr/mpi/pgi/mvapich2-1.2rc2' > > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src.rpm > > > Install mvapich2_pgi RPM: > > > Running rpm -iv --nodeps > > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > > apich2_pgi-1.2rc2-4.x86_64.rpm > > > Failed to install mvapich2_pgi RPM > > > See /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > > > > # more /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > Preparing packages for installation... > > > file /etc/mpe_graphics.conf from install of > > > mvapich2_pgi-1.2rc2-4 confli > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpe_log.conf from install of mvapich2_pgi-1.2rc2-4 > > > conflicts w > > > ith file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpe_mpianim.conf from install of > > mvapich2_pgi-1.2rc2-4 > > > conflic > > > ts with file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpe_mpicheck.conf from install of > > > mvapich2_pgi-1.2rc2-4 confli > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpe_mpilog.conf from install of > > mvapich2_pgi-1.2rc2-4 > > > conflict > > > s with file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpe_mpitrace.conf from install of > > > mvapich2_pgi-1.2rc2-4 confli > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpe_nolog.conf from install of > > mvapich2_pgi-1.2rc2-4 > > > conflicts > > > with file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpicc.conf from install of mvapich2_pgi-1.2rc2-4 > > > conflicts wit > > > h file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpicxx.conf from install of mvapich2_pgi-1.2rc2-4 > > > conflicts wi > > > th file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpif77.conf from install of mvapich2_pgi-1.2rc2-4 > > > conflicts wi > > > th file from package mvapich2_gcc-1.2rc2-4 > > > file /etc/mpif90.conf from install of mvapich2_pgi-1.2rc2-4 > > > conflicts wi > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > > > Scott Weitzenkamp > > > SQA and Release Manager > > > Server Access Virtualization Business Unit > > > Cisco Systems > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: general-bounces at lists.openfabrics.org > > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > > > Tziporet Koren > > > > Sent: Tuesday, September 09, 2008 8:20 AM > > > > To: ewg at lists.openfabrics.org > > > > Cc: general at lists.openfabrics.org > > > > Subject: [ofa-general] OFED 1.4-RC1 is available > > > > > > > > Hi, > > > > OFED 1.4-RC1 release is available on > > > > > > http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc1.tgz > > > > > > > > To get BUILD_ID run ofed_info > > > > > > > > Please report any issues in bugzilla > > https://bugs.openfabrics.org/ for > > > > OFED 1.4 > > > > > > > > Tziporet & Vladimir > > > > > > > > ============================================================== > > > > ========== > > > > > > > > Release information: > > > > -------------------- > > > > Linux Operating Systems: > > > > - RedHat EL4 up4: 2.6.9-42.ELsmp * > > > > - RedHat EL4 up5: 2.6.9-55.ELsmp > > > > - RedHat EL4 up6: 2.6.9-67.ELsmp > > > > - RedHat EL4 up7: 2.6.9-78.ELsmp > > > > - RedHat EL5: 2.6.18-8.el5 > > > > - RedHat EL5 up1: 2.6.18-53.el5 > > > > - RedHat EL5 up2: 2.6.18-92.el5 > > > > - CentOS 5.2: 2.6.18-92.el5 > > > > - Fedora C9: 2.6.25-14.fc9 * > > > > - SLES10: 2.6.16.21-0.8-smp > > > > - SLES10 SP1: 2.6.16.46-0.12-smp > > > > - SLES10 SP1 up1: 2.6.16.53-0.16-smp > > > > - SLES10 SP2: 2.6.16.60-0.21-smp > > > > - OpenSuSE 10.3: 2.6.22.5-31 * > > > > - kernel.org: 2.6.26 and 2.6.27-rc5 > > > > > > > > * Minimal QA for these versions > > > > > > > > Systems: > > > > * x86_64 > > > > * x86 > > > > * ia64 > > > > * ppc64 > > > > > > > > > > > > Main Changes from OFED 1.4-beta > > > > =============================== > > > > o Kernel code based on 2.6.27-rc5 > > > > o Added NFS-RDMA support for SLES10 SP2 and kernel 2.6.26 and 27 > > > > o iSER backports added and its now available > > > > o New MPI packages: Open MPI 1.2.7, MVAPICH 1.1 and MVAPICH2 1.1 > > > > o New DAPL libraries > > > > o 37 bugs fixed (see attached for details) > > > > > > > > > > > > Tasks that should be completed for the RC2: > > > > =========================================== > > > > 1. NFS-RDMA to work on RHEL 5.1 > > > > 2. OSM: Cashed routing > > > > 3. Cleanup compilation warning > > > > 4. Bug fixes > > > > > > > _______________________________________________ > > > ewg mailing list > > > ewg at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > -- > > Jonathan Perkins > > http://www.cse.ohio-state.edu/~perkinjo > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sweitzen at cisco.com Wed Sep 10 15:15:53 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 10 Sep 2008 15:15:53 -0700 Subject: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available In-Reply-To: References: Message-ID: So I can have mpirun_rsh *or* mpiexec, but not both? Scott > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] > Sent: Wednesday, September 10, 2008 3:15 PM > To: Scott Weitzenkamp (sweitzen) > Cc: Jonathan Perkins; ewg at lists.openfabrics.org; > general at lists.openfabrics.org; Dhabaleswar Panda > Subject: RE: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > Hi Scott, > > Thanks for your note. Starting with MVAPICH2 1.2, a new scalable > mpirun_rsh job start-up framework (similar to the one used in > MVAPICH) has > been introduced. This allows MVAPICH2 to start on multi-thousand core > clusters with very little time (like MVAPICH). It also allows > job start-up > scheme to be uniform across MVAPICH and MVAPICH2. The traditional > MPD/mpiexec job start-up option is still there. In the latest > MVAPICH2 1.2 > SRPM (1.2rc2-4), the default has been set for the new > scalable start-up > scheme. That's why you are not able to have it built with > mpiexec. Since > Jonathan is updating the SRPM to take care of the multiple compilers > errors (you reported yesterday), we will also include an > option to have > either of these two job start-up schemes (A. the new scalable > mpirun_rsh > framework or B. the traditional MPD-based framework) > installed. The new > SRPM to be uploaded by tomorrow will have all these fixes. > > Let us know if this will work out for you. > > Thanks, > > DK > > On Wed, 10 Sep 2008, Scott Weitzenkamp (sweitzen) wrote: > > > I'm also not getting mpiexec built, at least on the first > distro I tried > > (RHEL4 x86_64): > > > > # rpm -qlip mvapich2_gcc-1.2rc2-4.x86_64.rpm | fgrep mpiexec > > /usr/mpi/gcc/mvapich2-1.2rc2/bin/mpiexec.mpd > > > > Scott > > > > > > > > > -----Original Message----- > > > From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu] > > > Sent: Tuesday, September 09, 2008 1:25 PM > > > To: Scott Weitzenkamp (sweitzen) > > > Cc: Tziporet Koren; ewg at lists.openfabrics.org; > > > general at lists.openfabrics.org > > > Subject: Re: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > > > > > Thanks for the note. We are taking a look at this. > > > > > > On Tue, Sep 09, 2008 at 12:52:44PM -0700, Scott Weitzenkamp > > > (sweitzen) wrote: > > > > I am unable to build MVAPICH2 for multiple compilers: > > > > > > > > Building the MVAPICH2 RPM [OFA]... > > > > Running rpmbuild --rebuild --define '_topdir > /var/tmp/OFED_topdir' > > > > --define 'di > > > > st %{nil}' --target x86_64 --define '_name mvapich2_gcc' > > > --define 'impl > > > > ofa' --d > > > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > > > --with-ib-include=/usr/includ > > > > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > > > > 'shared_libs 1' - > > > > -define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran > > > > F90=gfortran' - > > > > -define 'auto_req 0' --define 'mpi_selector > /usr/bin/mpi-selector' > > > > --define '_pr > > > > efix /usr/mpi/gcc/mvapich2-1.2rc2' > > > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src > > > > .rpm > > > > Install mvapich2_gcc RPM: > > > > Running rpm -iv --nodeps > > > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > > > apich2_gcc-1.2rc2-4.x86_64.rpm > > > > Build mvapich2_pgi RPM > > > > Building the MVAPICH2 RPM [OFA]... > > > > Running rpmbuild --rebuild --define '_topdir > /var/tmp/OFED_topdir' > > > > --define 'di > > > > st %{nil}' --target x86_64 --define '_name mvapich2_pgi' > > > --define 'impl > > > > ofa' --d > > > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > > > --with-ib-include=/usr/includ > > > > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > > > > 'shared_libs 1' - > > > > -define 'romio 1' --define 'comp_env CC=pgcc CXX=pgCC F77=pgf77 > > > > F90=pgf90' --def > > > > ine 'auto_req 0' --define 'mpi_selector > > > /usr/bin/mpi-selector' --define > > > > '_prefix > > > > /usr/mpi/pgi/mvapich2-1.2rc2' > > > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src.rpm > > > > Install mvapich2_pgi RPM: > > > > Running rpm -iv --nodeps > > > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > > > apich2_pgi-1.2rc2-4.x86_64.rpm > > > > Failed to install mvapich2_pgi RPM > > > > See /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > > > > > > # more /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > > Preparing packages for installation... > > > > file /etc/mpe_graphics.conf from install of > > > > mvapich2_pgi-1.2rc2-4 confli > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpe_log.conf from install of > mvapich2_pgi-1.2rc2-4 > > > > conflicts w > > > > ith file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpe_mpianim.conf from install of > > > mvapich2_pgi-1.2rc2-4 > > > > conflic > > > > ts with file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpe_mpicheck.conf from install of > > > > mvapich2_pgi-1.2rc2-4 confli > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpe_mpilog.conf from install of > > > mvapich2_pgi-1.2rc2-4 > > > > conflict > > > > s with file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpe_mpitrace.conf from install of > > > > mvapich2_pgi-1.2rc2-4 confli > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpe_nolog.conf from install of > > > mvapich2_pgi-1.2rc2-4 > > > > conflicts > > > > with file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpicc.conf from install of > mvapich2_pgi-1.2rc2-4 > > > > conflicts wit > > > > h file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpicxx.conf from install of > mvapich2_pgi-1.2rc2-4 > > > > conflicts wi > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpif77.conf from install of > mvapich2_pgi-1.2rc2-4 > > > > conflicts wi > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > file /etc/mpif90.conf from install of > mvapich2_pgi-1.2rc2-4 > > > > conflicts wi > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > > > > > Scott Weitzenkamp > > > > SQA and Release Manager > > > > Server Access Virtualization Business Unit > > > > Cisco Systems > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: general-bounces at lists.openfabrics.org > > > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > > > > Tziporet Koren > > > > > Sent: Tuesday, September 09, 2008 8:20 AM > > > > > To: ewg at lists.openfabrics.org > > > > > Cc: general at lists.openfabrics.org > > > > > Subject: [ofa-general] OFED 1.4-RC1 is available > > > > > > > > > > Hi, > > > > > OFED 1.4-RC1 release is available on > > > > > > > > > http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc1.tgz > > > > > > > > > > To get BUILD_ID run ofed_info > > > > > > > > > > Please report any issues in bugzilla > > > https://bugs.openfabrics.org/ for > > > > > OFED 1.4 > > > > > > > > > > Tziporet & Vladimir > > > > > > > > > > ============================================================== > > > > > ========== > > > > > > > > > > Release information: > > > > > -------------------- > > > > > Linux Operating Systems: > > > > > - RedHat EL4 up4: 2.6.9-42.ELsmp * > > > > > - RedHat EL4 up5: 2.6.9-55.ELsmp > > > > > - RedHat EL4 up6: 2.6.9-67.ELsmp > > > > > - RedHat EL4 up7: 2.6.9-78.ELsmp > > > > > - RedHat EL5: 2.6.18-8.el5 > > > > > - RedHat EL5 up1: 2.6.18-53.el5 > > > > > - RedHat EL5 up2: 2.6.18-92.el5 > > > > > - CentOS 5.2: 2.6.18-92.el5 > > > > > - Fedora C9: 2.6.25-14.fc9 * > > > > > - SLES10: 2.6.16.21-0.8-smp > > > > > - SLES10 SP1: 2.6.16.46-0.12-smp > > > > > - SLES10 SP1 up1: 2.6.16.53-0.16-smp > > > > > - SLES10 SP2: 2.6.16.60-0.21-smp > > > > > - OpenSuSE 10.3: 2.6.22.5-31 * > > > > > - kernel.org: 2.6.26 and 2.6.27-rc5 > > > > > > > > > > * Minimal QA for these versions > > > > > > > > > > Systems: > > > > > * x86_64 > > > > > * x86 > > > > > * ia64 > > > > > * ppc64 > > > > > > > > > > > > > > > Main Changes from OFED 1.4-beta > > > > > =============================== > > > > > o Kernel code based on 2.6.27-rc5 > > > > > o Added NFS-RDMA support for SLES10 SP2 and kernel > 2.6.26 and 27 > > > > > o iSER backports added and its now available > > > > > o New MPI packages: Open MPI 1.2.7, MVAPICH 1.1 and > MVAPICH2 1.1 > > > > > o New DAPL libraries > > > > > o 37 bugs fixed (see attached for details) > > > > > > > > > > > > > > > Tasks that should be completed for the RC2: > > > > > =========================================== > > > > > 1. NFS-RDMA to work on RHEL 5.1 > > > > > 2. OSM: Cashed routing > > > > > 3. Cleanup compilation warning > > > > > 4. Bug fixes > > > > > > > > > _______________________________________________ > > > > ewg mailing list > > > > ewg at lists.openfabrics.org > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > > > -- > > > Jonathan Perkins > > > http://www.cse.ohio-state.edu/~perkinjo > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > From panda at cse.ohio-state.edu Wed Sep 10 15:24:46 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed, 10 Sep 2008 18:24:46 -0400 (EDT) Subject: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available In-Reply-To: Message-ID: > So I can have mpirun_rsh *or* mpiexec, but not both? Our goal is to provide both. However, it will take us some time to make the necessary changes to the SRPM creation process and test it. For tomorrow's SRPM version, we will provide the option for having one of these two. By next week, we will have an SRPM update to have both. Will that work out? Thanks, DK > Scott > > > > > -----Original Message----- > > From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] > > Sent: Wednesday, September 10, 2008 3:15 PM > > To: Scott Weitzenkamp (sweitzen) > > Cc: Jonathan Perkins; ewg at lists.openfabrics.org; > > general at lists.openfabrics.org; Dhabaleswar Panda > > Subject: RE: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > > > Hi Scott, > > > > Thanks for your note. Starting with MVAPICH2 1.2, a new scalable > > mpirun_rsh job start-up framework (similar to the one used in > > MVAPICH) has > > been introduced. This allows MVAPICH2 to start on multi-thousand core > > clusters with very little time (like MVAPICH). It also allows > > job start-up > > scheme to be uniform across MVAPICH and MVAPICH2. The traditional > > MPD/mpiexec job start-up option is still there. In the latest > > MVAPICH2 1.2 > > SRPM (1.2rc2-4), the default has been set for the new > > scalable start-up > > scheme. That's why you are not able to have it built with > > mpiexec. Since > > Jonathan is updating the SRPM to take care of the multiple compilers > > errors (you reported yesterday), we will also include an > > option to have > > either of these two job start-up schemes (A. the new scalable > > mpirun_rsh > > framework or B. the traditional MPD-based framework) > > installed. The new > > SRPM to be uploaded by tomorrow will have all these fixes. > > > > Let us know if this will work out for you. > > > > Thanks, > > > > DK > > > > On Wed, 10 Sep 2008, Scott Weitzenkamp (sweitzen) wrote: > > > > > I'm also not getting mpiexec built, at least on the first > > distro I tried > > > (RHEL4 x86_64): > > > > > > # rpm -qlip mvapich2_gcc-1.2rc2-4.x86_64.rpm | fgrep mpiexec > > > /usr/mpi/gcc/mvapich2-1.2rc2/bin/mpiexec.mpd > > > > > > Scott > > > > > > > > > > > > > -----Original Message----- > > > > From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu] > > > > Sent: Tuesday, September 09, 2008 1:25 PM > > > > To: Scott Weitzenkamp (sweitzen) > > > > Cc: Tziporet Koren; ewg at lists.openfabrics.org; > > > > general at lists.openfabrics.org > > > > Subject: Re: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > > > > > > > Thanks for the note. We are taking a look at this. > > > > > > > > On Tue, Sep 09, 2008 at 12:52:44PM -0700, Scott Weitzenkamp > > > > (sweitzen) wrote: > > > > > I am unable to build MVAPICH2 for multiple compilers: > > > > > > > > > > Building the MVAPICH2 RPM [OFA]... > > > > > Running rpmbuild --rebuild --define '_topdir > > /var/tmp/OFED_topdir' > > > > > --define 'di > > > > > st %{nil}' --target x86_64 --define '_name mvapich2_gcc' > > > > --define 'impl > > > > > ofa' --d > > > > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > > > > --with-ib-include=/usr/includ > > > > > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > > > > > 'shared_libs 1' - > > > > > -define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran > > > > > F90=gfortran' - > > > > > -define 'auto_req 0' --define 'mpi_selector > > /usr/bin/mpi-selector' > > > > > --define '_pr > > > > > efix /usr/mpi/gcc/mvapich2-1.2rc2' > > > > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src > > > > > .rpm > > > > > Install mvapich2_gcc RPM: > > > > > Running rpm -iv --nodeps > > > > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > > > > apich2_gcc-1.2rc2-4.x86_64.rpm > > > > > Build mvapich2_pgi RPM > > > > > Building the MVAPICH2 RPM [OFA]... > > > > > Running rpmbuild --rebuild --define '_topdir > > /var/tmp/OFED_topdir' > > > > > --define 'di > > > > > st %{nil}' --target x86_64 --define '_name mvapich2_pgi' > > > > --define 'impl > > > > > ofa' --d > > > > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > > > > --with-ib-include=/usr/includ > > > > > e' --define 'ib_libpath --with-ib-libpath=/usr/lib64' --define > > > > > 'shared_libs 1' - > > > > > -define 'romio 1' --define 'comp_env CC=pgcc CXX=pgCC F77=pgf77 > > > > > F90=pgf90' --def > > > > > ine 'auto_req 0' --define 'mpi_selector > > > > /usr/bin/mpi-selector' --define > > > > > '_prefix > > > > > /usr/mpi/pgi/mvapich2-1.2rc2' > > > > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src.rpm > > > > > Install mvapich2_pgi RPM: > > > > > Running rpm -iv --nodeps > > > > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > > > > apich2_pgi-1.2rc2-4.x86_64.rpm > > > > > Failed to install mvapich2_pgi RPM > > > > > See /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > > > > > > > > # more /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > > > Preparing packages for installation... > > > > > file /etc/mpe_graphics.conf from install of > > > > > mvapich2_pgi-1.2rc2-4 confli > > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpe_log.conf from install of > > mvapich2_pgi-1.2rc2-4 > > > > > conflicts w > > > > > ith file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpe_mpianim.conf from install of > > > > mvapich2_pgi-1.2rc2-4 > > > > > conflic > > > > > ts with file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpe_mpicheck.conf from install of > > > > > mvapich2_pgi-1.2rc2-4 confli > > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpe_mpilog.conf from install of > > > > mvapich2_pgi-1.2rc2-4 > > > > > conflict > > > > > s with file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpe_mpitrace.conf from install of > > > > > mvapich2_pgi-1.2rc2-4 confli > > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpe_nolog.conf from install of > > > > mvapich2_pgi-1.2rc2-4 > > > > > conflicts > > > > > with file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpicc.conf from install of > > mvapich2_pgi-1.2rc2-4 > > > > > conflicts wit > > > > > h file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpicxx.conf from install of > > mvapich2_pgi-1.2rc2-4 > > > > > conflicts wi > > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpif77.conf from install of > > mvapich2_pgi-1.2rc2-4 > > > > > conflicts wi > > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > > file /etc/mpif90.conf from install of > > mvapich2_pgi-1.2rc2-4 > > > > > conflicts wi > > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > > > > > > > Scott Weitzenkamp > > > > > SQA and Release Manager > > > > > Server Access Virtualization Business Unit > > > > > Cisco Systems > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: general-bounces at lists.openfabrics.org > > > > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > > > > > Tziporet Koren > > > > > > Sent: Tuesday, September 09, 2008 8:20 AM > > > > > > To: ewg at lists.openfabrics.org > > > > > > Cc: general at lists.openfabrics.org > > > > > > Subject: [ofa-general] OFED 1.4-RC1 is available > > > > > > > > > > > > Hi, > > > > > > OFED 1.4-RC1 release is available on > > > > > > > > > > > > http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc1.tgz > > > > > > > > > > > > To get BUILD_ID run ofed_info > > > > > > > > > > > > Please report any issues in bugzilla > > > > https://bugs.openfabrics.org/ for > > > > > > OFED 1.4 > > > > > > > > > > > > Tziporet & Vladimir > > > > > > > > > > > > ============================================================== > > > > > > ========== > > > > > > > > > > > > Release information: > > > > > > -------------------- > > > > > > Linux Operating Systems: > > > > > > - RedHat EL4 up4: 2.6.9-42.ELsmp * > > > > > > - RedHat EL4 up5: 2.6.9-55.ELsmp > > > > > > - RedHat EL4 up6: 2.6.9-67.ELsmp > > > > > > - RedHat EL4 up7: 2.6.9-78.ELsmp > > > > > > - RedHat EL5: 2.6.18-8.el5 > > > > > > - RedHat EL5 up1: 2.6.18-53.el5 > > > > > > - RedHat EL5 up2: 2.6.18-92.el5 > > > > > > - CentOS 5.2: 2.6.18-92.el5 > > > > > > - Fedora C9: 2.6.25-14.fc9 * > > > > > > - SLES10: 2.6.16.21-0.8-smp > > > > > > - SLES10 SP1: 2.6.16.46-0.12-smp > > > > > > - SLES10 SP1 up1: 2.6.16.53-0.16-smp > > > > > > - SLES10 SP2: 2.6.16.60-0.21-smp > > > > > > - OpenSuSE 10.3: 2.6.22.5-31 * > > > > > > - kernel.org: 2.6.26 and 2.6.27-rc5 > > > > > > > > > > > > * Minimal QA for these versions > > > > > > > > > > > > Systems: > > > > > > * x86_64 > > > > > > * x86 > > > > > > * ia64 > > > > > > * ppc64 > > > > > > > > > > > > > > > > > > Main Changes from OFED 1.4-beta > > > > > > =============================== > > > > > > o Kernel code based on 2.6.27-rc5 > > > > > > o Added NFS-RDMA support for SLES10 SP2 and kernel > > 2.6.26 and 27 > > > > > > o iSER backports added and its now available > > > > > > o New MPI packages: Open MPI 1.2.7, MVAPICH 1.1 and > > MVAPICH2 1.1 > > > > > > o New DAPL libraries > > > > > > o 37 bugs fixed (see attached for details) > > > > > > > > > > > > > > > > > > Tasks that should be completed for the RC2: > > > > > > =========================================== > > > > > > 1. NFS-RDMA to work on RHEL 5.1 > > > > > > 2. OSM: Cashed routing > > > > > > 3. Cleanup compilation warning > > > > > > 4. Bug fixes > > > > > > > > > > > _______________________________________________ > > > > > ewg mailing list > > > > > ewg at lists.openfabrics.org > > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > > > > > -- > > > > Jonathan Perkins > > > > http://www.cse.ohio-state.edu/~perkinjo > > > > > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > From sweitzen at cisco.com Wed Sep 10 15:27:17 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 10 Sep 2008 15:27:17 -0700 Subject: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available In-Reply-To: References: Message-ID: Sure, that's fine. I don't really have a strong opinion, just wanted to know what you were up to. Scott > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] > Sent: Wednesday, September 10, 2008 3:25 PM > To: Scott Weitzenkamp (sweitzen) > Cc: Jonathan Perkins; ewg at lists.openfabrics.org; > general at lists.openfabrics.org; Dhabaleswar Panda > Subject: RE: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > > So I can have mpirun_rsh *or* mpiexec, but not both? > > Our goal is to provide both. However, it will take us some > time to make > the necessary changes to the SRPM creation process and test it. For > tomorrow's SRPM version, we will provide the option for having one of > these two. By next week, we will have an SRPM update to have both. > > Will that work out? > > Thanks, > > DK > > > Scott > > > > > > > > > -----Original Message----- > > > From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] > > > Sent: Wednesday, September 10, 2008 3:15 PM > > > To: Scott Weitzenkamp (sweitzen) > > > Cc: Jonathan Perkins; ewg at lists.openfabrics.org; > > > general at lists.openfabrics.org; Dhabaleswar Panda > > > Subject: RE: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > > > > > Hi Scott, > > > > > > Thanks for your note. Starting with MVAPICH2 1.2, a new scalable > > > mpirun_rsh job start-up framework (similar to the one used in > > > MVAPICH) has > > > been introduced. This allows MVAPICH2 to start on > multi-thousand core > > > clusters with very little time (like MVAPICH). It also allows > > > job start-up > > > scheme to be uniform across MVAPICH and MVAPICH2. The traditional > > > MPD/mpiexec job start-up option is still there. In the latest > > > MVAPICH2 1.2 > > > SRPM (1.2rc2-4), the default has been set for the new > > > scalable start-up > > > scheme. That's why you are not able to have it built with > > > mpiexec. Since > > > Jonathan is updating the SRPM to take care of the > multiple compilers > > > errors (you reported yesterday), we will also include an > > > option to have > > > either of these two job start-up schemes (A. the new scalable > > > mpirun_rsh > > > framework or B. the traditional MPD-based framework) > > > installed. The new > > > SRPM to be uploaded by tomorrow will have all these fixes. > > > > > > Let us know if this will work out for you. > > > > > > Thanks, > > > > > > DK > > > > > > On Wed, 10 Sep 2008, Scott Weitzenkamp (sweitzen) wrote: > > > > > > > I'm also not getting mpiexec built, at least on the first > > > distro I tried > > > > (RHEL4 x86_64): > > > > > > > > # rpm -qlip mvapich2_gcc-1.2rc2-4.x86_64.rpm | fgrep mpiexec > > > > /usr/mpi/gcc/mvapich2-1.2rc2/bin/mpiexec.mpd > > > > > > > > Scott > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu] > > > > > Sent: Tuesday, September 09, 2008 1:25 PM > > > > > To: Scott Weitzenkamp (sweitzen) > > > > > Cc: Tziporet Koren; ewg at lists.openfabrics.org; > > > > > general at lists.openfabrics.org > > > > > Subject: Re: [ewg] RE: [ofa-general] OFED 1.4-RC1 is available > > > > > > > > > > Thanks for the note. We are taking a look at this. > > > > > > > > > > On Tue, Sep 09, 2008 at 12:52:44PM -0700, Scott Weitzenkamp > > > > > (sweitzen) wrote: > > > > > > I am unable to build MVAPICH2 for multiple compilers: > > > > > > > > > > > > Building the MVAPICH2 RPM [OFA]... > > > > > > Running rpmbuild --rebuild --define '_topdir > > > /var/tmp/OFED_topdir' > > > > > > --define 'di > > > > > > st %{nil}' --target x86_64 --define '_name mvapich2_gcc' > > > > > --define 'impl > > > > > > ofa' --d > > > > > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > > > > > --with-ib-include=/usr/includ > > > > > > e' --define 'ib_libpath > --with-ib-libpath=/usr/lib64' --define > > > > > > 'shared_libs 1' - > > > > > > -define 'romio 1' --define 'comp_env CC=gcc CXX=g++ > F77=gfortran > > > > > > F90=gfortran' - > > > > > > -define 'auto_req 0' --define 'mpi_selector > > > /usr/bin/mpi-selector' > > > > > > --define '_pr > > > > > > efix /usr/mpi/gcc/mvapich2-1.2rc2' > > > > > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src > > > > > > .rpm > > > > > > Install mvapich2_gcc RPM: > > > > > > Running rpm -iv --nodeps > > > > > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > > > > > apich2_gcc-1.2rc2-4.x86_64.rpm > > > > > > Build mvapich2_pgi RPM > > > > > > Building the MVAPICH2 RPM [OFA]... > > > > > > Running rpmbuild --rebuild --define '_topdir > > > /var/tmp/OFED_topdir' > > > > > > --define 'di > > > > > > st %{nil}' --target x86_64 --define '_name mvapich2_pgi' > > > > > --define 'impl > > > > > > ofa' --d > > > > > > efine 'rdma --with-rdma=gen2' --define 'ib_include > > > > > > --with-ib-include=/usr/includ > > > > > > e' --define 'ib_libpath > --with-ib-libpath=/usr/lib64' --define > > > > > > 'shared_libs 1' - > > > > > > -define 'romio 1' --define 'comp_env CC=pgcc > CXX=pgCC F77=pgf77 > > > > > > F90=pgf90' --def > > > > > > ine 'auto_req 0' --define 'mpi_selector > > > > > /usr/bin/mpi-selector' --define > > > > > > '_prefix > > > > > > /usr/mpi/pgi/mvapich2-1.2rc2' > > > > > > /tmp/OFED-1.4-rc1/SRPMS/mvapich2-1.2rc2-4.src.rpm > > > > > > Install mvapich2_pgi RPM: > > > > > > Running rpm -iv --nodeps > > > > > > /tmp/OFED-1.4-rc1/RPMS/redhat-release-4AS-4.1/x86_64/mv > > > > > > apich2_pgi-1.2rc2-4.x86_64.rpm > > > > > > Failed to install mvapich2_pgi RPM > > > > > > See /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > > > > > > > > > > # more /tmp/OFED.12539.logs/mvapich2_pgi.rpminstall.log > > > > > > Preparing packages for installation... > > > > > > file /etc/mpe_graphics.conf from install of > > > > > > mvapich2_pgi-1.2rc2-4 confli > > > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpe_log.conf from install of > > > mvapich2_pgi-1.2rc2-4 > > > > > > conflicts w > > > > > > ith file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpe_mpianim.conf from install of > > > > > mvapich2_pgi-1.2rc2-4 > > > > > > conflic > > > > > > ts with file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpe_mpicheck.conf from install of > > > > > > mvapich2_pgi-1.2rc2-4 confli > > > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpe_mpilog.conf from install of > > > > > mvapich2_pgi-1.2rc2-4 > > > > > > conflict > > > > > > s with file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpe_mpitrace.conf from install of > > > > > > mvapich2_pgi-1.2rc2-4 confli > > > > > > cts with file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpe_nolog.conf from install of > > > > > mvapich2_pgi-1.2rc2-4 > > > > > > conflicts > > > > > > with file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpicc.conf from install of > > > mvapich2_pgi-1.2rc2-4 > > > > > > conflicts wit > > > > > > h file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpicxx.conf from install of > > > mvapich2_pgi-1.2rc2-4 > > > > > > conflicts wi > > > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpif77.conf from install of > > > mvapich2_pgi-1.2rc2-4 > > > > > > conflicts wi > > > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > > > file /etc/mpif90.conf from install of > > > mvapich2_pgi-1.2rc2-4 > > > > > > conflicts wi > > > > > > th file from package mvapich2_gcc-1.2rc2-4 > > > > > > > > > > > > Scott Weitzenkamp > > > > > > SQA and Release Manager > > > > > > Server Access Virtualization Business Unit > > > > > > Cisco Systems > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: general-bounces at lists.openfabrics.org > > > > > > > [mailto:general-bounces at lists.openfabrics.org] On > Behalf Of > > > > > > > Tziporet Koren > > > > > > > Sent: Tuesday, September 09, 2008 8:20 AM > > > > > > > To: ewg at lists.openfabrics.org > > > > > > > Cc: general at lists.openfabrics.org > > > > > > > Subject: [ofa-general] OFED 1.4-RC1 is available > > > > > > > > > > > > > > Hi, > > > > > > > OFED 1.4-RC1 release is available on > > > > > > > > > > > > > > > > http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc1.tgz > > > > > > > > > > > > > > To get BUILD_ID run ofed_info > > > > > > > > > > > > > > Please report any issues in bugzilla > > > > > https://bugs.openfabrics.org/ for > > > > > > > OFED 1.4 > > > > > > > > > > > > > > Tziporet & Vladimir > > > > > > > > > > > > > > > ============================================================== > > > > > > > ========== > > > > > > > > > > > > > > Release information: > > > > > > > -------------------- > > > > > > > Linux Operating Systems: > > > > > > > - RedHat EL4 up4: 2.6.9-42.ELsmp * > > > > > > > - RedHat EL4 up5: 2.6.9-55.ELsmp > > > > > > > - RedHat EL4 up6: 2.6.9-67.ELsmp > > > > > > > - RedHat EL4 up7: 2.6.9-78.ELsmp > > > > > > > - RedHat EL5: 2.6.18-8.el5 > > > > > > > - RedHat EL5 up1: 2.6.18-53.el5 > > > > > > > - RedHat EL5 up2: 2.6.18-92.el5 > > > > > > > - CentOS 5.2: 2.6.18-92.el5 > > > > > > > - Fedora C9: 2.6.25-14.fc9 * > > > > > > > - SLES10: 2.6.16.21-0.8-smp > > > > > > > - SLES10 SP1: 2.6.16.46-0.12-smp > > > > > > > - SLES10 SP1 up1: 2.6.16.53-0.16-smp > > > > > > > - SLES10 SP2: 2.6.16.60-0.21-smp > > > > > > > - OpenSuSE 10.3: 2.6.22.5-31 * > > > > > > > - kernel.org: 2.6.26 and 2.6.27-rc5 > > > > > > > > > > > > > > * Minimal QA for these versions > > > > > > > > > > > > > > Systems: > > > > > > > * x86_64 > > > > > > > * x86 > > > > > > > * ia64 > > > > > > > * ppc64 > > > > > > > > > > > > > > > > > > > > > Main Changes from OFED 1.4-beta > > > > > > > =============================== > > > > > > > o Kernel code based on 2.6.27-rc5 > > > > > > > o Added NFS-RDMA support for SLES10 SP2 and kernel > > > 2.6.26 and 27 > > > > > > > o iSER backports added and its now available > > > > > > > o New MPI packages: Open MPI 1.2.7, MVAPICH 1.1 and > > > MVAPICH2 1.1 > > > > > > > o New DAPL libraries > > > > > > > o 37 bugs fixed (see attached for details) > > > > > > > > > > > > > > > > > > > > > Tasks that should be completed for the RC2: > > > > > > > =========================================== > > > > > > > 1. NFS-RDMA to work on RHEL 5.1 > > > > > > > 2. OSM: Cashed routing > > > > > > > 3. Cleanup compilation warning > > > > > > > 4. Bug fixes > > > > > > > > > > > > > _______________________________________________ > > > > > > ewg mailing list > > > > > > ewg at lists.openfabrics.org > > > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > > > > > > > -- > > > > > Jonathan Perkins > > > > > http://www.cse.ohio-state.edu/~perkinjo > > > > > > > > > _______________________________________________ > > > > general mailing list > > > > general at lists.openfabrics.org > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > From christopher.tanner at gatech.edu Wed Sep 10 17:52:12 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Wed, 10 Sep 2008 20:52:12 -0400 Subject: [ofa-general] Permission denied Message-ID: I'm receiving this error when I try to execute a mpi executable: [node2][0,1,1][btl_openib_component.c:466:init_one_hca] error obtaining device context for mthca0 errno says Permission denied -------------------------------------------------------------------------- WARNING: There were errors during IB HCA initialization on host 'node2'. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There is at least on IB HCA found on host 'node2', but there is no active ports detected. This is most certainly not what you wanted. Check your cables and SM configuration. -------------------------------------------------------------------------- I'm confused about the 'Permission denied'. My user is part of the group 'rdma', which I thought was supposed to give them permission to access the Infiniband devices. I'm also confused because the trivial test cases such as 'Hello World' and 'hostname' execute on all nodes without errors. The 'no active ports' is also curious. On the master node, I am running OpenSM and it indicates that the port is active (using ibv_devinfo). However, I notice that the 'ibv_devinfo' command can only be run by root. Is this an indication that permissions are not set correctly? As another note, my cluster is running Ubuntu 8.04, so I couldn't use the OFED scripts to install the Infiniband drivers, so I had to compile and install everything from source (which seemed to go fine). Thanks for your help! ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- From vlad at dev.mellanox.co.il Wed Sep 10 21:52:24 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 11 Sep 2008 07:52:24 +0300 Subject: [ofa-general] Compiled IB packages In-Reply-To: References: <0709481C-38BC-4598-870F-44FE8AE44FCE@gatech.edu> <48C6904E.1020606@mellanox.co.il> <1221031735.6948.12.camel@vlad-laptop> <94325E85-9403-4264-A4FE-90A567A8655B@gatech.edu> <48C7D63A.8090005@mellanox.co.il> Message-ID: <48C8A408.9060300@dev.mellanox.co.il> Christopher Tanner wrote: > > I only said this because, on my system, the > /lib/modules/2.6.24-16-server/updates directory doesn't exist; thus the > make process was having an error. However, the > /lib/modules/2.6.24-16-server/kernel does exist, but this directory > wasn't searched by the make process (as far as I can tell). > /lib/modules/`uname -r`/updates directory will be created by the "make install" command. kernel and updates directories (under /lib/modules/`uname -r`) are the target directories for modules installation and they are not searched by the make process. Regards, Vladimir From vlad at lists.openfabrics.org Thu Sep 11 03:07:55 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 11 Sep 2008 03:07:55 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080911-0200 daily build status Message-ID: <20080911100755.B1354E60DEB@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From eli at dev.mellanox.co.il Thu Sep 11 04:37:46 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Thu, 11 Sep 2008 14:37:46 +0300 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled In-Reply-To: References: <20080909145435.GO2316@sgi.com> <20080910135116.GB26881@mtls03> Message-ID: <20080911113746.GA6298@mtls03> On Wed, Sep 10, 2008 at 11:26:16AM -0700, Roland Dreier wrote: > On Tue, Sep 09, 2008 at 02:32:44PM -0700, Roland Dreier wrote: > > By the way, looking at this stuff again, it seems we have (a possibly > > quite unlikely) race where a send can complete before the xmit method > > finishes, and we end up running skb_orphan on an skb that another > > context has already freed. I'll have to think about how we can fix > > that -- but any good ideas are appreciated... > > Actually it looks like Arthur's patch introduces this race. The current > code is OK because skb_orphan is called under tx_lock, which is also > held when we poll the send CQ. But of course the status quo is no good > exactly because of the locking issue Arthur found. > > > We can check if there are outstanding WRs after poll_tx is called. If > > there are no outstanding WRs, it means that the SKB has been freed. If > > there are outstanding WRs, it means that the last post has not been > > freed so we can call skb_orphan(). Like the following patch (on top of > > Arthur's): > > I don't think this closes the race completely: at the point skb_orphan > is called (after Arthur's patch, by design), we have no locks held. And > so the timer-driven send completion handling could already have run and > freed the skb between when we drop tx_lock and when we call skb_orphan. > I don't think there is a problem. The only SKB which is subject to this race, is the one we that we posted right after stopping the net queue. But the interrupt handler (resulting from arming the CQ) and possibly the following timer invocations, will drain the CQ up to the point where there are half the queue outstanding WRs. But and this one is at the other half of the queue. From olga.shern at gmail.com Thu Sep 11 05:12:18 2008 From: olga.shern at gmail.com (Olga Shern (Voltaire)) Date: Thu, 11 Sep 2008 15:12:18 +0300 Subject: [ofa-general] Re: [PATCH] ipoib: send creation parameters when doing send-only join In-Reply-To: References: <48C6A9C1.5070108@gmail.com> <48C790CF.4050505@gmail.com> Message-ID: > Yes, I looked at the bug and I don't see the actual problem that is > caused by the current code. OK, the group doesn't get created if there > are only senders -- so what? The issue accrues when senders are at Infiniband side and receivers are at IP side, when setup includes IP to IB Gateways. > It seems a better fix would be just to get rid of the #if 0 and use > send-only membership after all these years? > So we are back to the same issue we have raised in the following thread and didn't get your reply http://lists.openfabrics.org/pipermail/general/2008-July/053037.html Olga From julia at diku.dk Thu Sep 11 05:33:01 2008 From: julia at diku.dk (Julia Lawall) Date: Thu, 11 Sep 2008 14:33:01 +0200 (CEST) Subject: [ofa-general] [PATCH 1/5] drivers/infiniband/hw: Drop code after return Message-ID: From: Julia Lawall The break after the return serves no purpose. Signed-off-by: Julia Lawall --- drivers/infiniband/hw/amso1100/c2_provider.c | 1 - drivers/infiniband/hw/nes/nes_verbs.c | 3 --- 2 files changed, 4 deletions(-) diff -u -p a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -272,7 +272,6 @@ static struct ib_qp *c2_create_qp(struct pr_debug("%s: Invalid QP type: %d\n", __func__, init_attr->qp_type); return ERR_PTR(-EINVAL); - break; } if (err) { diff -u -p a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -1467,7 +1467,6 @@ static struct ib_qp *nes_create_qp(struc default: nes_debug(NES_DBG_QP, "Invalid QP type: %d\n", init_attr->qp_type); return ERR_PTR(-EINVAL); - break; } /* update the QP table */ @@ -2498,7 +2497,6 @@ static struct ib_mr *nes_reg_user_mr(str nes_debug(NES_DBG_MR, "Leaving, ibmr=%p", ibmr); return ibmr; - break; case IWNES_MEMREG_TYPE_QP: case IWNES_MEMREG_TYPE_CQ: nespbl = kzalloc(sizeof(*nespbl), GFP_KERNEL); @@ -2572,7 +2570,6 @@ static struct ib_mr *nes_reg_user_mr(str nesmr->ibmr.lkey = -1; nesmr->mode = req.reg_type; return &nesmr->ibmr; - break; } return ERR_PTR(-ENOSYS); From richard.genoud at gmail.com Thu Sep 11 06:46:11 2008 From: richard.genoud at gmail.com (Richard Genoud) Date: Thu, 11 Sep 2008 15:46:11 +0200 Subject: [ofa-general] Re: [PATCH 1/5] drivers/infiniband/hw: Drop code after return In-Reply-To: References: Message-ID: <80b317760809110646j1e9c5171v3fe78546cd62c9a7@mail.gmail.com> 2008/9/11 Julia Lawall : > From: Julia Lawall > > The break after the return serves no purpose. > > Signed-off-by: Julia Lawall Reviewed-by: Richard Genoud > --- > drivers/infiniband/hw/amso1100/c2_provider.c | 1 - > drivers/infiniband/hw/nes/nes_verbs.c | 3 --- > 2 files changed, 4 deletions(-) > > diff -u -p a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c > --- a/drivers/infiniband/hw/amso1100/c2_provider.c > +++ b/drivers/infiniband/hw/amso1100/c2_provider.c > @@ -272,7 +272,6 @@ static struct ib_qp *c2_create_qp(struct > pr_debug("%s: Invalid QP type: %d\n", __func__, > init_attr->qp_type); > return ERR_PTR(-EINVAL); > - break; > } > > if (err) { > diff -u -p a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c > --- a/drivers/infiniband/hw/nes/nes_verbs.c > +++ b/drivers/infiniband/hw/nes/nes_verbs.c > @@ -1467,7 +1467,6 @@ static struct ib_qp *nes_create_qp(struc > default: > nes_debug(NES_DBG_QP, "Invalid QP type: %d\n", init_attr->qp_type); > return ERR_PTR(-EINVAL); > - break; > } > > /* update the QP table */ > @@ -2498,7 +2497,6 @@ static struct ib_mr *nes_reg_user_mr(str > nes_debug(NES_DBG_MR, "Leaving, ibmr=%p", ibmr); > > return ibmr; > - break; > case IWNES_MEMREG_TYPE_QP: > case IWNES_MEMREG_TYPE_CQ: > nespbl = kzalloc(sizeof(*nespbl), GFP_KERNEL); > @@ -2572,7 +2570,6 @@ static struct ib_mr *nes_reg_user_mr(str > nesmr->ibmr.lkey = -1; > nesmr->mode = req.reg_type; > return &nesmr->ibmr; > - break; > } > > return ERR_PTR(-ENOSYS); > From yossi.openib at gmail.com Thu Sep 11 09:09:34 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Thu, 11 Sep 2008 19:09:34 +0300 Subject: ***SPAM*** Re: Fwd: [ofa-general] [PATCH] ipoib: fix hang while bringing down uninitialized interface In-Reply-To: References: <48C7DA7B.3050706@gmail.com> Message-ID: <48C942BE.7010606@gmail.com> > Looks like a real issue. Is this a regression from 2.6.26? (ie what > introduced this bug?) > Commit http://www.openfabrics.org/git/?p=ofed_1_4/linux-2.6.git;a=commit;h=57ce41d1d18279cc90223f3deadca70c7de1cfca put the bug in ipoib, but maybe this causes a hang only in recent kernels due to modifications in timer code. --Yossi From tdhanu_2000 at yahoo.com Thu Sep 11 11:07:05 2008 From: tdhanu_2000 at yahoo.com (dhananjay tembe) Date: Thu, 11 Sep 2008 23:37:05 +0530 (IST) Subject: [ofa-general] ***SPAM*** Where can I find the topology file? Message-ID: <803776.49372.qm@web94206.mail.in2.yahoo.com> Hi, I am using ofed stack and opensm. I was running some ibtools like ibdiagnet and ibdiagpath. man page for ibdiagnet shows -t optiong using which you can specify the topology file. Will you please tell me what does this topology file contain and how can create/generate this topology file? Thanks in advance. ---Dhananjay. Add more friends to your messenger and enjoy! Go to http://in.messenger.yahoo.com/invite/ From sashak at voltaire.com Thu Sep 11 13:11:26 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 11 Sep 2008 23:11:26 +0300 Subject: [ofa-general] [PATCH] opensm/opensm.spec: comment out service auto-startup setup Message-ID: <20080911201126.GK25831@sashak.voltaire.com> This addresses bug#1181. Comment out opensm service auto-startup setup at %post section. Signed-off-by: Sasha Khapyorsky --- I don't really know why it was done this way originally. So please send any comments and/or objections. opensm/opensm.spec.in | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in index 2e3abfc..fc7677d 100644 --- a/opensm/opensm.spec.in +++ b/opensm/opensm.spec.in @@ -104,11 +104,11 @@ install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh rm -rf $RPM_BUILD_ROOT %post -if [ $1 = 1 ]; then - /sbin/chkconfig --add opensmd -else - /sbin/service opensmd condrestart -fi +#if [ $1 = 1 ]; then +# /sbin/chkconfig --add opensmd +#else +# /sbin/service opensmd condrestart +#fi %preun if [ $1 = 0 ]; then -- 1.5.4.rc2.60.gb2e62 From sashak at voltaire.com Thu Sep 11 13:36:27 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 11 Sep 2008 23:36:27 +0300 Subject: ***SPAM*** Re: [ofa-general] OpenSM Problems/Questions In-Reply-To: <20080909121140.1ec7838b.weiny2@llnl.gov> References: <20080909121140.1ec7838b.weiny2@llnl.gov> Message-ID: <20080911203627.GN25831@sashak.voltaire.com> On 12:11 Tue 09 Sep , Ira Weiny wrote: > > > > > The following problem that is being encountered may also be SA/SM related. A > > > node (NodeX) may be seen (through IPoIB) by all but a few nodes (NodesA-G). > > > A ping from those node (NodesA-G) to NodeX returns "Destination Host > > > Unreachable". A ping from NodeX to NodesA-G works. > > > > Sounds like perhaps those nodes were unable to join the broadcast > > group perhaps due to a rate issue. > > Hal is correct, and saquery is your friend here. If you use "genders" and > "whatsup" (https://computing.llnl.gov/linux/downloads.html) I have a series of > tools "Pragmatic InfiniBand Utilities (PIU)" > (https://computing.llnl.gov/linux/piu.html) which includes a tool called > "ibnodeinmcast" which can help debug this. What it does is use saquery [-g|-m] > to find nodes in the multicast groups. With the addition of other LLNL tools > this can be boiled down to which nodes "should" be in the group but are not. > You are welcome to download that package and adapt it to your environment. Also there was your fix (after OFED 1.3) which is pretty related to unstable links. Sasha commit e40NB597af556fce55e3b205b0cc4ffa6805aeaa Author: Ira Weiny Date: Thu Apr 24 18:16:57 2008 -0700 opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.) I did not get any output with multicast_debug_level! But I added some more debugging and finally realized that the set was not being sent. :-( I put a debug statement in OpenSM where the flag was set and therefore thought that OpenSM had set the rereg bit. However, since no other data had changed the "set" MAD was not sent. (I am getting a bit tongue tied reading this back. I hope that all makes sense.) Here is a patch which fixes the problem. (At least with the partial sub-nets configuration I explained before.) I will have to verify this fixes the problem I originally reported. Ira From 2e5511d6daf9c586c39698416e4bd36e24b13e62 Mon Sep 17 00:00:00 2001 From: Ira K. Weiny Date: Thu, 24 Apr 2008 18:05:01 -0700 Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit Signed-off-by: Ira K. Weiny Signed-off-by: Sasha Khapyorsky diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c index ab23929..4d628d2 100644 --- a/opensm/opensm/osm_lid_mgr.c +++ b/opensm/opensm/osm_lid_mgr.c @@ -1099,9 +1099,14 @@ __osm_lid_mgr_set_physp_pi(IN osm_lid_mgr_t * const p_mgr, if ((p_mgr->p_subn->first_time_master_sweep == TRUE || p_port->is_new) && !p_mgr->p_subn->opt.no_clients_rereg && ((p_old_pi->capability_mask & IB_PORT_CAP_HAS_CLIENT_REREG) != - 0)) + 0)) { + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, + "Seting client rereg on %s, port %d\n", + p_port->p_node->print_desc, + p_port->p_physp->port_num); ib_port_info_set_client_rereg(p_pi, 1); - else + send_set = TRUE; + } else ib_port_info_set_client_rereg(p_pi, 0); /* We need to send the PortInfo Set request with the new sm_lid From weiny2 at llnl.gov Thu Sep 11 14:13:01 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 11 Sep 2008 14:13:01 -0700 Subject: ***SPAM*** Re: [ofa-general] OpenSM Problems/Questions In-Reply-To: <20080911203627.GN25831@sashak.voltaire.com> References: <20080909121140.1ec7838b.weiny2@llnl.gov> <20080911203627.GN25831@sashak.voltaire.com> Message-ID: <20080911141301.4823682d.weiny2@llnl.gov> On Thu, 11 Sep 2008 23:36:27 +0300 Sasha Khapyorsky wrote: > On 12:11 Tue 09 Sep , Ira Weiny wrote: > > > > > > > The following problem that is being encountered may also be SA/SM related. A > > > > node (NodeX) may be seen (through IPoIB) by all but a few nodes (NodesA-G). > > > > A ping from those node (NodesA-G) to NodeX returns "Destination Host > > > > Unreachable". A ping from NodeX to NodesA-G works. > > > > > > Sounds like perhaps those nodes were unable to join the broadcast > > > group perhaps due to a rate issue. > > > > Hal is correct, and saquery is your friend here. If you use "genders" and > > "whatsup" (https:// computing.llnl.gov/linux/downloads.html) I have a series of > > tools "Pragmatic InfiniBand Utilities (PIU)" > > (https:// computing.llnl.gov/linux/piu.html) which includes a tool called > > "ibnodeinmcast" which can help debug this. What it does is use saquery [-g|-m] > > to find nodes in the multicast groups. With the addition of other LLNL tools > > this can be boiled down to which nodes "should" be in the group but are not. > > You are welcome to download that package and adapt it to your environment. > > Also there was your fix (after OFED 1.3) which is pretty related to > unstable links. True, but as I understood this is happening right after boot. Is this true? Ira > > Sasha > > > commit e40NB597af556fce55e3b205b0cc4ffa6805aeaa > Author: Ira Weiny > Date: Thu Apr 24 18:16:57 2008 -0700 > > opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit > > (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.) > > I did not get any output with multicast_debug_level! But I added some more > debugging and finally realized that the set was not being sent. :-( I put a > debug statement in OpenSM where the flag was set and therefore thought that > OpenSM had set the rereg bit. However, since no other data had changed the > "set" MAD was not sent. (I am getting a bit tongue tied reading this back. I > hope that all makes sense.) > > Here is a patch which fixes the problem. (At least with the partial sub-nets > configuration I explained before.) I will have to verify this fixes the problem > I originally reported. > > Ira > > From 2e5511d6daf9c586c39698416e4bd36e24b13e62 Mon Sep 17 00:00:00 2001 > From: Ira K. Weiny > Date: Thu, 24 Apr 2008 18:05:01 -0700 > Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit > > Signed-off-by: Ira K. Weiny > Signed-off-by: Sasha Khapyorsky > > diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c > index ab23929..4d628d2 100644 > --- a/opensm/opensm/osm_lid_mgr.c > +++ b/opensm/opensm/osm_lid_mgr.c > @@ -1099,9 +1099,14 @@ __osm_lid_mgr_set_physp_pi(IN osm_lid_mgr_t * const p_mgr, > if ((p_mgr->p_subn->first_time_master_sweep == TRUE || p_port->is_new) > && !p_mgr->p_subn->opt.no_clients_rereg > && ((p_old_pi->capability_mask & IB_PORT_CAP_HAS_CLIENT_REREG) != > - 0)) > + 0)) { > + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > + "Seting client rereg on %s, port %d\n", > + p_port->p_node->print_desc, > + p_port->p_physp->port_num); > ib_port_info_set_client_rereg(p_pi, 1); > - else > + send_set = TRUE; > + } else > ib_port_info_set_client_rereg(p_pi, 0); > > /* We need to send the PortInfo Set request with the new sm_lid > From rdreier at cisco.com Thu Sep 11 14:19:24 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 Sep 2008 14:19:24 -0700 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled In-Reply-To: <20080911113746.GA6298@mtls03> (Eli Cohen's message of "Thu, 11 Sep 2008 14:37:46 +0300") References: <20080909145435.GO2316@sgi.com> <20080910135116.GB26881@mtls03> <20080911113746.GA6298@mtls03> Message-ID: > I don't think there is a problem. The only SKB which is subject to > this race, is the one we that we posted right after stopping the net > queue. But the interrupt handler (resulting from arming the CQ) and > possibly the following timer invocations, will drain the CQ up to the > point where there are half the queue outstanding WRs. But and this one > is at the other half of the queue. Maybe I'm missing something but where is the logic that stops draining the CQ? I just see static int poll_tx(struct ipoib_dev_priv *priv) { int n, i; n = ib_poll_cq(priv->send_cq, MAX_SEND_CQE, priv->send_wc); for (i = 0; i < n; ++i) ipoib_ib_handle_tx_wc(priv->dev, priv->send_wc + i); return n == MAX_SEND_CQE; } and static void drain_tx_cq(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); unsigned long flags; spin_lock_irqsave(&priv->tx_lock, flags); while (poll_tx(priv)) ; /* nothing */ which seem like they could easily poll that last completion. From rdreier at cisco.com Thu Sep 11 14:20:39 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 Sep 2008 14:20:39 -0700 Subject: Fwd: [ofa-general] [PATCH] ipoib: fix hang while bringing down uninitialized interface In-Reply-To: <48C942BE.7010606@gmail.com> (Yossi Etigin's message of "Thu, 11 Sep 2008 19:09:34 +0300") References: <48C7DA7B.3050706@gmail.com> <48C942BE.7010606@gmail.com> Message-ID: > Commit http://www.openfabrics.org/git/?p=ofed_1_4/linux-2.6.git;a=commit;h=57ce41d1d18279cc90223f3deadca70c7de1cfca > put the bug in ipoib, but maybe this causes a hang only in recent kernels > due to modifications in timer code. So it looks like not a regression from 2.6.26... I'll queue this for 2.6.28 From chu11 at llnl.gov Thu Sep 11 14:27:59 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 11 Sep 2008 14:27:59 -0700 Subject: [ofa-general] [PATCH] opensm/opensm.spec: comment out service auto-startup setup In-Reply-To: <20080911201126.GK25831@sashak.voltaire.com> References: <20080911201126.GK25831@sashak.voltaire.com> Message-ID: <1221168479.19185.135.camel@cardanus.llnl.gov> Hey Sasha, Although the %post script below may not be 100% portable, I think it's pretty typical for system daemon rpms. A quick "rpm -q --scripts " shows its pretty common for system daemons on RHEL. It should be tweaked for portability rather than being removed. Personally, I've never done "/sbin/service FOO condrestart" in rpm scripts. I do "%{initrddir}/FOO condrestart". Maybe that's more portable?? Al On Thu, 2008-09-11 at 23:11 +0300, Sasha Khapyorsky wrote: > This addresses bug#1181. > > Comment out opensm service auto-startup setup at %post section. > > Signed-off-by: Sasha Khapyorsky > --- > > I don't really know why it was done this way originally. So please send > any comments and/or objections. > > opensm/opensm.spec.in | 10 +++++----- > 1 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in > index 2e3abfc..fc7677d 100644 > --- a/opensm/opensm.spec.in > +++ b/opensm/opensm.spec.in > @@ -104,11 +104,11 @@ install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh > rm -rf $RPM_BUILD_ROOT > > %post > -if [ $1 = 1 ]; then > - /sbin/chkconfig --add opensmd > -else > - /sbin/service opensmd condrestart > -fi > +#if [ $1 = 1 ]; then > +# /sbin/chkconfig --add opensmd > +#else > +# /sbin/service opensmd condrestart > +#fi > > %preun > if [ $1 = 0 ]; then -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From rdreier at cisco.com Thu Sep 11 19:59:32 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 Sep 2008 19:59:32 -0700 Subject: [ofa-general] Re: [PATCH] ipoib: send creation parameters when doing send-only join In-Reply-To: (Olga Shern's message of "Thu, 11 Sep 2008 15:12:18 +0300") References: <48C6A9C1.5070108@gmail.com> <48C790CF.4050505@gmail.com> Message-ID: > The issue accrues when senders are at Infiniband side and receivers > are at IP side, when setup includes IP to IB Gateways. Shouldn't the IP to IB gateway be a full member of any multicast groups it wants to forward? And figure out which groups to forward by snooping IGMP? > So we are back to the same issue we have raised in the following > thread and didn't get your reply > http://lists.openfabrics.org/pipermail/general/2008-July/053037.html Sorry I let that drop, but I don't see what that issue has to do with the question of whether IPoIB should finally use send-only membership? - R. From vlad at lists.openfabrics.org Fri Sep 12 03:07:27 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 12 Sep 2008 03:07:27 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080912-0200 daily build status Message-ID: <20080912100727.BEFD9E60E09@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From yossi.openib at gmail.com Fri Sep 12 04:22:09 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Fri, 12 Sep 2008 14:22:09 +0300 Subject: [ofa-general] [PATCH v2] ipiob: fix rtnl deadlock In-Reply-To: References: <4899CF0A.1060509@Voltaire.COM> <32cb786f0808081155o19f8fb9dm217cd6996dffa3e5@mail.gmail.com> <32cb786f0808090538j272842b1r5117547cccde0d06@mail.gmail.com> <32cb786f0808161218o417553b5w1738a517f0eb468a@mail.gmail.com> Message-ID: <48CA50E1.2090309@gmail.com> Seems like taking rtnl_lock in ipoib_mcast_join_complete() also causes a deadlock. See bug #1186. Roland Dreier wrote: > > What if you bring the device down, while you get a join completion event? > > ipoib_stop() can run in parellel with ipoib_mcast_join_complete(), and you > > will just wait for ipoib_stop() to finish to do netif_carrier_on() afterwards. > > Yes, but after ipoib_stop() finishes, netif_carrier_on() doesn't do > anything that could cause a problem, since the netdev is down. > > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From eli at dev.mellanox.co.il Fri Sep 12 07:25:05 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Fri, 12 Sep 2008 17:25:05 +0300 Subject: [ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled In-Reply-To: References: <20080909145435.GO2316@sgi.com> <20080910135116.GB26881@mtls03> <20080911113746.GA6298@mtls03> Message-ID: <1221229505.6869.11.camel@eli-lt> On Thu, 2008-09-11 at 14:19 -0700, Roland Dreier wrote: > Maybe I'm missing something but where is the logic that stops draining > the CQ? I just see > Well, there is a hole after all... you're right. > static int poll_tx(struct ipoib_dev_priv *priv) > { > int n, i; > > n = ib_poll_cq(priv->send_cq, MAX_SEND_CQE, priv->send_wc); > for (i = 0; i < n; ++i) > ipoib_ib_handle_tx_wc(priv->dev, priv->send_wc + i); > > return n == MAX_SEND_CQE; > } > > and > > static void drain_tx_cq(struct net_device *dev) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > unsigned long flags; > > spin_lock_irqsave(&priv->tx_lock, flags); > while (poll_tx(priv)) > ; /* nothing */ > > which seem like they could easily poll that last completion. There is no problem to poll the last completion. It's a problem if this code polls the last completion before the transmit function calls skb_orphan() on it, and that I think does not have much chances to happen. But, if we agree that the SKB posted just before the queue is stopped, is the problematic one, we can extend the following condition to be: if (unlikely(priv->tx_outstanding > MAX_SEND_CQE)) while (poll_tx(priv)) ; /* nothing */ - return ret; + return !(sent && priv->tx_outstanding && !netif_queue_stopped(dev)); } what do you think? From rdreier at cisco.com Fri Sep 12 08:20:29 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 Sep 2008 08:20:29 -0700 Subject: [ofa-general] [PATCH v2] ipiob: fix rtnl deadlock In-Reply-To: <48CA50E1.2090309@gmail.com> (Yossi Etigin's message of "Fri, 12 Sep 2008 14:22:09 +0300") References: <4899CF0A.1060509@Voltaire.COM> <32cb786f0808081155o19f8fb9dm217cd6996dffa3e5@mail.gmail.com> <32cb786f0808090538j272842b1r5117547cccde0d06@mail.gmail.com> <32cb786f0808161218o417553b5w1738a517f0eb468a@mail.gmail.com> <48CA50E1.2090309@gmail.com> Message-ID: > Seems like taking rtnl_lock in ipoib_mcast_join_complete() also causes a deadlock. > See bug #1186. I have to admit the deadlock isn't obvious to me... ipoib_mcast_join_complete() runs in the ib_mad1 thread, so I'm not sure how that thread is getting flushed. Can you reproduce this deadlock with lockdep enabled and get the output from that? - R. From yossi.openib at gmail.com Fri Sep 12 08:25:26 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Fri, 12 Sep 2008 18:25:26 +0300 Subject: ***SPAM*** Re: [ofa-general] [PATCH v2] ipiob: fix rtnl deadlock In-Reply-To: References: <4899CF0A.1060509@Voltaire.COM> <32cb786f0808081155o19f8fb9dm217cd6996dffa3e5@mail.gmail.com> <32cb786f0808090538j272842b1r5117547cccde0d06@mail.gmail.com> <32cb786f0808161218o417553b5w1738a517f0eb468a@mail.gmail.com> <48CA50E1.2090309@gmail.com> Message-ID: <48CA89E6.8030301@gmail.com> ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush() which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this wait until the multicast completion handler finishes. This happens to be ipoib_mcast_join_complete(), which waits for the rtnl_lock(), whcih was already taken by ipoib_stop(). Roland Dreier wrote: > > Seems like taking rtnl_lock in ipoib_mcast_join_complete() also causes a deadlock. > > See bug #1186. > > I have to admit the deadlock isn't obvious to > me... ipoib_mcast_join_complete() runs in the ib_mad1 thread, so I'm not > sure how that thread is getting flushed. Can you reproduce this > deadlock with lockdep enabled and get the output from that? > > - R. > From ctung at neteffect.com Fri Sep 12 09:22:15 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 12 Sep 2008 11:22:15 -0500 Subject: [ofa-general] ***SPAM*** [PATCH] RDMA/nes: client side QP destroy Message-ID: <200809121622.m8CGMFZ6001609@velma.neteffect.com> Author: Faisal Latif * Fixed QP not destroyed properly on the client. * Misc cleanup in nes_cm.c patch verified with rping. Signed-off-by: Faisal Latif -- Roland, Please consider this for 2.6.27. It has been applied and tested against 2.6.27-rc5. drivers/infiniband/hw/nes/nes_cm.c | 20 +++++++------------- 1 files changed, 7 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 9f0b964..8793aa4 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1145,7 +1145,7 @@ static int rem_ref_cm_node(struct nes_cm_core *cm_core, struct nes_timer_entry *recv_entry; struct iw_cm_id *cm_id; struct list_head *list_core, *list_node_temp; - struct nes_qp *nesqp; + struct nes_qp *nesqp = NULL; if (!cm_node) return -EINVAL; @@ -1826,7 +1826,7 @@ static struct nes_cm_listener *mini_cm_listen(struct nes_cm_core *cm_core, /** * mini_cm_connect - make a connection node with params */ -struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, +static struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, struct nes_vnic *nesvnic, u16 private_data_len, void *private_data, struct nes_cm_info *cm_info) { @@ -1835,7 +1835,7 @@ struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, struct nes_cm_listener *loopbackremotelistener; struct nes_cm_node *loopbackremotenode; struct nes_cm_info loopback_cm_info; - u16 mpa_frame_size = sizeof(struct ietf_mpa_frame) + private_data_len; + u16 mpa_frame_size = 0; struct ietf_mpa_frame *mpa_frame = NULL; /* create a CM connection node */ @@ -1847,7 +1847,8 @@ struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, mpa_frame->flags = IETF_MPA_FLAGS_CRC; mpa_frame->rev = IETF_MPA_VERSION; mpa_frame->priv_data_len = htons(private_data_len); - + mpa_frame_size = sizeof(struct ietf_mpa_frame) + + private_data_len; /* set our node side to client (active) side */ cm_node->tcp_cntxt.client = 1; cm_node->tcp_cntxt.rcv_wscale = NES_CM_DEFAULT_RCV_WND_SCALE; @@ -1956,13 +1957,6 @@ static int mini_cm_reject(struct nes_cm_core *cm_core, return ret; cleanup_retrans_entry(cm_node); cm_node->state = NES_CM_STATE_CLOSED; - ret = send_fin(cm_node, NULL); - - if (cm_node->accept_pend) { - BUG_ON(!cm_node->listener); - atomic_dec(&cm_node->listener->pend_accepts_cnt); - BUG_ON(atomic_read(&cm_node->listener->pend_accepts_cnt) < 0); - } ret = send_reset(cm_node, NULL); return ret; @@ -2383,6 +2377,7 @@ static int nes_cm_disconn_true(struct nes_qp *nesqp) atomic_inc(&cm_disconnects); cm_event.event = IW_CM_EVENT_DISCONNECT; if (last_ae == NES_AEQE_AEID_LLP_CONNECTION_RESET) { + issued_disconnect_reset = 1; cm_event.status = IW_CM_EVENT_STATUS_RESET; nes_debug(NES_DBG_CM, "Generating a CM " "Disconnect Event (status reset) for " @@ -2508,7 +2503,6 @@ static int nes_disconnect(struct nes_qp *nesqp, int abrupt) nes_debug(NES_DBG_CM, "Call close API\n"); g_cm_core->api->close(g_cm_core, nesqp->cm_node); - nesqp->cm_node = NULL; } return ret; @@ -2837,6 +2831,7 @@ int nes_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) cm_node->apbvt_set = 1; nesqp->cm_node = cm_node; cm_node->nesqp = nesqp; + nes_add_ref(&nesqp->ibqp); return 0; } @@ -3167,7 +3162,6 @@ static void cm_event_connect_error(struct nes_cm_event *event) if (ret) printk(KERN_ERR "%s[%u] OFA CM event_handler returned, " "ret=%d\n", __func__, __LINE__, ret); - nes_rem_ref(&nesqp->ibqp); cm_id->rem_ref(cm_id); rem_ref_cm_node(event->cm_node->cm_core, event->cm_node); From ctung at neteffect.com Fri Sep 12 09:22:15 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 12 Sep 2008 11:22:15 -0500 Subject: [ofa-general] ***SPAM*** [PATCH] RDMA/nes: 4 port 1G HP blade card support Message-ID: <200809121622.m8CGMFVS001611@velma.neteffect.com> * Adding support for NetEffect 4 port 1G HP blade card. The mapping between physical port and MAC is different from the standup card. Signed-off-by: Chien Tung -- Roland, Please consider this for 2.6.27. It has been applied and tested against 2.6.27-rc5. drivers/infiniband/hw/nes/nes.c | 29 +++++++++++++--- drivers/infiniband/hw/nes/nes_hw.c | 66 +++++++++++++++++++++++++++-------- drivers/infiniband/hw/nes/nes_hw.h | 1 + 3 files changed, 76 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index b0cab64..a539685 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -562,7 +562,26 @@ static int __devinit nes_probe(struct pci_dev *pcidev, const struct pci_device_i nesdev->nesadapter->pd_config_base[PCI_FUNC(nesdev->pcidev->devfn)]; */ nesdev->base_doorbell_index = 1; nesdev->doorbell_start = nesdev->nesadapter->doorbell_start; - nesdev->mac_index = PCI_FUNC(nesdev->pcidev->devfn) % nesdev->nesadapter->port_count; + if (nesdev->nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G) { + switch (PCI_FUNC(nesdev->pcidev->devfn) % + nesdev->nesadapter->port_count) { + case 1: + nesdev->mac_index = 2; + break; + case 2: + nesdev->mac_index = 1; + break; + case 3: + nesdev->mac_index = 3; + break; + case 0: + default: + nesdev->mac_index = 0; + } + } else { + nesdev->mac_index = PCI_FUNC(nesdev->pcidev->devfn) % + nesdev->nesadapter->port_count; + } tasklet_init(&nesdev->dpc_tasklet, nes_dpc, (unsigned long)nesdev); @@ -581,7 +600,7 @@ static int __devinit nes_probe(struct pci_dev *pcidev, const struct pci_device_i nesdev->int_req = (0x101 << PCI_FUNC(nesdev->pcidev->devfn)) | (1 << (PCI_FUNC(nesdev->pcidev->devfn)+16)); if (PCI_FUNC(nesdev->pcidev->devfn) < 4) { - nesdev->int_req |= (1 << (PCI_FUNC(nesdev->pcidev->devfn)+24)); + nesdev->int_req |= (1 << (PCI_FUNC(nesdev->mac_index)+24)); } /* TODO: This really should be the first driver to load, not function 0 */ @@ -772,14 +791,14 @@ static ssize_t nes_show_adapter(struct device_driver *ddp, char *buf) list_for_each_entry(nesdev, &nes_dev_list, list) { if (i == ee_flsh_adapter) { - devfn = nesdev->nesadapter->devfn; - bus_number = nesdev->nesadapter->bus_number; + devfn = nesdev->pcidev->devfn; + bus_number = nesdev->pcidev->bus->number; break; } i++; } - return snprintf(buf, PAGE_SIZE, "%x:%x", bus_number, devfn); + return snprintf(buf, PAGE_SIZE, "%x:%x\n", bus_number, devfn); } static ssize_t nes_store_adapter(struct device_driver *ddp, diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 1513d40..bdd98e6 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -61,7 +61,7 @@ u32 int_mod_cq_depth_1; static void nes_cqp_ce_handler(struct nes_device *nesdev, struct nes_hw_cq *cq); static void nes_init_csr_ne020(struct nes_device *nesdev, u8 hw_rev, u8 port_count); static int nes_init_serdes(struct nes_device *nesdev, u8 hw_rev, u8 port_count, - u8 OneG_Mode); + struct nes_adapter *nesadapter, u8 OneG_Mode); static void nes_nic_napi_ce_handler(struct nes_device *nesdev, struct nes_hw_nic_cq *cq); static void nes_process_aeq(struct nes_device *nesdev, struct nes_hw_aeq *aeq); static void nes_process_ceq(struct nes_device *nesdev, struct nes_hw_ceq *ceq); @@ -292,9 +292,6 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) { if ((port_count = nes_reset_adapter_ne020(nesdev, &OneG_Mode)) == 0) return NULL; - if (nes_init_serdes(nesdev, hw_rev, port_count, OneG_Mode)) - return NULL; - nes_init_csr_ne020(nesdev, hw_rev, port_count); max_qp = nes_read_indexed(nesdev, NES_IDX_QP_CTX_SIZE); nes_debug(NES_DBG_INIT, "QP_CTX_SIZE=%u\n", max_qp); @@ -353,6 +350,19 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) { nes_debug(NES_DBG_INIT, "Allocating new nesadapter @ %p, size = %u (actual size = %u).\n", nesadapter, (u32)sizeof(struct nes_adapter), adapter_size); + if (nes_read_eeprom_values(nesdev, nesadapter)) { + printk(KERN_ERR PFX "Unable to read EEPROM data.\n"); + kfree(nesadapter); + return NULL; + } + + if (nes_init_serdes(nesdev, hw_rev, port_count, nesadapter, + OneG_Mode)) { + kfree(nesadapter); + return NULL; + } + nes_init_csr_ne020(nesdev, hw_rev, port_count); + /* populate the new nesadapter */ nesadapter->devfn = nesdev->pcidev->devfn; nesadapter->bus_number = nesdev->pcidev->bus->number; @@ -468,20 +478,25 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) { /* setup port configuration */ if (nesadapter->port_count == 1) { - u32temp = 0x00000000; + nesadapter->log_port = 0x00000000; if (nes_drv_opt & NES_DRV_OPT_DUAL_LOGICAL_PORT) nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE, 0x00000002); else nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE, 0x00000003); } else { - if (nesadapter->port_count == 2) - u32temp = 0x00000044; - else - u32temp = 0x000000e4; + if (nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G) { + nesadapter->log_port = 0x000000D8; + } else { + if (nesadapter->port_count == 2) + nesadapter->log_port = 0x00000044; + else + nesadapter->log_port = 0x000000e4; + } nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE, 0x00000003); } - nes_write_indexed(nesdev, NES_IDX_NIC_LOGPORT_TO_PHYPORT, u32temp); + nes_write_indexed(nesdev, NES_IDX_NIC_LOGPORT_TO_PHYPORT, + nesadapter->log_port); nes_debug(NES_DBG_INIT, "Probe time, LOG2PHY=%u\n", nes_read_indexed(nesdev, NES_IDX_NIC_LOGPORT_TO_PHYPORT)); @@ -706,23 +721,43 @@ static unsigned int nes_reset_adapter_ne020(struct nes_device *nesdev, u8 *OneG_ * nes_init_serdes */ static int nes_init_serdes(struct nes_device *nesdev, u8 hw_rev, u8 port_count, - u8 OneG_Mode) + struct nes_adapter *nesadapter, u8 OneG_Mode) { int i; u32 u32temp; + u32 serdes_common_control; if (hw_rev != NE020_REV) { /* init serdes 0 */ nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_CDR_CONTROL0, 0x000000FF); - if (!OneG_Mode) + if (nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G) { + serdes_common_control = nes_read_indexed(nesdev, + NES_IDX_ETH_SERDES_COMMON_CONTROL0); + serdes_common_control |= 0x000000100; + nes_write_indexed(nesdev, + NES_IDX_ETH_SERDES_COMMON_CONTROL0, + serdes_common_control); + } else if (!OneG_Mode) { nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_TX_HIGHZ_LANE_MODE0, 0x11110000); - if (port_count > 1) { + } + if (((port_count > 1) && + (nesadapter->phy_type[0] != NES_PHY_TYPE_PUMA_1G)) || + ((port_count > 2) && + (nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G))) { /* init serdes 1 */ nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_CDR_CONTROL1, 0x000000FF); - if (!OneG_Mode) + if (nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G) { + serdes_common_control = nes_read_indexed(nesdev, + NES_IDX_ETH_SERDES_COMMON_CONTROL1); + serdes_common_control |= 0x000000100; + nes_write_indexed(nesdev, + NES_IDX_ETH_SERDES_COMMON_CONTROL1, + serdes_common_control); + } else if (!OneG_Mode) { nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_TX_HIGHZ_LANE_MODE1, 0x11110000); } + } } else { /* init serdes 0 */ nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, 0x00000008); @@ -2258,7 +2293,8 @@ static void nes_process_mac_intr(struct nes_device *nesdev, u32 mac_number) spin_unlock_irqrestore(&nesadapter->phy_lock, flags); } /* read the PHY interrupt status register */ - if (nesadapter->OneG_Mode) { + if ((nesadapter->OneG_Mode) && + (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { do { nes_read_1G_phy_reg(nesdev, 0x1a, nesadapter->phy_index[mac_index], &phy_data); diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 7b81e0a..fc0f063 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -1100,6 +1100,7 @@ struct nes_adapter { u8 mac_sw_state[4]; u8 mac_link_down[4]; u8 phy_type[4]; + u8 log_port; /* PCI information */ unsigned int devfn; From AHKumar at odu.edu Fri Sep 12 09:43:09 2008 From: AHKumar at odu.edu (Kumar, Amit H.) Date: Fri, 12 Sep 2008 12:43:09 -0400 Subject: [ofa-general] Usage of Infiniband Protocol Stack ? Message-ID: I have some applications(mvapich2, pvfs2 ...) compiled to use the OFED Infiniband protocol stack. May be a stupid question ..: Is it valid to see at the "ifconfig ib0" stats to report the usage of IB protocol stack, regardless what application I making use of the IB stack.? Thank you, Amit -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Fri Sep 12 09:46:44 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 12 Sep 2008 12:46:44 -0400 Subject: [ofa-general] Usage of Infiniband Protocol Stack ? In-Reply-To: References: Message-ID: <48CA9CF4.3070502@scalableinformatics.com> Kumar, Amit H. wrote: > I have some applications(mvapich2, pvfs2 …) compiled to use the OFED > Infiniband protocol stack. > > May be a stupid question ..: > Is it valid to see at the “ifconfig ib0” stats to report the usage of IB > protocol stack, regardless what application I making use of the IB stack.? Hi Amit: Only if they have loaded/configured IPoIB. If they haven't configured it, you might be able to try ibnodes and see if this reports anything on the network. We do tend to configure this for our customers clusters precisely as a diagnostics/testing tool (and as a way to enable infiniband-ignoring MPI stacks such as MPICH1/2 to have a fighting chance of using infiniband). Joe > > Thank you, > Amit > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From chu11 at llnl.gov Fri Sep 12 09:59:50 2008 From: chu11 at llnl.gov (Al Chu) Date: Fri, 12 Sep 2008 09:59:50 -0700 Subject: [ofa-general] [OpenSM][Trivial] fix routing algorithm description Message-ID: <1221238790.6274.7.camel@cardanus.llnl.gov> Hey Sasha, I think the text was just old. There are more algorithms than just minhop, updn, and file nowadays. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fix-minhop-algorithm-description.patch Type: text/x-patch Size: 1698 bytes Desc: not available URL: From hal.rosenstock at gmail.com Fri Sep 12 10:08:53 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 12 Sep 2008 13:08:53 -0400 Subject: ***SPAM*** Re: [ofa-general] Usage of Infiniband Protocol Stack ? In-Reply-To: References: Message-ID: On Fri, Sep 12, 2008 at 12:43 PM, Kumar, Amit H. wrote: > I have some applications(mvapich2, pvfs2 …) compiled to use the OFED > Infiniband protocol stack. > > May be a stupid question ..: > Is it valid to see at the "ifconfig ib0" stats to report the usage of IB > protocol stack, regardless what application I making use of the IB stack.? ifconfig for IB interfaces shows the IPoIB stats. "Pure" IB stats are available from the PMA. These stats (bytes*4,packets x in/out) are total (across all applications being run). They can be obtained by the perfquery diagnostic tool or via a Performance Manager. -- Hal > Thank you, > Amit > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Fri Sep 12 11:24:10 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 12 Sep 2008 21:24:10 +0300 Subject: [ofa-general] Re: [OpenSM][Trivial] fix routing algorithm description In-Reply-To: <1221238790.6274.7.camel@cardanus.llnl.gov> References: <1221238790.6274.7.camel@cardanus.llnl.gov> Message-ID: <20080912182410.GA17315@sashak.voltaire.com> On 09:59 Fri 12 Sep , Al Chu wrote: > Hey Sasha, > > I think the text was just old. There are more algorithms than just > minhop, updn, and file nowadays. > > Al > > -- > Albert Chu > chu11 at llnl.gov > 925-422-5311 > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > From 6014c58bc3e63df98135bbb987e89e9b3ae4f706 Mon Sep 17 00:00:00 2001 > From: Albert Chu > Date: Fri, 12 Sep 2008 09:55:30 -0700 > Subject: [PATCH] fix minhop algorithm description > > > Signed-off-by: Albert Chu Applied. Thanks. Sasha From rdreier at cisco.com Fri Sep 12 11:34:09 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 Sep 2008 11:34:09 -0700 Subject: [ofa-general] [PATCH v2] ipiob: fix rtnl deadlock In-Reply-To: <48CA89E6.8030301@gmail.com> (Yossi Etigin's message of "Fri, 12 Sep 2008 18:25:26 +0300") References: <4899CF0A.1060509@Voltaire.COM> <32cb786f0808081155o19f8fb9dm217cd6996dffa3e5@mail.gmail.com> <32cb786f0808090538j272842b1r5117547cccde0d06@mail.gmail.com> <32cb786f0808161218o417553b5w1738a517f0eb468a@mail.gmail.com> <48CA50E1.2090309@gmail.com> <48CA89E6.8030301@gmail.com> Message-ID: > ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush() > which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter > calls ib_sa_free_multicast(), and this wait until the multicast > completion handler finishes. This happens to be > ipoib_mcast_join_complete(), which > waits for the rtnl_lock(), whcih was already taken by ipoib_stop(). I see... I wonder why lockdep didn't warn about this in my testing. Anyway, any ideas how we want to fix this? - R. From sashak at voltaire.com Fri Sep 12 11:34:07 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 12 Sep 2008 21:34:07 +0300 Subject: ***SPAM*** Re: [ofa-general] OpenSM Problems/Questions In-Reply-To: <20080911141301.4823682d.weiny2@llnl.gov> References: <20080909121140.1ec7838b.weiny2@llnl.gov> <20080911203627.GN25831@sashak.voltaire.com> <20080911141301.4823682d.weiny2@llnl.gov> Message-ID: <20080912183407.GB17315@sashak.voltaire.com> On 14:13 Thu 11 Sep , Ira Weiny wrote: > > > > Also there was your fix (after OFED 1.3) which is pretty related to > > unstable links. > > True, but as I understood this is happening right after boot. Is this true? I think it could be related to a group of nodes where links are unstable. Of course I may be wrong about it - Matt should know better. And if it is - the fix should be relevant. Sasha From AHKumar at odu.edu Fri Sep 12 12:24:49 2008 From: AHKumar at odu.edu (Kumar, Amit H.) Date: Fri, 12 Sep 2008 15:24:49 -0400 Subject: [ofa-general] Usage of Infiniband Protocol Stack ? In-Reply-To: References: Message-ID: Thank you Hal & Joe for your prompt reply. Two more questions: If ifconfig for IB is just for IPoIB, is it okay to bring down this interface(ib0) and still be able to run applications like Mvapich2 and pvfs2 ?? And I also assume that we At Least Need 1 Ethernet Interface Up for the correct operation of IB compiled Applications, without which IB compiled Applications will fail. Is this correct ?? Thank you, Amit > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: Friday, September 12, 2008 1:09 PM > To: Kumar, Amit H. > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] Usage of Infiniband Protocol Stack ? > > On Fri, Sep 12, 2008 at 12:43 PM, Kumar, Amit H. > wrote: > > I have some applications(mvapich2, pvfs2 ...) compiled to use the OFED > > Infiniband protocol stack. > > > > May be a stupid question ..: > > Is it valid to see at the "ifconfig ib0" stats to report the usage of > IB > > protocol stack, regardless what application I making use of the IB > stack.? > > ifconfig for IB interfaces shows the IPoIB stats. > > "Pure" IB stats are available from the PMA. These stats > (bytes*4,packets x in/out) are total (across all applications being > run). They can be obtained by the perfquery diagnostic tool or via a > Performance Manager. > > -- Hal > > > Thank you, > > Amit > > > > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > From aj.guillon at gmail.com Fri Sep 12 16:52:19 2008 From: aj.guillon at gmail.com (Adrien Guillon) Date: Fri, 12 Sep 2008 19:52:19 -0400 Subject: [ofa-general] Sharing CQs across multiple connections with librdmacm Message-ID: <9870a2060809121652g3a54f817x6d92ad953bcf863f@mail.gmail.com> Hey... I want to allocate a send CQ and receive CQ for each HCA, to be shared by all connections using that HCA. This seems possible according to the Infiniband standard, but I can't see how to do this in practice using the ibverbs. I'm using librdmacm for the actual connections. My problem is that ibv_create_cq() takes an ibv_context* as an argument. With librdmacm, I can get this through rdma_cm_id->verbs. However it looks like ibv_context objects are associated with particular connections, not particular HCAs which is what is confusing me. It seems to me that ibv_create_cq() should be associated with a handle to the HCA itself, as the "Infiniband Network Architecture" book says. Ideally I would allocate a data structure with HCA specific data for each device (e.g. PD, CQ, etc.) and use the kernel name (e.g. mctha0) to lookup the HCA specific data. That way I can check the ibv_context to see if I can use existing specific data or create new. Whew. So the question is... how do I do this given that ibv_create_cq() takes ibv_context* as an argument? Will it internally just use the ibv_context to look up the device? What happens when that ibv_context is destroyed, but I want the PD to remain open (e.g. connection destroyed, others still open)? Can I create CQs and PDs using ibv_device at initialization time, so I don't have to wait for the first connection on each device to come in? Thanks! AJ From sashak at voltaire.com Fri Sep 12 18:18:09 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Sep 2008 04:18:09 +0300 Subject: [ofa-general] [PATCH] infiniband-diags/ibtracert: fix port by direct path resolving Message-ID: <20080913011809.GC17315@sashak.voltaire.com> Then option '-D' is used ports provided to ibtracert in direct path format. This option was broken (bug #1136) due to incorrect resolution - lack of lid. This addresses bug #1136. Signed-off-by: Sasha Khapyorsky --- infiniband-diags/src/ibtracert.c | 23 +++++++++++++++++++++++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/infiniband-diags/src/ibtracert.c b/infiniband-diags/src/ibtracert.c index eb9329c..21edfba 100644 --- a/infiniband-diags/src/ibtracert.c +++ b/infiniband-diags/src/ibtracert.c @@ -673,6 +673,20 @@ free_name: free(nodename); } +static int resolve_lid(ib_portid_t *portid, const void *srcport) +{ + uint8_t portinfo[64]; + uint16_t lid; + + if (!smp_query_via(portinfo, portid, IB_ATTR_PORT_INFO, 0, 0, srcport)) + return -1; + mad_decode_field(portinfo, IB_PORT_LID_F, &lid); + + ib_portid_set(portid, lid, 0, 0); + + return 0; +} + static void usage(void) { @@ -806,6 +820,15 @@ main(int argc, char **argv) if (ib_resolve_portid_str(&dest_portid, argv[1], dest_type, sm_id) < 0) IBERROR("can't resolve destination port %s", argv[1]); + if (dest_type == IB_DEST_DRPATH) { + if (resolve_lid(&src_portid, NULL) < 0) + IBERROR("cannot resolve lid for port \'%s\'", + portid2str(&src_portid)); + if (resolve_lid(&dest_portid, NULL) < 0) + IBERROR("cannot resolve lid for port \'%s\'", + portid2str(&dest_portid)); + } + if (dest_portid.lid == 0 || src_portid.lid == 0) { IBWARN("bad src/dest lid"); usage(); -- 1.6.0.1.196.g01914 From sashak at voltaire.com Fri Sep 12 19:12:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Sep 2008 05:12:00 +0300 Subject: [ofa-general] [PATCH] opensm/opensm.spec: comment out service auto-startup setup In-Reply-To: <1221168479.19185.135.camel@cardanus.llnl.gov> References: <20080911201126.GK25831@sashak.voltaire.com> <1221168479.19185.135.camel@cardanus.llnl.gov> Message-ID: <20080913021200.GE17315@sashak.voltaire.com> Hi Al, On 14:27 Thu 11 Sep , Al Chu wrote: > > Although the %post script below may not be 100% portable, I think it's > pretty typical for system daemon rpms. A quick "rpm -q --scripts > " shows its pretty common for system daemons on RHEL. It > should be tweaked for portability rather than being removed. The issue is that it starts opensm service on boot automatically after installation (without user requesting this with 'chkconfig' or so) - see bug #1181 - https://bugs.openfabrics.org/show_bug.cgi?id=1181 > Personally, I've never done "/sbin/service FOO condrestart" in rpm > scripts. I do "%{initrddir}/FOO condrestart". Maybe that's more > portable?? So script itself should support 'condrestart' command? Sasha > > Al > > On Thu, 2008-09-11 at 23:11 +0300, Sasha Khapyorsky wrote: > > This addresses bug#1181. > > > > Comment out opensm service auto-startup setup at %post section. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > > > I don't really know why it was done this way originally. So please send > > any comments and/or objections. > > > > opensm/opensm.spec.in | 10 +++++----- > > 1 files changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in > > index 2e3abfc..fc7677d 100644 > > --- a/opensm/opensm.spec.in > > +++ b/opensm/opensm.spec.in > > @@ -104,11 +104,11 @@ install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh > > rm -rf $RPM_BUILD_ROOT > > > > %post > > -if [ $1 = 1 ]; then > > - /sbin/chkconfig --add opensmd > > -else > > - /sbin/service opensmd condrestart > > -fi > > +#if [ $1 = 1 ]; then > > +# /sbin/chkconfig --add opensmd > > +#else > > +# /sbin/service opensmd condrestart > > +#fi > > > > %preun > > if [ $1 = 0 ]; then > -- > Albert Chu > chu11 at llnl.gov > 925-422-5311 > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > From chu11 at llnl.gov Fri Sep 12 20:52:22 2008 From: chu11 at llnl.gov (Al Chu) Date: Fri, 12 Sep 2008 23:52:22 -0400 Subject: [ofa-general] [PATCH] opensm/opensm.spec: comment out service auto-startup setup In-Reply-To: <20080913021200.GE17315@sashak.voltaire.com> References: <20080911201126.GK25831@sashak.voltaire.com> <1221168479.19185.135.camel@cardanus.llnl.gov> <20080913021200.GE17315@sashak.voltaire.com> Message-ID: <1221277942.3059.46.camel@whatsup> Hey Sasha, > The issue is that it starts opensm service on boot automatically after > installation (without user requesting this with 'chkconfig' or so) - see > bug #1181 - https://bugs.openfabrics.org/show_bug.cgi?id=1181 I don't know how this issue is handled in Suse, but I don't think it should be handled by removing the post-install script in the rpm spec file. I believe the post install script is typical for redhat/fedora systems. On my RHEL server, a quick look shows most of the popular daemons automatically add the daemon to chkconfig. # > rpm -q --scripts vixie-cron postinstall scriptlet (using /bin/sh): /sbin/chkconfig --add crond # > rpm -q --scripts openssh-server postinstall scriptlet (using /bin/sh): /sbin/chkconfig --add sshd # > rpm -q --scripts httpd postinstall scriptlet (using /bin/sh): # Register the httpd service /sbin/chkconfig --add httpd # > rpm -q --scripts mysql-server postinstall scriptlet (using /bin/sh): if [ $1 = 1 ]; then /sbin/chkconfig --add mysqld fi Whether the daemon should be started up automatically on boot is configured in the init.d script by specifying what run levels it should be configured on/off automatically. I see in the git master opensm it seems to be off by default: # > grep chkconfig redhat-opensm.init.in # chkconfig: - 15 85 So perhaps we need to bug some Suse knowledgeable people on how to do this properly. Because I think this patch will break RHEL behavior. > So script itself should support 'condrestart' command? This is the way I've personally done it. I can't say what the most common method is, but it seems fairly common. A grep in /etc/init.d on my RHEL system shows it is all over the place. Al On Sat, 2008-09-13 at 05:12 +0300, Sasha Khapyorsky wrote: > Hi Al, > > On 14:27 Thu 11 Sep , Al Chu wrote: > > > > Although the %post script below may not be 100% portable, I think it's > > pretty typical for system daemon rpms. A quick "rpm -q --scripts > > " shows its pretty common for system daemons on RHEL. It > > should be tweaked for portability rather than being removed. > > The issue is that it starts opensm service on boot automatically after > installation (without user requesting this with 'chkconfig' or so) - see > bug #1181 - https:// bugs.openfabrics.org/show_bug.cgi?id=1181 > > > Personally, I've never done "/sbin/service FOO condrestart" in rpm > > scripts. I do "%{initrddir}/FOO condrestart". Maybe that's more > > portable?? > > So script itself should support 'condrestart' command? > > Sasha > > > > > Al > > > > On Thu, 2008-09-11 at 23:11 +0300, Sasha Khapyorsky wrote: > > > This addresses bug#1181. > > > > > > Comment out opensm service auto-startup setup at %post section. > > > > > > Signed-off-by: Sasha Khapyorsky > > > --- > > > > > > I don't really know why it was done this way originally. So please send > > > any comments and/or objections. > > > > > > opensm/opensm.spec.in | 10 +++++----- > > > 1 files changed, 5 insertions(+), 5 deletions(-) > > > > > > diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in > > > index 2e3abfc..fc7677d 100644 > > > --- a/opensm/opensm.spec.in > > > +++ b/opensm/opensm.spec.in > > > @@ -104,11 +104,11 @@ install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh > > > rm -rf $RPM_BUILD_ROOT > > > > > > %post > > > -if [ $1 = 1 ]; then > > > - /sbin/chkconfig --add opensmd > > > -else > > > - /sbin/service opensmd condrestart > > > -fi > > > +#if [ $1 = 1 ]; then > > > +# /sbin/chkconfig --add opensmd > > > +#else > > > +# /sbin/service opensmd condrestart > > > +#fi > > > > > > %preun > > > if [ $1 = 0 ]; then > > -- > > Albert Chu > > chu11 at llnl.gov > > 925-422-5311 > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > > -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Fri Sep 12 21:05:43 2008 From: chu11 at llnl.gov (Al Chu) Date: Sat, 13 Sep 2008 00:05:43 -0400 Subject: [ofa-general] [PATCH] opensm/opensm.spec: comment out service auto-startup setup In-Reply-To: <1221277942.3059.46.camel@whatsup> References: <20080911201126.GK25831@sashak.voltaire.com> <1221168479.19185.135.camel@cardanus.llnl.gov> <20080913021200.GE17315@sashak.voltaire.com> <1221277942.3059.46.camel@whatsup> Message-ID: <1221278743.3059.54.camel@whatsup> Hey Sasha, I suddenly remembered that I've had to support a daemon on Suse before and looked at that project's init.d script :-) I *think* I remember how this is handled in Suse now. It's a different set of comments at the top of the init.d script. In opensm/scripts/opensm.init.in I see this: ### BEGIN INIT INFO # Provides: opensm # Required-Start: $syslog # Default-Start: 2 3 5 # Default-Stop: 0 1 6 # Description: Manage OpenSM ### END INIT INFO I think this indicates that by default opensm should start on boot on run levels 2 3 5. Which I guess is what we don't want. I'm going to take a guess that the following patch will fix the problem. Patch is completely untested (I don't have a suse system). So hopefully someone else can try it out. Al On Fri, 2008-09-12 at 23:52 -0400, Al Chu wrote: > Hey Sasha, > > > The issue is that it starts opensm service on boot automatically after > > installation (without user requesting this with 'chkconfig' or so) - see > > bug #1181 - https:// bugs.openfabrics.org/show_bug.cgi?id=1181 > > I don't know how this issue is handled in Suse, but I don't think it > should be handled by removing the post-install script in the rpm spec > file. I believe the post install script is typical for redhat/fedora > systems. On my RHEL server, a quick look shows most of the popular > daemons automatically add the daemon to chkconfig. > > # > rpm -q --scripts vixie-cron > postinstall scriptlet (using /bin/sh): > /sbin/chkconfig --add crond > > # > rpm -q --scripts openssh-server > postinstall scriptlet (using /bin/sh): > /sbin/chkconfig --add sshd > > # > rpm -q --scripts httpd > postinstall scriptlet (using /bin/sh): > # Register the httpd service > /sbin/chkconfig --add httpd > > # > rpm -q --scripts mysql-server > postinstall scriptlet (using /bin/sh): > if [ $1 = 1 ]; then > /sbin/chkconfig --add mysqld > fi > > Whether the daemon should be started up automatically on boot is > configured in the init.d script by specifying what run levels it should > be configured on/off automatically. I see in the git master opensm it > seems to be off by default: > > # > grep chkconfig redhat-opensm.init.in > # chkconfig: - 15 85 > > So perhaps we need to bug some Suse knowledgeable people on how to do > this properly. Because I think this patch will break RHEL behavior. > > > So script itself should support 'condrestart' command? > > This is the way I've personally done it. I can't say what the most > common method is, but it seems fairly common. A grep in /etc/init.d on > my RHEL system shows it is all over the place. > > Al > > On Sat, 2008-09-13 at 05:12 +0300, Sasha Khapyorsky wrote: > > Hi Al, > > > > On 14:27 Thu 11 Sep , Al Chu wrote: > > > > > > Although the %post script below may not be 100% portable, I think it's > > > pretty typical for system daemon rpms. A quick "rpm -q --scripts > > > " shows its pretty common for system daemons on RHEL. It > > > should be tweaked for portability rather than being removed. > > > > The issue is that it starts opensm service on boot automatically after > > installation (without user requesting this with 'chkconfig' or so) - see > > bug #1181 - https:// bugs.openfabrics.org/show_bug.cgi?id=1181 > > > > > Personally, I've never done "/sbin/service FOO condrestart" in rpm > > > scripts. I do "%{initrddir}/FOO condrestart". Maybe that's more > > > portable?? > > > > So script itself should support 'condrestart' command? > > > > Sasha > > > > > > > > Al > > > > > > On Thu, 2008-09-11 at 23:11 +0300, Sasha Khapyorsky wrote: > > > > This addresses bug#1181. > > > > > > > > Comment out opensm service auto-startup setup at %post section. > > > > > > > > Signed-off-by: Sasha Khapyorsky > > > > --- > > > > > > > > I don't really know why it was done this way originally. So please send > > > > any comments and/or objections. > > > > > > > > opensm/opensm.spec.in | 10 +++++----- > > > > 1 files changed, 5 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in > > > > index 2e3abfc..fc7677d 100644 > > > > --- a/opensm/opensm.spec.in > > > > +++ b/opensm/opensm.spec.in > > > > @@ -104,11 +104,11 @@ install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh > > > > rm -rf $RPM_BUILD_ROOT > > > > > > > > %post > > > > -if [ $1 = 1 ]; then > > > > - /sbin/chkconfig --add opensmd > > > > -else > > > > - /sbin/service opensmd condrestart > > > > -fi > > > > +#if [ $1 = 1 ]; then > > > > +# /sbin/chkconfig --add opensmd > > > > +#else > > > > +# /sbin/service opensmd condrestart > > > > +#fi > > > > > > > > %preun > > > > if [ $1 = 0 ]; then > > > -- > > > Albert Chu > > > chu11 at llnl.gov > > > 925-422-5311 > > > Computer Scientist > > > High Performance Systems Division > > > Lawrence Livermore National Laboratory > > > > > > -- > Albert Chu > chu11 at llnl.gov > 925-422-5311 > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-do-not-start-opensm-on-boot-automatically.patch Type: application/mbox Size: 770 bytes Desc: not available URL: From sashak at voltaire.com Sat Sep 13 03:04:45 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Sep 2008 13:04:45 +0300 Subject: [ofa-general] [PATCH] opensm/opensm.spec: comment out service auto-startup setup In-Reply-To: <1221278743.3059.54.camel@whatsup> References: <20080911201126.GK25831@sashak.voltaire.com> <1221168479.19185.135.camel@cardanus.llnl.gov> <20080913021200.GE17315@sashak.voltaire.com> <1221277942.3059.46.camel@whatsup> <1221278743.3059.54.camel@whatsup> Message-ID: <20080913100445.GF17315@sashak.voltaire.com> Hi Al, On 00:05 Sat 13 Sep , Al Chu wrote: > > I *think* I remember how this is handled in Suse now. It's a different > set of comments at the top of the init.d script. In > opensm/scripts/opensm.init.in I see this: > > ### BEGIN INIT INFO > # Provides: opensm > # Required-Start: $syslog > # Default-Start: 2 3 5 > # Default-Stop: 0 1 6 > # Description: Manage OpenSM > ### END INIT INFO It was my original thought too. But actually those fields are used as recommendation to chkconfig --add, chkconfig --del, etc.. > I think this indicates that by default opensm should start on boot on > run levels 2 3 5. Which I guess is what we don't want. I'm going to > take a guess that the following patch will fix the problem. Patch is > completely untested (I don't have a suse system). So hopefully someone > else can try it out. The patch is good since it drops unneeded assumption about configured runlevels. Without this system defaults will be used by chkconfig, and I guess it is more portable. Unfortunately it doesn't solve the original issue. Sasha From vlad at lists.openfabrics.org Sat Sep 13 03:08:51 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 13 Sep 2008 03:08:51 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080913-0200 daily build status Message-ID: <20080913100851.485E7E608FF@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From sashak at voltaire.com Sat Sep 13 09:25:29 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Sep 2008 19:25:29 +0300 Subject: [ofa-general] [PATCH] opensm/opensm.spec: comment out service auto-startup setup In-Reply-To: <1221277942.3059.46.camel@whatsup> References: <20080911201126.GK25831@sashak.voltaire.com> <1221168479.19185.135.camel@cardanus.llnl.gov> <20080913021200.GE17315@sashak.voltaire.com> <1221277942.3059.46.camel@whatsup> Message-ID: <20080913162529.GH17315@sashak.voltaire.com> Hi Al, On 23:52 Fri 12 Sep , Al Chu wrote: > > I don't know how this issue is handled in Suse, but I don't think it > should be handled by removing the post-install script in the rpm spec > file. What I can find is 'chkconfig --add', requested run levels are described in optional '# Default-Start: ' tag. Something like # Default-Start: none will disable setup and 'chkconfig --add' will do nothing. This could work as immediate solution and keep RH setup untouched. > I believe the post install script is typical for redhat/fedora > systems. On my RHEL server, a quick look shows most of the popular > daemons automatically add the daemon to chkconfig. With RH it is different, startup script must have tag: # chkconfig: > # > rpm -q --scripts vixie-cron > postinstall scriptlet (using /bin/sh): > /sbin/chkconfig --add crond > > # > rpm -q --scripts openssh-server > postinstall scriptlet (using /bin/sh): > /sbin/chkconfig --add sshd > > # > rpm -q --scripts httpd > postinstall scriptlet (using /bin/sh): > # Register the httpd service > /sbin/chkconfig --add httpd > > # > rpm -q --scripts mysql-server > postinstall scriptlet (using /bin/sh): > if [ $1 = 1 ]; then > /sbin/chkconfig --add mysqld > fi > > Whether the daemon should be started up automatically on boot is > configured in the init.d script by specifying what run levels it should > be configured on/off automatically. > > I see in the git master opensm it > seems to be off by default: > > # > grep chkconfig redhat-opensm.init.in > # chkconfig: - 15 85 Correct, and in post-install time 'chkconfig --add' will do nothing (almost, it will register the service with no startup run-levels). When user will want to setup startup on boot she will need to edit startup script and re-run 'chkconfig --add'. If so what is a clear benefit in running 'chkconfig --add' at post-install time? I don't know. Maybe only convention. Sasha > So perhaps we need to bug some Suse knowledgeable people on how to do > this properly. Because I think this patch will break RHEL behavior. > > > So script itself should support 'condrestart' command? > > This is the way I've personally done it. I can't say what the most > common method is, but it seems fairly common. A grep in /etc/init.d on > my RHEL system shows it is all over the place. > > Al > > On Sat, 2008-09-13 at 05:12 +0300, Sasha Khapyorsky wrote: > > Hi Al, > > > > On 14:27 Thu 11 Sep , Al Chu wrote: > > > > > > Although the %post script below may not be 100% portable, I think it's > > > pretty typical for system daemon rpms. A quick "rpm -q --scripts > > > " shows its pretty common for system daemons on RHEL. It > > > should be tweaked for portability rather than being removed. > > > > The issue is that it starts opensm service on boot automatically after > > installation (without user requesting this with 'chkconfig' or so) - see > > bug #1181 - https:// bugs.openfabrics.org/show_bug.cgi?id=1181 > > > > > Personally, I've never done "/sbin/service FOO condrestart" in rpm > > > scripts. I do "%{initrddir}/FOO condrestart". Maybe that's more > > > portable?? > > > > So script itself should support 'condrestart' command? > > > > Sasha > > > > > > > > Al > > > > > > On Thu, 2008-09-11 at 23:11 +0300, Sasha Khapyorsky wrote: > > > > This addresses bug#1181. > > > > > > > > Comment out opensm service auto-startup setup at %post section. > > > > > > > > Signed-off-by: Sasha Khapyorsky > > > > --- > > > > > > > > I don't really know why it was done this way originally. So please send > > > > any comments and/or objections. > > > > > > > > opensm/opensm.spec.in | 10 +++++----- > > > > 1 files changed, 5 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in > > > > index 2e3abfc..fc7677d 100644 > > > > --- a/opensm/opensm.spec.in > > > > +++ b/opensm/opensm.spec.in > > > > @@ -104,11 +104,11 @@ install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh > > > > rm -rf $RPM_BUILD_ROOT > > > > > > > > %post > > > > -if [ $1 = 1 ]; then > > > > - /sbin/chkconfig --add opensmd > > > > -else > > > > - /sbin/service opensmd condrestart > > > > -fi > > > > +#if [ $1 = 1 ]; then > > > > +# /sbin/chkconfig --add opensmd > > > > +#else > > > > +# /sbin/service opensmd condrestart > > > > +#fi > > > > > > > > %preun > > > > if [ $1 = 0 ]; then > > > -- > > > Albert Chu > > > chu11 at llnl.gov > > > 925-422-5311 > > > Computer Scientist > > > High Performance Systems Division > > > Lawrence Livermore National Laboratory > > > > > > -- > Albert Chu > chu11 at llnl.gov > 925-422-5311 > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > From sashak at voltaire.com Sat Sep 13 09:39:36 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Sep 2008 19:39:36 +0300 Subject: [ofa-general] [PATCH] opensm: do not start opensm on boot automatically In-Reply-To: <20080913162529.GH17315@sashak.voltaire.com> References: <20080911201126.GK25831@sashak.voltaire.com> <1221168479.19185.135.camel@cardanus.llnl.gov> <20080913021200.GE17315@sashak.voltaire.com> <1221277942.3059.46.camel@whatsup> <20080913162529.GH17315@sashak.voltaire.com> Message-ID: <20080913163936.GI17315@sashak.voltaire.com> Do not start opensm on boot automatically on not RH systems. Signed-off-by: Sasha Khapyorsky --- opensm/scripts/opensm.init.in | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/opensm/scripts/opensm.init.in b/opensm/scripts/opensm.init.in index 7673bfa..550a0c1 100644 --- a/opensm/scripts/opensm.init.in +++ b/opensm/scripts/opensm.init.in @@ -8,8 +8,8 @@ ### BEGIN INIT INFO # Provides: opensm # Required-Start: $syslog -# Default-Start: -# Default-Stop: 0 1 2 3 5 6 +# Default-Start: none +# Default-Stop: 0 1 6 # Description: Manage OpenSM ### END INIT INFO # -- 1.5.4.rc2.60.gb2e62 From sashak at voltaire.com Sat Sep 13 11:20:29 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Sep 2008 21:20:29 +0300 Subject: [ofa-general] [PATCH] opensm/redhat-opensm.init.in: make config file optional Message-ID: <20080913182029.GK17315@sashak.voltaire.com> This file is not installed by default, so make it optional for the script. Signed-off-by: Sasha Khapyorsky --- opensm/scripts/redhat-opensm.init.in | 6 ++---- 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/opensm/scripts/redhat-opensm.init.in b/opensm/scripts/redhat-opensm.init.in index 5526e44..d4cc580 100755 --- a/opensm/scripts/redhat-opensm.init.in +++ b/opensm/scripts/redhat-opensm.init.in @@ -47,12 +47,10 @@ exec_prefix=@exec_prefix@ . /etc/rc.d/init.d/functions CONFIG=@sysconfdir@/sysconfig/opensm.conf -if [ ! -f $CONFIG ]; then - exit 0 +if [ -f $CONFIG ]; then + . $CONFIG fi -. $CONFIG - prog=@sbindir@/opensm bin=${prog##*/} -- 1.5.4.rc2.60.gb2e62 From sashak at voltaire.com Sat Sep 13 11:29:02 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Sep 2008 21:29:02 +0300 Subject: [ofa-general] [PATCH] opensm/opensm.spec.in: don't install old format conf file Message-ID: <20080913182902.GL17315@sashak.voltaire.com> OpenSM uses this name (//opensm.conf) as default config file name. The file installed there from spec is old format "config-like" setup for old init scripts. It has different format and will break OpenSM (now this works in OFED only because some variables are not substituted correctly during OFED build and it is installed in another place). Signed-off-by: Sasha Khapyorsky --- opensm/opensm.spec.in | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in index fc7677d..14c753f 100644 --- a/opensm/opensm.spec.in +++ b/opensm/opensm.spec.in @@ -96,7 +96,6 @@ else fi mkdir -p $etc/{init.d,logrotate.d} $etc/@OPENSM_CONFIG_SUB_DIR@ install -m 755 scripts/${REDHAT}opensm.init $etc/init.d/opensmd -install -m 644 scripts/opensm.conf $etc/@OPENSM_CONFIG_SUB_DIR@/opensm.conf install -m 644 scripts/opensm.logrotate $etc/logrotate.d/opensm install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh @@ -128,7 +127,6 @@ fi %doc AUTHORS COPYING README doc/performance-manager-HOWTO.txt %{_sysconfdir}/init.d/opensmd %{_sbindir}/sldd.sh -%config(noreplace) %{_sysconfdir}/@OPENSM_CONFIG_SUB_DIR@/opensm.conf %config(noreplace) %{_sysconfdir}/logrotate.d/opensm %dir /var/cache/opensm %dir %{_sysconfdir}/@OPENSM_CONFIG_SUB_DIR@ -- 1.5.4.rc2.60.gb2e62 From vlad at lists.openfabrics.org Sun Sep 14 03:08:35 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 14 Sep 2008 03:08:35 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080914-0200 daily build status Message-ID: <20080914100835.C3016E60C6E@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From tziporet at dev.mellanox.co.il Sun Sep 14 08:29:26 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 14 Sep 2008 18:29:26 +0300 Subject: Fwd: [ofa-general] [PATCH] ipoib: fix hang while bringing down uninitialized interface In-Reply-To: References: <48C7DA7B.3050706@gmail.com> <48C942BE.7010606@gmail.com> Message-ID: <48CD2DD6.2020404@mellanox.co.il> Roland Dreier wrote: > > Commit http://www.openfabrics.org/git/?p=ofed_1_4/linux-2.6.git;a=commit;h=57ce41d1d18279cc90223f3deadca70c7de1cfca > > put the bug in ipoib, but maybe this causes a hang only in recent kernels > > due to modifications in timer code. > > So it looks like not a regression from 2.6.26... I'll queue this for 2.6.28 > > Yossi - can you create us a patch for OFEd 1.4 since its not going to 2.6.27 Thanks, Tziporet From kliteyn at dev.mellanox.co.il Sun Sep 14 13:51:38 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 14 Sep 2008 23:51:38 +0300 Subject: [ofa-general] ***SPAM*** Where can I find the topology file? In-Reply-To: <803776.49372.qm@web94206.mail.in2.yahoo.com> References: <803776.49372.qm@web94206.mail.in2.yahoo.com> Message-ID: <48CD795A.8040807@dev.mellanox.co.il> Hi Dhananjay, dhananjay tembe wrote: > Hi, > I am using ofed stack and opensm. I was running some ibtools > like ibdiagnet and ibdiagpath. man page for ibdiagnet shows -t optiong > using which you can specify the topology file. Will you please tell me > what does this topology file contain and how can create/generate this topology file? > Thanks in advance. You can find many examples of the topology files in the "ibutils". If you don't have the sources, you can pull git tree from here: git://staging.openfabrics.org/~orenk/ibutils All the .topo files are topology files. Here's a link to the web interface of the git repository: http://staging.openfabrics.org/git/?p=~orenk/ibutils.git;a=summary And this is a link to the directory that contains all the .topo files: http://staging.openfabrics.org/git/?p=~orenk/ibutils.git;a=tree;f=ibmgtsim/tests;h=e8b2d97cddfa9a5e922c8b132fd98ec5e96dec39;hb=master The topology file describes the hierarchical fabric topology. It can be used as an input file to ibdiagnet, in which case ibdiagnet will match the real fabric to the provided topology file. It is useful when you build a cluster, and you want to make sure that the cabling was done according to the original plan. It will also make ibdiagnet log messages more "readable", b/c ibdiagnet will use names from the topology file when reporting issues. The file can be also used as an input for ibmgtsim (IB management simulator), in which case it describes the fabric that ibmgtsim should simulate. Topo files can be written by hand, and they can be generated from the existing fabric by ibdiagnet, but then they will not have any hierarchy of the subnet - they will be "flat", like the lst files. -- Yevgeny > ---Dhananjay. > > > Add more friends to your messenger and enjoy! Go to http://in.messenger.yahoo.com/invite/ > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From harake at cscs.ch Mon Sep 15 00:51:30 2008 From: harake at cscs.ch (H. N. Harake) Date: Mon, 15 Sep 2008 09:51:30 +0200 Subject: [ofa-general] CQ entry and Async event Message-ID: Dear All, I am getting hundreds of these in my dmesg, We are using the IB network for storage (GPFS filesystem) we lost the filesystem for a short time due to connectivity issue some of the logs i found : ib_mthca 0000:09:00.0: CQ entry for unknown QP 6b0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6b0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6b0431 ib_mthca 0000:09:00.0: Async event for bogus QP 006d0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6d0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6d0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6d0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6d0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6e0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6e0431 ib_mthca 0000:09:00.0: Async event for bogus QP 006e0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 6e0431 ib_mthca 0000:09:00.0: CQ entry for unknown QP 830412 ib_mthca 0000:09:00.0: CQ entry for unknown QP 830412 I am running SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 from the ifconfig too many packets dropped appear: UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:10107122183 errors:0 dropped:0 overruns:0 frame:0 TX packets:15006340193 errors:0 dropped:89357 overruns:0 carrier:0 collisions:0 txqueuelen:128 OFED 1.2.5.1-0 09:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) Any Hints would be appreciated regards H. N. Harake From kovlensky at interia.pl Mon Sep 15 02:53:39 2008 From: kovlensky at interia.pl (kovlensky at interia.pl) Date: 15 Sep 2008 11:53:39 +0200 Subject: [ofa-general] ***SPAM*** testing memory chins on ib cards Message-ID: <20080915095340.266B52E5C7C@f27.poczta.interia.pl> Hi all, I observe such problem spawning randomly on my nodes: kernel: ib_ipath 0000:03:00.0: RXE parity, Eager TID port 0 idx 0x33c expected 20447819, but got 20047819. kernel: ib_ipath 0000:03:00.0: infinipath0: RXE parity Eager TID not recoverable, read 20047819, expected 20447819 kernel: ib_ipath 0000:03:00.0: infinipath0: RXE parity, Eager TID error is not recoverable That's for qlogic cards, Mellanox ones seem to be much, much more stable. As I need to stress every card I just looking for a tool to make memory chips there under heavy load and, unfortunately, with not much luck. So what's the tool for diagnosing the cards? ---------------------------------------------------------------------- >> Sprawdz swoja najblizsza przyszlosc! >> http://link.interia.pl/f1f0b From vlad at lists.openfabrics.org Mon Sep 15 03:10:39 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 15 Sep 2008 03:10:39 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080915-0200 daily build status Message-ID: <20080915101039.E2392E60CBB@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From sashak at voltaire.com Mon Sep 15 03:40:20 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 15 Sep 2008 13:40:20 +0300 Subject: [ofa-general] [PATCH] opensm/osm_multicast.[ch]: simplify flows, remove unused functions Message-ID: <20080915104020.GF17315@sashak.voltaire.com> Simplify flows, remove unused and mean less osm_mgrp_init() functions, consolidate notice sending functions. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_multicast.h | 88 ---------------- opensm/opensm/osm_multicast.c | 181 ++++++++++----------------------- 2 files changed, 56 insertions(+), 213 deletions(-) diff --git a/opensm/include/opensm/osm_multicast.h b/opensm/include/opensm/osm_multicast.h index c0bd16e..c860d4a 100644 --- a/opensm/include/opensm/osm_multicast.h +++ b/opensm/include/opensm/osm_multicast.h @@ -81,29 +81,6 @@ BEGIN_C_DECLS * Steve King, Intel * *********/ -/****f* IBA Base: OpenSM: Multicast Group/osm_get_mcast_req_type_str -* NAME -* osm_get_mcast_req_type_str -* -* DESCRIPTION -* Returns a string for the specified osm_mcast_req_type_t value. -* -* SYNOPSIS -*/ -const char *osm_get_mcast_req_type_str(IN osm_mcast_req_type_t req_type); -/* -* PARAMETERS -* req_type -* [in] osm_mcast_req_type value -* -* RETURN VALUES -* Pointer to the request type description string. -* -* NOTES -* -* SEE ALSO -*********/ - /****s* OpenSM: Multicast Group/osm_mcast_mgr_ctxt_t * NAME * osm_mcast_mgr_ctxt_t @@ -483,71 +460,6 @@ osm_mgrp_remove_port(IN osm_subn_t * const p_subn, * SEE ALSO *********/ -/****f* OpenSM: Multicast Group/osm_mgrp_get_root_switch -* NAME -* osm_mgrp_get_root_switch -* -* DESCRIPTION -* Returns the "root" switch of this multicast group. The root switch -* is at the trunk of the multicast single spanning tree. -* -* SYNOPSIS -*/ -static inline osm_switch_t *osm_mgrp_get_root_switch(IN const osm_mgrp_t * - const p_mgrp) -{ - if (p_mgrp->p_root) - return (p_mgrp->p_root->p_sw); - else - return (NULL); -} - -/* -* PARAMETERS -* p_mgrp -* [in] Pointer to an osm_mgrp_t object. -* -* RETURN VALUES -* Returns the "root" switch of this multicast group. The root switch -* is at the trunk of the multicast single spanning tree. -* -* NOTES -* -* SEE ALSO -* Multicast Group -*********/ - -/****f* OpenSM: Multicast Group/osm_mgrp_compute_avg_hops -* NAME -* osm_mgrp_compute_avg_hops -* -* DESCRIPTION -* Returns the average number of hops from the given to switch -* to all member of a multicast group. -* -* SYNOPSIS -*/ -float -osm_mgrp_compute_avg_hops(const osm_mgrp_t * const p_mgrp, - const osm_switch_t * const p_sw); -/* -* PARAMETERS -* p_mgrp -* [in] Pointer to an osm_mgrp_t object. -* -* p_sw -* [in] Pointer to the switch from which to measure. -* -* RETURN VALUES -* Returns the average number of hops from the given to switch -* to all member of a multicast group. -* -* NOTES -* -* SEE ALSO -* Multicast Group -*********/ - /****f* OpenSM: Multicast Group/osm_mgrp_apply_func * NAME * osm_mgrp_apply_func diff --git a/opensm/opensm/osm_multicast.c b/opensm/opensm/osm_multicast.c index 77e61ad..b810630 100644 --- a/opensm/opensm/osm_multicast.c +++ b/opensm/opensm/osm_multicast.c @@ -51,23 +51,6 @@ /********************************************************************** **********************************************************************/ -/* osm_mcast_req_type_t values converted to test for easier printing. */ -const static char *mcast_req_type_str[] = { - "OSM_MCAST_REQ_TYPE_CREATE", - "OSM_MCAST_REQ_TYPE_JOIN", - "OSM_MCAST_REQ_TYPE_LEAVE", - "OSM_MCAST_REQ_TYPE_SUBNET_CHANGE" -}; - -const char *osm_get_mcast_req_type_str(IN osm_mcast_req_type_t req_type) -{ - if (req_type > OSM_MCAST_REQ_TYPE_SUBNET_CHANGE) - req_type = OSM_MCAST_REQ_TYPE_SUBNET_CHANGE; - return (mcast_req_type_str[req_type]); -} - -/********************************************************************** - **********************************************************************/ void osm_mgrp_delete(IN osm_mgrp_t * const p_mgrp) { osm_mcm_port_t *p_mcm_port; @@ -92,10 +75,13 @@ void osm_mgrp_delete(IN osm_mgrp_t * const p_mgrp) /********************************************************************** **********************************************************************/ -static void -osm_mgrp_init(IN osm_mgrp_t * const p_mgrp, IN const ib_net16_t mlid) +osm_mgrp_t *osm_mgrp_new(IN const ib_net16_t mlid) { - CL_ASSERT(cl_ntoh16(mlid) >= IB_LID_MCAST_START_HO); + osm_mgrp_t *p_mgrp; + + p_mgrp = (osm_mgrp_t *) malloc(sizeof(*p_mgrp)); + if (!p_mgrp) + return NULL; memset(p_mgrp, 0, sizeof(*p_mgrp)); cl_qmap_init(&p_mgrp->mcm_port_tbl); @@ -103,19 +89,8 @@ osm_mgrp_init(IN osm_mgrp_t * const p_mgrp, IN const ib_net16_t mlid) p_mgrp->last_change_id = 0; p_mgrp->last_tree_id = 0; p_mgrp->to_be_deleted = FALSE; -} -/********************************************************************** - **********************************************************************/ -osm_mgrp_t *osm_mgrp_new(IN const ib_net16_t mlid) -{ - osm_mgrp_t *p_mgrp; - - p_mgrp = (osm_mgrp_t *) malloc(sizeof(*p_mgrp)); - if (p_mgrp) - osm_mgrp_init(p_mgrp, mlid); - - return (p_mgrp); + return p_mgrp; } /********************************************************************** @@ -132,42 +107,39 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_mgrp_t * const p_mgrp, uint8_t prev_scope; p_mcm_port = osm_mcm_port_new(p_port_gid, join_state, proxy_join); - if (p_mcm_port) { - port_guid = p_port_gid->unicast.interface_id; + if (!p_mcm_port) + return NULL; + + port_guid = p_port_gid->unicast.interface_id; + + /* + prev_item = cl_qmap_insert(...) + Pointer to the item in the map with the specified key. If insertion + was successful, this is the pointer to the item. If an item with the + specified key already exists in the map, the pointer to that item is + returned. + */ + prev_item = cl_qmap_insert(&p_mgrp->mcm_port_tbl, + port_guid, &p_mcm_port->map_item); + + /* if already exists - revert the insertion and only update join state */ + if (prev_item != &p_mcm_port->map_item) { + osm_mcm_port_delete(p_mcm_port); + p_mcm_port = (osm_mcm_port_t *) prev_item; /* - prev_item = cl_qmap_insert(...) - Pointer to the item in the map with the specified key. If insertion - was successful, this is the pointer to the item. If an item with the - specified key already exists in the map, the pointer to that item is - returned. + o15.0.1.11 + Join state of the end port should be the or of the + previous setting with the current one */ - prev_item = cl_qmap_insert(&p_mgrp->mcm_port_tbl, - port_guid, &p_mcm_port->map_item); - - /* if already exists - revert the insertion and only update join state */ - if (prev_item != &p_mcm_port->map_item) { - - osm_mcm_port_delete(p_mcm_port); - p_mcm_port = (osm_mcm_port_t *) prev_item; - - /* - o15.0.1.11 - Join state of the end port should be the or of the - previous setting with the current one - */ - ib_member_get_scope_state(p_mcm_port->scope_state, - &prev_scope, - &prev_join_state); - p_mcm_port->scope_state = - ib_member_set_scope_state(prev_scope, - prev_join_state | - join_state); - - } else { - /* track the fact we modified the group ports */ - p_mgrp->last_change_id++; - } + ib_member_get_scope_state(p_mcm_port->scope_state, &prev_scope, + &prev_join_state); + p_mcm_port->scope_state = + ib_member_set_scope_state(prev_scope, + prev_join_state | join_state); + } else { + /* track the fact we modified the group ports */ + p_mgrp->last_change_id++; } return (p_mcm_port); @@ -243,9 +215,7 @@ __osm_mgrp_apply_func_sub(const osm_mgrp_t * const p_mgrp, uint8_t max_children; osm_mtree_node_t *p_child_mtn; - /* - Call the user, then recurse. - */ + /* Call the user, then recurse. */ p_func(p_mgrp, p_mtn, context); max_children = osm_mtree_node_get_max_children(p_mtn); @@ -276,82 +246,43 @@ osm_mgrp_apply_func(const osm_mgrp_t * const p_mgrp, /********************************************************************** **********************************************************************/ -void -osm_mgrp_send_delete_notice(IN osm_subn_t * const p_subn, - IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) +static void mgrp_send_notice(osm_subn_t *subn, osm_log_t *log, + osm_mgrp_t *mgrp, unsigned num) { ib_mad_notice_attr_t notice; ib_api_status_t status; - OSM_LOG_ENTER(p_log); - - /* prepare the needed info */ - - /* details of the notice */ - notice.generic_type = 0x83; /* is generic subn mgt type */ + notice.generic_type = 0x83; /* generic SubnMgt type */ ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ - notice.g_or_v.generic.trap_num = CL_HTON16(67); /* delete of mcg */ + notice.g_or_v.generic.trap_num = CL_HTON16(num); /* The sm_base_lid is saved in network order already. */ - notice.issuer_lid = p_subn->sm_base_lid; + notice.issuer_lid = subn->sm_base_lid; /* following o14-12.1.11 and table 120 p726 */ /* we need to provide the MGID */ - memcpy(&(notice.data_details.ntc_64_67.gid), - &(p_mgrp->mcmember_rec.mgid), sizeof(ib_gid_t)); + memcpy(¬ice.data_details.ntc_64_67.gid, + &mgrp->mcmember_rec.mgid, sizeof(ib_gid_t)); /* According to page 653 - the issuer gid in this case of trap is the SM gid, since the SM is the initiator of this trap. */ - notice.issuer_gid.unicast.prefix = p_subn->opt.subnet_prefix; - notice.issuer_gid.unicast.interface_id = p_subn->sm_port_guid; + notice.issuer_gid.unicast.prefix = subn->opt.subnet_prefix; + notice.issuer_gid.unicast.interface_id = subn->sm_port_guid; - status = osm_report_notice(p_log, p_subn, ¬ice); - if (status != IB_SUCCESS) { - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7601: " + if ((status = osm_report_notice(log, subn, ¬ice))) + OSM_LOG(log, OSM_LOG_ERROR, "ERR 7601: " "Error sending trap reports (%s)\n", ib_get_err_str(status)); - goto Exit; - } +} -Exit: - OSM_LOG_EXIT(p_log); +void +osm_mgrp_send_delete_notice(IN osm_subn_t * const p_subn, + IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) +{ + mgrp_send_notice(p_subn, p_log, p_mgrp, 67); } -/********************************************************************** - **********************************************************************/ void osm_mgrp_send_create_notice(IN osm_subn_t * const p_subn, IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) { - ib_mad_notice_attr_t notice; - ib_api_status_t status; - - OSM_LOG_ENTER(p_log); - - /* prepare the needed info */ - - /* details of the notice */ - notice.generic_type = 0x83; /* Generic SubnMgt type */ - ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ - notice.g_or_v.generic.trap_num = CL_HTON16(66); /* create of mcg */ - /* The sm_base_lid is saved in network order already. */ - notice.issuer_lid = p_subn->sm_base_lid; - /* following o14-12.1.11 and table 120 p726 */ - /* we need to provide the MGID */ - memcpy(&(notice.data_details.ntc_64_67.gid), - &(p_mgrp->mcmember_rec.mgid), sizeof(ib_gid_t)); - - /* According to page 653 - the issuer gid in this case of trap - is the SM gid, since the SM is the initiator of this trap. */ - notice.issuer_gid.unicast.prefix = p_subn->opt.subnet_prefix; - notice.issuer_gid.unicast.interface_id = p_subn->sm_port_guid; - - status = osm_report_notice(p_log, p_subn, ¬ice); - if (status != IB_SUCCESS) { - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7602: " - "Error sending trap reports (%s)\n", - ib_get_err_str(status)); - goto Exit; - } - -Exit: - OSM_LOG_EXIT(p_log); + mgrp_send_notice(p_subn, p_log, p_mgrp, 66); } -- 1.6.0.1.196.g01914 From sashak at voltaire.com Mon Sep 15 05:24:14 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 15 Sep 2008 15:24:14 +0300 Subject: [ofa-general] [PATCH] opensm: simplify osm_get_mgrp_by_mgid() search function Message-ID: <20080915122414.GG17315@sashak.voltaire.com> Simplify MC group search functions osm_get_mgrp_by_mgid(), it will return pointer to struct osm_mgrp or NULL if it was not found (or marked for deletion). Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_sa.h | 4 +- opensm/opensm/osm_sa_mcmember_record.c | 93 +++++++++----------------------- opensm/opensm/osm_sa_path_record.c | 51 +++++++---------- 3 files changed, 48 insertions(+), 100 deletions(-) diff --git a/opensm/include/opensm/osm_sa.h b/opensm/include/opensm/osm_sa.h index b861ac4..52b69c5 100644 --- a/opensm/include/opensm/osm_sa.h +++ b/opensm/include/opensm/osm_sa.h @@ -490,9 +490,7 @@ osm_mcmr_rcv_find_or_create_new_mgrp(IN osm_sa_t * sa, * *********/ -ib_api_status_t -osm_get_mgrp_by_mgid(IN osm_sa_t * sa, - IN ib_gid_t * p_mgid, OUT osm_mgrp_t ** pp_mgrp); +osm_mgrp_t *osm_get_mgrp_by_mgid(IN osm_sa_t * sa, IN ib_gid_t * p_mgid); END_C_DECLS #endif /* _OSM_SA_H_ */ diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 13bce1f..81683f0 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -966,39 +966,16 @@ Exit: } - -typedef struct osm_sa_pr_mcmr_search_ctxt { - ib_gid_t mgid; - osm_mgrp_t *p_mgrp; - osm_sa_t *sa; -} osm_sa_pr_mcmr_search_ctxt_t; - /********************************************************************** *********************************************************************/ -static void -__search_mgrp_by_mgid(IN osm_mgrp_t * const p_mgrp, IN void *context) +static unsigned match_mgrp_by_mgid(IN osm_mgrp_t * const p_mgrp, ib_gid_t *mgid) { - osm_sa_pr_mcmr_search_ctxt_t *p_ctxt = context; - osm_sa_t *sa = p_ctxt->sa; - /* ignore groups marked for deletion */ - if (p_mgrp->to_be_deleted) - return; - - /* compare entire MGID so different scope will not sneak in for - the same MGID */ - if (memcmp(&p_mgrp->mcmember_rec.mgid, &p_ctxt->mgid, sizeof(ib_gid_t))) - return; - - if (p_ctxt->p_mgrp) { - char gid_str[INET6_ADDRSTRLEN]; - OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B30: " - "Multiple MC groups for MGID %s\n", - inet_ntop(AF_INET6, p_mgrp->mcmember_rec.mgid.raw, - gid_str, sizeof gid_str)); - return; - } - p_ctxt->p_mgrp = p_mgrp; + if (p_mgrp->to_be_deleted || + memcmp(&p_mgrp->mcmember_rec.mgid, mgid, sizeof(ib_gid_t))) + return 0; + else + return 1; } /********************************************************************** @@ -1023,21 +1000,15 @@ static unsigned match_and_update_ipv6_snm_mgid(ib_gid_t *mgid) return 0; } -ib_api_status_t -osm_get_mgrp_by_mgid(IN osm_sa_t *sa, - IN ib_gid_t *p_mgid, - OUT osm_mgrp_t **pp_mgrp) +osm_mgrp_t *osm_get_mgrp_by_mgid(IN osm_sa_t *sa, IN ib_gid_t *p_mgid) { - osm_sa_pr_mcmr_search_ctxt_t mcmr_search_context; - osm_mgrp_t *p_mgrp; + ib_gid_t mgid; int i; - memcpy(&mcmr_search_context.mgid, p_mgid, sizeof(*p_mgid)); - mcmr_search_context.sa = sa; - mcmr_search_context.p_mgrp = NULL; + memcpy(&mgid, p_mgid, sizeof(mgid)); if (sa->p_subn->opt.consolidate_ipv6_snm_req && - match_and_update_ipv6_snm_mgid(&mcmr_search_context.mgid)) { + match_and_update_ipv6_snm_mgid(&mgid)) { char gid_str[INET6_ADDRSTRLEN]; OSM_LOG(sa->p_log, OSM_LOG_DEBUG, "Special Case Solicited Node Mcast Join for MGID %s\n", @@ -1046,17 +1017,12 @@ osm_get_mgrp_by_mgid(IN osm_sa_t *sa, } for (i = 0; i <= sa->p_subn->max_mcast_lid_ho - IB_LID_MCAST_START_HO; - i++) { - p_mgrp = sa->p_subn->mgroups[i]; - if (p_mgrp) { - __search_mgrp_by_mgid(p_mgrp, &mcmr_search_context); - if (mcmr_search_context.p_mgrp) { - *pp_mgrp = mcmr_search_context.p_mgrp; - return IB_SUCCESS; - } - } - } - return IB_NOT_FOUND; + i++) + if (sa->p_subn->mgroups[i] && + match_mgrp_by_mgid(sa->p_subn->mgroups[i], &mgid)) + return sa->p_subn->mgroups[i]; + + return NULL; } /********************************************************************** @@ -1069,11 +1035,12 @@ osm_mcmr_rcv_find_or_create_new_mgrp(IN osm_sa_t * sa, const p_recvd_mcmember_rec, OUT osm_mgrp_t ** pp_mgrp) { - ib_api_status_t status; + osm_mgrp_t *mgrp; - status = osm_get_mgrp_by_mgid(sa, &p_recvd_mcmember_rec->mgid, pp_mgrp); - if (status == IB_SUCCESS) - return status; + if ((mgrp = osm_get_mgrp_by_mgid(sa, &p_recvd_mcmember_rec->mgid))) { + *pp_mgrp = mgrp; + return IB_SUCCESS; + } return osm_mcmr_rcv_create_new_mgrp(sa, comp_mask, p_recvd_mcmember_rec, NULL, pp_mgrp); @@ -1088,7 +1055,6 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, { boolean_t valid; osm_mgrp_t *p_mgrp; - ib_api_status_t status; ib_sa_mad_t *p_sa_mad; ib_member_rec_t *p_recvd_mcmember_rec; ib_member_rec_t mcmember_rec; @@ -1100,7 +1066,6 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, OSM_LOG_ENTER(sa->p_log); - p_mgrp = NULL; p_sa_mad = osm_madw_get_sa_mad_ptr(p_madw); p_recvd_mcmember_rec = (ib_member_rec_t *) ib_sa_mad_get_payload_ptr(p_sa_mad); @@ -1113,8 +1078,8 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, } CL_PLOCK_EXCL_ACQUIRE(sa->p_lock); - status = osm_get_mgrp_by_mgid(sa, &p_recvd_mcmember_rec->mgid, &p_mgrp); - if (status == IB_SUCCESS) { + p_mgrp = osm_get_mgrp_by_mgid(sa, &p_recvd_mcmember_rec->mgid); + if (p_mgrp) { mlid = p_mgrp->mlid; portguid = p_recvd_mcmember_rec->port_gid.unicast.interface_id; @@ -1155,15 +1120,10 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, /* OK we can leave */ /* note: osm_sm_mcgrp_leave() will release sa->p_lock */ - - status = - osm_sm_mcgrp_leave(sa->sm, mlid, - portguid); - if (status != IB_SUCCESS) { + if (osm_sm_mcgrp_leave(sa->sm, mlid, portguid)) OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B09: " "osm_sm_mcgrp_leave failed\n"); - } } } else { char gid_str[INET6_ADDRSTRLEN]; @@ -1223,7 +1183,6 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) OSM_LOG_ENTER(sa->p_log); - p_mgrp = NULL; p_sa_mad = osm_madw_get_sa_mad_ptr(p_madw); p_recvd_mcmember_rec = (ib_member_rec_t *) ib_sa_mad_get_payload_ptr(p_sa_mad); @@ -1276,8 +1235,8 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) &join_state); /* do we need to create a new group? */ - status = osm_get_mgrp_by_mgid(sa, &p_recvd_mcmember_rec->mgid, &p_mgrp); - if (status == IB_NOT_FOUND || p_mgrp->to_be_deleted) { + p_mgrp = osm_get_mgrp_by_mgid(sa, &p_recvd_mcmember_rec->mgid); + if (!p_mgrp || p_mgrp->to_be_deleted) { /* check for JoinState.FullMember = 1 o15.0.1.9 */ if ((join_state & 0x01) != 0x01) { char gid_str[INET6_ADDRSTRLEN]; diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c index d60dc01..e614bab 100644 --- a/opensm/opensm/osm_sa_path_record.c +++ b/opensm/opensm/osm_sa_path_record.c @@ -1458,59 +1458,50 @@ __osm_pr_rcv_process_pair(IN osm_sa_t * sa, /********************************************************************** **********************************************************************/ -static void -__osm_pr_get_mgrp(IN osm_sa_t * sa, - IN const osm_madw_t * const p_madw, OUT osm_mgrp_t ** pp_mgrp) +static osm_mgrp_t *pr_get_mgrp(IN osm_sa_t * sa, + IN const osm_madw_t * const p_madw) { ib_path_rec_t *p_pr; const ib_sa_mad_t *p_sa_mad; ib_net64_t comp_mask; - ib_api_status_t status; - - OSM_LOG_ENTER(sa->p_log); + osm_mgrp_t *mgrp = NULL; p_sa_mad = osm_madw_get_sa_mad_ptr(p_madw); p_pr = (ib_path_rec_t *) ib_sa_mad_get_payload_ptr(p_sa_mad); comp_mask = p_sa_mad->comp_mask; - if (comp_mask & IB_PR_COMPMASK_DGID) { - status = osm_get_mgrp_by_mgid(sa, &p_pr->dgid, pp_mgrp); - if (status != IB_SUCCESS) { - char gid_str[INET6_ADDRSTRLEN]; - OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F09: " - "No MC group found for PathRecord destination " - "GID %s\n", - inet_ntop(AF_INET6, p_pr->dgid.raw, gid_str, - sizeof gid_str)); - goto Exit; - } + if ((comp_mask & IB_PR_COMPMASK_DGID) && + !(mgrp = osm_get_mgrp_by_mgid(sa, &p_pr->dgid))) { + char gid_str[INET6_ADDRSTRLEN]; + OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F09: " + "No MC group found for PathRecord destination GID %s\n", + inet_ntop(AF_INET6, p_pr->dgid.raw, gid_str, + sizeof gid_str)); + goto Exit; } if (comp_mask & IB_PR_COMPMASK_DLID) { - if (*pp_mgrp) { + if (mgrp) { /* check that the MLID in the MC group is */ /* the same as the DLID in the PathRecord */ - if ((*pp_mgrp)->mlid != p_pr->dlid) { + if (mgrp->mlid != p_pr->dlid) { /* Note: perhaps this might be better indicated as an invalid request */ OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F10: " "MC group MLID 0x%x does not match " "PathRecord destination LID 0x%x\n", - (*pp_mgrp)->mlid, p_pr->dlid); - *pp_mgrp = NULL; + mgrp->mlid, p_pr->dlid); + mgrp = NULL; goto Exit; } - } else { - *pp_mgrp = osm_get_mgrp_by_mlid(sa->p_subn, p_pr->dlid); - if (*pp_mgrp == NULL) - OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F11: " - "No MC group found for PathRecord " - "destination LID 0x%x\n", p_pr->dlid); - } + } else if (!(mgrp = osm_get_mgrp_by_mlid(sa->p_subn, p_pr->dlid))) + OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F11: " + "No MC group found for PathRecord " + "destination LID 0x%x\n", p_pr->dlid); } Exit: - OSM_LOG_EXIT(sa->p_log); + return mgrp; } /********************************************************************** @@ -1743,7 +1734,7 @@ McastDest: uint8_t hop_limit; /* First, get the MC info */ - __osm_pr_get_mgrp(sa, p_madw, &p_mgrp); + p_mgrp = pr_get_mgrp(sa, p_madw); if (!p_mgrp) goto Unlock; -- 1.6.0.1.196.g01914 From christopher.tanner at gatech.edu Mon Sep 15 05:31:51 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Mon, 15 Sep 2008 08:31:51 -0400 Subject: [ofa-general] Permission Denied? Message-ID: All - I sent this out last week and haven't gotten any response. Is this problem unsolvable? -------- I'm receiving this error when I try to execute a mpi executable: [node2][0,1,1][btl_openib_component.c:466:init_one_hca] error obtaining device context for mthca0 errno says Permission denied -------------------------------------------------------------------------- WARNING: There were errors during IB HCA initialization on host 'node2'. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There is at least on IB HCA found on host 'node2', but there is no active ports detected. This is most certainly not what you wanted. Check your cables and SM configuration. -------------------------------------------------------------------------- I'm confused about the 'Permission denied'. My user is part of the group 'rdma', which I thought was supposed to give them permission to access the Infiniband devices. I'm also confused because the trivial test cases such as 'Hello World' and 'hostname' execute on all nodes without errors. The 'no active ports' is also curious. On the master node, I am running OpenSM and it indicates that the port is active (using ibv_devinfo). However, I notice that the 'ibv_devinfo' command can only be run by root. Is this an indication that permissions are not set correctly? As another note, my cluster is running Ubuntu 8.04, so I couldn't use the OFED scripts to install the Infiniband drivers, so I had to compile and install everything from source (which seemed to go fine). Do I have to do something extra to get permissions set and ports active? Thanks for your help! ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at mellanox.co.il Mon Sep 15 05:47:53 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 15 Sep 2008 15:47:53 +0300 Subject: [ofa-general] [PATCH] mlx4/IB: Set the PRESENT bit in the physical buffer list to match the MTT format. Message-ID: <20080915124753.GA13187@mellanox.co.il> Signed-off-by: Vladimir Sokolovsky --- Hi Roland, This is the last fix for FRWR. drivers/infiniband/hw/mlx4/qp.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index f29dbb7..5762944 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -57,6 +57,8 @@ enum { MLX4_IB_UD_HEADER_SIZE = 72 }; +#define MLX4_MTT_FLAG_PRESENT 1 + struct mlx4_ib_sqp { struct mlx4_ib_qp qp; int pkey_index; @@ -1342,6 +1344,12 @@ static __be32 convert_access(int acc) static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg, struct ib_send_wr *wr) { struct mlx4_ib_fast_reg_page_list *mfrpl = to_mfrpl(wr->wr.fast_reg.page_list); + int i; + + for (i = 0; i < wr->wr.fast_reg.page_list_len; ++i) + wr->wr.fast_reg.page_list->page_list[i] = + cpu_to_be64(wr->wr.fast_reg.page_list->page_list[i] | + MLX4_MTT_FLAG_PRESENT); fseg->flags = convert_access(wr->wr.fast_reg.access_flags); fseg->mem_key = cpu_to_be32(wr->wr.fast_reg.rkey); -- 1.6.0.1.90.g27a6e From Sumit.Gaur at Sun.COM Mon Sep 15 06:52:05 2008 From: Sumit.Gaur at Sun.COM (Sumit Gaur - Sun Microsystem) Date: Mon, 15 Sep 2008 19:22:05 +0530 Subject: [ofa-general] sndbuf and recvbuf size In-Reply-To: <20080915122521.264E6E60D12@openfabrics.org> References: <20080915122521.264E6E60D12@openfabrics.org> Message-ID: <48CE6885.8080408@Sun.COM> Hi I know it could be a basic question, but I do know the answer. Why we need to assign approx double size to sndbuf and recvbuf in mad_rpc request at memset(sndbuf, 0, umad_size() + IB_MAD_SIZE);. Thanks sumit From tziporet at mellanox.co.il Mon Sep 15 08:33:31 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 15 Sep 2008 18:33:31 +0300 Subject: [ofa-general] OFED meeting agenda for today (Sep 15) Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD0FE68E@mtlexch01.mtl.com> Agenda for OFED meeting today on OFED 1.4 status: 1. bugs review (see attached) <> 2. Review testing matrix (was sent before) 3. Open discussion Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bugs-2008-09-15.csv Type: application/octet-stream Size: 2147 bytes Desc: bugs-2008-09-15.csv URL: From AHKumar at odu.edu Mon Sep 15 08:43:36 2008 From: AHKumar at odu.edu (Kumar, Amit H.) Date: Mon, 15 Sep 2008 11:43:36 -0400 Subject: [ofa-general] How to Interpret MTU reported by "ibv_devinfo" vs "ifconfig ib0" In-Reply-To: <48CA9CF4.3070502@scalableinformatics.com> References: <48CA9CF4.3070502@scalableinformatics.com> Message-ID: Hello Everyone, Why do we see a difference in the MTU reported here by: ibv_devinfo and "ifconfig ib0" How do we interpret them? Also Is there a document where I can read in detail about IPoIB and applications that benefit from them. In general I understand that Socket based applications can make use of IPoIB for a better bandwidth, thought NOT for a better transport latency. In short I am trying to understand the difference and advantage, for an Application using "Ethernet NIC" vs "InfiniBand HCA(IPoIB enabled)", apart from knowing that there is no advantage in terms of transport latency. Thank you!, Amit # ibv_devinfo hca_id: mthca0 fw_ver: 1.2.0 node_guid: 0006:6a00:9800:e8e7 sys_image_guid: 0006:6a00:9800:e8e7 vendor_id: 0x066a vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_0230000001 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 2 port_lid: 29 port_lmc: 0x00 # ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.25.44.250 Bcast:172.25.44.255 Mask:255.255.255.0 inet6 addr: fe80::206:6a00:a000:e8e7/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:138863 errors:0 dropped:0 overruns:0 frame:0 TX packets:195994 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:11000200 (10.4 MiB) TX bytes:4321658633 (4.0 GiB) From dave.olson at qlogic.com Mon Sep 15 09:14:45 2008 From: dave.olson at qlogic.com (Dave Olson) Date: Mon, 15 Sep 2008 09:14:45 -0700 (PDT) Subject: [ofa-general] testing memory chins on ib cards In-Reply-To: <20080915095340.266B52E5C7C@f27.poczta.interia.pl> References: <20080915095340.266B52E5C7C@f27.poczta.interia.pl> Message-ID: On Mon, 15 Sep 2008, kovlensky at interia.pl wrote: | Hi all, | | I observe such problem spawning randomly on my nodes: | | kernel: ib_ipath 0000:03:00.0: RXE parity, Eager TID port 0 idx 0x33c expected 20447819, but got 20047819. | kernel: ib_ipath 0000:03:00.0: infinipath0: RXE parity Eager TID not recoverable, read 20047819, expected 20447819 | kernel: ib_ipath 0000:03:00.0: infinipath0: RXE parity, Eager TID error is not recoverable | | That's for qlogic cards, Mellanox ones seem to be much, much more stable. As I need to stress every card I just looking for a tool to make memory chips there under heavy load and, unfortunately, with not much luck. So what's the tool for diagnosing the cards? There is no memory on the card, this is on-chip memory. The only test tool for it is a QLogic internal manufacturing test tool. If you are seeing this more than once on the same card, you should get the card replaced by contacting QLogic support. Some memory errors are inevitable. We try to recover from them, but not all of them are recoverable (have a "known good" backup, or are known to be safe to rewrite and continue). Dave Olson dave.olson at qlogic.com From yossi.openib at gmail.com Mon Sep 15 09:51:12 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Mon, 15 Sep 2008 19:51:12 +0300 Subject: ***SPAM*** Re: [ofa-general] [PATCH v2] ipiob: fix rtnl deadlock In-Reply-To: References: <4899CF0A.1060509@Voltaire.COM> <32cb786f0808081155o19f8fb9dm217cd6996dffa3e5@mail.gmail.com> <32cb786f0808090538j272842b1r5117547cccde0d06@mail.gmail.com> <32cb786f0808161218o417553b5w1738a517f0eb468a@mail.gmail.com> <48CA50E1.2090309@gmail.com> <48CA89E6.8030301@gmail.com> Message-ID: <48CE9280.4010806@gmail.com> Roland Dreier wrote: > > ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush() > > which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter > > calls ib_sa_free_multicast(), and this wait until the multicast > > completion handler finishes. This happens to be > > ipoib_mcast_join_complete(), which > > waits for the rtnl_lock(), whcih was already taken by ipoib_stop(). > > I see... I wonder why lockdep didn't warn about this in my testing. > Anyway, any ideas how we want to fix this? > > - R. > You queue the netif_carrier_on() stuff on ipoib_workqueue instead of running it (from ipoib_mcast_join_complete()). From ralph.campbell at qlogic.com Mon Sep 15 10:30:05 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Mon, 15 Sep 2008 10:30:05 -0700 Subject: [ofa-general] How to Interpret MTU reported by "ibv_devinfo" vs "ifconfig ib0" In-Reply-To: References: <48CA9CF4.3070502@scalableinformatics.com> Message-ID: <1221499805.30937.223.camel@chromite.mv.qlogic.com> The MTU reported by "ifconfig ib0" is the MTU used by the Linux TCP/IP network stack. The MTU reported by ibv_devinfo is the MTU that the hardware is capable of sending. This is limited to 4K by the Infiniband specification. The reason the network stack can have a higher MTU is that ib_ipoib is using the RC QP protocol to send IP messages larger than the hardware MTU. If you use "datagram" mode for ib_ipoib, you will see that the network stack MTU is limited to the hardware MTU - 4. On Mon, 2008-09-15 at 11:43 -0400, Kumar, Amit H. wrote: > Hello Everyone, > > Why do we see a difference in the MTU reported here by: ibv_devinfo and "ifconfig ib0" How do we interpret them? > > > Also Is there a document where I can read in detail about IPoIB and applications that benefit from them. > In general I understand that Socket based applications can make use of IPoIB for a better bandwidth, thought NOT for a better transport latency. > In short I am trying to understand the difference and advantage, for an Application using "Ethernet NIC" vs "InfiniBand HCA(IPoIB enabled)", apart from knowing that there is no advantage in terms of transport latency. > > > Thank you!, > Amit > > # ibv_devinfo > hca_id: mthca0 > fw_ver: 1.2.0 > node_guid: 0006:6a00:9800:e8e7 > sys_image_guid: 0006:6a00:9800:e8e7 > vendor_id: 0x066a > vendor_part_id: 25204 > hw_ver: 0xA0 > board_id: MT_0230000001 > phys_port_cnt: 1 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 2 > port_lid: 29 > port_lmc: 0x00 > > > > # ifconfig ib0 > ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > inet addr:172.25.44.250 Bcast:172.25.44.255 Mask:255.255.255.0 > inet6 addr: fe80::206:6a00:a000:e8e7/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 > RX packets:138863 errors:0 dropped:0 overruns:0 frame:0 > TX packets:195994 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:256 > RX bytes:11000200 (10.4 MiB) TX bytes:4321658633 (4.0 GiB) > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ctung at neteffect.com Mon Sep 15 10:36:53 2008 From: ctung at neteffect.com (Chien Tung) Date: Mon, 15 Sep 2008 12:36:53 -0500 Subject: [ofa-general] [PATCH] RDMA/nes: 4 port 1G HP blade card support Message-ID: <200809151736.m8FHaroS010450@velma.neteffect.com> * Adding support for NetEffect 4 port 1G HP blade card. The mapping between physical port and MAC is different from the standup card. Signed-off-by: Chien Tung -- Roland, Please consider this for 2.6.27. It has been applied and tested against 2.6.27-rc5. drivers/infiniband/hw/nes/nes.c | 29 +++++++++++++--- drivers/infiniband/hw/nes/nes_hw.c | 66 +++++++++++++++++++++++++++-------- drivers/infiniband/hw/nes/nes_hw.h | 1 + 3 files changed, 76 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index b0cab64..a539685 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -562,7 +562,26 @@ static int __devinit nes_probe(struct pci_dev *pcidev, const struct pci_device_i nesdev->nesadapter->pd_config_base[PCI_FUNC(nesdev->pcidev->devfn)]; */ nesdev->base_doorbell_index = 1; nesdev->doorbell_start = nesdev->nesadapter->doorbell_start; - nesdev->mac_index = PCI_FUNC(nesdev->pcidev->devfn) % nesdev->nesadapter->port_count; + if (nesdev->nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G) { + switch (PCI_FUNC(nesdev->pcidev->devfn) % + nesdev->nesadapter->port_count) { + case 1: + nesdev->mac_index = 2; + break; + case 2: + nesdev->mac_index = 1; + break; + case 3: + nesdev->mac_index = 3; + break; + case 0: + default: + nesdev->mac_index = 0; + } + } else { + nesdev->mac_index = PCI_FUNC(nesdev->pcidev->devfn) % + nesdev->nesadapter->port_count; + } tasklet_init(&nesdev->dpc_tasklet, nes_dpc, (unsigned long)nesdev); @@ -581,7 +600,7 @@ static int __devinit nes_probe(struct pci_dev *pcidev, const struct pci_device_i nesdev->int_req = (0x101 << PCI_FUNC(nesdev->pcidev->devfn)) | (1 << (PCI_FUNC(nesdev->pcidev->devfn)+16)); if (PCI_FUNC(nesdev->pcidev->devfn) < 4) { - nesdev->int_req |= (1 << (PCI_FUNC(nesdev->pcidev->devfn)+24)); + nesdev->int_req |= (1 << (PCI_FUNC(nesdev->mac_index)+24)); } /* TODO: This really should be the first driver to load, not function 0 */ @@ -772,14 +791,14 @@ static ssize_t nes_show_adapter(struct device_driver *ddp, char *buf) list_for_each_entry(nesdev, &nes_dev_list, list) { if (i == ee_flsh_adapter) { - devfn = nesdev->nesadapter->devfn; - bus_number = nesdev->nesadapter->bus_number; + devfn = nesdev->pcidev->devfn; + bus_number = nesdev->pcidev->bus->number; break; } i++; } - return snprintf(buf, PAGE_SIZE, "%x:%x", bus_number, devfn); + return snprintf(buf, PAGE_SIZE, "%x:%x\n", bus_number, devfn); } static ssize_t nes_store_adapter(struct device_driver *ddp, diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 1513d40..bdd98e6 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -61,7 +61,7 @@ u32 int_mod_cq_depth_1; static void nes_cqp_ce_handler(struct nes_device *nesdev, struct nes_hw_cq *cq); static void nes_init_csr_ne020(struct nes_device *nesdev, u8 hw_rev, u8 port_count); static int nes_init_serdes(struct nes_device *nesdev, u8 hw_rev, u8 port_count, - u8 OneG_Mode); + struct nes_adapter *nesadapter, u8 OneG_Mode); static void nes_nic_napi_ce_handler(struct nes_device *nesdev, struct nes_hw_nic_cq *cq); static void nes_process_aeq(struct nes_device *nesdev, struct nes_hw_aeq *aeq); static void nes_process_ceq(struct nes_device *nesdev, struct nes_hw_ceq *ceq); @@ -292,9 +292,6 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) { if ((port_count = nes_reset_adapter_ne020(nesdev, &OneG_Mode)) == 0) return NULL; - if (nes_init_serdes(nesdev, hw_rev, port_count, OneG_Mode)) - return NULL; - nes_init_csr_ne020(nesdev, hw_rev, port_count); max_qp = nes_read_indexed(nesdev, NES_IDX_QP_CTX_SIZE); nes_debug(NES_DBG_INIT, "QP_CTX_SIZE=%u\n", max_qp); @@ -353,6 +350,19 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) { nes_debug(NES_DBG_INIT, "Allocating new nesadapter @ %p, size = %u (actual size = %u).\n", nesadapter, (u32)sizeof(struct nes_adapter), adapter_size); + if (nes_read_eeprom_values(nesdev, nesadapter)) { + printk(KERN_ERR PFX "Unable to read EEPROM data.\n"); + kfree(nesadapter); + return NULL; + } + + if (nes_init_serdes(nesdev, hw_rev, port_count, nesadapter, + OneG_Mode)) { + kfree(nesadapter); + return NULL; + } + nes_init_csr_ne020(nesdev, hw_rev, port_count); + /* populate the new nesadapter */ nesadapter->devfn = nesdev->pcidev->devfn; nesadapter->bus_number = nesdev->pcidev->bus->number; @@ -468,20 +478,25 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) { /* setup port configuration */ if (nesadapter->port_count == 1) { - u32temp = 0x00000000; + nesadapter->log_port = 0x00000000; if (nes_drv_opt & NES_DRV_OPT_DUAL_LOGICAL_PORT) nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE, 0x00000002); else nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE, 0x00000003); } else { - if (nesadapter->port_count == 2) - u32temp = 0x00000044; - else - u32temp = 0x000000e4; + if (nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G) { + nesadapter->log_port = 0x000000D8; + } else { + if (nesadapter->port_count == 2) + nesadapter->log_port = 0x00000044; + else + nesadapter->log_port = 0x000000e4; + } nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE, 0x00000003); } - nes_write_indexed(nesdev, NES_IDX_NIC_LOGPORT_TO_PHYPORT, u32temp); + nes_write_indexed(nesdev, NES_IDX_NIC_LOGPORT_TO_PHYPORT, + nesadapter->log_port); nes_debug(NES_DBG_INIT, "Probe time, LOG2PHY=%u\n", nes_read_indexed(nesdev, NES_IDX_NIC_LOGPORT_TO_PHYPORT)); @@ -706,23 +721,43 @@ static unsigned int nes_reset_adapter_ne020(struct nes_device *nesdev, u8 *OneG_ * nes_init_serdes */ static int nes_init_serdes(struct nes_device *nesdev, u8 hw_rev, u8 port_count, - u8 OneG_Mode) + struct nes_adapter *nesadapter, u8 OneG_Mode) { int i; u32 u32temp; + u32 serdes_common_control; if (hw_rev != NE020_REV) { /* init serdes 0 */ nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_CDR_CONTROL0, 0x000000FF); - if (!OneG_Mode) + if (nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G) { + serdes_common_control = nes_read_indexed(nesdev, + NES_IDX_ETH_SERDES_COMMON_CONTROL0); + serdes_common_control |= 0x000000100; + nes_write_indexed(nesdev, + NES_IDX_ETH_SERDES_COMMON_CONTROL0, + serdes_common_control); + } else if (!OneG_Mode) { nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_TX_HIGHZ_LANE_MODE0, 0x11110000); - if (port_count > 1) { + } + if (((port_count > 1) && + (nesadapter->phy_type[0] != NES_PHY_TYPE_PUMA_1G)) || + ((port_count > 2) && + (nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G))) { /* init serdes 1 */ nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_CDR_CONTROL1, 0x000000FF); - if (!OneG_Mode) + if (nesadapter->phy_type[0] == NES_PHY_TYPE_PUMA_1G) { + serdes_common_control = nes_read_indexed(nesdev, + NES_IDX_ETH_SERDES_COMMON_CONTROL1); + serdes_common_control |= 0x000000100; + nes_write_indexed(nesdev, + NES_IDX_ETH_SERDES_COMMON_CONTROL1, + serdes_common_control); + } else if (!OneG_Mode) { nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_TX_HIGHZ_LANE_MODE1, 0x11110000); } + } } else { /* init serdes 0 */ nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, 0x00000008); @@ -2258,7 +2293,8 @@ static void nes_process_mac_intr(struct nes_device *nesdev, u32 mac_number) spin_unlock_irqrestore(&nesadapter->phy_lock, flags); } /* read the PHY interrupt status register */ - if (nesadapter->OneG_Mode) { + if ((nesadapter->OneG_Mode) && + (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { do { nes_read_1G_phy_reg(nesdev, 0x1a, nesadapter->phy_index[mac_index], &phy_data); diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 7b81e0a..fc0f063 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -1100,6 +1100,7 @@ struct nes_adapter { u8 mac_sw_state[4]; u8 mac_link_down[4]; u8 phy_type[4]; + u8 log_port; /* PCI information */ unsigned int devfn; From ctung at neteffect.com Mon Sep 15 10:36:53 2008 From: ctung at neteffect.com (Chien Tung) Date: Mon, 15 Sep 2008 12:36:53 -0500 Subject: [ofa-general] [PATCH] RDMA/nes: client side QP destroy Message-ID: <200809151736.m8FHarpC010448@velma.neteffect.com> Author: Faisal Latif * Fixed QP not destroyed properly on the client. * Misc cleanup in nes_cm.c patch verified with rping. Signed-off-by: Faisal Latif -- Roland, Please consider this for 2.6.27. It has been applied and tested against 2.6.27-rc5. drivers/infiniband/hw/nes/nes_cm.c | 20 +++++++------------- 1 files changed, 7 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 9f0b964..8793aa4 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1145,7 +1145,7 @@ static int rem_ref_cm_node(struct nes_cm_core *cm_core, struct nes_timer_entry *recv_entry; struct iw_cm_id *cm_id; struct list_head *list_core, *list_node_temp; - struct nes_qp *nesqp; + struct nes_qp *nesqp = NULL; if (!cm_node) return -EINVAL; @@ -1826,7 +1826,7 @@ static struct nes_cm_listener *mini_cm_listen(struct nes_cm_core *cm_core, /** * mini_cm_connect - make a connection node with params */ -struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, +static struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, struct nes_vnic *nesvnic, u16 private_data_len, void *private_data, struct nes_cm_info *cm_info) { @@ -1835,7 +1835,7 @@ struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, struct nes_cm_listener *loopbackremotelistener; struct nes_cm_node *loopbackremotenode; struct nes_cm_info loopback_cm_info; - u16 mpa_frame_size = sizeof(struct ietf_mpa_frame) + private_data_len; + u16 mpa_frame_size = 0; struct ietf_mpa_frame *mpa_frame = NULL; /* create a CM connection node */ @@ -1847,7 +1847,8 @@ struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, mpa_frame->flags = IETF_MPA_FLAGS_CRC; mpa_frame->rev = IETF_MPA_VERSION; mpa_frame->priv_data_len = htons(private_data_len); - + mpa_frame_size = sizeof(struct ietf_mpa_frame) + + private_data_len; /* set our node side to client (active) side */ cm_node->tcp_cntxt.client = 1; cm_node->tcp_cntxt.rcv_wscale = NES_CM_DEFAULT_RCV_WND_SCALE; @@ -1956,13 +1957,6 @@ static int mini_cm_reject(struct nes_cm_core *cm_core, return ret; cleanup_retrans_entry(cm_node); cm_node->state = NES_CM_STATE_CLOSED; - ret = send_fin(cm_node, NULL); - - if (cm_node->accept_pend) { - BUG_ON(!cm_node->listener); - atomic_dec(&cm_node->listener->pend_accepts_cnt); - BUG_ON(atomic_read(&cm_node->listener->pend_accepts_cnt) < 0); - } ret = send_reset(cm_node, NULL); return ret; @@ -2383,6 +2377,7 @@ static int nes_cm_disconn_true(struct nes_qp *nesqp) atomic_inc(&cm_disconnects); cm_event.event = IW_CM_EVENT_DISCONNECT; if (last_ae == NES_AEQE_AEID_LLP_CONNECTION_RESET) { + issued_disconnect_reset = 1; cm_event.status = IW_CM_EVENT_STATUS_RESET; nes_debug(NES_DBG_CM, "Generating a CM " "Disconnect Event (status reset) for " @@ -2508,7 +2503,6 @@ static int nes_disconnect(struct nes_qp *nesqp, int abrupt) nes_debug(NES_DBG_CM, "Call close API\n"); g_cm_core->api->close(g_cm_core, nesqp->cm_node); - nesqp->cm_node = NULL; } return ret; @@ -2837,6 +2831,7 @@ int nes_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) cm_node->apbvt_set = 1; nesqp->cm_node = cm_node; cm_node->nesqp = nesqp; + nes_add_ref(&nesqp->ibqp); return 0; } @@ -3167,7 +3162,6 @@ static void cm_event_connect_error(struct nes_cm_event *event) if (ret) printk(KERN_ERR "%s[%u] OFA CM event_handler returned, " "ret=%d\n", __func__, __LINE__, ret); - nes_rem_ref(&nesqp->ibqp); cm_id->rem_ref(cm_id); rem_ref_cm_node(event->cm_node->cm_core, event->cm_node); From yossi.openib at gmail.com Mon Sep 15 10:54:41 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Mon, 15 Sep 2008 20:54:41 +0300 Subject: ***SPAM*** Re: Fwd: [ofa-general] [PATCH] ipoib: fix hang while bringing down uninitialized interface In-Reply-To: <48CD2DD6.2020404@mellanox.co.il> References: <48C7DA7B.3050706@gmail.com> <48C942BE.7010606@gmail.com> <48CD2DD6.2020404@mellanox.co.il> Message-ID: <48CEA161.3090302@gmail.com> It's already there in OFED. Tziporet Koren wrote: > Roland Dreier wrote: >> > Commit >> http://www.openfabrics.org/git/?p=ofed_1_4/linux-2.6.git;a=commit;h=57ce41d1d18279cc90223f3deadca70c7de1cfca >> >> > put the bug in ipoib, but maybe this causes a hang only in recent >> kernels >> > due to modifications in timer code. >> >> So it looks like not a regression from 2.6.26... I'll queue this for >> 2.6.28 >> >> > Yossi - can you create us a patch for OFEd 1.4 since its not going to > 2.6.27 > > Thanks, > Tziporet > > From rdreier at cisco.com Mon Sep 15 10:57:12 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 10:57:12 -0700 Subject: [ofa-general] Re: [PATCH] RDMA/nes: client side QP destroy In-Reply-To: <200809151736.m8FHarpC010448@velma.neteffect.com> (Chien Tung's message of "Mon, 15 Sep 2008 12:36:53 -0500") References: <200809151736.m8FHarpC010448@velma.neteffect.com> Message-ID: > Please consider this for 2.6.27. It has been applied and tested against 2.6.27-rc5. Need more info about the impact to consider merging this for 2.6.27 now. What is the user-visible impact of this bug? Is this a regression from 2.6.26? - R. From rdreier at cisco.com Mon Sep 15 10:59:50 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 10:59:50 -0700 Subject: [ofa-general] [PATCH] RDMA/nes: 4 port 1G HP blade card support In-Reply-To: <200809151736.m8FHaroS010450@velma.neteffect.com> (Chien Tung's message of "Mon, 15 Sep 2008 12:36:53 -0500") References: <200809151736.m8FHaroS010450@velma.neteffect.com> Message-ID: > Please consider this for 2.6.27. It has been applied and tested against 2.6.27-rc5. Sorry, looks too big and risky for something that only adds new hardware support for this stage of 2.6.27. I'll queue for 2.6.28. - R. From ctung at NetEffect.com Mon Sep 15 11:04:40 2008 From: ctung at NetEffect.com (Chien Tung) Date: Mon, 15 Sep 2008 13:04:40 -0500 Subject: [ofa-general] RE: [PATCH] RDMA/nes: client side QP destroy In-Reply-To: References: <200809151736.m8FHarpC010448@velma.neteffect.com> Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0848ADA8@venom2> > > Please consider this for 2.6.27. It has been applied and > tested against 2.6.27-rc5. > > Need more info about the impact to consider merging this for > 2.6.27 now. > What is the user-visible impact of this bug? Is this a > regression from 2.6.26? Without this patch, rping would not exit on the client side. I had left some code behind for the CM patch and this makes up for it. Totally my fault. Chien From yossi.openib at gmail.com Mon Sep 15 11:18:04 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Mon, 15 Sep 2008 21:18:04 +0300 Subject: [ofa-general] ***SPAM*** [PATCH v2] ipoib: fix hang while bringing down uninitialized interface Message-ID: <48CEA6DC.9000904@gmail.com> Fix bug #1172: If a pkey for an interface is not found during initialization, then poll_timer is left uninitialized. When the device is brought down, ipoib tries to del_timer_sync() it. This call hangs in an infinite loop in lock_timer_base(), because timer_base is NULL. We should check whether the timer was really initialized. Changes from v1: - handle a case when ipoib_ib_dev_stop() is called twice on the same dev->priv - zero the timer after its deletion. Signed-off-by: Yossi Etigin PS Vlad, please replace the patch ipoib_0400_fix_hang_while_bringing_down by this one. -- Index: b/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-09-09 19:54:11.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-09-15 20:58:50.000000000 +0300 @@ -850,7 +850,12 @@ int ipoib_ib_dev_stop(struct net_device ipoib_dbg(priv, "All sends and receives done.\n"); timeout: - del_timer_sync(&priv->poll_timer); + /* Make sure the timer was initialized */ + if (priv->poll_timer.function) { + del_timer_sync(&priv->poll_timer); + memset(&priv->poll_timer, 0, sizeof priv->poll_timer); + } + qp_attr.qp_state = IB_QPS_RESET; if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE)) ipoib_warn(priv, "Failed to modify QP to RESET state\n"); From rdreier at cisco.com Mon Sep 15 11:25:05 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 11:25:05 -0700 Subject: [ofa-general] Re: [PATCH] RDMA/nes: client side QP destroy In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC0848ADA8@venom2> (Chien Tung's message of "Mon, 15 Sep 2008 13:04:40 -0500") References: <200809151736.m8FHarpC010448@velma.neteffect.com> <5E701717F2B2ED4EA60F87C8AA57B7CC0848ADA8@venom2> Message-ID: > Without this patch, rping would not exit on the client side. I had left > some code behind for the CM patch and this makes up for it. Totally my > fault. And this bug was introduced with the CM rework that went in after 2.6.26? So it is a post-2.6.26 regression? From ctung at NetEffect.com Mon Sep 15 11:46:54 2008 From: ctung at NetEffect.com (Chien Tung) Date: Mon, 15 Sep 2008 13:46:54 -0500 Subject: [ofa-general] RE: [PATCH] RDMA/nes: client side QP destroy In-Reply-To: References: <200809151736.m8FHarpC010448@velma.neteffect.com><5E701717F2B2ED4EA60F87C8AA57B7CC0848ADA8@venom2> Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0848ADC0@venom2> > And this bug was introduced with the CM rework that went in > after 2.6.26? So it is a post-2.6.26 regression? yes. From rdreier at cisco.com Mon Sep 15 11:53:06 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 11:53:06 -0700 Subject: [ofa-general] Re: [PATCH] RDMA/nes: client side QP destroy In-Reply-To: <200809151736.m8FHarpC010448@velma.neteffect.com> (Chien Tung's message of "Mon, 15 Sep 2008 12:36:53 -0500") References: <200809151736.m8FHarpC010448@velma.neteffect.com> Message-ID: > * Fixed QP not destroyed properly on the client. > * Misc cleanup in nes_cm.c can you resend this patch split up so that the minimal fix (for 2.6.27) is one patch, and the misc cleanup is a second patch (for 2.6.28)? We need the smallest safest possible patch for this stage of 2.6.27. Thanks - R. From rdreier at cisco.com Mon Sep 15 12:01:38 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 12:01:38 -0700 Subject: [ofa-general] [PATCH v2] ipiob: fix rtnl deadlock In-Reply-To: <48CE9280.4010806@gmail.com> (Yossi Etigin's message of "Mon, 15 Sep 2008 19:51:12 +0300") References: <4899CF0A.1060509@Voltaire.COM> <32cb786f0808081155o19f8fb9dm217cd6996dffa3e5@mail.gmail.com> <32cb786f0808090538j272842b1r5117547cccde0d06@mail.gmail.com> <32cb786f0808161218o417553b5w1738a517f0eb468a@mail.gmail.com> <48CA50E1.2090309@gmail.com> <48CA89E6.8030301@gmail.com> <48CE9280.4010806@gmail.com> Message-ID: > You queue the netif_carrier_on() stuff on ipoib_workqueue instead of > running it (from ipoib_mcast_join_complete()). I don't get it. How do you flush that workqueue on the device cleanup path without deadlocking on rtnl in the same way? - R. From yossi.openib at gmail.com Mon Sep 15 12:11:41 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Mon, 15 Sep 2008 22:11:41 +0300 Subject: ***SPAM*** Re: [ofa-general] [PATCH v2] ipiob: fix rtnl deadlock In-Reply-To: References: <4899CF0A.1060509@Voltaire.COM> <32cb786f0808081155o19f8fb9dm217cd6996dffa3e5@mail.gmail.com> <32cb786f0808090538j272842b1r5117547cccde0d06@mail.gmail.com> <32cb786f0808161218o417553b5w1738a517f0eb468a@mail.gmail.com> <48CA50E1.2090309@gmail.com> <48CA89E6.8030301@gmail.com> <48CE9280.4010806@gmail.com> Message-ID: <48CEB36D.7070800@gmail.com> because you flush it on module unload, and not interface stop. module unload does not take rtnl_lock, but interface stop does. Roland Dreier wrote: > > You queue the netif_carrier_on() stuff on ipoib_workqueue instead of > > running it (from ipoib_mcast_join_complete()). > > I don't get it. How do you flush that workqueue on the device cleanup > path without deadlocking on rtnl in the same way? > > - R. > From rdreier at cisco.com Mon Sep 15 12:13:47 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 12:13:47 -0700 Subject: [ofa-general] [PATCH v2] ipiob: fix rtnl deadlock In-Reply-To: <48CEB323.5040304@Voltaire.COM> (Yossi Etigin's message of "Mon, 15 Sep 2008 22:10:27 +0300") References: <4899CF0A.1060509@Voltaire.COM> <32cb786f0808081155o19f8fb9dm217cd6996dffa3e5@mail.gmail.com> <32cb786f0808090538j272842b1r5117547cccde0d06@mail.gmail.com> <32cb786f0808161218o417553b5w1738a517f0eb468a@mail.gmail.com> <48CA50E1.2090309@gmail.com> <48CA89E6.8030301@gmail.com> <48CE9280.4010806@gmail.com> <48CEB323.5040304@Voltaire.COM> Message-ID: > because you flush it on module unload, and not interface stop. > module unload does not take rtnl_lock, but interface stop does. Oh, I see. That's clever and should work I guess. - R. From chu11 at llnl.gov Mon Sep 15 12:20:48 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:20:48 -0700 Subject: [ofa-general] [OpenSM][0/18] - Routing Chaining Message-ID: <1221506448.6274.32.camel@cardanus.llnl.gov> Hey Sasha, As we've discussed before, we wanted to put routing chaining into opensm. Here is a patch series to support it. For others on the list, routing chaining is the ability to configure the order in which routing algorithms are applied in opensm, i.e. -R ftree,updn,minhop Try using ftree routing. If ftree fails, try updn. If updn fails, try minhop. In order to get this done, some rearchitecture of the routing code had to be done b/c there is no longer an assumption that only one routing engine can be specified. Here's a summary of the overall rearchitecture. osm_ucast defaults to minhop - The current code automatically defaulted to minhop if anything in the selected routing engine failed. Naturally this had to be changed for routing chaining. I moved minhop out of the ucast_mgr code to make it its own routing engine instead. osm_ucast assumption on routing failures - The current code defaulted to minhop if anything in the selected routing engine failed. Because of this some routing engines (most notably "file" routing) intentionally "failed" when it wanted default to some portion of minhop behavior. All routing behavior had to be moved into routing engines to have the routing engines fully fail/succeed on their own. updn routing - currently utilizes the minhop build_fwd_tables but minhop's code assumes if build_lid_matrices is not-null, it is in "up/dn routing mode" instead of "minhop mode". Perfectly fine when you can specify max of one routing engine, but needs to be abstracted out of minhop so up/dn is independent in its routing "attempt" in the chain. dor routing "dependency" on ucast_mgr - the is_dor flag was checked/determined inside the ucast_mgr. Dor routing had to be "split out" of the ucast manager so its routing engine is independent of another routing engine's "attempt" in the chain. minhop routing assumed to never fail - Currently minhop routing cannot "fail". So if someone wanted to put minhop into the middle of a routing chain, it makes no sense. I assume this was based on legacy, when the minhop algorithm did not have options like "guid_routing_order_file" that could be parsed incorrectly. So I made changes to allow minhop to have options passed to it that allow it to "fail" or "move on no matter what". Subsequently, if all routing chaining inputs from the user fail, a bare bones "move on no matter what" minhop is executed. If no routing algorithm is specified, we still use minhop by default. So, lots of rearchitecture were done and lots of cleanup was done as well. Some bug fixes along the way too. Naturally, there may be some style differences and some code-efficiencies I just don't see right now. I may have missed something in the routing rearchitecture in part 2. But at the core, it seems to work :-) I've currently only tested against ibism, not a real cluster. Hope to do that later on. Please let me know what you think. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Mon Sep 15 12:21:45 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:21:45 -0700 Subject: [ofa-general] [OpenSM][1/18] - Routing Chaining Message-ID: <1221506505.6274.34.camel@cardanus.llnl.gov> split off minhop code into own files out of osm_ucast_mgr.c. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-move-minhop-routing-from-osm_ucast_mgr-to-own-files.patch Type: text/x-patch Size: 44748 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:22:10 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:22:10 -0700 Subject: [ofa-general] [OpenSM][2/18] - Routing Chaining Message-ID: <1221506530.6274.35.camel@cardanus.llnl.gov> add osm_ucast_lash.h, osm_ucast_updn.h, osm_ucast_file.h, and osm_ucast_ftree.h for consistency. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-add-routing-header-files-for-consistency.patch Type: text/x-patch Size: 12610 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:22:29 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:22:29 -0700 Subject: [ofa-general] [OpenSM][3/18] - Routing Chaining Message-ID: <1221506549.6274.37.camel@cardanus.llnl.gov> add and use a osm_ucast_minhop_setup() call for similarity to other routing engines. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-add-osm_ucast_minhop_setup-function.patch Type: text/x-patch Size: 3424 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:22:49 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:22:49 -0700 Subject: [ofa-general] [OpenSM][4/18] - Routing Chaining Message-ID: <1221506569.6274.39.camel@cardanus.llnl.gov> split off dor into its own routing engine like other algorithms. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0004-add-dor-routing-files.patch Type: text/x-patch Size: 6983 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:24:19 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:24:19 -0700 Subject: [ofa-general] [OpenSM][5/18] - Routing Chaining Message-ID: <1221506659.6274.42.camel@cardanus.llnl.gov> move is_dor out of osm_ucast_mgr_t into dor routing engine. Makes a common function called osm_ucast_minhop_and_dor_build_fwd_tables() which takes some flags so appropriate routing engines can specify behavior they'd like. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0005-move-dor-recognition-from-flag-to-callback.patch Type: text/x-patch Size: 6897 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:25:05 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:25:05 -0700 Subject: [ofa-general] [OpenSM][6/18] - Routing Chaining Message-ID: <1221506705.6274.45.camel@cardanus.llnl.gov> move port_order_list out of osm_ucast_mgr_t into minhop routing engine. Unlike the is_dor flag, this one wasn't routing necessary, but made things easier for later. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0006-move-port_order_list-from-osm_ucast_mgr-into-minhop.patch Type: text/x-patch Size: 4825 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:25:28 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:25:28 -0700 Subject: [ofa-general] [OpenSM][7/18] - Routing Chaining Message-ID: <1221506728.6274.46.camel@cardanus.llnl.gov> move some_hop_count_set out of osm_ucast_mgr_t into minhop routing engine. Unlike the is_dor flag, this one wasn't routing necessary, but made things easier for later. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0007-move-some_hop_count_set-from-osm_ucast_mgr-to-minhop.patch Type: text/x-patch Size: 4992 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:28:08 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:28:08 -0700 Subject: [ofa-general] [OpenSM][8/18] - Routing Chaining Message-ID: <1221506888.6274.54.camel@cardanus.llnl.gov> make all minhop code take osm_opensm_t as a parameter instead of osm_ucast_mgr_t for consistency to other routing engines. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0008-rearchitect-all-minhop-to-take-osm_opensm_t-as-param.patch Type: text/x-patch Size: 22495 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:28:13 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:28:13 -0700 Subject: [ofa-general] [OpenSM][9/18] - Routing Chaining Message-ID: <1221506893.6274.55.camel@cardanus.llnl.gov> make file routing engine handle default/minhop conditions instead of letting osm_ucast_mgr code handle the defaults. This is so when file routing returns a -1, it's because it's really failed. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0009-handle-minhop-lid_matrices-and-fwd_tables-in-file-ro.patch Type: text/x-patch Size: 5462 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:28:15 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:28:15 -0700 Subject: [ofa-general] [OpenSM][10/18] - Routing Chaining Message-ID: <1221506895.6274.56.camel@cardanus.llnl.gov> remove minhop assumptions of when it might be running in updn mode by having updn have an explicit callback for build_fwd_tables and specify flags to indicate the behavior it wants from minhop. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0010-remove-minhop-assumptions-of-being-in-updn-mode.patch Type: text/x-patch Size: 7008 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:31:19 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:31:19 -0700 Subject: [ofa-general] [OpenSM][11/18] - Routing Chaining Message-ID: <1221507079.6274.62.camel@cardanus.llnl.gov> Mostly a cleanup patch. Make all previous patch's routing callback functions take a void * instead of osm_opensm_t * for consistency to other routing callbacks. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0011-all-build_lid_matrices-and-build_fwd_tables-takes-vo.patch Type: text/x-patch Size: 2957 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:31:23 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:31:23 -0700 Subject: [ofa-general] [OpenSM][12/18] - Routing Chaining Message-ID: <1221507083.6274.63.camel@cardanus.llnl.gov> make all routing engines always specify build_lid_matrices and build_fwd_tables callback. Do not assume you can default to minhop by not specifying a callback. This make "failures" between routing engines consistent. If any single callback to a routing engine fails, you know the routing engine really failed. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0012-having-routing-engines-always-specify-lid-and-fwd-ca.patch Type: text/x-patch Size: 2836 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:31:35 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:31:35 -0700 Subject: [ofa-general] [OpenSM][13/18] - Routing Chaining Message-ID: <1221507095.6274.64.camel@cardanus.llnl.gov> always setup a routing engine, assume no default "fallthrough" minhop routing engine. On configured routing engine failure, do minhop as a last resort. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0013-always-setup-a-routing-engine.patch Type: text/x-patch Size: 4385 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:31:46 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:31:46 -0700 Subject: [ofa-general] [OpenSM][14/18] - Routing Chaining Message-ID: <1221507106.6274.65.camel@cardanus.llnl.gov> allow minhop to fail with errors. Do "minhop but continue on with defaults if there are errors" call as last resort routing Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0014-allow-minhop-routing-to-fail-on-errors.patch Type: text/x-patch Size: 6469 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:32:56 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:32:56 -0700 Subject: [ofa-general] [OpenSM][15/18] - Routing Chaining Message-ID: <1221507176.6274.70.camel@cardanus.llnl.gov> pass a struct osm_routing_engine to all routing engine setup functions to prepare for routing chaining. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0015-pass-osm_routing_engine-to-routing-setup-functions.patch Type: text/x-patch Size: 11739 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:34:02 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:34:02 -0700 Subject: [ofa-general] [OpenSM][16/18] - Routing Chaining Message-ID: <1221507242.6274.74.camel@cardanus.llnl.gov> stick a *next pointer into struct osm_routing_engine. Rearchitect routing engine usage as a list instead of a single struct. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0016-rearchitect-osm_routing_engine-as-a-list-data-struct.patch Type: text/x-patch Size: 8674 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:34:08 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:34:08 -0700 Subject: [ofa-general] [OpenSM][17/18] - Routing Chaining Message-ID: <1221507248.6274.75.camel@cardanus.llnl.gov> implement routing chaining (parse multiple routing chainings, put them into a list, add manpage entries, etc.) Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0017-implement-routing-chaining.patch Type: text/x-patch Size: 6062 bytes Desc: not available URL: From chu11 at llnl.gov Mon Sep 15 12:34:10 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 15 Sep 2008 12:34:10 -0700 Subject: [ofa-general] [OpenSM][18/18] - Routing Chaining Message-ID: <1221507250.6274.76.camel@cardanus.llnl.gov> Messages like "fall back to default routing" no longer make sense. Tweak as necessary. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0018-cleanup-comments-for-changes-to-routing.patch Type: text/x-patch Size: 3398 bytes Desc: not available URL: From AHKumar at odu.edu Mon Sep 15 12:50:18 2008 From: AHKumar at odu.edu (Kumar, Amit H.) Date: Mon, 15 Sep 2008 15:50:18 -0400 Subject: [ofa-general] How to Interpret MTU reported by "ibv_devinfo" vs "ifconfig ib0" In-Reply-To: <1221499805.30937.223.camel@chromite.mv.qlogic.com> References: <48CA9CF4.3070502@scalableinformatics.com> <1221499805.30937.223.camel@chromite.mv.qlogic.com> Message-ID: > > The MTU reported by "ifconfig ib0" is the MTU used by the Linux > TCP/IP network stack. The MTU reported by ibv_devinfo is the > MTU that the hardware is capable of sending. This is limited > to 4K by the Infiniband specification. The reason the network > stack can have a higher MTU is that ib_ipoib is using the RC > QP protocol to send IP messages larger than the hardware MTU. > If you use "datagram" mode for ib_ipoib, you will see that > the network stack MTU is limited to the hardware MTU - 4. Thank you Ralph!!! As far as common applications taking advantage of ib_ipoib Does it help using RC QP with a higher MTU than the hardware MTU? Does an Application, which uses Sockets API, by default make use of ib_ipoib, if it is enabled ? Is there any essential difference between IPoIB and ib_ipoib, or is it just a matter of usage ? Thank you, Amit > > > > Also Is there a document where I can read in detail about IPoIB and > applications that benefit from them. > > In general I understand that Socket based applications can make use > of IPoIB for a better bandwidth, thought NOT for a better transport > latency. > > In short I am trying to understand the difference and advantage, for > an Application using "Ethernet NIC" vs "InfiniBand HCA(IPoIB enabled)", > apart from knowing that there is no advantage in terms of transport > latency. > > From ctung at neteffect.com Mon Sep 15 12:58:02 2008 From: ctung at neteffect.com (Chien Tung) Date: Mon, 15 Sep 2008 14:58:02 -0500 Subject: [ofa-general] [PATCH] RDMA/nes: nes_cm.c cleanup Message-ID: <200809151958.m8FJw2sk012367@velma.neteffect.com> Author: Faisal Latif * Misc cleanup in nes_cm.c patch verified with rping. Signed-off-by: Faisal Latif -- Please queue for 2.6.28. drivers/infiniband/hw/nes/nes_cm.c | 9 +++++---- 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 499d3cf..8793aa4 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1145,7 +1145,7 @@ static int rem_ref_cm_node(struct nes_cm_core *cm_core, struct nes_timer_entry *recv_entry; struct iw_cm_id *cm_id; struct list_head *list_core, *list_node_temp; - struct nes_qp *nesqp; + struct nes_qp *nesqp = NULL; if (!cm_node) return -EINVAL; @@ -1826,7 +1826,7 @@ static struct nes_cm_listener *mini_cm_listen(struct nes_cm_core *cm_core, /** * mini_cm_connect - make a connection node with params */ -struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, +static struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, struct nes_vnic *nesvnic, u16 private_data_len, void *private_data, struct nes_cm_info *cm_info) { @@ -1835,7 +1835,7 @@ struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, struct nes_cm_listener *loopbackremotelistener; struct nes_cm_node *loopbackremotenode; struct nes_cm_info loopback_cm_info; - u16 mpa_frame_size = sizeof(struct ietf_mpa_frame) + private_data_len; + u16 mpa_frame_size = 0; struct ietf_mpa_frame *mpa_frame = NULL; /* create a CM connection node */ @@ -1847,7 +1847,8 @@ struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, mpa_frame->flags = IETF_MPA_FLAGS_CRC; mpa_frame->rev = IETF_MPA_VERSION; mpa_frame->priv_data_len = htons(private_data_len); - + mpa_frame_size = sizeof(struct ietf_mpa_frame) + + private_data_len; /* set our node side to client (active) side */ cm_node->tcp_cntxt.client = 1; cm_node->tcp_cntxt.rcv_wscale = NES_CM_DEFAULT_RCV_WND_SCALE; From ctung at neteffect.com Mon Sep 15 12:58:02 2008 From: ctung at neteffect.com (Chien Tung) Date: Mon, 15 Sep 2008 14:58:02 -0500 Subject: [ofa-general] [PATCH v2] RDMA/nes: client side QP destroy Message-ID: <200809151958.m8FJw2pY012365@velma.neteffect.com> Author: Faisal Latif * Fixed QP not destroyed properly on the client. patch verified with rping. Signed-off-by: Faisal Latif -- Change from V1: moved cleanup code to its own patch for 2.6.28. drivers/infiniband/hw/nes/nes_cm.c | 11 ++--------- 1 files changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 9f0b964..499d3cf 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1956,13 +1956,6 @@ static int mini_cm_reject(struct nes_cm_core *cm_core, return ret; cleanup_retrans_entry(cm_node); cm_node->state = NES_CM_STATE_CLOSED; - ret = send_fin(cm_node, NULL); - - if (cm_node->accept_pend) { - BUG_ON(!cm_node->listener); - atomic_dec(&cm_node->listener->pend_accepts_cnt); - BUG_ON(atomic_read(&cm_node->listener->pend_accepts_cnt) < 0); - } ret = send_reset(cm_node, NULL); return ret; @@ -2383,6 +2376,7 @@ static int nes_cm_disconn_true(struct nes_qp *nesqp) atomic_inc(&cm_disconnects); cm_event.event = IW_CM_EVENT_DISCONNECT; if (last_ae == NES_AEQE_AEID_LLP_CONNECTION_RESET) { + issued_disconnect_reset = 1; cm_event.status = IW_CM_EVENT_STATUS_RESET; nes_debug(NES_DBG_CM, "Generating a CM " "Disconnect Event (status reset) for " @@ -2508,7 +2502,6 @@ static int nes_disconnect(struct nes_qp *nesqp, int abrupt) nes_debug(NES_DBG_CM, "Call close API\n"); g_cm_core->api->close(g_cm_core, nesqp->cm_node); - nesqp->cm_node = NULL; } return ret; @@ -2837,6 +2830,7 @@ int nes_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) cm_node->apbvt_set = 1; nesqp->cm_node = cm_node; cm_node->nesqp = nesqp; + nes_add_ref(&nesqp->ibqp); return 0; } @@ -3167,7 +3161,6 @@ static void cm_event_connect_error(struct nes_cm_event *event) if (ret) printk(KERN_ERR "%s[%u] OFA CM event_handler returned, " "ret=%d\n", __func__, __LINE__, ret); - nes_rem_ref(&nesqp->ibqp); cm_id->rem_ref(cm_id); rem_ref_cm_node(event->cm_node->cm_core, event->cm_node); From ralph.campbell at qlogic.com Mon Sep 15 13:12:35 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Mon, 15 Sep 2008 13:12:35 -0700 Subject: [ofa-general] How to Interpret MTU reported by "ibv_devinfo" vs "ifconfig ib0" In-Reply-To: References: <48CA9CF4.3070502@scalableinformatics.com> <1221499805.30937.223.camel@chromite.mv.qlogic.com> Message-ID: <1221509555.30937.246.camel@chromite.mv.qlogic.com> On Mon, 2008-09-15 at 15:50 -0400, Kumar, Amit H. wrote: > > > > The MTU reported by "ifconfig ib0" is the MTU used by the Linux > > TCP/IP network stack. The MTU reported by ibv_devinfo is the > > MTU that the hardware is capable of sending. This is limited > > to 4K by the Infiniband specification. The reason the network > > stack can have a higher MTU is that ib_ipoib is using the RC > > QP protocol to send IP messages larger than the hardware MTU. > > If you use "datagram" mode for ib_ipoib, you will see that > > the network stack MTU is limited to the hardware MTU - 4. > > Thank you Ralph!!! As far as common applications taking advantage of ib_ipoib > Does it help using RC QP with a higher MTU than the hardware MTU? > > Does an Application, which uses Sockets API, by default make use of ib_ipoib, > if it is enabled ? > > Is there any essential difference between IPoIB and ib_ipoib, or is it just a matter of usage ? > > Thank you, > Amit IPoIB is ib_ipoib. The first is the name in the IB spec. the second is the name of the kernel module. ib_ipoib just looks like another sockets network device to Linux so Sockets API calls work normally (you just need to use the IP address of the ib0 device). The reason a larger network MTU helps is because the Linux network stack is more efficient when using larger MTUs. From AHKumar at odu.edu Mon Sep 15 13:16:07 2008 From: AHKumar at odu.edu (Kumar, Amit H.) Date: Mon, 15 Sep 2008 16:16:07 -0400 Subject: [ofa-general] How to Interpret MTU reported by "ibv_devinfo" vs "ifconfig ib0" In-Reply-To: <1221509555.30937.246.camel@chromite.mv.qlogic.com> References: <48CA9CF4.3070502@scalableinformatics.com> <1221499805.30937.223.camel@chromite.mv.qlogic.com> <1221509555.30937.246.camel@chromite.mv.qlogic.com> Message-ID: > IPoIB is ib_ipoib. The first is the name in the IB spec. the second is > the name of the kernel module. > ib_ipoib just looks like another sockets network device to Linux so > Sockets API calls work normally (you just need to use the IP address > of the ib0 device). > The reason a larger network MTU helps is because the Linux network > stack is more efficient when using larger MTUs. Okay!! Thanks a bunch!!! From yossi.openib at gmail.com Mon Sep 15 13:45:45 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Mon, 15 Sep 2008 23:45:45 +0300 Subject: [ofa-general] ***SPAM*** [PATCH] ipoib: fix deadlock between join completion handler and ipoib_stop Message-ID: <48CEC979.8030506@gmail.com> Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). Signed-off-by: Yossi Etigin -- Index: b/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2008-08-27 21:03:44.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2008-09-15 23:08:30.000000000 +0300 @@ -293,6 +293,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_poll_task; struct delayed_work mcast_task; + struct work_struct broadcast_join_task; struct work_struct flush_light; struct work_struct flush_normal; struct work_struct flush_heavy; @@ -464,6 +465,7 @@ int ipoib_dev_init(struct net_device *de void ipoib_dev_cleanup(struct net_device *dev); void ipoib_mcast_join_task(struct work_struct *work); +void ipoib_mcast_broadcast_join_task(struct work_struct *work); void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb); void ipoib_mcast_restart_task(struct work_struct *work); Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-09-08 20:14:08.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-09-15 23:07:45.000000000 +0300 @@ -1075,6 +1075,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(&priv->pkey_poll_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); + INIT_WORK(&priv->broadcast_join_task, ipoib_mcast_broadcast_join_task); INIT_WORK(&priv->flush_light, ipoib_ib_dev_flush_light); INIT_WORK(&priv->flush_normal, ipoib_ib_dev_flush_normal); INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy); Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-09-15 23:02:42.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-09-15 23:37:41.000000000 +0300 @@ -389,6 +389,21 @@ static int ipoib_mcast_sendonly_join(str return ret; } +void ipoib_mcast_broadcast_join_task(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + broadcast_join_task); + + /* + * Take rtnl_lock to avoid racing with ipoib_stop() + * and turning the carrier back on while a device + * is being removed. + */ + rtnl_lock(); + netif_carrier_on(priv->dev); + rtnl_unlock(); +} + static int ipoib_mcast_join_complete(int status, struct ib_sa_multicast *multicast) { @@ -415,16 +430,9 @@ static int ipoib_mcast_join_complete(int &priv->mcast_task, 0); mutex_unlock(&mcast_mutex); - if (mcast == priv->broadcast) { - /* - * Take RTNL lock here to avoid racing with - * ipoib_stop() and turning the carrier back - * on while a device is being removed. - */ - rtnl_lock(); - netif_carrier_on(dev); - rtnl_unlock(); - } + /* Would deadlock with ipoib_stop if rtnl_lock was taken */ + if (mcast == priv->broadcast) + queue_work(ipoib_workqueue, &priv->broadcast_join_task); return 0; } From rdreier at cisco.com Mon Sep 15 14:29:33 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 14:29:33 -0700 Subject: [ofa-general] Re: [PATCH] mlx4/IB: Set the PRESENT bit in the physical buffer list to match the MTT format. In-Reply-To: <20080915124753.GA13187@mellanox.co.il> (Vladimir Sokolovsky's message of "Mon, 15 Sep 2008 15:47:53 +0300") References: <20080915124753.GA13187@mellanox.co.il> Message-ID: Thanks, applied. I moved the MLX4_MTT_FLAG_PRESENT define to a common header rather than duplicating it in a different place, and updated the changelog to talk about byte swapping too. So now fast register is confirmed working with this and your previous patch? - R. From rdreier at cisco.com Mon Sep 15 14:30:40 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 14:30:40 -0700 Subject: [ofa-general] Re: [PATCH] RDMA/nes: nes_cm.c cleanup In-Reply-To: <200809151958.m8FJw2sk012367@velma.neteffect.com> (Chien Tung's message of "Mon, 15 Sep 2008 14:58:02 -0500") References: <200809151958.m8FJw2sk012367@velma.neteffect.com> Message-ID: > - struct nes_qp *nesqp; > + struct nes_qp *nesqp = NULL; Is this really just a cleanup, or is this properly part of the previous patch? From rdreier at cisco.com Mon Sep 15 15:53:55 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 Sep 2008 15:53:55 -0700 Subject: [ofa-general] Re: [PATCH] ipoib: fix deadlock between join completion handler and ipoib_stop In-Reply-To: <48CEC979.8030506@gmail.com> (Yossi Etigin's message of "Mon, 15 Sep 2008 23:45:45 +0300") References: <48CEC979.8030506@gmail.com> Message-ID: Looks good... I assume this has been tested and fixes the issue? From vlad at dev.mellanox.co.il Mon Sep 15 20:59:24 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 16 Sep 2008 06:59:24 +0300 Subject: [ofa-general] Re: [PATCH] mlx4/IB: Set the PRESENT bit in the physical buffer list to match the MTT format. In-Reply-To: References: <20080915124753.GA13187@mellanox.co.il> Message-ID: <48CF2F1C.9060103@dev.mellanox.co.il> Roland Dreier wrote: > Thanks, applied. I moved the MLX4_MTT_FLAG_PRESENT define to a common > header rather than duplicating it in a different place, and updated the > changelog to talk about byte swapping too. > > So now fast register is confirmed working with this and your previous > patch? > > - R. Yes, Now fast register is working. I used test based on Steve's krping. Regards, Vladimir From ogerlitz at voltaire.com Mon Sep 15 22:24:53 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 16 Sep 2008 08:24:53 +0300 Subject: [ofa-general] Re: [PATCH] mlx4/IB: Set the PRESENT bit in the physical buffer list to match the MTT format. In-Reply-To: <48CF2F1C.9060103@dev.mellanox.co.il> References: <20080915124753.GA13187@mellanox.co.il> <48CF2F1C.9060103@dev.mellanox.co.il> Message-ID: <48CF4325.7090804@voltaire.com> Vladimir Sokolovsky wrote: >> So now fast register is confirmed working with this and your previous >> patch? > Yes, Now fast register is working. I used test based on Steve's krping. Vlad, Would it be possible to have you send patch to Steve such that krping could be used to test fmrs for both IB and iWARP? Tziporet, what firmware is going to support this? Or. From ogerlitz at voltaire.com Mon Sep 15 23:00:55 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 16 Sep 2008 09:00:55 +0300 Subject: [ofa-general] How to Interpret MTU reported by "ibv_devinfo" vs "ifconfig ib0" In-Reply-To: <1221509555.30937.246.camel@chromite.mv.qlogic.com> References: <48CA9CF4.3070502@scalableinformatics.com> <1221499805.30937.223.camel@chromite.mv.qlogic.com> <1221509555.30937.246.camel@chromite.mv.qlogic.com> Message-ID: <48CF4B97.2000501@voltaire.com> Ralph Campbell wrote: > The reason a larger network MTU helps is because the Linux network stack is more efficient when using larger MTUs. If all your traffic goes within the IB cluster and only between Linux nodes, the connected-mode / 64k mtu would serve you well. If you use TCP and have to communicate with IB nodes not supporting the connected mode or through IB/Eth gateway, you may want to use an HCA which supports TCP stateless offloads such as LSO (Large Send Offload) and TCP checksum offload. For example with LSO, even though the nic mtu is 2k, the stack would let the nic to send a 64k tcp segment. As the name suggests, "stateless offload" are inter-operable, that is there no requirement on the other side. Or. From vlad at dev.mellanox.co.il Tue Sep 16 00:59:33 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 16 Sep 2008 10:59:33 +0300 Subject: [ofa-general] ***SPAM*** [PATCH v2] ipoib: fix hang while bringing down uninitialized interface In-Reply-To: <48CEA6DC.9000904@gmail.com> References: <48CEA6DC.9000904@gmail.com> Message-ID: <48CF6765.3020405@dev.mellanox.co.il> Yossi Etigin wrote: > Fix bug #1172: If a pkey for an interface is not found during > initialization, then poll_timer is left uninitialized. When the > device is brought down, ipoib tries to del_timer_sync() it. This > call hangs in an infinite loop in lock_timer_base(), because > timer_base is NULL. We should check whether the timer was really > initialized. > > Changes from v1: > - handle a case when ipoib_ib_dev_stop() is called twice on the > same dev->priv - zero the timer after its deletion. > > Signed-off-by: Yossi Etigin > PS > Vlad, please replace the patch ipoib_0400_fix_hang_while_bringing_down > by this one. > Done, Regards, Vladimir From vlad at lists.openfabrics.org Tue Sep 16 03:08:44 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 16 Sep 2008 03:08:44 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080916-0200 daily build status Message-ID: <20080916100844.D6354E60D7A@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From tziporet at dev.mellanox.co.il Tue Sep 16 03:38:59 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 16 Sep 2008 13:38:59 +0300 Subject: [ofa-general] Re: [PATCH] mlx4/IB: Set the PRESENT bit in the physical buffer list to match the MTT format. In-Reply-To: <48CF4325.7090804@voltaire.com> References: <20080915124753.GA13187@mellanox.co.il> <48CF2F1C.9060103@dev.mellanox.co.il> <48CF4325.7090804@voltaire.com> Message-ID: <48CF8CC3.2030709@mellanox.co.il> Or Gerlitz wrote: > > > Tziporet, what firmware is going to support this? > > FW - 2.6.0 Tziporet From hal.rosenstock at gmail.com Tue Sep 16 06:49:43 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 16 Sep 2008 09:49:43 -0400 Subject: [ofa-general] Usage of Infiniband Protocol Stack ? In-Reply-To: References: Message-ID: On Fri, Sep 12, 2008 at 3:24 PM, Kumar, Amit H. wrote: > Thank you Hal & Joe for your prompt reply. > > Two more questions: > If ifconfig for IB is just for IPoIB, is it okay to bring down this interface(ib0) and still be able to run applications like Mvapich2 and pvfs2 ?? > > And I also assume that we At Least Need 1 Ethernet Interface Up for the correct operation of IB compiled Applications, without which IB compiled Applications will fail. Is this correct ?? I think some applications need an IP interface which could be ethernet or IPoIB or any other network. Others don't need anything else as they use the IB CM or RDMA CM. There are specific mailing lists and documentation for mvapich2 and pvfs2. -- Hal > Thank you, > Amit > >> -----Original Message----- >> From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] >> Sent: Friday, September 12, 2008 1:09 PM >> To: Kumar, Amit H. >> Cc: general at lists.openfabrics.org >> Subject: Re: [ofa-general] Usage of Infiniband Protocol Stack ? >> >> On Fri, Sep 12, 2008 at 12:43 PM, Kumar, Amit H. >> wrote: >> > I have some applications(mvapich2, pvfs2 ...) compiled to use the OFED >> > Infiniband protocol stack. >> > >> > May be a stupid question ..: >> > Is it valid to see at the "ifconfig ib0" stats to report the usage of >> IB >> > protocol stack, regardless what application I making use of the IB >> stack.? >> >> ifconfig for IB interfaces shows the IPoIB stats. >> >> "Pure" IB stats are available from the PMA. These stats >> (bytes*4,packets x in/out) are total (across all applications being >> run). They can be obtained by the perfquery diagnostic tool or via a >> Performance Manager. >> >> -- Hal >> >> > Thank you, >> > Amit >> > >> > >> > >> > _______________________________________________ >> > general mailing list >> > general at lists.openfabrics.org >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> > >> > To unsubscribe, please visit >> > http://openib.org/mailman/listinfo/openib-general >> > > From hal.rosenstock at gmail.com Tue Sep 16 06:52:05 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 16 Sep 2008 09:52:05 -0400 Subject: ***SPAM*** Re: [ofa-general] [PATCH] opensm/osm_multicast.[ch]: simplify flows, remove unused functions In-Reply-To: <20080915104020.GF17315@sashak.voltaire.com> References: <20080915104020.GF17315@sashak.voltaire.com> Message-ID: Sasha, On Mon, Sep 15, 2008 at 6:40 AM, Sasha Khapyorsky wrote: > > Simplify flows, remove unused and mean less osm_mgrp_init() functions, > consolidate notice sending functions. > > Signed-off-by: Sasha Khapyorsky > --- > opensm/include/opensm/osm_multicast.h | 88 ---------------- > opensm/opensm/osm_multicast.c | 181 ++++++++++----------------------- > 2 files changed, 56 insertions(+), 213 deletions(-) > > diff --git a/opensm/include/opensm/osm_multicast.h b/opensm/include/opensm/osm_multicast.h > index c0bd16e..c860d4a 100644 > --- a/opensm/include/opensm/osm_multicast.h > +++ b/opensm/include/opensm/osm_multicast.h > @@ -81,29 +81,6 @@ BEGIN_C_DECLS > * Steve King, Intel > * > *********/ > -/****f* IBA Base: OpenSM: Multicast Group/osm_get_mcast_req_type_str > -* NAME > -* osm_get_mcast_req_type_str > -* > -* DESCRIPTION > -* Returns a string for the specified osm_mcast_req_type_t value. > -* > -* SYNOPSIS > -*/ > -const char *osm_get_mcast_req_type_str(IN osm_mcast_req_type_t req_type); > -/* > -* PARAMETERS > -* req_type > -* [in] osm_mcast_req_type value > -* > -* RETURN VALUES > -* Pointer to the request type description string. > -* > -* NOTES > -* > -* SEE ALSO > -*********/ > - > /****s* OpenSM: Multicast Group/osm_mcast_mgr_ctxt_t > * NAME > * osm_mcast_mgr_ctxt_t > @@ -483,71 +460,6 @@ osm_mgrp_remove_port(IN osm_subn_t * const p_subn, > * SEE ALSO > *********/ > > -/****f* OpenSM: Multicast Group/osm_mgrp_get_root_switch > -* NAME > -* osm_mgrp_get_root_switch > -* > -* DESCRIPTION > -* Returns the "root" switch of this multicast group. The root switch > -* is at the trunk of the multicast single spanning tree. > -* > -* SYNOPSIS > -*/ > -static inline osm_switch_t *osm_mgrp_get_root_switch(IN const osm_mgrp_t * > - const p_mgrp) > -{ > - if (p_mgrp->p_root) > - return (p_mgrp->p_root->p_sw); > - else > - return (NULL); > -} > - > -/* > -* PARAMETERS > -* p_mgrp > -* [in] Pointer to an osm_mgrp_t object. > -* > -* RETURN VALUES > -* Returns the "root" switch of this multicast group. The root switch > -* is at the trunk of the multicast single spanning tree. > -* > -* NOTES > -* > -* SEE ALSO > -* Multicast Group > -*********/ > - > -/****f* OpenSM: Multicast Group/osm_mgrp_compute_avg_hops > -* NAME > -* osm_mgrp_compute_avg_hops > -* > -* DESCRIPTION > -* Returns the average number of hops from the given to switch > -* to all member of a multicast group. > -* > -* SYNOPSIS > -*/ > -float > -osm_mgrp_compute_avg_hops(const osm_mgrp_t * const p_mgrp, > - const osm_switch_t * const p_sw); > -/* > -* PARAMETERS > -* p_mgrp > -* [in] Pointer to an osm_mgrp_t object. > -* > -* p_sw > -* [in] Pointer to the switch from which to measure. > -* > -* RETURN VALUES > -* Returns the average number of hops from the given to switch > -* to all member of a multicast group. > -* > -* NOTES > -* > -* SEE ALSO > -* Multicast Group > -*********/ > - > /****f* OpenSM: Multicast Group/osm_mgrp_apply_func > * NAME > * osm_mgrp_apply_func > diff --git a/opensm/opensm/osm_multicast.c b/opensm/opensm/osm_multicast.c > index 77e61ad..b810630 100644 > --- a/opensm/opensm/osm_multicast.c > +++ b/opensm/opensm/osm_multicast.c > @@ -51,23 +51,6 @@ > > /********************************************************************** > **********************************************************************/ > -/* osm_mcast_req_type_t values converted to test for easier printing. */ > -const static char *mcast_req_type_str[] = { > - "OSM_MCAST_REQ_TYPE_CREATE", > - "OSM_MCAST_REQ_TYPE_JOIN", > - "OSM_MCAST_REQ_TYPE_LEAVE", > - "OSM_MCAST_REQ_TYPE_SUBNET_CHANGE" > -}; > - > -const char *osm_get_mcast_req_type_str(IN osm_mcast_req_type_t req_type) > -{ > - if (req_type > OSM_MCAST_REQ_TYPE_SUBNET_CHANGE) > - req_type = OSM_MCAST_REQ_TYPE_SUBNET_CHANGE; > - return (mcast_req_type_str[req_type]); > -} > - > -/********************************************************************** > - **********************************************************************/ > void osm_mgrp_delete(IN osm_mgrp_t * const p_mgrp) > { > osm_mcm_port_t *p_mcm_port; > @@ -92,10 +75,13 @@ void osm_mgrp_delete(IN osm_mgrp_t * const p_mgrp) > > /********************************************************************** > **********************************************************************/ > -static void > -osm_mgrp_init(IN osm_mgrp_t * const p_mgrp, IN const ib_net16_t mlid) > +osm_mgrp_t *osm_mgrp_new(IN const ib_net16_t mlid) > { > - CL_ASSERT(cl_ntoh16(mlid) >= IB_LID_MCAST_START_HO); > + osm_mgrp_t *p_mgrp; > + > + p_mgrp = (osm_mgrp_t *) malloc(sizeof(*p_mgrp)); > + if (!p_mgrp) > + return NULL; > > memset(p_mgrp, 0, sizeof(*p_mgrp)); > cl_qmap_init(&p_mgrp->mcm_port_tbl); > @@ -103,19 +89,8 @@ osm_mgrp_init(IN osm_mgrp_t * const p_mgrp, IN const ib_net16_t mlid) > p_mgrp->last_change_id = 0; > p_mgrp->last_tree_id = 0; > p_mgrp->to_be_deleted = FALSE; > -} > > -/********************************************************************** > - **********************************************************************/ > -osm_mgrp_t *osm_mgrp_new(IN const ib_net16_t mlid) > -{ > - osm_mgrp_t *p_mgrp; > - > - p_mgrp = (osm_mgrp_t *) malloc(sizeof(*p_mgrp)); > - if (p_mgrp) > - osm_mgrp_init(p_mgrp, mlid); > - > - return (p_mgrp); > + return p_mgrp; > } > > /********************************************************************** > @@ -132,42 +107,39 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_mgrp_t * const p_mgrp, > uint8_t prev_scope; > > p_mcm_port = osm_mcm_port_new(p_port_gid, join_state, proxy_join); > - if (p_mcm_port) { > - port_guid = p_port_gid->unicast.interface_id; > + if (!p_mcm_port) > + return NULL; > + > + port_guid = p_port_gid->unicast.interface_id; > + > + /* > + prev_item = cl_qmap_insert(...) > + Pointer to the item in the map with the specified key. If insertion > + was successful, this is the pointer to the item. If an item with the > + specified key already exists in the map, the pointer to that item is > + returned. > + */ > + prev_item = cl_qmap_insert(&p_mgrp->mcm_port_tbl, > + port_guid, &p_mcm_port->map_item); > + > + /* if already exists - revert the insertion and only update join state */ > + if (prev_item != &p_mcm_port->map_item) { > + osm_mcm_port_delete(p_mcm_port); > + p_mcm_port = (osm_mcm_port_t *) prev_item; > > /* > - prev_item = cl_qmap_insert(...) > - Pointer to the item in the map with the specified key. If insertion > - was successful, this is the pointer to the item. If an item with the > - specified key already exists in the map, the pointer to that item is > - returned. > + o15.0.1.11 > + Join state of the end port should be the or of the > + previous setting with the current one > */ > - prev_item = cl_qmap_insert(&p_mgrp->mcm_port_tbl, > - port_guid, &p_mcm_port->map_item); > - > - /* if already exists - revert the insertion and only update join state */ > - if (prev_item != &p_mcm_port->map_item) { > - > - osm_mcm_port_delete(p_mcm_port); > - p_mcm_port = (osm_mcm_port_t *) prev_item; > - > - /* > - o15.0.1.11 > - Join state of the end port should be the or of the > - previous setting with the current one > - */ > - ib_member_get_scope_state(p_mcm_port->scope_state, > - &prev_scope, > - &prev_join_state); > - p_mcm_port->scope_state = > - ib_member_set_scope_state(prev_scope, > - prev_join_state | > - join_state); > - > - } else { > - /* track the fact we modified the group ports */ > - p_mgrp->last_change_id++; > - } > + ib_member_get_scope_state(p_mcm_port->scope_state, &prev_scope, > + &prev_join_state); > + p_mcm_port->scope_state = > + ib_member_set_scope_state(prev_scope, > + prev_join_state | join_state); > + } else { > + /* track the fact we modified the group ports */ > + p_mgrp->last_change_id++; > } > > return (p_mcm_port); > @@ -243,9 +215,7 @@ __osm_mgrp_apply_func_sub(const osm_mgrp_t * const p_mgrp, > uint8_t max_children; > osm_mtree_node_t *p_child_mtn; > > - /* > - Call the user, then recurse. > - */ > + /* Call the user, then recurse. */ > p_func(p_mgrp, p_mtn, context); > > max_children = osm_mtree_node_get_max_children(p_mtn); > @@ -276,82 +246,43 @@ osm_mgrp_apply_func(const osm_mgrp_t * const p_mgrp, > > /********************************************************************** > **********************************************************************/ > -void > -osm_mgrp_send_delete_notice(IN osm_subn_t * const p_subn, > - IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) > +static void mgrp_send_notice(osm_subn_t *subn, osm_log_t *log, > + osm_mgrp_t *mgrp, unsigned num) > { > ib_mad_notice_attr_t notice; > ib_api_status_t status; > > - OSM_LOG_ENTER(p_log); > - > - /* prepare the needed info */ > - > - /* details of the notice */ > - notice.generic_type = 0x83; /* is generic subn mgt type */ > + notice.generic_type = 0x83; /* generic SubnMgt type */ > ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ > - notice.g_or_v.generic.trap_num = CL_HTON16(67); /* delete of mcg */ > + notice.g_or_v.generic.trap_num = CL_HTON16(num); > /* The sm_base_lid is saved in network order already. */ > - notice.issuer_lid = p_subn->sm_base_lid; > + notice.issuer_lid = subn->sm_base_lid; > /* following o14-12.1.11 and table 120 p726 */ > /* we need to provide the MGID */ > - memcpy(&(notice.data_details.ntc_64_67.gid), > - &(p_mgrp->mcmember_rec.mgid), sizeof(ib_gid_t)); > + memcpy(¬ice.data_details.ntc_64_67.gid, > + &mgrp->mcmember_rec.mgid, sizeof(ib_gid_t)); > > /* According to page 653 - the issuer gid in this case of trap > is the SM gid, since the SM is the initiator of this trap. */ > - notice.issuer_gid.unicast.prefix = p_subn->opt.subnet_prefix; > - notice.issuer_gid.unicast.interface_id = p_subn->sm_port_guid; > + notice.issuer_gid.unicast.prefix = subn->opt.subnet_prefix; > + notice.issuer_gid.unicast.interface_id = subn->sm_port_guid; > > - status = osm_report_notice(p_log, p_subn, ¬ice); > - if (status != IB_SUCCESS) { > - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7601: " > + if ((status = osm_report_notice(log, subn, ¬ice))) > + OSM_LOG(log, OSM_LOG_ERROR, "ERR 7601: " > "Error sending trap reports (%s)\n", > ib_get_err_str(status)); > - goto Exit; > - } > +} > > -Exit: > - OSM_LOG_EXIT(p_log); > +void > +osm_mgrp_send_delete_notice(IN osm_subn_t * const p_subn, > + IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) > +{ > + mgrp_send_notice(p_subn, p_log, p_mgrp, 67); > } Any reason not to eliminate this extra call level ? > -/********************************************************************** > - **********************************************************************/ > void > osm_mgrp_send_create_notice(IN osm_subn_t * const p_subn, > IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) > { > - ib_mad_notice_attr_t notice; > - ib_api_status_t status; > - > - OSM_LOG_ENTER(p_log); > - > - /* prepare the needed info */ > - > - /* details of the notice */ > - notice.generic_type = 0x83; /* Generic SubnMgt type */ > - ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ > - notice.g_or_v.generic.trap_num = CL_HTON16(66); /* create of mcg */ > - /* The sm_base_lid is saved in network order already. */ > - notice.issuer_lid = p_subn->sm_base_lid; > - /* following o14-12.1.11 and table 120 p726 */ > - /* we need to provide the MGID */ > - memcpy(&(notice.data_details.ntc_64_67.gid), > - &(p_mgrp->mcmember_rec.mgid), sizeof(ib_gid_t)); > - > - /* According to page 653 - the issuer gid in this case of trap > - is the SM gid, since the SM is the initiator of this trap. */ > - notice.issuer_gid.unicast.prefix = p_subn->opt.subnet_prefix; > - notice.issuer_gid.unicast.interface_id = p_subn->sm_port_guid; > - > - status = osm_report_notice(p_log, p_subn, ¬ice); > - if (status != IB_SUCCESS) { > - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7602: " > - "Error sending trap reports (%s)\n", > - ib_get_err_str(status)); > - goto Exit; > - } > - > -Exit: > - OSM_LOG_EXIT(p_log); > + mgrp_send_notice(p_subn, p_log, p_mgrp, 66); Similarly, any reason not to eliminate this extra calling level ? -- Hal > } > -- > 1.6.0.1.196.g01914 > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From marshgr at cse.ohio-state.edu Tue Sep 16 07:08:41 2008 From: marshgr at cse.ohio-state.edu (gregory james marsh) Date: Tue, 16 Sep 2008 10:08:41 -0400 (EDT) Subject: [ofa-general] verbs.h & rdma_cma.h: error: comma at end of enumerator list Message-ID: Hello, I'm building an application (Apache Qpid) that includes ofed/include/rdma/rdma_cma.h and receive the following error message during "make all". (The detailed error message is at the end of this message.) /usr/local/ofed/include/infiniband/verbs.h:209: error: comma at end of enumerator list /usr/local/ofed/include/rdma/rdma_cma.h:66: error: comma at end of enumerator list The relevant enum's from each of these files is: verbs.h 208 enum ibv_event_flags { 209 IBV_XRC_QP_EVENT_FLAG = 0x80000000, 210 }; rdma_cma.h: 63 enum rdma_port_space { 64 RDMA_PS_IPOIB= 0x0002, 65 RDMA_PS_TCP = 0x0106, 66 RDMA_PS_UDP = 0x0111, 67 }; I looked at the latest verions of each file at (projects/~shefty/librdmacm.git) and (projects/ofed_1_4/libibverbs.git). Enum ibv_event_flags no longer occurs in verbs.h, but the situation still occurs in the latest rdma_cma.h Thanks, Greg Marsh g++ -DHAVE_CONFIG_H -I. -Igen -I./gen -Werror -pedantic -Wall -Wextra -Wno-shadow -Wpointer-arith -Wcast-qual -Wcast-align -Wno-long-long -Wvolatile-register-var -Winvalid-pch -Wno-system-headers -Woverloaded-virtual -Wno-missing-field-initializers -O2 -I/usr/local/ofed/include -MT qpid/sys/rdma/libqpidrdma_la-rdma_factories.lo -MD -MP -MF qpid/sys/rdma/.deps/libqpidrdma_la-rdma_factories.Tpo -c qpid/sys/rdma/rdma_factories.cpp -fPIC -DPIC -o qpid/sys/rdma/.libs/libqpidrdma_la-rdma_factories.o /usr/local/ofed/include/infiniband/verbs.h:209: error: comma at end of enumerator list /usr/local/ofed/include/rdma/rdma_cma.h:66: error: comma at end of enumerator list make[3]: *** [qpid/sys/rdma/libqpidrdma_la-rdma_factories.lo] Error 1 From AHKumar at odu.edu Tue Sep 16 07:18:22 2008 From: AHKumar at odu.edu (Kumar, Amit H.) Date: Tue, 16 Sep 2008 10:18:22 -0400 Subject: [ofa-general] Usage of Infiniband Protocol Stack ? In-Reply-To: References: Message-ID: > I think some applications need an IP interface which could be ethernet > or IPoIB or any other network. Others don't need anything else as they > use the IB CM or RDMA CM. > > There are specific mailing lists and documentation for mvapich2 and > pvfs2. > > -- Hal Thanks Hal !!! I will follow up the discussion with other lists as well. From tziporet at mellanox.co.il Tue Sep 16 07:35:20 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 16 Sep 2008 17:35:20 +0300 Subject: [ofa-general] OFED meeting summary for Sep 15, 2008 on 1.4 Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD867B09@mtlexch01.mtl.com> OFED meeting summary for Sep 15, 2008 on 1.4 status ======================================== Summary: ======== - 1.4-rc2 will be done by end of the week for the interop event - Reviewed the testing matrix. Feedback is good but wish to add more info Details: ====== 1. Bugs review: Reviewed critical and major bugs. All - we must work in high gear to fix all critical bugs 2. No progress on compilation warning cleanup :-( 3. Testing matrix: Matrix is important. Need more data: a. Which are the mandatory ULPs for the interop event - Rupert will provide this data. b. Add the interop testing to the table - Tziporet to get data from Rupert b. What is the testing level each company do on each ULP. Tziporet to define testing level and each company will update its status Meeting minutes on the web: http://www.openfabrics.org/txt/documentation/linux/EWG_meeting_minutes/ Tziporet From ctung at NetEffect.com Tue Sep 16 08:22:30 2008 From: ctung at NetEffect.com (Chien Tung) Date: Tue, 16 Sep 2008 10:22:30 -0500 Subject: [ofa-general] RE: [PATCH] RDMA/nes: nes_cm.c cleanup In-Reply-To: References: <200809151958.m8FJw2sk012367@velma.neteffect.com> Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0848AEAC@venom2> > > - struct nes_qp *nesqp; > > + struct nes_qp *nesqp = NULL; > > Is this really just a cleanup, or is this properly part of > the previous patch? > nesqp in rem_ref_cm_node() is assigned before use so this change really doesn't matter. Do you want me to take it out? Chien From sashak at voltaire.com Tue Sep 16 08:47:52 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Sep 2008 18:47:52 +0300 Subject: [ofa-general] [PATCH] opensm/osm_multicast.[ch]: simplify flows, remove unused functions In-Reply-To: References: <20080915104020.GF17315@sashak.voltaire.com> Message-ID: <20080916154752.GG11962@sashak.voltaire.com> Hi Hal, On 09:52 Tue 16 Sep , Hal Rosenstock wrote: > > +void > > +osm_mgrp_send_delete_notice(IN osm_subn_t * const p_subn, > > + IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) > > +{ > > + mgrp_send_notice(p_subn, p_log, p_mgrp, 67); > > } > > Any reason not to eliminate this extra call level ? Basically I agree, and in future it can be eliminated (I have few more patches in the stack :)). This patch is "cosmetic" one, so I didn't change any OpenSM internal API. > > -/********************************************************************** > > - **********************************************************************/ > > void > > osm_mgrp_send_create_notice(IN osm_subn_t * const p_subn, > > IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) > > { > > - ib_mad_notice_attr_t notice; > > - ib_api_status_t status; > > - > > - OSM_LOG_ENTER(p_log); > > - > > - /* prepare the needed info */ > > - > > - /* details of the notice */ > > - notice.generic_type = 0x83; /* Generic SubnMgt type */ > > - ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ > > - notice.g_or_v.generic.trap_num = CL_HTON16(66); /* create of mcg */ > > - /* The sm_base_lid is saved in network order already. */ > > - notice.issuer_lid = p_subn->sm_base_lid; > > - /* following o14-12.1.11 and table 120 p726 */ > > - /* we need to provide the MGID */ > > - memcpy(&(notice.data_details.ntc_64_67.gid), > > - &(p_mgrp->mcmember_rec.mgid), sizeof(ib_gid_t)); > > - > > - /* According to page 653 - the issuer gid in this case of trap > > - is the SM gid, since the SM is the initiator of this trap. */ > > - notice.issuer_gid.unicast.prefix = p_subn->opt.subnet_prefix; > > - notice.issuer_gid.unicast.interface_id = p_subn->sm_port_guid; > > - > > - status = osm_report_notice(p_log, p_subn, ¬ice); > > - if (status != IB_SUCCESS) { > > - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7602: " > > - "Error sending trap reports (%s)\n", > > - ib_get_err_str(status)); > > - goto Exit; > > - } > > - > > -Exit: > > - OSM_LOG_EXIT(p_log); > > + mgrp_send_notice(p_subn, p_log, p_mgrp, 66); > > Similarly, any reason not to eliminate this extra calling level ? Ditto. Sasha From rdreier at cisco.com Tue Sep 16 09:16:58 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 Sep 2008 09:16:58 -0700 Subject: [ofa-general] RE: [PATCH] RDMA/nes: nes_cm.c cleanup In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC0848AEAC@venom2> (Chien Tung's message of "Tue, 16 Sep 2008 10:22:30 -0500") References: <200809151958.m8FJw2sk012367@velma.neteffect.com> <5E701717F2B2ED4EA60F87C8AA57B7CC0848AEAC@venom2> Message-ID: > nesqp in rem_ref_cm_node() is assigned before use so this change really > doesn't matter. Do you want me to take it out? I can drop that chunk myself... It doesn't really seems like a cleanup to me to add a redundant assignment, so if it doesn't fix anything, I'll drop it. - R. From ctung at NetEffect.com Tue Sep 16 09:25:50 2008 From: ctung at NetEffect.com (Chien Tung) Date: Tue, 16 Sep 2008 11:25:50 -0500 Subject: [ofa-general] RE: [PATCH] RDMA/nes: nes_cm.c cleanup In-Reply-To: References: <200809151958.m8FJw2sk012367@velma.neteffect.com><5E701717F2B2ED4EA60F87C8AA57B7CC0848AEAC@venom2> Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0848AED3@venom2> > > nesqp in rem_ref_cm_node() is assigned before use so this > change really > doesn't matter. Do you want me to take it out? > > I can drop that chunk myself... It doesn't really seems like > a cleanup to me to add a redundant assignment, so if it > doesn't fix anything, I'll drop it. Okay, thanks. Chien From yosefe at voltaire.com Tue Sep 16 09:42:31 2008 From: yosefe at voltaire.com (Yossi Etigin) Date: Tue, 16 Sep 2008 19:42:31 +0300 Subject: [ofa-general] ***SPAM*** Re: [PATCH] ipoib: fix deadlock between join completion handler and ipoib_stop In-Reply-To: References: <48CEC979.8030506@gmail.com> Message-ID: <32cb786f0809160942m57d18239yad9b0313b43ae063@mail.gmail.com> Yes it was. On Tue, Sep 16, 2008 at 1:53 AM, Roland Dreier wrote: > Looks good... I assume this has been tested and fixes the issue? > From rdreier at cisco.com Tue Sep 16 09:45:38 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 Sep 2008 09:45:38 -0700 Subject: [ofa-general] Re: [PATCH] ib/ehca: add flush CQE generation In-Reply-To: <20080910162356.7294fe87@BL3D1974.boeblingen.de.ibm.com> (Alexander Schmidt's message of "Wed, 10 Sep 2008 16:23:56 +0200") References: <20080910162356.7294fe87@BL3D1974.boeblingen.de.ibm.com> Message-ID: thanks, queued for 2.6.28. From jsquyres at cisco.com Tue Sep 16 10:02:14 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 16 Sep 2008 13:02:14 -0400 Subject: [ofa-general] Question about RDMA CM Message-ID: Greetings. I'm trying to finish up RDMA CM support in Open MPI, but am running into a problem on IB that I have been beating my head over for about a week and I can't figure it out (seem to work fine on iWARP). I know Sean is out on sabbatical; I'm hoping that someone will have an insight into my problem anyway. Short version: ============== Open MPI uses a separate thread for most of its RDMA CM actions to ensure that they can respond in a timely manner / not timeout. All the code seems to work fine on iWARP (tested with Chelsio T3's), but on some older Mellanox HCAs, I sometimes get RNRs after both sides get the ESTABLISHED event and the initiator sends the first message on the new RC QP (not SRQ). I am *sure* that a receive buffer is posted at the receiver, and the QPs appear to be transitioning INIT -> RTR -> RTS properly. I cannot figure out why I am getting RNRs. These RNRs *only* seem to happen when either or both of the initiator or receiver servers are fully busy (i.e., all cores are 100% busy). Longer version: =============== All the code is currently in a development mercurial branch (not on the main Open MPI SVN): http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/openib-fd- progress/ As mentioned above, this all seems to work fine on Chelsio T3's. I'm getting these RNRs on Mellanox 2 port DDR cards, MT_00A0010001 using OFED 1.3.1. I have not tried on other IB cards. All my servers are 4 core intel machines (slightly old -- pre-woodcrest). I can pretty consistently get the problems to occur when I run a simple MPI "ring" test program (send a message around in a ring) across 2 servers (4 cores/ea). OMPI uses shared memory for on-node communication and verbs for off-node communication. The program runs fine when I do not use RDMA CM, but gets RNRs for some connections when I use RDMA CM over IB and all 4 cores on both servers are running MPI processes (i.e., are 100% busy polling for message passing progress). The connectivity looks like this: node 1 |--- proc A <- shmem <- proc B <- shmem <- proc C <- shmem <- proc D <-| verbs verbs |--> proc E -> shmem -> proc F -> shmem -> proc G -> shmem -> proc H --| node 2 Random notes: 1. Open MPI uses a progress thread in its OpenFabrics support for the RDMA CM. rdma_connect() is initiated from the main thread, but all other events are handled from the progress thread. 2. Due to some uninteresting race conditions, we only allow connections to be made "in one direction" (the lower (IP address, port) tuple is the initiator). If the "wrong" MPI process desires to make a connection, it makes a bogus QP and initiates an rdma_connect(). The receiver process then gets the CONNECT_REQUEST event, detects that the connection is coming the "wrong" way, initiates the connection in the "right" direction, and then rejects the "wrong" connection. The initiator expects the rejection, and simply waits for the CONNECT_REQUEST coming in the other direction. 3. To accommodate iWARP's "initiator must send first" requirement, we have the connection sequence in OMPI only post a single receive buffer that will later be used for an OMPI-level CTS. So during the RDMA CM wireup, there is only *one* receive buffer posted. Once the ESTABLISHED event arrives, OMPI posts all the rest of its normal receive buffers and then sends the CTS to the peer that will consume the 1 buffer that was previously posted (which is guaranteed to have its CTS buffer posted). OMPI does not start sending anything else until it gets the CTS from its peer. 4. OMPI normally sets non-SRQ RC QP's rnr_retry_count value to 0 because OMPI has its own flow control (read: if we ever get an RNR, it's an OMPI bug). Consider a scenario where MPI process A wants to connect to MPI process B (on different servers). Let's assume that A->B is the "right" direction for simplicity. Here's what happens: A: creates QP, posts the 1 CTS receive buffer, and calls rdma_connect() B: gets CONNECT_REQUEST, creates QP, posts the 1 CTS receive buffer, and calls rdma_accept() --> I've verified that B's QP is transitioned to RTR properly A and B: get ESTABLISHED --> I've verified that A and B's QPs are transitioned to RTS properly A: posts its normal OMPI receive buffers A: sends the CTS A: sometimes gets IBV_WC_RNR_RETRY_EXC_ERR I have done the following to try to track down what is happening: - after B calls ibv_post_recv(), call sleep(5) before calling rdma_accept() -- just to ensure that the buffer really is posted. No effect; A still gets RNRs. - verified that A and B's QPs are transitioning into RTR and RTS properly. They seem to be doing this just fine. - increased the rnr_retry_count on the new QP. When I set it to 0-6, the problem still occurs. When I set it to 7 (infinite), *the problem goes away*. This last one (setting rnr_retry_count=7) is what kills me. It seems to imply that there is a race condition in the sequence somewhere, but I just can't figure out where. Both sides are posting receive buffers. Both sides are getting ESTABLISHED. Both sides are transitioning INIT -> RTR -> RTS properly. Why is there an RNR occurring? As noted above, this *only* happens when all the cores on my servers are fully busy. If I only run 1 or 2 MPI processes on both servers, the problem does not occur. This seems fishy, but I don't know exactly what it means. This could certainly be a bug in my code, but I just can't figure out where. Any insights or advice would be greatly appreciated; many thanks. -- Jeff Squyres Cisco Systems From rdreier at cisco.com Tue Sep 16 10:23:45 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 Sep 2008 10:23:45 -0700 Subject: [ofa-general] Re: [PATCH 1/5] drivers/infiniband/hw: Drop code after return In-Reply-To: (Julia Lawall's message of "Thu, 11 Sep 2008 14:33:01 +0200 (CEST)") References: Message-ID: thanks, applied From arlin.r.davis at intel.com Tue Sep 16 12:23:18 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Tue, 16 Sep 2008 12:23:18 -0700 Subject: [ofa-general] Question about RDMA CM In-Reply-To: References: Message-ID: > >Random notes: > >1. Open MPI uses a progress thread in its OpenFabrics support for the >RDMA CM. rdma_connect() is initiated from the main thread, but all >other events are handled from the progress thread. > >2. Due to some uninteresting race conditions, we only allow >connections to be made "in one direction" (the lower (IP address, >port) tuple is the initiator). If the "wrong" MPI process desires to >make a connection, it makes a bogus QP and initiates an >rdma_connect(). The receiver process then gets the CONNECT_REQUEST >event, detects that the connection is coming the "wrong" way, >initiates the connection in the "right" direction, and then rejects >the "wrong" connection. The initiator expects the rejection, and >simply waits for the CONNECT_REQUEST coming in the other direction. > Do you use rdma_cm to create QP's? If so, you have to be careful about re-using a cm_id's QP after rejections or any other conn event error. Not sure from your note here but if you happen to move the rejected initiator cm_id's qp to the new cm_id created from the "right" direction CR coming in as a short cut you may have problems. Also, do you validate the cm_id context and remote/local addresses in your CM processing thread. Could you possibly be getting misguided on the established event and be sending to a QP not yet preposted? I guess you would see other QP errors in that case. -arlin From jsquyres at cisco.com Tue Sep 16 13:05:50 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 16 Sep 2008 16:05:50 -0400 Subject: [ofa-general] Question about RDMA CM In-Reply-To: References: Message-ID: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> On Sep 16, 2008, at 3:23 PM, Davis, Arlin R wrote: >> 2. Due to some uninteresting race conditions, we only allow >> connections to be made "in one direction" (the lower (IP address, >> port) tuple is the initiator). If the "wrong" MPI process desires to >> make a connection, it makes a bogus QP and initiates an >> rdma_connect(). The receiver process then gets the CONNECT_REQUEST >> event, detects that the connection is coming the "wrong" way, >> initiates the connection in the "right" direction, and then rejects >> the "wrong" connection. The initiator expects the rejection, and >> simply waits for the CONNECT_REQUEST coming in the other direction. > > Do you use rdma_cm to create QP's? No; we are using ibv_create_qp, and then assigning id->qp afterwards. > If so, you have to be careful > about re-using a cm_id's QP after rejections or any other conn event > error. Not sure from your note here but if you happen to move the > rejected initiator cm_id's qp to the new cm_id created from the > "right" direction CR coming in as a short cut you may have problems. We create a new CM ID for the new connection in the "right" direction; the ID used for the "wrong" direction is eventually discarded. > Also, do you validate the cm_id context and remote/local addresses > in your CM processing thread. Could you possibly be getting > misguided on the established event and be sending to a QP not > yet preposted? I guess you would see other QP errors in that case. As far as I can tell, I am not sending to the wrong QP. But it is complex code, so there certainly can be a bug in this area. The thing that is weird for me is that setting rnr_retry to 7 makes it work. -- Jeff Squyres Cisco Systems From dledford at redhat.com Tue Sep 16 22:01:59 2008 From: dledford at redhat.com (Doug Ledford) Date: Wed, 17 Sep 2008 01:01:59 -0400 Subject: [ofa-general] compat-dapl-1.2.10 install bogosity Message-ID: <1221627719.1927.563.camel@firewall.xsintricity.com> Arlin, you usually put out such nice stuff. But the install-exec-hook you wrote into Makefile.am (as well as the uninstall-hook) are both busted (not to mention the absolute wrong thing to do anyway). When using automake, all of your install directives that automake creates automatically have $(DESTDIR) prepended to them so that make DESTDIR= install will work. In the case of just about any build system I know if, this is an absolute requirement. When you hand write your own targets via hooks like that, they obviously don't get rewritten by automake so you have to include $(DESTDIR) yourself. When I tried to build this package, it tried to modify the /etc/ofed/dat.conf file on my real filesystem instead of looking into the rpm build root. However, even once you fix that, modifying user configuration files without their consent as part of make install is just a *BAD* idea. The only time you should ever do anything like this is when you just flat don't care about things like rpm packaging (or apt packaging) and having the files on your system trackable, verifiable, erasable, etc. Install methods that do this sort of thing are a good way to really alienate people in a position like me ;-) -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From dledford at redhat.com Tue Sep 16 22:45:23 2008 From: dledford at redhat.com (Doug Ledford) Date: Wed, 17 Sep 2008 01:45:23 -0400 Subject: [ofa-general] Question about RDMA CM In-Reply-To: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> References: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> Message-ID: <1221630323.1927.575.camel@firewall.xsintricity.com> On Tue, 2008-09-16 at 16:05 -0400, Jeff Squyres wrote: > On Sep 16, 2008, at 3:23 PM, Davis, Arlin R wrote: > > >> 2. Due to some uninteresting race conditions, we only allow > >> connections to be made "in one direction" (the lower (IP address, > >> port) tuple is the initiator). If the "wrong" MPI process desires to > >> make a connection, it makes a bogus QP and initiates an > >> rdma_connect(). The receiver process then gets the CONNECT_REQUEST > >> event, detects that the connection is coming the "wrong" way, > >> initiates the connection in the "right" direction, and then rejects > >> the "wrong" connection. The initiator expects the rejection, and > >> simply waits for the CONNECT_REQUEST coming in the other direction. > > > > Do you use rdma_cm to create QP's? > > No; we are using ibv_create_qp, and then assigning id->qp afterwards. Don't do that. Assume if rdmacm provides an interface for doing something, then there is likely a reason. In this case, when you call rdma_create_qp(), it does more than just call ibv_create_qp() and ibv_modify_qp() on your behalf. It also pipes information about the state changes in the qp to the kernel rdma_cm module (by writing the commands in rdma_cm format to id->channel->fd, which is the rdma_cm fd not the qp fd, in places like rdma_init_qp_attr()). > As far as I can tell, I am not sending to the wrong QP. But it is > complex code, so there certainly can be a bug in this area. > > The thing that is weird for me is that setting rnr_retry to 7 makes it > work. I didn't look into the kernel code so I couldn't venture a guess as to whether or not the above is actually a hard requirement, and whether or not it would explain the rnr_retry of 7 getting around the race condition, but I would think it's plausible. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From ogerlitz at voltaire.com Wed Sep 17 00:54:42 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 17 Sep 2008 10:54:42 +0300 Subject: [ofa-general] Question about RDMA CM In-Reply-To: <1221630323.1927.575.camel@firewall.xsintricity.com> References: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> <1221630323.1927.575.camel@firewall.xsintricity.com> Message-ID: <48D0B7C2.1060400@voltaire.com> Doug Ledford wrote: >> No; we are using ibv_create_qp, and then assigning id->qp afterwards. > Don't do that. Assume if rdmacm provides an interface for doing something, then there is likely a reason. In this case, when you call rdma_create_qp(), it does more than just call ibv_create_qp() and ibv_modify_qp() on your behalf. It also pipes information about the state changes in the qp to the kernel rdma_cm module (by writing the commands in rdma_cm format to id->channel->fd, which is the rdma_cm fd not the qp fd, in places like rdma_init_qp_attr()). As far as I remember, unlike the kernel rdma-cm, librdmacm doesn't support a mode where the app creates the qp and maintain the qp state transitions. However, looking for example on rdma_accept, you can see that it calls ucma_modify_qp_rtr unconditionally and the latter caused a return of -EINVAL if id->qp is NULL, on the other hand what the accept code does next is to branch based on whethere id->qp is NULL or not... Or. From wangwhao at cn.ibm.com Wed Sep 17 01:25:00 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Wed, 17 Sep 2008 16:25:00 +0800 Subject: [ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1 Message-ID: Hi all: I had one IB cluster with eight IBM HS21 blades, mixed with RHEL5.2 Server and SLES10 SP2. All of them connected to one IB switch. opensm was running as subnet manager on one blade. Command ibcheckerrors finished smoothly. Last week I got another eight IBM LS21 blades connected to another IB switch. But after I connected two switches and turned on all the IB adapters on new blades, ibcheckerrors gave error message: [root at gaia-07 ~]# ibcheckerrors #warn: counter RcvErrors = 5691 (threshold 10) lid 3 port 1 Error check on lid 3 (gaia-07 HCA-1) port 1: FAILED ## Summary: 19 nodes checked, 0 bad nodes found ## 46 ports checked, 1 ports have errors beyond threshold [root at gaia-07 ~]# ibv_devinfo hca_id: mlx4_0 fw_ver: 2.3.000 node_guid: 0002:c903:0001:3370 sys_image_guid: 0002:c903:0001:3373 vendor_id: 0x02c9 vendor_part_id: 25418 hw_ver: 0xA0 board_id: IBM08A0000001 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 15 port_lid: 3 port_lmc: 0x00 port: 2 state: PORT_DOWN (1) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 0 port_lid: 0 port_lmc: 0x00 [root at gaia-07 ~]# ibcheckport 3 1 [root at gaia-07 ~]# echo $? 0 I had closed the embeded subnet manager on two IB switches. The issue always exist, even after I change subnet manager location to another machine. ib0 of machine gaia-07 can communicate with other machines each other. All installed IB adapters are ConnectX 4xSDR. Both switches are Topspin Switches. Will anyone give some advice about this issue? Thanks in advance! Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Wed Sep 17 03:11:06 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 17 Sep 2008 03:11:06 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080917-0200 daily build status Message-ID: <20080917101106.777D2E60D76@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From jsquyres at cisco.com Wed Sep 17 05:15:13 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 17 Sep 2008 08:15:13 -0400 Subject: [ofa-general] Question about RDMA CM In-Reply-To: <1221630323.1927.575.camel@firewall.xsintricity.com> References: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> <1221630323.1927.575.camel@firewall.xsintricity.com> Message-ID: <4236041E-C487-4BAB-B9C7-3984F4419E8C@cisco.com> On Sep 17, 2008, at 1:45 AM, Doug Ledford wrote: >> No; we are using ibv_create_qp, and then assigning id->qp afterwards. > > Don't do that. Assume if rdmacm provides an interface for doing > something, then there is likely a reason. In this case, when you call > rdma_create_qp(), it does more than just call ibv_create_qp() and > ibv_modify_qp() on your behalf. It also pipes information about the > state changes in the qp to the kernel rdma_cm module (by writing the > commands in rdma_cm format to id->channel->fd, which is the rdma_cm fd > not the qp fd, in places like rdma_init_qp_attr()). In general, I agree with you (use rdmacm_create_qp instead of making it manually). But I'm looking at the head of the librdmacm git and I don't see what you're talking about. All it does is call ibv_create_qp and ibv_modify_qp. The person who initially wrote this code chose to create the qp manually for two reasons: - rdmacm_create_qp is just a wrapper around ibv_create_qp and ibv_modify_qp - other parts of OMPI (that don't use RDMA CM for wireup) call ibv_create_qp and ibv_modify_qp That being said, I actually spent a little time yesterday trying to convert to use rdmacm_create_qp and was having problems with the comparison at the top of rdmacm_create_qp against the protection domain -- somehow it was failing for me. It was not immediately obvious to me where rdmacm was getting that pd from, nor why that comparison would fail. >> As far as I can tell, I am not sending to the wrong QP. But it is >> complex code, so there certainly can be a bug in this area. >> >> The thing that is weird for me is that setting rnr_retry to 7 makes >> it >> work. > > I didn't look into the kernel code so I couldn't venture a guess as to > whether or not the above is actually a hard requirement, and whether > or > not it would explain the rnr_retry of 7 getting around the race > condition, but I would think it's plausible. If all is working properly, a rnr_retry_count of 0 should be sufficient because there should be no race conditions. This is what OMPI has had for years; it's only this new RDMA CM wireup problem that has forced me to set it at 7. However, as I mentioned before, this is complex code, so it's quite possible (likely?) that I have a bug in the code somewhere. I was posting here looking for any possible insights into why this could happen. -- Jeff Squyres Cisco Systems From jsquyres at cisco.com Wed Sep 17 05:16:59 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 17 Sep 2008 08:16:59 -0400 Subject: [ofa-general] Question about RDMA CM In-Reply-To: <48D0B7C2.1060400@voltaire.com> References: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> <1221630323.1927.575.camel@firewall.xsintricity.com> <48D0B7C2.1060400@voltaire.com> Message-ID: <69372BB0-5C4C-41B4-BD2C-1AE73638CC84@cisco.com> On Sep 17, 2008, at 3:54 AM, Or Gerlitz wrote: >> Don't do that. Assume if rdmacm provides an interface for doing >> something, then there is likely a reason. In this case, when you >> call rdma_create_qp(), it does more than just call ibv_create_qp() >> and ibv_modify_qp() on your behalf. It also pipes information >> about the state changes in the qp to the kernel rdma_cm module (by >> writing the commands in rdma_cm format to id->channel->fd, which is >> the rdma_cm fd not the qp fd, in places like rdma_init_qp_attr()). > As far as I remember, unlike the kernel rdma-cm, librdmacm doesn't > support a mode where the app creates the qp and maintain the qp > state transitions. However, looking for example on rdma_accept, you > can see that it calls ucma_modify_qp_rtr unconditionally and the > latter caused a return of -EINVAL if id->qp is NULL, on the other > hand what the accept code does next is to branch based on whethere > id->qp is NULL or not... Right. I agree that it's a bit weird that we create the QP manually -- it's just what the original author chose to do. But we never did any of the QP state transitions -- user RDMACM does that for us. All I've done in my tests is verify that the state transitions were occurring properly (but putting a little testing code in librdmacm, actually). I'll keep digging... -- Jeff Squyres Cisco Systems From ruimario at gmail.com Wed Sep 17 07:12:44 2008 From: ruimario at gmail.com (Rui Machado) Date: Wed, 17 Sep 2008 16:12:44 +0200 Subject: [ofa-general] atomic operations on ppc64 Message-ID: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> Hi list, does anyone have experienced problems using IB atomic operations (fetch and add) on a ppc64 platform? I tried a small example (using fetch and add) on x86 and ppc64 and on x86 worked fine while on ppc64 didn't. Thank you for the help Rui From dotanba at gmail.com Wed Sep 17 07:19:39 2008 From: dotanba at gmail.com (Dotan Barak) Date: Wed, 17 Sep 2008 17:19:39 +0300 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> Message-ID: <2f3bf9a60809170719x1df462bcl6ae1a4f776d474f9@mail.gmail.com> On Wed, Sep 17, 2008 at 5:12 PM, Rui Machado wrote: > Hi list, > > does anyone have experienced problems using IB atomic operations > (fetch and add) on a ppc64 platform? > I tried a small example (using fetch and add) on x86 and ppc64 and on > x86 worked fine while on ppc64 didn't. Do you handle the ntoh/hton or do you let the driver/HCA deal with it by itself? Dotan From hal.rosenstock at gmail.com Wed Sep 17 07:46:54 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 17 Sep 2008 10:46:54 -0400 Subject: [ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1 In-Reply-To: References: Message-ID: Hi, On Wed, Sep 17, 2008 at 4:25 AM, Wen Hao Wang wrote: > Hi all: > > I had one IB cluster with eight IBM HS21 blades, mixed with RHEL5.2 Server > and SLES10 SP2. All of them connected to one IB switch. opensm was running > as subnet manager on one blade. Command ibcheckerrors finished smoothly. > Last week I got another eight IBM LS21 blades connected to another IB > switch. But after I connected two switches and turned on all the IB adapters > on new blades, ibcheckerrors gave error message: > > [root at gaia-07 ~]# ibcheckerrors > #warn: counter RcvErrors = 5691 (threshold 10) lid 3 port 1 > Error check on lid 3 (gaia-07 HCA-1) port 1: FAILED > > ## Summary: 19 nodes checked, 0 bad nodes found > ## 46 ports checked, 1 ports have errors beyond threshold > [root at gaia-07 ~]# ibv_devinfo > hca_id: mlx4_0 > fw_ver: 2.3.000 > node_guid: 0002:c903:0001:3370 > sys_image_guid: 0002:c903:0001:3373 > vendor_id: 0x02c9 > vendor_part_id: 25418 > hw_ver: 0xA0 > board_id: IBM08A0000001 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 15 > port_lid: 3 > port_lmc: 0x00 > > port: 2 > state: PORT_DOWN (1) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > [root at gaia-07 ~]# ibcheckport 3 1 > [root at gaia-07 ~]# echo $? > 0 > > I had closed the embeded subnet manager on two IB switches. The issue always > exist, even after I change subnet manager location to another machine. ib0 > of machine gaia-07 can communicate with other machines each other. All > installed IB adapters are ConnectX 4xSDR. Both switches are Topspin > Switches. Will anyone give some advice about this issue? Thanks in advance! counter RcvErrors = 5691 is indicating the value of PortCounters:RcvErrors. Per IBA section 16.1.3.5, it includes: • Local physical errors (ICRC, VCRC, LPCRC, and all physical errors that cause entry into the BAD PACKET or BAD PACKET DISCARD states of the packet receiver state machine) • Malformed data packet errors (LVer, length, VL) • Malformed link packet errors (operand, length, VL) • Packets discarded due to buffer overrun Those errors may have occurred when you plugged in the additional nodes. You might want to clear the errors first and then see if they are continually increasing or stable. -- Hal > > Wen Hao Wang > Email: wangwhao at cn.ibm.com > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Wed Sep 17 09:40:39 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 Sep 2008 09:40:39 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get four patches, which I believe are all appropriate for this stage of 2.6.27: - A fix for a nes regression we introduced in some earlier 2.6.27 patches that leads to userspace processes hanging on exit. - Two fixes that make a new mlx4 feature we added in 2.6.27 actually work. The functions touched are only used for the new fast register work requests so fixing this is low risk. - A fix for an IPoIB RTNL deadlock that we introduced in 2.6.27. Faisal Latif (1): RDMA/nes: Fix client side QP destroy Roland Dreier (1): Merge branches 'ipoib', 'mlx4' and 'nes' into for-linus Vladimir Sokolovsky (2): mlx4_core: Set RAE and init mtt_sz field in FRMR MPT entries IB/mlx4: Fix up fast register page list format Yossi Etigin (1): IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() drivers/infiniband/hw/mlx4/qp.c | 6 ++++ drivers/infiniband/hw/nes/nes_cm.c | 11 +------- drivers/infiniband/ulp/ipoib/ipoib.h | 2 + drivers/infiniband/ulp/ipoib/ipoib_main.c | 1 + drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 31 ++++++++++++++++------- drivers/net/mlx4/mr.c | 10 ++++--- include/linux/mlx4/device.h | 4 +++ 7 files changed, 42 insertions(+), 23 deletions(-) commit e8224e4b804b4fd26723191c1891101a5959bb8a Author: Yossi Etigin Date: Tue Sep 16 11:57:45 2008 -0700 IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index b0ffc9a..05eb41b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -293,6 +293,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_poll_task; struct delayed_work mcast_task; + struct work_struct carrier_on_task; struct work_struct flush_light; struct work_struct flush_normal; struct work_struct flush_heavy; @@ -464,6 +465,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_dev_cleanup(struct net_device *dev); void ipoib_mcast_join_task(struct work_struct *work); +void ipoib_mcast_carrier_on_task(struct work_struct *work); void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb); void ipoib_mcast_restart_task(struct work_struct *work); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 7e9e218..1b1df5c 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1075,6 +1075,7 @@ static void ipoib_setup(struct net_device *dev) INIT_DELAYED_WORK(&priv->pkey_poll_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); + INIT_WORK(&priv->carrier_on_task, ipoib_mcast_carrier_on_task); INIT_WORK(&priv->flush_light, ipoib_ib_dev_flush_light); INIT_WORK(&priv->flush_normal, ipoib_ib_dev_flush_normal); INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index ac33c8f..aae2862 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -366,6 +366,21 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast) return ret; } +void ipoib_mcast_carrier_on_task(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + carrier_on_task); + + /* + * Take rtnl_lock to avoid racing with ipoib_stop() and + * turning the carrier back on while a device is being + * removed. + */ + rtnl_lock(); + netif_carrier_on(priv->dev); + rtnl_unlock(); +} + static int ipoib_mcast_join_complete(int status, struct ib_sa_multicast *multicast) { @@ -392,16 +407,12 @@ static int ipoib_mcast_join_complete(int status, &priv->mcast_task, 0); mutex_unlock(&mcast_mutex); - if (mcast == priv->broadcast) { - /* - * Take RTNL lock here to avoid racing with - * ipoib_stop() and turning the carrier back - * on while a device is being removed. - */ - rtnl_lock(); - netif_carrier_on(dev); - rtnl_unlock(); - } + /* + * Defer carrier on work to ipoib_workqueue to avoid a + * deadlock on rtnl_lock here. + */ + if (mcast == priv->broadcast) + queue_work(ipoib_workqueue, &priv->carrier_on_task); return 0; } commit d7ffd5076d4407d54b25bc4b25f3002f74fbafde Author: Faisal Latif Date: Tue Sep 16 11:56:26 2008 -0700 RDMA/nes: Fix client side QP destroy Fix QP not being destroyed properly on the client, which leads to userspace programs hanging on exit. This is a missing chunk from the connection management rewrite in commit 6492cdf3 ("RDMA/nes: CM connection setup/teardown rework"). Signed-off-by: Faisal Latif Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 9f0b964..499d3cf 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1956,13 +1956,6 @@ static int mini_cm_reject(struct nes_cm_core *cm_core, return ret; cleanup_retrans_entry(cm_node); cm_node->state = NES_CM_STATE_CLOSED; - ret = send_fin(cm_node, NULL); - - if (cm_node->accept_pend) { - BUG_ON(!cm_node->listener); - atomic_dec(&cm_node->listener->pend_accepts_cnt); - BUG_ON(atomic_read(&cm_node->listener->pend_accepts_cnt) < 0); - } ret = send_reset(cm_node, NULL); return ret; @@ -2383,6 +2376,7 @@ static int nes_cm_disconn_true(struct nes_qp *nesqp) atomic_inc(&cm_disconnects); cm_event.event = IW_CM_EVENT_DISCONNECT; if (last_ae == NES_AEQE_AEID_LLP_CONNECTION_RESET) { + issued_disconnect_reset = 1; cm_event.status = IW_CM_EVENT_STATUS_RESET; nes_debug(NES_DBG_CM, "Generating a CM " "Disconnect Event (status reset) for " @@ -2508,7 +2502,6 @@ static int nes_disconnect(struct nes_qp *nesqp, int abrupt) nes_debug(NES_DBG_CM, "Call close API\n"); g_cm_core->api->close(g_cm_core, nesqp->cm_node); - nesqp->cm_node = NULL; } return ret; @@ -2837,6 +2830,7 @@ int nes_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) cm_node->apbvt_set = 1; nesqp->cm_node = cm_node; cm_node->nesqp = nesqp; + nes_add_ref(&nesqp->ibqp); return 0; } @@ -3167,7 +3161,6 @@ static void cm_event_connect_error(struct nes_cm_event *event) if (ret) printk(KERN_ERR "%s[%u] OFA CM event_handler returned, " "ret=%d\n", __func__, __LINE__, ret); - nes_rem_ref(&nesqp->ibqp); cm_id->rem_ref(cm_id); rem_ref_cm_node(event->cm_node->cm_core, event->cm_node); commit 29bdc88384c2b24e37e5760df0dc898546083d6b Author: Vladimir Sokolovsky Date: Mon Sep 15 14:25:23 2008 -0700 IB/mlx4: Fix up fast register page list format Byte swap the addresses in the page list for fast register work requests to big endian to match what the HCA expectx. Also, the addresses must have the "present" bit set so that the HCA knows it can access them. Otherwise the HCA will fault the first time it accesses the memory region. Signed-off-by: Vladimir Sokolovsky Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index f29dbb7..9559248 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1342,6 +1342,12 @@ static __be32 convert_access(int acc) static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg, struct ib_send_wr *wr) { struct mlx4_ib_fast_reg_page_list *mfrpl = to_mfrpl(wr->wr.fast_reg.page_list); + int i; + + for (i = 0; i < wr->wr.fast_reg.page_list_len; ++i) + wr->wr.fast_reg.page_list->page_list[i] = + cpu_to_be64(wr->wr.fast_reg.page_list->page_list[i] | + MLX4_MTT_FLAG_PRESENT); fseg->flags = convert_access(wr->wr.fast_reg.access_flags); fseg->mem_key = cpu_to_be32(wr->wr.fast_reg.rkey); diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 644adf0..d1dd5b4 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -71,8 +71,6 @@ struct mlx4_mpt_entry { #define MLX4_MPT_PD_FLAG_RAE (1 << 28) #define MLX4_MPT_PD_FLAG_EN_INV (3 << 24) -#define MLX4_MTT_FLAG_PRESENT 1 - #define MLX4_MPT_STATUS_SW 0xF0 #define MLX4_MPT_STATUS_HW 0x00 diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 655ea0d..b2f9444 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -141,6 +141,10 @@ enum { MLX4_STAT_RATE_OFFSET = 5 }; +enum { + MLX4_MTT_FLAG_PRESENT = 1 +}; + static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) { return (major << 32) | (minor << 16) | subminor; commit c9257433f2eaf8803a1f3d3be5d984232db41ffe Author: Vladimir Sokolovsky Date: Tue Sep 2 13:38:29 2008 -0700 mlx4_core: Set RAE and init mtt_sz field in FRMR MPT entries Set the RAE (remote access enable) bit and correctly initialize the MTT size in MPT entries being set up for fast register memory regions. Otherwise the callers can't enable remote access and in fact can't fast register at all (since the HCA will think no MTT entries are allocated). Signed-off-by: Vladimir Sokolovsky Signed-off-by: Roland Dreier diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 62071d9..644adf0 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -67,7 +67,8 @@ struct mlx4_mpt_entry { #define MLX4_MPT_FLAG_PHYSICAL (1 << 9) #define MLX4_MPT_FLAG_REGION (1 << 8) -#define MLX4_MPT_PD_FLAG_FAST_REG (1 << 26) +#define MLX4_MPT_PD_FLAG_FAST_REG (1 << 27) +#define MLX4_MPT_PD_FLAG_RAE (1 << 28) #define MLX4_MPT_PD_FLAG_EN_INV (3 << 24) #define MLX4_MTT_FLAG_PRESENT 1 @@ -348,7 +349,10 @@ int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr) if (mr->mtt.order >= 0 && mr->mtt.page_shift == 0) { /* fast register MR in free state */ mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_FREE); - mpt_entry->pd_flags |= cpu_to_be32(MLX4_MPT_PD_FLAG_FAST_REG); + mpt_entry->pd_flags |= cpu_to_be32(MLX4_MPT_PD_FLAG_FAST_REG | + MLX4_MPT_PD_FLAG_RAE); + mpt_entry->mtt_sz = cpu_to_be32((1 << mr->mtt.order) * + MLX4_MTT_ENTRY_PER_SEG); } else { mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_SW_OWNS); } From dledford at redhat.com Wed Sep 17 10:00:23 2008 From: dledford at redhat.com (Doug Ledford) Date: Wed, 17 Sep 2008 13:00:23 -0400 Subject: [ofa-general] Question about RDMA CM In-Reply-To: <4236041E-C487-4BAB-B9C7-3984F4419E8C@cisco.com> References: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> <1221630323.1927.575.camel@firewall.xsintricity.com> <4236041E-C487-4BAB-B9C7-3984F4419E8C@cisco.com> Message-ID: <1221670823.15868.28.camel@firewall.xsintricity.com> On Wed, 2008-09-17 at 08:15 -0400, Jeff Squyres wrote: > On Sep 17, 2008, at 1:45 AM, Doug Ledford wrote: > > >> No; we are using ibv_create_qp, and then assigning id->qp afterwards. > > > > Don't do that. Assume if rdmacm provides an interface for doing > > something, then there is likely a reason. In this case, when you call > > rdma_create_qp(), it does more than just call ibv_create_qp() and > > ibv_modify_qp() on your behalf. It also pipes information about the > > state changes in the qp to the kernel rdma_cm module (by writing the > > commands in rdma_cm format to id->channel->fd, which is the rdma_cm fd > > not the qp fd, in places like rdma_init_qp_attr()). > > In general, I agree with you (use rdmacm_create_qp instead of making > it manually). But I'm looking at the head of the librdmacm git and I > don't see what you're talking about. All it does is call > ibv_create_qp and ibv_modify_qp. Does head of git not have this snippet in rdma_create_qp(): if (ucma_is_ud_ps(id->ps)) ret = ucma_init_ud_qp(id_priv, qp); else ret = ucma_init_conn_qp(id_priv, qp); That's in the librdmacm code I have here, and that happens to be called *before* the code sets id->qp = qp. And in that call chain, ucma_init_*_qp() both end up calling rdma_init_qp_attr() and it's here that we write to id->channel->fd (aka, the kernel rdmacm module instead of the verbs module) what we are doing. So unless things have changed, it's not just a simple wrapper. And even if they have changed, if you want to play it safe in terms of older librdmacm versions, you have to assume it isn't a simple wrapper. > The person who initially wrote this code chose to create the qp > manually for two reasons: > > - rdmacm_create_qp is just a wrapper around ibv_create_qp and > ibv_modify_qp > - other parts of OMPI (that don't use RDMA CM for wireup) call > ibv_create_qp and ibv_modify_qp > > That being said, I actually spent a little time yesterday trying to > convert to use rdmacm_create_qp and was having problems with the > comparison at the top of rdmacm_create_qp against the protection > domain -- somehow it was failing for me. It was not immediately > obvious to me where rdmacm was getting that pd from, nor why that > comparison would fail. This is some comments and code I had that works just fine for getting at the right id/pd pair when creating a new cm_id: // Before we can create a queue pair (QP), we have to have a protection // domain (PD) and it has to exist on the controller we are going // to create the QP on. Since on the server we want to be able to // share buffers between connections, and the buffer's PD must match // the QP's PD, and all connections have their own QP, we // have to share the same PD across all QPs on a single controller. // However, we don't know what controller rdma_resolve_addr bound // us to. But, inside the cm_id, there is a pointer to an ibv_context, // and our device list is actually a list of ibv_context pointers, so // try to match one of our known device pointers to that pointer and // if we hit a match, we know what our PD needs to be since we // already allocated it. If we don't find a match, we are screwed // and we bail. for (devnum = 0; devnum < num_devices; devnum++) if (t_data->rdma->cm_id->verbs == devlist[devnum].device) break; if (devnum == num_devices || devlist[devnum].domain == NULL) { printf("_rdma_connect: couldn't find matching context\n"); goto out; } In the rdma init code I do this to set up the devlist array in the first place: devices = rdma_get_devices(&num_devices); ... for(i=0; i < num_devices; i++) { devlist[i].device = devices[i]; devlist[i].domain = ibv_alloc_pd(devlist[i].device); An important factor being that you must use rdma_get_devices() and the ib contexts returned there from in your code when you are allocating protection domain contexts. > >> As far as I can tell, I am not sending to the wrong QP. But it is > >> complex code, so there certainly can be a bug in this area. > >> > >> The thing that is weird for me is that setting rnr_retry to 7 makes > >> it > >> work. > > > > I didn't look into the kernel code so I couldn't venture a guess as to > > whether or not the above is actually a hard requirement, and whether > > or > > not it would explain the rnr_retry of 7 getting around the race > > condition, but I would think it's plausible. > > > If all is working properly, a rnr_retry_count of 0 should be > sufficient because there should be no race conditions. This is what > OMPI has had for years; it's only this new RDMA CM wireup problem that > has forced me to set it at 7. > > However, as I mentioned before, this is complex code, so it's quite > possible (likely?) that I have a bug in the code somewhere. I was > posting here looking for any possible insights into why this could > happen. > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From ruimario at gmail.com Wed Sep 17 10:14:37 2008 From: ruimario at gmail.com (Rui Machado) Date: Wed, 17 Sep 2008 19:14:37 +0200 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> <2f3bf9a60809170719x1df462bcl6ae1a4f776d474f9@mail.gmail.com> <6978b4af0809170728l2417cbach62f080a216db2ab4@mail.gmail.com> <2f3bf9a60809170736id27d1f1s7ee8bd207392e368@mail.gmail.com> <6978b4af0809170744o13b196e5i8ec73303542176ba@mail.gmail.com> <2f3bf9a60809170748m4c9a1ca4t3bf5cdc7bc51a2f3@mail.gmail.com> <6978b4af0809170754w2e9dbd96j58b216f0340b4f66@mail.gmail.com> <2f3bf9a60809170757s5db88243sefa38eeebc412968@mail.gmail.com> <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> Message-ID: <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> From: Rui Machado Date: 2008/9/17 Subject: Re: [ofa-general] atomic operations on ppc64 To: Dotan Barak 2008/9/17 Dotan Barak : > On Wed, Sep 17, 2008 at 5:54 PM, Rui Machado wrote: >> 2008/9/17 Dotan Barak : >>> On Wed, Sep 17, 2008 at 5:44 PM, Rui Machado wrote: >>>> 2008/9/17 Dotan Barak : >>>>> On Wed, Sep 17, 2008 at 5:28 PM, Rui Machado wrote: >>>>>> Hey Dotan, >>>>>> >>>>>> 2008/9/17 Dotan Barak : >>>>>>> On Wed, Sep 17, 2008 at 5:12 PM, Rui Machado wrote: >>>>>>>> Hi list, >>>>>>>> >>>>>>>> does anyone have experienced problems using IB atomic operations >>>>>>>> (fetch and add) on a ppc64 platform? >>>>>>>> I tried a small example (using fetch and add) on x86 and ppc64 and on >>>>>>>> x86 worked fine while on ppc64 didn't. >>>>>>> >>>>>>> Do you handle the ntoh/hton or do you let the driver/HCA deal with it by itself? >>>>>> >>>>>> Nop, I don't use those. I guess then I'm letting the driver/HCA deal with it.... >>>>> >>>>> Do you see endianess issues or completely corrupted data? >>>>> >>>> >>>> Just to make it clear (to me :) ). I'm talking about ppc64<-->ppc64 >>>> communication. >>>> Should I still concern with converting data because of endianess? >>>> What happens is that I ask for a fetch and add and it doesn't happen. >>>> The value on the server doesn't get modified. >>> >>> This is a weird behaviour indeed .. >>> >>> Can you post the code in your program that fill the SR? >>> >>> Dotan >>> >> >> Not sure what do you mean by SR. >> Here's is the function inc() which I call to increment 1 one the >> remote machine. The remote machine has its buffer full of zeroes. >> That's what the client gets all the time although I increment 3 times >> in a row (with a sleep in between) >> >> Is this enough? >> Thanks for the help >> >> void inc() >> { >> >> struct ibv_qp_attr check_attr; >> struct ibv_qp_init_attr check_init_attr; >> >> void *ev_ctx; >> >> struct ibv_send_wr *bad_wr; >> struct ibv_wc wc; >> struct ibv_sge slist; >> struct ibv_send_wr swr3; >> >> >> slist.addr = (uintptr_t)buffer; >> slist.length = 8; >> slist.lkey =mr->lkey; >> >> swr3.wr.atomic.remote_addr = remote_node->mi.bufAddr; >> swr3.wr.atomic.rkey = remote_node->mi.buf_rkey; >> swr3.wr.atomic.compare_add = 1; >> >> swr3.wr_id = 1; >> swr3.sg_list = &slist; >> swr3.num_sge = 1; >> swr3.opcode = IBV_WR_ATOMIC_FETCH_AND_ADD; >> swr3.send_flags = IBV_SEND_SIGNALED; >> swr3.next = NULL; >> >> >> if(ibv_post_send(qp,&swr3,&bad_wr)){ >> printf("Couldn't post send...\n"); >> return 0; >> } >> >> >> int ne=0; >> do{ >> ne = ibv_poll_cq(cq,1,&wc); >> }while(ne==0); >> >> if((ne < 0) || (wc.status != IBV_WC_SUCCESS)){ >> >> //check qp status >> if(!ibv_query_qp(qp,&check_attr,IBV_QP_STATE,&check_init_attr)) >> printf("The qp state is: %d\n ",check_attr.qp_state); >> >> } >> } >> > > The code looks good and it should work... > (I would have memset every structure before using it ..) > > > Did you check the memory in the sender side or in the reciver side? > As I mentioned it does work on x86. Actually on both: server: Initial counter at buffer is 0 counter at buffer is 0 counter at buffer is 0 counter at buffer is 0 counter at buffer is 0 counter at buffer is 0 counter at buffer is 0 counter at buffer is 0 client: initial IB atomic counter 0 IB atomic counter 0 IB atomic counter 0 IB atomic counter 0 What could this be related to? Driver, HW? From jsquyres at cisco.com Wed Sep 17 10:25:42 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 17 Sep 2008 13:25:42 -0400 Subject: [ofa-general] Question about RDMA CM In-Reply-To: <1221670823.15868.28.camel@firewall.xsintricity.com> References: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> <1221630323.1927.575.camel@firewall.xsintricity.com> <4236041E-C487-4BAB-B9C7-3984F4419E8C@cisco.com> <1221670823.15868.28.camel@firewall.xsintricity.com> Message-ID: <28D7C298-C191-4DDB-858C-EC689617E604@cisco.com> On Sep 17, 2008, at 1:00 PM, Doug Ledford wrote: > Does head of git not have this snippet in rdma_create_qp(): > > if (ucma_is_ud_ps(id->ps)) > ret = ucma_init_ud_qp(id_priv, qp); > else > ret = ucma_init_conn_qp(id_priv, qp); > > That's in the librdmacm code I have here, and that happens to be > called > *before* the code sets id->qp = qp. And in that call chain, > ucma_init_*_qp() both end up calling rdma_init_qp_attr() and it's here > that we write to id->channel->fd (aka, the kernel rdmacm module > instead > of the verbs module) what we are doing. And so it does -- I read that code quickly and my eye simply saw what it wanted to see (ibv_query_qp()), not rdma_init_qp_attr(). I'll flog the guy who originally wrote this. :-) > An important factor being that you must use rdma_get_devices() and the > ib contexts returned there from in your code when you are allocating > protection domain contexts. This is what I was afraid of. #@$%@#$%!! We do not use rdma_get_devices() because our wireup scheme is both modular (it may use RDMA CM or it may use something else) and separate from the RDMA device discovery process. Specifically, the PD that we have allocated is from the context we got back from ibv_open_device(), not rdma_get_devices(). I'll have to go tinker around to see if I can make that work. -- Jeff Squyres Cisco Systems From sashak at voltaire.com Wed Sep 17 11:32:58 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 17 Sep 2008 21:32:58 +0300 Subject: [ofa-general] [PATCH] opensm: multicast group create/delete notification fix Message-ID: <20080917183258.GA25831@sashak.voltaire.com> Now OpenSM is not sending multicast group delete notification when last full member leaves the group. But according to IBA it should: o15-0.2-1.9: If SA supports UD multicast, then if SA receives a valid MC deletion MAD that removes the last full member, SA shall delete all records belonging to that multicast group. If this port was the last full member, SA may release all resources associated with it, including resources that may exist in the fabric itself. The exact time at which this operation is done is vendor-specific. When the last full member leaves the group, SA shall forward Trap 67 (see 14.4.10 Multicast Group Create/Delete Traps on page 891) to subscribing endports. This patch fixes this bug: Now OpenSM will track a number of full members for each multicast group and will send MCG creation notification when first full member joins the group (so it will work for "well known" groups as well) and MCG deletion notification when last full member leaves. Routing will not be calculated for groups w/out full members (including "well known") and such group will be deleted at end of routing cycle (except "well known" as it is now). As side effect of the patch osm_mgrp_port_remove() is not called from osm_sm_mcgrp_leave() anymore and we can cleanup locking there against SA MCR processor. The patch potentially could be dangerous, it changes the current behavior, so any kind of testing will be appreciated. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_multicast.h | 9 ++- opensm/opensm/osm_drop_mgr.c | 2 +- opensm/opensm/osm_mcast_mgr.c | 18 ++---- opensm/opensm/osm_multicast.c | 95 ++++++++++++++++++++++--------- opensm/opensm/osm_sa.c | 3 +- opensm/opensm/osm_sa_mcmember_record.c | 51 ++++------------- opensm/opensm/osm_sm.c | 6 +- 7 files changed, 98 insertions(+), 86 deletions(-) diff --git a/opensm/include/opensm/osm_multicast.h b/opensm/include/opensm/osm_multicast.h index c860d4a..a0eab16 100644 --- a/opensm/include/opensm/osm_multicast.h +++ b/opensm/include/opensm/osm_multicast.h @@ -139,6 +139,7 @@ typedef struct osm_mgrp { boolean_t to_be_deleted; uint32_t last_change_id; uint32_t last_tree_id; + unsigned full_members; } osm_mgrp_t; /* * FIELDS @@ -364,7 +365,8 @@ static inline ib_net16_t osm_mgrp_get_mlid(IN const osm_mgrp_t * const p_mgrp) * * SYNOPSIS */ -osm_mcm_port_t *osm_mgrp_add_port(IN osm_mgrp_t * const p_mgrp, +osm_mcm_port_t *osm_mgrp_add_port(osm_subn_t *subn, osm_log_t *log, + IN osm_mgrp_t * const p_mgrp, IN const ib_gid_t * const p_port_gid, IN const uint8_t join_state, IN boolean_t proxy_join); @@ -433,7 +435,7 @@ osm_mgrp_is_port_present(IN const osm_mgrp_t * const p_mgrp, * SYNOPSIS */ void -osm_mgrp_remove_port(IN osm_subn_t * const p_subn, +osm_mgrp_delete_port(IN osm_subn_t * const p_subn, IN osm_log_t * const p_log, IN osm_mgrp_t * const p_mgrp, IN const ib_net64_t port_guid); @@ -460,6 +462,9 @@ osm_mgrp_remove_port(IN osm_subn_t * const p_subn, * SEE ALSO *********/ +int osm_mgrp_remove_port(osm_subn_t *subn, osm_log_t *log, osm_mgrp_t *mgrp, + osm_mcm_port_t *mcm, uint8_t join_state); + /****f* OpenSM: Multicast Group/osm_mgrp_apply_func * NAME * osm_mgrp_apply_func diff --git a/opensm/opensm/osm_drop_mgr.c b/opensm/opensm/osm_drop_mgr.c index 1aeb172..e827c26 100644 --- a/opensm/opensm/osm_drop_mgr.c +++ b/opensm/opensm/osm_drop_mgr.c @@ -209,7 +209,7 @@ static void __osm_drop_mgr_remove_port(osm_sm_t * sm, IN osm_port_t * p_port) while (p_mcm != (osm_mcm_info_t *) cl_qlist_end(&p_port->mcm_list)) { p_mgrp = osm_get_mgrp_by_mlid(sm->p_subn, p_mcm->mlid); if (p_mgrp) { - osm_mgrp_remove_port(sm->p_subn, sm->p_log, + osm_mgrp_delete_port(sm->p_subn, sm->p_log, p_mgrp, p_port->guid); osm_mcm_info_delete((osm_mcm_info_t *) p_mcm); } diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c index 78eed50..fc8533d 100644 --- a/opensm/opensm/osm_mcast_mgr.c +++ b/opensm/opensm/osm_mcast_mgr.c @@ -1073,6 +1073,9 @@ osm_mcast_mgr_process_tree(osm_sm_t * sm, */ __osm_mcast_mgr_clear(sm, p_mgrp); + if (!p_mgrp->full_members) + goto Exit; + status = __osm_mcast_mgr_build_spanning_tree(sm, p_mgrp); if (status != IB_SUCCESS) { OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 0A17: " @@ -1109,20 +1112,11 @@ mcast_mgr_process_mgrp(osm_sm_t * sm, } p_mgrp->last_tree_id = p_mgrp->last_change_id; - /* Remove MGRP only if osm_mcm_port_t count is 0 and - * Not a well known group - */ - if (cl_qmap_count(&p_mgrp->mcm_port_tbl) == 0 && !p_mgrp->well_known) { + /* remove MCGRP if it is marked for deletion */ + if (p_mgrp->to_be_deleted) { OSM_LOG(sm->p_log, OSM_LOG_DEBUG, - "Destroying mgrp with lid:0x%X\n", + "Destroying mgrp with lid:0x%x\n", cl_ntoh16(p_mgrp->mlid)); - if (p_mgrp->to_be_deleted == FALSE) { - p_mgrp->to_be_deleted = TRUE; - /* Send a Report to any InformInfo registered for - Trap 67 : MCGroup delete */ - osm_mgrp_send_delete_notice(sm->p_subn, sm->p_log, - p_mgrp); - } sm->p_subn->mgroups[cl_ntoh16(p_mgrp->mlid) - IB_LID_MCAST_START_HO] = NULL; osm_mgrp_delete(p_mgrp); } diff --git a/opensm/opensm/osm_multicast.c b/opensm/opensm/osm_multicast.c index b810630..83c0399 100644 --- a/opensm/opensm/osm_multicast.c +++ b/opensm/opensm/osm_multicast.c @@ -95,7 +95,8 @@ osm_mgrp_t *osm_mgrp_new(IN const ib_net16_t mlid) /********************************************************************** **********************************************************************/ -osm_mcm_port_t *osm_mgrp_add_port(IN osm_mgrp_t * const p_mgrp, +osm_mcm_port_t *osm_mgrp_add_port(IN osm_subn_t *subn, osm_log_t *log, + IN osm_mgrp_t * const p_mgrp, IN const ib_gid_t * const p_port_gid, IN const uint8_t join_state, IN boolean_t proxy_join) @@ -103,7 +104,7 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_mgrp_t * const p_mgrp, ib_net64_t port_guid; osm_mcm_port_t *p_mcm_port; cl_map_item_t *prev_item; - uint8_t prev_join_state; + uint8_t prev_join_state = 0; uint8_t prev_scope; p_mcm_port = osm_mcm_port_new(p_port_gid, join_state, proxy_join); @@ -142,43 +143,81 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_mgrp_t * const p_mgrp, p_mgrp->last_change_id++; } + if ((join_state ^ prev_join_state) & IB_JOIN_STATE_FULL) { + if (join_state & IB_JOIN_STATE_FULL) { + if (++p_mgrp->full_members == 1) { + osm_mgrp_send_create_notice(subn, log, p_mgrp); + p_mgrp->to_be_deleted = 0; + } + } else if (--p_mgrp->full_members == 0) { + osm_mgrp_send_delete_notice(subn, log, p_mgrp); + if (!p_mgrp->well_known) + p_mgrp->to_be_deleted = 1; + } + } + return (p_mcm_port); } /********************************************************************** **********************************************************************/ -void -osm_mgrp_remove_port(IN osm_subn_t * const p_subn, - IN osm_log_t * const p_log, - IN osm_mgrp_t * const p_mgrp, - IN const ib_net64_t port_guid) +int osm_mgrp_remove_port(osm_subn_t *subn, osm_log_t *log, osm_mgrp_t *mgrp, + osm_mcm_port_t *mcm, uint8_t join_state) { - cl_map_item_t *p_map_item; - - CL_ASSERT(p_mgrp); - - p_map_item = cl_qmap_get(&p_mgrp->mcm_port_tbl, port_guid); - - if (p_map_item != cl_qmap_end(&p_mgrp->mcm_port_tbl)) { - cl_qmap_remove_item(&p_mgrp->mcm_port_tbl, p_map_item); - osm_mcm_port_delete((osm_mcm_port_t *) p_map_item); - - /* track the fact we modified the group */ - p_mgrp->last_change_id++; - } + int ret; + uint8_t port_join_state; + uint8_t new_join_state; /* - no more ports so the group will be deleted after re-route - but only if it is not a well known group and not already deleted + * according to the same o15-0.1.14 we get the stored + * JoinState and the request JoinState and they must be + * opposite to leave - otherwise just update it */ - if ((cl_is_qmap_empty(&p_mgrp->mcm_port_tbl)) && - (p_mgrp->well_known == FALSE) && (p_mgrp->to_be_deleted == FALSE)) { - p_mgrp->to_be_deleted = TRUE; + port_join_state = mcm->scope_state & 0x0F; + new_join_state = port_join_state & ~join_state; + + if (new_join_state) { + mcm->scope_state = new_join_state | (mcm->scope_state & 0xf0); + OSM_LOG(log, OSM_LOG_DEBUG, + "updating port 0x%" PRIx64 " JoinState 0x%x -> 0x%x\n", + cl_ntoh64(mcm->port_gid.unicast.interface_id), + port_join_state, new_join_state); + ret = 0; + } else { + cl_qmap_remove_item(&mgrp->mcm_port_tbl, &mcm->map_item); + OSM_LOG(log, OSM_LOG_DEBUG, "removing port 0x%" PRIx64 "\n", + cl_ntoh64(mcm->port_gid.unicast.interface_id)); + osm_mcm_port_delete(mcm); + /* track the fact we modified the group */ + mgrp->last_change_id++; + ret = 1; + } - /* Send a Report to any InformInfo registered for - Trap 67 : MCGroup delete */ - osm_mgrp_send_delete_notice(p_subn, p_log, p_mgrp); + /* no more full members so the group will be deleted after re-route + but only if it is not a well known group */ + if ((port_join_state ^ new_join_state) & IB_JOIN_STATE_FULL) { + if (port_join_state & IB_JOIN_STATE_FULL) { + if (--mgrp->full_members == 0) { + osm_mgrp_send_delete_notice(subn, log, mgrp); + if (!mgrp->well_known) + mgrp->to_be_deleted = 1; + } + } else if (++mgrp->full_members == 1) { + osm_mgrp_send_create_notice(subn, log, mgrp); + mgrp->to_be_deleted = 0; + } } + + return ret; +} + +void osm_mgrp_delete_port(osm_subn_t *subn, osm_log_t *log, osm_mgrp_t *mgrp, + ib_net64_t port_guid) +{ + cl_map_item_t *item = cl_qmap_get(&mgrp->mcm_port_tbl, port_guid); + + if (item != cl_qmap_end(&mgrp->mcm_port_tbl)) + osm_mgrp_remove_port(subn, log, mgrp, (osm_mcm_port_t *)item, 0xf); } /********************************************************************** diff --git a/opensm/opensm/osm_sa.c b/opensm/opensm/osm_sa.c index 37750eb..670deae 100644 --- a/opensm/opensm/osm_sa.c +++ b/opensm/opensm/osm_sa.c @@ -1007,7 +1007,8 @@ int osm_sa_db_file_load(osm_opensm_t * p_osm) if (cl_qmap_get(&p_mgrp->mcm_port_tbl, port_gid.unicast.interface_id) == cl_qmap_end(&p_mgrp->mcm_port_tbl)) - osm_mgrp_add_port(p_mgrp, &port_gid, + osm_mgrp_add_port(&p_osm->subn, &p_osm->log, + p_mgrp, &port_gid, scope_state, proxy_join); } else if (!strncmp(p, "Service Record:", 15)) { ib_service_record_t s_rec; diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index c4f9d48..3b4b435 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -206,7 +206,7 @@ __add_new_mgrp_port(IN osm_sa_t * sa, "Create new port with proxy_join TRUE\n"); } - *pp_mcmr_port = osm_mgrp_add_port(p_mgrp, + *pp_mcmr_port = osm_mgrp_add_port(sa->p_subn, sa->p_log, p_mgrp, &p_recvd_mcmember_rec->port_gid, p_recvd_mcmember_rec->scope_state, proxy_join); @@ -951,10 +951,6 @@ osm_mcmr_rcv_create_new_mgrp(IN osm_sa_t * sa, sa->p_subn->mgroups[cl_ntoh16(mlid) - IB_LID_MCAST_START_HO] = *pp_mgrp; - /* Send a Report to any InformInfo registered for - Trap 66: MCGroup create */ - osm_mgrp_send_create_notice(sa->p_subn, sa->p_log, *pp_mgrp); - Exit: OSM_LOG_EXIT(sa->p_log); return status; @@ -1055,8 +1051,7 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, ib_net16_t mlid; ib_net64_t portguid; osm_mcm_port_t *p_mcm_port; - uint8_t port_join_state; - uint8_t new_join_state; + int removed; OSM_LOG_ENTER(sa->p_log); @@ -1104,37 +1099,19 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, goto Exit; } - /* - * according to the same o15-0.1.14 we get the stored - * JoinState and the request JoinState and they must be - * opposite to leave - otherwise just update it - */ - port_join_state = p_mcm_port->scope_state & 0x0F; - new_join_state = - port_join_state & ~(p_recvd_mcmember_rec->scope_state & 0x0F); - if (new_join_state) { - /* Just update the result JoinState */ - p_mcm_port->scope_state = - new_join_state | (p_mcm_port->scope_state & 0xf0); - + mcmember_rec.scope_state = p_mcm_port->scope_state; + /* remove port or update join state */ + removed = osm_mgrp_remove_port(sa->p_subn, sa->p_log, p_mgrp, p_mcm_port, + p_recvd_mcmember_rec->scope_state&0x0F); + if (removed) mcmember_rec.scope_state = p_mcm_port->scope_state; - CL_PLOCK_RELEASE(sa->p_lock); - - OSM_LOG(sa->p_log, OSM_LOG_DEBUG, - "After update JoinState != 0. " - "Updating from 0x%X to 0x%X\n", - port_join_state, new_join_state); - } else { - /* we need to return the stored scope state */ - mcmember_rec.scope_state = p_mcm_port->scope_state; + CL_PLOCK_RELEASE(sa->p_lock); - /* OK we can leave */ - /* note: osm_sm_mcgrp_leave() will release sa->p_lock */ - if (osm_sm_mcgrp_leave(sa->sm, mlid, portguid)) - OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B09: " - "osm_sm_mcgrp_leave failed\n"); - } + /* we can leave if port was deleted from MCG */ + if (removed && osm_sm_mcgrp_leave(sa->sm, mlid, portguid)) + OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B09: " + "osm_sm_mcgrp_leave failed\n"); /* Send an SA response */ __osm_mcmr_rcv_respond(sa, p_madw, &mcmember_rec); @@ -1382,9 +1359,7 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) /* the request for routing failed so we need to remove the port */ p_mgrp = osm_get_mgrp_by_mlid(sa->p_subn, mlid); if (p_mgrp != NULL) { - osm_mgrp_remove_port(sa->p_subn, - sa->p_log, - p_mgrp, + osm_mgrp_delete_port(sa->p_subn, sa->p_log, p_mgrp, p_recvd_mcmember_rec->port_gid. unicast.interface_id); __cleanup_mgrp(sa, mlid); diff --git a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c index 292eb05..9b7810c 100644 --- a/opensm/opensm/osm_sm.c +++ b/opensm/opensm/osm_sm.c @@ -622,8 +622,8 @@ osm_sm_mcgrp_leave(IN osm_sm_t * const p_sm, /* * Acquire the port object for the port leaving this group. */ - /* note: p_sm->p_lock is locked by caller, but will be released later - this function */ + CL_PLOCK_EXCL_ACQUIRE(p_sm->p_lock); + p_port = osm_get_port_by_guid(p_sm->p_subn, port_guid); if (!p_port) { CL_PLOCK_RELEASE(p_sm->p_lock); @@ -649,8 +649,6 @@ osm_sm_mcgrp_leave(IN osm_sm_t * const p_sm, /* * Walk the list of ports in the group, and remove the appropriate one. */ - osm_mgrp_remove_port(p_sm->p_subn, p_sm->p_log, p_mgrp, port_guid); - osm_port_remove_mgrp(p_port, mlid); __osm_sm_mgrp_disconnect(p_sm, p_mgrp, port_guid); -- 1.5.4.rc2.60.gb2e62 From dledford at redhat.com Wed Sep 17 12:32:08 2008 From: dledford at redhat.com (Doug Ledford) Date: Wed, 17 Sep 2008 15:32:08 -0400 Subject: [ofa-general] Re: compat-dapl-1.2.10 install bogosity In-Reply-To: <1221627719.1927.563.camel@firewall.xsintricity.com> References: <1221627719.1927.563.camel@firewall.xsintricity.com> Message-ID: <1221679928.15868.30.camel@firewall.xsintricity.com> On Wed, 2008-09-17 at 01:03 -0400, Doug Ledford wrote: > Arlin, you usually put out such nice stuff. But the install-exec-hook > you wrote into Makefile.am (as well as the uninstall-hook) are both > busted (not to mention the absolute wrong thing to do anyway). > > When using automake, all of your install directives that automake > creates automatically have $(DESTDIR) prepended to them so that make > DESTDIR= install will work. In the case of just about any > build system I know if, this is an absolute requirement. When you hand > write your own targets via hooks like that, they obviously don't get > rewritten by automake so you have to include $(DESTDIR) yourself. When > I tried to build this package, it tried to modify the /etc/ofed/dat.conf > file on my real filesystem instead of looking into the rpm build root. > > However, even once you fix that, modifying user configuration files > without their consent as part of make install is just a *BAD* idea. The > only time you should ever do anything like this is when you just flat > don't care about things like rpm packaging (or apt packaging) and having > the files on your system trackable, verifiable, erasable, etc. Install > methods that do this sort of thing are a good way to really alienate > people in a position like me ;-) Turns out the dapl-2 package was doing this to. I've attached the patch I used to solve the problem. I still don't know if I'd go around editing user config files on install/uninstall though. Oh, and I made the patch in the Makefile.in files because I don't run the automake tools, so it wouldn't get picked up if it was against Makefile.am. But, obviously, that's where you would want to apply these hunks. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: dapl-install-hook.patch Type: text/x-patch Size: 6362 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From wangwhao at cn.ibm.com Thu Sep 18 00:55:08 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Thu, 18 Sep 2008 15:55:08 +0800 Subject: [ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1 In-Reply-To: Message-ID: >counter RcvErrors = 5691 is indicating the value of >PortCounters:RcvErrors. Per IBA section 16.1.3.5, it includes: >• Local physical errors (ICRC, VCRC, LPCRC, and all physical >errors that cause entry into the BAD PACKET or BAD PACKET >DISCARD states of the packet receiver state machine) >• Malformed data packet errors (LVer, length, VL) >• Malformed link packet errors (operand, length, VL) >• Packets discarded due to buffer overrun > >Those errors may have occurred when you plugged in the additional >nodes. You might want to clear the errors first and then see if they >are continually increasing or stable. > >-- Hal Hello Hal: Thanks for your explanation. Unfortunately it is difficult for me to unplug/plug those cables. Moreover I need test many items within limited time. So I have to wait for suitable time to do that. Will let you know the result at that time. Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From wangwhao at cn.ibm.com Thu Sep 18 02:52:54 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Thu, 18 Sep 2008 17:52:54 +0800 Subject: [ofa-general] dapltest couldn't read ABI version Message-ID: Hi all: I installed OFED 1.3.1 and 1.4 RC1 on my SLES10 SP2 x86_64 servers. dapltest failed to be run as server model. Following error is reported: LS21-05:~ # dapltest -T S -d -D ib0 Server_Cmd.debug: 1 Server_Cmd.dapl_name: ib0 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 CMA: unable to open /dev/infiniband/rdma_cm LS21-05:32561: open_hca: ERR - RDMA channel No such file or directory LS21-05:32561: dapls_ib_open_hca failed 40000 DT_cs_Server: Could not open ib0 (DAT_INTERNAL_ERROR ) DT_cs_Server: Waiting for clients to all go away... DT_cs_Server: Cleaning up ... DT_cs_Server (ib0): Exiting. File /dev/infiniband/rdma_cm exists on RHEL servers, but not on SLES servers. ib0 can communicate with other servres. Any advice or comments? Thanks in advance! Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Thu Sep 18 03:10:07 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 18 Sep 2008 03:10:07 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080918-0200 daily build status Message-ID: <20080918101007.E8CC3E60939@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From hal.rosenstock at gmail.com Thu Sep 18 03:26:22 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Sep 2008 06:26:22 -0400 Subject: ***SPAM*** Re: [ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1 In-Reply-To: References: Message-ID: On Thu, Sep 18, 2008 at 3:55 AM, Wen Hao Wang wrote: >>counter RcvErrors = 5691 is indicating the value of >>PortCounters:RcvErrors. Per IBA section 16.1.3.5, it includes: >>• Local physical errors (ICRC, VCRC, LPCRC, and all physical >>errors that cause entry into the BAD PACKET or BAD PACKET >>DISCARD states of the packet receiver state machine) >>• Malformed data packet errors (LVer, length, VL) >>• Malformed link packet errors (operand, length, VL) >>• Packets discarded due to buffer overrun >> >>Those errors may have occurred when you plugged in the additional >>nodes. You might want to clear the errors first and then see if they >>are continually increasing or stable. >> >>-- Hal > > Hello Hal: > > Thanks for your explanation. Unfortunately it is difficult for me to > unplug/plug those cables. I wasn't suggesting to do that. Just that the errors may have occurred then. What I was suggesting was to first clear the errors and then recheck them to see if any of the error counters are increasing or whether they are relatively stable. -- Hal >Moreover I need test many items within limited > time. So I have to wait for suitable time to do that. Will let you know the > result at that time. > > Wen Hao Wang > Email: wangwhao at cn.ibm.com > From jean-vincent.ficet at bull.net Thu Sep 18 07:45:10 2008 From: jean-vincent.ficet at bull.net (Vincent Ficet) Date: Thu, 18 Sep 2008 16:45:10 +0200 Subject: [ofa-general] ibsim: sim_read_pkt: write failed: Resource temporarily unavailable - pkt dropped Message-ID: <48D26976.8060203@bull.net> Hello, While simulating a large fabric using ibsim (roughly 3000 lines of topology, 50 x 36 port switches, 576 HCAs), I get the following errors: sim_read_pkt: write failed: Resource temporarily unavailable - pkt dropped The code is as follows (ibsim.c, function sim_read_pkt()): // reply ret = write(dcl->fd, buf, size); if (ret == size) return 0; if (ret < 0 && (errno == ECONNREFUSED || errno == ENOTCONN)) { IBWARN("client %u seems to be dead - disconnecting.", dcl->id); disconnect_client(dcl->id); } IBWARN("write failed: %m - pkt dropped"); The error being thrown out here is EAGAIN and is not handled at all. When I kill opensm after seeing these errors, I see that the MADs were not acknowledged by ibsim, e.g: OpenSM: Got signal 2 - exiting... There are still 51 MADs out. Forcing the exit of the OpenSM application... To address this issue, I modified the code as follows: --- ibsim.c.ORIG 2008-09-18 14:30:07.000000000 +0200 +++ ibsim.c 2008-09-18 15:37:55.000000000 +0200 @@ -481,6 +481,8 @@ return -1; } for (;;) { + int retry_count = 0; + if ((size = read(fd, buf, sizeof(buf))) <= 0) return size; @@ -497,7 +499,14 @@ size, sizeof(struct sim_request), dcl->id, dcl->fd); // reply - ret = write(dcl->fd, buf, size); + do { + ret = write(dcl->fd, buf, size); + if (retry_count && (ret != size)) { + IBWARN("failed to send reply: ret = %d, retry_count =%d, errno = %d.", + ret, retry_count, errno); + } + } while ((retry_count++ < 20) && (ret == -1)); + if (ret == size) return 0; Basically, it simply retries 20 times before giving up (and I still get errors, although less). The question is: Am I looking at the right thing here, or is the 'pkt dropped' error hiding another problem elsewhere ? Note: both ibsim and opensm codes are pulled from the git head branch. Thanks for your help, Vincent From ruimario at gmail.com Thu Sep 18 08:18:39 2008 From: ruimario at gmail.com (Rui Machado) Date: Thu, 18 Sep 2008 17:18:39 +0200 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> <2f3bf9a60809170719x1df462bcl6ae1a4f776d474f9@mail.gmail.com> <6978b4af0809170728l2417cbach62f080a216db2ab4@mail.gmail.com> <2f3bf9a60809170736id27d1f1s7ee8bd207392e368@mail.gmail.com> <6978b4af0809170744o13b196e5i8ec73303542176ba@mail.gmail.com> <2f3bf9a60809170748m4c9a1ca4t3bf5cdc7bc51a2f3@mail.gmail.com> <6978b4af0809170754w2e9dbd96j58b216f0340b4f66@mail.gmail.com> <2f3bf9a60809170757s5db88243sefa38eeebc412968@mail.gmail.com> <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> Message-ID: <6978b4af0809180818h1aea3cqb44ab357f41a3c2a@mail.gmail.com> 2008/9/17 Rui Machado : > From: Rui Machado > Date: 2008/9/17 > Subject: Re: [ofa-general] atomic operations on ppc64 > To: Dotan Barak > > > 2008/9/17 Dotan Barak : >> On Wed, Sep 17, 2008 at 5:54 PM, Rui Machado wrote: >>> 2008/9/17 Dotan Barak : >>>> On Wed, Sep 17, 2008 at 5:44 PM, Rui Machado wrote: >>>>> 2008/9/17 Dotan Barak : >>>>>> On Wed, Sep 17, 2008 at 5:28 PM, Rui Machado wrote: >>>>>>> Hey Dotan, >>>>>>> >>>>>>> 2008/9/17 Dotan Barak : >>>>>>>> On Wed, Sep 17, 2008 at 5:12 PM, Rui Machado wrote: >>>>>>>>> Hi list, >>>>>>>>> >>>>>>>>> does anyone have experienced problems using IB atomic operations >>>>>>>>> (fetch and add) on a ppc64 platform? >>>>>>>>> I tried a small example (using fetch and add) on x86 and ppc64 and on >>>>>>>>> x86 worked fine while on ppc64 didn't. >>>>>>>> >>>>>>>> Do you handle the ntoh/hton or do you let the driver/HCA deal with it by itself? >>>>>>> >>>>>>> Nop, I don't use those. I guess then I'm letting the driver/HCA deal with it.... >>>>>> >>>>>> Do you see endianess issues or completely corrupted data? >>>>>> >>>>> >>>>> Just to make it clear (to me :) ). I'm talking about ppc64<-->ppc64 >>>>> communication. >>>>> Should I still concern with converting data because of endianess? >>>>> What happens is that I ask for a fetch and add and it doesn't happen. >>>>> The value on the server doesn't get modified. >>>> >>>> This is a weird behaviour indeed .. >>>> >>>> Can you post the code in your program that fill the SR? >>>> >>>> Dotan >>>> >>> >>> Not sure what do you mean by SR. >>> Here's is the function inc() which I call to increment 1 one the >>> remote machine. The remote machine has its buffer full of zeroes. >>> That's what the client gets all the time although I increment 3 times >>> in a row (with a sleep in between) >>> >>> Is this enough? >>> Thanks for the help >>> >>> void inc() >>> { >>> >>> struct ibv_qp_attr check_attr; >>> struct ibv_qp_init_attr check_init_attr; >>> >>> void *ev_ctx; >>> >>> struct ibv_send_wr *bad_wr; >>> struct ibv_wc wc; >>> struct ibv_sge slist; >>> struct ibv_send_wr swr3; >>> >>> >>> slist.addr = (uintptr_t)buffer; >>> slist.length = 8; >>> slist.lkey =mr->lkey; >>> >>> swr3.wr.atomic.remote_addr = remote_node->mi.bufAddr; >>> swr3.wr.atomic.rkey = remote_node->mi.buf_rkey; >>> swr3.wr.atomic.compare_add = 1; >>> >>> swr3.wr_id = 1; >>> swr3.sg_list = &slist; >>> swr3.num_sge = 1; >>> swr3.opcode = IBV_WR_ATOMIC_FETCH_AND_ADD; >>> swr3.send_flags = IBV_SEND_SIGNALED; >>> swr3.next = NULL; >>> >>> >>> if(ibv_post_send(qp,&swr3,&bad_wr)){ >>> printf("Couldn't post send...\n"); >>> return 0; >>> } >>> >>> >>> int ne=0; >>> do{ >>> ne = ibv_poll_cq(cq,1,&wc); >>> }while(ne==0); >>> >>> if((ne < 0) || (wc.status != IBV_WC_SUCCESS)){ >>> >>> //check qp status >>> if(!ibv_query_qp(qp,&check_attr,IBV_QP_STATE,&check_init_attr)) >>> printf("The qp state is: %d\n ",check_attr.qp_state); >>> >>> } >>> } >>> >> >> The code looks good and it should work... >> (I would have memset every structure before using it ..) >> >> >> Did you check the memory in the sender side or in the reciver side? >> > > As I mentioned it does work on x86. > > Actually on both: > > server: > Initial counter at buffer is 0 > counter at buffer is 0 > counter at buffer is 0 > counter at buffer is 0 > counter at buffer is 0 > counter at buffer is 0 > counter at buffer is 0 > counter at buffer is 0 > > > client: > initial IB atomic counter 0 > IB atomic counter 0 > IB atomic counter 0 > IB atomic counter 0 > > What could this be related to? Driver, HW? > Anyone with some insight on this? Maybe how can I debug this further? Cheers From arlin.r.davis at intel.com Thu Sep 18 10:40:53 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Sep 2008 10:40:53 -0700 Subject: [ofa-general] RE: compat-dapl-1.2.10 install bogosity In-Reply-To: <1221679928.15868.30.camel@firewall.xsintricity.com> References: <1221627719.1927.563.camel@firewall.xsintricity.com> <1221679928.15868.30.camel@firewall.xsintricity.com> Message-ID: >> However, even once you fix that, modifying user configuration files >> without their consent as part of make install is just a >*BAD* idea. The >> only time you should ever do anything like this is when you just flat >> don't care about things like rpm packaging (or apt >packaging) and having >> the files on your system trackable, verifiable, erasable, >etc. Install >> methods that do this sort of thing are a good way to really alienate >> people in a position like me ;-) > >Turns out the dapl-2 package was doing this to. I've attached >the patch >I used to solve the problem. I still don't know if I'd go around >editing user config files on install/uninstall though. > Thanks, I will get these changes in as soon as possible. These changes were driven by the fact that there are other DAPL provider vendors that share this same configuration file, all adding/removing entries for their specific provider. What is the proper way to handle the case where packages need to add/remove entries in a shared configuration file? -arlin From dledford at redhat.com Thu Sep 18 10:47:53 2008 From: dledford at redhat.com (Doug Ledford) Date: Thu, 18 Sep 2008 13:47:53 -0400 Subject: [ofa-general] RE: compat-dapl-1.2.10 install bogosity In-Reply-To: References: <1221627719.1927.563.camel@firewall.xsintricity.com> <1221679928.15868.30.camel@firewall.xsintricity.com> Message-ID: <1221760073.15868.93.camel@firewall.xsintricity.com> On Thu, 2008-09-18 at 10:40 -0700, Davis, Arlin R wrote: > > >> However, even once you fix that, modifying user configuration files > >> without their consent as part of make install is just a > >*BAD* idea. The > >> only time you should ever do anything like this is when you just flat > >> don't care about things like rpm packaging (or apt > >packaging) and having > >> the files on your system trackable, verifiable, erasable, > >etc. Install > >> methods that do this sort of thing are a good way to really alienate > >> people in a position like me ;-) > > > >Turns out the dapl-2 package was doing this to. I've attached > >the patch > >I used to solve the problem. I still don't know if I'd go around > >editing user config files on install/uninstall though. > > > > Thanks, I will get these changes in as soon as possible. These changes > were driven by the fact that there are other DAPL provider vendors > that share this same configuration file, all adding/removing entries for > > their specific provider. What is the proper way to handle the case > where packages need to add/remove entries in a shared configuration > file? Multiple options. Lots of people never change the defaults, so you could just make the default work without any config file entries at all and then just use entries to over ride the defaults. Or you can do like lots of various configurable programs do know a days and make a directory for people to drop individual config files into and parse all those individual files. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From jsquyres at cisco.com Thu Sep 18 11:37:11 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 18 Sep 2008 14:37:11 -0400 Subject: [ofa-general] Question about RDMA CM In-Reply-To: <28D7C298-C191-4DDB-858C-EC689617E604@cisco.com> References: <29544E34-60C7-479A-84D4-1814F2B283ED@cisco.com> <1221630323.1927.575.camel@firewall.xsintricity.com> <4236041E-C487-4BAB-B9C7-3984F4419E8C@cisco.com> <1221670823.15868.28.camel@firewall.xsintricity.com> <28D7C298-C191-4DDB-858C-EC689617E604@cisco.com> Message-ID: <3B537352-7E9A-4D47-8EE5-18E4E5BAFC70@cisco.com> On Sep 17, 2008, at 1:25 PM, Jeff Squyres wrote: >> We do not use rdma_get_devices() because our wireup scheme is both >> modular (it may use RDMA CM or it may use something else) and >> separate from the RDMA device discovery process. Specifically, the >> PD that we have allocated is from the context we got back from >> ibv_open_device(), not rdma_get_devices(). To followup for the list -- preliminary testing seems to indicate that this was exactly the problem. To create QPs with RDMA CM, one must use rdma_create_qp(), not ibv_create_qp(). Once I started doing this, my problems seem to have disappeared. Jon M. quickly glanced at the RDMA CM kernel code and didn't see anything that jumped out at him that would matter on the iWARP side (i.e., using rdma_create_qp() vs. ibv_create_qp()). But I assume that it *does* matter somehow for IB -- this would explain the behavior we saw (worked on iWARP; failed on IB). Regardless, rdma_create_qp() in the RDMA CM interface, and therefore we should (and will) use it when using RDMA CM. Thanks for the insight; it's exactly what I was looking for. -- Jeff Squyres Cisco Systems From dotanba at gmail.com Thu Sep 18 18:43:29 2008 From: dotanba at gmail.com (Dotan Barak) Date: Fri, 19 Sep 2008 03:43:29 +0200 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809180818h1aea3cqb44ab357f41a3c2a@mail.gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> <2f3bf9a60809170719x1df462bcl6ae1a4f776d474f9@mail.gmail.com> <6978b4af0809170728l2417cbach62f080a216db2ab4@mail.gmail.com> <2f3bf9a60809170736id27d1f1s7ee8bd207392e368@mail.gmail.com> <6978b4af0809170744o13b196e5i8ec73303542176ba@mail.gmail.com> <2f3bf9a60809170748m4c9a1ca4t3bf5cdc7bc51a2f3@mail.gmail.com> <6978b4af0809170754w2e9dbd96j58b216f0340b4f66@mail.gmail.com> <2f3bf9a60809170757s5db88243sefa38eeebc412968@mail.gmail.com> <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> <6978b4af0809180818h1aea3cqb44ab357f41a3c2a@mail.gmail.com> Message-ID: <48D303C1.8090308@gmail.com> Rui Machado wrote: > 2008/9/17 Rui Machado : > >> From: Rui Machado >> Date: 2008/9/17 >> Subject: Re: [ofa-general] atomic operations on ppc64 >> To: Dotan Barak >> >> >> 2008/9/17 Dotan Barak : >> >>> On Wed, Sep 17, 2008 at 5:54 PM, Rui Machado wrote: >>> >>>> 2008/9/17 Dotan Barak : >>>> >>>>> On Wed, Sep 17, 2008 at 5:44 PM, Rui Machado wrote: >>>>> >>>>>> 2008/9/17 Dotan Barak : >>>>>> >>>>>>> On Wed, Sep 17, 2008 at 5:28 PM, Rui Machado wrote: >>>>>>> >>>>>>>> Hey Dotan, >>>>>>>> >>>>>>>> 2008/9/17 Dotan Barak : >>>>>>>> >>>>>>>>> On Wed, Sep 17, 2008 at 5:12 PM, Rui Machado wrote: >>>>>>>>> >>>>>>>>>> Hi list, >>>>>>>>>> >>>>>>>>>> does anyone have experienced problems using IB atomic operations >>>>>>>>>> (fetch and add) on a ppc64 platform? >>>>>>>>>> I tried a small example (using fetch and add) on x86 and ppc64 and on >>>>>>>>>> x86 worked fine while on ppc64 didn't. >>>>>>>>>> >>>>>>>>> Do you handle the ntoh/hton or do you let the driver/HCA deal with it by itself? >>>>>>>>> >>>>>>>> Nop, I don't use those. I guess then I'm letting the driver/HCA deal with it.... >>>>>>>> >>>>>>> Do you see endianess issues or completely corrupted data? >>>>>>> >>>>>>> >>>>>> Just to make it clear (to me :) ). I'm talking about ppc64<-->ppc64 >>>>>> communication. >>>>>> Should I still concern with converting data because of endianess? >>>>>> What happens is that I ask for a fetch and add and it doesn't happen. >>>>>> The value on the server doesn't get modified. >>>>>> >>>>> This is a weird behaviour indeed .. >>>>> >>>>> Can you post the code in your program that fill the SR? >>>>> >>>>> Dotan >>>>> >>>>> >>>> Not sure what do you mean by SR. >>>> Here's is the function inc() which I call to increment 1 one the >>>> remote machine. The remote machine has its buffer full of zeroes. >>>> That's what the client gets all the time although I increment 3 times >>>> in a row (with a sleep in between) >>>> >>>> Is this enough? >>>> Thanks for the help >>>> >>>> void inc() >>>> { >>>> >>>> struct ibv_qp_attr check_attr; >>>> struct ibv_qp_init_attr check_init_attr; >>>> >>>> void *ev_ctx; >>>> >>>> struct ibv_send_wr *bad_wr; >>>> struct ibv_wc wc; >>>> struct ibv_sge slist; >>>> struct ibv_send_wr swr3; >>>> >>>> >>>> slist.addr = (uintptr_t)buffer; >>>> slist.length = 8; >>>> slist.lkey =mr->lkey; >>>> >>>> swr3.wr.atomic.remote_addr = remote_node->mi.bufAddr; >>>> swr3.wr.atomic.rkey = remote_node->mi.buf_rkey; >>>> swr3.wr.atomic.compare_add = 1; >>>> >>>> swr3.wr_id = 1; >>>> swr3.sg_list = &slist; >>>> swr3.num_sge = 1; >>>> swr3.opcode = IBV_WR_ATOMIC_FETCH_AND_ADD; >>>> swr3.send_flags = IBV_SEND_SIGNALED; >>>> swr3.next = NULL; >>>> >>>> >>>> if(ibv_post_send(qp,&swr3,&bad_wr)){ >>>> printf("Couldn't post send...\n"); >>>> return 0; >>>> } >>>> >>>> >>>> int ne=0; >>>> do{ >>>> ne = ibv_poll_cq(cq,1,&wc); >>>> }while(ne==0); >>>> >>>> if((ne < 0) || (wc.status != IBV_WC_SUCCESS)){ >>>> >>>> //check qp status >>>> if(!ibv_query_qp(qp,&check_attr,IBV_QP_STATE,&check_init_attr)) >>>> printf("The qp state is: %d\n ",check_attr.qp_state); >>>> >>>> } >>>> } >>>> >>>> >>> The code looks good and it should work... >>> (I would have memset every structure before using it ..) >>> >>> >>> Did you check the memory in the sender side or in the reciver side? >>> >>> >> As I mentioned it does work on x86. >> >> Actually on both: >> >> server: >> Initial counter at buffer is 0 >> counter at buffer is 0 >> counter at buffer is 0 >> counter at buffer is 0 >> counter at buffer is 0 >> counter at buffer is 0 >> counter at buffer is 0 >> counter at buffer is 0 >> >> >> client: >> initial IB atomic counter 0 >> IB atomic counter 0 >> IB atomic counter 0 >> IB atomic counter 0 >> >> What could this be related to? Driver, HW? >> >> > > Anyone with some insight on this? > Maybe how can I debug this further? > Bugs can be anywhere: application / Driver / HW ... Can you try to use server in x86 and client in PPC64 and then server in PPC64 and client in x86? Which OFED version do you use? Can you send the output of ibv_devinfo? Dotan From roger at terascala.com Thu Sep 18 13:45:12 2008 From: roger at terascala.com (Roger Spellman) Date: Thu, 18 Sep 2008 16:45:12 -0400 Subject: [ofa-general] Intermittent: ib0: multicast join failed Message-ID: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> I have many nodes, each with a Mellanox MT25204. When I reboot some nodes, they occasionally get the following error: ib0: multicast join failed Rebooting the system almost always solves this problem. What causes this? Is there a way to solve this without rebooting? Thanks. Roger Spellman Sr. Staff Engineer Terascala, Inc. www.terascala.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From wangwhao at cn.ibm.com Fri Sep 19 01:13:06 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Fri, 19 Sep 2008 16:13:06 +0800 Subject: [ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1 In-Reply-To: Message-ID: > I wasn't suggesting to do that. Just that the errors may have occurred then. > What I was suggesting was to first clear the errors and then recheck > them to see if any of the error counters are increasing or whether > they are relatively stable. > > -- Hal OK. Would you provide some detailed advice how to clear the errors? Thanks. Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-vincent.ficet at bull.net Fri Sep 19 02:11:03 2008 From: jean-vincent.ficet at bull.net (Vincent Ficet) Date: Fri, 19 Sep 2008 11:11:03 +0200 Subject: [ofa-general] [PATCH] ibsim: handle EAGAIN error Message-ID: <48D36CA7.2050201@bull.net> Hello Sasha, Following the issue I raised yesterday ([ofa-general] ibsim: sim_read_pkt: write failed: Resource temporarily unavailable - pkt dropped), I wrote a small patch that fixes the issue (in attachment). All my tests are now running fine with this patch. Please let me know if you have any comments ;-) Cheers, Vincent -------------- next part -------------- A non-text attachment was scrubbed... Name: ibsim-EAGAIN.patch Type: text/x-patch Size: 813 bytes Desc: not available URL: From vlad at lists.openfabrics.org Fri Sep 19 03:11:16 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 19 Sep 2008 03:11:16 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080919-0200 daily build status Message-ID: <20080919101116.91F59E60AD3@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From hal.rosenstock at gmail.com Fri Sep 19 04:19:44 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 19 Sep 2008 07:19:44 -0400 Subject: ***SPAM*** Re: [ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1 In-Reply-To: References: Message-ID: On Fri, Sep 19, 2008 at 4:13 AM, Wen Hao Wang wrote: >> I wasn't suggesting to do that. Just that the errors may have occurred >> then. >> What I was suggesting was to first clear the errors and then recheck >> them to see if any of the error counters are increasing or whether >> they are relatively stable. >> >> -- Hal > > OK. Would you provide some detailed advice how to clear the errors? ibclearerrors will do this. -- Hal >Thanks. > > Wen Hao Wang > Email: wangwhao at cn.ibm.com > From ruimario at gmail.com Fri Sep 19 06:16:08 2008 From: ruimario at gmail.com (Rui Machado) Date: Fri, 19 Sep 2008 15:16:08 +0200 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <48D303C1.8090308@gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> <2f3bf9a60809170736id27d1f1s7ee8bd207392e368@mail.gmail.com> <6978b4af0809170744o13b196e5i8ec73303542176ba@mail.gmail.com> <2f3bf9a60809170748m4c9a1ca4t3bf5cdc7bc51a2f3@mail.gmail.com> <6978b4af0809170754w2e9dbd96j58b216f0340b4f66@mail.gmail.com> <2f3bf9a60809170757s5db88243sefa38eeebc412968@mail.gmail.com> <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> <6978b4af0809180818h1aea3cqb44ab357f41a3c2a@mail.gmail.com> <48D303C1.8090308@gmail.com> Message-ID: <6978b4af0809190616s7f23d5f8uc9c13b8be38fece1@mail.gmail.com> 2008/9/19 Dotan Barak : > Rui Machado wrote: >> >> 2008/9/17 Rui Machado : >> >>> >>> From: Rui Machado >>> Date: 2008/9/17 >>> Subject: Re: [ofa-general] atomic operations on ppc64 >>> To: Dotan Barak >>> >>> >>> 2008/9/17 Dotan Barak : >>> >>>> >>>> On Wed, Sep 17, 2008 at 5:54 PM, Rui Machado wrote: >>>> >>>>> >>>>> 2008/9/17 Dotan Barak : >>>>> >>>>>> >>>>>> On Wed, Sep 17, 2008 at 5:44 PM, Rui Machado >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> 2008/9/17 Dotan Barak : >>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 17, 2008 at 5:28 PM, Rui Machado >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Hey Dotan, >>>>>>>>> >>>>>>>>> 2008/9/17 Dotan Barak : >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Sep 17, 2008 at 5:12 PM, Rui Machado >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi list, >>>>>>>>>>> >>>>>>>>>>> does anyone have experienced problems using IB atomic operations >>>>>>>>>>> (fetch and add) on a ppc64 platform? >>>>>>>>>>> I tried a small example (using fetch and add) on x86 and ppc64 >>>>>>>>>>> and on >>>>>>>>>>> x86 worked fine while on ppc64 didn't. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Do you handle the ntoh/hton or do you let the driver/HCA deal with >>>>>>>>>> it by itself? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Nop, I don't use those. I guess then I'm letting the driver/HCA >>>>>>>>> deal with it.... >>>>>>>>> >>>>>>>> >>>>>>>> Do you see endianess issues or completely corrupted data? >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Just to make it clear (to me :) ). I'm talking about ppc64<-->ppc64 >>>>>>> communication. >>>>>>> Should I still concern with converting data because of endianess? >>>>>>> What happens is that I ask for a fetch and add and it doesn't happen. >>>>>>> The value on the server doesn't get modified. >>>>>>> >>>>>> >>>>>> This is a weird behaviour indeed .. >>>>>> >>>>>> Can you post the code in your program that fill the SR? >>>>>> >>>>>> Dotan >>>>>> >>>>>> >>>>> >>>>> Not sure what do you mean by SR. >>>>> Here's is the function inc() which I call to increment 1 one the >>>>> remote machine. The remote machine has its buffer full of zeroes. >>>>> That's what the client gets all the time although I increment 3 times >>>>> in a row (with a sleep in between) >>>>> >>>>> Is this enough? >>>>> Thanks for the help >>>>> >>>>> void inc() >>>>> { >>>>> >>>>> struct ibv_qp_attr check_attr; >>>>> struct ibv_qp_init_attr check_init_attr; >>>>> >>>>> void *ev_ctx; >>>>> >>>>> struct ibv_send_wr *bad_wr; >>>>> struct ibv_wc wc; >>>>> struct ibv_sge slist; >>>>> struct ibv_send_wr swr3; >>>>> >>>>> >>>>> slist.addr = (uintptr_t)buffer; >>>>> slist.length = 8; >>>>> slist.lkey =mr->lkey; >>>>> >>>>> swr3.wr.atomic.remote_addr = remote_node->mi.bufAddr; >>>>> swr3.wr.atomic.rkey = remote_node->mi.buf_rkey; >>>>> swr3.wr.atomic.compare_add = 1; >>>>> >>>>> swr3.wr_id = 1; >>>>> swr3.sg_list = &slist; >>>>> swr3.num_sge = 1; >>>>> swr3.opcode = IBV_WR_ATOMIC_FETCH_AND_ADD; >>>>> swr3.send_flags = IBV_SEND_SIGNALED; >>>>> swr3.next = NULL; >>>>> >>>>> >>>>> if(ibv_post_send(qp,&swr3,&bad_wr)){ >>>>> printf("Couldn't post send...\n"); >>>>> return 0; >>>>> } >>>>> >>>>> >>>>> int ne=0; >>>>> do{ >>>>> ne = ibv_poll_cq(cq,1,&wc); >>>>> }while(ne==0); >>>>> >>>>> if((ne < 0) || (wc.status != IBV_WC_SUCCESS)){ >>>>> >>>>> //check qp status >>>>> >>>>> if(!ibv_query_qp(qp,&check_attr,IBV_QP_STATE,&check_init_attr)) >>>>> printf("The qp state is: %d\n >>>>> ",check_attr.qp_state); >>>>> >>>>> } >>>>> } >>>>> >>>>> >>>> >>>> The code looks good and it should work... >>>> (I would have memset every structure before using it ..) >>>> >>>> >>>> Did you check the memory in the sender side or in the reciver side? >>>> >>>> >>> >>> As I mentioned it does work on x86. >>> >>> Actually on both: >>> >>> server: >>> Initial counter at buffer is 0 >>> counter at buffer is 0 >>> counter at buffer is 0 >>> counter at buffer is 0 >>> counter at buffer is 0 >>> counter at buffer is 0 >>> counter at buffer is 0 >>> counter at buffer is 0 >>> >>> >>> client: >>> initial IB atomic counter 0 >>> IB atomic counter 0 >>> IB atomic counter 0 >>> IB atomic counter 0 >>> >>> What could this be related to? Driver, HW? >>> >>> >> >> Anyone with some insight on this? >> Maybe how can I debug this further? >> > > Bugs can be anywhere: application / Driver / HW ... > > Can you try to use server in x86 and client in PPC64 and then server in > PPC64 and client in x86? > > Which OFED version do you use? > Can you send the output of ibv_devinfo? > > Dotan > I tried the combination ppc64-x86 and x86-ppc64. The result was a hang on the client side on poll (see code above) on both cases. Could endianess be playing a role here? This is the first time I try to put different architectures to communicate. I have ofed 1.2.5.5 ibv_devinfo (x86 machine used for the ppc64-x86 communication) hca_id: mthca0 fw_ver: 1.2.0 node_guid: 0002:c902:0021:b820 sys_image_guid: 0002:c902:0021:b823 vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_03B0110001 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 142 port_lid: 134 port_lmc: 0x01 ibv_devinfo (ppc64) hca_id: mlx4_0 fw_ver: 2.3.000 node_guid: 0002:c903:0000:9334 sys_image_guid: 0002:c903:0000:9337 vendor_id: 0x02c9 vendor_part_id: 25418 hw_ver: 0xA0 board_id: IBM08A0000001 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 142 port_lid: 68 port_lmc: 0x01 port: 2 state: PORT_DOWN (1) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 0 port_lid: 0 port_lmc: 0x00 Cheers, From dotanba at gmail.com Fri Sep 19 07:21:03 2008 From: dotanba at gmail.com (Dotan Barak) Date: Fri, 19 Sep 2008 16:21:03 +0200 Subject: ***SPAM*** Re: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809190616s7f23d5f8uc9c13b8be38fece1@mail.gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> <2f3bf9a60809170736id27d1f1s7ee8bd207392e368@mail.gmail.com> <6978b4af0809170744o13b196e5i8ec73303542176ba@mail.gmail.com> <2f3bf9a60809170748m4c9a1ca4t3bf5cdc7bc51a2f3@mail.gmail.com> <6978b4af0809170754w2e9dbd96j58b216f0340b4f66@mail.gmail.com> <2f3bf9a60809170757s5db88243sefa38eeebc412968@mail.gmail.com> <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> <6978b4af0809180818h1aea3cqb44ab357f41a3c2a@mail.gmail.com> <48D303C1.8090308@gmail.com> <6978b4af0809190616s7f23d5f8uc9c13b8be38fece1@mail.gmail.com> Message-ID: <48D3B54F.3070703@gmail.com> >> Bugs can be anywhere: application / Driver / HW ... >> >> Can you try to use server in x86 and client in PPC64 and then server in >> PPC64 and client in x86? >> >> Which OFED version do you use? >> Can you send the output of ibv_devinfo? >> >> Dotan >> >> > > I tried the combination ppc64-x86 and x86-ppc64. > The result was a hang on the client side on poll (see code above) on both cases. > Could endianess be playing a role here? This is the first time I try > to put different architectures to communicate. > > I have ofed 1.2.5.5 > > ibv_devinfo (x86 machine used for the ppc64-x86 communication) > > hca_id: mthca0 > fw_ver: 1.2.0 > node_guid: 0002:c902:0021:b820 > sys_image_guid: 0002:c902:0021:b823 > vendor_id: 0x02c9 > vendor_part_id: 25204 > hw_ver: 0xA0 > board_id: MT_03B0110001 > phys_port_cnt: 1 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 142 > port_lid: 134 > port_lmc: 0x01 > > ibv_devinfo (ppc64) > > hca_id: mlx4_0 > fw_ver: 2.3.000 > node_guid: 0002:c903:0000:9334 > sys_image_guid: 0002:c903:0000:9337 > vendor_id: 0x02c9 > vendor_part_id: 25418 > hw_ver: 0xA0 > board_id: IBM08A0000001 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 142 > port_lid: 68 > port_lmc: 0x01 > > port: 2 > state: PORT_DOWN (1) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > > Cheers, > It seems that you didn't check the same HCA in each arch. Can you try to use the mthca device in PPC64 and check the results? (Anyway, i would have suggest you to upgrade the OFED package that you are using) Dotan From ruimario at gmail.com Fri Sep 19 06:47:40 2008 From: ruimario at gmail.com (Rui Machado) Date: Fri, 19 Sep 2008 15:47:40 +0200 Subject: ***SPAM*** Re: [ofa-general] atomic operations on ppc64 In-Reply-To: <48D3B54F.3070703@gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> <2f3bf9a60809170748m4c9a1ca4t3bf5cdc7bc51a2f3@mail.gmail.com> <6978b4af0809170754w2e9dbd96j58b216f0340b4f66@mail.gmail.com> <2f3bf9a60809170757s5db88243sefa38eeebc412968@mail.gmail.com> <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> <6978b4af0809180818h1aea3cqb44ab357f41a3c2a@mail.gmail.com> <48D303C1.8090308@gmail.com> <6978b4af0809190616s7f23d5f8uc9c13b8be38fece1@mail.gmail.com> <48D3B54F.3070703@gmail.com> Message-ID: <6978b4af0809190647q3feeef90jae0aecf45474d4d2@mail.gmail.com> 2008/9/19 Dotan Barak : > >>> Bugs can be anywhere: application / Driver / HW ... >>> >>> Can you try to use server in x86 and client in PPC64 and then server in >>> PPC64 and client in x86? >>> >>> Which OFED version do you use? >>> Can you send the output of ibv_devinfo? >>> >>> Dotan >>> >>> >> >> I tried the combination ppc64-x86 and x86-ppc64. >> The result was a hang on the client side on poll (see code above) on both >> cases. >> Could endianess be playing a role here? This is the first time I try >> to put different architectures to communicate. >> >> I have ofed 1.2.5.5 >> >> ibv_devinfo (x86 machine used for the ppc64-x86 communication) >> >> hca_id: mthca0 >> fw_ver: 1.2.0 >> node_guid: 0002:c902:0021:b820 >> sys_image_guid: 0002:c902:0021:b823 >> vendor_id: 0x02c9 >> vendor_part_id: 25204 >> hw_ver: 0xA0 >> board_id: MT_03B0110001 >> phys_port_cnt: 1 >> port: 1 >> state: PORT_ACTIVE (4) >> max_mtu: 2048 (4) >> active_mtu: 2048 (4) >> sm_lid: 142 >> port_lid: 134 >> port_lmc: 0x01 >> >> ibv_devinfo (ppc64) >> >> hca_id: mlx4_0 >> fw_ver: 2.3.000 >> node_guid: 0002:c903:0000:9334 >> sys_image_guid: 0002:c903:0000:9337 >> vendor_id: 0x02c9 >> vendor_part_id: 25418 >> hw_ver: 0xA0 >> board_id: IBM08A0000001 >> phys_port_cnt: 2 >> port: 1 >> state: PORT_ACTIVE (4) >> max_mtu: 2048 (4) >> active_mtu: 2048 (4) >> sm_lid: 142 >> port_lid: 68 >> port_lmc: 0x01 >> >> port: 2 >> state: PORT_DOWN (1) >> max_mtu: 2048 (4) >> active_mtu: 2048 (4) >> sm_lid: 0 >> port_lid: 0 >> port_lmc: 0x00 >> >> >> Cheers, >> > > It seems that you didn't check the same HCA in each arch. > > Can you try to use the mthca device in PPC64 and check the results? > > (Anyway, i would have suggest you to upgrade the OFED package that you are > using) I apologize for my newbieness here but I do not understand what you mean. The two machines have different devices. What do you mean by making the ppc64 machine use the mthca when it does only have the mlx4? Unfortunately, upgrading might not be an (easy) option :-/ Thanks for the patience From dotanba at gmail.com Fri Sep 19 07:57:52 2008 From: dotanba at gmail.com (Dotan Barak) Date: Fri, 19 Sep 2008 16:57:52 +0200 Subject: ***SPAM*** Re: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809190647q3feeef90jae0aecf45474d4d2@mail.gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> <2f3bf9a60809170748m4c9a1ca4t3bf5cdc7bc51a2f3@mail.gmail.com> <6978b4af0809170754w2e9dbd96j58b216f0340b4f66@mail.gmail.com> <2f3bf9a60809170757s5db88243sefa38eeebc412968@mail.gmail.com> <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> <6978b4af0809180818h1aea3cqb44ab357f41a3c2a@mail.gmail.com> <48D303C1.8090308@gmail.com> <6978b4af0809190616s7f23d5f8uc9c13b8be38fece1@mail.gmail.com> <48D3B54F.3070703@gmail.com> <6978b4af0809190647q3feeef90jae0aecf45474d4d2@mail.gmail.com> Message-ID: <48D3BDF0.1090103@gmail.com> >> It seems that you didn't check the same HCA in each arch. >> >> Can you try to use the mthca device in PPC64 and check the results? >> >> (Anyway, i would have suggest you to upgrade the OFED package that you are >> using) >> > > I apologize for my newbieness here but I do not understand what you mean. > The two machines have different devices. What do you mean by making > the ppc64 machine use the mthca when it does only have the mlx4? > > Unfortunately, upgrading might not be an (easy) option :-/ > > Thanks for the patience > You have 2 machines, each with another device: x86 has the HCA 25204 PPC64 has the HCA 25418 The fact that the test didn't pass for you on the PPC64 may be related to this device (i think that OFED 1.2.5 was the first version that supported this device). So, i suggest that you'll put the HCA 25204 on the PPC64 and check if the failure still exists. Dotan From ruimario at gmail.com Fri Sep 19 07:39:04 2008 From: ruimario at gmail.com (Rui Machado) Date: Fri, 19 Sep 2008 16:39:04 +0200 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <48D3BDF0.1090103@gmail.com> References: <6978b4af0809170712j6991bf5ek8ea24adcb2533dfc@mail.gmail.com> <2f3bf9a60809170757s5db88243sefa38eeebc412968@mail.gmail.com> <6978b4af0809170807x4e63c41bsb679b6eedd123626@mail.gmail.com> <6978b4af0809171014v468db89ew3534def3d5b6a303@mail.gmail.com> <6978b4af0809180818h1aea3cqb44ab357f41a3c2a@mail.gmail.com> <48D303C1.8090308@gmail.com> <6978b4af0809190616s7f23d5f8uc9c13b8be38fece1@mail.gmail.com> <48D3B54F.3070703@gmail.com> <6978b4af0809190647q3feeef90jae0aecf45474d4d2@mail.gmail.com> <48D3BDF0.1090103@gmail.com> Message-ID: <6978b4af0809190739l155f81b7q9668e17406f83eca@mail.gmail.com> 2008/9/19 Dotan Barak : > >>> It seems that you didn't check the same HCA in each arch. >>> >>> Can you try to use the mthca device in PPC64 and check the results? >>> >>> (Anyway, i would have suggest you to upgrade the OFED package that you >>> are >>> using) >>> >> >> I apologize for my newbieness here but I do not understand what you mean. >> The two machines have different devices. What do you mean by making >> the ppc64 machine use the mthca when it does only have the mlx4? >> >> Unfortunately, upgrading might not be an (easy) option :-/ >> >> Thanks for the patience >> > > You have 2 machines, each with another device: > x86 has the HCA 25204 > PPC64 has the HCA 25418 > > > The fact that the test didn't pass for you on the PPC64 may be related to > this device > (i think that OFED 1.2.5 was the first version that supported this device). > > So, i suggest that you'll put the HCA 25204 on the PPC64 and check if the > failure still exists. > > > Dotan > Right :) Now I get it! I couldn't (yet at least) put the HCA 25204 on the ppc64. But I tried again with another ppc64 machine. ibv_devinfo hca_id: mthca0 fw_ver: 4.8.200 node_guid: 0005:ad00:001d:cd24 sys_image_guid: 0005:ad00:0100:d050 vendor_id: 0x05ad vendor_part_id: 25208 hw_ver: 0xA0 board_id: HCA.HSDC.A0 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 142 port_lid: 156 port_lmc: 0x01 port: 2 state: PORT_DOWN (1) max_mtu: 2048 (4) active_mtu: 512 (2) sm_lid: 0 port_lid: 0 port_lmc: 0x00 The result was still the same as before: ppc64-ppc64 - the counter doesn't get updated. ppc64-x86 - clients hangs at poll Just as an extra info... From haven.hash at isilon.com Thu Sep 18 18:49:35 2008 From: haven.hash at isilon.com (Haven Hash) Date: Thu, 18 Sep 2008 18:49:35 -0700 Subject: [ofa-general] [PATCH][TRIVIAL]mad.c: Need parens to kmalloc correct amount of memory Message-ID: <1221788975.5804.49.camel@hhash-dev> I assume this has never been a problem because the malloc will probably word align the allocation, but maybe it was desired? Potential patch attached. Haven Hash haven.hash at isilon.com- -------------- next part -------------- A non-text attachment was scrubbed... Name: mad.c.diff Type: text/x-patch Size: 703 bytes Desc: not available URL: From sashak at voltaire.com Fri Sep 19 10:06:22 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 19 Sep 2008 20:06:22 +0300 Subject: [ofa-general] Intermittent: ib0: multicast join failed In-Reply-To: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> References: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> Message-ID: <20080919170622.GI27236@sashak.voltaire.com> On 16:45 Thu 18 Sep , Roger Spellman wrote: > I have many nodes, each with a Mellanox MT25204. When I reboot some > nodes, they occasionally get the following error: > > ib0: multicast join failed What is the software stack? Which version? > Rebooting the system almost always solves this problem. > > What causes this? What are SM you using? If it is OpenSM you can see in the log (/vat/log/opensm.log) why the join failed. > Is there a way to solve this without rebooting? Hard to say - the reason for failure is unknown. I could be port's low speed/width or something else, hard to say without any details. Sasha From sashak at voltaire.com Fri Sep 19 10:36:21 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 19 Sep 2008 20:36:21 +0300 Subject: [ofa-general] Re: [PATCH] ibsim: handle EAGAIN error In-Reply-To: <48D36CA7.2050201@bull.net> References: <48D36CA7.2050201@bull.net> Message-ID: <20080919173621.GJ27236@sashak.voltaire.com> Hi Vincent, On 11:11 Fri 19 Sep , Vincent Ficet wrote: > > Following the issue I raised yesterday ([ofa-general] ibsim: sim_read_pkt: > write failed: Resource temporarily unavailable - pkt dropped), I wrote a > small patch that fixes the issue (in attachment). > All my tests are now running fine with this patch. Please let me know if > you have any comments ;-) The patch looks good for me. Just remember next to to add 'Signed-off-by' line. Applied. Thanks. Sasha From roger at terascala.com Fri Sep 19 11:28:00 2008 From: roger at terascala.com (Roger Spellman) Date: Fri, 19 Sep 2008 14:28:00 -0400 Subject: [ofa-general] Intermittent: ib0: multicast join failed References: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> <20080919170622.GI27236@sashak.voltaire.com> Message-ID: <2C7DE72B9BD00F44BAECA5B0CBB87395321855@hermes.terascala.com> Sasha, I am running OFED 1.3.1. My SN Manager is opensmd. /var/log/opensm.log shows the following: Sep 19 14:21:19 480217 [43806960] 0x02 -> SUBNET UP Sep 19 14:21:19 818276 [41001960] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 (Channel Adapter) from LID:0x0011 TID:0x0000000000000000 Sep 19 14:21:19 818330 [41001960] 0x02 -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0011 GID:0xfe80000000000000,0x0002c9020027d451 Sep 19 14:21:19 823408 [43806960] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches Sep 19 14:21:19 827220 [43806960] 0x02 -> SUBNET UP Sep 19 14:21:27 283873 [41802960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending IB_SA_MAD_STATUS_REQ_INVALID Sep 19 14:21:43 298367 [42804960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending IB_SA_MAD_STATUS_REQ_INVALID Sep 19 14:21:59 312765 [42003960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending IB_SA_MAD_STATUS_REQ_INVALID Rebooting the node that failed to join the group always seems to solve the problem. Thanks for your help. -Roger > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Friday, September 19, 2008 1:06 PM > To: Roger Spellman > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] Intermittent: ib0: multicast join failed > > On 16:45 Thu 18 Sep , Roger Spellman wrote: > > I have many nodes, each with a Mellanox MT25204. When I reboot some > > nodes, they occasionally get the following error: > > > > ib0: multicast join failed > > What is the software stack? Which version? > > > Rebooting the system almost always solves this problem. > > > > What causes this? > > What are SM you using? If it is OpenSM you can see in the log > (/vat/log/opensm.log) why the join failed. > > > Is there a way to solve this without rebooting? > > Hard to say - the reason for failure is unknown. I could be port's low > speed/width or something else, hard to say without any details. > > Sasha From hal.rosenstock at gmail.com Fri Sep 19 11:33:08 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 19 Sep 2008 14:33:08 -0400 Subject: ***SPAM*** Re: [ofa-general] Intermittent: ib0: multicast join failed In-Reply-To: <2C7DE72B9BD00F44BAECA5B0CBB87395321855@hermes.terascala.com> References: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> <20080919170622.GI27236@sashak.voltaire.com> <2C7DE72B9BD00F44BAECA5B0CBB87395321855@hermes.terascala.com> Message-ID: On Fri, Sep 19, 2008 at 2:28 PM, Roger Spellman wrote: > Sasha, > I am running OFED 1.3.1. > > My SN Manager is opensmd. /var/log/opensm.log shows the following: > > Sep 19 14:21:19 480217 [43806960] 0x02 -> SUBNET UP > Sep 19 14:21:19 818276 [41001960] 0x01 -> > __osm_trap_rcv_process_request: Received Generic Notice type:0x04 > num:144 Producer:1 (Channel Adapter) from LID:0x0011 > TID:0x0000000000000000 > Sep 19 14:21:19 818330 [41001960] 0x02 -> osm_report_notice: Reporting > Generic Notice type:4 num:144 from LID:0x0011 > GID:0xfe80000000000000,0x0002c9020027d451 > Sep 19 14:21:19 823408 [43806960] 0x02 -> osm_ucast_mgr_process: minhop > tables configured on all switches > Sep 19 14:21:19 827220 [43806960] 0x02 -> SUBNET UP > Sep 19 14:21:27 283873 [41802960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR > 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = > 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending > IB_SA_MAD_STATUS_REQ_INVALID > Sep 19 14:21:43 298367 [42804960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR > 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = > 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending > IB_SA_MAD_STATUS_REQ_INVALID > Sep 19 14:21:59 312765 [42003960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR > 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = > 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending > IB_SA_MAD_STATUS_REQ_INVALID It's likely a rate issue where the negotiated port rate is not the broadcast group rate. What does ibstat or ibstatus show when the join fails ? Also, what about saquery -g ? > > Rebooting the node that failed to join the group always seems to solve > the problem. Yes, that's consistent with the negotiated rate being a problem. -- Hal > Thanks for your help. > > -Roger > >> -----Original Message----- >> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] >> Sent: Friday, September 19, 2008 1:06 PM >> To: Roger Spellman >> Cc: general at lists.openfabrics.org >> Subject: Re: [ofa-general] Intermittent: ib0: multicast join failed >> >> On 16:45 Thu 18 Sep , Roger Spellman wrote: >> > I have many nodes, each with a Mellanox MT25204. When I reboot some >> > nodes, they occasionally get the following error: >> > >> > ib0: multicast join failed >> >> What is the software stack? Which version? >> >> > Rebooting the system almost always solves this problem. >> > >> > What causes this? >> >> What are SM you using? If it is OpenSM you can see in the log >> (/vat/log/opensm.log) why the join failed. >> >> > Is there a way to solve this without rebooting? >> >> Hard to say - the reason for failure is unknown. I could be port's low >> speed/width or something else, hard to say without any details. >> >> Sasha > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Fri Sep 19 13:42:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 19 Sep 2008 23:42:40 +0300 Subject: [ofa-general] [PATCH] opensm: consolidate mgrp_send_notice calls In-Reply-To: <20080917183258.GA25831@sashak.voltaire.com> References: <20080917183258.GA25831@sashak.voltaire.com> Message-ID: <20080919204240.GO27236@sashak.voltaire.com> Now when all MCG creation/deletion notification calls are consolidated in one place it is easy to cleanup the code - make it static, drop extra call level. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_multicast.h | 64 -------------------------- opensm/opensm/osm_multicast.c | 80 +++++++++++++------------------- 2 files changed, 33 insertions(+), 111 deletions(-) diff --git a/opensm/include/opensm/osm_multicast.h b/opensm/include/opensm/osm_multicast.h index a0eab16..bbb7070 100644 --- a/opensm/include/opensm/osm_multicast.h +++ b/opensm/include/opensm/osm_multicast.h @@ -499,69 +499,5 @@ osm_mgrp_apply_func(const osm_mgrp_t * const p_mgrp, * Multicast Group *********/ -/****f* OpenSM: Multicast Group/osm_mgrp_send_delete_notice -* NAME -* osm_mgrp_send_delete_notice -* -* DESCRIPTION -* Sends a notice that the given multicast group is now deleted. -* -* SYNOPSIS -*/ -void -osm_mgrp_send_delete_notice(IN osm_subn_t * const p_subn, - IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp); -/* -* PARAMETERS -* p_subn -* Pointer to the Subnet object for this subnet. -* -* p_log -* Pointer to the log object. -* -* p_mgrp -* [in] Pointer to an osm_mgrp_t object. -* -* RETURN VALUES -* None. -* -* NOTES -* -* SEE ALSO -* Multicast Group -*********/ - -/****f* OpenSM: Multicast Group/osm_mgrp_send_create_notice -* NAME -* osm_mgrp_send_create_notice -* -* DESCRIPTION -* Sends a notice that the given multicast group is now created. -* -* SYNOPSIS -*/ -void -osm_mgrp_send_create_notice(IN osm_subn_t * const p_subn, - IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp); -/* -* PARAMETERS -* p_subn -* Pointer to the Subnet object for this subnet. -* -* p_log -* Pointer to the log object. -* -* p_mgrp -* [in] Pointer to an osm_mgrp_t object. -* -* RETURN VALUES -* None. -* -* NOTES -* -* SEE ALSO -* Multicast Group -*********/ - END_C_DECLS #endif /* _OSM_MULTICAST_H_ */ diff --git a/opensm/opensm/osm_multicast.c b/opensm/opensm/osm_multicast.c index 83c0399..625ca6e 100644 --- a/opensm/opensm/osm_multicast.c +++ b/opensm/opensm/osm_multicast.c @@ -95,6 +95,35 @@ osm_mgrp_t *osm_mgrp_new(IN const ib_net16_t mlid) /********************************************************************** **********************************************************************/ +static void mgrp_send_notice(osm_subn_t *subn, osm_log_t *log, + osm_mgrp_t *mgrp, unsigned num) +{ + ib_mad_notice_attr_t notice; + ib_api_status_t status; + + notice.generic_type = 0x83; /* generic SubnMgt type */ + ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ + notice.g_or_v.generic.trap_num = CL_HTON16(num); + /* The sm_base_lid is saved in network order already. */ + notice.issuer_lid = subn->sm_base_lid; + /* following o14-12.1.11 and table 120 p726 */ + /* we need to provide the MGID */ + memcpy(¬ice.data_details.ntc_64_67.gid, + &mgrp->mcmember_rec.mgid, sizeof(ib_gid_t)); + + /* According to page 653 - the issuer gid in this case of trap + is the SM gid, since the SM is the initiator of this trap. */ + notice.issuer_gid.unicast.prefix = subn->opt.subnet_prefix; + notice.issuer_gid.unicast.interface_id = subn->sm_port_guid; + + if ((status = osm_report_notice(log, subn, ¬ice))) + OSM_LOG(log, OSM_LOG_ERROR, "ERR 7601: " + "Error sending trap reports (%s)\n", + ib_get_err_str(status)); +} + +/********************************************************************** + **********************************************************************/ osm_mcm_port_t *osm_mgrp_add_port(IN osm_subn_t *subn, osm_log_t *log, IN osm_mgrp_t * const p_mgrp, IN const ib_gid_t * const p_port_gid, @@ -146,11 +175,11 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_subn_t *subn, osm_log_t *log, if ((join_state ^ prev_join_state) & IB_JOIN_STATE_FULL) { if (join_state & IB_JOIN_STATE_FULL) { if (++p_mgrp->full_members == 1) { - osm_mgrp_send_create_notice(subn, log, p_mgrp); + mgrp_send_notice(subn, log, p_mgrp, 66); p_mgrp->to_be_deleted = 0; } } else if (--p_mgrp->full_members == 0) { - osm_mgrp_send_delete_notice(subn, log, p_mgrp); + mgrp_send_notice(subn, log, p_mgrp, 67); if (!p_mgrp->well_known) p_mgrp->to_be_deleted = 1; } @@ -198,12 +227,12 @@ int osm_mgrp_remove_port(osm_subn_t *subn, osm_log_t *log, osm_mgrp_t *mgrp, if ((port_join_state ^ new_join_state) & IB_JOIN_STATE_FULL) { if (port_join_state & IB_JOIN_STATE_FULL) { if (--mgrp->full_members == 0) { - osm_mgrp_send_delete_notice(subn, log, mgrp); + mgrp_send_notice(subn, log, mgrp, 67); if (!mgrp->well_known) mgrp->to_be_deleted = 1; } } else if (++mgrp->full_members == 1) { - osm_mgrp_send_create_notice(subn, log, mgrp); + mgrp_send_notice(subn, log, mgrp, 66); mgrp->to_be_deleted = 0; } } @@ -282,46 +311,3 @@ osm_mgrp_apply_func(const osm_mgrp_t * const p_mgrp, if (p_mtn) __osm_mgrp_apply_func_sub(p_mgrp, p_mtn, p_func, context); } - -/********************************************************************** - **********************************************************************/ -static void mgrp_send_notice(osm_subn_t *subn, osm_log_t *log, - osm_mgrp_t *mgrp, unsigned num) -{ - ib_mad_notice_attr_t notice; - ib_api_status_t status; - - notice.generic_type = 0x83; /* generic SubnMgt type */ - ib_notice_set_prod_type_ho(¬ice, 4); /* A Class Manager generator */ - notice.g_or_v.generic.trap_num = CL_HTON16(num); - /* The sm_base_lid is saved in network order already. */ - notice.issuer_lid = subn->sm_base_lid; - /* following o14-12.1.11 and table 120 p726 */ - /* we need to provide the MGID */ - memcpy(¬ice.data_details.ntc_64_67.gid, - &mgrp->mcmember_rec.mgid, sizeof(ib_gid_t)); - - /* According to page 653 - the issuer gid in this case of trap - is the SM gid, since the SM is the initiator of this trap. */ - notice.issuer_gid.unicast.prefix = subn->opt.subnet_prefix; - notice.issuer_gid.unicast.interface_id = subn->sm_port_guid; - - if ((status = osm_report_notice(log, subn, ¬ice))) - OSM_LOG(log, OSM_LOG_ERROR, "ERR 7601: " - "Error sending trap reports (%s)\n", - ib_get_err_str(status)); -} - -void -osm_mgrp_send_delete_notice(IN osm_subn_t * const p_subn, - IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) -{ - mgrp_send_notice(p_subn, p_log, p_mgrp, 67); -} - -void -osm_mgrp_send_create_notice(IN osm_subn_t * const p_subn, - IN osm_log_t * const p_log, IN osm_mgrp_t * p_mgrp) -{ - mgrp_send_notice(p_subn, p_log, p_mgrp, 66); -} -- 1.6.0.2.287.g3791f From yannick.cote at qlogic.com Fri Sep 19 16:03:03 2008 From: yannick.cote at qlogic.com (yannick.cote at qlogic.com) Date: Fri, 19 Sep 2008 16:03:03 -0700 Subject: [ofa-general] [PATCH] IB/ipath: ib_ipath module hangs on unload Message-ID: <1221865383-29438-1-git-send-email-yannick.cote@qlogic.com> This fix handles the case where posting a send is requested when the link is down. This addresses bug #1117. Signed-off-by: Yannick Cote --- Roland, Please consider this for 2.6.27. drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index b766e40..eabc424 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -340,9 +340,16 @@ static int ipath_post_one_send(struct ipath_qp *qp, struct ib_send_wr *wr) int acc; int ret; unsigned long flags; + struct ipath_devdata *dd = to_idev(qp->ibqp.device)->dd; spin_lock_irqsave(&qp->s_lock, flags); + if (qp->ibqp.qp_type != IB_QPT_SMI && + !(dd->ipath_flags & IPATH_LINKACTIVE)) { + ret = -ENETDOWN; + goto bail; + } + /* Check that state is OK to post send. */ if (unlikely(!(ib_ipath_state_ops[qp->state] & IPATH_POST_SEND_OK))) goto bail_inval; -- 1.6.0.2 From rdreier at cisco.com Fri Sep 19 16:26:39 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 19 Sep 2008 16:26:39 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipath: ib_ipath module hangs on unload In-Reply-To: <1221865383-29438-1-git-send-email-yannick.cote@qlogic.com> (yannick cote's message of "Fri, 19 Sep 2008 16:03:03 -0700") References: <1221865383-29438-1-git-send-email-yannick.cote@qlogic.com> Message-ID: > Please consider this for 2.6.27. Is this a regression from 2.6.26? It doesn't seem so to me. My first reaction is that this problem is not severe enough to merit trying to get the patch into 2.6.27. - R. From vlad at lists.openfabrics.org Sat Sep 20 03:08:42 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 20 Sep 2008 03:08:42 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080920-0200 daily build status Message-ID: <20080920100842.D686FE60E02@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From arlin.r.davis at intel.com Sat Sep 20 18:34:56 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Sat, 20 Sep 2008 18:34:56 -0700 Subject: [ofa-general] RE: compat-dapl-1.2.10 install bogosity In-Reply-To: <1221679928.15868.30.camel@firewall.xsintricity.com> References: <1221627719.1927.563.camel@firewall.xsintricity.com> <1221679928.15868.30.camel@firewall.xsintricity.com> Message-ID: >Turns out the dapl-2 package was doing this to. I've attached >the patch Can you send me a Signed-off-by: ? Thanks, -arlin From dledford at redhat.com Sat Sep 20 19:00:52 2008 From: dledford at redhat.com (Doug Ledford) Date: Sat, 20 Sep 2008 22:00:52 -0400 Subject: [ofa-general] RE: compat-dapl-1.2.10 install bogosity In-Reply-To: References: <1221627719.1927.563.camel@firewall.xsintricity.com> <1221679928.15868.30.camel@firewall.xsintricity.com> Message-ID: <1221962453.605.92.camel@firewall.xsintricity.com> On Sat, 2008-09-20 at 18:34 -0700, Davis, Arlin R wrote: > > >Turns out the dapl-2 package was doing this to. I've attached > >the patch > > Can you send me a Signed-off-by: ? Signed-off-by: Doug Ledford -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From vlad at lists.openfabrics.org Sun Sep 21 03:08:29 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 21 Sep 2008 03:08:29 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080921-0200 daily build status Message-ID: <20080921100829.55931E60B37@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From sashak at voltaire.com Sun Sep 21 06:53:21 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 21 Sep 2008 16:53:21 +0300 Subject: [ofa-general] [PATCH] opensm/osm_log.c: provide useful error message when file opening fails Message-ID: <20080921135321.GC25831@sashak.voltaire.com> Provide useful error message when log file opening fails, also to stderr. Return IB_ERROR instead of IB_UNKNOWN_ERROR. This addresses bug #1207. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_log.c | 21 +++++++++------------ 1 files changed, 9 insertions(+), 12 deletions(-) diff --git a/opensm/opensm/osm_log.c b/opensm/opensm/osm_log.c index d4118b1..88633ab 100644 --- a/opensm/opensm/osm_log.c +++ b/opensm/opensm/osm_log.c @@ -260,16 +260,13 @@ static int open_out_port(IN osm_log_t * p_log) p_log->out_port = fopen(p_log->log_file_name, "w+"); if (!p_log->out_port) { - if (p_log->accum_log_file) - syslog(LOG_CRIT, - "Cannot open %s for appending. Permission denied\n", - p_log->log_file_name); - else - syslog(LOG_CRIT, - "Cannot open %s for writing. Permission denied\n", - p_log->log_file_name); - - return (IB_UNKNOWN_ERROR); + syslog(LOG_CRIT, "Cannot open file \'%s\' for %s: %s\n", + p_log->log_file_name, + p_log->accum_log_file ? "appending" : "writing", + strerror(errno)); + fprintf(stderr, "Cannot open file \'%s\': %s\n", + p_log->log_file_name, strerror(errno)); + return -1; } if (fstat(fileno(p_log->out_port), &st) == 0) @@ -283,7 +280,7 @@ static int open_out_port(IN osm_log_t * p_log) dup2(fileno(p_log->out_port), 2); } - return (0); + return 0; } int osm_log_reopen_file(osm_log_t * p_log) @@ -321,7 +318,7 @@ ib_api_status_t osm_log_init_v2(IN osm_log_t * const p_log, else if (!strcmp(log_file, "stderr")) p_log->out_port = stderr; else if (open_out_port(p_log)) - return (IB_UNKNOWN_ERROR); + return IB_ERROR; if (cl_spinlock_init(&p_log->lock) == CL_SUCCESS) return IB_SUCCESS; -- 1.6.0.2.287.g3791f From eli at mellanox.co.il Sun Sep 21 06:25:22 2008 From: eli at mellanox.co.il (Eli Cohen) Date: Sun, 21 Sep 2008 16:25:22 +0300 Subject: [ofa-general] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah Message-ID: <20080921132522.GA25090@mtls03> Commit ee1e2c82c245a5fb2864e9dbcdaab3390fde3fcc introduced an optimization on path flushing. This caused a new possible scenario in which unicast_arp_send triggers path query which could fail, causing path->ah to become NULL. A successive successfull path query will then trigger WARN_ON() in path_rec_completion(). This fix requires old_ah to differ from NULL as a prerequsite to trigger the WARN_ON(). Moreover, that commit also allowed path resolution to be triggered for an invalid path; if that path resolution failed, old_ah would be freed outside priv->lock violating the assumption that dropping references inside the lock are guaranteed not to reach zero reference. Signed-off-by: Eli Cohen --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 7e9e218..b1b425f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -442,7 +442,7 @@ static void path_rec_completion(int status, list_for_each_entry_safe(neigh, tn, &path->neigh_list, list) { if (neigh->ah) { - WARN_ON(neigh->ah != old_ah); + WARN_ON(old_ah && neigh->ah != old_ah); /* * Dropping the ah reference inside * priv->lock is safe here, because we @@ -475,6 +475,20 @@ static void path_rec_completion(int status, __skb_queue_tail(&skqueue, skb); } path->valid = 1; + } else { + list_for_each_entry_safe(neigh, tn, &path->neigh_list, list) { + if (neigh->ah) { + /* + * Dropping the ah reference inside + * priv->lock is safe here, because we + * will hold one more reference from + * the original value of path->ah (ie + * old_ah). + */ + ipoib_put_ah(neigh->ah); + neigh->ah = NULL; + } + } } path->query = NULL; -- 1.6.0.2 From yossi.openib at gmail.com Sun Sep 21 13:06:10 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Sun, 21 Sep 2008 23:06:10 +0300 Subject: [ofa-general] ***SPAM*** [PATCH] ipoib: fix a deadlock between ipoib start/stop and child interface create/delete Message-ID: <48D6A932.7040505@gmail.com> Fix a deadlock between child interface creation/deletion and ipoib start/stop. The former takes first vlan_mutex, and might take rtnl_lock via register_netdev or unregister_netdev. The latter is executed with rtnl_lock held, and tries to take vlan_mutex. We take the vlan_mutex and bring child interface up/down on a scheduled task instead of during stop/start, since ipoib_workqueue will not be flushed with rtnl_lock held. Signed-off-by: Yossi Etigin --- Fix bug #1198. One alternative approach might be to fine-grain the locking (for example use one mutex to sync child creation/deletion, and another one to sync accesses to child_intfs list). drivers/infiniband/ulp/ipoib/ipoib.h | 2 + drivers/infiniband/ulp/ipoib/ipoib_main.c | 33 ++++-------------------------- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 22 ++++++++++++++++++++ 3 files changed, 29 insertions(+), 28 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2008-09-21 21:59:57.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2008-09-21 22:02:17.000000000 +0300 @@ -299,6 +299,7 @@ struct ipoib_dev_priv { struct work_struct flush_heavy; struct work_struct restart_task; struct delayed_work ah_reap_task; + struct work_struct vlan_task; struct ib_device *ca; u8 port; @@ -503,6 +504,7 @@ void ipoib_event(struct ib_event_handler int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey); int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); +void ipoib_vlan_task(struct work_struct *work); void ipoib_pkey_poll(struct work_struct *work); int ipoib_pkey_dev_delay_open(struct net_device *dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-09-21 21:59:57.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-09-21 22:55:55.000000000 +0300 @@ -124,20 +124,8 @@ int ipoib_open(struct net_device *dev) } if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - struct ipoib_dev_priv *cpriv; - - /* Bring up any child interfaces too */ - mutex_lock(&priv->vlan_mutex); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags = cpriv->dev->flags; - if (flags & IFF_UP) - continue; - - dev_change_flags(cpriv->dev, flags | IFF_UP); - } - mutex_unlock(&priv->vlan_mutex); + atomic_long_set(&priv->vlan_task.data, 4); + queue_work(ipoib_workqueue, &priv->vlan_task); } netif_start_queue(dev); @@ -160,20 +148,8 @@ static int ipoib_stop(struct net_device ipoib_ib_dev_stop(dev, 0); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - struct ipoib_dev_priv *cpriv; - - /* Bring down any child interfaces too */ - mutex_lock(&priv->vlan_mutex); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags = cpriv->dev->flags; - if (!(flags & IFF_UP)) - continue; - - dev_change_flags(cpriv->dev, flags & ~IFF_UP); - } - mutex_unlock(&priv->vlan_mutex); + atomic_long_set(&priv->vlan_task.data, 0); + queue_work(ipoib_workqueue, &priv->vlan_task); } return 0; @@ -1081,6 +1057,7 @@ static void ipoib_setup(struct net_devic INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); + INIT_WORK(&priv->vlan_task, ipoib_vlan_task); } struct ipoib_dev_priv *ipoib_intf_alloc(const char *name) Index: b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2008-09-21 21:59:57.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2008-09-21 22:57:10.000000000 +0300 @@ -174,3 +174,25 @@ int ipoib_vlan_delete(struct net_device return ret; } + +void ipoib_vlan_task(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, vlan_task); + struct ipoib_dev_priv *cpriv; + int flags, new_flags, iffup_value; + + iffup_value = atomic_long_read(&work->data) ? IFF_UP : 0; + + mutex_lock(&priv->vlan_mutex); + list_for_each_entry(cpriv, &priv->child_intfs, list) { + flags = cpriv->dev->flags; + new_flags = (flags & ~IFF_UP) | iffup_value; + if (flags != new_flags) { + rtnl_lock(); + dev_change_flags(cpriv->dev, new_flags); + rtnl_unlock(); + } + } + mutex_unlock(&priv->vlan_mutex); +} From jon at opengridcomputing.com Sun Sep 21 15:45:19 2008 From: jon at opengridcomputing.com (Jon Mason) Date: Sun, 21 Sep 2008 17:45:19 -0500 Subject: [ofa-general] asm/byteorder.h needed in infiniband/cm.h Message-ID: <20080921224519.GA1779@opengridcomputing.com> While building the current ompi-trunk on top of OFED-1.4, I hit the following build break: connect/btl_openib_connect_ibcm.c: In function `ibcm_component_query': connect/btl_openib_connect_ibcm.c:766: error: implicit declaration of function `__constant_cpu_to_be64' make[2]: *** [connect/btl_openib_connect_ibcm.lo] Error 1 The line in question is referring to IB_CM_ASSIGN_SERVICE_ID in infiniband/cm.h. That file does not include a reference to where __constant_cpu_to_be64 is defined. When I included asm/byteorder.h, everything built fine and all iWARP tests passed on OMPI trunk. Below is the patch in question. Thanks, Jon Signed-Off-By: Jon Mason --- /usr/include/infiniband/cm.h.orig 2008-09-21 15:36:46.000000000 -0700 +++ /usr/include/infiniband/cm.h 2008-09-21 14:17:43.000000000 -0700 @@ -38,6 +38,7 @@ #include #include +#include #ifdef __cplusplus extern "C" { From wangwhao at cn.ibm.com Sun Sep 21 17:19:03 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Mon, 22 Sep 2008 08:19:03 +0800 Subject: [ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1 In-Reply-To: Message-ID: > ibclearerrors will do this. > > -- Hal Hi Hal: Thanks for your explanation! After I run ibclearerrors, ibcheckerrors did not report errors any more. Will do some more testing to see whether the situation is stable. Not sure whether you are responsible for dapl-utils, including dapltest and dtest. If you are, would you please also have a look at my issue about dapltest on SLES10? Thanks again. Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From wangwhao at cn.ibm.com Sun Sep 21 17:33:45 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Mon, 22 Sep 2008 08:33:45 +0800 Subject: [ofa-general] dapltest couldn't read ABI version In-Reply-To: Message-ID: > Hi all: > > I installed OFED 1.3.1 and 1.4 RC1 on my SLES10 SP2 x86_64 servers. dapltest failed to be run as server model. Following error is reported: > > LS21-05:~ # dapltest -T S -d -D ib0 > Server_Cmd.debug: 1 > Server_Cmd.dapl_name: ib0 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > CMA: unable to open /dev/infiniband/rdma_cm > LS21-05:32561: open_hca: ERR - RDMA channel No such file or directory > LS21-05:32561: dapls_ib_open_hca failed 40000 > DT_cs_Server: Could not open ib0 (DAT_INTERNAL_ERROR ) > DT_cs_Server: Waiting for clients to all go away... > DT_cs_Server: Cleaning up ... > DT_cs_Server (ib0): Exiting. > > File /dev/infiniband/rdma_cm exists on RHEL servers, but not on SLES servers. ib0 can communicate with other servres. Any advice or comments? Thanks in advance! Will anyone give some advice? Thanks in advance. Here are the different file locations on my SLES and RHEL nodes. SLES:~ # find /sys/ -name abi_* /sys/class/infiniband_mad/abi_version /sys/class/infiniband_cm/abi_version /sys/class/infiniband_verbs/uverbs0/abi_version /sys/class/infiniband_verbs/abi_version [root at RHEL ~]# find /sys -name abi_version /sys/class/infiniband_verbs/uverbs0/abi_version /sys/class/infiniband_verbs/abi_version /sys/class/infiniband_mad/abi_version /sys/class/misc/rdma_cm/abi_version The default file /sys/class/misc/rdma_cm/abi_version does not exist on SLES. Is it one OFED bug, or I missed to load something? SLES:~ # lsmod Module Size Used by ib_ipoib 108640 0 ib_srp 51400 0 ib_sdp 101724 0 ib_iser 51960 0 libiscsi 48000 1 ib_iser rdma_cm 52756 2 ib_sdp,ib_iser iw_cm 27400 1 rdma_cm scsi_transport_iscsi 54040 2 ib_iser,libiscsi ib_umad 33704 0 ib_addr 25992 1 rdma_cm ib_ucm 34184 0 ib_cm 57768 4 ib_ipoib,ib_srp,rdma_cm,ib_ucm ib_uverbs 60080 1 ib_ucm ib_sa 59016 4 ib_ipoib,ib_srp,rdma_cm,ib_cm mlx4_ib 81216 0 mlx4_core 116592 1 mlx4_ib ib_ipath 329288 0 ib_mthca 142884 0 ib_mad 55204 5 ib_umad,ib_cm,ib_sa,mlx4_ib,ib_mthca ib_core 95120 15 ib_ipoib,ib_srp,ib_sdp,ib_iser,rdma_cm,iw_cm,ib_umad,ib_ucm,ib_cm,ib_uverbs,ib_sa,mlx4_ib,ib_ipath,ib_mthca,ib_mad iptable_filter 19712 0 ip_tables 31048 1 iptable_filter x_tables 31112 1 ip_tables nfs 234296 1 lockd 89456 2 nfs nfs_acl 20224 1 nfs sunrpc 174024 4 nfs,lockd,nfs_acl joydev 27520 0 st 55204 0 sr_mod 33060 0 ide_disk 32768 0 ide_cd 57760 0 cdrom 52520 2 sr_mod,ide_cd ide_core 166148 2 ide_disk,ide_cd ipv6 339872 21 ib_ipoib dock 26896 0 button 24352 0 battery 27528 0 ac 22152 0 apparmor 55600 0 loop 33168 0 dm_mod 80400 0 usbhid 61088 0 i2c_piix4 26512 0 i2c_core 39936 1 i2c_piix4 ohci_hcd 38148 0 ehci_hcd 49288 0 usbcore 150312 4 usbhid,ohci_hcd,ehci_hcd bnx2 201736 0 shpchp 61984 0 pci_hotplug 44800 1 shpchp mptctl 47112 0 reiserfs 247040 1 sg 53304 0 edd 26760 0 fan 21896 0 thermal 32400 0 processor 53100 1 thermal mptsas 52112 2 mptscsih 54400 1 mptsas mptbase 95204 3 mptctl,mptsas,mptscsih scsi_transport_sas 49536 1 mptsas sd_mod 37760 3 scsi_mod 170936 12 ib_srp,ib_iser,libiscsi,scsi_transport_iscsi,st,sr_mod,mptctl,sg,mptsas,mptscsih,scsi_transport_sas,sd_mod [root at RHEL ~]# lsmod Module Size Used by nls_utf8 35137 6 loop 48977 12 nfsd 285193 17 exportfs 38849 1 nfsd lockd 99057 2 nfsd nfs_acl 36673 1 nfsd auth_rpcgss 81889 1 nfsd autofs4 57289 2 hidp 83521 2 rfcomm 104809 0 l2cap 89281 10 hidp,rfcomm bluetooth 118597 5 hidp,rfcomm,l2cap sunrpc 198025 13 nfsd,lockd,nfs_acl,auth_rpcgss ib_iser 68344 0 iscsi_tcp 58752 0 libiscsi 61952 2 ib_iser,iscsi_tcp scsi_transport_iscsi 67344 4 ib_iser,iscsi_tcp,libiscsi rdma_ucm 47232 0 qlgc_vnic 126592 0 ib_sdp 125276 0 rdma_cm 67348 3 ib_iser,rdma_ucm,ib_sdp iw_cm 43656 1 rdma_cm ib_addr 41992 1 rdma_cm ib_ipoib 113248 0 ipoib_helper 35728 2 ib_ipoib ib_cm 67368 3 qlgc_vnic,rdma_cm,ib_ipoib ib_sa 74632 4 qlgc_vnic,rdma_cm,ib_ipoib,ib_cm ib_uverbs 75568 1 rdma_ucm ib_umad 50600 0 iw_cxgb3 104788 0 cxgb3 154224 1 iw_cxgb3 ib_ipath 346444 0 mlx4_ib 95932 0 ib_mthca 159044 0 ib_mad 70948 5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthca ib_core 97664 16 ib_iser,rdma_ucm,qlgc_vnic,ib_sdp,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,ib_ipath,mlx4_ib,ib_mthca,ib_mad dm_multipath 52945 0 video 53197 0 sbs 49921 0 backlight 39873 1 video i2c_ec 38593 1 sbs i2c_core 56129 1 i2c_ec button 40545 0 battery 43849 0 asus_acpi 50917 0 acpi_memhotplug 40133 0 ac 38729 0 ipv6 420481 65 ib_ipoib xfrm_nalgo 43845 1 ipv6 crypto_api 42177 1 xfrm_nalgo parport_pc 62313 0 lp 47121 0 parport 73165 2 parport_pc,lp shpchp 70765 0 sr_mod 50789 0 i5000_edac 42177 0 mlx4_core 109008 1 mlx4_ib cdrom 68713 1 sr_mod bnx2 173917 0 edac_mc 60193 1 i5000_edac sg 69993 0 pcspkr 36289 0 dm_snapshot 50569 0 dm_zero 35265 0 dm_mirror 60489 0 dm_mod 99481 9 dm_multipath,dm_snapshot,dm_zero,dm_mirror usb_storage 116257 2 ata_piix 54981 0 libata 192345 1 ata_piix mptsas 69201 2 mptscsih 69569 1 mptsas mptbase 111461 2 mptsas,mptscsih scsi_transport_sas 66753 1 mptsas sd_mod 56257 6 scsi_mod 188665 12 ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,sr_mod,sg,usb_storage,libata,mptsas,mptscsih,scsi_transport_sas,sd_mod ext3 167249 4 jbd 93873 1 ext3 uhci_hcd 57433 0 ohci_hcd 54493 0 ehci_hcd 65741 0 Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Mon Sep 22 00:17:50 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 22 Sep 2008 10:17:50 +0300 Subject: [ofa-general] OFED 1.4-rc2 is available Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD9043D4@mtlexch01.mtl.com> Hi, OFED 1.4-rc2 release is available on http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc2.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ for OFED 1.4 Tziporet & Vladimir ======================================================================== Release information: ------------------------------ Linux Operating Systems: - RedHat EL4 up4: 2.6.9-42.ELsmp * - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7: 2.6.9-78.ELsmp - RedHat EL5: 2.6.18-8.el5 - RedHat EL5 up1: 2.6.18-53.el5 - RedHat EL5 up2: 2.6.18-92.el5 - CentOS 5.2: 2.6.18-92.el5 - Fedora C9: 2.6.25-14.fc9 * - SLES10: 2.6.16.21-0.8-smp - SLES10 SP1: 2.6.16.46-0.12-smp - SLES10 SP1 up1: 2.6.16.53-0.16-smp - SLES10 SP2: 2.6.16.60-0.21-smp - OpenSuSE 10.3: 2.6.22.5-31 * - kernel.org: 2.6.26 and 2.6.27-rc5 * Minimal QA for these versions Systems: * x86_64 * x86 * ia64 * ppc64 Main Changes from OFED 1.4-rc1 ========================= - Kernel base updated to 2.5.27-rc6 - Updated MPI packages: mvapich-1.1.0-2977 and mvapich2-1.2rc2-6 - Updated bonding package: ib-bonding-0.9.0-30 - 12 bugs fixed (see attached for details) Tasks that should be completed for the rc3: ================================ 1. NFS-RDMA to work on RHEL 5.1 2. OSM: Cashed routing 3. Cleanup compilation warning 4. Bug fixes -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed-1.4-rc2-fixed-bugs.csv Type: application/octet-stream Size: 1266 bytes Desc: ofed-1.4-rc2-fixed-bugs.csv URL: From ogerlitz at voltaire.com Mon Sep 22 01:59:06 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 22 Sep 2008 11:59:06 +0300 Subject: [ofa-general] ipoib crashes with 2.6.27-rc7 Message-ID: <48D75E5A.7010908@voltaire.com> Attempting to set an ipoib / partitioning bonding environment with 2.6.27-rc7 , I came a cross few ipoib crashes, eg these two oops listings. I understand that some patches were sent by Yossi just recently so they may help, or do they fall into the non-regression-from-2.6.26 category? Or. this is seen on node startup > mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008) > NET: Registered protocol family 10 > lo: Disabled Privacy Extensions > ADDRCONF(NETDEV_UP): ib0.8003: link is not ready > ------------[ cut here ]------------ > kernel BUG at include/linux/netdevice.h:415! > invalid opcode: 0000 [1] SMP CPU 7 > Modules linked in: rdma_ucm ib_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm > ib_sa inet_lro ipv6 ib_uverbs ib_umad mlx4_ib ib_mthca ib_mad ib_core > dm_multipath battery ac floppy sr_mod joydev sg igb mlx4_core shpchp > button pcspkr rng_core dm_snapshot dm_zero dm_mirror dm_log dm_mod > usb_storage ata_piix libata sd_mod scsi_mod dock ext3 jbd ehci_hcd > ohci_hcd uhci_hcd [last unloaded: microcode] > Pid: 3035, comm: ipoib Not tainted 2.6.27-rc7 #2 > RIP: 0010:[] [] ipoib_open+0x3c/0x150 > [ib_ipoib] > RSP: 0018:ffff880229d15e90 EFLAGS: 00010246 > RAX: ffff88021f00a878 RBX: ffff88021f00a7a0 RCX: 0000000000000000 > RDX: 0003000600000000 RSI: ffff88022e029880 RDI: ffff88021f00a000 > RBP: ffff88021f00a780 R08: 0000000000000000 R09: ffffffff805a8e40 > R10: 0000000000000000 R11: 0000000000000003 R12: ffff88021f00a000 > R13: ffffffffa01f4af2 R14: ffffffff805e32c0 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff88022f826580(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 00000000008cb170 CR3: 000000022e5d2000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process ipoib (pid: 3035, threadinfo ffff880229d14000, task ffff88022e195f00) > Stack: ffff88021f00a878 ffff88022d02c780 ffff88021f00a870 ffffffff8023fd92 > ffff88022c531d18 ffff88022d02c780 ffff88022d02c7a8 ffff88022c531d18 > ffffffff805e0e80 ffffffff80240700 0000000000000000 ffff88022e195f00 > Call Trace: > [] ? run_workqueue+0x88/0x118 > [] ? worker_thread+0xd5/0xe0 > [] ? autoremove_wake_function+0x0/0x2e > [] ? worker_thread+0x0/0xe0 > [] ? kthread+0x47/0x73 > [] ? schedule_tail+0x28/0x60 > [] ? child_rip+0xa/0x11 > [] ? kthread+0x0/0x73 > [] ? child_rip+0x0/0x11 > > > Code: 07 00 00 53 7e 12 48 8b 75 18 48 c7 c7 ff c5 1f a0 31 c0 e8 e7 eb 03 > e0 41 f6 84 24 b0 07 00 00 01 49 8d 9c 24 a0 07 00 00 75 04 <0f> 0b eb fe > f0 80 63 10 fe f0 80 8d 80 00 00 00 04 4c 89 e7 e8 > RIP [] ipoib_open+0x3c/0x150 [ib_ipoib] > RSP > ---[ end trace d51c7bec8b19b076 ]--- and this takes place when you attempt to take ib0 down in the presence of child devices which are not running, if there are no child devices it doesn't happen > ib0.8003: Failed to modify QP to ERROR state > BUG: soft lockup - CPU#0 stuck for 61s! [ifconfig:7481] > CPU 0: > Modules linked in: autofs4 sunrpc ib_iser iscsi_tcp libiscsi scsi_transport_iscsi bonding rdma_ucm ib_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa inet_lro ipv6 ib_uverbs ib_umad mlx4_ib ib_mthca ib_mad ib_core dm_multipath battery ac floppy sr_mod igb joydev mlx4_core shpchp sg button pcspkr rng_core dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix libata sd_mod scsi_mod dock ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode] > Pid: 7481, comm: ifconfig Tainted: G D 2.6.27-rc7 #2 > RIP: 0010:[] [] lock_timer_base+0x15/0x4b > RSP: 0018:ffff880213d75c28 EFLAGS: 00000246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100 > RDX: 0000000000001800 RSI: ffff880213d75c68 RDI: ffff880222cb94d0 > RBP: ffff880222cb8000 R08: 0000000000000100 R09: ffff8800280bb900 > R10: 0000000000000000 R11: ffffffff8031c680 R12: ffff880222cb8780 > R13: ffff880222cb8780 R14: ffff880222cb87a0 R15: ffff88002805cf00 > FS: 00007f7f380fc710(0000) GS:ffffffff805a9a80(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00007f52433af000 CR3: 000000021c5cf000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Call Trace: > [] ? try_to_del_timer_sync+0x16/0x5a > [] ? del_timer_sync+0xc/0x16 > [] ? ipoib_ib_dev_stop+0x190/0x26d [ib_ipoib] > [] ? _spin_lock_irqsave+0x9/0xe > [] ? lock_timer_base+0x26/0x4b > [] ? default_wake_function+0x0/0xe > [] ? _spin_unlock_irq+0x9/0xc > [] ? ipoib_flush_paths+0x13a/0x145 [ib_ipoib] > [] ? ipoib_stop+0x7e/0xf8 [ib_ipoib] > [] ? dev_close+0x6f/0x87 > [] ? dev_change_flags+0xa6/0x15c > [] ? ipoib_stop+0xb8/0xf8 [ib_ipoib] > [] ? dev_close+0x6f/0x87 > [] ? dev_change_flags+0xa6/0x15c > [] ? devinet_ioctl+0x242/0x58a > [] ? sock_ioctl+0x1d2/0x1f9 > [] ? vfs_ioctl+0x21/0x6b > [] ? do_vfs_ioctl+0x259/0x272 > [] ? sys_ioctl+0x51/0x73 From vlad at lists.openfabrics.org Mon Sep 22 03:09:42 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 22 Sep 2008 03:09:42 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080922-0200 daily build status Message-ID: <20080922100942.BE31BE60C57@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From tziporet at dev.mellanox.co.il Mon Sep 22 03:38:27 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 22 Sep 2008 13:38:27 +0300 Subject: [ofa-general] asm/byteorder.h needed in infiniband/cm.h In-Reply-To: <20080921224519.GA1779@opengridcomputing.com> References: <20080921224519.GA1779@opengridcomputing.com> Message-ID: <48D775A3.5010104@mellanox.co.il> Jon Mason wrote: > While building the current ompi-trunk on top of OFED-1.4, I hit the following > build break: > > connect/btl_openib_connect_ibcm.c: In function `ibcm_component_query': > connect/btl_openib_connect_ibcm.c:766: error: implicit declaration of function `__constant_cpu_to_be64' > make[2]: *** [connect/btl_openib_connect_ibcm.lo] Error 1 > > The line in question is referring to IB_CM_ASSIGN_SERVICE_ID in infiniband/cm.h. > That file does not include a reference to where __constant_cpu_to_be64 is > defined. When I included asm/byteorder.h, everything built fine and all iWARP > tests passed on OMPI trunk. > > Below is the patch in question. > > Thanks, > Jon > > Signed-Off-By: Jon Mason > > --- /usr/include/infiniband/cm.h.orig 2008-09-21 15:36:46.000000000 -0700 > +++ /usr/include/infiniband/cm.h 2008-09-21 14:17:43.000000000 -0700 > @@ -38,6 +38,7 @@ > > #include > #include > +#include > > #ifdef __cplusplus > extern "C" { > Arlin Can you look at this since Sean is on vacation Thanks Tziporet From tziporet at dev.mellanox.co.il Mon Sep 22 03:40:37 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 22 Sep 2008 13:40:37 +0300 Subject: [ofa-general] dapltest couldn't read ABI version In-Reply-To: References: Message-ID: <48D77625.9080600@mellanox.co.il> Wen Hao Wang wrote: > > > Hi all: > > > > I installed OFED 1.3.1 and 1.4 RC1 on my SLES10 SP2 x86_64 servers. > dapltest failed to be run as server model. Following error is reported: > > > > LS21-05:~ # dapltest -T S -d -D ib0 > > Server_Cmd.debug: 1 > > Server_Cmd.dapl_name: ib0 > > librdmacm: couldn't read ABI version. > > librdmacm: assuming: 4 > > CMA: unable to open /dev/infiniband/rdma_cm > > LS21-05:32561: open_hca: ERR - RDMA channel No such file or directory > > LS21-05:32561: dapls_ib_open_hca failed 40000 > > DT_cs_Server: Could not open ib0 (DAT_INTERNAL_ERROR ) > > DT_cs_Server: Waiting for clients to all go away... > > DT_cs_Server: Cleaning up ... > > DT_cs_Server (ib0): Exiting. > > > > File /dev/infiniband/rdma_cm exists on RHEL servers, but not on SLES > servers. ib0 can communicate with other servres. Any advice or > comments? Thanks in advance! > > Will anyone give some advice? Thanks in advance. Here are the > different file locations on my SLES and RHEL nodes. > Arlin is DAPL maintainer Tziporet From vegalew at hotmail.com Mon Sep 22 04:31:19 2008 From: vegalew at hotmail.com (vega) Date: Mon, 22 Sep 2008 19:31:19 +0800 Subject: [ofa-general] ***SPAM*** fresh man want some help in intalling the OFED Message-ID: Dear all, I'm a freshman. There is a infiniband in our Computer Center. Unfortunately, the system administrator knowing little about how to use infiniband correctly. But I want to use it, because my DFT calculation with openmpi run in a very low efficiency without it . I have downloaded the OFED-1.3.1.tgz. then I unpack it using the command like this, tar -zxvf OFED-1.3.1.tgz. And I found intall.pl under the folder, run it using ./install.pl. then I choose '2' and then '3'. Then I choose OFA and then yes for every thing and input a location for it. After all above was done. The the programme said 'gcc-3.3.3 rpm is required to build libibverbs'. I have checked my current gcc. I found my vision is 3.2.3, a liitle older. How could I deal with this problem? Should I update the gcc to 3.3.3? if so, do you think root permissions is required? Is there a way to finish all the installation without root permissions? Do you think intel c++ compiler could doing the same thing instead of gcc? any hints will be deeply appreciated. vega ================================================================================= Vega Lew (weijia liu) PH.D Candidate in Chemical Engineering State Key Laboratory of Materials-oriented Chemical Engineering College of Chemistry and Chemical Engineering Nanjing University of Technology, 210009, Nanjing, Jiangsu, China -------------- next part -------------- An HTML attachment was scrubbed... URL: From hal.rosenstock at gmail.com Mon Sep 22 05:56:09 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 22 Sep 2008 08:56:09 -0400 Subject: [ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1 In-Reply-To: References: Message-ID: On Sun, Sep 21, 2008 at 8:19 PM, Wen Hao Wang wrote: >> ibclearerrors will do this. >> >> -- Hal > > Hi Hal: > > Thanks for your explanation! After I run ibclearerrors, ibcheckerrors did > not report errors any more. Will do some more testing to see whether the > situation is stable. > > Not sure whether you are responsible for dapl-utils, including dapltest and > dtest. A list of the maintainers can be found as: http://www.openfabrics.org/txt/woody/maintainers.txt for Linux -- Hal > If you are, would you please also have a look at my issue about > dapltest on SLES10? Thanks again. > > Wen Hao Wang > Email: wangwhao at cn.ibm.com From tziporet at dev.mellanox.co.il Mon Sep 22 06:52:49 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 22 Sep 2008 16:52:49 +0300 Subject: [ofa-general] ***SPAM*** fresh man want some help in intalling the OFED In-Reply-To: References: Message-ID: <48D7A331.8090203@mellanox.co.il> vega wrote: > I have downloaded the OFED-1.3.1.tgz. then I unpack it using the > command like this, tar -zxvf OFED-1.3.1.tgz. > And I found intall.pl under the folder, run it using ./install.pl. > then I choose '2' and then '3'. Then I choose OFA and then yes for > every thing and input a location for it. > After all above was done. The the programme said 'gcc-3.3.3 rpm is > required to build libibverbs'. > I have checked my current gcc. I found my vision is 3.2.3, a liitle > older. > How could I deal with this problem? Should I update the gcc to 3.3.3? yes > if so, do you think root permissions is required? > Is there a way to finish all the installation without root > permissions? Do you think intel c++ compiler could > doing the same thing instead of gcc? > You must be root to install OFED Tziporet From oren at mellanox.co.il Mon Sep 22 07:32:45 2008 From: oren at mellanox.co.il (oren at mellanox.co.il) Date: Mon, 22 Sep 2008 17:32:45 +0300 Subject: [ofa-general][PATCH 0/3] mlx4: fixes to support mlx4_fc driver Message-ID: The following 3 patches fix some bugs and adds API needed to support the mlx4_fc driver. Patch 1: Adds API that allows any of the mlx4 interfaces query the core about a given underlying device structure. Currently implemented only to mlx4_en interface, this means getting the relevant mlx4_dev and port number for a given net_device. Later, this API will also allow getting the relevant mlx4_dev and port for a given ib_dev. Patch 2: Fixes a bug in VLANs registration. The hardware VLANs table starts at entry 2, with 0 and 1 reserved for no-vlan and vlan-miss special cases. Patch 3: Fixes a bug in reserving FEXCH QPs and MPTs for the mlx4_fc driver. The reserved range should be twice as large, as it is per-port. All these patches rely on the mlx4_en patches, submitted by Yevgeny P. Signed-off-by: Oren Duer From oren at mellanox.co.il Mon Sep 22 07:33:44 2008 From: oren at mellanox.co.il (oren at mellanox.co.il) Date: Mon, 22 Sep 2008 17:33:44 +0300 Subject: [ofa-general][PATCH 1/3] mlx4: Query internal device Message-ID: mlx4: Add API to query interfaces for given internal device Updated mlx4_en interface to provide a query function for it's internal net_device structure. Signed-off-by: Oren Duer Index: ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/en_main.c =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig.orig/drivers/net/mlx4/en_main.c 2008-09-04 14:45:56.000000000 +0300 +++ ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/en_main.c 2008-09-04 14:46:17.440543000 +0300 @@ -234,10 +234,24 @@ err_free_res: return NULL; } +enum mlx4_query_reply mlx4_en_query(void *endev_ptr, void *int_dev) +{ + struct mlx4_en_dev *mdev = endev_ptr; + struct net_device *netdev = int_dev; + int p; + + for (p = 1; p <= MLX4_MAX_PORTS; ++p) + if (mdev->pndev[p] == netdev) + return p; + + return MLX4_QUERY_NOT_MINE; +} + static struct mlx4_interface mlx4_en_interface = { .add = mlx4_en_add, .remove = mlx4_en_remove, - .event = mlx4_en_event + .event = mlx4_en_event, + .query = mlx4_en_query }; static int __init mlx4_en_init(void) Index: ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/intf.c =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig.orig/drivers/net/mlx4/intf.c 2008-09-04 14:45:47.000000000 +0300 +++ ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/intf.c 2008-09-04 14:46:02.196098000 +0300 @@ -112,6 +112,36 @@ void mlx4_unregister_interface(struct ml } EXPORT_SYMBOL_GPL(mlx4_unregister_interface); +struct mlx4_dev *mlx4_query_interface(void *int_dev, int *port) +{ + struct mlx4_priv *priv; + struct mlx4_device_context *dev_ctx; + enum mlx4_query_reply r; + unsigned long flags; + + mutex_lock(&intf_mutex); + + list_for_each_entry(priv, &dev_list, dev_list) { + spin_lock_irqsave(&priv->ctx_lock, flags); + list_for_each_entry(dev_ctx, &priv->ctx_list, list) { + if (!dev_ctx->intf->query) + continue; + r = dev_ctx->intf->query(dev_ctx->context, int_dev); + if (r != MLX4_QUERY_NOT_MINE) { + *port = r; + spin_unlock_irqrestore(&priv->ctx_lock, flags); + mutex_unlock(&intf_mutex); + return &priv->dev; + } + } + spin_unlock_irqrestore(&priv->ctx_lock, flags); + } + + mutex_unlock(&intf_mutex); + return NULL; +} +EXPORT_SYMBOL_GPL(mlx4_query_interface); + void mlx4_dispatch_event(struct mlx4_dev *dev, enum mlx4_dev_event type, int port) { struct mlx4_priv *priv = mlx4_priv(dev); Index: ofed_kernel-2.6.18-EL5.1.orig/include/linux/mlx4/driver.h =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig.orig/include/linux/mlx4/driver.h 2008-09-04 14:45:47.000000000 +0300 +++ ofed_kernel-2.6.18-EL5.1.orig/include/linux/mlx4/driver.h 2008-09-04 14:46:02.201102000 +0300 @@ -44,15 +44,22 @@ enum mlx4_dev_event { MLX4_DEV_EVENT_PORT_REINIT, }; +enum mlx4_query_reply { + MLX4_QUERY_NOT_MINE = -1, + MLX4_QUERY_MINE_NOPORT = 0 +}; + struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); void (*event) (struct mlx4_dev *dev, void *context, enum mlx4_dev_event event, int port); + enum mlx4_query_reply (*query) (void *context, void *); struct list_head list; }; int mlx4_register_interface(struct mlx4_interface *intf); void mlx4_unregister_interface(struct mlx4_interface *intf); +struct mlx4_dev *mlx4_query_interface(void *, int *port); #endif /* MLX4_DRIVER_H */ From oren at mellanox.co.il Mon Sep 22 07:34:10 2008 From: oren at mellanox.co.il (oren at mellanox.co.il) Date: Mon, 22 Sep 2008 17:34:10 +0300 Subject: [ofa-general][PATCH 2/3] mlx4: Register VLAN bugfix Message-ID: mlx4_core: Fix VLAN registration Signed-off-by: Oren Duer Index: ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/en_port.c =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig.orig/drivers/net/mlx4/en_port.c 2008-09-04 15:00:59.497911000 +0300 +++ ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/en_port.c 2008-09-04 15:01:16.979282000 +0300 @@ -129,6 +129,10 @@ int mlx4_SET_PORT_qpn_calc(struct mlx4_d context->base_qpn = cpu_to_be32(base_qpn); context->promisc = cpu_to_be32(promisc << SET_PORT_PROMISC_SHIFT | base_qpn); context->mcast = cpu_to_be32(1 << SET_PORT_PROMISC_SHIFT | base_qpn); + context->intra_no_vlan = 0; + context->no_vlan = MLX4_NO_VLAN_IDX; + context->intra_vlan_miss = 0; + context->vlan_miss = MLX4_VLAN_MISS_IDX; in_mod = MLX4_SET_PORT_RQP_CALC << 8 | port; err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT, Index: ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/en_port.h =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig.orig/drivers/net/mlx4/en_port.h 2008-09-04 15:00:59.500913000 +0300 +++ ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/en_port.h 2008-09-04 15:01:16.984285000 +0300 @@ -75,9 +75,9 @@ struct mlx4_set_port_rqp_calc_context { __be32 flags; u8 reserved[3]; u8 mac_miss; - u8 reserved2; + u8 intra_no_vlan; u8 no_vlan; - u8 reserved3; + u8 intra_vlan_miss; u8 vlan_miss; u8 reserved4[3]; u8 no_vlan_prio; Index: ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/mlx4.h =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig.orig/drivers/net/mlx4/mlx4.h 2008-09-04 15:01:13.697393000 +0300 +++ ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/mlx4.h 2008-09-04 15:01:31.944725000 +0300 @@ -267,7 +267,7 @@ struct mlx4_mac_table { }; struct mlx4_vlan_table { -#define MLX4_MAX_VLAN_NUM 126 +#define MLX4_MAX_VLAN_NUM 128 #define MLX4_VLAN_MASK 0xfff #define MLX4_VLAN_VALID (1 << 31) #define MLX4_VLAN_TABLE_SIZE (MLX4_MAX_VLAN_NUM << 2) Index: ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/port.c =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig.orig/drivers/net/mlx4/port.c 2008-09-04 15:00:59.507914000 +0300 +++ ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/port.c 2008-09-04 15:01:16.995288000 +0300 @@ -56,7 +56,7 @@ void mlx4_init_vlan_table(struct mlx4_de int i; sema_init(&table->vlan_sem, 1); - for (i = 0; i < MLX4_MAX_MAC_NUM; i++) { + for (i = 0; i < MLX4_MAX_VLAN_NUM; i++) { table->entries[i] = 0; table->refs[i] = 0; } @@ -185,7 +185,7 @@ int mlx4_register_vlan(struct mlx4_dev * int free = -1; down(&table->vlan_sem); - for (i = 0; i < MLX4_MAX_VLAN_NUM; i++) { + for (i = MLX4_VLAN_REGULAR; i < MLX4_MAX_VLAN_NUM; i++) { if (free < 0 && (table->refs[i] == 0)) { free = i; continue; @@ -231,6 +231,11 @@ void mlx4_unregister_vlan(struct mlx4_de { struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port].vlan_table; + if (index < MLX4_VLAN_REGULAR) { + mlx4_warn(dev, "Trying to free special vlan index %d\n", index); + return; + } + down(&table->vlan_sem); if (!table->refs[index]) { mlx4_warn(dev, "No vlan entry for index %d\n", index); Index: ofed_kernel-2.6.18-EL5.1.orig/include/linux/mlx4/device.h =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig.orig/include/linux/mlx4/device.h 2008-09-04 15:00:59.514915000 +0300 +++ ofed_kernel-2.6.18-EL5.1.orig/include/linux/mlx4/device.h 2008-09-04 15:01:17.003285000 +0300 @@ -162,6 +162,12 @@ enum { MLX4_NUM_FEXCH = 64 * 1024, }; +enum mlx4_special_vlan_idx { + MLX4_NO_VLAN_IDX = 0, + MLX4_VLAN_MISS_IDX, + MLX4_VLAN_REGULAR +}; + #define MLX4_LEAST_ATTACHED_VECTOR 0xffffffff static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) From oren at mellanox.co.il Mon Sep 22 07:34:29 2008 From: oren at mellanox.co.il (oren at mellanox.co.il) Date: Mon, 22 Sep 2008 17:34:29 +0300 Subject: [ofa-general][PATCH 3/3] mlx4: Reserve FEXCH per port Message-ID: mlx4_core: Double the amount of reserved QPs for FEXCH Needs to be per-port Signed-off-by: Oren Duer Index: ofed_kernel-2.6.18-EL5.1/drivers/net/mlx4/main.c =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig/drivers/net/mlx4/main.c 2008-09-04 13:41:36.000000000 +0300 +++ ofed_kernel-2.6.18-EL5.1/drivers/net/mlx4/main.c 2008-09-04 13:42:11.126779000 +0300 @@ -76,12 +76,12 @@ static char mlx4_version[] __devinitdata DRV_VERSION " (" DRV_RELDATE ")\n"; static struct mlx4_profile default_profile = { - .num_qp = 1 << 17, + .num_qp = 1 << 18, .num_srq = 1 << 16, .rdmarc_per_qp = 1 << 4, .num_cq = 1 << 16, .num_mcg = 1 << 13, - .num_mpt = 1 << 18, + .num_mpt = 1 << 19, .num_mtt = 1 << 20, }; Index: ofed_kernel-2.6.18-EL5.1/include/linux/mlx4/device.h =================================================================== --- ofed_kernel-2.6.18-EL5.1.orig/include/linux/mlx4/device.h 2008-09-03 17:56:42.000000000 +0300 +++ ofed_kernel-2.6.18-EL5.1/include/linux/mlx4/device.h 2008-09-04 13:46:07.224047000 +0300 @@ -159,7 +159,7 @@ enum mlx4_port_type { }; enum { - MLX4_NUM_FEXCH = 64 * 1024, + MLX4_NUM_FEXCH = MLX4_MAX_PORTS * 64 * 1024, }; enum mlx4_special_vlan_idx { From tziporet at mellanox.co.il Mon Sep 22 08:00:03 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 22 Sep 2008 18:00:03 +0300 Subject: [ofa-general] OFED meeting agenda for today (Sep 22) Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD904A32@mtlexch01.mtl.com> > Agenda for OFED meeting today on OFED 1.4 status: > > 1. High priority bugs review (see attached) > 1128 blo Othe stefan.roscher at de.ibm.com release IPoIB-CM QP resources in flushing CQE context 1192 cri RHEL alekseys at voltaire.com Failed to comple ofa_kernel RPM on RH4 up7, ia64 arch' 1215 cri Othe bugzilla at openib.org The driver is not recognizing the device under CentOS 5.2... 1199 cri RHEL dorons at voltaire.com tvflash and tgt-generic fail to build if --prefix is used 1188 cri RHEL eli at mellanox.co.il kernel panic in neigh_destroy during ib_ipoib unload 1191 cri SLES eli at mellanox.co.il Failed to load ib_ipoib on sles10 sp2, ia64 1196 cri SLES eli at mellanox.co.il rds module is in use right after installation and thus op... 1113 cri RHEL vu at mellanox.com rpm -e scsi-target-utils-0.1-2008715 fails 1117 cri SLES yannick.cote at qlogic.com ib_ipath module hangs on unload 1186 cri RHEL yosefe at voltaire.com lockup in IPoIB during module unload 1198 cri SLES yosefe at voltaire.com hang during ipoib create_child/ifdown 1164 maj SLES eli at mellanox.co.il iperf over IPoIB fails for 100 tcp connections 1183 maj SLES tziporet at mellanox.co.il higest priority doesn't become the MASTER opensm 1099 maj All vlad at mellanox.co.il IPoIB IPv6 does not work 1153 maj Othe yosefe at voltaire.com OpenSM- Multicast group will not open when IB host is the... > 2. Review OFED 1.4 status of each company and decide on rc3 date My suggestion - October 7 > 3. Open discussion > > Tziporet > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Sep 22 08:28:52 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 Sep 2008 08:28:52 -0700 Subject: [ofa-general][PATCH 0/3] mlx4: fixes to support mlx4_fc driver In-Reply-To: (oren@mellanox.co.il's message of "Mon, 22 Sep 2008 17:32:45 +0300") References: Message-ID: > Patch 2: Fixes a bug in VLANs registration. The hardware VLANs table starts at entry 2, with 0 and 1 reserved for no-vlan and vlan-miss special cases. > Patch 3: Fixes a bug in reserving FEXCH QPs and MPTs for the mlx4_fc driver. The reserved range should be twice as large, as it is per-port. Since I haven't applied any of these original patches, can you please just roll the fixes into the original patches and repost the series? Also re-reading the patches and fixing the obvious typos wouldn't hurt either. Thanks, Roland From rdreier at cisco.com Mon Sep 22 08:32:31 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 Sep 2008 08:32:31 -0700 Subject: [ofa-general] ipoib crashes with 2.6.27-rc7 In-Reply-To: <48D75E5A.7010908@voltaire.com> (Or Gerlitz's message of "Mon, 22 Sep 2008 11:59:06 +0300") References: <48D75E5A.7010908@voltaire.com> Message-ID: > Attempting to set an ipoib / partitioning bonding environment with > 2.6.27-rc7 , I came a cross few ipoib crashes, eg these two oops > listings. I understand that some patches were sent by Yossi just > recently so they may help, or do they fall into the > non-regression-from-2.6.26 category? Well, are they regressions or not? Have you debugged them at all? The second one looks like the uninitialized timer bug, which we have a patch for but which has been there for a long time. - R. From arlin.r.davis at intel.com Mon Sep 22 09:05:30 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 22 Sep 2008 09:05:30 -0700 Subject: [ofa-general] dapltest couldn't read ABI version In-Reply-To: References: Message-ID: > CMA: unable to open /dev/infiniband/rdma_cm > LS21-05:32561: open_hca: ERR - RDMA channel No such file or directory > LS21-05:32561: dapls_ib_open_hca failed 40000 > DT_cs_Server: Could not open ib0 (DAT_INTERNAL_ERROR ) > DT_cs_Server: Waiting for clients to all go away... > DT_cs_Server: Cleaning up ... > DT_cs_Server (ib0): Exiting. > > File /dev/infiniband/rdma_cm exists on RHEL servers, but not on SLES servers. ib0 can communicate with other servres. Any advice or comments? Thanks in advance! Will anyone give some advice? Thanks in advance. Here are the different file locations on my SLES and RHEL nodes. Looks like rdma_ucm module didn't get loaded. Check /etc/infiniband/openib.conf and make sure loading rdma_ucm is set properly: [root at cst-53 dapl]# cat /etc/infiniband/openib.conf # Start HCA driver upon boot ONBOOT=yes # Load RDMA_CM module RDMA_CM_LOAD=yes # Load RDMA_UCM module RDMA_UCM_LOAD=yes -arlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronniz at mellanox.co.il Mon Sep 22 09:46:19 2008 From: ronniz at mellanox.co.il (Ronni Zimmermann) Date: Mon, 22 Sep 2008 19:46:19 +0300 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809190739l155f81b7q9668e17406f83eca@mail.gmail.com> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD904AF6@mtlexch01.mtl.com> Hi, We run tests which use atomic operations (both fetch and add and comp and swap) on PPC64 all the time, without experiencing any problem. Just to make sure I ran few simple tests, which use atomic operations, on our PPC64 machines, both with SLES10 SP1 and with RHAS5.1, and all of them passed. I was working with the latest OFED1.4 driver and mlx4 HCA with the latest released FW and with FW 2.3.000 (on the SLES10 SP1 machine). Given the above information I believe that there's either a problem with your code (although looking at the code you posted I couldn’t see anything wrong) or it's an OFED1.2.5 issue, as Dotan suggested. Ronni. -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Rui Machado Sent: ו 19 ספטמבר 2008 17:39 To: Dotan Barak Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] atomic operations on ppc64 2008/9/19 Dotan Barak : > >>> It seems that you didn't check the same HCA in each arch. >>> >>> Can you try to use the mthca device in PPC64 and check the results? >>> >>> (Anyway, i would have suggest you to upgrade the OFED package that >>> you are >>> using) >>> >> >> I apologize for my newbieness here but I do not understand what you mean. >> The two machines have different devices. What do you mean by making >> the ppc64 machine use the mthca when it does only have the mlx4? >> >> Unfortunately, upgrading might not be an (easy) option :-/ >> >> Thanks for the patience >> > > You have 2 machines, each with another device: > x86 has the HCA 25204 > PPC64 has the HCA 25418 > > > The fact that the test didn't pass for you on the PPC64 may be related > to this device (i think that OFED 1.2.5 was the first version that > supported this device). > > So, i suggest that you'll put the HCA 25204 on the PPC64 and check if > the failure still exists. > > > Dotan > Right :) Now I get it! I couldn't (yet at least) put the HCA 25204 on the ppc64. But I tried again with another ppc64 machine. ibv_devinfo hca_id: mthca0 fw_ver: 4.8.200 node_guid: 0005:ad00:001d:cd24 sys_image_guid: 0005:ad00:0100:d050 vendor_id: 0x05ad vendor_part_id: 25208 hw_ver: 0xA0 board_id: HCA.HSDC.A0 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 142 port_lid: 156 port_lmc: 0x01 port: 2 state: PORT_DOWN (1) max_mtu: 2048 (4) active_mtu: 512 (2) sm_lid: 0 port_lid: 0 port_lmc: 0x00 The result was still the same as before: ppc64-ppc64 - the counter doesn't get updated. ppc64-x86 - clients hangs at poll Just as an extra info... _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From yannick.cote at qlogic.com Mon Sep 22 10:18:41 2008 From: yannick.cote at qlogic.com (Yannick Cote) Date: Mon, 22 Sep 2008 10:18:41 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipath: ib_ipath module hangs on unload In-Reply-To: References: <1221865383-29438-1-git-send-email-yannick.cote@qlogic.com> Message-ID: <48D7D371.8010202@qlogic.com> Roland Dreier wrote: > > Please consider this for 2.6.27. > > Is this a regression from 2.6.26? It doesn't seem so to me. My first > reaction is that this problem is not severe enough to merit trying to > get the patch into 2.6.27. > > - R. > No it isn't. Being queued up for 2.6.28 is fine. - Yan From michael.heinz at qlogic.com Mon Sep 22 11:42:08 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon, 22 Sep 2008 13:42:08 -0500 Subject: [ofa-general] Allowing end-users to query for fabric information Message-ID: I recently noticed that OFED prohibits non-root users from doing sa queries. This means that non-root users cannot, for example, diagnose connectivity problems without the help of an administrator. What was the reason for making this design choice? While I could certainly provide boot scripts to change the permissions to /dev/infiniband/umad*, I'd rather understand why the decision was made to restrict access. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From roger at terascala.com Mon Sep 22 11:43:26 2008 From: roger at terascala.com (Roger Spellman) Date: Mon, 22 Sep 2008 14:43:26 -0400 Subject: [ofa-general] Intermittent: ib0: multicast join failed References: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> <20080919170622.GI27236@sashak.voltaire.com> <2C7DE72B9BD00F44BAECA5B0CBB87395321855@hermes.terascala.com> Message-ID: <2C7DE72B9BD00F44BAECA5B0CBB873953218BE@hermes.terascala.com> Thanks, Hal. Below is the output to ibstat and ibstatus. It shows that the rate is 2.5 Gb/sec, rather than 10 Gb/sec. Is there a way to get it to renegotiate the rate, short of rebooting? [root at ts-raid6-03 lib64]# ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.2.936 Hardware version: a0 Node GUID: 0x0002c9020026e4c0 System image GUID: 0x0002c9020026e4c3 Port 1: State: Active Physical state: LinkUp Rate: 2 Base lid: 19 LMC: 0 SM lid: 1 Capability mask: 0x02510a68 Port GUID: 0x0002c9020026e4c1 [root at ts-raid6-03 lib64]# ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0026:e4c1 base lid: 0x13 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 2.5 Gb/sec (1X) > It's likely a rate issue where the negotiated port rate is not the > broadcast group rate. > What does ibstat or ibstatus show when the join fails ? Also, what > about saquery -g ? > > > > Rebooting the node that failed to join the group always seems to solve > > the problem. > Yes, that's consistent with the negotiated rate being a problem. > -- Hal From hal.rosenstock at gmail.com Mon Sep 22 11:50:03 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 22 Sep 2008 14:50:03 -0400 Subject: [ofa-general] Intermittent: ib0: multicast join failed In-Reply-To: <2C7DE72B9BD00F44BAECA5B0CBB873953218BE@hermes.terascala.com> References: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> <20080919170622.GI27236@sashak.voltaire.com> <2C7DE72B9BD00F44BAECA5B0CBB87395321855@hermes.terascala.com> <2C7DE72B9BD00F44BAECA5B0CBB873953218BE@hermes.terascala.com> Message-ID: On Mon, Sep 22, 2008 at 2:43 PM, Roger Spellman wrote: > Thanks, Hal. > > Below is the output to ibstat and ibstatus. It shows that the rate is > 2.5 Gb/sec, rather than 10 Gb/sec. > > Is there a way to get it to renegotiate the rate, short of rebooting? Try ibportstate reset on the switch peer port. You could also replug the cable on that link. > [root at ts-raid6-03 lib64]# ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.2.936 > Hardware version: a0 > Node GUID: 0x0002c9020026e4c0 > System image GUID: 0x0002c9020026e4c3 > Port 1: > State: Active > Physical state: LinkUp > Rate: 2 > Base lid: 19 > LMC: 0 > SM lid: 1 > Capability mask: 0x02510a68 > Port GUID: 0x0002c9020026e4c1 > [root at ts-raid6-03 lib64]# ibstatus > Infiniband device 'mthca0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c902:0026:e4c1 > base lid: 0x13 > sm lid: 0x1 > state: 4: ACTIVE > phys state: 5: LinkUp > rate: 2.5 Gb/sec (1X) > > > > >> It's likely a rate issue where the negotiated port rate is not the >> broadcast group rate. Yes, it's a rate problem (the link is coming up a 1X SDR which is 2.5 Gbps whereas I suspect that the group is 10 Gbps so it can't join. -- Hal >> What does ibstat or ibstatus show when the join fails ? Also, what >> about saquery -g ? > >> > >> > Rebooting the node that failed to join the group always seems to > solve >> > the problem. > >> Yes, that's consistent with the negotiated rate being a problem. > >> -- Hal > From rdreier at cisco.com Mon Sep 22 12:17:54 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 Sep 2008 12:17:54 -0700 Subject: [ofa-general] Allowing end-users to query for fabric information In-Reply-To: (Mike Heinz's message of "Mon, 22 Sep 2008 13:42:08 -0500") References: Message-ID: > What was the reason for making this design choice? While I could > certainly provide boot scripts to change the permissions to > /dev/infiniband/umad*, I'd rather understand why the decision was made > to restrict access. because /dev/infiniband/umadX allows full unfiltered access to send/receive any MADs. Including changing routing tables, bringing ports down, etc. Not stuff that unprivileged users should be able to do. It would make sense to have a higher-level interface that only allows safe queries without side effects, but that's quite a bit more work than just changing permissions on device nodes. - R. From michael.heinz at qlogic.com Mon Sep 22 12:18:56 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon, 22 Sep 2008 14:18:56 -0500 Subject: [ofa-general] Allowing end-users to query for fabric information In-Reply-To: References: Message-ID: Thanks for the explanation. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Monday, September 22, 2008 3:18 PM To: Mike Heinz Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] Allowing end-users to query for fabric information > What was the reason for making this design choice? While I could > certainly provide boot scripts to change the permissions to > /dev/infiniband/umad*, I'd rather understand why the decision was made > to restrict access. because /dev/infiniband/umadX allows full unfiltered access to send/receive any MADs. Including changing routing tables, bringing ports down, etc. Not stuff that unprivileged users should be able to do. It would make sense to have a higher-level interface that only allows safe queries without side effects, but that's quite a bit more work than just changing permissions on device nodes. - R. From hal.rosenstock at gmail.com Mon Sep 22 12:19:07 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 22 Sep 2008 15:19:07 -0400 Subject: ***SPAM*** Re: [ofa-general] Allowing end-users to query for fabric information In-Reply-To: References: Message-ID: On Mon, Sep 22, 2008 at 2:42 PM, Mike Heinz wrote: > I recently noticed that OFED prohibits non-root users from doing sa queries. > This means that non-root users cannot, for example, diagnose connectivity > problems without the help of an administrator. > > What was the reason for making this design choice? It was decided as the default to limit what non root users can do the fabric as a primitive security measure. It controls more than just SA queries. > While I could certainly > provide boot scripts to change the permissions to /dev/infiniband/umad*, I'd > rather understand why the decision was made to restrict access. udev can easily be changed for a configuration that desires different defaults. -- Hal > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From jon at opengridcomputing.com Mon Sep 22 13:43:31 2008 From: jon at opengridcomputing.com (Jon Mason) Date: Mon, 22 Sep 2008 15:43:31 -0500 Subject: [ofa-general] [PATCH] iw_cxgb3: populate active_mtu in ib_port_attr Message-ID: <20080922204330.GA3943@opengridcomputing.com> When running ibv_devinfo, the active_mtu returned is garbage. This is due to the field not being populated in the query_port function in the driver. The patch below populates the active_mtu field with a MTU of 2k. It also zeros the struct, so that any new additions to it will return 0. Thanks, Jon Signed-Off-By: Jon Mason diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index eb778bf..ecff980 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1155,13 +1155,11 @@ static int iwch_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props) { PDBG("%s ibdev %p\n", __func__, ibdev); + + memset(props, 0, sizeof(struct ib_port_attr)); props->max_mtu = IB_MTU_4096; - props->lid = 0; - props->lmc = 0; - props->sm_lid = 0; - props->sm_sl = 0; + props->active_mtu = IB_MTU_2048; props->state = IB_PORT_ACTIVE; - props->phys_state = 0; props->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_SNMP_TUNNEL_SUP | @@ -1170,7 +1168,6 @@ static int iwch_query_port(struct ib_device *ibdev, IB_PORT_VENDOR_CLASS_SUP | IB_PORT_BOOT_MGMT_SUP; props->gid_tbl_len = 1; props->pkey_tbl_len = 1; - props->qkey_viol_cntr = 0; props->active_width = 2; props->active_speed = 2; props->max_msg_sz = -1; From or.gerlitz at gmail.com Mon Sep 22 13:46:40 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 22 Sep 2008 23:46:40 +0300 Subject: ***SPAM*** Re: [ofa-general] ipoib crashes with 2.6.27-rc7 In-Reply-To: References: <48D75E5A.7010908@voltaire.com> Message-ID: <15ddcffd0809221346n3ea620a9n8f2213db8079a162@mail.gmail.com> On Mon, Sep 22, 2008 at 6:32 PM, Roland Dreier wrote: > Well, are they regressions or not? I haven't try the same scheme with 2.6.26, as for the bug-being-a-regression criteria, can you clarify the current policy, are all post (say) -rc2/3 patches should be only fixes to regressions, or is it anything post rc1? is this a new convention? Have you debugged them at all? no, I didn't have the time for that nor for setting a 2.6.26 environment. Indeed the second one seems to be targeted by the patch Yossi sent. As for the first one, I'll double check if it happens also without an child interfaces etc. Or. -------------- next part -------------- An HTML attachment was scrubbed... URL: From or.gerlitz at gmail.com Mon Sep 22 14:00:48 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 23 Sep 2008 00:00:48 +0300 Subject: ***SPAM*** Re: [ofa-general] ipoib crashes with 2.6.27-rc7 In-Reply-To: References: <48D75E5A.7010908@voltaire.com> Message-ID: <15ddcffd0809221400w5977704ble47b5d1bb022414d@mail.gmail.com> On Mon, Sep 22, 2008 at 6:32 PM, Roland Dreier wrote: > Well, are they regressions or not? Over the last weeks I think that there were few patches which fixed something which resolved not to be a regression, so are they all queued now in the for-next branch of your tree? Is your review for create-qp-expanded thing being dependent on the xrc patch series merge? I understand that Ron was fixing things as was requested but the patches are still not queued for merge. We have now few applications that need cq moderation from user space, but I don't see a point to let someone work on that since the patch would be depedent on the qp-expanded patches which in turn are not it yet and so on. Or. -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Mon Sep 22 14:22:16 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 22 Sep 2008 16:22:16 -0500 Subject: [ofa-general] Re: [PATCH] iw_cxgb3: populate active_mtu in ib_port_attr In-Reply-To: <20080922204330.GA3943@opengridcomputing.com> References: <20080922204330.GA3943@opengridcomputing.com> Message-ID: <48D80C88.5080001@opengridcomputing.com> Acked-by: Steve Wise Jon Mason wrote: > When running ibv_devinfo, the active_mtu returned is garbage. This is > due to the field not being populated in the query_port function in the > driver. The patch below populates the active_mtu field with a MTU of > 2k. It also zeros the struct, so that any new additions to it will > return 0. > > Thanks, > Jon > > Signed-Off-By: Jon Mason > > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index eb778bf..ecff980 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -1155,13 +1155,11 @@ static int iwch_query_port(struct ib_device *ibdev, > u8 port, struct ib_port_attr *props) > { > PDBG("%s ibdev %p\n", __func__, ibdev); > + > + memset(props, 0, sizeof(struct ib_port_attr)); > props->max_mtu = IB_MTU_4096; > - props->lid = 0; > - props->lmc = 0; > - props->sm_lid = 0; > - props->sm_sl = 0; > + props->active_mtu = IB_MTU_2048; > props->state = IB_PORT_ACTIVE; > - props->phys_state = 0; > props->port_cap_flags = > IB_PORT_CM_SUP | > IB_PORT_SNMP_TUNNEL_SUP | > @@ -1170,7 +1168,6 @@ static int iwch_query_port(struct ib_device *ibdev, > IB_PORT_VENDOR_CLASS_SUP | IB_PORT_BOOT_MGMT_SUP; > props->gid_tbl_len = 1; > props->pkey_tbl_len = 1; > - props->qkey_viol_cntr = 0; > props->active_width = 2; > props->active_speed = 2; > props->max_msg_sz = -1; > From rdreier at cisco.com Mon Sep 22 14:44:38 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 Sep 2008 14:44:38 -0700 Subject: [ofa-general] ipoib crashes with 2.6.27-rc7 In-Reply-To: <15ddcffd0809221346n3ea620a9n8f2213db8079a162@mail.gmail.com> (Or Gerlitz's message of "Mon, 22 Sep 2008 23:46:40 +0300") References: <48D75E5A.7010908@voltaire.com> <15ddcffd0809221346n3ea620a9n8f2213db8079a162@mail.gmail.com> Message-ID: > I haven't try the same scheme with 2.6.26, as for the bug-being-a-regression > criteria, can you clarify the current policy, are all post (say) -rc2/3 > patches should be only fixes to regressions, or is it anything post rc1? is > this a new convention? Linus has been being stricter about what we merge late in the release cycle lately. Definitely if we are going to add a patch after rc7, it better be a very important fix that we're very sure of. - R. From rdreier at cisco.com Mon Sep 22 14:46:50 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 Sep 2008 14:46:50 -0700 Subject: [ofa-general] ipoib crashes with 2.6.27-rc7 In-Reply-To: <15ddcffd0809221400w5977704ble47b5d1bb022414d@mail.gmail.com> (Or Gerlitz's message of "Tue, 23 Sep 2008 00:00:48 +0300") References: <48D75E5A.7010908@voltaire.com> <15ddcffd0809221400w5977704ble47b5d1bb022414d@mail.gmail.com> Message-ID: > Over the last weeks I think that there were few patches which fixed > something which resolved not to be a regression, so are they all queued now > in the for-next branch of your tree? Most are queued -- there are some I still haven't merged yet and have sitting in my mailbox. > Is your review for create-qp-expanded thing being dependent on the xrc patch > series merge? Yes, because we can't add the libibverbs changes until XRC is ready because of ABI issues. So there's no sane way to get the kernel side of things done first either. This is mostly a mess caused by rushing XRC into OFED too fast. - R. From wangwhao at cn.ibm.com Mon Sep 22 19:24:17 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Tue, 23 Sep 2008 10:24:17 +0800 Subject: [ofa-general] dapltest couldn't read ABI version In-Reply-To: Message-ID: > Looks like rdma_ucm module didn't get loaded. Check /etc/infiniband/openib.conf and make sure loading rdma_ucm is set properly: > > [root at cst-53 dapl]# cat /etc/infiniband/openib.conf > # Start HCA driver upon boot > ONBOOT=yes > ># Load RDMA_CM module > RDMA_CM_LOAD=yes > > # Load RDMA_UCM module > RDMA_UCM_LOAD=yes > -arlin Thanks! After I reloaded all the modules including rdma_ucm on SLES, dapltest can be executed now. Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From arlin.r.davis at intel.com Mon Sep 22 21:13:19 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 22 Sep 2008 21:13:19 -0700 Subject: [ofa-general] [ANNOUNCE] new release of IB CM library - libibcm 1.0.4 Message-ID: libibcm 1.0.4 is now available from: http://www.openfabrics.org/downloads/rdmacm/ md5sum: 8f0c6930d638ef965b78dd48db4c6fa7 libibcm-1.0.4.tar.gz Most relevant change: libibcm - missing definition for __constant_cpu_to_be64, include OFED 1.4: Please use this version in the latest OFED release. -arlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Mon Sep 22 22:06:15 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 23 Sep 2008 08:06:15 +0300 Subject: [ofa-general] ipoib crashes with 2.6.27-rc7 In-Reply-To: References: <48D75E5A.7010908@voltaire.com> <15ddcffd0809221400w5977704ble47b5d1bb022414d@mail.gmail.com> Message-ID: <48D87947.6020607@voltaire.com> Roland Dreier wrote: > > Is your review for create-qp-expanded thing being dependent on the xrc patch > > series merge? > > Yes, because we can't add the libibverbs changes until XRC is ready because of ABI issues. So there's no sane way to get the kernel side of things done first either. > > This is mostly a mess caused by rushing XRC into OFED too fast. Anything we can do for helping getting out of this mess? from what I recall, Jack was very responsive to comments on his patches. Or. From wangwhao at cn.ibm.com Mon Sep 22 22:25:38 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Tue, 23 Sep 2008 13:25:38 +0800 Subject: [ofa-general] ibsysstat cpu output is incomplete Message-ID: Hi all: I find the output if "ibsysstat cpu" is not complete. This issue exists on all my cluster nodes, with RHEL/SLES and OFED 1.3.1/1.4-RC1 installed. [root at xblade07 ~]# ibsysstat 13 cpu cpu 0: model Genuine Intel(R) CPU @ 2.83GHz MHZ 2833.512 cpu 1: model Genuine Intel(R) CPU @ 2.83GHz MHZ 2833.512 cpu 2: model Genuine Intel(R) CPU @ 2.83GHz M ---------------------> something missed [root at xblade07 ~]# ibsysstat 13 ping sysstat ping succeeded [root at xblade07 ~]# ibsysstat 13 host xblade03 [root at xblade07 ~]# ssh xblade03 cat /proc/cpuinfo root at xblade03's password: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU @ 2.83GHz stepping : 4 cpu MHz : 2833.512 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5669.08 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU @ 2.83GHz stepping : 4 cpu MHz : 2833.512 cache size : 6144 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5666.07 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU @ 2.83GHz stepping : 4 cpu MHz : 2833.512 cache size : 6144 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5666.07 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU @ 2.83GHz stepping : 4 cpu MHz : 2833.512 cache size : 6144 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5666.13 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU @ 2.83GHz stepping : 4 cpu MHz : 2833.512 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5666.13 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU @ 2.83GHz stepping : 4 cpu MHz : 2833.512 cache size : 6144 KB physical id : 1 siblings : 4 core id : 1 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5666.09 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU @ 2.83GHz stepping : 4 cpu MHz : 2833.512 cache size : 6144 KB physical id : 1 siblings : 4 core id : 2 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5666.16 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU @ 2.83GHz stepping : 4 cpu MHz : 2833.512 cache size : 6144 KB physical id : 1 siblings : 4 core id : 3 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5666.15 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: The cluster contains IBM HS21 XM blades and LS21 blades. I have not tried OFED 1.4 RC2, because I failed to find any related fixed bug in RC2 fixed bug list. Is this one known limitation/bug, or I missed some needed configuration? Thanks in advance! Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanba at gmail.com Mon Sep 22 22:48:17 2008 From: dotanba at gmail.com (Dotan Barak) Date: Tue, 23 Sep 2008 08:48:17 +0300 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD904AF6@mtlexch01.mtl.com> References: <6978b4af0809190739l155f81b7q9668e17406f83eca@mail.gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD904AF6@mtlexch01.mtl.com> Message-ID: <2f3bf9a60809222248x2f2116b4x5e3fe6ed08bd846c@mail.gmail.com> Rui: Do you access the memory as "volatile" (becuase the HCA changes its content)? Can you try do check if RDMA Write/Read behave the same for you? Dotan 2008/9/22 Ronni Zimmermann : > Hi, > We run tests which use atomic operations (both fetch and add and comp and swap) on PPC64 all the time, without experiencing any problem. > > Just to make sure I ran few simple tests, which use atomic operations, on our PPC64 machines, both with SLES10 SP1 and with RHAS5.1, and all of them passed. > I was working with the latest OFED1.4 driver and mlx4 HCA with the latest released FW and with FW 2.3.000 (on the SLES10 SP1 machine). > > Given the above information I believe that there's either a problem with your code (although looking at the code you posted I couldn't see anything wrong) or it's an OFED1.2.5 issue, as Dotan suggested. > > Ronni. > > > -----Original Message----- > From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Rui Machado > Sent: ו 19 ספטמבר 2008 17:39 > To: Dotan Barak > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] atomic operations on ppc64 > > 2008/9/19 Dotan Barak : >> >>>> It seems that you didn't check the same HCA in each arch. >>>> >>>> Can you try to use the mthca device in PPC64 and check the results? >>>> >>>> (Anyway, i would have suggest you to upgrade the OFED package that >>>> you are >>>> using) >>>> >>> >>> I apologize for my newbieness here but I do not understand what you mean. >>> The two machines have different devices. What do you mean by making >>> the ppc64 machine use the mthca when it does only have the mlx4? >>> >>> Unfortunately, upgrading might not be an (easy) option :-/ >>> >>> Thanks for the patience >>> >> >> You have 2 machines, each with another device: >> x86 has the HCA 25204 >> PPC64 has the HCA 25418 >> >> >> The fact that the test didn't pass for you on the PPC64 may be related >> to this device (i think that OFED 1.2.5 was the first version that >> supported this device). >> >> So, i suggest that you'll put the HCA 25204 on the PPC64 and check if >> the failure still exists. >> >> >> Dotan >> > > Right :) Now I get it! > > I couldn't (yet at least) put the HCA 25204 on the ppc64. > > But I tried again with another ppc64 machine. > > ibv_devinfo > hca_id: mthca0 > fw_ver: 4.8.200 > node_guid: 0005:ad00:001d:cd24 > sys_image_guid: 0005:ad00:0100:d050 > vendor_id: 0x05ad > vendor_part_id: 25208 > hw_ver: 0xA0 > board_id: HCA.HSDC.A0 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 142 > port_lid: 156 > port_lmc: 0x01 > > port: 2 > state: PORT_DOWN (1) > max_mtu: 2048 (4) > active_mtu: 512 (2) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > > The result was still the same as before: > ppc64-ppc64 - the counter doesn't get updated. > ppc64-x86 - clients hangs at poll > > Just as an extra info... > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From eli at mellanox.co.il Tue Sep 23 01:39:56 2008 From: eli at mellanox.co.il (Eli Cohen) Date: Tue, 23 Sep 2008 11:39:56 +0300 Subject: [ofa-general] [PATCH] IB/ipoib: Fix typo in IPoIB CM service ID Message-ID: <20080923083956.GA14288@mtls03> Set the correct service ID, as defined by RFC 4755: +--------+--------+--------+--------+-------+-------+--------+-------+ |00000001| Type | Reserved | QPN | +--------+--------+--------+--------+-------+-------+--------+-------+ The "Type" field MUST be set to 0. The "Reserved" field MUST be set to zeroes. The QPN MUST be the UD QP exchanged during address resolution. This fixes bug #1121. Found by: Alex Estrin Signed-off-by: Eli Cohen --- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 341ffed..a93ca17 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -55,7 +55,7 @@ MODULE_PARM_DESC(cm_data_debug_level, "Enable data path debug tracing for connected mode if > 0"); #endif -#define IPOIB_CM_IETF_ID 0x1000000000000000ULL +#define IPOIB_CM_IETF_ID 0x100000000000000ULL #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ) #define IPOIB_CM_RX_TIMEOUT (2 * 256 * HZ) -- 1.6.0.2 From ogerlitz at voltaire.com Tue Sep 23 01:58:37 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 23 Sep 2008 11:58:37 +0300 Subject: [ofa-general] Re: [Bug 1194] ipoib_ud_test causes Kernel panic caused by deadlock In-Reply-To: <20080923084437.486ACE60D60@openfabrics.org> References: <20080923084437.486ACE60D60@openfabrics.org> Message-ID: <48D8AFBD.3040405@voltaire.com> > https://bugs.openfabrics.org/show_bug.cgi?id=1194 > eli at mellanox.co.il changed: > What |Removed |Added > ---------------------------------------------------------------------------- > Status|NEW |RESOLVED > Resolution| |FIXED > ------- Comment #2 from eli at mellanox.co.il 2008-09-23 01:44 ------- > Fixed. Patch sent to Roland and will appear in next OFED build Sending a patch is an important step towards fixing a bug, however the patch review is not less important... why put something in before it was reviewed and accepted to the mainline kernel? a patch to solve bug X can cause or reopen Y,Z, W etc new bugs,etc and the review is the way to try and make sure this doesn't happen. Haven't we learned the lesson from the previous release? didn't people came from world wide locations to Sonoma and discussed something? Or. From ogerlitz at voltaire.com Tue Sep 23 02:01:24 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 23 Sep 2008 12:01:24 +0300 Subject: [ofa-general] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: <20080921132522.GA25090@mtls03> References: <20080921132522.GA25090@mtls03> Message-ID: <48D8B064.4060004@voltaire.com> Eli Cohen wrote: > Moreover, that commit also allowed path resolution to be triggered for an invalid path; if that path resolution failed, old_ah would be freed outside priv->lock violating the assumption that dropping references inside the lock are guaranteed not to reach zero reference. and what would happen next, I understand its the oops shown in https://bugs.openfabrics.org/show_bug.cgi?id=1194, I guess you want to mention that in the change log Or. From vlad at lists.openfabrics.org Tue Sep 23 03:10:36 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 23 Sep 2008 03:10:36 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080923-0200 daily build status Message-ID: <20080923101036.66379E60D5B@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: From monis at Voltaire.COM Tue Sep 23 04:01:00 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Tue, 23 Sep 2008 14:01:00 +0300 Subject: [ofa-general] Re: [ewg] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: <20080921132522.GA25090@mtls03> References: <20080921132522.GA25090@mtls03> Message-ID: <48D8CC6C.3000309@Voltaire.COM> Eli Cohen wrote: > Commit ee1e2c82c245a5fb2864e9dbcdaab3390fde3fcc introduced an > optimization on path flushing. This caused a new possible scenario in > which unicast_arp_send triggers path query which could fail, causing > path->ah to become NULL. A successive successfull path query will then > trigger WARN_ON() in path_rec_completion(). This fix requires old_ah > to differ from NULL as a prerequsite to trigger the WARN_ON(). > Moreover, that commit also allowed path resolution to be triggered for > an invalid path; if that path resolution failed, old_ah would be freed > outside priv->lock violating the assumption that dropping references > inside the lock are guaranteed not to reach zero reference. > Eli Roland, I understand that this patch is going to be in OFED. What about upstream kernel? I'd like to add improvements to commit ee1e2c82c245a5fb2864e9dbcdaab3390fde3fcc (the one you referred to) and it will probably be on top of your fix. I'm sorry if I missed Roland's answer. thanks From eli at dev.mellanox.co.il Tue Sep 23 04:07:58 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 23 Sep 2008 14:07:58 +0300 Subject: [ofa-general] Re: [ewg] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: <48D8CC6C.3000309@Voltaire.COM> References: <20080921132522.GA25090@mtls03> <48D8CC6C.3000309@Voltaire.COM> Message-ID: <20080923110757.GA14422@mtls03> On Tue, Sep 23, 2008 at 02:01:00PM +0300, Moni Shoua wrote: > Eli Cohen wrote: > > Commit ee1e2c82c245a5fb2864e9dbcdaab3390fde3fcc introduced an > > optimization on path flushing. This caused a new possible scenario in > > which unicast_arp_send triggers path query which could fail, causing > > path->ah to become NULL. A successive successfull path query will then > > trigger WARN_ON() in path_rec_completion(). This fix requires old_ah > > to differ from NULL as a prerequsite to trigger the WARN_ON(). > > Moreover, that commit also allowed path resolution to be triggered for > > an invalid path; if that path resolution failed, old_ah would be freed > > outside priv->lock violating the assumption that dropping references > > inside the lock are guaranteed not to reach zero reference. > > > > Eli Roland, > I understand that this patch is going to be in OFED. > What about upstream kernel? > I'd like to add improvements to commit ee1e2c82c245a5fb2864e9dbcdaab3390fde3fcc (the one you referred to) and it will probably be on top of your fix. > > I'm sorry if I missed Roland's answer. > I don't think Roland responded to this patch yet. Still, I think it is important that this patch is reviewed since we have a regression relative to 2.6.26. From monis at Voltaire.COM Tue Sep 23 04:15:47 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Tue, 23 Sep 2008 14:15:47 +0300 Subject: [ofa-general] Re: [ewg] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: <20080923110757.GA14422@mtls03> References: <20080921132522.GA25090@mtls03> <48D8CC6C.3000309@Voltaire.COM> <20080923110757.GA14422@mtls03> Message-ID: <48D8CFE3.5050800@Voltaire.COM> Eli Cohen wrote: > On Tue, Sep 23, 2008 at 02:01:00PM +0300, Moni Shoua wrote: >> Eli Cohen wrote: >>> Commit ee1e2c82c245a5fb2864e9dbcdaab3390fde3fcc introduced an >>> optimization on path flushing. This caused a new possible scenario in >>> which unicast_arp_send triggers path query which could fail, causing >>> path->ah to become NULL. A successive successfull path query will then >>> trigger WARN_ON() in path_rec_completion(). This fix requires old_ah >>> to differ from NULL as a prerequsite to trigger the WARN_ON(). >>> Moreover, that commit also allowed path resolution to be triggered for >>> an invalid path; if that path resolution failed, old_ah would be freed >>> outside priv->lock violating the assumption that dropping references >>> inside the lock are guaranteed not to reach zero reference. >>> >> Eli Roland, >> I understand that this patch is going to be in OFED. >> What about upstream kernel? >> I'd like to add improvements to commit ee1e2c82c245a5fb2864e9dbcdaab3390fde3fcc (the one you referred to) and it will probably be on top of your fix. >> >> I'm sorry if I missed Roland's answer. >> > > I don't think Roland responded to this patch yet. Still, I think it is > important that this patch is reviewed since we have a regression > relative to 2.6.26. > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > I agree I gave a thought here. It's possible, when path_rec_completion() is called with nonzero status, to do nothing with ah. Only when path query finishes with success do the replacement. This is good for cases when old_ah is still good (no remote LID change happened). Besides that I think that the patch is correct. From eli at dev.mellanox.co.il Tue Sep 23 06:22:16 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 23 Sep 2008 16:22:16 +0300 Subject: [ofa-general] Re: [ewg] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: <48D8CFE3.5050800@Voltaire.COM> References: <20080921132522.GA25090@mtls03> <48D8CC6C.3000309@Voltaire.COM> <20080923110757.GA14422@mtls03> <48D8CFE3.5050800@Voltaire.COM> Message-ID: <20080923132216.GB14422@mtls03> On Tue, Sep 23, 2008 at 02:15:47PM +0300, Moni Shoua wrote: > I agree > I gave a thought here. > It's possible, when path_rec_completion() is called with nonzero status, to do nothing with ah. > Only when path query finishes with success do the replacement. > This is good for cases when old_ah is still good (no remote LID change happened). But what you're saying is that even though no SM responded to a path query, you still want to use the old ah which is an even stronger request than what the original path did (which waited until the new path resolution to complete before updating the ah). Can you think of scenario that can cause this? From tziporet at dev.mellanox.co.il Tue Sep 23 06:41:27 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 23 Sep 2008 16:41:27 +0300 Subject: [ofa-general] [PATCH] IB/ipoib: Fix typo in IPoIB CM service ID In-Reply-To: <20080923083956.GA14288@mtls03> References: <20080923083956.GA14288@mtls03> Message-ID: <48D8F207.7090703@mellanox.co.il> Eli Cohen wrote: > Set the correct service ID, as defined by RFC 4755: > > +--------+--------+--------+--------+-------+-------+--------+-------+ > |00000001| Type | Reserved | QPN | > +--------+--------+--------+--------+-------+-------+--------+-------+ > > The "Type" field MUST be set to 0. > The "Reserved" field MUST be set to zeroes. > The QPN MUST be the UD QP exchanged during address resolution. > > Note that this change can make IPoIB not to work between new and older kernels Need to think when and if we wish to do this change Tziporet From tziporet at mellanox.co.il Tue Sep 23 07:36:59 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 23 Sep 2008 17:36:59 +0300 Subject: [ofa-general] OFED meeting minutes Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD94DD2F@mtlexch01.mtl.com> > OFED September 22 meeting summary on OFED 1.4 status: > Meeting Summary: ============== 1. General status: RC2 releases on Sunday Sep 21 and being used in the plugfest. 2. Rc3 is expected on Oct 7 Decided to cleanup compilation warnings by half for RC3 and totally for the GA version 3. Decided not to include both Open MPI 1.2.7 and 1.3 beta in the release. For now we stay with 1.2.7 4. No one from Chelsio or Neteffect participated thus iWARP status is not clear. Details: ====== > 1. High priority bugs review: > > 1128 blo > Othe stefan.roscher at de.ibm.com release IPoIB-CM QP resources in > flushing CQE context - fixed > 1192 cri > RHEL alekseys at voltaire.com Failed to comple ofa_kernel RPM > on RH4 up7, ia64 arch' - fixed > 1215 cri > Othe bugzilla at openib.org The driver is not recognizing > the device under CentOS 5.2... - Xen kernel > 1199 cri > RHEL dorons at voltaire.com tvflash and tgt-generic fail to > build if --prefix is used - fixed > 1188 cri > RHEL eli at mellanox.co.il kernel panic in neigh_destroy > during ib_ipoib unload - fixed > 1191 cri > SLES eli at mellanox.co.il Failed to load ib_ipoib on > sles10 sp2, ia64 - Voltaire will fix > 1196 cri > SLES eli at mellanox.co.il rds module is in use right after > installation and thus op... - assigned to Andy from Oracle > 1113 cri > RHEL vu at mellanox.com rpm -e > scsi-target-utils-0.1-2008715 fails > 1117 cri > SLES yannick.cote at qlogic.com ib_ipath module hangs on unload > - QL need to send a patch to Vlad > 1186 cri > RHEL yosefe at voltaire.com lockup in IPoIB during module > unload - fixed > 1198 cri > SLES yosefe at voltaire.com hang during ipoib > create_child/ifdown - fixed > 1164 maj > SLES eli at mellanox.co.il iperf over IPoIB fails for 100 > tcp connections > 1183 maj > SLES tziporet at mellanox.co.il higest priority doesn't > become the MASTER opensm - Mellanox FW issue - not going to > review > 1099 maj > All vlad at mellanox.co.il IPoIB IPv6 does not work > - problem with RHEL4 only - on work > 1153 maj > Othe yosefe at voltaire.com OpenSM- Multicast group will not > open when IB host is the... - Voltaire will send a patch > All - please update bug status in bugzilla 2. Testing status: Intel - 16 nodes cluster pass with RHEL 4.4 and ROCKS, and few Itanium nodes. Will have 64 nodes soon. IBM - testing on progress, no special issues Qlogic - passing small cluster, fox critical bugs, running Intel MPI Voltaire - testing all arch OSes and ULPs. bonding Mellanox - testing all ULPs beside NFS-RDMA and iSER. Covering x86, x86_64 and PPC64 on most of supported OSes OSU - no special update Open MPI - 1.2.7 is stable. Since we decided not to include 1.3 beta in OFED 1.4 release notes will be updated to pint to the 1.3. 3. NFS-RDMA: Jeff work on RHEL 5.1. Should be ready next week. 4. OFED to support more OSes: CenOS - already agreed to add version 5.2 - need to update the roadmap on the web. OEL (Oracle) - Qlogic will check if they can test it. It can be added as a partially tested system. 5. Compilation warnings: Today we have ~1800 compilation warnings. We wish to reduce by half for RC3. Betsy and John from Qlogic will open bugs on ULPs with many warnings 6. Track changes between RCs: Will add git diffs between new RC and the previous one. This will enable all to understand the type of changes in each RC > Tziporet > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Sep 23 07:47:49 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 Sep 2008 07:47:49 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: Fix typo in IPoIB CM service ID In-Reply-To: <20080923083956.GA14288@mtls03> (Eli Cohen's message of "Tue, 23 Sep 2008 11:39:56 +0300") References: <20080923083956.GA14288@mtls03> Message-ID: > -#define IPOIB_CM_IETF_ID 0x1000000000000000ULL > +#define IPOIB_CM_IETF_ID 0x100000000000000ULL Oh man, this means old and new kernels can't work together. It seems the only way we could ever change this would be to listen on both service IDs, and if we get a reject on connect retry with the old (wrong) service ID. And then maybe in a few years we can get rid of that code. This sucks, oh well. - R. From rdreier at cisco.com Tue Sep 23 07:49:51 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 Sep 2008 07:49:51 -0700 Subject: [ofa-general] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: <48D8B064.4060004@voltaire.com> (Or Gerlitz's message of "Tue, 23 Sep 2008 12:01:24 +0300") References: <20080921132522.GA25090@mtls03> <48D8B064.4060004@voltaire.com> Message-ID: > > Moreover, that commit also allowed path resolution to be triggered > > for an invalid path; if that path resolution failed, old_ah would > > be freed outside priv->lock violating the assumption that dropping > > references inside the lock are guaranteed not to reach zero > > reference. > and what would happen next, I understand its the oops shown in > https://bugs.openfabrics.org/show_bug.cgi?id=1194, I guess you want to > mention that in the change log So is this patch fixing a crash or just avoiding a WARN_ON? I'm not going to send a change that just affects warnings for 2.6.27, but if it's an oops that triggers in real life then that's different. From tziporet at mellanox.co.il Tue Sep 23 08:10:07 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 23 Sep 2008 18:10:07 +0300 Subject: [ofa-general] RE: [ewg] OFED 1.4-rc2 is available In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD9043D4@mtlexch01.mtl.com> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD94DD8B@mtlexch01.mtl.com> Changes in kernel from rc1 and rc2 attached Tziporet -----Original Message----- From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Monday, September 22, 2008 10:18 AM To: ewg at lists.openfabrics.org Cc: general at lists.openfabrics.org Subject: [ewg] OFED 1.4-rc2 is available Hi, OFED 1.4-rc2 release is available on http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc2.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ for OFED 1.4 Tziporet & Vladimir ======================================================================== Release information: ------------------------------ Linux Operating Systems: - RedHat EL4 up4: 2.6.9-42.ELsmp * - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7: 2.6.9-78.ELsmp - RedHat EL5: 2.6.18-8.el5 - RedHat EL5 up1: 2.6.18-53.el5 - RedHat EL5 up2: 2.6.18-92.el5 - CentOS 5.2: 2.6.18-92.el5 - Fedora C9: 2.6.25-14.fc9 * - SLES10: 2.6.16.21-0.8-smp - SLES10 SP1: 2.6.16.46-0.12-smp - SLES10 SP1 up1: 2.6.16.53-0.16-smp - SLES10 SP2: 2.6.16.60-0.21-smp - OpenSuSE 10.3: 2.6.22.5-31 * - kernel.org: 2.6.26 and 2.6.27-rc5 * Minimal QA for these versions Systems: * x86_64 * x86 * ia64 * ppc64 Main Changes from OFED 1.4-rc1 ========================= - Kernel base updated to 2.5.27-rc6 - Updated MPI packages: mvapich-1.1.0-2977 and mvapich2-1.2rc2-6 - Updated bonding package: ib-bonding-0.9.0-30 - 12 bugs fixed (see attached for details) Tasks that should be completed for the rc3: ================================ 1. NFS-RDMA to work on RHEL 5.1 2. OSM: Cashed routing 3. Cleanup compilation warning 4. Bug fixes -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: git-log-ofed-1.4-rc1_rc2.txt URL: From eli at dev.mellanox.co.il Tue Sep 23 08:16:27 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 23 Sep 2008 18:16:27 +0300 Subject: [ofa-general] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: References: <20080921132522.GA25090@mtls03> <48D8B064.4060004@voltaire.com> Message-ID: <20080923151627.GC14422@mtls03> On Tue, Sep 23, 2008 at 07:49:51AM -0700, Roland Dreier wrote: > > So is this patch fixing a crash or just avoiding a WARN_ON? I'm not > going to send a change that just affects warnings for 2.6.27, but if > it's an oops that triggers in real life then that's different. It's not just the WARN_ON thing. It also prevents a deadlock as appears in bug #1194, caused by dropping the last reference on an ah while priv->lock is held. From swise at opengridcomputing.com Tue Sep 23 08:22:41 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 23 Sep 2008 10:22:41 -0500 Subject: [ofa-general] Re: [ewg] OFED meeting minutes In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD94DD2F@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD94DD2F@mtlexch01.mtl.com> Message-ID: <48D909C1.40405@opengridcomputing.com> Tziporet Koren wrote: > > OFED September 22 meeting summary on OFED 1.4 status: > > Meeting Summary: > > ============== > > 1. General status: RC2 releases on Sunday Sep 21 and being used in the > plugfest. > > 2. Rc3 is expected on Oct 7 > > Decided to cleanup compilation warnings by half for RC3 and > totally for the GA version > > 3. Decided not to include both Open MPI 1.2.7 and 1.3 beta in the > release. For now we stay with 1.2.7 > > 4. No one from Chelsio or Neteffect participated thus iWARP status is > not clear. > Sorry I missed the meeting. I've run some quick *MPI* regressions over chelsio's rnic on Friday's daily build. Looked ok. I'll update my cluster asap and regression test -RC2. Steve. From monis at Voltaire.COM Tue Sep 23 08:37:04 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Tue, 23 Sep 2008 18:37:04 +0300 Subject: [ofa-general] Re: [ewg] [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: <20080923132216.GB14422@mtls03> References: <20080921132522.GA25090@mtls03> <48D8CC6C.3000309@Voltaire.COM> <20080923110757.GA14422@mtls03> <48D8CFE3.5050800@Voltaire.COM> <20080923132216.GB14422@mtls03> Message-ID: <48D90D20.7080704@Voltaire.COM> Eli Cohen wrote: > On Tue, Sep 23, 2008 at 02:15:47PM +0300, Moni Shoua wrote: >> I agree >> I gave a thought here. >> It's possible, when path_rec_completion() is called with nonzero status, to do nothing with ah. >> Only when path query finishes with success do the replacement. >> This is good for cases when old_ah is still good (no remote LID change happened). > > But what you're saying is that even though no SM responded to a path > query, you still want to use the old ah which is an even stronger > request than what the original path did (which waited until the new > path resolution to complete before updating the ah). Can you think of > scenario that can cause this? > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > I agree that this is a stronger request now. Suppose a host that has an ah to another host gets a SM_CHANGE event. I would like that the first host will send path queries to the existing path because there is a chance that the other host changed its LID. However, the new SM might not be ready immediately so the first host might need to retry the path refresh. In this case I would like to keep the old ah since the chances that it is a good ah are very high. BTW, the description of the scenario above is in a code that I intend to send soon. From ruimario at gmail.com Tue Sep 23 09:06:17 2008 From: ruimario at gmail.com (Rui Machado) Date: Tue, 23 Sep 2008 18:06:17 +0200 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <2f3bf9a60809222248x2f2116b4x5e3fe6ed08bd846c@mail.gmail.com> References: <6978b4af0809190739l155f81b7q9668e17406f83eca@mail.gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD904AF6@mtlexch01.mtl.com> <2f3bf9a60809222248x2f2116b4x5e3fe6ed08bd846c@mail.gmail.com> Message-ID: <6978b4af0809230906h14a50f09l1f643e9967d3f72c@mail.gmail.com> Hey guys, sorry for some delay. > Do you access the memory as "volatile" (becuase the HCA changes its content)? Nop. > Can you try do check if RDMA Write/Read behave the same for you? > RDMA write and read seems to work fine. > 2008/9/22 Ronni Zimmermann : >> Hi, >> We run tests which use atomic operations (both fetch and add and comp and swap) on PPC64 all the time, without experiencing any problem. >> >> Just to make sure I ran few simple tests, which use atomic operations, on our PPC64 machines, both with SLES10 SP1 and with RHAS5.1, and all of them passed. >> I was working with the latest OFED1.4 driver and mlx4 HCA with the latest released FW and with FW 2.3.000 (on the SLES10 SP1 machine). >> >> Given the above information I believe that there's either a problem with your code (although looking at the code you posted I couldn't see anything wrong) or it's an OFED1.2.5 issue, as Dotan suggested. >> OK thanks for the feedback. We have ppc64 machines with mlx4 and mthca0 (from ibv_devinfo) ) Both don't work. Any experience with the mthca0? It is older and should be better supported on 1.2.5 or? My priority is the machines with the mlx4 but of course I would like to see both working. I also tried with a 2.6.26.2 kernel (had it at hand) with the same ofed1.2.5 installation and still see the problem. I guess my last and longest try to install the whole ofed 1.4 package. btw: changing hardware is not an option :-/ Cheers and thanks for the support, From vegalew at hotmail.com Wed Sep 24 00:52:59 2008 From: vegalew at hotmail.com (vega) Date: Wed, 24 Sep 2008 15:52:59 +0800 Subject: [ofa-general] ***SPAM*** Is there a version of OFED suitable for Red Hat Enterprise Linux WS3.0 and Voltaire infiniband Message-ID: Dear all, My system is Red Hat Enterprise Linux WS3.0. And my infiniband was manufactured by Voltaire. When I tried to install the current version of OFED, the installation programme need gcc-3.3.3. So I think maybe my kernel was also quite old that not fit the current version of OFED. Is there a old version that support my system and manufacturer of infiniband both? thank you for reading. any hints will be deeply appreciated. vega ================================================================================= Vega Lew (weijia liu) PH.D Candidate in Chemical Engineering State Key Laboratory of Materials-oriented Chemical Engineering College of Chemistry and Chemical Engineering Nanjing University of Technology, 210009, Nanjing, Jiangsu, China -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at dev.mellanox.co.il Wed Sep 24 02:45:28 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 24 Sep 2008 12:45:28 +0300 Subject: [ofa-general] ***SPAM*** Is there a version of OFED suitable for Red Hat Enterprise Linux WS3.0 and Voltaire infiniband In-Reply-To: References: Message-ID: <48DA0C38.9040208@mellanox.co.il> vega wrote: > Dear all, > My system is Red Hat Enterprise Linux /WS3/.0. And my infiniband was > manufactured by Voltaire. > When I tried to install the current version of OFED, the installation > programme need gcc-3.3.3. So I think > maybe my kernel was also quite old that not fit the current version of > OFED. > Is there a old version that support my system and manufacturer of > infiniband both? > thank you for reading. > any hints will be deeply appreciated. OFED does not support ant EHEL 3.x since its kernel is based on 2.6.24 and not 2.6.26 You must move to RHEL 4.4 or above to use OFED Tziporet From vlad at lists.openfabrics.org Wed Sep 24 03:13:10 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 24 Sep 2008 03:13:10 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080924-0200 daily build status Message-ID: <20080924101310.51DD4E60D6C@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Passed on ppc64 with linux-2.6.19 Failed: From vlad at lists.openfabrics.org Wed Sep 24 04:45:44 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 24 Sep 2008 04:45:44 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080924-0330 daily build status Message-ID: <20080924114544.F154DE60D8F@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.21.1 Log: /home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1833: error: 'struct scatterlist' has no member named 'dma_address' /home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h: In function 'ib_sg_dma_len': /home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1846: error: 'struct scatterlist' has no member named 'dma_length' make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath/ipath_dma.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.21.1_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.21.1_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.21.1' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c: In function 'ehca_poll_eqs': /home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:942: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type /home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:946: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.24_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080924-0330_linux-2.6.24_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From jackm at dev.mellanox.co.il Wed Sep 24 06:59:19 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 24 Sep 2008 16:59:19 +0300 Subject: [ofa-general] [PATCH] libibmad: eliminate compiler warnings on x86_64 Message-ID: <200809241659.19744.jackm@dev.mellanox.co.il> libibmad: eliminate compiler warnings on x86_64 The snprintf's below generated warnings of the form: warning: format '%010lx' expects type 'long unsigned int', but argument 4 has type 'long long unsigned int' on 64-bit systems -- due to the influence of the <>llu constants. Casting solves this. Signed-off-by: Jack Morgenstein --- Please fix for upcoming OFED 1.4 release candidate. Index: libibmad/src/dump.c =================================================================== --- libibmad.orig/src/dump.c 2008-09-24 09:20:04.000000000 +0300 +++ libibmad/src/dump.c 2008-09-24 09:22:00.000000000 +0300 @@ -115,13 +115,13 @@ mad_dump_hex(char *buf, int bufsz, void snprintf(buf, bufsz, "0x%08x", *(uint32_t *)val); break; case 5: - snprintf(buf, bufsz, "0x%010" PRIx64, *(uint64_t *)val & 0xffffffffffllu); + snprintf(buf, bufsz, "0x%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffllu); break; case 6: - snprintf(buf, bufsz, "0x%012" PRIx64, *(uint64_t *)val & 0xffffffffffffllu); + snprintf(buf, bufsz, "0x%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffllu); break; case 7: - snprintf(buf, bufsz, "0x%014" PRIx64, *(uint64_t *)val & 0xffffffffffffffllu); + snprintf(buf, bufsz, "0x%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffllu); break; case 8: snprintf(buf, bufsz, "0x%016" PRIx64, *(uint64_t *)val); @@ -149,13 +149,13 @@ mad_dump_rhex(char *buf, int bufsz, void snprintf(buf, bufsz, "%08x", *(uint32_t *)val); break; case 5: - snprintf(buf, bufsz, "%010" PRIx64, *(uint64_t *)val & 0xffffffffffllu); + snprintf(buf, bufsz, "%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffllu); break; case 6: - snprintf(buf, bufsz, "%012" PRIx64, *(uint64_t *)val & 0xffffffffffffllu); + snprintf(buf, bufsz, "%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffllu); break; case 7: - snprintf(buf, bufsz, "%014" PRIx64, *(uint64_t *)val & 0xffffffffffffffllu); + snprintf(buf, bufsz, "%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffllu); break; case 8: snprintf(buf, bufsz, "%016" PRIx64, *(uint64_t *)val); From jackm at dev.mellanox.co.il Wed Sep 24 06:59:23 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 24 Sep 2008 16:59:23 +0300 Subject: [ofa-general] [PATCH] infiniband-diags: eliminate compiler warnings Message-ID: <200809241659.23447.jackm@dev.mellanox.co.il> infiniband: eliminate compiler warnings on x86_64 The printf's below generated warnings of the form: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'long long unsigned int' on 64-bit systems -- due to the influence of the <>ull constants. Casting solves this. Signed-off-by: Jack Morgenstein --- Please fix for upcoming OFED 1.4 release candidate. Index: infiniband-diags/src/ibping.c =================================================================== --- infiniband-diags.orig/src/ibping.c 2008-09-21 17:09:26.000000000 +0300 +++ infiniband-diags/src/ibping.c 2008-09-24 16:38:16.000000000 +0300 @@ -174,7 +174,7 @@ report(int sig) printf("\n--- %s (%s) ibping statistics ---\n", last_host, portid2str(&portid)); printf("%" PRIu64 " packets transmitted, %" PRIu64 " received, %" PRIu64 "%% packet loss, time %" PRIu64 " ms\n", ntrans, replied, - (lost != 0) ? lost * 100ull / ntrans : 0ull, total_time / 1000ull); + (uint64_t) ((lost != 0) ? lost * 100ull / ntrans : 0ull), (uint64_t) (total_time / 1000ull)); printf("rtt min/avg/max = %" PRIu64 ".%03" PRIu64 "/%" PRIu64 ".%03" PRIu64 "/%" PRIu64 ".%03" PRIu64 " ms\n", minrtt == ~0ull ? 0 : minrtt/1000, minrtt == ~0ull ? 0 : minrtt%1000, From olga.shern at gmail.com Wed Sep 24 07:55:24 2008 From: olga.shern at gmail.com (Olga Shern (Voltaire)) Date: Wed, 24 Sep 2008 17:55:24 +0300 Subject: [ofa-general] Intermittent: ib0: multicast join failed In-Reply-To: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> References: <2C7DE72B9BD00F44BAECA5B0CBB873953217F5@hermes.terascala.com> Message-ID: On Thu, Sep 18, 2008 at 11:45 PM, Roger Spellman wrote: > I have many nodes, each with a Mellanox MT25204. When I reboot some nodes, > they occasionally get the following error: > > > > ib0: multicast join failed > > There can be a case that the first join will fail, but IPoIB sends joins until it will success. Usually you will not see that it failed to join the second time (every time it failed it will print this message) If your interface (ib0) is in running state - then it has successfully joined and you don't need to reboot the node. Is this your case? > > Rebooting the system almost always solves this problem. > > > > What causes this? > > > > Is there a way to solve this without rebooting? > > > > Thanks. > > > > Roger Spellman > Sr. Staff Engineer > > Terascala, Inc. > > www.terascala.com > > > > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From yosefe at Voltaire.COM Wed Sep 24 08:23:56 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Wed, 24 Sep 2008 18:23:56 +0300 Subject: [ofa-general] [PATCH v2] ipoib: fix a deadlock between ipoib start/stop and child interface create/delete Message-ID: <48DA5B8C.50403@Voltaire.COM> Fix a deadlock between child interface creation/deletion and ipoib start/stop. The former takes first vlan_mutex, and might take rtnl_lock via register_netdev or unregister_netdev. The latter is executed with rtnl_lock held, and tries to take vlan_mutex. We take the vlan_mutex and bring child interface up/down on a scheduled task instead of during stop/start, since ipoib_workqueue will not be flushed with rtnl_lock held. Signed-off-by: Yossi Etigin --- Fix bug #1198. One alternative approach might be to fine-grain the locking (for example use one mutex to sync child creation/deletion, and another one to sync accesses to child_intfs list). Changes from v1: - declare an atomic flag in ipoib_dev_priv and use it instead of work_struct::data drivers/infiniband/ulp/ipoib/ipoib.h | 3 ++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 35 +++++------------------------- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 22 ++++++++++++++++++ 3 files changed, 31 insertions(+), 29 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2008-09-22 21:24:59.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2008-09-22 22:11:54.000000000 +0300 @@ -299,6 +299,8 @@ struct ipoib_dev_priv { struct work_struct flush_heavy; struct work_struct restart_task; struct delayed_work ah_reap_task; + struct work_struct vlan_task; + atomic_t vlan_task_flag; struct ib_device *ca; u8 port; @@ -503,6 +505,7 @@ void ipoib_event(struct ib_event_handler int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey); int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); +void ipoib_vlan_task(struct work_struct *work); void ipoib_pkey_poll(struct work_struct *work); int ipoib_pkey_dev_delay_open(struct net_device *dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-09-22 21:24:59.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-09-22 22:13:26.000000000 +0300 @@ -124,20 +124,8 @@ int ipoib_open(struct net_device *dev) } if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - struct ipoib_dev_priv *cpriv; - - /* Bring up any child interfaces too */ - mutex_lock(&priv->vlan_mutex); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags = cpriv->dev->flags; - if (flags & IFF_UP) - continue; - - dev_change_flags(cpriv->dev, flags | IFF_UP); - } - mutex_unlock(&priv->vlan_mutex); + vlan_task_flag(&priv->vlan_task_flag, 1); + queue_work(ipoib_workqueue, &priv->vlan_task); } netif_start_queue(dev); @@ -160,20 +148,8 @@ static int ipoib_stop(struct net_device ipoib_ib_dev_stop(dev, 0); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - struct ipoib_dev_priv *cpriv; - - /* Bring down any child interfaces too */ - mutex_lock(&priv->vlan_mutex); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags = cpriv->dev->flags; - if (!(flags & IFF_UP)) - continue; - - dev_change_flags(cpriv->dev, flags & ~IFF_UP); - } - mutex_unlock(&priv->vlan_mutex); + atomic_set(&priv->vlan_task_flag, 0); + queue_work(ipoib_workqueue, &priv->vlan_task); } return 0; @@ -1075,12 +1051,13 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(&priv->pkey_poll_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); - INIT_WORK(&priv->carrier_on_task, ipoib_mcast_carrier_on_task); + INIT_WORK(&priv->broadcast_join_task, ipoib_mcast_broadcast_join_task); INIT_WORK(&priv->flush_light, ipoib_ib_dev_flush_light); INIT_WORK(&priv->flush_normal, ipoib_ib_dev_flush_normal); INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); + INIT_WORK(&priv->vlan_task, ipoib_vlan_task); } struct ipoib_dev_priv *ipoib_intf_alloc(const char *name) Index: b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2008-09-22 21:05:14.000000000 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2008-09-22 22:13:43.000000000 +0300 @@ -174,3 +174,25 @@ int ipoib_vlan_delete(struct net_device return ret; } + +void ipoib_vlan_task(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, vlan_task); + struct ipoib_dev_priv *cpriv; + int flags, new_flags, iffup_value; + + iffup_value = atomic_read(&priv->vlan_task_flag) ? IFF_UP : 0; + + mutex_lock(&priv->vlan_mutex); + list_for_each_entry(cpriv, &priv->child_intfs, list) { + flags = cpriv->dev->flags; + new_flags = (flags & ~IFF_UP) | iffup_value; + if (flags != new_flags) { + rtnl_lock(); + dev_change_flags(cpriv->dev, new_flags); + rtnl_unlock(); + } + } + mutex_unlock(&priv->vlan_mutex); +} -- --Yossi From rdreier at cisco.com Wed Sep 24 11:08:05 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 Sep 2008 11:08:05 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: <20080921132522.GA25090@mtls03> (Eli Cohen's message of "Sun, 21 Sep 2008 16:25:22 +0300") References: <20080921132522.GA25090@mtls03> Message-ID: Would it be simpler to just not update path->ah if the query fails? In the case of the new optimization on path flushing, path->valid will be 0 so there's no reason to change path->ah to NULL if we're not going to change path->valid anyway. In fact this seems more in keeping with the idea behind ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events"), because your path kills the ah in the neigh struct as soon as a path record query fails, when the point of the original patch was to keep using addresses that are probably still valid across an SM failure. The patch I'm proposing is the following. If this looks OK, I'll send it to Linus tomorrow. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 1b1df5c..e9ca3cb 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -404,7 +404,7 @@ static void path_rec_completion(int status, struct net_device *dev = path->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_ah *ah = NULL; - struct ipoib_ah *old_ah; + struct ipoib_ah *old_ah = NULL; struct ipoib_neigh *neigh, *tn; struct sk_buff_head skqueue; struct sk_buff *skb; @@ -428,12 +428,12 @@ static void path_rec_completion(int status, spin_lock_irqsave(&priv->lock, flags); - old_ah = path->ah; - path->ah = ah; - if (ah) { path->pathrec = *pathrec; + old_ah = path->ah; + path->ah = ah; + ipoib_dbg(priv, "created address handle %p for LID 0x%04x, SL %d\n", ah, be16_to_cpu(pathrec->dlid), pathrec->sl); From arlin.r.davis at intel.com Wed Sep 24 13:40:05 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 24 Sep 2008 13:40:05 -0700 Subject: [ofa-general] [PATCH 5/5][v2.0] Message-ID: <000001c91e85$ba5ed2f0$db97070a@amr.corp.intel.com> From arlin.r.davis at intel.com Wed Sep 24 13:43:55 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 24 Sep 2008 13:43:55 -0700 Subject: [ofa-general] [PATCH 1/5][v2.0] dapl: add provider specific attribute query option for IB UD MTU size Message-ID: <000101c91e86$4338b2d0$db97070a@amr.corp.intel.com> Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_adapter_util.h | 1 + dapl/common/dapl_ia_query.c | 2 +- dapl/openib_cma/dapl_ib_util.c | 2 ++ dapl/openib_scm/dapl_ib_util.c | 28 +++++++++++++++++++++++++++- dapl/openib_scm/dapl_ib_util.h | 2 ++ dat/include/dat2/dat_ib_extensions.h | 5 +++-- 6 files changed, 36 insertions(+), 4 deletions(-) diff --git a/dapl/common/dapl_adapter_util.h b/dapl/common/dapl_adapter_util.h index 43175a9..c5bf5da 100755 --- a/dapl/common/dapl_adapter_util.h +++ b/dapl/common/dapl_adapter_util.h @@ -246,6 +246,7 @@ int dapls_ib_private_data_size ( void dapls_query_provider_specific_attr( + IN DAPL_IA *ia_ptr, IN DAT_PROVIDER_ATTR *provider_attr ); #ifdef CQ_WAIT_OBJECT diff --git a/dapl/common/dapl_ia_query.c b/dapl/common/dapl_ia_query.c index a8c39a3..6c1bf14 100755 --- a/dapl/common/dapl_ia_query.c +++ b/dapl/common/dapl_ia_query.c @@ -172,7 +172,7 @@ dapl_ia_query ( /* * Query for provider specific attributes */ - dapls_query_provider_specific_attr(provider_attr); + dapls_query_provider_specific_attr(ia_ptr, provider_attr); /* * Set up evd_stream_merging_supported options. Note there is diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index a8e1fe3..72d8237 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -963,8 +963,10 @@ DAT_NAMED_ATTR ib_attrs[] = { #define SPEC_ATTR_SIZE( x ) (sizeof( x ) / sizeof( DAT_NAMED_ATTR)) void dapls_query_provider_specific_attr( + IN DAPL_IA *ia_ptr, IN DAT_PROVIDER_ATTR *attr_ptr ) { attr_ptr->num_provider_specific_attr = SPEC_ATTR_SIZE(ib_attrs); attr_ptr->provider_specific_attr = ib_attrs; } + diff --git a/dapl/openib_scm/dapl_ib_util.c b/dapl/openib_scm/dapl_ib_util.c index 58c9943..92b45d5 100644 --- a/dapl/openib_scm/dapl_ib_util.c +++ b/dapl/openib_scm/dapl_ib_util.c @@ -76,6 +76,18 @@ enum ibv_mtu dapl_ib_mtu(int mtu) } } +char *dapl_ib_mtu_str(enum ibv_mtu mtu) +{ + switch (mtu) { + case IBV_MTU_256: return "256"; + case IBV_MTU_512: return "512"; + case IBV_MTU_1024: return "1024"; + case IBV_MTU_2048: return "2048"; + case IBV_MTU_4096: return "4096"; + default: return "1024"; + } +} + /* just get IP address for hostname */ DAT_RETURN getipaddr( char *addr, int addr_len) { @@ -475,6 +487,11 @@ DAT_RETURN dapls_ib_query_hca ( DAPL_MAX(dev_attr.local_ca_ack_delay, hca_ptr->ib_trans.ack_timer); + /* set MTU in transport specific named attribute */ + hca_ptr->ib_trans.named_attr.name = "DAT_IB_TRANSPORT_MTU"; + hca_ptr->ib_trans.named_attr.value = + dapl_ib_mtu_str(hca_ptr->ib_trans.mtu); + dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " query_hca: (%x.%x) ep %d ep_q %d evd %d" " evd_q %d mtu %d\n", @@ -588,6 +605,9 @@ DAT_RETURN dapls_ib_setup_async_callback ( * void */ DAT_NAMED_ATTR ib_attrs[] = { + { + "DAT_IB_TRANSPORT_MTU", "1024" + }, #ifdef DAT_EXTENSIONS { "DAT_EXTENSION_INTERFACE", "TRUE" @@ -610,9 +630,15 @@ DAT_NAMED_ATTR ib_attrs[] = { #define SPEC_ATTR_SIZE( x ) (sizeof( x ) / sizeof( DAT_NAMED_ATTR)) void dapls_query_provider_specific_attr( + IN DAPL_IA *ia_ptr, IN DAT_PROVIDER_ATTR *attr_ptr ) { attr_ptr->num_provider_specific_attr = SPEC_ATTR_SIZE(ib_attrs); - attr_ptr->provider_specific_attr = ib_attrs; + attr_ptr->provider_specific_attr = ib_attrs; + + /* set MTU to actual settings */ + ib_attrs[0].value = ia_ptr->hca_ptr->ib_trans.named_attr.value; } + + diff --git a/dapl/openib_scm/dapl_ib_util.h b/dapl/openib_scm/dapl_ib_util.h index f0230b8..863da2b 100644 --- a/dapl/openib_scm/dapl_ib_util.h +++ b/dapl/openib_scm/dapl_ib_util.h @@ -197,6 +197,7 @@ typedef struct ibv_comp_channel *ib_wait_obj_handle_t; #define IB_MAX_REJ_PDATA_SIZE 148 #define IB_MAX_DREQ_PDATA_SIZE 220 #define IB_MAX_DREP_PDATA_SIZE 224 +#define IB_MAX_RTU_PDATA_SIZE 224 /* DTO OPs, ordered for DAPL ENUM definitions */ #define OP_RDMA_WRITE IBV_WR_RDMA_WRITE @@ -307,6 +308,7 @@ typedef struct _ib_hca_transport uint8_t hop_limit; uint8_t tclass; uint8_t mtu; + DAT_NAMED_ATTR named_attr; } ib_hca_transport_t; /* provider specfic fields for shared memory support */ diff --git a/dat/include/dat2/dat_ib_extensions.h b/dat/include/dat2/dat_ib_extensions.h index 27af51e..eb10714 100755 --- a/dat/include/dat2/dat_ib_extensions.h +++ b/dat/include/dat2/dat_ib_extensions.h @@ -108,8 +108,9 @@ typedef enum dat_ib_ext_type DAT_IB_RECV_IMMED_DATA, // 4 DAT_IB_UD_CONNECT_REQUEST, // 5 DAT_IB_UD_REMOTE_AH, // 6 - DAT_IB_UD_SEND, // 7 - DAT_IB_UD_RECV // 8 + DAT_IB_UD_PASSIVE_REMOTE_AH, // 7 + DAT_IB_UD_SEND, // 8 + DAT_IB_UD_RECV // 9 } DAT_IB_EXT_TYPE; -- 1.5.2.5 From arlin.r.davis at intel.com Wed Sep 24 13:44:01 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 24 Sep 2008 13:44:01 -0700 Subject: [ofa-general] [PATCH 2/5][v2.0] dapl: fixes for IB UD extensions in common code and socket cm provider. Message-ID: <000201c91e86$466cdbc0$db97070a@amr.corp.intel.com> - Manage EP states base on attribute service type. - Allow multiple connections (remote_ah resolution) and accepts on UD type endpoints. - Supply private data on CR conn establishment - Add UD extension conn event type - DAT_IB_UD_PASSIVE_REMOTE_AH Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_cr_accept.c | 8 +++-- dapl/common/dapl_ep_connect.c | 3 +- dapl/common/dapl_ep_disconnect.c | 3 +- dapl/openib_scm/dapl_ib_cm.c | 37 ++++++++++------------ dapl/openib_scm/dapl_ib_qp.c | 63 ++++++++++++++++++++++++------------- 5 files changed, 67 insertions(+), 47 deletions(-) diff --git a/dapl/common/dapl_cr_accept.c b/dapl/common/dapl_cr_accept.c index e221fdc..1059e0c 100644 --- a/dapl/common/dapl_cr_accept.c +++ b/dapl/common/dapl_cr_accept.c @@ -123,15 +123,17 @@ dapl_cr_accept ( if ( ep_handle == NULL ) { ep_handle = cr_ptr->param.local_ep_handle; - if ( (((DAPL_EP *) ep_handle)->param.ep_state != DAT_EP_STATE_TENTATIVE_CONNECTION_PENDING) - && (((DAPL_EP *)ep_handle)->param.ep_state != DAT_EP_STATE_PASSIVE_CONNECTION_PENDING) ) + if ( ((((DAPL_EP *) ep_handle)->param.ep_state != DAT_EP_STATE_TENTATIVE_CONNECTION_PENDING) + && (((DAPL_EP *)ep_handle)->param.ep_state != DAT_EP_STATE_PASSIVE_CONNECTION_PENDING)) && + (((DAPL_EP *)ep_handle)->param.ep_attr.service_type == DAT_SERVICE_TYPE_RC)) { return DAT_INVALID_STATE; } } else { /* ensure this EP isn't connected or in use*/ - if ( ((DAPL_EP *) ep_handle)->param.ep_state != DAT_EP_STATE_UNCONNECTED ) + if ( (((DAPL_EP *)ep_handle)->param.ep_state != DAT_EP_STATE_UNCONNECTED) && + (((DAPL_EP *)ep_handle)->param.ep_attr.service_type == DAT_SERVICE_TYPE_RC)) { return DAT_INVALID_STATE; } diff --git a/dapl/common/dapl_ep_connect.c b/dapl/common/dapl_ep_connect.c index f290ebe..0c3f10a 100755 --- a/dapl/common/dapl_ep_connect.c +++ b/dapl/common/dapl_ep_connect.c @@ -234,7 +234,8 @@ dapl_ep_connect ( } } - if ( ep_ptr->param.ep_state != DAT_EP_STATE_UNCONNECTED ) + if (ep_ptr->param.ep_state != DAT_EP_STATE_UNCONNECTED && + ep_ptr->param.ep_attr.service_type == DAT_SERVICE_TYPE_RC) { dapl_os_unlock ( &ep_ptr->header.lock ); dat_status = DAT_ERROR (DAT_INVALID_STATE, dapls_ep_state_subtype (ep_ptr)); diff --git a/dapl/common/dapl_ep_disconnect.c b/dapl/common/dapl_ep_disconnect.c index 0c1dd38..fabce92 100644 --- a/dapl/common/dapl_ep_disconnect.c +++ b/dapl/common/dapl_ep_disconnect.c @@ -95,7 +95,8 @@ dapl_ep_disconnect ( dapl_os_lock ( &ep_ptr->header.lock ); /* Disconnecting a disconnected EP is a no-op. */ - if ( ep_ptr->param.ep_state == DAT_EP_STATE_DISCONNECTED ) + if (ep_ptr->param.ep_state == DAT_EP_STATE_DISCONNECTED || + ep_ptr->param.ep_attr.service_type != DAT_SERVICE_TYPE_RC) { dapl_os_unlock ( &ep_ptr->header.lock ); dat_status = DAT_SUCCESS; diff --git a/dapl/openib_scm/dapl_ib_cm.c b/dapl/openib_scm/dapl_ib_cm.c index cf5891d..5a6aa97 100644 --- a/dapl/openib_scm/dapl_ib_cm.c +++ b/dapl/openib_scm/dapl_ib_cm.c @@ -108,7 +108,7 @@ static void dapli_cm_destroy(struct ib_cm_handle *cm_ptr) dapl_os_lock(&cm_ptr->lock); cm_ptr->state = SCM_DESTROY; - if (cm_ptr->ep) + if ((cm_ptr->ep) && (cm_ptr->ep->cm_handle == cm_ptr)) cm_ptr->ep->cm_handle = IB_INVALID_HANDLE; /* close socket if still active */ @@ -185,18 +185,20 @@ dapli_socket_disconnect(dp_ib_cm_handle_t cm_ptr) } dapl_os_unlock(&cm_ptr->lock); - - if (ep_ptr->cr_ptr) { - dapls_cr_callback(cm_ptr, - IB_CME_DISCONNECTED, - NULL, - ((DAPL_CR *)ep_ptr->cr_ptr)->sp_ptr); - } else { - dapl_evd_connection_callback(ep_ptr->cm_handle, - IB_CME_DISCONNECTED, - NULL, - ep_ptr); - } + /* disconnect events for RC's only */ + if (ep_ptr->param.ep_attr.service_type == DAT_SERVICE_TYPE_RC) { + if (ep_ptr->cr_ptr) { + dapls_cr_callback(cm_ptr, + IB_CME_DISCONNECTED, + NULL, + ((DAPL_CR *)ep_ptr->cr_ptr)->sp_ptr); + } else { + dapl_evd_connection_callback(ep_ptr->cm_handle, + IB_CME_DISCONNECTED, + NULL, + ep_ptr); + } + } /* scheduled destroy via disconnect clean in callback */ return DAT_SUCCESS; @@ -477,7 +479,6 @@ dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) ep_ptr->param.remote_ia_address_ptr)->sin_addr)); goto bail; } - if (dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_RTS, cm_ptr) != DAT_SUCCESS) { dapl_log(DAPL_DBG_TYPE_ERR, @@ -487,9 +488,6 @@ dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) ep_ptr->param.remote_ia_address_ptr)->sin_addr)); goto bail; } - - ep_ptr->qp_state = IB_QP_STATE_RTS; - dapl_dbg_log(DAPL_DBG_TYPE_EP," connect_rtu: send RTU\n"); /* complete handshake after final QP state change */ @@ -801,7 +799,6 @@ dapli_socket_accept_usr(DAPL_EP *ep_ptr, &cm_ptr->dst.ia_address)->sin_addr)); goto bail; } - ep_ptr->qp_state = IB_QP_STATE_RTS; /* save remote address information */ dapl_os_memcpy( &ep_ptr->remote_ia_address, @@ -911,7 +908,7 @@ dapli_socket_accept_rtu(dp_ib_cm_handle_t cm_ptr) /* post EVENT, modify_qp created ah */ xevent.status = 0; - xevent.type = DAT_IB_UD_REMOTE_AH; + xevent.type = DAT_IB_UD_PASSIVE_REMOTE_AH; xevent.remote_ah.ah = cm_ptr->ah; xevent.remote_ah.qpn = cm_ptr->dst.qpn; dapl_os_memcpy( &xevent.remote_ah.ia_addr, @@ -1529,7 +1526,7 @@ void cr_thread(void *arg) else dapli_socket_connected(cr,errno); } else { - dapl_log(DAPL_DBG_TYPE_WARN, + dapl_log(DAPL_DBG_TYPE_CM, " CM poll ERR, wrong state(%d) -> %s SKIP\n", cr->state, inet_ntoa(((struct sockaddr_in*) diff --git a/dapl/openib_scm/dapl_ib_qp.c b/dapl/openib_scm/dapl_ib_qp.c index 4fae307..9c8a881 100644 --- a/dapl/openib_scm/dapl_ib_qp.c +++ b/dapl/openib_scm/dapl_ib_qp.c @@ -156,7 +156,6 @@ dapls_ib_qp_alloc ( return DAT_INTERNAL_ERROR; } - ep_ptr->qp_state = IB_QP_STATE_INIT; return DAT_SUCCESS; } @@ -193,7 +192,6 @@ dapls_ib_qp_free ( return(dapl_convert_errno(errno,"destroy_qp")); ep_ptr->qp_handle = IB_INVALID_HANDLE; - ep_ptr->qp_state = IB_QP_STATE_ERROR; } return DAT_SUCCESS; @@ -241,7 +239,6 @@ dapls_ib_qp_modify ( /* move to error state if necessary */ if ((ep_ptr->qp_state == IB_QP_STATE_ERROR) && (ep_ptr->qp_handle->state != IBV_QPS_ERR)) { - ep_ptr->qp_state = IB_QP_STATE_ERROR; return (dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_ERR, NULL)); } @@ -295,16 +292,17 @@ void dapls_ib_reinit_ep ( IN DAPL_EP *ep_ptr) { - if ( ep_ptr->qp_handle != IB_INVALID_HANDLE ) { + if (ep_ptr->qp_handle != IB_INVALID_HANDLE && + ep_ptr->qp_handle->qp_type != IBV_QPT_UD) { /* move to RESET state and then to INIT */ dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_RESET, 0); dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_INIT, 0); - ep_ptr->qp_state = IB_QP_STATE_INIT; } } /* * Generic QP modify for init, reset, error, RTS, RTR + * For UD, create_ah on RTR, qkey on INIT */ DAT_RETURN dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, @@ -316,20 +314,21 @@ dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, DAPL_EP *ep_ptr = (DAPL_EP*)qp_handle->qp_context; DAPL_IA *ia_ptr = ep_ptr->header.owner_ia; ib_qp_cm_t *qp_cm = &cm_ptr->dst; + int ret; dapl_os_memzero((void*)&qp_attr, sizeof(qp_attr)); qp_attr.qp_state = qp_state; - switch (qp_state) { /* additional attributes with RTR and RTS */ case IBV_QPS_RTR: { dapl_dbg_log(DAPL_DBG_TYPE_EP, - " QPS_RTR: type %d qpn %x lid %x" - " port %x\n", - qp_handle->qp_type, - qp_cm->qpn, qp_cm->lid, qp_cm->port); - + " QPS_RTR: type %d state %d qpn %x lid %x" + " port %x ep %p qp_state %d\n", + qp_handle->qp_type, qp_handle->qp_type, + qp_cm->qpn, qp_cm->lid, qp_cm->port, + ep_ptr, ep_ptr->qp_state); + mask |= IBV_QP_AV | IBV_QP_PATH_MTU | IBV_QP_DEST_QPN | @@ -337,7 +336,6 @@ dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, IBV_QP_MAX_DEST_RD_ATOMIC | IBV_QP_MIN_RNR_TIMER; - qp_attr.qp_state = IBV_QPS_RTR; qp_attr.dest_qp_num = qp_cm->qpn; qp_attr.rq_psn = 1; qp_attr.path_mtu = @@ -346,8 +344,8 @@ dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, ep_ptr->param.ep_attr.max_rdma_read_out; qp_attr.min_rnr_timer = ia_ptr->hca_ptr->ib_trans.rnr_timer; - - /* address handle */ + + /* address handle. RC and UD */ qp_attr.ah_attr.dlid = qp_cm->lid; if (ia_ptr->hca_ptr->ib_trans.global) { qp_attr.ah_attr.is_global = 1; @@ -372,19 +370,24 @@ dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, if (!cm_ptr->ah) return(dapl_convert_errno(errno, "ibv_ah")); + + /* already RTR, multi remote AH's on QP */ + if (ep_ptr->qp_state == IBV_QPS_RTR || + ep_ptr->qp_state == IBV_QPS_RTS) + return DAT_SUCCESS; } #endif break; } case IBV_QPS_RTS: { - mask |= IBV_QP_SQ_PSN; + /* RC only */ if (qp_handle->qp_type == IBV_QPT_RC) { - mask |= IBV_QP_TIMEOUT | + mask |= IBV_QP_SQ_PSN | + IBV_QP_TIMEOUT | IBV_QP_RETRY_CNT | IBV_QP_RNR_RETRY | IBV_QP_MAX_QP_RD_ATOMIC; - qp_attr.timeout = ia_ptr->hca_ptr->ib_trans.ack_timer; qp_attr.retry_cnt = @@ -394,15 +397,25 @@ dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, qp_attr.max_rd_atomic = ep_ptr->param.ep_attr.max_rdma_read_out; } + /* RC and UD */ qp_attr.qp_state = IBV_QPS_RTS; qp_attr.sq_psn = 1; dapl_dbg_log(DAPL_DBG_TYPE_EP, " QPS_RTS: psn %x rd_atomic %d ack %d " - " retry %d rnr_retry %d\n", + " retry %d rnr_retry %d ep %p qp_state %d\n", qp_attr.sq_psn, qp_attr.max_rd_atomic, qp_attr.timeout, qp_attr.retry_cnt, - qp_attr.rnr_retry ); + qp_attr.rnr_retry, ep_ptr, ep_ptr->qp_state); +#ifdef DAT_EXTENSIONS + if (qp_handle->qp_type == IBV_QPT_UD) { + /* already RTS, multi remote AH's on QP */ + if (ep_ptr->qp_state == IBV_QPS_RTS) + return DAT_SUCCESS; + else + mask = IBV_QP_STATE | IBV_QP_SQ_PSN; + } +#endif break; } case IBV_QPS_INIT: @@ -419,6 +432,9 @@ dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, } #ifdef DAT_EXTENSIONS if (qp_handle->qp_type == IBV_QPT_UD) { + /* already INIT, multi remote AH's on QP */ + if (ep_ptr->qp_state == IBV_QPS_INIT) + return DAT_SUCCESS; mask |= IBV_QP_QKEY; qp_attr.qkey = SCM_UD_QKEY; } @@ -437,10 +453,13 @@ dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, } - if (ibv_modify_qp(qp_handle, &qp_attr, mask)) + ret = ibv_modify_qp(qp_handle, &qp_attr, mask); + if (ret == 0) { + ep_ptr->qp_state = qp_state; + return DAT_SUCCESS; + } else { return(dapl_convert_errno(errno,"modify_qp_state")); - - return DAT_SUCCESS; + } } /* -- 1.5.2.5 From arlin.r.davis at intel.com Wed Sep 24 13:44:03 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 24 Sep 2008 13:44:03 -0700 Subject: [ofa-general] [PATCH 3/5][v2.0] dtestx: Add new options to test UD. Message-ID: <000301c91e86$47d9b230$db97070a@amr.corp.intel.com> - many to one/many EP remote AH resolution, data flow - bi-directional EP remote AH resolution, data flow Signed-off by: Arlin Davis ardavis at ichips.intel.com --- test/dtest/dtestx.c | 650 ++++++++++++++++++++++++++++++++++----------------- 1 files changed, 441 insertions(+), 209 deletions(-) diff --git a/test/dtest/dtestx.c b/test/dtest/dtestx.c index fb89364..cfe00cd 100755 --- a/test/dtest/dtestx.c +++ b/test/dtest/dtestx.c @@ -75,6 +75,8 @@ int disconnect_ep(void); dat_strerror(status, &maj_msg, &min_msg);\ fprintf(stderr, str " returned %s : %s\n", maj_msg, min_msg);\ exit(1);\ + } else if (verbose) {\ + printf("dtestx: %s success\n",str);\ }\ } @@ -85,6 +87,8 @@ int disconnect_ep(void); dat_strerror(status, &maj_msg, &min_msg);\ fprintf(stderr, str " returned %s : %s\n", maj_msg, min_msg);\ exit(1);\ + } else if (verbose) {\ + printf("dtestx: %s\n",str);\ }\ } @@ -119,10 +123,14 @@ int disconnect_ep(void); #define ntoh64(x) (x) #endif /* __BYTE_ORDER == __BIG_ENDIAN */ +#define MIN(a, b) ((a < b) ? (a) : (b)) +#define MAX(a, b) ((a > b) ? (a) : (b)) + #define DTO_TIMEOUT (1000*1000*5) -#define CONN_TIMEOUT (1000*1000*10) -#define SERVER_TIMEOUT (1000*1000*120) -#define SERVER_CONN_QUAL 31111 +#define CONN_TIMEOUT (1000*1000*30) +#define SERVER_TIMEOUT (DAT_TIMEOUT_INFINITE) +#define CLIENT_ID 31111 +#define SERVER_ID 31112 #define BUF_SIZE 256 #define BUF_SIZE_ATOMIC 8 #define REG_MEM_COUNT 10 @@ -130,6 +138,7 @@ int disconnect_ep(void); #define RCV_RDMA_BUF_INDEX 1 #define SEND_BUF_INDEX 2 #define RECV_BUF_INDEX 3 +#define MAX_EP_COUNT 8 DAT_VADDR *atomic_buf; DAT_LMR_HANDLE lmr_atomic; @@ -137,14 +146,14 @@ DAT_LMR_CONTEXT lmr_atomic_context; DAT_RMR_CONTEXT rmr_atomic_context; DAT_VLEN reg_atomic_size; DAT_VADDR reg_atomic_addr; -DAT_LMR_HANDLE lmr[ REG_MEM_COUNT ]; -DAT_LMR_CONTEXT lmr_context[ REG_MEM_COUNT ]; -DAT_RMR_TRIPLET rmr[ REG_MEM_COUNT ]; -DAT_RMR_CONTEXT rmr_context[ REG_MEM_COUNT ]; -DAT_VLEN reg_size[ REG_MEM_COUNT ]; -DAT_VADDR reg_addr[ REG_MEM_COUNT ]; -DAT_RMR_TRIPLET * buf[ REG_MEM_COUNT ]; -DAT_EP_HANDLE ep; +DAT_LMR_HANDLE lmr[ REG_MEM_COUNT*MAX_EP_COUNT ]; +DAT_LMR_CONTEXT lmr_context[ REG_MEM_COUNT*MAX_EP_COUNT ]; +DAT_RMR_TRIPLET rmr[ REG_MEM_COUNT*MAX_EP_COUNT ]; +DAT_RMR_CONTEXT rmr_context[ REG_MEM_COUNT*MAX_EP_COUNT ]; +DAT_VLEN reg_size[ REG_MEM_COUNT*MAX_EP_COUNT ]; +DAT_VADDR reg_addr[ REG_MEM_COUNT*MAX_EP_COUNT ]; +DAT_RMR_TRIPLET * buf[ REG_MEM_COUNT*MAX_EP_COUNT ]; +DAT_EP_HANDLE ep[MAX_EP_COUNT]; DAT_EVD_HANDLE async_evd = DAT_HANDLE_NULL; DAT_IA_HANDLE ia = DAT_HANDLE_NULL; DAT_PZ_HANDLE pz = DAT_HANDLE_NULL; @@ -152,21 +161,30 @@ DAT_EVD_HANDLE cr_evd = DAT_HANDLE_NULL; DAT_EVD_HANDLE con_evd = DAT_HANDLE_NULL; DAT_EVD_HANDLE dto_evd = DAT_HANDLE_NULL; DAT_PSP_HANDLE psp = DAT_HANDLE_NULL; -DAT_CR_HANDLE cr = DAT_HANDLE_NULL; int server = 1; int ud_test = 0; +int multi_eps = 0; int buf_size = BUF_SIZE; int msg_size = sizeof(DAT_RMR_TRIPLET); char provider[64] = DAPL_PROVIDER; char hostname[256] = { 0 }; -DAT_IB_ADDR_HANDLE remote_ah; +DAT_IB_ADDR_HANDLE remote_ah[MAX_EP_COUNT]; +int eps = 1; +int verbose = 0; + +#define LOGPRINTF if (verbose) printf void print_usage(void) { printf("\n dtestx usage \n\n"); + printf("v: verbose\n"); printf("u unreliable datagram test\n"); + printf("U: unreliable datagram test, UD endpoint count\n"); + printf("m unreliable datagram test, multiple Server endpoints\n"); printf("b: buf length to allocate\n"); - printf("h: hostname/address of server, specified on client\n"); + printf("h: hostname/address of Server, specified on Client\n"); + printf("c: Client\n"); + printf("s: Server, default\n"); printf("P: provider name (default = ofa-v2-ib0)\n"); printf("\n"); } @@ -193,6 +211,7 @@ send_msg( DAT_EVENT event; DAT_COUNT nmore; DAT_RETURN status; + int i,ep_idx=0,ah_idx=0; DAT_DTO_COMPLETION_EVENT_DATA *dto_event = &event.event_data.dto_completion_event_data; @@ -200,26 +219,171 @@ send_msg( iov.virtual_address = (DAT_VADDR)data; iov.segment_length = (DAT_VLEN)size; - if (ud_test) - status = dat_ib_post_send_ud(ep, 1, &iov, - &remote_ah, cookie, flags); - else - status = dat_ep_post_send(ep, 1, &iov, cookie, flags); + for (i=0;isin_addr)); + + /* client expects all data in on first EP */ + status = dat_ib_post_send_ud(ep[ep_idx], + 1, + &iov, + &remote_ah[ah_idx], + cookie, + flags); + + } else { + status = dat_ep_post_send(ep[0], 1, &iov, + cookie, flags); + } + _OK(status, "dat_ep_post_send"); + + if (!(flags & DAT_COMPLETION_SUPPRESS_FLAG)) { + status = dat_evd_wait(dto_evd, DTO_TIMEOUT, + 1, &event, &nmore); + _OK(status, "dat_evd_wait after dat_ep_post_send"); + + if (event.event_number != DAT_DTO_COMPLETION_EVENT && + ud_test && event.event_number != + DAT_IB_DTO_EVENT) { + printf("unexpected event waiting post_send " + "completion - 0x%x\n", + event.event_number); + exit(1); + } + _OK(dto_event->status, "event status for post_send"); + } + } +} - _OK(status, "dat_ep_post_send"); +/* RC - Server only, UD - Server and Client, one per EP */ +void process_cr(int idx) +{ + DAT_EVENT event; + DAT_COUNT nmore; + DAT_RETURN status; + int pdata; + DAT_CR_HANDLE cr = DAT_HANDLE_NULL; + DAT_CONN_QUAL exp_qual = server?SERVER_ID:CLIENT_ID; + DAT_CR_PARAM cr_param; + DAT_CR_ARRIVAL_EVENT_DATA *cr_event = + &event.event_data.cr_arrival_event_data; + + LOGPRINTF("%s waiting for connect[%d] request\n", + server?"Server":"Client",idx); - if (!(flags & DAT_COMPLETION_SUPPRESS_FLAG)) { - status = dat_evd_wait(dto_evd, DTO_TIMEOUT, - 1, &event, &nmore); - _OK(status, "dat_evd_wait after dat_ep_post_send"); + status = dat_evd_wait(cr_evd, SERVER_TIMEOUT, 1, &event, &nmore); + _OK(status, "CR dat_evd_wait"); - if ((event.event_number != DAT_DTO_COMPLETION_EVENT) && - (ud_test && event.event_number != DAT_IB_DTO_EVENT)) { - printf("unexpected event waiting for post_send " - "completion - 0x%x\n", event.event_number); - exit(1); - } - _OK(dto_event->status, "event status for post_send"); + if (event.event_number != DAT_CONNECTION_REQUEST_EVENT && + (ud_test && event.event_number != + DAT_IB_UD_CONNECTION_REQUEST_EVENT)) { + printf("unexpected event,!conn req: 0x%x\n", + event.event_number); + exit(1); + } + + if ((cr_event->conn_qual != exp_qual) || + (cr_event->sp_handle.psp_handle != psp)) { + printf("wrong cr event data\n"); + exit(1); + } + + cr = cr_event->cr_handle; + status = dat_cr_query(cr, DAT_CSP_FIELD_ALL, &cr_param); + _OK(status, "dat_cr_query"); + + /* use private data to select EP */ + pdata = ntoh32(*((int*)cr_param.private_data)); + + LOGPRINTF("%s recvd pdata=0x%x, send pdata=0x%x\n", + server?"Server":"Client",pdata, + *(int*)cr_param.private_data); + + status = dat_cr_accept(cr, ep[pdata], 4, cr_param.private_data); + _OK(status, "dat_cr_accept"); + + printf("%s accepted CR on EP[%d]=%p\n", + server?"Server":"Client", + pdata, ep[pdata]); +} + +/* RC - Client and Server: 1, UD - Client: 1 per EP, Server: 2 per EP's */ +void process_conn(int idx) +{ + DAT_EVENT event; + DAT_COUNT nmore; + DAT_RETURN status; + int pdata; + DAT_IB_EXTENSION_EVENT_DATA *ext_event = + (DAT_IB_EXTENSION_EVENT_DATA *) + &event.event_extension_data[0]; + DAT_CONNECTION_EVENT_DATA *conn_event = + &event.event_data.connect_event_data; + + LOGPRINTF("%s waiting for connect[%d] establishment\n", + server?"Server":"Client",idx); + + status = dat_evd_wait(con_evd, CONN_TIMEOUT, 1, &event, &nmore); + _OK(status, "CONN dat_evd_wait"); + + LOGPRINTF("%s got connect[%d] event, pdata %p sz=%d\n", + server?"Server":"Client",idx, + conn_event->private_data, + conn_event->private_data_size); + + /* Waiting on CR's or CONN_EST */ + if (event.event_number != DAT_CONNECTION_EVENT_ESTABLISHED && + (ud_test && event.event_number != + DAT_IB_UD_CONNECTION_EVENT_ESTABLISHED)) { + printf("unexpected event, !conn established: 0x%x\n", + event.event_number); + exit(1); + } + + /* RC or PASSIVE CONN_EST we are done */ + if (!ud_test) + return; + + /* store each remote_ah according to remote EP index */ + pdata = ntoh32(*((int*)conn_event->private_data)); + LOGPRINTF(" Client got private data=0x%x\n", pdata); + + /* UD, get AH for sends. + * NOTE: bi-directional AH resolution results in a CONN_EST + * for both outbound connect and inbound CR. + * Use Active CONN_EST which includes server's CR + * pdata for remote_ah idx to send on and ignore PASSIVE CONN_EST. + * + * DAT_IB_UD_PASSIVE_REMOTE_AH == passive side CONN_EST + * DAT_IB_UD_REMOTE_AH == active side CONN_EST + */ + if (ext_event->type == DAT_IB_UD_REMOTE_AH) { + remote_ah[pdata] = ext_event->remote_ah; + printf("remote_ah[%d]: ah=%p, qpn=0x%x " + "addr=%s\n", + pdata, remote_ah[pdata].ah, + remote_ah[pdata].qpn, + inet_ntoa(((struct sockaddr_in*) + &remote_ah[pdata].ia_addr)->sin_addr)); + + } else if (ext_event->type != DAT_IB_UD_PASSIVE_REMOTE_AH) { + printf("unexpected UD ext_event type: 0x%x\n", + ext_event->type); + exit(1); } } @@ -235,28 +399,37 @@ connect_ep(char *hostname) DAT_LMR_TRIPLET iov; DAT_RMR_TRIPLET *r_iov; DAT_DTO_COOKIE cookie; - int i; - DAT_CR_ARRIVAL_EVENT_DATA *cr_event = - &event.event_data.cr_arrival_event_data; - DAT_DTO_COMPLETION_EVENT_DATA *dto_event = + DAT_CONN_QUAL conn_qual; + int i,ii,pdata,ctx; + DAT_PROVIDER_ATTR prov_attrs; + DAT_DTO_COMPLETION_EVENT_DATA *dto_event = &event.event_data.dto_completion_event_data; - DAT_IB_EXTENSION_EVENT_DATA *ext_event = - (DAT_IB_EXTENSION_EVENT_DATA *) - &event.event_extension_data[0]; - + status = dat_ia_open(provider, 8, &async_evd, &ia); _OK(status, "dat_ia_open"); - + + memset(&prov_attrs, 0, sizeof(prov_attrs)); + status = dat_ia_query(ia, NULL, 0, NULL, + DAT_PROVIDER_FIELD_ALL, &prov_attrs); + _OK(status, "dat_ia_query"); + + /* Print provider specific attributes */ + for (i=0;iconn_qual != SERVER_CONN_QUAL) || - (cr_event->sp_handle.psp_handle != psp)) { - printf("wrong cr event data\n"); - exit(1); - } - - cr = cr_event->cr_handle; - status = dat_cr_accept(cr, ep, 0, (DAT_PVOID)0); - printf("Server waiting for accept response\n"); - - } else { + } + + /* ud can resolve_ah and connect both ways */ + if (!server || (server && ud_test)) { struct addrinfo *target; if (getaddrinfo (hostname, NULL, NULL, &target) != 0) { @@ -383,52 +559,58 @@ connect_ep(char *hostname) exit(1); } - printf ("Server Name: %s \n", hostname); - printf ("Server Net Address: %s\n", inet_ntoa( - ((struct sockaddr_in *)target->ai_addr)->sin_addr)); + printf ("Remote %s Name: %s \n", + server?"Client":"Server", hostname); + printf ("Remote %s Net Address: %s\n", + server?"Client":"Server", + inet_ntoa(((struct sockaddr_in *) + target->ai_addr)->sin_addr)); remote_addr = *((DAT_IA_ADDRESS_PTR)target->ai_addr); freeaddrinfo(target); - strcpy((char*)buf[ SND_RDMA_BUF_INDEX ],"client written data"); - status = dat_ep_connect(ep, - &remote_addr, - SERVER_CONN_QUAL, - CONN_TIMEOUT, - 0, - (DAT_PVOID)0, - 0, - DAT_CONNECT_DEFAULT_FLAG ); - _OK(status, "dat_psp_create"); - printf("Client waiting for connect response\n"); - } - - status = dat_evd_wait(con_evd, CONN_TIMEOUT, 1, &event, &nmore); - _OK(status, "connect dat_evd_wait"); - - if (event.event_number != DAT_CONNECTION_EVENT_ESTABLISHED && - (ud_test && event.event_number != - DAT_IB_UD_CONNECTION_EVENT_ESTABLISHED)) { - printf("unexpected event, !conn established: 0x%x\n", - event.event_number); - exit(1); + strcpy((char*)buf[SND_RDMA_BUF_INDEX],"Client written data"); + + /* one Client EP, multiple Server EPs, same conn_qual + * use private data to select EP on Server + */ + for (i=0;itype == DAT_IB_UD_REMOTE_AH) { - remote_ah = ext_event->remote_ah; - printf(" remote_ah: ah=%p, qpn=0x%x addr=%s\n", - remote_ah.ah, remote_ah.qpn, - inet_ntoa(((struct sockaddr_in *) - &remote_ah.ia_addr)->sin_addr)); - } else { - printf("unexpected UD ext_event type: 0x%x\n", - ext_event->type); - exit(1); - } + for (i=(server?1:0);ivirtual_address = hton64((DAT_VADDR)buf[RCV_RDMA_BUF_INDEX]); r_iov->segment_length = hton32(buf_size); - printf("%d Send RMR msg to remote: r_key_ctx=0x%x,va="F64x",len=0x%x\n", - getpid(), hton32(r_iov->rmr_context), - hton64(r_iov->virtual_address), hton32(r_iov->segment_length)); + printf("Send RMR message: r_key_ctx=0x%x,va="F64x",len=0x%x\n", + hton32(r_iov->rmr_context), + hton64(r_iov->virtual_address), + hton32(r_iov->segment_length)); send_msg(buf[SEND_BUF_INDEX], sizeof(DAT_RMR_TRIPLET), @@ -452,44 +635,69 @@ connect_ep(char *hostname) /* * Wait for their RMR */ - printf("Waiting for remote to send RMR data\n"); - status = dat_evd_wait(dto_evd, DTO_TIMEOUT, 1, &event, &nmore); - _OK(status, "dat_evd_wait after dat_ep_post_send"); + for (i=0,ctx=0;istatus, "event status for post_recv"); - - if (dto_event->transfered_length != msg_size || - dto_event->user_cookie.as_64 != RECV_BUF_INDEX) { - printf("unexpected event data for receive: len=%d cookie=%d " - "expected %d/%d\n", - (int)dto_event->transfered_length, - (int)dto_event->user_cookie.as_64, - msg_size, RECV_BUF_INDEX); - exit(1); - } - - /* swap RMR,address info to host order */ - if (ud_test) - r_iov = (DAT_RMR_TRIPLET*)((char*)buf[RECV_BUF_INDEX]+40); - else - r_iov = (DAT_RMR_TRIPLET*)buf[RECV_BUF_INDEX]; - - r_iov->rmr_context = ntoh32(r_iov->rmr_context); - r_iov->virtual_address = ntoh64(r_iov->virtual_address); - r_iov->segment_length = ntoh32(r_iov->segment_length); + status = dat_evd_wait(dto_evd, DTO_TIMEOUT, 1, &event, &nmore); + _OK(status, "dat_evd_wait after dat_ep_post_send"); - printf("%d Received RMR from remote: " - "r_iov: r_key_ctx=%x,va="F64x",len=0x%x\n", - getpid(), r_iov->rmr_context, - r_iov->virtual_address, - r_iov->segment_length); + if ((event.event_number != DAT_DTO_COMPLETION_EVENT) && + (ud_test && event.event_number != DAT_IB_DTO_EVENT)) { + printf("unexpected event waiting for RMR context " + "- 0x%x\n", event.event_number); + exit(1); + } + _OK(dto_event->status, "event status for post_recv"); + + /* careful when checking cookies: + * Client - receiving multi messages on a single EP + * Server - not receiving on multiple EP's + */ + if (!server || (server && !multi_eps)) { + if (dto_event->transfered_length != msg_size || + dto_event->user_cookie.as_64 != ctx) { + printf("unexpected event data on recv: len=%d" + " cookie="F64x" expected %d/%d\n", + (int)dto_event->transfered_length, + dto_event->user_cookie.as_64, + msg_size, ctx); + exit(1); + } + /* Server - receiving one message each across many EP's */ + } else { + if (dto_event->transfered_length != msg_size || + dto_event->user_cookie.as_64 != RECV_BUF_INDEX) { + printf("unexpected event data on recv: len=%d" + "cookie="F64x" expected %d/%d\n", + (int)dto_event->transfered_length, + dto_event->user_cookie.as_64, + msg_size, RECV_BUF_INDEX); + exit(1); + } + } + /* swap RMR,address info to host order */ + if (!server || (server && !multi_eps)) + r_iov = (DAT_RMR_TRIPLET*)buf[ctx]; + else + r_iov = (DAT_RMR_TRIPLET*)buf[(i*REG_MEM_COUNT)+RECV_BUF_INDEX]; + + if (ud_test) + r_iov = (DAT_RMR_TRIPLET*)((char*)r_iov + 40); + + r_iov->rmr_context = ntoh32(r_iov->rmr_context); + r_iov->virtual_address = ntoh64(r_iov->virtual_address); + r_iov->segment_length = ntoh32(r_iov->segment_length); + + printf("Recv RMR message: r_iov(%p):" + " r_key_ctx=%x,va="F64x",len=0x%x on EP=%p\n", + r_iov, r_iov->rmr_context, + r_iov->virtual_address, + r_iov->segment_length, + dto_event->ep_handle); + } return(0); } @@ -497,42 +705,43 @@ int disconnect_ep(void) { DAT_RETURN status; - int i; DAT_EVENT event; DAT_COUNT nmore; + int i; - status = dat_ep_disconnect(ep, DAT_CLOSE_DEFAULT); - _OK2(status, "dat_ep_disconnect"); - - status = dat_evd_wait(con_evd, DAT_TIMEOUT_INFINITE, 1, - &event, &nmore); - _OK(status, "dat_evd_wait"); - - if (server) { + if (!ud_test) { + status = dat_ep_disconnect(ep[0], DAT_CLOSE_DEFAULT); + _OK2(status, "dat_ep_disconnect"); + + status = dat_evd_wait(con_evd, DAT_TIMEOUT_INFINITE, 1, + &event, &nmore); + _OK(status, "dat_evd_wait"); + } + if (psp) { status = dat_psp_free(psp); _OK2(status, "dat_psp_free"); } - - for (i = 0; i < REG_MEM_COUNT; i++) { + for (i = 0; i < REG_MEM_COUNT*eps; i++) { status = dat_lmr_free(lmr[ i ]); _OK2(status, "dat_lmr_free"); } - if (lmr_atomic) { status = dat_lmr_free(lmr_atomic); _OK2(status, "dat_lmr_free_atomic"); } - - status = dat_ep_free(ep); - _OK2(status, "dat_ep_free"); - + for (i=0;i add locking around the modify_qp state changes to avoid unnecessary modify_qp calls during multiple resolve remote AH connection events on a single EP. Signed-off-by: Arlin Davis --- dapl/openib_scm/dapl_ib_cm.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/dapl/openib_scm/dapl_ib_cm.c b/dapl/openib_scm/dapl_ib_cm.c index 5a6aa97..80a7d5e 100644 --- a/dapl/openib_scm/dapl_ib_cm.c +++ b/dapl/openib_scm/dapl_ib_cm.c @@ -470,6 +470,7 @@ dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) } /* modify QP to RTR and then to RTS with remote info */ + dapl_os_lock(&ep_ptr->header.lock); if (dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_RTR, cm_ptr) != DAT_SUCCESS) { dapl_log(DAPL_DBG_TYPE_ERR, @@ -477,6 +478,7 @@ dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) strerror(errno), inet_ntoa(((struct sockaddr_in *) ep_ptr->param.remote_ia_address_ptr)->sin_addr)); + dapl_os_unlock(&ep_ptr->header.lock); goto bail; } if (dapls_modify_qp_state(ep_ptr->qp_handle, @@ -486,8 +488,10 @@ dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) strerror(errno), inet_ntoa(((struct sockaddr_in *) ep_ptr->param.remote_ia_address_ptr)->sin_addr)); + dapl_os_unlock(&ep_ptr->header.lock); goto bail; } + dapl_os_unlock(&ep_ptr->header.lock); dapl_dbg_log(DAPL_DBG_TYPE_EP," connect_rtu: send RTU\n"); /* complete handshake after final QP state change */ @@ -781,6 +785,7 @@ dapli_socket_accept_usr(DAPL_EP *ep_ptr, #endif /* modify QP to RTR and then to RTS with remote info already read */ + dapl_os_lock(&ep_ptr->header.lock); if (dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_RTR, cm_ptr) != DAT_SUCCESS) { dapl_log(DAPL_DBG_TYPE_ERR, @@ -788,6 +793,7 @@ dapli_socket_accept_usr(DAPL_EP *ep_ptr, strerror(errno), inet_ntoa(((struct sockaddr_in *) &cm_ptr->dst.ia_address)->sin_addr)); + dapl_os_unlock(&ep_ptr->header.lock); goto bail; } if (dapls_modify_qp_state(ep_ptr->qp_handle, @@ -797,9 +803,11 @@ dapli_socket_accept_usr(DAPL_EP *ep_ptr, strerror(errno), inet_ntoa(((struct sockaddr_in *) &cm_ptr->dst.ia_address)->sin_addr)); + dapl_os_unlock(&ep_ptr->header.lock); goto bail; } - + dapl_os_unlock(&ep_ptr->header.lock); + /* save remote address information */ dapl_os_memcpy( &ep_ptr->remote_ia_address, &cm_ptr->dst.ia_address, -- 1.5.2.5 From arlin.r.davis at intel.com Wed Sep 24 13:44:08 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 24 Sep 2008 13:44:08 -0700 Subject: [ofa-general] [PATCH 5/5][v2.0] build/install: $(DESTDIR) prepend needed on install hooks for dat.conf Message-ID: <000501c91e86$4ac64ad0$db97070a@amr.corp.intel.com> All install directives that automake creates automatically have $(DESTDIR) prepended to them so that a make DESTDIR= install will work. The hand written install hooks for dat.conf was missing DESTDIR. Signed-off-by: Doug Ledford --- Makefile.am | 33 ++++++++++++++++++--------------- dapl.spec.in | 1 + 2 files changed, 19 insertions(+), 15 deletions(-) diff --git a/Makefile.am b/Makefile.am index 4cb339f..8afa666 100755 --- a/Makefile.am +++ b/Makefile.am @@ -400,24 +400,27 @@ dist-hook: dapl.spec cp dapl.spec $(distdir) install-exec-hook: - if test -e $(sysconfdir)/dat.conf; then \ - sed -e '/ofa-v2-.* u2/d' < $(sysconfdir)/dat.conf > /tmp/$$$$ofadapl; \ - cp /tmp/$$$$ofadapl $(sysconfdir)/dat.conf; \ + if ! test -d $(DESTDIR)$(sysconfdir); then \ + mkdir -p $(DESTDIR)$(sysconfdir); \ fi; \ - echo ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib0 0" ""' >> $(sysconfdir)/dat.conf; \ - echo ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib1 0" ""' >> $(sysconfdir)/dat.conf; \ - echo ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"mthca0 1" ""' >> $(sysconfdir)/dat.conf; \ - echo ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"mthca0 2" ""' >> $(sysconfdir)/dat.conf; \ - echo ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"mlx4_0 1" ""' >> $(sysconfdir)/dat.conf; \ - echo ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"mlx4_0 2" ""' >> $(sysconfdir)/dat.conf; - echo ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"ipath0 1" ""' >> $(sysconfdir)/dat.conf; \ - echo ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"ipath0 2" ""' >> $(sysconfdir)/dat.conf; - echo ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"ehca0 1" ""' >> $(sysconfdir)/dat.conf; + if test -e $(DESTDIR)$(sysconfdir)/dat.conf; then \ + sed -e '/ofa-v2-.* u2/d' < $(DESTDIR)$(sysconfdir)/dat.conf > /tmp/$$$$ofadapl; \ + cp /tmp/$$$$ofadapl $(DESTDIR)$(sysconfdir)/dat.conf; \ + fi; \ + echo ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib0 0" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib1 0" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"mthca0 1" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"mthca0 2" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"mlx4_0 1" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"mlx4_0 2" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; + echo ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"ipath0 1" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"ipath0 2" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; + echo ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 '"ehca0 1" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; uninstall-hook: - if test -e $(sysconfdir)/dat.conf; then \ - sed -e '/ofa-v2-.* u2/d' < $(sysconfdir)/dat.conf > /tmp/$$$$ofadapl; \ - cp /tmp/$$$$ofadapl $(sysconfdir)/dat.conf; \ + if test -e $(DESTDIR)$(sysconfdir)/dat.conf; then \ + sed -e '/ofa-v2-.* u2/d' < $(DESTDIR)$(sysconfdir)/dat.conf > /tmp/$$$$ofadapl; \ + cp /tmp/$$$$ofadapl $(DESTDIR)$(sysconfdir)/dat.conf; \ fi; SUBDIRS = . test/dtest test/dapltest diff --git a/dapl.spec.in b/dapl.spec.in index b33271b..ce39cd9 100644 --- a/dapl.spec.in +++ b/dapl.spec.in @@ -83,6 +83,7 @@ rm -rf %{buildroot} make DESTDIR=%{buildroot} install # remove unpackaged files from the buildroot rm -f %{buildroot}%{_libdir}/*.la +rm -f %{buildroot}%{_sysconfdir}/*.conf %clean rm -rf %{buildroot} -- 1.5.2.5 From arlin.r.davis at intel.com Wed Sep 24 14:23:08 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 24 Sep 2008 14:23:08 -0700 Subject: [ofa-general] [PATCH][v1.2] build/install: $(DESTDIR) prepend needed on install hooks for dat.conf Message-ID: <000601c91e8b$bdbedf20$db97070a@amr.corp.intel.com> All install directives that automake creates automatically have $(DESTDIR) prepended to them so that a make DESTDIR= install will work. The hand written install hooks for dat.conf was missing DESTDIR. Signed-off-by: Doug Ledford Signed-off-by: Arlin Davis --- Makefile.am | 27 +++++++++++++++------------ dapl.spec.in | 1 + 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/Makefile.am b/Makefile.am index 29e6b3b..1dd996c 100644 --- a/Makefile.am +++ b/Makefile.am @@ -383,22 +383,25 @@ dist-hook: dapl.spec cp dapl.spec $(distdir) install-exec-hook: - if test -e $(sysconfdir)/dat.conf; then \ + if ! test -d $(DESTDIR)$(sysconfdir); then \ + mkdir -p $(DESTDIR)$(sysconfdir); \ + fi; \ + if test -e $(DESTDIR)$(sysconfdir)/dat.conf; then \ echo "exec-hook"; \ - sed -e '/OpenIB-.*/d' < $(sysconfdir)/dat.conf > /tmp/$$$$OpenIBdapl; \ - cp /tmp/$$$$OpenIBdapl $(sysconfdir)/dat.conf; \ + sed -e '/OpenIB-.*/d' < $(DESTDIR)$(sysconfdir)/dat.conf > /tmp/$$$$OpenIBdapl; \ + cp /tmp/$$$$OpenIBdapl $(DESTDIR)$(sysconfdir)/dat.conf; \ fi; \ - echo OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib0 0" ""' >> $(sysconfdir)/dat.conf; \ - echo OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib1 0" ""' >> $(sysconfdir)/dat.conf; \ - echo OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 '"mthca0 1" ""' >> $(sysconfdir)/dat.conf; \ - echo OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 '"mthca0 2" ""' >> $(sysconfdir)/dat.conf; \ - echo OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 '"mlx4_0 1" ""' >> $(sysconfdir)/dat.conf; \ - echo OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 '"mlx4_0 2" ""' >> $(sysconfdir)/dat.conf; + echo OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib0 0" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib1 0" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 '"mthca0 1" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 '"mthca0 2" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 '"mlx4_0 1" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; \ + echo OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 '"mlx4_0 2" ""' >> $(DESTDIR)$(sysconfdir)/dat.conf; uninstall-hook: - if test -e $(sysconfdir)/dat.conf; then \ - sed -e '/OpenIB-.* u1/d' < $(sysconfdir)/dat.conf > /tmp/$$$$OpenIBdapl; \ - cp /tmp/$$$$OpenIBdapl $(sysconfdir)/dat.conf; \ + if test -e $(DESTDIR)$(sysconfdir)/dat.conf; then \ + sed -e '/OpenIB-.* u1/d' < $(DESTDIR)$(sysconfdir)/dat.conf > /tmp/$$$$OpenIBdapl; \ + cp /tmp/$$$$OpenIBdapl $(DESTDIR)$(sysconfdir)/dat.conf; \ fi; SUBDIRS = . test/dtest test/dapltest diff --git a/dapl.spec.in b/dapl.spec.in index 8c8f62a..b3d103e 100644 --- a/dapl.spec.in +++ b/dapl.spec.in @@ -83,6 +83,7 @@ rm -rf %{buildroot} make DESTDIR=%{buildroot} install # remove unpackaged files from the buildroot rm -f %{buildroot}%{_libdir}/*.la +rm -f %{buildroot}%{_sysconfdir}/*.conf %clean rm -rf %{buildroot} -- 1.5.2.5 From sdake at redhat.com Wed Sep 24 15:43:56 2008 From: sdake at redhat.com (Steven Dake) Date: Wed, 24 Sep 2008 15:43:56 -0700 Subject: [ofa-general] general questions about librdmacm Message-ID: <1222296236.29287.17.camel@balance> Developers, I am a maintainer of a project called openais/corosync (www.openais.org) which implements a network protocol called Totem. This code is the basis for much of the community work on clustering in Linux and other platforms. I don't yet have hardware, but will shortly and intend to add OFED RDMA support to the base Totem protocol used for our communications. Totem is a reliable virtual synchrony multicast protocol which transmits a message from any node to all nodes in a collection of computers (called the configuration or membership). It has a few requirements: unreliable datagram multicast unreliable datagram unicast ability to bind to a specific port and interface ability to poll() (POLLIN) via system call for new multicast datagram messages Today Totem is based upon IP(v4 or v6 are supported) and uses UDP. Few questions: 1) I would like to continue to use IP addressing but it looks like I have to use a different addressing model in librdmacm. I looked at the examples in the library and it isn't clear to me whether they use IP addressing or some other addressing model. I see references to IPoverIB but I don't see any information in the wiki on the topic. Anyone have links to documentation on the topic of node addressing? 2) The library doesn't have any non blocking (kernel wait queue based) polling mechanism that I can see. Am I missing a call here? 3) Of course using the standard socket API would be highly desired as it requires less code changes. Is there some other library I should be using? Regards -steve From wangwhao at cn.ibm.com Wed Sep 24 16:58:11 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Thu, 25 Sep 2008 07:58:11 +0800 Subject: ***SPAM*** =?GB2312?B?16q3ojogW29mYS1nZW5lcmFsXSBpYnN5c3N0YXQgY3B1IG91dHB1dCBpcw==?= =?GB2312?B?IGluY29tcGxldGU=?= Message-ID: Hi, Sasha Khapyorsky: Would you please give any advice about this issue? Thanks. > Hi all: > > I find the output if "ibsysstat cpu" is not complete. This issue exists on all my cluster nodes, with RHEL/SLES and OFED 1.3.1/1.4-RC1 installed. > > [root at xblade07 ~]# ibsysstat 13 cpu > cpu 0: model Genuine Intel(R) CPU @ 2.83GHz MHZ 2833.512 > cpu 1: model Genuine Intel(R) CPU @ 2.83GHz MHZ 2833.512 > cpu 2: model Genuine Intel(R) CPU @ 2.83GHz M ---------------------> something missed > [root at xblade07 ~]# ibsysstat 13 ping > sysstat ping succeeded > [root at xblade07 ~]# ibsysstat 13 host > xblade03 > [root at xblade07 ~]# ssh xblade03 cat /proc/cpuinfo > root at xblade03's password: > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Genuine Intel(R) CPU @ 2.83GHz > stepping : 4 > cpu MHz : 2833.512 > cache size : 6144 KB > physical id : 0 > siblings : 4 > core id : 0 > cpu cores : 4 > fpu : yes > vfpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr > lahf_lm > bogomips : 5669.08 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > > processor : 1 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Genuine Intel(R) CPU @ 2.83GHz > stepping : 4 > cpu MHz : 2833.512 > cache size : 6144 KB > physical id : 1 > siblings : 4 > core id : 0 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > bogomips : 5666.07 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > > processor: 2 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Genuine Intel(R) CPU @ 2.83GHz > stepping : 4 > cpu MHz : 2833.512 > cache size : 6144 KB > physical id : 0 > siblings : 4 > core id : 1 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > bogomips : 5666.07 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > > processor : 3 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Genuine Intel(R) CPU @ 2.83GHz > stepping : 4 > cpu MHz : 2833.512 > cache size : 6144 KB > physical id : 0 > siblings : 4 > core id : 2 > cpu cores: 4 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > bogomips : 5666.13 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > > processor : 4 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Genuine Intel(R) CPU @ 2.83GHz > stepping : 4 > cpu MHz : 2833.512 > cache size : 6144 KB > physical id : 0 > siblings : 4 > core id : 3 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > bogomips : 5666.13 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > > processor : 5 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Genuine Intel(R) CPU @ 2.83GHz > stepping : 4 > cpu MHz : 2833.512 > cache size : 6144 KB > physical id : 1 > siblings : 4 > core id : 1 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > bogomips : 5666.09 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > > processor : 6 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Genuine Intel(R) CPU @ 2.83GHz > stepping : 4 > cpu MHz : 2833.512 > cache size : 6144 KB > physical id : 1 > siblings : 4 > core id : 2 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > bogomips : 5666.16 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > > processor : 7 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Genuine Intel(R) CPU @ 2.83GHz > stepping : 4 > cpu MHz : 2833.512 > cache size : 6144 KB > physical id : 1 > siblings : 4 > core id : 3 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > bogomips : 5666.15 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > > The cluster contains IBM HS21 XM blades and LS21 blades. I have not tried OFED 1.4 RC2, because I failed to find any related fixed bug in RC2 fixed bug list. Is this one known limitation/bug, or I missed some needed configuration? > > Thanks in advance! Wen Hao Wang Email: wangwhao at cn.ibm.com_______________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli at dev.mellanox.co.il Thu Sep 25 02:38:02 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Thu, 25 Sep 2008 12:38:02 +0300 Subject: [ofa-general] Re: [ewg] Re: [PATCH] IB/ipoib: avoid WARN_ON on NULL path->ah In-Reply-To: References: <20080921132522.GA25090@mtls03> Message-ID: <20080925093802.GA22409@mtls03> On Wed, Sep 24, 2008 at 11:08:05AM -0700, Roland Dreier wrote: > Would it be simpler to just not update path->ah if the query fails? In > the case of the new optimization on path flushing, path->valid will be 0 > so there's no reason to change path->ah to NULL if we're not going to > change path->valid anyway. > > In fact this seems more in keeping with the idea behind ee1e2c82 > ("IPoIB: Refresh paths instead of flushing them on SM change events"), > because your path kills the ah in the neigh struct as soon as a path > record query fails, when the point of the original patch was to keep > using addresses that are probably still valid across an SM failure. > > The patch I'm proposing is the following. If this looks OK, I'll send > it to Linus tomorrow. > I am OK with this. From ronniz at mellanox.co.il Thu Sep 25 02:56:50 2008 From: ronniz at mellanox.co.il (Ronni Zimmermann) Date: Thu, 25 Sep 2008 12:56:50 +0300 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809230906h14a50f09l1f643e9967d3f72c@mail.gmail.com> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD9B57B9@mtlexch01.mtl.com> Rui Machado wrote: > > 2008/9/22 Ronni Zimmermann : > >> Hi, > >> We run tests which use atomic operations (both fetch and > add and comp and swap) on PPC64 all the time, without > experiencing any problem. > >> > >> Just to make sure I ran few simple tests, which use atomic > operations, on our PPC64 machines, both with SLES10 SP1 and > with RHAS5.1, and all of them passed. > >> I was working with the latest OFED1.4 driver and mlx4 HCA > with the latest released FW and with FW 2.3.000 (on the > SLES10 SP1 machine). > >> > >> Given the above information I believe that there's either > a problem with your code (although looking at the code you > posted I couldn't see anything wrong) or it's an OFED1.2.5 > issue, as Dotan suggested. > >> > > OK thanks for the feedback. We have ppc64 machines with mlx4 > and mthca0 (from ibv_devinfo) ) Both don't work. Any > experience with the mthca0? It is older and should be better > supported on 1.2.5 or? > My priority is the machines with the mlx4 but of course I > would like to see both working. > Sorry, I have no experience with mthca0 on PPC64 machines. It is indeed an older HCA, but I don't know weather or not it's working properly on PPC64 with ofed 1.2.5. > I also tried with a 2.6.26.2 kernel (had it at hand) with the same > ofed1.2.5 installation and still see the problem. > I guess my last and longest try to install the whole ofed 1.4 package. > Please bear in mind that OFED 1.4 is RC2 and will probably be GA by the end of October. If installing a new driver on youe machine is a big problem for you, and you don't need the new features supported by ofed 1.4 and not by ofed 1.3.1, maybe it'll be better for you to install ofed 1.3.1, which is already GA. We also recommend that you updgrade the FW in your mlx4 HCAs to the latest released FW, which is 2.5.000, since your current FW is quite old. Good luck, Ronni. > btw: changing hardware is not an option :-/ > > Cheers and thanks for the support, > From vlad at lists.openfabrics.org Thu Sep 25 03:13:10 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 25 Sep 2008 03:13:10 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080925-0200 daily build status Message-ID: <20080925101310.7B12AE60DD7@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.21.1 Log: /home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1833: error: 'struct scatterlist' has no member named 'dma_address' /home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h: In function 'ib_sg_dma_len': /home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1846: error: 'struct scatterlist' has no member named 'dma_length' make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath/ipath_dma.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.21.1_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.21.1' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c: In function 'ehca_poll_eqs': /home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:942: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type /home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:946: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.24_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080925-0200_linux-2.6.24_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From ruimario at gmail.com Thu Sep 25 08:08:00 2008 From: ruimario at gmail.com (Rui Machado) Date: Thu, 25 Sep 2008 17:08:00 +0200 Subject: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809250629l52335b7am85f0b361ec1731df@mail.gmail.com> References: <6978b4af0809230906h14a50f09l1f643e9967d3f72c@mail.gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD9B57B9@mtlexch01.mtl.com> <6978b4af0809250629l52335b7am85f0b361ec1731df@mail.gmail.com> Message-ID: <6978b4af0809250808oabb752cv23596c0f77c19732@mail.gmail.com> 2008/9/25 Ronni Zimmermann : > Rui Machado wrote: >> > 2008/9/22 Ronni Zimmermann : >> >> Hi, >> >> We run tests which use atomic operations (both fetch and >> add and comp and swap) on PPC64 all the time, without >> experiencing any problem. >> >> >> >> Just to make sure I ran few simple tests, which use atomic >> operations, on our PPC64 machines, both with SLES10 SP1 and >> with RHAS5.1, and all of them passed. >> >> I was working with the latest OFED1.4 driver and mlx4 HCA >> with the latest released FW and with FW 2.3.000 (on the >> SLES10 SP1 machine). >> >> >> >> Given the above information I believe that there's either >> a problem with your code (although looking at the code you >> posted I couldn't see anything wrong) or it's an OFED1.2.5 >> issue, as Dotan suggested. >> >> >> >> OK thanks for the feedback. We have ppc64 machines with mlx4 >> and mthca0 (from ibv_devinfo) ) Both don't work. Any >> experience with the mthca0? It is older and should be better >> supported on 1.2.5 or? >> My priority is the machines with the mlx4 but of course I >> would like to see both working. >> > Sorry, I have no experience with mthca0 on PPC64 machines. > It is indeed an older HCA, but I don't know weather or not it's working > properly on PPC64 with ofed 1.2.5. > >> I also tried with a 2.6.26.2 kernel (had it at hand) with the same >> ofed1.2.5 installation and still see the problem. >> I guess my last and longest try to install the whole ofed 1.4 package. >> > > Please bear in mind that OFED 1.4 is RC2 and will probably be GA by the > end of October. > If installing a new driver on youe machine is a big problem for you, and > you don't need the new features supported by ofed 1.4 and not by ofed > 1.3.1, maybe it'll be better for you to install ofed 1.3.1, which is > already GA. > Actually I just tried with ofed 1.4 and still see the problem :( I think I installed it correctly with a 2.6.26.2 kernel although I see the warning: libibverbs: Warning: couldn't load driver 'mthca': libmthca-rdmav2.so: cannot open shared object file: No such file or directory A small example using RDMA read is working. I just wanted to see if the problem exists with 1.4 even if it is a RC. Probably I will install 1.3.1 when I solve this problem. And I really need to solve it! > We also recommend that you updgrade the FW in your mlx4 HCAs to the > latest released FW, which is 2.5.000, since your current FW is quite > old. > from http://www.mellanox.com/support/firmware_table_IBM.php I see a 2.3.000 version Isn't this then the latest for my hw? hca_id: mlx4_0 fw_ver: 2.3.000 node_guid: 0002:c903:0000:917c sys_image_guid: 0002:c903:0000:917f vendor_id: 0x02c9 vendor_part_id: 25418 hw_ver: 0xA0 board_id: IBM08A0000001 <------------- phys_port_cnt: 2 > Good luck, > Ronni. Thanks for the help ;) Rui From dotanba at gmail.com Thu Sep 25 16:11:10 2008 From: dotanba at gmail.com (Dotan Barak) Date: Fri, 26 Sep 2008 01:11:10 +0200 Subject: ***SPAM*** Re: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809250808oabb752cv23596c0f77c19732@mail.gmail.com> References: <6978b4af0809230906h14a50f09l1f643e9967d3f72c@mail.gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD9B57B9@mtlexch01.mtl.com> <6978b4af0809250629l52335b7am85f0b361ec1731df@mail.gmail.com> <6978b4af0809250808oabb752cv23596c0f77c19732@mail.gmail.com> Message-ID: <48DC1A8E.5040008@gmail.com> Rui Machado wrote: > 2008/9/25 Ronni Zimmermann : > >> Rui Machado wrote: >> >>>> 2008/9/22 Ronni Zimmermann : >>>> >>>>> Hi, >>>>> We run tests which use atomic operations (both fetch and >>>>> >>> add and comp and swap) on PPC64 all the time, without >>> experiencing any problem. >>> >>>>> Just to make sure I ran few simple tests, which use atomic >>>>> >>> operations, on our PPC64 machines, both with SLES10 SP1 and >>> with RHAS5.1, and all of them passed. >>> >>>>> I was working with the latest OFED1.4 driver and mlx4 HCA >>>>> >>> with the latest released FW and with FW 2.3.000 (on the >>> SLES10 SP1 machine). >>> >>>>> Given the above information I believe that there's either >>>>> >>> a problem with your code (although looking at the code you >>> posted I couldn't see anything wrong) or it's an OFED1.2.5 >>> issue, as Dotan suggested. >>> >>> OK thanks for the feedback. We have ppc64 machines with mlx4 >>> and mthca0 (from ibv_devinfo) ) Both don't work. Any >>> experience with the mthca0? It is older and should be better >>> supported on 1.2.5 or? >>> My priority is the machines with the mlx4 but of course I >>> would like to see both working. >>> >>> >> Sorry, I have no experience with mthca0 on PPC64 machines. >> It is indeed an older HCA, but I don't know weather or not it's working >> properly on PPC64 with ofed 1.2.5. >> >> >>> I also tried with a 2.6.26.2 kernel (had it at hand) with the same >>> ofed1.2.5 installation and still see the problem. >>> I guess my last and longest try to install the whole ofed 1.4 package. >>> >>> >> Please bear in mind that OFED 1.4 is RC2 and will probably be GA by the >> end of October. >> If installing a new driver on youe machine is a big problem for you, and >> you don't need the new features supported by ofed 1.4 and not by ofed >> 1.3.1, maybe it'll be better for you to install ofed 1.3.1, which is >> already GA. >> >> > > Actually I just tried with ofed 1.4 and still see the problem :( > I think I installed it correctly with a 2.6.26.2 kernel although I see > the warning: > libibverbs: Warning: couldn't load driver 'mthca': libmthca-rdmav2.so: > cannot open shared object file: No such file or directory > A small example using RDMA read is working. > > I just wanted to see if the problem exists with 1.4 even if it is a > RC. Probably I will install 1.3.1 when I solve this problem. And I > really need to solve it! > > The problem that you describes is pretty basic and even an RC shouldn't have this issue. I think that you should upgrade the HCA's Firmware. as Ronni suggested. I have a feeling that the problem is in your code: You should access the buffer that the HCA read/write as volatile, to "tip" the compiler that this memory will be modified by other components and he shouldn't do any optimization when you want to read data from it and actually do the reading ... Dotan From ruimario at gmail.com Thu Sep 25 09:29:54 2008 From: ruimario at gmail.com (Rui Machado) Date: Thu, 25 Sep 2008 18:29:54 +0200 Subject: ***SPAM*** Re: [ofa-general] atomic operations on ppc64 In-Reply-To: <48DC1A8E.5040008@gmail.com> References: <6978b4af0809230906h14a50f09l1f643e9967d3f72c@mail.gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD9B57B9@mtlexch01.mtl.com> <6978b4af0809250629l52335b7am85f0b361ec1731df@mail.gmail.com> <6978b4af0809250808oabb752cv23596c0f77c19732@mail.gmail.com> <48DC1A8E.5040008@gmail.com> Message-ID: <6978b4af0809250929j26095702v6d3c0080093c552e@mail.gmail.com> 2008/9/26 Dotan Barak : > Rui Machado wrote: >> >> 2008/9/25 Ronni Zimmermann : >> >>> >>> Rui Machado wrote: >>> >>>>> >>>>> 2008/9/22 Ronni Zimmermann : >>>>> >>>>>> >>>>>> Hi, >>>>>> We run tests which use atomic operations (both fetch and >>>>>> >>>> >>>> add and comp and swap) on PPC64 all the time, without >>>> experiencing any problem. >>>> >>>>>> >>>>>> Just to make sure I ran few simple tests, which use atomic >>>>>> >>>> >>>> operations, on our PPC64 machines, both with SLES10 SP1 and >>>> with RHAS5.1, and all of them passed. >>>> >>>>>> >>>>>> I was working with the latest OFED1.4 driver and mlx4 HCA >>>>>> >>>> >>>> with the latest released FW and with FW 2.3.000 (on the >>>> SLES10 SP1 machine). >>>> >>>>>> >>>>>> Given the above information I believe that there's either >>>>>> >>>> >>>> a problem with your code (although looking at the code you >>>> posted I couldn't see anything wrong) or it's an OFED1.2.5 >>>> issue, as Dotan suggested. >>>> OK thanks for the feedback. We have ppc64 machines with mlx4 >>>> and mthca0 (from ibv_devinfo) ) Both don't work. Any >>>> experience with the mthca0? It is older and should be better >>>> supported on 1.2.5 or? >>>> My priority is the machines with the mlx4 but of course I >>>> would like to see both working. >>>> >>>> >>> >>> Sorry, I have no experience with mthca0 on PPC64 machines. >>> It is indeed an older HCA, but I don't know weather or not it's working >>> properly on PPC64 with ofed 1.2.5. >>> >>> >>>> >>>> I also tried with a 2.6.26.2 kernel (had it at hand) with the same >>>> ofed1.2.5 installation and still see the problem. >>>> I guess my last and longest try to install the whole ofed 1.4 package. >>>> >>>> >>> >>> Please bear in mind that OFED 1.4 is RC2 and will probably be GA by the >>> end of October. >>> If installing a new driver on youe machine is a big problem for you, and >>> you don't need the new features supported by ofed 1.4 and not by ofed >>> 1.3.1, maybe it'll be better for you to install ofed 1.3.1, which is >>> already GA. >>> >>> >> >> Actually I just tried with ofed 1.4 and still see the problem :( >> I think I installed it correctly with a 2.6.26.2 kernel although I see >> the warning: >> libibverbs: Warning: couldn't load driver 'mthca': libmthca-rdmav2.so: >> cannot open shared object file: No such file or directory >> A small example using RDMA read is working. >> >> I just wanted to see if the problem exists with 1.4 even if it is a >> RC. Probably I will install 1.3.1 when I solve this problem. And I >> really need to solve it! >> >> > > The problem that you describes is pretty basic and even an RC shouldn't have > this issue. > Agreed! > I think that you should upgrade the HCA's Firmware. as Ronni suggested. > But I'm not sure about the fw version. As I mentioned, on that Mellanox page the latest firwmare for the IBM version is 2.3.00 which is the one I have. Or am I wrong? > I have a feeling that the problem is in your code: > You should access the buffer that the HCA read/write as volatile, to "tip" > the compiler > that this memory will be modified by other components and he shouldn't do > any optimization > when you want to read data from it and actually do the reading ... > I tried that as you said before but didn't help. And the RDMA read works fine. Of course, it is possible that the problem is with my code. In fact it looks every time closer to this possibility. But can the code be in such a way wrong that it works on x86 but not on ppc. That is what intrigues me. Cheers From dotanba at gmail.com Thu Sep 25 16:48:25 2008 From: dotanba at gmail.com (Dotan Barak) Date: Fri, 26 Sep 2008 01:48:25 +0200 Subject: ***SPAM*** Re: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809250929j26095702v6d3c0080093c552e@mail.gmail.com> References: <6978b4af0809230906h14a50f09l1f643e9967d3f72c@mail.gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD9B57B9@mtlexch01.mtl.com> <6978b4af0809250629l52335b7am85f0b361ec1731df@mail.gmail.com> <6978b4af0809250808oabb752cv23596c0f77c19732@mail.gmail.com> <48DC1A8E.5040008@gmail.com> <6978b4af0809250929j26095702v6d3c0080093c552e@mail.gmail.com> Message-ID: <48DC2349.5070709@gmail.com> >> The problem that you describes is pretty basic and even an RC shouldn't have >> this issue. >> >> > Agreed! > > >> I think that you should upgrade the HCA's Firmware. as Ronni suggested. >> >> > > But I'm not sure about the fw version. As I mentioned, on that > Mellanox page the latest firwmare for the IBM version is 2.3.00 which > is the one I have. Or am I wrong? > I checked Mellanox's site and i here is the URL for the FW of ConnectX: http://www.mellanox.com/support/firmware_table_ConnectXIB.php In this page, i can see that the FW version is 2.5.000. > >> I have a feeling that the problem is in your code: >> You should access the buffer that the HCA read/write as volatile, to "tip" >> the compiler >> that this memory will be modified by other components and he shouldn't do >> any optimization >> when you want to read data from it and actually do the reading ... >> >> > I tried that as you said before but didn't help. > And the RDMA read works fine. > Of course, it is possible that the problem is with my code. In fact it > looks every time closer to this possibility. But can the code be in > such a way wrong that it works on x86 but not on ppc. That is what > intrigues me. > Pay attension that the changes from PPC64 to x86 is not only the pointer size (64 vs. 32) and the endianess issue. It is a different CPU architecture which may cause to many differences. If you wish, i can review your code that handles IB offline... Dotan > Cheers > From gsadasiv7 at gmail.com Thu Sep 25 13:18:00 2008 From: gsadasiv7 at gmail.com (Ganesh Sadasivan) Date: Thu, 25 Sep 2008 13:18:00 -0700 Subject: [ofa-general] ***SPAM*** Synchronous access of RDMA memory Message-ID: <532b813a0809251318m22042e3bh61a8f0df0915fe4c@mail.gmail.com> On a memory region setup for RDMA, is it possible for the local CPU also to write this memory in a synchronous way e,g by doing ibv_post_send? Thanks Ganesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruimario at gmail.com Thu Sep 25 13:22:15 2008 From: ruimario at gmail.com (Rui Machado) Date: Thu, 25 Sep 2008 22:22:15 +0200 Subject: ***SPAM*** Re: [ofa-general] atomic operations on ppc64 In-Reply-To: <48DC2349.5070709@gmail.com> References: <6978b4af0809230906h14a50f09l1f643e9967d3f72c@mail.gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD9B57B9@mtlexch01.mtl.com> <6978b4af0809250629l52335b7am85f0b361ec1731df@mail.gmail.com> <6978b4af0809250808oabb752cv23596c0f77c19732@mail.gmail.com> <48DC1A8E.5040008@gmail.com> <6978b4af0809250929j26095702v6d3c0080093c552e@mail.gmail.com> <48DC2349.5070709@gmail.com> Message-ID: <6978b4af0809251322x5bea6e39ta42293049137f5da@mail.gmail.com> >> >> But I'm not sure about the fw version. As I mentioned, on that >> Mellanox page the latest firwmare for the IBM version is 2.3.00 which >> is the one I have. Or am I wrong? >> > > I checked Mellanox's site and i here is the URL for the FW of ConnectX: > http://www.mellanox.com/support/firmware_table_ConnectXIB.php > > In this page, i can see that the FW version is 2.5.000. OK. I thought I could only use the FW version on the IBM table (I mentioned before) and not the one from the link you provide. >> >> I tried that as you said before but didn't help. >> And the RDMA read works fine. >> Of course, it is possible that the problem is with my code. In fact it >> looks every time closer to this possibility. But can the code be in >> such a way wrong that it works on x86 but not on ppc. That is what >> intrigues me. >> > > Pay attension that the changes from PPC64 to x86 is not only the pointer > size (64 vs. 32) and the > endianess issue. It is a different CPU architecture which may cause to many > differences. > > If you wish, i can review your code that handles IB offline... > Yes, it was in fact an endianess issue. I had some "wrong" pointers and then was being mislead by the printed values. Took a while but I learned something in between :) I will install ofed 1.3.1 to be on the stable side of the force. Many thanks, in particular to Dotan for his review and will to help me. Cheers From rdreier at cisco.com Thu Sep 25 15:28:08 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 Sep 2008 15:28:08 -0700 Subject: [ofa-general] [PATCH] IPoIB: Fix crash when path record fails after path flush Message-ID: From: Roland Dreier Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") changed how paths are flushed on an SM event. This change introduces a problem if the path record query triggered by fails, causing path->ah to become NULL. A later successful path query will then trigger WARN_ON() in path_rec_completion(), and crash because path->ah has already been freed, so the ipoib_put_ah() inside the lock in path_rec_completion() may actually drop the last reference (contrary to the comment that claims this is safe). Fix this by updating path->ah and freeing old_ah only when the path record query is successful. This prevents the neighbour AH and that path AH from getting out of sync. This fixes Reported-by: Rabah Salem Debugged-by: Eli Cohen Signed-off-by: Roland Dreier --- Hi Linus, One more patch for 2.6.27. This fixes a regression from 2.6.26 that causes a panic with IP-over-InfiniBand on some network events. Please apply. Thanks, Roland drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 1b1df5c..e9ca3cb 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -404,7 +404,7 @@ static void path_rec_completion(int status, struct net_device *dev = path->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_ah *ah = NULL; - struct ipoib_ah *old_ah; + struct ipoib_ah *old_ah = NULL; struct ipoib_neigh *neigh, *tn; struct sk_buff_head skqueue; struct sk_buff *skb; @@ -428,12 +428,12 @@ static void path_rec_completion(int status, spin_lock_irqsave(&priv->lock, flags); - old_ah = path->ah; - path->ah = ah; - if (ah) { path->pathrec = *pathrec; + old_ah = path->ah; + path->ah = ah; + ipoib_dbg(priv, "created address handle %p for LID 0x%04x, SL %d\n", ah, be16_to_cpu(pathrec->dlid), pathrec->sl); -- 1.6.0.1 From dotanba at gmail.com Fri Sep 26 01:45:41 2008 From: dotanba at gmail.com (Dotan Barak) Date: Fri, 26 Sep 2008 10:45:41 +0200 Subject: ***SPAM*** Re: [ofa-general] ***SPAM*** Synchronous access of RDMA memory In-Reply-To: <532b813a0809251318m22042e3bh61a8f0df0915fe4c@mail.gmail.com> References: <532b813a0809251318m22042e3bh61a8f0df0915fe4c@mail.gmail.com> Message-ID: <48DCA135.2000904@gmail.com> Ganesh Sadasivan wrote: > > On a memory region setup for RDMA, is it possible for the local CPU > also to write this memory > in a synchronous way e,g by doing ibv_post_send? CPU can access the memory locally, so it don't need any IB access. If the question was: can an HCA write to a local memory using RDMA write, than the answer is "yes". (if you'll set up the connection right). Dotan From vlad at lists.openfabrics.org Fri Sep 26 03:14:22 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 26 Sep 2008 03:14:22 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080926-0200 daily build status Message-ID: <20080926101422.1A076E60E16@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.21.1 Log: /home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1833: error: 'struct scatterlist' has no member named 'dma_address' /home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h: In function 'ib_sg_dma_len': /home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1846: error: 'struct scatterlist' has no member named 'dma_length' make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath/ipath_dma.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.21.1_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.21.1' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c: In function 'ehca_poll_eqs': /home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:942: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type /home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:946: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.24_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080926-0200_linux-2.6.24_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From Robert at saq.co.uk Fri Sep 26 03:13:02 2008 From: Robert at saq.co.uk (Robert Dunkley) Date: Fri, 26 Sep 2008 11:13:02 +0100 Subject: [ofa-general] Centos - Failure of QLogic Infinipath Driver, How do I stop it loading? Message-ID: Hi, I made a small mistake in my OFED compile and the QLogic driver is broken, I use Mellanox so don't need it. How do I stop it loading on bootup? Centos says the following on bootup: Loading QLogic Infinipath Driver: Failed Thanks in advance, Rob The SAQ Group Registered Office: 18 Chapel Street, Petersfield, Hampshire. GU32 3DZ SEMTEC Limited trading as SAQ is Registered in England & Wales Company Number: 06481952 http://www.saqnet.co.uk AS29219 SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business. DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support. Find us in http://www.thebestof.co.uk/petersfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From yosefe at Voltaire.COM Fri Sep 26 04:24:49 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Fri, 26 Sep 2008 14:24:49 +0300 Subject: [ofa-general] Centos - Failure of QLogic Infinipath Driver, How do I stop it loading? In-Reply-To: References: Message-ID: <48DCC681.8080508@Voltaire.COM> In /etc/infiniband/openib.conf, replace: QLGC_VNIC_LOAD=yes with: QLGC_VNIC_LOAD=no Robert Dunkley wrote: > Hi, > > > > I made a small mistake in my OFED compile and the QLogic driver is > broken, I use Mellanox so don’t need it. How do I stop it loading on bootup? > From amar.mudrankit at qlogic.com Fri Sep 26 06:08:11 2008 From: amar.mudrankit at qlogic.com (Amar Mudrankit) Date: Fri, 26 Sep 2008 18:38:11 +0530 Subject: ***SPAM*** Re: [ofa-general] Centos - Failure of QLogic Infinipath Driver, How do I stop it loading? In-Reply-To: <48DCC681.8080508@Voltaire.COM> References: <48DCC681.8080508@Voltaire.COM> Message-ID: On Fri, Sep 26, 2008 at 4:54 PM, Yossi Etigin wrote: > In /etc/infiniband/openib.conf, replace: > QLGC_VNIC_LOAD=yes > with: > QLGC_VNIC_LOAD=no > This controls the QLogic VNIC driver. I think Robert wants to disable the loading of QLogic Infinipath driver, which can be achieved through setting in /etc/infiniband/openib.conf, the parameter IPATH_LOAD=no Regards, Amar From keshetti.mahesh at gmail.com Fri Sep 26 04:21:45 2008 From: keshetti.mahesh at gmail.com (Keshetti Mahesh) Date: Fri, 26 Sep 2008 16:51:45 +0530 Subject: [ofa-general] ***SPAM*** ibdm network topology format Message-ID: <829ded920809260421h693cafk63beac5881defeda@mail.gmail.com> Hello all, Is there anyway/tool available to convert the Infiniband network topology from the 'ibnetdiscover' format to the format understood by 'ibdm' ? Thanks in advance, Mahesh From sferris at acm.org Fri Sep 26 08:54:08 2008 From: sferris at acm.org (Scott M. Ferris) Date: Fri, 26 Sep 2008 10:54:08 -0500 Subject: [ofa-general] general questions about librdmacm In-Reply-To: <1222296236.29287.17.camel@balance> References: <1222296236.29287.17.camel@balance> Message-ID: <20080926155408.GA35815@sferris.acm.org> On Wed, Sep 24, 2008 at 03:43:56PM -0700, Steven Dake wrote: > > Totem is a reliable virtual synchrony multicast protocol which transmits > a message from any node to all nodes in a collection of computers > (called the configuration or membership). It has a few requirements: > > unreliable datagram multicast > unreliable datagram unicast > ability to bind to a specific port and interface > ability to poll() (POLLIN) via system call for new multicast datagram > messages It's been a few years since I read the totem paper, but If I recall correctly, totem also has certain ordering requirements about unicast and multicast, which the paper asserts are true for ethernet. It's not clear to me that Infiniband will provide the same guarantees in all cases. If unicasts and multicasts are sent on different queue pairs, I'm not sure any ordering is guaranteed. I can also imagine the IB virtual lane (VL) feature potentially reordering delivery if messages end up in different VLs. I'd recommend talking to someone more knowledgable in Infiniband than I am to check what you need to do to meet the totem ordering requirement. > Few questions: > 1) I would like to continue to use IP addressing but it looks like I > have to use a different addressing model in librdmacm. I looked at the > examples in the library and it isn't clear to me whether they use IP > addressing or some other addressing model. I see references to IPoverIB > but I don't see any information in the wiki on the topic. Anyone have > links to documentation on the topic of node addressing? IPoIB is fairly transparent to you. The OFED software provides Linux netdevices (e.g. ib0, ib1) which pass IP traffic just like an ethernet netdevice would. Normal IP routing controls what interface gets used. For the most part IP-based software just works. Low-level things like DHCP daemons will notice differences, since the hardware addresses are larger. If IPoIB connected mode is being used, unicasts and multicasts will be sent on different queue pairs, since the unicasts will use a connection, and the multicasts can't. You may need to disable IPoIB connected mode in order to get the ordering guarantees totem needs, since I'm not sure any ordering is guaranteed between messages queued to different queue pairs. If you want to use native IB protocols instead of IPoIB, librdmacm will be easier than using libibcm. The RDMA communication manager uses IPoIB to resolve IP addresses to native IB addresses, so you can avoid changing your addressing model. The rdma_cm(7) man page describes how to bind and connect. > 2) The library doesn't have any non blocking (kernel wait queue based) > polling mechanism that I can see. Am I missing a call here? Look at ibv_create_comp_channel(3) and ibv_get_cq_event(3). > 3) Of course using the standard socket API would be highly desired as > it requires less code changes. Is there some other library I should be > using? IPoIB gives you the standard socket API. RDMA CM uses a similiar set of calls for binding and connecting, though there are some differences. rdma_cm_ids are somewhat like socket fds. Once you have a multicast or unicast rmda_cm_id, you use the IB verbs API (libibverbs) to send and receive. -- Scott M. Ferris, sferris at acm.org From YJia at tmriusa.com Fri Sep 26 10:39:27 2008 From: YJia at tmriusa.com (Yicheng Jia) Date: Fri, 26 Sep 2008 12:39:27 -0500 Subject: [ofa-general] HCA takes longer time to get LID after opensm restart In-Reply-To: Message-ID: Hi Hal, I use opensm with a single unmanaged switch to connect several HCAs. I found that HCAs take much longer time to get LID from SA after opensm restart without hard reset the switch. If the switch is hard reset before opensm restart, then HCAs get their LIDs sooner. I'm wondering if there's minhop table pre-existing in the switch which will prevent HCAs from regain their LIDs soon. Is there any option to reset the switch during opensm start? Thanks! Yicheng "Hal Rosenstock" 07/10/2008 09:15 PM To "Yicheng Jia" cc "Jim Mott" , general at lists.openfabrics.org Subject Re: [ofa-general] minimum sw components requirement for driver/opensm in a single unmanaged switch network On Thu, Jul 10, 2008 at 7:39 PM, Yicheng Jia wrote: > >> If you want to avoid all the SM stuff, and are willing to program the >> switches directly (a few mads) > > Is it done by opensm? Yes. > What information should be set up in the switch by > opensm? Things like the PortInfos and LFT. See IBA spec vol 1 14.2.5 >> Then to figure out QP connections, you just use a function of 3 >> parameters: >> my_qp_num = fn_sqp(my_node, target_node, qp_num) >> target_qp_num = fn_tqp(my_node, target_node, qp_num) >> Where qp_num is a small number between 0 and the maximum number of QPs you >> need active between any 2 endpoints. > > Can the qp_num be manually assigned? > Does it need opensm be involved? SM has nothing to do with QP numbers. >> If it works, you are done. If not, reset, up, wait for him to connect and >> send something to you. > > Is it reliable? I mean the QPs connection will keep alive during the QPs > lifecycle? For one thing, SM needs to try to keep ports at active. -- Hal > Best, > Yicheng > > > > "Jim Mott" > > 07/10/2008 04:17 PM > > To > "Yicheng Jia" , > cc > Subject > RE: [ofa-general] minimum sw components requirement for driver/opensm in a > single unmanaged switch network > > > > > If you want to avoid all the SM stuff, and are willing to program the > switches directly (a few mads), then I've used schemes like: > > Node LID=base + (switch port * constant) (base=0, constant = 1 works) > > Then to figure out QP connections, you just use a function of 3 parameters: > my_qp_num = fn_sqp(my_node, target_node, qp_num) > target_qp_num = fn_tqp(my_node, target_node, qp_num) > Where qp_num is a small number between 0 and the maximum number of QPs you > need active between any 2 endpoints. > > With the above scheme, you know your node_id (switch port number), your lid, > the lid of the target node, and the QPs on both sides. From there on, it > is clear sailing. You don't even need to send MADs; just transition the QP > up and try and use it. If it works, you are done. If not, reset, up, wait > for him to connect and send something to you. A little timer to make sure > everybody retries once in awhile and what can go wrong? > > Jim > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Yicheng Jia > Sent: Thursday, July 10, 2008 2:59 PM > To: general at lists.openfabrics.org > Subject: [ofa-general] minimum sw components requirement for driver/opensm > in a single unmanaged switch network > > > Hi Folks, > > I have a IB network which consists of only a single unmanaged switch, all > end nodes connecting with the switch only need to do RDMA read/write > operation with each other. My question is, what are the indispensable > modules in driver's core and opensm that make the network up and run? > > I've been using only ib_mad module in driver's core with a managed switch > before, and the network works fine. So I assume that only the ib_mad module > in driver's core and SM in opensm are mandatory in my network. The LIDs are > assigned by them. The SA and CM modules are not useful in my case. Am I > right? > > I need to minimize driver and opensm to fit them in my network, the HCA > driver is mthca. > > Best, > Yicheng > _____________________________________________________________________________ > Scanned by IBM Email Security Management Services powered by MessageLabs. > For more information please visit http://www.ers.ibm.com > _____________________________________________________________________________ > > _____________________________________________________________________________ > Scanned by IBM Email Security Management Services powered by MessageLabs. > For more information please visit http://www.ers.ibm.com > _____________________________________________________________________________ > > _____________________________________________________________________________ > Scanned by IBM Email Security Management Services powered by MessageLabs. > For more information please visit http://www.ers.ibm.com > _____________________________________________________________________________ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _____________________________________________________________________________ Scanned by IBM Email Security Management Services powered by MessageLabs. For more information please visit http://www.ers.ibm.com _____________________________________________________________________________ _____________________________________________________________________________ Scanned by IBM Email Security Management Services powered by MessageLabs. For more information please visit http://www.ers.ibm.com _____________________________________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 7/13] RDMA/nes: Use Ethtool timer value Message-ID: <200809262008.m8QK8Aru011704@velma.neteffect.com> Author: John Lacombe Use timer value set via Ethtool intead of #defines. Signed-off-by: John Lacombe Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_hw.c | 9 ++++----- 1 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index bc16fc0..515c071 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -222,11 +222,10 @@ static void nes_nic_tune_timer(struct nes_device *nesdev) } /* boundary checking */ - if (shared_timer->timer_in_use > NES_NIC_FAST_TIMER_HIGH) - shared_timer->timer_in_use = NES_NIC_FAST_TIMER_HIGH; - else if (shared_timer->timer_in_use < NES_NIC_FAST_TIMER_LOW) { - shared_timer->timer_in_use = NES_NIC_FAST_TIMER_LOW; - } + if (shared_timer->timer_in_use > shared_timer->threshold_high) + shared_timer->timer_in_use = shared_timer->threshold_high; + else if (shared_timer->timer_in_use < shared_timer->threshold_low) + shared_timer->timer_in_use = shared_timer->threshold_low; nesdev->currcq_count = 0; From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 1/13] RDMA/nes: sysfs permissions Message-ID: <200809262008.m8QK8A2Y011692@velma.neteffect.com> Author: Chien Tung Change permission to 0644 for following files: /sys/module/iw_nes/parameters/mpa_version /sys/module/iw_nes/parameters/disable_mpa_crc /sys/module/iw_nes/parameters/send_first /sys/module/iw_nes/parameters/nes_drv_opt Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- Roland, Please queue up this patch series for 2.6.28 after "nes_cm.c cleanup" patch and "4 port 1G HP blade card support" patch. Thanks, Chien drivers/infiniband/hw/nes/nes.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index a539685..30c93db 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -70,21 +70,21 @@ int interrupt_mod_interval = 0; /* Interoperability */ int mpa_version = 1; -module_param(mpa_version, int, 0); +module_param(mpa_version, int, 0644); MODULE_PARM_DESC(mpa_version, "MPA version to be used int MPA Req/Resp (0 or 1)"); /* Interoperability */ int disable_mpa_crc = 0; -module_param(disable_mpa_crc, int, 0); +module_param(disable_mpa_crc, int, 0644); MODULE_PARM_DESC(disable_mpa_crc, "Disable checking of MPA CRC"); unsigned int send_first = 0; -module_param(send_first, int, 0); +module_param(send_first, int, 0644); MODULE_PARM_DESC(send_first, "Send RDMA Message First on Active Connection"); unsigned int nes_drv_opt = 0; -module_param(nes_drv_opt, int, 0); +module_param(nes_drv_opt, int, 0644); MODULE_PARM_DESC(nes_drv_opt, "Driver option parameters"); unsigned int nes_debug_level = 0; From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 11/13] RDMA/nes: Stop critical error interrupts Message-ID: <200809262008.m8QK8Alb011713@velma.neteffect.com> Author: Chien Tung Mask off a critical error after 100 critical error interrupts to keep the system "sane". Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_hw.c | 40 +++++++++++++++++++++++++---------- drivers/infiniband/hw/nes/nes_hw.h | 1 + 2 files changed, 29 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 515c071..fc45e9c 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -55,7 +55,7 @@ u32 int_mod_cq_depth_24; u32 int_mod_cq_depth_16; u32 int_mod_cq_depth_4; u32 int_mod_cq_depth_1; - +static const u8 nes_max_critical_error_count = 100; #include "nes_cm.h" static void nes_cqp_ce_handler(struct nes_device *nesdev, struct nes_hw_cq *cq); @@ -67,6 +67,7 @@ static void nes_process_aeq(struct nes_device *nesdev, struct nes_hw_aeq *aeq); static void nes_process_ceq(struct nes_device *nesdev, struct nes_hw_ceq *ceq); static void nes_process_iwarp_aeqe(struct nes_device *nesdev, struct nes_hw_aeqe *aeqe); +static void process_critical_error(struct nes_device *nesdev); static void nes_process_mac_intr(struct nes_device *nesdev, u32 mac_number); static unsigned int nes_reset_adapter_ne020(struct nes_device *nesdev, u8 *OneG_Mode); @@ -1352,7 +1353,7 @@ int nes_init_phy(struct nes_device *nesdev) nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc319, 0x0008); nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc31a, 0x0098); nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0026, 0x0E00); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0027, 0x0000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0027, 0x0001); nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0028, 0xA528); /* @@ -1991,7 +1992,30 @@ int nes_napi_isr(struct nes_device *nesdev) } } - +static void process_critical_error(struct nes_device *nesdev) +{ + u32 debug_error; + u32 nes_idx_debug_error_masks0 = 0; + u16 error_module = 0; + + debug_error = nes_read_indexed(nesdev, NES_IDX_DEBUG_ERROR_CONTROL_STATUS); + printk(KERN_ERR PFX "Critical Error reported by device!!! 0x%02X\n", + (u16)debug_error); + nes_write_indexed(nesdev, NES_IDX_DEBUG_ERROR_CONTROL_STATUS, + 0x01010000 | (debug_error & 0x0000ffff)); + if (crit_err_count++ > 10) + nes_write_indexed(nesdev, NES_IDX_DEBUG_ERROR_MASKS1, 1 << 0x17); + error_module = (u16) (debug_error & 0x0F00) >> 8; + if (++nesdev->nesadapter->crit_error_count[error_module-1] >= + nes_max_critical_error_count) { + printk(KERN_ERR PFX "Masking off critical error for module " + "0x%02X\n", (u16)error_module); + nes_idx_debug_error_masks0 = nes_read_indexed(nesdev, + NES_IDX_DEBUG_ERROR_MASKS0); + nes_write_indexed(nesdev, NES_IDX_DEBUG_ERROR_MASKS0, + nes_idx_debug_error_masks0 | (1 << error_module)); + } +} /** * nes_dpc */ @@ -2006,7 +2030,6 @@ void nes_dpc(unsigned long param) u32 timer_stat; u32 temp_int_stat; u32 intf_int_stat; - u32 debug_error; u32 processed_intf_int = 0; u16 processed_timer_int = 0; u16 completion_ints = 0; @@ -2084,14 +2107,7 @@ void nes_dpc(unsigned long param) intf_int_stat = nes_read32(nesdev->regs+NES_INTF_INT_STAT); intf_int_stat &= nesdev->intf_int_req; if (NES_INTF_INT_CRITERR & intf_int_stat) { - debug_error = nes_read_indexed(nesdev, NES_IDX_DEBUG_ERROR_CONTROL_STATUS); - printk(KERN_ERR PFX "Critical Error reported by device!!! 0x%02X\n", - (u16)debug_error); - nes_write_indexed(nesdev, NES_IDX_DEBUG_ERROR_CONTROL_STATUS, - 0x01010000 | (debug_error & 0x0000ffff)); - /* BUG(); */ - if (crit_err_count++ > 10) - nes_write_indexed(nesdev, NES_IDX_DEBUG_ERROR_MASKS1, 1 << 0x17); + process_critical_error(nesdev); } if (NES_INTF_INT_PCIERR & intf_int_stat) { printk(KERN_ERR PFX "PCI Error reported by device!!!\n"); diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 82d0676..1b93c57 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -1096,6 +1096,7 @@ struct nes_adapter { u16 pd_config_base[4]; u16 link_interrupt_count[4]; + u8 crit_error_count[32]; /* the phy index for each port */ u8 phy_index[4]; From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 6/13] RDMA/nes: Correct MAX TSO frags Message-ID: <200809262008.m8QK8AMw011702@velma.neteffect.com> Author: Bob Sharp Use correct define for max TSO fragments. Signed-off-by: Bob Sharp Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_nic.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 77e258a..96db599 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -437,7 +437,7 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev) struct nes_hw_nic_sq_wqe *nic_sqe; struct tcphdr *tcph; /* struct udphdr *udph; */ -#define NES_MAX_TSO_FRAGS 18 +#define NES_MAX_TSO_FRAGS MAX_SKB_FRAGS /* 64K segment plus overflow on each side */ dma_addr_t tso_bus_address[NES_MAX_TSO_FRAGS]; dma_addr_t bus_address; From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 3/13] RDMA/nes: MDC setting Message-ID: <200809262008.m8QK8AWQ011696@velma.neteffect.com> Author: Chien Tung Clear MDC bits before setting them to a new value. Adjust MDC value for 10G. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_hw.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 0e259a8..1437b6e 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -1262,6 +1262,7 @@ int nes_init_phy(struct nes_device *nesdev) if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_1G) { printk(PFX "%s: Programming mdc config for 1G\n", __func__); tx_config = nes_read_indexed(nesdev, NES_IDX_MAC_TX_CONFIG); + tx_config &= 0xFFFFFFE3; tx_config |= 0x04; nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config); } @@ -1327,7 +1328,8 @@ int nes_init_phy(struct nes_device *nesdev) (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) { /* setup 10G MDIO operation */ tx_config = nes_read_indexed(nesdev, NES_IDX_MAC_TX_CONFIG); - tx_config |= 0x14; + tx_config &= 0xFFFFFFE3; + tx_config |= 0x15; nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config); } if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) { From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 12/13] RDMA/nes: Handle AE bounds violation Message-ID: <200809262008.m8QK8AnY011722@velma.neteffect.com> Author: Faisal Latif Handle async error NES_AEQE_AEID_AMP_BOUNDS_VIOLATION. Signed-off-by: Faisal Latif Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_hw.c | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index fc45e9c..5e13c26 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -3191,6 +3191,22 @@ static void nes_process_iwarp_aeqe(struct nes_device *nesdev, nes_cm_disconn(nesqp); break; /* TODO: additional AEs need to be here */ + case NES_AEQE_AEID_AMP_BOUNDS_VIOLATION: + nesqp = *((struct nes_qp **)&context); + spin_lock_irqsave(&nesqp->lock, flags); + nesqp->hw_iwarp_state = iwarp_state; + nesqp->hw_tcp_state = tcp_state; + nesqp->last_aeq = async_event_id; + spin_unlock_irqrestore(&nesqp->lock, flags); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_ACCESS_ERR; + nesqp->ibqp.event_handler(&ibevent, + nesqp->ibqp.qp_context); + } + nes_cm_disconn(nesqp); + break; default: nes_debug(NES_DBG_AEQ, "Processing an iWARP related AE for QP, misc = 0x%04X\n", async_event_id); From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 9/13] RDMA/nes: Correct tso_wqe_length Message-ID: <200809262008.m8QK8AlS011708@velma.neteffect.com> Author: Chien Tung Fix tso_wqe_length calculation. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_nic.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 6abd404..852546b 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -605,6 +605,8 @@ tso_sq_no_longer_full: wqe_fragment_length[wqe_fragment_index] = 0; set_wqe_64bit_value(nic_sqe->wqe_words, NES_NIC_SQ_WQE_FRAG1_LOW_IDX, bus_address); + tso_wqe_length += skb_headlen(skb) - + original_first_length; } while (wqe_fragment_index < 5) { wqe_fragment_length[wqe_fragment_index] = From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 5/13] RDMA/nes: Enable MC/UC after changing MTU Message-ID: <200809262008.m8QK8ACZ011700@velma.neteffect.com> Author: Bob Sharp Re-enable multicast and unicast after changing MTU. Signed-off-by: Bob Sharp Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_nic.c | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 1b0938c..77e258a 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -918,6 +918,10 @@ static int nes_netdev_change_mtu(struct net_device *netdev, int new_mtu) struct nes_device *nesdev = nesvnic->nesdev; int ret = 0; u8 jumbomode = 0; + u32 nic_active; + u32 nic_active_bit; + u32 uc_all_active; + u32 mc_all_active; if ((new_mtu < ETH_ZLEN) || (new_mtu > max_mtu)) return -EINVAL; @@ -931,8 +935,24 @@ static int nes_netdev_change_mtu(struct net_device *netdev, int new_mtu) nes_nic_init_timer_defaults(nesdev, jumbomode); if (netif_running(netdev)) { + nic_active_bit = 1 << nesvnic->nic_index; + mc_all_active = nes_read_indexed(nesdev, + NES_IDX_NIC_MULTICAST_ALL) & nic_active_bit; + uc_all_active = nes_read_indexed(nesdev, + NES_IDX_NIC_UNICAST_ALL) & nic_active_bit; + nes_netdev_stop(netdev); nes_netdev_open(netdev); + + nic_active = nes_read_indexed(nesdev, + NES_IDX_NIC_MULTICAST_ALL); + nic_active |= mc_all_active; + nes_write_indexed(nesdev, NES_IDX_NIC_MULTICAST_ALL, + nic_active); + + nic_active = nes_read_indexed(nesdev, NES_IDX_NIC_UNICAST_ALL); + nic_active |= uc_all_active; + nes_write_indexed(nesdev, NES_IDX_NIC_UNICAST_ALL, nic_active); } return ret; From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 2/13] RDMA/nes: module option - wqm_quanta Message-ID: <200809262008.m8QK8AHs011694@velma.neteffect.com> Author: Chien Tung Add module parameter wqm_quanta. It controls the number of segments transmitted at a time. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes.c | 58 ++++++++++++++++++++++++++++++++++++ drivers/infiniband/hw/nes/nes.h | 2 +- drivers/infiniband/hw/nes/nes_hw.c | 3 +- drivers/infiniband/hw/nes/nes_hw.h | 2 + 4 files changed, 63 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index 30c93db..a2b04d6 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -91,6 +91,10 @@ unsigned int nes_debug_level = 0; module_param_named(debug_level, nes_debug_level, uint, 0644); MODULE_PARM_DESC(debug_level, "Enable debug output level"); +unsigned int wqm_quanta = 0x10000; +module_param(wqm_quanta, int, 0644); +MODULE_PARM_DESC(wqm_quanta, "WQM quanta"); + LIST_HEAD(nes_adapter_list); static LIST_HEAD(nes_dev_list); @@ -557,6 +561,7 @@ static int __devinit nes_probe(struct pci_dev *pcidev, const struct pci_device_i goto bail5; } nesdev->nesadapter->et_rx_coalesce_usecs_irq = interrupt_mod_interval; + nesdev->nesadapter->wqm_quanta = wqm_quanta; /* nesdev->base_doorbell_index = nesdev->nesadapter->pd_config_base[PCI_FUNC(nesdev->pcidev->devfn)]; */ @@ -1069,6 +1074,55 @@ static ssize_t nes_store_idx_data(struct device_driver *ddp, return strnlen(buf, count); } + +/** + * nes_show_wqm_quanta + */ +static ssize_t nes_show_wqm_quanta(struct device_driver *ddp, char *buf) +{ + u32 wqm_quanta_value = 0xdead; + u32 i = 0; + struct nes_device *nesdev; + + list_for_each_entry(nesdev, &nes_dev_list, list) { + if (i == ee_flsh_adapter) { + wqm_quanta_value = nesdev->nesadapter->wqm_quanta; + break; + } + i++; + } + + return snprintf(buf, PAGE_SIZE, "0x%X\n", wqm_quanta); +} + + +/** + * nes_store_wqm_quanta + */ +static ssize_t nes_store_wqm_quanta(struct device_driver *ddp, + const char *buf, size_t count) +{ + unsigned long wqm_quanta_value; + u32 wqm_config1; + u32 i = 0; + struct nes_device *nesdev; + + strict_strtoul(buf, 0, &wqm_quanta_value); + list_for_each_entry(nesdev, &nes_dev_list, list) { + if (i == ee_flsh_adapter) { + nesdev->nesadapter->wqm_quanta = wqm_quanta_value; + wqm_config1 = nes_read_indexed(nesdev, + NES_IDX_WQM_CONFIG1); + nes_write_indexed(nesdev, NES_IDX_WQM_CONFIG1, + ((wqm_quanta_value << 1) | + (wqm_config1 & 0x00000001))); + break; + } + i++; + } + return strnlen(buf, count); +} + static DRIVER_ATTR(adapter, S_IRUSR | S_IWUSR, nes_show_adapter, nes_store_adapter); static DRIVER_ATTR(eeprom_cmd, S_IRUSR | S_IWUSR, @@ -1087,6 +1141,8 @@ static DRIVER_ATTR(idx_addr, S_IRUSR | S_IWUSR, nes_show_idx_addr, nes_store_idx_addr); static DRIVER_ATTR(idx_data, S_IRUSR | S_IWUSR, nes_show_idx_data, nes_store_idx_data); +static DRIVER_ATTR(wqm_quanta, S_IRUSR | S_IWUSR, + nes_show_wqm_quanta, nes_store_wqm_quanta); static int nes_create_driver_sysfs(struct pci_driver *drv) { @@ -1100,6 +1156,7 @@ static int nes_create_driver_sysfs(struct pci_driver *drv) error |= driver_create_file(&drv->driver, &driver_attr_nonidx_data); error |= driver_create_file(&drv->driver, &driver_attr_idx_addr); error |= driver_create_file(&drv->driver, &driver_attr_idx_data); + error |= driver_create_file(&drv->driver, &driver_attr_wqm_quanta); return error; } @@ -1114,6 +1171,7 @@ static void nes_remove_driver_sysfs(struct pci_driver *drv) driver_remove_file(&drv->driver, &driver_attr_nonidx_data); driver_remove_file(&drv->driver, &driver_attr_idx_addr); driver_remove_file(&drv->driver, &driver_attr_idx_data); + driver_remove_file(&drv->driver, &driver_attr_wqm_quanta); } /** diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h index 8eb7ae9..1595dc7 100644 --- a/drivers/infiniband/hw/nes/nes.h +++ b/drivers/infiniband/hw/nes/nes.h @@ -169,7 +169,7 @@ extern int disable_mpa_crc; extern unsigned int send_first; extern unsigned int nes_drv_opt; extern unsigned int nes_debug_level; - +extern unsigned int wqm_quanta; extern struct list_head nes_adapter_list; extern atomic_t cm_connects; diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index bdd98e6..0e259a8 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -861,7 +861,8 @@ static void nes_init_csr_ne020(struct nes_device *nesdev, u8 hw_rev, u8 port_cou nes_write_indexed(nesdev, 0x00005000, 0x00018000); /* nes_write_indexed(nesdev, 0x00005000, 0x00010000); */ - nes_write_indexed(nesdev, 0x00005004, 0x00020001); + nes_write_indexed(nesdev, NES_IDX_WQM_CONFIG1, (wqm_quanta << 1) | + 0x00000001); nes_write_indexed(nesdev, 0x00005008, 0x1F1F1F1F); nes_write_indexed(nesdev, 0x00005010, 0x1F1F1F1F); nes_write_indexed(nesdev, 0x00005018, 0x1F1F1F1F); diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index fc0f063..82d0676 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -156,6 +156,7 @@ enum indexed_regs { NES_IDX_ENDNODE0_NSTAT_TX_OCTETS_HI = 0x7004, NES_IDX_ENDNODE0_NSTAT_TX_FRAMES_LO = 0x7008, NES_IDX_ENDNODE0_NSTAT_TX_FRAMES_HI = 0x700c, + NES_IDX_WQM_CONFIG1 = 0x5004, NES_IDX_CM_CONFIG = 0x5100, NES_IDX_NIC_LOGPORT_TO_PHYPORT = 0x6000, NES_IDX_NIC_PHYPORT_TO_USW = 0x6008, @@ -1079,6 +1080,7 @@ struct nes_adapter { u32 et_rx_max_coalesced_frames_high; u32 et_rate_sample_interval; u32 timer_int_limit; + u32 wqm_quanta; /* Adapter base MAC address */ u32 mac_addr_low; From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 13/13] RDMA/nes: Enhanced PFT management scheme Message-ID: <200809262008.m8QK8A2B011727@velma.neteffect.com> Author: Vadim Makhervaks Change management of perfect filter table to allow enhanced performance applications. Signed-off-by: Vadim Makhervaks Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_hw.c | 3 + drivers/infiniband/hw/nes/nes_hw.h | 2 + drivers/infiniband/hw/nes/nes_nic.c | 71 +++++++++++++++++++++++++++++------ 3 files changed, 64 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 5e13c26..f7d2065 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -363,6 +363,9 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) { } nes_init_csr_ne020(nesdev, hw_rev, port_count); + memset(nesadapter->pft_mcast_map, 255, + sizeof nesadapter->pft_mcast_map); + /* populate the new nesadapter */ nesadapter->devfn = nesdev->pcidev->devfn; nesadapter->bus_number = nesdev->pcidev->bus->number; diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 1b93c57..610b9d8 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -968,6 +968,7 @@ struct nes_arp_entry { #define DEFAULT_JUMBO_NES_QL_TARGET 40 #define DEFAULT_JUMBO_NES_QL_HIGH 128 #define NES_NIC_CQ_DOWNWARD_TREND 16 +#define NES_PFT_SIZE 48 struct nes_hw_tune_timer { /* u16 cq_count; */ @@ -1117,6 +1118,7 @@ struct nes_adapter { u8 virtwq; u8 et_use_adaptive_rx_coalesce; u8 adapter_fcn_count; + u8 pft_mcast_map[NES_PFT_SIZE]; }; struct nes_pbl { diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 639d0fc..26809f4 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -91,6 +91,7 @@ static struct nic_qp_map *nic_qp_mapping_per_function[] = { static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN; static int debug = -1; +static int nics_per_function = 1; /** * nes_netdev_poll @@ -201,7 +202,8 @@ static int nes_netdev_open(struct net_device *netdev) nes_debug(NES_DBG_NETDEV, "i=%d, perfect filter table index= %d, PERF FILTER LOW" " (Addr:%08X) = %08X, HIGH = %08X.\n", i, nesvnic->qp_nic_index[i], - NES_IDX_PERFECT_FILTER_LOW+((nesvnic->perfect_filter_index + i) * 8), + NES_IDX_PERFECT_FILTER_LOW+ + (nesvnic->qp_nic_index[i] * 8), macaddr_low, (u32)macaddr_high | NES_MAC_ADDR_VALID | ((((u32)nesvnic->nic_index) << 16))); @@ -833,6 +835,7 @@ static void nes_netdev_set_multicast_list(struct net_device *netdev) { struct nes_vnic *nesvnic = netdev_priv(netdev); struct nes_device *nesdev = nesvnic->nesdev; + struct nes_adapter *nesadapter = nesvnic->nesdev->nesadapter; struct dev_mc_list *multicast_addr; u32 nic_active_bit; u32 nic_active; @@ -842,7 +845,12 @@ static void nes_netdev_set_multicast_list(struct net_device *netdev) u8 mc_all_on = 0; u8 mc_index; int mc_nic_index = -1; + u8 pft_entries_preallocated = max(nesadapter->adapter_fcn_count * + nics_per_function, 4); + u8 max_pft_entries_avaiable = NES_PFT_SIZE - pft_entries_preallocated; + unsigned long flags; + spin_lock_irqsave(&nesadapter->resource_lock, flags); nic_active_bit = 1 << nesvnic->nic_index; if (netdev->flags & IFF_PROMISC) { @@ -853,7 +861,7 @@ static void nes_netdev_set_multicast_list(struct net_device *netdev) nic_active |= nic_active_bit; nes_write_indexed(nesdev, NES_IDX_NIC_UNICAST_ALL, nic_active); mc_all_on = 1; - } else if ((netdev->flags & IFF_ALLMULTI) || (netdev->mc_count > NES_MULTICAST_PF_MAX) || + } else if ((netdev->flags & IFF_ALLMULTI) || (nesvnic->nic_index > 3)) { nic_active = nes_read_indexed(nesdev, NES_IDX_NIC_MULTICAST_ALL); nic_active |= nic_active_bit; @@ -876,13 +884,30 @@ static void nes_netdev_set_multicast_list(struct net_device *netdev) (netdev->flags & IFF_ALLMULTI)?1:0); if (!mc_all_on) { multicast_addr = netdev->mc_list; - perfect_filter_register_address = NES_IDX_PERFECT_FILTER_LOW + 0x80; - perfect_filter_register_address += nesvnic->nic_index*0x40; - for (mc_index=0; mc_index < NES_MULTICAST_PF_MAX; mc_index++) { - while (multicast_addr && nesvnic->mcrq_mcast_filter && ((mc_nic_index = nesvnic->mcrq_mcast_filter(nesvnic, multicast_addr->dmi_addr)) == 0)) + perfect_filter_register_address = NES_IDX_PERFECT_FILTER_LOW + + pft_entries_preallocated * 0x8; + for (mc_index = 0; mc_index < max_pft_entries_avaiable; + mc_index++) { + while (multicast_addr && nesvnic->mcrq_mcast_filter && + ((mc_nic_index = nesvnic->mcrq_mcast_filter(nesvnic, + multicast_addr->dmi_addr)) == 0)) { multicast_addr = multicast_addr->next; + } if (mc_nic_index < 0) mc_nic_index = nesvnic->nic_index; + while (nesadapter->pft_mcast_map[mc_index] < 16 && + nesadapter->pft_mcast_map[mc_index] != + nesvnic->nic_index && + mc_index < max_pft_entries_avaiable) { + nes_debug(NES_DBG_NIC_RX, + "mc_index=%d skipping nic_index=%d,\ + used for=%d \n", mc_index, + nesvnic->nic_index, + nesadapter->pft_mcast_map[mc_index]); + mc_index++; + } + if (mc_index >= max_pft_entries_avaiable) + break; if (multicast_addr) { DECLARE_MAC_BUF(mac); nes_debug(NES_DBG_NIC_RX, "Assigning MC Address %s to register 0x%04X nic_idx=%d\n", @@ -903,14 +928,31 @@ static void nes_netdev_set_multicast_list(struct net_device *netdev) (u32)macaddr_high | NES_MAC_ADDR_VALID | ((((u32)(1<next; + nesadapter->pft_mcast_map[mc_index] = + nesvnic->nic_index; } else { nes_debug(NES_DBG_NIC_RX, "Clearing MC Address at register 0x%04X\n", perfect_filter_register_address+(mc_index * 8)); nes_write_indexed(nesdev, perfect_filter_register_address+4+(mc_index * 8), 0); + nesadapter->pft_mcast_map[mc_index] = 255; } } + /* PFT is not large enough */ + if (multicast_addr && multicast_addr->next) { + nic_active = nes_read_indexed(nesdev, + NES_IDX_NIC_MULTICAST_ALL); + nic_active |= nic_active_bit; + nes_write_indexed(nesdev, NES_IDX_NIC_MULTICAST_ALL, + nic_active); + nic_active = nes_read_indexed(nesdev, + NES_IDX_NIC_UNICAST_ALL); + nic_active &= ~nic_active_bit; + nes_write_indexed(nesdev, NES_IDX_NIC_UNICAST_ALL, + nic_active); + } + spin_unlock_irqrestore(&nesadapter->resource_lock, flags); } } @@ -1615,7 +1657,9 @@ struct net_device *nes_netdev_init(struct nes_device *nesdev, nesvnic, (unsigned long)netdev->features, nesvnic->nic.qp_id, nesvnic->nic_index, nesvnic->logical_port, nesdev->mac_index); - if (nesvnic->nesdev->nesadapter->port_count == 1) { + if (nesvnic->nesdev->nesadapter->port_count == 1 && + nesvnic->nesdev->nesadapter->adapter_fcn_count == 1) { + nesvnic->qp_nic_index[0] = nesvnic->nic_index; nesvnic->qp_nic_index[1] = nesvnic->nic_index + 1; if (nes_drv_opt & NES_DRV_OPT_DUAL_LOGICAL_PORT) { @@ -1626,11 +1670,14 @@ struct net_device *nes_netdev_init(struct nes_device *nesdev, nesvnic->qp_nic_index[3] = nesvnic->nic_index + 3; } } else { - if (nesvnic->nesdev->nesadapter->port_count == 2) { - nesvnic->qp_nic_index[0] = nesvnic->nic_index; - nesvnic->qp_nic_index[1] = nesvnic->nic_index + 2; - nesvnic->qp_nic_index[2] = 0xf; - nesvnic->qp_nic_index[3] = 0xf; + if (nesvnic->nesdev->nesadapter->port_count == 2 || + (nesvnic->nesdev->nesadapter->port_count == 1 && + nesvnic->nesdev->nesadapter->adapter_fcn_count == 2)) { + nesvnic->qp_nic_index[0] = nesvnic->nic_index; + nesvnic->qp_nic_index[1] = nesvnic->nic_index + + 2; + nesvnic->qp_nic_index[2] = 0xf; + nesvnic->qp_nic_index[3] = 0xf; } else { nesvnic->qp_nic_index[0] = nesvnic->nic_index; nesvnic->qp_nic_index[1] = 0xf; From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 4/13] RDMA/nes: Free NIC TX buffers Message-ID: <200809262008.m8QK8ALb011698@velma.neteffect.com> Author: Bob Sharp Free NIC TX buffers when destroying NIC QP. Signed-off-by: Bob Sharp Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_hw.c | 64 ++++++++++++++++++++++++++++++++++- 1 files changed, 62 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 1437b6e..bc16fc0 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -1797,9 +1797,14 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev) */ void nes_destroy_nic_qp(struct nes_vnic *nesvnic) { + u64 u64temp; + dma_addr_t bus_address; struct nes_device *nesdev = nesvnic->nesdev; struct nes_hw_cqp_wqe *cqp_wqe; + struct nes_hw_nic_sq_wqe *nic_sqe; struct nes_hw_nic_rq_wqe *nic_rqe; + __le16 *wqe_fragment_length; + u16 wqe_fragment_index; u64 wqe_frag; u32 cqp_head; unsigned long flags; @@ -1808,14 +1813,69 @@ void nes_destroy_nic_qp(struct nes_vnic *nesvnic) /* Free remaining NIC receive buffers */ while (nesvnic->nic.rq_head != nesvnic->nic.rq_tail) { nic_rqe = &nesvnic->nic.rq_vbase[nesvnic->nic.rq_tail]; - wqe_frag = (u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]); - wqe_frag |= ((u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX])) << 32; + wqe_frag = (u64)le32_to_cpu( + nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]); + wqe_frag |= ((u64)le32_to_cpu( + nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX]))<<32; pci_unmap_single(nesdev->pcidev, (dma_addr_t)wqe_frag, nesvnic->max_frame_size, PCI_DMA_FROMDEVICE); dev_kfree_skb(nesvnic->nic.rx_skb[nesvnic->nic.rq_tail++]); nesvnic->nic.rq_tail &= (nesvnic->nic.rq_size - 1); } + /* Free remaining NIC transmit buffers */ + while (nesvnic->nic.sq_head != nesvnic->nic.sq_tail) { + nic_sqe = &nesvnic->nic.sq_vbase[nesvnic->nic.sq_tail]; + wqe_fragment_index = 1; + wqe_fragment_length = (__le16 *) + &nic_sqe->wqe_words[NES_NIC_SQ_WQE_LENGTH_0_TAG_IDX]; + /* bump past the vlan tag */ + wqe_fragment_length++; + if (le16_to_cpu(wqe_fragment_length[wqe_fragment_index]) != 0) { + u64temp = (u64)le32_to_cpu( + nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX+ + wqe_fragment_index*2]); + u64temp += ((u64)le32_to_cpu( + nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX + + wqe_fragment_index*2]))<<32; + bus_address = (dma_addr_t)u64temp; + if (test_and_clear_bit(nesvnic->nic.sq_tail, + nesvnic->nic.first_frag_overflow)) { + pci_unmap_single(nesdev->pcidev, + bus_address, + le16_to_cpu(wqe_fragment_length[ + wqe_fragment_index++]), + PCI_DMA_TODEVICE); + } + for (; wqe_fragment_index < 5; wqe_fragment_index++) { + if (wqe_fragment_length[wqe_fragment_index]) { + u64temp = le32_to_cpu( + nic_sqe->wqe_words[ + NES_NIC_SQ_WQE_FRAG0_LOW_IDX+ + wqe_fragment_index*2]); + u64temp += ((u64)le32_to_cpu( + nic_sqe->wqe_words[ + NES_NIC_SQ_WQE_FRAG0_HIGH_IDX+ + wqe_fragment_index*2]))<<32; + bus_address = (dma_addr_t)u64temp; + pci_unmap_page(nesdev->pcidev, + bus_address, + le16_to_cpu( + wqe_fragment_length[ + wqe_fragment_index]), + PCI_DMA_TODEVICE); + } else + break; + } + } + if (nesvnic->nic.tx_skb[nesvnic->nic.sq_tail]) + dev_kfree_skb( + nesvnic->nic.tx_skb[nesvnic->nic.sq_tail]); + + nesvnic->nic.sq_tail = (++nesvnic->nic.sq_tail) + & (nesvnic->nic.sq_size - 1); + } + spin_lock_irqsave(&nesdev->cqp.lock, flags); /* Destroy NIC QP */ From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 8/13] RDMA/nes: Fill in firmware version Message-ID: <200809262008.m8QK8AeI011706@velma.neteffect.com> Author: Chien Tung Fill in firmware version for ethtool_drvinfo. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_nic.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 96db599..6abd404 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -1228,10 +1228,12 @@ static void nes_netdev_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo) { struct nes_vnic *nesvnic = netdev_priv(netdev); + struct nes_adapter *nesadapter = nesvnic->nesdev->nesadapter; strcpy(drvinfo->driver, DRV_NAME); strcpy(drvinfo->bus_info, pci_name(nesvnic->nesdev->pcidev)); - strcpy(drvinfo->fw_version, "TBD"); + sprintf(drvinfo->fw_version, "%u.%u", nesadapter->firmware_version>>16, + nesadapter->firmware_version & 0x000000ff); strcpy(drvinfo->version, DRV_VERSION); drvinfo->n_stats = nes_netdev_get_stats_count(netdev); drvinfo->testinfo_len = 0; From ctung at neteffect.com Fri Sep 26 13:08:10 2008 From: ctung at neteffect.com (Chien Tung) Date: Fri, 26 Sep 2008 15:08:10 -0500 Subject: [ofa-general] [PATCH 10/13] RDMA/nes: Stop spurious MAC interrupts Message-ID: <200809262008.m8QK8Aoi011711@velma.neteffect.com> Author: Chien Tung Mask off MAC interrupts on netdev_stop to prevent spurious MAC interrupts on unload/reload of iw_nes. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_nic.c | 18 +++++++++++------- 1 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 852546b..639d0fc 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -272,14 +272,18 @@ static int nes_netdev_stop(struct net_device *netdev) break; } - if (first_nesvnic->netdev_open == 0) + if ((first_nesvnic->netdev_open == 1) && (first_nesvnic != nesvnic) && + (PCI_FUNC(first_nesvnic->nesdev->pcidev->devfn) != + PCI_FUNC(nesvnic->nesdev->pcidev->devfn))) { + nes_write_indexed(nesdev, NES_IDX_MAC_INT_MASK+ + (0x200*nesdev->mac_index), 0xffffffff); + nes_write_indexed(first_nesvnic->nesdev, + NES_IDX_MAC_INT_MASK+ + (0x200*first_nesvnic->nesdev->mac_index), + ~(NES_MAC_INT_LINK_STAT_CHG | NES_MAC_INT_XGMII_EXT | + NES_MAC_INT_TX_UNDERFLOW | NES_MAC_INT_TX_ERROR)); + } else { nes_write_indexed(nesdev, NES_IDX_MAC_INT_MASK+(0x200*nesdev->mac_index), 0xffffffff); - else if ((first_nesvnic != nesvnic) && - (PCI_FUNC(first_nesvnic->nesdev->pcidev->devfn) != PCI_FUNC(nesvnic->nesdev->pcidev->devfn))) { - nes_write_indexed(nesdev, NES_IDX_MAC_INT_MASK + (0x200 * nesdev->mac_index), 0xffffffff); - nes_write_indexed(first_nesvnic->nesdev, NES_IDX_MAC_INT_MASK + (0x200 * first_nesvnic->nesdev->mac_index), - ~(NES_MAC_INT_LINK_STAT_CHG | NES_MAC_INT_XGMII_EXT | - NES_MAC_INT_TX_UNDERFLOW | NES_MAC_INT_TX_ERROR)); } nic_active_mask = ~((u32)(1 << nesvnic->nic_index)); From sdake at redhat.com Fri Sep 26 13:17:02 2008 From: sdake at redhat.com (Steven Dake) Date: Fri, 26 Sep 2008 13:17:02 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: <20080926155408.GA35815@sferris.acm.org> References: <1222296236.29287.17.camel@balance> <20080926155408.GA35815@sferris.acm.org> Message-ID: <1222460222.3742.2.camel@balance> On Fri, 2008-09-26 at 10:54 -0500, Scott M. Ferris wrote: > On Wed, Sep 24, 2008 at 03:43:56PM -0700, Steven Dake wrote: > > > > Totem is a reliable virtual synchrony multicast protocol which transmits > > a message from any node to all nodes in a collection of computers > > (called the configuration or membership). It has a few requirements: > > > > unreliable datagram multicast > > unreliable datagram unicast > > ability to bind to a specific port and interface > > ability to poll() (POLLIN) via system call for new multicast datagram > > messages > > It's been a few years since I read the totem paper, but If I recall > correctly, totem also has certain ordering requirements about unicast > and multicast, which the paper asserts are true for ethernet. > Totem can deal with reordering of any packets and has no otherrequirements then above. It is designed for these sorts of networks. Thanks for your detailed response. I have a small question below... > It's not clear to me that Infiniband will provide the same guarantees > in all cases. If unicasts and multicasts are sent on different queue > pairs, I'm not sure any ordering is guaranteed. I can also imagine > the IB virtual lane (VL) feature potentially reordering delivery if > messages end up in different VLs. I'd recommend talking to someone > more knowledgable in Infiniband than I am to check what you need to do > to meet the totem ordering requirement. > > > Few questions: > > 1) I would like to continue to use IP addressing but it looks like I > > have to use a different addressing model in librdmacm. I looked at the > > examples in the library and it isn't clear to me whether they use IP > > addressing or some other addressing model. I see references to IPoverIB > > but I don't see any information in the wiki on the topic. Anyone have > > links to documentation on the topic of node addressing? > > IPoIB is fairly transparent to you. The OFED software provides Linux > netdevices (e.g. ib0, ib1) which pass IP traffic just like an ethernet > netdevice would. Normal IP routing controls what interface gets used. > For the most part IP-based software just works. Low-level things like > DHCP daemons will notice differences, since the hardware addresses are > larger. > > If IPoIB connected mode is being used, unicasts and multicasts will be > sent on different queue pairs, since the unicasts will use a > connection, and the multicasts can't. You may need to disable IPoIB > connected mode in order to get the ordering guarantees totem needs, > since I'm not sure any ordering is guaranteed between messages queued > to different queue pairs. > > If you want to use native IB protocols instead of IPoIB, librdmacm > will be easier than using libibcm. The RDMA communication manager > uses IPoIB to resolve IP addresses to native IB addresses, so you can > avoid changing your addressing model. The rdma_cm(7) man page > describes how to bind and connect. > > > 2) The library doesn't have any non blocking (kernel wait queue based) > > polling mechanism that I can see. Am I missing a call here? > > Look at ibv_create_comp_channel(3) and ibv_get_cq_event(3). > > > 3) Of course using the standard socket API would be highly desired as > > it requires less code changes. Is there some other library I should be > > using? > > IPoIB gives you the standard socket API. > Where is IPoIB? Is it a standard feature of the OFED software stack? Thanks again! regards -steve > RDMA CM uses a similiar set of calls for binding and connecting, > though there are some differences. rdma_cm_ids are somewhat like > socket fds. Once you have a multicast or unicast rmda_cm_id, you use > the IB verbs API (libibverbs) to send and receive. > From rdreier at cisco.com Fri Sep 26 13:17:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 Sep 2008 13:17:37 -0700 Subject: [ofa-general] [PATCH 7/13] RDMA/nes: Use Ethtool timer value In-Reply-To: <200809262008.m8QK8Aru011704@velma.neteffect.com> (Chien Tung's message of "Fri, 26 Sep 2008 15:08:10 -0500") References: <200809262008.m8QK8Aru011704@velma.neteffect.com> Message-ID: > Author: John Lacombe Should be "From:" instead of "Author:" for automatic git tools handling. But I can fix it up by hand for this patch series, no need to resend. - R. From rdreier at cisco.com Fri Sep 26 13:18:46 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 Sep 2008 13:18:46 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: <1222460222.3742.2.camel@balance> (Steven Dake's message of "Fri, 26 Sep 2008 13:17:02 -0700") References: <1222296236.29287.17.camel@balance> <20080926155408.GA35815@sferris.acm.org> <1222460222.3742.2.camel@balance> Message-ID: > Where is IPoIB? Is it a standard feature of the OFED software stack? IPoIB was the first ULP merged upstream. It is the ib_ipoib module and it creates interfaces like "ibX" - R. From rdreier at cisco.com Fri Sep 26 13:19:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 Sep 2008 13:19:00 -0700 Subject: [ofa-general] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <20080925114414.GA25044@mtls03> (Eli Cohen's message of "Thu, 25 Sep 2008 14:44:14 +0300") References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> Message-ID: How about this? Instead of trying to rely on some complicated and fragile reasoning about when some race might occur, let's just do what we want to do anyway and get rid of LLTX. We change from priv->tx_lock (taken with IRQ disabling) to netif_tx_lock (taken on with BH-disabling). And then we can keep the skb_orphan in the place it is, since our xmit routine runs with IRQs enabled. Most of this patch is just compensating for the fact that the tx_lock regions are now IRQ-enabled, and so we have to convert places that take priv->lock to disable IRQs too. If we could change ipoib_cm_rx_event_handler to not need priv->lock, then we could change priv->lock to a BH-disabling lock too and simplify things a bit further. I've tested this patch some in both datagram and connected mode with a kernel with lockdep and other debugging enabled, so it is at least somewhat sane. However more stress testing would definitely be helpful if we want to put this in 2.6.28. Also it would be interesting to see if there are any performance effects. Thanks, Roland --- drivers/infiniband/ulp/ipoib/ipoib.h | 8 +-- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 88 ++++++++++++++---------- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 30 ++++++-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 68 ++++++++----------- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 31 ++++----- 5 files changed, 118 insertions(+), 107 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 05eb41b..68ba5c3 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -268,10 +268,9 @@ struct ipoib_lro { }; /* - * Device private locking: tx_lock protects members used in TX fast - * path (and we use LLTX so upper layers don't do extra locking). - * lock protects everything else. lock nests inside of tx_lock (ie - * tx_lock must be acquired first if needed). + * Device private locking: network stack tx_lock protects members used + * in TX fast path, lock protects everything else. lock nests inside + * of tx_lock (ie tx_lock must be acquired first if needed). */ struct ipoib_dev_priv { spinlock_t lock; @@ -320,7 +319,6 @@ struct ipoib_dev_priv { struct ipoib_rx_buf *rx_ring; - spinlock_t tx_lock; struct ipoib_tx_buf *tx_ring; unsigned tx_head; unsigned tx_tail; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 341ffed..7b14c2c 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -786,7 +786,8 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) dev_kfree_skb_any(tx_req->skb); - spin_lock_irqsave(&priv->tx_lock, flags); + netif_tx_lock(dev); + ++tx->tx_tail; if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && netif_queue_stopped(dev) && @@ -801,7 +802,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) "(status=%d, wrid=%d vend_err %x)\n", wc->status, wr_id, wc->vendor_err); - spin_lock(&priv->lock); + spin_lock_irqsave(&priv->lock, flags); neigh = tx->neigh; if (neigh) { @@ -821,10 +822,10 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags); - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); } - spin_unlock_irqrestore(&priv->tx_lock, flags); + netif_tx_unlock(dev); } int ipoib_cm_dev_open(struct net_device *dev) @@ -1149,7 +1150,6 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) { struct ipoib_dev_priv *priv = netdev_priv(p->dev); struct ipoib_cm_tx_buf *tx_req; - unsigned long flags; unsigned long begin; ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n", @@ -1180,12 +1180,12 @@ timeout: DMA_TO_DEVICE); dev_kfree_skb_any(tx_req->skb); ++p->tx_tail; - spin_lock_irqsave(&priv->tx_lock, flags); + netif_tx_lock_bh(p->dev); if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && netif_queue_stopped(p->dev) && test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(p->dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); + netif_tx_unlock_bh(p->dev); } if (p->qp) @@ -1202,6 +1202,7 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, struct ipoib_dev_priv *priv = netdev_priv(tx->dev); struct net_device *dev = priv->dev; struct ipoib_neigh *neigh; + unsigned long flags; int ret; switch (event->event) { @@ -1220,8 +1221,8 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, case IB_CM_REJ_RECEIVED: case IB_CM_TIMEWAIT_EXIT: ipoib_dbg(priv, "CM error %d.\n", event->event); - spin_lock_irq(&priv->tx_lock); - spin_lock(&priv->lock); + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); neigh = tx->neigh; if (neigh) { @@ -1239,8 +1240,8 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, queue_work(ipoib_workqueue, &priv->cm.reap_task); } - spin_unlock(&priv->lock); - spin_unlock_irq(&priv->tx_lock); + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); break; default: break; @@ -1294,19 +1295,24 @@ static void ipoib_cm_tx_start(struct work_struct *work) struct ib_sa_path_rec pathrec; u32 qpn; - spin_lock_irqsave(&priv->tx_lock, flags); - spin_lock(&priv->lock); + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); + while (!list_empty(&priv->cm.start_list)) { p = list_entry(priv->cm.start_list.next, typeof(*p), list); list_del_init(&p->list); neigh = p->neigh; qpn = IPOIB_QPN(neigh->neighbour->ha); memcpy(&pathrec, &p->path->pathrec, sizeof pathrec); - spin_unlock(&priv->lock); - spin_unlock_irqrestore(&priv->tx_lock, flags); + + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); + ret = ipoib_cm_tx_init(p, qpn, &pathrec); - spin_lock_irqsave(&priv->tx_lock, flags); - spin_lock(&priv->lock); + + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); + if (ret) { neigh = p->neigh; if (neigh) { @@ -1320,44 +1326,52 @@ static void ipoib_cm_tx_start(struct work_struct *work) kfree(p); } } - spin_unlock(&priv->lock); - spin_unlock_irqrestore(&priv->tx_lock, flags); + + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); } static void ipoib_cm_tx_reap(struct work_struct *work) { struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, cm.reap_task); + struct net_device *dev = priv->dev; struct ipoib_cm_tx *p; + unsigned long flags; + + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); - spin_lock_irq(&priv->tx_lock); - spin_lock(&priv->lock); while (!list_empty(&priv->cm.reap_list)) { p = list_entry(priv->cm.reap_list.next, typeof(*p), list); list_del(&p->list); - spin_unlock(&priv->lock); - spin_unlock_irq(&priv->tx_lock); + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); ipoib_cm_tx_destroy(p); - spin_lock_irq(&priv->tx_lock); - spin_lock(&priv->lock); + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); } - spin_unlock(&priv->lock); - spin_unlock_irq(&priv->tx_lock); + + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); } static void ipoib_cm_skb_reap(struct work_struct *work) { struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, cm.skb_task); + struct net_device *dev = priv->dev; struct sk_buff *skb; - + unsigned long flags; unsigned mtu = priv->mcast_mtu; - spin_lock_irq(&priv->tx_lock); - spin_lock(&priv->lock); + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); + while ((skb = skb_dequeue(&priv->cm.skb_queue))) { - spin_unlock(&priv->lock); - spin_unlock_irq(&priv->tx_lock); + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); + if (skb->protocol == htons(ETH_P_IP)) icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) @@ -1365,11 +1379,13 @@ static void ipoib_cm_skb_reap(struct work_struct *work) icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, priv->dev); #endif dev_kfree_skb_any(skb); - spin_lock_irq(&priv->tx_lock); - spin_lock(&priv->lock); + + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); } - spin_unlock(&priv->lock); - spin_unlock_irq(&priv->tx_lock); + + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); } void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 66cafa2..0e748ae 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -468,21 +468,22 @@ void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) static void drain_tx_cq(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned long flags; - spin_lock_irqsave(&priv->tx_lock, flags); + netif_tx_lock(dev); while (poll_tx(priv)) ; /* nothing */ if (netif_queue_stopped(dev)) mod_timer(&priv->poll_timer, jiffies + 1); - spin_unlock_irqrestore(&priv->tx_lock, flags); + netif_tx_unlock(dev); } void ipoib_send_comp_handler(struct ib_cq *cq, void *dev_ptr) { - drain_tx_cq((struct net_device *)dev_ptr); + struct ipoib_dev_priv *priv = netdev_priv(dev_ptr); + + mod_timer(&priv->poll_timer, jiffies); } static inline int post_send(struct ipoib_dev_priv *priv, @@ -614,17 +615,20 @@ static void __ipoib_reap_ah(struct net_device *dev) struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_ah *ah, *tah; LIST_HEAD(remove_list); + unsigned long flags; + + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); - spin_lock_irq(&priv->tx_lock); - spin_lock(&priv->lock); list_for_each_entry_safe(ah, tah, &priv->dead_ahs, list) if ((int) priv->tx_tail - (int) ah->last_send >= 0) { list_del(&ah->list); ib_destroy_ah(ah->ah); kfree(ah); } - spin_unlock(&priv->lock); - spin_unlock_irq(&priv->tx_lock); + + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); } void ipoib_reap_ah(struct work_struct *work) @@ -761,6 +765,14 @@ void ipoib_drain_cq(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); int i, n; + + /* + * We call completion handling routines that expect to be + * called from the BH-disabled NAPI poll context, so disable + * BHs here too. + */ + local_bh_disable(); + do { n = ib_poll_cq(priv->recv_cq, IPOIB_NUM_WC, priv->ibwc); for (i = 0; i < n; ++i) { @@ -784,6 +796,8 @@ void ipoib_drain_cq(struct net_device *dev) while (poll_tx(priv)) ; /* nothing */ + + local_bh_enable(); } int ipoib_ib_dev_stop(struct net_device *dev, int flush) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index e9ca3cb..c0ee514 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -373,9 +373,10 @@ void ipoib_flush_paths(struct net_device *dev) struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path, *tp; LIST_HEAD(remove_list); + unsigned long flags; - spin_lock_irq(&priv->tx_lock); - spin_lock(&priv->lock); + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); list_splice_init(&priv->path_list, &remove_list); @@ -385,15 +386,16 @@ void ipoib_flush_paths(struct net_device *dev) list_for_each_entry_safe(path, tp, &remove_list, list) { if (path->query) ib_sa_cancel_query(path->query_id, path->query); - spin_unlock(&priv->lock); - spin_unlock_irq(&priv->tx_lock); + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); wait_for_completion(&path->done); path_free(dev, path); - spin_lock_irq(&priv->tx_lock); - spin_lock(&priv->lock); + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); } - spin_unlock(&priv->lock); - spin_unlock_irq(&priv->tx_lock); + + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); } static void path_rec_completion(int status, @@ -555,6 +557,7 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path; struct ipoib_neigh *neigh; + unsigned long flags; neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { @@ -563,11 +566,7 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) return; } - /* - * We can only be called from ipoib_start_xmit, so we're - * inside tx_lock -- no need to save/restore flags. - */ - spin_lock(&priv->lock); + spin_lock_irqsave(&priv->lock, flags); path = __path_find(dev, skb->dst->neighbour->ha + 4); if (!path) { @@ -614,7 +613,7 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) __skb_queue_tail(&neigh->queue, skb); } - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); return; err_list: @@ -626,7 +625,7 @@ err_drop: ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); } static void ipoib_path_lookup(struct sk_buff *skb, struct net_device *dev) @@ -650,12 +649,9 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path; + unsigned long flags; - /* - * We can only be called from ipoib_start_xmit, so we're - * inside tx_lock -- no need to save/restore flags. - */ - spin_lock(&priv->lock); + spin_lock_irqsave(&priv->lock, flags); path = __path_find(dev, phdr->hwaddr + 4); if (!path || !path->valid) { @@ -667,7 +663,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, __skb_queue_tail(&path->queue, skb); if (path_rec_start(dev, path)) { - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); path_free(dev, path); return; } else @@ -677,7 +673,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, dev_kfree_skb_any(skb); } - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); return; } @@ -696,7 +692,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, dev_kfree_skb_any(skb); } - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); } static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -705,13 +701,10 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) struct ipoib_neigh *neigh; unsigned long flags; - if (unlikely(!spin_trylock_irqsave(&priv->tx_lock, flags))) - return NETDEV_TX_LOCKED; - if (likely(skb->dst && skb->dst->neighbour)) { if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) { ipoib_path_lookup(skb, dev); - goto out; + return NETDEV_TX_OK; } neigh = *to_ipoib_neigh(skb->dst->neighbour); @@ -721,7 +714,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) skb->dst->neighbour->ha + 4, sizeof(union ib_gid))) || (neigh->dev != dev))) { - spin_lock(&priv->lock); + spin_lock_irqsave(&priv->lock, flags); /* * It's safe to call ipoib_put_ah() inside * priv->lock here, because we know that @@ -732,25 +725,25 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) ipoib_put_ah(neigh->ah); list_del(&neigh->list); ipoib_neigh_free(dev, neigh); - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); ipoib_path_lookup(skb, dev); - goto out; + return NETDEV_TX_OK; } if (ipoib_cm_get(neigh)) { if (ipoib_cm_up(neigh)) { ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); - goto out; + return NETDEV_TX_OK; } } else if (neigh->ah) { ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(skb->dst->neighbour->ha)); - goto out; + return NETDEV_TX_OK; } if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) { - spin_lock(&priv->lock); + spin_lock_irqsave(&priv->lock, flags); __skb_queue_tail(&neigh->queue, skb); - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); } else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -779,16 +772,13 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) IPOIB_GID_RAW_ARG(phdr->hwaddr + 4)); dev_kfree_skb_any(skb); ++dev->stats.tx_dropped; - goto out; + return NETDEV_TX_OK; } unicast_arp_send(skb, dev, phdr); } } -out: - spin_unlock_irqrestore(&priv->tx_lock, flags); - return NETDEV_TX_OK; } @@ -1052,7 +1042,6 @@ static void ipoib_setup(struct net_device *dev) dev->type = ARPHRD_INFINIBAND; dev->tx_queue_len = ipoib_sendq_size * 2; dev->features = (NETIF_F_VLAN_CHALLENGED | - NETIF_F_LLTX | NETIF_F_HIGHDMA); memcpy(dev->broadcast, ipv4_bcast_addr, INFINIBAND_ALEN); @@ -1064,7 +1053,6 @@ static void ipoib_setup(struct net_device *dev) ipoib_lro_setup(priv); spin_lock_init(&priv->lock); - spin_lock_init(&priv->tx_lock); mutex_init(&priv->vlan_mutex); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index aae2862..d9d1223 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -69,14 +69,13 @@ static void ipoib_mcast_free(struct ipoib_mcast *mcast) struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_neigh *neigh, *tmp; - unsigned long flags; int tx_dropped = 0; ipoib_dbg_mcast(netdev_priv(dev), "deleting multicast group " IPOIB_GID_FMT "\n", IPOIB_GID_ARG(mcast->mcmember.mgid)); - spin_lock_irqsave(&priv->lock, flags); + spin_lock_irq(&priv->lock); list_for_each_entry_safe(neigh, tmp, &mcast->neigh_list, list) { /* @@ -90,7 +89,7 @@ static void ipoib_mcast_free(struct ipoib_mcast *mcast) ipoib_neigh_free(dev, neigh); } - spin_unlock_irqrestore(&priv->lock, flags); + spin_unlock_irq(&priv->lock); if (mcast->ah) ipoib_put_ah(mcast->ah); @@ -100,9 +99,9 @@ static void ipoib_mcast_free(struct ipoib_mcast *mcast) dev_kfree_skb_any(skb_dequeue(&mcast->pkt_queue)); } - spin_lock_irqsave(&priv->tx_lock, flags); + netif_tx_lock_bh(dev); dev->stats.tx_dropped += tx_dropped; - spin_unlock_irqrestore(&priv->tx_lock, flags); + netif_tx_unlock_bh(dev); kfree(mcast); } @@ -259,10 +258,10 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast, } /* actually send any queued packets */ - spin_lock_irq(&priv->tx_lock); + netif_tx_lock_bh(dev); while (!skb_queue_empty(&mcast->pkt_queue)) { struct sk_buff *skb = skb_dequeue(&mcast->pkt_queue); - spin_unlock_irq(&priv->tx_lock); + netif_tx_unlock_bh(dev); skb->dev = dev; @@ -273,9 +272,9 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast, if (dev_queue_xmit(skb)) ipoib_warn(priv, "dev_queue_xmit failed to requeue packet\n"); - spin_lock_irq(&priv->tx_lock); + netif_tx_lock_bh(dev); } - spin_unlock_irq(&priv->tx_lock); + netif_tx_unlock_bh(dev); return 0; } @@ -286,7 +285,6 @@ ipoib_mcast_sendonly_join_complete(int status, { struct ipoib_mcast *mcast = multicast->context; struct net_device *dev = mcast->dev; - struct ipoib_dev_priv *priv = netdev_priv(dev); /* We trap for port events ourselves. */ if (status == -ENETRESET) @@ -302,12 +300,12 @@ ipoib_mcast_sendonly_join_complete(int status, IPOIB_GID_ARG(mcast->mcmember.mgid), status); /* Flush out any queued packets */ - spin_lock_irq(&priv->tx_lock); + netif_tx_lock_bh(dev); while (!skb_queue_empty(&mcast->pkt_queue)) { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb_dequeue(&mcast->pkt_queue)); } - spin_unlock_irq(&priv->tx_lock); + netif_tx_unlock_bh(dev); /* Clear the busy flag so we try again */ status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, @@ -662,12 +660,9 @@ void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; + unsigned long flags; - /* - * We can only be called from ipoib_start_xmit, so we're - * inside tx_lock -- no need to save/restore flags. - */ - spin_lock(&priv->lock); + spin_lock_irqsave(&priv->lock, flags); if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags) || !priv->broadcast || @@ -738,7 +733,7 @@ out: } unlock: - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); } void ipoib_mcast_dev_flush(struct net_device *dev) From rdreier at cisco.com Fri Sep 26 13:20:59 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 Sep 2008 13:20:59 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: <1222296236.29287.17.camel@balance> (Steven Dake's message of "Wed, 24 Sep 2008 15:43:56 -0700") References: <1222296236.29287.17.camel@balance> Message-ID: > 1) I would like to continue to use IP addressing but it looks like I > have to use a different addressing model in librdmacm. I looked at the > examples in the library and it isn't clear to me whether they use IP > addressing or some other addressing model. I see references to IPoverIB > but I don't see any information in the wiki on the topic. Anyone have > links to documentation on the topic of node addressing? No, librdmacm continues to use IP addressing. For any IB device that you want to use with librdmacm, you must create an IPoIB interface and assign an IP address to it. > 2) The library doesn't have any non blocking (kernel wait queue based) > polling mechanism that I can see. Am I missing a call here? librdmacm just handles connection and multicast membership things. The actual datapath (sending, receiving, getting events, etc) is in libibverbs. > 3) Of course using the standard socket API would be highly desired as > it requires less code changes. Is there some other library I should be > using? Is there some reason you can't just use standard UDP over IP-over-IB? What are your performance requirements? Do you really need kernel bypass for ultra-low-latency? - R. From sdake at redhat.com Fri Sep 26 13:25:20 2008 From: sdake at redhat.com (Steven Dake) Date: Fri, 26 Sep 2008 13:25:20 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: References: <1222296236.29287.17.camel@balance> Message-ID: <1222460720.3742.4.camel@balance> On Fri, 2008-09-26 at 13:20 -0700, Roland Dreier wrote: > > > 3) Of course using the standard socket API would be highly desired as > > it requires less code changes. Is there some other library I should be > > using? > > Is there some reason you can't just use standard UDP over IP-over-IB? > What are your performance requirements? Do you really need kernel > bypass for ultra-low-latency? > > - I would like to use standard UDP over IP but I'm not sure how to make the last part work (IP over IB). I couldn't really find many docs on how this is done. Regards -steve From akepner at sgi.com Fri Sep 26 14:21:11 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Fri, 26 Sep 2008 14:21:11 -0700 Subject: [ofa-general] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> Message-ID: <20080926212111.GQ15133@sgi.com> On Fri, Sep 26, 2008 at 01:19:00PM -0700, Roland Dreier wrote: > How about this? Instead of trying to rely on some complicated and > fragile reasoning about when some race might occur, let's just do what > we want to do anyway and get rid of LLTX. We change from priv->tx_lock > (taken with IRQ disabling) to netif_tx_lock (taken on with > BH-disabling). And then we can keep the skb_orphan in the place it is, > since our xmit routine runs with IRQs enabled. > ... Thanks for doing this work, Roland. We'll give it a test asap. (Actually, we'll need to port the patch to OFED 1.3.1 first, since that's what we're using/shipping.) -- Arthur From sashak at voltaire.com Fri Sep 26 14:54:19 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 27 Sep 2008 00:54:19 +0300 Subject: [ofa-general] ibsysstat cpu output is incomplete In-Reply-To: References: Message-ID: <20080926215419.GT16914@sashak.voltaire.com> Hi, On 13:25 Tue 23 Sep , Wen Hao Wang wrote: > > I find the output if "ibsysstat cpu" is not complete. This issue > exists on all my cluster nodes, with RHEL/SLES and OFED 1.3.1/1.4-RC1 > installed. > > [root at xblade07 ~]# ibsysstat 13 cpu > cpu 0: model Genuine Intel(R) CPU @ 2.83GHz MHZ 2833.512 > cpu 1: model Genuine Intel(R) CPU @ 2.83GHz MHZ 2833.512 > cpu 2: model Genuine Intel(R) CPU @ 2.83GHz M > ---------------------> something missed ibsysstat uses vendor specific class 0x33. The single packet payload size is limited by 216 bytes. It looks you got this limit. Sasha From arlin.r.davis at intel.com Fri Sep 26 14:54:53 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 26 Sep 2008 14:54:53 -0700 Subject: [ofa-general] [PATCH 1/1] dtest: fix 32-bit build issues in dtest and dtestx examples. Message-ID: <000001c92022$8194f690$db97070a@amr.corp.intel.com> Fix 32-bit build issues in dtest and dtestx examples Signed-off-by: Arlin Davis --- test/dtest/dtest.c | 24 ++++++++++++------------ test/dtest/dtestx.c | 28 ++++++++++++++-------------- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/test/dtest/dtest.c b/test/dtest/dtest.c index 00d14e3..55f325b 100755 --- a/test/dtest/dtest.c +++ b/test/dtest/dtest.c @@ -92,14 +92,14 @@ (((uint32_t)(x) & 0xFF000000) >> 24)) #define hton32(x) ntoh32(x) #define ntoh64(x) (uint64_t)( \ - (((uint64_t)x & 0x00000000000000FF) << 56) | \ - (((uint64_t)x & 0x000000000000FF00) << 40) | \ - (((uint64_t)x & 0x0000000000FF0000) << 24) | \ - (((uint64_t)x & 0x00000000FF000000) << 8 ) | \ - (((uint64_t)x & 0x000000FF00000000) >> 8 ) | \ - (((uint64_t)x & 0x0000FF0000000000) >> 24) | \ - (((uint64_t)x & 0x00FF000000000000) >> 40) | \ - (((uint64_t)x & 0xFF00000000000000) >> 56)) + (((uint64_t)x & 0x00000000000000FFULL) << 56) | \ + (((uint64_t)x & 0x000000000000FF00ULL) << 40) | \ + (((uint64_t)x & 0x0000000000FF0000ULL) << 24) | \ + (((uint64_t)x & 0x00000000FF000000ULL) << 8 ) | \ + (((uint64_t)x & 0x000000FF00000000ULL) >> 8 ) | \ + (((uint64_t)x & 0x0000FF0000000000ULL) >> 24) | \ + (((uint64_t)x & 0x00FF000000000000ULL) >> 40) | \ + (((uint64_t)x & 0xFF00000000000000ULL) >> 56)) #define hton64(x) ntoh64(x) #elif __BYTE_ORDER == __BIG_ENDIAN #define hton16(x) (x) @@ -1029,7 +1029,7 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) /* * Setup our remote memory and tell the other side about it */ - rmr_send_msg.virtual_address = hton64((DAT_VADDR)rbuf); + rmr_send_msg.virtual_address = hton64((DAT_VADDR)(uintptr_t)rbuf); rmr_send_msg.segment_length = hton32(RDMA_BUFFER_SIZE); rmr_send_msg.rmr_context = hton32(rmr_context_recv); @@ -1230,7 +1230,7 @@ do_rdma_write_with_msg( void ) for (i=0;i> 24)) #define hton32(x) ntoh32(x) #define ntoh64(x) (uint64_t)( \ - (((uint64_t)x & 0x00000000000000FF) << 56) | \ - (((uint64_t)x & 0x000000000000FF00) << 40) | \ - (((uint64_t)x & 0x0000000000FF0000) << 24) | \ - (((uint64_t)x & 0x00000000FF000000) << 8 ) | \ - (((uint64_t)x & 0x000000FF00000000) >> 8 ) | \ - (((uint64_t)x & 0x0000FF0000000000) >> 24) | \ - (((uint64_t)x & 0x00FF000000000000) >> 40) | \ - (((uint64_t)x & 0xFF00000000000000) >> 56)) + (((uint64_t)x & 0x00000000000000FFULL) << 56) | \ + (((uint64_t)x & 0x000000000000FF00ULL) << 40) | \ + (((uint64_t)x & 0x0000000000FF0000ULL) << 24) | \ + (((uint64_t)x & 0x00000000FF000000ULL) << 8 ) | \ + (((uint64_t)x & 0x000000FF00000000ULL) >> 8 ) | \ + (((uint64_t)x & 0x0000FF0000000000ULL) >> 24) | \ + (((uint64_t)x & 0x00FF000000000000ULL) >> 40) | \ + (((uint64_t)x & 0xFF00000000000000ULL) >> 56)) #define hton64(x) ntoh64(x) #elif __BYTE_ORDER == __BIG_ENDIAN #define hton16(x) (x) @@ -216,7 +216,7 @@ send_msg( &event.event_data.dto_completion_event_data; iov.lmr_context = context; - iov.virtual_address = (DAT_VADDR)data; + iov.virtual_address = (DAT_VADDR)(uintptr_t)data; iov.segment_length = (DAT_VLEN)size; for (i=0;irmr_context = hton32(rmr_context[RCV_RDMA_BUF_INDEX]); - r_iov->virtual_address = hton64((DAT_VADDR)buf[RCV_RDMA_BUF_INDEX]); + r_iov->virtual_address = hton64((DAT_VADDR)(uintptr_t)buf[RCV_RDMA_BUF_INDEX]); r_iov->segment_length = hton32(buf_size); printf("Send RMR message: r_key_ctx=0x%x,va="F64x",len=0x%x\n", @@ -781,7 +781,7 @@ do_immediate() r_iov = *buf[RECV_BUF_INDEX]; iov.lmr_context = lmr_context[SND_RDMA_BUF_INDEX]; - iov.virtual_address = (DAT_VADDR) buf[SND_RDMA_BUF_INDEX]; + iov.virtual_address = (DAT_VADDR)(uintptr_t)buf[SND_RDMA_BUF_INDEX]; iov.segment_length = buf_size; cookie.as_64 = 0x9999; @@ -939,7 +939,7 @@ do_cmp_swap() r_iov = *buf[ RECV_BUF_INDEX ]; l_iov.lmr_context = lmr_atomic_context; - l_iov.virtual_address = (DAT_UINT64)atomic_buf; + l_iov.virtual_address = (DAT_UINT64)(uintptr_t)atomic_buf; l_iov.segment_length = BUF_SIZE_ATOMIC; cookie.as_64 = 3333; @@ -1040,7 +1040,7 @@ do_fetch_add() r_iov = *buf[ RECV_BUF_INDEX ]; l_iov.lmr_context = lmr_atomic_context; - l_iov.virtual_address = (DAT_UINT64)atomic_buf; + l_iov.virtual_address = (DAT_UINT64)(uintptr_t)atomic_buf; l_iov.segment_length = BUF_SIZE_ATOMIC; cookie.as_64 = 0x7777; -- 1.5.2.5 From rdreier at cisco.com Fri Sep 26 15:11:27 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 Sep 2008 15:11:27 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: <1222460720.3742.4.camel@balance> (Steven Dake's message of "Fri, 26 Sep 2008 13:25:20 -0700") References: <1222296236.29287.17.camel@balance> <1222460720.3742.4.camel@balance> Message-ID: > I would like to use standard UDP over IP but I'm not sure how to make > the last part work (IP over IB). I couldn't really find many docs on > how this is done. Just something like modprobe ib_ipoib ifconfig ib0 10.1.2.3 (or ib1, ib2, ... if you have multiple HCA ports in the system) - R. From rdreier at cisco.com Fri Sep 26 15:11:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 Sep 2008 15:11:57 -0700 Subject: [ofa-general] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <20080926212111.GQ15133@sgi.com> (akepner@sgi.com's message of "Fri, 26 Sep 2008 14:21:11 -0700") References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080926212111.GQ15133@sgi.com> Message-ID: > We'll give it a test asap. (Actually, we'll need to port the patch > to OFED 1.3.1 first, since that's what we're using/shipping.) Thanks! Let me know how it goes. - R. From sdake at redhat.com Fri Sep 26 15:27:26 2008 From: sdake at redhat.com (Steven Dake) Date: Fri, 26 Sep 2008 15:27:26 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: References: <1222296236.29287.17.camel@balance> <1222460720.3742.4.camel@balance> Message-ID: <1222468046.4443.10.camel@balance> On Fri, 2008-09-26 at 15:11 -0700, Roland Dreier wrote: > > I would like to use standard UDP over IP but I'm not sure how to make > > the last part work (IP over IB). I couldn't really find many docs on > > how this is done. > > Just something like > > modprobe ib_ipoib > ifconfig ib0 10.1.2.3 > > (or ib1, ib2, ... if you have multiple HCA ports in the system) > > - R. Roland, Thanks for the response. One more question and I'll get to experimenting. Does this kernel module use "RDMA" and provide highest performance available for the application? From rdreier at cisco.com Fri Sep 26 15:33:40 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 Sep 2008 15:33:40 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: <1222468046.4443.10.camel@balance> (Steven Dake's message of "Fri, 26 Sep 2008 15:27:26 -0700") References: <1222296236.29287.17.camel@balance> <1222460720.3742.4.camel@balance> <1222468046.4443.10.camel@balance> Message-ID: > One more question and I'll get to experimenting. Does this kernel > module use "RDMA" and provide highest performance available for the > application? No, it is just using the IB adapter as a very fast NIC. But you can't really use RDMA for multicast anyway. - R. From sashak at voltaire.com Fri Sep 26 15:39:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 27 Sep 2008 01:39:40 +0300 Subject: [ofa-general] Re: [PATCH] libibmad: eliminate compiler warnings on x86_64 In-Reply-To: <200809241659.19744.jackm@dev.mellanox.co.il> References: <200809241659.19744.jackm@dev.mellanox.co.il> Message-ID: <20080926223940.GU16914@sashak.voltaire.com> On 16:59 Wed 24 Sep , Jack Morgenstein wrote: > libibmad: eliminate compiler warnings on x86_64 > > The snprintf's below generated warnings of the form: > warning: format '%010lx' expects type 'long unsigned int', but argument 4 has type 'long long unsigned int' > on 64-bit systems -- due to the influence of the <>llu constants. Casting solves this. > > Signed-off-by: Jack Morgenstein Applied. Thanks. Sasha From sashak at voltaire.com Fri Sep 26 15:44:13 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 27 Sep 2008 01:44:13 +0300 Subject: [ofa-general] Re: [PATCH] infiniband-diags: eliminate compiler warnings In-Reply-To: <200809241659.23447.jackm@dev.mellanox.co.il> References: <200809241659.23447.jackm@dev.mellanox.co.il> Message-ID: <20080926224413.GV16914@sashak.voltaire.com> Hi Jack, On 16:59 Wed 24 Sep , Jack Morgenstein wrote: > infiniband: eliminate compiler warnings on x86_64 > > The printf's below generated warnings of the form: > warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'long long unsigned int' > on 64-bit systems -- due to the influence of the <>ull constants. Casting solves this. > > Signed-off-by: Jack Morgenstein > --- > > Please fix for upcoming OFED 1.4 release candidate. > > Index: infiniband-diags/src/ibping.c > =================================================================== > --- infiniband-diags.orig/src/ibping.c 2008-09-21 17:09:26.000000000 +0300 > +++ infiniband-diags/src/ibping.c 2008-09-24 16:38:16.000000000 +0300 > @@ -174,7 +174,7 @@ report(int sig) > printf("\n--- %s (%s) ibping statistics ---\n", last_host, portid2str(&portid)); > printf("%" PRIu64 " packets transmitted, %" PRIu64 " received, %" PRIu64 "%% packet loss, time %" PRIu64 " ms\n", > ntrans, replied, > - (lost != 0) ? lost * 100ull / ntrans : 0ull, total_time / 1000ull); > + (uint64_t) ((lost != 0) ? lost * 100ull / ntrans : 0ull), (uint64_t) (total_time / 1000ull)); I think instead of casting just removing 'ull' should solve the issue. Something like: diff --git a/infiniband-diags/src/ibping.c b/infiniband-diags/src/ibping.c index e847f42..bc3bc84 100644 --- a/infiniband-diags/src/ibping.c +++ b/infiniband-diags/src/ibping.c @@ -174,7 +174,7 @@ report(int sig) printf("\n--- %s (%s) ibping statistics ---\n", last_host, portid2str(&portid)); printf("%" PRIu64 " packets transmitted, %" PRIu64 " received, %" PRIu64 "%% packet loss, time %" PRIu64 " ms\n", ntrans, replied, - (lost != 0) ? lost * 100ull / ntrans : 0ull, total_time / 1000ull); + (lost != 0) ? lost * 100 / ntrans : 0, total_time / 1000); printf("rtt min/avg/max = %" PRIu64 ".%03" PRIu64 "/%" PRIu64 ".%03" PRIu64 "/%" PRIu64 ".%03" PRIu64 " ms\n", minrtt == ~0ull ? 0 : minrtt/1000, minrtt == ~0ull ? 0 : minrtt%1000, Does it work for you? Sasha From sdake at redhat.com Fri Sep 26 15:46:41 2008 From: sdake at redhat.com (Steven Dake) Date: Fri, 26 Sep 2008 15:46:41 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: References: <1222296236.29287.17.camel@balance> <1222460720.3742.4.camel@balance> <1222468046.4443.10.camel@balance> Message-ID: <1222469202.4443.25.camel@balance> On Fri, 2008-09-26 at 15:33 -0700, Roland Dreier wrote: > > One more question and I'll get to experimenting. Does this kernel > > module use "RDMA" and provide highest performance available for the > > application? > > No, it is just using the IB adapter as a very fast NIC. > Thanks so essentially to get RDMA support I'd have to use the RDMA library. > But you can't really use RDMA for multicast anyway. > > - R The RDMA library has features for joining a multicast group. Could you point me at the appropriate specifications about infiniband relating to the limitations of RDMA when used with multicast (or hand me an explanation would be fine too :) Also to answer your questions about performance requirements: today totem is entirely cpu bound when running on GIGE hardware with jumbo frames. A majority of this time I believe is spent copying message data between kernel and userspace, and then the other copies that take place around the protocol. I'd like to remove any copies that are present if possible to give the CPU more bandwidth to execute the protocol. Thanks Roland Regards -steve From rdreier at cisco.com Fri Sep 26 15:56:13 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 Sep 2008 15:56:13 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: <1222469202.4443.25.camel@balance> (Steven Dake's message of "Fri, 26 Sep 2008 15:46:41 -0700") References: <1222296236.29287.17.camel@balance> <1222460720.3742.4.camel@balance> <1222468046.4443.10.camel@balance> <1222469202.4443.25.camel@balance> Message-ID: > The RDMA library has features for joining a multicast group. Could you > point me at the appropriate specifications about infiniband relating to > the limitations of RDMA when used with multicast (or hand me an > explanation would be fine too :) RDMA (in the sense of remote direct memory access, ie one sided operations) can only be done over reliable connected transport. Multicast works only over unreliable datagram transport. You can get kernel bypass and zero-copy send/receive for multicast with libibverbs and librdmacm. But that strictly speaking isn't RDMA. > Also to answer your questions about performance requirements: today > totem is entirely cpu bound when running on GIGE hardware with jumbo > frames. A majority of this time I believe is spent copying message data > between kernel and userspace, and then the other copies that take place > around the protocol. I'd like to remove any copies that are present if > possible to give the CPU more bandwidth to execute the protocol. Seems implausible given today's memory bandwidths. I've seen slightly in excess of 10 gigabits/sec with TCP on IPoIB -- this is with netpipe and using large send/large receive offload but still all data is being copied at least once. Profiling to find out where your CPU is really going probably wouldn't be a bad idea before you go to the effort of reimplementing everything on InfiniBand. - R. From robert.j.woodruff at intel.com Fri Sep 26 15:56:58 2008 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 26 Sep 2008 15:56:58 -0700 Subject: [ofa-general] general questions about librdmacm In-Reply-To: <1222296236.29287.17.camel@balance> References: <1222296236.29287.17.camel@balance> Message-ID: Steve wrote, >Developers, >I am a maintainer of a project called openais/corosync (www.openais.org) >which implements a network protocol called Totem. This code is the >basis for much of the community work on clustering in Linux and other >platforms. I don't yet have hardware, but will shortly and intend to >add OFED RDMA support to the base Totem protocol used for our >communications. >Totem is a reliable virtual synchrony multicast protocol which transmits >a message from any node to all nodes in a collection of computers >(called the configuration or membership). It has a few requirements: >unreliable datagram multicast >unreliable datagram unicast >ability to bind to a specific port and interface >ability to poll() (POLLIN) via system call for new multicast datagram >messages >Today Totem is based upon IP(v4 or v6 are supported) and uses UDP. >Few questions: >1) I would like to continue to use IP addressing but it looks like I >have to use a different addressing model in librdmacm. I looked at the >examples in the library and it isn't clear to me whether they use IP >addressing or some other addressing model. I see references to IPoverIB >but I don't see any information in the wiki on the topic. Anyone have >links to documentation on the topic of node addressing? >2) The library doesn't have any non blocking (kernel wait queue based) >polling mechanism that I can see. Am I missing a call here? >3) Of course using the standard socket API would be highly desired as >it requires less code changes. Is there some other library I should be >using? >Regards >-steve He steve, Just adding Arlin Davis to this thread in case he missed it. Arlin is covering for Sean Hefty (the rdma_cm) maintainer while Sean is on Sabbatical. Arlin also maintains uDAPL, a middleware that supports access closer to the native verbs than IPoIB, but uses a different programming model. There is also something called RDS, reliable datagram service that Oracle maintains that might fit your application model a little better. woody _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From wangwhao at cn.ibm.com Fri Sep 26 17:55:50 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Sat, 27 Sep 2008 08:55:50 +0800 Subject: [ofa-general] ibsysstat cpu output is incomplete In-Reply-To: <20080926215419.GT16914@sashak.voltaire.com> Message-ID: > Hi, > > On 13:25 Tue 23 Sep , Wen Hao Wang wrote: >> >> I find the output if "ibsysstat cpu" is not complete. This issue >> exists on all my cluster nodes, with RHEL/SLES and OFED 1.3.1/1.4-RC1 >> installed. >> >> [root at xblade07 ~]# ibsysstat 13 cpu >> cpu 0: model Genuine Intel(R) CPU @ 2.83GHz MHZ 2833.512 >> cpu 1: model Genuine Intel(R) CPU @ 2.83GHz MHZ 2833.512 >> cpu 2: model Genuine Intel(R) CPU @ 2.83GHz M >> ---------------------> something missed > > ibsysstat uses vendor specific class 0x33. The single packet payload > size is limited by 216 bytes. It looks you got this limit. > > Sasha Hi Sasha: I opened bug 1237 for this issue. If the incomplete output is related to the packet size limitation, I think we may solve the problem by two solutions: 1. Increase the limitation to one large value (not sure whether this is feasible) 2. Transfer multiple packets until all the information is got and printed Thanks. Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Fri Sep 26 21:33:07 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 27 Sep 2008 07:33:07 +0300 Subject: [ofa-general] ibsysstat cpu output is incomplete In-Reply-To: References: <20080926215419.GT16914@sashak.voltaire.com> Message-ID: <20080927043307.GW16914@sashak.voltaire.com> On 08:55 Sat 27 Sep , Wen Hao Wang wrote: > > I opened bug 1237 for this issue. Yes, I saw already. > If the incomplete output is related to > the packet size limitation, I think we may solve the problem by two > solutions: > 1. Increase the limitation to one large value (not sure whether this is > feasible) We cannot do it following class 0x33 playload size limitation. > 2. Transfer multiple packets until all the information is got and printed Using RMPP (as allowed for classes in that range). Sasha From vlad at lists.openfabrics.org Sat Sep 27 03:11:22 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 27 Sep 2008 03:11:22 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080927-0200 daily build status Message-ID: <20080927101123.1BBB2E60C3D@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.21.1 Log: /home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1833: error: 'struct scatterlist' has no member named 'dma_address' /home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h: In function 'ib_sg_dma_len': /home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1846: error: 'struct scatterlist' has no member named 'dma_length' make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath/ipath_dma.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.21.1_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.21.1' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c: In function 'ehca_poll_eqs': /home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:942: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type /home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:946: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.24_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080927-0200_linux-2.6.24_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From go2buaa at 163.com Sat Sep 27 05:04:36 2008 From: go2buaa at 163.com (=?gb2312?B?z8TP/sus?=) Date: Sat, 27 Sep 2008 20:04:36 +0800 Subject: [ofa-general] ***SPAM*** some question for help Message-ID: <200809272004350334611@163.com> hi, I am a fresh man to OpenFabrics,and I want to add my own code to OpenFabrics.What should I do and How to ,such as compile ,test and debug. Thank you so much for hear from you soon. 2008-09-27 夏晓爽 -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.weirding at yahoo.com Sat Sep 27 13:53:18 2008 From: thomas.weirding at yahoo.com (Thomas Weirding) Date: Sat, 27 Sep 2008 13:53:18 -0700 (PDT) Subject: [ofa-general] ***SPAM*** rpmbuild of ofa-kernel fails Message-ID: <911830.50049.qm@web59606.mail.ac4.yahoo.com> Hi, Newbie to ofed (ofed1.3.1, centos 5.2) My rpmbuild rebuild of ofa-kernel fails with the following.. Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'configure_options   --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-cxgb3-mod --with-nes-mod --with-ipath_inf-mod --with-ipoib-mod' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 'KVERSION 2.6.18-53.1.21-lustre-1.6.5.1-ofed-1.3.1' --define 'K_SRC /lib/modules/2.6.18-53.1..21-lustre-1.6.5.1-ofed-1.3.1/build' --define 'network_dir /etc/sysconfig/network-scripts' --define '_prefix /usr' /extra/OFED-1.3.1/SRPMS/ofa_kernel-1.3.1-ofed1.3.1.src.rpm  Failed to build ofa_kernel RPM  + ./configure --prefix=/usr --kernel-version 2.6.18-53.1.21-lustre-1.6..5.1-ofed-1.3.1 --kernel-sources /lib/modules/2.6.18-53.1.21-lustre-1.6.5.1-ofed-1.3.1/build --modules-dir /lib/modules/2.6.18-53.1.21-lustre-1.6.5.1-ofed-1.3.1/updates --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-cxgb3-mod --with-nes-mod --with-ipath_inf-mod --with-ipoib-mod ofed_patch.mk does not exist. running ofed_patch.sh /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/ofed_scripts/ofed_patch.sh --kernel-version 2.6.18-53.1.21-lustre-1.6.5.1-ofed-1.3.1 mkdir -p /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/patches> touch /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/patches/quiltrc /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/kernel_patches/fixes/cma_0010_response_timeout.patch  /usr/local/bin/quilt --quiltrc /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3..1/patches/quiltrc import  /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1 /kernel_patches/fixes/cma_0010_response_timeout.patch  Usage: quilt command [-h] ...  Commands are:          add      files    new       push     snapshot          applied  fold     next      refresh  top          delete   fork     patches   remove   unapplied          diff     gendiff  pop       series          edit     import   previous  setup   Failed executing /usr/local/bin/quilt Failed executing /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/ofed_scripts/ofed_patch.sherror: Bad exit status from /var/tmp/rpm-tmp. TIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From panda at cse.ohio-state.edu Sat Sep 27 21:34:04 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun, 28 Sep 2008 00:34:04 -0400 (EDT) Subject: [ofa-general] Announcing the release of MVAPICH 1.1RC1 Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH 1.1RC1 with the following NEW features: - New Features for OpenFabrics Gen2-IB Interface - eXtended Reliable Connection (XRC) support - Lock-free design to provide support for asynchronous progress at both sender and receiver to overlap computation and communication - New OpenFabrics Gen2-Hybrid interface - Replaces the Gen2-UD interface of MVAPICH 1.0 series - Targeted for large-scale IB clusters (multi-thousand cores) to provide highest performance and minimal memory usage - Support for UD, RC and XRC transports - Adaptive selection during run-time (based on application and systems characteristics) to switch between RC and UD (or between XRC and UD) transports - Delivers performance and scalability with near constant memory footprint for communication contexts - Zero-copy protocol with UD for large data transfer - Multiple buffer organizations with XRC support - Shared memory communication between cores within a node - Multi-core optimized collectives (MPI_Bcast, MPI_Barrier, MPI_Reduce and MPI_Allreduce) - Enhanced MPI_Allgather collective For downloading MVAPICH 1.1RC1, associated user guide and accessing the SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu This version is also being made available through OFED 1.4. All feedbacks, including bug reports and hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From jackm at dev.mellanox.co.il Sat Sep 27 23:48:00 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 28 Sep 2008 09:48:00 +0300 Subject: [ofa-general] Re: [PATCH] infiniband-diags: eliminate compiler warnings In-Reply-To: <20080926224413.GV16914@sashak.voltaire.com> References: <200809241659.23447.jackm@dev.mellanox.co.il> <20080926224413.GV16914@sashak.voltaire.com> Message-ID: <200809280948.00866.jackm@dev.mellanox.co.il> On Saturday 27 September 2008 01:44, Sasha Khapyorsky wrote: > I think instead of casting just removing 'ull' should solve the issue. > Something like: > > diff --git a/infiniband-diags/src/ibping.c b/infiniband-diags/src/ibping.c > index e847f42..bc3bc84 100644 > --- a/infiniband-diags/src/ibping.c > +++ b/infiniband-diags/src/ibping.c > @@ -174,7 +174,7 @@ report(int sig) >         printf("\n--- %s (%s) ibping statistics ---\n", last_host, portid2str(&portid)); >         printf("%" PRIu64 " packets transmitted, %" PRIu64 " received, %" PRIu64 "%% packet loss, time %" PRIu64 " ms\n", >                 ntrans, replied, > -               (lost != 0) ?  lost * 100ull / ntrans : 0ull, total_time / 1000ull); > +               (lost != 0) ?  lost * 100 / ntrans : 0, total_time / 1000); >         printf("rtt min/avg/max = %" PRIu64 ".%03" PRIu64 "/%" PRIu64 ".%03" PRIu64 "/%" PRIu64 ".%03" PRIu64 " ms\n", >                 minrtt == ~0ull ? 0 : minrtt/1000, >                 minrtt == ~0ull ? 0 : minrtt%1000, > > Does it work for you? > Yes, it does. - Jack From sashak at voltaire.com Sun Sep 28 02:39:36 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 28 Sep 2008 12:39:36 +0300 Subject: [ofa-general] Re: [PATCH] infiniband-diags: eliminate compiler warnings In-Reply-To: <200809280948.00866.jackm@dev.mellanox.co.il> References: <200809241659.23447.jackm@dev.mellanox.co.il> <20080926224413.GV16914@sashak.voltaire.com> <200809280948.00866.jackm@dev.mellanox.co.il> Message-ID: <20080928093936.GE15515@sashak.voltaire.com> On 09:48 Sun 28 Sep , Jack Morgenstein wrote: > Yes, it does. Applied. Thanks. Sasha From vlad at lists.openfabrics.org Sun Sep 28 03:11:32 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 28 Sep 2008 03:11:32 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080928-0200 daily build status Message-ID: <20080928101132.A2BEAE60C90@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.21.1 Log: /home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1833: error: 'struct scatterlist' has no member named 'dma_address' /home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h: In function 'ib_sg_dma_len': /home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1846: error: 'struct scatterlist' has no member named 'dma_length' make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath/ipath_dma.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.21.1_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.21.1' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c: In function 'ehca_poll_eqs': /home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:942: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type /home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:946: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.24_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080928-0200_linux-2.6.24_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From eli at dev.mellanox.co.il Sun Sep 28 04:39:45 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Sun, 28 Sep 2008 14:39:45 +0300 Subject: [ofa-general] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> Message-ID: <20080928113945.GA32630@mtls03> On Fri, Sep 26, 2008 at 01:19:00PM -0700, Roland Dreier wrote: > How about this? Instead of trying to rely on some complicated and > fragile reasoning about when some race might occur, let's just do what > we want to do anyway and get rid of LLTX. We change from priv->tx_lock > (taken with IRQ disabling) to netif_tx_lock (taken on with > BH-disabling). And then we can keep the skb_orphan in the place it is, > since our xmit routine runs with IRQs enabled. > We'll integrate this into ofed 1.4 and monitor this through our regression system. From olga.shern at gmail.com Sun Sep 28 04:40:21 2008 From: olga.shern at gmail.com (Olga Shern (Voltaire)) Date: Sun, 28 Sep 2008 14:40:21 +0300 Subject: [ofa-general] ***SPAM*** Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <20080928113945.GA32630@mtls03> References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080928113945.GA32630@mtls03> Message-ID: Hi Eli, We also want to run regression tests with this patch. Please let me know when OFED daily build will include it. Thanks Olga On Sun, Sep 28, 2008 at 2:39 PM, Eli Cohen wrote: > On Fri, Sep 26, 2008 at 01:19:00PM -0700, Roland Dreier wrote: >> How about this? Instead of trying to rely on some complicated and >> fragile reasoning about when some race might occur, let's just do what >> we want to do anyway and get rid of LLTX. We change from priv->tx_lock >> (taken with IRQ disabling) to netif_tx_lock (taken on with >> BH-disabling). And then we can keep the skb_orphan in the place it is, >> since our xmit routine runs with IRQs enabled. >> > > We'll integrate this into ofed 1.4 and monitor this through our > regression system. > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From eli at dev.mellanox.co.il Sun Sep 28 04:48:03 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Sun, 28 Sep 2008 14:48:03 +0300 Subject: [ofa-general] Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080928113945.GA32630@mtls03> Message-ID: <20080928114803.GB32630@mtls03> On Sun, Sep 28, 2008 at 02:40:21PM +0300, Olga Shern (Voltaire) wrote: > Hi Eli, > > We also want to run regression tests with this patch. > Please let me know when OFED daily build will include it. > > Thanks > Olga > It will most likely appear in today's build. From vlad at mellanox.co.il Sun Sep 28 08:08:06 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 28 Sep 2008 18:08:06 +0300 Subject: [ofa-general] ***SPAM*** Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080928113945.GA32630@mtls03> Message-ID: <48DF9DD6.80508@mellanox.co.il> Olga Shern (Voltaire) wrote: > Hi Eli, > > We also want to run regression tests with this patch. > Please let me know when OFED daily build will include it. > > Thanks > Olga Hi Olga, OFED-1.4-20080928-0756.tgz includes this patch. Regards, Vladimir From monis at Voltaire.COM Sun Sep 28 08:09:14 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Sun, 28 Sep 2008 18:09:14 +0300 Subject: [ofa-general] ***SPAM*** Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <48DF9DD6.80508@mellanox.co.il> References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080928113945.GA32630@mtls03> <48DF9DD6.80508@mellanox.co.il> Message-ID: <48DF9E1A.3050807@Voltaire.COM> Vladimir Sokolovsky wrote: > Olga Shern (Voltaire) wrote: >> Hi Eli, >> >> We also want to run regression tests with this patch. >> Please let me know when OFED daily build will include it. >> >> Thanks >> Olga > > Hi Olga, > OFED-1.4-20080928-0756.tgz includes this patch. > > Regards, > Vladimir > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > Which commit is it in OFED tree? From vlad at mellanox.co.il Sun Sep 28 08:12:32 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 28 Sep 2008 18:12:32 +0300 Subject: [ofa-general] ***SPAM*** Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <48DF9E1A.3050807@Voltaire.COM> References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080928113945.GA32630@mtls03> <48DF9DD6.80508@mellanox.co.il> <48DF9E1A.3050807@Voltaire.COM> Message-ID: <1222614753.10548.16.camel@vlad-laptop> On Sun, 2008-09-28 at 18:09 +0300, Moni Shoua wrote: > Vladimir Sokolovsky wrote: > > Olga Shern (Voltaire) wrote: > >> Hi Eli, > >> > >> We also want to run regression tests with this patch. > >> Please let me know when OFED daily build will include it. > >> > >> Thanks > >> Olga > > > > Hi Olga, > > OFED-1.4-20080928-0756.tgz includes this patch. > > > > Regards, > > Vladimir > > _______________________________________________ > > ewg mailing list > > ewg at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > Which commit is it in OFED tree? commit 47595f49915ac51116916c640f3c0d98df521789 Author: Roland Dreier Date: Sun Sep 28 14:58:30 2008 +0300 IPoIB: Continue of "defer skb_orphan() until irqs enabled" Instead of trying to rely on some complicated and fragile reasoning about when some race might occur, let's just do what we want to do anyway and get rid of LLTX. We change from priv->tx_lock (taken with IRQ disabling) to netif_tx_lock (taken on with BH-disabling). And then we can keep the skb_orphan in the place it is, since our xmit routine runs with IRQs enabled. Most of this patch is just compensating for the fact that the tx_lock regions are now IRQ-enabled, and so we have to convert places that take priv->lock to disable IRQs too. If we could change ipoib_cm_rx_event_handler to not need priv->lock, then we could change priv->lock to a BH-disabling lock too and simplify things a bit further. I've tested this patch some in both datagram and connected mode with a kernel with lockdep and other debugging enabled, so it is at least somewhat sane. However more stress testing would definitely be helpful if we want to put this in 2.6.28. Also it would be interesting to see if there are any performance effects. Signed-off-by: Roland Dreier From tziporet at dev.mellanox.co.il Sun Sep 28 08:21:52 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 28 Sep 2008 18:21:52 +0300 Subject: [ofa-general] Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <20080926212111.GQ15133@sgi.com> References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080926212111.GQ15133@sgi.com> Message-ID: <48DFA110.9090706@mellanox.co.il> akepner at sgi.com wrote: > We'll give it a test asap. (Actually, we'll need to port the patch > to OFED 1.3.1 first, since that's what we're using/shipping.) > > > If you have a patch for 1.3.1 (with all backports) we can add it to OFED 1.3.2 branch too Tziporet From monis at Voltaire.COM Sun Sep 28 08:23:22 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Sun, 28 Sep 2008 18:23:22 +0300 Subject: [ofa-general] ***SPAM*** Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <1222614753.10548.16.camel@vlad-laptop> References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080928113945.GA32630@mtls03> <48DF9DD6.80508@mellanox.co.il> <48DF9E1A.3050807@Voltaire.COM> <1222614753.10548.16.camel@vlad-laptop> Message-ID: <48DFA16A.1070601@Voltaire.COM> Vladimir Sokolovsky wrote: > On Sun, 2008-09-28 at 18:09 +0300, Moni Shoua wrote: >> Vladimir Sokolovsky wrote: >>> Olga Shern (Voltaire) wrote: >>>> Hi Eli, >>>> >>>> We also want to run regression tests with this patch. >>>> Please let me know when OFED daily build will include it. >>>> >>>> Thanks >>>> Olga >>> Hi Olga, >>> OFED-1.4-20080928-0756.tgz includes this patch. >>> >>> Regards, >>> Vladimir >>> _______________________________________________ >>> ewg mailing list >>> ewg at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg >>> >> Which commit is it in OFED tree? > > commit 47595f49915ac51116916c640f3c0d98df521789 > Author: Roland Dreier > Date: Sun Sep 28 14:58:30 2008 +0300 > > IPoIB: Continue of "defer skb_orphan() until irqs enabled" > > Instead of trying to rely on some complicated and > fragile reasoning about when some race might occur, let's just do what > we want to do anyway and get rid of LLTX. We change from priv->tx_lock > (taken with IRQ disabling) to netif_tx_lock (taken on with > BH-disabling). And then we can keep the skb_orphan in the place it is, > since our xmit routine runs with IRQs enabled. > > Most of this patch is just compensating for the fact that the tx_lock > regions are now IRQ-enabled, and so we have to convert places that take > priv->lock to disable IRQs too. > > If we could change ipoib_cm_rx_event_handler to not need priv->lock, > then we could change priv->lock to a BH-disabling lock too and simplify > things a bit further. > > I've tested this patch some in both datagram and connected mode with a > kernel with lockdep and other debugging enabled, so it is at least > somewhat sane. However more stress testing would definitely be helpful > if we want to put this in 2.6.28. Also it would be interesting to see > if there are any performance effects. > > Signed-off-by: Roland Dreier > > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > Thanks. Did you check that this patch doesn't invalidate backport patch? Did you have to change any backport patch? From tziporet at dev.mellanox.co.il Sun Sep 28 08:25:13 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 28 Sep 2008 18:25:13 +0300 Subject: ***SPAM*** Re: [ofa-general] atomic operations on ppc64 In-Reply-To: <6978b4af0809251322x5bea6e39ta42293049137f5da@mail.gmail.com> References: <6978b4af0809230906h14a50f09l1f643e9967d3f72c@mail.gmail.com> <5D49E7A8952DC44FB38C38FA0D758EAD9B57B9@mtlexch01.mtl.com> <6978b4af0809250629l52335b7am85f0b361ec1731df@mail.gmail.com> <6978b4af0809250808oabb752cv23596c0f77c19732@mail.gmail.com> <48DC1A8E.5040008@gmail.com> <6978b4af0809250929j26095702v6d3c0080093c552e@mail.gmail.com> <48DC2349.5070709@gmail.com> <6978b4af0809251322x5bea6e39ta42293049137f5da@mail.gmail.com> Message-ID: <48DFA1D9.7080906@mellanox.co.il> Rui Machado wrote: >> I checked Mellanox's site and i here is the URL for the FW of ConnectX: >> http://www.mellanox.com/support/firmware_table_ConnectXIB.php >> >> In this page, i can see that the FW version is 2.5.000. >> > > OK. I thought I could only use the FW version on the IBM table (I > mentioned before) and not the one from the link you provide. > Please take new FW binary from IBM since you must use their spacial configuration. Maybe you wish to send us the test so we can debug the problem here Tziporet From vlad at mellanox.co.il Sun Sep 28 08:37:51 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 28 Sep 2008 18:37:51 +0300 Subject: [ofa-general] ***SPAM*** Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <48DFA16A.1070601@Voltaire.COM> References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080928113945.GA32630@mtls03> <48DF9DD6.80508@mellanox.co.il> <48DF9E1A.3050807@Voltaire.COM> <1222614753.10548.16.camel@vlad-laptop> <48DFA16A.1070601@Voltaire.COM> Message-ID: <1222616271.10548.36.camel@vlad-laptop> > Thanks. > Did you check that this patch doesn't invalidate backport patch? > Did you have to change any backport patch? > Yes, See commit 7891514239b97fe3396e92d2fcf24f7ab18e60e9. Moni, don't you have OFED's git tree to check? Regards, Vladimir From monis at Voltaire.COM Sun Sep 28 08:43:52 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Sun, 28 Sep 2008 18:43:52 +0300 Subject: [ofa-general] ***SPAM*** Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled" In-Reply-To: <1222616271.10548.36.camel@vlad-laptop> References: <48DA643E.9040605@Voltaire.COM> <20080924162034.GE15133@sgi.com> <20080924171135.GF15133@sgi.com> <20080924191623.GJ15133@sgi.com> <20080925114414.GA25044@mtls03> <20080928113945.GA32630@mtls03> <48DF9DD6.80508@mellanox.co.il> <48DF9E1A.3050807@Voltaire.COM> <1222614753.10548.16.camel@vlad-laptop> <48DFA16A.1070601@Voltaire.COM> <1222616271.10548.36.camel@vlad-laptop> Message-ID: <48DFA638.1040709@Voltaire.COM> Vladimir Sokolovsky wrote: >> Thanks. >> Did you check that this patch doesn't invalidate backport patch? >> Did you have to change any backport patch? >> > > Yes, > See commit 7891514239b97fe3396e92d2fcf24f7ab18e60e9. > > Moni, don't you have OFED's git tree to check? > > Regards, > Vladimir > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > Sorry. I missed that. I see it in my git now. From sashak at voltaire.com Sun Sep 28 13:22:42 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 28 Sep 2008 23:22:42 +0300 Subject: [ofa-general] Re: [OpenSM][16/18] - Routing Chaining In-Reply-To: <1221507242.6274.74.camel@cardanus.llnl.gov> References: <1221507242.6274.74.camel@cardanus.llnl.gov> Message-ID: <20080928202242.GF25831@sashak.voltaire.com> Hi Al, Some technical comments... On 12:34 Mon 15 Sep , Al Chu wrote: > stick a *next pointer into struct osm_routing_engine. Rearchitect > routing engine usage as a list instead of a single struct. > > Al > > -- > Albert Chu > chu11 at llnl.gov > 925-422-5311 > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > From 0cc04e2660a57a2b7b03e449563cc19bc5446811 Mon Sep 17 00:00:00 2001 > From: Albert Chu > Date: Fri, 12 Sep 2008 14:22:42 -0700 > Subject: [PATCH] rearchitect osm_routing_engine as a list data structure > > > Signed-off-by: Albert Chu > --- > opensm/include/opensm/osm_opensm.h | 10 +++- > opensm/opensm/osm_opensm.c | 73 +++++++++++++++++++++++++++++++----- > opensm/opensm/osm_ucast_mgr.c | 56 ++++++++++++++++++--------- > 3 files changed, 108 insertions(+), 31 deletions(-) > > diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h > index a1c255b..6990307 100644 > --- a/opensm/include/opensm/osm_opensm.h > +++ b/opensm/include/opensm/osm_opensm.h > @@ -126,6 +126,7 @@ struct osm_routing_engine { > int (*ucast_build_fwd_tables) (void *context); > void (*ucast_dump_tables) (void *context); > void (*delete) (void *context); > + struct osm_routing_engine *next; > }; > /* > * FIELDS > @@ -148,6 +149,9 @@ struct osm_routing_engine { > * delete > * The delete method, may be used for routing engine > * internals cleanup. > +* > +* next > +* Pointer to next routing engine in the list. > */ > > /****s* OpenSM: OpenSM/osm_opensm_t > @@ -178,7 +182,7 @@ typedef struct osm_opensm { > osm_log_t log; > cl_dispatcher_t disp; > cl_plock_t lock; > - struct osm_routing_engine routing_engine; > + struct osm_routing_engine *routing_engine_list; > osm_routing_engine_type_t routing_engine_used; > osm_stats_t stats; > osm_console_t console; > @@ -221,8 +225,8 @@ typedef struct osm_opensm { > * lock > * Shared lock guarding most OpenSM structures. > * > -* routing_engine > -* Routing engine; will be initialized then used. > +* routing_engine_list > +* List of routing engines that should be tried for use. > * > * routing_engine_used > * Indicates which routing engine was used to route a subnet. > diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c > index 48e75f5..5b49d0a 100644 > --- a/opensm/opensm/osm_opensm.c > +++ b/opensm/opensm/osm_opensm.c > @@ -135,36 +135,72 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const char *str) > > /********************************************************************** > **********************************************************************/ > +static void append_routing_engine(osm_opensm_t * p_osm, > + struct osm_routing_engine *routing_engine) > +{ > + struct osm_routing_engine *p_routing_engine; > + > + routing_engine->next = NULL; > + > + if (!p_osm->routing_engine_list) { > + p_osm->routing_engine_list = routing_engine; > + return; > + } > + > + p_routing_engine = p_osm->routing_engine_list; > + while (p_routing_engine->next) > + p_routing_engine = p_routing_engine->next; > + > + p_routing_engine->next = routing_engine; > +} > + > static void setup_routing_engine(osm_opensm_t * p_osm, const char *name) > { > + struct osm_routing_engine *routing_engine = NULL; > const struct routing_engine_module *r; > > + routing_engine = malloc(sizeof(struct osm_routing_engine)); > + if (!routing_engine) { > + OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > + "routing engine memory allocation failed\n"); > + return; > + } > + memset(routing_engine, '\0', sizeof(struct osm_routing_engine)); > + > if (!name) { > - osm_ucast_minhop_setup(&p_osm->routing_engine, p_osm); > + osm_ucast_minhop_setup(routing_engine, p_osm); > + append_routing_engine(p_osm, routing_engine); > return; > } > > for (r = routing_modules; r->name && *r->name; r++) { > if (!strcmp(r->name, name)) { > - p_osm->routing_engine.name = r->name; > - if (r->setup(&p_osm->routing_engine, p_osm)) { > + routing_engine->name = r->name; > + if (r->setup(routing_engine, p_osm)) { > OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > "setup of routing" > " engine \'%s\' failed\n", name); > - break; > + return; > } > OSM_LOG(&p_osm->log, OSM_LOG_DEBUG, > "\'%s\' routing engine set up\n", > - p_osm->routing_engine.name); > + routing_engine->name); > + append_routing_engine(p_osm, routing_engine); > return; > } > } > > OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > - "cannot find or setup routing engine" > - " \'%s\'. Minhop will be used instead\n", > + "cannot find or setup routing engine \'%s\'", > name); > - osm_ucast_minhop_setup(&p_osm->routing_engine, p_osm); No free() for routing_engine in case of failure. > +} > + > +static void setup_default_routing_engine(osm_opensm_t * p_osm) > +{ > + setup_routing_engine(p_osm, NULL); > + > + OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > + "Minhop configured as default routing\n"); > } > > /********************************************************************** > @@ -184,6 +220,21 @@ void osm_opensm_construct(IN osm_opensm_t * const p_osm) > > /********************************************************************** > **********************************************************************/ > +static void _osm_opensm_routing_engine_destroy(IN osm_opensm_t * const p_osm) > +{ > + struct osm_routing_engine *p_routing_engine; > + > + if (!p_osm->routing_engine_list) > + return; > + > + p_routing_engine = p_osm->routing_engine_list; > + while (p_routing_engine) { > + if (p_routing_engine->delete) > + p_routing_engine->delete(p_routing_engine->context); > + p_routing_engine = p_routing_engine->next; No free() for p_routing_engine. > + } > +} > + > void osm_opensm_destroy(IN osm_opensm_t * const p_osm) > { > /* in case of shutdown through exit proc - no ^C */ > @@ -221,8 +272,7 @@ void osm_opensm_destroy(IN osm_opensm_t * const p_osm) > osm_sa_db_file_dump(p_osm); > > /* do the destruction in reverse order as init */ > - if (p_osm->routing_engine.delete) > - p_osm->routing_engine.delete(p_osm->routing_engine.context); > + _osm_opensm_routing_engine_destroy(p_osm); > osm_sa_destroy(&p_osm->sa); > osm_sm_destroy(&p_osm->sm); > #ifdef ENABLE_OSM_PERF_MGR > @@ -384,6 +434,9 @@ osm_opensm_init(IN osm_opensm_t * const p_osm, > > setup_routing_engine(p_osm, p_opt->routing_engine_name); > > + if (!p_osm->routing_engine_list) > + setup_default_routing_engine(p_osm); > + This (default engine setup) is duplicated in setup_routing_engine(). > p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; > > p_osm->node_name_map = open_node_name_map(p_opt->node_name_map_name); > diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c > index b8272ee..f905eab 100644 > --- a/opensm/opensm/osm_ucast_mgr.c > +++ b/opensm/opensm/osm_ucast_mgr.c > @@ -245,20 +245,48 @@ static int ucast_mgr_setup_all_switches(osm_subn_t * p_subn) > > /********************************************************************** > **********************************************************************/ > +int _osm_ucast_mgr_route(struct osm_routing_engine *p_routing_eng, > + osm_opensm_t *p_osm) > +{ > + int blm = -1; > + int ubft = -1; > + > + CL_ASSERT(p_routing_eng->build_lid_matrices); > + CL_ASSERT(p_routing_eng->ucast_build_fwd_tables); > + > + blm = p_routing_eng->build_lid_matrices(p_routing_eng->context); > + > + /* > + Now that the lid matrices have been built, we can > + build and download the switch forwarding tables. > + */ > + if (!blm) > + ubft = p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context); > + > + if (!blm && !ubft) { > + p_osm->routing_engine_used = > + osm_routing_engine_type(p_routing_eng->name); > + return 0; > + } > + > + OSM_LOG(&p_osm->log, OSM_LOG_DEBUG, > + "failed to route using routing algorithm %s\n", > + p_routing_eng->name); > + return -1; > +} > + > osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) > { > osm_opensm_t *p_osm; > struct osm_routing_engine *p_routing_eng; > osm_signal_t signal = OSM_SIGNAL_DONE; > cl_qmap_t *p_sw_guid_tbl; > - int blm = -1; > - int ubft = -1; > > OSM_LOG_ENTER(p_mgr->p_log); > > p_sw_guid_tbl = &p_mgr->p_subn->sw_guid_tbl; > p_osm = p_mgr->p_subn->p_osm; > - p_routing_eng = &p_osm->routing_engine; > + p_routing_eng = p_osm->routing_engine_list; > > CL_PLOCK_EXCL_ACQUIRE(p_mgr->p_lock); > > @@ -271,22 +299,14 @@ osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) > > p_mgr->any_change = FALSE; > > - CL_ASSERT(p_routing_eng->build_lid_matrices); > - CL_ASSERT(p_routing_eng->ucast_build_fwd_tables); > - > - blm = p_routing_eng->build_lid_matrices(p_routing_eng->context); > - > - /* > - Now that the lid matrices have been built, we can > - build and download the switch forwarding tables. > - */ > - if (!blm) > - ubft = p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context); > + p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; > + while (p_routing_eng) { > + if (!_osm_ucast_mgr_route(p_routing_eng, p_osm)) > + break; > + p_routing_eng = p_routing_eng->next; > + } > > - if (!blm && !ubft) > - p_osm->routing_engine_used = > - osm_routing_engine_type(p_routing_eng->name); > - else { > + if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) { > /* If configured routing algorithm failed, use default MinHop */ > osm_ucast_minhop_no_failure_build_lid_matrices(p_osm); > osm_ucast_minhop_no_failure_build_fwd_tables(p_osm); > -- > 1.5.4.5 > From sashak at voltaire.com Sun Sep 28 13:26:48 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 28 Sep 2008 23:26:48 +0300 Subject: [ofa-general] Re: [OpenSM][0/18] - Routing Chaining In-Reply-To: <1221506448.6274.32.camel@cardanus.llnl.gov> References: <1221506448.6274.32.camel@cardanus.llnl.gov> Message-ID: <20080928202648.GG25831@sashak.voltaire.com> Hi Al, Sorry about delay. It took some time to review this patch series in deep. On 12:20 Mon 15 Sep , Al Chu wrote: > > As we've discussed before, we wanted to put routing chaining into > opensm. This is great. Thanks! > osm_ucast defaults to minhop - The current code automatically > defaulted to minhop if anything in the selected routing engine failed. > Naturally this had to be changed for routing chaining. I moved minhop > out of the ucast_mgr code to make it its own routing engine instead. I fully agree that minhop should be implemented as regular routing engine. However I don't think we need to move osm_ucast_mgr_build_lid_matrices() and ucast_mgr_build_lfts() and make it minhop routing engine specific. osm_ucast_mgr_build_lid_matrices() implements generic lid matrix generation code and ucast_mgr_build_lfts() has generic balancer, both are useful by other routing engines. > osm_ucast assumption on routing failures - The current code defaulted > to minhop if anything in the selected routing engine failed. Because > of this some routing engines (most notably "file" routing) > intentionally "failed" when it wanted default to some portion of > minhop behavior. All routing behavior had to be moved into routing > engines to have the routing engines fully fail/succeed on their own. Maybe we can use simpler solution - to use method's return status. It can return negative value on failure, zero on success and positive if method fallback is requested. This will help to keep routing engine method to be optional (potentially we can add multicast routing methods there) and to keep this simple enough when single method fallback is desired. > updn routing - currently utilizes the minhop build_fwd_tables but > minhop's code assumes if build_lid_matrices is not-null, it is in > "up/dn routing mode" instead of "minhop mode". Perfectly fine when > you can specify max of one routing engine, but needs to be abstracted > out of minhop so up/dn is independent in its routing "attempt" in the > chain. Agree. But actually usage of this check is odd IMHO. I think it is fine to just remove this and make unconditional debug warning instead. > minhop routing assumed to never fail - Currently minhop routing cannot > "fail". So if someone wanted to put minhop into the middle of a > routing chain, it makes no sense. I assume this was based on legacy, > when the minhop algorithm did not have options like > "guid_routing_order_file" that could be parsed incorrectly. Sure it is legacy, but not for only this reason. In some already old days OpenSM state machine was design so that "managers" (such as ucast_mgr, lid_mgr, etc.) were needed to return value (osm_signal_t) which indicates sending MADs instead of its actual execution status. Fortunately it is not the case anymore, so we can rework all managers (including ucast_mgr) to return its real "status". > So, lots of rearchitecture were done and lots of cleanup was done as > well. Some bug fixes along the way too. Finally this patch series leaves as with us: 21 files changed, 1538 insertions(+), 698 deletions(-) , which I think is pretty big for routing chaining :(. Assuming that we want this important feature to be included in OFED-1.4 and that OFED release cycle is already in "RC" phase I reworked this as single and smaller patch which is based on your original patch series (so obviously authorship is preserved). It includes some thoughts mentioned above and also works with ibsim (still test it although). I will post it to the list shortly. Let me know how it looks? Ideally it would be nice to have it integrated before the next RC release (6 Oct). Sasha From sashak at voltaire.com Sun Sep 28 13:42:44 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 28 Sep 2008 23:42:44 +0300 Subject: [ofa-general] [PATCH] opensm: routing chaining In-Reply-To: <20080928202648.GG25831@sashak.voltaire.com> References: <1221506448.6274.32.camel@cardanus.llnl.gov> <20080928202648.GG25831@sashak.voltaire.com> Message-ID: <20080928204244.GH25831@sashak.voltaire.com> From: Albert Chu Routing chaining is the ability to configure the order in which routing algorithms are applied in opensm, i.e. -R ftree,updn,minhop Try using ftree routing. If ftree fails, try updn. If updn fails, try minhop. In order to get this done, some rearchitecture of the routing code had to be done b/c there is no longer an assumption that only one routing engine can be specified. Always setup a routing engine, assume no default "fallthrough" minhop routing engine. On configured routing engine failure, do minhop as a last resort. Stick a *next pointer into struct osm_routing_engine. Rearchitect routing engine usage as a list instead of a single struct. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_opensm.h | 10 ++- opensm/include/opensm/osm_subnet.h | 7 +- opensm/include/opensm/osm_ucast_mgr.h | 2 +- opensm/man/opensm.8.in | 8 ++- opensm/opensm/main.c | 10 ++- opensm/opensm/osm_opensm.c | 121 +++++++++++++++++++++++---------- opensm/opensm/osm_subnet.c | 11 ++- opensm/opensm/osm_ucast_file.c | 19 ++--- opensm/opensm/osm_ucast_ftree.c | 35 ++++------ opensm/opensm/osm_ucast_lash.c | 16 ++-- opensm/opensm/osm_ucast_mgr.c | 119 +++++++++++++++++++++----------- opensm/opensm/osm_ucast_updn.c | 10 ++-- 12 files changed, 226 insertions(+), 142 deletions(-) diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h index 5d45724..c121be4 100644 --- a/opensm/include/opensm/osm_opensm.h +++ b/opensm/include/opensm/osm_opensm.h @@ -126,6 +126,7 @@ struct osm_routing_engine { int (*ucast_build_fwd_tables) (void *context); void (*ucast_dump_tables) (void *context); void (*delete) (void *context); + struct osm_routing_engine *next; }; /* * FIELDS @@ -148,6 +149,9 @@ struct osm_routing_engine { * delete * The delete method, may be used for routing engine * internals cleanup. +* +* next +* Pointer to next routing engine in the list. */ /****s* OpenSM: OpenSM/osm_opensm_t @@ -178,7 +182,7 @@ typedef struct osm_opensm { osm_log_t log; cl_dispatcher_t disp; cl_plock_t lock; - struct osm_routing_engine routing_engine; + struct osm_routing_engine *routing_engine_list; osm_routing_engine_type_t routing_engine_used; osm_stats_t stats; osm_console_t console; @@ -221,8 +225,8 @@ typedef struct osm_opensm { * lock * Shared lock guarding most OpenSM structures. * -* routing_engine -* Routing engine; will be initialized then used. +* routing_engine_list +* List of routing engines that should be tried for use. * * routing_engine_used * Indicates which routing engine was used to route a subnet. diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index f90f7ea..0c7f3b9 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -182,7 +182,7 @@ typedef struct osm_subn_opt { char *port_prof_ignore_file; boolean_t port_profile_switch_nodes; boolean_t sweep_on_trap; - char *routing_engine_name; + char *routing_engine_names; boolean_t connect_roots; char *lid_matrix_dump_file; char *lfts_file; @@ -353,9 +353,8 @@ typedef struct osm_subn_opt { * sweep_on_trap * Received traps will initiate a new sweep. * -* routing_engine_name -* Name of used routing engine -* (other than default Min Hop Algorithm) +* routing_engine_names +* Name of routing engine(s) to use. * * connect_roots * The option which will enforce root to root connectivity with diff --git a/opensm/include/opensm/osm_ucast_mgr.h b/opensm/include/opensm/osm_ucast_mgr.h index 1dc9a37..59ba9fa 100644 --- a/opensm/include/opensm/osm_ucast_mgr.h +++ b/opensm/include/opensm/osm_ucast_mgr.h @@ -264,7 +264,7 @@ osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr, * * SYNOPSIS */ -void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr); +int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr); /* * PARAMETERS * p_mgr diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index 565c5f8..6790d11 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -9,7 +9,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA) [\-F | \-\-config ] [\-c(reate-config) ] [\-g(uid) ] [\-l(mc) ] [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] -[\-R | \-\-routing_engine ] +[\-R | \-\-routing_engine ] [\-z | \-\-connect_roots] [\-M | \-\-lid_matrix_file ] [\-U | \-\-lfts_file ] @@ -116,8 +116,10 @@ Without -r, OpenSM attempts to preserve existing LID assignments resolving multiple use of same LID. .TP \fB\-R\fR, \fB\-\-routing_engine\fR -This option chooses routing engine instead of Min Hop -algorithm (default). +This option chooses routing engine(s) to use instead of Min Hop +algorithm (default). Multiple routing engines can be specified +separated by commas so that specific ordering of routing algorithms +will be tried if earlier routing engines fail. Supported engines: minhop, updn, file, ftree, lash, dor .TP \fB\-z\fR, \fB\-\-connect_roots\fR diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 01bfddf..2f53157 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -177,8 +177,10 @@ static void show_usage(void) " LID assignments resolving multiple use of same LID.\n\n"); printf("-R\n" "--routing_engine \n" - " This option chooses routing engine instead of Min Hop\n" - " algorithm (default).\n" + " This option chooses routing engine(s) to use instead of default\n" + " Min Hop algorithm. Multiple routing engines can be specified\n" + " separated by commas so that specific ordering of routing\n" + " algorithms will be tried if earlier routing engines fail.\n" " Supported engines: updn, file, ftree, lash, dor\n\n"); printf("-z\n" "--connect_roots\n" @@ -851,8 +853,8 @@ int main(int argc, char *argv[]) break; case 'R': - opt.routing_engine_name = optarg; - printf(" Activate \'%s\' routing engine\n", optarg); + opt.routing_engine_names = optarg; + printf(" Activate \'%s\' routing engine(s)\n", optarg); break; case 'z': diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c index d17fed3..4970d0c 100644 --- a/opensm/opensm/osm_opensm.c +++ b/opensm/opensm/osm_opensm.c @@ -61,24 +61,23 @@ struct routing_engine_module { const char *name; - int (*setup) (osm_opensm_t * p_osm); + int (*setup) (struct osm_routing_engine *, osm_opensm_t *); }; -extern int osm_ucast_updn_setup(osm_opensm_t * p_osm); -extern int osm_ucast_file_setup(osm_opensm_t * p_osm); -extern int osm_ucast_ftree_setup(osm_opensm_t * p_osm); -extern int osm_ucast_lash_setup(osm_opensm_t * p_osm); - -static int osm_ucast_null_setup(osm_opensm_t * p_osm); +extern int osm_ucast_minhop_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_updn_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_file_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *); const static struct routing_engine_module routing_modules[] = { - {"null", osm_ucast_null_setup}, - {"minhop", osm_ucast_null_setup}, + {"minhop", osm_ucast_minhop_setup}, {"updn", osm_ucast_updn_setup}, {"file", osm_ucast_file_setup}, {"ftree", osm_ucast_ftree_setup}, {"lash", osm_ucast_lash_setup}, - {"dor", osm_ucast_null_setup}, + {"dor", osm_ucast_dor_setup}, {NULL, NULL} }; @@ -135,33 +134,77 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const char *str) /********************************************************************** **********************************************************************/ -static int setup_routing_engine(osm_opensm_t * p_osm, const char *name) +static void append_routing_engine(osm_opensm_t *osm, + struct osm_routing_engine *routing_engine) { - const struct routing_engine_module *r; + struct osm_routing_engine *r; + + routing_engine->next = NULL; + + if (!osm->routing_engine_list) { + osm->routing_engine_list = routing_engine; + return; + } + + r = osm->routing_engine_list; + while (r->next) + r = r->next; - for (r = routing_modules; r->name && *r->name; r++) { - if (!strcmp(r->name, name)) { - p_osm->routing_engine.name = r->name; - if (r->setup(p_osm)) { - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, + r->next = routing_engine; +} + +static void setup_routing_engine(osm_opensm_t *osm, const char *name) +{ + struct osm_routing_engine *re; + const struct routing_engine_module *m; + + for (m = routing_modules; m->name && *m->name; m++) { + if (!strcmp(m->name, name)) { + re = malloc(sizeof(struct osm_routing_engine)); + if (!re) { + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, + "memory allocation failed\n"); + return; + } + memset(re, 0, sizeof(struct osm_routing_engine)); + + re->name = m->name; + if (m->setup(re, osm)) { + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, "setup of routing" " engine \'%s\' failed\n", name); - return -2; + return; } - OSM_LOG(&p_osm->log, OSM_LOG_DEBUG, - "\'%s\' routing engine set up\n", - p_osm->routing_engine.name); - return 0; + OSM_LOG(&osm->log, OSM_LOG_DEBUG, + "\'%s\' routing engine set up\n", re->name); + append_routing_engine(osm, re); + return; } } - return -1; + + OSM_LOG(&osm->log, OSM_LOG_ERROR, + "cannot find or setup routing engine \'%s\'", name); } -static int osm_ucast_null_setup(osm_opensm_t * p_osm) +static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names) { - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, - "nothing yet - using default (minhop) routing engine\n"); - return 0; + char *name, *str, *p; + + if (!engine_names || !*engine_names) { + setup_routing_engine(osm, "minhop"); + return; + } + + str = strdup(engine_names); + name = strtok_r(str, ", \t\n", &p); + while (name && *name) { + setup_routing_engine(osm, name); + name = strtok_r(NULL, ", \t\n", &p); + } + free(str); + + if (!osm->routing_engine_list) + setup_routing_engine(osm, "minhop"); } /********************************************************************** @@ -181,6 +224,20 @@ void osm_opensm_construct(IN osm_opensm_t * const p_osm) /********************************************************************** **********************************************************************/ +static void destroy_routing_engines(osm_opensm_t *osm) +{ + struct osm_routing_engine *r, *next; + + next = osm->routing_engine_list; + while (next) { + r = next; + next = r->next; + if (r->delete) + r->delete(r->context); + free(r); + } +} + void osm_opensm_destroy(IN osm_opensm_t * const p_osm) { /* in case of shutdown through exit proc - no ^C */ @@ -218,8 +275,7 @@ void osm_opensm_destroy(IN osm_opensm_t * const p_osm) osm_sa_db_file_dump(p_osm); /* do the destruction in reverse order as init */ - if (p_osm->routing_engine.delete) - p_osm->routing_engine.delete(p_osm->routing_engine.context); + destroy_routing_engines(p_osm); osm_sa_destroy(&p_osm->sa); osm_sm_destroy(&p_osm->sm); #ifdef ENABLE_OSM_PERF_MGR @@ -371,12 +427,7 @@ osm_opensm_init(IN osm_opensm_t * const p_osm, goto Exit; #endif /* ENABLE_OSM_PERF_MGR */ - if (p_opt->routing_engine_name && - setup_routing_engine(p_osm, p_opt->routing_engine_name)) - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, - "cannot find or setup routing engine" - " \'%s\'. Default will be used instead\n", - p_opt->routing_engine_name); + setup_routing_engines(p_osm, p_opt->routing_engine_names); p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 278aa3d..a39ce75 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -442,7 +442,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) p_opt->port_prof_ignore_file = NULL; p_opt->port_profile_switch_nodes = FALSE; p_opt->sweep_on_trap = TRUE; - p_opt->routing_engine_name = NULL; + p_opt->routing_engine_names = NULL; p_opt->connect_roots = FALSE; p_opt->lid_matrix_dump_file = NULL; p_opt->lfts_file = NULL; @@ -1264,7 +1264,7 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) p_key, p_val, &p_opts->sweep_on_trap); opts_unpack_charp("routing_engine", - p_key, p_val, &p_opts->routing_engine_name); + p_key, p_val, &p_opts->routing_engine_names); opts_unpack_boolean("connect_roots", p_key, p_val, &p_opts->connect_roots); @@ -1521,9 +1521,12 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) fprintf(opts_file, "# Routing engine\n" + "# Multiple routing engines can be specified separated by\n" + "# commas so that specific ordering of routing algorithms will\n" + "# be tried if earlier routing engines fail.\n" "# Supported engines: minhop, updn, file, ftree, lash, dor\n" - "routing_engine %s\n\n", p_opts->routing_engine_name ? - p_opts->routing_engine_name : null_str); + "routing_engine %s\n\n", p_opts->routing_engine_names ? + p_opts->routing_engine_names : null_str); fprintf(opts_file, "# Connect roots (use FALSE if unsure)\n" diff --git a/opensm/opensm/osm_ucast_file.c b/opensm/opensm/osm_ucast_file.c index 3d00cb2..cbd65c1 100644 --- a/opensm/opensm/osm_ucast_file.c +++ b/opensm/opensm/osm_ucast_file.c @@ -135,14 +135,13 @@ static int do_ucast_file_load(void *context) OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, "LFTs file name is not given; " "using default routing algorithm\n"); - return -1; + return 1; } file = fopen(file_name, "r"); if (!file) { OSM_LOG(&p_osm->log, OSM_LOG_ERROR | OSM_LOG_SYS, "ERR 6302: " - "cannot open ucast dump file \'%s\'; " - "using default routing algorithm\n", file_name); + "cannot open ucast dump file \'%s\': %m\n", file_name); return -1; } @@ -270,15 +269,13 @@ static int do_lid_matrix_file_load(void *context) OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, "lid matrix file name is not given; " "using default lid matrix generation algorithm\n"); - return -1; + return 1; } file = fopen(file_name, "r"); if (!file) { OSM_LOG(&p_osm->log, OSM_LOG_ERROR | OSM_LOG_SYS, "ERR 6305: " - "cannot open lid matrix file \'%s\'; " - "using default lid matrix generation algorithm\n", - file_name); + "cannot open lid matrix file \'%s\': %m\n", file_name); return -1; } @@ -389,10 +386,10 @@ static int do_lid_matrix_file_load(void *context) return 0; } -int osm_ucast_file_setup(osm_opensm_t * p_osm) +int osm_ucast_file_setup(struct osm_routing_engine *r, osm_opensm_t *osm) { - p_osm->routing_engine.context = (void *)p_osm; - p_osm->routing_engine.build_lid_matrices = do_lid_matrix_file_load; - p_osm->routing_engine.ucast_build_fwd_tables = do_ucast_file_load; + r->context = osm; + r->build_lid_matrices = do_lid_matrix_file_load; + r->ucast_build_fwd_tables = do_ucast_file_load; return 0; } diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index 1d3233c..15168b7 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -3552,8 +3552,7 @@ static int __osm_ftree_construct_fabric(IN void *context) OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, "Ranking FatTree\n"); if (__osm_ftree_fabric_rank(p_ftree) != 0) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Failed ranking the tree - " - "fat-tree routing falls back to default routing\n"); + "Failed ranking the tree\n"); status = -1; goto Exit; } @@ -3567,14 +3566,12 @@ static int __osm_ftree_construct_fabric(IN void *context) "Populating CA & switch ports\n"); if (__osm_ftree_fabric_populate_ports(p_ftree) != 0) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric topology is not a fat-tree - " - "routing falls back to default routing\n"); + "Fabric topology is not a fat-tree\n"); status = -1; goto Exit; } else if (p_ftree->cn_num == 0) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric has no valid compute nodes - " - "routing falls back to default routing\n"); + "Fabric has no valid compute nodes\n"); status = -1; goto Exit; } @@ -3586,8 +3583,7 @@ static int __osm_ftree_construct_fabric(IN void *context) if (__osm_ftree_fabric_get_rank(p_ftree) > FAT_TREE_MAX_RANK || __osm_ftree_fabric_get_rank(p_ftree) < FAT_TREE_MIN_RANK) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric rank is %u (should be between %u and %u) - " - "fat-tree routing falls back to default routing\n", + "Fabric rank is %u (should be between %u and %u)\n", __osm_ftree_fabric_get_rank(p_ftree), FAT_TREE_MIN_RANK, FAT_TREE_MAX_RANK); status = -1; @@ -3600,8 +3596,7 @@ static int __osm_ftree_construct_fabric(IN void *context) validation - it checks that all the CNs are at the same rank. */ if (__osm_ftree_fabric_mark_leaf_switches(p_ftree)) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric topology is not a fat-tree - " - "routing falls back to default routing\n"); + "Fabric topology is not a fat-tree\n"); status = -1; goto Exit; } @@ -3619,8 +3614,7 @@ static int __osm_ftree_construct_fabric(IN void *context) In any case, the first and the last switches in the array are REAL leafs. */ if (__osm_ftree_fabric_create_leaf_switch_array(p_ftree)) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric topology is not a fat-tree - " - "routing falls back to default routing\n"); + "Fabric topology is not a fat-tree\n"); status = -1; goto Exit; } @@ -3640,8 +3634,7 @@ static int __osm_ftree_construct_fabric(IN void *context) if (!__osm_ftree_fabric_roots_provided(p_ftree) && !__osm_ftree_fabric_validate_topology(p_ftree)) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric topology is not a fat-tree - " - "routing falls back to default routing\n"); + "Fabric topology is not a fat-tree\n"); status = -1; goto Exit; } @@ -3726,7 +3719,7 @@ static void __osm_ftree_delete(IN void *context) /*************************************************** ***************************************************/ -int osm_ucast_ftree_setup(osm_opensm_t * p_osm) +int osm_ucast_ftree_setup(struct osm_routing_engine *r, osm_opensm_t * p_osm) { ftree_fabric_t *p_ftree = __osm_ftree_fabric_create(); if (!p_ftree) @@ -3734,12 +3727,10 @@ int osm_ucast_ftree_setup(osm_opensm_t * p_osm) p_ftree->p_osm = p_osm; - p_osm->routing_engine.context = (void *)p_ftree; - p_osm->routing_engine.build_lid_matrices = __osm_ftree_construct_fabric; - p_osm->routing_engine.ucast_build_fwd_tables = __osm_ftree_do_routing; - p_osm->routing_engine.delete = __osm_ftree_delete; + r->context = (void *)p_ftree; + r->build_lid_matrices = __osm_ftree_construct_fabric; + r->ucast_build_fwd_tables = __osm_ftree_do_routing; + r->delete = __osm_ftree_delete; + return 0; } - -/*************************************************** - ***************************************************/ diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index b985e9a..ce3982f 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -785,7 +785,7 @@ static int init_lash_structures(lash_t * p_lash) unsigned vl_min = p_lash->vl_min; unsigned num_switches = p_lash->num_switches; osm_log_t *p_log = &p_lash->p_osm->log; - int status = IB_SUCCESS; + int status = 0; unsigned int i, j, k; OSM_LOG_ENTER(p_log); @@ -852,7 +852,7 @@ static int init_lash_structures(lash_t * p_lash) goto Exit; Exit_Mem_Error: - status = IB_ERROR; + status = -1; OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D01: " "Could not allocate required memory for LASH errno %d, errno %d for lack of memory\n", errno, ENOMEM); @@ -875,7 +875,7 @@ static int lash_core(lash_t * p_lash) int stop = 0, output_link, i_next_switch; int output_link2, i_next_switch2; int cycle_found2 = 0; - int status = IB_SUCCESS; + int status = 0; int *switch_bitmap = NULL; /* Bitmap to check if we have processed this pair */ OSM_LOG_ENTER(p_log); @@ -1028,7 +1028,7 @@ static int lash_core(lash_t * p_lash) goto Exit; Error_Not_Enough_Lanes: - status = IB_ERROR; + status = -1; OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D02: " "Lane requirements (%d) exceed available lanes (%d)\n", p_lash->vl_min, lanes_needed); @@ -1360,15 +1360,15 @@ uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, osm_port_t * p_src_port, return (uint8_t) ((switch_t *) p_sw->priv)->routing_table[dst_id].lane; } -int osm_ucast_lash_setup(osm_opensm_t * p_osm) +int osm_ucast_lash_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm) { lash_t *p_lash = lash_create(p_osm); if (!p_lash) return -1; - p_osm->routing_engine.context = p_lash; - p_osm->routing_engine.ucast_build_fwd_tables = lash_process; - p_osm->routing_engine.delete = lash_delete; + r->context = p_lash; + r->ucast_build_fwd_tables = lash_process; + r->delete = lash_delete; return 0; } diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c index 9d0ad13..935846c 100644 --- a/opensm/opensm/osm_ucast_mgr.c +++ b/opensm/opensm/osm_ucast_mgr.c @@ -216,7 +216,6 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, uint8_t port; boolean_t is_ignored_by_port_prof; ib_net64_t node_guid; - struct osm_routing_engine *p_routing_eng; unsigned start_from = 1; OSM_LOG_ENTER(p_mgr->p_log); @@ -253,8 +252,6 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, node_guid = osm_node_get_node_guid(p_sw->p_node); - p_routing_eng = &p_mgr->p_subn->p_osm->routing_engine; - /* The lid matrix contains the number of hops to each lid from each port. From this information we determine @@ -269,18 +266,9 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, /* do not try to overwrite the ppro of non existing port ... */ is_ignored_by_port_prof = TRUE; - /* Up/Down routing can cause unreachable routes between some - switches so we do not report that as an error in that case */ - if (!p_routing_eng->build_lid_matrices) { - OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, "ERR 3A08: " - "No path to get to LID %u from switch 0x%" - PRIx64 "\n", lid_ho, cl_ntoh64(node_guid)); - /* trigger a new sweep - try again ... */ - p_mgr->p_subn->subnet_initialization_error = TRUE; - } else - OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, - "No path to get to LID %u from switch 0x%" - PRIx64 "\n", lid_ho, cl_ntoh64(node_guid)); + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, + "No path to get to LID %u from switch 0x%" PRIx64 "\n", + lid_ho, cl_ntoh64(node_guid)); } else { osm_physp_t *p = osm_node_get_physp_ptr(p_sw->p_node, port); @@ -583,7 +571,7 @@ __osm_ucast_mgr_process_neighbors(IN cl_map_item_t * const p_map_item, /********************************************************************** **********************************************************************/ -void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) +int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) { uint32_t i; uint32_t iteration_max; @@ -646,6 +634,8 @@ void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, "Min-hop propagated in %d steps\n", i); } + + return 0; } /********************************************************************** @@ -752,7 +742,7 @@ static void clear_prof_ignore_flag(cl_map_item_t * const p_map_item, void *ctx) } } -static void ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) +static int ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) { cl_qlist_init(&p_mgr->port_order_list); @@ -786,27 +776,56 @@ static void ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) __osm_ucast_mgr_process_tbl, p_mgr); cl_qlist_remove_all(&p_mgr->port_order_list); + + return 0; } /********************************************************************** **********************************************************************/ +static int ucast_mgr_route(struct osm_routing_engine *r, osm_opensm_t *osm) +{ + int ret; + + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, + "building routing with \'%s\' routing algorithm...\n", r->name); + + if (!r->build_lid_matrices || + (ret = r->build_lid_matrices(r->context)) > 0) + ret = osm_ucast_mgr_build_lid_matrices(&osm->sm.ucast_mgr); + + if (ret < 0) { + OSM_LOG(&osm->log, OSM_LOG_ERROR, + "%s: cannot build lid matrices.\n", r->name); + return ret; + } + + if (!r->ucast_build_fwd_tables || + (ret = r->ucast_build_fwd_tables(r->context)) > 0) + ret = ucast_mgr_build_lfts(&osm->sm.ucast_mgr); + + if (ret < 0) { + OSM_LOG(&osm->log, OSM_LOG_ERROR, + "%s: cannot build fwd tables.\n", r->name); + return ret; + } + + osm->routing_engine_used = osm_routing_engine_type(r->name); + + return 0; +} + osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) { osm_opensm_t *p_osm; struct osm_routing_engine *p_routing_eng; osm_signal_t signal = OSM_SIGNAL_DONE; cl_qmap_t *p_sw_guid_tbl; - int blm = 0; - int ubft = 0; OSM_LOG_ENTER(p_mgr->p_log); p_sw_guid_tbl = &p_mgr->p_subn->sw_guid_tbl; p_osm = p_mgr->p_subn->p_osm; - p_routing_eng = &p_osm->routing_engine; - - p_mgr->is_dor = p_routing_eng->name - && (strcmp(p_routing_eng->name, "dor") == 0); + p_routing_eng = p_osm->routing_engine_list; CL_PLOCK_EXCL_ACQUIRE(p_mgr->p_lock); @@ -819,28 +838,19 @@ osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) p_mgr->any_change = FALSE; - if (!p_routing_eng->build_lid_matrices || - (blm = p_routing_eng->build_lid_matrices(p_routing_eng->context))) - osm_ucast_mgr_build_lid_matrices(p_mgr); + p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; + while (p_routing_eng) { + if (!ucast_mgr_route(p_routing_eng, p_osm)) + break; + p_routing_eng = p_routing_eng->next; + } - /* - Now that the lid matrices have been built, we can - build and download the switch forwarding tables. - */ - if (!p_routing_eng->ucast_build_fwd_tables || - (ubft = - p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context))) + if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) { + /* If configured routing algorithm failed, use default MinHop */ + osm_ucast_mgr_build_lid_matrices(p_mgr); ucast_mgr_build_lfts(p_mgr); - - /* 'file' routing engine has one unique logic corner case */ - if (p_routing_eng->name && (strcmp(p_routing_eng->name, "file") == 0) - && (!blm || !ubft)) - p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_FILE; - else if (!blm && !ubft) - p_osm->routing_engine_used = - osm_routing_engine_type(p_routing_eng->name); - else p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_MINHOP; + } OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "%s tables configured on all switches\n", @@ -861,3 +871,28 @@ Exit: OSM_LOG_EXIT(p_mgr->p_log); return (signal); } + +static int ucast_build_lid_matrices(void *context) +{ + return osm_ucast_mgr_build_lid_matrices(context); +} + +static int ucast_build_lfts(void *context) +{ + return ucast_mgr_build_lfts(context); +} + +int osm_ucast_minhop_setup(struct osm_routing_engine *r, osm_opensm_t *osm) +{ + r->context = &osm->sm.ucast_mgr; + r->build_lid_matrices = ucast_build_lid_matrices; + r->ucast_build_fwd_tables = ucast_build_lfts; + return 0; +} + +int osm_ucast_dor_setup(struct osm_routing_engine *r, osm_opensm_t *osm) +{ + osm_ucast_minhop_setup(r, osm); + osm->sm.ucast_mgr.is_dor = 1; + return 0; +} diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 90e9af8..4fdcc78 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -643,7 +643,7 @@ static int __osm_updn_call(void *ctx) } else { OSM_LOG(&p_updn->p_osm->log, OSM_LOG_INFO, "disabling UPDN algorithm, no root nodes were found\n"); - ret = 1; + ret = -1; } if (osm_log_is_active(&p_updn->p_osm->log, OSM_LOG_ROUTING)) @@ -669,7 +669,7 @@ static void __osm_updn_delete(void *context) free(context); } -int osm_ucast_updn_setup(osm_opensm_t * p_osm) +int osm_ucast_updn_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm) { updn_t *p_updn; @@ -680,9 +680,9 @@ int osm_ucast_updn_setup(osm_opensm_t * p_osm) p_updn->p_osm = p_osm; - p_osm->routing_engine.context = p_updn; - p_osm->routing_engine.delete = __osm_updn_delete; - p_osm->routing_engine.build_lid_matrices = __osm_updn_call; + r->context = p_updn; + r->delete = __osm_updn_delete; + r->build_lid_matrices = __osm_updn_call; return 0; } -- 1.6.0.2.287.g3791f From karun.sharma at qlogic.com Sun Sep 28 23:24:45 2008 From: karun.sharma at qlogic.com (Karun Sharma (Contractor - GS Labs)) Date: Mon, 29 Sep 2008 01:24:45 -0500 Subject: [ofa-general] Centos - Failure of QLogic Infinipath Driver, How do I stop it loading? References: Message-ID: In /etc/infiniband/openib.conf, replace: IPATH_LOAD=yes with IPATH_LOAD=no Thanks Karun -------------------------------------------------------------- Think before you print...... -------------------------------------------------------------- ________________________________ From: general-bounces at lists.openfabrics.org on behalf of Robert Dunkley Sent: Fri 9/26/2008 3:43 PM To: general at lists.openfabrics.org Subject: [ofa-general] Centos - Failure of QLogic Infinipath Driver,How do I stop it loading? Hi, I made a small mistake in my OFED compile and the QLogic driver is broken, I use Mellanox so don't need it. How do I stop it loading on bootup? Centos says the following on bootup: Loading QLogic Infinipath Driver: Failed Thanks in advance, Rob The SAQ Group Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ SEMTEC Limited Trading as SAQ is Registered in England & Wales Company Number: 06481952 http://www.saqnet.co.uk AS29219 SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business. DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support. Find us in http://www.thebestof.co.uk/petersfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From keshetti.mahesh at gmail.com Mon Sep 29 01:39:48 2008 From: keshetti.mahesh at gmail.com (Keshetti Mahesh) Date: Mon, 29 Sep 2008 14:09:48 +0530 Subject: [ofa-general] ***SPAM*** ibdm network topology format Message-ID: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> Hello, Is there tool available to convert the Infiniband network topology from the 'ibnetdiscover' format to the format understood by 'ibdm' ? Thanks in advance, Mahesh From vlad at lists.openfabrics.org Mon Sep 29 03:13:13 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 29 Sep 2008 03:13:13 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080929-0200 daily build status Message-ID: <20080929101313.51573E60D60@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.21.1 Log: /home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1833: error: 'struct scatterlist' has no member named 'dma_address' /home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h: In function 'ib_sg_dma_len': /home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1846: error: 'struct scatterlist' has no member named 'dma_length' make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath/ipath_dma.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.21.1_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.21.1' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c: In function 'ehca_poll_eqs': /home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:942: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type /home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:946: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.24_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080929-0200_linux-2.6.24_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From hal.rosenstock at gmail.com Mon Sep 29 11:16:39 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 29 Sep 2008 14:16:39 -0400 Subject: [ofa-general] [PATCH][TRIVIAL]mad.c: Need parens to kmalloc correct amount of memory In-Reply-To: <1221788975.5804.49.camel@hhash-dev> References: <1221788975.5804.49.camel@hhash-dev> Message-ID: Roland, On Thu, Sep 18, 2008 at 9:49 PM, Haven Hash wrote: > > I assume this has never been a problem because the malloc will probably > word align the allocation, but maybe it was desired? > > Potential patch attached. FWIW this patch looks right to me. I believe Sean is on sabbatical. It looks like this is non urgent to me. -- Hal > Haven Hash > haven.hash at isilon.com- > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Mon Sep 29 13:04:49 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 29 Sep 2008 16:04:49 -0400 Subject: [ofa-general] ***SPAM*** ibdm network topology format In-Reply-To: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> References: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> Message-ID: On Mon, Sep 29, 2008 at 4:39 AM, Keshetti Mahesh wrote: > Hello, > > Is there tool available to convert the Infiniband network topology > from the 'ibnetdiscover' format to the format understood by 'ibdm' ? AFAIK there is no such tool. >From what I know, I think it might be easier to go in the other direction as ibdm has more information not obtained by IB means whereas ibnetdiscover only uses IB obtained information. Also, ibnetdiscover -g is closer to what ibdm does. ibnetdiscover -g relies on the system image GUID for grouping information. -- Hal > Thanks in advance, > Mahesh > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hbisa at us.ibm.com Mon Sep 29 13:10:17 2008 From: hbisa at us.ibm.com (Hakem B Isa) Date: Mon, 29 Sep 2008 16:10:17 -0400 Subject: [ofa-general] AUTO: Hakem B Isa/Cranford/IBM is out of the office. (returning 10/07/2008) Message-ID: I am out of the office until 10/07/2008. I am out of the country until Oct.7th. Please contact the following : Gunther R. Schmidt IBM System x Technical Specialist for CitiGroup Office: 212-745-2306 Cell: 1-917-816-3958 grschmid at us.ibm.com ---------------------------------------------------------------------------------------------OR Jim Herrschaft SSM, Citi IA Team Cell: 914-261-1665 Note: This is an automated response to your message "general Digest, Vol 20, Issue 92" sent on 9/29/08 15:00:03. This is the only notification you will receive while this person is away. -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at obsidianresearch.com Mon Sep 29 13:25:01 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Mon, 29 Sep 2008 14:25:01 -0600 Subject: [ofa-general] [PATCH][MINOR] infiniband-diags/ibsysstat.c: Fix a couple of latent bugs Message-ID: <48E1399D.6070606@obsidianresearch.com> Sasha, This patch is based on a code inspection of ibsysstat.c due to the buffer overflow observed with more than 2 CPUs. It fixes a couple of latent bugs although won't improve that particular issue. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-sysstat1 URL: From chu11 at llnl.gov Mon Sep 29 14:32:23 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 29 Sep 2008 14:32:23 -0700 Subject: [ofa-general] Re: [OpenSM][0/18] - Routing Chaining In-Reply-To: <20080928202648.GG25831@sashak.voltaire.com> References: <1221506448.6274.32.camel@cardanus.llnl.gov> <20080928202648.GG25831@sashak.voltaire.com> Message-ID: <1222723943.32432.307.camel@cardanus.llnl.gov> Hey Sasha, Thoughts below. On Sun, 2008-09-28 at 23:26 +0300, Sasha Khapyorsky wrote: > Hi Al, > > Sorry about delay. It took some time to review this patch series in > deep. > > On 12:20 Mon 15 Sep , Al Chu wrote: > > > > As we've discussed before, we wanted to put routing chaining into > > opensm. > > This is great. Thanks! > > > osm_ucast defaults to minhop - The current code automatically > > defaulted to minhop if anything in the selected routing engine failed. > > Naturally this had to be changed for routing chaining. I moved minhop > > out of the ucast_mgr code to make it its own routing engine instead. > > I fully agree that minhop should be implemented as regular routing > engine. However I don't think we need to move > osm_ucast_mgr_build_lid_matrices() and ucast_mgr_build_lfts() and make > it minhop routing engine specific. osm_ucast_mgr_build_lid_matrices() > implements generic lid matrix generation code and ucast_mgr_build_lfts() > has generic balancer, both are useful by other routing engines. > > > osm_ucast assumption on routing failures - The current code defaulted > > to minhop if anything in the selected routing engine failed. Because > > of this some routing engines (most notably "file" routing) > > intentionally "failed" when it wanted default to some portion of > > minhop behavior. All routing behavior had to be moved into routing > > engines to have the routing engines fully fail/succeed on their own. > > Maybe we can use simpler solution - to use method's return status. It > can return negative value on failure, zero on success and positive if > method fallback is requested. This is fine as well. I had considered doing it this way as well, but I guess it comes down to personal programming style. > This will help to keep routing engine method to be optional (potentially > we can add multicast routing methods there) and to keep this simple > enough when single method fallback is desired. > > > updn routing - currently utilizes the minhop build_fwd_tables but > > minhop's code assumes if build_lid_matrices is not-null, it is in > > "up/dn routing mode" instead of "minhop mode". Perfectly fine when > > you can specify max of one routing engine, but needs to be abstracted > > out of minhop so up/dn is independent in its routing "attempt" in the > > chain. > > Agree. But actually usage of this check is odd IMHO. I think it is fine > to just remove this and make unconditional debug warning instead. > > > minhop routing assumed to never fail - Currently minhop routing cannot > > "fail". So if someone wanted to put minhop into the middle of a > > routing chain, it makes no sense. I assume this was based on legacy, > > when the minhop algorithm did not have options like > > "guid_routing_order_file" that could be parsed incorrectly. > > Sure it is legacy, but not for only this reason. > > In some already old days OpenSM state machine was design so that > "managers" (such as ucast_mgr, lid_mgr, etc.) were needed to return > value (osm_signal_t) which indicates sending MADs instead of its actual > execution status. > > Fortunately it is not the case anymore, so we can rework all managers > (including ucast_mgr) to return its real "status". > > > So, lots of rearchitecture were done and lots of cleanup was done as > > well. Some bug fixes along the way too. > > Finally this patch series leaves as with us: > > 21 files changed, 1538 insertions(+), 698 deletions(-) > > , which I think is pretty big for routing chaining :(. > > Assuming that we want this important feature to be included in OFED-1.4 Although it'd be nice to get into OFED 1.4, I know that we (the lab) aren't in too much of a hurry to see it in OFED 1.4. Of course, we backport new stuff into our local tree whenever we want. That's not most people. Al > and that OFED release cycle is already in "RC" phase I reworked this > as single and smaller patch which is based on your original patch series > (so obviously authorship is preserved). It includes some thoughts > mentioned above and also works with ibsim (still test it although). I > will post it to the list shortly. Let me know how it looks? > > Ideally it would be nice to have it integrated before the next RC release > (6 Oct). > > Sasha -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Mon Sep 29 15:17:33 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 29 Sep 2008 15:17:33 -0700 Subject: [ofa-general] Re: [PATCH] opensm: routing chaining In-Reply-To: <20080928204244.GH25831@sashak.voltaire.com> References: <1221506448.6274.32.camel@cardanus.llnl.gov> <20080928202648.GG25831@sashak.voltaire.com> <20080928204244.GH25831@sashak.voltaire.com> Message-ID: <1222726653.32432.321.camel@cardanus.llnl.gov> Hey Sasha, Comments inlined. On Sun, 2008-09-28 at 23:42 +0300, Sasha Khapyorsky wrote: > From: Albert Chu > > Routing chaining is the ability to configure the order in which routing > algorithms are applied in opensm, i.e. > > -R ftree,updn,minhop > > Try using ftree routing. If ftree fails, try updn. If updn fails, try > minhop. > > In order to get this done, some rearchitecture of the routing code had > to be done b/c there is no longer an assumption that only one routing > engine can be specified. > > Always setup a routing engine, assume no default "fallthrough" minhop > routing engine. On configured routing engine failure, do minhop as > a last resort. Stick a *next pointer into struct osm_routing_engine. > Rearchitect routing engine usage as a list instead of a single struct. > > Signed-off-by: Sasha Khapyorsky > --- > opensm/include/opensm/osm_opensm.h | 10 ++- > opensm/include/opensm/osm_subnet.h | 7 +- > opensm/include/opensm/osm_ucast_mgr.h | 2 +- > opensm/man/opensm.8.in | 8 ++- > opensm/opensm/main.c | 10 ++- > opensm/opensm/osm_opensm.c | 121 +++++++++++++++++++++++---------- > opensm/opensm/osm_subnet.c | 11 ++- > opensm/opensm/osm_ucast_file.c | 19 ++--- > opensm/opensm/osm_ucast_ftree.c | 35 ++++------ > opensm/opensm/osm_ucast_lash.c | 16 ++-- > opensm/opensm/osm_ucast_mgr.c | 119 +++++++++++++++++++++----------- > opensm/opensm/osm_ucast_updn.c | 10 ++-- > 12 files changed, 226 insertions(+), 142 deletions(-) > > diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h > index 5d45724..c121be4 100644 > --- a/opensm/include/opensm/osm_opensm.h > +++ b/opensm/include/opensm/osm_opensm.h > @@ -126,6 +126,7 @@ struct osm_routing_engine { > int (*ucast_build_fwd_tables) (void *context); > void (*ucast_dump_tables) (void *context); > void (*delete) (void *context); > + struct osm_routing_engine *next; > }; > /* > * FIELDS > @@ -148,6 +149,9 @@ struct osm_routing_engine { > * delete > * The delete method, may be used for routing engine > * internals cleanup. > +* > +* next > +* Pointer to next routing engine in the list. > */ > > /****s* OpenSM: OpenSM/osm_opensm_t > @@ -178,7 +182,7 @@ typedef struct osm_opensm { > osm_log_t log; > cl_dispatcher_t disp; > cl_plock_t lock; > - struct osm_routing_engine routing_engine; > + struct osm_routing_engine *routing_engine_list; > osm_routing_engine_type_t routing_engine_used; > osm_stats_t stats; > osm_console_t console; > @@ -221,8 +225,8 @@ typedef struct osm_opensm { > * lock > * Shared lock guarding most OpenSM structures. > * > -* routing_engine > -* Routing engine; will be initialized then used. > +* routing_engine_list > +* List of routing engines that should be tried for use. > * > * routing_engine_used > * Indicates which routing engine was used to route a subnet. > diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h > index f90f7ea..0c7f3b9 100644 > --- a/opensm/include/opensm/osm_subnet.h > +++ b/opensm/include/opensm/osm_subnet.h > @@ -182,7 +182,7 @@ typedef struct osm_subn_opt { > char *port_prof_ignore_file; > boolean_t port_profile_switch_nodes; > boolean_t sweep_on_trap; > - char *routing_engine_name; > + char *routing_engine_names; > boolean_t connect_roots; > char *lid_matrix_dump_file; > char *lfts_file; > @@ -353,9 +353,8 @@ typedef struct osm_subn_opt { > * sweep_on_trap > * Received traps will initiate a new sweep. > * > -* routing_engine_name > -* Name of used routing engine > -* (other than default Min Hop Algorithm) > +* routing_engine_names > +* Name of routing engine(s) to use. > * > * connect_roots > * The option which will enforce root to root connectivity with > diff --git a/opensm/include/opensm/osm_ucast_mgr.h b/opensm/include/opensm/osm_ucast_mgr.h > index 1dc9a37..59ba9fa 100644 > --- a/opensm/include/opensm/osm_ucast_mgr.h > +++ b/opensm/include/opensm/osm_ucast_mgr.h > @@ -264,7 +264,7 @@ osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr, > * > * SYNOPSIS > */ > -void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr); > +int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr); > /* > * PARAMETERS > * p_mgr > diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in > index 565c5f8..6790d11 100644 > --- a/opensm/man/opensm.8.in > +++ b/opensm/man/opensm.8.in > @@ -9,7 +9,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA) > [\-F | \-\-config ] [\-c(reate-config) ] > [\-g(uid) ] [\-l(mc) ] > [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] > -[\-R | \-\-routing_engine ] > +[\-R | \-\-routing_engine ] > [\-z | \-\-connect_roots] > [\-M | \-\-lid_matrix_file ] > [\-U | \-\-lfts_file ] > @@ -116,8 +116,10 @@ Without -r, OpenSM attempts to preserve existing > LID assignments resolving multiple use of same LID. > .TP > \fB\-R\fR, \fB\-\-routing_engine\fR > -This option chooses routing engine instead of Min Hop > -algorithm (default). > +This option chooses routing engine(s) to use instead of Min Hop > +algorithm (default). Multiple routing engines can be specified > +separated by commas so that specific ordering of routing algorithms > +will be tried if earlier routing engines fail. > Supported engines: minhop, updn, file, ftree, lash, dor > .TP > \fB\-z\fR, \fB\-\-connect_roots\fR > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c > index 01bfddf..2f53157 100644 > --- a/opensm/opensm/main.c > +++ b/opensm/opensm/main.c > @@ -177,8 +177,10 @@ static void show_usage(void) > " LID assignments resolving multiple use of same LID.\n\n"); > printf("-R\n" > "--routing_engine \n" > - " This option chooses routing engine instead of Min Hop\n" > - " algorithm (default).\n" > + " This option chooses routing engine(s) to use instead of default\n" > + " Min Hop algorithm. Multiple routing engines can be specified\n" > + " separated by commas so that specific ordering of routing\n" > + " algorithms will be tried if earlier routing engines fail.\n" > " Supported engines: updn, file, ftree, lash, dor\n\n"); > printf("-z\n" > "--connect_roots\n" > @@ -851,8 +853,8 @@ int main(int argc, char *argv[]) > break; > > case 'R': > - opt.routing_engine_name = optarg; > - printf(" Activate \'%s\' routing engine\n", optarg); > + opt.routing_engine_names = optarg; > + printf(" Activate \'%s\' routing engine(s)\n", optarg); > break; > > case 'z': > diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c > index d17fed3..4970d0c 100644 > --- a/opensm/opensm/osm_opensm.c > +++ b/opensm/opensm/osm_opensm.c > @@ -61,24 +61,23 @@ > > struct routing_engine_module { > const char *name; > - int (*setup) (osm_opensm_t * p_osm); > + int (*setup) (struct osm_routing_engine *, osm_opensm_t *); > }; > > -extern int osm_ucast_updn_setup(osm_opensm_t * p_osm); > -extern int osm_ucast_file_setup(osm_opensm_t * p_osm); > -extern int osm_ucast_ftree_setup(osm_opensm_t * p_osm); > -extern int osm_ucast_lash_setup(osm_opensm_t * p_osm); > - > -static int osm_ucast_null_setup(osm_opensm_t * p_osm); > +extern int osm_ucast_minhop_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_updn_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_file_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *); > > const static struct routing_engine_module routing_modules[] = { > - {"null", osm_ucast_null_setup}, Not sure how much legacy opensm.opts files are out there, but I kept the "null" routing engine in there just for safety. Is it ok to remove? > - {"minhop", osm_ucast_null_setup}, > + {"minhop", osm_ucast_minhop_setup}, > {"updn", osm_ucast_updn_setup}, > {"file", osm_ucast_file_setup}, > {"ftree", osm_ucast_ftree_setup}, > {"lash", osm_ucast_lash_setup}, > - {"dor", osm_ucast_null_setup}, > + {"dor", osm_ucast_dor_setup}, > {NULL, NULL} > }; > > @@ -135,33 +134,77 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const char *str) > > /********************************************************************** > **********************************************************************/ > -static int setup_routing_engine(osm_opensm_t * p_osm, const char *name) > +static void append_routing_engine(osm_opensm_t *osm, > + struct osm_routing_engine *routing_engine) > { > - const struct routing_engine_module *r; > + struct osm_routing_engine *r; > + > + routing_engine->next = NULL; > + > + if (!osm->routing_engine_list) { > + osm->routing_engine_list = routing_engine; > + return; > + } > + > + r = osm->routing_engine_list; > + while (r->next) > + r = r->next; > > - for (r = routing_modules; r->name && *r->name; r++) { > - if (!strcmp(r->name, name)) { > - p_osm->routing_engine.name = r->name; > - if (r->setup(p_osm)) { > - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > + r->next = routing_engine; > +} > + > +static void setup_routing_engine(osm_opensm_t *osm, const char *name) > +{ > + struct osm_routing_engine *re; > + const struct routing_engine_module *m; > + > + for (m = routing_modules; m->name && *m->name; m++) { > + if (!strcmp(m->name, name)) { > + re = malloc(sizeof(struct osm_routing_engine)); > + if (!re) { > + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, > + "memory allocation failed\n"); > + return; > + } > + memset(re, 0, sizeof(struct osm_routing_engine)); > + > + re->name = m->name; > + if (m->setup(re, osm)) { > + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, > "setup of routing" > " engine \'%s\' failed\n", name); > - return -2; > + return; > } > - OSM_LOG(&p_osm->log, OSM_LOG_DEBUG, > - "\'%s\' routing engine set up\n", > - p_osm->routing_engine.name); > - return 0; > + OSM_LOG(&osm->log, OSM_LOG_DEBUG, > + "\'%s\' routing engine set up\n", re->name); > + append_routing_engine(osm, re); > + return; > } > } > - return -1; > + > + OSM_LOG(&osm->log, OSM_LOG_ERROR, > + "cannot find or setup routing engine \'%s\'", name); > } > > -static int osm_ucast_null_setup(osm_opensm_t * p_osm) > +static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names) > { > - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > - "nothing yet - using default (minhop) routing engine\n"); > - return 0; > + char *name, *str, *p; > + > + if (!engine_names || !*engine_names) { > + setup_routing_engine(osm, "minhop"); > + return; > + } > + > + str = strdup(engine_names); > + name = strtok_r(str, ", \t\n", &p); > + while (name && *name) { > + setup_routing_engine(osm, name); > + name = strtok_r(NULL, ", \t\n", &p); > + } > + free(str); > + > + if (!osm->routing_engine_list) > + setup_routing_engine(osm, "minhop"); > } > > /********************************************************************** > @@ -181,6 +224,20 @@ void osm_opensm_construct(IN osm_opensm_t * const p_osm) > > /********************************************************************** > **********************************************************************/ > +static void destroy_routing_engines(osm_opensm_t *osm) > +{ > + struct osm_routing_engine *r, *next; > + > + next = osm->routing_engine_list; > + while (next) { > + r = next; > + next = r->next; > + if (r->delete) > + r->delete(r->context); > + free(r); > + } > +} > + > void osm_opensm_destroy(IN osm_opensm_t * const p_osm) > { > /* in case of shutdown through exit proc - no ^C */ > @@ -218,8 +275,7 @@ void osm_opensm_destroy(IN osm_opensm_t * const p_osm) > osm_sa_db_file_dump(p_osm); > > /* do the destruction in reverse order as init */ > - if (p_osm->routing_engine.delete) > - p_osm->routing_engine.delete(p_osm->routing_engine.context); > + destroy_routing_engines(p_osm); > osm_sa_destroy(&p_osm->sa); > osm_sm_destroy(&p_osm->sm); > #ifdef ENABLE_OSM_PERF_MGR > @@ -371,12 +427,7 @@ osm_opensm_init(IN osm_opensm_t * const p_osm, > goto Exit; > #endif /* ENABLE_OSM_PERF_MGR */ > > - if (p_opt->routing_engine_name && > - setup_routing_engine(p_osm, p_opt->routing_engine_name)) > - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > - "cannot find or setup routing engine" > - " \'%s\'. Default will be used instead\n", > - p_opt->routing_engine_name); > + setup_routing_engines(p_osm, p_opt->routing_engine_names); > > p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; > > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 278aa3d..a39ce75 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -442,7 +442,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) > p_opt->port_prof_ignore_file = NULL; > p_opt->port_profile_switch_nodes = FALSE; > p_opt->sweep_on_trap = TRUE; > - p_opt->routing_engine_name = NULL; > + p_opt->routing_engine_names = NULL; > p_opt->connect_roots = FALSE; > p_opt->lid_matrix_dump_file = NULL; > p_opt->lfts_file = NULL; > @@ -1264,7 +1264,7 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) > p_key, p_val, &p_opts->sweep_on_trap); > > opts_unpack_charp("routing_engine", > - p_key, p_val, &p_opts->routing_engine_name); > + p_key, p_val, &p_opts->routing_engine_names); > > opts_unpack_boolean("connect_roots", > p_key, p_val, &p_opts->connect_roots); > @@ -1521,9 +1521,12 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) > > fprintf(opts_file, > "# Routing engine\n" > + "# Multiple routing engines can be specified separated by\n" > + "# commas so that specific ordering of routing algorithms will\n" > + "# be tried if earlier routing engines fail.\n" > "# Supported engines: minhop, updn, file, ftree, lash, dor\n" > - "routing_engine %s\n\n", p_opts->routing_engine_name ? > - p_opts->routing_engine_name : null_str); > + "routing_engine %s\n\n", p_opts->routing_engine_names ? > + p_opts->routing_engine_names : null_str); > > fprintf(opts_file, > "# Connect roots (use FALSE if unsure)\n" > diff --git a/opensm/opensm/osm_ucast_file.c b/opensm/opensm/osm_ucast_file.c > index 3d00cb2..cbd65c1 100644 > --- a/opensm/opensm/osm_ucast_file.c > +++ b/opensm/opensm/osm_ucast_file.c > @@ -135,14 +135,13 @@ static int do_ucast_file_load(void *context) > OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > "LFTs file name is not given; " > "using default routing algorithm\n"); > - return -1; > + return 1; > } > > file = fopen(file_name, "r"); > if (!file) { > OSM_LOG(&p_osm->log, OSM_LOG_ERROR | OSM_LOG_SYS, "ERR 6302: " > - "cannot open ucast dump file \'%s\'; " > - "using default routing algorithm\n", file_name); > + "cannot open ucast dump file \'%s\': %m\n", file_name); > return -1; > } > > @@ -270,15 +269,13 @@ static int do_lid_matrix_file_load(void *context) > OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > "lid matrix file name is not given; " > "using default lid matrix generation algorithm\n"); > - return -1; > + return 1; > } > > file = fopen(file_name, "r"); > if (!file) { > OSM_LOG(&p_osm->log, OSM_LOG_ERROR | OSM_LOG_SYS, "ERR 6305: " > - "cannot open lid matrix file \'%s\'; " > - "using default lid matrix generation algorithm\n", > - file_name); > + "cannot open lid matrix file \'%s\': %m\n", file_name); > return -1; > } > > @@ -389,10 +386,10 @@ static int do_lid_matrix_file_load(void *context) > return 0; > } > > -int osm_ucast_file_setup(osm_opensm_t * p_osm) > +int osm_ucast_file_setup(struct osm_routing_engine *r, osm_opensm_t *osm) > { > - p_osm->routing_engine.context = (void *)p_osm; > - p_osm->routing_engine.build_lid_matrices = do_lid_matrix_file_load; > - p_osm->routing_engine.ucast_build_fwd_tables = do_ucast_file_load; > + r->context = osm; > + r->build_lid_matrices = do_lid_matrix_file_load; > + r->ucast_build_fwd_tables = do_ucast_file_load; > return 0; > } > diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c > index 1d3233c..15168b7 100644 > --- a/opensm/opensm/osm_ucast_ftree.c > +++ b/opensm/opensm/osm_ucast_ftree.c > @@ -3552,8 +3552,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, "Ranking FatTree\n"); > if (__osm_ftree_fabric_rank(p_ftree) != 0) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Failed ranking the tree - " > - "fat-tree routing falls back to default routing\n"); > + "Failed ranking the tree\n"); > status = -1; > goto Exit; > } > @@ -3567,14 +3566,12 @@ static int __osm_ftree_construct_fabric(IN void *context) > "Populating CA & switch ports\n"); > if (__osm_ftree_fabric_populate_ports(p_ftree) != 0) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric topology is not a fat-tree - " > - "routing falls back to default routing\n"); > + "Fabric topology is not a fat-tree\n"); > status = -1; > goto Exit; > } else if (p_ftree->cn_num == 0) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric has no valid compute nodes - " > - "routing falls back to default routing\n"); > + "Fabric has no valid compute nodes\n"); > status = -1; > goto Exit; > } > @@ -3586,8 +3583,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > if (__osm_ftree_fabric_get_rank(p_ftree) > FAT_TREE_MAX_RANK || > __osm_ftree_fabric_get_rank(p_ftree) < FAT_TREE_MIN_RANK) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric rank is %u (should be between %u and %u) - " > - "fat-tree routing falls back to default routing\n", > + "Fabric rank is %u (should be between %u and %u)\n", > __osm_ftree_fabric_get_rank(p_ftree), FAT_TREE_MIN_RANK, > FAT_TREE_MAX_RANK); > status = -1; > @@ -3600,8 +3596,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > validation - it checks that all the CNs are at the same rank. */ > if (__osm_ftree_fabric_mark_leaf_switches(p_ftree)) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric topology is not a fat-tree - " > - "routing falls back to default routing\n"); > + "Fabric topology is not a fat-tree\n"); > status = -1; > goto Exit; > } > @@ -3619,8 +3614,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > In any case, the first and the last switches in the array are REAL leafs. */ > if (__osm_ftree_fabric_create_leaf_switch_array(p_ftree)) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric topology is not a fat-tree - " > - "routing falls back to default routing\n"); > + "Fabric topology is not a fat-tree\n"); > status = -1; > goto Exit; > } > @@ -3640,8 +3634,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > if (!__osm_ftree_fabric_roots_provided(p_ftree) && > !__osm_ftree_fabric_validate_topology(p_ftree)) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric topology is not a fat-tree - " > - "routing falls back to default routing\n"); > + "Fabric topology is not a fat-tree\n"); > status = -1; > goto Exit; > } > @@ -3726,7 +3719,7 @@ static void __osm_ftree_delete(IN void *context) > /*************************************************** > ***************************************************/ > > -int osm_ucast_ftree_setup(osm_opensm_t * p_osm) > +int osm_ucast_ftree_setup(struct osm_routing_engine *r, osm_opensm_t * p_osm) > { > ftree_fabric_t *p_ftree = __osm_ftree_fabric_create(); > if (!p_ftree) > @@ -3734,12 +3727,10 @@ int osm_ucast_ftree_setup(osm_opensm_t * p_osm) > > p_ftree->p_osm = p_osm; > > - p_osm->routing_engine.context = (void *)p_ftree; > - p_osm->routing_engine.build_lid_matrices = __osm_ftree_construct_fabric; > - p_osm->routing_engine.ucast_build_fwd_tables = __osm_ftree_do_routing; > - p_osm->routing_engine.delete = __osm_ftree_delete; > + r->context = (void *)p_ftree; > + r->build_lid_matrices = __osm_ftree_construct_fabric; > + r->ucast_build_fwd_tables = __osm_ftree_do_routing; > + r->delete = __osm_ftree_delete; > + > return 0; > } > - > -/*************************************************** > - ***************************************************/ > diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c > index b985e9a..ce3982f 100644 > --- a/opensm/opensm/osm_ucast_lash.c > +++ b/opensm/opensm/osm_ucast_lash.c > @@ -785,7 +785,7 @@ static int init_lash_structures(lash_t * p_lash) > unsigned vl_min = p_lash->vl_min; > unsigned num_switches = p_lash->num_switches; > osm_log_t *p_log = &p_lash->p_osm->log; > - int status = IB_SUCCESS; > + int status = 0; > unsigned int i, j, k; > > OSM_LOG_ENTER(p_log); > @@ -852,7 +852,7 @@ static int init_lash_structures(lash_t * p_lash) > goto Exit; > > Exit_Mem_Error: > - status = IB_ERROR; > + status = -1; > OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D01: " > "Could not allocate required memory for LASH errno %d, errno %d for lack of memory\n", > errno, ENOMEM); > @@ -875,7 +875,7 @@ static int lash_core(lash_t * p_lash) > int stop = 0, output_link, i_next_switch; > int output_link2, i_next_switch2; > int cycle_found2 = 0; > - int status = IB_SUCCESS; > + int status = 0; > int *switch_bitmap = NULL; /* Bitmap to check if we have processed this pair */ > > OSM_LOG_ENTER(p_log); > @@ -1028,7 +1028,7 @@ static int lash_core(lash_t * p_lash) > goto Exit; > > Error_Not_Enough_Lanes: > - status = IB_ERROR; > + status = -1; > OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D02: " > "Lane requirements (%d) exceed available lanes (%d)\n", > p_lash->vl_min, lanes_needed); > @@ -1360,15 +1360,15 @@ uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, osm_port_t * p_src_port, > return (uint8_t) ((switch_t *) p_sw->priv)->routing_table[dst_id].lane; > } > > -int osm_ucast_lash_setup(osm_opensm_t * p_osm) > +int osm_ucast_lash_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm) > { > lash_t *p_lash = lash_create(p_osm); > if (!p_lash) > return -1; > > - p_osm->routing_engine.context = p_lash; > - p_osm->routing_engine.ucast_build_fwd_tables = lash_process; > - p_osm->routing_engine.delete = lash_delete; > + r->context = p_lash; > + r->ucast_build_fwd_tables = lash_process; > + r->delete = lash_delete; > > return 0; > } > diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c > index 9d0ad13..935846c 100644 > --- a/opensm/opensm/osm_ucast_mgr.c > +++ b/opensm/opensm/osm_ucast_mgr.c > @@ -216,7 +216,6 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, > uint8_t port; > boolean_t is_ignored_by_port_prof; > ib_net64_t node_guid; > - struct osm_routing_engine *p_routing_eng; > unsigned start_from = 1; > > OSM_LOG_ENTER(p_mgr->p_log); > @@ -253,8 +252,6 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, > > node_guid = osm_node_get_node_guid(p_sw->p_node); > > - p_routing_eng = &p_mgr->p_subn->p_osm->routing_engine; > - > /* > The lid matrix contains the number of hops to each > lid from each port. From this information we determine > @@ -269,18 +266,9 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, > /* do not try to overwrite the ppro of non existing port ... */ > is_ignored_by_port_prof = TRUE; > > - /* Up/Down routing can cause unreachable routes between some > - switches so we do not report that as an error in that case */ > - if (!p_routing_eng->build_lid_matrices) { > - OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, "ERR 3A08: " > - "No path to get to LID %u from switch 0x%" > - PRIx64 "\n", lid_ho, cl_ntoh64(node_guid)); > - /* trigger a new sweep - try again ... */ > - p_mgr->p_subn->subnet_initialization_error = TRUE; > - } else > - OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > - "No path to get to LID %u from switch 0x%" > - PRIx64 "\n", lid_ho, cl_ntoh64(node_guid)); > + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > + "No path to get to LID %u from switch 0x%" PRIx64 "\n", > + lid_ho, cl_ntoh64(node_guid)); > } else { > osm_physp_t *p = osm_node_get_physp_ptr(p_sw->p_node, port); > > @@ -583,7 +571,7 @@ __osm_ucast_mgr_process_neighbors(IN cl_map_item_t * const p_map_item, > > /********************************************************************** > **********************************************************************/ > -void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) > +int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) > { > uint32_t i; > uint32_t iteration_max; > @@ -646,6 +634,8 @@ void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) > OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > "Min-hop propagated in %d steps\n", i); > } > + > + return 0; > } > > /********************************************************************** > @@ -752,7 +742,7 @@ static void clear_prof_ignore_flag(cl_map_item_t * const p_map_item, void *ctx) > } > } > > -static void ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) > +static int ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) > { > cl_qlist_init(&p_mgr->port_order_list); > > @@ -786,27 +776,56 @@ static void ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) > __osm_ucast_mgr_process_tbl, p_mgr); > > cl_qlist_remove_all(&p_mgr->port_order_list); > + > + return 0; > } > > /********************************************************************** > **********************************************************************/ > +static int ucast_mgr_route(struct osm_routing_engine *r, osm_opensm_t *osm) > +{ > + int ret; > + > + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, > + "building routing with \'%s\' routing algorithm...\n", r->name); > + > + if (!r->build_lid_matrices || > + (ret = r->build_lid_matrices(r->context)) > 0) > + ret = osm_ucast_mgr_build_lid_matrices(&osm->sm.ucast_mgr); > + > + if (ret < 0) { > + OSM_LOG(&osm->log, OSM_LOG_ERROR, > + "%s: cannot build lid matrices.\n", r->name); > + return ret; > + } > + > + if (!r->ucast_build_fwd_tables || > + (ret = r->ucast_build_fwd_tables(r->context)) > 0) > + ret = ucast_mgr_build_lfts(&osm->sm.ucast_mgr); > + > + if (ret < 0) { > + OSM_LOG(&osm->log, OSM_LOG_ERROR, > + "%s: cannot build fwd tables.\n", r->name); > + return ret; > + } > + > + osm->routing_engine_used = osm_routing_engine_type(r->name); > + > + return 0; > +} > + > osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) > { > osm_opensm_t *p_osm; > struct osm_routing_engine *p_routing_eng; > osm_signal_t signal = OSM_SIGNAL_DONE; > cl_qmap_t *p_sw_guid_tbl; > - int blm = 0; > - int ubft = 0; > > OSM_LOG_ENTER(p_mgr->p_log); > > p_sw_guid_tbl = &p_mgr->p_subn->sw_guid_tbl; > p_osm = p_mgr->p_subn->p_osm; > - p_routing_eng = &p_osm->routing_engine; > - > - p_mgr->is_dor = p_routing_eng->name > - && (strcmp(p_routing_eng->name, "dor") == 0); > + p_routing_eng = p_osm->routing_engine_list; > > CL_PLOCK_EXCL_ACQUIRE(p_mgr->p_lock); > > @@ -819,28 +838,19 @@ osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) > > p_mgr->any_change = FALSE; > > - if (!p_routing_eng->build_lid_matrices || > - (blm = p_routing_eng->build_lid_matrices(p_routing_eng->context))) > - osm_ucast_mgr_build_lid_matrices(p_mgr); > + p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; > + while (p_routing_eng) { > + if (!ucast_mgr_route(p_routing_eng, p_osm)) > + break; > + p_routing_eng = p_routing_eng->next; > + } > > - /* > - Now that the lid matrices have been built, we can > - build and download the switch forwarding tables. > - */ > - if (!p_routing_eng->ucast_build_fwd_tables || > - (ubft = > - p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context))) > + if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) { > + /* If configured routing algorithm failed, use default MinHop */ > + osm_ucast_mgr_build_lid_matrices(p_mgr); > ucast_mgr_build_lfts(p_mgr); > - > - /* 'file' routing engine has one unique logic corner case */ > - if (p_routing_eng->name && (strcmp(p_routing_eng->name, "file") == 0) > - && (!blm || !ubft)) > - p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_FILE; > - else if (!blm && !ubft) > - p_osm->routing_engine_used = > - osm_routing_engine_type(p_routing_eng->name); > - else > p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_MINHOP; > + } > > OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, > "%s tables configured on all switches\n", > @@ -861,3 +871,28 @@ Exit: > OSM_LOG_EXIT(p_mgr->p_log); > return (signal); > } > + > +static int ucast_build_lid_matrices(void *context) > +{ > + return osm_ucast_mgr_build_lid_matrices(context); > +} > + > +static int ucast_build_lfts(void *context) > +{ > + return ucast_mgr_build_lfts(context); > +} > + > +int osm_ucast_minhop_setup(struct osm_routing_engine *r, osm_opensm_t *osm) > +{ > + r->context = &osm->sm.ucast_mgr; > + r->build_lid_matrices = ucast_build_lid_matrices; > + r->ucast_build_fwd_tables = ucast_build_lfts; > + return 0; > +} > + > +int osm_ucast_dor_setup(struct osm_routing_engine *r, osm_opensm_t *osm) > +{ > + osm_ucast_minhop_setup(r, osm); > + osm->sm.ucast_mgr.is_dor = 1; If dor is listed in the routing chain, all other algorithms that may fall-through into minhop's build_lfts callback (minhop, updn, file), will be affected by the is_dor flag. Is this intended? If we don't want to abstract it for this round, perhaps we could stick the "is_dor" flag set/unset into ucast_mgr_route() so that is_dor is set only when dor is being routed. > + return 0; > +} > diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c > index 90e9af8..4fdcc78 100644 > --- a/opensm/opensm/osm_ucast_updn.c > +++ b/opensm/opensm/osm_ucast_updn.c > @@ -643,7 +643,7 @@ static int __osm_updn_call(void *ctx) > } else { > OSM_LOG(&p_updn->p_osm->log, OSM_LOG_INFO, > "disabling UPDN algorithm, no root nodes were found\n"); > - ret = 1; > + ret = -1; > } > > if (osm_log_is_active(&p_updn->p_osm->log, OSM_LOG_ROUTING)) > @@ -669,7 +669,7 @@ static void __osm_updn_delete(void *context) > free(context); > } > > -int osm_ucast_updn_setup(osm_opensm_t * p_osm) > +int osm_ucast_updn_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm) > { > updn_t *p_updn; > > @@ -680,9 +680,9 @@ int osm_ucast_updn_setup(osm_opensm_t * p_osm) > > p_updn->p_osm = p_osm; > > - p_osm->routing_engine.context = p_updn; > - p_osm->routing_engine.delete = __osm_updn_delete; > - p_osm->routing_engine.build_lid_matrices = __osm_updn_call; > + r->context = p_updn; > + r->delete = __osm_updn_delete; > + r->build_lid_matrices = __osm_updn_call; > > return 0; > } The patch looks fine as whole. Thanks, Al -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Mon Sep 29 15:17:33 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 29 Sep 2008 15:17:33 -0700 Subject: [ofa-general] Re: [PATCH] opensm: routing chaining In-Reply-To: <20080928204244.GH25831@sashak.voltaire.com> References: <1221506448.6274.32.camel@cardanus.llnl.gov> <20080928202648.GG25831@sashak.voltaire.com> <20080928204244.GH25831@sashak.voltaire.com> Message-ID: <1222726653.32432.321.camel@cardanus.llnl.gov> Hey Sasha, Comments inlined. On Sun, 2008-09-28 at 23:42 +0300, Sasha Khapyorsky wrote: > From: Albert Chu > > Routing chaining is the ability to configure the order in which routing > algorithms are applied in opensm, i.e. > > -R ftree,updn,minhop > > Try using ftree routing. If ftree fails, try updn. If updn fails, try > minhop. > > In order to get this done, some rearchitecture of the routing code had > to be done b/c there is no longer an assumption that only one routing > engine can be specified. > > Always setup a routing engine, assume no default "fallthrough" minhop > routing engine. On configured routing engine failure, do minhop as > a last resort. Stick a *next pointer into struct osm_routing_engine. > Rearchitect routing engine usage as a list instead of a single struct. > > Signed-off-by: Sasha Khapyorsky > --- > opensm/include/opensm/osm_opensm.h | 10 ++- > opensm/include/opensm/osm_subnet.h | 7 +- > opensm/include/opensm/osm_ucast_mgr.h | 2 +- > opensm/man/opensm.8.in | 8 ++- > opensm/opensm/main.c | 10 ++- > opensm/opensm/osm_opensm.c | 121 +++++++++++++++++++++++---------- > opensm/opensm/osm_subnet.c | 11 ++- > opensm/opensm/osm_ucast_file.c | 19 ++--- > opensm/opensm/osm_ucast_ftree.c | 35 ++++------ > opensm/opensm/osm_ucast_lash.c | 16 ++-- > opensm/opensm/osm_ucast_mgr.c | 119 +++++++++++++++++++++----------- > opensm/opensm/osm_ucast_updn.c | 10 ++-- > 12 files changed, 226 insertions(+), 142 deletions(-) > > diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h > index 5d45724..c121be4 100644 > --- a/opensm/include/opensm/osm_opensm.h > +++ b/opensm/include/opensm/osm_opensm.h > @@ -126,6 +126,7 @@ struct osm_routing_engine { > int (*ucast_build_fwd_tables) (void *context); > void (*ucast_dump_tables) (void *context); > void (*delete) (void *context); > + struct osm_routing_engine *next; > }; > /* > * FIELDS > @@ -148,6 +149,9 @@ struct osm_routing_engine { > * delete > * The delete method, may be used for routing engine > * internals cleanup. > +* > +* next > +* Pointer to next routing engine in the list. > */ > > /****s* OpenSM: OpenSM/osm_opensm_t > @@ -178,7 +182,7 @@ typedef struct osm_opensm { > osm_log_t log; > cl_dispatcher_t disp; > cl_plock_t lock; > - struct osm_routing_engine routing_engine; > + struct osm_routing_engine *routing_engine_list; > osm_routing_engine_type_t routing_engine_used; > osm_stats_t stats; > osm_console_t console; > @@ -221,8 +225,8 @@ typedef struct osm_opensm { > * lock > * Shared lock guarding most OpenSM structures. > * > -* routing_engine > -* Routing engine; will be initialized then used. > +* routing_engine_list > +* List of routing engines that should be tried for use. > * > * routing_engine_used > * Indicates which routing engine was used to route a subnet. > diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h > index f90f7ea..0c7f3b9 100644 > --- a/opensm/include/opensm/osm_subnet.h > +++ b/opensm/include/opensm/osm_subnet.h > @@ -182,7 +182,7 @@ typedef struct osm_subn_opt { > char *port_prof_ignore_file; > boolean_t port_profile_switch_nodes; > boolean_t sweep_on_trap; > - char *routing_engine_name; > + char *routing_engine_names; > boolean_t connect_roots; > char *lid_matrix_dump_file; > char *lfts_file; > @@ -353,9 +353,8 @@ typedef struct osm_subn_opt { > * sweep_on_trap > * Received traps will initiate a new sweep. > * > -* routing_engine_name > -* Name of used routing engine > -* (other than default Min Hop Algorithm) > +* routing_engine_names > +* Name of routing engine(s) to use. > * > * connect_roots > * The option which will enforce root to root connectivity with > diff --git a/opensm/include/opensm/osm_ucast_mgr.h b/opensm/include/opensm/osm_ucast_mgr.h > index 1dc9a37..59ba9fa 100644 > --- a/opensm/include/opensm/osm_ucast_mgr.h > +++ b/opensm/include/opensm/osm_ucast_mgr.h > @@ -264,7 +264,7 @@ osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr, > * > * SYNOPSIS > */ > -void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr); > +int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr); > /* > * PARAMETERS > * p_mgr > diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in > index 565c5f8..6790d11 100644 > --- a/opensm/man/opensm.8.in > +++ b/opensm/man/opensm.8.in > @@ -9,7 +9,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA) > [\-F | \-\-config ] [\-c(reate-config) ] > [\-g(uid) ] [\-l(mc) ] > [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] > -[\-R | \-\-routing_engine ] > +[\-R | \-\-routing_engine ] > [\-z | \-\-connect_roots] > [\-M | \-\-lid_matrix_file ] > [\-U | \-\-lfts_file ] > @@ -116,8 +116,10 @@ Without -r, OpenSM attempts to preserve existing > LID assignments resolving multiple use of same LID. > .TP > \fB\-R\fR, \fB\-\-routing_engine\fR > -This option chooses routing engine instead of Min Hop > -algorithm (default). > +This option chooses routing engine(s) to use instead of Min Hop > +algorithm (default). Multiple routing engines can be specified > +separated by commas so that specific ordering of routing algorithms > +will be tried if earlier routing engines fail. > Supported engines: minhop, updn, file, ftree, lash, dor > .TP > \fB\-z\fR, \fB\-\-connect_roots\fR > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c > index 01bfddf..2f53157 100644 > --- a/opensm/opensm/main.c > +++ b/opensm/opensm/main.c > @@ -177,8 +177,10 @@ static void show_usage(void) > " LID assignments resolving multiple use of same LID.\n\n"); > printf("-R\n" > "--routing_engine \n" > - " This option chooses routing engine instead of Min Hop\n" > - " algorithm (default).\n" > + " This option chooses routing engine(s) to use instead of default\n" > + " Min Hop algorithm. Multiple routing engines can be specified\n" > + " separated by commas so that specific ordering of routing\n" > + " algorithms will be tried if earlier routing engines fail.\n" > " Supported engines: updn, file, ftree, lash, dor\n\n"); > printf("-z\n" > "--connect_roots\n" > @@ -851,8 +853,8 @@ int main(int argc, char *argv[]) > break; > > case 'R': > - opt.routing_engine_name = optarg; > - printf(" Activate \'%s\' routing engine\n", optarg); > + opt.routing_engine_names = optarg; > + printf(" Activate \'%s\' routing engine(s)\n", optarg); > break; > > case 'z': > diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c > index d17fed3..4970d0c 100644 > --- a/opensm/opensm/osm_opensm.c > +++ b/opensm/opensm/osm_opensm.c > @@ -61,24 +61,23 @@ > > struct routing_engine_module { > const char *name; > - int (*setup) (osm_opensm_t * p_osm); > + int (*setup) (struct osm_routing_engine *, osm_opensm_t *); > }; > > -extern int osm_ucast_updn_setup(osm_opensm_t * p_osm); > -extern int osm_ucast_file_setup(osm_opensm_t * p_osm); > -extern int osm_ucast_ftree_setup(osm_opensm_t * p_osm); > -extern int osm_ucast_lash_setup(osm_opensm_t * p_osm); > - > -static int osm_ucast_null_setup(osm_opensm_t * p_osm); > +extern int osm_ucast_minhop_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_updn_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_file_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *); > +extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *); > > const static struct routing_engine_module routing_modules[] = { > - {"null", osm_ucast_null_setup}, Not sure how much legacy opensm.opts files are out there, but I kept the "null" routing engine in there just for safety. Is it ok to remove? > - {"minhop", osm_ucast_null_setup}, > + {"minhop", osm_ucast_minhop_setup}, > {"updn", osm_ucast_updn_setup}, > {"file", osm_ucast_file_setup}, > {"ftree", osm_ucast_ftree_setup}, > {"lash", osm_ucast_lash_setup}, > - {"dor", osm_ucast_null_setup}, > + {"dor", osm_ucast_dor_setup}, > {NULL, NULL} > }; > > @@ -135,33 +134,77 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const char *str) > > /********************************************************************** > **********************************************************************/ > -static int setup_routing_engine(osm_opensm_t * p_osm, const char *name) > +static void append_routing_engine(osm_opensm_t *osm, > + struct osm_routing_engine *routing_engine) > { > - const struct routing_engine_module *r; > + struct osm_routing_engine *r; > + > + routing_engine->next = NULL; > + > + if (!osm->routing_engine_list) { > + osm->routing_engine_list = routing_engine; > + return; > + } > + > + r = osm->routing_engine_list; > + while (r->next) > + r = r->next; > > - for (r = routing_modules; r->name && *r->name; r++) { > - if (!strcmp(r->name, name)) { > - p_osm->routing_engine.name = r->name; > - if (r->setup(p_osm)) { > - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > + r->next = routing_engine; > +} > + > +static void setup_routing_engine(osm_opensm_t *osm, const char *name) > +{ > + struct osm_routing_engine *re; > + const struct routing_engine_module *m; > + > + for (m = routing_modules; m->name && *m->name; m++) { > + if (!strcmp(m->name, name)) { > + re = malloc(sizeof(struct osm_routing_engine)); > + if (!re) { > + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, > + "memory allocation failed\n"); > + return; > + } > + memset(re, 0, sizeof(struct osm_routing_engine)); > + > + re->name = m->name; > + if (m->setup(re, osm)) { > + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, > "setup of routing" > " engine \'%s\' failed\n", name); > - return -2; > + return; > } > - OSM_LOG(&p_osm->log, OSM_LOG_DEBUG, > - "\'%s\' routing engine set up\n", > - p_osm->routing_engine.name); > - return 0; > + OSM_LOG(&osm->log, OSM_LOG_DEBUG, > + "\'%s\' routing engine set up\n", re->name); > + append_routing_engine(osm, re); > + return; > } > } > - return -1; > + > + OSM_LOG(&osm->log, OSM_LOG_ERROR, > + "cannot find or setup routing engine \'%s\'", name); > } > > -static int osm_ucast_null_setup(osm_opensm_t * p_osm) > +static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names) > { > - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > - "nothing yet - using default (minhop) routing engine\n"); > - return 0; > + char *name, *str, *p; > + > + if (!engine_names || !*engine_names) { > + setup_routing_engine(osm, "minhop"); > + return; > + } > + > + str = strdup(engine_names); > + name = strtok_r(str, ", \t\n", &p); > + while (name && *name) { > + setup_routing_engine(osm, name); > + name = strtok_r(NULL, ", \t\n", &p); > + } > + free(str); > + > + if (!osm->routing_engine_list) > + setup_routing_engine(osm, "minhop"); > } > > /********************************************************************** > @@ -181,6 +224,20 @@ void osm_opensm_construct(IN osm_opensm_t * const p_osm) > > /********************************************************************** > **********************************************************************/ > +static void destroy_routing_engines(osm_opensm_t *osm) > +{ > + struct osm_routing_engine *r, *next; > + > + next = osm->routing_engine_list; > + while (next) { > + r = next; > + next = r->next; > + if (r->delete) > + r->delete(r->context); > + free(r); > + } > +} > + > void osm_opensm_destroy(IN osm_opensm_t * const p_osm) > { > /* in case of shutdown through exit proc - no ^C */ > @@ -218,8 +275,7 @@ void osm_opensm_destroy(IN osm_opensm_t * const p_osm) > osm_sa_db_file_dump(p_osm); > > /* do the destruction in reverse order as init */ > - if (p_osm->routing_engine.delete) > - p_osm->routing_engine.delete(p_osm->routing_engine.context); > + destroy_routing_engines(p_osm); > osm_sa_destroy(&p_osm->sa); > osm_sm_destroy(&p_osm->sm); > #ifdef ENABLE_OSM_PERF_MGR > @@ -371,12 +427,7 @@ osm_opensm_init(IN osm_opensm_t * const p_osm, > goto Exit; > #endif /* ENABLE_OSM_PERF_MGR */ > > - if (p_opt->routing_engine_name && > - setup_routing_engine(p_osm, p_opt->routing_engine_name)) > - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > - "cannot find or setup routing engine" > - " \'%s\'. Default will be used instead\n", > - p_opt->routing_engine_name); > + setup_routing_engines(p_osm, p_opt->routing_engine_names); > > p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; > > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 278aa3d..a39ce75 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -442,7 +442,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) > p_opt->port_prof_ignore_file = NULL; > p_opt->port_profile_switch_nodes = FALSE; > p_opt->sweep_on_trap = TRUE; > - p_opt->routing_engine_name = NULL; > + p_opt->routing_engine_names = NULL; > p_opt->connect_roots = FALSE; > p_opt->lid_matrix_dump_file = NULL; > p_opt->lfts_file = NULL; > @@ -1264,7 +1264,7 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) > p_key, p_val, &p_opts->sweep_on_trap); > > opts_unpack_charp("routing_engine", > - p_key, p_val, &p_opts->routing_engine_name); > + p_key, p_val, &p_opts->routing_engine_names); > > opts_unpack_boolean("connect_roots", > p_key, p_val, &p_opts->connect_roots); > @@ -1521,9 +1521,12 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) > > fprintf(opts_file, > "# Routing engine\n" > + "# Multiple routing engines can be specified separated by\n" > + "# commas so that specific ordering of routing algorithms will\n" > + "# be tried if earlier routing engines fail.\n" > "# Supported engines: minhop, updn, file, ftree, lash, dor\n" > - "routing_engine %s\n\n", p_opts->routing_engine_name ? > - p_opts->routing_engine_name : null_str); > + "routing_engine %s\n\n", p_opts->routing_engine_names ? > + p_opts->routing_engine_names : null_str); > > fprintf(opts_file, > "# Connect roots (use FALSE if unsure)\n" > diff --git a/opensm/opensm/osm_ucast_file.c b/opensm/opensm/osm_ucast_file.c > index 3d00cb2..cbd65c1 100644 > --- a/opensm/opensm/osm_ucast_file.c > +++ b/opensm/opensm/osm_ucast_file.c > @@ -135,14 +135,13 @@ static int do_ucast_file_load(void *context) > OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > "LFTs file name is not given; " > "using default routing algorithm\n"); > - return -1; > + return 1; > } > > file = fopen(file_name, "r"); > if (!file) { > OSM_LOG(&p_osm->log, OSM_LOG_ERROR | OSM_LOG_SYS, "ERR 6302: " > - "cannot open ucast dump file \'%s\'; " > - "using default routing algorithm\n", file_name); > + "cannot open ucast dump file \'%s\': %m\n", file_name); > return -1; > } > > @@ -270,15 +269,13 @@ static int do_lid_matrix_file_load(void *context) > OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, > "lid matrix file name is not given; " > "using default lid matrix generation algorithm\n"); > - return -1; > + return 1; > } > > file = fopen(file_name, "r"); > if (!file) { > OSM_LOG(&p_osm->log, OSM_LOG_ERROR | OSM_LOG_SYS, "ERR 6305: " > - "cannot open lid matrix file \'%s\'; " > - "using default lid matrix generation algorithm\n", > - file_name); > + "cannot open lid matrix file \'%s\': %m\n", file_name); > return -1; > } > > @@ -389,10 +386,10 @@ static int do_lid_matrix_file_load(void *context) > return 0; > } > > -int osm_ucast_file_setup(osm_opensm_t * p_osm) > +int osm_ucast_file_setup(struct osm_routing_engine *r, osm_opensm_t *osm) > { > - p_osm->routing_engine.context = (void *)p_osm; > - p_osm->routing_engine.build_lid_matrices = do_lid_matrix_file_load; > - p_osm->routing_engine.ucast_build_fwd_tables = do_ucast_file_load; > + r->context = osm; > + r->build_lid_matrices = do_lid_matrix_file_load; > + r->ucast_build_fwd_tables = do_ucast_file_load; > return 0; > } > diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c > index 1d3233c..15168b7 100644 > --- a/opensm/opensm/osm_ucast_ftree.c > +++ b/opensm/opensm/osm_ucast_ftree.c > @@ -3552,8 +3552,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, "Ranking FatTree\n"); > if (__osm_ftree_fabric_rank(p_ftree) != 0) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Failed ranking the tree - " > - "fat-tree routing falls back to default routing\n"); > + "Failed ranking the tree\n"); > status = -1; > goto Exit; > } > @@ -3567,14 +3566,12 @@ static int __osm_ftree_construct_fabric(IN void *context) > "Populating CA & switch ports\n"); > if (__osm_ftree_fabric_populate_ports(p_ftree) != 0) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric topology is not a fat-tree - " > - "routing falls back to default routing\n"); > + "Fabric topology is not a fat-tree\n"); > status = -1; > goto Exit; > } else if (p_ftree->cn_num == 0) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric has no valid compute nodes - " > - "routing falls back to default routing\n"); > + "Fabric has no valid compute nodes\n"); > status = -1; > goto Exit; > } > @@ -3586,8 +3583,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > if (__osm_ftree_fabric_get_rank(p_ftree) > FAT_TREE_MAX_RANK || > __osm_ftree_fabric_get_rank(p_ftree) < FAT_TREE_MIN_RANK) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric rank is %u (should be between %u and %u) - " > - "fat-tree routing falls back to default routing\n", > + "Fabric rank is %u (should be between %u and %u)\n", > __osm_ftree_fabric_get_rank(p_ftree), FAT_TREE_MIN_RANK, > FAT_TREE_MAX_RANK); > status = -1; > @@ -3600,8 +3596,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > validation - it checks that all the CNs are at the same rank. */ > if (__osm_ftree_fabric_mark_leaf_switches(p_ftree)) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric topology is not a fat-tree - " > - "routing falls back to default routing\n"); > + "Fabric topology is not a fat-tree\n"); > status = -1; > goto Exit; > } > @@ -3619,8 +3614,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > In any case, the first and the last switches in the array are REAL leafs. */ > if (__osm_ftree_fabric_create_leaf_switch_array(p_ftree)) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric topology is not a fat-tree - " > - "routing falls back to default routing\n"); > + "Fabric topology is not a fat-tree\n"); > status = -1; > goto Exit; > } > @@ -3640,8 +3634,7 @@ static int __osm_ftree_construct_fabric(IN void *context) > if (!__osm_ftree_fabric_roots_provided(p_ftree) && > !__osm_ftree_fabric_validate_topology(p_ftree)) { > osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > - "Fabric topology is not a fat-tree - " > - "routing falls back to default routing\n"); > + "Fabric topology is not a fat-tree\n"); > status = -1; > goto Exit; > } > @@ -3726,7 +3719,7 @@ static void __osm_ftree_delete(IN void *context) > /*************************************************** > ***************************************************/ > > -int osm_ucast_ftree_setup(osm_opensm_t * p_osm) > +int osm_ucast_ftree_setup(struct osm_routing_engine *r, osm_opensm_t * p_osm) > { > ftree_fabric_t *p_ftree = __osm_ftree_fabric_create(); > if (!p_ftree) > @@ -3734,12 +3727,10 @@ int osm_ucast_ftree_setup(osm_opensm_t * p_osm) > > p_ftree->p_osm = p_osm; > > - p_osm->routing_engine.context = (void *)p_ftree; > - p_osm->routing_engine.build_lid_matrices = __osm_ftree_construct_fabric; > - p_osm->routing_engine.ucast_build_fwd_tables = __osm_ftree_do_routing; > - p_osm->routing_engine.delete = __osm_ftree_delete; > + r->context = (void *)p_ftree; > + r->build_lid_matrices = __osm_ftree_construct_fabric; > + r->ucast_build_fwd_tables = __osm_ftree_do_routing; > + r->delete = __osm_ftree_delete; > + > return 0; > } > - > -/*************************************************** > - ***************************************************/ > diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c > index b985e9a..ce3982f 100644 > --- a/opensm/opensm/osm_ucast_lash.c > +++ b/opensm/opensm/osm_ucast_lash.c > @@ -785,7 +785,7 @@ static int init_lash_structures(lash_t * p_lash) > unsigned vl_min = p_lash->vl_min; > unsigned num_switches = p_lash->num_switches; > osm_log_t *p_log = &p_lash->p_osm->log; > - int status = IB_SUCCESS; > + int status = 0; > unsigned int i, j, k; > > OSM_LOG_ENTER(p_log); > @@ -852,7 +852,7 @@ static int init_lash_structures(lash_t * p_lash) > goto Exit; > > Exit_Mem_Error: > - status = IB_ERROR; > + status = -1; > OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D01: " > "Could not allocate required memory for LASH errno %d, errno %d for lack of memory\n", > errno, ENOMEM); > @@ -875,7 +875,7 @@ static int lash_core(lash_t * p_lash) > int stop = 0, output_link, i_next_switch; > int output_link2, i_next_switch2; > int cycle_found2 = 0; > - int status = IB_SUCCESS; > + int status = 0; > int *switch_bitmap = NULL; /* Bitmap to check if we have processed this pair */ > > OSM_LOG_ENTER(p_log); > @@ -1028,7 +1028,7 @@ static int lash_core(lash_t * p_lash) > goto Exit; > > Error_Not_Enough_Lanes: > - status = IB_ERROR; > + status = -1; > OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D02: " > "Lane requirements (%d) exceed available lanes (%d)\n", > p_lash->vl_min, lanes_needed); > @@ -1360,15 +1360,15 @@ uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, osm_port_t * p_src_port, > return (uint8_t) ((switch_t *) p_sw->priv)->routing_table[dst_id].lane; > } > > -int osm_ucast_lash_setup(osm_opensm_t * p_osm) > +int osm_ucast_lash_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm) > { > lash_t *p_lash = lash_create(p_osm); > if (!p_lash) > return -1; > > - p_osm->routing_engine.context = p_lash; > - p_osm->routing_engine.ucast_build_fwd_tables = lash_process; > - p_osm->routing_engine.delete = lash_delete; > + r->context = p_lash; > + r->ucast_build_fwd_tables = lash_process; > + r->delete = lash_delete; > > return 0; > } > diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c > index 9d0ad13..935846c 100644 > --- a/opensm/opensm/osm_ucast_mgr.c > +++ b/opensm/opensm/osm_ucast_mgr.c > @@ -216,7 +216,6 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, > uint8_t port; > boolean_t is_ignored_by_port_prof; > ib_net64_t node_guid; > - struct osm_routing_engine *p_routing_eng; > unsigned start_from = 1; > > OSM_LOG_ENTER(p_mgr->p_log); > @@ -253,8 +252,6 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, > > node_guid = osm_node_get_node_guid(p_sw->p_node); > > - p_routing_eng = &p_mgr->p_subn->p_osm->routing_engine; > - > /* > The lid matrix contains the number of hops to each > lid from each port. From this information we determine > @@ -269,18 +266,9 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, > /* do not try to overwrite the ppro of non existing port ... */ > is_ignored_by_port_prof = TRUE; > > - /* Up/Down routing can cause unreachable routes between some > - switches so we do not report that as an error in that case */ > - if (!p_routing_eng->build_lid_matrices) { > - OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, "ERR 3A08: " > - "No path to get to LID %u from switch 0x%" > - PRIx64 "\n", lid_ho, cl_ntoh64(node_guid)); > - /* trigger a new sweep - try again ... */ > - p_mgr->p_subn->subnet_initialization_error = TRUE; > - } else > - OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > - "No path to get to LID %u from switch 0x%" > - PRIx64 "\n", lid_ho, cl_ntoh64(node_guid)); > + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > + "No path to get to LID %u from switch 0x%" PRIx64 "\n", > + lid_ho, cl_ntoh64(node_guid)); > } else { > osm_physp_t *p = osm_node_get_physp_ptr(p_sw->p_node, port); > > @@ -583,7 +571,7 @@ __osm_ucast_mgr_process_neighbors(IN cl_map_item_t * const p_map_item, > > /********************************************************************** > **********************************************************************/ > -void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) > +int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) > { > uint32_t i; > uint32_t iteration_max; > @@ -646,6 +634,8 @@ void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) > OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > "Min-hop propagated in %d steps\n", i); > } > + > + return 0; > } > > /********************************************************************** > @@ -752,7 +742,7 @@ static void clear_prof_ignore_flag(cl_map_item_t * const p_map_item, void *ctx) > } > } > > -static void ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) > +static int ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) > { > cl_qlist_init(&p_mgr->port_order_list); > > @@ -786,27 +776,56 @@ static void ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) > __osm_ucast_mgr_process_tbl, p_mgr); > > cl_qlist_remove_all(&p_mgr->port_order_list); > + > + return 0; > } > > /********************************************************************** > **********************************************************************/ > +static int ucast_mgr_route(struct osm_routing_engine *r, osm_opensm_t *osm) > +{ > + int ret; > + > + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, > + "building routing with \'%s\' routing algorithm...\n", r->name); > + > + if (!r->build_lid_matrices || > + (ret = r->build_lid_matrices(r->context)) > 0) > + ret = osm_ucast_mgr_build_lid_matrices(&osm->sm.ucast_mgr); > + > + if (ret < 0) { > + OSM_LOG(&osm->log, OSM_LOG_ERROR, > + "%s: cannot build lid matrices.\n", r->name); > + return ret; > + } > + > + if (!r->ucast_build_fwd_tables || > + (ret = r->ucast_build_fwd_tables(r->context)) > 0) > + ret = ucast_mgr_build_lfts(&osm->sm.ucast_mgr); > + > + if (ret < 0) { > + OSM_LOG(&osm->log, OSM_LOG_ERROR, > + "%s: cannot build fwd tables.\n", r->name); > + return ret; > + } > + > + osm->routing_engine_used = osm_routing_engine_type(r->name); > + > + return 0; > +} > + > osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) > { > osm_opensm_t *p_osm; > struct osm_routing_engine *p_routing_eng; > osm_signal_t signal = OSM_SIGNAL_DONE; > cl_qmap_t *p_sw_guid_tbl; > - int blm = 0; > - int ubft = 0; > > OSM_LOG_ENTER(p_mgr->p_log); > > p_sw_guid_tbl = &p_mgr->p_subn->sw_guid_tbl; > p_osm = p_mgr->p_subn->p_osm; > - p_routing_eng = &p_osm->routing_engine; > - > - p_mgr->is_dor = p_routing_eng->name > - && (strcmp(p_routing_eng->name, "dor") == 0); > + p_routing_eng = p_osm->routing_engine_list; > > CL_PLOCK_EXCL_ACQUIRE(p_mgr->p_lock); > > @@ -819,28 +838,19 @@ osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) > > p_mgr->any_change = FALSE; > > - if (!p_routing_eng->build_lid_matrices || > - (blm = p_routing_eng->build_lid_matrices(p_routing_eng->context))) > - osm_ucast_mgr_build_lid_matrices(p_mgr); > + p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; > + while (p_routing_eng) { > + if (!ucast_mgr_route(p_routing_eng, p_osm)) > + break; > + p_routing_eng = p_routing_eng->next; > + } > > - /* > - Now that the lid matrices have been built, we can > - build and download the switch forwarding tables. > - */ > - if (!p_routing_eng->ucast_build_fwd_tables || > - (ubft = > - p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context))) > + if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) { > + /* If configured routing algorithm failed, use default MinHop */ > + osm_ucast_mgr_build_lid_matrices(p_mgr); > ucast_mgr_build_lfts(p_mgr); > - > - /* 'file' routing engine has one unique logic corner case */ > - if (p_routing_eng->name && (strcmp(p_routing_eng->name, "file") == 0) > - && (!blm || !ubft)) > - p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_FILE; > - else if (!blm && !ubft) > - p_osm->routing_engine_used = > - osm_routing_engine_type(p_routing_eng->name); > - else > p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_MINHOP; > + } > > OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, > "%s tables configured on all switches\n", > @@ -861,3 +871,28 @@ Exit: > OSM_LOG_EXIT(p_mgr->p_log); > return (signal); > } > + > +static int ucast_build_lid_matrices(void *context) > +{ > + return osm_ucast_mgr_build_lid_matrices(context); > +} > + > +static int ucast_build_lfts(void *context) > +{ > + return ucast_mgr_build_lfts(context); > +} > + > +int osm_ucast_minhop_setup(struct osm_routing_engine *r, osm_opensm_t *osm) > +{ > + r->context = &osm->sm.ucast_mgr; > + r->build_lid_matrices = ucast_build_lid_matrices; > + r->ucast_build_fwd_tables = ucast_build_lfts; > + return 0; > +} > + > +int osm_ucast_dor_setup(struct osm_routing_engine *r, osm_opensm_t *osm) > +{ > + osm_ucast_minhop_setup(r, osm); > + osm->sm.ucast_mgr.is_dor = 1; If dor is listed in the routing chain, all other algorithms that may fall-through into minhop's build_lfts callback (minhop, updn, file), will be affected by the is_dor flag. Is this intended? If we don't want to abstract it for this round, perhaps we could stick the "is_dor" flag set/unset into ucast_mgr_route() so that is_dor is set only when dor is being routed. > + return 0; > +} > diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c > index 90e9af8..4fdcc78 100644 > --- a/opensm/opensm/osm_ucast_updn.c > +++ b/opensm/opensm/osm_ucast_updn.c > @@ -643,7 +643,7 @@ static int __osm_updn_call(void *ctx) > } else { > OSM_LOG(&p_updn->p_osm->log, OSM_LOG_INFO, > "disabling UPDN algorithm, no root nodes were found\n"); > - ret = 1; > + ret = -1; > } > > if (osm_log_is_active(&p_updn->p_osm->log, OSM_LOG_ROUTING)) > @@ -669,7 +669,7 @@ static void __osm_updn_delete(void *context) > free(context); > } > > -int osm_ucast_updn_setup(osm_opensm_t * p_osm) > +int osm_ucast_updn_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm) > { > updn_t *p_updn; > > @@ -680,9 +680,9 @@ int osm_ucast_updn_setup(osm_opensm_t * p_osm) > > p_updn->p_osm = p_osm; > > - p_osm->routing_engine.context = p_updn; > - p_osm->routing_engine.delete = __osm_updn_delete; > - p_osm->routing_engine.build_lid_matrices = __osm_updn_call; > + r->context = p_updn; > + r->delete = __osm_updn_delete; > + r->build_lid_matrices = __osm_updn_call; > > return 0; > } The patch looks fine as whole. Thanks, Al -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From rdreier at cisco.com Mon Sep 29 20:24:39 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Sep 2008 20:24:39 -0700 Subject: [ofa-general] Re: [PATCH v2] ipoib: fix hang while bringing down uninitialized interface In-Reply-To: <48CEA6DC.9000904@gmail.com> (Yossi Etigin's message of "Mon, 15 Sep 2008 21:18:04 +0300") References: <48CEA6DC.9000904@gmail.com> Message-ID: > - handle a case when ipoib_ib_dev_stop() is called twice on the > same dev->priv - zero the timer after its deletion. I don't understand why this is an issue and why: > + /* Make sure the timer was initialized */ > + if (priv->poll_timer.function) { > + del_timer_sync(&priv->poll_timer); > + memset(&priv->poll_timer, 0, sizeof priv->poll_timer); this memset is needed. If the timer isn't pending, isn't del_timer_sync() just a no-op? What am I missing? - R. From rdreier at cisco.com Mon Sep 29 21:24:12 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Sep 2008 21:24:12 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipath: ib_ipath module hangs on unload In-Reply-To: <1221865383-29438-1-git-send-email-yannick.cote@qlogic.com> (yannick cote's message of "Fri, 19 Sep 2008 16:03:03 -0700") References: <1221865383-29438-1-git-send-email-yannick.cote@qlogic.com> Message-ID: thanks, applied From rdreier at cisco.com Mon Sep 29 21:41:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Sep 2008 21:41:37 -0700 Subject: [ofa-general] [PATCH/RFC] IB/mthca: Use pci_request_regions() Message-ID: Back in prehistoric (pre-git!) days, the kernel's MSI-X support did request_mem_region() on a device's MSI-X tables, which meant that a driver that enabled MSI-X couldn't use pci_request_regions() (since that would clash with the PCI layer's MSI-X request). However, that was removed (by me!) years ago, so mthca can just use pci_request_regions() and pci_release_regions() instead of its own much more complicated code that avoids requesting the MSI-X tables. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_catas.c | 15 +------- drivers/infiniband/hw/mthca/mthca_eq.c | 51 +++++-------------------- drivers/infiniband/hw/mthca/mthca_main.c | 59 +--------------------------- 3 files changed, 14 insertions(+), 111 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_catas.c b/drivers/infiniband/hw/mthca/mthca_catas.c index cc440f9..65ad359 100644 --- a/drivers/infiniband/hw/mthca/mthca_catas.c +++ b/drivers/infiniband/hw/mthca/mthca_catas.c @@ -149,18 +149,10 @@ void mthca_start_catas_poll(struct mthca_dev *dev) ((pci_resource_len(dev->pdev, 0) - 1) & dev->catas_err.addr); - if (!request_mem_region(addr, dev->catas_err.size * 4, - DRV_NAME)) { - mthca_warn(dev, "couldn't request catastrophic error region " - "at 0x%lx/0x%x\n", addr, dev->catas_err.size * 4); - return; - } - dev->catas_err.map = ioremap(addr, dev->catas_err.size * 4); if (!dev->catas_err.map) { mthca_warn(dev, "couldn't map catastrophic error region " "at 0x%lx/0x%x\n", addr, dev->catas_err.size * 4); - release_mem_region(addr, dev->catas_err.size * 4); return; } @@ -175,13 +167,8 @@ void mthca_stop_catas_poll(struct mthca_dev *dev) { del_timer_sync(&dev->catas_err.timer); - if (dev->catas_err.map) { + if (dev->catas_err.map) iounmap(dev->catas_err.map); - release_mem_region(pci_resource_start(dev->pdev, 0) + - ((pci_resource_len(dev->pdev, 0) - 1) & - dev->catas_err.addr), - dev->catas_err.size * 4); - } spin_lock_irq(&catas_lock); list_del(&dev->catas_err.list); diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c index cc6858f..28f0e0c 100644 --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -652,27 +652,13 @@ static int mthca_map_reg(struct mthca_dev *dev, { unsigned long base = pci_resource_start(dev->pdev, 0); - if (!request_mem_region(base + offset, size, DRV_NAME)) - return -EBUSY; - *map = ioremap(base + offset, size); - if (!*map) { - release_mem_region(base + offset, size); + if (!*map) return -ENOMEM; - } return 0; } -static void mthca_unmap_reg(struct mthca_dev *dev, unsigned long offset, - unsigned long size, void __iomem *map) -{ - unsigned long base = pci_resource_start(dev->pdev, 0); - - release_mem_region(base + offset, size); - iounmap(map); -} - static int mthca_map_eq_regs(struct mthca_dev *dev) { if (mthca_is_memfree(dev)) { @@ -699,9 +685,7 @@ static int mthca_map_eq_regs(struct mthca_dev *dev) dev->fw.arbel.eq_arm_base) + 4, 4, &dev->eq_regs.arbel.eq_arm)) { mthca_err(dev, "Couldn't map EQ arm register, aborting.\n"); - mthca_unmap_reg(dev, (pci_resource_len(dev->pdev, 0) - 1) & - dev->fw.arbel.clr_int_base, MTHCA_CLR_INT_SIZE, - dev->clr_base); + iounmap(dev->clr_base); return -ENOMEM; } @@ -710,12 +694,8 @@ static int mthca_map_eq_regs(struct mthca_dev *dev) MTHCA_EQ_SET_CI_SIZE, &dev->eq_regs.arbel.eq_set_ci_base)) { mthca_err(dev, "Couldn't map EQ CI register, aborting.\n"); - mthca_unmap_reg(dev, ((pci_resource_len(dev->pdev, 0) - 1) & - dev->fw.arbel.eq_arm_base) + 4, 4, - dev->eq_regs.arbel.eq_arm); - mthca_unmap_reg(dev, (pci_resource_len(dev->pdev, 0) - 1) & - dev->fw.arbel.clr_int_base, MTHCA_CLR_INT_SIZE, - dev->clr_base); + iounmap(dev->eq_regs.arbel.eq_arm); + iounmap(dev->clr_base); return -ENOMEM; } } else { @@ -731,8 +711,7 @@ static int mthca_map_eq_regs(struct mthca_dev *dev) &dev->eq_regs.tavor.ecr_base)) { mthca_err(dev, "Couldn't map ecr register, " "aborting.\n"); - mthca_unmap_reg(dev, MTHCA_CLR_INT_BASE, MTHCA_CLR_INT_SIZE, - dev->clr_base); + iounmap(dev->clr_base); return -ENOMEM; } } @@ -744,22 +723,12 @@ static int mthca_map_eq_regs(struct mthca_dev *dev) static void mthca_unmap_eq_regs(struct mthca_dev *dev) { if (mthca_is_memfree(dev)) { - mthca_unmap_reg(dev, (pci_resource_len(dev->pdev, 0) - 1) & - dev->fw.arbel.eq_set_ci_base, - MTHCA_EQ_SET_CI_SIZE, - dev->eq_regs.arbel.eq_set_ci_base); - mthca_unmap_reg(dev, ((pci_resource_len(dev->pdev, 0) - 1) & - dev->fw.arbel.eq_arm_base) + 4, 4, - dev->eq_regs.arbel.eq_arm); - mthca_unmap_reg(dev, (pci_resource_len(dev->pdev, 0) - 1) & - dev->fw.arbel.clr_int_base, MTHCA_CLR_INT_SIZE, - dev->clr_base); + iounmap(dev->eq_regs.arbel.eq_set_ci_base); + iounmap(dev->eq_regs.arbel.eq_arm); + iounmap(dev->clr_base); } else { - mthca_unmap_reg(dev, MTHCA_ECR_BASE, - MTHCA_ECR_SIZE + MTHCA_ECR_CLR_SIZE, - dev->eq_regs.tavor.ecr_base); - mthca_unmap_reg(dev, MTHCA_CLR_INT_BASE, MTHCA_CLR_INT_SIZE, - dev->clr_base); + iounmap(dev->eq_regs.tavor.ecr_base); + iounmap(dev->clr_base); } } diff --git a/drivers/infiniband/hw/mthca/mthca_main.c b/drivers/infiniband/hw/mthca/mthca_main.c index fb9f91b..52f60f4 100644 --- a/drivers/infiniband/hw/mthca/mthca_main.c +++ b/drivers/infiniband/hw/mthca/mthca_main.c @@ -921,58 +921,6 @@ err_uar_table_free: return err; } -static int mthca_request_regions(struct pci_dev *pdev, int ddr_hidden) -{ - int err; - - /* - * We can't just use pci_request_regions() because the MSI-X - * table is right in the middle of the first BAR. If we did - * pci_request_region and grab all of the first BAR, then - * setting up MSI-X would fail, since the PCI core wants to do - * request_mem_region on the MSI-X vector table. - * - * So just request what we need right now, and request any - * other regions we need when setting up EQs. - */ - if (!request_mem_region(pci_resource_start(pdev, 0) + MTHCA_HCR_BASE, - MTHCA_HCR_SIZE, DRV_NAME)) - return -EBUSY; - - err = pci_request_region(pdev, 2, DRV_NAME); - if (err) - goto err_bar2_failed; - - if (!ddr_hidden) { - err = pci_request_region(pdev, 4, DRV_NAME); - if (err) - goto err_bar4_failed; - } - - return 0; - -err_bar4_failed: - pci_release_region(pdev, 2); - -err_bar2_failed: - release_mem_region(pci_resource_start(pdev, 0) + MTHCA_HCR_BASE, - MTHCA_HCR_SIZE); - - return err; -} - -static void mthca_release_regions(struct pci_dev *pdev, - int ddr_hidden) -{ - if (!ddr_hidden) - pci_release_region(pdev, 4); - - pci_release_region(pdev, 2); - - release_mem_region(pci_resource_start(pdev, 0) + MTHCA_HCR_BASE, - MTHCA_HCR_SIZE); -} - static int mthca_enable_msi_x(struct mthca_dev *mdev) { struct msix_entry entries[3]; @@ -1059,7 +1007,7 @@ static int __mthca_init_one(struct pci_dev *pdev, int hca_type) if (!(pci_resource_flags(pdev, 4) & IORESOURCE_MEM)) ddr_hidden = 1; - err = mthca_request_regions(pdev, ddr_hidden); + err = pci_request_regions(pdev, DRV_NAME); if (err) { dev_err(&pdev->dev, "Cannot obtain PCI resources, " "aborting.\n"); @@ -1196,7 +1144,7 @@ err_free_dev: ib_dealloc_device(&mdev->ib_dev); err_free_res: - mthca_release_regions(pdev, ddr_hidden); + pci_release_regions(pdev); err_disable_pdev: pci_disable_device(pdev); @@ -1240,8 +1188,7 @@ static void __mthca_remove_one(struct pci_dev *pdev) pci_disable_msix(pdev); ib_dealloc_device(&mdev->ib_dev); - mthca_release_regions(pdev, mdev->mthca_flags & - MTHCA_FLAG_DDR_HIDDEN); + pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); } -- 1.6.0.1 From keshetti.mahesh at gmail.com Mon Sep 29 23:04:41 2008 From: keshetti.mahesh at gmail.com (Keshetti Mahesh) Date: Tue, 30 Sep 2008 11:34:41 +0530 Subject: [ofa-general] ***SPAM*** ibdm network topology format In-Reply-To: References: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> Message-ID: <829ded920809292304k3ffc78c0m556efbdd7d35c528@mail.gmail.com> Thanks Hal for the reply. > From what I know, I think it might be easier to go in the other > direction as ibdm has more information not obtained by IB means > whereas ibnetdiscover only uses IB obtained information. Also, > ibnetdiscover -g is closer to what ibdm does. ibnetdiscover -g relies > on the system image GUID for grouping information. While going through the Infiniband utilities available, I came to know 'ibdiagnet' utility from "IBDM" can generate the network topology file in the format of "ibdm" with '-wt' option. But the problem is 'ibdiagnet' is not working on Infiniband simulator 'ibsim'. AFAIK, all IB utilities which use "libibumad.so" should work seamlessly with the 'ibsim'. Does 'ibdiagnet' use any other libraries ? regards, Mahesh From vlad at lists.openfabrics.org Tue Sep 30 03:12:49 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 30 Sep 2008 03:12:49 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20080930-0200 daily build status Message-ID: <20080930101249.65BCAE6095B@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.21.1 Log: /home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1833: error: 'struct scatterlist' has no member named 'dma_address' /home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h: In function 'ib_sg_dma_len': /home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.21.1_x86_64_check/include/rdma/ib_verbs.h:1846: error: 'struct scatterlist' has no member named 'dma_length' make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath/ipath_dma.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.21.1_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.21.1_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.21.1' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c: In function 'ehca_poll_eqs': /home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:942: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type /home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.c:946: warning: passing argument 1 of 'local_irq_save_ptr' from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.24_ppc64_check/drivers/infiniband/hw/ehca] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.24_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20080930-0200_linux-2.6.24_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From raq at cttc.upc.edu Tue Sep 30 03:51:32 2008 From: raq at cttc.upc.edu (Ramiro Alba Queipo) Date: Tue, 30 Sep 2008 12:51:32 +0200 Subject: [ofa-general] Diagnostics output messages Message-ID: <1222771892.31161.219.camel@mundo> Hello everybody: We have just started to run a 22 nodes infiniband cluster (44 in a couple of months) under Ubuntu 8.04 and after carefully reading and testing OFED 1.3.1 diagnogstics packages (ibutils and infiniband-diags), I have got some messages I can not understand: * ibdiagnet -o . -t file.topo -s jff -pm -I--------------------------------------------------- -I- IPoIB Subnets Check -I--------------------------------------------------- -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Suboptimal rate for group. Lowest member rate:20Gbps > group-rate:10Gbps What does it mean? * ibchecknet #warn: counter RcvSwRelayErrors = 259 (threshold 100) lid 4 port 255 Error check on lid 4 (MT47396 Infiniscale-III Mellanox Technologies) port all: FAILED I could see that command 'perfquery -a 255' shows its counters, but: - What is for? - ibqueryerrors.pl -a says RcvSwRelayErrors: This counter can increase due to a valid network event Should I worry by switch ports increasing little by little this counter? I am using IPoIB * ibdiagpath -o . -t file.topo -s jff -n jff201 -I--------------------------------------------------- -I- QoS on Path Check -I--------------------------------------------------- -W- VLArbTableLow Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001 guid=0x0002c90200279295 dev=25204 port:1 -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001 guid=0x0002c90200279295 dev=25204 port:1 -W- VLArbTableLow Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004 guid=0x000b8cffff0052cf dev=47396 port:1 -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004 guid=0x000b8cffff0052cf dev=47396 port:1 -W- SLs:6 7 14 15 mapped to VL > 5 at node:"switch-1/U1" lid=0x0004 guid=0x000b8cffff0052cf dev=47396 in-port:23 out-port:1 -I- The following SLs can be used:0 1 2 3 4 5 8 9 10 11 12 13 What is the meaning of this messages? Finally, and not related to diagnostics messages, I have to change permissions at crw-rw---- 1 root rdma 231, 192 2008-09-30 09:19 /dev/infiniband/uverbs0 to be 'rw' to everybody. Should I add users to 'rdma' group instead? --- Thanks in advance Regards -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que est� net. For all your IT requirements visit: http://www.transtec.co.uk From sashak at voltaire.com Tue Sep 30 05:12:52 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 30 Sep 2008 15:12:52 +0300 Subject: [ofa-general] ***SPAM*** ibdm network topology format In-Reply-To: <829ded920809292304k3ffc78c0m556efbdd7d35c528@mail.gmail.com> References: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> <829ded920809292304k3ffc78c0m556efbdd7d35c528@mail.gmail.com> Message-ID: <20080930121252.GA7396@sashak.voltaire.com> Hi Manesh, On 11:34 Tue 30 Sep , Keshetti Mahesh wrote: > > AFAIK, all IB utilities which use "libibumad.so" should work seamlessly > with the 'ibsim'. Does 'ibdiagnet' use any other libraries ? I'm able to run ibdiagnet with ibsim. I need to export SIM_HOST environment variable so ibdiagnet will start from some host and not a switch (by default with ibsim application starts running from first switch in a fabric). Sasha From hal.rosenstock at gmail.com Tue Sep 30 05:21:52 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 30 Sep 2008 08:21:52 -0400 Subject: ***SPAM*** Re: [ofa-general] ***SPAM*** ibdm network topology format In-Reply-To: <20080930121252.GA7396@sashak.voltaire.com> References: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> <829ded920809292304k3ffc78c0m556efbdd7d35c528@mail.gmail.com> <20080930121252.GA7396@sashak.voltaire.com> Message-ID: On Tue, Sep 30, 2008 at 8:12 AM, Sasha Khapyorsky wrote: > Hi Manesh, > > On 11:34 Tue 30 Sep , Keshetti Mahesh wrote: >> >> AFAIK, all IB utilities which use "libibumad.so" should work seamlessly >> with the 'ibsim'. Does 'ibdiagnet' use any other libraries ? > > I'm able to run ibdiagnet with ibsim. I need to export SIM_HOST > environment variable so ibdiagnet will start from some host and not a > switch (by default with ibsim application starts running from first > switch in a fabric). Won't it work from a switch port 0 then too ? Shouldn't it work from any end port ? -- Hal > > Sasha > From sashak at voltaire.com Tue Sep 30 05:34:44 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 30 Sep 2008 15:34:44 +0300 Subject: [ofa-general] ***SPAM*** ibdm network topology format In-Reply-To: References: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> <829ded920809292304k3ffc78c0m556efbdd7d35c528@mail.gmail.com> <20080930121252.GA7396@sashak.voltaire.com> Message-ID: <20080930123444.GB7396@sashak.voltaire.com> On 08:21 Tue 30 Sep , Hal Rosenstock wrote: > > Won't it work from a switch port 0 then too ? Shouldn't it work from > any end port ? I would expect that it should work, but it doesn't (from switch port 0). I didn't see why. Sasha From hal.rosenstock at gmail.com Tue Sep 30 05:35:20 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 30 Sep 2008 08:35:20 -0400 Subject: ***SPAM*** Re: [ofa-general] Diagnostics output messages In-Reply-To: <1222771892.31161.219.camel@mundo> References: <1222771892.31161.219.camel@mundo> Message-ID: On Tue, Sep 30, 2008 at 6:51 AM, Ramiro Alba Queipo wrote: > Hello everybody: > > We have just started to run a 22 nodes infiniband cluster (44 in a > couple > of months) under Ubuntu 8.04 and after carefully reading and testing > OFED 1.3.1 diagnogstics packages (ibutils and infiniband-diags), I have > got some messages I can not understand: > > * ibdiagnet -o . -t file.topo -s jff -pm > > > -I--------------------------------------------------- > -I- IPoIB Subnets Check > -I--------------------------------------------------- > -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps > SL:0x00 > -W- Suboptimal rate for group. Lowest member rate:20Gbps > > group-rate:10Gbps > > > What does it mean? This means your subnet is pure DDR and the IPoIB broadcast group can run at a higher rate than the default. This is done via OpenSM configuration which is slightly different depending on which version you are using. > * ibchecknet > > #warn: counter RcvSwRelayErrors = 259 (threshold 100) lid 4 port 255 > Error check on lid 4 (MT47396 Infiniscale-III Mellanox Technologies) > port all: FAILED > > > I could see that command 'perfquery -a 255' shows its counters, but: > > - What is for? > - ibqueryerrors.pl -a says > RcvSwRelayErrors: This counter can increase due to a valid network > event > Should I worry by switch ports increasing little by little this > counter? > > I am using IPoIB Unfortunately when running IPoIB, RcvSwRelayErrors needs to be ignored as multicasts are counted as looping. > * ibdiagpath -o . -t file.topo -s jff -n jff201 > > -I--------------------------------------------------- > -I- QoS on Path Check > -I--------------------------------------------------- > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001 > guid=0x0002c90200279295 dev=25204 port:1 > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001 > guid=0x0002c90200279295 dev=25204 port:1 > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004 > guid=0x000b8cffff0052cf dev=47396 port:1 > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004 > guid=0x000b8cffff0052cf dev=47396 port:1 > -W- SLs:6 7 14 15 mapped to VL > 5 at node:"switch-1/U1" lid=0x0004 > guid=0x000b8cffff0052cf dev=47396 in-port:23 out-port:1 > -I- The following SLs can be used:0 1 2 3 4 5 8 9 10 11 12 13 > > What is the meaning of this messages? I'm not sure but it looks like it's complaining about an invalid VL. Can you run: smpquery portinfo 1 smpquery sl2vl 1 smpquery vlarb 1 for both of these lids ? -- Hal > Finally, and not related to diagnostics messages, I have to change > permissions at > > crw-rw---- 1 root rdma 231, 192 2008-09-30 09:19 /dev/infiniband/uverbs0 > > to be 'rw' to everybody. > > Should I add users to 'rdma' group instead? > > > --- > Thanks in advance > > Regards > > > -- > Aquest missatge ha estat analitzat per MailScanner > a la cerca de virus i d'altres continguts perillosos, > i es considera que està net. > For all your IT requirements visit: http://www.transtec.co.uk > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Tue Sep 30 05:36:53 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 30 Sep 2008 08:36:53 -0400 Subject: ***SPAM*** Re: [ofa-general] ***SPAM*** ibdm network topology format In-Reply-To: <20080930123444.GB7396@sashak.voltaire.com> References: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> <829ded920809292304k3ffc78c0m556efbdd7d35c528@mail.gmail.com> <20080930121252.GA7396@sashak.voltaire.com> <20080930123444.GB7396@sashak.voltaire.com> Message-ID: On Tue, Sep 30, 2008 at 8:34 AM, Sasha Khapyorsky wrote: > On 08:21 Tue 30 Sep , Hal Rosenstock wrote: >> >> Won't it work from a switch port 0 then too ? Shouldn't it work from >> any end port ? > > I would expect that it should work, but it doesn't (from switch port 0). > I didn't see why. Sounds like a bug :-( Was it entered into bugzilla so it can be tracked ? -- Hal > Sasha > From sashak at voltaire.com Tue Sep 30 05:59:06 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 30 Sep 2008 15:59:06 +0300 Subject: [ofa-general] Re: [OpenSM][0/18] - Routing Chaining In-Reply-To: <1222723943.32432.307.camel@cardanus.llnl.gov> References: <1221506448.6274.32.camel@cardanus.llnl.gov> <20080928202648.GG25831@sashak.voltaire.com> <1222723943.32432.307.camel@cardanus.llnl.gov> Message-ID: <20080930125906.GD7396@sashak.voltaire.com> Hi Al, On 14:32 Mon 29 Sep , Al Chu wrote: > > > > Maybe we can use simpler solution - to use method's return status. It > > can return negative value on failure, zero on success and positive if > > method fallback is requested. > > This is fine as well. I had considered doing it this way as well, but I > guess it comes down to personal programming style. It is always true :). I called this "simpler" just because in such case we are able to handle default method fallback in single place instead of mandatory method definitions in multiple points. Also this let us to mane assumption about what the "default" is a central place (ucast manager) and not inside particular routing engine (file in this case). > > Assuming that we want this important feature to be included in OFED-1.4 > > Although it'd be nice to get into OFED 1.4, I know that we (the lab) > aren't in too much of a hurry to see it in OFED 1.4. Of course, we > backport new stuff into our local tree whenever we want. That's not > most people. Sure, you (the lab) are great guys and I would expect from everybody more active usage of the recent sources :). Unfortunately it is not the case yet, many will stick with OFED only. Sasha From sashak at voltaire.com Tue Sep 30 06:00:01 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 30 Sep 2008 16:00:01 +0300 Subject: [ofa-general] Re: [PATCH] opensm: routing chaining In-Reply-To: <1222726653.32432.321.camel@cardanus.llnl.gov> References: <1221506448.6274.32.camel@cardanus.llnl.gov> <20080928202648.GG25831@sashak.voltaire.com> <20080928204244.GH25831@sashak.voltaire.com> <1222726653.32432.321.camel@cardanus.llnl.gov> Message-ID: <20080930130001.GE7396@sashak.voltaire.com> Hi Al, Thanks for the comments. Answers are below. On 15:17 Mon 29 Sep , Al Chu wrote: > > diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c > > index d17fed3..4970d0c 100644 > > --- a/opensm/opensm/osm_opensm.c > > +++ b/opensm/opensm/osm_opensm.c > > @@ -61,24 +61,23 @@ > > > > struct routing_engine_module { > > const char *name; > > - int (*setup) (osm_opensm_t * p_osm); > > + int (*setup) (struct osm_routing_engine *, osm_opensm_t *); > > }; > > > > -extern int osm_ucast_updn_setup(osm_opensm_t * p_osm); > > -extern int osm_ucast_file_setup(osm_opensm_t * p_osm); > > -extern int osm_ucast_ftree_setup(osm_opensm_t * p_osm); > > -extern int osm_ucast_lash_setup(osm_opensm_t * p_osm); > > - > > -static int osm_ucast_null_setup(osm_opensm_t * p_osm); > > +extern int osm_ucast_minhop_setup(struct osm_routing_engine *, osm_opensm_t *); > > +extern int osm_ucast_updn_setup(struct osm_routing_engine *, osm_opensm_t *); > > +extern int osm_ucast_file_setup(struct osm_routing_engine *, osm_opensm_t *); > > +extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *); > > +extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *); > > +extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *); > > > > const static struct routing_engine_module routing_modules[] = { > > - {"null", osm_ucast_null_setup}, > > Not sure how much legacy opensm.opts files are out there, but I kept the > "null" routing engine in there just for safety. Is it ok to remove? I think it is good time to remove - we changed OpenSM configuration and opensm,opts is not used anymore. Also even in case when "null" (or any other unknown name) will be used minhop routing engine will be used instead (as default). > > - {"minhop", osm_ucast_null_setup}, > > + {"minhop", osm_ucast_minhop_setup}, > > {"updn", osm_ucast_updn_setup}, > > {"file", osm_ucast_file_setup}, > > {"ftree", osm_ucast_ftree_setup}, > > {"lash", osm_ucast_lash_setup}, > > - {"dor", osm_ucast_null_setup}, > > + {"dor", osm_ucast_dor_setup}, > > {NULL, NULL} > > }; > > > > @@ -861,3 +871,28 @@ Exit: > > OSM_LOG_EXIT(p_mgr->p_log); > > return (signal); > > } > > + > > +static int ucast_build_lid_matrices(void *context) > > +{ > > + return osm_ucast_mgr_build_lid_matrices(context); > > +} > > + > > +static int ucast_build_lfts(void *context) > > +{ > > + return ucast_mgr_build_lfts(context); > > +} > > + > > +int osm_ucast_minhop_setup(struct osm_routing_engine *r, osm_opensm_t *osm) > > +{ > > + r->context = &osm->sm.ucast_mgr; > > + r->build_lid_matrices = ucast_build_lid_matrices; > > + r->ucast_build_fwd_tables = ucast_build_lfts; > > + return 0; > > +} > > + > > +int osm_ucast_dor_setup(struct osm_routing_engine *r, osm_opensm_t *osm) > > +{ > > + osm_ucast_minhop_setup(r, osm); > > + osm->sm.ucast_mgr.is_dor = 1; > > If dor is listed in the routing chain, all other algorithms that may > fall-through into minhop's build_lfts callback (minhop, updn, file), > will be affected by the is_dor flag. Is this intended? It is just bug (wanted to handle this, but forgot... :( ). > If we don't want to abstract it for this round, perhaps we could stick > the "is_dor" flag set/unset into ucast_mgr_route() so that is_dor is set > only when dor is being routed. I think we may define ucast_build_fwd_tables method for dor, something like: static int ucast_dor_build_lfts(void *context) { osm_ucast_mgr_t *mgr = context; int ret; mgr->is_dor = 1; ret = ucast_mgr_build_lfts(mgr); mgr->is_dor = 0; return ret; } I will fix this. > The patch looks fine as whole. Thanks! Sasha From halr at obsidianresearch.com Tue Sep 30 06:08:25 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Tue, 30 Sep 2008 07:08:25 -0600 Subject: [ofa-general] [PATCH][TRIVIAL] OpenSM: More man and doc changes for opensm.conf Message-ID: <48E224C9.4020206@obsidianresearch.com> Sasha, Attached are some more man and doc changes for change to opensm.conf from opensm.opts -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-osmdoc1 URL: From hal.rosenstock at gmail.com Tue Sep 30 06:13:13 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 30 Sep 2008 09:13:13 -0400 Subject: [ofa-general] ***SPAM*** OpenSM script inconsistency ? Message-ID: Sasha, When using the defaults for an opensm configure, include/config.h:#define HAVE_DEFAULT_OPENSM_CONFIG_FILE "/usr/local/etc/opensm/ opensm.conf" but: scripts/redhat-opensm.init.in:# config: @sysconfdir@/sysconfig/opensm.conf scripts/redhat-opensm.init.in:CONFIG=@sysconfdir@/sysconfig/opensm.conf scripts/sldd.sh.in:# config: @sysconfdir@/sysconfig/opensm.conf scripts/sldd.sh.in:[ -f @sysconfdir@/sysconfig/opensm.conf ] && CONFIG=@sysconfd ir@/sysconfig/opensm.conf This doesn't look consistent to me. I know RedHat wants things in certain places. Shouldn't that be documented somewhere ? Also, what about sldd ? Is that for RedHat or general ? -- Hal From raq at cttc.upc.edu Tue Sep 30 07:11:32 2008 From: raq at cttc.upc.edu (Ramiro Alba Queipo) Date: Tue, 30 Sep 2008 16:11:32 +0200 Subject: [ofa-general] Diagnostics output messages Message-ID: <1222783892.31161.228.camel@mundo> On Tue, 2008-09-30 at 08:35 -0400, Hal Rosenstock wrote: > On Tue, Sep 30, 2008 at 6:51 AM, Ramiro Alba Queipo wrote: > > Hello everybody: > > > > We have just started to run a 22 nodes infiniband cluster (44 in a > > couple > > of months) under Ubuntu 8.04 and after carefully reading and testing > > OFED 1.3.1 diagnogstics packages (ibutils and infiniband-diags), I have > > got some messages I can not understand: > > > > * ibdiagnet -o . -t file.topo -s jff -pm > > > > > > -I--------------------------------------------------- > > -I- IPoIB Subnets Check > > -I--------------------------------------------------- > > -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps > > SL:0x00 > > -W- Suboptimal rate for group. Lowest member rate:20Gbps > > > group-rate:10Gbps > > > > > > What does it mean? > > This means your subnet is pure DDR and the IPoIB broadcast group can > run at a higher rate than the default. This is done via OpenSM > configuration which is slightly different depending on which version > you are using. > OpenSM 3.1.11 > > * ibchecknet > > > > #warn: counter RcvSwRelayErrors = 259 (threshold 100) lid 4 port 255 > > Error check on lid 4 (MT47396 Infiniscale-III Mellanox Technologies) > > port all: FAILED > > > > > > I could see that command 'perfquery -a 255' shows its counters, but: > > > > - What is for? > > - ibqueryerrors.pl -a says > > RcvSwRelayErrors: This counter can increase due to a valid network > > event > > Should I worry by switch ports increasing little by little this > > counter? > > > > I am using IPoIB > > Unfortunately when running IPoIB, RcvSwRelayErrors needs to be ignored > as multicasts are counted as looping. > > > * ibdiagpath -o . -t file.topo -s jff -n jff201 > > > > -I--------------------------------------------------- > > -I- QoS on Path Check > > -I--------------------------------------------------- > > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001 > > guid=0x0002c90200279295 dev=25204 port:1 > > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001 > > guid=0x0002c90200279295 dev=25204 port:1 > > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004 > > guid=0x000b8cffff0052cf dev=47396 port:1 > > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004 > > guid=0x000b8cffff0052cf dev=47396 port:1 > > -W- SLs:6 7 14 15 mapped to VL > 5 at node:"switch-1/U1" lid=0x0004 > > guid=0x000b8cffff0052cf dev=47396 in-port:23 out-port:1 > > -I- The following SLs can be used:0 1 2 3 4 5 8 9 10 11 12 13 > > > > What is the meaning of this messages? > > I'm not sure but it looks like it's complaining about an invalid VL. > Can you run: > smpquery portinfo 1 > smpquery sl2vl 1 > smpquery vlarb 1 > for both of these lids ? > # Port info: Lid 1 port 1 Mkey:............................0x0000000000000000 GidPrefix:.......................0xfe80000000000000 Lid:.............................0x0001 SMLid:...........................0x0001 CapMask:.........................0x2510a6a IsSM IsTrapSupported IsAutomaticMigrationSupported IsSLMappingSupported IsLedInfoSupported IsSystemImageGUIDsupported IsCommunicatonManagementSupported IsVendorClassSupported IsCapabilityMaskNoticeSupported IsClientRegistrationSupported DiagCode:........................0x0000 MkeyLeasePeriod:.................0 LocalPort:.......................1 LinkWidthEnabled:................1X or 4X LinkWidthSupported:..............1X or 4X LinkWidthActive:.................4X LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps LinkState:.......................Active PhysLinkState:...................LinkUp LinkDownDefState:................Polling ProtectBits:.....................0 LMC:.............................0 LinkSpeedActive:.................5.0 Gbps LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps NeighborMTU:.....................2048 SMSL:............................0 VLCap:...........................VL0-3 InitType:........................0x00 VLHighLimit:.....................0 VLArbHighCap:....................8 VLArbLowCap:.....................8 InitReply:.......................0x00 MtuCap:..........................2048 VLStallCount:....................7 HoqLife:.........................31 OperVLs:.........................VL0-3 PartEnforceInb:..................0 PartEnforceOutb:.................0 FilterRawInb:....................0 FilterRawOutb:...................0 MkeyViolations:..................0 PkeyViolations:..................0 QkeyViolations:..................0 GuidCap:.........................32 ClientReregister:................0 SubnetTimeout:...................18 RespTimeVal:.....................16 LocalPhysErr:....................8 OverrunErr:......................8 MaxCreditHint:...................0 RoundTrip:.......................0 # SL2VL table: Lid 1 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| # VLArbitration tables: Lid 1 port 1 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 | WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 | # High priority VL Arbitration Table: VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 | WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | # Port info: Lid 4 port 1 Mkey:............................0x0000000000000000 GidPrefix:.......................0x0000000000000000 Lid:.............................0x0000 SMLid:...........................0x0000 CapMask:.........................0x0 DiagCode:........................0x0000 MkeyLeasePeriod:.................0 LocalPort:.......................23 LinkWidthEnabled:................1X or 4X LinkWidthSupported:..............1X or 4X LinkWidthActive:.................4X LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps LinkState:.......................Active PhysLinkState:...................LinkUp LinkDownDefState:................Polling ProtectBits:.....................0 LMC:.............................0 LinkSpeedActive:.................5.0 Gbps LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps NeighborMTU:.....................2048 SMSL:............................0 VLCap:...........................VL0-7 InitType:........................0x00 VLHighLimit:.....................0 VLArbHighCap:....................8 VLArbLowCap:.....................8 InitReply:.......................0x00 MtuCap:..........................2048 VLStallCount:....................7 HoqLife:.........................16 OperVLs:.........................VL0-3 PartEnforceInb:..................1 PartEnforceOutb:.................1 FilterRawInb:....................0 FilterRawOutb:...................0 MkeyViolations:..................0 PkeyViolations:..................0 QkeyViolations:..................0 GuidCap:.........................0 ClientReregister:................0 SubnetTimeout:...................0 RespTimeVal:.....................0 LocalPhysErr:....................8 OverrunErr:......................8 MaxCreditHint:...................0 RoundTrip:.......................0 # SL2VL table: Lid 4 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| ports: in 1, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 2, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 3, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 4, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 5, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 6, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 7, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 8, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 9, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 10, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 11, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 12, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 13, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 14, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 15, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 16, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 17, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 18, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 19, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 20, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 21, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 22, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 23, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| ports: in 24, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| # VLArbitration tables: Lid 4 port 1 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 | WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 | # High priority VL Arbitration Table: VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 | WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | > -- Hal > > > Finally, and not related to diagnostics messages, I have to change > > permissions at > > > > crw-rw---- 1 root rdma 231, 192 2008-09-30 09:19 /dev/infiniband/uverbs0 > > > > to be 'rw' to everybody. > > > > Should I add users to 'rdma' group instead? > > > > > > --- > > Thanks in advance > > > > Regards -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que est� net. For all your IT requirements visit: http://www.transtec.co.uk From hal.rosenstock at gmail.com Tue Sep 30 08:41:22 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 30 Sep 2008 11:41:22 -0400 Subject: ***SPAM*** Fwd: [ofa-general] Diagnostics output messages In-Reply-To: References: <1222771892.31161.219.camel@mundo> <1222783776.31161.227.camel@mundo> Message-ID: ---------- Forwarded message ---------- From: Hal Rosenstock Date: Tue, Sep 30, 2008 at 11:39 AM Subject: Re: [ofa-general] Diagnostics output messages To: Ramiro Alba Queipo On Tue, Sep 30, 2008 at 10:09 AM, Ramiro Alba Queipo wrote: > On Tue, 2008-09-30 at 08:35 -0400, Hal Rosenstock wrote: >> On Tue, Sep 30, 2008 at 6:51 AM, Ramiro Alba Queipo wrote: >> > Hello everybody: >> > >> > We have just started to run a 22 nodes infiniband cluster (44 in a >> > couple >> > of months) under Ubuntu 8.04 and after carefully reading and testing >> > OFED 1.3.1 diagnogstics packages (ibutils and infiniband-diags), I have >> > got some messages I can not understand: >> > >> > * ibdiagnet -o . -t file.topo -s jff -pm >> > >> > >> > -I--------------------------------------------------- >> > -I- IPoIB Subnets Check >> > -I--------------------------------------------------- >> > -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps >> > SL:0x00 >> > -W- Suboptimal rate for group. Lowest member rate:20Gbps > >> > group-rate:10Gbps >> > >> > >> > What does it mean? >> >> This means your subnet is pure DDR and the IPoIB broadcast group can >> run at a higher rate than the default. This is done via OpenSM >> configuration which is slightly different depending on which version >> you are using. >> > > OpenSM 3.1.11 See the man page on partition configuration for how to fix this. The partition config file should contain: Default=0x7fff,ipoib,rate=6:ALL=full; since a rate of 6 is 20 Gbps >> > * ibchecknet >> > >> > #warn: counter RcvSwRelayErrors = 259 (threshold 100) lid 4 port 255 >> > Error check on lid 4 (MT47396 Infiniscale-III Mellanox Technologies) >> > port all: FAILED >> > >> > >> > I could see that command 'perfquery -a 255' shows its counters, but: >> > >> > - What is for? >> > - ibqueryerrors.pl -a says >> > RcvSwRelayErrors: This counter can increase due to a valid network >> > event >> > Should I worry by switch ports increasing little by little this >> > counter? >> > >> > I am using IPoIB >> >> Unfortunately when running IPoIB, RcvSwRelayErrors needs to be ignored >> as multicasts are counted as looping. >> >> > * ibdiagpath -o . -t file.topo -s jff -n jff201 >> > >> > -I--------------------------------------------------- >> > -I- QoS on Path Check >> > -I--------------------------------------------------- >> > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001 >> > guid=0x0002c90200279295 dev=25204 port:1 >> > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001 >> > guid=0x0002c90200279295 dev=25204 port:1 >> > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004 >> > guid=0x000b8cffff0052cf dev=47396 port:1 >> > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004 >> > guid=0x000b8cffff0052cf dev=47396 port:1 >> > -W- SLs:6 7 14 15 mapped to VL > 5 at node:"switch-1/U1" lid=0x0004 >> > guid=0x000b8cffff0052cf dev=47396 in-port:23 out-port:1 >> > -I- The following SLs can be used:0 1 2 3 4 5 8 9 10 11 12 13 >> > >> > What is the meaning of this messages? >> >> I'm not sure but it looks like it's complaining about an invalid VL. >> Can you run: >> smpquery portinfo 1 >> smpquery sl2vl 1 >> smpquery vlarb 1 >> for both of these lids ? >> > > # Port info: Lid 1 port 1 > Mkey:............................0x0000000000000000 > GidPrefix:.......................0xfe80000000000000 > Lid:.............................0x0001 > SMLid:...........................0x0001 > CapMask:.........................0x2510a6a > IsSM > IsTrapSupported > IsAutomaticMigrationSupported > IsSLMappingSupported > IsLedInfoSupported > IsSystemImageGUIDsupported > IsCommunicatonManagementSupported > IsVendorClassSupported > IsCapabilityMaskNoticeSupported > IsClientRegistrationSupported > DiagCode:........................0x0000 > MkeyLeasePeriod:.................0 > LocalPort:.......................1 > LinkWidthEnabled:................1X or 4X > LinkWidthSupported:..............1X or 4X > LinkWidthActive:.................4X > LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps > LinkState:.......................Active > PhysLinkState:...................LinkUp > LinkDownDefState:................Polling > ProtectBits:.....................0 > LMC:.............................0 > LinkSpeedActive:.................5.0 Gbps > LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps > NeighborMTU:.....................2048 > SMSL:............................0 > VLCap:...........................VL0-3 > InitType:........................0x00 > VLHighLimit:.....................0 > VLArbHighCap:....................8 > VLArbLowCap:.....................8 > InitReply:.......................0x00 > MtuCap:..........................2048 > VLStallCount:....................7 > HoqLife:.........................31 > OperVLs:.........................VL0-3 > PartEnforceInb:..................0 > PartEnforceOutb:.................0 > FilterRawInb:....................0 > FilterRawOutb:...................0 > MkeyViolations:..................0 > PkeyViolations:..................0 > QkeyViolations:..................0 > GuidCap:.........................32 > ClientReregister:................0 > SubnetTimeout:...................18 > RespTimeVal:.....................16 > LocalPhysErr:....................8 > OverrunErr:......................8 > MaxCreditHint:...................0 > RoundTrip:.......................0 > > # SL2VL table: Lid 1 > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > ports: in 0, out 0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| > > # VLArbitration tables: Lid 1 port 1 LowCap 8 HighCap 8 > # Low priority VL Arbitration Table: > VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 | > WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 | > # High priority VL Arbitration Table: > VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 | > WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | > > > # Port info: Lid 4 port 1 > Mkey:............................0x0000000000000000 > GidPrefix:.......................0x0000000000000000 > Lid:.............................0x0000 > SMLid:...........................0x0000 > CapMask:.........................0x0 > DiagCode:........................0x0000 > MkeyLeasePeriod:.................0 > LocalPort:.......................23 > LinkWidthEnabled:................1X or 4X > LinkWidthSupported:..............1X or 4X > LinkWidthActive:.................4X > LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps > LinkState:.......................Active > PhysLinkState:...................LinkUp > LinkDownDefState:................Polling > ProtectBits:.....................0 > LMC:.............................0 > LinkSpeedActive:.................5.0 Gbps > LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps > NeighborMTU:.....................2048 > SMSL:............................0 > VLCap:...........................VL0-7 > InitType:........................0x00 > VLHighLimit:.....................0 > VLArbHighCap:....................8 > VLArbLowCap:.....................8 > InitReply:.......................0x00 > MtuCap:..........................2048 > VLStallCount:....................7 > HoqLife:.........................16 > OperVLs:.........................VL0-3 > PartEnforceInb:..................1 > PartEnforceOutb:.................1 > FilterRawInb:....................0 > FilterRawOutb:...................0 > MkeyViolations:..................0 > PkeyViolations:..................0 > QkeyViolations:..................0 > GuidCap:.........................0 > ClientReregister:................0 > SubnetTimeout:...................0 > RespTimeVal:.....................0 > LocalPhysErr:....................8 > OverrunErr:......................8 > MaxCreditHint:...................0 > RoundTrip:.......................0 > > # SL2VL table: Lid 4 > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > ports: in 0, out 1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| > ports: in 1, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 2, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 3, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 4, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 5, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 6, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 7, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 8, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 9, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 10, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 11, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 12, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 13, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 14, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 15, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 16, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 17, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 18, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 19, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 20, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 21, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 22, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 23, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > ports: in 24, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > > # VLArbitration tables: Lid 4 port 1 LowCap 8 HighCap 8 > # Low priority VL Arbitration Table: > VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 | > WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 | > # High priority VL Arbitration Table: > VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 | > WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | I see what it's complaining about: ibdiag/src/ibdebug_if.tcl has the following snippet of code: "-W-ibdiagpath:qos.vlaOverOpVLs" { foreach {name port entries opVLs HL} $args {break} set lastVL [expr $opVLs - 1] if {$lastVL == 15} {set lastVL 14} append msgText "VLArbTable$HL Entries:$entries VL > $lastVL at node: $name port:$port" } There's a similar snipper for the low arb table. If I'm reading this right, those code snippets look wrong to me since it is valid to have the same VL entry in there more than once. The limit which can't be exceeded is the VLArbHigh/LowCap. In terms of the SL mapping, # SL2VL table: Lid 4 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 1, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| I think it's complaining about SLs 5-7 being mapped to non operational VLs. That is also valid but means those SLs would be dropped and not sure if that is what is intended. -- Hal > >> -- Hal >> >> > Finally, and not related to diagnostics messages, I have to change >> > permissions at >> > >> > crw-rw---- 1 root rdma 231, 192 2008-09-30 09:19 /dev/infiniband/uverbs0 >> > >> > to be 'rw' to everybody. >> > >> > Should I add users to 'rdma' group instead? >> > >> > >> > --- >> > Thanks in advance >> > >> > Regards >> > >> > >> > -- >> > Aquest missatge ha estat analitzat per MailScanner >> > a la cerca de virus i d'altres continguts perillosos, >> > i es considera que està net. >> > For all your IT requirements visit: http://www.transtec.co.uk >> > >> > _______________________________________________ >> > general mailing list >> > general at lists.openfabrics.org >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> > >> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> > >> > > > -- > Aquest missatge ha estat analitzat per MailScanner > a la cerca de virus i d'altres continguts perillosos, > i es considera que està net. > For all your IT requirements visit: http://www.transtec.co.uk > > From christopher.tanner at gatech.edu Tue Sep 30 11:44:42 2008 From: christopher.tanner at gatech.edu (Christopher Tanner) Date: Tue, 30 Sep 2008 14:44:42 -0400 Subject: [ofa-general] List of all files and locations Message-ID: <7893495E-1517-42DB-BC8F-01AEE9613100@gatech.edu> All - I'm trying to compile and install the Infiniband libraries and drivers on my system (Ubuntu 8.04 Server, Mellanox cards). I've downloaded all of the source tarballs from the OpenFabrics website, compiled them, and installed them, but it's still not working. a) I'm not sure the drivers are loading correctly b) I'm not sure I have the right permissions (whatever that means). I'm getting 'Permission Denied' errors when I try to execute a MPI program over IB. So, if anyone has this, I would like a list of the pertinent files, their locations (can be in terms of $IB_DIR), and the configuration files I need to change in order for the drivers to load and permissions to be set. This may be a big order, I don't know, but I don't see how else to solve the problems. Thanks! ------------------------------------------- Chris Tanner Space Systems Design Lab Georgia Institute of Technology christopher.tanner at gatech.edu ------------------------------------------- From rcummins at sgi.com Tue Sep 30 11:42:46 2008 From: rcummins at sgi.com (Robert Cummins) Date: Tue, 30 Sep 2008 12:42:46 -0600 Subject: [ofa-general] List of all files and locations In-Reply-To: <7893495E-1517-42DB-BC8F-01AEE9613100@gatech.edu> References: <7893495E-1517-42DB-BC8F-01AEE9613100@gatech.edu> Message-ID: <1222800166.21330.7.camel@rockymtn> Have you tried using lsmod to see what kernel modules are loaded? What's the output from dmesg? Are you even sure ib is up? What does ibnetdiscover return? Do you have a lid? Is your subnet manager running? Until you have verified that ib is working you're not likely to get an MPI application to work. I would start with something simple like making sure you can ping (requires ipoib to be working) a remote node over infiniband after you've verified the above before attempting to launch the MPI application. There are *many* reasons why you might be getting permission denied. On Tue, 2008-09-30 at 14:44 -0400, Christopher Tanner wrote: > All - > > I'm trying to compile and install the Infiniband libraries and drivers > on my system (Ubuntu 8.04 Server, Mellanox cards). I've downloaded all > of the source tarballs from the OpenFabrics website, compiled them, > and installed them, but it's still not working. > > a) I'm not sure the drivers are loading correctly > b) I'm not sure I have the right permissions (whatever that means). > I'm getting 'Permission Denied' errors when I try to execute a MPI > program over IB. > > So, if anyone has this, I would like a list of the pertinent files, > their locations (can be in terms of $IB_DIR), and the configuration > files I need to change in order for the drivers to load and > permissions to be set. > > This may be a big order, I don't know, but I don't see how else to > solve the problems. Thanks! > > ------------------------------------------- > Chris Tanner > Space Systems Design Lab > Georgia Institute of Technology > christopher.tanner at gatech.edu > ------------------------------------------- > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Tue Sep 30 14:02:35 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 1 Oct 2008 00:02:35 +0300 Subject: [ofa-general] Re: OpenSM script inconsistency ? In-Reply-To: References: Message-ID: <20080930210235.GF7396@sashak.voltaire.com> Hi Hal, On 09:13 Tue 30 Sep , Hal Rosenstock wrote: > > When using the defaults for an opensm configure, > > include/config.h:#define HAVE_DEFAULT_OPENSM_CONFIG_FILE "/usr/local/etc/opensm/ > opensm.conf" Right, when prefix is '/usr/local'. > but: > scripts/redhat-opensm.init.in:# config: @sysconfdir@/sysconfig/opensm.conf > scripts/redhat-opensm.init.in:CONFIG=@sysconfdir@/sysconfig/opensm.conf > scripts/sldd.sh.in:# config: @sysconfdir@/sysconfig/opensm.conf > scripts/sldd.sh.in:[ -f @sysconfdir@/sysconfig/opensm.conf ] && CONFIG=@sysconfd > ir@/sysconfig/opensm.conf yes, and: scripts/redhat-opensm.init:# config: ${prefix}/etc/sysconfig/opensm.conf scripts/redhat-opensm.init:prefix=/usr/local scripts/redhat-opensm.init:CONFIG=${prefix}/etc/sysconfig/opensm.conf > This doesn't look consistent to me. What is not consistent? It is all depends from ./configure's options --prefix, --sysconfdir, etc. > I know RedHat wants things in > certain places. Yes, they are running with "prefix=/usr", etc.. > Shouldn't that be documented somewhere ? Also, what > about sldd ? Is that for RedHat or general ? I don't know much about who is using sldd.sh. Currently it is installed by RPM for any distro. Sasha From sashak at voltaire.com Tue Sep 30 14:04:03 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 1 Oct 2008 00:04:03 +0300 Subject: [ofa-general] ***SPAM*** ibdm network topology format In-Reply-To: References: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> <829ded920809292304k3ffc78c0m556efbdd7d35c528@mail.gmail.com> <20080930121252.GA7396@sashak.voltaire.com> <20080930123444.GB7396@sashak.voltaire.com> Message-ID: <20080930210403.GG7396@sashak.voltaire.com> On 08:36 Tue 30 Sep , Hal Rosenstock wrote: > On Tue, Sep 30, 2008 at 8:34 AM, Sasha Khapyorsky wrote: > > On 08:21 Tue 30 Sep , Hal Rosenstock wrote: > >> > >> Won't it work from a switch port 0 then too ? Shouldn't it work from > >> any end port ? > > > > I would expect that it should work, but it doesn't (from switch port 0). > > I didn't see why. > > Sounds like a bug :-( Was it entered into bugzilla so it can be tracked ? Not by me - I didn't check it with real switch, only with ibsim. Sasha From sashak at voltaire.com Tue Sep 30 14:15:06 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 1 Oct 2008 00:15:06 +0300 Subject: [ofa-general] Re: [PATCH][TRIVIAL] OpenSM: More man and doc changes for opensm.conf In-Reply-To: <48E224C9.4020206@obsidianresearch.com> References: <48E224C9.4020206@obsidianresearch.com> Message-ID: <20080930211506.GH7396@sashak.voltaire.com> On 07:08 Tue 30 Sep , Hal Rosenstock wrote: > Sasha, > > Attached are some more man and doc changes for change to opensm.conf from > opensm.opts > > -- Hal > > > More changes for opensm.conf (rather than opensm.opts) > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From hal.rosenstock at gmail.com Tue Sep 30 14:18:14 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 30 Sep 2008 17:18:14 -0400 Subject: [ofa-general] ***SPAM*** Re: OpenSM script inconsistency ? In-Reply-To: <20080930210235.GF7396@sashak.voltaire.com> References: <20080930210235.GF7396@sashak.voltaire.com> Message-ID: Sasha, On Tue, Sep 30, 2008 at 5:02 PM, Sasha Khapyorsky wrote: > Hi Hal, > > On 09:13 Tue 30 Sep , Hal Rosenstock wrote: >> >> When using the defaults for an opensm configure, >> >> include/config.h:#define HAVE_DEFAULT_OPENSM_CONFIG_FILE "/usr/local/etc/opensm/ >> opensm.conf" > > Right, when prefix is '/usr/local'. Isn't that the default prefix ? >> but: >> scripts/redhat-opensm.init.in:# config: @sysconfdir@/sysconfig/opensm.conf >> scripts/redhat-opensm.init.in:CONFIG=@sysconfdir@/sysconfig/opensm.conf >> scripts/sldd.sh.in:# config: @sysconfdir@/sysconfig/opensm.conf >> scripts/sldd.sh.in:[ -f @sysconfdir@/sysconfig/opensm.conf ] && CONFIG=@sysconfd >> ir@/sysconfig/opensm.conf > > yes, and: > > scripts/redhat-opensm.init:# config: ${prefix}/etc/sysconfig/opensm.conf > scripts/redhat-opensm.init:prefix=/usr/local > scripts/redhat-opensm.init:CONFIG=${prefix}/etc/sysconfig/opensm.conf > >> This doesn't look consistent to me. > > What is not consistent? It is all depends from ./configure's options > --prefix, --sysconfdir, etc. They are not consistent with defaults. Isn't opensm.conf in .../etc/opensm/ by default (not .../etc/sysconfig/) ? >> I know RedHat wants things in >> certain places. > > Yes, they are running with "prefix=/usr", etc.. That's the prefix part and not the other pieces. >> Shouldn't that be documented somewhere ? Also, what >> about sldd ? Is that for RedHat or general ? > > I don't know much about who is using sldd.sh. Currently it is installed > by RPM for any distro. Looks to me to be consistent with RedHat script but not defaults. -- Hal > Sasha > From sashak at voltaire.com Tue Sep 30 14:29:25 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 1 Oct 2008 00:29:25 +0300 Subject: [ofa-general] Re: OpenSM script inconsistency ? In-Reply-To: References: <20080930210235.GF7396@sashak.voltaire.com> Message-ID: <20080930212925.GI7396@sashak.voltaire.com> On 17:18 Tue 30 Sep , Hal Rosenstock wrote: > >> > >> include/config.h:#define HAVE_DEFAULT_OPENSM_CONFIG_FILE "/usr/local/etc/opensm/ > >> opensm.conf" > > > > Right, when prefix is '/usr/local'. > > Isn't that the default prefix ? Yes, it is. > >> but: > >> scripts/redhat-opensm.init.in:# config: @sysconfdir@/sysconfig/opensm.conf > >> scripts/redhat-opensm.init.in:CONFIG=@sysconfdir@/sysconfig/opensm.conf > >> scripts/sldd.sh.in:# config: @sysconfdir@/sysconfig/opensm.conf > >> scripts/sldd.sh.in:[ -f @sysconfdir@/sysconfig/opensm.conf ] && CONFIG=@sysconfd > >> ir@/sysconfig/opensm.conf > > > > yes, and: > > > > scripts/redhat-opensm.init:# config: ${prefix}/etc/sysconfig/opensm.conf > > scripts/redhat-opensm.init:prefix=/usr/local > > scripts/redhat-opensm.init:CONFIG=${prefix}/etc/sysconfig/opensm.conf > > > >> This doesn't look consistent to me. > > > > What is not consistent? It is all depends from ./configure's options > > --prefix, --sysconfdir, etc. > > They are not consistent with defaults. Isn't opensm.conf in > .../etc/opensm/ by default (not .../etc/sysconfig/) ? I see. In this script "sysconfig/opensm.conf" is not OpenSM config file, but legacy script/config file for this particular script (which is optional too). > >> I know RedHat wants things in > >> certain places. > > > > Yes, they are running with "prefix=/usr", etc.. > > That's the prefix part and not the other pieces. > > >> Shouldn't that be documented somewhere ? Also, what > >> about sldd ? Is that for RedHat or general ? > > > > I don't know much about who is using sldd.sh. Currently it is installed > > by RPM for any distro. > > Looks to me to be consistent with RedHat script but not defaults. Do you mean start_sldd(), etc functions in redhat script? Assuming so, why it should be same for any distro, or why it should be there at all? I don't know, I don't have any feedback from sldd.sh users. Sasha From rdreier at cisco.com Tue Sep 30 14:48:44 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Sep 2008 14:48:44 -0700 Subject: [ofa-general] Re: [PATCH] RDMA/nes: nes_cm.c cleanup In-Reply-To: <200809151958.m8FJw2sk012367@velma.neteffect.com> (Chien Tung's message of "Mon, 15 Sep 2008 14:58:02 -0500") References: <200809151958.m8FJw2sk012367@velma.neteffect.com> Message-ID: I applied this part: > -struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, > +static struct nes_cm_node *mini_cm_connect(struct nes_cm_core *cm_core, since that clearly makes sense, but I dropped: > - struct nes_qp *nesqp; > + struct nes_qp *nesqp = NULL; and > - u16 mpa_frame_size = sizeof(struct ietf_mpa_frame) + private_data_len; > + u16 mpa_frame_size = 0; > + mpa_frame_size = sizeof(struct ietf_mpa_frame) + > + private_data_len; since I don't see any point to those transformations. From rdreier at cisco.com Tue Sep 30 14:50:04 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Sep 2008 14:50:04 -0700 Subject: [ofa-general] [PATCH] RDMA/nes: 4 port 1G HP blade card support In-Reply-To: <200809151736.m8FHaroS010450@velma.neteffect.com> (Chien Tung's message of "Mon, 15 Sep 2008 12:36:53 -0500") References: <200809151736.m8FHaroS010450@velma.neteffect.com> Message-ID: thanks, applied. From rdreier at cisco.com Tue Sep 30 14:51:42 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Sep 2008 14:51:42 -0700 Subject: [ofa-general] Re: [PATCH] iw_cxgb3: populate active_mtu in ib_port_attr In-Reply-To: <48D80C88.5080001@opengridcomputing.com> (Steve Wise's message of "Mon, 22 Sep 2008 16:22:16 -0500") References: <20080922204330.GA3943@opengridcomputing.com> <48D80C88.5080001@opengridcomputing.com> Message-ID: thanks, applied From rdreier at cisco.com Tue Sep 30 15:09:05 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Sep 2008 15:09:05 -0700 Subject: [ofa-general] [PATCH][TRIVIAL]mad.c: Need parens to kmalloc correct amount of memory In-Reply-To: <1221788975.5804.49.camel@hhash-dev> (Haven Hash's message of "Thu, 18 Sep 2008 18:49:35 -0700") References: <1221788975.5804.49.camel@hhash-dev> Message-ID: > I assume this has never been a problem because the malloc will probably > word align the allocation, but maybe it was desired? > > Potential patch attached. > > > Haven Hash > haven.hash at isilon.com- Looks correct to me... can you send a "Signed-off-by:" line for this patch? From rdreier at cisco.com Tue Sep 30 15:37:54 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Sep 2008 15:37:54 -0700 Subject: [ofa-general] Re: [PATCH 13/13] RDMA/nes: Enhanced PFT management scheme In-Reply-To: <200809262008.m8QK8A2B011727@velma.neteffect.com> (Chien Tung's message of "Fri, 26 Sep 2008 15:08:10 -0500") References: <200809262008.m8QK8A2B011727@velma.neteffect.com> Message-ID: thanks, applied all 13 From sashak at voltaire.com Tue Sep 30 18:19:20 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 1 Oct 2008 04:19:20 +0300 Subject: [ofa-general] [PATCH v2] opensm: routing chaining In-Reply-To: <20080928204244.GH25831@sashak.voltaire.com> References: <1221506448.6274.32.camel@cardanus.llnl.gov> <20080928202648.GG25831@sashak.voltaire.com> <20080928204244.GH25831@sashak.voltaire.com> Message-ID: <20081001011920.GJ7396@sashak.voltaire.com> From: Albert Chu Routing chaining is the ability to configure the order in which routing algorithms are applied in opensm, i.e. -R ftree,updn,minhop Try using ftree routing. If ftree fails, try updn. If updn fails, try minhop. In order to get this done, some rearchitecture of the routing code had to be done b/c there is no longer an assumption that only one routing engine can be specified. Always setup a routing engine, assume no default "fallthrough" minhop routing engine. On configured routing engine failure, do minhop as a last resort. Stick a *next pointer into struct osm_routing_engine. Rearchitect routing engine usage as a list instead of a single struct. Signed-off-by: Sasha Khapyorsky --- The difference with previous version is proper 'is_dor' flag handling in dor routing engine. opensm/include/opensm/osm_opensm.h | 10 ++- opensm/include/opensm/osm_subnet.h | 7 +- opensm/include/opensm/osm_ucast_mgr.h | 2 +- opensm/man/opensm.8.in | 8 +- opensm/opensm/main.c | 10 ++- opensm/opensm/osm_opensm.c | 121 +++++++++++++++++++++--------- opensm/opensm/osm_subnet.c | 11 ++- opensm/opensm/osm_ucast_file.c | 19 ++--- opensm/opensm/osm_ucast_ftree.c | 35 +++------ opensm/opensm/osm_ucast_lash.c | 16 ++-- opensm/opensm/osm_ucast_mgr.c | 132 ++++++++++++++++++++++----------- opensm/opensm/osm_ucast_updn.c | 10 +- 12 files changed, 239 insertions(+), 142 deletions(-) diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h index 5d45724..c121be4 100644 --- a/opensm/include/opensm/osm_opensm.h +++ b/opensm/include/opensm/osm_opensm.h @@ -126,6 +126,7 @@ struct osm_routing_engine { int (*ucast_build_fwd_tables) (void *context); void (*ucast_dump_tables) (void *context); void (*delete) (void *context); + struct osm_routing_engine *next; }; /* * FIELDS @@ -148,6 +149,9 @@ struct osm_routing_engine { * delete * The delete method, may be used for routing engine * internals cleanup. +* +* next +* Pointer to next routing engine in the list. */ /****s* OpenSM: OpenSM/osm_opensm_t @@ -178,7 +182,7 @@ typedef struct osm_opensm { osm_log_t log; cl_dispatcher_t disp; cl_plock_t lock; - struct osm_routing_engine routing_engine; + struct osm_routing_engine *routing_engine_list; osm_routing_engine_type_t routing_engine_used; osm_stats_t stats; osm_console_t console; @@ -221,8 +225,8 @@ typedef struct osm_opensm { * lock * Shared lock guarding most OpenSM structures. * -* routing_engine -* Routing engine; will be initialized then used. +* routing_engine_list +* List of routing engines that should be tried for use. * * routing_engine_used * Indicates which routing engine was used to route a subnet. diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index f90f7ea..0c7f3b9 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -182,7 +182,7 @@ typedef struct osm_subn_opt { char *port_prof_ignore_file; boolean_t port_profile_switch_nodes; boolean_t sweep_on_trap; - char *routing_engine_name; + char *routing_engine_names; boolean_t connect_roots; char *lid_matrix_dump_file; char *lfts_file; @@ -353,9 +353,8 @@ typedef struct osm_subn_opt { * sweep_on_trap * Received traps will initiate a new sweep. * -* routing_engine_name -* Name of used routing engine -* (other than default Min Hop Algorithm) +* routing_engine_names +* Name of routing engine(s) to use. * * connect_roots * The option which will enforce root to root connectivity with diff --git a/opensm/include/opensm/osm_ucast_mgr.h b/opensm/include/opensm/osm_ucast_mgr.h index 1dc9a37..59ba9fa 100644 --- a/opensm/include/opensm/osm_ucast_mgr.h +++ b/opensm/include/opensm/osm_ucast_mgr.h @@ -264,7 +264,7 @@ osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr, * * SYNOPSIS */ -void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr); +int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr); /* * PARAMETERS * p_mgr diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index 13d9a32..c1ea584 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -9,7 +9,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA) [\-F | \-\-config ] [\-c(reate-config) ] [\-g(uid) ] [\-l(mc) ] [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] -[\-R | \-\-routing_engine ] +[\-R | \-\-routing_engine ] [\-z | \-\-connect_roots] [\-M | \-\-lid_matrix_file ] [\-U | \-\-lfts_file ] @@ -116,8 +116,10 @@ Without -r, OpenSM attempts to preserve existing LID assignments resolving multiple use of same LID. .TP \fB\-R\fR, \fB\-\-routing_engine\fR -This option chooses routing engine instead of Min Hop -algorithm (default). +This option chooses routing engine(s) to use instead of Min Hop +algorithm (default). Multiple routing engines can be specified +separated by commas so that specific ordering of routing algorithms +will be tried if earlier routing engines fail. Supported engines: minhop, updn, file, ftree, lash, dor .TP \fB\-z\fR, \fB\-\-connect_roots\fR diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 01bfddf..2f53157 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -177,8 +177,10 @@ static void show_usage(void) " LID assignments resolving multiple use of same LID.\n\n"); printf("-R\n" "--routing_engine \n" - " This option chooses routing engine instead of Min Hop\n" - " algorithm (default).\n" + " This option chooses routing engine(s) to use instead of default\n" + " Min Hop algorithm. Multiple routing engines can be specified\n" + " separated by commas so that specific ordering of routing\n" + " algorithms will be tried if earlier routing engines fail.\n" " Supported engines: updn, file, ftree, lash, dor\n\n"); printf("-z\n" "--connect_roots\n" @@ -851,8 +853,8 @@ int main(int argc, char *argv[]) break; case 'R': - opt.routing_engine_name = optarg; - printf(" Activate \'%s\' routing engine\n", optarg); + opt.routing_engine_names = optarg; + printf(" Activate \'%s\' routing engine(s)\n", optarg); break; case 'z': diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c index d17fed3..4970d0c 100644 --- a/opensm/opensm/osm_opensm.c +++ b/opensm/opensm/osm_opensm.c @@ -61,24 +61,23 @@ struct routing_engine_module { const char *name; - int (*setup) (osm_opensm_t * p_osm); + int (*setup) (struct osm_routing_engine *, osm_opensm_t *); }; -extern int osm_ucast_updn_setup(osm_opensm_t * p_osm); -extern int osm_ucast_file_setup(osm_opensm_t * p_osm); -extern int osm_ucast_ftree_setup(osm_opensm_t * p_osm); -extern int osm_ucast_lash_setup(osm_opensm_t * p_osm); - -static int osm_ucast_null_setup(osm_opensm_t * p_osm); +extern int osm_ucast_minhop_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_updn_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_file_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *); +extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *); const static struct routing_engine_module routing_modules[] = { - {"null", osm_ucast_null_setup}, - {"minhop", osm_ucast_null_setup}, + {"minhop", osm_ucast_minhop_setup}, {"updn", osm_ucast_updn_setup}, {"file", osm_ucast_file_setup}, {"ftree", osm_ucast_ftree_setup}, {"lash", osm_ucast_lash_setup}, - {"dor", osm_ucast_null_setup}, + {"dor", osm_ucast_dor_setup}, {NULL, NULL} }; @@ -135,33 +134,77 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const char *str) /********************************************************************** **********************************************************************/ -static int setup_routing_engine(osm_opensm_t * p_osm, const char *name) +static void append_routing_engine(osm_opensm_t *osm, + struct osm_routing_engine *routing_engine) { - const struct routing_engine_module *r; + struct osm_routing_engine *r; + + routing_engine->next = NULL; + + if (!osm->routing_engine_list) { + osm->routing_engine_list = routing_engine; + return; + } + + r = osm->routing_engine_list; + while (r->next) + r = r->next; - for (r = routing_modules; r->name && *r->name; r++) { - if (!strcmp(r->name, name)) { - p_osm->routing_engine.name = r->name; - if (r->setup(p_osm)) { - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, + r->next = routing_engine; +} + +static void setup_routing_engine(osm_opensm_t *osm, const char *name) +{ + struct osm_routing_engine *re; + const struct routing_engine_module *m; + + for (m = routing_modules; m->name && *m->name; m++) { + if (!strcmp(m->name, name)) { + re = malloc(sizeof(struct osm_routing_engine)); + if (!re) { + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, + "memory allocation failed\n"); + return; + } + memset(re, 0, sizeof(struct osm_routing_engine)); + + re->name = m->name; + if (m->setup(re, osm)) { + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, "setup of routing" " engine \'%s\' failed\n", name); - return -2; + return; } - OSM_LOG(&p_osm->log, OSM_LOG_DEBUG, - "\'%s\' routing engine set up\n", - p_osm->routing_engine.name); - return 0; + OSM_LOG(&osm->log, OSM_LOG_DEBUG, + "\'%s\' routing engine set up\n", re->name); + append_routing_engine(osm, re); + return; } } - return -1; + + OSM_LOG(&osm->log, OSM_LOG_ERROR, + "cannot find or setup routing engine \'%s\'", name); } -static int osm_ucast_null_setup(osm_opensm_t * p_osm) +static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names) { - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, - "nothing yet - using default (minhop) routing engine\n"); - return 0; + char *name, *str, *p; + + if (!engine_names || !*engine_names) { + setup_routing_engine(osm, "minhop"); + return; + } + + str = strdup(engine_names); + name = strtok_r(str, ", \t\n", &p); + while (name && *name) { + setup_routing_engine(osm, name); + name = strtok_r(NULL, ", \t\n", &p); + } + free(str); + + if (!osm->routing_engine_list) + setup_routing_engine(osm, "minhop"); } /********************************************************************** @@ -181,6 +224,20 @@ void osm_opensm_construct(IN osm_opensm_t * const p_osm) /********************************************************************** **********************************************************************/ +static void destroy_routing_engines(osm_opensm_t *osm) +{ + struct osm_routing_engine *r, *next; + + next = osm->routing_engine_list; + while (next) { + r = next; + next = r->next; + if (r->delete) + r->delete(r->context); + free(r); + } +} + void osm_opensm_destroy(IN osm_opensm_t * const p_osm) { /* in case of shutdown through exit proc - no ^C */ @@ -218,8 +275,7 @@ void osm_opensm_destroy(IN osm_opensm_t * const p_osm) osm_sa_db_file_dump(p_osm); /* do the destruction in reverse order as init */ - if (p_osm->routing_engine.delete) - p_osm->routing_engine.delete(p_osm->routing_engine.context); + destroy_routing_engines(p_osm); osm_sa_destroy(&p_osm->sa); osm_sm_destroy(&p_osm->sm); #ifdef ENABLE_OSM_PERF_MGR @@ -371,12 +427,7 @@ osm_opensm_init(IN osm_opensm_t * const p_osm, goto Exit; #endif /* ENABLE_OSM_PERF_MGR */ - if (p_opt->routing_engine_name && - setup_routing_engine(p_osm, p_opt->routing_engine_name)) - OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, - "cannot find or setup routing engine" - " \'%s\'. Default will be used instead\n", - p_opt->routing_engine_name); + setup_routing_engines(p_osm, p_opt->routing_engine_names); p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 278aa3d..a39ce75 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -442,7 +442,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) p_opt->port_prof_ignore_file = NULL; p_opt->port_profile_switch_nodes = FALSE; p_opt->sweep_on_trap = TRUE; - p_opt->routing_engine_name = NULL; + p_opt->routing_engine_names = NULL; p_opt->connect_roots = FALSE; p_opt->lid_matrix_dump_file = NULL; p_opt->lfts_file = NULL; @@ -1264,7 +1264,7 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) p_key, p_val, &p_opts->sweep_on_trap); opts_unpack_charp("routing_engine", - p_key, p_val, &p_opts->routing_engine_name); + p_key, p_val, &p_opts->routing_engine_names); opts_unpack_boolean("connect_roots", p_key, p_val, &p_opts->connect_roots); @@ -1521,9 +1521,12 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) fprintf(opts_file, "# Routing engine\n" + "# Multiple routing engines can be specified separated by\n" + "# commas so that specific ordering of routing algorithms will\n" + "# be tried if earlier routing engines fail.\n" "# Supported engines: minhop, updn, file, ftree, lash, dor\n" - "routing_engine %s\n\n", p_opts->routing_engine_name ? - p_opts->routing_engine_name : null_str); + "routing_engine %s\n\n", p_opts->routing_engine_names ? + p_opts->routing_engine_names : null_str); fprintf(opts_file, "# Connect roots (use FALSE if unsure)\n" diff --git a/opensm/opensm/osm_ucast_file.c b/opensm/opensm/osm_ucast_file.c index 3d00cb2..cbd65c1 100644 --- a/opensm/opensm/osm_ucast_file.c +++ b/opensm/opensm/osm_ucast_file.c @@ -135,14 +135,13 @@ static int do_ucast_file_load(void *context) OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, "LFTs file name is not given; " "using default routing algorithm\n"); - return -1; + return 1; } file = fopen(file_name, "r"); if (!file) { OSM_LOG(&p_osm->log, OSM_LOG_ERROR | OSM_LOG_SYS, "ERR 6302: " - "cannot open ucast dump file \'%s\'; " - "using default routing algorithm\n", file_name); + "cannot open ucast dump file \'%s\': %m\n", file_name); return -1; } @@ -270,15 +269,13 @@ static int do_lid_matrix_file_load(void *context) OSM_LOG(&p_osm->log, OSM_LOG_VERBOSE, "lid matrix file name is not given; " "using default lid matrix generation algorithm\n"); - return -1; + return 1; } file = fopen(file_name, "r"); if (!file) { OSM_LOG(&p_osm->log, OSM_LOG_ERROR | OSM_LOG_SYS, "ERR 6305: " - "cannot open lid matrix file \'%s\'; " - "using default lid matrix generation algorithm\n", - file_name); + "cannot open lid matrix file \'%s\': %m\n", file_name); return -1; } @@ -389,10 +386,10 @@ static int do_lid_matrix_file_load(void *context) return 0; } -int osm_ucast_file_setup(osm_opensm_t * p_osm) +int osm_ucast_file_setup(struct osm_routing_engine *r, osm_opensm_t *osm) { - p_osm->routing_engine.context = (void *)p_osm; - p_osm->routing_engine.build_lid_matrices = do_lid_matrix_file_load; - p_osm->routing_engine.ucast_build_fwd_tables = do_ucast_file_load; + r->context = osm; + r->build_lid_matrices = do_lid_matrix_file_load; + r->ucast_build_fwd_tables = do_ucast_file_load; return 0; } diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index 1d3233c..15168b7 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -3552,8 +3552,7 @@ static int __osm_ftree_construct_fabric(IN void *context) OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, "Ranking FatTree\n"); if (__osm_ftree_fabric_rank(p_ftree) != 0) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Failed ranking the tree - " - "fat-tree routing falls back to default routing\n"); + "Failed ranking the tree\n"); status = -1; goto Exit; } @@ -3567,14 +3566,12 @@ static int __osm_ftree_construct_fabric(IN void *context) "Populating CA & switch ports\n"); if (__osm_ftree_fabric_populate_ports(p_ftree) != 0) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric topology is not a fat-tree - " - "routing falls back to default routing\n"); + "Fabric topology is not a fat-tree\n"); status = -1; goto Exit; } else if (p_ftree->cn_num == 0) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric has no valid compute nodes - " - "routing falls back to default routing\n"); + "Fabric has no valid compute nodes\n"); status = -1; goto Exit; } @@ -3586,8 +3583,7 @@ static int __osm_ftree_construct_fabric(IN void *context) if (__osm_ftree_fabric_get_rank(p_ftree) > FAT_TREE_MAX_RANK || __osm_ftree_fabric_get_rank(p_ftree) < FAT_TREE_MIN_RANK) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric rank is %u (should be between %u and %u) - " - "fat-tree routing falls back to default routing\n", + "Fabric rank is %u (should be between %u and %u)\n", __osm_ftree_fabric_get_rank(p_ftree), FAT_TREE_MIN_RANK, FAT_TREE_MAX_RANK); status = -1; @@ -3600,8 +3596,7 @@ static int __osm_ftree_construct_fabric(IN void *context) validation - it checks that all the CNs are at the same rank. */ if (__osm_ftree_fabric_mark_leaf_switches(p_ftree)) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric topology is not a fat-tree - " - "routing falls back to default routing\n"); + "Fabric topology is not a fat-tree\n"); status = -1; goto Exit; } @@ -3619,8 +3614,7 @@ static int __osm_ftree_construct_fabric(IN void *context) In any case, the first and the last switches in the array are REAL leafs. */ if (__osm_ftree_fabric_create_leaf_switch_array(p_ftree)) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric topology is not a fat-tree - " - "routing falls back to default routing\n"); + "Fabric topology is not a fat-tree\n"); status = -1; goto Exit; } @@ -3640,8 +3634,7 @@ static int __osm_ftree_construct_fabric(IN void *context) if (!__osm_ftree_fabric_roots_provided(p_ftree) && !__osm_ftree_fabric_validate_topology(p_ftree)) { osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, - "Fabric topology is not a fat-tree - " - "routing falls back to default routing\n"); + "Fabric topology is not a fat-tree\n"); status = -1; goto Exit; } @@ -3726,7 +3719,7 @@ static void __osm_ftree_delete(IN void *context) /*************************************************** ***************************************************/ -int osm_ucast_ftree_setup(osm_opensm_t * p_osm) +int osm_ucast_ftree_setup(struct osm_routing_engine *r, osm_opensm_t * p_osm) { ftree_fabric_t *p_ftree = __osm_ftree_fabric_create(); if (!p_ftree) @@ -3734,12 +3727,10 @@ int osm_ucast_ftree_setup(osm_opensm_t * p_osm) p_ftree->p_osm = p_osm; - p_osm->routing_engine.context = (void *)p_ftree; - p_osm->routing_engine.build_lid_matrices = __osm_ftree_construct_fabric; - p_osm->routing_engine.ucast_build_fwd_tables = __osm_ftree_do_routing; - p_osm->routing_engine.delete = __osm_ftree_delete; + r->context = (void *)p_ftree; + r->build_lid_matrices = __osm_ftree_construct_fabric; + r->ucast_build_fwd_tables = __osm_ftree_do_routing; + r->delete = __osm_ftree_delete; + return 0; } - -/*************************************************** - ***************************************************/ diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index b985e9a..ce3982f 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -785,7 +785,7 @@ static int init_lash_structures(lash_t * p_lash) unsigned vl_min = p_lash->vl_min; unsigned num_switches = p_lash->num_switches; osm_log_t *p_log = &p_lash->p_osm->log; - int status = IB_SUCCESS; + int status = 0; unsigned int i, j, k; OSM_LOG_ENTER(p_log); @@ -852,7 +852,7 @@ static int init_lash_structures(lash_t * p_lash) goto Exit; Exit_Mem_Error: - status = IB_ERROR; + status = -1; OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D01: " "Could not allocate required memory for LASH errno %d, errno %d for lack of memory\n", errno, ENOMEM); @@ -875,7 +875,7 @@ static int lash_core(lash_t * p_lash) int stop = 0, output_link, i_next_switch; int output_link2, i_next_switch2; int cycle_found2 = 0; - int status = IB_SUCCESS; + int status = 0; int *switch_bitmap = NULL; /* Bitmap to check if we have processed this pair */ OSM_LOG_ENTER(p_log); @@ -1028,7 +1028,7 @@ static int lash_core(lash_t * p_lash) goto Exit; Error_Not_Enough_Lanes: - status = IB_ERROR; + status = -1; OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D02: " "Lane requirements (%d) exceed available lanes (%d)\n", p_lash->vl_min, lanes_needed); @@ -1360,15 +1360,15 @@ uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, osm_port_t * p_src_port, return (uint8_t) ((switch_t *) p_sw->priv)->routing_table[dst_id].lane; } -int osm_ucast_lash_setup(osm_opensm_t * p_osm) +int osm_ucast_lash_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm) { lash_t *p_lash = lash_create(p_osm); if (!p_lash) return -1; - p_osm->routing_engine.context = p_lash; - p_osm->routing_engine.ucast_build_fwd_tables = lash_process; - p_osm->routing_engine.delete = lash_delete; + r->context = p_lash; + r->ucast_build_fwd_tables = lash_process; + r->delete = lash_delete; return 0; } diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c index 9d0ad13..a4967fe 100644 --- a/opensm/opensm/osm_ucast_mgr.c +++ b/opensm/opensm/osm_ucast_mgr.c @@ -216,7 +216,6 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, uint8_t port; boolean_t is_ignored_by_port_prof; ib_net64_t node_guid; - struct osm_routing_engine *p_routing_eng; unsigned start_from = 1; OSM_LOG_ENTER(p_mgr->p_log); @@ -253,8 +252,6 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, node_guid = osm_node_get_node_guid(p_sw->p_node); - p_routing_eng = &p_mgr->p_subn->p_osm->routing_engine; - /* The lid matrix contains the number of hops to each lid from each port. From this information we determine @@ -269,18 +266,9 @@ __osm_ucast_mgr_process_port(IN osm_ucast_mgr_t * const p_mgr, /* do not try to overwrite the ppro of non existing port ... */ is_ignored_by_port_prof = TRUE; - /* Up/Down routing can cause unreachable routes between some - switches so we do not report that as an error in that case */ - if (!p_routing_eng->build_lid_matrices) { - OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, "ERR 3A08: " - "No path to get to LID %u from switch 0x%" - PRIx64 "\n", lid_ho, cl_ntoh64(node_guid)); - /* trigger a new sweep - try again ... */ - p_mgr->p_subn->subnet_initialization_error = TRUE; - } else - OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, - "No path to get to LID %u from switch 0x%" - PRIx64 "\n", lid_ho, cl_ntoh64(node_guid)); + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, + "No path to get to LID %u from switch 0x%" PRIx64 "\n", + lid_ho, cl_ntoh64(node_guid)); } else { osm_physp_t *p = osm_node_get_physp_ptr(p_sw->p_node, port); @@ -583,7 +571,7 @@ __osm_ucast_mgr_process_neighbors(IN cl_map_item_t * const p_map_item, /********************************************************************** **********************************************************************/ -void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) +int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) { uint32_t i; uint32_t iteration_max; @@ -646,6 +634,8 @@ void osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * const p_mgr) OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, "Min-hop propagated in %d steps\n", i); } + + return 0; } /********************************************************************** @@ -752,7 +742,7 @@ static void clear_prof_ignore_flag(cl_map_item_t * const p_map_item, void *ctx) } } -static void ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) +static int ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) { cl_qlist_init(&p_mgr->port_order_list); @@ -786,27 +776,56 @@ static void ucast_mgr_build_lfts(osm_ucast_mgr_t *p_mgr) __osm_ucast_mgr_process_tbl, p_mgr); cl_qlist_remove_all(&p_mgr->port_order_list); + + return 0; } /********************************************************************** **********************************************************************/ +static int ucast_mgr_route(struct osm_routing_engine *r, osm_opensm_t *osm) +{ + int ret; + + OSM_LOG(&osm->log, OSM_LOG_VERBOSE, + "building routing with \'%s\' routing algorithm...\n", r->name); + + if (!r->build_lid_matrices || + (ret = r->build_lid_matrices(r->context)) > 0) + ret = osm_ucast_mgr_build_lid_matrices(&osm->sm.ucast_mgr); + + if (ret < 0) { + OSM_LOG(&osm->log, OSM_LOG_ERROR, + "%s: cannot build lid matrices.\n", r->name); + return ret; + } + + if (!r->ucast_build_fwd_tables || + (ret = r->ucast_build_fwd_tables(r->context)) > 0) + ret = ucast_mgr_build_lfts(&osm->sm.ucast_mgr); + + if (ret < 0) { + OSM_LOG(&osm->log, OSM_LOG_ERROR, + "%s: cannot build fwd tables.\n", r->name); + return ret; + } + + osm->routing_engine_used = osm_routing_engine_type(r->name); + + return 0; +} + osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) { osm_opensm_t *p_osm; struct osm_routing_engine *p_routing_eng; osm_signal_t signal = OSM_SIGNAL_DONE; cl_qmap_t *p_sw_guid_tbl; - int blm = 0; - int ubft = 0; OSM_LOG_ENTER(p_mgr->p_log); p_sw_guid_tbl = &p_mgr->p_subn->sw_guid_tbl; p_osm = p_mgr->p_subn->p_osm; - p_routing_eng = &p_osm->routing_engine; - - p_mgr->is_dor = p_routing_eng->name - && (strcmp(p_routing_eng->name, "dor") == 0); + p_routing_eng = p_osm->routing_engine_list; CL_PLOCK_EXCL_ACQUIRE(p_mgr->p_lock); @@ -819,28 +838,19 @@ osm_signal_t osm_ucast_mgr_process(IN osm_ucast_mgr_t * const p_mgr) p_mgr->any_change = FALSE; - if (!p_routing_eng->build_lid_matrices || - (blm = p_routing_eng->build_lid_matrices(p_routing_eng->context))) - osm_ucast_mgr_build_lid_matrices(p_mgr); + p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE; + while (p_routing_eng) { + if (!ucast_mgr_route(p_routing_eng, p_osm)) + break; + p_routing_eng = p_routing_eng->next; + } - /* - Now that the lid matrices have been built, we can - build and download the switch forwarding tables. - */ - if (!p_routing_eng->ucast_build_fwd_tables || - (ubft = - p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context))) + if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) { + /* If configured routing algorithm failed, use default MinHop */ + osm_ucast_mgr_build_lid_matrices(p_mgr); ucast_mgr_build_lfts(p_mgr); - - /* 'file' routing engine has one unique logic corner case */ - if (p_routing_eng->name && (strcmp(p_routing_eng->name, "file") == 0) - && (!blm || !ubft)) - p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_FILE; - else if (!blm && !ubft) - p_osm->routing_engine_used = - osm_routing_engine_type(p_routing_eng->name); - else p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_MINHOP; + } OSM_LOG(p_mgr->p_log, OSM_LOG_INFO, "%s tables configured on all switches\n", @@ -861,3 +871,41 @@ Exit: OSM_LOG_EXIT(p_mgr->p_log); return (signal); } + +static int ucast_build_lid_matrices(void *context) +{ + return osm_ucast_mgr_build_lid_matrices(context); +} + +static int ucast_build_lfts(void *context) +{ + return ucast_mgr_build_lfts(context); +} + +int osm_ucast_minhop_setup(struct osm_routing_engine *r, osm_opensm_t *osm) +{ + r->context = &osm->sm.ucast_mgr; + r->build_lid_matrices = ucast_build_lid_matrices; + r->ucast_build_fwd_tables = ucast_build_lfts; + return 0; +} + +static int ucast_dor_build_lfts(void *context) +{ + osm_ucast_mgr_t *mgr = context; + int ret; + + mgr->is_dor = 1; + ret = ucast_mgr_build_lfts(mgr); + mgr->is_dor = 0; + + return ret; +} + +int osm_ucast_dor_setup(struct osm_routing_engine *r, osm_opensm_t *osm) +{ + r->context = &osm->sm.ucast_mgr; + r->build_lid_matrices = ucast_build_lid_matrices; + r->ucast_build_fwd_tables = ucast_dor_build_lfts; + return 0; +} diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 90e9af8..4fdcc78 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -643,7 +643,7 @@ static int __osm_updn_call(void *ctx) } else { OSM_LOG(&p_updn->p_osm->log, OSM_LOG_INFO, "disabling UPDN algorithm, no root nodes were found\n"); - ret = 1; + ret = -1; } if (osm_log_is_active(&p_updn->p_osm->log, OSM_LOG_ROUTING)) @@ -669,7 +669,7 @@ static void __osm_updn_delete(void *context) free(context); } -int osm_ucast_updn_setup(osm_opensm_t * p_osm) +int osm_ucast_updn_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm) { updn_t *p_updn; @@ -680,9 +680,9 @@ int osm_ucast_updn_setup(osm_opensm_t * p_osm) p_updn->p_osm = p_osm; - p_osm->routing_engine.context = p_updn; - p_osm->routing_engine.delete = __osm_updn_delete; - p_osm->routing_engine.build_lid_matrices = __osm_updn_call; + r->context = p_updn; + r->delete = __osm_updn_delete; + r->build_lid_matrices = __osm_updn_call; return 0; } -- 1.6.0.1.196.g01914 From keshetti.mahesh at gmail.com Tue Sep 30 21:38:31 2008 From: keshetti.mahesh at gmail.com (Keshetti Mahesh) Date: Wed, 1 Oct 2008 10:08:31 +0530 Subject: [ofa-general] ***SPAM*** ibdm network topology format In-Reply-To: <20080930121252.GA7396@sashak.voltaire.com> References: <829ded920809290139vf2cc151w4cc8a6fafb49edfe@mail.gmail.com> <829ded920809292304k3ffc78c0m556efbdd7d35c528@mail.gmail.com> <20080930121252.GA7396@sashak.voltaire.com> Message-ID: <829ded920809302138o39008b1x9d01fd7a4c4d8cc8@mail.gmail.com> Thanks Sasha and Hal for your replies. > I'm able to run ibdiagnet with ibsim. I need to export SIM_HOST > environment variable so ibdiagnet will start from some host and not a > switch (by default with ibsim application starts running from first > switch in a fabric). I'll try exporting "SIM_HOST" variable. Do you want me to create a bug entry for ibdiagnet failure when started from a switch in bugzilla ? -Mahesh